Yahoo Search Goes Open Source
Feb 20th, 2008 by Colin Beasty
As part of its ongoing strategy towards moving to an open source infrastructure and network, Yahoo says that its search index is now being processed using Apache Hadoop. The Hadoop software takes over from a proprietary system being used previously, with the benefits, among others, being cost savings and scalability. Hadoop will run on a Linux server cluster with 10,000 core processors and will do the job nearly 40 percent faster than the old software.
The irony of all this is that it comes just as Microsoft beefs up its efforts to take over Yahoo. Microsoft has always been all about proprietary technology, though they’ve made strides in recent years to open source adoption. That said, it will be interesting to see what influence Microsoft’s acquisition of Yahoo will have on the search engine’s infrastructure, if and when that purchase happens.
The other irony is the implementation of Hadoop is yet another example of a vendor purchasing themselves a second-place company that’s still one step behind the leader. Yahoo is following in Google’s footsteps yet again. Simply stated, Hadoop is an open source implementation of Google’s MapReduce software.
Still, competing with Google using open source software where it can is a smart move on Yahoo’s part, especially when that software outperforms its own.



This story is getting a good amount o press today. Most people dont seem to know that Yahoo! and Hadoop isnt a new story.
Doug Cutting, the creator of Lucene,Nutch, and Hadoop, was actually brought on board Yahoo! about a year ago and leads a team of engineers for the project. Basically, Yahoo! has taken over the Hadoop project to make it production ready for really really large implementations.
For the past year they have been running it internally on a large cluster of servers to process log information for various groups within the company for analytics purposes.
The move of them running their search on the technology is a statement saying Hadoop is viable for anyone to use. It is not as much about Yahoo! trying to copy MapReduce as it is about commoditizing parrallel programming and opening it up to all developers and startups. A lot of companies already use it for their day to day activities.., two notable ones are Facebook and AdMob.
I have posted two powerpoints from the Hadoop OSCON presentation last yr.., you can find them here:
http://kaiyzen.com/?p=77
“The other irony is the implementation of Hadoop is yet another example of a vendor purchasing themselves a second-place company that’s still one step behind the leader.”
Eh?
Hadoop is not a company. It is the leader when it comes to open source distributed computing infrastructure. Nothing else even comes close…