Tag Archives: hadoop

How to Develop Big Data Applications for Hadoop

This video is a great introduction to implementing Hadoop on Amazon Web Services using Karmasphere Studio.

A morning session of the Strata 2011 O’Reilly conference, it is a video of a panel of speakers from Karmasphere, Amazon Web Services, and Concurrent. The video comes in three parts totaling 145 minutes, and while the editing of the video could have been better, the content is excellent.

It starts off with the history of Hadoop, the basics of map-reduce infrastructure, and the languages, libraries, and other supporting projects that go with it.

Ken Krugler of Amazon gives an overview of Amazon Web Services (AWS), followed by Chris Wensel of Concurrent talking about their Cascading product

One of the central ideas of the video is that MapReduce (MR) is too low level to express anything more than a simple algorithm. Tools, such as Karmasphere Studio, can help generate the needed boilerplate code when given a higher level model. Tools that work with these higher level models include

  • Cascading, a visual flow layout tool for combining multiple MR steps
  • Hive, a SQL-like language that can work with most any file types/flat files
  • Pig, a language for data analysis

A case study follows on how Playfish, a company which makes games which run on Facebook, uses Karmasphere Analyst to produce their reports. Every click on a Playfish game is considered a tuple to be processed, and it used to take a long time to run a report. Now, with Analyst and AWS, the reporting has sped up tremendously, enabling Playfish to respond to trends that much quicker.

Next, a hands-on lab, led by Abe Taha of Karmasphere, was the highlight of the video. It covered:

  • installation of Karmasphere Studio into Eclipse
  • working with the Hadoop perspective to setup clusters and such
  • using the Java perspective to create various artifacts, like reducers, mappers, and partitioners
  • defining and loading datafiles with Karmasphere Analyst
  • using hive to implement joins, which are easy in hive but would be difficult in Java MR

This was all then finished off with a Q&A session.

Overall, a great video well worth the time.

This video is available at O’Reilly.

Advertisements

Leave a comment

Filed under aws, book review, hadoop