Prepared by :U.Selvi,B.E.,M.Tech.,
Hadoop was created by Doug cutting has its origin in Apache Nutch, an open source web search Engine. Hadoop is widely used in Big data in the industry ex., spam filtering, network searching, click-stream analysis & social recommendations.
Hadoop provides a reliable shared storage and analysis system. The storage is provided by Hadoop Distributed File System( HDFS) and analysis by Map Reduce.
Map Reduce is a computing model that decomposes large data manipulation job into individual tasks that can be executed in parallel across a cluster of servers. The results of the tasks can be joined together to compute the final results.Map Reduce programming model was developed at Google.
Map Reduce comes from the two fundamental data transformation operations used Map and Reduce.Map operation converts the elements of a collection from one form to another using key-value pair which is input to reduce operation.Reduce operation perform the summary operation.
HDFS manages data across the cluster. Each block is replicated several times, so that no single hard drive or servers failure results in data loss.Hence provides Fault Tolerance.
Hadoop has the ability to cope and perform under expanding Data sets.Hence provides scalability.
Hadoop Ecosystem are Evro, Map Reduce, HDFS, Pig, Hive, HBase, Zookeeper, sqoop, Oozie.
Hadoop can support Data Exploration with full datasets, mine large datasets, large scale pre-processing of raw data and data agility.