Friday, October 4, 2013

Hot Topic On Technology

Apache Hadoop

  1. Apache Hadoop is an open-source software framework that supports data-intensive distributed         applications, licensed under the Apache v2 license. It supports the running of   applications on large     clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File   System (GFS) papers.
  2. The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the application   is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster.
  3.  In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the    distributed file system are designed so that node failures are automatically handled by the framework.
  4.  It enables applications to work with thousands of computation-independent computers and petabytes of data.
  5. Hadoop is written in the Java programming language and is an Apache top-level project being built and used by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem.Though Java code is most common, any programming language can be used with "streaming" to implement the "map" and "reduce" parts of the system.
  6. JobTracker and TaskTracker:  The MapReduce engine, which consists of one JobTracker, to which client applications submit MapReduce jobs. The JobTracker pushes work out to   available Task Tracker nodes in the cluster, striving to keep the work as close to the data as possible.
  7. Scheduling:  By default Hadoop uses FIFO, and optional 5 scheduling priorities to schedule jobs from a work queue.In version 0.19 the job scheduler was refactored out of the   JobTracker, and added the ability to use an alternate scheduler (such as the Fair scheduler or the Capacity scheduler).
  8. Fair scheduler:  The goal of the fair scheduler is to provide fast response times for small jobs and QoS for production jobs. 
  9. Capacity scheduler :The capacity scheduler supports several features which are similar to the fair scheduler.
  10. Prominent users:Yahoo!, Facebook,Amazon,ebay,AVG,Google etc.




LinkWithin

Related Posts Plugin for WordPress, Blogger...