Hadoop Ecosystem Application

A Complete processing architecture

  • Data transfer (Flume, Sqoop, Kafka, Falcon)
  • File System (HDFS)
  • Data Storage (HBase, Cassandra)
  • Serialization (Avro, Trevni, Thrift)
  • Jobs Execution (MapReduce, YARN)
  • Data Interaction (Pig, Hive, Spark, Storm)
  • Intelligence (Mahout, Drill)
  • Search (Lucene, Solr)
  • Graphics (Giraph)
  • Security (Knox, Sentry)
  • Operation and Development (Ooozie, Zookeeper, Ambari)

1. Apache Zookeeper — coordination of distributed services

2. Apache Oozie — Workflow Scheduling

3. Apache Hive — Data Warehouse (Developed by Facebook)

4. Apache Sqoop — SQL Server/Oracle for HDFS

5. Apache Pig

  • Pig Latin Script Language is a procedural language of data flow and contains syntax and commands applied to implement business logic.
  • Runtime Engine, which is the compiler that produces sequences of MapReduce programs, uses HDFS to store and fetch data, is used to interact with Hadoop systems, and validates and compiles scripts in Jobs MapReduce sequences.

6. Apache HBase — NoSQL key-value database

HBase Node Architecture

HBase vs. RDBMS

7. Apache Flume — Massive source data collection for HDFS

8. Apache Mahout — central data flow repository

9. Apache Kafka — central data flow repository



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store