In this blog we are going to go through the entire Hadoop Ecosystem. The Hadoop Ecosystem comprises of many different analytical tools. Big Data in not limited to the use of only a few tools for analysis. Today, we stand in the world where a gigabyte of data is generated every second. Even a click on any social networking site generates data. So to analyze this data increasing in geometric progression many different tools have been developed over the years. Many IT Giants like Yahoo, Facebook, etc and The Apache Software Foundation have developed several analytical tools for Big Data computing.
Let us start by discussing several tools present under the Hadoop Ecosystem for data analysis. The Hadoop Ecosystem comprises of the following tools :
1) HDFS - Hadoop Distributed File System
2) YARN - Yet Another Resource Negotiator
3) MapReduce - Big Data Processing Using Programming (like using Java,R etc.)
4) Spark - It is a framework for real time data analytics
5) Apache Storm - It a fault tolerant, distributed framework for real-time computation and processing data streams
6) PIG - Uses its own language PigLatin for data processing
7) HIVE - Uses SQL like queries for data processing known as Hive Query Language (HQL)
8) HBase - NoSQL Database
9) Mahout, Spark MLib - Implemented for Machine Learning
10) Apache Drill - SQL on Hadoop
11) Zookeeper - Performs synchronization,configuration maintenance, grouping and naming
12) Oozie - Used for Job Scheduling
13) Flume, Sqoop - These two are data ingestion tools
14) Solr & Lucene - Searching and Indexing
15) Apache Ambari - Provisioning, Managing and Monitoring Apache Hadoop clusters
Moreover these there are many other tools available for data processing. But these are the tools which are most frequently used for Big Data computing. These are the tools which are making the lives of Big Data Engineers, Data Scientists and Data Analyst's a bit easier.
Comments
Post a Comment