Skip to main content

Big Data Analytics Tools -

In this blog we are going to go through the entire Hadoop Ecosystem. The Hadoop Ecosystem comprises of many different analytical tools. Big Data in not limited to the use of only a few tools for analysis. Today, we stand in the world where a gigabyte of data is generated every second. Even a click on any social networking site generates data. So to analyze this data increasing in geometric progression many different tools have been developed over the years. Many IT Giants like Yahoo, Facebook, etc and The Apache Software Foundation have developed several analytical tools for Big Data computing.
Let us start by discussing several tools present under the Hadoop Ecosystem for data analysis. The Hadoop Ecosystem comprises of the following tools :

1) HDFS - Hadoop Distributed File System
2) YARN - Yet Another Resource Negotiator
3) MapReduce - Big Data Processing Using Programming (like using Java,R etc.)
4) Spark - It is a framework for real time data analytics
5) Apache Storm - It a fault tolerant, distributed framework for real-time computation and processing data streams
6) PIG - Uses its own language PigLatin for data processing
7) HIVE - Uses SQL like queries for data processing known as Hive Query Language (HQL)
8) HBase - NoSQL Database
9) Mahout, Spark MLib - Implemented for Machine Learning
10) Apache Drill - SQL on Hadoop
11) Zookeeper - Performs synchronization,configuration maintenance, grouping and naming
12) Oozie - Used for Job Scheduling
13) Flume, Sqoop - These two are data ingestion tools
14) Solr & Lucene - Searching and Indexing
15) Apache Ambari - Provisioning, Managing and Monitoring Apache Hadoop clusters

Moreover these there are many other tools available for data processing. But these are the tools which are most frequently used for Big Data computing. These are the tools which are making the lives of Big Data Engineers, Data Scientists and Data Analyst's a bit easier.

Comments

Popular posts from this blog

History Of Hadoop?

Hadoop is an open-source framework for data processing written in JAVA. First lets dig deep into the history of Hadoop and from where did this name came. Kids are very good at inventing new names. Like the word Googol is a kid's term and later on it was improvised by the company and made GOOGLE. Doug cutting is the creator of Hadoop. And the idea of  the name Hadoop came to him through his son. His son named a yellow colored stuffed elephant toy as Hadoop. And Hadoop was named after his son started calling the stuffed elephant toy Hadoop Hadoop. Hadoop is an framework which allows us to process large data sets (i.e., data sets in zeta bytes and peta bytes).