Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Apache Hadoop - Hadoop Open Source Code

Apache Hadoop

  • Availability: Open Source

  • It is a Big data framework used for storing and large scale processing of datasets on clusters using the MapReduce programming model

  • It is used for Batch processing. Batch processing is the collection of data and processing it afterward.

  • It can process with terabytes & petabytes of data.

  • It employs ordinary computers with sufficient processing capacity can be used as participating nodes in a Hadoop cluster without requiring any specialized hardware for data processing.

  • It uses a distributed file system which is shared across all participating nodes. Data is broken into smaller same sized blocks and sent to several computer nodes for processing in parallel.

  • It does not have a security model to check and validate data for security considerations. It processes whatever data are submitted to it.

  • Apache Hive is built on the top of Hadoop, and it is used to query and manage data sets in distributed storage. It provides ETL tools, SQL like query execution via MapReduce and enables plugging in custom mapper and reducers

  • Apache Pig platform analyzes the large data sets using its own high-level language called “Pig Latin”. This enables users to write MapReduce tasks using Pig Latin, a high-level SQL like language.

Modules

  • Hadoop Common: consists of libraries and utilities which are required by other Hadoop modules

  • Hadoop Distributed File System (HDFS): stores data on the commodity machines and provides very high bandwidth across the Hadoop cluster

  • Hadoop YARN: responsible for managing the computational resources in clusters and for scheduling those resources to user applications

  • Hadoop MapReduce: supports large-scale data processing

Features

  • Open Source

  • Distributed Processing

  • Reliability

  • Fault Tolerance

  • Scalability and High Availability

  • Easy to use

  • Data Locality