Hadoop: A set of open source programs and procedures which anyone can use as the “backbone” of their big data operations.

The four module of Hadoop

  1. Distributed File-System
  2. MapReduce
  3. Hadoop Common
  4. YARN

Distributed File-System

The most important two are Distributed File System and MapReduce Distributed File System in Hadoop system uses its own file system that “sit above” the file system. Meaning it can be accessed using any computer, and operating system. While MapReduce provides the basic tools for “massaging” the data


Read data from database, putting it into a format suitable for analysis (Map). Performing mathematical operations, i.e.: counting the numbers of males aged 30+ in a customer database (Reduce).

Hadoop Common

Another module which provides the tools (in Java) needed for the user’s computer system to read data stored under the Hadoop file system


Final module of the four that handles resources of the systems storing the data and running the analysis

Over the years, various other procedures, libraries or features have come to be considered part of the Hadoop framework.

Apache Software Foundation releases this Hadoop systems in year 2005