Hadoop: A set of open source programs and procedures which anyone can use as the “backbone” of their big data operations.
The four module of Hadoop
- Distributed File-System
- MapReduce
- Hadoop Common
- YARN
Distributed File-System
The most important two are Distributed File System and MapReduce Distributed File System in Hadoop system uses its own file system that “sit above” the file system. Meaning it can be accessed using any computer, and operating system. While MapReduce provides the basic tools for “massaging” the data
MapReduce
Read data from database, putting it into a format suitable for analysis (Map). Performing mathematical operations, i.e.: counting the numbers of males aged 30+ in a customer database (Reduce).
Hadoop Common
Another module which provides the tools (in Java) needed for the user’s computer system to read data stored under the Hadoop file system
YARN
Final module of the four that handles resources of the systems storing the data and running the analysis
Over the years, various other procedures, libraries or features have come to be considered part of the Hadoop framework.
Apache Software Foundation releases this Hadoop systems in year 2005