HDFS JAVA Hadoop API Overview




Hadoop API Introduction Hadoop API provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc. A very basic example can be found on Apache wiki about how to read and write files from Hadoop API. This is great for applications running within the […]

Read more

Oveview of hadoop cassandra




Overview of Hadoop Cassandra hadoop Cassandra is a noSQL opensource database.It was developed by Facebook to handle their unique needs to process enormous amounts of data. To say that it is noSQL does not mean it is unstructured. Data in hadoop Cassandra is stored in the familiar row-and-column datasets as a regular SQL database. But there are no relations between […]

Read more

Oveview of cassandra hadoop




Overview of Cassandra Hadoop Cassandra Hadoop is a noSQL opensource database.It was developed by Facebook to handle their unique needs to process enormous amounts of data. To say that it is noSQL does not mean it is unstructured. Data in Cassandra hadoop is stored in the familiar row-and-column datasets as a regular SQL database. But there are no relations between […]

Read more

What is Hadoop Sqoop? What is FLUME – Hadoop Tutorial




What is Hadoop Sqoop? What is FLUME – Hadoop Tutorial In this Article we are going to learn hadoop sqoop and hadoop flume. Before we learn more about Flume and Sqoop , lets study Issues with Data Load into Hadoop Analytical processing using Hadoop requires loading of huge amounts of data from diverse sources into Hadoop clusters. This process of […]

Read more

Hadoop – HDFS Operations




Hadoop – HDFS Operations   we will see in this article about HDFS Operations that we usually needs for our job. Starting HDFS Initially you have to format the configured HDFS file system, open namenode (HDFS server), and execute the following command. $ hadoop namenode -format After formatting the HDFS, start the distributed file system. The following command will start […]

Read more

Hadoop Commands Reference




Hadoop Commands Reference There are many more commands in “$HADOOP_HOME/bin/hadoop fs”than are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no additional arguments will list all the commands that can be run with the FsShell system. Furthermore, $HADOOP_HOME/bin/hadoop fs -help commandName will display a short usage summary for the operation in question, if you […]

Read more

Apache YARN Hadoop NextGen MapReduce




MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN Hadoop. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application […]

Read more

Hadoop Distributed File System (HDFS) for Big Data




The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining […]

Read more

hadoop fileinputformat Partitioning in MapReduce




Partitioning in MapReduce As you may know, when a job (it is a MapReduce term for program) is run it goes to the the mapper, and the output of the mapper goes to the reducer. Ever wondered how many mapper and how many reducers is required for a job execution? What are parameters taken into consideration for deciding number of […]

Read more

MRUnit Example for WordCount Algorithm




MRUnit Example for WordCount Algorithm In this post we will discuss about basic MRUnit example for Wordcount algorithm. Below are the tools used in this example Eclipse 3.8, mrunit-1.0.0-hadoop2.jar Procedure: 1. Download mrunit jar from this link and add this to the java project build path (File –> properties –> java build path –> add external jars) in eclipse. 2. […]

Read more
1 2 3