Hadoop – HDFS Operations




Hadoop – HDFS Operations   we will see in this article about HDFS Operations that we usually needs for our job. Starting HDFS Initially you have to format the configured HDFS file system, open namenode (HDFS server), and execute the following command. $ hadoop namenode -format After formatting the HDFS, start the distributed file system. The following command will start […]

Read more

Hadoop Commands Reference




Hadoop Commands Reference There are many more commands in “$HADOOP_HOME/bin/hadoop fs”than are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no additional arguments will list all the commands that can be run with the FsShell system. Furthermore, $HADOOP_HOME/bin/hadoop fs -help commandName will display a short usage summary for the operation in question, if you […]

Read more

How to run hadoop – map reduce jobs without a cluster




This document is indented to aid basic java developers to kick start how to run hadoop and practical investigation on Hadoop map reduce jobs without any cluster set up on their end. To understand this document you need to possess basic theoretical knowledge on  Hadoop, hdfs and map reduce jobs. It is also advisable to have some prior knowledge on […]

Read more

Word Count – Hadoop Map Reduce Example




Word count is a typical example where Hadoop map reduce example developers start their hands on with. This sample map reduce is intended to count the no of occurrences of each word  in the provided input files. What are the minimum requirements? Input text files – any text file Cloudera test VM The mapper, reducer and driver classes to process the […]

Read more

Hadoop Distributed File System (HDFS) and MapReduce




The Hadoop Distributed File System (HDFS) HDFS is a fault tolerant and self-healing distributed file system designed to turn a cluster of industry standard servers into a massively scalable pool of storage. Developed specifically for large-scale data processing workloads where scalability, flexibility and throughput are critical, HDFS accepts data in any format regardless of schema, optimizes for high bandwidth streaming, […]

Read more

Apache YARN Hadoop NextGen MapReduce




MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN Hadoop. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application […]

Read more

Hadoop Distributed File System (HDFS) for Big Data




The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining […]

Read more

hadoop fileinputformat Partitioning in MapReduce




Partitioning in MapReduce As you may know, when a job (it is a MapReduce term for program) is run it goes to the the mapper, and the output of the mapper goes to the reducer. Ever wondered how many mapper and how many reducers is required for a job execution? What are parameters taken into consideration for deciding number of […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more
1 2 3 4 11