HDFS JAVA Hadoop API Overview




Hadoop API Introduction Hadoop API provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc. A very basic example can be found on Apache wiki about how to read and write files from Hadoop API. This is great for applications running within the […]

Read more

Apache Hadoop Yarn Overview




Here we describe Apache Hadoop Yarn, which is a resource manager built into Hadoop. But it also is a stand-alone programming framework that other applications can use to run those applications across a distributed architecture. We illustrate Yarn by setting up a Hadoop cluster as Yarn by itself is not much to see. It is not something you work with […]

Read more

How to run hadoop – map reduce jobs without a cluster




This document is indented to aid basic java developers to kick start how to run hadoop and practical investigation on Hadoop map reduce jobs without any cluster set up on their end. To understand this document you need to possess basic theoretical knowledge on  Hadoop, hdfs and map reduce jobs. It is also advisable to have some prior knowledge on […]

Read more

Hadoop Distributed File System (HDFS) and MapReduce




The Hadoop Distributed File System (HDFS) HDFS is a fault tolerant and self-healing distributed file system designed to turn a cluster of industry standard servers into a massively scalable pool of storage. Developed specifically for large-scale data processing workloads where scalability, flexibility and throughput are critical, HDFS accepts data in any format regardless of schema, optimizes for high bandwidth streaming, […]

Read more

Apache YARN Hadoop NextGen MapReduce




MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN Hadoop. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application […]

Read more

hadoop fileinputformat Partitioning in MapReduce




Partitioning in MapReduce As you may know, when a job (it is a MapReduce term for program) is run it goes to the the mapper, and the output of the mapper goes to the reducer. Ever wondered how many mapper and how many reducers is required for a job execution? What are parameters taken into consideration for deciding number of […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

MRUnit Example for WordCount Algorithm




MRUnit Example for WordCount Algorithm In this post we will discuss about basic MRUnit example for Wordcount algorithm. Below are the tools used in this example Eclipse 3.8, mrunit-1.0.0-hadoop2.jar Procedure: 1. Download mrunit jar from this link and add this to the java project build path (File –> properties –> java build path –> add external jars) in eclipse. 2. […]

Read more

Merging Small Files Into Avro File




Merging Small Files Into Avro File This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS output directory. We will […]

Read more
1 2 3 4