Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

MRUnit Example for WordCount Algorithm




MRUnit Example for WordCount Algorithm In this post we will discuss about basic MRUnit example for Wordcount algorithm. Below are the tools used in this example Eclipse 3.8, mrunit-1.0.0-hadoop2.jar Procedure: 1. Download mrunit jar from this link and add this to the java project build path (File –> properties –> java build path –> add external jars) in eclipse. 2. […]

Read more

Merging Small Files Into Avro File




Merging Small Files Into Avro File This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS output directory. We will […]

Read more

Merging Small Files into SequenceFile




Merging Small Files into SequenceFile In this post, we will discuss one of the famous use case of SequenceFiles, where we will merge large number of small files into SequenceFile. We will get to this requirement mainly due to the lack efficient processing of large number of small files in hadoop or mapreduce. Need For Merging Small Files: As hadoop […]

Read more

Avro MapReduce 2 API Example




Avro MapReduce 2 API Example Avro provides support for both old Mapreduce Package API (org.apache.hadoop.mapred) and new Mapreduce Package API (org.apache.hadoop.mapreduce). Avro data can be used as both input and output from a MapReduce job, as well as the intermediate format. In this post we will provide an example run of Avro Mapreduce 2 API. This post can be treated as […]

Read more

MapReduce Multiple Outputs Use case




MapReduce Multiple Outputs Use case Use Case Description: In this post we will discuss about the usage of Mapreduce Multiple Outputs Output format in Mapreduce jobs by taking one real world use case. In this, we are considering an use case to generate multiple output file names from reducer and these file names should be based on the certain input […]

Read more

Mapreduce Use Case for N-Gram Statistics




Mapreduce Use Case for N-Gram Statistics In this post we will provide solution to famous N-Grams calculator in Mapreduce Programming. Mapreduce Use case for N-Gram Statistics. N-Gram: In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs […]

Read more

Hadoop Performance Tuning




Hadoop Performance Tuning Hadoop Performance Tuning There are many ways to improve the performance of Hadoop jobs. In this post, we will provide a few MapReduce properties that can be used at various mapreduce phases to improve the performance tuning. There is no one-size-fits-all technique for tuning Hadoop jobs, because of the architecture of Hadoop, achieving balance among resources is often […]

Read more

Avro MapReduce Word Count Example




Avro MapReduce Word Count Example In this post, we will discuss about famous word count example through mapreduce and create a sample avro data file in hadoop distributed file system. Prerequisite: In order to execute the mapreduce word count program given in this post, we need avro-mapred-1.7.4-hadoop2.jar file to be present in $HADOOP_HOME/share/hadoop/common/lib directory. This jar contains the classes used […]

Read more