Hadoop Hive Introduction




Hadoop Hive Overview Hadoop Hive is very similar to Apache Pig. What it does is let you create tables and load external files into tables using SQL. Then it creates MapReduce jobs in Java.  Java is a very wordy language so using Pig and Hive is simpler. Some have said that Hadoop Hive is a data warehouse tool (Bluntly put, […]

Read more

Hadoop Pig Interview Questions and Answers Part – 1




Hadoop Pig Interview Questions and Answers Part – 1 Below are some of the Hadoop Pig Interview questions and answers that suitable for both freshers and experienced hadoop programmers. 1. What is Apache Pig? Pig is a scripting language for exploring huge data sets of size gigabytes or terabytes very easily. Pig provides an engine for executing data flows in parallel […]

Read more

Pig Interview Questions and Answers Part – 2




Pig Interview Questions and Answers Part – 2 Below are a few more Pig Interview Questions and Answers 1. What is a tuple? A tuple is an ordered set of fields and A field is a piece of data. 2. What is a relation in Pig? A Pig relation is a bag of tuples. A Pig relation is similar to a […]

Read more

Pig Installation on Ubuntu




Pig Installation on Ubuntu In this post, we will describe the procedure for Pig Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Pig installation on Ubuntu and getting started. Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Hadoop1.x or 2.x Installed on the cluster. In this post we will use […]

Read more

Load Functions In Pig




Load Functions In Pig In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a source, and Pig has a built-in […]

Read more

Built-in Load Store Functions in Pig




Built-in Load Store Functions in Pig In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘t’. PigStorage() itself can be used for both […]

Read more

Processing Logs in Pig




Processing Logs in Pig In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log files majorly. Web Server Access Logs […]

Read more

Hadoop Pig Installation , Pig Configuration in Local and MapReduce Mode




HI All, In this post we will see how we will install pig and run the pig from local or on cluster using mapreduce . To install pig we need to install java first. To get started with pig, we will follow below basic steps: Install Java Install Pig Run the Pig scripts – in Local or Hadoop mode Java […]

Read more