MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Hadoop Mapreduce Streaming Tricks and Techniques
Copy link
Facebook
Email
Notes
More

Hadoop Mapreduce Streaming Tricks and Techniques

Rahul Agarwal's avatar
Rahul Agarwal
May 09, 2015
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Hadoop Mapreduce Streaming Tricks and Techniques
Copy link
Facebook
Email
Notes
More
Share

I have been using Hadoop a lot now a days and thought about writing some of the novel techniques that a user could use to get the most out of the Hadoop Ecosystem.

Using Shell Scripts to run your Programs

I am not a fan of large bash commands. The ones where you have to specify the whole path of the jar files and the such. You can effectively organize your workflow by using shell scripts. Now Shell scripts are not as formidable as they sound. We wont be doing programming perse using these shell scripts(Though they are pretty good at that too), we will just use them to store commands that we need to use sequentially.

Below is a sample of the shell script I use to run my Mapreduce Codes.

#!/bin/bash
#Defining program variables
IP="/data/input"
OP="/data/output"
HADOOP_JAR_PATH="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.5.0.jar"
MAPPER="test_m.py"
REDUCER="test_r.py"

hadoop fs -rmr -skipTrash $OP
hadoop jar $HADOOP_JAR_PA…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More