MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Shell Basics every Data Scientist Should know -Part I
Copy link
Facebook
Email
Notes
More

Shell Basics every Data Scientist Should know -Part I

Rahul Agarwal's avatar
Rahul Agarwal
Oct 09, 2015
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Shell Basics every Data Scientist Should know -Part I
Copy link
Facebook
Email
Notes
More
Share

Shell Commands are powerful. And life would be like hell without shell is how I like to say it(And that is probably the reason that I dislike windows).

Consider a case when you have a 6 GB pipe-delimited file sitting on your laptop and you want to find out the count of distinct values in one particular column. You can probably do this in more than one way. You could put that file in a database and run SQL Commands, or you could write a python/perl script.

Probably whatever you do it won’t be simpler/less time consuming than this

cat data.txt | cut -d "|" -f 1 | sort | uniq | wc -l
30

And this will run way faster than whatever you do with perl/python script.

Now this command says

  • Use the cat command to print/stream the contents of the file to stdout.

  • Pipe the streaming contents from our cat command to the next command cut.

  • The cut commands specifies the delimiter by the argument -d and the column by the argument -f and streams the output to stdout.

  • Pipe the streaming content to the sort comman…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More