MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Data Science 101 : Playing with Scraping in Python
Copy link
Facebook
Email
Notes
More

Data Science 101 : Playing with Scraping in Python

Rahul Agarwal's avatar
Rahul Agarwal
Oct 02, 2014
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Data Science 101 : Playing with Scraping in Python
Copy link
Facebook
Email
Notes
More
Share

This is a simple illustration of using Pattern Module to scrape web data using Python. We will be scraping the data from imdb for the top TV Series along with their ratings

We will be using this link for this:

http://www.imdb.com/search/title?count=100&num_votes=5000,&ref_=gnr_tv_hr&sort=user_rating,desc&start=1&title_type=tv_series,mini_series

This URL gives a list of top Rated TV Series which have number of votes atleast 5000. The Thing to note in this URL is the “&start=” parameter where we can specify which review should the list begin with. If we specify 1 we will get reviews starting from 1-100, if we specify 101 we get reviews from 101-200 and so on.

Lets Start by importing some Python Modules that will be needed for Scraping Data:

import requests                     # This is a module that is used for getting html data from a webpage in the text format
from pattern import web             # We use this module to parse through the dtaa that we loaded using requests

Loading the data …

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More