Create a list of movies to watch with Python and Urlist

I usually like to keep lists of movies to watch, books to read, games to play on Google Keep, mainly because it comes with a widget that looks nice on my phone. Its sharing capabilities though are, well, nonexistent. Yes, you can email a list with a bullet point for each entry but it ends there.

Since I wanted to share a list of movies to watch with my wife, I resorted to Urlist. It’s a neat, straight-to-the-point tool that’s good for sharing links with friends, collaborators, anybody.

I wish they had an API available, so that this post could have been about a tool that automatically creates lists for you (heck, I could even write a simple chrome extension!), but so far there’s none.

Our list is going to have, for every entry, the vote that the movie got on IMDB, a brief summary of its plot and cast. If any of them attracts your SO’s attention, (s)he can just click to see further info about the movie 🙂

The script requires BeautifulSoup and Requests, 2 awesome libraries to scrape the web.

To install them, you can use either pip:

sudo pip install beautifulsoup4 requests

or easy_install:

sudo easy_install beautifulsoup4 requests

Create the list on Urlist, launch the script:

python scrape_IMDB.py

and for every movie you want to add:

  1. search it on IMDB
  2. copy the URL
  3. paste the URL on Urlist to add an entry
  4. paste the URL on the console where the script is running
  5. copy the output of the script
  6. back to Urlist, hit edit and paste what you copied

Here’s the script (you can download it from pastebin):

from bs4 import BeautifulSoup
import requests

done = False

while not done:
  try:
    url = raw_input("IMDB URL: ")

    # get the IMDB page
    r = requests.get(url)
    data = r.text

    # and parse it with BeautifulSoup
    soup = BeautifulSoup(data)

    # the td containing what we're looking for
    td = soup.find('td', {'id': 'overview-top'})
    rating = td.find('div', {'class': 'star-box-giga-star'}).string
    plot = td.find('p', {'itemprop': 'description'}).string
    # the div containing the main actors in the cast
    actors = td.find('div', {'itemprop': 'actors'})
    stars = ', '.join([actor.string for actor in actors.find_all('span', {'class': 'itemprop', 'itemprop': 'name'})])

    print '*%s* - %s. %s' % (rating.strip(), stars, plot)
  except KeyboardInterrupt:
    done = True
print
print 'bye!'

It’s super simple! It gets the page, finds the HTML source for what we’re looking for, and prints it out as formatted text that’s good for Urlist.

The way you find items with BeautifulSoup is relatively similar to what you do with jQuery: you look for elements in the DOM that contain what you’re looking for (to find what they are, just use your browser’s inspector… on Chrome, right click on the text and choose “Inspect element…” to see where it is in the DOM), and manipulate them as strings or arrays of strings.

Easy enough!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s