I had a PHP/MySQL script created for me that does the following:
1. Takes a large list of every movie that’s released
2. Runs a search for the title on IMDB.com
example: http://www.imdb.com/find?s=all&q=pink+panther
- The script takes the first result under ‘Popular Titles’
- If there are no Popular Title results, it simply takes the first link that it can find.
3. On the IMDB result page, it scrapes the rating out of 10, and the Metascore out of 100.
4. Next the script searches for the title at RottenTomatoes.com and goes to the first result.
It scrapes the ‘average rating’ out of 10 from this page.
This method works fairly well for the most part, but there are a few ways I would like to improve the script.
MODIFICATIONS
————-
- In Step #2, IF there are no ‘Popular Title’ results, instead of taking the first link that it finds on the page, I would like it to take the first result under ‘Titles (Exact Matches)’. If neither of those exist, well, the script can just skip the movie.
- When going to the result page on IMDB, the script would make note of the IMDB ID#. This can be found right in the URL. For example:
http://www.imdb.com/title/tt0103064/ <– the ID# is 0103064
- Instead of searching on RottenTomatoes, it would use the RottenTomatoes JSON API interface which can find a movie based on the IMDB ID#. This should be more accurate than a blind search on RottenTomatoes. To find the movie, we just use the following URL:
http://api.rottentomatoes.com/api/public/v1.0/movie_alias.json?type=imdb&id=0103064&apikey=z7y3sjqf9ecysct7pz2c4naj&_prettyprint=true
And the result we’re looking for is the one under “links”: {“alternate”: – in this case it is http://www.rottentomatoes.com/m/terminator_2_judgment_day/
The script would then proceed to the RottenTomatoes page like it does now, scraping the score.