Harnessing GameFaqs with Ruby on Rails and Hpricot
One of the things that’s most annoying when making sites that rely on external data is keeping them in sync. Although this usually means importing some sample data during development, eventually you’ll have to do it the right way. This evening I hit that point on a project where I needed to get a list of all arcade games from GameFAQs. Keeping these updated manually is completely out of the question, especially considering there’s over 3500 of them.
I decided to turn to Hpricot, which I’d heard about on Peepcode as well a number of other blogs. Hpricot is a very simple HTML parser for ruby.
To get started, just install the hpricot gem…
$ gem install hpricot
After that you can require it in your controller and go wild. I needed to get a list of all arcade games available on GameFaqs for this, so that means hitting 27 different pages and parsing the results (26 letters + all numbers). My Games table is extremely simple at this point with just an ID, name and gamefaqs_id. Since all I really need is to update my local games table with the data from GameFaqs, I also want to make sure I don’t insert duplicate records. One thing to note though: GameFaqs has multiple names for the same game. You might pull back 3 different games with a specific id. In my case I’m just using the first one I find, but you could switch this up easily enough.
So where’s the code?
require 'hpricot'
require 'open-uri'
class GatewayController < ApplicationController
before_filter :check_administrator_role
def update_games
letters = ('a'..'z').to_a << '0'
letters.each do |letter|
page = Hpricot(open('http://www.gamefaqs.com/coinop/arcade/list_'+letter+'.html'))
page.search( "//div#container/div#content/div#sky_col_wrap/div#main_col_wrap/div#main_col/div[@class='pod']/div[@class='body']/table/tr" ).each do |g|
a = g.search( "//td:first/a").first
name = a.inner_html
link = a['href']
gamefaqs_id = link.match(/[0-9]+/)[0]
# Create this game if it doesn't exist
if !Game.find_by_gamefaqs_id(gamefaqs_id, :select => 'true')
Game.create(:name => name, :gamefaqs_id => gamefaqs_id)
end
end
end
render :template => "games/update"
end
end
Not too bad for 29 lines in Rails. The letters variable contains all possible endings for the URL, with a pair of loops to do the work. The main work goes on in the page.search() part, which generates an array of td elements containing the information we need. From this you can grab the a element and then the gamefas_id and game name.

I always thought KLOV was a better source for arcade game info. Whatcha working on? sounds right up my alley – I’m a Rails dev and an arcade game collector
I finally around to deploying it this past weekend, although it still has a ways to go.
If you’re interesting in checking it out, the link is below. Its still a work in progress but if you have any suggestions I’d love to hear them!
http://www.arcadefly.com