Cambridge Search Meetup – a night of crawling and scraping

Last night was the busiest ever Cambridge Search Meetup, with two excellent talks and a lot of discussion and networking. First was Harry Waye of Arachnys, who provide access to data on emerging markets that no-one else has using a variety of custom crawling technology and heavy use of tools such Google Translate. If you want to trawl the Greek corporate registry or find out financial news from Kazakhstan a standard Google search is little help: Harry talked about how Arachnys have experimented with Google Custom Search Engine and the ‘headless browser’ PhantomJS to crawl sites.

Our second talk was from Shane Evans, who I first met when he led software development for our client Mydeco. While there he first worked on the development of an open source Python crawling framework, Scrapy: Shane showed how easy it is to get a Scrapy web spider running in a few lines of code, and how extensible and customisable Scrapy is for a huge variety of crawling and scraping situations. There’s even a fully hosted version at Scrapinghub with graphical tools for setting up web crawling and page scraping. We’re big fans of Scrapy at Flax and we’ve used it in a number of projects, so it was good to see an overview of why Scrapy exists and how it can be used.

Thanks to both our speakers who both travelled from out of town as did several other attendees: we’re pleased to say this was our 15th Meetup and we now have 100 members – we’re already planning further events, one will be on the evening of the first day of the Enterprise Search Europe conference.

Leave a Reply

Your email address will not be published. Required fields are marked *