Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others

Posted on February 15, 2016 by Charlie Hull

Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Posted in Biotechnology, Blog, Events | Tagged bioinformatics, biology, biosolr, DIH, django, elasticsearch, indexing, lucene, python, redis, scaling, SOLR, sphinx, sql | Leave a reply

XJoin for Solr, part 1: filtering using price discount data

Posted on January 25, 2016 by Tom Winch

5

In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.
What is XJoin?
...Continue reading

Posted in Biotechnology, Blog, E-commerce, Technical | Tagged bioinformatics, ecommerce, example, filtering, indexing, java, lucene, patch, python, SOLR, xjoin | 5 Replies

How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP

Posted on February 6, 2014 by Charlie Hull

Reply

Matt Pearce writes: We recently released UKMP, a search application built on work done on last year's Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations a...Continue reading

Posted in Technical | Tagged government, indexing, python, SOLR, stanford NLP, user interface | Leave a reply

Cambridge Search Meetup – a night of crawling and scraping

Posted on February 22, 2013 by Charlie Hull

Reply

Last night was the busiest ever Cambridge Search Meetup, with two excellent talks and a lot of discussion and networking. First was Harry Waye of Arachnys, who provide access to data on emerging markets that no-one else has using a variety of custom crawling technology and heavy use of tools such Google Translate. If you want to trawl the Greek corporate registry or find out financial news...Continue reading

Posted in Meetups | Tagged crawling, events, open source, python, scraping, scrapy | Leave a reply

Open source search engines and programming languages

Posted on September 3, 2010 by Charlie Hull

1

So you're writing a search-related application in your favourite language, and you've decided to choose an open source search engine to power it. So far, so good - but how are the two going to communicate? Let's look at two engines, Xapian and Lucene, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ - so if you're using those languages respectively, everything should be relatively simple - j...Continue reading

Posted in Technical | Tagged c#, flax, java, lucene, open source, python, SOLR, xapian | 1 Reply

flax.crawler arrives

Posted on August 2, 2010 by Charlie Hull

Reply

We've recently uploaded a new crawler framework to the Flax code repository. This is designed for use from Python to build a web crawler for your project. It's multithreaded and simple to use, here's a minimal example:
import crawler

crawler.dump = MyContentDumperImplementati...Continue reading
Posted in Technical | Tagged crawling, flax, open source, python | Leave a reply
flax.core 0.1 available Posted on June 24, 2010 by Tom Reply Charlie wrote previously that we try and work with flexible, lightweight frameworks: flax.core is a Python library for conveniently adding functionality to Xapian projects. The current (and first!) version is 0.1, which can be checked out from the flaxcode repository. This version supports named fields for inde...Continue reading Posted in Technical | Tagged flax, indexing, open source, python, xapian | Leave a reply Packaged solutions and customisability, the Python way Posted on June 14, 2010 by Charlie Hull Reply With any large scale software installation, there is going to be some customisation and tweaking necessary, and enterprise search systems are no exception. Whatever features are packaged with a system, some of those you need will be missing and some won't be used at all. It's rare to see a situation where the search engine can just be installed straight out of the box. Our Flax system is based on the Xapian core, which has a set of bindings to various differe...Continue reading Posted in Technical | Tagged flax, indexing, open source, python, xapian | Leave a reply Python and Flax presentation Posted on June 25, 2009 by Charlie Hull Reply My colleague Richard Boulton will be presenting at Europython in Birmingham, U.K. next week, specifically at 15.30 on Tuesday 30th June - an abstract is available. He'll be talking about Xapian, Xappy and Flax, and showing examples of these in action including one using a Django integration layer....Continue reading Posted in Events | Tagged Add new tag, django, flax, python, xapian, xappy | Leave a reply