<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flax Blog &#187; xapian</title>
	<atom:link href="http://www.flax.co.uk/blog/tag/xapian/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flax.co.uk/blog</link>
	<description>Open source &#38; enterprise search</description>
	<lastBuildDate>Wed, 25 Jan 2012 14:56:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Flax&#8217;s 10th birthday!</title>
		<link>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/</link>
		<comments>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 10:23:12 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[happy birthday to us]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=618</guid>
		<description><![CDATA[<p>Today marks 10 years since we formed Flax (originally as Lemur Consulting Ltd.). We had an idea that search based on open source software was going to be increasingly important (indeed, our original business model was consultancy based on <a&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Today marks 10 years since we formed Flax (originally as Lemur Consulting Ltd.). We had an idea that search based on open source software was going to be increasingly important (indeed, our original business model was consultancy based on <a href="http://www.xapian.org">Xapian</a>) and I think we&#8217;ve been proved right over the decade. Today, in the depths of a recession, we&#8217;re seeing significant growth in the business and some fascinating opportunities: the sector is still going through rapid change and it will be very interesting to see what the next few years bring.</p>
<p>Thanks to all of those who have worked <a href="http://www.flax.co.uk/our_clients">with us</a> and <a href="http://www.flax.co.uk/who_we_are">for us</a> over the last decade &#8211; we look forward to the next ten years in this exciting field!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Whitepaper &#8211; Why you should be considering open source search</title>
		<link>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 10:49:50 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Reference]]></category>
		<category><![CDATA[durrants]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[guardian]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[strategy]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=585</guid>
		<description><![CDATA[<p>I&#8217;ve uploaded a whitepaper I wrote a short while ago :</p>
<p><em>&#8220;In these rapidly changing times we don&#8217;t know what we will need to search tomorrow – so it&#8217;s important to be adaptable, flexible and able to cope with data</em>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve uploaded a whitepaper I wrote a short while ago :</p>
<p><em>&#8220;In these rapidly changing times we don&#8217;t know what we will need to search tomorrow – so it&#8217;s important to be adaptable, flexible and able to cope with data volumes that may not scale linearly. Maintaining control over the future of your search software is also key. Open source search has come of age and every modern business should be aware of its advantages.&#8221;</em></p>
<p>It&#8217;s available in our <a href="http://www.flax.co.uk/downloads/">downloads</a> area, together with several case studies on open source search projects we&#8217;ve carried out for clients.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open source search evening &#8211; ElasticSearch, Xapian and GSoC</title>
		<link>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/</link>
		<comments>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/#comments</comments>
		<pubDate>Wed, 04 May 2011 13:42:35 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=565</guid>
		<description><![CDATA[<p>Last night there was a small <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/16913125/">gathering</a> in Cambridge of open source search engine developers and enthusiasts. <a href="http://twitter.com/#!/rboulton">Richard Boulton</a> hosted the event and began with an introduction to <a href="http://www.elasticsearch.org/"><strong>elasticsearch</strong></a>, which is an &#8220;Open Source (Apache 2), Distributed,&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Last night there was a small <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/16913125/">gathering</a> in Cambridge of open source search engine developers and enthusiasts. <a href="http://twitter.com/#!/rboulton">Richard Boulton</a> hosted the event and began with an introduction to <a href="http://www.elasticsearch.org/"><strong>elasticsearch</strong></a>, which is an &#8220;Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene&#8221;. Richard told us about how this system attempts to make prototyping and building search systems easier by automatically guessing data schemas, offering a powerful, heirarchical &#8216;query language&#8217; and automatically distributing the search load. Richard&#8217;s conclusions were that although elasticsearch is not as mature as Apache <a href="http://lucene.apache.org/solr/">Solr</a> it is certainly a project to consider: however development is rapid and documentation is not easy to find. We&#8217;ll watch this project with interest.</p>
<p><a href="http://oligarchy.co.uk/">Olly Betts</a> next told us about various <a href="http://trac.xapian.org/wiki/GSoC2011">Xapian projects</a> running as part of this year&#8217;s Google Summer of Code; this led into a discussion of <a href="http://en.wikipedia.org/wiki/Learning_to_rank">Learning to Rank</a> and how this might be implemented in practical terms. It&#8217;s great to see these cutting-edge features being added to an open source project. </p>
<p>Thanks to Richard for organising the evening and to all who came.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ECIR 2011 overview</title>
		<link>http://www.flax.co.uk/blog/2011/04/26/ecir-2011-overview/</link>
		<comments>http://www.flax.co.uk/blog/2011/04/26/ecir-2011-overview/#comments</comments>
		<pubDate>Tue, 26 Apr 2011 11:12:09 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=550</guid>
		<description><![CDATA[<p>I spent part of last week at the <a href="http://www.ecir2011.dcu.ie/">33rd European Conference on Information Retrieval</a> in Dublin, as I had been asked to speak during the Industry Day (of which, more later &#8211; far too much useful information to cram&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I spent part of last week at the <a href="http://www.ecir2011.dcu.ie/">33rd European Conference on Information Retrieval</a> in Dublin, as I had been asked to speak during the Industry Day (of which, more later &#8211; far too much useful information to cram into one blog post!). Arriving late afternoon on Wednesday I caught up with Olly Betts of <a href="http://oligarchy.co.uk">Oligarchy</a>, one of the core <a href="http://www.xapian.org">Xapian</a> developers who&#8217;d travelled from New Zealand. Olly told me more about the Xapian <a href="http://trac.xapian.org/wiki/GSoC2011">projects running as part of Google&#8217;s Summer of Code</a> &#8211; very exciting to hear that there were over 40 applicants this year for a limited number of slots. </p>
<p>We went on to the conference banquet at the <a href="http://www.villageatlyons.com/">Lyons Estate</a> outside the city &#8211; which in some ways reminded me of <a href="http://www.portmeirion-village.com/">Portmeirion</a> &#8211; and caught up with people from Google Zurich amongst others. This was one of several fantastic venues organised by the Dublin team led by Cathal Gurrin (at Industry Day itself we were high above the city with great view, and I heard good things about the Guinness Storehouse, the venue for the first day of the conference). </p>
<p>Thanks to all the team (especially <a href="http://www.essex.ac.uk/csee/staff/profile.aspx?ID=1585">Udo Kruschwitz</a> and <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a> for organising Industry Day). I look forward to catching up with some of you at the next <a href="http://irsg.bcs.org/SearchSolutions/2010/sse2010.php">BCS IRSG Search Solutions</a> event on November 16th.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/04/26/ecir-2011-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open Source action in UK government</title>
		<link>http://www.flax.co.uk/blog/2011/02/02/open-source-action-in-uk-government/</link>
		<comments>http://www.flax.co.uk/blog/2011/02/02/open-source-action-in-uk-government/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 15:10:25 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=498</guid>
		<description><![CDATA[<p>I&#8217;ve been reading the revised <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/open_source.pdf">Open Source, Open Standards and ReUse: Government Action Plan</a> &#8211; it&#8217;s surprising (and heartening) to see this has existed in one form or another since as far back as 2004.</p>
<p>The key changes for&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading the revised <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/open_source.pdf">Open Source, Open Standards and ReUse: Government Action Plan</a> &#8211; it&#8217;s surprising (and heartening) to see this has existed in one form or another since as far back as 2004.</p>
<p>The key changes for this version are:</p>
<li>suppliers have to show evidence they&#8217;ve considered open source options &#8211; hopefully this will be more than a quick trawl through <a href="http://sourceforge.net/">SourceForge</a></li>
<li>&#8217;shadow license costs&#8217; have to be shown in calculations to take account of previous purchases of &#8216;perpetual&#8217; licenses &#8211; apparently in some cases this could make software license fees for a project appear as zero!</li>
<li>all purchases have to be on the basis of of re-use across the government sector &#8211; so no need to  pay again if a system moves to the <a href="http://www.guardian.co.uk/technology/2010/jan/27/cloud-computing-government-uk">cloud</a> in the future</li>
<p>This all sounds great for the open source community; let&#8217;s also hope that increased openness in government means that we&#8217;ll be able check the Action Plan is actually being followed! </p>
<p>By the way a great example of open source in action on government data is <a href="http://www.theyworkforyou.com/">They Work For You</a>, which cleans up Hansard and makes more accessible &#8211; search is powered by <a href="http://www.xapian.org">Xapian.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/02/02/open-source-action-in-uk-government/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open source intranet search over millions of documents with full security</title>
		<link>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/</link>
		<comments>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 11:03:34 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[faceted search]]></category>
		<category><![CDATA[file format]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[intranet]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=489</guid>
		<description><![CDATA[<p>Last year my colleague Tom Mortimer <a href="http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/">talked about indexing security information</a> within an open source enterprise search application, and we&#8217;re happy to announce more details of the project. Our <a href="http://www.taitworld.com">client</a> is an international radio supplier, who had considered&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Last year my colleague Tom Mortimer <a href="http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/">talked about indexing security information</a> within an open source enterprise search application, and we&#8217;re happy to announce more details of the project. Our <a href="http://www.taitworld.com">client</a> is an international radio supplier, who had considered both closed source products and search appliances, but chose open source for greater flexibility and the much lower cost of scaling to indexes of millions of documents. </p>
<p>Using the <a href="http://www.flax.co.uk/what_we_do/">Flax platform</a>, we built a high-performance multi-threaded filesystem crawler to gather documents, translated them to plain text using our own open source Flax Filters and captured Unix file permissions and access control lists (ACLs). User logins are authenticated against an LDAP server and we use this to show only the results a particular user is allowed to see. We also added the ability to tag documents directly within the search results page (for example, to mark &#8216;current&#8217; versions, or even personal favourites) &#8211; the tags can then be used to filter future results. <a href="http://en.wikipedia.org/wiki/Faceted_search">Faceted search</a> is also available.</p>
<p>You can read more about the project in a <a href="http://www.flax.co.uk/downloads/tait_case_study_jan2011.pdf">case study</a> (PDF) and Tom&#8217;s <a href="http://www.flax.co.uk/downloads/intranet_show_and_tell_2010.pdf">presentation slides</a> (PDF) explain more about the method we used to index the security information.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Networking in a great city for enterprise search</title>
		<link>http://www.flax.co.uk/blog/2011/01/14/networking-in-a-great-city-for-enterprise-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/01/14/networking-in-a-great-city-for-enterprise-search/#comments</comments>
		<pubDate>Fri, 14 Jan 2011 11:09:33 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[autonomy]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=465</guid>
		<description><![CDATA[<p>Cambridge, U.K. has a long history of hosting search experts and businesses. Back in the 1980s two firms arose &#8211; Cambridge CD Publishing, founded by <a href="http://en.wikipedia.org/wiki/Martin_Porter">Martin Porter</a> and John Snyder grew into <a href="http://www.searchtools.com/tools/muscat.html">Muscat</a>, and Cambridge Neurodynamics became <a&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Cambridge, U.K. has a long history of hosting search experts and businesses. Back in the 1980s two firms arose &#8211; Cambridge CD Publishing, founded by <a href="http://en.wikipedia.org/wiki/Martin_Porter">Martin Porter</a> and John Snyder grew into <a href="http://www.searchtools.com/tools/muscat.html">Muscat</a>, and Cambridge Neurodynamics became <a href="http://en.wikipedia.org/wiki/Autonomy_Corporation">Autonomy</a>. We believe <a href="http://www.smartlogic.com/">Smartlogic</a> still have a small office here. <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a>, co-author of the probabilistic theory of information retrieval (which <a href="http://www.xapian.org">Xapian</a> uses for ranking) is based here at <a href="http://research.microsoft.com/en-us/labs/cambridge/default.aspx">Microsoft Research</a>.</p>
<p>Today, the city is still home to innovative search companies, including <a href="http://www.trueknowledge.com/">True Knowledge</a>, <a href="http://www.grapeshot.co.uk/">Grapeshot</a> and of course ourselves. We know of many more &#8216;under the radar&#8217; developing search technologies both to complement existing systems and as completely new approaches to information retrieval, including visual search.</p>
<p>To encourage networking and to help keep the city at the forefront of search developments, we&#8217;ve created the <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Enterprise Search Cambridge Meetup group</a> and our first meeting is on February 16th &#8211; all are welcome, whether currently working with search and related technologies or simply interested in the possibilities. Hope to meet you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/01/14/networking-in-a-great-city-for-enterprise-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chalk and cheese &#8211; the difficulty of analysing open source options</title>
		<link>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/</link>
		<comments>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/#comments</comments>
		<pubDate>Thu, 09 Dec 2010 14:55:53 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Reference]]></category>
		<category><![CDATA[analyst]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=452</guid>
		<description><![CDATA[<p>David Fishman of Lucid Imagination has <a href="http://www.lucidimagination.com/blog/2010/12/07/open-source-search-analysts-radar/">blogged</a> on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>David Fishman of Lucid Imagination has <a href="http://www.lucidimagination.com/blog/2010/12/07/open-source-search-analysts-radar/">blogged</a> on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your contact details). We can add to his list a report from the <a href="http://www.realstorygroup.com/Research/Channel/Search/Vendors">Real Story Group</a> &#8211; and I hear Ovum will shortly release an updated report. </p>
<p>What I find most interesting about these analyst reports is how various vendors are subdivided &#8211; either by target market, or by size, or by how &#8216;complex&#8217; their platform is. Open source solutions don&#8217;t always fit the categories &#8211; for example Real Story Group list &#8216;Apache Project&#8217; as a &#8217;specialised  vendor&#8217; &#8211; which it really isn&#8217;t. Perhaps it&#8217;s time for some new categories in these analyst reports &#8211; maybe a list of specialist open source integrators, linked with the available technologies such as <a href="http://lucene.apache.org/">Lucene</a>, <a href="http://www.xapian.org">Xapian</a> or <a href="http://sphinxsearch.com/">Sphinx</a>, combined with some data about likely costs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Legal search is broken &#8211; can it be fixed with open source taxonomies?</title>
		<link>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/</link>
		<comments>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 10:22:06 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[legal]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=423</guid>
		<description><![CDATA[<p>I spent yesterday afternoon at the <a href="http://www.iskouk.org/">International Society for Knowledge Organisation</a>&#8217;s <a href="http://www.iskouk.org/events/legal_knowledge_nov2010.htm">Legal KnowHow event</a>, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I spent yesterday afternoon at the <a href="http://www.iskouk.org/">International Society for Knowledge Organisation</a>&#8217;s <a href="http://www.iskouk.org/events/legal_knowledge_nov2010.htm">Legal KnowHow event</a>, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and academics, and the talks came from those who are planning legal knowledge systems or implementing them. I also particularly enjoyed hearing from <a href="http://wyner.info/LanguageLogicLawSoftware/">Adam Wyner</a> from Liverpool University who is modelling legal arguments in software, using open source text analysis. You can see some of the key points I picked up on our <a href="http://twitter.com/#!/search?q=%23iskolegal">Twitter feed</a>.</p>
<p>What became clear to me during the afternoon is that search technology is not currently serving the needs of lawyers or law firms. The users want a simple Google-like interface (or think they do), the software is having trouble presenting results in context and the source data is large, complex and unwieldy. The software used for search is from some of the biggest commercial search vendors (legal firms seem to &#8216;follow the pack&#8217; in terms of what vendor they select &#8211; unfortunately few of the large law firms seem to have even considered the credible open source alternatives such as Lucene/Solr or Xapian).</p>
<p>In many cases taxonomies were presented as the solution &#8211; make sure every document fits tidily into a heirarchy and all the search problems go away, as lawyers can simply navigate to what they need. All very simple in theory &#8211; however each big law firm and each big legal information publisher has their own idea of what this taxonomy should be.</p>
<p>After the final presentation I argued that this seemed to be a classic case where an open source model could help. If a firm, or publisher were prepared to create an<strong> open source legal taxonomy</strong> (and to be fair, we&#8217;re only talking about 5000 entries or so &#8211; this wouldn&#8217;t be a very big structure) and let this be developed and improved collaboratively, they would themselves benefit from others&#8217; experience, the transfer of legal data between repositories would be easier and even the search vendors might learn a little about how lawyers actually want to search. The original creators would be seen as thought-leaders and could even license the taxonomy so it could not be rebadged and passed off as original by another firm or publisher.</p>
<p>However my plea fell on stony ground: law firms seem to think that their own taxonomies have inherent value (and thus should never be let outside the company) and they regard the open source model with suspicion. Perhaps legal search will remain broken for the time being.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Open source search engines and programming languages</title>
		<link>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/</link>
		<comments>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 10:40:16 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=352</guid>
		<description><![CDATA[<p>So you&#8217;re writing a search-related application in your favourite language, and you&#8217;ve decided to choose an open source search engine to power it. So far, so good &#8211; but how are the two going to communicate?</p>
<p>Let&#8217;s look at two&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>So you&#8217;re writing a search-related application in your favourite language, and you&#8217;ve decided to choose an open source search engine to power it. So far, so good &#8211; but how are the two going to communicate?</p>
<p>Let&#8217;s look at two engines, <a href="http://www.xapian.org">Xapian</a> and <a href="http://www.lucene.net">Lucene</a>, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ &#8211; so if you&#8217;re using those languages respectively, everything should be relatively simple &#8211; just download the source code and get on with it. However if this isn&#8217;t the case, you&#8217;re going to have to work out how to interface to the engine. </p>
<p>The Lucene project has been rewritten in several other languages: for C/C++ there&#8217;s <a href="http://incubator.apache.org/lucy/">Lucy</a> (which includes Perl and Ruby bindings), for Python there&#8217;s <a href=http://lucene.apache.org/pylucene/>PyLucene</a>, and there&#8217;s even a .Net version called, not surprisingly, <a href="http://incubator.apache.org/lucene.net/">Lucene.NET</a>. Some of these &#8216;ports&#8217; of Lucene are &#8216;looser&#8217; than others (i.e. they may not share the same API or feature set), and they may not be updated as often as Lucene itself. There are also versions in Perl, Ruby, Delphi or even Lisp (scary!) &#8211; there&#8217;s a <a href="http://wiki.apache.org/lucene-java/LuceneImplementations">full list</a> available. Not all are currently active projects.</p>
<p>Xapian takes a different approach, with only one core project, but a sheaf of bindings to other languages. Currently these bindings cover C#, Java, Perl, PHP, Python, Ruby and Tcl &#8211; but interestingly these are <em>auto-generated</em> using the <a href="http://www.swig.org/">Simplified Wrapper and Interface Generator</a> or SWIG. This means that every time Xapian&#8217;s API changes, the bindings can easily be updated to reflect this (it&#8217;s actually not quite that simple, but SWIG copes with the vast majority of code that would otherwise have to be manually edited). SWIG actually supports other languages as well (according to the SWIG website, &#8220;Common Lisp (CLISP, Allegro CL, CFFI, UFFI), Lua, Modula-3, OCAML, Octave and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme, Chicken)&#8221;) so in theory bindings to these could also be built relatively easily.</p>
<p>There&#8217;s also another way to communicate with both engines, using a <em>search server</em>. <a href="http://lucene.apache.org/solr/">SOLR</a> is the search server for Lucene, whereas for Xapian there is <a href="http://www.flax.co.uk/the_software">Flax Search Service</a>. In this case, any language that supports Web Services (you&#8217;d be hard pressed to find a modern language that doesn&#8217;t) can communicate with the engine, simply passing data over the HTTP protocol. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

