<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flax Blog &#187; lucene</title>
	<atom:link href="http://www.flax.co.uk/blog/tag/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flax.co.uk/blog</link>
	<description>Open source &#38; enterprise search</description>
	<lastBuildDate>Wed, 25 Jan 2012 14:56:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Fall and rise of search in a world of Big Data &#8211; part 1</title>
		<link>http://www.flax.co.uk/blog/2011/10/28/the-fall-and-rise-of-search-in-a-world-of-big-data-part-1/</link>
		<comments>http://www.flax.co.uk/blog/2011/10/28/the-fall-and-rise-of-search-in-a-world-of-big-data-part-1/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 10:07:10 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[endeca]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[market]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[oracle]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=653</guid>
		<description><![CDATA[<p>It&#8217;s been an interesting and busy few weeks this autumn &#8211; starting with <a href="http://2011.lucene-eurocon.org/">Lucene Eurocon in Barcelona</a>. &#8216;Big Data&#8217; was a main theme, with some great <a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011">presentations</a> including the keynote from Grant Ingersoll and the talk from Eric&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been an interesting and busy few weeks this autumn &#8211; starting with <a href="http://2011.lucene-eurocon.org/">Lucene Eurocon in Barcelona</a>. &#8216;Big Data&#8217; was a main theme, with some great <a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011">presentations</a> including the keynote from Grant Ingersoll and the talk from Eric Baldeschwieler of Hortonworks, showing how Lucene fits with other Apache projects such as <a href="http://hadoop.apache.org/">Hadoop</a>, <a href="http://mahout.apache.org/">Mahout</a> and <a href="http://hbase.apache.org/">HBase.</a> I also enjoyed the presentations from Andrzej Bialecki on a portable index format for Lucene, Jan Høydahl of <a href="http://www.cominvent.com/">Cominvent AS</a> on the Solr Update Chain and James Alexander of the Open University on building a Solr-powered search of their video archives. Luckily this year the presentations were videoed &#8211; so I can catch up on the presentations I missed &#8211; you&#8217;ll also be able to see me talk about our recent work with <a href="http://www.realwire.com/releases/Reed-Specialist-Recruitment-benefits-from-open-source-recruitment-search-system-developed-by-Flax">Reed Specialist Recruitment.</a></p>
<p>Of course, one of the major reasons for attending an event like this is the networking and talks outside the main event, and it was great to catch up with others in the field &#8211; one meeting between a number of us with an interest in pipelining and data conditioning led to the creation of an <a href="http://www.meetup.com/SearchPipelines/">informal group</a> to discuss how we might better share ideas, code and best practises. </p>
<p>While we were at the conference the announcement that search vendor <a href="http://gigaom.com/cloud/why-oracle-bought-big-data-veteran-endeca/">Endeca had been bought by Oracle </a>- and yes, this is also probably about Big Data. These are fascinating times &#8211; is search becoming the enabling technology for a revolution in how we deal with digital information? </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/10/28/the-fall-and-rise-of-search-in-a-world-of-big-data-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to remove a stored field in Lucene</title>
		<link>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/</link>
		<comments>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 12:12:42 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=598</guid>
		<description><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are tens of millions of complex documents in this case) we looked for a way to remove the stored field in-place.</p>
<p>There&#8217;s an interesting set of <a href="http://www.slideshare.net/abial/eurocon2010">slides from last year&#8217;s Apache Lucene Eurocon</a> which discuss this kind of Lucene index post-processing, but we didn&#8217;t find any tools to do this particular task (although this doesn&#8217;t mean they don&#8217;t exist &#8211; for example <a href="http://code.google.com/p/luke/">Luke</a> may be helpful). So we wrote our own, based on some examples in the &#8216;contrib&#8217;  directory of Solr 4. We override the document() methods of FilterIndexReader to remove the required field from each returned Document&#8217;s field list. Terms aren&#8217;t interfered with, so it really is like changing the field from being stored to not being stored; it&#8217;s still indexed.</p>
<p>The code is available <a href="http://code.google.com/p/flaxcode/source/browse/#svn%2Ftrunk%2Flucene_tools">here</a>. It&#8217;s written against Lucene 2.9.3 (which is contained in Solr 1.4.1).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Whitepaper &#8211; Why you should be considering open source search</title>
		<link>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 10:49:50 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Reference]]></category>
		<category><![CDATA[durrants]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[guardian]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[strategy]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=585</guid>
		<description><![CDATA[<p>I&#8217;ve uploaded a whitepaper I wrote a short while ago :</p>
<p><em>&#8220;In these rapidly changing times we don&#8217;t know what we will need to search tomorrow – so it&#8217;s important to be adaptable, flexible and able to cope with data</em>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve uploaded a whitepaper I wrote a short while ago :</p>
<p><em>&#8220;In these rapidly changing times we don&#8217;t know what we will need to search tomorrow – so it&#8217;s important to be adaptable, flexible and able to cope with data volumes that may not scale linearly. Maintaining control over the future of your search software is also key. Open source search has come of age and every modern business should be aware of its advantages.&#8221;</em></p>
<p>It&#8217;s available in our <a href="http://www.flax.co.uk/downloads/">downloads</a> area, together with several case studies on open source search projects we&#8217;ve carried out for clients.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open source search evening &#8211; ElasticSearch, Xapian and GSoC</title>
		<link>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/</link>
		<comments>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/#comments</comments>
		<pubDate>Wed, 04 May 2011 13:42:35 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=565</guid>
		<description><![CDATA[<p>Last night there was a small <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/16913125/">gathering</a> in Cambridge of open source search engine developers and enthusiasts. <a href="http://twitter.com/#!/rboulton">Richard Boulton</a> hosted the event and began with an introduction to <a href="http://www.elasticsearch.org/"><strong>elasticsearch</strong></a>, which is an &#8220;Open Source (Apache 2), Distributed,&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Last night there was a small <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/16913125/">gathering</a> in Cambridge of open source search engine developers and enthusiasts. <a href="http://twitter.com/#!/rboulton">Richard Boulton</a> hosted the event and began with an introduction to <a href="http://www.elasticsearch.org/"><strong>elasticsearch</strong></a>, which is an &#8220;Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene&#8221;. Richard told us about how this system attempts to make prototyping and building search systems easier by automatically guessing data schemas, offering a powerful, heirarchical &#8216;query language&#8217; and automatically distributing the search load. Richard&#8217;s conclusions were that although elasticsearch is not as mature as Apache <a href="http://lucene.apache.org/solr/">Solr</a> it is certainly a project to consider: however development is rapid and documentation is not easy to find. We&#8217;ll watch this project with interest.</p>
<p><a href="http://oligarchy.co.uk/">Olly Betts</a> next told us about various <a href="http://trac.xapian.org/wiki/GSoC2011">Xapian projects</a> running as part of this year&#8217;s Google Summer of Code; this led into a discussion of <a href="http://en.wikipedia.org/wiki/Learning_to_rank">Learning to Rank</a> and how this might be implemented in practical terms. It&#8217;s great to see these cutting-edge features being added to an open source project. </p>
<p>Thanks to Richard for organising the evening and to all who came.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/05/04/open-source-search-evening-elasticsearch-xapian-and-gsoc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ECIR 2011 Industry day &#8211; part 2 of 2</title>
		<link>http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/</link>
		<comments>http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 12:14:39 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[user interface]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=560</guid>
		<description><![CDATA[<p>Here&#8217;s the second writeup. </p>
<p>We started after lunch with a talk from <a href="http://labs.yahoo.com/Flavio_Junqueira">Flavio Junqueira</a> of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the second writeup. </p>
<p>We started after lunch with a talk from <a href="http://labs.yahoo.com/Flavio_Junqueira">Flavio Junqueira</a> of Yahoo! on web search engine cacheing. He talked both about the various things that can be cached (query results, term lists and document data) and the pros and cons of dynamic versus static caching. His work has focused on the former, with a decoupled approach &#8211; i.e. the cache doesn&#8217;t automatically know what&#8217;s changed in the index. The approach is to give data in the cache a &#8216;time to live&#8217; (TTL), after which it is refreshed &#8211; an acceptable approach as search engines don&#8217;t have a &#8216;perfect&#8217; view of the web at any one point in time. As he mentioned, this method is less useful for &#8216;real-time&#8217; data such as news.</p>
<p><a href=https://researcher.ibm.com/researcher/view.php?person=ie-FCALABRE>Francesco Calabrese</a> followed, talking about his work in the IBM Smarter Cities Technology Centre in Dublin itself. Using data from mobile devices his group has looked at &#8216;digital footprints&#8217; and how they might be used to better understand such things as public transport provision. An interesting effect they have noticed is that they can predict the type of an event (say a football match) from the points of origin of the attendees. This talk wasn&#8217;t really about search, although the data gathered would be useful in search applications with geolocation features.</p>
<p><a href="http://uk.linkedin.com/in/geryducatel">Gery Ducatel</a> from BT was next, with a description of a search application for their mobile workforce, allowing searches over a job database as well as reference and health &#038; safety information. This had some interesting aspects, not least with the user interface &#8211; you can&#8217;t type long strings wearing heavy gloves while halfway up a telegraph pole! The system uses various NLP features such as a part-of-speech tagger to break down a query and provide easy-to-use dropdown options for potential results. The user interface, while not the prettiest I&#8217;ve seen, also made good use of geolocation to show where other engineers had carried out nearby jobs.</p>
<p>I followed with my talk on Unexpected Search, which I&#8217;ll detail in a future blog post. We then moved onto a panel discussion on the <a href="http://www.bbc.co.uk/news/technology-12491688">IBM Watson</a> project &#8211; suffice it to say that although I&#8217;ve been asked about this a lot in the last few months, it seems to me that this was a great PR coup for IBM rather than a huge leap forward in the technology (which by the way includes the open source Lucene search engine).</p>
<p>Thanks again to Udo and Tony for organising the day, and for inviting me to speak &#8211; there was a fascinating range of speakers and topics, and it was great to catch up with others working in the industry. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/04/28/ecir-2011-industry-day-part-2-of-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The year open source search got serious</title>
		<link>http://www.flax.co.uk/blog/2010/12/17/the-year-open-source-search-got-serious/</link>
		<comments>http://www.flax.co.uk/blog/2010/12/17/the-year-open-source-search-got-serious/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 10:22:35 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[autonomy]]></category>
		<category><![CDATA[durrants]]></category>
		<category><![CDATA[financial times]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[partner]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=461</guid>
		<description><![CDATA[<p>It&#8217;s been an interesting and busy twelve months here at Flax &#8211; we&#8217;ve worked on some fantastic <a href="http://www.flax.co.uk/blog/2010/10/25/building-a-new-press-cuttings-service-for-the-financial-times/">customer</a> <a href="http://www.flax.co.uk/blog/2010/12/13/next-generation-media-monitoring-with-open-source-search/">projects</a>, spoken at conferences at <a href="http://www.flax.co.uk/blog/2010/10/22/search-solutions-2010-a-brief-review/">home</a> and <a href="http://www.lucenerevolution.com/">abroad</a> and made some great alliances and <a href="http://www.flax.co.uk/blog/2010/10/04/flax-partners-with-lucid-imagination/">partnerships</a>. We are&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been an interesting and busy twelve months here at Flax &#8211; we&#8217;ve worked on some fantastic <a href="http://www.flax.co.uk/blog/2010/10/25/building-a-new-press-cuttings-service-for-the-financial-times/">customer</a> <a href="http://www.flax.co.uk/blog/2010/12/13/next-generation-media-monitoring-with-open-source-search/">projects</a>, spoken at conferences at <a href="http://www.flax.co.uk/blog/2010/10/22/search-solutions-2010-a-brief-review/">home</a> and <a href="http://www.lucenerevolution.com/">abroad</a> and made some great alliances and <a href="http://www.flax.co.uk/blog/2010/10/04/flax-partners-with-lucid-imagination/">partnerships</a>. We are talking to more people than ever before about the advantages of open source search and we&#8217;ve even started a <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">local Meetup group</a>.</p>
<p>This has been the year when open source search moved out of the shadows and became a force to reckon with &#8211; whether handling <a href="http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html">billions of queries</a> or <a href="http://www.lucidimagination.com/Community/Marketplace/Application-Showcase-Wiki/LinkedIn">millions of customers</a>, powering <a href="http://www.guardian.co.uk/open-platform">innovative new APIs</a> for open content from <a href="http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/">forward-looking media companies</a> or simply making it easier for search applications to be developed. <a href="http://www.flax.co.uk/support">Commercial support</a> is now available to rival anything offered by the closed source world and there are now <a href="http://www.lucidimagination.com/enterprise-search-solutions">fully packaged solutions</a> built on open source. In some sectors open source may even become the default choice (see what <a href="http://www.lucidimagination.com/events/revolution2010/presentation-abstracts#market-trends">IDC</a> said about the embedded/OEM market).</p>
<p>There&#8217;s still significant change to come in the search sector &#8211; I expect a few vendors will be in trouble by this time next year as they realise their business models (often built on per-document charges) are out-of-date, and we <a href="http://www.businessinsider.com/no-microsoft-probably-wont-buy-autonomy-2010-12">might also see further acquisitions by the usual behemoths</a>. All this leads to reduced choice and increased costs for customers, and this is where open source can help &#8211; you can build your search solution in-house, or engage companies like ours to help, but you&#8217;re no longer locked in to a vendor&#8217;s roadmap and shackled to their business plan (or the consequences of its failure!).</p>
<p>I&#8217;ll leave the final word to Matt Asay of Canonical, who <a href="http://www.theregister.co.uk/2010/12/17/open_source_year_in_review/">says</a>: &#8220;Open source is how we do business 10 years into this new millennium.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/12/17/the-year-open-source-search-got-serious/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chalk and cheese &#8211; the difficulty of analysing open source options</title>
		<link>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/</link>
		<comments>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/#comments</comments>
		<pubDate>Thu, 09 Dec 2010 14:55:53 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Reference]]></category>
		<category><![CDATA[analyst]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=452</guid>
		<description><![CDATA[<p>David Fishman of Lucid Imagination has <a href="http://www.lucidimagination.com/blog/2010/12/07/open-source-search-analysts-radar/">blogged</a> on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>David Fishman of Lucid Imagination has <a href="http://www.lucidimagination.com/blog/2010/12/07/open-source-search-analysts-radar/">blogged</a> on how open source search is treated by the analyst community (you can even use his links to get hold of some of the reports mentioned for the usual price of your contact details). We can add to his list a report from the <a href="http://www.realstorygroup.com/Research/Channel/Search/Vendors">Real Story Group</a> &#8211; and I hear Ovum will shortly release an updated report. </p>
<p>What I find most interesting about these analyst reports is how various vendors are subdivided &#8211; either by target market, or by size, or by how &#8216;complex&#8217; their platform is. Open source solutions don&#8217;t always fit the categories &#8211; for example Real Story Group list &#8216;Apache Project&#8217; as a &#8217;specialised  vendor&#8217; &#8211; which it really isn&#8217;t. Perhaps it&#8217;s time for some new categories in these analyst reports &#8211; maybe a list of specialist open source integrators, linked with the available technologies such as <a href="http://lucene.apache.org/">Lucene</a>, <a href="http://www.xapian.org">Xapian</a> or <a href="http://sphinxsearch.com/">Sphinx</a>, combined with some data about likely costs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/12/09/chalk-and-cheese-the-difficulty-of-analysing-open-source-options/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Legal search is broken &#8211; can it be fixed with open source taxonomies?</title>
		<link>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/</link>
		<comments>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 10:22:06 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[legal]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=423</guid>
		<description><![CDATA[<p>I spent yesterday afternoon at the <a href="http://www.iskouk.org/">International Society for Knowledge Organisation</a>&#8217;s <a href="http://www.iskouk.org/events/legal_knowledge_nov2010.htm">Legal KnowHow event</a>, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I spent yesterday afternoon at the <a href="http://www.iskouk.org/">International Society for Knowledge Organisation</a>&#8217;s <a href="http://www.iskouk.org/events/legal_knowledge_nov2010.htm">Legal KnowHow event</a>, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and academics, and the talks came from those who are planning legal knowledge systems or implementing them. I also particularly enjoyed hearing from <a href="http://wyner.info/LanguageLogicLawSoftware/">Adam Wyner</a> from Liverpool University who is modelling legal arguments in software, using open source text analysis. You can see some of the key points I picked up on our <a href="http://twitter.com/#!/search?q=%23iskolegal">Twitter feed</a>.</p>
<p>What became clear to me during the afternoon is that search technology is not currently serving the needs of lawyers or law firms. The users want a simple Google-like interface (or think they do), the software is having trouble presenting results in context and the source data is large, complex and unwieldy. The software used for search is from some of the biggest commercial search vendors (legal firms seem to &#8216;follow the pack&#8217; in terms of what vendor they select &#8211; unfortunately few of the large law firms seem to have even considered the credible open source alternatives such as Lucene/Solr or Xapian).</p>
<p>In many cases taxonomies were presented as the solution &#8211; make sure every document fits tidily into a heirarchy and all the search problems go away, as lawyers can simply navigate to what they need. All very simple in theory &#8211; however each big law firm and each big legal information publisher has their own idea of what this taxonomy should be.</p>
<p>After the final presentation I argued that this seemed to be a classic case where an open source model could help. If a firm, or publisher were prepared to create an<strong> open source legal taxonomy</strong> (and to be fair, we&#8217;re only talking about 5000 entries or so &#8211; this wouldn&#8217;t be a very big structure) and let this be developed and improved collaboratively, they would themselves benefit from others&#8217; experience, the transfer of legal data between repositories would be easier and even the search vendors might learn a little about how lawyers actually want to search. The original creators would be seen as thought-leaders and could even license the taxonomy so it could not be rebadged and passed off as original by another firm or publisher.</p>
<p>However my plea fell on stony ground: law firms seem to think that their own taxonomies have inherent value (and thus should never be let outside the company) and they regard the open source model with suspicion. Perhaps legal search will remain broken for the time being.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/11/11/legal-search-is-broken-can-it-be-fixed-with-open-source-taxonomies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More about LucidWorks Enterprise</title>
		<link>http://www.flax.co.uk/blog/2010/11/05/more-about-lucidworks-enterprise/</link>
		<comments>http://www.flax.co.uk/blog/2010/11/05/more-about-lucidworks-enterprise/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 11:36:19 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Reference]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[partner]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=420</guid>
		<description><![CDATA[<p>If you&#8217;re considering a Lucene/Solr powered search solution, you may be interested in LucidWorks Enterprise, produced by our partners <a href="http://lucidimagination.com/">Lucid Imagination</a>. They&#8217;ve taken Lucene/Solr and added a powerful admin GUI, ReST API, web spiders, file crawlers, database connectors, alerts,&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re considering a Lucene/Solr powered search solution, you may be interested in LucidWorks Enterprise, produced by our partners <a href="http://lucidimagination.com/">Lucid Imagination</a>. They&#8217;ve taken Lucene/Solr and added a powerful admin GUI, ReST API, web spiders, file crawlers, database connectors, alerts, a clickthrough framework and more. All this comes with a range of excellent support options backed by the experts at Lucid.</p>
<p>If you&#8217;d like to know more read this <a href="http://www.flax.co.uk/downloads/lucidworksenterprise.pdf">downloadable PDF</a> or <a href="http://www.flax.co.uk/contact_us">contact us</a> for more information and a demo.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/11/05/more-about-lucidworks-enterprise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Further revolutions</title>
		<link>http://www.flax.co.uk/blog/2010/10/08/further-revolutions/</link>
		<comments>http://www.flax.co.uk/blog/2010/10/08/further-revolutions/#comments</comments>
		<pubDate>Fri, 08 Oct 2010 21:04:36 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=373</guid>
		<description><![CDATA[<p>Back for the second day of <a href="http://www.lucenerevolution.com">Lucene Revolution</a>, with some great talks on migrating to Solr from FAST ESP, the new flexible indexing features coming to Lucene &#8216;real soon now&#8217;, and finishing off with a panel discussion. I felt&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Back for the second day of <a href="http://www.lucenerevolution.com">Lucene Revolution</a>, with some great talks on migrating to Solr from FAST ESP, the new flexible indexing features coming to Lucene &#8216;real soon now&#8217;, and finishing off with a panel discussion. I felt privileged to sit as part of this panel between Eric Gries, CEO of Lucid Imagination, and Paul Doscher of Exalead &#8211; the discussion was lively and interesting (I hope!) to the audience.</p>
<p>I&#8217;m looking forward to returning to the UK with all I&#8217;ve learnt from this event, and to follow up on some of the ideas generated &#8211; for example, it would be great to be able to demonstrate Lucid Works Enterprise to interested parties in London.</p>
<p>Thanks to Stephen Arnold&#8217;s team and all at Lucid Imagination for organising such a great conference. It won&#8217;t be the last I&#8217;m sure!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/10/08/further-revolutions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

