<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flax Blog &#187; open source</title>
	<atom:link href="http://www.flax.co.uk/blog/tag/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flax.co.uk/blog</link>
	<description>Open source &#38; enterprise search</description>
	<lastBuildDate>Wed, 25 Jan 2012 14:56:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Search events for 2012 &#8211; the first crop</title>
		<link>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/</link>
		<comments>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 14:56:17 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=704</guid>
		<description><![CDATA[<p>Details of search events in 2012 are beginning to appear already, here&#8217;s a few to start with:</p>
<ul>
<li>1-5 April 2012 &#8211; <a href="http://ecir2012.upf.edu/">European Conference on Information Retrieval</a> (ECIR) in Barcelona, Spain. An academic conference featuring new developments in IR.</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Details of search events in 2012 are beginning to appear already, here&#8217;s a few to start with:</p>
<ul>
<li>1-5 April 2012 &#8211; <a href="http://ecir2012.upf.edu/">European Conference on Information Retrieval</a> (ECIR) in Barcelona, Spain. An academic conference featuring new developments in IR. </li>
<li>7-10 May 2012 &#8211; Lucid Imagination&#8217;s <a href="http://lucenerevolution.org/">Lucene Revolution</a> in Boston, USA. The largest conference on open source search &#8211; this event has a great buzz as the Lucene/Solr community continues to grow.</li>
<li>30/31 May 2012 &#8211; <a href="http://www.enterprisesearcheurope.com/2012">Enterprise Search Europe</a> in London, after a successful first event last year. Great for those planning or working on enterprise search projects.</li>
</ul>
<p>More to come as we hear about them &#8211; we&#8217;ll also be running another <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Cambridge Search Meetup</a> soon. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cambridge Search Meetup review &#8211; Two different kinds of university search</title>
		<link>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 10:38:18 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[adaptive search]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[university]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[website]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=686</guid>
		<description><![CDATA[<p>James Alexander of the Open University talked first on the <a href="http://www3.open.ac.uk/media/fullstory.aspx?id=17403">Access to Video Assets</a> project, a prototype system that looked at preservation, digitisation and access to thousands of TV programs originally broadcast by the BBC. James&#8217; team have worked&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>James Alexander of the Open University talked first on the <a href="http://www3.open.ac.uk/media/fullstory.aspx?id=17403">Access to Video Assets</a> project, a prototype system that looked at preservation, digitisation and access to thousands of TV programs originally broadcast by the BBC. James&#8217; team have worked out an approach based on open source software &#8211; storing programme metadata and video assets in a <a href="http://fedora-commons.org/">Fedora Commons</a> repository, indexing and searching using <a href="http://lucene.apache.org/solr/">Apache Solr</a>, authentication via <a href="http://drupal.org/">Drupal</a> &#8211; that is testament to the flexibility of these packages (some of which are being used in non-traditional ways &#8211; for example Drupal is used in a &#8216;nodeless&#8217; fashion). He showed the search interface, which allowed you to find the exact points within a long video where particular words are mentioned and play video directly with a pop-up window. I&#8217;d seen this talk before (here&#8217;s a <a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#james_alexander">video and slides</a> from Lucene Eurocon) but what I hadn&#8217;t grasped is how Solr is used as a mediation layer between the user and what can be some very complex data around the video asset itself (subtitles, rights information, format information, scripts etc.). As he mentioned, search is being used as a gateway technology to effective re-use of this huge archive. </p>
<p><a href="http://cswww.essex.ac.uk/staff/udo/">Udo Kruschwitz</a> was next with a brief treatment of his ongoing work on automatically extracting domain knowledge and using this to improve search results (for example see the &#8216;Suggestions&#8217; on the <a href="http://www.essex.ac.uk/Search/SearchResults.aspx?q=accomodation&#038;ssSubmit=search">University of Essex website</a>) &#8211; he showed us some of the various methods his team have tried to analyze query logs, including <a href="http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms">Ant Colony Optimisation</a> (modelling &#8216;trails&#8217; of queries that can be reinforced by repeat visits, or &#8216;fade&#8217; over time as they are less used). I liked the concept of developing a &#8216;community&#8217; search profile where individual search profiles are hard to obtain &#8211; and how this could be simply subdivided (so for example searchers from inside a university might have a different profile to those outside). The key idea here is that all these techniques are <strong>automatic</strong>, so the system is continually evolving to give better search suggestions and hints. Udo and his team are soon to release an open source adaptive search framework to be called &#8220;Sunny Aberdeen&#8221; which we look forward to hearing about.</p>
<p>The evening ended with networking and a pint or two in traditional fashion &#8211; thanks to both our speakers and to all who came, from as far afield as Milton Keynes, Essex and Luton. The group now has 70 members and we&#8217;re building an active and friendly local community of search enthusiasts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Outside the search box &#8211; when you need more than just a search engine</title>
		<link>http://www.flax.co.uk/blog/2011/12/06/search-plus-when-you-need-more-than-just-a-search-engine/</link>
		<comments>http://www.flax.co.uk/blog/2011/12/06/search-plus-when-you-need-more-than-just-a-search-engine/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 14:40:55 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[entity recognition]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[taxonomy]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=676</guid>
		<description><![CDATA[<p>Core search features are increasingly a commodity &#8211; you can knock up some indexing scripts in whatever scripting language you like in a short time, build a searchable inverted index with freely available open source software, and hook up your&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Core search features are increasingly a commodity &#8211; you can knock up some indexing scripts in whatever scripting language you like in a short time, build a searchable inverted index with freely available open source software, and hook up your search UI quickly via HTTP &#8211; this all used to be a lot harder than it is now (unfortunately some vendors would have you believe this is still the case, which is reflected in their hefty price tags). </p>
<p>However we&#8217;re increasingly asked to develop features <em>outside</em> the traditional search stack, to make this standard search a lot more accurate/relevant or to apply &#8217;search&#8217; to non-traditional areas. For example, Named Entity Recognition (NER) is a powerful technique to extract entities such as proper names from text &#8211; these can then be fed back into the indexing process as metadata for each document. Part of Speech (POS) tagging tells you which words are nouns, verbs etc. Sentiment Analysis promises to give you some idea of the &#8216;tone&#8217; of a comment or news piece &#8211; positive, negative or neutral for example, very useful in e-commerce applications (did customers like your product?). Word Sense Disambiguation (WSD) attempts to tell you the context a word is being used in (did you mean pen for writing or pen for livestock?).</p>
<p>There are commercial offerings from companies such as <a href="http://www.nstein.com">Nstein</a> and <a href="http://www.lexalytics.com">Lexalytics</a> that offer some of these features. An increasing amount of companies provide their services as APIs, where you pay-per-use &#8211; for example Thomson Reuters <a href="http://www.opencalais.com/">OpenCalais</a> service, <a href="http://www.pingar.com/">Pingar</a> from New Zealand and WSD specialists <a href="http://springsense.com/">SpringSense</a>. We&#8217;ve also worked with open source tools such as <a href="http://nlp.stanford.edu/">Stanford NLP</a> which perform very well when compared to commercial offerings (and can certainly compete on cost grounds). <a href="http://radimrehurek.com/gensim/">Gensim</a> is a powerful package that allows for semantic modelling of topics. The Apache <a href="http://mahout.apache.org/">Mahout</a> machine learning library allows for these techniques to be scaled to very large data sets.</p>
<p>These techniques can be used to build systems that don&#8217;t just provide powerful and enhanced search, but automatic categorisation and classification into taxonomies, document clustering, recommendation engines and automatic identification of similar documents. It&#8217;s great to be thinking outside the box &#8211; the search box that is!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/12/06/search-plus-when-you-need-more-than-just-a-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building bridges in the Cloud with open source search</title>
		<link>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 14:46:00 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[client]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[document management]]></category>
		<category><![CDATA[intranet]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sharepoint]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=667</guid>
		<description><![CDATA[<p>We&#8217;ve just published a <a href="http://www.flax.co.uk/downloads/cspencer_case_study_nov2011.pdf">case study</a> on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management &#8211; instead of taking the default Sharepoint route or buying another product off&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve just published a <a href="http://www.flax.co.uk/downloads/cspencer_case_study_nov2011.pdf">case study</a> on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management &#8211; instead of taking the default Sharepoint route or buying another product off the shelf, they decided to create their own in-house system based on open source components, hosted on the Amazon AWS Cloud. We&#8217;ve helped them integrate <a href="http://lucene.apache.org/solr/">Apache Solr</a> to provide full text search across the millions of items held in the document management system, with a sub-second response. Their staff can now find letters, contracts, emails and designs quickly via a web interface. </p>
<p>C Spencer are known for their innovative and modern approach &#8211; they&#8217;re even <a href="http://www.thisishullandeastriding.co.uk/150m-green-waste-power-station-plan-Hull/story-12765690-detail/story.html">building their own green power station</a> on a brownfield site in Hull. It&#8217;s thus not surprising that they chose cutting-edge open source technology for search: tracking and managing documents correctly is extremely important to their business.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another powerful API based on Solr launches, searching more patents than Google</title>
		<link>http://www.flax.co.uk/blog/2011/10/07/another-powerful-api-based-on-solr-launches-searching-more-patents-than-google/</link>
		<comments>http://www.flax.co.uk/blog/2011/10/07/another-powerful-api-based-on-solr-launches-searching-more-patents-than-google/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 12:57:20 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[patents]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=633</guid>
		<description><![CDATA[<p>Our customer Cambridge Intellectual Property <a href=http://www.cambridgenetwork.co.uk/news/article/default.aspx?objid=85392>announced yesterday</a> their new API for a collection of 55 million patents &#8211; 48 million more than Google Patents. It&#8217;s great to see a Cambridge company innovating in this space, especially as the service&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Our customer Cambridge Intellectual Property <a href=http://www.cambridgenetwork.co.uk/news/article/default.aspx?objid=85392>announced yesterday</a> their new API for a collection of 55 million patents &#8211; 48 million more than Google Patents. It&#8217;s great to see a Cambridge company innovating in this space, especially as the service is powered by <a href=http://lucene.apache.org/solr/>Apache Solr</a> (we&#8217;ve given them some small assistance with configuring and tuning this software over the last few months).</p>
<p>The API, available on the <a href=http://www.boliven.com/bws/patents_api>Boliven website</a>, offers a REST based service and returns patent data in JSON or XML &#8211; so users can easily integrate patent data with their own applications. It can also return PDFs or summaries of the selected patents. In addition, the API will allow users to search and query Boliven&#8217;s database of 45+ million science literature documents including journal publications and medical device trials. That&#8217;s around 100 million items in total.</p>
<p>Like the Guardian&#8217;s Open Platform which I wrote about <a href=http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/>previously</a>, this is a great example of open source search technology as a platform for new delivery methods &#8211; showing how effective (and economical) it can be at this large scale.</p>
<p>It didn&#8217;t take me long to find <a href=http://www.boliven.com/patent/AU3209101?q=charles+hull+john+snyder>my own</a> small contribution to the patent landscape.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/10/07/another-powerful-api-based-on-solr-launches-searching-more-patents-than-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Enterprise Search dead? No, but it&#8217;s changing&#8230;</title>
		<link>http://www.flax.co.uk/blog/2011/09/15/is-enterprise-search-dead-no-but-its-changing/</link>
		<comments>http://www.flax.co.uk/blog/2011/09/15/is-enterprise-search-dead-no-but-its-changing/#comments</comments>
		<pubDate>Thu, 15 Sep 2011 11:05:48 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analyst]]></category>
		<category><![CDATA[attivio]]></category>
		<category><![CDATA[autonomy]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[market]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=630</guid>
		<description><![CDATA[<p>I spent yesterday morning at Ovum&#8217;s <a href="http://enterprisesearch.ovumevents.com/">briefing on Enterprise Search</a>, and they kindly invited me to sit on a discussion panel. One of the more controversial topics raised by analyst Mike Davis was &#8216;Is Enterprise Search dead?&#8217; which provoked&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I spent yesterday morning at Ovum&#8217;s <a href="http://enterprisesearch.ovumevents.com/">briefing on Enterprise Search</a>, and they kindly invited me to sit on a discussion panel. One of the more controversial topics raised by analyst Mike Davis was &#8216;Is Enterprise Search dead?&#8217; which provoked some lively discussion. We also heard from Tyler Tate of <a href="http://www.twigkit.com">Twigkit</a> on Search UX, <a href="http://www.exalead.com">Exalead</a> on Search Based Applications and <a href="http://searchtechnologies.com">Search Technologies</a> on data conditioning and why metadata is so important.</p>
<p>One can&#8217;t deny that the search market is going through some huge changes at the moment. Larger vendors are being <a href="http://www.flax.co.uk/blog/2011/08/19/mixed-reactions-as-hp-buys-autonomy/">acquired</a> which can lead to some major (and not always welcome) <a href="http://www.flax.co.uk/blog/2010/02/09/fast-drops-linux-unix-support-no-surprise/">changes</a> in the product, pricing and service. Smaller vendors are finding it increasingly hard to compete with the plethora of powerful open source solutions (we&#8217;ve heard rumours of prices of closed source solutions being dropped radically to attempt to secure new business). There are also some interesting moves towards more comprehensive Business Intelligence and Unified Access solutions, such as <a href="http://www.attivio.com">Attivio</a>. </p>
<p>I don&#8217;t think enterprise search is dying as a market or an offering, simply changing &#8211; and hopefully for the better, into an era of more realistic pricing, solutions that actually work (rather than promising &#8216;magic&#8217;) and more openness in terms of the technology and capability. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/09/15/is-enterprise-search-dead-no-but-its-changing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mixed reactions as HP buys Autonomy</title>
		<link>http://www.flax.co.uk/blog/2011/08/19/mixed-reactions-as-hp-buys-autonomy/</link>
		<comments>http://www.flax.co.uk/blog/2011/08/19/mixed-reactions-as-hp-buys-autonomy/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 10:43:09 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[analyst]]></category>
		<category><![CDATA[autonomy]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=621</guid>
		<description><![CDATA[<p>The blogotweetosphere has been positively buzzing since last night&#8217;s announcement that Hewlett Packard will be buying Autonomy for £7.1bn, while divesting itself of its PC business. Many commentators have put a positive spin on this, pointing to Autonomy&#8217;s <a href="http://www.cabume.co.uk/software/cambridge-hq-and-uk-staff-to-net-gbp30m-as-hp-offers-gbp6bn-for-autonomy.html">meteoric</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>The blogotweetosphere has been positively buzzing since last night&#8217;s announcement that Hewlett Packard will be buying Autonomy for £7.1bn, while divesting itself of its PC business. Many commentators have put a positive spin on this, pointing to Autonomy&#8217;s <a href="http://www.cabume.co.uk/software/cambridge-hq-and-uk-staff-to-net-gbp30m-as-hp-offers-gbp6bn-for-autonomy.html">meteoric rise from a small office in Cambridge</a> to the behemoth it is today. It&#8217;s undoubtedly good news for Autonomy&#8217;s shareholders.  <a href="http://kellblog.com/2011/08/18/hp-rumored-to-be-buying-uks-autonomy-for-10b/?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+Kellblog+%28Kellblog%29">Dave Kellogg</a> correctly identifies Autonomy as a &#8220;finance company dressed in (meaning-based) technology company clothing&#8221; with a &#8220;happy ending&#8221;.</p>
<p>However the reaction isn&#8217;t all positive &#8211; the <a href="http://ht.ly/66ZMo">FT implies</a> this deal is at the &#8220;lunatic end of the valuation spectrum&#8221;. <a href="http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=1202511698633&#038;In_LargestEver_Legal_Technology_Deal_HP_Acquires_Autonomy_for_10B&#038;slreturn=1&#038;hbxlogin=1">Law Technology News</a> says &#8220;Autonomy&#8217;s e-discovery revenue stream is high-end but unsustainable&#8221; and quotes users of the system with problems: &#8220;We had a lot of issues with the applications crashing, the documents tending not to get checked in&#8221;&#8230;.&#8221;"[Autonomy sales staff] were pricey, arrogant, and they couldn&#8217;t care less about us. &#8230; It cannot get any worse.&#8221;. </p>
<p>HP will have to work hard to integrate Autonomy into both its corporate culture and software frameworks &#8211; a problem currently faced by Microsoft since its acquisition of FAST a short while ago. <a href="http://arnoldit.com/wordpress/2011/08/18/hp-and-autonomy-what-is-ahead/">Stephen Arnold</a> thinks this process will be &#8220;risky&#8221;. What it means for the rest of the search sector is harder to guess, although <a href="http://www.intranetfocus.com/archives/446">Martin White of Intranet Focus</a> says this deal indicates HP can see a &#8220;future in search applications&#8221; and, interestingly, &#8220;A number of privately-held search vendors are probably working out what their valuation would be&#8221;. </p>
<p>My view is that this is just the latest of huge shifts in the enterprise search market, partly spurred on by the rise of open source options and the gradual realisation that the huge license fees charged by some vendors may be unsustainable. I don&#8217;t think Autonomy will be the last company looking for a safe haven in the years to come.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/08/19/mixed-reactions-as-hp-buys-autonomy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flax&#8217;s 10th birthday!</title>
		<link>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/</link>
		<comments>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 10:23:12 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[happy birthday to us]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=618</guid>
		<description><![CDATA[<p>Today marks 10 years since we formed Flax (originally as Lemur Consulting Ltd.). We had an idea that search based on open source software was going to be increasingly important (indeed, our original business model was consultancy based on <a&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Today marks 10 years since we formed Flax (originally as Lemur Consulting Ltd.). We had an idea that search based on open source software was going to be increasingly important (indeed, our original business model was consultancy based on <a href="http://www.xapian.org">Xapian</a>) and I think we&#8217;ve been proved right over the decade. Today, in the depths of a recession, we&#8217;re seeing significant growth in the business and some fascinating opportunities: the sector is still going through rapid change and it will be very interesting to see what the next few years bring.</p>
<p>Thanks to all of those who have worked <a href="http://www.flax.co.uk/our_clients">with us</a> and <a href="http://www.flax.co.uk/who_we_are">for us</a> over the last decade &#8211; we look forward to the next ten years in this exciting field!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/07/27/flaxs-10th-birthday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enterprise Search Europe &amp; a SuperSized Search Meetup</title>
		<link>http://www.flax.co.uk/blog/2011/07/22/enterprise-search-europe-a-supersized-search-meetup/</link>
		<comments>http://www.flax.co.uk/blog/2011/07/22/enterprise-search-europe-a-supersized-search-meetup/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 10:57:46 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[enterprise search europe]]></category>
		<category><![CDATA[meetup]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=614</guid>
		<description><![CDATA[<p>We&#8217;ve been helping to organise a new conference to be held in London this October, <a href="http://www.enterprisesearcheurope.com/2011/">Enterprise Search Europe</a>. This two-day event promises to give a <em>&#8216;European perspective on the technology, selection, implementation and optimisation of enterprise-scale search&#8217; </em>and features&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been helping to organise a new conference to be held in London this October, <a href="http://www.enterprisesearcheurope.com/2011/">Enterprise Search Europe</a>. This two-day event promises to give a <em>&#8216;European perspective on the technology, selection, implementation and optimisation of enterprise-scale search&#8217; </em>and features speakers from <a href="http://www.3i.com/">3i plc</a>, <a href="http://www.logica.co.uk/">Logica</a>, <a href="http://www.guardian.co.uk/">The Guardian</a> and a number of search providers such as <a href="http://www.findwise.com/">Findwise</a>, <a href="http://www.funnelback.com/">Funnelback</a> and ourselves (I&#8217;ll be talking on <em>&#8216;Building a Strong Business Foundation with Open Source Search&#8217;</em> on the second day). </p>
<p>It&#8217;s going to be a busy time as I&#8217;m also chairing a panel on the first day and helping run the evening reception, which is co-hosted by the <a href="http://www.meetup.com/es-london/">London</a> and <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Cambridge</a> Search Meetups &#8211; this is likely to be one of the largest Search Meetups ever and is sure to be a fascinating evening, featuring speakers from the conference in an informal setting (i.e., a pub!).</p>
<p>Hope to see some of you there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/07/22/enterprise-search-europe-a-supersized-search-meetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to remove a stored field in Lucene</title>
		<link>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/</link>
		<comments>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 12:12:42 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=598</guid>
		<description><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are tens of millions of complex documents in this case) we looked for a way to remove the stored field in-place.</p>
<p>There&#8217;s an interesting set of <a href="http://www.slideshare.net/abial/eurocon2010">slides from last year&#8217;s Apache Lucene Eurocon</a> which discuss this kind of Lucene index post-processing, but we didn&#8217;t find any tools to do this particular task (although this doesn&#8217;t mean they don&#8217;t exist &#8211; for example <a href="http://code.google.com/p/luke/">Luke</a> may be helpful). So we wrote our own, based on some examples in the &#8216;contrib&#8217;  directory of Solr 4. We override the document() methods of FilterIndexReader to remove the required field from each returned Document&#8217;s field list. Terms aren&#8217;t interfered with, so it really is like changing the field from being stored to not being stored; it&#8217;s still indexed.</p>
<p>The code is available <a href="http://code.google.com/p/flaxcode/source/browse/#svn%2Ftrunk%2Flucene_tools">here</a>. It&#8217;s written against Lucene 2.9.3 (which is contained in Solr 1.4.1).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

