<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flax Blog</title>
	<atom:link href="http://www.flax.co.uk/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flax.co.uk/blog</link>
	<description>Open source &#38; enterprise search</description>
	<lastBuildDate>Fri, 11 May 2012 11:43:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Big Data &#8211; It&#8217;s not always big and it&#8217;s not always clever</title>
		<link>http://www.flax.co.uk/blog/2012/05/11/big-data-its-not-always-big-and-its-not-always-clever/</link>
		<comments>http://www.flax.co.uk/blog/2012/05/11/big-data-its-not-always-big-and-its-not-always-clever/#comments</comments>
		<pubDate>Fri, 11 May 2012 11:43:34 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[analyst]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=753</guid>
		<description><![CDATA[<p>There&#8217;s been a recent flurry of activity from search vendors (and those larger companies that have been buying them) around the theme of Big Data, which has become the fashionable marketing term for a sheaf of technologies including search, machine&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been a recent flurry of activity from search vendors (and those larger companies that have been buying them) around the theme of Big Data, which has become the fashionable marketing term for a sheaf of technologies including search, machine learning, Map Reduce and for scalability in general. If anyone impertinently <a href="http://www.realstorygroup.com/Blog/2344-IBM-and-Vivisimo-Long-live-the-federation">asks</a> why <a href="http://www-03.ibm.com/press/us/en/pressrelease/37491.wss">company X bought company Y</a> the answer seems to be &#8216;because they have capability in Big Data and our customers will need this&#8217;. </p>
<p>Search companies like ours have been working with large datasets since the beginning &#8211; back in 1999/2000 the founders of Flax led a team to build a half-billion-page Web search engine, which as I recall ran on a cluster of 30 or so servers. Since then we&#8217;ve worked with other collections of tens or hundreds of millions of items. Even a relatively small company can have a few million files on their intranet, if you count all those emails, customer records and Powerpoint presentations. So yes, you could say we can do Big Data &#8211; we certainly know how to design and build systems that scale.</p>
<p>However it makes me nervous when a set of technologies that <strong>could</strong> (in theory) be used together are simply lumped together for marketing purposes as the Next Big Thing. The devil is as always in the detail (and the integration) and it&#8217;s important to remember that just because you can fit all your data into a system doesn&#8217;t mean that system will help you make any sense of it. A recent term for unstructured data (which of course us search developers have been working with for decades) is <a href="http://www.realstorygroup.com/Blog/2354-Dark-Data-provides-Lucid-moment-as-Big-Data-turf-war-heats-up">Dark Data</a>, which implies that it is mysterious and hidden &#8211; but that doesn&#8217;t mean it has any actual value. Those considering a Big Data project should be aware that in any computer system <strong><a href="http://en.wikipedia.org/wiki/Garbage_in,_garbage_out">GIGO</a></strong> is still an issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/05/11/big-data-its-not-always-big-and-its-not-always-clever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An open source replacement for the dtSearch closed source search engine</title>
		<link>http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/</link>
		<comments>http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/#comments</comments>
		<pubDate>Tue, 24 Apr 2012 10:00:48 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=741</guid>
		<description><![CDATA[<p>We&#8217;ve been working on a client project where we needed to replace the <a href="http://www.dtsearch.co.uk/">dtSearch</a> closed source search engine, which doesn&#8217;t perform that well at scale in this case. As the client has significant investment in stored queries (it&#8217;s for&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been working on a client project where we needed to replace the <a href="http://www.dtsearch.co.uk/">dtSearch</a> closed source search engine, which doesn&#8217;t perform that well at scale in this case. As the client has significant investment in stored queries (it&#8217;s for a monitoring application) they were keen that the new engine spoke exactly the same query language as the old &#8211; so we&#8217;ve built a version of <a href="http://lucene.apache.org/">Apache Lucene</a> to replace dtSearch. There are a few other modifications we had to do as well, to return such things as positional information from deep within the Lucene code (this is particularly important in monitoring as you want to show clients <em>where</em> the keywords they were interested in appeared in an article &#8211; they may be checking their media coverage in detail, and position on the page is important).</p>
<p>First, we developed a new Lucene <a href="http://www.lucidimagination.com/search/link?url=http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/index.html">Analyzer </a>that speaks the same syntax as dtSearch, allowing us to index text input. On the search side we have a Lucene <a href="http://lucene.apache.org/core/3_6_0/queryparsersyntax.html">QueryParser</a> that shares this syntax. To make it easier to use we&#8217;ve wrapped the whole lot in a modified <a href="http://lucene.apache.org/solr/">Solr</a> server. As we needed some features of very recent Lucene code, our modifications are based on a patch to Lucene trunk (and so the source code isn&#8217;t for the faint hearted &#8211; if you need it <a href="http://www.flax.co.uk/contact_us">let us know</a>, but we&#8217;re not currently providing it for download).</p>
<p>We&#8217;re not sure if there&#8217;s anyone else out there who needs an open source alternative to dtSearch &#8211; but in case there is we&#8217;ve provided a <a href="http://www.flax.co.uk/downloads/flaxSolr1.tgz">downloadable WAR file</a> with the latest Solr executables in our <a href="http://www.flax.co.uk/downloads/">downloads</a> area, including a brief README file.</p>
<p>More generally, what this project demonstrates is that even if you have significant investment in your existing search infrastructure it is entirely possible to move to an open source alternative, which may be faster and will almost certainly be more economically scalable. Does anyone else have a search engine they&#8217;d like to replace?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazon CloudSearch &#8211; a game changer?</title>
		<link>http://www.flax.co.uk/blog/2012/04/12/amazon-cloudsearch-a-game-changer/</link>
		<comments>http://www.flax.co.uk/blog/2012/04/12/amazon-cloudsearch-a-game-changer/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 09:43:16 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[search service]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=736</guid>
		<description><![CDATA[<p>Amazon have just launched a <a href="http://aws.typepad.com/aws/2012/04/amazon-cloudsearch-start-searching-in-one-hour.html/">cloud-based search service</a>, which promises a &#8216;fully managed search service in the cloud&#8217; &#8211; and it certainly looks impressive, with auto-scaling built in. You simply create a service, upload documents as JSON or XML&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Amazon have just launched a <a href="http://aws.typepad.com/aws/2012/04/amazon-cloudsearch-start-searching-in-one-hour.html/">cloud-based search service</a>, which promises a &#8216;fully managed search service in the cloud&#8217; &#8211; and it certainly looks impressive, with auto-scaling built in. You simply create a service, upload documents as JSON or XML and then perform searches. For cases where you need to search publically available data this offers a great way to avoid having to install and integrate any search software &#8211; of course it won&#8217;t be so popular if you&#8217;re worried about where your data actually<em> is,</em> or <a href="http://www.backup-technology.com/9796/u-s-patriot-act-dampens-microsoft-cloud-services/">other complications such as the Patriot Act</a>.</p>
<p>As you might expect, some people are already offering<a href="http://www.searchtechnologies.com/amazon-cloudsearch-services.html"> services based around CloudSearch</a> (we&#8217;d be happy to do the same -<a href="http://www.flax.co.uk/contact_us"> just ask</a>!) and there&#8217;s a <a href="http://wikipedia.searchtechnologies.com/search?q=cloud+search">demo of searching Wikipedia</a> available. I&#8217;m not sure who SmackBot is but I&#8217;m slightly wary of reading any Wikipedia articles it&#8217;s had something to do with&#8230;</p>
<p>Of course searching Wikipedia is <a href="http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html">nothing new</a> and I sometimes wish for a different choice of source material for search demos.</p>
<p>One thing that seems clear is that with the rise of cloud-based search options (here&#8217;s <a href="http://www.lucidimagination.com/products/lucidworks-search-platform/cloud">another from our partners Lucid Imagination</a>, based on Apache Lucene/Solr) the cost and complication of &#8217;simple&#8217; search projects could fall dramatically, applying further pressure to those companies selling closed source search engines for frankly unrealistic prices. Amazon&#8217;s offering, with their huge experience in cloud-based services, has the potential to be a game changer for the search market.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/04/12/amazon-cloudsearch-a-game-changer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Meetup Cambridge &#8211; Challenges of Unstructured Data</title>
		<link>http://www.flax.co.uk/blog/2012/03/15/search-meetup-cambridge-challenges-of-unstructured-data/</link>
		<comments>http://www.flax.co.uk/blog/2012/03/15/search-meetup-cambridge-challenges-of-unstructured-data/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 09:52:12 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[discovery]]></category>
		<category><![CDATA[entity extraction]]></category>
		<category><![CDATA[forensics]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[sharepoint]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=733</guid>
		<description><![CDATA[<p>Another <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/52568422/">Cambridge Search Meetup</a> this week, with two speakers on unstructured data, plus the usual networking, beer and snacks. We started with Dean Yearsley of <a href="http://www.pingar.com">Pingar</a> talking and bravely attempting a live demo of their <a href="http://pingar.com/get-the-api/">API</a>, which&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Another <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/52568422/">Cambridge Search Meetup</a> this week, with two speakers on unstructured data, plus the usual networking, beer and snacks. We started with Dean Yearsley of <a href="http://www.pingar.com">Pingar</a> talking and bravely attempting a live demo of their <a href="http://pingar.com/get-the-api/">API</a>, which amongst other things has facilities for entity extraction in multiple languages including English, Chinese and Japanese. The Pingar system is written in .Net and thus unsurprisingly plays well with Sharepoint: Dean demonstrated it automatically providing extra metadata for Sharepoint items, especially useful if a new column has been added to a Sharepoint store, as it would be tedious for operators to have to add data for this column to each item manually.</p>
<p>Jordan Hrycaj of 7Safe, recently acquired by <a href="http://www.paconsulting.com/">PA Consulting</a>, was up next to talk about what he described as &#8216;ad-hoc&#8217; search &#8211; for use in digital forensics or digital discovery applications. The application he described can be used to search the hard disks of suspect PCs or servers for information such as credit card numbers extremely quickly, working at a low level to avoid leaving any impression on the data (i.e., no file timestamps are altered) and usually working on live systems. This system is command line based, tiny in size and portable across operating systems and is an impressive way to cut down the likely candidates for a data security breach. It was fascinating to hear about a way to search that doesn&#8217;t depend on indexing, and the compromises made for performance reasons (i.e., regular expressions can be used but without wildcards). </p>
<p>Thanks to both speakers and to all who came to hear them. We already have some more talks lined up so we expect the next Meetup to be sooner rather than later!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/03/15/search-meetup-cambridge-challenges-of-unstructured-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search backwards &#8211; media monitoring with open source search</title>
		<link>http://www.flax.co.uk/blog/2012/03/08/search-backwards-media-monitoring-with-open-source-search/</link>
		<comments>http://www.flax.co.uk/blog/2012/03/08/search-backwards-media-monitoring-with-open-source-search/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 12:00:45 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[durrants]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[real time]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=722</guid>
		<description><![CDATA[<p>We&#8217;re working with a number of clients on media monitoring solutions, which are a special case of search application (we&#8217;ve <a href="http://www.flax.co.uk/downloads/durrants_case_study_091210.pdf">worked on this previously</a> for <a href="http://www.flax.co.uk/our_clients">Durrants</a>). In standard search, you apply a single query to a large amount&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re working with a number of clients on media monitoring solutions, which are a special case of search application (we&#8217;ve <a href="http://www.flax.co.uk/downloads/durrants_case_study_091210.pdf">worked on this previously</a> for <a href="http://www.flax.co.uk/our_clients">Durrants</a>). In standard search, you apply a single query to a large amount of documents, expecting to get a ranked list of documents that match your query as a result. However in media monitoring you need to search each incoming document (for example, a news article or blog post) with many queries representing what the end user wants to monitor &#8211; and you need to do this quickly as you may have tens or hundreds of thousands of articles to monitor in close to real time (Durrants have over 60,000 client queries to apply to half a million articles a day). This &#8216;backwards&#8217; search isn&#8217;t really what search engines were designed to do, so performance could potentially be very poor.</p>
<p>There are several ways around this problem: for example in most cases you don&#8217;t need to monitor every article for every client, as they will have told you they&#8217;re only interested in certain sources (for example, a car manufacturer might want to keep an eye on car magazines and the reviews in the back page of the Guardian Saturday magazine, but doesn&#8217;t care about the rest of the paper or fashion magazines). However, pre-filtering queries in this way can be complex especially when there are so many potential sources of data.</p>
<p>We&#8217;ve recently managed to develop a method for searching incoming articles using a brute-force approach based on <a href="http://lucene.apache.org/core/">Apache Lucene</a> which in early tests is performing very well &#8211; around 70,000 queries applied to a single article in around a second on a standard MacBook. On suitable server hardware this would be even faster &#8211; and of course you have all the other features of Lucene potentially available, such as phrase queries, wildcards and highlighting. We&#8217;re looking forward to being able to develop some powerful &#8211; and economically scalable &#8211; media monitoring solutions based on this core.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/03/08/search-backwards-media-monitoring-with-open-source-search/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NOT WITHIN queries in Lucene</title>
		<link>http://www.flax.co.uk/blog/2012/02/22/not-within-queries-in-lucene/</link>
		<comments>http://www.flax.co.uk/blog/2012/02/22/not-within-queries-in-lucene/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 09:29:00 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[dtSearch]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[query]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=718</guid>
		<description><![CDATA[<p>A guest post from <a href="http://www.romseysoftware.co.uk/">Alan Woodward</a> who has joined the <a href="http://www.flax.co.uk/who_we_are">Flax team</a> recently:</p>
<p>I’ve been working on migrating a client from a legacy dtSearch platform to a new system based on <a href="http://lucene.apache.org/core/">Lucene</a>, part of which involves writing&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A guest post from <a href="http://www.romseysoftware.co.uk/">Alan Woodward</a> who has joined the <a href="http://www.flax.co.uk/who_we_are">Flax team</a> recently:</p>
<p>I’ve been working on migrating a client from a legacy dtSearch platform to a new system based on <a href="http://lucene.apache.org/core/">Lucene</a>, part of which involves writing a query parser to translate their existing dtSearch queries into Lucene Query objects.  dtSearch allows you to perform proximity searches – find documents with term A within X positions of term B – which can be reproduced using Lucene SpanQueries (a good introduction to span queries can be found on the <a href="http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/">Lucid Imagination blog</a>). SpanQueries search for Spans – a start term, an end term, and an edit distance. So to search for &#8220;fish&#8221; within two positions of &#8220;chips&#8221;, you’d create a SpanNearQuery, passing in the terms “fish” and “chips” and an edit distance of 2. </p>
<p>You can also search for terms that are not within X positions of another term.  This too is possible to achieve with SpanQueries, with a bit of trickery.</p>
<p>Let’s say we have the following document:</p>
<p>    fish and chips is nicer than fish and jam</p>
<p>We want to match documents that contain the term ‘fish’, but not if it’s within two positions of the term ‘chips’ – the relevant dtSearch syntax here is &#8220;fish&#8221; NOT WITHIN/2 &#8220;chips&#8221;. A query of this type should return the document above, as the second instance of the term ‘fish’ matches our criteria. We can’t just negate a normal &#8220;fish&#8221; WITHIN/2 &#8220;chips&#8221; query, as that won’t match our document. We need to somehow distinguish between tokens within a document based on their context.</p>
<p>Enter the SpanNotQuery. A SpanNotQuery takes two SpanQueries, and returns all documents that have instances of the first Span that do not overlap with instances of the second. The Lucid Imagination post linked above gives the example of searching for “George Bush” – say you wanted documents relating to George W Bush, but not to George H W Bush. You could create a SpanNotQuery that looked for &#8220;George&#8221; within 2 positions of &#8220;Bush&#8221;, not overlapping with &#8220;H&#8221;.</p>
<p>In our specific case, we want to find instances of “fish” that do not overlap with Spans of &#8220;fish&#8221; within/2 &#8220;chips&#8221;. So to create our query, we need the following:</p>
<p><code>int distance = 2;<br />
boolean ordered = true;<br />
SpanQuery fish = new SpanTermQuery(new SpanTerm(FIELD, "fish"));<br />
SpanQuery chips = new SpanTermQuery(new SpanTerm(FIELD, "chips"));<br />
SpanQuery fishnearchips = new SpanNearQuery(new SpanQuery[] { fish, chips },<br />
                                                distance, ordered);</p>
<p>Query q = new SpanNotQuery(fish, fishnearchips);</code></p>
<p>It’s a bit verbose, but that’s <a href="http://www.java.com">Java</a> for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/02/22/not-within-queries-in-lucene/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Searching for (and finding) open source in the UK Government</title>
		<link>http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/</link>
		<comments>http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 10:30:46 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[lucid]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=710</guid>
		<description><![CDATA[<p>There have been some very encouraging noises recently about increased use of open source software by the UK Government: for example we&#8217;ve seen the creation of an <a href="http://www.cabinetoffice.gov.uk/resource-library/open-source-procurement-toolkit">Open Source Procurement Toolkit</a> by the Cabinet Office, which <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/Open-Source-Option-v1.pdf"> lists</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>There have been some very encouraging noises recently about increased use of open source software by the UK Government: for example we&#8217;ve seen the creation of an <a href="http://www.cabinetoffice.gov.uk/resource-library/open-source-procurement-toolkit">Open Source Procurement Toolkit</a> by the Cabinet Office, which <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/Open-Source-Option-v1.pdf"> lists</a> <a href="http://www.xapian.org">Xapian</a> and <a href="http://lucene.apache.org/solr/">Apache Lucene/Solr</a> as alternatives to the usual closed source options. The <a href="http://www.cesg.gov.u">CESG</a>, the &#8220;UK Government&#8217;s National Technical Authority for Information Assurance&#8221;, has clarified its position on open source software, which has led to the Cabinet Office <a href="http://www.zdnet.co.uk/news/enterprise-apps/2011/11/09/cesg-open-source-software-is-secure-enough-for-us-40094387/">dispelling</a> some of the old <a href=http://www.guardian.co.uk/government-computing-network/2012/feb/10/open-source-software-standards-liam-maxwell>myths about security and open source</a>. We know that the Cabinet Office&#8217;s &#8217;skunkworks&#8217;, the <a href="http://digital.cabinetoffice.gov.uk/">Government Digital Service</a>, are using <a href="http://digital.cabinetoffice.gov.uk/2012/01/12/directscot/">Solr</a> for several of their projects. Francis Maude MP was recently in the USA with some of the GDS team and <a href="http://digital.cabinetoffice.gov.uk/2012/02/10/west-coast/">visited</a> amongst others our US partners <a href="http://www.lucidimagination.com/">Lucid Imagination</a>.</p>
<p> The <a href="http://ossg.bcs.org/">British Computer Society</a> have helped organise a series of Awareness Events for civil servants and I&#8217;m glad to be speaking at the first of these next Tuesday 21st February on open source search &#8211; hopefully this will further increase the momentum and make it even more clear that a modern Government needs to consider this modern, flexible and economically scalable approach to software.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search events for 2012 &#8211; the first crop</title>
		<link>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/</link>
		<comments>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 14:56:17 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=704</guid>
		<description><![CDATA[<p>Details of search events in 2012 are beginning to appear already, here&#8217;s a few to start with:</p>
<ul>
<li>1-5 April 2012 &#8211; <a href="http://ecir2012.upf.edu/">European Conference on Information Retrieval</a> (ECIR) in Barcelona, Spain. An academic conference featuring new developments in IR.</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Details of search events in 2012 are beginning to appear already, here&#8217;s a few to start with:</p>
<ul>
<li>1-5 April 2012 &#8211; <a href="http://ecir2012.upf.edu/">European Conference on Information Retrieval</a> (ECIR) in Barcelona, Spain. An academic conference featuring new developments in IR. </li>
<li>7-10 May 2012 &#8211; Lucid Imagination&#8217;s <a href="http://lucenerevolution.org/">Lucene Revolution</a> in Boston, USA. The largest conference on open source search &#8211; this event has a great buzz as the Lucene/Solr community continues to grow.</li>
<li>30/31 May 2012 &#8211; <a href="http://www.enterprisesearcheurope.com/2012">Enterprise Search Europe</a> in London, after a successful first event last year. Great for those planning or working on enterprise search projects.</li>
</ul>
<p>More to come as we hear about them &#8211; we&#8217;ll also be running another <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Cambridge Search Meetup</a> soon. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2012/01/25/search-events-for-2012-the-first-crop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Twelve Days of (Search) Christmas</title>
		<link>http://www.flax.co.uk/blog/2011/12/22/the-twelve-days-of-search-christmas/</link>
		<comments>http://www.flax.co.uk/blog/2011/12/22/the-twelve-days-of-search-christmas/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 12:08:48 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[badpoetry]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=690</guid>
		<description><![CDATA[<p>On the twelfth day of (Search) Christmas my inbox brought to me:</p>
<p>Twelve users searching,<br />
Eleven pages found,<br />
Ten facets shown,<br />
Nine <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Search Meetups</a>,<br />
Eight entity extractors,<br />
Seven <a href="http://www.flax.co.uk/flax_for_solr">SOLR</a> servers,<br />
Six <a href="http://www.flax.co.uk/the_software">Xapian</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>On the twelfth day of (Search) Christmas my inbox brought to me:</p>
<p>Twelve users searching,<br />
Eleven pages found,<br />
Ten facets shown,<br />
Nine <a href="http://www.meetup.com/Enterprise-Search-Cambridge-UK/">Search Meetups</a>,<br />
Eight entity extractors,<br />
Seven <a href="http://www.flax.co.uk/flax_for_solr">SOLR</a> servers,<br />
Six <a href="http://www.flax.co.uk/the_software">Xapian</a> patches,<br />
Five Open Source,<br />
Four <a href="http://www.flax.co.uk/blog/newsletters/flaxnewsdec2011.pdf">cloud apps</a>,<br />
Three <a href="http://www.flax.co.uk/blog/newsletters/flaxnewsnov2010.pdf">Lucid partners</a>,<br />
Two big acquisitions,<br />
And a Mike Lynch on board at HP.</p>
<p>Have a great Christmas and New Year from everyone at Flax.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/12/22/the-twelve-days-of-search-christmas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cambridge Search Meetup review &#8211; Two different kinds of university search</title>
		<link>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 10:38:18 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[adaptive search]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[university]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[website]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=686</guid>
		<description><![CDATA[<p>James Alexander of the Open University talked first on the <a href="http://www3.open.ac.uk/media/fullstory.aspx?id=17403">Access to Video Assets</a> project, a prototype system that looked at preservation, digitisation and access to thousands of TV programs originally broadcast by the BBC. James&#8217; team have worked&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>James Alexander of the Open University talked first on the <a href="http://www3.open.ac.uk/media/fullstory.aspx?id=17403">Access to Video Assets</a> project, a prototype system that looked at preservation, digitisation and access to thousands of TV programs originally broadcast by the BBC. James&#8217; team have worked out an approach based on open source software &#8211; storing programme metadata and video assets in a <a href="http://fedora-commons.org/">Fedora Commons</a> repository, indexing and searching using <a href="http://lucene.apache.org/solr/">Apache Solr</a>, authentication via <a href="http://drupal.org/">Drupal</a> &#8211; that is testament to the flexibility of these packages (some of which are being used in non-traditional ways &#8211; for example Drupal is used in a &#8216;nodeless&#8217; fashion). He showed the search interface, which allowed you to find the exact points within a long video where particular words are mentioned and play video directly with a pop-up window. I&#8217;d seen this talk before (here&#8217;s a <a href="http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#james_alexander">video and slides</a> from Lucene Eurocon) but what I hadn&#8217;t grasped is how Solr is used as a mediation layer between the user and what can be some very complex data around the video asset itself (subtitles, rights information, format information, scripts etc.). As he mentioned, search is being used as a gateway technology to effective re-use of this huge archive. </p>
<p><a href="http://cswww.essex.ac.uk/staff/udo/">Udo Kruschwitz</a> was next with a brief treatment of his ongoing work on automatically extracting domain knowledge and using this to improve search results (for example see the &#8216;Suggestions&#8217; on the <a href="http://www.essex.ac.uk/Search/SearchResults.aspx?q=accomodation&#038;ssSubmit=search">University of Essex website</a>) &#8211; he showed us some of the various methods his team have tried to analyze query logs, including <a href="http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms">Ant Colony Optimisation</a> (modelling &#8216;trails&#8217; of queries that can be reinforced by repeat visits, or &#8216;fade&#8217; over time as they are less used). I liked the concept of developing a &#8216;community&#8217; search profile where individual search profiles are hard to obtain &#8211; and how this could be simply subdivided (so for example searchers from inside a university might have a different profile to those outside). The key idea here is that all these techniques are <strong>automatic</strong>, so the system is continually evolving to give better search suggestions and hints. Udo and his team are soon to release an open source adaptive search framework to be called &#8220;Sunny Aberdeen&#8221; which we look forward to hearing about.</p>
<p>The evening ended with networking and a pint or two in traditional fashion &#8211; thanks to both our speakers and to all who came, from as far afield as Milton Keynes, Essex and Luton. The group now has 70 members and we&#8217;re building an active and friendly local community of search enthusiasts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/12/08/cambridge-search-meetup-review-two-different-kinds-of-university-search/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

