<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flax Blog &#187; Technical</title>
	<atom:link href="http://www.flax.co.uk/blog/category/technical/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flax.co.uk/blog</link>
	<description>Open source &#38; enterprise search</description>
	<lastBuildDate>Wed, 25 Jan 2012 14:56:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Building bridges in the Cloud with open source search</title>
		<link>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/</link>
		<comments>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 14:46:00 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[client]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[document management]]></category>
		<category><![CDATA[intranet]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sharepoint]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=667</guid>
		<description><![CDATA[<p>We&#8217;ve just published a <a href="http://www.flax.co.uk/downloads/cspencer_case_study_nov2011.pdf">case study</a> on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management &#8211; instead of taking the default Sharepoint route or buying another product off&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve just published a <a href="http://www.flax.co.uk/downloads/cspencer_case_study_nov2011.pdf">case study</a> on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management &#8211; instead of taking the default Sharepoint route or buying another product off the shelf, they decided to create their own in-house system based on open source components, hosted on the Amazon AWS Cloud. We&#8217;ve helped them integrate <a href="http://lucene.apache.org/solr/">Apache Solr</a> to provide full text search across the millions of items held in the document management system, with a sub-second response. Their staff can now find letters, contracts, emails and designs quickly via a web interface. </p>
<p>C Spencer are known for their innovative and modern approach &#8211; they&#8217;re even <a href="http://www.thisishullandeastriding.co.uk/150m-green-waste-power-station-plan-Hull/story-12765690-detail/story.html">building their own green power station</a> on a brownfield site in Hull. It&#8217;s thus not surprising that they chose cutting-edge open source technology for search: tracking and managing documents correctly is extremely important to their business.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/11/23/building-bridges-in-the-cloud-with-open-source-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to remove a stored field in Lucene</title>
		<link>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/</link>
		<comments>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 12:12:42 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=598</guid>
		<description><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are tens of millions of complex documents in this case) we looked for a way to remove the stored field in-place.</p>
<p>There&#8217;s an interesting set of <a href="http://www.slideshare.net/abial/eurocon2010">slides from last year&#8217;s Apache Lucene Eurocon</a> which discuss this kind of Lucene index post-processing, but we didn&#8217;t find any tools to do this particular task (although this doesn&#8217;t mean they don&#8217;t exist &#8211; for example <a href="http://code.google.com/p/luke/">Luke</a> may be helpful). So we wrote our own, based on some examples in the &#8216;contrib&#8217;  directory of Solr 4. We override the document() methods of FilterIndexReader to remove the required field from each returned Document&#8217;s field list. Terms aren&#8217;t interfered with, so it really is like changing the field from being stored to not being stored; it&#8217;s still indexed.</p>
<p>The code is available <a href="http://code.google.com/p/flaxcode/source/browse/#svn%2Ftrunk%2Flucene_tools">here</a>. It&#8217;s written against Lucene 2.9.3 (which is contained in Solr 1.4.1).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open source intranet search over millions of documents with full security</title>
		<link>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/</link>
		<comments>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 11:03:34 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[faceted search]]></category>
		<category><![CDATA[file format]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[intranet]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=489</guid>
		<description><![CDATA[<p>Last year my colleague Tom Mortimer <a href="http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/">talked about indexing security information</a> within an open source enterprise search application, and we&#8217;re happy to announce more details of the project. Our <a href="http://www.taitworld.com">client</a> is an international radio supplier, who had considered&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Last year my colleague Tom Mortimer <a href="http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/">talked about indexing security information</a> within an open source enterprise search application, and we&#8217;re happy to announce more details of the project. Our <a href="http://www.taitworld.com">client</a> is an international radio supplier, who had considered both closed source products and search appliances, but chose open source for greater flexibility and the much lower cost of scaling to indexes of millions of documents. </p>
<p>Using the <a href="http://www.flax.co.uk/what_we_do/">Flax platform</a>, we built a high-performance multi-threaded filesystem crawler to gather documents, translated them to plain text using our own open source Flax Filters and captured Unix file permissions and access control lists (ACLs). User logins are authenticated against an LDAP server and we use this to show only the results a particular user is allowed to see. We also added the ability to tag documents directly within the search results page (for example, to mark &#8216;current&#8217; versions, or even personal favourites) &#8211; the tags can then be used to filter future results. <a href="http://en.wikipedia.org/wiki/Faceted_search">Faceted search</a> is also available.</p>
<p>You can read more about the project in a <a href="http://www.flax.co.uk/downloads/tait_case_study_jan2011.pdf">case study</a> (PDF) and Tom&#8217;s <a href="http://www.flax.co.uk/downloads/intranet_show_and_tell_2010.pdf">presentation slides</a> (PDF) explain more about the method we used to index the security information.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Building a new press cuttings service for the  Financial Times</title>
		<link>http://www.flax.co.uk/blog/2010/10/25/building-a-new-press-cuttings-service-for-the-financial-times/</link>
		<comments>http://www.flax.co.uk/blog/2010/10/25/building-a-new-press-cuttings-service-for-the-financial-times/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 09:39:10 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[financial times]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[web service]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=395</guid>
		<description><![CDATA[<p>Those of you who read my <a href="http://slidesha.re/a7h7WL">slides from Search Solutions 2010</a> will have spotted a case study on our work for the <a href="http://www.ft.com">Financial Times</a>, one of the world’s leading business news organisations. </p>
<p>When the <a href="http://www.prweek.com/uk/News/MostRead/1013735/Financial-Times-launches-press-cuttings-service-FTcom/">Financial Times</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Those of you who read my <a href="http://slidesha.re/a7h7WL">slides from Search Solutions 2010</a> will have spotted a case study on our work for the <a href="http://www.ft.com">Financial Times</a>, one of the world’s leading business news organisations. </p>
<p>When the <a href="http://www.prweek.com/uk/News/MostRead/1013735/Financial-Times-launches-press-cuttings-service-FTcom/">Financial Times decided to bring their digital press cuttings in-house</a> in summer 2010, they asked us to build a powerful &#8217;search server&#8217; that they could easily integrate into their existing product offerings. </p>
<p>We built an indexer for the XML source data and a <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">RESTful</a> <a href="http://en.wikipedia.org/wiki/Web_service">Web Service</a> API, offering search features including Boolean operators, phrase searches, area specifiers (search whole article, body, headline, byline or any combination), date range restrictions, similarity search (“articles like this one”) and <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>. Also available is spelling correction and synonyms, and detailed logging of indexing and all searches.</p>
<p>This might sound like a complex task, but using open source technology we created this system <strong>within less than a fortnight</strong>. Initially designed as a small-scale prototype, the system scaled easily to indexing hundreds of thousands of pages. You can use the service at<br />
<a href="http://presscuttings.ft.com ">http://presscuttings.ft.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/10/25/building-a-new-press-cuttings-service-for-the-financial-times/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When search isn&#8217;t just search at The Guardian</title>
		<link>http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/</link>
		<comments>http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/#comments</comments>
		<pubDate>Tue, 19 Oct 2010 09:06:43 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[guardian]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SOLR]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=378</guid>
		<description><![CDATA[<p>A <a href="http://www.meetup.com/es-london/calendar/14829629/?from=list&#038;offset=0">fascinating event</a> last night as the <a href="http://www.guardian.co.uk/">Guardian</a> team told us more about how they&#8217;ve used open source search technology to build their new <a href="http://www.guardian.co.uk/open-platform">open platform</a>. The presentations were brief and to-the-point, and covered how the team&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.meetup.com/es-london/calendar/14829629/?from=list&#038;offset=0">fascinating event</a> last night as the <a href="http://www.guardian.co.uk/">Guardian</a> team told us more about how they&#8217;ve used open source search technology to build their new <a href="http://www.guardian.co.uk/open-platform">open platform</a>. The presentations were brief and to-the-point, and covered how the team have created a detailed, rich API to their news content, all built on the open source engine <a href="http://lucene.apache.org/solr/">Apache Solr</a> &#8211; opening up Guardian Media Group content to the world for mashups, repurposing and innovative new business models.</p>
<p>The Guardian have an existing Oracle database with <a href="http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition">J2EE</a> web applications to serve content, but discovered that certain operations such as returning content with multiple tags, or dynamically generated &#8216;related&#8217; content, were very database-intensive and difficult to scale. The use of Solr effectively <em>flattens the cost</em> of these complex queries, and also allows them to scale up capacity on demand by simply spinning up more Solr instances on the <a href="http://aws.amazon.com/ec2/">Amazon EC2 cloud </a>. Interestingly, site search for the <a href="http://www.guardian.co.uk/">Guardian website</a> doesn&#8217;t yet use Solr, although they hope to move this across soon.</p>
<p>What we&#8217;re seeing here is a change in how search technology is used especially by forward-looking organisations &#8211; from being a bolt-on to an existing website or application, <strong>search is now the platform</strong> for new developments. I&#8217;ll be talking about other ways <a href="http://irsg.bcs.org/SearchSolutions/2010/sse2010.php">open source search has been used for news content</a> at the British Computer Society this coming Thursday 21st October &#8211; I believe there are still a few places available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Open source search engines and programming languages</title>
		<link>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/</link>
		<comments>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 10:40:16 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=352</guid>
		<description><![CDATA[<p>So you&#8217;re writing a search-related application in your favourite language, and you&#8217;ve decided to choose an open source search engine to power it. So far, so good &#8211; but how are the two going to communicate?</p>
<p>Let&#8217;s look at two&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>So you&#8217;re writing a search-related application in your favourite language, and you&#8217;ve decided to choose an open source search engine to power it. So far, so good &#8211; but how are the two going to communicate?</p>
<p>Let&#8217;s look at two engines, <a href="http://www.xapian.org">Xapian</a> and <a href="http://www.lucene.net">Lucene</a>, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ &#8211; so if you&#8217;re using those languages respectively, everything should be relatively simple &#8211; just download the source code and get on with it. However if this isn&#8217;t the case, you&#8217;re going to have to work out how to interface to the engine. </p>
<p>The Lucene project has been rewritten in several other languages: for C/C++ there&#8217;s <a href="http://incubator.apache.org/lucy/">Lucy</a> (which includes Perl and Ruby bindings), for Python there&#8217;s <a href=http://lucene.apache.org/pylucene/>PyLucene</a>, and there&#8217;s even a .Net version called, not surprisingly, <a href="http://incubator.apache.org/lucene.net/">Lucene.NET</a>. Some of these &#8216;ports&#8217; of Lucene are &#8216;looser&#8217; than others (i.e. they may not share the same API or feature set), and they may not be updated as often as Lucene itself. There are also versions in Perl, Ruby, Delphi or even Lisp (scary!) &#8211; there&#8217;s a <a href="http://wiki.apache.org/lucene-java/LuceneImplementations">full list</a> available. Not all are currently active projects.</p>
<p>Xapian takes a different approach, with only one core project, but a sheaf of bindings to other languages. Currently these bindings cover C#, Java, Perl, PHP, Python, Ruby and Tcl &#8211; but interestingly these are <em>auto-generated</em> using the <a href="http://www.swig.org/">Simplified Wrapper and Interface Generator</a> or SWIG. This means that every time Xapian&#8217;s API changes, the bindings can easily be updated to reflect this (it&#8217;s actually not quite that simple, but SWIG copes with the vast majority of code that would otherwise have to be manually edited). SWIG actually supports other languages as well (according to the SWIG website, &#8220;Common Lisp (CLISP, Allegro CL, CFFI, UFFI), Lua, Modula-3, OCAML, Octave and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme, Chicken)&#8221;) so in theory bindings to these could also be built relatively easily.</p>
<p>There&#8217;s also another way to communicate with both engines, using a <em>search server</em>. <a href="http://lucene.apache.org/solr/">SOLR</a> is the search server for Lucene, whereas for Xapian there is <a href="http://www.flax.co.uk/the_software">Flax Search Service</a>. In this case, any language that supports Web Services (you&#8217;d be hard pressed to find a modern language that doesn&#8217;t) can communicate with the engine, simply passing data over the HTTP protocol. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/09/03/open-source-search-engines-and-programming-languages/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>flax.crawler arrives</title>
		<link>http://www.flax.co.uk/blog/2010/08/02/flax-crawler-arrives/</link>
		<comments>http://www.flax.co.uk/blog/2010/08/02/flax-crawler-arrives/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 15:22:24 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[crawling]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=329</guid>
		<description><![CDATA[<p>We&#8217;ve recently uploaded a new <a href="http://code.google.com/p/flaxcode/source/browse/trunk/flax/crawler/">crawler framework</a> to the Flax code repository. This is designed for use from Python to build a <a href="http://en.wikipedia.org/wiki/Web_crawler">web crawler</a> for your project. It&#8217;s multithreaded and simple to use, here&#8217;s a minimal example:&#8230;</p>
<p]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve recently uploaded a new <a href="http://code.google.com/p/flaxcode/source/browse/trunk/flax/crawler/">crawler framework</a> to the Flax code repository. This is designed for use from Python to build a <a href="http://en.wikipedia.org/wiki/Web_crawler">web crawler</a> for your project. It&#8217;s multithreaded and simple to use, here&#8217;s a minimal example:</p>
<p style="padding-left: 30px;"><code> import crawler</code></p>
<p style="padding-left: 30px;"><code> crawler.dump = MyContentDumperImplementation()</code><br />
<code> crawler.pool.add_url(StdURL("http://test/"))</code><br />
<code> crawler.pool.add_url(StdURL("http://anothertest/"))</code><br />
<code> crawler.start()</code></p>
<p>Note that you can provide your own implementation of various parts of the crawler &#8211; and you must at least provide a &#8216;content dumper&#8217; to store whatever the crawler finds and downloads.</p>
<p>We&#8217;ve also included a reference implementation, a working crawler that stores URLs and downloaded content in a SQLite3 database.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/08/02/flax-crawler-arrives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Log analysis and adaptive search</title>
		<link>http://www.flax.co.uk/blog/2010/07/30/log-analysis-and-adaptive-search/</link>
		<comments>http://www.flax.co.uk/blog/2010/07/30/log-analysis-and-adaptive-search/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 14:03:51 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[intranet]]></category>
		<category><![CDATA[networking]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=326</guid>
		<description><![CDATA[<p>I attended an interesting talk by Udo Kruschwitz on Adaptive Intranet Search last night as part of the <a href="http://www.meetup.com/es-london/">Enterprise Search London</a> Meetup. Udo has built a search engine for the University of Essex and has been investigating how to&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I attended an interesting talk by Udo Kruschwitz on Adaptive Intranet Search last night as part of the <a href="http://www.meetup.com/es-london/">Enterprise Search London</a> Meetup. Udo has built a search engine for the University of Essex and has been investigating how to help users to refine their query using techniques such as suggesting related terms (there&#8217;s a similar feature in Xapian called &#8216;top terms&#8217; &#8211; <a href="http://xapian.org/search?P=language&#038;DEFAULTOP=or&#038;DB=default&#038;FMT=xapian.org&#038;xP=Zenglish&#038;xDB=default&#038;xFILTERS=--O">here&#8217;s an example</a>). As part of this he&#8217;s done a great deal of analysis of query and session logs, and is building up expertise on <em>automatically maintained domain knowledge</em> &#8211; moving away from the traditional model of manually maintained networks relating one word or phrase to another. For example, his system is learning automatically that when users type &#8220;map&#8221; into the search box, they really want to search for &#8220;campus map&#8221;.  The number of documents in his test collection is small and the volume of searches is low; it will be interesting to see how these ideas scale to larger collections and groups of users.</p>
<p>The group was small, informal and seemed to consist mainly of those with expertise in implementing search solutions &#8211; no sales or marketing here, just a group of people discussing how best to get the job done. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/07/30/log-analysis-and-adaptive-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>flax.core 0.1 available</title>
		<link>http://www.flax.co.uk/blog/2010/06/24/flax-core-0-1-available/</link>
		<comments>http://www.flax.co.uk/blog/2010/06/24/flax-core-0-1-available/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 15:50:53 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=296</guid>
		<description><![CDATA[<p>Charlie <a href="http://www.flax.co.uk/blog/2010/06/14/packaged-solutions-and-customisability-the-python-way/">wrote previously</a> that we try and work with flexible, lightweight frameworks:  <strong>flax.core</strong> is a Python library for conveniently adding functionality to Xapian projects. The current (and first!) version is 0.1, which can be checked out from the <a&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Charlie <a href="http://www.flax.co.uk/blog/2010/06/14/packaged-solutions-and-customisability-the-python-way/">wrote previously</a> that we try and work with flexible, lightweight frameworks:  <strong>flax.core</strong> is a Python library for conveniently adding functionality to Xapian projects. The current (and first!) version is 0.1, which can be checked out from the <a href="http://code.google.com/p/flaxcode/source/checkout">flaxcode repository</a>. This version supports named fields for indexing and search (no need to deal with prefixes or value numbers), facets, simplified query construction, and an optional action-oriented indexing framework.</p>
<p>Unlike <a href="http://xappy.org/">Xappy</a>, flax.core makes no attempt to abstract or hide the Xapian API, and is therefore aimed at a rather different audience. The reason is our observation that &#8220;interesting&#8221; search applications often require customisation at the Xapian API level, for example bespoke MatchDeciders, PostingSources or Sorters. Rather than having to dive in and modify the flax.core code, these application-specific modifications can happily co-exist with the unmodified flax.core (at least, this is the intention). It is also intended that flax.core remains minimal enough to easily port to other languages such as PHP or Java.</p>
<p>The primary flax.core class is <strong>Fieldmap</strong>, which associates a set of named fields with a Xapian database. As an example, the following code sets up a simple structure of one &#8216;freetext&#8217; and one &#8216;filter&#8217; field:</p>
<pre>    import xapian
    import flax.core

    db = xapian.WritableDatabase('db', xapian.DB_CREATE)
    fm = flax.core.Fieldmap()
    fm.language = 'en'              # stem for English
    fm.setfield('mytext', False)      # freetext field
    fm.setfield('mydate', True)       # filter field

    fm.save(db)</pre>
<p>and this code indexes some text and a datetime:</p>
<pre>    doc = fm.document()
    doc.index('mytext', "I don't like spam.")
    doc.index('mydate', datetime(2010, 2, 3, 12, 0))
    fm.add_document(db, doc)
    db.flush()</pre>
<p>Fields can be of type string, int, float or datetime. These are handled automatically, and are not tied to fieldnames (so it would be possible to have field instances of different types, not that this is a good idea).</p>
<p>Indexing can also be performed by the Action framework. In this case, a text file contains a list of:</p>
<ul>
<li>external identifiers (such as XPaths,  SQL column name etc)</li>
<li>flax fieldname</li>
<li>indexing actions</li>
</ul>
<p>For example, an actions file for XML might look like this:</p>
<pre>
    .//metadata[@name='Author']/@value
        author: filter(facet)
        author2: index(default)

    .//metadata[@name='Year']/@value
        published: numeric
</pre>
<p>This means that &#8216;Author&#8217; metadata elements are indexed as two flax fields: &#8216;author&#8217; is a filter field which stores facet values, while &#8216;author2&#8242; is a freetext field which is searchable by default. &#8216;Year&#8217; metadata elements are indexed as the flax field &#8216;published&#8217;, which is numeric.</p>
<p>The flaxcode repository contains two example flax.core applications here:</p>
<pre>
    applications/flax_core_examples
</pre>
<p>One is an XML indexer implemented in less than 100 lines, the other is a minimal web search application in a similar number of lines. Currently there is no documentation other than these examples and the docstrings in flax.core. If anyone needs some, I&#8217;ll put some together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/06/24/flax-core-0-1-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Packaged solutions and customisability, the Python way</title>
		<link>http://www.flax.co.uk/blog/2010/06/14/packaged-solutions-and-customisability-the-python-way/</link>
		<comments>http://www.flax.co.uk/blog/2010/06/14/packaged-solutions-and-customisability-the-python-way/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 11:14:43 +0000</pubDate>
		<dc:creator>charlie</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[flax]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://www.flax.co.uk/blog/?p=293</guid>
		<description><![CDATA[<p>With any large scale software installation, there is going to be some customisation and tweaking necessary, and enterprise search systems are no exception. Whatever features are packaged with a system, some of those you need will be missing and some&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>With any large scale software installation, there is going to be some customisation and tweaking necessary, and enterprise search systems are no exception. Whatever features are packaged with a system, some of those you need will be missing and some won&#8217;t be used at all. It&#8217;s rare to see a situation where the search engine can just be installed straight out of the box.</p>
<p>Our Flax system is based on the <a href="http://www.xapian.org">Xapian</a> core, which has a set of bindings to various different languages including Perl, Python, PHP, Java, Ruby, C# and even TCL, which makes integration with systems where a particular language is preferred relatively easy. However for the Flax layer itself (comprising file filters, indexers, crawlers, front ends, administration tools etc. &#8211; the &#8216;toolkit&#8217; for building a complete search system) we chose Python, for much the same reasons as the <a href="http://www.python.org/about/success/verity/">Ultraseek developers did back in 2003</a>.</p>
<p>The flexibility of Python means we can add any missing features very fast, and create complete new systems in a matter of days &#8211; for example, often a complete indexer can be created in less than 50 lines of code, by re-using existing components and taking advantage of the many Python modules available (such as XML parsers). Our open source approach also means that solutions we create for one customer can often be repurposed and adapted for another &#8211; which again makes for very short development cycles. Python is also available on a wide variety of platforms.</p>
<p>We&#8217;re <a href="http://xkcd.com/353/">not alone</a> in our preference for Python of course!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flax.co.uk/blog/2010/06/14/packaged-solutions-and-customisability-the-python-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

