Posts Tagged ‘open source’

An open approach to tuning search for gov.uk

Roo Reynolds from the GDS team has written a great blog post about the ongoing process of tuning the search for gov.uk which I can highly recommend.

We regularly see situations where a search project has been set up as ‘fire and forget’ – which is never a good idea: not only does content grow, but user needs change and search requirements evolve, whatever the application. Search should be a living project: monitoring user behaviour should reveal not just which searches ‘work’ (i.e. the user gets some results which they then click on) but more important which ones don’t. For example, common mispellings or acronyms might be a useful addition to a synonym list; if average search response times are lengthening then it might be time to consider performance tuning or even scaling out; the constant use of the ‘Next 10 Results’ button might indicate a problem with relevance ranking.

Luckily any improvements to gov.uk made by the GDS team should appear in their Github repository at some point – as I mentioned before the GDS team are (very sensibly) committed to an open source approach.

Tags: , , ,

Posted in Reference, Technical

June 12th, 2013

No Comments »

A belated report on Enterprise Search Europe 2013

Earlier this month I attended the third Enterprise Search Europe conference, this time not to speak but to run workshops, panels, tracks and social events. On Tuesday a colleague and I gave a workshop on Getting the Best from Open Source Search which I hope was useful to attendees: one thing I did take away is how the level of experience with open source and indeed search technology itself can vary widely: some attendees had already experimented widely with Apache Lucene/Solr and some simply wanted to expand their knowledge of the associated risks & benefits of this approach.

The first day of the conference started with Ed Dale of Ernst & Young talking about implementing enterprise search for a truly global organisation. E&Y’s search is over a surprisingly small number of documents (only 2 million or so) but they are lucky enough to have a relatively large and experienced team running their search as an ongoing operation – no ‘fire and forget’ here (an approach often taken and seldom successfully). We moved on to hear from Kristian Norling on the second year of Findwise’s Enterprise Search Survey (some interesting numbers with the full results available soon) and then a fascinating and amusing talk from Joe Lamantia on the Language of Discovery, backed up by a second talk from Tyler Tate – it seems Discovery might a better term for what we call Search, at least from a usability perspective. The morning ended with Steven Arnold’s provocative take on how the performance of search technology hasn’t improved measurably in many decades due to processing limitations and how the rise of Big Data is only going to compound the problem.

The afternoon began with a panel session on the future of open source search – my personal thanks to Daniel Lee of Artirix, Eric Pugh of Open Source Connections and RenĂ© Kriegler for leading a lively discussion on the seemingly inexorable rise of open source search and what may happen next. There were some interesting points raised on how significant investment in open source search may change the picture. We continued in the open source theme with talks on open source solutions for the City of Antibes and Shopping24, before a drinks reception and then moving to the pub across the road for the combined London and Cambridge Search Meetup. Our theme was ‘The Nightmare before Search’ – some great (and unbloggable!) war stories on crazy search implementations was followed by networking late into the night.

The next day continued with a session on search implementation from speakers including Dan Foster of Legal & General, a track on Big Data during which we heard from Eric Pugh on building a very large scale system using open source software – sadly I had to drop out at this point for meetings and only returned for the closing plenary sessions. I particularly enjoyed Kara Pernice’s insights on how to build usable intranet search and Valentin Richter’s session on migrating to a new search technology (a topic on many minds especially for those using FAST ESP which goes out of mainstream support in a couple of months). Lynda Moulton did her best to sum up what we had learnt over the last few days – a very hard job when the event covered so many aspects of search & discovery.

Many thanks to Information Today and chair Martin White as ever for organising the event – although it was an intense few days it was great to catch up with everyone and to talk search. We’re looking forward to next year – did I hear a rumour that the Europe in the title might be more emphasized next time? We shall see!

Tags: , , , ,

Posted in events

May 28th, 2013

No Comments »

Search events for 2013

Here’s a quick roundup of search-related events coming soon:

Next week Lucene/Solr Revolution is to be held in San Diego, with a couple of days of training on April 29th & 30th and the main event on the 1st and 2nd May. This is probably the biggest event dedicated to Apache Lucene/Solr and features a huge array of presentations from Etsy, Wells Fargo, Lucidworks and even Microsoft who are increasingly supporting open source technologies.

Enterprise Search Europe is next on 15th and 16th May with a day of workshops on the 14th, including one from the Flax team. I’m looking forward to the various open source panels and presentations of course, and hearing from people from Ernst & Young, Neilsen Norman Group, Oracle and the University of Manchester. We’re also running a Meetup event on the first evening, open to all, with the usual informal mix of beer, snacks and search!

Some of the Flax team are hoping to attend Berlin Buzzwords on June 3rd & 4th – this conference promises to address “search”, “store” and “scale” – certainly sounds interesting! We know there will be lots of talks on elasticsearch and Lucene/Solr.

There’s more to come in the Autumn of course – more details when we know them. Hope to meet you at one of these great events!

Why we won’t pay to play at conferences

One unedifying result of having been asked to speak on open source search at various events and conferences over the last few years is the discovery that not all events are equal – some genuinely wish to create a programme of interesting talks of value to the audience, and some simply wish to sell as much sponsorship as possible to those who would like to present. Some of the larger analyst firms are guilty of this behaviour – their Summits and Forums are often packed with talks by big-budget solution providers (and their industry sector reports similarly reflect the fact that if you pay, you play). At Flax we don’t have much budget for sponsorship so we’re often excluded, even though the talks we give are seldom if ever pushing any particular solution – a benefit of the open source model is that even if you hear about it from us you can still go and download and use the software yourself without paying us or anyone else a penny.

Luckily there are events that don’t work like this – the excellent Search Solutions day run in late Autumn by the British Computer Society and of course Enterprise Search Europe (disclaimer: I’m on the programme committee for the latter). My view is this means we get a higher quality set of talks, presenters who know and can discuss their subject rather than just reading out the company-approved Powerpoint deck, and attendees can see a wider range of views and options.

Building high-end search features at low cost with Apache Solr

One of the best things about the increased use of open source search technology is that features that were previously unattainable for clients with small budgets are now within reach. Our client Bride and Groom Direct, a UK-based business selling wedding gifts and stationery, asked us if we could help improve the search features on their website and in particular the auto-suggest – and they asked us to take a look at the website of US mega-retailer Sears.com for inspiration. They particularly liked the way that while you type, Sears’ website doesn’t just show you suggested words but also clickable picture previews of products you might be looking for.

Using Apache Solr and in under two days we built them a similar feature for their website: since we didn’t have direct access to their development servers we provided both Solr configuration files and a simple JQuery/Javascript demo of the features they needed (it’s about 170 lines of code). Their own developers then integrated these changes based on our notes. I think it’s safe to say that Bride and Groom Direct are a rather smaller business than Sears, but with open source they can have access to equally good search facilities. They’ve been kind enough to let us feature them on our Clients page and as you can see, they’re happy with the results.

Tags: , , , ,

Posted in Technical

March 1st, 2013

No Comments »

Cambridge Search Meetup – a night of crawling and scraping

Last night was the busiest ever Cambridge Search Meetup, with two excellent talks and a lot of discussion and networking. First was Harry Waye of Arachnys, who provide access to data on emerging markets that no-one else has using a variety of custom crawling technology and heavy use of tools such Google Translate. If you want to trawl the Greek corporate registry or find out financial news from Kazakhstan a standard Google search is little help: Harry talked about how Arachnys have experimented with Google Custom Search Engine and the ‘headless browser’ PhantomJS to crawl sites.

Our second talk was from Shane Evans, who I first met when he led software development for our client Mydeco. While there he first worked on the development of an open source Python crawling framework, Scrapy: Shane showed how easy it is to get a Scrapy web spider running in a few lines of code, and how extensible and customisable Scrapy is for a huge variety of crawling and scraping situations. There’s even a fully hosted version at Scrapinghub with graphical tools for setting up web crawling and page scraping. We’re big fans of Scrapy at Flax and we’ve used it in a number of projects, so it was good to see an overview of why Scrapy exists and how it can be used.

Thanks to both our speakers who both travelled from out of town as did several other attendees: we’re pleased to say this was our 15th Meetup and we now have 100 members – we’re already planning further events, one will be on the evening of the first day of the Enterprise Search Europe conference.

Tags: , , , , ,

Posted in Technical, events

February 22nd, 2013

No Comments »

Business Leaders, Open Source and free Pi

I spent last night at a networking event organised by the Business Leaders Network on the subject of Open Source Business Models – this isn’t the usual sort of event I attend, being held in a very posh law firm’s offices overlooking the Thames and with some fellow attendees from venture capital firms and investment banks. Although the panel included speakers from Canonical, Rackspace and the Raspberry Pi foundation (the gently amusing Jack Lang, a Cambridge luminary who I could have happily listened to for the full hour) the theme was generally non-technical.

Questions from the floor (and via Twitter) showed that many outside the technical sector (and probably a few within it) are still bemused at how one can build a thriving business on open source, when the panel admitted that it can involve making your intellectual property available to your competitors, giving your product away for nothing and investing heavily in community building. One of the most interesting responses from the panel indicated that an open source entrant to an existing market can shrink that market by 40-50% – a venture capitalist I spoke to afterwards couldn’t understand why this can be a positive thing: however if a market is dominated by big players selling overpriced solutions, some disruptive deflation can re-shape the market considerably: this is certainly what we’ve seen in the search sector recently, and investment in the right place and time can still reap considerable rewards (consider Elasticsearch’s recent funding).

The panel also made the point that a key part of open source success is investment in people – both within a business and in the wider community. Another question about what an open source business is actually selling prompted a range of answers: a brand, peach of mind, happiness, experience, platform were the answers given. It was clear that the discussion could have continued for a lot longer as the audience were keen to hear more, and the BLN may thus be running further open source themed events – the appetite for knowledge about open source business models outside the technical community is large.

Thanks to Mark Littlewood for organising such an interesting evening and particular thanks for the free Raspberry Pi – we have a cunning plan about what to do with it so watch this space!

Tags: , , ,

Posted in Business, events

February 7th, 2013

No Comments »

New Year predictions: further search storms ahead!

2012 has been a fascinating and stormy year for those of us in the search business. We’ve seen a raft of further acquisitions of commercial closed source search companies by bigger players, some convinced that what used to be called Enterprise Search is now a solution to Big Data (like Stephen Arnold we wonder what will succeed Big Data as the next marketing term – I love his phrase “In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing”). One acquisition hasn’t gone so smoothly: Autonomy, bought by HP for a price that no-one in the search business thought was remotely sensible, has been accused of being oversold vapourware: this is a story that will continue to develop in 2013. If you want a great overview of the current market read Martin White’s latest research note.

Here in the slightly calmer waters of open source search, we’ve seen a huge rise in enquiries from often blue-chip companies, no longer needing persuasion that open source is a serious contender for even the largest search and content projects. Often these companies have considered large commercial solutions but are put off by both the price and high-pressure marketing tactics – in a world of reduced budgets you simply can’t sell magic beans for a pile of gold. We’ve also seen increased interest in related technologies such as machine learning and automatic categorisation – search really isn’t just about search any more.

At Flax we’re busier than we have ever been and we’re expected the trend to continue. We’re looking forward to running more Cambridge Search Meetups, visiting and helping organise conferences such as Enterprise Search Europe and Lucene Revolution, building our network of carefully chosen partners and of course working on exciting and cutting-edge development projects.

As the storms in our sector continue to rage overhead we’ll simply be getting on with what we do best, building effective search.

Tags: , , , , ,

Posted in Business, News

January 3rd, 2013

No Comments »

Trading-up to open source – a safer route to effective search

It hasn’t taken long for some of Autonomy’s rivals to attempt to capitalise on the recent bad PR around HP’s acquisition – OpenText has offered a ’software trade-in’, Recommind has offered a ‘trade-up’ and Swiss company RSD has offered a free license for their governance software to Autonomy customers. No word yet from Exalead, Oracle (Endeca), Microsoft (FAST) or any of the other big commercial search companies but I’m sure their salespeople are making the most of the situation.

Migrating a search engine from one technology to another is rarely trouble-free: data must be re-indexed, query architectures rewritten, integration with external systems re-done, relevancy checked…however with sufficient forethought it can be done successfully. We’ve just helped one client migrate from a commercial engine to Apache Solr in a matter of weeks: although at first glance Solr didn’t seem to support all of the features the commercial engine provided, it proved possible to simulate them using multiple queries and with careful design for scalability, query performance is comparable.

Choosing one closed source engine to replace another doesn’t remove the risk that future corporate mergers & acquisitions will cause exactly the same lack of confidence that is no doubt affecting Autonomy customers – or huge increases in license fees, a drop in the quality of available support or the end of the product line altogether – and we’ve heard of all of these effects over the last few years. Moving to an open source search engine gives you freedom and control of the future of the technology your business is reliant upon, with a wealth of options for migration assistance, development and support.

So here’s our offer – we’d be happy to talk, for free (by phone or face-to-face for customers within reach of our Cambridge offices), to any Autonomy customers considering migration and to help them consider the open source options (some of these even have the Bayesian, probabilistic search features Autonomy IDOL provides) – and together with our partners we can also provide a level of ongoing support comparable to any closed source vendor. We don’t have salespeople, we don’t have a product to sell you and you’ll be talking directly to experts with decades of experience implementing search – and there’s no obligation to take things any further. We’d simply like to offer an alternative (and we believe, safer) route to effective search.

Tags: , , , , , , ,

Posted in News

December 5th, 2012

No Comments »

Following the money….all the way to open source search.

There’s an old saying that to find out what’s really going on, you have to “follow the money”. In the search industry two recent events have pointed the way: firstly, Attivio raised $34 million in new funding. Attivio produce a solution based on their own Active Intelligence Engine (yes, it’s still just a search engine) which itself is based on open source projects such as Apache Lucene. Secondly, this week the new(ish) company formed to offer support for the ElasticSearch open source search engine also raised funding to the tune of $10m.

From these two events we can conclude that the smart money has realised that the enterprise search market is heading in only one direction – towards open source software or solutions mainly based on it (another good example being our partner LucidWorks). News from this week’s ApacheCon in Germany of incredibly busy sessions around Lucene, Solr and ElasticSearch (as well as related and complimentary projects such as Stanbol) shows that the technical community agrees. I don’t think this will be the last time we hear of a significant investment by both the financial and technical communities in open source search.