Here are two relatively new networking groups – these are informal gatherings of those who work with enterprise search. I’ve been to the first one and it was very interesting.
London Open Source Social – for those working with open-source enterprise search
Enterprise Search London – more generally for those working in enterprise search
Posted in events.
Tagged with events, networking, open source.
A new year, and a chance to think about what might happen in the world of enterprise search over the next twelve months. I’ll make a stab at some predictions:
- Price cuts – possibly driven by even harsher competition between Google and Microsoft FAST, I can see prices coming down for packaged enterprise search. Autonomy will probably raise theirs
- Real time search matures – not just Twitter or Facebook, but real time data from many sources being part of enterprise search results
- More geolocation-aware search – in the U.K. at least, we’re seeing signs that the source data is finally being freed up, which should make it a lot simpler and cheaper to build location-aware solutions
- A few less second-tier players in the market – it’s still difficult out there, I’m afraid not every company will survive the next year.
You’re welcome to take any of these with a generous pinch of salt!
Posted in Business.
Tagged with autonomy, microsoft, real time.
Back at Online 2009 on Thursday, to take part in the closing panel: “Cloud Computing, Open Source and Semantics: Content and Search Predictions”, moderated by Stephen Arnold. We only touched on four of the ten controversial themes Stephen had prepared: we talked a lot about how ‘Google pressure’ will affect the market, how XML isn’t necessarily the universal panacea for representing data, on the growth of rich media and the challenges it presents and finally on security. Some great questions from the floor as well, thanks to all who came and the organisers and Stephen for inviting us. I wish we’d had more time!
I didn’t agree with Stephen’s main point that Google will crush us all – I think the battles between Google and Microsoft (and Google and everyone else) are a distraction. While they’re fighting it out the rest of us can get on with developing cutting-edge search technologies. Open source search technology gives us tremendous flexibility, allows us to develop solutions very fast, allows the customer to take ownership of the system that’s being developed and now has comparable performance, scalability and commercial support to the traditional closed source world.
The real question is how this will affect the profitability of existing companies in the search space. I wonder who won’t be around at next year’s Online Information show…
Posted in Business, News.
Tagged with events, open source, performance.
I’ve created a page with links to our Flax Newsletters – let us know if you would like to be added to the mailing list (or indeed, if you’d like to be removed from it).
Posted in News.
Tagged with flax.
I visited the Online Information exhibition yesterday at Olympia. My first impression was that the exhibition area was very quiet – and a few of the exhibitors agreed with me. The current financial situation would seem the obvious cause. At previous shows exhibitors have given away all kinds of freebies, from bags, to mini mice, to branded juggling balls….but this year you’d be lucky if you came away with a couple of free pens and a boiled sweet.
I dropped in on the associated conference later, and caught a presentation titled “The Real Time Web: Discovery vs. Search”. Antonio Gulli of Microsoft told us about their new European offices, including one in Soho, that were concentrating on bringing new features to Bing – but the results look very familiar, is Bing doomed to play catch-up? The only ‘real time’ feature he discussed was indexing Twitter, although apparently they’ll soon be indexing Facebook as well. Surely real time encompasses more than these two platforms?
Stephen Arnold gave us his thoughts on what we should mean by ‘real time’, sensibly talking about how the financial services world has been using real time systems for many years. He also injected some notes of caution about how difficult it is to trust information spread amongst peers on social networking sites – here’s a recent case, read further down the page for a great quote from Graham Cluley.
Someone from Endeca (I didn’t catch the name, he was replacing the published speaker) showed us lots of slides of various applications of search, but his theme seemed more about how search can replace traditional databases than about ‘real time’, something I’ve blogged about recently.
We finished with Conrad Wolfram, demonstrating Wolfram Alpha, which isn’t really a search engine but rather a computation engine – it tries to give you a set of answers, rather than a list of possible resources where the answer might be found. Not a lot of ‘real time’ here either.
I’m back on Thursday as part of the closing keynote panel.
Posted in Uncategorized, events.
Tagged with events, real time.
We’ve recently been working with mySkreen, who like Hulu in the U.S. provide a service for finding and viewing television programs via your web browser. mySkreen is the brainchild of Frédéric Sitterlé, previously Head of New Media at the Le Figaro media group.
mySkreen works with French-language content, and is currently indexing over 1.6 million programmes (and counting). Using Flax, you can search using programme title, actors, genres or time periods. We also added some innovative query parsing to translate fuzzy queries such as ‘tomorrow evening’ into more exact time periods, and some clever ranking so that ‘more easily available’ programmes appear higher in the search results. We also added faceted search and automatic spelling correction.
This was a fast-moving project with a very quick turnaround: we first visited mySkreen in Paris in August and delivered customised code to them less than four weeks later; the flexibility of Flax and the open source model helped to make this possible.
Posted in News.
Tagged with flax, indexing, media.
Avi Rappoport writes about ‘real-time’ search, a popular subject at the moment. Twitter search is one example of this kind of application, where a stream of new content is arriving very quickly.
From a search engine developer’s point of view there are various things to consider: how quickly new content must become searchable, how to balance this against performance demands and how to rank the results.
A lot of search engine architectures are built on the assumption that indexes won’t need to be updated very often, sacrificing index freshness for search speed, so constantly adding new content is expensive in terms of performance. One approach is to maintain several indexes: a small, fresh one and some older, static ones, with the fresh index periodically being merged into the older static set. Searches must be made across all these indexes of course, with care taken to maintain accurate statistics and thus relevancy ranking.
The question of ranking is also an interesting one: in a ‘real-time’ situation, how should we present the results – does ‘more recent’ always trump ‘more relevant’? As always, a combination of both is probably the best default approach, with an option available to the user to choose one or the other.
In any case there will always be some delay between content being published and being searchable – the trick is to keep this to the minimum, so it appears as ‘real-time’ as possible.
Posted in News, Technical.
Tagged with indexing, real time.
We sponsored Open Source Search Cambridge last week, which went very well, with attendees from as far away as Tokyo and New Zealand, a great variety of talks, presentation and networking and some excellent food!
Shane Evans from mydeco gave a detailed talk on Creating a product search engine, with some interesting details on how query-independent weights are calculate. He was followed by Olly Betts on How Gmane is implemented using Xapian – 72 million messages indexed on a single server! We also had talks from those involved with the Cheshire3 XML search engine, PuppyIR, project to develop search frameworks for children, and found out more about how Glasses Direct have implemented their search using SOLR.
The afternoon consisted of a number of well-attended seminars on search topics, such as comparisons of the various open source search engines available. The day ended with informal networking in a nearby pub.
Based on the feedback we got, there’s definitely interest in a similar event next year – watch this space.
Update: sounds like Search Solutions 2009 was also a good day.
Posted in events.
Tagged with events, lucene, open source, xapian.
As September begins, there are various events coming up that may be of interest to some of our readers. We have a list of conferences we’re attending and/or presenting at. Gartner are running their Portals, Content and Collaboration Summit in mid September in London. Also in London is E Commerce Expo 2009 in late October, which may be of interest as most e-commerce solutions will need some kind of search facility (although in our opinion many fall woefully short, failing to implement such features as spelling correction and synonyms).
For more Enterprise Search events, there’s a calendar provided by Information Today which is pretty exhaustive.
Posted in Business, News.
Tagged with events.
One of the things we often notice about existing systems based on relational databases (RDB) is that as they scale to millions of items, simple lookup tasks become slow and inefficient. These tasks don’t usually require complicated database operations, so in most cases it is possible to relocate the data from the RDB into a search engine like Flax.
Consider a system where a search engine has already been implemented to search textual product information, but numerical data on each product, such as price, is still being stored in a RDB. Users will often need filters on search results such as ’show me items under £10′ and so a RDB operation similar to ‘SELECT productID FROM products WHERE price<£10‘ will be needed, in addition to the search engine query. Modern search engines like Flax implement range search functions, so that numerical information can be added to documents, and it is thus possible to carry out this operation in the search engine as part of the full-text search for the product information.
We’ve noticed with several clients that it is now possible to move all their data from the original RDB into the search engine. This can obviously lead to cost savings, as only one system must be hosted, maintained and backed up, and scaling out can be far simpler.
Another way to look at this is to consider a search engine as an example of a document-oriented database.
Posted in Technical.
Tagged with database, flax.