Archive for the ‘Business’ Category

Analysts getting a bad press – how can they do better?

It seems to be a bad summer for analyst companies in several sectors: here’s Forrester getting a kicking from Digital Clarity Group about their Wave report on Digital Experience Delivery Platforms (my first challenge was understanding what on earth those are, but I think it’s a new shiny name for web content management), Nuix putting the boot into Gartner about their eDiscovery Magic Quadrant, and Stephen Few jumping up and down in hobnail boots on both analyst firms about Business Intelligence (insert your own joke here), complete with a not particularly enlightening reply from Forrester themselves.

Miles Kehoe has already taken a look at Gartner’s Magic Quadrant report on our own Enterprise Search sector. I’ve written before on how I don’t think open source solutions are particularly well treated by the large analyst firms, as they often focus on vendors only. The world has somewhat changed though and five of the seventeen vendors mentioned are using a base of open source technology, so at least some of this major part of the market is covered.

However the problem remains that the MQ ignores a great deal of the enterprise search sector: it doesn’t cover Sharepoint with its FAST-derived search facility, Oracle’s Endeca (which apparently is now no longer available as a standalone product, a surprise to me), Funnelback (which is again incorrectly labelled as open source – it’s the Squiz CMS software that’s open source, not the search engine they bought) or the rising star of Elasticsearch. If you were new to the sector you might conclude that none of these options are available to you. Gartner itself says “This Magic Quadrant introduces search managers and information architects in end-user organizations to the range of enterprise search vendors they can choose from” – but this range is severely and artificially restricted.

Let’s hope that the analyst firms take note of some of this bad press – perhaps it’s time to change approach, be more open about biases and methodologies, and stop producing hugely oversimplified diagrams to characterise complex and deep business sectors.

Tags: , , , ,

Posted in Business

July 30th, 2014

1 Comment »

Why GCloud search is badly broken & how to fix it

The GCloud initiative and the associated CloudStore are a great idea – hoping to level the field of UK government IT supply, take advantage of flexible and agile delivery of software and services and help SMEs like ourselves compete against the large System Integrators (SIs) that dominate this market. GCloud sales have now reached £154m although this is still a fraction of what the UK government spends on IT. We’re on GCloud 5 ourselves by the way so I have a vested interest in helping potential customers find us, and we’ve helped with government systems before.

Unfortunately the Cloudstore itself has a search facility that is badly broken. There are several obvious issues: many of the entries created by the larger suppliers have been keyword stuffed – here’s a particularly egregious example from Atos which seems to include most of the terms used in software in the last few years. I found this using the search terms ‘enterprise search’ which produces very few relevant looking results. The online guidance for CloudStore search suggests putting double quotes around my terms (sadly I think few users will think of this) which improves things a little but there are still a lot of irrelevant results – an online conferencing system is fifth for example.

Fortunately all is not lost and in the next iteration of GCloud we are promised major improvements to the search engine. I’m hoping this will include phrase boosting. However, if the big SIs and others are allowed to create the sort of bad-quality content I have shown above, no search engine in the world will be able to sort the wheat from the chaff. It is essential that CloudStore entries are subject to some kind of curation and that keyword stuffing is banned and/or heavily penalised, otherwise SMEs like ourselves will still find it very hard to compete with the big SIs.

Update: it seems there is a new system under construction, and the search works a lot better. Let’s hope it comes out of alpha soon and can be used by purchasers!

Tags: , , ,

Posted in Business, Technical

June 26th, 2014

No Comments »

As Hadoop gains, does Lucene benefit?

The last few weeks have seen a rush of investment in companies that offer Hadoop-powered Big Data platforms – the most recent being Intel’s investment in Cloudera, but Hortonworks has also snorted up $100m.

Gartner correctly explains that Hadoop isn’t just one project, but an ecosystem comprising an increasing number of open source projects (and some closed source distributions and add-ons). Once you’ve got your Big Data in a HDFS-shaped pile, there are many ways to make sense of it – and one of those is a search engine, so there’s been a lot of work recently trying to add Lucene-powered search engines such as Apache Solr and Elasticsearch into the mix. There’s also been some interesting partnerships.

I’m thus wondering whether this could signal a significant boost to the development of these search projects: there are already Lucene/Solr committers working at Hadoop-flavoured companies who have been working on distributed search and other improvements to scalability. Let’s hope some of the investment cash goes to search!

Convergence and collisions in Enterprise Search

At the end of next month I’ll be at Enterprise Search Europe (I’m on the programme committee and help with the open source track) and the opening keynote this year is from Dale Roberts, author of the book Decision Sourcing. Dale will be talking about how Social, Big Data, Analytics and Enterprise Search are on a collision course and business leaders ignore these four themes at their peril.

So I wondered if we could see how in practical terms one might build systems based on these four themes. There are technical and logistical challenges of course (not least convincing someone to pay for the effort) but it’s worth exploring nonetheless.

Social in a business context can mean many things: social media is inherently noisy (and as far as I can see mostly cats) but when social tools are used within a business they can be a great way to encourage collaboration. We ourselves have added social features to search applications – user tagging of search results for example, to improve relevance for future searches and to help with de-duplication. Much has been made of the idea of finding not just relevant documents, but the subject matter experts that may have written them, or just other people in your organisation who are interested in the same subject. From a technical point of view none of this is particularly hard – you just have to add these social signals to your index and surface them in some intuitive way – but getting a high enough percentage of users to contribute to shared discussions and participate in tagging can be difficult.

Big Data is an overused term – but in a business context people usually apply it to very large collections of log files or other data showing how your customers are interacting with your business. A lot of search engine experts will tell you that Big Data isn’t always that ‘big’ – we’ve been dealing with collections of hundreds of millions or even billions of indexed items for many years now, the trick is scaling your solution appropriately (not just in technical terms, but in an economic way, as linearly as possible). If you’ve got a few million items, I’m sorry but you haven’t got Big Data, you’ve just got some data.

I’ve always been unsure of the benefits of search Analytics but I’m beginning to change my mind, having seen a some very impressive demos recently. Search engines have always counted things; the clever bit is allowing for queries that can surface unusual or interesting information, and using modern visualisation techniques to show this. Knowing the most popular search term may not be as important as spotting an unexpected one.

So we’ve indexed our data including tags, personnel records, internal chatrooms; put them all onto a elastically scalable platform and built some intuitive and useful interfaces to search and analyze our data. I’m pretty sure you could do all this with the open source technologies we have today (including Scrapy, Apache Lucene/Solr, Elasticsearch, Apache Hadoop, Redis, Logstash, Kibana, JQuery, Dropwizard, Python and Java). This isn’t the whole story though: you’d need a cross-disciplinary team within your organisation with the ability to gather user requirements and drive adoption, a suitable budget for prototyping, development and ongoing support and refinements to the system and a vision encompassing the benefits that it would bring your business. Not an inconsiderable challenge!

What questions should we be able to ask the system? I’ll leave that as an exercise for the reader.

See you in April! If you’d like a 20% discount on registration use the code HULL20. We’ll also be running an evening Meetup on Tuesday 29th April open to both conference attendees and others.

G-Cloud and open file formats, a cautionary tale

We’re lucky enough to have our services available on the G-Cloud, a new initiative by the UK Government’s Cabinet Office with the aim of breaking the sometimes monopolistic practices of ‘big IT’ when supplying government clients. We’ve recently had a couple of contracts procured via the G-Cloud iii framework and one of the requirements is to report whenever a client is invoiced. This is done via a website called Management Information Systems Online (MISO).

Part of the process is to input various mysterious Product Codes, and to find out what these were I downloaded a file from the MISO website. I use the Firefox browser and OpenOffice so I had assumed that opening this file would be a relatively simple process…perhaps unwisely.

Firstly, due to some quirk of the website and/or browser the file arrives with no file extension. I’m assuming it’s some kind of Microsoft Office document so I try renaming it to .xls as an Excel spreadsheet, and open it in OpenOffice Calc. This doesn’t work, as I end up with a load of XML in the spreadsheet cells. As it’s XML I wonder if it’s a newer, XML-powered Office format, so rename to .xlsx, but no, it seems that doesn’t work either. Opening up the file in a text editor shows it’s some kind of XML with Microsoft schemas abounding. At this point I tried contacting the MISO technical support department but they weren’t able to help.

A quick Google and I’ve discovered that the file is probably SpreadsheetML, a file format used before 2007 when Microsoft finally went the whole hog and embraced (well, forced everyone else to embrace) their own XML-based standard for Office documents. The latter format is something OpenOffice can easily read, so I try renaming the file as .xml and importing it. OpenOffice now tells me "OpenOffice.org requires a Java runtime environment (JRE) to perform this task. The selected JRE is defective."

This is now taking far too long. After some more research I discover what this actually means is OpenOffice needs a version of Java 6 (now discouraged by Oracle). I have to register for an Oracle account to even download it. Finally, Open Office is able to read the file and I can now fill in the original form.

If anything this process proves that central government has a long way to go towards adopting open standards and using plain, widely adopted file formats. The G-Cloud framework is a great step forward – but some of the details still need some work.

Business Leaders, Open Source and free Pi

I spent last night at a networking event organised by the Business Leaders Network on the subject of Open Source Business Models – this isn’t the usual sort of event I attend, being held in a very posh law firm’s offices overlooking the Thames and with some fellow attendees from venture capital firms and investment banks. Although the panel included speakers from Canonical, Rackspace and the Raspberry Pi foundation (the gently amusing Jack Lang, a Cambridge luminary who I could have happily listened to for the full hour) the theme was generally non-technical.

Questions from the floor (and via Twitter) showed that many outside the technical sector (and probably a few within it) are still bemused at how one can build a thriving business on open source, when the panel admitted that it can involve making your intellectual property available to your competitors, giving your product away for nothing and investing heavily in community building. One of the most interesting responses from the panel indicated that an open source entrant to an existing market can shrink that market by 40-50% – a venture capitalist I spoke to afterwards couldn’t understand why this can be a positive thing: however if a market is dominated by big players selling overpriced solutions, some disruptive deflation can re-shape the market considerably: this is certainly what we’ve seen in the search sector recently, and investment in the right place and time can still reap considerable rewards (consider Elasticsearch’s recent funding).

The panel also made the point that a key part of open source success is investment in people – both within a business and in the wider community. Another question about what an open source business is actually selling prompted a range of answers: a brand, peach of mind, happiness, experience, platform were the answers given. It was clear that the discussion could have continued for a lot longer as the audience were keen to hear more, and the BLN may thus be running further open source themed events – the appetite for knowledge about open source business models outside the technical community is large.

Thanks to Mark Littlewood for organising such an interesting evening and particular thanks for the free Raspberry Pi – we have a cunning plan about what to do with it so watch this space!

Tags: , , ,

Posted in Business, events

February 7th, 2013

No Comments »

Phony wars: the battle between Solr and Elasticsearch

The most well known open source search engine, Apache Lucene/Solr, has a rival in Elasticsearch, also based on Apache Lucene. Or maybe it doesn’t. I’m not convinced that there’s an actual battle going on here, above and beyond the fact that the commercial companies formed to support each technology (Lucidworks and Elasticsearch [the company]) are obviously competitors. Let’s look at the evidence:

  • Elasticsearch contains (by some measures) 64 years of effort, Solr only 55 years….a point to Elasticsearch!
  • Elasticsearch commits are 31% down on last year, Solr commits are 85% up…a point to Solr!
  • There are more books about Solr than Elasticsearch…a point to Solr!
  • Elasticsearch, sorry elasticsearch, has a cool lower case logo and fancy website…a point to Elasticsearch!

This is of course before we get to any actual technical differences in terms of performance, scalability, ease-of-use etc. which are probably a lot more important than the list above. There are vocal critics and supporters of each project on Twitter and other media, but the great thing in our view is that there is a choice of two such excellent search technologies, both open source, so for real world applications one can try both at little cost and choose whichever is most appropriate (there are even proven migration routes between the two – we’ve helped one client with this process).

Tags: , , , ,

Posted in Business, Technical

January 14th, 2013

3 Comments »

New Year predictions: further search storms ahead!

2012 has been a fascinating and stormy year for those of us in the search business. We’ve seen a raft of further acquisitions of commercial closed source search companies by bigger players, some convinced that what used to be called Enterprise Search is now a solution to Big Data (like Stephen Arnold we wonder what will succeed Big Data as the next marketing term – I love his phrase “In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing”). One acquisition hasn’t gone so smoothly: Autonomy, bought by HP for a price that no-one in the search business thought was remotely sensible, has been accused of being oversold vapourware: this is a story that will continue to develop in 2013. If you want a great overview of the current market read Martin White’s latest research note.

Here in the slightly calmer waters of open source search, we’ve seen a huge rise in enquiries from often blue-chip companies, no longer needing persuasion that open source is a serious contender for even the largest search and content projects. Often these companies have considered large commercial solutions but are put off by both the price and high-pressure marketing tactics – in a world of reduced budgets you simply can’t sell magic beans for a pile of gold. We’ve also seen increased interest in related technologies such as machine learning and automatic categorisation – search really isn’t just about search any more.

At Flax we’re busier than we have ever been and we’re expected the trend to continue. We’re looking forward to running more Cambridge Search Meetups, visiting and helping organise conferences such as Enterprise Search Europe and Lucene Revolution, building our network of carefully chosen partners and of course working on exciting and cutting-edge development projects.

As the storms in our sector continue to rage overhead we’ll simply be getting on with what we do best, building effective search.

Tags: , , , , ,

Posted in Business, News

January 3rd, 2013

No Comments »

Autonomy & HP – a technology viewpoint

I’m not going to comment on the various financial aspects of the recent news about HP’s write-down of the value of its Autonomy acquisition – others are able to do this far better than me – but I would urge anyone interested to re-read the documents Oracle released earlier this year. However, I am going to write about the IDOL technology itself (I’d also recommend Tony Byrne’s excellent post).

Autonomy’s ability to market its technology has never been in doubt: aggressive and fearless, it painted IDOL as unique and magical, able to understand the meaning of data in multiple forms. However, this has never been true; computers simply don’t understand ‘meaning’ like we do. IDOL’s foundation was just a search engine using Bayesian probabilistic ranking; although most other search technologies use the vector space model there are a few other examples of this approach: Muscat, a company founded a few years before and literally across the hall from Autonomy in a Cambridge incubator, grew to a £30m business with customers including Fujitsu and the Daily Telegraph newspaper. Sadly Muscat was a casualty of the dot-com years but it is where the founders of Flax first met and worked together on a project to build a half-billion-page web search engine.

Another even less well-known example is OmniQ, eventually acquired and subsequently shelved by Sybase. Digging in the archives reveals some familiar-sounding phrases such as “automatically capture and retrieve information based on concepts”.

Originally developed at Muscat, the open source library Xapian also uses Bayesian ranking and we’ve used this successfully to build systems for the Financial Times, Newspaper Licensing Agency and Tait Electronics. Recently, Apache Lucene/Solr version 4.0 has introduced the idea of ‘pluggable’ ranking models, with one option being the Bayesian BM25. It’s important to remember though that Bayesian ranking is only one way to approach a search problem and in many cases, simply unnecessary.

It certainly isn’t magic.

Flax partners with open source support specialists Sirius Corporation

We’re very happy to announce we’ve partnered with Sirius Corporation. Sirius are the leading U.K. provider of managed services, support and training for open source software with an impressive and growing list of clients including Canonical, Médecins Sans Frontières and the Met Office. We’ve recently carried out a major project for which Sirius will be providing ongoing support on a 24/7 SLA basis and we’re looking forward to further collaboration with this energetic, highly professional and skilled company.

We’re also happy to announce that Flax and Sirius will be co-hosting a free, half day event on Open Source Enterprise Search on Friday 20th July from 9.30 a.m. Held at the Sirius Corporation offices in Weybridge, Surrey, this will be an opportunity to find out how open source search can directly benefit your business. Whether you need search over documents on an intranet or database, pages on a website or more specialised applications such as media monitoring, taxonomy and classification, open source technologies can offer an economical and highly scalable route to success. The event will feature focussed briefings, networking and discussion with leading experts in the field. It’s completely free to attend and breakfast, refreshments and a riverside barbeque lunch will be provided.

You can register and find out more online.

Tags: , , , ,

Posted in Business, News

June 28th, 2012

No Comments »