A few years ago much marketing noise was made about Big Data. Every software vendor suddenly had a Big Data suite; you could suddenly buy Big Data capable hardware; consultants and experts would release thought pieces, blogs and books all about Big Data and how it would change the world. The reality of course was slightly different: Big Data meant…well, it meant whatever you wanted it to mean for your commercial purpose. For some people, what didn’t fit in an Excel spreadsheet was Big Data, for others with actually large collections of data to process it was often hard to sort the wheat from the PR chaff and find a solution that worked.
Those of us in the search engine sector would occasionally mention that we’d been dealing with not inconsequential amounts of data for many years (for example, the founders of Flax met while building a half-billion-page web search engine back in 1999). We already knew something about distributed computing, clusters of servers and how to scale for performance and reliability. There’s even some shared history: Hadoop, the foundation of so many Big Data architectures, was created by the same person who created the search library Lucene and the web crawler Nutch – so he could build a big search engine. As a result we ended up with suites of Big Data-capable software where the clever bit was… search technology.
We’re at a similar point now with AI. No matter how many pictures of humanoid robots they use, what people are calling AI is not the Terminator or a robot companion built by a reclusive billionaire. It’s generally a combination of techniques such as machine learning (ML) and natural language processing (NLP), some of which have been around for decades, which can (if you get them right) spot patterns in data, recognise graphical shapes, analyze human speech etc. Getting them right is the hard bit – you need good, reliable signals; models that work and most importantly clever people to put it together (and few of these people are available).
Again, some of the most interesting (and more likely to be real, rather than just a dodgy prototype thrown together in the hope that Google will buy your startup) work is happening in the world of search, where the underlying and necessary fundamentals of large-scale data processing, text processing, user interaction and matching are well understood through decades of experience. Here, AI techniques can be applied with practical results – for example, Learning to Rank which cleverly re-orders search results based on signals important to the business or user. So again, underneath the current trend we find a dependence on search technology. It’s unfortunate that some commentators have assumed that this means that everything in search is powered by magic AI – rather the reverse in some cases.
Activate, a conference previously known as Lucene Revolution and run by our partners Lucidworks, has brought together AI and search deliberately to explore these connections. We’re looking forward to attending next month – come and find us if you want to discuss your project!