Elasticsearch, Kibana and duplicate keys in JSON

Posted on August 3, 2017 by Tom

JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:

{
  "foo": "spam",
  "foo": "eggs",
  ...
}

Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading

London Lucene/Solr Meetup – Introducing Marple & Solr Classification

Posted on March 27, 2017 by Charlie Hull

A small crowd for this month's London Lucene/Solr Meetup, kindly hosted by Barclays in their sumptuous Canary Wharf offices. I introduced the Meetup and spoke briefly on how Flax is currently looking for team members (want to work on a variety of cutting-edge open source search projects in the UK and abroad? Get in touch!) before introducing Flax's Alan Woodwar...Continue reading

Working with Hadoop, Kafka, Samza and the wider Big Data ecosystem

Posted on March 3, 2016 by Charlie Hull

We've been working on a number of projects recently involving open source software often quoted as 'Big Data' solutions - here's a quick overview of them. The grandfather of them all of course is Apache Hadoop, now not so much a single project as an ecosystem including storage and processing for potentially huge amounts of data, spread across clusters of machines. Interestingly Hadoop was originally created by D...Continue reading

Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others

Posted on February 15, 2016 by Charlie Hull

Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Posted in Biotechnology, Blog, Events | Tagged bioinformatics, biology, biosolr, DIH, django, elasticsearch, indexing, lucene, python, redis, scaling, SOLR, sphinx, sql | Leave a reply

Better search for life sciences at the BioSolr Workshop, day 1 – Apache Lucene/Solr

Posted on February 10, 2016 by Charlie Hull

Reply

Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Posted in Biotechnology, Blog, Events, Presentations | Tagged bioinformatics, biology, biosolr, EBI, EMBL-EBI, federated search, high availability, indexing, lucene, MySQL, NCBI, SOLR, xjoin | Leave a reply

XJoin for Solr, part 2: a click-through example

Posted on January 29, 2016 by Tom Winch

Reply

In my last blog post, I demonstrated how to set up and configure Solr to use the new XJoin search components we've developed for the BioSolr project, using an example from an e-commerce setting. This time, I'll show...Continue reading

Posted in Biotechnology, Blog, E-commerce, Reference, Technical | Tagged biosolr, click-through, e-commerce, example, filtering, indexing, lucene, SOLR | Leave a reply

The fun and frustration of writing a plugin for Elasticsearch for ontology indexing

Posted on January 27, 2016 by Matt Pearce

6

As part of our work on the BioSolr project, I have been continuing to work on the various Elasticsearch ontology annotation plugins (note that even though the project started with a focus on Solr - thus the name - we have also been developing some features for Ela...Continue reading

Posted in Biotechnology, Blog, Reference, Technical | Tagged bioinformatics, biosolr, elasticsearch, indexing, ontology, plugins, SOLR | 6 Replies

XJoin for Solr, part 1: filtering using price discount data

Posted on January 25, 2016 by Tom Winch

5

In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.
What is XJoin?
...Continue reading

Posted in Biotechnology, Blog, E-commerce, Technical | Tagged bioinformatics, ecommerce, example, filtering, indexing, java, lucene, patch, python, SOLR, xjoin | 5 Replies

Elasticsearch vs. Solr: performance improvements

Posted on December 18, 2015 by Tom

Reply

I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading

Posted in Blog, Technical | Tagged elasticsearch, indexing, performance, SOLR | Leave a reply

Elasticsearch London Meetup: Templates, easy log search & lead generation

Posted on January 30, 2015 by Charlie Hull

1

After a long day at a Real Time Analytics event (of which more later) I dropped into the Elasticsearch London User Group, hosted by Red Badger and provided with a ridiculously huge amount of pizza (I have a theory that you'll be able to spot an Elasticsearch developer in a few years by the size of their pizza-filled belly). ...Continue reading

Posted in Blog, Meetups | Tagged elasticsearch, events, indexing, networking, performance, real time | 1 Reply

Post navigation

← Older posts