hackday – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Lucene Hackdays in London & Montreal http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/ http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/#respond Tue, 23 Oct 2018 09:35:13 +0000 http://www.flax.co.uk/?p=3919 We ran a couple of Lucene Hackdays over the last couple of weeks: a chance to get together with other people working on open source search, learn from each other and to try and improve both Lucene and associated software. … More

The post Lucene Hackdays in London & Montreal appeared first on Flax.

]]>
We ran a couple of Lucene Hackdays over the last couple of weeks: a chance to get together with other people working on open source search, learn from each other and to try and improve both Lucene and associated software.

Our first Hackday was in London, hosted by Mimecast at their offices near Moorgate. Despite a fire alarm practice (during which we ended up under some flats at the Barbican, whose residents may have been a little surprised at quite how many people ended up milling around under their balconies) we had a busy day – we split into three groups to look at tools for inspecting Lucene indexes, various outstanding bugs and issues with Lucene and Solr and to review a well-known issue where different Solr replicas can provide slightly different result ordering. By 5.30 p.m. when we were scheduled to finish we were still frantically hacking on some last-minute Javascript to add a feature to our Marple index inspector – luckily a few minutes later to a collective sigh of relief we had it working and we repaired to a local pub for food and drink (kindly sponsored by Elastic).

The next week a number of us were in Montreal for the Activate conference (previously known as Lucene/Solr Revolution but now sprinkled with cutting-edge AI fairy dust!). Our second Hackday was hosted by Netgovern and we worked on various Lucene/Solr issues, some improvements to our Harahachibu proxy (which attempts to block Solr updates when disk space is low) and discussed in depth how to improve the Solr onboarded experience. Pizza (sponsored by OneMoreCloud) and coffee fueled the hacking and we also added some new features including a Query Parser for MinHash queries. Many Lucene/Solr committers attended and afterwards we met up for a drink & food nearby (thanks to Searchstax for sponsoring this!) where we were joined by a few others – including Yonik Seeley, creator of Solr.

Next it was time for Activate – of which more later! Thanks to everyone who attended – you can see some notes and links about what we worked on here. Work will be continuing on these issues I’m sure.

The post Lucene Hackdays in London & Montreal appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/23/lucene-hackdays-in-london-montreal/feed/ 0
A fabulous FactHack for Full Fact http://www.flax.co.uk/blog/2017/01/27/fabulous-facthack-full-fact/ http://www.flax.co.uk/blog/2017/01/27/fabulous-facthack-full-fact/#respond Fri, 27 Jan 2017 10:49:20 +0000 http://www.flax.co.uk/?p=3412 Last week we ran a hackday for Full Fact, hosted by Facebook in their London office. We had planned to gather a room full of search experts from our London Lucene/Solr Meetup and around twenty people attended from a range … More

The post A fabulous FactHack for Full Fact appeared first on Flax.

]]>
Last week we ran a hackday for Full Fact, hosted by Facebook in their London office. We had planned to gather a room full of search experts from our London Lucene/Solr Meetup and around twenty people attended from a range of companies including Bloomberg, Alfresco and the European Bioinformatics Institute, including a number of Lucene/Solr committers.

Mevan Babakar of Full Fact has already written a detailed review of the day, but to summarise we worked on three areas:

  • Building a web service around our Luwak stored query engine, to give it an easy-to-use API. We now have an early version of this which allows Full Fact to check claims they have previously fact checked against a stream of incoming data (e.g. subtitles or transcripts of political events).
  • Creating a way to extract numbers from text and turn them into a consistent form (e.g. ‘eleven percent’, ‘11%’, ‘0.11’) so that we can use range queries more easily – Derek Jones’ team researched existing solutions and he has blogged about what they achieved.
  • Investigating how to use natural language processing to identify parts of speech and tag them in a Lucene index using synonyms and token stacking, to allow for queries such as ‘<noun> is rising’ to match text like ‘crime is rising’ – the team forked Lucene/Solr to experiment with this.

We’re hoping to build on these achievements to continue to support Full Fact as they develop open source automated fact checking tools for both their own operations and for other fact checking organisations across the world (there were fact checkers from Argentina and Africa attending to give us an international perspective). Our thanks to all of those who contributed.

I’ve also introduced Full Fact to many others within the search and text analytics community and we would welcome further contributions from anyone who can lend their expertise and time – get in touch if you can help. This is only the beginning!

The post A fabulous FactHack for Full Fact appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/01/27/fabulous-facthack-full-fact/feed/ 0
Just the facts with Solr & Luwak http://www.flax.co.uk/blog/2017/01/04/just-facts-solr-luwak/ http://www.flax.co.uk/blog/2017/01/04/just-facts-solr-luwak/#respond Wed, 04 Jan 2017 15:58:19 +0000 http://www.flax.co.uk/?p=3406 It won’t have escaped your notice that factchecking is very much in the news recently due to last year’s political upheavals in both the US and UK and the suspected influence of fake news on voters. Both traditional and social … More

The post Just the facts with Solr & Luwak appeared first on Flax.

]]>
It won’t have escaped your notice that factchecking is very much in the news recently due to last year’s political upheavals in both the US and UK and the suspected influence of fake news on voters. Both traditional and social media organisations are making efforts in this area; examples include Channel 4 and Facebook.

At our recent London Lucene/Solr Meetup UK charity Full Fact spoke eloquently on the need for automated factchecking tools to help identify and correct stories that are demonstrably false. They’ve also published a great report on The State of Automated Factchecking which mentions both Apache Solr and our powerful stored query library Luwak as components of their platform. We’ve been helping FullFact with their prototype factchecking tools for a while now but during the Meetup I suggested we might run a hackday to develop these further.

Thus I’m very pleased to announce that Facebook have offered us a venue in London for the hackday on January 20th (register here). Many Solr developers, including several committers and PMC members, are signed up to attend already. We’ll use Full Fact’s report and their experiences of factchecking newspapers, TV’s Question Time and Hansard to design and build practical, useful tools and identify a future roadmap. We’ll aim to publish what we build as open source software which should also benefit factchecking organisations across the world.

If you’re concerned about the impact of fake news on the political process and want to help, join the Meetup and/or donate to Full Fact.

The post Just the facts with Solr & Luwak appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/01/04/just-facts-solr-luwak/feed/ 0
A tale of two cities (and two Lucene Hackdays) http://www.flax.co.uk/blog/2016/10/21/tale-two-cities-two-lucene-hackdays/ http://www.flax.co.uk/blog/2016/10/21/tale-two-cities-two-lucene-hackdays/#respond Fri, 21 Oct 2016 10:27:00 +0000 http://www.flax.co.uk/?p=3365 To mark Flax’s 15th anniversary we ran two Lucene Hackdays recently, in London and Boston. I even made some Flax cakes! The London event was attended by around 20 people from companies both large and small and kindly hosted by … More

The post A tale of two cities (and two Lucene Hackdays) appeared first on Flax.

]]>
cuj9dqlvyaak6uc-jpg-large
To mark Flax’s 15th anniversary we ran two Lucene Hackdays recently, in London and Boston. I even made some Flax cakes! The London event was attended by around 20 people from companies both large and small and kindly hosted by Bloomberg (who are currently very active in the Lucene/Solr community). We split up into a number of groups to work on a range of projects. Erica Sundberg from Blackrock took a group of beginners through installing Solr and indexing their first collection, while also considering how a minimal Solr example could be built (some of the shipped examples being rather complex). Another team led by Christine Poerschke of Bloomberg looked at a way to avoid slightly different statistics being returned from different Solr replicas (which can cause result ordering to appear to ‘jump’) and Diego Ceccarelli looked at adding BM25F ranking to Lucene. Other groups looked at SQL streaming with Solr (committer Joel Bernstein dialed in via Skype to help) and Flax’s Alan Woodward worked on Marple, a browser-based explorer for Lucene indexes. The day finished with a curry dinner kindly sponsored by Alfresco.

Several days later we ran a similar Hackday in Boston, as many Lucene people were in town for Lucene Revolution. Many more Lucene/Solr committers attended this time and enjoyed a chance to work on their own projects or to continue some of the work we’d started in London. Doug Turnbull came up with a way to do BM25F ranking with existing Lucene features while Alexandre Ravalovitch and I had a long conversation about minimal Solr examples and improving the way beginners can start with Solr. Other projects included new field types for Lucene, improved highlighters and DocValues. BA Insight were kind enough to provide the venue and Lucidworks sponsored drinks and snacks later in the pub downstairs.

We’ve gathered notes on what we worked on with links to some of the software we developed here – please do get involved if you can! In particular the Marple project is attracting further contributions (and interest from those who developed and maintain the existing Luke Lucene index inspector).

I’d like to thank everyone who came to the Hackdays, our generous sponsors for providing venues, food and drink and to those who helped organise the events. The feedback has been excellent (and do let us know if you have any further comments) and people seem keen for this to be a regular event before the annual Lucene Revolution conference – a chance to work on Lucene-based projects outside of regular work, to meet, network and spend time with other contributors and to enjoy being part of a great open source community. We’ll be back!

The post A tale of two cities (and two Lucene Hackdays) appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/10/21/tale-two-cities-two-lucene-hackdays/feed/ 0
Not one, but three Lucene hackdays coming soon! http://www.flax.co.uk/blog/2016/08/24/not-one-three-lucene-hackdays-coming-soon/ http://www.flax.co.uk/blog/2016/08/24/not-one-three-lucene-hackdays-coming-soon/#respond Wed, 24 Aug 2016 14:07:51 +0000 http://www.flax.co.uk/?p=3353 We’re always keen to get more people involved in the Lucene search community – there’s always lots to do, from deep hacking of the core code, to testing with different frameworks and clients, to creating documentation and examples. It’s also … More

The post Not one, but three Lucene hackdays coming soon! appeared first on Flax.

]]>
We’re always keen to get more people involved in the Lucene search community – there’s always lots to do, from deep hacking of the core code, to testing with different frameworks and clients, to creating documentation and examples. It’s also just over fifteen years since Tom Mortimer and I founded Flax and we thought we should mark this birthday with some kind of event! So I’m thus very happy to announce we’ll be involved in three Lucene hackday events over the next two months:

Firstly, Dr. Leif Azzopardi has kindly invited us to speak and participate in the Lucene4IR Workshop to be held at the University of Strathclyde in Glasgow on 8th & 9th September 2016. The event is aimed at those in academia wanting to get more involved in practical applications of Lucene and we are also hoping they will also contribute ideas from cutting-edge information retrieval research. We’ll be giving the keynote talk on how we use Lucene-based search engines in industry and also getting involved in the coding sessions later. There are some free places (although registration is only £69) and there are even some travel grants available.

A month later on 7th October we’re running a Lucene Hackday in London as part of our London Lucene/Solr Usergroup (note that Elasticsearch users are also very welcome to this and the other events mentioned). Bloomberg are kindly providing a venue, we’ll have Lucene committers on hand to guide us.

The next week is the largest Lucene event of the year, Lucene Revolution – but we’ll be in Boston a couple of days early on Tuesday 10th October to run a Boston Lucene Hackday. BA Insight are our hosts this time and we’re hoping some of those coming to Revolution later in the week will be able to participate.

So all we need is you – bring a laptop, your ideas for new things to add to or do with Lucene. We’ll even provide cake for Flax’s birthday at the latter two events! Feel free to suggest what we should hack on in the comments below.

The post Not one, but three Lucene hackdays coming soon! appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/08/24/not-one-three-lucene-hackdays-coming-soon/feed/ 0
London Lucene Solr Meetup – Enterprising attitudes to open source search & query completion strategies http://www.flax.co.uk/blog/2016/05/18/london-lucene-solr-meetup-enterprising-attitudes-open-source-search-query-completion-strategies/ http://www.flax.co.uk/blog/2016/05/18/london-lucene-solr-meetup-enterprising-attitudes-open-source-search-query-completion-strategies/#respond Wed, 18 May 2016 15:34:23 +0000 http://www.flax.co.uk/?p=3279 Last night the London Lucene Solr Meetup was hosted by Elsevier in their Finsbury Square offices. Our first speaker was Martin White, expert consultant, author of many books about enterprise search and intranets and visiting professor at the University of … More

The post London Lucene Solr Meetup – Enterprising attitudes to open source search & query completion strategies appeared first on Flax.

]]>
Last night the London Lucene Solr Meetup was hosted by Elsevier in their Finsbury Square offices. Our first speaker was Martin White, expert consultant, author of many books about enterprise search and intranets and visiting professor at the University of Sheffield (oh, and Flax partner). Martin showed us some scary numbers about the terribly low level of satisfaction with enterprise search, drawing on research from AIIM and Findwise (I highly recommend you contribute to their ongoing survey if you can, it’s a great resource). An example is that around 55% of people in enterprises find it ‘very difficult’ to find information which can have a huge effect on productivity. Martin suggested that there is a huge opportunity for open source search in the enterprise market, but that we need a way of communicating the benefits to non-technical staff – as these people are generally the ones in charge of budgets. He ended with a suggestion that a trade association for smaller, independent search companies could be formed, an idea I’m going to further explore.

After a short break we continued with Tomasz Sobczak of Findwise (who had travelled from Poland especially to speak) on query completion strategies – you’ll have seen this feature where a search system suggests endings for the query you’ve begun to type. He described the various applications of this (including completing place names in map searches and available products in e-commerce) and described the many ways it can be implemented in Solr: facet.prefix, facet.contains, using N-grams, Shingles, the Suggester component, queries using synonyms and the Terms component. He noted the various pros and cons of each approach including how they may affect performance and suggested how a separate Solr index might be used purely for query completion. Data for query completion should also be clean and secure (you don’t want to show something the user isn’t allowed to know exists via query completion!). He finished with an example from Findwise’s work for Ericsson.

After the talks we had a brief discussion around how some of the less exciting features of Solr might be improved (we’ve blogged about our search for sponsorship for fixing some of these issues) and the suggestion arose that we might run some more Solr hackdays, in London or even the U.S.A. We’ll be looking into this possibility.

Thanks to our hosts, speakers and indeed everyone who came – see you next time!

The post London Lucene Solr Meetup – Enterprising attitudes to open source search & query completion strategies appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/05/18/london-lucene-solr-meetup-enterprising-attitudes-open-source-search-query-completion-strategies/feed/ 0
Cambridge Search Meetup – Elasticsearch Hackday http://www.flax.co.uk/blog/2014/10/03/cambridge-search-meetup-elasticsearch-hackday/ http://www.flax.co.uk/blog/2014/10/03/cambridge-search-meetup-elasticsearch-hackday/#respond Fri, 03 Oct 2014 12:32:00 +0000 http://www.flax.co.uk/blog/?p=1287 Last Friday we hosted a hackday featuring Elasticsearch in Cambridge, following a similar event last year focused on Apache Lucene/Solr. Around 20 people attended from organisations working in sectors including analytics, digital music, bioinformatics and e-commerce, and all the Flax … More

The post Cambridge Search Meetup – Elasticsearch Hackday appeared first on Flax.

]]>
Last Friday we hosted a hackday featuring Elasticsearch in Cambridge, following a similar event last year focused on Apache Lucene/Solr. Around 20 people attended from organisations working in sectors including analytics, digital music, bioinformatics and e-commerce, and all the Flax team were there as well.

We started with a brief presentation on Elasticsearch and asked around the room for any data collections we might be able to use. Lee from Elasticsearch (the company) had brought collections of UK crime data and the complete works of Shakespeare; we also had several million rows of digital music metadata, Wikipedia edit data for all UK MPs (to follow last year’s theme!) and several years of data describing Premier League football. Unlike our Solr hackday where each team worked on the same general task, this time we split into four different teams who worked on all of the above except the Wikipedia edits. We’d also been provided with a very high-performance Elasticsearch cluster by BigStep for our use, which meant it was very quick to index the above data and start working with it.

By lunchtime (the food was sponsored by Elasticsearch, who also provided stickers, plush ELKs and lollypops – thanks guys!) we had some very basic information about the various datasets – such as which scene in which Shakespeare play has the most characters on stage (the answer is 21 in Richard III), and which football teams seemed to gain the most advantage from playing at home. Note that we had already moved beyond basic search functionality to use Elasticsearch as an analytic platform, answering particular questions, using features such as aggregations.

We continued during the afternoon to develop the various applications and finished with a ‘show and tell’. Some of the teams had managed to develop user interfaces for Elasticsearch, the most polished being a clickable Google Map that would show you which types of crime were significantly above and below the national average for the area you selected – unsurprisingly in Cambridge, stolen bicycles were very common! By the end of the day, everyone had gained experience of Elasticsearch, some for the first time. We finished the day, as is traditional, with a swift pint and further networking.

Thanks to Cambridge Business Lounge (a highly recommended co-working space) for the venue, BigStep for hosting and Elasticsearch for sponsoring lunch and providing the swag, and of course to all who attended. We’ll return with a further Cambridge Search Meetup soon!

The post Cambridge Search Meetup – Elasticsearch Hackday appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/10/03/cambridge-search-meetup-elasticsearch-hackday/feed/ 0
Meetups, genomes and hack days: Grant Ingersoll visits the UK http://www.flax.co.uk/blog/2013/07/29/meetups-genomes-and-hack-days-grant-ingersoll-visits-the-uk/ http://www.flax.co.uk/blog/2013/07/29/meetups-genomes-and-hack-days-grant-ingersoll-visits-the-uk/#respond Mon, 29 Jul 2013 12:18:00 +0000 http://www.flax.co.uk/blog/?p=984 Lucene/Solr commiter, Mahout co-creator, LucidWorks co-founder and general all-round search expert Grant Ingersoll visited us last week on his way to the SIGIR conference in Dublin. We visited the European Bioinformatics Institute on the Wellcome Trust Genome Campus to hear … More

The post Meetups, genomes and hack days: Grant Ingersoll visits the UK appeared first on Flax.

]]>
Lucene/Solr commiter, Mahout co-creator, LucidWorks co-founder and general all-round search expert Grant Ingersoll visited us last week on his way to the SIGIR conference in Dublin. We visited the European Bioinformatics Institute on the Wellcome Trust Genome Campus to hear about some fascinating projects using Lucene/Solr to index genomes, phenomes and proteins and for Grant to give a talk on recent developments in both Lucene/Solr and Mahout – it was gratifying that over 50 people turned up to listen and at least 30 of these indicated they were using the technology.

After a brief rest it was then time to travel to London so Grant could talk at the Enterprise Search London Meetup on both recent developments in Lucene/Solr and what he dubbed ‘Search engine (ab)use’ – some crazy use cases of Lucene/Solr including for very fast key/value storage. Some great statistics including how Twitter make new tweets searchable in around 50 microseconds using only 8-10 indexing servers.

Next it was back to Cambridge for our own Lucene/Solr hack day in a great new co-working space. Attendees ranged from those who had never used Lucene/Solr to those with significant search expertise, and some had come from as far away as Germany – after a brief introduction we split into several groups each mentored by a member of the Flax team. Two groups (one comprised entirely of those who had never used Lucene) worked on a dataset of tweets from UK members of parliament and a healthy sense of competition developed between them – you can see some of the code they developed at in our Github account including an entity extractor webservice. Another group, led by Grant, created a SolrCloud cluster, with around 1-2 million documents split into 2 shards – running on ten laptops over a wireless connection! Impressively this was set up in less than ten minutes. Others worked on their own applications including an index of proteins and there was even some work on the Lucene/Solr code itself.

We’re hoping to put the results of some of these projects live very soon, so you can see just what can be built in a single day using this powerful open source software. Thanks to all who came, our hosts at Cambridge Business Lounge and of course Grant for his considerable energy and invaluable expertise. If nothing else, we’ve introduced a lot more people to open source search and sparked some ideas, and we ended off the week with beer in a sunny pub garden which is always nice!

The post Meetups, genomes and hack days: Grant Ingersoll visits the UK appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2013/07/29/meetups-genomes-and-hack-days-grant-ingersoll-visits-the-uk/feed/ 0