security – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 A search-based suggester for Elasticsearch with security filters http://www.flax.co.uk/blog/2017/11/16/search-based-suggester-elasticsearch-security-filters/ http://www.flax.co.uk/blog/2017/11/16/search-based-suggester-elasticsearch-security-filters/#comments Thu, 16 Nov 2017 14:30:12 +0000 http://www.flax.co.uk/?p=3639 Both Solr and Elasticsearch include suggester components, which can be used to provide search engine users with suggested completions of queries as they type: Query autocomplete has become an expected part of the search experience. Its benefits to the user … More

The post A search-based suggester for Elasticsearch with security filters appeared first on Flax.

]]>
Both Solr and Elasticsearch include suggester components, which can be used to provide search engine users with suggested completions of queries as they type:

Query autocomplete has become an expected part of the search experience. Its benefits to the user include less typing, speed, spelling correction, and cognitive assistance.

A challenge we have encountered with a few customers is autocomplete for search applications which include user-based access control (i.e. certain documents or classes of document are hidden from certain users or classes of user). In general, it is desirable not to suggest query completions to users which only match documents they do not have access to. For one thing, if the system suggests a query which then returns no results, it confounds the user’s expectation and makes it look like the system is in error. For another, suggestions may “leak” information from the system that the administrators would rather remain hidden (e.g. an intranet user could type “dev” into a search box and get “developer redundancies” as a suggestion.)

Access control logic is often implemented as a Boolean filter query. Although both the Solr and Elasticsearch suggesters have simple “context” filtering, they do not allow arbitrary Boolean filters. This is because the suggesters are not implemented as search components, for reasons of performance.

To be useful, suggesters must be fast, they must provide suggestions which make intuitive sense to the user and which, if followed, lead to search results, and they must be reasonably comprehensive (they should take account of all the content which the user potentially has access to.) For these reasons, it is impractical in most cases to obtain suggestions directly from the main index using a search-based method.

However, an alternative is to create an auxiliary index consisting of suggestion phrases, and retrieve suggestions using normal queries. The source of the suggestion index can be anything you like: hand-curated suggestions and logged user queries are two possibilities.

To demonstrate this I have written a small proof-of-concept system for a search-based suggester where the suggestions are generated directly from the main documents. Since any access control metadata is also available from the documents, we can use it to exclude suggestions based on the current user. A document in the suggester index looks something like this:

suggestion: "secret report"
freq: 16
meta:
  - include_groups: [ "directors" ]
    exclude_people: [ "Bob", "Lauren" ]
  - include_groups: [ "financial", "IT" ]
    exclude_people: [ "Max" ]

In this case, the phrase “secret report” has been extracted from one or more documents which are visible to the group “directors” (excluding Bob and Lauren) and one or more documents visible to groups “financial” and “IT” (excluding Max.) Thus, “secret report” can be suggested only to those people who have access to the source documents (if filtering is included in the suggestion query).

The proof of concept uses Elasticsearch, and includes Python code to create the main and the suggestion indexes, and a script to demonstrate filtered suggesting. The repository is here.

If you would like Flax to help build suggesters for your search application, do get in touch!

The post A search-based suggester for Elasticsearch with security filters appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/11/16/search-based-suggester-elasticsearch-security-filters/feed/ 2
Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/ http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/#respond Thu, 28 Sep 2017 08:44:26 +0000 http://www.flax.co.uk/?p=3613 I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million … More

The post Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch appeared first on Flax.

]]>
I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack based on Elasticsearch, Apache Kafka and CouchDB. After creating a proof-of-concept system on Amazon’s cloud they discovered that simply running all 3.5m Percolator queries every time a new property appeared would be too slow and thus implemented a series of filters to cut down the number of queries applied, including filtering out rental properties and those in the wrong location. They are now running around 40m saved searches per day and also plan to upgrade from their current Elasticsearch 2.4 system to the newer version 5, as well as carry out further performance improvements. After the talk I chatted to the presenter George Theofanous about our work for Bloomberg using our own library Luwak, which could be an way for Rightmove to run stored searches much more efficiently.

Next up was Signal Media, describing how they built an automated system for upgrading Elasticsearch after their cluster grew to over 60 nodes (they ingest a million articles a day and up to May 2016 were running on Elasticsearch 1.5 which had a number of issues with stability and performance). To avoid having to competely shut down and upgrade their cluster, Joachim Draeger described how they carried out major version upgrades by creating a new, parallel cluster (he named this the ‘blue/green’ method), with their indexing pipeline supplying both clusters and their UI code being gradually switched over to the new cluster once stability and performance were verified. This process has cut their cluster to only 23 nodes with a 50% cost saving and many performance and stability benefits. For ongoing minor version changes they have built an automated rolling upgrade system using two Amazon EBS volumes for each node (one is for the system, and is simply switched off as a node is disabled, the other is data and is re-attached to a new node once it is created with the upgraded Elasticsearch machine image). With careful monitoring of cluster stability and (of course) testing, this system enables them to upgrade their entire production cluster in a safe and reliable way without affecting their customers.

After the talks I announced the Search Industry Awards I’ll be helping to judge in November (please apply if you have a suitable search project or innovation!) and then spoke to Simone Scarduzio about his free Elasticsearch and Kibana security plugin, a great alternative to the Elastic X-Pack (only available to Elastic subscription customers). We’ll certainly be taking a deeper look at this plugin for our own clients.

Thanks again to Yann Cluchey for organising the event and all the speakers and hosts.

The post Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2017/09/28/elastic-london-meetup-rightmove-signal-media-new-free-security-plugin-elasticsearch/feed/ 0
Searching for (and finding) open source in the UK Government http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/ http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/#respond Fri, 17 Feb 2012 10:30:46 +0000 http://www.flax.co.uk/blog/?p=710 There have been some very encouraging noises recently about increased use of open source software by the UK Government: for example we’ve seen the creation of an Open Source Procurement Toolkit by the Cabinet Office, which lists Xapian and Apache … More

The post Searching for (and finding) open source in the UK Government appeared first on Flax.

]]>
There have been some very encouraging noises recently about increased use of open source software by the UK Government: for example we’ve seen the creation of an Open Source Procurement Toolkit by the Cabinet Office, which lists Xapian and Apache Lucene/Solr as alternatives to the usual closed source options. The CESG, the “UK Government’s National Technical Authority for Information Assurance”, has clarified its position on open source software, which has led to the Cabinet Office dispelling some of the old myths about security and open source. We know that the Cabinet Office’s ‘skunkworks’, the Government Digital Service, are using Solr for several of their projects. Francis Maude MP was recently in the USA with some of the GDS team and visited amongst others our US partners Lucid Imagination.

The British Computer Society have helped organise a series of Awareness Events for civil servants and I’m glad to be speaking at the first of these next Tuesday 21st February on open source search – hopefully this will further increase the momentum and make it even more clear that a modern Government needs to consider this modern, flexible and economically scalable approach to software.

The post Searching for (and finding) open source in the UK Government appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/02/17/searching-and-finding-open-source-in-uk-government/feed/ 0
Events: Open source for government and search in Cambridge http://www.flax.co.uk/blog/2011/04/01/events-open-source-for-government-and-search-in-cambridge/ http://www.flax.co.uk/blog/2011/04/01/events-open-source-for-government-and-search-in-cambridge/#respond Fri, 01 Apr 2011 14:04:16 +0000 http://www.flax.co.uk/blog/?p=534 We’ll be attending the Guardian’s Public Procurement Show on June 14th & 15th as part of the Open Goverment stand – with the recent release by the UK government Cabinet Office of a new IT strategy (here are some industry … More

The post Events: Open source for government and search in Cambridge appeared first on Flax.

]]>
We’ll be attending the Guardian’s Public Procurement Show on June 14th & 15th as part of the Open Goverment stand – with the recent release by the UK government Cabinet Office of a new IT strategy (here are some industry reactions) it will be interesting to see whether anyone still believes the FUD about open source in the face of the evidence.

We’re also organising another search meetup in Cambridge on April 5th, this time featuring two perspectives on learning, and will also be at a more informal gathering of open source search people on May 3rd.

The post Events: Open source for government and search in Cambridge appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/04/01/events-open-source-for-government-and-search-in-cambridge/feed/ 0
Open source intranet search over millions of documents with full security http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/ http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/#comments Wed, 26 Jan 2011 11:03:34 +0000 http://www.flax.co.uk/blog/?p=489 Last year my colleague Tom Mortimer talked about indexing security information within an open source enterprise search application, and we’re happy to announce more details of the project. Our client is an international radio supplier, who had considered both closed … More

The post Open source intranet search over millions of documents with full security appeared first on Flax.

]]>
Last year my colleague Tom Mortimer talked about indexing security information within an open source enterprise search application, and we’re happy to announce more details of the project. Our client is an international radio supplier, who had considered both closed source products and search appliances, but chose open source for greater flexibility and the much lower cost of scaling to indexes of millions of documents.

Using the Flax platform, we built a high-performance multi-threaded filesystem crawler to gather documents, translated them to plain text using our own open source Flax Filters and captured Unix file permissions and access control lists (ACLs). User logins are authenticated against an LDAP server and we use this to show only the results a particular user is allowed to see. We also added the ability to tag documents directly within the search results page (for example, to mark ‘current’ versions, or even personal favourites) – the tags can then be used to filter future results. Faceted search is also available.

You can read more about the project in a case study (PDF) and Tom’s presentation slides (PDF) explain more about the method we used to index the security information.

The post Open source intranet search over millions of documents with full security appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/01/26/open-source-intranet-search-over-millions-of-documents-with-full-security/feed/ 5
Intranet search event http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/ http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/#respond Fri, 03 Dec 2010 10:18:39 +0000 http://www.flax.co.uk/blog/?p=438 Intranet Search was the theme for a small gathering last night at the (rather imposing) Ministry of Justice in London. We heard from Luke Oatham on intranet search at the Ministry itself, powered by Google over a reasonably small set … More

The post Intranet search event appeared first on Flax.

]]>
Intranet Search was the theme for a small gathering last night at the (rather imposing) Ministry of Justice in London. We heard from Luke Oatham on intranet search at the Ministry itself, powered by Google over a reasonably small set of static and hand-published HTML. Simon Thompson continued with a neat way of enhancing Sharepoint search, using JQuery to create an auto-complete tool for his company intranet, which interestingly displayed both ‘people’ and ‘page’ results in the same drop-down menu. Tyler Tate couldn’t make it to the event due to bad weather, but bravely volunteered to present over Skype on a (surprisingly good) 3G connection, and talked about handling diverse data (video, slides). Next up was our very own Tom Mortimer talking about indexing security information (of which more later) and we finished up with a quick talk from Rangi Robinson on the intranet at Framestore, with search powered by the open source Sphinx project.

Thanks to Simon Thompson and Angel Brown for organising the event and inviting us to speak.

The post Intranet search event appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2010/12/03/intranet-search-event/feed/ 0
Find out how we build in document security for open source search http://www.flax.co.uk/blog/2010/11/12/find-out-how-we-build-in-document-security-for-open-source-search/ http://www.flax.co.uk/blog/2010/11/12/find-out-how-we-build-in-document-security-for-open-source-search/#respond Fri, 12 Nov 2010 15:26:40 +0000 http://www.flax.co.uk/blog/?p=427 My colleague Tom Mortimer will be talking at the London Intranet Show & Tell on 2nd December, about how to implement document-level security for search: his presentation is titled “Implementing ACLs in an open source search solution”. There are still … More

The post Find out how we build in document security for open source search appeared first on Flax.

]]>
My colleague Tom Mortimer will be talking at the London Intranet Show & Tell on 2nd December, about how to implement document-level security for search: his presentation is titled “Implementing ACLs in an open source search solution”.

There are still a few tickets left for this small event, which will be of value to those working on intranet search.

The post Find out how we build in document security for open source search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2010/11/12/find-out-how-we-build-in-document-security-for-open-source-search/feed/ 0