We recently did a proof-of-concept project for a customer which ingested log events from various sources into a Kafka – Logstash – Elasticsearch – Kibana stack. This was configured with Ansible and hosted on about a dozen VMs inside the customer’s main network.
For various reasons resources were tight. One problem which we ran into several times was running out of disk space on the Elasticsearch nodes (this was despite setting up Curator to delete older indexes, and increasing the available storage as much as possible). Like most software, Elasticsearch does not always handle this situation gracefully, and we often had to ssh in and manually delete index files to get the system working again.
As a result of this experience, we have written a simple proxy server which can detect when an Elasticsearch or Solr cluster is close to running out of storage, and reject any further updates with a configurable error (503 Unavailable would seem to be the most appropriate) until enough space is freed up for indexing to continue. We call this Hara Hachi Bu, after the Confucian teaching to only eat until you are 80% full. It is available to download on GitHub and has the Apache 2.0 license. This is a very early release and we would welcome feedback or contributions. Although we have tested it with Elasticsearch and Solr, it should be adaptable to any data store with a RESTful API.
The server is implemented using DropWizard (version 0.9.2), a framework we’ve used a lot for its ease of use and configurability. It is intended to sit between an indexer and your search engine (or a similar disk-based data store), and will check that disk space is available when requesting certain endpoints. If the disk space is less than a configured threshold value, the request will be rejected with a configurable HTTP status code.
There are disk space checkers for Elasticsearch (using the /_cluster/stats endpoint), a local Solr installation, or a cluster of hosts. If using a cluster, each machine is required to regularly post its disk space to the application. Custom implementations can also be added, by implementing the DiskSpaceChecker interface.
The trickiest part of the implementation was to allow DropWizard endpoints through without them being proxied. We did this by implementing both a filter and a servlet – the filter looks out for locally known endpoints and passes them straight through, while unknown endpoints have a /proxy prefix added to the URL path and then caught by the proxy servlet. The filter also carries out the disk space check on URLs in the check list, allowing them to be rejected before reaching the servlet. (If you’ve come up with a different solution to this problem, we’d be interested to hear about it.)
The proxy was implemented by extending the Jetty ProxyServlet (http://www.eclipse.org/jetty/
Internally, the application will build the DiskSpaceChecker defined in the configuration. DropWizard resources (or endpoints) and health checks are added depending on the implementation, with a default, generic health check which simply checks whether or not disk space is currently available. The /setSpace resource is only available when using the clustered configuration, for example.