Tired of Slow, Manual Logstash Syntax Checking?

In order to accelerate the process of writing log parsers in Logstash, we set out to develop our own Logstash syntax checking tool. This blog article outlines how the syntax checker works and explains how this does not interfere with Logstash’s real-time pipelining capability.

Logstash Syntax Checking Limitations

Don’t take this the wrong way: At Trovent, we use Logstash extensively. Logstash is great for our log parsing needs. And we do a lot of log parsing in the context of our anomaly detection solutions!
But, the reality is that the standard, out-of-the-box, Logstash implementation does not provide any built-in syntax checking that is both usable and intuitive. This makes writing log parsers painfully slow.

Yes, experienced Logstash users will point out that Logstash already has the functionality to test the provided configuration. It only needs a special flag for it. But, even for such a simple task, Logstash requires more than 10 seconds (of course this depends on many factors, such as the host’s compute power and configuration, Java configuration, etc.). Consuming more than 10s for Logstash to start JVM, prepare Java and Ruby environments as well as parse the CLI command, represents a major obstacle in developing complex Logstash pipelines.

So, in order to accelerate the process of writing log parsers in Logstash, we set out to develop our own Logstash syntax checking tool. Below we will outline how the syntax checker works and explain how this does not interfere with Logstash’s real-time pipelining capability.

Automate and Accelerate Logstash Syntax Checking

In our over-arching goal of automating and accelerating Logstash syntax checking, we set ourselves the following key development objectives:

  • Provide real-time capability
  • Implement a Logstash configuration editor that closely resembles modern IDEs
  • Ensure that future Logstash releases can be easily patched to our custom version
  • Enhance Logstash without making changes to standard Logstash code

Syntax Checking Set-up in Standard Logstash Release

The simplified workflow for testing configuration syntax in the official Logstash release looks like this:

But this set-up has the aforementioned downsides:

  • No real-time syntax checking
  • Very slow performance in preparing the environment for syntax testing

Enhanced Logstash Syntax Checking

The general concept behind our enhanced Logstash syntax checking set-up is to have a separate entry point to the java program (let’s call it LogstashSyntax.java) and to leave a shell script to decide which entry point to use to execute Logstash action, either:

  • Current Logstash.java entry point; or
  • Custom LogstashSyntax.java entry point

The custom LogstashSyntax.java entry point performs the same task as the Logstash.java entry point, except it does not shut down and exit after the environment preparation and actions performed, rather it creates an embedded Jetty server which serves REST API requests.

The simplified workflow for testing configuration syntax in our custom Logstash release looks like this:

The addition of the Logstash syntax checker service does not change Logstash’s real-time pipelining capabilities. Logstash and the Logstash syntax checker can work in parallel, even on the same machine.

Deployment of Enhanced Logstash Release with Real-time Syntax Checking

Let’s take a simple Dockerfile configuration and build a Docker image from which we can start both Logstash and the Logstash syntax checker service:

FROM openjdk:8 AS build

RUN apt-get update -y && apt-get install -y git

# clone the trovent/logstash repository
RUN git clone https://github.com/trovent/logstash.git
WORKDIR /logstash

RUN ./gradlew unpackTarDistribution

FROM openjdk:8-jre-alpine

RUN mkdir /usr/share/logstash
COPY --from=build /logstash/build/logstash-*-SNAPSHOT /usr/share/logstash

# prepare working dir
WORKDIR /usr/share/logstash
RUN mkdir pipeline

Let’s build the custom Logstash Docker image with:

docker build -t trovent/logstash .

As we now have our Docker image ready, we have a couple of options for using this image. First, we can use it for Logstash’s main purpose – real-time pipelining. And in this regard, our Logstash instance does not differ from the official Logstash release:

docker run --rm --name logstash trovent/logstash sh ./bin/logstash -e 'input {}'

But, in addition, an independent service can be run from the same Docker image. And that service can serve as the Logstash configuration syntax checker:

docker run --rm -p 8000:8080 \
  --name logstash_syntax_checker trovent/logstash \
  sh ./bin/logstash-syntax -f pipeline -t --port 8080

Confirming the Real-time Syntax Checker Does What It Should

Now, let’s test our enhanced Logstash syntax checker. We started the REST service on port 8080 inside the Docker container by adding the parameter –port to the logstash-syntax command. Also, we exposed that 8080 port to the host’s 8000 port.

Here is an example of testing an invalid Logstash syntax:

curl -X POST 'http://localhost:8000/api/syntax-check' \
  --data-raw 'inputs { stdin { } } output { stdout { } }'

The response is the following:

  "error":"Expected one of [ \t\r\n], "#", "{" at line 1, column 6 (byte 6) after input"

Not only are we now able to parse the syntax check and provide the client with a structured response containing the error message, line, and column where this error appeared, but we are also able to deliver the response within milliseconds!

We were therefore able to achieve our key objective to automate and accelerate Logstash syntax checking!

If the concept of real-time Logstash syntax checking is potentially of value in your own Logstash implementations, please feel free to try it for yourself: https://github.com/trovent/logstash

Of course we would be happy to hear your feedback. Please don’t hesitate to contact us!