Elasticsearch + Logstash + Kibana to geo identify our users

Posted by in ELK tutorial

worldWhile I was updating my DevOps Days Warsaw 2014 talk (slides) I decided it may be good to show how to use multiple filters in Logstash – for example grok combined with geoip filter that will enhance our log files with location of visitors of our site. So if you are interested in it and later how to use Kibana 4 to visualize that data, I hope you will find the post useful.

The data

We start with the data simplest data I could get – access logs from http://solr.pl and http://elasticsearchserverbook.com. Each log line looks similar to the following one:

180.76.4.169 - - [11/Mar/2015:00:21:11 +0100] "GET / HTTP/1.1" 301 191 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

We have the caller IP address, the target resource, etc. The data is available on our Github account, in the repository dedicated to the updated talk – https://github.com/solrpl/zerotohero.

Logstash configuration

Because we are not processing logs continuously we will just run Logstash once. I’ll write something dedicated to show how to use log shippers to send our logs continently, but this blog entry is not about that. So, to make Logstash work we need three things:

  • input – definition from where the data comes
  • filters – definition on how our input data will be processed
  • output – where our processed data will be send

Let’s look at each configuration section now.

Input

I put the data in /home/gr0/hero.log file, so our Logstash input definition looks as follows:

input {
  file {
    path => "/home/gro/hero.log"
    type => "access_log"
    start_position => "beginning"
  }
}

The path property specifies the location of the data file, the type is our name for the data and the start_position set to beginning says that we want to process the whole file from the start.

Filters

Now let’s get to the filters definition:

filter {
  if [type] == "access_log"  {
    grok {
      match => {
        "message" => "%{COMBINEDAPACHELOG}"
      }
    }
    geoip {
      source => "clientip"
    }
  }
}

First we say that we want to process our access_log type and we define two filters – grok and geoip. The grok filter is responsible for parsing the access log. Basically it takes the unstructured log from the defined input and give each log line format. The second filters is the geoip one, which adds localization information to our data. For the geoip filter we are required to specify the source property, which holds information about the field which holds the IP address we are interested in – in our case if is the clientip field generated by the grok filter.

Output

The output is very simple as well as we just want to push our data to local Elasticsearch:

output {
  elasticsearch {
    host => "localhost"
    port => 9200
    protocol => "http"
    index => "logs_%{+YYYY.MM.dd}"
    manage_template => true
  }
}

We specify the host, port and protocol. Next we specify the index template by using the index property, which in our case will result indices like logs_2015.03.21, logs_2015.03.22 and so on. Finally, we say that Logstash should manage Elasticsearch templates, so we don’t have to care about them at all – this is done by setting the manage_template to true.

Final configuration

So the final configuration stored in logstash.conf file looks as follows:

input {
  file {
    path => "/home/gro/hero.log"
    type => "access_log"
    start_position => "beginning"
  }
}

filter {
  if [type] == "access_log"  {
    grok {
      match => {
        "message" => "%{COMBINEDAPACHELOG}"
      }
    }
    geoip {
      source => "clientip"
    }
  }
}

output {
  elasticsearch {
    host => "localhost"
    port => 9200
    index => "logs_%{+YYYY.MM.dd}"
    protocol => "http"
    manage_template => true
  }
}

Starting Logstash processing

Now we just need to run:

bin/logstash -f logstash.conf

and wait for Logstash to end 🙂

Kibana

Now let’s visualize it by using Kibana 4.

Initializing Kibana

We start Kibana with the default configuration by using a simple command:

bin/kibana

After that you should see a log line in Kibana console looking as follows:

{"@timestamp":"2015-03-15T14:21:41.769Z","level":"info","message":"Listening on 0.0.0.0:5601","node_env":"\"production\""}

That means that Kibana is now started and we need to point our web browser to localhost:5601 to initialize it further. Our web broswer should show us Kibana initialization screen, which should look similar to the following one:

kibana_initialization

The things to remember is to have both “Use event times to create index names” and “Use event times to create index names” checked, “Index pattern inverval” set to match our time based indices – in our case daily and the “Index name or pattern” set to the pattern we use, which in our case is [logs_]YYYY.MM.DD. Finally we chose the time based field – @timestamp in our case and we click “Create“.

Creating map with user locations

Now all we have to do is go to Visualize tab and chose Tile map:

kibana_tile_map

We chose “From a new search“:

kibana_new_search

We are now at a page where we need to chose the aggregation type. We are interested in the “Geo coordinates” aggregation, which will be based on the IP addresses information we configured Logstash to extract and add:

kibana_geo_coordinates

We need to chose the Geohash aggregation and the geoip.location field:

kibana_geo_coordinates_done

After we have that we should see where our users come from:

kibana_full

And that’s all folks 🙂