Month: July 2022

Kibana Vega Chart with Query

I have finally managed to produce a chart that includes a query — I don’t want to have to walk all of the help desk users through setting up the query, although I figured having the ability to select your own time range would be useful.

{
  $schema: https://vega.github.io/schema/vega-lite/v2.json
  title: User Logon Count

  // Define the data source
  data: {
    url: {
      // Which index to search
      index: firewall_logs*

      body: {
        _source: ['@timestamp', 'user', 'action']

"query": {
	"bool": {
		"must": [{
				"query_string": {
					"default_field": "subtype",
					"query": "user"
				}
			},
	   {
				"range": {
					"@timestamp": {
						"%timefilter%": true
                    			}
                  		}
     	}]
	}
}

        
        aggs: {
          time_buckets: {
            date_histogram: {
              field: @timestamp
              interval: {%autointerval%: true}
              extended_bounds: {
                // Use the current time range's start and end
                min: {%timefilter%: "min"}
                max: {%timefilter%: "max"}
              }
              // Use this for linear (e.g. line, area) graphs.  Without it, empty buckets will not show up
              min_doc_count: 0
            }
          }
        }
        size: 0
      }
    }
    format: {property: "aggregations.time_buckets.buckets"}
  }
  mark: point
  encoding: {
    x: {
      field: key
      type: temporal
      axis: {title: false} // Don't add title to x-axis
    }
    y: {
      field: doc_count
      type: quantitative
      axis: {title: "Document count"}
    }
  }
}

Debugging Vega Graphs in Kibana

If you open the browser’s developer console, you can access debugging information. This works when you are editing a visualization as well as when you are viewing one. To see a list of available functions, type VEGA_DEBUG. and a drop-down will show you what’s available. The command “VEGA_DEBUG.vega_spec” outputs pretty much everything about the chart.

To access the data set being graphed with the Vega Lite grammar, use “VEGA_DEBUG.view.data(“source_0)” — if you are using the Vega grammar, use the dataset name that you have defined.

Kibana – Visualizations and Dashboards

Kibana – Creating Visualizations

General

Time Series Visualization Pipeline

Kibana – Creating a Dashboard

Kibana – Creating Visualizations

General

To create a new visualization, select the visualization icon from the left-hand navigation menu and click “Create visualization”. You’ll need to select the type of visualization you want to create.

TSVB (Time Series Visualization Builder)

The Time Series Visualization Pipeline is a GUI visualization builder to create graphs from time series data. This means the x-axis will be datetime values and the y-axis will the data you want to visualize over the time period. To create a new visualization of this type, select “TSVB” on the “New Visualization” menu.

Scroll down and select “Panel options” – here you specify the index you want to visualize. Select the field that will be used as the time for each document (e.g. if your document has a special field like eventOccuredAt, you’d select that here). I generally leave the time interval at ‘auto’ – although you might specifically want to present a daily or hourly report.

Once you have selected the index, return to the “Data” tab. First, select the type of aggregation you want to use. In this example, we are showing the number of documents for a variety of policies.

The “Group by” dropdown allows you to have chart lines for different categories (instead of just having the count of documents over the time series, which is what “Everything” produces) – to use document data to create the groupings, select “Terms”.

Select the field you want to group on – in this case, I want the count for each unique “policyname” value, so I selected “policyname.keyword” as the grouping term.

Voila – a time series chart showing how many documents are found for each policy name. Click “Save” at the top left of the chart to save the visualization.

Provide a name for the visualization, write a brief description, and click “Save”. The visualization will now be available for others to view or for inclusion in dashboards.

TimeLion

TimeLion looks like it is going away soon, but it’s what I’ve seen as the recommendation for drawing horizontal lines on charts.

This visualization type is a little cryptic – you need to enter Timelion expression — .es() retrieves data from ElasticSearch, .value(3500) draws a horizontal line at 3,500

If there is null data at a time value, TimeLion will draw a discontinuous line. You can modify this behavior by specifying a fit function.

Note that you’ll need to click “Update” to update the chart before you are able to save the visualization.

Vega

Vega is an experimental visualization type.

This is, by far, the most flexible but most complex approach to creating a visualization. I’ve used it to create the Sankey visualization showing the source and destination countries from our firewall logs. Both Vega and Vega-Lite grammars can be used. ElasticCo provides a getting started guide, and there are many example online that you can use as the basis for your visualization.

Kibana – Creating a Dashboard

To create a dashboard, select the “Dashboards” icon on the left-hand navigation bar. Click “Create dashboard”

Click “Add an existing” to add existing visualizations to the dashboard.

Select the dashboards you want added, then click “Save” to save your dashboard.

Provide a name and brief description, then click “Save”.

 

Kibana Sankey Visualization

Now that we’ve got a lot of data being ingested into our ELK platform, I am beginning to build out visualizations and dashboards. This Vega visualization (source below) shows the number of connections between source and destination countries.

{ 
  $schema: https://vega.github.io/schema/vega/v3.0.json
  data: [
    {
      // Respect currently selected time range and filter string
      name: rawData
      url: {
        %context%: true
        %timefield%: @timestamp
        index: firewall_logs*
        body: {
          size: 0
          aggs: {
            table: {
              composite: {
                size: 10000
                sources: [
                  {
                    source_country: {
                     terms: {field: "srccountry.keyword"}
                    }
                  }
                  {
                    dest_country: {
                     terms: {field: "dstcountry.keyword"}
                    }
                  }
                ]
              }
            }
          }
        }
      }
      format: {property: "aggregations.table.buckets"}
      // Add aliases for data.* elements
      transform: [
        {type: "formula", expr: "datum.key.source_country", as: "source_country"}
        {type: "formula", expr: "datum.key.dest_country", as: "dest_country"}
        {type: "formula", expr: "datum.doc_count", as: "size"}
      ]
    }
    {
      name: nodes
      source: rawData
      transform: [
        // Filter to selected country
        {
          type: filter
          expr: !groupSelector || groupSelector.source_country == datum.source_country || groupSelector.dest_country == datum.dest_country
        }
        {type: "formula", expr: "datum.source_country+datum.dest_country", as: "key"}
        {
          type: fold
          fields: ["source_country", "dest_country"]
          as: ["stack", "grpId"]
        }
        {
          type: formula
          expr: datum.stack == 'source_country' ? datum.source_country+' '+datum.dest_country : datum.dest_country+' '+datum.source_country
          as: sortField
        }
        {
          type: stack
          groupby: ["stack"]
          sort: {field: "sortField", order: "descending"}
          field: size
        }
        {type: "formula", expr: "(datum.y0+datum.y1)/2", as: "yc"}
      ]
    }
    {
      name: groups
      source: nodes
      transform: [
        // Aggregate country groups and include number of documents for each grouping
        {
          type: aggregate
          groupby: ["stack", "grpId"]
          fields: ["size"]
          ops: ["sum"]
          as: ["total"]
        }
        {
          type: stack
          groupby: ["stack"]
          sort: {field: "grpId", order: "descending"}
          field: total
        }
        {type: "formula", expr: "scale('y', datum.y0)", as: "scaledY0"}
        {type: "formula", expr: "scale('y', datum.y1)", as: "scaledY1"}
        {type: "formula", expr: "datum.stack == 'source_country'", as: "rightLabel"}
        {
          type: formula
          expr: datum.total/domain('y')[1]
          as: percentage
        }
      ]
    }
    {
      name: destinationNodes
      source: nodes
      transform: [
        {type: "filter", expr: "datum.stack == 'dest_country'"}
      ]
    }
    {
      name: edges
      source: nodes
      transform: [
        {type: "filter", expr: "datum.stack == 'source_country'"}
        {
          type: lookup
          from: destinationNodes
          key: key
          fields: ["key"]
          as: ["target"]
        }
        {
          type: linkpath
          orient: horizontal
          shape: diagonal
          sourceY: {expr: "scale('y', datum.yc)"}
          sourceX: {expr: "scale('x', 'source_country') + bandwidth('x')"}
          targetY: {expr: "scale('y', datum.target.yc)"}
          targetX: {expr: "scale('x', 'dest_country')"}
        }
        // Calculation to determine line thickness
        {
          type: formula
          expr: range('y')[0]-scale('y', datum.size)
          as: strokeWidth
        }
        {
          type: formula
          expr: datum.size/domain('y')[1]
          as: percentage
        }
      ]
    }
  ]
  scales: [
    {
      name: x
      type: band
      range: width
      domain: ["source_country", "dest_country"]
      paddingOuter: 0.05
      paddingInner: 0.95
    }
    {
      name: y
      type: linear
      range: height
      domain: {data: "nodes", field: "y1"}
    }
    {
      name: color
      type: ordinal
      range: category
      domain: {data: "rawData", fields: ["source_country", "dest_country"]}
    }
    {
      name: stackNames
      type: ordinal
      range: ["Source Country", "Destination Country"]
      domain: ["source_country", "dest_country"]
    }
  ]
  axes: [
    {
      orient: bottom
      scale: x
      encode: {
        labels: {
          update: {
            text: {scale: "stackNames", field: "value"}
          }
        }
      }
    }
    {orient: "left", scale: "y"}
  ]
  marks: [
    {
      type: path
      name: edgeMark
      from: {data: "edges"}
      clip: true
      encode: {
        update: {
          stroke: [
            {
              test: groupSelector && groupSelector.stack=='source_country'
              scale: color
              field: dest_country
            }
            {scale: "color", field: "source_country"}
          ]
          strokeWidth: {field: "strokeWidth"}
          path: {field: "path"}
          strokeOpacity: {
            signal: !groupSelector && (groupHover.source_country == datum.source_country || groupHover.dest_country == datum.dest_country) ? 0.9 : 0.3
          }
          zindex: {
            signal: !groupSelector && (groupHover.source_country == datum.source_country || groupHover.dest_country == datum.dest_country) ? 1 : 0
          }
          tooltip: {
            signal: datum.source_country + ' → ' + datum.dest_country + '    ' + format(datum.size, ',.0f') + '   (' + format(datum.percentage, '.1%') + ')'
          }
        }
        hover: {
          strokeOpacity: {value: 1}
        }
      }
    }
    {
      type: rect
      name: groupMark
      from: {data: "groups"}
      encode: {
        enter: {
          fill: {scale: "color", field: "grpId"}
          width: {scale: "x", band: 1}
        }
        update: {
          x: {scale: "x", field: "stack"}
          y: {field: "scaledY0"}
          y2: {field: "scaledY1"}
          fillOpacity: {value: 0.6}
          tooltip: {
            signal: datum.grpId + '   ' + format(datum.total, ',.0f') + '   (' + format(datum.percentage, '.1%') + ')'
          }
        }
        hover: {
          fillOpacity: {value: 1}
        }
      }
    }
    {
      type: text
      from: {data: "groups"}
      interactive: false
      encode: {
        update: {
          x: {
            signal: scale('x', datum.stack) + (datum.rightLabel ? bandwidth('x') + 8 : -8)
          }
          yc: {signal: "(datum.scaledY0 + datum.scaledY1)/2"}
          align: {signal: "datum.rightLabel ? 'left' : 'right'"}
          baseline: {value: "middle"}
          fontWeight: {value: "bold"}
          // Do not show labels on smaller items
          text: {signal: "abs(datum.scaledY0-datum.scaledY1) > 13 ? datum.grpId : ''"}
        }
      }
    }
    {
      type: group
      data: [
        {
          name: dataForShowAll
          values: [{}]
          transform: [{type: "filter", expr: "groupSelector"}]
        }
      ]
      // Set button size and positioning
      encode: {
        enter: {
          xc: {signal: "width/2"}
          y: {value: 30}
          width: {value: 80}
          height: {value: 30}
        }
      }
      marks: [
        {
          type: group
          name: groupReset
          from: {data: "dataForShowAll"}
          encode: {
            enter: {
              cornerRadius: {value: 6}
              fill: {value: "#f5f5f5"}
              stroke: {value: "#c1c1c1"}
              strokeWidth: {value: 2}
              // use parent group's size
              height: {
                field: {group: "height"}
              }
              width: {
                field: {group: "width"}
              }
            }
            update: {
              opacity: {value: 1}
            }
            hover: {
              opacity: {value: 0.7}
            }
          }
          marks: [
            {
              type: text
              interactive: false
              encode: {
                enter: {
                  xc: {
                    field: {group: "width"}
                    mult: 0.5
                  }
                  yc: {
                    field: {group: "height"}
                    mult: 0.5
                    offset: 2
                  }
                  align: {value: "center"}
                  baseline: {value: "middle"}
                  fontWeight: {value: "bold"}
                  text: {value: "Show All"}
                }
              }
            }
          ]
        }
      ]
    }
  ]
  signals: [
    {
      name: groupHover
      value: {}
      on: [
        {
          events: @groupMark:mouseover
          update: "{source_country:datum.stack=='source_country' && datum.grpId, dest_country:datum.stack=='dest_country' && datum.grpId}"
        }
        {events: "mouseout", update: "{}"}
      ]
    }
    {
      name: groupSelector
      value: false
      on: [
        {
          events: @groupMark:click!
          update: "{stack:datum.stack, source_country:datum.stack=='source_country' && datum.grpId, dest_country:datum.stack=='dest_country' && datum.grpId}"
        }
        {
          events: [
            {type: "click", markname: "groupReset"}
            {type: "dblclick"}
          ]
          update: "false"
        }
      ]
    }
  ]
}

Did you know … you can chat with yourself in Teams?

I’ll admit it — I send myself emails. And text messages. And, back before smart phones, I sent myself voice messages too. That was one item where Teams was as step backwards — I had to make my own Teams space (with just me as a member!) in order to send myself notes here. But not anymore …
You can finally chat with yourself! Note that retention policies may be shorter in chats than Teams spaces … so you might still want to go the route of creating your own Teams space to ensure that note you send yourself for an end-of-the-year task or a list of accomplishments for your annual review are still around when you need them. But for quick notes so you remember something tomorrow (or next week, or three weeks from now), chatting with yourself is perfect for holding short-term reminders.

 

Logstash, JRuby, and Private Temp

There’s a long-standing bug in logstash where the private temp folder created for jruby isn’t cleaned up when the logstash process exits. To avoid filling up the temp disk space, I put together a quick script to check the PID associated with each jruby temp folder, see if it’s an active process, and remove the temp folder if the associated process doesn’t exist.

When the PID has been re-used, this means we’ve got an extra /tmp/jruby-### folder hanging about … but each folder is only 10 meg. The impacting issue is when we’ve restarted logstash a thousand times and a thousand ten meg folders are hanging about.

This script can be cron’d to run periodically or it can be run when the logstash service launches.

import subprocess
import re

from shutil import rmtree

strResult = subprocess.check_output(f"ls /tmp", shell=True)

for strLine in strResult.decode("utf-8").split('\n'):
        if len(strLine) > 0 and strLine.startswith("jruby-"):
                listSplitFileNames = re.split("jruby-([0-9]*)", strLine)
                if listSplitFileNames[1] is not None:
                        try:
                                strCheckPID = subprocess.check_output(f"ps -efww |  grep {listSplitFileNames[1]} | grep -v grep", shell=True)
                                #print(f"PID check result is {strCheckPID}")
                        except:
                                print(f"I am deleting |{strLine}|")
                                rmtree(f"/tmp/{strLine}")