Tag: data visualization

July 22, 2022

Kibana Visualization – Vega Line Chart with Baseline

There’s often a difference between hypothetical (e.g. the physics formula answer) and real results — sometimes this is because sciences will ignore “negligible” factors that can be, well, more than negligible, sometimes this is because the “real world” isn’t perfect. In transmission media, this difference is a measurable “loss” — hypothetically, we know we could send X data in Y delta-time, but we only sent X’. Loss also happens because stuff breaks — metal corrodes, critters nest in fiber junction boxes, dirt builds up on a dish. And it’s not easy, when looking at loss data at a single point in time, to identify what’s normal loss and what’s a problem.

We’re starting a project to record a baseline of loss for all sorts of things — this will allow individuals to check the current loss data against that which engineers say “this is as good as it’s gonna get”. If the current value is close … there’s not a problem. If there’s a big difference … someone needs to go fix something.

Unfortunately, creating a graph in Kibana that shows the baseline was … not trivial. There is a rule mark that allows you to draw a straight line between two points. You cannot just say “draw a line at y from 0 to some large value that’s going to be off the graph. The line doesn’t render (say, 0 => today or the year 2525). You cannot just get the max value of the axis.

I finally stumbled across a series of data contortions that make the baseline graphable.

The data sets I have available have a datetime object (when we measured this loss) and a loss value. For scans, there may be lots of scans for a single device. For baselines, there will only be one record.

The joinaggregate transformation method — which appends the value to each element of the data set — was essential because I needed to know the largest datetime value that would appear in the chart.

, {“type”: “joinaggregate”, “fields”: [“transformedtimestamp”], “ops”: [“max”], “as”: [“maxtime”]}

The lookup transformation method — which can access elements from other data sets — allowed me to get that maximum timestamp value into the baseline data set. Except … lookup needs an exact match in the search field. Luckily, it does return a random (I presume either first or last … but it didn’t matter in this case because all records have the same max date value) record when multiple matches are found.

So I used a formula transformation method to add a constant to each record as well

, {“type”: “formula”, “as”: “pi”, “expr”: “PI”}

Now that there’s a record to be found, I can add the max time from our scan data into our baseline data

, {“type”: “lookup”, “from”: “scandata”, “key”: “pi”, “fields”: [“pi”], “values”: [“maxtime”], “as”: [“maxtime”]}

Voila — a chart with a horizontal line at the baseline loss value. Yes, I randomly copied a record to use as the baseline and selected the wrong one (why some scans are below the “good as it’s ever going to get” baseline value!). But … once we have live data coming into the system, we’ll have reasonable looking graphs.

The full Vega spec for this graph:

{
    "$schema": "https://vega.github.io/schema/vega/v4.json",
      "description": "Scan data with baseline",
    "padding": 5,

    "title": {
        "text": "Scan Data",
        "frame": "bounds",
        "anchor": "start",
        "offset": 12,
        "zindex": 0
      },
    "data": [
    {
        "name": "scandata",
        "url": {
            "%context%": true,
            "%timefield%": "@timestamp",
            "index": "traces-*",
            "body": {
            "sort": [{
                "@timestamp": {
                    "order": "asc"
                }
            }],
            "size": 10000,
            "_source":["@timestamp","Events.Summary.total loss"]
            }
        }
        ,"format": { "property": "hits.hits"}
        ,"transform":[
            {"type": "formula", "expr": "datetime(datum._source['@timestamp'])", "as": "transformedtimestamp"}
            , {"type": "joinaggregate", "fields": ["transformedtimestamp"], "ops": ["max"], "as": ["maxtime"]}
            , {"type": "formula", "as": "pi", "expr": "PI"}
        ]
    }
  ,
   {
        "name": "baseline",
        "url": {
            "%context%": true,
            "index": "baselines*",
            "body": {
                "sort": [{
                    "@timestamp": {
                        "order": "desc"
                    }
                }],
                "size": 1,
                "_source":["@timestamp","Events.Summary.total loss"]
            }
        }
        ,"format": { "property": "hits.hits" }
        ,"transform":[
                {"type": "formula", "as": "pi", "expr": "PI"}
                , {"type": "lookup", "from": "scandata", "key": "pi", "fields": ["pi"], "values": ["maxtime"], "as": ["maxtime"]}
        ]
  }
]      
,
    "scales": [
      {
        "name": "x",
        "type": "point",
        "range": "width",
        "domain": {"data": "scandata", "field": "transformedtimestamp"}
      },
      {
        "name": "y",
        "type": "linear",
        "range": "height",
        "nice": true,
        "zero": true,
        "domain": {"data": "scandata", "field": "_source.Events.Summary.total loss"}
      }
    ],
        "axes": [
      {"orient": "bottom", "scale": "x"},
      {"orient": "left", "scale": "y"}
    ],
     "marks": [
                {
            "type": "line",
            "from": {"data": "scandata"},
            "encode": {
              "enter": {
                "x": { "scale": "x", "field": "transformedtimestamp", "type": "temporal",
      "timeUnit": "yearmonthdatehourminute"},
                "y": {"scale": "y",       "type": "quantitative","field": "_source.Events.Summary.total loss"},
                "strokeWidth": {"value": 2},
                "stroke": {"value": "green"}
              }
            }
          }
                 ,        {
            "type": "rule",
            "from": {"data": "baseline"},
            "encode": {
              "enter": {
                "stroke": {"value": "#652c90"},
                "x": {"scale": "x", "value": 0},
                "y": {"scale": "y",      "type": "quantitative","field": "_source.Events.Summary.total loss"},
                "x2": {"scale": "x","field": "maxtime", "type": "temporal"},
                "strokeWidth": {"value": 4},
                "opacity": {"value": 0.3}
              }
            }
          }
     ]         
}

July 21, 2022

Vega Visualization when Data Element Name Contains At Symbol

We have data created by an external source (i.e. I cannot just change the names used so it works) — the datetime field is named @timestamp and I had an awful time figuring out out how to address that element within a transformation expression.

Just to make sure I wasn’t doing something silly, I created a copy of the data element named without the at symbol. Voila – transformedtimestamp is populated with a datetime element.

I finally figured it out – it appears that I have encountered a JavaScript limitation. Instead of using the dot-notation to access the element, the array subscript method works – not datum.@timestamp in any iteration or with any combination of escapes.

July 15, 2022

Kibana Vega Chart with Query

I have finally managed to produce a chart that includes a query — I don’t want to have to walk all of the help desk users through setting up the query, although I figured having the ability to select your own time range would be useful.

{
  $schema: https://vega.github.io/schema/vega-lite/v2.json
  title: User Logon Count

  // Define the data source
  data: {
    url: {
      // Which index to search
      index: firewall_logs*

      body: {
        _source: ['@timestamp', 'user', 'action']

"query": {
	"bool": {
		"must": [{
				"query_string": {
					"default_field": "subtype",
					"query": "user"
				}
			},
	   {
				"range": {
					"@timestamp": {
						"%timefilter%": true
                    			}
                  		}
     	}]
	}
}

        
        aggs: {
          time_buckets: {
            date_histogram: {
              field: @timestamp
              interval: {%autointerval%: true}
              extended_bounds: {
                // Use the current time range's start and end
                min: {%timefilter%: "min"}
                max: {%timefilter%: "max"}
              }
              // Use this for linear (e.g. line, area) graphs.  Without it, empty buckets will not show up
              min_doc_count: 0
            }
          }
        }
        size: 0
      }
    }
    format: {property: "aggregations.time_buckets.buckets"}
  }
  mark: point
  encoding: {
    x: {
      field: key
      type: temporal
      axis: {title: false} // Don't add title to x-axis
    }
    y: {
      field: doc_count
      type: quantitative
      axis: {title: "Document count"}
    }
  }
}

July 14, 2022

Debugging Vega Graphs in Kibana

If you open the browser’s developer console, you can access debugging information. This works when you are editing a visualization as well as when you are viewing one. To see a list of available functions, type VEGA_DEBUG. and a drop-down will show you what’s available. The command “VEGA_DEBUG.vega_spec” outputs pretty much everything about the chart.

To access the data set being graphed with the Vega Lite grammar, use “VEGA_DEBUG.view.data(“source_0)” — if you are using the Vega grammar, use the dataset name that you have defined.

July 13, 2022

Kibana – Visualizations and Dashboards

Kibana – Creating Visualizations

General

Time Series Visualization Pipeline

Kibana – Creating a Dashboard

Kibana – Creating Visualizations

General

To create a new visualization, select the visualization icon from the left-hand navigation menu and click “Create visualization”. You’ll need to select the type of visualization you want to create.

TSVB (Time Series Visualization Builder)

The Time Series Visualization Pipeline is a GUI visualization builder to create graphs from time series data. This means the x-axis will be datetime values and the y-axis will the data you want to visualize over the time period. To create a new visualization of this type, select “TSVB” on the “New Visualization” menu.

Scroll down and select “Panel options” – here you specify the index you want to visualize. Select the field that will be used as the time for each document (e.g. if your document has a special field like eventOccuredAt, you’d select that here). I generally leave the time interval at ‘auto’ – although you might specifically want to present a daily or hourly report.

Once you have selected the index, return to the “Data” tab. First, select the type of aggregation you want to use. In this example, we are showing the number of documents for a variety of policies.

The “Group by” dropdown allows you to have chart lines for different categories (instead of just having the count of documents over the time series, which is what “Everything” produces) – to use document data to create the groupings, select “Terms”.

Select the field you want to group on – in this case, I want the count for each unique “policyname” value, so I selected “policyname.keyword” as the grouping term.

Voila – a time series chart showing how many documents are found for each policy name. Click “Save” at the top left of the chart to save the visualization.

Provide a name for the visualization, write a brief description, and click “Save”. The visualization will now be available for others to view or for inclusion in dashboards.

TimeLion

TimeLion looks like it is going away soon, but it’s what I’ve seen as the recommendation for drawing horizontal lines on charts.

This visualization type is a little cryptic – you need to enter Timelion expression — .es() retrieves data from ElasticSearch, .value(3500) draws a horizontal line at 3,500

If there is null data at a time value, TimeLion will draw a discontinuous line. You can modify this behavior by specifying a fit function.

Note that you’ll need to click “Update” to update the chart before you are able to save the visualization.

Vega

Vega is an experimental visualization type.

This is, by far, the most flexible but most complex approach to creating a visualization. I’ve used it to create the Sankey visualization showing the source and destination countries from our firewall logs. Both Vega and Vega-Lite grammars can be used. ElasticCo provides a getting started guide, and there are many example online that you can use as the basis for your visualization.

Kibana – Creating a Dashboard

To create a dashboard, select the “Dashboards” icon on the left-hand navigation bar. Click “Create dashboard”

Click “Add an existing” to add existing visualizations to the dashboard.

Select the dashboards you want added, then click “Save” to save your dashboard.

Provide a name and brief description, then click “Save”.

July 27, 2020

Data Visualization – Renters at Risk of Eviction

Tonight, I saw a post about the percent of rental households facing eviction — a staggering statistic on its own, but percentages can hide large or small numbers. 22% or 55% of households facing eviction sounds awful, but how awful depends on how many households rent or own in the state. The government, however, publishes a lot of data about US households. Data profiles for 2018 are available, so I’m using the number of renters by state in 2018 to translate percentages into households.

This chart represents 18 million households facing eviction — this is housing units, not number of people. One household may be one person or it may be ten people.

Beyond the immediately obvious question of “where are all of these people going once they are evicted?, there is peripheral impact — a lot of rental units are financed. How are property owners paying when a quarter of their rental units vacant? Are their kids still going to school? The government insisted on rescuing “too big to fail” banks. Ten or twenty million homeless people across the country seems more dire. Yet our government cannot pass an unemployment extension in a timely fashion.

There’s a historical hypothesis that the relative stability of the past couple hundred years was achieved by maintaining a large middle class. People constantly plagued by poverty and starvation are open to suggested alternatives — what do you have to lose? But give 60% a little something — a decent place to live, enough food, transportation, a little extra money for “fun stuff” and you’ve got a populace invested in maintaining the status quo (however inequitable that may be). Mass unemployment and homelessness is the stuff of mass protest and revolution.

March 12, 2020

SARS CoV-2 Data

Visualization from Johns Hopkins Uni Center for Systems Science and Engineering: https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Testing Stats: https://www.cdc.gov/coronavirus/2019-ncov/testing-in-us.html

Interesting combination of data — there have been 13,624 tests (although the data points for the past few days is currently incomplete) and 1,663 infections. That means like 87% of the people who have been tested weren’t infected. Which could be that they’ve been tested before they are infected enough, or it could be that there are a LOT of uninfected people getting tested. Since the actual number of tests is going to be higher, the percent actually infected is lower.