{"id":9097,"date":"2022-06-24T13:08:42","date_gmt":"2022-06-24T18:08:42","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=9097"},"modified":"2022-06-24T13:09:10","modified_gmt":"2022-06-24T18:09:10","slug":"9097","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=9097","title":{"rendered":"Logstash"},"content":{"rendered":"<p><strong>General Info<\/strong><\/p>\n<p>Logstash is a pipeline based data processing service. Data comes into logstash, is manipulated, and is sent elsewhere. The source is maintained on <a href=\"https:\/\/github.com\/elastic\/logstash\">GitHub by ElasticCo<\/a>.<\/p>\n<p><strong>Installation<\/strong><\/p>\n<p><a href=\"https:\/\/www.elastic.co\/downloads\/past-releases\/#logstash\">Logstash<\/a> was downloaded from ElasticCo and installed from a gzipped tar archive to the \/opt\/elk\/logstash folder.<\/p>\n<p><strong>Configuration<\/strong><\/p>\n<p>The Logstash server is configured using the <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/logstash-settings-file.html\">logstash.yml file<\/a>.<\/p>\n<p>Logstash uses Log4J 2 for logging. <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/logging.html#log4j2\">Logging configuration<\/a> is maintained in the log4j2.properties file<\/p>\n<p>Logstash is java-based, and the JVM settings are maintained in the jvm.options file \u2013 this includes min heap space, garbage collection configuration, JRuby settings, etc.<\/p>\n<p>Logstash loads the pipelines defined in \/opt\/elk\/logstash\/config\/pipelines.yml \u2013 each pipeline needs an ID and a path to its configuration. The path can be to a config file or to a folder of config files for the pipeline. The number of workers for the pipeline defaults to the number of CPUs, so we normally define a worker count as well \u2013 this can be increased as load dictates.<\/p>\n<p>&#8211; pipeline.id: LJR<br \/>\npipeline.workers: 2<br \/>\npath.config: &#8220;\/opt\/elk\/logstash\/config\/ljr.conf&#8221;<\/p>\n<p>Each pipeline is configured in an individual <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/configuration-file-structure.html\">config file<\/a> that defines the input, any data manipulation to be performed, and the output.<\/p>\n<p><strong>Testing Configuration<\/strong><\/p>\n<p>As we have it configured, you must reload Logstash to implement any configuration changes. As errors in pipeline definitions will prevent the pipeline from loading, it is best to test the configuration prior to restarting Logstash.<\/p>\n<p>\/opt\/elk\/logstash\/bin\/logstash &#8211;config.test_and_exit -f ljr_firewall_logs_wip.conf<\/p>\n<p>Some errors may occur \u2013 if the test ends with &#8220;Configuration OK&#8221;, then it&#8217;s OK!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1076\" height=\"225\" class=\"wp-image-9098\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5.png 1076w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5-300x63.png 300w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5-1024x214.png 1024w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5-768x161.png 768w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-5-750x157.png 750w\" sizes=\"auto, (max-width: 1076px) 100vw, 1076px\" \/><\/p>\n<p><strong>Automatic Config Reload<\/strong><\/p>\n<p>The <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/7.7\/reloading-config.html\">configuration <em>can<\/em> automatically be reloaded<\/a> when changes to config files are detected. This doesn&#8217;t give you the opportunity to test a configuration prior to it going live on the server (once it&#8217;s saved, it will be loaded &#8230; or <em>not<\/em> loaded if there&#8217;s an error)<\/p>\n<p><strong>Input<\/strong><\/p>\n<p>Input instructs logstash about what format data the pipeline will receive \u2013 is JSON data being sent to the pipeline, is syslog sending log data to the pipeline, or does data come from STDIN? The types of data that can be received are defined by the <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/input-plugins.html\">input plugins<\/a>. Each input has its own configuration parameters. We use <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-inputs-beats.html\">Beats<\/a>, <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-inputs-syslog.html\">syslog<\/a>, JSON (a codec, not a filter type), <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-inputs-kafka.html\">Kafka<\/a>, stuff<\/p>\n<p>The input configuration also indicates which port to use for the pipeline \u2013 this needs to be unique!<\/p>\n<p>Input for a pipeline on port 5055 receiving JSON formatted data<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"206\" height=\"97\" class=\"wp-image-9099\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-6.png\" \/><\/p>\n<p>Input for a pipeline on port 5100 (both TCP and UDP) receiving syslog data<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"336\" height=\"267\" class=\"wp-image-9100\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-7.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-7.png 336w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-7-300x238.png 300w\" sizes=\"auto, (max-width: 336px) 100vw, 336px\" \/><\/p>\n<p><strong>Output<\/strong><\/p>\n<p>Output is similarly simple \u2013 various <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/output-plugins.html\">output plugins<\/a> define the systems to which data can be shipped. Each output has its own configuration parameters \u2013 <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-outputs-elasticsearch.html\">ElasticSearch<\/a>, <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-outputs-kafka.html\">Kafka<\/a>, and <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/plugins-outputs-file.html\">file <\/a>are the three output plug-ins we currently use.<\/p>\n<p><strong>ElasticSearch<\/strong><\/p>\n<p>Most of the data we ingest into logstash is processed and sent to ElasticSearch. The data is indexed and available to users through ES and Kibana.<\/p>\n<p><strong>Kafka<\/strong><\/p>\n<p>Some data is sent to Kafka basically as a holding queue. It is then picked up by the &#8220;aggregation&#8221; logstash server, processed some more, and relayed to the ElasticSearch system.<\/p>\n<p><strong>File<\/strong><\/p>\n<p>File output is generally used for debugging \u2013 seeing the output data allows you to verify your data manipulations are working property (as well as just make sure you see data transiting the pipeline without resorting to tcpdump!).<\/p>\n<p><strong>Filter<\/strong><\/p>\n<p>Filtering allows data to be removed, attributes to be added to records, and parses data into fields. The types of filters that can be applied are defined by the <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/master\/filter-plugins.html\">filter plugins<\/a>. Each plugin has its own documentation. Most of our data streams are filtered using Grok \u2013 see below for more details on that.<\/p>\n<p><a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/current\/event-dependent-configuration.html#conditionals\">Conditional rules<\/a> can be used in filters. This example filters out messages that contain the string &#8220;FIREWALL&#8221;, &#8220;id=firewall&#8221;, or &#8220;FIREWALL_VRF&#8221; as the business need does not require these messages, so there&#8217;s no reason to waste disk space and I\/O processing, indexing, and storing these messages.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"824\" height=\"51\" class=\"wp-image-9101\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8.png 824w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8-300x19.png 300w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8-768x48.png 768w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8-800x51.png 800w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-8-750x46.png 750w\" sizes=\"auto, (max-width: 824px) 100vw, 824px\" \/><\/p>\n<p>This example adds a field, &#8216;sourcetype&#8217;, with a value that is based on the log file path.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"334\" class=\"wp-image-9102\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-9.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-9.png 472w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-9-300x212.png 300w\" sizes=\"auto, (max-width: 472px) 100vw, 472px\" \/><\/p>\n<p><strong>Grok<\/strong><\/p>\n<p>The <a href=\"https:\/\/www.elastic.co\/guide\/en\/logstash\/master\/plugins-filters-grok.html\">grok filter<\/a> is a Logstash plugin that is used to extract data from log records \u2013 this allows us to pull important information into distinct fields within the ElasticSearch record. Instead of having the full message in the &#8216;message&#8217; field, you can have success\/failure in its own field, the logon user in its own field, or the source IP in its own field. This allows more robust reporting. If the use case <em>just<\/em> wants to store data, parsing the record may not be required. But, if they want to report on the number of users logged in per hour or how much data is sent to each IP address, we need to have the relevant fields available in the document.<\/p>\n<p><a href=\"https:\/\/github.com\/logstash-plugins\/logstash-patterns-core\">Patterns used by the grok filter are maintained in a Git repository<\/a> \u2013 the grok-patterns contains the data types like &#8216;DATA&#8217; in %{DATA:fieldname}<\/p>\n<p>The following are the ones I&#8217;ve used most frequently:<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Name<\/strong><\/td>\n<td><strong>Field Type<\/strong><\/td>\n<td><strong>Pattern Notes<\/strong><\/td>\n<td><strong>Notes<\/strong><\/td>\n<\/tr>\n<tr>\n<td>DATA<\/td>\n<td>Text data<\/td>\n<td>.*?<\/td>\n<td>This does not expand to the most matching characters \u2013 so looking for foo.*?bar in &#8220;<strong>foobar<\/strong> is not really a word, but foobar gets used a lot in IT documentation&#8221; will only match the bold portion of the text<\/td>\n<\/tr>\n<tr>\n<td>GREEDYDATA<\/td>\n<td>Text data<\/td>\n<td>.*<\/td>\n<td>Whereas this matches the <em>most<\/em> matching characters \u2013 so foo.*bar in &#8220;<strong>foobar is not really a word, but foobar<\/strong> gets used a lot in IT documentation&#8221; matches the whole bold portion of the text<\/td>\n<\/tr>\n<tr>\n<td>IPV4<\/td>\n<td>IPv4 address<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>IPV6<\/td>\n<td>IPv6 address<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>IP<\/td>\n<td>IP address \u2013 either v4 or v6<\/td>\n<td>(?:%{IPV6}|%{IPV4})<\/td>\n<td>This provides some flexibility as groups move to IPv6 \u2013 but it&#8217;s a more complex filter, so I&#8217;ve been using IPV4 with the understanding that we may need to adjust some parsing rules in the future<\/td>\n<\/tr>\n<tr>\n<td>LOGLEVEL<\/td>\n<td>Text data<\/td>\n<td><\/td>\n<td>Regex to match list of standard log level strings \u2013 provides data validation over using DATA (i.e. if someone sets their log level to &#8220;superawful&#8221;, it won&#8217;t match)<\/td>\n<\/tr>\n<tr>\n<td>SYSLOGBASE<\/td>\n<td>Text data<\/td>\n<td><\/td>\n<td>This matches the standard start of a syslog record. Often used as &#8220;%{SYSLOGBASE} %{GREEDYDATA:msgtext}&#8221; to parse out the timestamp, facility, host, and program \u2013 the remainder of the text is mapped to &#8220;msgtext&#8221;<\/td>\n<\/tr>\n<tr>\n<td>URI<\/td>\n<td>Text data<\/td>\n<td><\/td>\n<td>protocol:\/\/stuff text is parsed into the protocol, user, host, path, and query parameters<\/td>\n<\/tr>\n<tr>\n<td>INT<\/td>\n<td>Numeric data<\/td>\n<td>(?:[+-]?(?:[0-9]+))<\/td>\n<td>Signed or unsigned integer<\/td>\n<\/tr>\n<tr>\n<td>NUMBER<\/td>\n<td>Numeric data<\/td>\n<td><\/td>\n<td>Can include a casting like\u00a0%{NUMBER:fieldname;int} or %{NUMBER:fieldname;float}<\/td>\n<\/tr>\n<tr>\n<td>TIMESTAMP_ISO8601<\/td>\n<td>DateTime<\/td>\n<td>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?<\/td>\n<td>There are various other date patterns depending on how the string will be formatted. This is the one that matches YYYYMMDDThh:mm:ss<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Parsing an entire log string<\/strong><\/p>\n<p>In a system with a set format for log data, parsing the entire line is reasonable \u2013 and, often, there will be a filter for well-known log types. I.E. if you are using the default Apache HTTPD log format, you don&#8217;t need to write a filter for each component of the log line\u00a0\u2013 just match either the HTTPD_COMBINEDLOG or HTTPD_COMMONLOG pattern.<\/p>\n<p>match =&gt; { &#8220;message&#8221; =&gt; &#8220;%{HTTPD_COMMONLOG}&#8221; }<\/p>\n<p>But you can create your own filter as well \u2013 internally developed applications and less common vendor applications won&#8217;t have prebuilt filter rules.<\/p>\n<p>match =&gt; { &#8220;message&#8221; =&gt; &#8220;%{TIMESTAMP_ISO8601:logtime} &#8211; %{IPV4:srcip} &#8211; %{IPV4:dstip} &#8211; %{DATA:result}&#8221; }<\/p>\n<p><strong>Extracting an array of data<\/strong><\/p>\n<p>Instead of trying to map an entire line at once, you can extract individual data elements by matching an array of patterns within the message.<\/p>\n<p>match =&gt; { &#8220;message&#8221; =&gt; [&#8220;srcip=%{IPV4:src_ip}&#8221;<br \/>\n, &#8220;srcport=%{NUMBER:srcport:int}&#8221;<br \/>\n,&#8221;dstip=%{IPV4:dst_ip}&#8221;<br \/>\n,&#8221;dstport=%{NUMBER:dstport:int}&#8221;] }<\/p>\n<p>This means the IP and port information will be extracted regardless of the order in which the fields are written in the log record. This also allows you to parse data out of log records where multiple different formats are used (as an example, the NSS Firewall logs) instead of trying to write different parsers for each of the possible string combinations.<\/p>\n<p>Logstash, by default, breaks when a match is found. This means you can &#8216;stack&#8217; different filters instead of using if tests. Sometimes, though, you don&#8217;t <em>want<\/em> to break when a match is found\u00a0\u2013 maybe you are extracting a bit of data that gets used in another match. In these cases, you can set break_on_match to &#8216;false&#8217; in the grok rule.<\/p>\n<p>I have also had to set break_on_match when extracting an array of values from a message.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"443\" height=\"282\" class=\"wp-image-9103\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-10.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-10.png 443w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2022\/06\/word-image-10-300x191.png 300w\" sizes=\"auto, (max-width: 443px) 100vw, 443px\" \/><\/p>\n<p><strong>Troubleshooting<\/strong><\/p>\n<p><strong>Log Files<\/strong><\/p>\n<p>Logstash logs output to \/opt\/elk\/logstash\/logs\/logstash-plain.log\u00a0\u2013 the logging level is defined in the \/opt\/elk\/logstash\/config\/log4j2.properties configuration file.<\/p>\n<p><strong>Viewing Data Transmitted to a Pipeline<\/strong><\/p>\n<p>There are several ways to confirm that data is being received by a pipeline \u2013 tcpdump can be used to verify information is being received on the port. If no data is being received, the port may be offline (if there is an error in the pipeline config, the pipeline will not load \u2013 grep \/opt\/elk\/logstash\/logs\/logstash-plain.log for the pipeline name to view errors), there may be a firewall preventing communication, or the sender could not be transmitting data.<\/p>\n<p>tcpdump dst port 5100 -vv<\/p>\n<p>If data is confirmed to be coming into the pipeline port, add a &#8220;file&#8221; output filter to the pipeline.<\/p>\n<p><strong>Issues<\/strong><\/p>\n<p><strong>Data from filebeat servers not received in ElasticSearch<\/strong><\/p>\n<p>We have encountered a scenario were data from the filebeat servers was not being transmitted to ElasticSearch. Monitoring the filebeat server did not show any data being sent. Restarting the Logstash servers allowed data to be transmitted as expected.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>General Info Logstash is a pipeline based data processing service. Data comes into logstash, is manipulated, and is sent elsewhere. The source is maintained on GitHub by ElasticCo. Installation Logstash was downloaded from ElasticCo and installed from a gzipped tar archive to the \/opt\/elk\/logstash folder. Configuration The Logstash server is configured using the logstash.yml file. &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1588],"tags":[1589,1643],"class_list":["post-9097","post","type-post","status-publish","format-standard","hentry","category-elk","tag-elk","tag-logstash"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9097","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9097"}],"version-history":[{"count":2,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9097\/revisions"}],"predecessor-version":[{"id":9105,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9097\/revisions\/9105"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9097"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9097"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9097"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}