{"id":9022,"date":"2022-05-24T15:08:30","date_gmt":"2022-05-24T20:08:30","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=9022"},"modified":"2022-05-24T15:08:31","modified_gmt":"2022-05-24T20:08:31","slug":"elk-monitoring","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=9022","title":{"rendered":"ELK Monitoring"},"content":{"rendered":"\n<p>We have a number of logstash servers gathering data from various filebeat sources. We&#8217;ve recently experienced a problem where the pipeline stops getting data for <em>some<\/em> of those sources. Not all &#8212; and restarting the non-functional filebeat source sends data for ten minutes or so. We were able to rectify the immediate problem by restarting our logstash services (IT troubleshooting step #1 &#8212; we restarted all of the filebeats and, when that didn&#8217;t help, moved on to restarting the logstashes)<\/p>\n\n\n\n<p>But we need to have a way to ensure this isn&#8217;t happening &#8212; losing days of log data from some sources is <em>really bad<\/em>. So I put together a Python script to verify there&#8217;s <em>something<\/em> coming in from each of the filebeat sources. <\/p>\n\n\n\n<p>pip install elasticsearch==7.13.4<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#!\/usr\/bin\/env python3\n#-*- coding: utf-8 -*-\n# Disable warnings that not verifying SSL trust isn&#039;t a good idea\nimport requests\nrequests.packages.urllib3.disable_warnings()\n\nfrom elasticsearch import Elasticsearch\nimport time\n\n# Modules for email alerting\nimport smtplib\nfrom email.mime.multipart import MIMEMultipart\nfrom email.mime.text import MIMEText\n\n\n# Config variables\nstrSenderAddress = &quot;devnull@example.com&quot;\nstrRecipientAddress = &quot;me@example.com&quot;\nstrSMTPHostname = &quot;mail.example.com&quot;\niSMTPPort = 25\n\nlistSplunkRelayHosts = &#x5B;&#039;host293&#039;, &#039;host590&#039;, &#039;host591&#039;, &#039;host022&#039;, &#039;host014&#039;, &#039;host135&#039;]\niAgeThreashold = 3600 # Alert if last document is more than an hour old (3600 seconds)\n\nstrAlert = None\n\nfor strRelayHost in listSplunkRelayHosts:\n\tiCurrentUnixTimestamp = time.time()\n\telastic_client = Elasticsearch(&quot;https:\/\/elasticsearchhost.example.com:9200&quot;, http_auth=(&#039;rouser&#039;,&#039;r0pAs5w0rD&#039;), verify_certs=False)\n\n\tquery_body = {\n\t\t&quot;sort&quot;: {\n\t\t\t&quot;@timestamp&quot;: {\n\t\t\t\t&quot;order&quot;: &quot;desc&quot;\n\t\t\t}\n\t\t},\n\t\t&quot;query&quot;: {\n\t\t\t&quot;bool&quot;: {\n\t\t\t\t&quot;must&quot;: {\n\t\t\t\t\t&quot;term&quot;: {\n\t\t\t\t\t\t&quot;host.hostname&quot;: strRelayHost\n\t\t\t\t\t}\n\t\t\t\t},\n\t\t\t\t&quot;must_not&quot;: {\n\t\t\t\t\t&quot;term&quot;: {\n\t\t\t\t\t\t&quot;source&quot;: &quot;\/var\/log\/messages&quot;\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\n\tresult = elastic_client.search(index=&quot;network_syslog*&quot;, body=query_body,size=1)\n\tall_hits = result&#x5B;&#039;hits&#039;]&#x5B;&#039;hits&#039;]\n\n\tiDocumentAge = None\n\tfor num, doc in enumerate(all_hits):\n\t\tiDocumentAge =  (  (iCurrentUnixTimestamp*1000) - doc.get(&#039;sort&#039;)&#x5B;0]) \/ 1000.0\n\n\tif iDocumentAge is not None:\n\t\tif iDocumentAge &gt; iAgeThreashold:\n\t\t\tif strAlert is None:\n\t\t\t\tstrAlert = f&quot;&lt;tr&gt;&lt;td&gt;{strRelayHost}&lt;\/td&gt;&lt;td&gt;{iDocumentAge}&lt;\/td&gt;&lt;\/tr&gt;&quot;\n\t\t\telse:\n\t\t\t\tstrAlert = f&quot;{strAlert}\\n&lt;tr&gt;&lt;td&gt;{strRelayHost}&lt;\/td&gt;&lt;td&gt;{iDocumentAge}&lt;\/td&gt;&lt;\/tr&gt;\\n&quot;\n\t\t\tprint(f&quot;PROBLEM - For {strRelayHost}, document age is {iDocumentAge} second(s)&quot;)\n\t\telse:\n\t\t\tprint(f&quot;GOOD - For {strRelayHost}, document age is {iDocumentAge} second(s)&quot;)\n\telse:\n\t\tprint(f&quot;PROBLEM - For {strRelayHost}, no recent record found&quot;)\n\n\nif strAlert is not None:\n\tmsg = MIMEMultipart(&#039;alternative&#039;)\n\tmsg&#x5B;&#039;Subject&#039;] = &quot;ELK Filebeat Alert&quot;\n\tmsg&#x5B;&#039;From&#039;] = strSenderAddress\n\tmsg&#x5B;&#039;To&#039;] = strRecipientAddress\n\n\tstrHTMLMessage = f&quot;&lt;html&gt;&lt;body&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;Server&lt;\/th&gt;&lt;th&gt;Document Age&lt;\/th&gt;&lt;\/tr&gt;{strAlert}&lt;\/table&gt;&lt;\/body&gt;&lt;\/html&gt;&quot;\n\tstrTextMessage = strAlert\n\n\tpart1 = MIMEText(strTextMessage, &#039;plain&#039;)\n\tpart2 = MIMEText(strHTMLMessage, &#039;html&#039;)\n\n\tmsg.attach(part1)\n\tmsg.attach(part2)\n\n\ts = smtplib.SMTP(strSMTPHostname)\n\ts.sendmail(strSenderAddress, strRecipientAddress, msg.as_string())\n\ts.quit()\n\n<\/pre><\/div>\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We have a number of logstash servers gathering data from various filebeat sources. We&#8217;ve recently experienced a problem where the pipeline stops getting data for some of those sources. Not all &#8212; and restarting the non-functional filebeat source sends data for ten minutes or so. We were able to rectify the immediate problem by restarting &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1588],"tags":[1590,1589,1642,1643,1644,664,633],"class_list":["post-9022","post","type-post","status-publish","format-standard","hentry","category-elk","tag-elasticsearch","tag-elk","tag-filebeat","tag-logstash","tag-monitoring","tag-python","tag-script"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9022"}],"version-history":[{"count":1,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9022\/revisions"}],"predecessor-version":[{"id":9023,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/9022\/revisions\/9023"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}