Tag: script

ELK Monitoring

We have a number of logstash servers gathering data from various filebeat sources. We’ve recently experienced a problem where the pipeline stops getting data for some of those sources. Not all — and restarting the non-functional filebeat source sends data for ten minutes or so. We were able to rectify the immediate problem by restarting our logstash services (IT troubleshooting step #1 — we restarted all of the filebeats and, when that didn’t help, moved on to restarting the logstashes)

But we need to have a way to ensure this isn’t happening — losing days of log data from some sources is really bad. So I put together a Python script to verify there’s something coming in from each of the filebeat sources.

pip install elasticsearch==7.13.4

#!/usr/bin/env python3
#-*- coding: utf-8 -*-
# Disable warnings that not verifying SSL trust isn't a good idea
import requests

from elasticsearch import Elasticsearch
import time

# Modules for email alerting
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

# Config variables
strSenderAddress = "devnull@example.com"
strRecipientAddress = "me@example.com"
strSMTPHostname = "mail.example.com"
iSMTPPort = 25

listSplunkRelayHosts = ['host293', 'host590', 'host591', 'host022', 'host014', 'host135']
iAgeThreashold = 3600 # Alert if last document is more than an hour old (3600 seconds)

strAlert = None

for strRelayHost in listSplunkRelayHosts:
	iCurrentUnixTimestamp = time.time()
	elastic_client = Elasticsearch("https://elasticsearchhost.example.com:9200", http_auth=('rouser','r0pAs5w0rD'), verify_certs=False)

	query_body = {
		"sort": {
			"@timestamp": {
				"order": "desc"
		"query": {
			"bool": {
				"must": {
					"term": {
						"host.hostname": strRelayHost
				"must_not": {
					"term": {
						"source": "/var/log/messages"

	result = elastic_client.search(index="network_syslog*", body=query_body,size=1)
	all_hits = result['hits']['hits']

	iDocumentAge = None
	for num, doc in enumerate(all_hits):
		iDocumentAge =  (  (iCurrentUnixTimestamp*1000) - doc.get('sort')[0]) / 1000.0

	if iDocumentAge is not None:
		if iDocumentAge > iAgeThreashold:
			if strAlert is None:
				strAlert = f"<tr><td>{strRelayHost}</td><td>{iDocumentAge}</td></tr>"
				strAlert = f"{strAlert}\n<tr><td>{strRelayHost}</td><td>{iDocumentAge}</td></tr>\n"
			print(f"PROBLEM - For {strRelayHost}, document age is {iDocumentAge} second(s)")
			print(f"GOOD - For {strRelayHost}, document age is {iDocumentAge} second(s)")
		print(f"PROBLEM - For {strRelayHost}, no recent record found")

if strAlert is not None:
	msg = MIMEMultipart('alternative')
	msg['Subject'] = "ELK Filebeat Alert"
	msg['From'] = strSenderAddress
	msg['To'] = strRecipientAddress

	strHTMLMessage = f"<html><body><table><tr><th>Server</th><th>Document Age</th></tr>{strAlert}</table></body></html>"
	strTextMessage = strAlert

	part1 = MIMEText(strTextMessage, 'plain')
	part2 = MIMEText(strHTMLMessage, 'html')


	s = smtplib.SMTP(strSMTPHostname)
	s.sendmail(strSenderAddress, strRecipientAddress, msg.as_string())


A few times now, I’ve encountered individuals with cron jobs or bash scripts where a command execution ends in 2>/dev/null … and the individual is stymied by the fact it’s not working but there’s no clue as to why. The error output is being sent into a big black hole never to escape!

The trick here is to understand file descriptors — 1 is basically a shortcut name for STDOUT and 2 is basically a shortcut name for STDERR (0 is STDIN, although that’s not particularly relevant here).  So 2>/dev/null says “take all of the STDERR stuff and redirect it to /dev/null”.

Sometimes you’ll see both STDERR and STDOUT being redirected either to a file or to /dev/null — in that case you will see 2>&1 where the ampersand prior to the “1” indicates the stream is being redirected to a file descriptor (2>1 would direct STDOUT to a file named “1”) — so >/dev/null 2>&1 is the normal way you’d see it written. Functionally, >/dev/null 1>&2 would be the same thing … but redirecting all output into error is, conceptually, a little odd.

To visualize all of this, use a command that will output something to both STDERR and STDOUT — for clarify, I’ve used “1>/dev/null” (redirect STDOUT to /devnull) in conjunction with 2>&1 (redirect STDERR to STDOUT). As written in the text above, the number 1 is generally omitted and just >/dev/null is written.



Shell Scripting: “File Exist” Test With Wildcards

Determining if a specific file exists within a shell script is straight-forward:

if [ -f filename.txt ]; then DoSomething; fi

The -f verifies that a regular file exists. You might want -s (exists and size is greater than zero), -w (exists and is writable), -e (a regular or special file exists), etc. But the test comes from the “CONDITIONAL EXPRESSIONS” section of the bash man page and is simply used in an if statement.

What if you don’t know the exact name of the file? Using the text “if [ -f *substring*.xtn ]” seems like it works. If there is no matching file, the condition evaluates to FALSE. If there is one matching file the condition evaluates to TRUE. But when there are multiple matching files, you get an error because there are too many parameters

[lisa@fc test]# ll
total 0
[lisa@fc test]# if [ -f *something*.txt ]; then echo "The file exists"; fi
[lisa@fc test]# touch 1something1.txt
[lisa@fc test]# if [ -f *something*.txt ]; then echo "The file exists"; fi
The file exists
[lisa@fc test]# touch 2something2.txt
[lisa@fc test]# if [ -f *something*.txt ]; then echo "The file exists"; fi
-bash: [: 1something1.txt: binary operator expected

Beyond throwing an error … we are not executing the code-block meant to be run when the condition is TRUE. In a shell script, execution will continue past the block as if the condition evaluated to FALSE (i.e. the script does not just abnormally end on the error, making the failure more obvious).

To test for the existence of possibly multiple files matching a pattern, we can evaluate the number of files returned from ls. I include 2>/dev/null to hide the error which will otherwise be displayed when there are zero matching files.

[lisa@fc test]# ll
total 0
[lisa@fc test]# if [ $(ls *something*.txt 2>/dev/null | wc -l) -gt 0 ]; then echo "Some matching files are found."; fi
[lisa@fc test]# touch 1something1.txt
[lisa@fc test]# if [ $(ls *something*.txt 2>/dev/null | wc -l) -gt 0 ]; then echo "Some matching files are found."; fi
Some matching files are found.
[lisa@fc test]# touch 2something2.txt
[lisa@fc test]# if [ $(ls *something*.txt 2>/dev/null | wc -l) -gt 0 ]; then echo "Some matching files are found."; fi
Some matching files are found.
[lisa@fc test]#

Now we have a test that evaluates to TRUE when there are one or more matching files in the path.