{"id":10559,"date":"2023-12-15T15:18:39","date_gmt":"2023-12-15T20:18:39","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=10559"},"modified":"2023-12-15T15:18:40","modified_gmt":"2023-12-15T20:18:40","slug":"elasticsearch-too-many-shards","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=10559","title":{"rendered":"ElasticSearch &#8212; Too Many Shards"},"content":{"rendered":"\n<p>Our ElasticSearch environment melted down in a fairly spectacular fashion &#8212; evidently (at least in older iterations), it&#8217;s an unhandled Java exception when a server is trying to send data over to <em>another<\/em> server that is refusing it because that would put the receiver over the shard limit. So we <em>didn&#8217;t<\/em> just have a server or three go into read only mode &#8212; we had cascading failure where java would except out and the process was dead. Restarting the ElasticSearch service temporarily restored functionality &#8212; so I quickly increased the max shards per node limit to keep the system up whilst I cleaned up whatever I could clean up<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\ncurl -X PUT http:\/\/uid:pass@`hostname`:9200\/_cluster\/settings -H &quot;Content-Type: application\/json&quot; -d &#039;{ &quot;persistent&quot;: { &quot;cluster.max_shards_per_node&quot;: &quot;5000&quot; } }&#039;\n<\/pre><\/div>\n\n\n<p>There were two requests against the ES API that were helpful in cleaning &#8216;stuff&#8217; up &#8212; GET \/_cat\/allocation?v returns a list of each node in the ES cluster with a count of shards (plus disk space) being used. This was useful in confirming that load across &#8216;hot&#8217;, &#8216;warm&#8217;, and &#8216;cold&#8217; nodes was reasonable. If it was not, we would want to investigate <em>why<\/em> some nodes were under-allocated. We were, however, fine. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"470\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode-1024x470.png\" alt=\"\" class=\"wp-image-10560\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode-1024x470.png 1024w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode-300x138.png 300w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode-768x352.png 768w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode-750x344.png 750w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ES-ShardsPerNode.png 1218w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>The second request: GET \/_cat\/shards?v=true which dumps out <em>all<\/em> of the shards that comprise the stored data. In my case, a lot of clients create a new index daily &#8212; MyApp-20231215 &#8212; and then proceeded to add <em>absolutely nothing<\/em> to that index. Literally 10% of our shards were devoted to storing <em>zero<\/em> documents! Well, that&#8217;s silly. I created a quick script to remove any zero-document index that is older than a week. A new document coming in will create the index again, and we don&#8217;t need to waste shards not storing data. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards.png\"><img loading=\"lazy\" decoding=\"async\" width=\"994\" height=\"719\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards.png\" alt=\"\" class=\"wp-image-10561\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards.png 994w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards-300x217.png 300w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards-768x556.png 768w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2023\/12\/ListOfShards-750x543.png 750w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/a><\/figure>\n\n\n\n<p>Once you&#8217;ve cleaned up the shards, it&#8217;s a good idea to drop your shard-per-node configuration down again. I&#8217;m also putting together a script to run through the allocated shards per node data to alert us when allocation is unbalanced or when total shards approach our limit. Hopefully this will allow us to proactively reduce shards instead of having the entire cluster fall over one night. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our ElasticSearch environment melted down in a fairly spectacular fashion &#8212; evidently (at least in older iterations), it&#8217;s an unhandled Java exception when a server is trying to send data over to another server that is refusing it because that would put the receiver over the shard limit. So we didn&#8217;t just have a server &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1588],"tags":[1590,1589,1941],"class_list":["post-10559","post","type-post","status-publish","format-standard","hentry","category-elk","tag-elasticsearch","tag-elk","tag-shard-allocation"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10559"}],"version-history":[{"count":1,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10559\/revisions"}],"predecessor-version":[{"id":10562,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10559\/revisions\/10562"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}