{"id":11197,"date":"2024-10-08T11:39:34","date_gmt":"2024-10-08T16:39:34","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=11197"},"modified":"2024-10-08T11:52:44","modified_gmt":"2024-10-08T16:52:44","slug":"k8s-resolv-conf-and-ndots","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=11197","title":{"rendered":"K8s, resolv.conf, and ndots"},"content":{"rendered":"<p>I had a very strange problem when firewalld was used with nftables as the back end \u2013 rules configured properly in firewalld didn\u2019t exist in the nftables rulesets so \u2026 didn\u2019t exist. The most obvious failure in the k8s cluster was DNS resolution \u2013 requests to any nodes where nftables was the back end just timed out. In diagnosing the \u201cdns queries time out\u201d issue, I was watching the logs from the coredns pods. And I saw a <em>lot<\/em> of NXDOMAIN errors. Not because I had a hostname mistyped or anything \u2013 each pod was appending every domain in the resolv.conf search order <em>before<\/em> trying the actual hostname.<\/p>\n<p>Quick solution was to update our hostnames to include the trailing dot for the root zone. It is not redishost.example.com but rather redishost.example.com.<\/p>\n<p>But that didn\u2019t explain why \u2013 I\u2019ve got plenty of Linux boxes where there are some search domains in resolv.conf. Never once seen redishost.example.com.example.com come across the query log. There <em>is<\/em> a configuration that I\u2019ve rarely used that is designed to speed up getting to the search list. You can <a href=\"https:\/\/linux.die.net\/man\/5\/resolv.conf\">configure ndots \u2013 the default is one<\/a>, but you can set whatever positive integer you would like. Surely, they wouldn\u2019t set ndots to something crazy high \u2026 right??<\/p>\n<p>Oh, look \u2013<\/p>\n<pre>Defaulted container \"kafka-streams-app\" out of: kafka-streams-app, filebeat\r\nbash-4.4# cat \/etc\/resolv.conf\r\nsearch kstreams.svc.cluster.local svc.cluster.local cluster.local mgmt.example.net dsys.example.net dnoc.example.net admin.example.net example.com\r\nnameserver 10.6.0.5\r\noptions ndots:5<\/pre>\n<p>Yup, it\u2019s right there in the source &#8212; and it&#8217;s been there for seven years:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"98\" class=\"wp-image-11198\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-1.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-1.png 1018w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-1-300x29.png 300w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-1-768x74.png 768w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-1-750x72.png 750w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><\/p>\n<p>What does this mean? Well, ndots is really just the number of dots in a hostname. If there are fewer than <em>ndots<\/em> dots, the resolver will try appending the search domains <em>first<\/em> and then try what you typed as a last resort. With one dot, that basically means a string with no dots will get the search domains appended. I guess if you go out and register a gTLD for your company \u2013 my hostname is literally just example. \u2013 then you\u2019ll have a little inefficiency as the search domains are tried. But that\u2019s a really edge case. With the k8s default, anything with <em>fewer than five<\/em> dots gets <em>all<\/em> of those search domains appended first.<\/p>\n<p>So I need redishost.example.com? I see the following resolutions fail because there is no such hostname:<\/p>\n<pre>[INFO] 64.24.29.155:57014 - # \"A IN redishost.example.com.svc.cluster.local. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:57028 - # \"AAAA IN redishost.example.com.svc.cluster.local. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:56096 - # \"A IN redishost.example.com.cluster.local. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:56193 - # \"AAAA IN redishost.example.com.cluster.local. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:55001 - # \"A IN redishost.example.com.mgmt.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:55194 - # \"AAAA IN redishost.example.com.mgmt.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:54078 - # \"A IN redishost.example.com.dsys.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:54127 - # \"AAAA IN redishost.example.com.dsys.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:52061 - # \"A IN redishost.example.com.dnoc.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:52182 - # \"AAAA IN redishost.example.com.dnoc.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:51018 - # \"A IN redishost.example.com.admin.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:51104 - # \"AAAA IN redishost.example.com.admin.example.net. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:50052 - # \"A IN redishost.example.com.example.com. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n[INFO] 64.24.29.155:50189 - # \"AAAA IN redishost.example.com.example.com. udp # false 512\" NXDOMAIN qr,aa,rd 158 0.00019419s\r\n<\/pre>\n<p>Wonderful &#8212; IPv6 is enabled and it&#8217;s trying AAAA records too. <em>Finally<\/em> it resolves redishost.example.com!<\/p>\n<p>Luckily, there is a quick solution. Update the deployment YAML to include a custom ndots value \u2013 I like 1. I could see where someone might want two \u2013 something.else where I need svc.cluster.local appended, maybe I don\u2019t want to waste time looking up something.else \u2026 <em>I<\/em> don\u2019t want to do that. But I could see why something higher than one might be desirable in k8s. Not sure I buy it\u2019s awesome enough to be the <em>default<\/em>, though!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"411\" height=\"99\" class=\"wp-image-11199\" src=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-2.png\" srcset=\"https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-2.png 411w, https:\/\/www.rushworth.us\/lisa\/wp-content\/uploads\/2024\/10\/word-image-11197-2-300x72.png 300w\" sizes=\"auto, (max-width: 411px) 100vw, 411px\" \/><\/p>\n<p>Redeployed and instantly cut the DNS traffic by about 90% &#8212; and reduced application latency as each DNS call no longer has to have fourteen failures before the final success.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I had a very strange problem when firewalld was used with nftables as the back end \u2013 rules configured properly in firewalld didn\u2019t exist in the nftables rulesets so \u2026 didn\u2019t exist. The most obvious failure in the k8s cluster was DNS resolution \u2013 requests to any nodes where nftables was the back end just &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1541],"tags":[2053,1542,615],"class_list":["post-11197","post","type-post","status-publish","format-standard","hentry","category-kubernetes","tag-coredns","tag-k8s","tag-kubernetes"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/11197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11197"}],"version-history":[{"count":1,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/11197\/revisions"}],"predecessor-version":[{"id":11200,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/11197\/revisions\/11200"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}