{"id":10436,"date":"2023-10-02T20:18:19","date_gmt":"2023-10-03T01:18:19","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=10436"},"modified":"2023-10-03T11:01:46","modified_gmt":"2023-10-03T16:01:46","slug":"linux-high-load-with-cifs-mounts-using-kernel-6-5-5","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=10436","title":{"rendered":"Linux \u2013 High Load with CIFS Mounts using Kernel 6.5.5"},"content":{"rendered":"<p>We recently updated our Fedora servers from 36 and 37 to 38. Since the upgrade, we have observed servers with very high load averages \u2013 8+ on a 4-cpu server \u2013 but the server didn\u2019t seem unreasonably slow. On the Unix servers I first used, Irix and Solaris, load average counts threads in a Runnable state. Linux, however, includes both Runnable and Uninterruptible states in the load average. This means processes waiting \u2013 on network calls using mkdir to a mounted remote server, local disk I\/O \u2013 are included in the load average. As such, a high load average on Linux may indicate CPU resource contention but it may also indicate I\/O contention elsewhere.<\/p>\n<p>But there\u2019s a third possibility \u2013 code that opts for the simplicity of the uninterrupted sleep without <em>needing<\/em> to be uninterruptible for a process. In our upgrade, CIFS mounts have a laundromat that I assume cleans up cache \u2013 I see four cifsd-cfid-laundromat threads in an uninterruptible sleep state \u2013 which means my load average, when the system is doing absolutely nothing, would be 4.<\/p>\n<pre>2023-10-03 11:11:12 [lisa@server01 ~\/]# ps aux | grep \" [RD]\"\r\nUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND\r\nroot 1150 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]\r\nroot 1151 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]\r\nroot 1152 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]\r\nroot 1153 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]\r\nroot 556598 0.0 0.0 224668 3072 pts\/11 R+ 11:11 0:00 ps aux<\/pre>\n<p>Looking around the Internet, I see quite a few bug reports regarding this situation \u2026 so it seems like a \u201cignore it and wait\u201d problem \u2013 although the load average value is increased by these sleeping threads, it\u2019s cosmetic. Which explains why the server didn\u2019t seem to be running slowly even through the load average was so high.<\/p>\n<p><a href=\"https:\/\/lkml.org\/lkml\/2023\/9\/26\/1144\">https:\/\/lkml.org\/lkml\/2023\/9\/26\/1144<\/a><\/p>\n<pre>Date: Tue, 26 Sep 2023 17:54:10 -0700\r\nFrom: Paul Aurich \r\nSubject: Re: Possible bug report: kernel 6.5.0\/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in\"D\" state)\r\n\r\nOn 2023-09-19 13:23:44 -0500, Steve French wrote:\r\n&gt;On Tue, Sep 19, 2023 at 1:07\u202fPM Tom Talpey &lt;tom@talpey.com&gt; wrote:\r\n&gt;&gt; These changes are good, but I'm skeptical they will reduce the load\r\n&gt;&gt; when the laundromat thread is actually running. All these do is avoid\r\n&gt;&gt; creating it when not necessary, right?\r\n&gt;\r\n&gt;It does create half as many laundromat threads (we don't need\r\n&gt;laundromat on connection to IPC$) even for the Windows server target\r\n&gt;example, but helps more for cases where server doesn't support\r\n&gt;directory leases.\r\n\r\nPerhaps the laundromat thread should be using msleep_interruptible()?\r\n\r\nUsing an interruptible sleep appears to prevent the thread from contributing\r\nto the load average, and has the happy side-effect of removing the up-to-1s delay\r\nwhen tearing down the tcon (since a7c01fa93ae, kthread_stop() will return\r\nearly triggered by kthread_stop).\r\n\r\n~Paul<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We recently updated our Fedora servers from 36 and 37 to 38. Since the upgrade, we have observed servers with very high load averages \u2013 8+ on a 4-cpu server \u2013 but the server didn\u2019t seem unreasonably slow. On the Unix servers I first used, Irix and Solaris, load average counts threads in a Runnable &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30],"tags":[1913,294,1914,1915,867],"class_list":["post-10436","post","type-post","status-publish","format-standard","hentry","category-system-administration","tag-cifs","tag-linux","tag-load","tag-load-average","tag-samba"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10436","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10436"}],"version-history":[{"count":1,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10436\/revisions"}],"predecessor-version":[{"id":10437,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10436\/revisions\/10437"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10436"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10436"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}