Commit ffe5a83005b0d23575ab109755b4cb5518a5d91f

Authored by Chuck Lever
Committed by Trond Myklebust
1 parent 8cb7f74eee

NFS: Slow down state manager after an unhandled error

If the state manager thread is not actually able to fully recover from
some situation, it wakes up waiters, who kick off a new state manager
thread.  Quite often the fresh invocation of the state manager is just
as successful.

This results in a livelock as the client dumps thousands of NFS
requests a second on the network in a vain attempt to recover.  Not
very friendly.

To mitigate this situation, add a delay in the state manager after
an unhandled error, so that the client sends just a few requests
every second in this case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Showing 1 changed file with 1 additions and 0 deletions Side-by-side Diff

... ... @@ -2015,6 +2015,7 @@
2015 2015 pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s"
2016 2016 " with error %d\n", section_sep, section,
2017 2017 clp->cl_hostname, -status);
  2018 + ssleep(1);
2018 2019 nfs4_end_drain_session(clp);
2019 2020 nfs4_clear_state_manager_bit(clp);
2020 2021 }