Commit abb5a5cc6bba1516403146c5b79036fe843beb70

Authored by Paul Jackson
Committed by Linus Torvalds
1 parent aa95387774

[PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock

Fix ABBA deadlock between lock_cpu_hotplug() and the cpuset
callback_mutex lock.

It only happens on cpu_exclusive cpusets, due to the dynamic
sched domain code trying to take the cpu hotplug lock inside
the cpuset callback_mutex lock.

This bug has apparently been here for several months, but didn't
get hit until the right customer load on a large system.

This fix appears right from inspection, but it will take a few
more days running it on that customers workload to be confident
we nailed it.  We don't have any other reproducible test case.

The cpu_hotplug_lock() tends to cover large runs of code.
The other places that hold both that lock and the cpuset callback
mutex lock always nest the cpuset lock inside the hotplug lock.
This place tries to do the reverse, risking an ABBA deadlock.

This is in the cpuset_rmdir() code, where we:
  * take the callback_mutex lock
  * mark the cpuset CS_REMOVED
  * call update_cpu_domains for cpu_exclusive cpusets
  * in that call, take the cpu_hotplug lock if the
    cpuset is marked for removal.

Thanks to Jack Steiner for identifying this deadlock.

The fix is to tear down the dynamic sched domain before we grab
the cpuset callback_mutex lock.  This way, the two locks are
serialized, with the hotplug lock taken and released before
trying for the cpuset lock.

I suspect that this bug was introduced when I changed the
cpuset locking from one lock to two.  The dynamic sched domain
dependency on cpu_exclusive cpusets and its hotplug hooks were
added to this code earlier, when cpusets had only a single lock.
It may well have been fine then.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 1 changed file with 21 additions and 3 deletions Side-by-side Diff

... ... @@ -762,6 +762,8 @@
762 762 *
763 763 * Call with manage_mutex held. May nest a call to the
764 764 * lock_cpu_hotplug()/unlock_cpu_hotplug() pair.
  765 + * Must not be called holding callback_mutex, because we must
  766 + * not call lock_cpu_hotplug() while holding callback_mutex.
765 767 */
766 768  
767 769 static void update_cpu_domains(struct cpuset *cur)
... ... @@ -781,7 +783,7 @@
781 783 if (is_cpu_exclusive(c))
782 784 cpus_andnot(pspan, pspan, c->cpus_allowed);
783 785 }
784   - if (is_removed(cur) || !is_cpu_exclusive(cur)) {
  786 + if (!is_cpu_exclusive(cur)) {
785 787 cpus_or(pspan, pspan, cur->cpus_allowed);
786 788 if (cpus_equal(pspan, cur->cpus_allowed))
787 789 return;
... ... @@ -1917,6 +1919,17 @@
1917 1919 return cpuset_create(c_parent, dentry->d_name.name, mode | S_IFDIR);
1918 1920 }
1919 1921  
  1922 +/*
  1923 + * Locking note on the strange update_flag() call below:
  1924 + *
  1925 + * If the cpuset being removed is marked cpu_exclusive, then simulate
  1926 + * turning cpu_exclusive off, which will call update_cpu_domains().
  1927 + * The lock_cpu_hotplug() call in update_cpu_domains() must not be
  1928 + * made while holding callback_mutex. Elsewhere the kernel nests
  1929 + * callback_mutex inside lock_cpu_hotplug() calls. So the reverse
  1930 + * nesting would risk an ABBA deadlock.
  1931 + */
  1932 +
1920 1933 static int cpuset_rmdir(struct inode *unused_dir, struct dentry *dentry)
1921 1934 {
1922 1935 struct cpuset *cs = dentry->d_fsdata;
1923 1936  
... ... @@ -1936,11 +1949,16 @@
1936 1949 mutex_unlock(&manage_mutex);
1937 1950 return -EBUSY;
1938 1951 }
  1952 + if (is_cpu_exclusive(cs)) {
  1953 + int retval = update_flag(CS_CPU_EXCLUSIVE, cs, "0");
  1954 + if (retval < 0) {
  1955 + mutex_unlock(&manage_mutex);
  1956 + return retval;
  1957 + }
  1958 + }
1939 1959 parent = cs->parent;
1940 1960 mutex_lock(&callback_mutex);
1941 1961 set_bit(CS_REMOVED, &cs->flags);
1942   - if (is_cpu_exclusive(cs))
1943   - update_cpu_domains(cs);
1944 1962 list_del(&cs->sibling); /* delete my sibling from parent->children */
1945 1963 spin_lock(&cs->dentry->d_lock);
1946 1964 d = dget(cs->dentry);