02 Feb, 2010

4 commits


30 Jan, 2010

2 commits

  • This patch fixes the regression in functionality where the
    kernel debugger and the perf API do not nicely share hw
    breakpoint reservations.

    The kernel debugger cannot use any mutex_lock() calls because it
    can start the kernel running from an invalid context.

    A mutex free version of the reservation API needed to get
    created for the kernel debugger to safely update hw breakpoint
    reservations.

    The possibility for a breakpoint reservation to be concurrently
    processed at the time that kgdb interrupts the system is
    improbable. Should this corner case occur the end user is
    warned, and the kernel debugger will prohibit updating the
    hardware breakpoint reservations.

    Any time the kernel debugger reserves a hardware breakpoint it
    will be a system wide reservation.

    Signed-off-by: Jason Wessel
    Acked-by: Frederic Weisbecker
    Cc: kgdb-bugreport@lists.sourceforge.net
    Cc: K.Prasad
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: torvalds@linux-foundation.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jason Wessel
     
  • In the 2.6.33 kernel, the hw_breakpoint API is now used for the
    performance event counters. The hw_breakpoint_handler() now
    consumes the hw breakpoints that were previously set by kgdb
    arch specific code. In order for kgdb to work in conjunction
    with this core API change, kgdb must use some of the low level
    functions of the hw_breakpoint API to install, uninstall, and
    deal with hw breakpoint reservations.

    The kgdb core required a change to call kgdb_disable_hw_debug
    anytime a slave cpu enters kgdb_wait() in order to keep all the
    hw breakpoints in sync as well as to prevent hitting a hw
    breakpoint while kgdb is active.

    During the architecture specific initialization of kgdb, it will
    pre-allocate 4 disabled (struct perf event **) structures. Kgdb
    will use these to manage the capabilities for the 4 hw
    breakpoint registers, per cpu. Right now the hw_breakpoint API
    does not have a way to ask how many breakpoints are available,
    on each CPU so it is possible that the install of a breakpoint
    might fail when kgdb restores the system to the run state. The
    intent of this patch is to first get the basic functionality of
    hw breakpoints working and leave it to the person debugging the
    kernel to understand what hw breakpoints are in use and what
    restrictions have been imposed as a result. Breakpoint
    constraints will be dealt with in a future patch.

    While atomic, the x86 specific kgdb code will call
    arch_uninstall_hw_breakpoint() and arch_install_hw_breakpoint()
    to manage the cpu specific hw breakpoints.

    The net result of these changes allow kgdb to use the same pool
    of hw_breakpoints that are used by the perf event API, but
    neither knows about future reservations for the available hw
    breakpoint slots.

    Signed-off-by: Jason Wessel
    Acked-by: Frederic Weisbecker
    Cc: kgdb-bugreport@lists.sourceforge.net
    Cc: K.Prasad
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: torvalds@linux-foundation.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jason Wessel
     

28 Jan, 2010

3 commits

  • On a given architecture, when hardware breakpoint registration fails
    due to un-supported access type (read/write/execute), we lose the bp
    slot since register_perf_hw_breakpoint() does not release the bp slot
    on failure.
    Hence, any subsequent hardware breakpoint registration starts failing
    with 'no space left on device' error.

    This patch introduces error handling in register_perf_hw_breakpoint()
    function and releases bp slot on error.

    Signed-off-by: Mahesh Salgaonkar
    Cc: Ananth N Mavinakayanahalli
    Cc: K. Prasad
    Cc: Maneesh Soni
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Mahesh Salgaonkar
     
  • Due to an incorrect line break the output currently contains tabs.
    Also remove trailing space.

    The actual output that logcheck sent me looked like this:
    Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040)

    After this patch it becomes:
    Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040)

    Signed-off-by: Frans Pop
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frans Pop
     
  • We moved to migrate on wakeup, which means that sleeping tasks could
    still be present on offline cpus. Amend the check to only test running
    tasks.

    Reported-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Jan, 2010

4 commits

  • Lockdep has found the real bug, but the output doesn't look right to me:

    > =========================================================
    > [ INFO: possible irq lock inversion dependency detected ]
    > 2.6.33-rc5 #77
    > ---------------------------------------------------------
    > emacs/1609 just changed the state of lock:
    > (&(&tty->ctrl_lock)->rlock){+.....}, at: [] tty_fasync+0xe8/0x190
    > but this lock took another, HARDIRQ-unsafe lock in the past:
    > (&(&sighand->siglock)->rlock){-.....}

    "HARDIRQ-unsafe" and "this lock took another" looks wrong, afaics.

    > ... key at: [] __key.46539+0x0/0x8
    > ... acquired at:
    > [] __lock_acquire+0x1056/0x15a0
    > [] lock_acquire+0x9f/0x120
    > [] _raw_spin_lock_irqsave+0x52/0x90
    > [] __proc_set_tty+0x3e/0x150
    > [] tty_open+0x51d/0x5e0

    The stack-trace shows that this lock (ctrl_lock) was taken under
    ->siglock (which is hopefully irq-safe).

    This is a clear typo in check_usage_backwards() where we tell the print a
    fancy routine we're forwards.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Update the graph tracer examples to cover the new frame pointer semantics
    (in terms of passing it along). Move the HAVE_FUNCTION_GRAPH_FP_TEST docs
    out of the Kconfig, into the right place, and expand on the details.

    Signed-off-by: Mike Frysinger
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Mike Frysinger
     
  • If the iterator comes to an empty page for some reason, or if
    the page is emptied by a consuming read. The iterator code currently
    does not check if the iterator is pass the contents, and may
    return a false entry.

    This patch adds a check to the ring buffer iterator to test if the
    current page has been completely read and sets the iterator to the
    next page if necessary.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Usually reads of the ring buffer is performed by a single task.
    There are two types of reads from the ring buffer.

    One is a consuming read which will consume the entry that was read
    and the next read will be the entry that follows.

    The other is an iterator that will let the user read the contents of
    the ring buffer without modifying it. When an iterator is allocated,
    writes to the ring buffer are disabled to protect the iterator.

    The problem exists when consuming reads happen while an iterator is
    allocated. Specifically, the kind of read that swaps out an entire
    page (used by splice) and replaces it with a new read. If the iterator
    is on the page that is swapped out, then the next read may read
    from this swapped out page and return garbage.

    This patch adds a check when reading the iterator to make sure that
    the iterator contents are still valid. If a consuming read has taken
    place, the iterator is reset.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

26 Jan, 2010

2 commits

  • commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
    logic) introduced a potential kgdb dead lock. When the kernel is
    stopped by kgdb inside code which holds watchdog_lock then kgdb dead
    locks in clocksource_resume_watchdog().

    clocksource_resume_watchdog() is called from kbdg via
    clocksource_touch_watchdog() to avoid that the clock source watchdog
    marks TSC unstable after the kernel has been stopped.

    Solve this by replacing spin_lock with a spin_trylock and just return
    in case the lock is held. Not resetting the watchdog might result in
    TSC becoming marked unstable, but that's an acceptable penalty for
    using kgdb.

    The timekeeping is anyway easily screwed up by kgdb when the system
    uses either jiffies or a clock source which wraps in short intervals
    (e.g. pm_timer wraps about every 4.6s), so we really do not have to
    worry about that occasional TSC marked unstable side effect.

    The second caller of clocksource_resume_watchdog() is
    clocksource_resume(). The trylock is safe here as well because the
    system is UP at this point, interrupts are disabled and nothing else
    can hold watchdog_lock().

    Reported-by: Jason Wessel
    LKML-Reference:
    Cc: kgdb-bugreport@lists.sourceforge.net
    Cc: Martin Schwidefsky
    Cc: John Stultz
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • If the contents of the ftrace ring buffer gets corrupted and the trace
    file is read, it could create a kernel oops (usualy just killing the user
    task thread). This is caused by the checking of the pid in the buffer.
    If the pid is negative, it still references the cmdline cache array,
    which could point to an invalid address.

    The simple fix is to test for negative PIDs.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

25 Jan, 2010

2 commits


22 Jan, 2010

2 commits

  • There are a number of issues:

    1) TASK_WAKING vs cgroup_clone (cpusets)

    copy_process():

    sched_fork()
    child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
    if (current->nsproxy != p->nsproxy)
    ns_cgroup_clone()
    cgroup_clone()
    mutex_lock(inode->i_mutex)
    mutex_lock(cgroup_mutex)
    cgroup_attach_task()
    ss->can_attach()
    ss->attach() [ -> cpuset_attach() ]
    cpuset_attach_task()
    set_cpus_allowed_ptr();
    while (child->state == TASK_WAKING)
    cpu_relax();
    will deadlock the system.

    2) cgroup_clone (cpusets) vs copy_process

    So even if the above would work we still have:

    copy_process():

    if (current->nsproxy != p->nsproxy)
    ns_cgroup_clone()
    cgroup_clone()
    mutex_lock(inode->i_mutex)
    mutex_lock(cgroup_mutex)
    cgroup_attach_task()
    ss->can_attach()
    ss->attach() [ -> cpuset_attach() ]
    cpuset_attach_task()
    set_cpus_allowed_ptr();
    ...

    p->cpus_allowed = current->cpus_allowed

    over-writing the modified cpus_allowed.

    3) fork() vs hotplug

    if we unplug the child's cpu after the sanity check when the child
    gets attached to the task_list but before wake_up_new_task() shit
    will meet with fan.

    Solve all these issues by moving fork cpu selection into
    wake_up_new_task().

    Reported-by: Serge E. Hallyn
    Tested-by: Serge E. Hallyn
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: x86: Add support for the ANY bit
    perf: Change the is_software_event() definition
    perf: Honour event state for aux stream data
    perf: Fix perf_event_do_pending() fallback callsite
    perf kmem: Print usage help for unknown commands
    perf kmem: Increase "Hit" column length
    hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
    perf timechart: Use tid not pid for COMM change

    Linus Torvalds
     

21 Jan, 2010

4 commits

  • Anton reported that perf record kept receiving events even after calling
    ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
    events didn't respect the disabled state and kept flowing in.

    Reported-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    Tested-by: Anton Blanchard
    LKML-Reference:
    CC: stable@kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Paul questioned the context in which we should call
    perf_event_do_pending(). After looking at that I found that it should be
    called from IRQ context these days, however the fallback call-site is
    placed in softirq context. Ammend this by placing the callback in the IRQ
    timer path.

    Reported-by: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Assume A->B schedule is processing, if B have acquired BKL before and it
    need reschedule this time. Then on B's context, it will go to
    need_resched_nonpreemptible for reschedule. But at this time, prev and
    switch_count are related to A. It's wrong and will lead to incorrect
    scheduler statistics.

    Signed-off-by: Yong Zhang
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Zhang
     
  • SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
    enabled, leading to many cache misses on large machines as we traverse
    looking for an idle shared cache to wake to. Change the enabler of
    select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
    sibling domain level.

    Reported-by: Lin Ming
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

18 Jan, 2010

1 commit

  • Marc reported that the BUG_ON in clockevents_notify() triggers on his
    system. This happens because the kernel tries to remove an active
    clock event device (used for broadcasting) from the device list.

    The handling of devices which can be used as per cpu device and as a
    global broadcast device is suboptimal.

    The simplest solution for now (and for stable) is to check whether the
    device is used as global broadcast device, but this needs to be
    revisited.

    [ tglx: restored the cpuweight check and massaged the changelog ]

    Reported-by: Marc Dionne
    Tested-by: Marc Dionne
    Signed-off-by: Xiaotian Feng
    LKML-Reference:
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Xiaotian Feng
     

17 Jan, 2010

7 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futexes: Remove rw parameter from get_futex_key()

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing/filters: Add comment for match callbacks
    tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
    tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
    lib: Introduce strnstr()
    tracing/filters: Fix MATCH_END_ONLY filter matching
    tracing/filters: Fix MATCH_FRONT_ONLY filter matching
    ftrace: Fix MATCH_END_ONLY function filter
    tracing/x86: Derive arch from bits argument in recordmcount.pl
    ring-buffer: Add rb_list_head() wrapper around new reader page next field
    ring-buffer: Wrap a list.next reference with rb_list_head()

    Linus Torvalds
     
  • The change in acpi_cpufreq to use smp_call_function_any causes a warning
    when it is called since the function erroneously passes the cpu id to
    cpumask_of_node rather than the node that the cpu is on. Fix this.

    cpumask_of_node(3): node > nr_node_ids(1)
    Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
    Call Trace:
    [] cpumask_of_node+0x23/0x58
    [] smp_call_function_any+0x65/0xfa
    [] ? do_drv_read+0x0/0x2f
    [] get_cur_val+0xb0/0x102
    [] get_cur_freq_on_cpu+0x74/0xc5
    [] acpi_cpufreq_cpu_init+0x417/0x515
    [] ? __down_write+0xb/0xd
    [] cpufreq_add_dev+0x278/0x922

    Signed-off-by: David John
    Cc: Suresh Siddha
    Cc: Rusty Russell
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David John
     
  • On my first try using them I missed that the fifos need to be power of
    two, resulting in a runtime bug. Document that requirement everywhere
    (and fix one grammar bug)

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • In some upcoming code it's useful to peek into a FIFO without permanentely
    removing data. This patch implements a new kfifo_out_peek() to do this.

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Right now for kfifo_*_user it's not easily possible to distingush between
    a user copy failing and the FIFO not containing enough data. The problem
    is that both conditions are multiplexed into the same return code.

    Avoid this by moving the "copy length" into a separate output parameter
    and only return 0/-EFAULT in the main return value.

    I didn't fully adapt the weird "record" variants, those seem
    to be unused anyways and were rather messy (should they be just removed?)

    I would appreciate some double checking if I did all the conversions
    correctly.

    Signed-off-by: Andi Kleen
    Cc: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The pointers to user buffers are currently unsigned char *, which requires
    a lot of casting in the caller for any non-char typed buffers. Use void *
    instead.

    Signed-off-by: Andi Kleen
    Acked-by: Stefani Seibold
    Cc: Roland Dreier
    Cc: Dmitry Torokhov
    Cc: Andy Walls
    Cc: Vikram Dhillon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

15 Jan, 2010

6 commits

  • We should be clear on 2 things:

    - the length parameter of a match callback includes
    tailing '\0'.

    - the string to be searched might not be NULL-terminated.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • MATCH_FULL matching for PTR_STRING is not working correctly:

    # echo 'func == vt' > events/bkl/lock_kernel/filter
    # echo 1 > events/bkl/lock_kernel/enable
    ...
    # cat trace
    Xorg-1484 [000] 1973.392586: lock_kernel: ... func=vt_ioctl()
    gpm-1402 [001] 1974.027740: lock_kernel: ... func=vt_ioctl()

    We should pass to regex.match(..., len) the length (including '\0')
    of the source string instead of the length of the pattern string.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • The @str might not be NULL-terminated if it's of type
    DYN_STRING or STATIC_STRING, so we should use strnstr()
    instead of strstr().

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • For '*foo' pattern, we should allow any string ending with
    'foo', but event filtering incorrectly disallows strings
    like bar_foo_foo:

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • MATCH_FRONT_ONLY actually is a full matching:

    # ./perf record -R -f -a -e lock:lock_acquire \
    --filter 'name ~rcu_*' sleep 1
    # ./perf trace
    (no output)

    We should pass the length of the pattern string to strncmp().

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • For '*foo' pattern, we should allow any string ending with
    'foo', but ftrace filter incorrectly disallows strings
    like bar_foo_foo:

    # echo '*io' > set_ftrace_filter
    # cat set_ftrace_filter | grep 'req_bio_endio'
    # cat available_filter_functions | grep 'req_bio_endio'
    req_bio_endio

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     

13 Jan, 2010

1 commit

  • Currently, futexes have two problem:

    A) The current futex code doesn't handle private file mappings properly.

    get_futex_key() uses PageAnon() to distinguish file and
    anon, which can cause the following bad scenario:

    1) thread-A call futex(private-mapping, FUTEX_WAIT), it
    sleeps on file mapping object.
    2) thread-B writes a variable and it makes it cow.
    3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
    wakes up blocked thread on the anonymous page. (but it's nothing)

    B) Current futex code doesn't handle zero page properly.

    Read mode get_user_pages() can return zero page, but current
    futex code doesn't handle it at all. Then, zero page makes
    infinite loop internally.

    The solution is to use write mode get_user_page() always for
    page lookup. It prevents the lookup of both file page of private
    mappings and zero page.

    Performance concerns:

    Probaly very little, because glibc always initialize variables
    for futex before to call futex(). It means glibc users never see
    the overhead of this patch.

    Compatibility concerns:

    This patch has few compatibility issues. After this patch,
    FUTEX_WAIT require writable access to futex variables (read-only
    mappings makes EFAULT). But practically it's not a problem,
    glibc always initalizes variables for futexes explicitly - nobody
    uses read-only mappings.

    Reported-by: Hugh Dickins
    Signed-off-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Cc:
    Cc: Linus Torvalds
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: Ulrich Drepper
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

12 Jan, 2010

2 commits

  • When print-fatal-signals is enabled it's possible to dump any memory
    reachable by the kernel to the log by simply jumping to that address from
    user space.

    Or crash the system if there's some hardware with read side effects.

    The fatal signals handler will dump 16 bytes at the execution address,
    which is fully controlled by ring 3.

    In addition when something jumps to a unmapped address there will be up to
    16 additional useless page faults, which might be potentially slow (and at
    least is not very efficient)

    Fortunately this option is off by default and only there on i386.

    But fix it by checking for kernel addresses and also stopping when there's
    a page fault.

    Signed-off-by: Andi Kleen
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
    here in cgroup_diput():

    /*
    * if we're getting rid of the cgroup, refcount should ensure
    * that there are no pidlists left.
    */
    BUG_ON(!list_empty(&cgrp->pidlists));

    The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
    when pidlist_array_load() calls cgroup_pidlist_find():

    (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
    pre-existing cgroup_pidlist, and increments its use_count.
    (2) if no matching cgroup_pidlist is found, then a new one is allocated, it
    down_write's its mutex, and the use_count is set to 0.
    (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
    which increments its use_count -- regardless whether new or pre-existing --
    and up_write's the mutex.

    So if a matching list is ever encountered by cgroup_pidlist_find() during
    the life of a cgroup directory, it results in an inflated use_count value,
    preventing it from ever getting released by cgroup_release_pid_array().
    Then if the directory is subsequently removed, cgroup_diput() hits the
    BUG_ON() when it finds that the directory's cgroup is still populated with
    a pidlist.

    The patch simply removes the use_count increment when a matching pidlist
    is found by cgroup_pidlist_find(), because it gets bumped by the calling
    pidlist_array_load() function while still protected by the list's mutex.

    Signed-off-by: Dave Anderson
    Reviewed-by: Li Zefan
    Acked-by: Ben Blum
    Cc: Paul Menage
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Anderson