11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

20 Sep, 2016

1 commit

  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.
    Since the online target runs always on the target CPU we can drop
    smp_call_function_single(). The functions is invoked with interrupts off to
    keep the old calling convention. If the maintainer things that this function
    can be called with interrupts enabled then it can be removed :)

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Robert Richter
    Cc: Peter Zijlstra
    Cc: oprofile-list@lists.sf.net
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160906170457.32393-10-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • sync_buffer() needs the mmap_sem for two distinct operations, both only
    occurring upon user context switch handling:

    1) Dealing with the exe_file.

    2) Adding the dcookie data as we need to lookup the vma that
    backs it. This is done via add_sample() and add_data().

    This patch isolates 1), for it will no longer need the mmap_sem for
    serialization. However, for now, make of the more standard
    get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
    and relying on reference counting to make sure that the exe file won't
    dissappear underneath us while doing the get dcookie.

    As a consequence, for 2) we move the mmap_sem locking into where we really
    need it, in lookup_dcookie(). The benefits are twofold: reduce mmap_sem
    hold times, and cleaner code.

    [akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
    Signed-off-by: Davidlohr Bueso
    Cc: Robert Richter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

16 Apr, 2015

1 commit


27 Aug, 2014

1 commit


20 Mar, 2014

1 commit

  • Subsystems that want to register CPU hotplug callbacks, as well as perform
    initialization for the CPUs that are already online, often do it as shown
    below:

    get_online_cpus();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    register_cpu_notifier(&foobar_cpu_notifier);

    put_online_cpus();

    This is wrong, since it is prone to ABBA deadlocks involving the
    cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
    with CPU hotplug operations).

    Instead, the correct and race-free way of performing the callback
    registration is:

    cpu_notifier_register_begin();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    /* Note the use of the double underscored version of the API */
    __register_cpu_notifier(&foobar_cpu_notifier);

    cpu_notifier_register_done();

    Fix the nmi-timer code in oprofile by using this latter form of callback
    registration.

    Cc: Robert Richter
    Cc: Ingo Molnar
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

04 Sep, 2013

6 commits


15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the remaining one-off uses of the __cpuinit macros
    from all C files in the drivers/* directory.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: Greg Kroah-Hartman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Feb, 2013

1 commit


09 Oct, 2012

1 commit

  • Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
    a task's executable file. After this patch they will use mm->exe_file
    directly. mm->exe_file is protected with mm->mmap_sem, so locking stays
    the same.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Chris Metcalf [arch/tile]
    Acked-by: Tetsuo Handa [tomoyo]
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Acked-by: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

27 Aug, 2012

1 commit

  • Under certain workloads we see the following warnings:

    WQ on CPU0, prefer CPU1
    WQ on CPU0, prefer CPU2
    WQ on CPU0, prefer CPU3

    It warns the user that the wq to access a per-cpu buffers runs not on
    the same cpu. This happens if the wq is rescheduled on a different cpu
    than where the buffer is located. This was probably implemented to
    detect performance issues. Not sure if there actually is one as the
    buffers are copied to a single buffer anyway which should be the
    actual bottleneck.

    We wont change WQ implementation. Since a user can do nothing the
    warning is pointless. Removing it.

    Cc: Andi Kleen
    Signed-off-by: Robert Richter

    Robert Richter
     

22 Jun, 2012

1 commit

  • This changes oprofile_perf.c to use the per-cpu framework.

    Using the per-cpu framework should avoid error like the following:

    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:28:28: error: variably modified 'perf_events' at file scope

    Reported-by: William Cohen
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Robert Richter
     

21 Jun, 2012

1 commit

  • The OProfile perf backend uses a static array to keep track of the
    perf events on the system. When compiling with CONFIG_CPUMASK_OFFSTACK=y
    && SMP, nr_cpumask_bits is not a compile-time constant and the build
    will fail with:

    oprofile_perf.c:28: error: variably modified 'perf_events' at file scope

    This patch uses NR_CPUs instead of nr_cpumask_bits for the array
    initialisation. If this causes space problems in the future, we can
    always move to dynamic allocation for the events array.

    Cc: Matt Fleming
    Reported-by: Russell King - ARM Linux
    Signed-off-by: Will Deacon
    Cc: # v2.6.37+
    Signed-off-by: Robert Richter

    Will Deacon
     

06 Apr, 2012

1 commit

  • Many users of debugfs copy the implementation of default_open() when
    they want to support a custom read/write function op. This leads to a
    proliferation of the default_open() implementation across the entire
    tree.

    Now that the common implementation has been consolidated into libfs we
    can replace all the users of this function with simple_open().

    This replacement was done with the following semantic patch:

    @ open @
    identifier open_f != simple_open;
    identifier i, f;
    @@
    -int open_f(struct inode *i, struct file *f)
    -{
    (
    -if (i->i_private)
    -f->private_data = i->i_private;
    |
    -f->private_data = i->i_private;
    )
    -return 0;
    -}

    @ has_open depends on open @
    identifier fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ...
    -.open = open_f,
    +.open = simple_open,
    ...
    };

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: Al Viro
    Cc: Julia Lawall
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

21 Mar, 2012

2 commits


07 Jan, 2012

1 commit

  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
    perf kvm: Fix copy & paste error in description
    perf script: Kill script_spec__delete
    perf top: Fix a memory leak
    perf stat: Introduce get_ratio_color() helper
    perf session: Remove impossible condition check
    perf tools: Fix feature-bits rework fallout, remove unused variable
    perf script: Add generic perl handler to process events
    perf tools: Use for_each_set_bit() to iterate over feature flags
    perf tools: Unify handling of features when writing feature section
    perf report: Accept fifos as input file
    perf tools: Moving code in some files
    perf tools: Fix out-of-bound access to struct perf_session
    perf tools: Continue processing header on unknown features
    perf tools: Improve macros for struct feature_ops
    perf: builtin-record: Document and check that mmap_pages must be a power of two.
    perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
    perf tools: Fix truncated annotation
    perf script: look up thread using tid instead of pid
    perf tools: Look up thread names for system wide profiling
    perf tools: Fix comm for processes with named threads
    ...

    Linus Torvalds
     

20 Dec, 2011

2 commits


07 Dec, 2011

1 commit


15 Nov, 2011

2 commits


04 Nov, 2011

3 commits

  • The legacy x86 nmi watchdog code was removed with the implementation
    of the perf based nmi watchdog. This broke Oprofile's nmi timer
    mode. To run nmi timer mode we relied on a continuous ticking nmi
    source which the nmi watchdog provided. The nmi tick was no longer
    available and current watchdog can not be used anymore since it runs
    with very long periods in the range of seconds. This patch
    reimplements the nmi timer mode using a perf counter nmi source.

    V2:
    * removing pr_info()
    * fix undefined reference to `__udivdi3' for 32 bit build
    * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb
    * removed nmi timer setup in arch/x86
    * implemented function stubs for op_nmi_init/exit()
    * made code more readable in oprofile_init()

    V3:
    * fix architectural initialization in oprofile_init()
    * fix CONFIG_OPROFILE_NMI_TIMER dependencies

    Acked-by: Peter Zijlstra
    Signed-off-by: Robert Richter

    Robert Richter
     
  • Remove exit functions by moving init/exit code to oprofile's setup/
    shutdown functions. Doing so the oprofile module exit code will be
    easier and less error-prone.

    Signed-off-by: Robert Richter

    Robert Richter
     
  • Oprofile may crash in a KVM guest while unlaoding modules. This
    happens if oprofile_arch_init() fails and oprofile switches to the hr
    timer mode as a fallback. In this case oprofile_arch_exit() is called,
    but it never was initialized properly which causes the crash. This
    patch fixes this.

    oprofile: using timer interrupt.
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] unregister_syscore_ops+0x41/0x58
    PGD 41da3f067 PUD 41d80e067 PMD 0
    Oops: 0002 [#1] PREEMPT SMP
    CPU 5
    Modules linked in: oprofile(-)

    Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d #18 Advanced Micro Device Anaheim/Anaheim
    RIP: 0010:[] [] unregister_syscore_ops+0x41/0x58
    RSP: 0018:ffff88041de1de98 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200
    RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
    RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082
    R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080
    R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
    FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040)
    Stack:
    ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e
    ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993
    ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200
    Call Trace:
    [] op_nmi_exit+0x15/0x17 [oprofile]
    [] oprofile_arch_exit+0xe/0x10 [oprofile]
    [] oprofile_exit+0x13/0x15 [oprofile]
    [] sys_delete_module+0x1c3/0x22f
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b
    Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
    89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
    RIP [] unregister_syscore_ops+0x41/0x58
    RSP
    CR2: 0000000000000008
    ---[ end trace 06d4e95b6aa3b437 ]---

    CC: stable@kernel.org # 2.6.37+
    Signed-off-by: Robert Richter

    Robert Richter
     

13 Sep, 2011

1 commit

  • The oprofilefs_lock can be taken in atomic context (in profiling
    interrupts) and therefore cannot cannot be preempted on -rt -
    annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

22 Jul, 2011

1 commit


01 Jul, 2011

1 commit

  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     

31 May, 2011

1 commit

  • This fixes the A->B/B->A locking dependency, see the warning below.

    The function task_exit_notify() is called with (task_exit_notifier)
    .rwsem set and then calls sync_buffer() which locks buffer_mutex. In
    sync_start() the buffer_mutex was set to prevent notifier functions to
    be started before sync_start() is finished. But when registering the
    notifier, (task_exit_notifier).rwsem is locked too, but now in
    different order than in sync_buffer(). In theory this causes a locking
    dependency, what does not occur in practice since task_exit_notify()
    is always called after the notifier is registered which means the lock
    is already released.

    However, after checking the notifier functions it turned out the
    buffer_mutex in sync_start() is unnecessary. This is because
    sync_buffer() may be called from the notifiers even if sync_start()
    did not finish yet, the buffers are already allocated but empty. No
    need to protect this with the mutex.

    So we fix this theoretical locking dependency by removing buffer_mutex
    in sync_start(). This is similar to the implementation before commit:

    750d857 oprofile: fix crash when accessing freed task structs

    which introduced the locking dependency.

    Lockdep warning:

    oprofiled/4447 is trying to acquire lock:
    (buffer_mutex){+.+...}, at: [] sync_buffer+0x31/0x3ec [oprofile]

    but task is already holding lock:
    ((task_exit_notifier).rwsem){++++..}, at: [] __blocking_notifier_call_chain+0x39/0x67

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 ((task_exit_notifier).rwsem){++++..}:
    [] lock_acquire+0xf8/0x11e
    [] down_write+0x44/0x67
    [] blocking_notifier_chain_register+0x52/0x8b
    [] profile_event_register+0x2d/0x2f
    [] sync_start+0x47/0xc6 [oprofile]
    [] oprofile_setup+0x60/0xa5 [oprofile]
    [] event_buffer_open+0x59/0x8c [oprofile]
    [] __dentry_open+0x1eb/0x308
    [] nameidata_to_filp+0x60/0x67
    [] do_last+0x5be/0x6b2
    [] path_openat+0xc7/0x360
    [] do_filp_open+0x3d/0x8c
    [] do_sys_open+0x110/0x1a9
    [] sys_open+0x20/0x22
    [] system_call_fastpath+0x16/0x1b

    -> #0 (buffer_mutex){+.+...}:
    [] __lock_acquire+0x1085/0x1711
    [] lock_acquire+0xf8/0x11e
    [] mutex_lock_nested+0x63/0x309
    [] sync_buffer+0x31/0x3ec [oprofile]
    [] task_exit_notify+0x16/0x1a [oprofile]
    [] notifier_call_chain+0x37/0x63
    [] __blocking_notifier_call_chain+0x50/0x67
    [] blocking_notifier_call_chain+0x14/0x16
    [] profile_task_exit+0x1a/0x1c
    [] do_exit+0x2a/0x6fc
    [] do_group_exit+0x83/0xae
    [] sys_exit_group+0x17/0x1b
    [] system_call_fastpath+0x16/0x1b

    other info that might help us debug this:

    1 lock held by oprofiled/4447:
    #0: ((task_exit_notifier).rwsem){++++..}, at: [] __blocking_notifier_call_chain+0x39/0x67

    stack backtrace:
    Pid: 4447, comm: oprofiled Not tainted 2.6.39-00007-gcf4d8d4 #10
    Call Trace:
    [] print_circular_bug+0xae/0xbc
    [] __lock_acquire+0x1085/0x1711
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] lock_acquire+0xf8/0x11e
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] ? mark_lock+0x42f/0x552
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] mutex_lock_nested+0x63/0x309
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] sync_buffer+0x31/0x3ec [oprofile]
    [] ? __blocking_notifier_call_chain+0x39/0x67
    [] ? __blocking_notifier_call_chain+0x39/0x67
    [] task_exit_notify+0x16/0x1a [oprofile]
    [] notifier_call_chain+0x37/0x63
    [] __blocking_notifier_call_chain+0x50/0x67
    [] blocking_notifier_call_chain+0x14/0x16
    [] profile_task_exit+0x1a/0x1c
    [] do_exit+0x2a/0x6fc
    [] ? retint_swapgs+0xe/0x13
    [] do_group_exit+0x83/0xae
    [] sys_exit_group+0x17/0x1b
    [] system_call_fastpath+0x16/0x1b

    Reported-by: Marcin Slusarz
    Cc: Carl Love
    Cc: # .36+
    Signed-off-by: Robert Richter

    Robert Richter