19 May, 2010

4 commits

  • Add get_online_cpus/put_online_cpus to ensure that no cpu goes
    offline during the flushing of the padata percpu queues.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • yield was used to wait until all references of the internal control
    structure in use are dropped before it is freed. This patch implements
    padata_flush_queues which actively flushes the padata percpu queues
    in this case.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • padata_get_next needs to check whether the next object that
    need serialization must be parallel processed by the local cpu.
    This check was wrong implemented and returned always true,
    so the try_again loop in padata_reorder was never taken. This
    can lead to object leaks in some rare cases due to a race that
    appears with the trylock in padata_reorder. The try_again loop
    was not a good idea after all, because a cpu could take that
    loop frequently, so we handle this with a timer instead.

    This patch adds a timer to handle the race that appears with
    the trylock. If cpu1 queues an object to the reorder queue while
    cpu2 holds the pd->lock but left the while loop in padata_reorder
    already, cpu2 can't care for this object and cpu1 exits because
    it can't get the lock. Usually the next cpu that takes the lock
    cares for this object too. We need the timer just if this object
    was the last one that arrives to the reorder queues. The timer
    function sends it out in this case.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     

03 May, 2010

6 commits


25 Apr, 2010

1 commit

  • On ppc64 you get this error:

    $ setarch ppc -R true
    setarch: ppc: Unrecognized architecture

    because uname still reports ppc64 as the machine.

    So mask off the personality flags when checking for PER_LINUX32.

    Signed-off-by: Andreas Schwab
    Reviewed-by: Christoph Hellwig
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     

22 Apr, 2010

1 commit

  • creds_are_invalid() reads both cred->usage and cred->subscribers and then
    compares them to make sure the number of processes subscribed to a cred struct
    never exceeds the refcount of that cred struct.

    The problem is that this can cause a race with both copy_creds() and
    exit_creds() as the two counters, whilst they are of atomic_t type, are only
    atomic with respect to themselves, and not atomic with respect to each other.

    This means that if creds_are_invalid() can read the values on one CPU whilst
    they're being modified on another CPU, and so can observe an evolving state in
    which the subscribers count now is greater than the usage count a moment
    before.

    Switching the order in which the counts are read cannot help, so the thing to
    do is to remove that particular check.

    I had considered rechecking the values to see if they're in flux if the test
    fails, but I can't guarantee they won't appear the same, even if they've
    changed several times in the meantime.

    Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled.

    The problem is only likely to occur with multithreaded programs, and can be
    tested by the tst-eintr1 program from glibc's "make check". The symptoms look
    like:

    CRED: Invalid credentials
    CRED: At include/linux/cred.h:240
    CRED: Specified credentials: ffff88003dda5878 [real][eff]
    CRED: ->magic=43736564, put_addr=(null)
    CRED: ->usage=766, subscr=766
    CRED: ->*uid = { 0,0,0,0 }
    CRED: ->*gid = { 0,0,0,0 }
    CRED: ->security is ffff88003d72f538
    CRED: ->security {359, 359}
    ------------[ cut here ]------------
    kernel BUG at kernel/cred.c:850!
    ...
    RIP: 0010:[] [] __invalid_creds+0x4e/0x52
    ...
    Call Trace:
    [] copy_creds+0x6b/0x23f

    Note the ->usage=766 and subscr=766. The values appear the same because
    they've been re-read since the check was made.

    Reported-by: Roland McGrath
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

21 Apr, 2010

1 commit

  • Patch 570b8fb505896e007fd3bb07573ba6640e51851d:

    Author: Mathieu Desnoyers
    Date: Tue Mar 30 00:04:00 2010 +0100
    Subject: CRED: Fix memory leak in error handling

    attempts to fix a memory leak in the error handling by making the offending
    return statement into a jump down to the bottom of the function where a
    kfree(tgcred) is inserted.

    This is, however, incorrect, as it does a kfree() after doing put_cred() if
    security_prepare_creds() fails. That will result in a double free if 'error'
    is jumped to as put_cred() will also attempt to free the new tgcred record by
    virtue of it being pointed to by the new cred record.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

19 Apr, 2010

1 commit

  • The lockdep facility temporarily disables lockdep checking by
    incrementing the current->lockdep_recursion variable. Such
    disabling happens in NMIs and in other situations where lockdep
    might expect to recurse on itself.

    This patch therefore checks current->lockdep_recursion, disabling RCU
    lockdep splats when this variable is non-zero. In addition, this patch
    removes the "likely()", as suggested by Lai Jiangshan.

    Reported-by: Frederic Weisbecker
    Reported-by: David Miller
    Tested-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: eric.dumazet@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

11 Apr, 2010

1 commit

  • When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device
    improperly by old_decode_dev and it results in an error while
    hibernating with s2disk.

    All users already pass the new device number, so switch to
    new_decode_dev().

    Signed-off-by: Jiri Slaby
    Reported-and-tested-by: Jiri Kosina
    Signed-off-by: "Rafael J. Wysocki"

    Jiri Slaby
     

08 Apr, 2010

1 commit


07 Apr, 2010

2 commits


06 Apr, 2010

5 commits

  • taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with
    the following error:

    sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument)

    This box has 128 threads and 16 bytes is enough to cover it.

    Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched:
    sched_getaffinity(): Allow less than NR_CPUS length) is
    comparing this 16 bytes agains nr_cpu_ids.

    Fix it by comparing nr_cpu_ids to the number of bits in the
    cpumask we pass in.

    Signed-off-by: Anton Blanchard
    Reviewed-by: KOSAKI Motohiro
    Cc: Sharyathi Nagesh
    Cc: Ulrich Drepper
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Jack Steiner
    Cc: Russ Anderson
    Cc: Mike Travis
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     
  • Module refcounting is implemented with a per-cpu counter for speed.
    However there is a race when tallying the counter where a reference may
    be taken by one CPU and released by another. Reference count summation
    may then see the decrement without having seen the previous increment,
    leading to lower than expected count. A module which never has its
    actual reference drop below 1 may return a reference count of 0 due to
    this race.

    Module removal generally runs under stop_machine, which prevents this
    race causing bugs due to removal of in-use modules. However there are
    other real bugs in module.c code and driver code (module_refcount is
    exported) where the callers do not run under stop_machine.

    Fix this by maintaining running per-cpu counters for the number of
    module refcount increments and the number of refcount decrements. The
    increments are tallied after the decrements, so any decrement seen will
    always have its corresponding increment counted. The final refcount is
    the difference of the total increments and decrements, preventing a
    low-refcount from being returned.

    Signed-off-by: Nick Piggin
    Acked-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There have been a number of reports of people seeing the message:
    "name_count maxed, losing inode data: dev=00:05, inode=3185"
    in dmesg. These usually lead to people reporting problems to the filesystem
    group who are in turn clueless what they mean.

    Eventually someone finds me and I explain what is going on and that
    these come from the audit system. The basics of the problem is that the
    audit subsystem never expects a single syscall to 'interact' (for some
    wish washy meaning of interact) with more than 20 inodes. But in fact
    some operations like loading kernel modules can cause changes to lots of
    inodes in debugfs.

    There are a couple real fixes being bandied about including removing the
    fixed compile time limit of 20 or not auditing changes in debugfs (or
    both) but neither are small and obvious so I am not sending them for
    immediate inclusion (I hope Al forwards a real solution next devel
    window).

    In the meantime this patch simply adds 'audit' to the beginning of the
    crap message so if a user sees it, they come blame me first and we can
    talk about what it means and make sure we understand all of the reasons
    it can happen and make sure this gets solved correctly in the long run.

    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • * 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
    eeepc-wmi: include slab.h
    staging/otus: include slab.h from usbdrv.h
    percpu: don't implicitly include slab.h from percpu.h
    kmemcheck: Fix build errors due to missing slab.h
    include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
    iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
    x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h

    Fix up trivial conflicts in include/linux/percpu.h due to
    is_kernel_percpu_address() having been introduced since the slab.h
    cleanup with the percpu_up.c splitup.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    module: add stub for is_module_percpu_address
    percpu, module: implement and use is_kernel/module_percpu_address()
    module: encapsulate percpu handling better and record percpu_size

    Linus Torvalds
     

05 Apr, 2010

4 commits


03 Apr, 2010

11 commits


01 Apr, 2010

2 commits

  • Scheduler's task migration events don't work because they always
    pass NULL regs perf_sw_event(). The event hence gets filtered
    in perf_swevent_add().

    Scheduler's context switches events use task_pt_regs() to get
    the context when the event occured which is a wrong thing to
    do as this won't give us the place in the kernel where we went
    to sleep but the place where we left userspace. The result is
    even more wrong if we switch from a kernel thread.

    Use the hot regs snapshot for both events as they belong to the
    non-interrupt/exception based events family. Unlike page faults
    or so that provide the regs matching the exact origin of the event,
    we need to save the current context.

    This makes the task migration event working and fix the context
    switch callchains and origin ip.

    Example: perf record -a -e cs

    Before:

    10.91% ksoftirqd/0 0 [k] 0000000000000000
    |
    --- (nil)
    perf_callchain
    perf_prepare_sample
    __perf_event_overflow
    perf_swevent_overflow
    perf_swevent_add
    perf_swevent_ctx_event
    do_perf_sw_event
    __perf_sw_event
    perf_event_task_sched_out
    schedule
    run_ksoftirqd
    kthread
    kernel_thread_helper

    After:

    23.77% hald-addon-stor [kernel.kallsyms] [k] schedule
    |
    --- schedule
    |
    |--60.00%-- schedule_timeout
    | wait_for_common
    | wait_for_completion
    | blk_execute_rq
    | scsi_execute
    | scsi_execute_req
    | sr_test_unit_ready
    | |
    | |--66.67%-- sr_media_change
    | | media_changed
    | | cdrom_media_changed
    | | sr_block_media_changed
    | | check_disk_change
    | | cdrom_open

    v2: Always build perf_arch_fetch_caller_regs() now that software
    events need that too. They don't need it from modules, unlike trace
    events, so we keep the EXPORT_SYMBOL in trace_event_perf.c

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: David Miller

    Frederic Weisbecker
     
  • The trace event buffer used by perf to record raw sample events
    is typed as an array of char and may then not be aligned to 8
    by alloc_percpu().

    But we need it to be aligned to 8 in sparc64 because we cast
    this buffer into a random structure type built by the TRACE_EVENT()
    macro to store the traces. So if a random 64 bits field is accessed
    inside, it may be not under an expected good alignment.

    Use an array of long instead to force the appropriate alignment, and
    perform a compile time check to ensure the size in byte of the buffer
    is a multiple of sizeof(long) so that its actual size doesn't get
    shrinked under us.

    This fixes unaligned accesses reported while using perf lock
    in sparc 64.

    Suggested-by: David Miller
    Suggested-by: Tejun Heo
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Steven Rostedt

    Frederic Weisbecker