30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit


15 Mar, 2010

3 commits


14 Mar, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Provide generic perf_sample_data initialization
    MAINTAINERS: Add Arnaldo as tools/perf/ co-maintainer
    perf trace: Don't use pager if scripting
    perf trace/scripting: Remove extraneous header read
    perf, ARM: Modify kuser rmb() call to compile for Thumb-2
    x86/stacktrace: Don't dereference bad frame pointers
    perf archive: Don't try to collect files without a build-id
    perf_events, x86: Fixup fixed counter constraints
    perf, x86: Restrict the ANY flag
    perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
    perf, x86: add some IBS macros to perf_event.h
    perf, x86: make IBS macros available in perf_event.h
    hw-breakpoints: Remove stub unthrottle callback
    x86/hw-breakpoints: Remove the name field
    perf: Remove pointless breakpoint union
    perf lock: Drop the buffers multiplexing dependency
    perf lock: Fix and add misc documentally things
    percpu: Add __percpu sparse annotations to hw_breakpoint

    Linus Torvalds
     

13 Mar, 2010

2 commits

  • inflate_fast() can do either POST INC or PRE INC on its pointers walking
    the memory to decompress. Default is PRE INC.

    The sout pointer offset was miscalculated in one case as the calculation
    assumed sout was a char * This breaks inflate_fast() iff configured to do
    POST INC.

    Signed-off-by: Joakim Tjernlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     
  • Commit 6846ee5ca68d81e6baccf0d56221d7a00c1be18b ("zlib: Fix build of
    powerpc boot wrapper") made the new optimized inflate only available on
    arch's that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

    This patch will again enable the optimization for all arch's by defining
    our own endian independent version of unaligned access. As an added
    bonus, arch's that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS do a
    plain load instead.

    Signed-off-by: Joakim Tjernlund
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: David Woodhouse
    Cc: Kumar Gala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     

10 Mar, 2010

1 commit


08 Mar, 2010

3 commits

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Constify struct kset_uevent_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • This reverts commit a069c266ae5fdfbf5b4aecf2c672413aa33b2504.

    It turns ou that not only was it missing a case (XFS) that needed it,
    but perhaps more importantly, people sometimes want to enable new
    modules that they hadn't had enabled before, and if such a module uses
    list_sort(), it can't easily be inserted any more.

    So rather than add a "select LIST_SORT" to the XFS case, just leave it
    compiled in. It's not all _that_ big, after all, and the inconvenience
    isn't worth it.

    Requested-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Cc: Don Mullis
    Cc: Andrew Morton
    Cc: Dave Chinner
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Mar, 2010

14 commits

  • This adds separate I/O and memory specs, so we don't have to change the
    field width in a shared spec, which then lets us make all the specs const
    and static, since they never change.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Add clues about what the SMALL and SPECIAL flags do.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Reducing the size of struct printf_spec is a good thing because multiple
    instances are commonly passed on stack.

    It's possible for type to be u8 and field_width to be s8, but this is
    likely small enough for now.

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
    [LogFS] Change magic number
    [LogFS] Remove h_version field
    [LogFS] Check feature flags
    [LogFS] Only write journal if dirty
    [LogFS] Fix bdev erases
    [LogFS] Silence gcc
    [LogFS] Prevent 64bit divisions in hash_index
    [LogFS] Plug memory leak on error paths
    [LogFS] Add MAINTAINERS entry
    [LogFS] add new flash file system

    Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
    fs/logfs/inode.c introduced by write_inode() being changed to use
    writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
    ("pass writeback_control to ->write_inode")

    Linus Torvalds
     
  • Signed-off-by: Joakim Tjernlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     
  • Replace open-coded loop with for_each_set_bit().

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The function name must be followed by a space, hypen, space, and a short
    description.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     
  • Build list_sort() only for configs that need it -- those that don't save
    ~581 bytes (i386).

    Signed-off-by: Don Mullis
    Cc: Dave Airlie
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     
  • Clarify and correct header comment of list_sort().

    Signed-off-by: Don Mullis
    Cc: Dave Airlie
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     
  • XFS and UBIFS can pass long lists to list_sort(); this alternative
    implementation scales better, reaching ~3x performance gain when list
    length exceeds the L2 cache size.

    Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
    gcc-4.4, with flags extracted from an Ubuntu kernel build. Object size is
    581 bytes compared to 455 for Mark J. Roberts' code.

    Worst case for either implementation is a list length just over a power of
    two, and to roughly the same degree, so here are timing results for a
    range of 2^N+1 lengths. List elements were 16 bytes each including malloc
    overhead; initial order was random.

    time (msec)
    Tatham-Roberts
    | generic-Mullis-v2
    loop_count length | | ratio
    4000000 2 206 294 1.427
    2000000 3 176 227 1.289
    1000000 5 199 172 0.864
    500000 9 235 178 0.757
    250000 17 243 182 0.748
    125000 33 261 196 0.750
    62500 65 277 209 0.754
    31250 129 292 219 0.75
    15625 257 317 235 0.741
    7812 513 340 252 0.741
    3906 1025 362 267 0.737
    1953 2049 388 283 0.729 ~ L1 size
    976 4097 556 323 0.580
    488 8193 678 361 0.532
    244 16385 773 395 0.510
    122 32769 844 418 0.495
    61 65537 917 454 0.495
    30 131073 1128 543 0.481
    15 262145 2355 869 0.369 ~ L2 size
    7 524289 5597 1714 0.306
    3 1048577 6218 2022 0.325

    Mark's code does not actually implement the usual or generic mergesort,
    but rather a variant from Simon Tatham described here:

    http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html

    Simon's algorithm performs O(log N) passes over the entire input list,
    doing merges of sublists that double in size on each pass. The generic
    algorithm instead merges pairs of equal length lists as early as possible,
    in recursive order. For either algorithm, the elements that extend the
    list beyond power-of-two length are a special case, handled as nearly as
    possible as a "rounding-up" to a full POT.

    Some intuition for the locality of reference implications of merge order
    may be gotten by watching this animation:

    http://www.sorting-algorithms.com/merge-sort

    Simon's algorithm requires only O(1) extra space rather than the generic
    algorithm's O(log N), but in my non-recursive implementation the actual
    O(log N) data is merely a vector of ~20 pointers, which I've put on the
    stack.

    Long-running list_sort() calls: If the list passed in may be long, or the
    client's cmp() callback function is slow, the client's cmp() may
    periodically invoke cond_resched() to voluntarily yield the CPU. All
    inner loops of list_sort() call back to cmp().

    Stability of the sort: distinct elements that compare equal emerge from
    the sort in the same order as with Mark's code, for simple test cases. A
    boot-time test is provided to verify this and other correctness
    requirements.

    A kernel that uses drm.ko appears to run normally with this change; I have
    no suitable hardware to similarly test the use by UBIFS.

    [akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init]
    Signed-off-by: Don Mullis
    Cc: Dave Airlie
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     
  • Signed-off-by: André Goddard Rosa
    Cc: Li Zefan
    Cc: Joe Perches
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    André Goddard Rosa
     
  • Removes 32 bytes on core2 with gcc 4.4.1:
    text data bss dec hex filename
    3196 0 0 3196 c7c lib/string-BEFORE.o
    3164 0 0 3164 c5c lib/string-AFTER.o

    Signed-off-by: André Goddard Rosa
    Cc: Joe Perches
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    André Goddard Rosa
     
  • Add adds a debugfs interface and additional failure modes to LKDTM to
    provide similar functionality to the provoke-crash driver submitted here:

    http://lwn.net/Articles/371208/

    Crashes can now be induced either through module parameters (as before)
    or through the debugfs interface as in provoke-crash.

    The patch also provides a new "direct" interface, where KPROBES are not
    used, i.e., the crash is invoked directly upon write to the debugfs
    file. When built without KPROBES configured, only this mode is available.

    Signed-off-by: Simon Kagstrom
    Cc: M. Mohan Kumar
    Cc: Americo Wang
    Cc: David Woodhouse
    Cc: Ingo Molnar
    Cc: "Eric W. Biederman" ,
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Kagstrom
     
  • Use the same log level for printk's in show_mem(), so that those messages
    can be shown completely when using log level 6.

    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

01 Mar, 2010

3 commits

  • Conflicts:
    drivers/firmware/iscsi_ibft.c

    David S. Miller
     
  • * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Mark atomic irq ops raw for 32bit legacy
    x86: Merge show_regs()
    x86: Macroise x86 cache descriptors
    x86-32: clean up rwsem inline asm statements
    x86: Merge asm/atomic_{32,64}.h
    x86: Sync asm/atomic_32.h and asm/atomic_64.h
    x86: Split atomic64_t functions into seperate headers
    x86-64: Modify memcpy()/memset() alternatives mechanism
    x86-64: Modify copy_user_generic() alternatives mechanism
    x86: Lift restriction on the location of FIX_BTMAP_*
    x86, core: Optimize hweight32()

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
    rcu: Fix accelerated GPs for last non-dynticked CPU
    rcu: Make non-RCU_PROVE_LOCKING rcu_read_lock_sched_held() understand boot
    rcu: Fix accelerated grace periods for last non-dynticked CPU
    rcu: Export rcu_scheduler_active
    rcu: Make rcu_read_lock_sched_held() take boot time into account
    rcu: Make lockdep_rcu_dereference() message less alarmist
    sched, cgroups: Fix module export
    rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task information
    rcu: Fix rcutorture mod_timer argument to delay one jiffy
    rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detection
    rcu: Convert to raw_spinlocks
    rcu: Stop overflowing signed integers
    rcu: Use canonical URL for Mathieu's dissertation
    rcu: Accelerate grace period if last non-dynticked CPU
    rcu: Fix citation of Mathieu's dissertation
    rcu: Documentation update for CONFIG_PROVE_RCU
    security: Apply lockdep-based checking to rcu_dereference() uses
    idr: Apply lockdep-based diagnostics to rcu_dereference() uses
    radix-tree: Disable RCU lockdep checking in radix tree
    vfs: Abstract rcu_dereference_check for files-fdtable use
    ...

    Linus Torvalds
     

28 Feb, 2010

3 commits

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (88 commits)
    powerpc: Fix lwsync feature fixup vs. modules on 64-bit
    powerpc: Convert pmc_owner_lock to raw_spinlock
    powerpc: Convert die.lock to raw_spinlock
    powerpc: Convert tlbivax_lock to raw_spinlock
    powerpc: Convert mpic locks to raw_spinlock
    powerpc: Convert pmac_pic_lock to raw_spinlock
    powerpc: Convert big_irq_lock to raw_spinlock
    powerpc: Convert feature_lock to raw_spinlock
    powerpc: Convert i8259_lock to raw_spinlock
    powerpc: Convert beat_htab_lock to raw_spinlock
    powerpc: Convert confirm_error_lock to raw_spinlock
    powerpc: Convert ipic_lock to raw_spinlock
    powerpc: Convert native_tlbie_lock to raw_spinlock
    powerpc: Convert beatic_irq_mask_lock to raw_spinlock
    powerpc: Convert nv_lock to raw_spinlock
    powerpc: Convert context_lock to raw_spinlock
    powerpc/85xx: Add NOR, LEDs and PIB support for MPC8568E-MDS boards
    powerpc/86xx: Enable VME driver on the GE SBC610
    powerpc/86xx: Enable VME driver on the GE PPC9A
    powerpc/86xx: Add MSI section to GE PPC9A DTS
    ...

    Linus Torvalds
     
  • Remove pointless union in the breakpoint field of hw_perf_event.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul Mackerras

    Frederic Weisbecker
     
  • I've forgot to add 'perf lock' line to command-list.txt,
    so users of perf could not find perf lock when they type 'perf'.

    Fixing command-list.txt requires document
    (tools/perf/Documentation/perf-lock.txt).
    But perf lock is too much "under construction" to write a
    stable document, so this is something like pseudo document for now.

    And I wrote description of perf lock at help section of
    CONFIG_LOCK_STAT, this will navigate users of lock trace events.

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Hitoshi Mitake
     

27 Feb, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (187 commits)
    sh: remove dead LED code for migo-r and ms7724se
    sh: ecovec build fix for CONFIG_I2C=n
    sh: ecovec r-standby support
    sh: ms7724se r-standby support
    sh: SH-Mobile R-standby register save/restore
    clocksource: Fix up a registration/IRQ race in the sh drivers.
    sh: ms7724: modify scan_timing for KEYSC
    sh: ms7724: Add sh_sir support
    sh: mach-ecovec24: Add sh_sir support
    sh: wire up SET/GET_UNALIGN_CTL.
    sh: allow alignment fault mode to be configured at kernel boot.
    sh: sh7724: Update FSI/SPU2 clock
    sh: always enable sh7724 vpu_clk and set to 166MHz on Ecovec
    sh: add sh7724 kick callback to clk_div4_table
    sh: introduce struct clk_div4_table
    sh: clock-cpg div4 set_rate() shift fix
    sh: Turn on speculative return for SH7785 and SH7786
    sh: Merge legacy and dynamic PMB modes.
    sh: Use uncached I/O helpers in PMB setup.
    sh: Provide uncached I/O helpers.
    ...

    Linus Torvalds
     

26 Feb, 2010

1 commit


25 Feb, 2010

5 commits

  • When RCU detects a grace-period stall, it currently just prints
    out the PID of any tasks doing the stalling. This patch adds
    RCU_CPU_STALL_VERBOSE, which enables the more-verbose reporting
    from sched_show_task().

    Suggested-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Because idr can be used with any of a number of locks or with
    any flavor of RCU, just disable the lockdep-based diagnostics.
    If idr needs diagnostics, the check expression will need to be
    passed into the relevant idr primitives as an additional
    argument.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Because the radix tree is used with many different locking
    designs, we cannot do any effective checking without changing
    the radix-tree APIs. It might make sense to do this later, but
    only if the RCU lockdep checking proves itself sufficiently
    valuable.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Inspection is proving insufficient to catch all RCU misuses,
    which is understandable given that rcu_dereference() might be
    protected by any of four different flavors of RCU (RCU, RCU-bh,
    RCU-sched, and SRCU), and might also/instead be protected by any
    of a number of locking primitives. It is therefore time to
    enlist the aid of lockdep.

    This set of patches is inspired by earlier work by Peter
    Zijlstra and Thomas Gleixner, and takes the following approach:

    o Set up separate lockdep classes for RCU, RCU-bh, and RCU-sched.

    o Set up separate lockdep classes for each instance of SRCU.

    o Create primitives that check for being in an RCU read-side
    critical section. These return exact answers if lockdep is
    fully enabled, but if unsure, report being in an RCU read-side
    critical section. (We want to avoid false positives!)
    The primitives are:

    For RCU: rcu_read_lock_held(void)

    For RCU-bh: rcu_read_lock_bh_held(void)

    For RCU-sched: rcu_read_lock_sched_held(void)

    For SRCU: srcu_read_lock_held(struct srcu_struct *sp)

    o Add rcu_dereference_check(), which takes a second argument
    in which one places a boolean expression based on the above
    primitives and/or lockdep_is_held().

    o A new kernel configuration parameter, CONFIG_PROVE_RCU, enables
    rcu_dereference_check(). This depends on CONFIG_PROVE_LOCKING,
    and should be quite helpful during the transition period while
    CONFIG_PROVE_RCU-unaware patches are in flight.

    The existing rcu_dereference() primitive does no checking, but
    upcoming patches will change that.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Merge reason: Update from -rc4 to -final.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Feb, 2010

1 commit

  • This is retry of reverted 859ddf09743a8cc680af33f7259ccd0fd36bfe9d
    ("idr: fix a critical misallocation bug") which contained two bugs.

    * pa[idp->layers] should be cleared even if it's not used by
    sub_alloc() because it's used by mark idr_mark_full().

    * The original condition check also assigned pa[l] to p which the new
    code didn't do thus leaving p pointing at the wrong layer.

    Both problems have been fixed and the idr code has received good amount
    testing using userland testing setup where simple bitmap allocator is
    run parallel to verify the result of idr allocation.

    The bug this patch fixes is caused by sub_alloc() optimization path
    bypassing out-of-room condition check and restarting allocation loop
    with starting value higher than maximum allowed value. For detailed
    description, please read commit message of 859ddf09.

    Signed-off-by: Tejun Heo
    Based-on-patch-from: Eric Paris
    Reported-by: Eric Paris
    Tested-by: Stefan Lippers-Hollmann
    Tested-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo