03 Feb, 2010

14 commits

  • Fix kfifo kernel-doc warnings:

    Warning(kernel/kfifo.c:361): No description found for parameter 'total'
    Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data
    Warning(kernel/kfifo.c:412): No description found for parameter 'lenout'

    Signed-off-by: Randy Dunlap
    Cc: Stefani Seibold
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add missing braces for multiline 'if' statements in fm3130_probe.

    Signed-off-by: Sergey Matyukevich
    Signed-off-by: Alessandro Zummo
    Cc: Sergey Lapin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Matyukevich
     
  • Fix the kernel oops when dev_dbg is called with mx3_fbi->txd == NULL

    Fix the late initialisation of mx3fb->backlight_level. If not, in the
    chain of function started by init_fb_chan(), in __blank() call
    sdc_set_brightness(mx3fb, mx3fb->backlight_level) that will shut down the
    CONTRAST PWM output.

    Signed-off-by: Alberto Panizzo
    Acked-by: Guennadi Liakhovetski gmx.de>
    Cc: Sascha Hauer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alberto Panizzo
     
  • Eric Paris located a bug in idr. With IDR_BITS of 6, it grows to three
    layers when id 4096 is first allocated. When that happens, idr wraps
    incorrectly and searches the idr array ignoring the high bits. The
    following test code from Eric demonstrates the bug nicely.

    #include
    #include
    #include

    static DEFINE_IDR(test_idr);

    int init_module(void)
    {
    int ret, forty95, forty96;
    void *addr;

    /* add 2 entries both with 4095 as the start address */
    again1:
    if (!idr_pre_get(&test_idr, GFP_KERNEL))
    return -ENOMEM;
    ret = idr_get_new_above(&test_idr, (void *)4095, 4095, &forty95);
    if (ret) {
    if (ret == -EAGAIN)
    goto again1;
    return ret;
    }
    if (forty95 != 4095)
    printk(KERN_ERR "hmmm, forty95=%d\n", forty95);

    again2:
    if (!idr_pre_get(&test_idr, GFP_KERNEL))
    return -ENOMEM;
    ret = idr_get_new_above(&test_idr, (void *)4096, 4095, &forty96);
    if (ret) {
    if (ret == -EAGAIN)
    goto again2;
    return ret;
    }
    if (forty96 != 4096)
    printk(KERN_ERR "hmmm, forty96=%d\n", forty96);

    /* try to find the 2 entries, noticing that 4096 broke */
    addr = idr_find(&test_idr, forty95);
    if ((int)addr != forty95)
    printk(KERN_ERR "hmmm, after find forty95=%d addr=%d\n", forty95, (int)addr);
    addr = idr_find(&test_idr, forty96);
    if ((int)addr != forty96)
    printk(KERN_ERR "hmmm, after find forty96=%d addr=%d\n", forty96, (int)addr);
    /* really weird, the entry which should be at 4096 is actually at 0!! */
    addr = idr_find(&test_idr, 0);
    if ((int)addr)
    printk(KERN_ERR "found an entry at id=0 for addr=%d\n", (int)addr);

    idr_remove(&test_idr, forty95);
    idr_remove(&test_idr, forty96);

    return 0;
    }

    void cleanup_module(void)
    {
    }

    MODULE_AUTHOR("Eric Paris ");
    MODULE_DESCRIPTION("Simple idr test");
    MODULE_LICENSE("GPL");

    This happens because when sub_alloc() back tracks it doesn't always do it
    step-by-step while the over-the-limit detection assumes step-by-step
    backtracking. The logic in sub_alloc() looks like the following.

    restart:
    clear pa[top level + 1] for end cond detection
    l = top level
    while (true) {
    search for empty slot at this level
    if (not found) {
    push id to the next possible value
    l++
    A: if (pa[l] is clear)
    failed, return asking caller to grow the tree
    if (going up 1 level gives more slots to search)
    continue the while loop above with the incremented l
    else
    C: goto restart
    }
    adjust id accordingly to the found slot
    if (l == 0)
    return found id;
    create lower level if not there yet
    record pa[l] and l--
    }

    Test A is the fail exit condition but this assumes that failure is
    propagated upwared one level at a time but the B optimization path breaks
    the assumption and restarts the whole thing with a start value which is
    above the possible limit with the current layers. sub_alloc() assumes the
    start id value is inside the limit when called and test A is the only exit
    condition check, so it ends up searching for empty slot while ignoring
    high set bit.

    So, for 4095->4096 test, level0 search fails but pa[1] contains a valid
    pointer. However, going up 1 level wouldn't give any more empty slot so
    it takes C and when the whole thing restarts nobody notices the high bit
    set beyond the top level.

    This patch fixes the bug by changing the fail exit condition check to full
    id limit check.

    Based-on-patch-from: Eric Paris
    Reported-by: Eric Paris
    Signed-off-by: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cfq-iosched: Do not idle on async queues
    blk-cgroup: Fix potential deadlock in blk-cgroup
    block: fix bugs in bio-integrity mempool usage
    block: fix bio_add_page for non trivial merge_bvec_fn case
    drbd: null dereference bug
    drbd: fix max_segment_size initialization

    Linus Torvalds
     
  • Improve handling of fragmented per-CPU vmaps. We previously don't free
    up per-CPU maps until all its addresses have been used and freed. So
    fragmented blocks could fill up vmalloc space even if they actually had
    no active vmap regions within them.

    Add some logic to allow all CPUs to have these blocks purged in the case
    of failure to allocate a new vm area, and also put some logic to trim
    such blocks of a current CPU if we hit them in the allocation path (so
    as to avoid a large build up of them).

    Christoph reported some vmap allocation failures when using the per CPU
    vmap APIs in XFS, which cannot be reproduced after this patch and the
    previous bug fix.

    Cc: linux-mm@kvack.org
    Cc: stable@kernel.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • RCU list walking of the per-cpu vmap cache was broken. It did not use
    RCU primitives, and also the union of free_list and rcu_head is
    obviously wrong (because free_list is indeed the list we are RCU
    walking).

    While we are there, remove a couple of unused fields from an earlier
    iteration.

    These APIs aren't actually used anywhere, because of problems with the
    XFS conversion. Christoph has now verified that the problems are solved
    with these patches. Also it is an exported interface, so I think it
    will be good to be merged now (and Christoph wants to get the XFS
    changes into their local tree).

    Cc: stable@kernel.org
    Cc: linux-mm@kvack.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    random: Remove unused inode variable
    crypto: padlock-sha - Add import/export support
    random: drop weird m_time/a_time manipulation

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
    GFS2: Use GFP_NOFS for alloc structure
    GFS2: Fix previous patch
    GFS2: Don't withdraw on partial rindex entries
    GFS2: Fix refcnt leak on gfs2_follow_link() error path

    Linus Torvalds
     
  • * 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Fix access to released memory in clk_debugfs_register_one()
    sh: Fix access to released memory in dwarf_unwinder_cleanup()
    usb: r8a66597-hdc disable interrupts fix
    spi: spi_sh_msiof: Fixed data sampling on the correct edge

    Linus Torvalds
     
  • * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
    MIPS: 64-bit: Detect virtual memory size
    MIPS: AR7: Fix USB slave mem range typo
    MIPS: Alchemy: Fix dbdma ring destruction memory debugcheck.

    Linus Torvalds
     
  • Commit 221af7f87b9 ("Split 'flush_old_exec' into two functions") split
    the function at the point of no return - ie right where there were no
    more error cases to check. That made sense from a technical standpoint,
    but when we then also combined it with the actual personality setting
    going in between flush_old_exec() and setup_new_exec(), it needs to be a
    bit more careful.

    In particular, we need to make sure that we really flush the old
    personality bits in the 'flush' stage, rather than later in the 'setup'
    stage, since otherwise we might be flushing the _new_ personality state
    that we're just setting up.

    So this moves the flags and personality flushing (and 'flush_thread()',
    which is the arch-specific function that generally resets lazy FP state
    etc) of the old process into flush_old_exec(), so that it doesn't affect
    any state that execve() is setting up for the new process environment.

    This was reported by Michal Simek as breaking his Microblaze qemu
    environment.

    Reported-and-tested-by: Michal Simek
    Cc: Peter Anvin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Few weeks back, Shaohua Li had posted similar patch. I am reposting it
    with more test results.

    This patch does two things.

    - Do not idle on async queues.

    - It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
    Currently, we seem to driving queue depth of 1 always for WRITES. This is
    true even if there is only one write queue in the system and all the logic
    of infinite queue depth in case of single busy queue as well as slowly
    increasing queue depth based on last delayed sync request does not seem to
    be kicking in at all.

    This patch will allow deeper WRITE queue depths (subjected to the other
    WRITE queue depth contstraints like cfq_quantum and last delayed sync
    request).

    Shaohua Li had reported getting more out of his SSD. For me, I have got
    one Lun exported from an HP EVA and when pure buffered writes are on, I
    can get more out of the system. Following are test results of pure
    buffered writes (with end_fsync=1) with vanilla and patched kernel. These
    results are average of 3 sets of run with increasing number of threads.

    AVERAGE[bufwfs][vanilla]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    bufwfs 3 1 0 0 95349 474141
    bufwfs 3 2 0 0 100282 806926
    bufwfs 3 4 0 0 109989 2.7301e+06
    bufwfs 3 8 0 0 116642 3762231
    bufwfs 3 16 0 0 118230 6902970

    AVERAGE[bufwfs] [patched kernel]
    -------
    bufwfs 3 1 0 0 270722 404352
    bufwfs 3 2 0 0 206770 1.06552e+06
    bufwfs 3 4 0 0 195277 1.62283e+06
    bufwfs 3 8 0 0 260960 2.62979e+06
    bufwfs 3 16 0 0 299260 1.70731e+06

    I also ran buffered writes along with some sequential reads and some
    buffered reads going on in the system on a SATA disk because the potential
    risk could be that we should not be driving queue depth higher in presence
    of sync IO going to keep the max clat low.

    With some random and sequential reads going on in the system on one SATA
    disk I did not see any significant increase in max clat. So it looks like
    other WRITE queue depth control logic is doing its job. Here are the
    results.

    AVERAGE[brr, bsr, bufw together] [vanilla]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    brr 3 1 850 546345 0 0
    bsr 3 1 14650 729543 0 0
    bufw 3 1 0 0 23908 8274517

    brr 3 2 981.333 579395 0 0
    bsr 3 2 14149.7 1175689 0 0
    bufw 3 2 0 0 21921 1.28108e+07

    brr 3 4 898.333 1.75527e+06 0 0
    bsr 3 4 12230.7 1.40072e+06 0 0
    bufw 3 4 0 0 19722.3 2.4901e+07

    brr 3 8 900 3160594 0 0
    bsr 3 8 9282.33 1.91314e+06 0 0
    bufw 3 8 0 0 18789.3 23890622

    AVERAGE[brr, bsr, bufw mixed] [patched kernel]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    brr 3 1 837 417973 0 0
    bsr 3 1 14357.7 591275 0 0
    bufw 3 1 0 0 24869.7 8910662

    brr 3 2 1038.33 543434 0 0
    bsr 3 2 13351.3 1205858 0 0
    bufw 3 2 0 0 18626.3 13280370

    brr 3 4 913 1.86861e+06 0 0
    bsr 3 4 12652.3 1430974 0 0
    bufw 3 4 0 0 15343.3 2.81305e+07

    brr 3 8 890 2.92695e+06 0 0
    bsr 3 8 9635.33 1.90244e+06 0 0
    bufw 3 8 0 0 17200.3 24424392

    So looks like it might make sense to include this patch.

    Thanks
    Vivek

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Linux kernel 2.6.32 and later allocate address space from the top of the
    kernel virtual memory address space.

    This patch implements virtual memory size detection for 64 bit MIPS CPUs
    to avoid resulting crashes.

    Signed-off-by: Guenter Roeck
    Cc: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/935/
    Reviewed-by: David Daney
    Signed-off-by: Ralf Baechle

    Guenter Roeck
     

02 Feb, 2010

20 commits

  • Signed-off-by: Marek Skuczynski
    Signed-off-by: Paul Mundt

    Marek Skuczynski
     
  • Signed-off-by: Marek Skuczynski
    Acked-by: Matt Fleming
    Signed-off-by: Paul Mundt

    Marek Skuczynski
     
  • This patch improves disable_controller() in the r8a66597-hdc
    driver to disable all interrupts and clear status flags. It
    also makes sure that disable_controller() is called during
    probe(). This fixes the relatively rare case of unexpected
    pending interrupts after kexec reboot.

    Signed-off-by: Magnus Damm
    Acked-by: Yoshihiro Shimoda
    Signed-off-by: Paul Mundt

    Magnus Damm
     
  • The spi_sh_msiof.c driver presently misconfigures REDG and TEDG. TEDG==0
    outputs data at the **rising edge** of the clock and REDG==0 samples data
    at the **falling edge** of the clock. Therefore for SPI, TEDG must be
    equal to REDG, otherwise the last byte received is not sampled in SPI
    mode 3.

    This brings the driver in line with the SH7723 HW Reference Manual
    settings documented in Figures 20.20 and 20.21 ("SPI Clock and data
    timing").

    Signed-off-by: Markus Pietrek
    Acked-by: Magnus Damm
    Signed-off-by: Paul Mundt

    Markus Pietrek
     
  • The previous changeset left behind an unused inode variable.
    This patch removes it.

    Reported-by: Stephen Rothwell
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • As the padlock driver for SHA uses a software fallback to perform
    partial hashing, it must implement custom import/export functions.
    Otherwise hmac which depends on import/export for prehashing will
    not work with padlock-sha.

    Reported-by: Wolfgang Walter
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • No other driver does anything remotely like this that I know of except
    for the tty drivers, and I can't see any reason for random/urandom to do
    it. In fact, it's a (trivial, harmless) timing information leak. And
    obviously, it generates power- and flash-cycle wasting I/O, especially
    if combined with something like hwrngd. Also, it breaks ubifs's
    expectations.

    Signed-off-by: Matt Mackall
    Signed-off-by: Herbert Xu

    Matt Mackall
     
  • Signed-off-by: Alexander Clouter
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/919/
    Signed-off-by: Ralf Baechle

    Alexander Clouter
     
  • DBDMA descriptors need to be located at 32-byte aligned addresses;
    however kmalloc in conjunction with the SLAB allocator and
    CONFIG_DEBUG_SLUB enabled doesn't deliver any. The dbdma code works
    around that by allocating a larger area and realigning the start
    address within it.

    When freeing a channel however this adjustment is not taken into
    account which results in an oops:

    Kernel bug detected[#1]:
    [...]
    Call Trace:
    [] cache_free_debugcheck+0x284/0x318
    [] kfree+0xe8/0x2a0
    [] au1xxx_dbdma_chan_free+0x2c/0x7c
    [] au1x_pcm_dbdma_free+0x34/0x4c
    [] au1xpsc_pcm_close+0x28/0x38
    [] soc_codec_close+0x14c/0x1cc
    [] snd_pcm_release_substream+0x60/0xac
    [] snd_pcm_release+0x40/0xa0
    [] __fput+0x11c/0x228
    [] filp_close+0x7c/0x98
    [] sys_close+0x9c/0xe4
    [] stack_done+0x20/0x3c

    Fix this by recording the address delivered by kmalloc() and using
    it as parameter to kfree().

    This fix is only necessary with the SLAB allocator and CONFIG_DEBUG_SLAB
    enabled; non-debug SLAB, SLUB do return nicely aligned addresses,
    debug-enabled SLUB currently panics early in the boot process.

    Signed-off-by: Manuel Lauss
    To: Linux-MIPS
    Cc: Manuel Lauss
    Patchwork: http://patchwork.linux-mips.org/patch/878/
    Signed-off-by: Ralf Baechle

    Manuel Lauss
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ASoC: AM3517: ASoC driver not getting compiled
    ASoC: AIC23: Fixing writes to non-existing registers in resume function
    ALSA: hda - Add an ASUS mobo to MSI blacklist

    Linus Torvalds
     
  • * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
    drm/radeon/kms: move radeon KMS on/off switch out of staging.
    drm/radeon/kms: Bailout of blit if error happen & protect with mutex V3
    drm/vmwgfx: Don't send bad flags to the host
    drm/vmwgfx: Request SVGA version 2 and bail if not found
    drm/vmwgfx: Correctly detect 3D
    drm/ttm: remove unnecessary save_flags and ttm_flag_masked in ttm_bo_util.c
    drm/kms: Remove incorrect comment in struct drm_mode_modeinfo
    drm/ttm: remove padding from ttm_ref_object on 64bit builds
    drm/radeon/kms: release agp on error.
    drm/kms/radeon/agp: Move the check of the aper_size after drm_acp_acquire and drm_agp_info
    drm/kms/radeon/agp: Fix warning, format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’
    drm/ttm: Avoid conflicting reserve_memtype during ttm_tt_set_page_caching.
    drm/kms/radeon: pick digitial encoders smarter. (v3)
    drm/radeon/kms: use active device to pick connector for encoder
    drm/radeon/kms: fix incorrect logic in DP vs eDP connector checking.

    Linus Torvalds
     
  • …t/frederic/random-tracing

    * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
    reiserfs: Fix vmalloc call under reiserfs lock

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: Fix check_usage_backwards() error message

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf, hw_breakpoint, kgdb: Do not take mutex for kernel debugger
    x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint API
    hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails.
    perf: Ignore perf.data.old
    perf report: Fix segmentation fault when running with '-g none'

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Correct printk whitespace in warning from cpu down task check
    sched: Fix incorrect sanity check
    sched: Fix fork vs hotplug vs cpuset namespaces

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Prevent potential kgdb dead lock

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing/documentation: Cover new frame pointer semantics
    tracing/documentation: Fix a typo in ftrace.txt
    ring-buffer: Check for end of page in iterator
    ring-buffer: Check if ring buffer iterator has stale data
    tracing: Prevent kernel oops with corrupted buffer

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/agp: Fix agp_amd64_init regression
    x86: Add quirk for Intel DG45FC board to avoid low memory corruption
    x86: Add Dell OptiPlex 760 reboot quirk
    x86, UV: Fix RTC latency bug by reading replicated cachelines
    oprofile/x86: add Xeon 7500 series support
    oprofile/x86: fix crash when profiling more than 28 events
    lib/dma-debug.c: mark file-local struct symbol static.
    x86/amd-iommu: Fix deassignment of a device from the pt_domain
    x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt
    x86/amd-iommu: Fix NULL pointer dereference in __detach_device()
    x86/amd-iommu: Fix possible integer overflow

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
    regulator: Specify REGULATOR_CHANGE_STATUS for WM835x LED constraints

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc: TIF_ABI_PENDING bit removal
    powerpc/pseries: Fix xics build without CONFIG_SMP
    powerpc/4xx: Add pcix type 1 transactions
    powerpc/pci: Add missing call to header fixup
    powerpc/pci: Add missing hookup to pci_slot
    powerpc/pci: Add calls to set_pcie_port_type() and set_pcie_hotplug_bridge()
    powerpc/40x: Update the PowerPC 40x board defconfigs
    powerpc/44x: Update PowerPC 44x board defconfigs

    Linus Torvalds
     

01 Feb, 2010

6 commits

  • The WM8350 LED driver needs to be able to enable and disable the
    regulators it is using. Previously the core wasn't properly enforcing
    status change constraints so the driver was able to function but this
    has always been intended to be required.

    Signed-off-by: Mark Brown
    Cc: stable@kernel.org
    Signed-off-by: Liam Girdwood

    Mark Brown
     
  • This is called under a glock, so its a good plan to use GFP_NOFS

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The do_div() call needs to remain.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • ince gfs2 writes the rindex file a block at a time, and releases the
    exclusive lock after each block, it is possible that another process
    will grab the lock in the middle of the write. Since rindex entries are
    not an even divisor of blocks, that other process may see partial
    entries. On grows, this is fine. The process can simply ignore the the
    partial entires. Previously, the code withdrew when it saw partial
    entries. Now it simply ignores them.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • I triggered a lockdep warning as following.

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.33-rc2 #1
    -------------------------------------------------------
    test_io_control/7357 is trying to acquire lock:
    (blkio_list_lock){+.+...}, at: [] blkiocg_weight_write+0x82/0x9e

    but task is already holding lock:
    (&(&blkcg->lock)->rlock){......}, at: [] blkiocg_weight_write+0x3b/0x9e

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (&(&blkcg->lock)->rlock){......}:
    [] validate_chain+0x8bc/0xb9c
    [] __lock_acquire+0x723/0x789
    [] lock_acquire+0x90/0xa7
    [] _raw_spin_lock_irqsave+0x27/0x5a
    [] blkiocg_add_blkio_group+0x1a/0x6d
    [] cfq_get_queue+0x225/0x3de
    [] cfq_set_request+0x217/0x42d
    [] elv_set_request+0x17/0x26
    [] get_request+0x203/0x2c5
    [] get_request_wait+0x18/0x10e
    [] __make_request+0x2ba/0x375
    [] generic_make_request+0x28d/0x30f
    [] submit_bio+0x8a/0x8f
    [] submit_bh+0xf0/0x10f
    [] ll_rw_block+0xc0/0xf9
    [] ext3_find_entry+0x319/0x544 [ext3]
    [] ext3_lookup+0x2c/0xb9 [ext3]
    [] do_lookup+0xd3/0x172
    [] link_path_walk+0x5fb/0x95c
    [] path_walk+0x3c/0x81
    [] do_path_lookup+0x21/0x8a
    [] do_filp_open+0xf0/0x978
    [] open_exec+0x1b/0xb7
    [] do_execve+0xbb/0x266
    [] sys_execve+0x24/0x4a
    [] ptregs_execve+0x12/0x18

    -> #1 (&(&q->__queue_lock)->rlock){..-.-.}:
    [] validate_chain+0x8bc/0xb9c
    [] __lock_acquire+0x723/0x789
    [] lock_acquire+0x90/0xa7
    [] _raw_spin_lock_irqsave+0x27/0x5a
    [] cfq_unlink_blkio_group+0x17/0x41
    [] blkiocg_destroy+0x72/0xc7
    [] cgroup_diput+0x4a/0xb2
    [] dentry_iput+0x93/0xb7
    [] d_kill+0x1c/0x36
    [] dput+0xf5/0xfe
    [] do_rmdir+0x95/0xbe
    [] sys_rmdir+0x10/0x12
    [] sysenter_do_call+0x12/0x32

    -> #0 (blkio_list_lock){+.+...}:
    [] validate_chain+0x61c/0xb9c
    [] __lock_acquire+0x723/0x789
    [] lock_acquire+0x90/0xa7
    [] _raw_spin_lock+0x1e/0x4e
    [] blkiocg_weight_write+0x82/0x9e
    [] cgroup_file_write+0xc6/0x1c0
    [] vfs_write+0x8c/0x116
    [] sys_write+0x3b/0x60
    [] sysenter_do_call+0x12/0x32

    other info that might help us debug this:

    1 lock held by test_io_control/7357:
    #0: (&(&blkcg->lock)->rlock){......}, at: [] blkiocg_weight_write+0x3b/0x9e
    stack backtrace:
    Pid: 7357, comm: test_io_control Not tainted 2.6.33-rc2 #1
    Call Trace:
    [] print_circular_bug+0x91/0x9d
    [] validate_chain+0x61c/0xb9c
    [] __lock_acquire+0x723/0x789
    [] lock_acquire+0x90/0xa7
    [] ? blkiocg_weight_write+0x82/0x9e
    [] _raw_spin_lock+0x1e/0x4e
    [] ? blkiocg_weight_write+0x82/0x9e
    [] blkiocg_weight_write+0x82/0x9e
    [] cgroup_file_write+0xc6/0x1c0
    [] ? trace_hardirqs_off+0xb/0xd
    [] ? cpu_clock+0x2e/0x44
    [] ? security_file_permission+0xf/0x11
    [] ? rw_verify_area+0x8a/0xad
    [] ? cgroup_file_write+0x0/0x1c0
    [] vfs_write+0x8c/0x116
    [] sys_write+0x3b/0x60
    [] sysenter_do_call+0x12/0x32

    To prevent deadlock, we should take locks as following sequence:

    blkio_list_lock -> queue_lock -> blkcg_lock.

    The following patch should fix this bug.

    Signed-off-by: Gui Jianfeng
    Signed-off-by: Jens Axboe

    Gui Jianfeng
     
  • Here are the powerpc bits to remove TIF_ABI_PENDING now that
    set_personality() is called at the appropriate place in exec.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Benjamin Herrenschmidt

    Andreas Schwab