23 Aug, 2010

17 commits

  • kernel needs to kobject_put on dev->kobj if elv_register_queue fails.

    Signed-off-by: Xiaotian Feng
    Cc: "Martin K. Petersen"
    Cc: Stephen Hemminger
    Cc: Nikanth Karthikesan
    Cc: David Teigland
    Signed-off-by: Jens Axboe

    Xiaotian Feng
     
  • If kmalloc() fails then cleanup and return failure (-1).

    Signed-off-by: Dan Carpenter
    Acked-by: Stephen M. Cameron
    Signed-off-by: Jens Axboe

    Dan Carpenter
     
  • Some documentation to provide help with tunables.

    Signed-off-by: Vivek Goyal
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Divyesh had gotten rid of this code in the past. I want to re-introduce it
    back as it helps me a lot during debugging.

    Reviewed-by: Jeff Moyer
    Reviewed-by: Divyesh Shah
    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Implement a new tunable group_idle, which allows idling on the group
    instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
    on the individual queues but idle on the group. This way on fast storage
    we can get fairness between groups at the same time overall throughput
    improves.

    Signed-off-by: Vivek Goyal
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Implement another CFQ mode where we charge group in terms of number
    of requests dispatched instead of measuring the time. Measuring in terms
    of time is not possible when we are driving deeper queue depths and there
    are requests from multiple cfq queues in the request queue.

    o This mode currently gets activated if one sets slice_idle=0 and associated
    disk supports NCQ. Again the idea is that on an NCQ disk with idling disabled
    most of the queues will dispatch 1 or more requests and then cfq queue
    expiry happens and we don't have a way to measure time. So start providing
    fairness in terms of IOPS.

    o Currently IOPS mode works only with cfq group scheduling. CFQ is following
    different scheduling algorithms for queue and group scheduling. These IOPS
    stats are used only for group scheduling hence in non-croup mode nothing
    should change.

    o For CFQ group scheduling one can disable slice idling so that we don't idle
    on queue and drive deeper request queue depths (achieving better throughput),
    at the same time group idle is enabled so one should get service
    differentiation among groups.

    Signed-off-by: Vivek Goyal
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Do not idle either on cfq queue or service tree if slice_idle=0. User does
    not want any queue or service tree idling. Currently even if slice_idle=0,
    we were waiting for request to finish before expiring the queue and that
    can lead to lower queue depths.

    Acked-by: Jeff Moyer
    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • The doorbell reset initially appears to work correctly,
    the controller resets, comes up, some i/o can even be
    done, but on at least some Smart Arrays in some servers,
    it eventually causes a subsequent controller lockup due
    to some kind of PCIe error, and kdump can end up leaving
    the root filesystem in an unbootable state. For this
    reason, until the problem is fixed, or at least isolated
    to certain hardware enough to be avoided, the doorbell
    reset should not be used at all.

    Signed-off-by: Stephen M. Cameron
    Signed-off-by: Jens Axboe

    Stephen M. Cameron
     
  • If the cgroup hierarchy for blkio control groups is deeper than two
    levels, kernel should not allow the creation of further levels. mkdir
    system call does not except EINVAL as a return value. This patch
    replaces EINVAL with more appropriate EPERM

    Signed-off-by: Ciju Rajan K
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Jens Axboe

    Ciju Rajan K
     
  • * 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
    radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
    radix-tree: clear all tags in radix_tree_node_rcu_free

    Linus Torvalds
     
  • Linus Torvalds
     
  • Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
    omplement function radix_tree_range_tag_if_tagged") does not safely
    set tags on on intermediate tree nodes. The code walks down the tree
    setting tags before it has fully resolved the path to the leaf under
    the assumption there will be a leaf slot with the tag set in the
    range it is searching.

    Unfortunately, this is not a valid assumption - we can abort after
    setting a tag on an intermediate node if we overrun the number of
    tags we are allowed to set in a batch, or stop scanning because we
    we have passed the last scan index before we reach a leaf slot with
    the tag we are searching for set.

    As a result, we can leave the function with tags set on intemediate
    nodes which can be tripped over later by tag-based lookups. The
    result of these stale tags is that lookup may end prematurely or
    livelock because the lookup cannot make progress.

    The fix for the problem involves reocrding the traversal path we
    take to the leaf nodes, and only propagating the tags back up the
    tree once the tag is set in the leaf node slot. We are already
    recording the path for efficient traversal, so there is no
    additional overhead to do the intermediately node tag setting in
    this manner.

    This fixes a radix tree lookup livelock triggered by the new
    writeback sync livelock avoidance code introduced in commit
    f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
    livelock avoidance using page tagging").

    Signed-off-by: Dave Chinner
    Acked-by: Jan Kara

    Dave Chinner
     
  • Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
    writeback livelock avoidance using page tagging") introduced a new
    radix tree tag, increasing the number of tags in each node from 2 to
    3. It did not, however, fix up the code in
    radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
    and hence could leave stray tags set in the new tag array.

    The result is that the livelock avoidance code added in the the
    above commit would hit stale tags when doing tag based lookups,
    resulting in livelocks when trying to traverse the tree.

    Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
    again in the future by using a loop to walk all the tags up to
    RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
    leaves behind.

    Signed-off-by: Dave Chinner
    Acked-by: Nick Piggin
    Acked-by: Jan Kara

    Dave Chinner
     
  • * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: PIT: free irq source id in handling error path
    KVM: destroy workqueue on kvm_create_pit() failures
    KVM: fix poison overwritten caused by using wrong xstate size

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
    drm/i915,intel_agp: Add support for Sandybridge D0
    drm/i915: fix render pipe control notify on sandybridge
    agp/intel: set 40-bit dma mask on Sandybridge
    drm/i915: Remove the conflicting BUG_ON()
    drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
    drm/i915/suspend: Flush register writes before busy-waiting.
    i915: disable DAC on Ironlake also when doing CRT load detection.
    drm/i915: wait for actual vblank, not just 20ms
    drm/i915: make sure eDP PLL is enabled at the right time
    drm/i915: fix VGA plane disable for Ironlake+
    drm/i915: eDP mode set sequence corrections
    drm/i915: add panel reset workaround
    drm/i915: Enable RC6 on Ironlake.
    drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
    drm/i915: Set up a render context on Ironlake
    drm/i915 invalidate indirect state pointers at end of ring exec
    drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
    drm/i915: Apply i830 errata for cursor alignment
    drm/i915: Only update i845/i865 CURBASE when disabled (v2)
    drm/i915: FBC is updated within set_base() so remove second call in mode_set()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slab: fix object alignment
    slub: add missing __percpu markup in mm/slub_def.h

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
    nilfs2: wait for discard to finish

    Linus Torvalds
     

22 Aug, 2010

11 commits


21 Aug, 2010

12 commits

  • Like the mlock() change previously, this makes the stack guard check
    code use vma->vm_prev to see what the mapping below the current stack
    is, rather than have to look it up with find_vma().

    Also, accept an abutting stack segment, since that happens naturally if
    you split the stack with mlock or mprotect.

    Tested-by: Ian Campbell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • If we've split the stack vma, only the lowest one has the guard page.
    Now that we have a doubly linked list of vma's, checking this is trivial.

    Tested-by: Ian Campbell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • It's a really simple list, and several of the users want to go backwards
    in it to find the previous vma. So rather than have to look up the
    previous entry with 'find_vma_prev()' or something similar, just make it
    doubly linked instead.

    Tested-by: Ian Campbell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Apparently, the check for a 6-byte ID string introduced by commit
    426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash
    detection to new MLC chips") is NOT sufficient to determine whether or
    not a Samsung chip uses their new MLC detection scheme or the old,
    standard scheme. This adds a condition to check cell type.

    Signed-off-by: Tilman Sauerbeck
    Signed-off-by: Brian Norris
    Signed-off-by: David Woodhouse
    Cc: stable@kernel.org

    Tilman Sauerbeck
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, apic: Fix apic=debug boot crash
    x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
    x86-32: Fix dummy trampoline-related inline stubs
    x86-32: Separate 1:1 pagetables from swapper_pg_dir
    x86, cpu: Fix regression in AMD errata checking code

    Linus Torvalds
     
  • This list moved to lists.ozlabs.org quite some time ago.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • All these lists moved to lists.ozlabs.org quite a while ago.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Chapter 6 is right about mutex_trylock, but chapter 10 wasn't. This error
    was introduced during semaphore-to-mutex conversion of the Unreliable
    guide. :-)

    If user context which performs mutex_lock() or mutex_trylock() is
    preempted by interrupt context which performs mutex_trylock() on the same
    mutex instance, a deadlock occurs. This is because these functions do not
    disable local IRQs when they operate on mutex->wait_lock.

    Signed-off-by: Stefan Richter
    Acked-by: Rusty Russell
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • gcc-4.0.2:

    drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
    drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
    drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
    drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
    drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here

    Cc: Ravi Anand
    Cc: Vikas Chaudhary
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Fix uml compile error:

    include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
    arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here

    Introduced by commit 4565f0170dfc ("dma-mapping: unify
    dma_get_cache_alignment implementations")

    Signed-off-by: Miklos Szeredi
    Cc: Jeff Dike
    Cc: FUJITA Tomonori
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • dump_tasks() needs to hold the RCU read lock around its access of the
    target task's UID. To this end it should use task_uid() as it only needs
    that one thing from the creds.

    The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
    target process replacing its credentials on another CPU.

    Then, this patch change to call rcu_read_lock() explicitly.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    mm/oom_kill.c:410 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 1
    4 locks held by kworker/1:2/651:
    #0: (events){+.+.+.}, at: []
    process_one_work+0x137/0x4a0
    #1: (moom_work){+.+...}, at: []
    process_one_work+0x137/0x4a0
    #2: (tasklist_lock){.+.+..}, at: []
    out_of_memory+0x164/0x3f0
    #3: (&(&p->alloc_lock)->rlock){+.+...}, at: []
    find_lock_task_mm+0x2e/0x70

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: David Howells
    Acked-by: Paul E. McKenney
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Commit 0aad4b3124 ("oom: fold __out_of_memory into out_of_memory")
    introduced a tasklist_lock leak. Then it caused following obvious
    danger warnings and panic.

    ================================================
    [ BUG: lock held when returning to user space! ]
    ------------------------------------------------
    rsyslogd/1422 is leaving the kernel with locks still held!
    1 lock held by rsyslogd/1422:
    #0: (tasklist_lock){.+.+.+}, at: [] out_of_memory+0x164/0x3f0
    BUG: scheduling while atomic: rsyslogd/1422/0x00000002
    INFO: lockdep is turned off.

    This patch fixes it.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Minchan Kim
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro