05 Mar, 2011

1 commit

  • This merge creates two set of conflicts. One is simple context
    conflicts caused by removal of throtl_scheduled_delayed_work() in
    for-linus and removal of throtl_shutdown_timer_wq() in
    for-2.6.39/core.

    The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
    call directly into q->request_fn() __blk_run_queue()) in for-linus
    crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
    isn't trivial but the resolution is straight-forward.

    * __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
    should be called with @force_kblockd set to %true.

    * elv_insert() in blk_kick_flush() should use
    %ELEVATOR_INSERT_REQUEUE.

    Both changes are to avoid invoking ->request_fn() directly from
    request completion path and closely match the changes in the commit
    255bb490c8.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

04 Mar, 2011

1 commit

  • Following steps lead to deadlock in kernel:

    dd if=/dev/zero of=img bs=512 count=1000
    losetup -f img
    mkfs.ext2 /dev/loop0
    mount -t ext2 -o loop /dev/loop0 mnt
    umount mnt/

    Stacktrace:
    [] irq_exit+0x36/0x59
    [] smp_apic_timer_interrupt+0x6b/0x75
    [] apic_timer_interrupt+0x31/0x38
    [] mutex_spin_on_owner+0x54/0x5b
    [] lo_release+0x12/0x67 [loop]
    [] __blkdev_put+0x7c/0x10c
    [] fput+0xd5/0x1aa
    [] loop_clr_fd+0x1a9/0x1b1 [loop]
    [] lo_release+0x39/0x67 [loop]
    [] __blkdev_put+0x7c/0x10c
    [] deactivate_locked_super+0x17/0x36
    [] sys_umount+0x27e/0x2a5
    [] sys_oldumount+0xb/0xe
    [] sysenter_do_call+0x12/0x26
    [] 0xffffffff

    Regression since 2a48fc0ab24241755dc9, which introduced the private
    loop_mutex as part of the BKL removal process.

    As per [1], the mutex can be safely removed.

    [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930

    Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394
    Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172

    Signed-off-by: Petr Uzel
    Cc: stable@kernel.org
    Reviewed-by: Nikanth Karthikesan
    Acked-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Petr Uzel
     

03 Mar, 2011

5 commits

  • If we enable trace events to trace block actions, We use
    blk_fill_rwbs_rq to analyze the corresponding actions
    in request's cmd_flags, but we only choose the minor 2 bits
    from it, so most of other flags(e.g, REQ_SYNC) are missing.
    For example, with a sync write we get:
    write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
    8 [write_test]

    Since now we have integrated the flags of both bio and request,
    it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
    blk_fill_rwbs_rq isn't needed any more.

    With this patch, after a sync write we get:
    write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
    8 [write_test]

    Signed-off-by: Tao Ma
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Tao Ma
     
  • Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
    written in such a way that it needs queue lock. In blk_release_queue()
    there is no gurantee that ->queue_lock is still around.

    Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
    one problem.

    https://lkml.org/lkml/2010/10/23/86

    And a quick fix moved blk_throtl_exit() to blk_release_queue().

    commit 7ad58c028652753814054f4e3ac58f925e7343f4
    Author: Jens Axboe
    Date: Sat Oct 23 20:40:26 2010 +0200

    block: fix use-after-free bug in blk throttle code

    This patch reverts above change and does not try to shutdown the
    throtl work in blk_sync_queue(). By avoiding call to
    throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
    the problem reported by Ingo.

    blk_sync_queue() seems to be used only by md driver and it seems to be
    using it to make sure q->unplug_fn is not called as md registers its
    own unplug functions and it is about to free up the data structures
    used by unplug_fn(). Block throttle does not call back into unplug_fn()
    or into md. So there is no need to cancel blk throttle work.

    In fact I think cancelling block throttle work is bad because it might
    happen that some bios are throttled and scheduled to be dispatched later
    with the help of pending work and if work is cancelled, these bios might
    never be dispatched.

    Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
    blk_release_queue() time. That should be safe as we are also calling
    blk_throtl_exit() which should make sure all the throttling related
    data structures are cleaned up.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Now we initialize ->queue_lock at queue allocation time so driver does
    not have to worry about initializing it before calling
    blk_cleanup_queue().

    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • There does not seem to be a clear convention whether q->queue_lock is
    initialized or not when blk_cleanup_queue() is called. In the past it
    was not necessary but now blk_throtl_exit() takes up queue lock by
    default and needs queue lock to be available.

    In fact elevator_exit() code also has similar requirement just that it
    is less stringent in the sense that elevator_exit() is called only if
    elevator is initialized.

    Two problems have been noticed because of ambiguity about spin lock
    status.

    - If a driver calls blk_alloc_queue() and then soon calls
    blk_cleanup_queue() almost immediately, (because some other
    driver structure allocation failed or some other error happened)
    then blk_throtl_exit() will run into issues as queue lock is not
    initialized. Loop driver ran into this issue recently and I
    noticed error paths in md driver too. Similar error paths should
    exist in other drivers too.

    - If some driver provided external spin lock and zapped the lock
    before blk_cleanup_queue(), then it can lead to issues.

    So this patch initializes the default queue lock at queue allocation time.

    block throttling code is one of the users of queue lock and it is
    initialized at the queue allocation time, so it makes sense to
    initialize ->queue_lock also to internal lock. A driver can overide that
    lock later. This will take care of the issue where a driver does not have
    to worry about initializing the queue lock to default before calling
    blk_cleanup_queue()

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Rename the numerals in the diskstats_show() into the macros.

    Cc: Jens Axboe
    Signed-off-by: Liu Yuan
    Signed-off-by: Jens Axboe

    Liu Yuan
     

02 Mar, 2011

6 commits

  • blk-flush decomposes a flush into sequence of multiple requests. On
    completion of a request, the next one is queued; however, block layer
    must not implicitly call into q->request_fn() directly from completion
    path. This makes the queue behave unexpectedly when seen from the
    drivers and violates the assumption that q->request_fn() is called
    with process context + queue_lock.

    This patch makes blk-flush the following two changes to make sure
    q->request_fn() is not called directly from request completion path.

    - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
    use kblockd instead of calling directly into q->request_fn().

    - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
    ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
    request queue directly.

    Reported by Jan in the following threads.

    http://thread.gmane.org/gmane.linux.ide/48778
    http://thread.gmane.org/gmane.linux.ide/48786

    stable: applicable to v2.6.37.

    Signed-off-by: Tejun Heo
    Reported-by: Jan Beulich
    Cc: "David S. Miller"
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • __blk_run_queue() automatically either calls q->request_fn() directly
    or schedules kblockd depending on whether the function is recursed.
    blk-flush implementation needs to be able to explicitly choose
    kblockd. Add @force_kblockd.

    All the current users are converted to specify %false for the
    parameter and this patch doesn't introduce any behavior change.

    stable: This is prerequisite for fixing ide oops caused by the new
    blk-flush implementation.

    Signed-off-by: Tejun Heo
    Cc: Jan Beulich
    Cc: James Bottomley
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Effectively, make group_isolation=1 the default and remove the tunable.
    The setting group_isolation=0 was because by default we idle on
    sync-noidle tree and on fast devices, this can be very harmful for
    throughput.

    However, this problem can also be addressed by tuning slice_idle and
    possibly group_idle on faster storage devices.

    This change simplifies the CFQ code by removing the feature entirely.

    Signed-off-by: Justin TerAvest
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Justin TerAvest
     
  • Conflicts:
    block/cfq-iosched.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Ben Hutchings
     
  • o Dominik Klein reported a system hang issue while doing some blkio
    throttling testing.

    https://lkml.org/lkml/2011/2/24/173

    o Some tracing revealed that CFQ was not dispatching any more jobs as
    queue unplug was not happening. And queue unplug was not happening
    because unplug work was not being called as there was one throttling
    work on same cpu which as not finished yet. And throttling work had not
    finished as it was tyring to dispatch a bio to CFQ but all the request
    descriptors were consume to it was put to sleep.

    o So basically it is a cyclic dependecny between CFQ unplug work and
    throtl dispatch work. Tejun suggested that use separate workqueue for
    such cases.

    o This patch uses a separate workqueue for throttle related work and
    does not rely on kblockd workqueue anymore.

    Cc: stable@kernel.org
    Reported-by: Dominik Klein
    Signed-off-by: Vivek Goyal
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

01 Mar, 2011

10 commits

  • * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
    hwmon: (adt7411) add MODULE_DEVICE_TABLE
    hwmon: (ad7414) add MODULE_DEVICE_TABLE

    Linus Torvalds
     
  • Fix new kernel-doc warning in fs/block_dev.c:

    Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Several ACPI drivers fail to build if CONFIG_NET is unset, because
    they refer to things depending on CONFIG_THERMAL that in turn depends
    on CONFIG_NET. However, CONFIG_THERMAL doesn't really need to depend
    on CONFIG_NET, because the only part of it requiring CONFIG_NET is
    the netlink interface in thermal_sys.c.

    Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET
    and remove the dependency of CONFIG_THERMAL on CONFIG_NET from
    drivers/thermal/Kconfig.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Randy Dunlap
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Stephen Rothwell
    Cc: Luming Yu
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
    drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo

    Linus Torvalds
     
  • …/git/tmlind/linux-omap-2.6

    * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
    omap4: prcm: Fix the CPUx clockdomain offsets
    OMAP2+: clocksource: fix crash on boot when !CONFIG_OMAP_32K_TIMER
    OMAP2/3: clock: fix fint calculation for DPLL_FREQSEL
    OMAP2+: mailbox: fix lookups for multiple mailboxes
    OMAP2420: mailbox: fix IVA vs DSP IRQ numbering
    mach-omap2: smartreflex: world-writable debugfs voltage files
    mach-omap2: pm: world-writable debugfs timer files
    mach-omap2: mux: world-writable debugfs files

    Linus Torvalds
     
  • …or-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf timechart: Fix max number of cpus
    perf timechart: Fix black idle boxes in the title
    perf hists: Print number of samples, not the period sum

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Use u32 instead of long to set reset vector back to 0

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clockevents: Prevent oneshot mode when broadcast device is periodic

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: fix truncate after open
    fuse: fix hang of single threaded fuseblk filesystem

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    ocfs2: Check heartbeat mode for kernel stacks only
    Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
    ocfs2: Fix estimate of necessary credits for mkdir

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    eukrea-tlv320: fix platform_name
    ASoC: correct pxa AC97 DAI names
    ALSA: hda - Add support for new IDT 92HD98 and 92HD99 codecs
    ALSA: HDA: Add ideapad quirk for two Dell machines
    ALSA: HDA: Add a new Conexant codec 506e (20590)
    ALSA: usb-audio: fix oops due to cleanup race when disconnecting
    ASoC: Hook wm_hubs micbiases up to CLK_SYS
    ASoC: Correct definition of WM8903_VMID_RES_5K
    ASoC: Fix WM8958 default microphone detection argument ordering
    ALSA: HDA: Fix mic initialization in VIA auto parser
    ALSA: fix one memory leak in sound jack

    Linus Torvalds
     
  • Commit e2cda3226481 ("thp: add pmd mangling generic functions") replaced
    some macros in with inline functions.

    If the functions are to be defined (not all architectures need them)
    then struct vm_area_struct must be defined first. So include
    .

    Fixes a build failure seen in Debian:

    CC [M] drivers/media/dvb/mantis/mantis_pci.o
    In file included from arch/arm/include/asm/pgtable.h:460,
    from drivers/media/dvb/mantis/mantis_pci.c:25:
    include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
    include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

    Signed-off-by: Ben Hutchings
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     

28 Feb, 2011

6 commits


27 Feb, 2011

2 commits


26 Feb, 2011

9 commits

  • Takashi Iwai
     
  • When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
    can switch into oneshot mode, when the backup broadcast device
    supports oneshot mode as well. Otherwise we would try to switch the
    broadcast device into an unsupported mode unconditionally. This went
    unnoticed so far as the current available broadcast devices support
    oneshot mode. Seth unearthed this problem while debugging and working
    around an hpet related BIOS wreckage.

    Add the necessary check to tick_is_oneshot_available().

    Reported-and-tested-by: Seth Forshee
    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: stable@kernel.org # .21 ->

    Thomas Gleixner
     
  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset

    Linus Torvalds
     
  • Fixes sysfs config attribute to allow access to entire 16MB maintenance
    space of RapidIO devices.

    Signed-off-by: Alexandre Bounine
    Cc: Kumar Gala
    Cc: Matt Porter
    Cc: Li Yang
    Cc: Thomas Moll
    Cc: Micha Nelissen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Initialize ts_real.flags to fix compiler warning about possible
    uninitialized use of this field.

    Signed-off-by: Alexander Gordeev
    Cc: john stultz
    Cc: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Gordeev
     
  • It seems odd that truncate_inode_pages_range(), called not only when
    truncating but also when evicting inodes, has mem_cgroup_uncharge_start
    and _end() batching in its second loop to clear up a few leftovers, but
    not in its first loop that does almost all the work: add them there too.

    Signed-off-by: Hugh Dickins
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The THP code didn't pass the correct interleaving shift to the memory
    policy code. Fix this here by adjusting for the order.

    Signed-off-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Acked-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • A race can occur when io_submit() races with io_destroy():

    CPU1 CPU2
    io_submit()
    do_io_submit()
    ...
    ctx = lookup_ioctx(ctx_id);
    io_destroy()
    Now do_io_submit() holds the last reference to ctx.
    ...
    queue new AIO
    put_ioctx(ctx) - frees ctx with active AIOs

    We solve this issue by checking whether ctx is being destroyed in AIO
    submission path after adding new AIO to ctx. Then we are guaranteed that
    either io_destroy() waits for new AIO or we see that ctx is being
    destroyed and bail out.

    Cc: Nick Piggin
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.

    lookup_ioctx doesn't implement the rcu lookup pattern properly.
    rcu_read_lock does not prevent refcount going to zero, so we might take
    a refcount on a zero count ioctx.

    Fix the bug by atomically testing for zero refcount before incrementing.

    [jack@suse.cz: added comment into the code]
    Reviewed-by: Jeff Moyer
    Signed-off-by: Nick Piggin
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin