12 Feb, 2011

1 commit

  • Commit 2a48fc0ab242417 ("block: autoconvert trivial BKL users to private
    mutex") replaced uses of the BKL in the nbd driver with mutex
    operations. Since then, I've been been seeing these lock ups:

    INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    qemu-nbd D 0000000000000001 0 16115 16114 0x00000004
    ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000
    0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80
    ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020
    Call Trace:
    [] __mutex_lock_slowpath+0xf7/0x180
    [] mutex_lock+0x2b/0x50
    [] nbd_ioctl+0x6c/0x1c0 [nbd]
    [] blkdev_ioctl+0x230/0x730
    [] block_ioctl+0x41/0x50
    [] do_vfs_ioctl+0x93/0x370
    [] sys_ioctl+0x81/0xa0
    [] system_call_fastpath+0x16/0x1b

    Instrumenting the nbd module's ioctl handler with some extra logging
    clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
    ioctl in the sense that it doesn't return until another ioctl asks the
    driver to disconnect. However, that other ioctl blocks, waiting for the
    module-level mutex that replaced the BKL, and then we're stuck.

    This patch removes the module-level mutex altogether. It's clearly
    wrong, and as far as I can see, it's entirely unnecessary, since the nbd
    driver maintains per-device mutexes, and I don't see anything that would
    require a module-level (or kernel-level, for that matter) mutex.

    Signed-off-by: Soren Hansen
    Acked-by: Serge Hallyn
    Acked-by: Paul Clements
    Cc: Arnd Bergmann
    Cc: Jens Axboe
    Cc: [2.6.37.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Soren Hansen
     

19 Jan, 2011

4 commits

  • Signed-off-by: Stephen M. Cameron
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Stephen M. Cameron
     
  • Change Makefile to use -y instead of -objs because -objs
    is deprecated and should now be switched. According to
    (documentation/kbuild/makefiles.txt).

    Signed-off-by: Tracey Dent
    Cc: "Ed L. Cashin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Tracey Dent
     
  • Performing
    $ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/
    mount: wrong fs type, bad option, bad superblock on /dev/loop0,
    missing codepage or helper program, or other error
    In some cases useful info is found in syslog - try
    dmesg | tail or so

    $ sudo modprobe -r loop

    results in oops:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
    IP: [] do_raw_spin_lock+0x14/0x122
    Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000)
    Call Trace:
    [] _raw_spin_lock_irq+0x4a/0x51
    [] ? blk_throtl_exit+0x3b/0xa0
    [] ? cancel_delayed_work_sync+0xd/0xf
    [] blk_throtl_exit+0x3b/0xa0
    [] blk_release_queue+0x21/0x65
    [] kobject_release+0x51/0x66
    [] ? kobject_release+0x0/0x66
    [] kref_put+0x43/0x4d
    [] kobject_put+0x47/0x4b
    [] blk_cleanup_queue+0x56/0x5b
    [] loop_exit+0x68/0x844 [loop]
    [] sys_delete_module+0x1e8/0x25b
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b

    because of an attempt to acquire NULL queue_lock.
    I added the same lines as in blk_queue_make_request -
    index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'.

    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Sergey Senozhatsky
     
  • Change Makefile to use -y instead of -objs because -objs
    is deprecated and should now be switched. According to
    (documentation/kbuild/makefiles.txt).

    Signed-off-by: Tracey Dent
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Tracey Dent
     

14 Jan, 2011

3 commits

  • * 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block:
    cciss: reinstate proper FIFO order of command queue list
    floppy: replace NO_GEOM macro with a function

    Linus Torvalds
     
  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup when trying to mount inexistent image
    net/ceph: make ceph_msgr_wq non-reentrant
    ceph: fsc->*_wq's aren't used in memory reclaim path
    ceph: Always free allocated memory in osdmap_decode()
    ceph: Makefile: Remove unnessary code
    ceph: associate requests with opening sessions
    ceph: drop redundant r_mds field
    ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
    ceph: add dir_layout to inode

    Linus Torvalds
     

13 Jan, 2011

2 commits

  • Previously we didn't clean up the sysfs entry that was just
    created.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • * 'stable/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/xenbus: making backend support modular is too complex
    xen/pci: Make xen-pcifront be dependent on XEN_XENBUS_FRONTEND
    xen/xenbus: fixup checkpatch issues in xenbus_probe*
    xen/netfront: select XEN_XENBUS_FRONTEND
    xen/xenbus: clean up noise in xenbus_probe_frontend.c
    xen/xenbus: clean up noise in xenbus_probe_backend.c
    xen/xenbus: clean up noise in xenbus_probe.c
    xen/xenbus: cleanup debug noise in xenbus_comms.c
    xen/xenbus: clean up error handling
    xen/xenbus: make frontend bus GPL
    xen/xenbus: make sure backend bus is registered earlier
    xenbus/frontend: register bus earlier
    xen: remove xen/evtchn.h
    xen: add backend driver support
    xen: separate out frontend xenbus

    Linus Torvalds
     

11 Jan, 2011

1 commit


08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
    usb: don't use flush_scheduled_work()
    speedtch: don't abuse struct delayed_work
    media/video: don't use flush_scheduled_work()
    media/video: explicitly flush request_module work
    ioc4: use static work_struct for ioc4_load_modules()
    init: don't call flush_scheduled_work() from do_initcalls()
    s390: don't use flush_scheduled_work()
    rtc: don't use flush_scheduled_work()
    mmc: update workqueue usages
    mfd: update workqueue usages
    dvb: don't use flush_scheduled_work()
    leds-wm8350: don't use flush_scheduled_work()
    mISDN: don't use flush_scheduled_work()
    macintosh/ams: don't use flush_scheduled_work()
    vmwgfx: don't use flush_scheduled_work()
    tpm: don't use flush_scheduled_work()
    sonypi: don't use flush_scheduled_work()
    hvsi: don't use flush_scheduled_work()
    xen: don't use flush_scheduled_work()
    gdrom: don't use flush_scheduled_work()
    ...

    Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
    as per Tejun.

    Linus Torvalds
     

06 Jan, 2011

1 commit

  • Impact: refactor

    Make a distinct frontend xenbus, in preparation for adding a backend xenbus.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    [corresponds to 2fd433a4188f in git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
    with adjustments to reflect changes in the code which is moved]
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     

27 Dec, 2010

1 commit


24 Dec, 2010

2 commits


21 Dec, 2010

2 commits

  • .. caused by a missing semi-colon, introduced in commit 0fc13c8995cd
    ("cciss: fix cciss_revalidate panic").

    Reported-by: Stephen Rothwell
    Reported-by: Thiago Farina
    Cc: Jens Axboe
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cciss: fix cciss_revalidate panic
    block: max hardware sectors limit wrapper
    block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
    blk-throttle: Correct the placement of smp_rmb()
    blk-throttle: Trim/adjust slice_end once a bio has been dispatched
    block: check for proper length of iov entries earlier in blk_rq_map_user_iov()
    drbd: fix for spin_lock_irqsave in endio callback
    drbd: don't recvmsg with zero length

    Linus Torvalds
     

20 Dec, 2010

1 commit

  • Commit a8adbe3 forgot to remove the return variable, kill it.

    drivers/block/loop.c: In function 'lo_splice_actor':
    drivers/block/loop.c:398: warning: unused variable 'ret'
    [...]
    fs/nfsd/vfs.c: In function 'nfsd_splice_actor':
    fs/nfsd/vfs.c:848: warning: unused variable 'ret'

    Reported-by: Stephen Rothwell
    Signed-off-by: Jens Axboe

    Jens Axboe
     

17 Dec, 2010

2 commits

  • If you delete a logical drive, and then run BLKRRPART (e.g. via fdisk)
    on a logical drive which is "after" the deleted logical drive in the h->drv[]
    array, then cciss_revalidate panics because it will access the null pointer
    h->drv[x] when x hits the deleted drive.

    Signed-off-by: Stephen M. Cameron
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Stephen M. Cameron
     
  • This patch pulls calls to buf->ops->confirm() from all actors passed
    (also indirectly) to splice_from_pipe_feed().

    Is avoiding the call to buf->ops->confirm() while splice()ing to
    /dev/null is an intentional optimization? No other user does that
    and this will remove this special case.

    Against current linux.git 6313e3c21743cc88bb5bd8aa72948ee1e83937b6.

    Signed-off-by: Michał Mirosław
    Signed-off-by: Jens Axboe

    Michał Mirosław
     

16 Dec, 2010

1 commit


03 Dec, 2010

1 commit


02 Dec, 2010

1 commit

  • The new interface creates directories per mapped image
    and under each it creates a subdir per available snapshot.
    This allows keeping a cleaner interface within the sysfs
    guidelines. The ABI documentation was updated too.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

28 Nov, 2010

3 commits

  • In commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb,
    Author: Lars Ellenberg
    Date: Wed Aug 11 23:40:24 2010 +0200

    drbd: new configuration parameter c-min-rate

    a bad chunk slipped through, which is now reverted as well,
    restoring the correct irqsave for the endio callback.

    This patch also add comments at both req_mod()
    and in the endio callback so it should not happen again.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Lars Ellenberg
     
  • This should fix a performance degradation we observed recently.

    If we don't expect any subheader, we should not call into the tcp stack,
    as that may add considerable latency if there is no data available at
    this point.

    For a synthetic synchronous write load with single outstanding writes,
    this additional latency when processing the "unplug remote" packet
    added up to a performance degradation factor >= 10.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Lars Ellenberg
     
  • …/tj/misc into for-2.6.38/core

    Jens Axboe
     

27 Nov, 2010

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cciss: fix build for PROC_FS disabled
    block: fix amiga and atari floppy driver compile warning
    blk-throttle: Fix calculation of max number of WRITES to be dispatched
    ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()
    xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests
    xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER
    xen/blkfront: change blk_shadow.request to proper pointer
    xen/blkfront: map REQ_FLUSH into a full barrier

    Linus Torvalds
     

18 Nov, 2010

1 commit


17 Nov, 2010

2 commits

  • The recent patch to fix the removal of a non-existing proc
    directory introduced this build problem for !CONFIG_PROC_FS:

    drivers/block/cciss.c:4929: error: 'proc_cciss' undeclared (first use in this function)

    Fix it by moving proc_cciss outside of the CONFIG_PROC_FS scope.

    Reported-by: Randy Dunlap
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Move the mid-layer's ->queuecommand() invocation from being locked
    with the host lock to being unlocked to facilitate speeding up the
    critical path for drivers who don't need this lock taken anyway.

    The patch below presents a simple SCSI host lock push-down as an
    equivalent transformation. No locking or other behavior should change
    with this patch. All existing bugs and locking orders are preserved.

    Additionally, add one parameter to queuecommand,
    struct Scsi_Host *
    and remove one parameter from queuecommand,
    void (*done)(struct scsi_cmnd *)

    Scsi_Host* is a convenient pointer that most host drivers need anyway,
    and 'done' is redundant to struct scsi_cmnd->scsi_done.

    Minimal code disturbance was attempted with this change. Most drivers
    needed only two one-line modifications for their host lock push-down.

    Signed-off-by: Jeff Garzik
    Acked-by: James Bottomley
    Signed-off-by: Linus Torvalds

    Jeff Garzik
     

16 Nov, 2010

1 commit

  • Geert, my crosstool don't produce warning below. I guess this has to do
    something with compiler version.

    - Geert noticed following warning during compilation.

    drivers/block/amiflop.c:1344: warning: ‘rq’ may be used uninitialized in
    this function
    drivers/block/ataflop.c:1402: warning: ‘rq’ may be used uninitialized in
    this function

    - Initialize rq to NULL to fix the warning. If we can't find a suitable request
    to dispatch, this function should return NULL instead of a possibly garbage
    pointer.

    - Cross compile tested only. Don't have hardware to test it.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

15 Nov, 2010

1 commit


13 Nov, 2010

3 commits

  • After recent blkdev_get() modifications, open_by_devnum() and
    open_bdev_exclusive() are simple wrappers around blkdev_get().
    Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

    blkdev_get_by_dev() is identical to open_by_devnum().
    blkdev_get_by_path() is slightly different in that it doesn't
    automatically add %FMODE_EXCL to @mode.

    All users are converted. Most conversions are mechanical and don't
    introduce any behavior difference. There are several exceptions.

    * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
    reason to OR it explicitly on blkdev_put().

    * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
    sb->s_mode.

    * With the above changes, sb->s_mode now always should contain
    FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
    errors.

    The new blkdev_get_*() functions are with proper docbook comments.
    While at it, add function description to blkdev_get() too.

    Signed-off-by: Tejun Heo
    Cc: Philipp Reisner
    Cc: Neil Brown
    Cc: Mike Snitzer
    Cc: Joern Engel
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: KONISHI Ryusuke
    Cc: reiserfs-devel@vger.kernel.org
    Cc: xfs-masters@oss.sgi.com
    Cc: Alexander Viro

    Tejun Heo
     
  • Over time, block layer has accumulated a set of APIs dealing with bdev
    open, close, claim and release.

    * blkdev_get/put() are the primary open and close functions.

    * bd_claim/release() deal with exclusive open.

    * open/close_bdev_exclusive() are combination of open and claim and
    the other way around, respectively.

    * bd_link/unlink_disk_holder() to create and remove holder/slave
    symlinks.

    * open_by_devnum() wraps bdget() + blkdev_get().

    The interface is a bit confusing and the decoupling of open and claim
    makes it impossible to properly guarantee exclusive access as
    in-kernel open + claim sequence can disturb the existing exclusive
    open even before the block layer knows the current open if for another
    exclusive access. Reorganize the interface such that,

    * blkdev_get() is extended to include exclusive access management.
    @holder argument is added and, if is @FMODE_EXCL specified, it will
    gain exclusive access atomically w.r.t. other exclusive accesses.

    * blkdev_put() is similarly extended. It now takes @mode argument and
    if @FMODE_EXCL is set, it releases an exclusive access. Also, when
    the last exclusive claim is released, the holder/slave symlinks are
    removed automatically.

    * bd_claim/release() and close_bdev_exclusive() are no longer
    necessary and either made static or removed.

    * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
    is no longer necessary and removed.

    * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
    and blkdev_get(). It also has an unexpected extra bdev_read_only()
    test which probably should be moved into blkdev_get().

    * open_by_devnum() is modified to take @holder argument and pass it to
    blkdev_get().

    Most of bdev open/close operations are unified into blkdev_get/put()
    and most exclusive accesses are tested atomically at the open time (as
    it should). This cleans up code and removes some, both valid and
    invalid, but unnecessary all the same, corner cases.

    open_bdev_exclusive() and open_by_devnum() can use further cleanup -
    rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
    special features. Well, let's leave them for another day.

    Most conversions are straight-forward. drbd conversion is a bit more
    involved as there was some reordering, but the logic should stay the
    same.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Brown
    Acked-by: Ryusuke Konishi
    Acked-by: Mike Snitzer
    Acked-by: Philipp Reisner
    Cc: Peter Osterlund
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: "Theodore Ts'o"
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: dm-devel@redhat.com
    Cc: drbd-dev@lists.linbit.com
    Cc: Leo Chen
    Cc: Scott Branden
    Cc: Chris Mason
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: reiserfs-devel@vger.kernel.org
    Cc: Alexander Viro

    Tejun Heo
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
    block: remove unused copy_io_context()
    Documentation: remove anticipatory scheduler info
    block: remove REQ_HARDBARRIER
    ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2)
    ioprio: fix RCU locking around task dereference
    block: ioctl: fix information leak to userland
    block: read i_size with i_size_read()
    cciss: fix proc warning on attempt to remove non-existant directory
    bio: take care not overflow page count when mapping/copying user data
    block: limit vec count in bio_kmalloc() and bio_alloc_map_data()
    block: take care not to overflow when calculating total iov length
    block: check for proper length of iov entries in blk_rq_map_user_iov()
    cciss: remove controllers supported by hpsa
    cciss: use usleep_range not msleep for small sleeps
    cciss: limit commands allocated on reset_devices
    cciss: Use kernel provided PCI state save and restore functions
    cciss: fix board status waiting code
    drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs
    drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses
    drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code
    ...

    Linus Torvalds
     

12 Nov, 2010

1 commit


10 Nov, 2010

3 commits

  • REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
    at this point is:

    - various checks inside the block layer.
    - sanity checks in bio based drivers.
    - now unused bio_empty_barrier helper.
    - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
    but Xen really needs to sort out it's barrier situaton.
    - setting of ordered tags in uas - dead code copied from old scsi
    drivers.
    - scsi different retry for barriers - it's dead and should have been
    removed when flushes were converted to FS requests.
    - blktrace handling of barriers - removed. Someone who knows blktrace
    better should add support for REQ_FLUSH and REQ_FUA, though.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Conflicts:
    drivers/block/cciss.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Convert direct reads of an inode's i_size to using i_size_read().

    i_size_{read,write} use a seqcount to protect reads from accessing
    incomple writes. Concurrent i_size_write()s require mutual exclussion
    to protect the seqcount that is used by i_size_{read,write}. But
    i_size_read() callers do not need to use additional locking.

    Signed-off-by: Mike Snitzer
    Acked-by: NeilBrown
    Acked-by: Lars Ellenberg
    Signed-off-by: Jens Axboe

    Mike Snitzer