31 Mar, 2011

1 commit


25 Mar, 2011

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: simplify iget & friends
    fs: pull inode->i_lock up out of writeback_single_inode
    fs: rename inode_lock to inode_hash_lock
    fs: move i_wb_list out from under inode_lock
    fs: move i_sb_list out from under inode_lock
    fs: remove inode_lock from iput_final and prune_icache
    fs: Lock the inode LRU list separately
    fs: factor inode disposal
    fs: protect inode->i_state with inode->i_lock
    autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
    autofs4 - remove autofs4_lock
    autofs4 - fix d_manage() return on rcu-walk
    autofs4 - fix autofs4_expire_indirect() traversal
    autofs4 - fix dentry leak in autofs4_expire_direct()
    autofs4 - reinstate last used update on access
    vfs - check non-mountpoint dentry might block in __follow_mount_rcu()

    Linus Torvalds
     
  • Protect the inode writeback list with a new global lock
    inode_wb_list_lock and use it to protect the list manipulations and
    traversals. This lock replaces the inode_lock as the inodes on the
    list can be validity checked while holding the inode->i_lock and
    hence the inode_lock is no longer needed to protect the list.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Protect inode state transitions and validity checks with the
    inode->i_lock. This enables us to make inode state transitions
    independently of the inode_lock and is the first step to peeling
    away the inode_lock from the code.

    This requires that __iget() is done atomically with i_state checks
    during list traversals so that we don't race with another thread
    marking the inode I_FREEING between the state check and grabbing the
    reference.

    Also remove the unlock_new_inode() memory barrier optimisation
    required to avoid taking the inode_lock when clearing I_NEW.
    Simplify the code by simply taking the inode->i_lock around the
    state change and wakeup. Because the wakeup is no longer tricky,
    remove the wake_up_inode() function and open code the wakeup where
    necessary.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     

19 Mar, 2011

1 commit


10 Mar, 2011

5 commits

  • Conflicts:
    block/blk-core.c
    block/blk-flush.c
    drivers/md/raid1.c
    drivers/md/raid10.c
    drivers/md/raid5.c
    fs/nilfs2/btnode.c
    fs/nilfs2/mdt.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Not all block drivers clear events immediately after reporting. Some
    do so in ->revalidate_disk() or other steps during ->open(). There is
    a slim chance event poll may happen between the clearing event check
    from check_disk_change() and the actual clearing of the events which
    would result in spurious events.

    Block event checks while block device open is in progress. There is
    no need to kick explicit event check afterwards as events are always
    checked during open.

    -v2: The original patch could have called disk_unblock_events() with
    an already released or %NULL @disk causing oops. Fixed by making
    sure references are put after disk_unblock_events() is called.
    It also makes the error path of __blkdev_get() a bit simpler.
    This problem was reported by Jens.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Kay Sievers

    Tejun Heo
     
  • The block event mechanism currently always checks events when the
    device is being closed regardless of the open mode. The intention was
    to allow detection of EJECT_REQUEST when a device is closed whether
    disk event polling is enabled or not.

    This is unnecessary as, for devices of interest, events are checked
    from either userland or kernel and in the former case ->check_events()
    is performed on open of each poll attempt anyway. Furthermore, this
    unconditional event check on close makes the code susceptible to event
    loop if the block driver doesn't clear reported events correctly - an
    event triggers userland to open and close the device which in turn
    causes another event, rinse and repeat.

    Check events on close only if it was blocked by excl write open.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Kay Sievers

    Tejun Heo
     
  • Currently, disk_unblock_events() implicitly kick event check if the
    block count reaches zero. This behavior is not described in the
    comment and hinders with future changes. Make the unblocker
    explicitly check events by calling disk_check_events() as necessary.

    This patch doesn't cause any behavior difference.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Kay Sievers

    Tejun Heo
     

01 Mar, 2011

1 commit


26 Feb, 2011

1 commit

  • * 'for-linus' of git://neil.brown.name/md:
    md: Fix - again - partition detection when array becomes active
    Fix over-zealous flush_disk when changing device size.
    md: avoid spinlock problem in blk_throtl_exit
    md: correctly handle probe of an 'mdp' device.
    md: don't set_capacity before array is active.
    md: Fix raid1->raid0 takeover

    Linus Torvalds
     

25 Feb, 2011

1 commit

  • The new implementation of bd_link_disk_holder() added by 49731baa41d
    (block: restore multiple bd_link_disk_holder() support) didn't get an
    extra reference for the holder_dir kobject of the slave bdev; however,
    bdev kills holder_dir on removal, not release, so if the slave bdev is
    removed while there are holder links, the holder_dir will be destroyed
    while there still are holder links, which leads to oops later when
    bd_unlink_disk_order() tries to remove those links.

    Make bd_link_disk_holder() grab an extra reference for the slave's
    holder_dir and put it in bd_unlink_disk_holder().

    Signed-off-by: Tejun Heo
    Reported-by: "Hawrylewicz Czarnowski, Przemyslaw"
    Tested-by: "Hawrylewicz Czarnowski, Przemyslaw"
    Cc: Neil Brown
    Cc: Jens Axboe
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

24 Feb, 2011

1 commit

  • There are two cases when we call flush_disk.
    In one, the device has disappeared (check_disk_change) so any
    data will hold becomes irrelevant.
    In the oter, the device has changed size (check_disk_size_change)
    so data we hold may be irrelevant.

    In both cases it makes sense to discard any 'clean' buffers,
    so they will be read back from the device if needed.

    In the former case it makes sense to discard 'dirty' buffers
    as there will never be anywhere safe to write the data. In the
    second case it *does*not* make sense to discard dirty buffers
    as that will lead to file system corruption when you simply enlarge
    the containing devices.

    flush_disk calls __invalidate_devices.
    __invalidate_device calls both invalidate_inodes and invalidate_bdev.

    invalidate_inodes *does* discard I_DIRTY inodes and this does lead
    to fs corruption.

    invalidate_bev *does*not* discard dirty pages, but I don't really care
    about that at present.

    So this patch adds a flag to __invalidate_device (calling it
    __invalidate_device2) to indicate whether dirty buffers should be
    killed, and this is passed to invalidate_inodes which can choose to
    skip dirty inodes.

    flusk_disk then passes true from check_disk_change and false from
    check_disk_size_change.

    dm avoids tripping over this problem by calling i_size_write directly
    rathher than using check_disk_size_change.

    md does use check_disk_size_change and so is affected.

    This regression was introduced by commit 608aeef17a which causes
    check_disk_size_change to call flush_disk, so it is suitable for any
    kernel since 2.6.27.

    Cc: stable@kernel.org
    Acked-by: Jeff Moyer
    Cc: Andrew Patterson
    Cc: Jens Axboe
    Signed-off-by: NeilBrown

    NeilBrown
     

17 Feb, 2011

1 commit

  • This reverts commit 75f1dc0d076d ("block: check bdev_read_only() from
    blkdev_get()"). That commit added stricter checking to make sure
    devices that were being used read-only were actually opened in that
    mode.

    It turns out that the change breaks a bunch of kernel code that opens
    block devices. Affected systems include dm, md, and the loop device.
    Because strict checking for read-only opens of block devices was not
    done before this, the code that opens the devices was opening them
    read-write even if they were being used read-only. Auditing all that
    code will take time, and new userspace packages for dm, mdadm, etc.
    will also be required.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     

15 Jan, 2011

1 commit

  • Commit e09b457b (block: simplify holder symlink handling) incorrectly
    assumed that there is only one link at maximum. dm may use multiple
    links and expects block layer to track reference count for each link,
    which is different from and unrelated to the exclusive device holder
    identified by @holder when the device is opened.

    Remove the single holder assumption and automatic removal of the link
    and revive the per-link reference count tracking. The code
    essentially behaves the same as before commit e09b457b sans the
    unnecessary kobject reference count dancing.

    While at it, note that this facility should not be used by anyone else
    than the current ones. Sysfs symlinks shouldn't be abused like this
    and the whole thing doesn't belong in the block layer at all.

    Signed-off-by: Tejun Heo
    Reported-by: Milan Broz
    Cc: Jun'ichi Nomura
    Cc: Neil Brown
    Cc: linux-raid@vger.kernel.org
    Cc: Kay Sievers
    Signed-off-by: Jens Axboe

    Tejun Heo
     

14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

13 Jan, 2011

1 commit


07 Jan, 2011

1 commit

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

17 Dec, 2010

1 commit

  • Currently, media presence polling for removeable block devices is done
    from userland. There are several issues with this.

    * Polling is done by periodically opening the device. For SCSI
    devices, the command sequence generated by such action involves a
    few different commands including TEST_UNIT_READY. This behavior,
    while perfectly legal, is different from Windows which only issues
    single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
    ATAPI devices lock up after being periodically queried such command
    sequences.

    * There is no reliable and unintrusive way for a userland program to
    tell whether the target device is safe for media presence polling.
    For example, polling for media presence during an on-going burning
    session can make it fail. The polling program can avoid this by
    opening the device with O_EXCL but then it risks making a valid
    exclusive user of the device fail w/ -EBUSY.

    * Userland polling is unnecessarily heavy and in-kernel implementation
    is lighter and better coordinated (workqueue, timer slack).

    This patch implements framework for in-kernel disk event handling,
    which includes media presence polling.

    * bdops->check_events() is added, which supercedes ->media_changed().
    It should check whether there's any pending event and return if so.
    Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
    DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be
    called parallelly.

    * gendisk->events and ->async_events are added. These should be
    initialized by block driver before passing the device to add_disk().
    The former contains the mask of all supported events and the latter
    the mask of all events which the device can report without polling.
    /sys/block/*/events[_async] export these to userland.

    * Kernel parameter block.events_dfl_poll_msecs controls the system
    polling interval (default is 0 which means disable) and
    /sys/block/*/events_poll_msecs control polling intervals for
    individual devices (default is -1 meaning use system setting). Note
    that if a device can report all supported events asynchronously and
    its polling interval isn't explicitly set, the device won't be
    polled regardless of the system polling interval.

    * If a device is opened exclusively with write access, event checking
    is automatically disabled until all write exclusive accesses are
    released.

    * There are event 'clearing' events. For example, both of currently
    defined events are cleared after the device has been successfully
    opened. This information is passed to ->check_events() callback
    using @clearing argument as a hint.

    * Event checking is always performed from system_nrt_wq and timer
    slack is set to 25% for polling.

    * Nothing changes for drivers which implement ->media_changed() but
    not ->check_events(). Going forward, all drivers will be converted
    to ->check_events() and ->media_change() will be dropped.

    Signed-off-by: Tejun Heo
    Cc: Kay Sievers
    Cc: Jan Kara
    Signed-off-by: Jens Axboe

    Tejun Heo
     

18 Nov, 2010

1 commit


13 Nov, 2010

5 commits

  • After recent blkdev_get() modifications, open_by_devnum() and
    open_bdev_exclusive() are simple wrappers around blkdev_get().
    Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

    blkdev_get_by_dev() is identical to open_by_devnum().
    blkdev_get_by_path() is slightly different in that it doesn't
    automatically add %FMODE_EXCL to @mode.

    All users are converted. Most conversions are mechanical and don't
    introduce any behavior difference. There are several exceptions.

    * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
    reason to OR it explicitly on blkdev_put().

    * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
    sb->s_mode.

    * With the above changes, sb->s_mode now always should contain
    FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
    errors.

    The new blkdev_get_*() functions are with proper docbook comments.
    While at it, add function description to blkdev_get() too.

    Signed-off-by: Tejun Heo
    Cc: Philipp Reisner
    Cc: Neil Brown
    Cc: Mike Snitzer
    Cc: Joern Engel
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: KONISHI Ryusuke
    Cc: reiserfs-devel@vger.kernel.org
    Cc: xfs-masters@oss.sgi.com
    Cc: Alexander Viro

    Tejun Heo
     
  • bdev read-only status can be queried using bdev_read_only() and may
    change while the device is being opened. Enforce it by checking it
    from blkdev_get() after open succeeds.

    This makes bdev_read_only() check in open_bdev_exclusive() and
    fsg_lun_open() unnecessary. Drop them.

    Signed-off-by: Tejun Heo
    Cc: David Brownell
    Cc: linux-usb@vger.kernel.org

    Tejun Heo
     
  • With claim/release rolled into blkdev_get/put(), there's no reason to
    keep bd_abort/finish_claim(), __bd_claim() and bd_release() as
    separate functions. It only makes the code difficult to follow.
    Collapse them into blkdev_get/put(). This will ease future changes
    around claim/release.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Over time, block layer has accumulated a set of APIs dealing with bdev
    open, close, claim and release.

    * blkdev_get/put() are the primary open and close functions.

    * bd_claim/release() deal with exclusive open.

    * open/close_bdev_exclusive() are combination of open and claim and
    the other way around, respectively.

    * bd_link/unlink_disk_holder() to create and remove holder/slave
    symlinks.

    * open_by_devnum() wraps bdget() + blkdev_get().

    The interface is a bit confusing and the decoupling of open and claim
    makes it impossible to properly guarantee exclusive access as
    in-kernel open + claim sequence can disturb the existing exclusive
    open even before the block layer knows the current open if for another
    exclusive access. Reorganize the interface such that,

    * blkdev_get() is extended to include exclusive access management.
    @holder argument is added and, if is @FMODE_EXCL specified, it will
    gain exclusive access atomically w.r.t. other exclusive accesses.

    * blkdev_put() is similarly extended. It now takes @mode argument and
    if @FMODE_EXCL is set, it releases an exclusive access. Also, when
    the last exclusive claim is released, the holder/slave symlinks are
    removed automatically.

    * bd_claim/release() and close_bdev_exclusive() are no longer
    necessary and either made static or removed.

    * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
    is no longer necessary and removed.

    * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
    and blkdev_get(). It also has an unexpected extra bdev_read_only()
    test which probably should be moved into blkdev_get().

    * open_by_devnum() is modified to take @holder argument and pass it to
    blkdev_get().

    Most of bdev open/close operations are unified into blkdev_get/put()
    and most exclusive accesses are tested atomically at the open time (as
    it should). This cleans up code and removes some, both valid and
    invalid, but unnecessary all the same, corner cases.

    open_bdev_exclusive() and open_by_devnum() can use further cleanup -
    rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
    special features. Well, let's leave them for another day.

    Most conversions are straight-forward. drbd conversion is a bit more
    involved as there was some reordering, but the logic should stay the
    same.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Brown
    Acked-by: Ryusuke Konishi
    Acked-by: Mike Snitzer
    Acked-by: Philipp Reisner
    Cc: Peter Osterlund
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: "Theodore Ts'o"
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: dm-devel@redhat.com
    Cc: drbd-dev@lists.linbit.com
    Cc: Leo Chen
    Cc: Scott Branden
    Cc: Chris Mason
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: reiserfs-devel@vger.kernel.org
    Cc: Alexander Viro

    Tejun Heo
     
  • Code to manage symlinks in /sys/block/*/{holders|slaves} are overly
    complex with multiple holder considerations, redundant extra
    references to all involved kobjects, unused generic kobject holder
    support and unnecessary mixup with bd_claim/release functionalities.

    Strip it down to what's necessary (single gendisk holder) and make it
    use a separate interface. This is a step for cleaning up
    bd_claim/release. This patch makes dm-table slightly more complex but
    it will be simplified again with further changes.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Brown
    Acked-by: Mike Snitzer
    Cc: dm-devel@redhat.com

    Tejun Heo
     

29 Oct, 2010

1 commit


26 Oct, 2010

3 commits

  • The use of the same inode list structure (inode->i_list) for two
    different list constructs with different lifecycles and purposes
    makes it impossible to separate the locking of the different
    operations. Therefore, to enable the separation of the locking of
    the writeback and reclaim lists, split the inode->i_list into two
    separate lists dedicated to their specific tracking functions.

    Signed-off-by: Nick Piggin
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Nick Piggin
     
  • bdev inodes can remain dirty even after their last close. Hence the
    BDI associated with the bdev->inode gets modified duringthe last
    close to point to the default BDI. However, the bdev inode still
    needs to be moved to the dirty lists of the new BDI, otherwise it
    will corrupt the writeback list is was left on.

    Add a new function bdev_inode_switch_bdi() to move all the bdi state
    from the old bdi to the new one safely. This is only a temporary
    measure until the bdev inodebdi lifecycle problems are sorted
    out.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Clones an existing reference to inode; caller must already hold one.

    Signed-off-by: Al Viro

    Al Viro
     

17 Sep, 2010

1 commit

  • All the blkdev_issue_* helpers can only sanely be used for synchronous
    caller. To issue cache flushes or barriers asynchronously the caller needs
    to set up a bio by itself with a completion callback to move the asynchronous
    state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
    specified when calling blkdev_issue_* and also remove the now unused flags
    argument to blkdev_issue_flush and blkdev_issue_zeroout. For
    blkdev_issue_discard we need to keep it for the secure discard flag, which
    gains a more descriptive name and loses the bitops vs flag confusion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 Aug, 2010

3 commits

  • The cgroup device whitelist code gets confused when trying to grant
    permission to a disk partition that is not currently open. Part of
    blkdev_open() includes __blkdev_get() on the whole disk.

    Basically, the only ways to reliably allow a cgroup access to a partition
    on a block device when using the whitelist are to 1) also give it access
    to the whole block device or 2) make sure the partition is already open in
    a different context.

    The patch avoids the cgroup check for the whole disk case when opening a
    partition.

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662

    Signed-off-by: Chris Wright
    Acked-by: Serge E. Hallyn
    Tested-by: Serge E. Hallyn
    Reported-by: Vivek Goyal
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: "Daniel P. Berrange"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wright
     
  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Aug, 2010

1 commit

  • The open and release block_device_operations are currently
    called with the BKL held. In order to change that, we must
    first make sure that all drivers that currently rely
    on this have no regressions.

    This blindly pushes the BKL into all .open and .release
    operations for all block drivers to prepare for the
    next step. The drivers can subsequently replace the BKL
    with their own locks or remove it completely when it can
    be shown that it is not needed.

    The functions blkdev_get and blkdev_put are the only
    remaining users of the big kernel lock in the block
    layer, besides a few uses in the ioctl code, none
    of which need to serialize with blkdev_{get,put}.

    Most of these two functions is also under the protection
    of bdev->bd_mutex, including the actual calls to
    ->open and ->release, and the common code does not
    access any global data structures that need the BKL.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

05 Aug, 2010

1 commit

  • bd_prepare_to_claim() incorrectly allowed multiple attempts for
    exclusive open to progress in parallel if the attempting holders are
    identical. This triggered BUG_ON() as reported in the following bug.

    https://bugzilla.kernel.org/show_bug.cgi?id=16393

    __bd_abort_claiming() is used to finish claiming blocks and doesn't
    work if multiple openers are inside a claiming block. Allowing
    multiple parallel open attempts to continue doesn't gain anything as
    those are serialized down in the call chain anyway. Fix it by always
    allowing only single open attempt in a claiming block.

    This problem can easily be reproduced by adding a delay after
    bd_prepare_to_claim() and attempting to mount two partitions of a
    disk.

    stable: only applicable to v2.6.35

    Signed-off-by: Tejun Heo
    Reported-by: Markus Trippelsdorf
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

11 Jun, 2010

1 commit