08 Aug, 2010

4 commits

  • The blkpg_ioctl and blkdev_reread_part access fields of
    the bdev and gendisk structures, yet they always do so
    under the protection of bdev->bd_mutex, which seems
    sufficient.

    Signed-off-by: Arnd Bergmann
    cked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • We only call the functions set_device_ro(),
    invalidate_bdev(), sync_filesystem() and sync_blockdev()
    while holding the BKL in these commands. All
    of these are also done in other code paths without
    the BKL, which leads me to the conclusion that
    the BKL is not needed here either.

    The reason we hold it here is that it was originally
    pushed down into the ioctl function from vfs_ioctl.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • The blktrace driver currently needs the BKL, but
    we should not need to take that in the block layer,
    so just push it down into the driver itself.

    It is quite likely that the BKL is not actually
    required in blktrace code and could be removed
    in a follow-on patch.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • As a preparation for the removal of the big kernel
    lock in the block layer, this removes the BKL
    from the common ioctl handling code, moving it
    into every single driver still using it.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

29 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Dec, 2009

1 commit

  • The discard ioctl is used by mkfs utilities to clear a block device
    prior to putting metadata down. However, not all devices return zeroed
    blocks after a discard. Some drives return stale data, potentially
    containing old superblocks. It is therefore important to know whether
    discarded blocks are properly zeroed.

    Both ATA and SCSI drives have configuration bits that indicate whether
    zeroes are returned after a discard operation. Implement a block level
    interface that allows this information to be bubbled up the stack and
    queried via a new block device ioctl.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

04 Oct, 2009

1 commit

  • Not all users of the topology information want to use libblkid. Provide
    the topology information through bdev ioctls.

    Also clarify sector size comments for existing BLK ioctls.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

14 Sep, 2009

1 commit

  • blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
    the only difference between the two is that blkdev_issue_discard needs to
    send a barrier discard request and blk_ioctl_discard a non-barrier one,
    and blk_ioctl_discard needs to wait on the request. To facilitates this
    add a flags argument to blkdev_issue_discard to control both aspects of the
    behaviour. This will be very useful later on for using the waiting
    funcitonality for other callers.

    Based on an earlier patch from Matthew Wilcox .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

23 May, 2009

2 commits

  • Convert all external users of queue limits to using wrapper functions
    instead of poking the request queue variables directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

15 Apr, 2009

1 commit


29 Dec, 2008

1 commit


18 Nov, 2008

1 commit


21 Oct, 2008

7 commits


09 Oct, 2008

9 commits

  • disk->__part used to be statically allocated to the maximum possible
    number of partitions. This patch makes partition array allocation
    dynamic. The added overhead is minimal as only real change is one
    memory dereference changed to RCU one. This saves both a bit of
    memory and cpu cycles iterating through unoccupied slots and makes
    increasing partition limit easier.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • genhd and partition code handled disk and partitions separately. All
    information about the whole disk was in struct genhd and partitions in
    struct hd_struct. However, the whole disk (part0) and other
    partitions have a lot in common and the data structures end up having
    good number of common fields and thus separate code paths doing the
    same thing. Also, the partition array was indexed by partno - 1 which
    gets pretty confusing at times.

    This patch introduces partition 0 and makes the partition array
    indexed by partno. Following patches will unify the handling of disk
    and parts piece-by-piece.

    This patch also implements disk_partitionable() which tests whether a
    disk is partitionable. With coming dynamic partition array change,
    the most common usage of disk_max_parts() will be testing whether a
    disk is partitionable and the number of max partitions will become
    much less important.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • disk->part[] is protected by its matching bdev's lock. However,
    non-critical accesses like collecting stats and printing out sysfs and
    proc information used to be performed without any locking. As
    partitions can come and go dynamically, partitions can go away
    underneath those non-critical accesses. As some of those accesses are
    writes, this theoretically can lead to silent corruption.

    This patch fixes the race by using RCU for the partition array and dev
    reference counter to hold partitions.

    * Rename disk->part[] to disk->__part[] to make sure no one outside
    genhd layer proper accesses it directly.

    * Use RCU for disk->__part[] dereferencing.

    * Implement disk_{get|put}_part() which can be used to get and put
    partitions from gendisk respectively.

    * Iterators are implemented to help iterate through all partitions
    safely.

    * Functions which require RCU readlock are marked with _rcu suffix.

    * Use disk_put_part() in __blkdev_put() instead of directly putting
    the contained kobject.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * Implement disk_devt() and part_devt() and use them to directly
    access devt instead of computing it from ->major and ->first_minor.

    Note that all references to ->major and ->first_minor outside of
    block layer is used to determine devt of the disk (the part0) and as
    ->major and ->first_minor will continue to represent devt for the
    disk, converting these users aren't strictly necessary. However,
    convert them for consistency.

    * Implement disk_max_parts() to avoid directly deferencing
    genhd->minors.

    * Update bdget_disk() such that it doesn't assume consecutive minor
    space.

    * Move devt computation from register_disk() to add_disk() and make it
    the only one (all other usages use the initially determined value).

    These changes clean up the code and will help disk->part dereference
    fix and extended block device numbers.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • In hd_struct, @partno is used to denote partition number and a number
    of other places use @part to denote hd_struct. Functions use @part
    and @index instead. This causes confusion and makes it difficult to
    use consistent variable names for hd_struct. Always use @partno if a
    variable represents partition number.

    Also, print out functions use @f or @part for seq_file argument. Use
    @seqf uniformly instead.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • d805dda4 tried to fix error case handling in add_partition() but had a
    few problems.

    * disk->part[] entry is set early and left dangling if operation
    fails.

    * Once device initialized, the last put_device() is responsible for
    freeing all the resources. The failure path freed part_stats and p
    regardless of put_device() causing double free.

    * holders subdir holds reference to the disk device, so failure path
    should remove it to release resources properly which was missing.

    This patch fixes the above problems and while at it move partition
    slot busy check into add_partition() for completeness and inlines
    holders subdirectory creation. Using separate function for it just
    obfuscates the code.

    Signed-off-by: Tejun Heo
    Cc: Abdel Benamrouche
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • delete_partition() was noop for zero length partition. As the
    addition code allows creating zero lenght partition and deletion is
    assumed to always succeed, this causes memory leak for zero length
    partitions. Allow zero length partitions to end their meaningless
    lives.

    While at it, allow deleting zero lenght partition via
    BLKPG_DEL_PARTITION ioctl too.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • But blkdev_issue_discard() still emits requests which are interpreted as
    soft barriers, because naïve callers might otherwise issue subsequent
    writes to those same sectors, which might cross on the queue (if they're
    reallocated quickly enough).

    Callers still _can_ issue non-barrier discard requests, but they have to
    take care of queue ordering for themselves.

    Signed-off-by: David Woodhouse
    Signed-off-by: Jens Axboe

    David Woodhouse
     
  • We may well want mkfs tools to use this to mark the whole device as
    unwanted before they format it, for example.

    The ioctl takes a pair of uint64_ts, which are start offset and length
    in _bytes_. Although at the moment it might make sense for them both to
    be in 512-byte sectors, I don't want to limit the ABI to that.

    Signed-off-by: David Woodhouse
    Signed-off-by: Jens Axboe

    David Woodhouse
     

26 Jul, 2008

1 commit


10 Oct, 2007

1 commit


08 May, 2007

1 commit

  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

21 Feb, 2007

1 commit

  • >=============================================
    >[ INFO: possible recursive locking detected ]
    >2.6.19-1.2909.fc7 #1
    >---------------------------------------------
    >anaconda/587 is trying to acquire lock:
    > (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >but task is already holding lock:
    > (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >other info that might help us debug this:
    >1 lock held by anaconda/587:
    > #0: (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >stack backtrace:
    > [] show_trace_log_lvl+0x1a/0x2f
    > [] show_trace+0x12/0x14
    > [] dump_stack+0x16/0x18
    > [] __lock_acquire+0x116/0xa09
    > [] lock_acquire+0x56/0x6f
    > [] __mutex_lock_slowpath+0xe5/0x24a
    > [] mutex_lock+0x21/0x24
    > [] blkdev_ioctl+0x600/0x76d
    > [] block_ioctl+0x1b/0x1f
    > [] do_ioctl+0x22/0x68
    > [] vfs_ioctl+0x252/0x265
    > [] sys_ioctl+0x49/0x63
    > [] syscall_call+0x7/0xb

    Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
    clarifying the bd_mutex locking, because I confused myself and initially
    thought the lock order was wrong too.

    Signed-off-by: Peter Zijlstra
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

11 Feb, 2007

1 commit

  • Some partitioning systems create special partitions that
    span the entire disk. One example are Sun partitions, and
    this whole-disk partition exists to tell the firmware the
    extent of the entire device so it can load the boot block
    and do other things.

    Such partitions should not be treated as normal partitions,
    because all the other partitions overlap this whole-disk one.
    So we'd see multiple instances of the same UUID etc. which
    we do not want. udev and friends can thus search for this
    'whole_disk' attribute and use it to decide to ignore the
    partition.

    Signed-off-by: Fabio Massimo Di Nitto
    Signed-off-by: David S. Miller

    Fabio Massimo Di Nitto
     

09 Dec, 2006

2 commits


03 Oct, 2006

1 commit

  • Export blkdev_driver_ioctl for device-mapper.

    If we get as far as the device-mapper ioctl handler, we know the ioctl is not
    a standard block layer BLK* one, so we don't need to check for them a second
    time and can call blkdev_driver_ioctl() directly.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     

15 Jul, 2006

1 commit


24 Mar, 2006

1 commit