20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

17 Jul, 2007

3 commits

  • Fix this:

    fs/block_dev.c: In function 'bd_claim_by_disk':
    fs/block_dev.c:970: warning: 'found' may be used uninitialized in this function

    and given that free_bd_holder() now needs free(NULL)-is-legal behaviour, we
    can simplify bd_release_from_kobject().

    Cc: Bjorn Steinbrink
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Replace some funky codepaths in fs/block_dev.c with cleaner versions of the
    affected places.

    [akpm@linux-foundation.org: fix return value]
    Signed-off-by: Johannes Weiner
    Cc: Bjorn Steinbrink
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • fs/block_dev.c: Use list_for_each_entry() instead of list_for_each()
    in nr_blockdev_pages()

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     

10 Jul, 2007

1 commit


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

1 commit


08 May, 2007

3 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove duplicate work in kill_bdev().

    It currently invalidates and then truncates the bdev's mapping.
    invalidate_mapping_pages() will opportunistically remove pages from the
    mapping. And truncate_inode_pages() will forcefully remove all pages.

    The only thing truncate doesn't do is flush the bh lrus. So do that
    explicitly. This avoids (very unlikely) but possible invalid lookup
    results if the same bdev is quickly re-issued.

    It also will prevent extreme kernel latencies which are observed when
    blockdevs which have a large amount of pagecache are unmounted, by avoiding
    invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
    cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
    has one.

    [akpm@linux-foundation.org: restore nrpages==0 optimisation]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

21 Feb, 2007

1 commit

  • >=============================================
    >[ INFO: possible recursive locking detected ]
    >2.6.19-1.2909.fc7 #1
    >---------------------------------------------
    >anaconda/587 is trying to acquire lock:
    > (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >but task is already holding lock:
    > (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >other info that might help us debug this:
    >1 lock held by anaconda/587:
    > #0: (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
    >
    >stack backtrace:
    > [] show_trace_log_lvl+0x1a/0x2f
    > [] show_trace+0x12/0x14
    > [] dump_stack+0x16/0x18
    > [] __lock_acquire+0x116/0xa09
    > [] lock_acquire+0x56/0x6f
    > [] __mutex_lock_slowpath+0xe5/0x24a
    > [] mutex_lock+0x21/0x24
    > [] blkdev_ioctl+0x600/0x76d
    > [] block_ioctl+0x1b/0x1f
    > [] do_ioctl+0x22/0x68
    > [] vfs_ioctl+0x252/0x265
    > [] sys_ioctl+0x49/0x63
    > [] syscall_call+0x7/0xb

    Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
    clarifying the bd_mutex locking, because I confused myself and initially
    thought the lock order was wrong too.

    Signed-off-by: Peter Zijlstra
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

13 Feb, 2007

1 commit


04 Feb, 2007

1 commit

  • Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
    slowdown due to the recent special-casing of direct-io against
    blockdevs. We don't know why either of these things are occurring.

    The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
    release.

    Cc: Andrew Vasquez
    Cc: Ken Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Jan, 2007

2 commits

  • For large size DIO that needs multiple bio, one full page worth of data was
    lost at the boundary of bio's maximum sector or segment limits. After a
    bio is full and got submitted. The outer while (nbytes) { ... } loop will
    allocate a new bio and just march on to index into next page. It just
    forgets about the page that bio_add_page() rejected when previous bio is
    full. Fix it by put the rejected page back to pvec so we pick it up again
    for the next bio.

    Signed-off-by: Ken Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     
  • size_t is unsigned. IO errors aren't getting through.

    Cc: "Chen, Kenneth W"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Jan, 2007

1 commit

  • Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
    xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

    (XFS unlocks the semaphore from a different task, by design. The mutex
    code warns about this)

    Signed-off-by: Dave Chinner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

14 Dec, 2006

1 commit

  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

09 Dec, 2006

7 commits

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     
  • Don't leak a ->bd_part_count when the partition open fails with -ENXIO.

    Signed-off-by: Peter Zijlstra
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Now that the nesting in blkdev_{get,put} is simpler, adding mutex_lock_nested
    is trivial.

    Cc: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • When we open (actually blkdev_get) a partition we need to also open (get) the
    whole device that holds the partition. The involves some limited recursion.
    This patch tries to simplify some aspects of this.

    As well as opening the whole device, we need to increment ->bd_part_count when
    a partition is opened (this is used by rescan_partitions to avoid a rescan if
    any partition is active, as that would be confusing).

    The main change this patch makes is to move the inc/dec of bd_part_count into
    blkdev_{get,put} for the whole rather than doing it in blkdev_{get,put} for
    the partition.

    More specifically, we introduce __blkdev_get and __blkdev_put which do exactly
    what blkdev_{get,put} did, only with an extra "for_part" argument
    (blkget_{get,put} then call the __ version with a '0' for the extra argument).

    If for_part is 1, then the blkdev is being get(put) because a partition is
    being opened(closed) for the first(last) time, and so bd_part_count should be
    updated (on success). The particular advantage of pushing this function down
    is that the bd_mutex lock (which is needed to update bd_part_count) is already
    held at the lower level.

    Note that this slightly changes the semantics of bd_part_count. Instead of
    updating it whenever a partition is opened or released, it is now only updated
    on the first open or last release. This is an adequate semantic as it is only
    ever tested for "== 0".

    Having introduced these functions we remove the current bd_part_count updates
    from do_open (which is really the body of blkdev_get) and call
    __blkdev_get(... 1). Similarly in blkget_put we remove the old bd_part_count
    updates and call __blkget_put(..., 1). This call is moved to the end of
    __blkdev_put to avoid nested locks of bd_mutex.

    Finally the mutex_lock on whole->bd_mutex in do_open can be removed. It was
    only really needed to protect bd_part_count, and that is now managed (and
    protected) within the recursive call.

    The observation that bd_part_count is central to the locking issues, and the
    modifications to create __blkdev_put are from Peter Zijlstra.

    Cc: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The extra call to get_gendisk is not good. It causes a ->probe and possible
    module load before it is really appropriate to do this.

    Cc: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Use the gendisk partition number to set a lock class.

    Signed-off-by: Peter Zijlstra
    Cc: Neil Brown
    Cc: Ingo Molnar
    Acked-by: Arjan van de Ven
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Remove the old complex and crufty bd_mutex annotation.

    Signed-off-by: Peter Zijlstra
    Cc: Neil Brown
    Cc: Ingo Molnar
    Cc: Arjan van de Ven
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Nov, 2006

1 commit

  • fs/block_dev.c: In function 'find_bd_holder':
    fs/block_dev.c:666: warning: return makes integer from pointer without a cast
    fs/block_dev.c:669: warning: return makes integer from pointer without a cast
    fs/block_dev.c: In function 'add_bd_holder':
    fs/block_dev.c:685: warning: unused variable 'tmp'
    fs/block_dev.c: In function 'bd_claim_by_kobject':
    fs/block_dev.c:773: warning: assignment makes pointer from integer without a cast

    Acked-by: Jun'ichi Nomura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

31 Oct, 2006

2 commits

  • add_bd_holder() is called from bd_claim_by_kobject to put a given struct
    bd_holder in the list if there is no matching entry.

    There are 3 possible results of add_bd_holder():
    1. there is no matching entry and add the given one to the list
    2. there is matching entry, so just increment reference count of
    the existing one
    3. something failed during its course

    1 and 2 are successful cases. But for case 2, someone has to free the
    unused struct bd_holder.

    The current code frees it inside of add_bd_holder and returns same value
    0 for both cases 1 and 2. However, it's natural and less error-prone if
    caller frees it since it's allocated by the caller.

    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     
  • This fixes bd_claim_by_kobject to release bdev correctly in case that
    bd_claim succeeds but following add_bd_holder fails.

    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     

29 Oct, 2006

1 commit


01 Oct, 2006

6 commits


30 Sep, 2006

2 commits

  • In the case below we are locking the whole disk not a partition. This
    change simply brings the code in line with the piece above where when we
    are the 'first' opener, and we are a partition.

    Signed-off-by: Jason Baron
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • Check driver layer errors.

    Fix from: "Jun'ichi Nomura"

    In blockdevc-check-errors.patch, add_bd_holder() is modified to return error
    values when some of its operation failed. Among them, it returns -EEXIST when
    a given bd_holder object already exists in the list.

    However, in this case, the function completed its work successfully and need
    no action by its caller other than freeing unused bd_holder object. So I
    think it's better to return success after freeing by itself.

    Otherwise, bd_claim-ing with same claim pointer will fail.
    Typically, lvresize will fails with following message:
    device-mapper: reload ioctl failed: Invalid argument
    and you'll see messages like below in kernel log:
    device-mapper: table: 254:13: linear: dm-linear: Device lookup failed
    device-mapper: ioctl: error adding target to table

    Similarly, it should not add bd_holder to the list if either one of symlinking
    fails. I don't have a test case for this to happen but it should cause
    dereference of freed pointer.

    If a matching bd_holder is found in bd_holder_list, add_bd_holder() completes
    its job by just incrementing the reference count. In this case, it should be
    considered as success but it used to return 'fail' to let the caller free
    temporary bd_holder. Fixed it to return success and free given object by
    itself.

    Also, if either one of symlinking fails, the bd_holder should not be added to
    the list so that it can be discarded later. Otherwise, the caller will free
    bd_holder which is in the list.

    Signed-off-by: Jun'ichi Nomura
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

28 Aug, 2006

1 commit

  • On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
    > =============================================
    > [ INFO: possible recursive locking detected ]
    > ---------------------------------------------
    > parted/7929 is trying to acquire lock:
    > (&bdev->bd_mutex){--..}, at: [] __blkdev_put+0x1e/0x13c
    >
    > but task is already holding lock:
    > (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    >
    > other info that might help us debug this:
    > 1 lock held by parted/7929:
    > #0: (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    > stack backtrace:
    > [] show_trace_log_lvl+0x58/0x15b
    > [] show_trace+0xd/0x10
    > [] dump_stack+0x17/0x1a
    > [] __lock_acquire+0x753/0x99c
    > [] lock_acquire+0x4a/0x6a
    > [] mutex_lock_nested+0xc8/0x20c
    > [] __blkdev_put+0x1e/0x13c
    > [] blkdev_put+0xa/0xc
    > [] do_open+0x336/0x3a8
    > [] blkdev_open+0x1f/0x4c
    > [] __dentry_open+0xc7/0x1aa
    > [] nameidata_to_filp+0x1c/0x2e
    > [] do_filp_open+0x2e/0x35
    > [] do_sys_open+0x38/0x68
    > [] sys_open+0x16/0x18
    > [] sysenter_past_esp+0x56/0x8d

    OK, I'm having a look here; its all new to me so bear with me.

    blkdev_open() calls
    do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
    mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)

    then something fails, and we're thrown to:

    out_first: where
    if (bdev != bdev->bd_contains)
    blkdev_put(bdev->bd_contains) which is
    __blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
    mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) bd_contains is either bdev or whole, and
    since we take the branch it must be whole. So it seems to me the
    following patch would be the right one:

    [akpm@osdl.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

04 Jul, 2006

1 commit

  • Teach special (recursive) locking code to the lock validator.

    Effects on non-lockdep kernels:

    - the introduction of the following function variants:

    extern struct block_device *open_partition_by_devnum(dev_t, unsigned);

    extern int blkdev_put_partition(struct block_device *);

    static int
    blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);

    which on non-lockdep are the same as open_by_devnum(), blkdev_put()
    and blkdev_get().

    - a subclass parameter to do_open(). [unused on non-lockdep]

    - a subclass parameter to __blkdev_put(), which is a new internal
    function for the main blkdev_put*() functions. [parameter unused
    on non-lockdep kernels, except for two sanity check WARN_ON()s]

    these functions carry no semantical difference - they only express
    object dependencies towards the lockdep subsystem.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar