14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

13 Jan, 2011

1 commit


07 Jan, 2011

7 commits

  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_hash so it may be called from lock-free RCU lookups. See similar
    patch for d_compare for details.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_compare so it may be called from lock-free RCU lookups. This
    does put significant restrictions on what may be done from the callback,
    however there don't seem to have been any problems with in-tree fses.
    If some strange use case pops up that _really_ cannot cope with the
    rcu-walk rules, we can just add new rcu-unaware callbacks, which would
    cause name lookup to drop out of rcu-walk mode.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Use vfat's method for dealing with negative dentries to preserve case,
    rather than overwrite dentry name in d_revalidate, which is a bit ugly
    and also gets in the way of doing lock-free path walking.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

13 Nov, 2010

2 commits

  • After recent blkdev_get() modifications, open_by_devnum() and
    open_bdev_exclusive() are simple wrappers around blkdev_get().
    Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

    blkdev_get_by_dev() is identical to open_by_devnum().
    blkdev_get_by_path() is slightly different in that it doesn't
    automatically add %FMODE_EXCL to @mode.

    All users are converted. Most conversions are mechanical and don't
    introduce any behavior difference. There are several exceptions.

    * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
    reason to OR it explicitly on blkdev_put().

    * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
    sb->s_mode.

    * With the above changes, sb->s_mode now always should contain
    FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
    errors.

    The new blkdev_get_*() functions are with proper docbook comments.
    While at it, add function description to blkdev_get() too.

    Signed-off-by: Tejun Heo
    Cc: Philipp Reisner
    Cc: Neil Brown
    Cc: Mike Snitzer
    Cc: Joern Engel
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: KONISHI Ryusuke
    Cc: reiserfs-devel@vger.kernel.org
    Cc: xfs-masters@oss.sgi.com
    Cc: Alexander Viro

    Tejun Heo
     
  • Over time, block layer has accumulated a set of APIs dealing with bdev
    open, close, claim and release.

    * blkdev_get/put() are the primary open and close functions.

    * bd_claim/release() deal with exclusive open.

    * open/close_bdev_exclusive() are combination of open and claim and
    the other way around, respectively.

    * bd_link/unlink_disk_holder() to create and remove holder/slave
    symlinks.

    * open_by_devnum() wraps bdget() + blkdev_get().

    The interface is a bit confusing and the decoupling of open and claim
    makes it impossible to properly guarantee exclusive access as
    in-kernel open + claim sequence can disturb the existing exclusive
    open even before the block layer knows the current open if for another
    exclusive access. Reorganize the interface such that,

    * blkdev_get() is extended to include exclusive access management.
    @holder argument is added and, if is @FMODE_EXCL specified, it will
    gain exclusive access atomically w.r.t. other exclusive accesses.

    * blkdev_put() is similarly extended. It now takes @mode argument and
    if @FMODE_EXCL is set, it releases an exclusive access. Also, when
    the last exclusive claim is released, the holder/slave symlinks are
    removed automatically.

    * bd_claim/release() and close_bdev_exclusive() are no longer
    necessary and either made static or removed.

    * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
    is no longer necessary and removed.

    * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
    and blkdev_get(). It also has an unexpected extra bdev_read_only()
    test which probably should be moved into blkdev_get().

    * open_by_devnum() is modified to take @holder argument and pass it to
    blkdev_get().

    Most of bdev open/close operations are unified into blkdev_get/put()
    and most exclusive accesses are tested atomically at the open time (as
    it should). This cleans up code and removes some, both valid and
    invalid, but unnecessary all the same, corner cases.

    open_bdev_exclusive() and open_by_devnum() can use further cleanup -
    rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
    special features. Well, let's leave them for another day.

    Most conversions are straight-forward. drbd conversion is a bit more
    involved as there was some reordering, but the logic should stay the
    same.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Brown
    Acked-by: Ryusuke Konishi
    Acked-by: Mike Snitzer
    Acked-by: Philipp Reisner
    Cc: Peter Osterlund
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: "Theodore Ts'o"
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: dm-devel@redhat.com
    Cc: drbd-dev@lists.linbit.com
    Cc: Leo Chen
    Cc: Scott Branden
    Cc: Chris Mason
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: reiserfs-devel@vger.kernel.org
    Cc: Alexander Viro

    Tejun Heo
     

29 Oct, 2010

1 commit


26 Oct, 2010

2 commits


25 Oct, 2010

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Update broken web addresses in arch directory.
    Update broken web addresses in the kernel.
    Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
    Revert "Fix typo: configuation => configuration" partially
    ida: document IDA_BITMAP_LONGS calculation
    ext2: fix a typo on comment in ext2/inode.c
    drivers/scsi: Remove unnecessary casts of private_data
    drivers/s390: Remove unnecessary casts of private_data
    net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
    drivers/infiniband: Remove unnecessary casts of private_data
    drivers/gpu/drm: Remove unnecessary casts of private_data
    kernel/pm_qos_params.c: Remove unnecessary casts of private_data
    fs/ecryptfs: Remove unnecessary casts of private_data
    fs/seq_file.c: Remove unnecessary casts of private_data
    arm: uengine.c: remove C99 comments
    arm: scoop.c: remove C99 comments
    Fix typo configue => configure in comments
    Fix typo: configuation => configuration
    Fix typo interrest[ing|ed] => interest[ing|ed]
    Fix various typos of valid in comments
    ...

    Fix up trivial conflicts in:
    drivers/char/ipmi/ipmi_si_intf.c
    drivers/usb/gadget/rndis.c
    net/irda/irnet/irnet_ppp.c

    Linus Torvalds
     

05 Oct, 2010

2 commits

  • The BKL is only used in put_super, fill_super and remount_fs that are all
    three protected by the superblocks s_umount rw_semaphore. Therefore it is
    safe to remove the BKL entirely.

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann

    Jan Blunck
     
  • This patch is a preparation necessary to remove the BKL from do_new_mount().
    It explicitly adds calls to lock_kernel()/unlock_kernel() around
    get_sb/fill_super operations for filesystems that still uses the BKL.

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    do_kern_mount() is already called without the BKL when mounting the rootfs
    and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
    from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
    through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
    afs_mntpt_follow_link(). Both later functions are actually the filesystems
    follow_link inode operation. vfs_kern_mount() is calling the specified
    get_sb function and lets the filesystem do its job by calling the given
    fill_super function.

    Therefore I think it is safe to push down the BKL from the VFS to the
    low-level filesystems get_sb/fill_super operation.

    [arnd: do not add the BKL to those file systems that already
    don't use it elsewhere]

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann
    Cc: Matthew Wilcox
    Cc: Christoph Hellwig

    Jan Blunck
     

21 Sep, 2010

1 commit


11 Aug, 2010

2 commits


10 Aug, 2010

4 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the only
    remaining caller and rename the non-truncating version to nobh_write_begin.

    Get rid of the superflous file argument to it while we're at it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

09 Aug, 2010

1 commit


31 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    quota: Convert quota statistics to generic percpu_counter
    ext3 uses rb_node = NULL; to zero rb_root.
    quota: Fixup dquot_transfer
    reiserfs: Fix resuming of quotas on remount read-write
    pohmelfs: Remove dead quota code
    ufs: Remove dead quota code
    udf: Remove dead quota code
    quota: rename default quotactl methods to dquot_
    quota: explicitly set ->dq_op and ->s_qcop
    quota: drop remount argument to ->quota_on and ->quota_off
    quota: move unmount handling into the filesystem
    quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
    quota: move remount handling into the filesystem
    ocfs2: Fix use after free on remount read-only

    Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c

    Linus Torvalds
     

28 May, 2010

1 commit


24 May, 2010

5 commits

  • Follow the dquot_* style used elsewhere in dquot.c.

    [Jan Kara: Fixed up missing conversion of ext2]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Only set the quota operation vectors if the filesystem actually supports
    quota instead of doing it for all filesystems in alloc_super().

    [Jan Kara: Export dquot_operations and vfs_quotactl_ops]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently the VFS calls into the quotactl interface for unmounting
    filesystems. This means filesystems with their own quota handling
    can't easily distinguish between user-space originating quotaoff
    and an unount. Instead move the responsibily of the unmount handling
    into the filesystem to be consistent with all other dquot handling.

    Note that we do call dquot_disable a lot later now, e.g. after
    a sync_filesystem. But this is fine as the quota code does all its
    writes via blockdev's mapping and that is synced even later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Instead of having wrappers in the VFS namespace export the dquot_suspend
    and dquot_resume helpers directly. Also rename vfs_quota_disable to
    dquot_disable while we're at it.

    [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently do_remount_sb calls into the dquot code to tell it about going
    from rw to ro and ro to rw. Move this code into the filesystem to
    not depend on the dquot code in the VFS - note ocfs2 already ignores
    these calls and handles remount by itself. This gets rid of overloading
    the quotactl calls and allows to unify the VFS and XFS codepaths in
    that area later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

22 May, 2010

2 commits


21 May, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
    vlynq: make whole Kconfig-menu dependant on architecture
    add descriptive comment for TIF_MEMDIE task flag declaration.
    EEPROM: max6875: Header file cleanup
    EEPROM: 93cx6: Header file cleanup
    EEPROM: Header file cleanup
    agp: use NULL instead of 0 when pointer is needed
    rtc-v3020: make bitfield unsigned
    PCI: make bitfield unsigned
    jbd2: use NULL instead of 0 when pointer is needed
    cciss: fix shadows sparse warning
    doc: inode uses a mutex instead of a semaphore.
    uml: i386: Avoid redefinition of NR_syscalls
    fix "seperate" typos in comments
    cocbalt_lcdfb: correct sections
    doc: Change urls for sparse
    Powerpc: wii: Fix typo in comment
    i2o: cleanup some exit paths
    Documentation/: it's -> its where appropriate
    UML: Fix compiler warning due to missing task_struct declaration
    UML: add kernel.h include to signal.c
    ...

    Linus Torvalds
     

15 May, 2010

1 commit

  • I spotted the missing kfree() while removing the BKL.

    [akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again]
    Signed-off-by: Jan Blunck
    Cc: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jan Blunck
     

23 Apr, 2010

1 commit


22 Apr, 2010

1 commit


16 Apr, 2010

2 commits