08 Mar, 2010

2 commits

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Constify struct kset_uevent_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     

06 Mar, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

2 commits

  • Currenly sync_quota_sb does a lot of sync and truncate action that only
    applies to "VFS" style quotas and is actively harmful for the sync
    performance in XFS. Move it into vfs_quota_sync and add a wait parameter
    to ->quota_sync to tell if we need it or not.

    My audit of the GFS2 code says it's also not needed given the way GFS2
    implements quotas, but I'd be happy if this can get a detailed review.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    init: Open /dev/console from rootfs
    mqueue: fix typo "failues" -> "failures"
    mqueue: only set error codes if they are really necessary
    mqueue: simplify do_open() error handling
    mqueue: apply mathematics distributivity on mq_bytes calculation
    mqueue: remove unneeded info->messages initialization
    mqueue: fix mq_open() file descriptor leak on user-space processes
    fix race in d_splice_alias()
    set S_DEAD on unlink() and non-directory rename() victims
    vfs: add NOFOLLOW flag to umount(2)
    get rid of ->mnt_parent in tomoyo/realpath
    hppfs can use existing proc_mnt, no need for do_kern_mount() in there
    Mirror MS_KERNMOUNT in ->mnt_flags
    get rid of useless vfsmount_lock use in put_mnt_ns()
    Take vfsmount_lock to fs/internal.h
    get rid of insanity with namespace roots in tomoyo
    take check for new events in namespace (guts of mounts_poll()) to namespace.c
    Don't mess with generic_permission() under ->d_lock in hpfs
    sanitize const/signedness for udf
    nilfs: sanitize const/signedness in dealing with ->d_name.name
    ...

    Fix up fairly trivial (famous last words...) conflicts in
    drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c

    Linus Torvalds
     

04 Mar, 2010

1 commit


01 Mar, 2010

4 commits

  • This patch changes glock numbers from printing in decimal to hex.
    Since DLM prints corresponding resource IDs in hex, it makes debugging
    easier.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • When we queue data buffers for ordered write, the buffers are added
    to the head of the ordered write list. When the log needs to push
    these buffers to disk, it also walks the list from the head. The
    result is that the the ordered buffers are submitted to disk in
    reverse order.

    For large writes, this means that whenever the log flushes large
    streams of reverse sequential order buffers are pushed down into the
    block layers. The elevators don't handle this particularly well, so
    IO rates tend to be significantly lower than if the IO was issued in
    ascending block order.

    Queue new ordered buffers to the tail of the ordered buffer list to
    ensure that IO is dispatched in the order it was submitted. This
    should significantly improve large sequential write speeds. On a
    disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
    noop and from 38MB/s to 50MB/s for cfq.

    Signed-off-by: Dave Chinner
    Signed-off-by: Steven Whitehouse

    Dave Chinner
     
  • As a consequence of the previous patch, we can now remove the
    loop which used to be required due to the circular dependency
    between the inodes and glocks. Instead we can just invalidate
    the inodes, and then clear up any glocks which are left.

    Also we no longer need the rwsem since there is no longer any
    danger of the inode invalidation calling back into the glock
    code (and from there back into the inode code).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

12 Feb, 2010

2 commits

  • This patch solves a corner case during allocation which occurs if both
    metadata (indirect) and data blocks are required but there is an
    obstacle in the filesystem (e.g. a resource group header or another
    allocated block) such that when the allocation is requested only
    enough blocks for the metadata are returned.

    By changing the exit condition of this loop, we ensure that a
    minimum of one data block will always be returned.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • We need this one-liner to signal the mount helper of the 'insufficient journals' condition.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

03 Feb, 2010

2 commits

  • Although all glocks are, by the time of the umount glock wait,
    scheduled for demotion, some of them haven't made it far
    enough through the process for the original set of waiting
    code to wait for them.

    This extends the ref count to the whole glock lifetime in order
    to ensure that the waiting does catch all glocks. It does make
    it a bit more invasive, but it seems the only sensible solution
    at the moment.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds a wait on umount between the point at which we
    dispose of all glocks and the point at which we unmount the
    lock protocol. This ensures that we've received all the replies
    to our unlock requests before we stop the locking.

    Signed-off-by: Steven Whitehouse
    Reported-by: Fabio M. Di Nitto

    Steven Whitehouse
     

01 Feb, 2010

3 commits


12 Jan, 2010

1 commit


11 Jan, 2010

1 commit


08 Jan, 2010

3 commits

  • The ref counting for the bh returned by gfs2_ea_find() was
    wrong. This patch ensures that we always drop the ref count
    to that bh correctly.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The rename code was taking a resource group lock in cases where
    it wasn't actually needed, this caused problems if the rename
    was resulting in an inode being unlinked. The patch ensures that
    we only take the rgrp lock early if it is really needed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The VFS reads the inode size during generic_file_aio_write() but
    with no locking around it. In order to get the expected result
    from O_APPEND opens, this patch updated the inode size before
    calling generic_file_aio_write()

    There is of course still a race here, in that there is nothing to
    prevent another node coming in and extending the file in the
    mean time. On the other hand, when used with file locking this
    will ensure that the expected results are obtained.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Dec, 2009

2 commits

  • This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as
    requested by Alexey:

    "I think I gave a good enough arguments to not merge it.
    To iterate:
    * patch makes impossible to start using ext3 on EXT3_FS=n kernels
    without reboot.
    * this is done only for one pointer on task_struct"

    None of config options which define task_struct are tristate directly
    or effectively."

    Requested-by: Alexey Dobriyan
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • After I_SYNC was split from I_LOCK the leftover is always used together with
    I_NEW and thus superflous.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Dec, 2009

1 commit

  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

16 Dec, 2009

2 commits


03 Dec, 2009

12 commits

  • This patch fixes some ref counting issues. Firstly by moving
    the point at which we drop the ref count after a dlm lock
    operation has completed we ensure that we never call
    gfs2_glock_hold() on a lock with a zero ref count.

    Secondly, by using atomic_dec_and_lock() in gfs2_glock_put()
    we ensure that at no time will a glock with zero ref count
    appear on the lru_list. That means that we can remove the
    check for this in our shrinker (which was racy).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • No one is calling wb_writeback and write_cache_pages with
    wbc.nonblocking=1 any more. And lumpy pageout will want to do
    nonblocking writeback without the congestion wait.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Steven Whitehouse

    Wu Fengguang
     
  • When a gfs2 filesystem is grown, it needs to rebuild the rindex list to be able
    to use the new space. gfs2 does this when the rindex is marked not uptodate,
    which happens when the rindex glock is dropped. However, on a single node
    setup, there is never any reason to drop the rindex glock, so gfs2 never
    invalidates the the rindex. This patch makes gfs2 automatically drop the
    rindex glock after filesystem grows, so it can refresh the rindex list.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • There are two spare field in the header common to all GFS2
    metadata. One is just the right size to fit a journal id
    in it, and this patch updates the journal code so that each
    time a metadata block is modified, we tag it with the journal
    id of the node which is performing the modification.

    The reason for this is that it should make it much easier to
    debug issues which arise if we can tell which node was the
    last to modify a particular metadata block.

    Since the field is updated before the block is written into
    the journal, each journal should only contain metadata which
    is tagged with its own journal id. The one exception to this
    is the journal header block, which might have a different node's
    id in it, if that journal was recovered by another node in the
    cluster.

    Thus each journal will contain a record of which nodes recovered
    it, via the journal header.

    The other field in the metadata header could potentially be
    used to hold information about what kind of operation was
    performed, but for the time being we just zero it on each
    transaction so that if we use it for that in future, we'll
    know that the information (where it exists) is reliable.

    I did consider using the other field to hold the journal
    sequence number, however since in GFS2's journaling we write
    the modified data into the journal and not the original
    data, this gives no information as to what action caused the
    modification, so I think we can probably come up with a better
    use for those 64 bits in the future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • In some cases we already have the rindex lock when
    we enter this function.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This function only had one caller left, and that caller only
    called it for leaf blocks, hence one branch of the "if" was
    never taken. In addition the call to get_left had already
    verified the metadata type, so the function can be reduced
    to a single line of code in its caller.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since the default is barriers on, this only displays the
    nobarrier option when that is active.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Currently gfs2 issues barrier unconditionally. There are various reasons
    to disable them, be that just for testing or for stupid devices flushing
    large battert backed caches. Add a nobarrier option that matches xfs and
    btrfs for this. Also add a symmetric barrier option to turn it back on
    at remount time.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Steven Whitehouse

    Christoph Hellwig
     
  • It's not necessary to do any 64bit division for the statfs sync code, so
    remove it.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • GFS2 now has three new mount options, statfs_quantum, quota_quantum and
    statfs_percent. statfs_quantum and quota_quantum simply allow you to
    set the tunables of the same name. Setting setting statfs_quantum to 0
    will also turn on the statfs_slow tunable. statfs_percent accepts an
    integer between 0 and 100. Numbers between 1 and 100 will cause GFS2 to
    do any early sync when the local number of blocks free changes by at
    least statfs_percent from the totoal number of blocks free. Setting
    statfs_percent to 0 disables this.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This adds support to GFS2 to send quota warnings via netlink.
    Also it removes a stray \r which was left over from when the
    code used to print warnings on the console.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds the ability to set GFS2 quota limit and
    warning levels via the XFS quota API.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse