14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Steven Whitehouse
    Cc: cluster-devel@redhat.com
    Signed-off-by: James Morris

    David Howells
     

23 Oct, 2008

2 commits


14 Oct, 2008

1 commit

  • This is a much better version of a previous patch to make the parser
    tables constant. Rather than changing the typedef, we put the "const" in
    all the various places where its required, allowing the __initconst
    exception for nfsroot which was the cause of the previous trouble.

    This was posted for review some time ago and I believe its been in -mm
    since then.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Steven Whitehouse
     

11 Oct, 2008

1 commit


26 Sep, 2008

1 commit

  • This patch adds barrier support to GFS2. There is not a lot of change
    really... we just add the barrier flag when we write journal header
    blocks. If the underlying device refuses to support them, we fall back
    to the previous way of doing things (wait for the I/O and hope) since
    there is nothing else we can do. There is no user configuration,
    barriers will always be on unless the device refuses to support them.
    This seems a reasonable solution to me since this is a correctness
    issue.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Sep, 2008

2 commits

  • Until now, we've used the same scheme as GFS1 for atime. This has failed
    since atime is a per vfsmnt flag, not a per fs flag and as such the
    "noatime" flag was not getting passed down to the filesystems. This
    patch removes all the "special casing" around atime updates and we
    simply use the VFS's atime code.

    The net result is that GFS2 will now support all the same atime related
    mount options of any other filesystem on a per-vfsmnt basis. We do lose
    the "lazy atime" updates, but we gain "relatime". We could add lazy
    atime to the VFS at a later date, if there is a requirement for that
    variant still - I suspect relatime will be enough.

    Also we lose about 100 lines of code after this patch has been applied,
    and I have a suspicion that it will speed things up a bit, even when
    atime is "on". So it seems like a nice clean up as well.

    From a user perspective, everything stays the same except the loss of
    the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
    least, and to be honest I don't think anybody ever used it) and that a
    number of options which were ignored before now work correctly.

    Please let me know if you've got any comments. I'm pushing this out
    early so that you can all see what my plans are.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The following patch shrinks the gfs2_args structure which is embedded in
    every GFS2 superblock. It cuts down the size of the options to a single
    unsigned int (the 13 bits of bitfields will be rounded up to that size
    by the compiler) from the current 11 unsigned ints. So on x86 thats 44
    bytes shrinking to 4 bytes, in each and every GFS2 superblock.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Sep, 2008

2 commits


05 Sep, 2008

2 commits

  • In case of error, the function gfs2_inode_lookup returns an
    ERR pointer, but never returns a NULL pointer. So a NULL test that
    necessarily comes after an IS_ERR test should be deleted, and a NULL
    test that may come after a call to this function should be
    strengthened by an IS_ERR test.

    The semantic match that finds this problem is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @match_bad_null_test@
    expression x, E;
    statement S1,S2;
    @@
    x = gfs2_inode_lookup(...)
    ... when != x = E
    * if (x != NULL)
    S1 else S2
    //

    Signed-off-by: Julien Brunel
    Signed-off-by: Julia Lawall
    Signed-off-by: Steven Whitehouse

    Julien Brunel
     
  • In the case that a request for a glock arrives right after the
    grant reply has arrived, it sometimes means that the gl_tstamp
    field hasn't been updated recently enough. The net result is that
    the min-hold time for the glock is ignored. If this happens
    often enough, it leads to poor performance.

    This patch adds an additional test, so that if the reply pending
    bit is set on a glock, then it will select the maximum length of
    time for the min-hold time, rather than looking at gl_tstamp.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

29 Aug, 2008

1 commit

  • Add a count for lockspace create and release so that create can
    be called multiple times to use the lockspace from different places.
    Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
    the previous behavior of returning -EEXIST if the lockspace already
    exists.

    Signed-off-by: David Teigland

    David Teigland
     

27 Aug, 2008

1 commit

  • This patch fixes a locking issue in the rename code by ensuring that we hold
    the per sb rename lock over both directory and "other" renames which involve
    different parent directories.

    At the same time, this moved the (only called from one place) function
    gfs2_ok_to_move into the file that its called from, so we can mark it
    static. This should make a code a bit easier to follow.

    Signed-off-by: Steven Whitehouse
    Cc: Peter Staubach

    Steven Whitehouse
     

13 Aug, 2008

3 commits

  • This patch fixes a problem whereby simultaneous unlink, rmdir,
    rename and link operations (e.g. rm -fR *) from multiple nodes
    on the same GFS2 file system can cause kernel panics, hangs,
    and/or memory corruption. It also gets rid of all the non-rgrp
    calls to gfs2_glock_nq_m.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch is intended to fix the issues reported in bz #457798. Instead
    of having the metafs as a separate filesystem, it becomes a second root
    of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
    is still possible (for backwards compatibility purposes) to mount it as
    type gfs2meta. A new mount flag "meta" is introduced so that its possible
    to tell the two cases apart in /proc/mounts.

    As a result it becomes possible to mount type gfs2 with -o meta and
    get the same result as mounting type gfs2meta. So it is possible to
    mount just the metafs on its own. Currently if you do this, its then
    impossible to mount the "normal" root of the gfs2 filesystem without
    first unmounting the metafs root. I'm not sure if thats a feature or
    a bug :-)

    Either way, this is a great improvement on the previous scheme and I've
    verified that it works ok with bind mounts on both the "normal" root
    and the metafs root in various combinations.

    There were also a bunch of functions in super.c which didn't belong there,
    so this moves them into ops_fstype.c where they can be static. Hopefully
    the mount/umount sequence is now more obvious as a result.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro

    Steven Whitehouse
     
  • Due to an incorrect iterator, some glocks were being missed from the
    glock dumps obtained via debugfs. This patch fixes the problem and
    ensures that we don't miss any glocks in future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jul, 2008

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • * kill nameidata * argument; map the 3 bits in ->flags anybody cares
    about to new MAY_... ones and pass with the mask.
    * kill redundant gfs2_iop_permission()
    * sanitize ecryptfs_permission()
    * fix remaining places where ->permission() instances might barf on new
    MAY_... found in mask.

    The obvious next target in that direction is permission(9)

    folded fix for nfs_permission() breakage from Miklos Szeredi

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

16 Jul, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    [GFS2] Fix GFS2's use of do_div() in its quota calculations
    [GFS2] Remove unused declaration
    [GFS2] Remove support for unused and pointless flag
    [GFS2] Replace rgrp "recent list" with mru list
    [GFS2] Allow local DF locks when holding a cached EX glock
    [GFS2] Fix delayed demote race
    [GFS2] don't call permission()
    [GFS2] Fix module building
    [GFS2] Glock documentation
    [GFS2] Remove all_list from lock_dlm
    [GFS2] Remove obsolete conversion deadlock avoidance code
    [GFS2] Remove remote lock dropping code
    [GFS2] kernel panic mounting volume
    [GFS2] Revise readpage locking
    [GFS2] Fix ordering of args for list_add
    [GFS2] trivial sparse lock annotations
    [GFS2] No lock_nolock
    [GFS2] Fix ordering bug in lock_dlm
    [GFS2] Clean up the glock core

    Linus Torvalds
     

15 Jul, 2008

1 commit


11 Jul, 2008

1 commit


10 Jul, 2008

3 commits

  • The implementation of gfs2_inode_attr_in is removed.
    So remove its declaration.

    Signed-off-by: Li Xiaodong
    Signed-off-by: Steven Whitehouse

    Li Xiaodong
     
  • The ability to mark files for direct i/o access when opened
    normally is both unused and pointless, so this patch removes
    support for that feature.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch removes the "recent list" which is used during allocation
    and replaces it with the (already existing) mru list used during
    deletion. The "recent list" was not a true mru list leading to a number
    of inefficiencies including a "next" function which made scanning the
    list an order N^2 operation wrt to the number of list elements.

    This should increase allocation performance with large numbers of rgrps.
    Its also a useful preparation and cleanup before some further changes
    which are planned in this area.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

07 Jul, 2008

2 commits

  • We already allow local SH locks while we hold a cached EX glock, so here
    we allow DF locks as well. This works only because we rely on the VFS's
    invalidation for locally cached data, and because if we hold an EX lock,
    then we know that no other node can be caching data relating to this
    file.

    It dramatically speeds up initial writes to O_DIRECT files since we fall
    back to buffered I/O for this and would otherwise bounce between DF and
    EX modes on each and every write call. The lessons to be learned from
    that are to ensure that (for the time being anyway) O_DIRECT files are
    preallocated and that they are written to using reasonably large I/O
    sizes. Even so this change fixes that corner case nicely

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There is a race in the delayed demote code where it does the wrong thing
    if a demotion to UN has occurred for other reasons before the delay has
    expired. This patch adds an assert to catch that condition as well as
    fixing the root cause by adding an additional check for the UN state.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

03 Jul, 2008

2 commits

  • GFS2 calls permission() to verify permissions after locks on the files
    have been taken.

    For this it's sufficient to call gfs2_permission() instead. This
    results in the following changes:

    - IS_RDONLY() check is not performed
    - IS_IMMUTABLE() check is not performed
    - devcgroup_inode_permission() is not called
    - security_inode_permission() is not called

    IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
    flag should provide protection against read-only remounts during
    operations. do_gfs2_set_flags() has been fixed to perform
    mnt_want_write()/mnt_drop_write() to protect against remounting
    read-only.

    IS_IMMUTABLE has been added to gfs2_permission()

    Repeating the security checks seems to be pointless, as they don't
    normally change, and if they do, it's independent of the filesystem
    state.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Steven Whitehouse

    Miklos Szeredi
     
  • - Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
    failures in all users)
    - Change all users to either use generic_file_llseek_unlocked directly or
    take the BKL around. I changed the file systems who don't use the BKL
    for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
    take the BKL, but explicitely in their own source now.

    I moved them all over in a single patch to avoid unbisectable sections.

    Open problem: 32bit kernels can corrupt fpos because its modification
    is not atomic, but they can do that anyways because there's other paths who
    modify it without BKL.

    Do we need a special lock for the pos/f_version = 0 checks?

    Trond says the NFS BKL is likely not needed, but keep it for now
    until his full audit.

    v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

    Cc: Trond.Myklebust@netapp.com
    Cc: swhiteho@redhat.com
    Cc: sfrench@samba.org
    Cc: vandrove@vc.cvut.cz

    Signed-off-by: Andi Kleen
    Signed-off-by: Andi Kleen
    Signed-off-by: Jonathan Corbet

    Andi Kleen
     

27 Jun, 2008

10 commits

  • Two lines missed from the previous patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • I discovered that we had a list onto which every lock_dlm
    lock was being put. Its only function was to discover whether
    we'd got any locks left after umount. Since there was already
    a counter for that purpose as well, I removed the list. The
    saving is sizeof(struct list_head) per glock - well worth
    having.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is only used by GFS1 so can be removed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There are several reasons why this is undesirable:

    1. It never happens during normal operation anyway
    2. If it does happen it causes performance to be very, very poor
    3. It isn't likely to solve the original problem (memory shortage
    on remote DLM node) it was supposed to solve
    4. It uses a bunch of arbitrary constants which are unlikely to be
    correct for any particular situation and for which the tuning seems
    to be a black art.
    5. In an N node cluster, only 1/N of the dropped locked will actually
    contribute to solving the problem on average.

    So all in all we are better off without it. This also makes merging
    the lock_dlm module into GFS2 a bit easier.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes Red Hat bugzilla bug 450156.

    This started with a not-too-improbable mount failure because the
    locking protocol was never set back to its proper "lock_dlm" after the
    system was rebooted in the middle of a gfs2_fsck. That left a
    (purposely) invalid locking protocol in the superblock, which caused an
    error when the file system was mounted the next time.

    When there's an error mounting, vfs calls DQUOT_OFF, which calls
    vfs_quota_off which calls gfs2_sync_fs. Next, gfs2_sync_fs calls
    gfs2_log_flush passing s_fs_info. But due to the error, s_fs_info
    had been previously set to NULL, and so we have the kernel oops.

    My solution in this patch is to test for the NULL value before passing
    it. I tested this patch and it fixes the problem.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • The previous attempt to fix the locking in readpage failed due
    to the use of a "try lock" which resulted in occasional high
    cpu usage during testing (due to repeated tries) and also it
    did not resolve all the ordering problems wrt the transaction
    lock (although it did solve all the inode lock ordering problems).

    This patch avoids the problem by unlocking the page and getting the
    locks in the correct order. This means that we have to retest the
    page to ensure that it hasn't changed when we relock the page.

    This now passes the tests which were previously failing.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The patch to remove lock_nolock managed to get the arguments
    of this list_add backwards. This fixes it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Annotate the &sdp->sd_log_lock.

    Signed-off-by: Harvey Harrison
    Signed-off-by: Steven Whitehouse

    Harvey Harrison
     
  • This patch merges the lock_nolock module into GFS2 itself. As well as removing
    some of the overhead of the module, it also means that its now impossible to
    build GFS2 without a lock module (which would be a pointless thing to do
    anyway).

    We also plan to merge lock_dlm into GFS2 in the future, but that is a more
    tricky task, and will therefore be a separate patch.

    Signed-off-by: Steven Whitehouse
    Cc: David Teigland

    Steven Whitehouse
     
  • This looks like a lot of change, but in fact its not. Mostly its
    things moving from one file to another. The change is just that
    instead of queuing lock completions and callbacks from the DLM
    we now pass them directly to GFS2.

    This gives us a net loss of two list heads per glock (a fair
    saving in memory) plus a reduction in the latency of delivering
    the messages to GFS2, plus we now have one thread fewer as well.
    There was a bug where callbacks and completions could be delivered
    in the wrong order due to this unnecessary queuing which is fixed
    by this patch.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse