18 Sep, 2008

1 commit

  • Until now, we've used the same scheme as GFS1 for atime. This has failed
    since atime is a per vfsmnt flag, not a per fs flag and as such the
    "noatime" flag was not getting passed down to the filesystems. This
    patch removes all the "special casing" around atime updates and we
    simply use the VFS's atime code.

    The net result is that GFS2 will now support all the same atime related
    mount options of any other filesystem on a per-vfsmnt basis. We do lose
    the "lazy atime" updates, but we gain "relatime". We could add lazy
    atime to the VFS at a later date, if there is a requirement for that
    variant still - I suspect relatime will be enough.

    Also we lose about 100 lines of code after this patch has been applied,
    and I have a suspicion that it will speed things up a bit, even when
    atime is "on". So it seems like a nice clean up as well.

    From a user perspective, everything stays the same except the loss of
    the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
    least, and to be honest I don't think anybody ever used it) and that a
    number of options which were ignored before now work correctly.

    Please let me know if you've got any comments. I'm pushing this out
    early so that you can all see what my plans are.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jun, 2008

2 commits

  • There are several reasons why this is undesirable:

    1. It never happens during normal operation anyway
    2. If it does happen it causes performance to be very, very poor
    3. It isn't likely to solve the original problem (memory shortage
    on remote DLM node) it was supposed to solve
    4. It uses a bunch of arbitrary constants which are unlikely to be
    correct for any particular situation and for which the tuning seems
    to be a black art.
    5. In an N node cluster, only 1/N of the dropped locked will actually
    contribute to solving the problem on average.

    So all in all we are better off without it. This also makes merging
    the lock_dlm module into GFS2 a bit easier.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch implements a number of cleanups to the core of the
    GFS2 glock code. As a result a lot of code is removed. It looks
    like a really big change, but actually a large part of this patch
    is either removing or moving existing code.

    There are some new bits too though, such as the new run_queue()
    function which is considerably streamlined. Highlights of this
    patch include:

    o Fixes a cluster coherency bug during SH -> EX lock conversions
    o Removes the "glmutex" code in favour of a single bit lock
    o Removes the ->go_xmote_bh() for inodes since it was duplicating
    ->go_lock()
    o We now only use the ->lm_lock() function for both locks and
    unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
    o The fast path is considerably shortly, giving performance gains
    especially with lock_nolock
    o The glock_workqueue is now used for all the callbacks from the DLM
    which allows us to simplify the lock_dlm module (see following patch)
    o The way is now open to make further changes such as eliminating the two
    threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
    scheme.

    This patch has undergone extensive testing with various test suites
    so it should be pretty stable by now.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

31 Mar, 2008

2 commits

  • We've previously been using a "try lock" in readpage on the basis that
    it would prevent deadlocks due to the inverted lock ordering (our normal
    lock ordering is glock first and then page lock). Unfortunately tests
    have shown that this isn't enough. If the glock has a demote request
    queued such that run_queue() in the glock code tries to do a demote when
    its called under readpage then it will try and write out all the dirty
    pages which requires locking them. This then deadlocks with the page
    locked by readpage.

    The solution is to always require two calls into readpage. The first
    unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
    second does the actual readpage and unlocks the glock & page as
    required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • gfs2_glock_hold() can now become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Steven Whitehouse

    Adrian Bunk
     

08 Feb, 2008

1 commit

  • The gl_owner_pid field is used to get the holder task by its pid and check
    whether the current is a holder, so make it in a proper manner, i.e. via the
    struct pid * manipulations.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Acked-by: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

10 Oct, 2007

2 commits

  • This patch adds a new flag to the gfs2_holder structure GL_FLOCK.
    It is set on holders of glocks representing flocks. This flag is
    checked in add_to_queue() and a process is permitted to queue more
    than one holder onto a glock if it is set. This solves the issue
    of a process not being able to do multiple flocks on the same file.
    Through a single descriptor, a process can now promote and demote
    flocks. Through multiple descriptors a process can now queue
    multiple flocks on the same file. There's still the problem of
    a process deadlocking itself (because gfs2 blocking locks are not
    interruptible) by queueing incompatible deadlock.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • We only need a single gfs2_scand process rather than the one
    per filesystem which we had previously. As a result the parameter
    determining the frequency of gfs2_scand runs becomes a module
    parameter rather than a mount parameter as it was before.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 Jul, 2007

1 commit

  • There were two issues during deallocation of unlinked inodes. The
    first was relating to the use of a "try" lock which in the case of
    the inode lock wasn't trying hard enough to deallocate in all
    circumstances (now changed to a normal glock) and in the case of
    the iopen lock didn't wait for the demotion of the shared lock before
    attempting to get the exclusive lock, and thereby sometimes (timing dependent)
    not completing the deallocation when it should have done.

    The second issue related to the lack of a way to invalidate dcache entries
    on remote nodes (now fixed by this patch) which meant that unlinks were
    taking a long time to return disk space to the fs. By adding some code to
    invalidate the dcache entries across the cluster for unlinked inodes, that
    is now fixed.

    This patch was written jointly by Abhijith Das and Steven Whitehouse.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 May, 2007

3 commits

  • In Testing the previously posted and accepted patch for
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
    I uncovered some gfs2 badness. It turns out that the current
    gfs2 code saves off a process pointer when glocks is taken
    in both the glock and glock holder structures. Those
    structures will persist in memory long after the process has
    ended; pointers to poisoned memory.

    This problem isn't caused by the 228540 fix; the new capability
    introduced by the fix just uncovered the problem.

    I wrote this patch that avoids saving process pointers
    and instead saves off the process pid. Rather than
    referencing the bad pointers, it now does process lookups.
    There is special code that makes the output nicer for
    printing holder information for processes that have ended.

    This patch also adds a stub for the new "sprint_symbol"
    function that exists in Andrew Morton's -mm patch set, but
    won't go into the base kernel until 2.6.22, since it adds
    functionality but doesn't fix a bug.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     
  • This patch prevents the printing of a warning message in cases where
    the fs is functioning normally by handing off responsibility for
    unlinked, but still open inodes, to another node for eventual deallocation.
    Also, there is now an improved system for ensuring that such requests
    to other nodes do not get lost. The callback on the iopen lock is
    only ever called when i_nlink == 0 and when a node is unable to deallocate
    it due to it still being in use on another node. When a node receives
    the callback therefore, it knows that i_nlink must be zero, so we mark
    it as such (in gfs2_drop_inode) in order that it will then attempt
    deallocation of the inode itself.

    As an additional benefit, queuing a demote request no longer requires
    a memory allocation. This simplifies the code for dealing with gfs2_holders
    as it removes one special case.

    There are two new fields in struct gfs2_glock. gl_demote_state is the
    state which the remote node has requested and gl_demote_time is the
    time when the request came in. Both fields are only valid when the
    GLF_DEMOTE flag is set in gl_flags.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The attached patch resolves bz 228540. This adds the capability
    for gfs2 to dump gfs2 locks through the debugfs file system.
    This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
    gfs2 because all the ioctls were stripped out. Please see the bugzilla
    for more history about the fix. This patch is also attached to the bugzilla
    record.

    The patch is against Steve Whitehouse's latest nmw git tree kernel
    (2.6.21-rc1) and has been tested on system trin-10.

    Signed-off-by: Robert Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     

06 Feb, 2007

4 commits

  • This patch doesn't make any changes to the ordering of the various
    operations related to glocking, but it does tidy up the calls to the
    glops.c functions to make the structure more obvious.

    The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
    made static within glock.c since they are called by every set of glock
    operations. The xmote_th and drop_th glock operations are then made
    conditional upon those two routines existing and called from the
    previously mentioned functions in glock.c respectively.

    Also it can be seen that the go_sync operation isn't needed since it can
    easily be replaced by calls to xmote_bh and drop_bh respectively. This
    results in no longer (confusingly) calling back into routines in glock.c
    from glops.c and also reducing the glock operations by one member.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Here is a patch for GFS2 to remove the local exclusive flag. In
    the places it was used, mutex's are always held earlier in the
    call path, so it appears redundant in the LM_ST_SHARED case.

    Also, the GFS2 holders were setting local exclusive in any case where
    the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
    code where the flag was tested have been replaced with tests for the
    lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
    same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
    as globally exclusive).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The "greedy" code was an attempt to retain glocks for a minimum length
    of time when they relate to mmap()ed files. The current implementation
    of this feature is not, however, ideal in that it required allocating
    memory in order to do this and its overly complicated.

    It also misses the mark by ignoring the other I/O operations which are
    just as likely to suffer from the same problem. So the plan is to remove
    this now and then add the functionality back as part of the glock state
    machine at a later date (and thus take into account all the possible
    users of this feature)

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This removes the extra filldir callback which gfs2 was using to
    enclose an attempt at readahead for inodes during readdir. The
    code was too complicated and also hurts performance badly in the
    case that the getdents64/readdir call isn't being followed by
    stat() and it wasn't even getting it right all the time when it
    was.

    As a result, on my test box an "ls" of a directory containing 250000
    files fell from about 7mins (freshly mounted, so nothing cached) to
    between about 15 to 25 seconds. When the directory content was cached,
    the time taken fell from about 3mins to about 4 or 5 seconds.

    Interestingly in the cached case, running "ls -l" once reduced the time
    taken for subsequent runs of "ls" to about 6 secs even without this
    patch. Now it turns out that there was a special case of glocks being
    used for prefetching the metadata, but because of the timeouts for these
    locks (set to 10 secs) the metadata was being timed out before it was
    being used and this the prefetch code was constantly trying to prefetch
    the same data over and over.

    Calling "ls -l" meant that the inodes were brought into memory and once
    the inodes are cached, the glocks are not disposed of until the inodes
    are pushed out of the cache, thus extending the lifetime of the glocks,
    and thus bringing down the time for subsequent runs of "ls"
    considerably.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Nov, 2006

3 commits

  • This fixes a bug which resulted in poor performance due to flushing
    the journal too often. The code path in question was via the inode_go_sync()
    function in glops.c. The solution is not to flush the journal immediately
    when inodes are ejected from memory, but batch up the work for glockd to
    deal with later on. This means that glocks may now live on beyond the end of
    the lifetime of their inodes (but not very much longer in the normal case).

    Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
    calculation of the number of free journal blocks.

    The gfs2_logd process has been altered to be more responsive to the journal
    filling up. We now wake it up when the number of uncommitted journal blocks
    has reached the threshold level rather than trying to flush directly at the
    end of each transaction. This again means doing fewer, but larger, log
    flushes in general.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This fixes a race between the glock and the page lock encountered
    during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages
    function doesn't need the same fix since it only uses a try lock anyway, so
    it will fail back to gfs2_readpage in the case of a potential deadlock.

    This bug was spotted by Russell Cattelan.

    Cc: Russell Cattelan
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There is no way to set the GL_DUMP flag, and in any case the
    same thing can be done with systemtap if required for debugging,
    so this removes it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

10 Sep, 2006

1 commit


08 Sep, 2006

3 commits

  • As requested by Jan Engelhardt, this removes the typedefs in the
    locking module interface and replaces them with void *. Also
    since we are changing the interface, I've added a few consts
    as well.

    Cc: Jan Engelhardt
    Cc: David Teigland
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This removes one of the typedefs from the locking interface. It
    is replaced by a forward declaration of the gfs2 superblock. The
    other two are not so easy to solve since in their case, they
    can refer to one of two possible structures.

    Cc: David Teigland
    Cc: Jan Engelhardt
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There are several reasons why we want to do this:
    - Firstly its large and thus we'll scale better with multiple
    GFS2 fs mounted at the same time
    - Secondly its easier to scale its size as required (thats a plan
    for later patches)
    - Thirdly, we can use kzalloc rather than vmalloc when allocating
    the superblock (its now only 4888 bytes)
    - Fourth its all part of my plan to eventually be able to use RCU
    with the glock hash.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Sep, 2006

2 commits


04 Sep, 2006

1 commit


01 Sep, 2006

1 commit

  • As per comments from Jan Engelhardt this
    updates the copyright message to say "version" in full rather than
    "v.2". Also incore.h has been updated to remove forward structure
    declarations which are not required.

    The gfs2_quota_lvb structure has now had endianess annotations added
    to it. Also quota.c has been updated so that we now store the
    lvb data locally in endian independant format to avoid needing
    a structure in host endianess too. As a result the endianess
    conversions are done as required at various points and thus the
    conversion routines in lvb.[ch] are no longer required. I've
    moved the one remaining constant in lvb.h thats used into lm.h
    and removed the unused lvb.[ch].

    I have not changed the HIF_ constants. That is left to a later patch
    which I hope will unify the gh_flags and gh_iflags fields of the
    struct gfs2_holder.

    Cc: Jan Engelhardt
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Aug, 2006

2 commits


15 Jun, 2006

1 commit

  • This patch fixes the way we have been dealing with unlinked,
    but still open files. It removes all limits (other than memory
    for inodes, as per every other filesystem) on numbers of these
    which we can support on GFS2. It also means that (like other
    fs) its the responsibility of the last process to close the file
    to deallocate the storage, rather than the person who did the
    unlinking. Note that with GFS2, those two events might take place
    on different nodes.

    Also there are a number of other changes:

    o We use the Linux inode subsystem as it was intended to be
    used, wrt allocating GFS2 inodes
    o The Linux inode cache is now the point which we use for
    local enforcement of only holding one copy of the inode in
    core at once (previous to this we used the glock layer).
    o We no longer use the unlinked "special" file. We just ignore it
    completely. This makes unlinking more efficient.
    o We now use the 4th block allocation state. The previously unused
    state is used to track unlinked but still open inodes.
    o gfs2_inoded is no longer needed
    o Several fields are now no longer needed (and removed) from the in
    core struct gfs2_inode
    o Several fields are no longer needed (and removed) from the in core
    superblock

    There are a number of future possible optimisations and clean ups
    which have been made possible by this patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 May, 2006

2 commits


28 Apr, 2006

1 commit

  • This patch contains the following possible cleanups:
    - make needlessly global code static
    - #if 0 unused functions
    - remove the following global function that was both unused and
    unimplemented:
    - super.c: gfs2_do_upgrade()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Steven Whitehouse

    Adrian Bunk
     

27 Apr, 2006

1 commit


21 Apr, 2006

1 commit


18 Apr, 2006

1 commit

  • When allocating memory to sort directory entries, use vmalloc()
    rather than kmalloc() since for larger directories, the required
    size can easily be graeter than the 128k maximum of kmalloc().

    Also adding the first steps towards getting the AOP_TRUNCATED_PAGE
    return code get in the glock code by flagging all places where we
    request a glock and we are holding a page lock.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Mar, 2006

1 commit

  • Update the debugging code in trans.c and at the same time improve
    the debugging code for gfs2_holders. The new code should be pretty
    fast during the normal case and provide just as much information
    in case of errors (or more).

    One small function from glock.c has moved to glock.h as a static inline so
    that its return address won't get in the way of the debugging.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

17 Jan, 2006

1 commit