25 Jan, 2008

40 commits

  • I spotted this bug while I was digging around. Looks like it could cause
    a lockup in some rare error condition.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • There was a bug in the truncation/invalidation race path for
    ->page_mkwrite for gfs2. It ought to return 0 so that the effect is the
    same as if the page was truncated at any of the other points at which
    the page_lock is dropped. This will result in the restart of the whole
    page fault path. If it was due to a real truncation (as opposed to an
    invalidate because we let a glock go) then the ->fault path will pick
    that up when it gets called again.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes a minor typo. Surprisingly, it still compiled.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • The comparison was being made against the wrong quantity.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of
    an earlier version I got wrong). The idea here is to check if the write extends
    past the last block in the file. If so, the function can save itself a lot of
    time and trouble because it knows an allocate will be required. Benchmarks like
    iozone should see better performance.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch removes a vestigial variable "i_spin" from the gfs2_inode
    structure. This not only saves us memory (>300000 of these in memory
    for the oom test) it also saves us time because we don't have to
    spend time initializing it (i.e. slightly better performance).

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • It is possible to reduce the size of GFS2 inodes by taking the i_alloc
    structure out of the gfs2_inode. This patch allocates the i_alloc
    structure whenever its needed, and frees it afterward. This decreases
    the amount of low memory we use at the expense of requiring a memory
    allocation for each page or partial page that we write. A quick test
    with postmark shows that the overhead is not measurable and I also note
    that OCFS2 use the same approach.

    In the future I'd like to solve the problem by shrinking down the size
    of the members of the i_alloc structure, but for now, this reduces the
    immediate problem of using too much low-memory on x86 and doesn't add
    too much overhead.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Although the values were all being calculated correctly, there was a
    race in the assert due to the way it was using atomic variables. This
    changes the value we assert on so that we get the same effect by testing
    a different variable. This prevents the assert triggering when it shouldn't.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes a couple of problems which affected the execution of files
    on GFS2. The first is that there was a corner case where inodes were not
    always uptodate at the point at which permissions checks were being carried
    out, this was resulting in refusal of execute permission, but only on the
    first lookup, subsequent requests worked correctly. The second was a problem
    relating to incorrect updating of file sizes which was introduced with the
    write_begin/end code for GFS2 a little while back.

    Signed-off-by: Steven Whitehouse
    Cc: Abhijith Das

    Steven Whitehouse
     
  • Here is a patch for the latest upstream GFS2 code:
    The journal extent map needs to be initialized sooner than it
    currently is. Otherwise failed mount attempts (e.g. not enough
    journals, etc.) may panic trying to access the uninitialized list.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • To improve performance on NUMA, we use the VM's standard page
    migration for writeback and ordered pages. Probably we could
    also do the same for journaled data, but that would need a
    careful audit of the code, so will be the subject of a later
    patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The go_drop_th function is never called or referenced.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • A missing offset in the calculation.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is a small correction to my previously posted patch1.
    It just changes a divide to a shift. It's faster and doesn't
    introduce odd dependencies on 32-bit compiles.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch eliminates the unneeded sd_statfs_mutex mutex but preserves
    the ordering as discussed.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait
    was being called regardless of whether a disk read was requested.
    This just pulls that wait into the if that triggers the read.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Function gfs2_block_map was often looking up the disk inode twice.
    This optimizes it so that only does it once.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch optimizes the function gfs2_glmutex_lock.
    The basic theory is: Why bother initializing a holder, setting up
    wait bits and then waiting on them, if you know the glock can be
    yours. So the holder stuff is placed inside the if checking if the
    glock is locked. This one needs careful scrutiny because changing
    anything to do with locking should strike terror into one's heart.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • I eliminated the passing of an unused parameter into gfs2_bitfit called rgd.

    This also changes the gfs2_bitfit code that searches for free (or used) blocks.
    Before, the code was trying to check for bytes that indicated 4 blocks in
    the undesired state. The problem is, it was spending more time trying to
    do this than it actually was saving. This version only optimizes the case
    where we're looking for free blocks, and it checks a machine word at a time.
    So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit
    machines, it will check 64-bits (32 blocks) at a time. The compiler
    optimizes that quite well and we save some time, especially when running
    through full bitmaps (like the bitmaps allocated for the journals).

    There's probably a more elegant or optimized way to do this, but I haven't
    thought of it yet. I'm open to suggestions.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This just eliminates an unused variable from the quota code.
    Not likely to be a time saver.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch saves a little time when gfs2 writes to the journals by
    keeping a mapping between logical and physical blocks on disk.
    That's better than constantly looking up indirect pointers in
    buffers, when the journals are several levels of indirection
    (which they typically are).

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch is just a cleanup. Function gfs2_get_block() just calls
    function gfs2_block_map reversing the last two parameters. By
    reversing the parameters, gfs2_block_map() may be called directly
    and function gfs2_get_block may be eliminated altogether.
    Since this function is done for every block operation,
    this streamlines the code and makes it a little bit more efficient.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • The fl_owner is that of lockd when posix locks arrive from nfs
    clients, so it can't be used to distinguish between lock holders.
    Use fl_pid as owner instead; it's the pid of the process on the
    nfs client.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • A certain scenario in the rename code path triggers a kernel BUG()
    because it accidentally does recursive locking The first lock is
    requested to unlink an already existing inode (replacing a file) and the
    second lock is requested when the destination directory needs to alloc
    some space. It is rare that these two
    events happen during the same rename call, and even more rare that these
    two instances try to lock the same rgrp. It is, however, possible.
    https://bugzilla.redhat.com/show_bug.cgi?id=404711

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • GFS2 supports two modes of locking - lock_nolock for single node filesystem
    and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
    file operation table for lock_nolock protocol. This would allow VFS to handle
    posix lock and flock logics just like other in-tree filesystems without
    duplication.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio M. Di Nitto
     
  • Hi Steven,

    Steven Whitehouse wrote:
    > Hi,
    >
    > Now in the -nmw git tree. Thanks,
    >
    > Steve.
    >
    > On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote:

    this patch introduces a bunch of build warnings by leaving around

    struct inode *inode = &ip->i_inode;

    The patch in attachment cleans them up. Please apply.

    Signed-off-by: Fabio Massimo Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • Remove read/write permission() checks from xattr operations.
    VFS layer is already handling permission for xattrs via the
    xattr_permission() call, so there is no need for gfs2 to
    check permissions. Futhermore, using permission() for SELinux
    xattrs ops is incorrect.

    Signed-off-by: Ryan O'Hara
    Signed-off-by: Steven Whitehouse

    Ryan O'Hara
     
  • The issue is indeed UP vs SMP and it is totally random.

    spin_is_locked() is a bad assertion because there is no correct answer on UP.
    on UP spin_is_locked() has to return either one value or another, always.

    This means that in my setup I am lucky enough to trigger the issue and your you
    are lucky enough not to.

    the patch in attachment removes the bogus calls to BUG_ON and according to David
    (in CC and thanks for the long explanation on the problem) we can rely upon
    things like lockdep to find problem that might be trying to catch.

    Signed-off-by: Fabio M. Di Nitto
    Cc: David S. Miller
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • Print error with log_error() to be consistent with others.

    Signed-off-by: David Teigland
    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • The patch is a fix to abort mount if the mount.gfs* and possible
    umount.* are missing from /sbin.

    While we do what we can to guarantee that they are installed properly in
    userland (CVS HEAD), we want to make sure that mount still aborts properly.

    The only sign of missing helpers is that lock_dlm will receive no mount options
    at all. According to David the problem does not exist for lock_nolock as the
    helpers are not required.

    The patch has been tested for both gfs and gfs2 and it works as expected. The
    lack of mount.gfs* will generate an error that is propagated to mount:

    oot@node1:~# mount -t gfs2 /dev/nbd2 /mnt/
    mount: wrong fs type, bad option, bad superblock on /dev/nbd2,
    missing codepage or helper program, or other error
    In some cases useful info is found in syslog - try
    dmesg | tail or so

    [ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
    [ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly!
    [ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata=

    You might want to notice that it will also avoid mount to hang or fail silently
    or with strange errors that will require the cluster to reboot/restart before
    you can actually mount the filesystem again.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • We only care about the content of the jindex in two cases,
    one is when we mount the fs and the other is when we need
    to recover another journal. In both cases we have to update
    the jindex anyway, so there is no point in updating it
    periodically between times, so this removes it to simplify
    gfs2_logd.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This means that we can mark gfs2_ail1_empty static and prepares
    the way for further changes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch changes the counter which keeps track of the free
    blocks in the journal to an atomic_t in preparation for the
    following patch which will update the log reservation code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The only reason for adding glocks to the journal was to keep track
    of which locks required a log flush prior to release. We add a
    flag to the glock to allow this check to be made in a simpler way.

    This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
    and means that we can avoid extra work during the journal flush.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Use wait_event_interruptible() in the lock_dlm thread instead
    of an open coded equivalent, and include a kthread_should_stop()
    check in the wait test so we don't miss a kthread_stop().

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • This patch changes the /sys/fs/gfs2//id file to give the device
    id "major:minor" rather than the s_id. That enables gfs2_tool to
    match devices properly (by id, not name) when locating the tuning files.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
    depending upon which of the two waiters lists they were going to
    be queued upon. They were then tested when the holders were taken
    off the lists to ensure that the right type of holder was being
    dequeued.

    Since we are already using separate lists, there doesn't seem a
    lot of point having these flags as well, and since setting them
    and testing them is in the fast path for locking and unlocking
    glock, this patch removes them.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse