11 Jan, 2012

1 commit


04 Jan, 2012

3 commits

  • These new callbacks notify the dlm user about lock recovery.
    GFS2, and possibly others, need to be aware of when the dlm
    will be doing lock recovery for a failed lockspace member.

    In the past, this coordination has been done between dlm and
    file system daemons in userspace, which then direct their
    kernel counterparts. These callbacks allow the same
    coordination directly, and more simply.

    Signed-off-by: David Teigland

    David Teigland
     
  • Slot numbers are assigned to nodes when they join the lockspace.
    The slot number chosen is the minimum unused value starting at 1.
    Once a node is assigned a slot, that slot number will not change
    while the node remains a lockspace member. If the node leaves
    and rejoins it can be assigned a new slot number.

    A new generation number is also added to a lockspace. It is
    set and incremented during each recovery along with the slot
    collection/assignment.

    The slot numbers will be passed to gfs2 which will use them as
    journal id's.

    Signed-off-by: David Teigland

    David Teigland
     
  • Put all the calls to recovery barriers in the same function
    to clarify where they each happen. Should not change any behavior.
    Also modify some recovery debug lines to make them consistent.

    Signed-off-by: David Teigland

    David Teigland
     

23 Nov, 2011

1 commit


19 Nov, 2011

1 commit

  • Change the linked lists to rb_tree's in the rsb
    hash table to speed up searches. Slow rsb searches
    were having a large impact on gfs2 performance due
    to the large number of dlm locks gfs2 uses.

    Signed-off-by: Bob Peterson
    Signed-off-by: David Teigland

    Bob Peterson
     

26 Jul, 2011

1 commit

  • * 'for-3.1' of git://linux-nfs.org/~bfields/linux:
    nfsd: don't break lease on CLAIM_DELEGATE_CUR
    locks: rename lock-manager ops
    nfsd4: update nfsv4.1 implementation notes
    nfsd: turn on reply cache for NFSv4
    nfsd4: call nfsd4_release_compoundargs from pc_release
    nfsd41: Deny new lock before RECLAIM_COMPLETE done
    fs: locks: remove init_once
    nfsd41: check the size of request
    nfsd41: error out when client sets maxreq_sz or maxresp_sz too small
    nfsd4: fix file leak on open_downgrade
    nfsd4: remember to put RW access on stateid destruction
    NFSD: Added TEST_STATEID operation
    NFSD: added FREE_STATEID operation
    svcrpc: fix list-corrupting race on nfsd shutdown
    rpc: allow autoloading of gss mechanisms
    svcauth_unix.c: quiet sparse noise
    svcsock.c: include sunrpc.h to quiet sparse noise
    nfsd: Remove deprecated nfsctl system call and related code.
    NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

    Fix up trivial conflicts in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     

21 Jul, 2011

1 commit

  • Both the filesystem and the lock manager can associate operations with a
    lock. Confusingly, one of them (fl_release_private) actually has the
    same name in both operation structures.

    It would save some confusion to give the lock-manager ops different
    names.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

20 Jul, 2011

1 commit


16 Jul, 2011

1 commit

  • Instead of creating our own kthread (dlm_astd) to deliver
    callbacks for all lockspaces, use a per-lockspace workqueue
    to deliver the callbacks. This eliminates complications and
    slowdowns from many lockspaces sharing the same thread.

    Signed-off-by: David Teigland

    David Teigland
     

15 Jul, 2011

1 commit


13 Jul, 2011

1 commit

  • By pre-allocating rsb structs before searching the hash
    table, they can be inserted immediately. This avoids
    always having to repeat the search when adding the struct
    to hash list.

    This also adds space to the rsb struct for a max resource
    name, so an rsb allocation can be used by any request.
    The constant size also allows us to finally use a slab
    for the rsb structs.

    Signed-off-by: David Teigland

    David Teigland
     

11 Jul, 2011

3 commits

  • This is simpler and quicker than the hash table, and
    avoids needing to search the hash list for every new
    lkid to check if it's used.

    Signed-off-by: David Teigland

    David Teigland
     
  • The gfp and size args were switched.

    Signed-off-by: David Teigland

    David Teigland
     
  • In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small
    issues:

    1) There's no need to test the return value of the allocation and do a
    memset if is succeedes. Just use kzalloc() to obtain zeroed memory.

    2) Since kfree() handles NULL pointers gracefully, the test of
    'warned' against NULL before the kfree() after the loop is completely
    pointless. Remove it.

    3) The arguments to kmalloc() (now kzalloc()) were swapped. Thanks to
    Dr. David Alan Gilbert for pointing this out.

    Signed-off-by: Jesper Juhl
    Signed-off-by: David Teigland

    Jesper Juhl
     

07 Jul, 2011

1 commit


02 Jul, 2011

1 commit


01 Jul, 2011

1 commit


27 May, 2011

1 commit

  • * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    gfs2: Drop __TIME__ usage
    isdn/diva: Drop __TIME__ usage
    atm: Drop __TIME__ usage
    dlm: Drop __TIME__ usage
    wan/pc300: Drop __TIME__ usage
    parport: Drop __TIME__ usage
    hdlcdrv: Drop __TIME__ usage
    baycom: Drop __TIME__ usage
    pmcraid: Drop __DATE__ usage
    edac: Drop __DATE__ usage
    rio: Drop __DATE__ usage
    scsi/wd33c93: Drop __TIME__ usage
    scsi/in2000: Drop __TIME__ usage
    aacraid: Drop __TIME__ usage
    media/cx231xx: Drop __TIME__ usage
    media/radio-maxiradio: Drop __TIME__ usage
    nozomi: Drop __TIME__ usage
    cyclades: Drop __TIME__ usage

    Linus Torvalds
     

26 May, 2011

1 commit

  • The kernel already prints its build timestamp during boot, no need to
    repeat it in random drivers and produce different object files each
    time.

    Cc: Christine Caulfield
    Cc: David Teigland
    Cc: cluster-devel@redhat.com
    Signed-off-by: Michal Marek

    Michal Marek
     

25 May, 2011

1 commit


23 May, 2011

1 commit

  • Allow processes blocked on plock requests to be interrupted
    when they are killed. This leaves the problem of cleaning
    up the lock state in userspace. This has three parts:

    1. Add a flag to unlock operations sent to userspace
    indicating the file is being closed. Userspace will
    then look for and clear any waiting plock operations that
    were abandoned by an interrupted process.

    2. Queue an unlock-close operation (like in 1) to clean up
    userspace from an interrupted plock request. This is needed
    because the vfs will not send a cleanup-unlock if it sees no
    locks on the file, which it won't if the interrupted operation
    was the only one.

    3. Do not use replies from userspace for unlock-close operations
    because they are unnecessary (they are just cleaning up for the
    process which did not make an unlock call). This also simplifies
    the new unlock-close generated from point 2.

    Signed-off-by: David Teigland

    David Teigland
     

05 Apr, 2011

1 commit

  • kmalloc a stub message struct during recovery instead of sharing the
    struct in the lockspace. This leaves the lockspace stub_ms only for
    faking downconvert replies, where it is never modified and sharing
    is not a problem.

    Also improve the debug messages in the same recovery function.

    Signed-off-by: David Teigland

    David Teigland
     

02 Apr, 2011

1 commit


31 Mar, 2011

1 commit


28 Mar, 2011

1 commit


11 Mar, 2011

3 commits

  • Replaces deprecated create_singlethread_workqueue().

    Signed-off-by: David Teigland

    David Teigland
     
  • Make all three hash tables a consistent size of 1024
    rather than 1024, 512, 256. All three tables, for
    resources, locks, and lock dir entries, will generally
    be filled to the same order of magnitude.

    Signed-off-by: David Teigland

    David Teigland
     
  • Change how callbacks are recorded for locks. Previously, information
    about multiple callbacks was combined into a couple of variables that
    indicated what the end result should be. In some situations, we
    could not tell from this combined state what the exact sequence of
    callbacks were, and would end up either delivering the callbacks in
    the wrong order, or suppress redundant callbacks incorrectly. This
    new approach records all the data for each callback, leaving no
    uncertainty about what needs to be delivered.

    Signed-off-by: David Teigland

    David Teigland
     

12 Feb, 2011

1 commit

  • The recent commit to use cmwq for send and recv threads
    dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems,
    apparently due to multiple workqueue threads. Single threads
    make the problems go away, so return to that until we fully
    understand the concurrency issues with multiple threads.

    Signed-off-by: David Teigland

    David Teigland
     

17 Jan, 2011

1 commit

  • This patch fixes the following kconfig error after changing
    CONFIGFS_FS -> select SYSFS:

    fs/sysfs/Kconfig:1:error: recursive dependency detected!
    fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
    fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM
    fs/dlm/Kconfig:1: symbol DLM depends on SYSFS

    Signed-off-by: Nicholas A. Bellinger
    Cc: Joel Becker
    Cc: Randy Dunlap
    Cc: Stephen Rothwell
    Cc: James Bottomley

    Nicholas Bellinger
     

14 Dec, 2010

1 commit


13 Nov, 2010

3 commits

  • Calling cond_resched() after every send can unnecessarily
    degrade performance. Go back to an old method of scheduling
    after 25 messages.

    Signed-off-by: Bob Peterson
    Signed-off-by: David Teigland

    Bob Peterson
     
  • Nagling doesn't help and can sometimes hurt dlm comms.

    Signed-off-by: David Teigland

    David Teigland
     
  • So far as I can tell, there is no reason to use a single-threaded
    send workqueue for dlm, since it may need to send to several sockets
    concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid
    any possible deadlocks, WQ_HIGHPRI since locking traffic is highly
    latency sensitive (and to avoid a priority inversion wrt GFS2's
    glock_workqueue) and WQ_FREEZABLE just in case someone needs to do
    that (even though with current cluster infrastructure, it doesn't
    make sense as the node will most likely land up ejected from the
    cluster) in the future.

    Signed-off-by: Steven Whitehouse
    Cc: Tejun Heo
    Signed-off-by: David Teigland

    Steven Whitehouse
     

12 Nov, 2010

1 commit

  • In the normal regime where an application uses non-blocking I/O
    writes on a socket, they will handle -EAGAIN and use poll() to
    wait for send space.

    They don't actually sleep on the socket I/O write.

    But kernel level RPC layers that do socket I/O operations directly
    and key off of -EAGAIN on the write() to "try again later" don't
    use poll(), they instead have their own sleeping mechanism and
    rely upon ->sk_write_space() to trigger the wakeup.

    So they do effectively sleep on the write(), but this mechanism
    alone does not let the socket layers know what's going on.

    Therefore they must emulate what would have happened, otherwise
    TCP cannot possibly see that the connection is application window
    size limited.

    Handle this, therefore, like SUNRPC by setting SOCK_NOSPACE and
    bumping the ->sk_write_count as needed when we hit the send buffer
    limits.

    This should make TCP send buffer size auto-tuning and the
    ->sk_write_space() callback invocations actually happen.

    Signed-off-by: David S. Miller
    Signed-off-by: David Teigland

    David Miller
     

23 Oct, 2010

1 commit


15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

03 Sep, 2010

1 commit

  • When converting a lock, an lkb is in the granted state and also being used
    to request a new state. In the case that the conversion was a "try 1cb"
    type which has failed, and if the new state was incompatible with the old
    state, a callback was being generated to the requesting node. This is
    incorrect as callbacks should only be sent to all the other nodes holding
    blocking locks. The requesting node should receive the normal (failed)
    response to its "try 1cb" conversion request only.

    This was discovered while debugging a performance problem on GFS2, however
    this fix also speeds up GFS as well. In the GFS2 case the performance gain
    is over 10x for cases of write activity to an inode whose glock is cached
    on another, idle (wrt that glock) node.

    (comment added, dct)

    Signed-off-by: Steven Whitehouse
    Tested-by: Abhijith Das
    Signed-off-by: David Teigland

    Steven Whitehouse
     

06 Aug, 2010

1 commit

  • hlist_for_each_entry binds its first argument to a non-null value, and thus
    any null test on the value of that argument is superfluous.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    iterator I;
    expression x,E,E1,E2;
    statement S,S1,S2;
    @@

    I(x,...) { }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David Teigland

    Julia Lawall