11 Mar, 2011

1 commit

  • Change how callbacks are recorded for locks. Previously, information
    about multiple callbacks was combined into a couple of variables that
    indicated what the end result should be. In some situations, we
    could not tell from this combined state what the exact sequence of
    callbacks were, and would end up either delivering the callbacks in
    the wrong order, or suppress redundant callbacks incorrectly. This
    new approach records all the data for each callback, leaving no
    uncertainty about what needs to be delivered.

    Signed-off-by: David Teigland

    David Teigland
     

15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

01 May, 2010

1 commit

  • Commit 7fe2b3190b8b299409f13cf3a6f85c2bd371f8bb fixed possible
    misordering of completion asts (casts) and blocking asts (basts)
    for kernel locks. This patch does the same for locks taken by
    user space applications.

    Signed-off-by: David Teigland

    David Teigland
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Feb, 2010

1 commit

  • When both blocking and completion callbacks are queued for lock,
    the dlm would always deliver the completion callback (cast) first.
    In some cases the blocking callback (bast) is queued before the
    cast, though, and should be delivered first. This patch keeps
    track of the order in which they were queued and delivers them
    in that order.

    This patch also keeps track of the granted mode in the last cast
    and eliminates the following bast if the bast mode is compatible
    with the preceding cast mode. This happens when a remotely mastered
    lock is demoted, e.g. EX->NL, in which case the local node queues
    a cast immediately after sending the demote message. In this way
    a cast can be queued for a mode, e.g. NL, that makes an in-transit
    bast extraneous.

    Signed-off-by: David Teigland

    David Teigland
     

01 Dec, 2009

1 commit

  • Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
    ls_allocation would be GFP_KERNEL for userland lockspaces
    and GFP_NOFS for file system lockspaces.

    It was discovered that any lockspaces on the system can
    affect all others by triggering memory reclaim in the
    file system which could in turn call back into the dlm
    to acquire locks, deadlocking dlm threads that were
    shared by all lockspaces, like dlm_recv.

    Signed-off-by: David Teigland

    David Teigland
     

12 Mar, 2009

1 commit

  • Using offsetof() to calculate name length does not work because
    it does not produce consistent results with with structure packing.
    This caused memcpy to corrupt memory by copying 4 extra bytes off
    the end of the buffer on 64 bit kernels with 32 bit userspace
    (the only case where this 32/64 compat code is used).

    The fix is to calculate name length directly from the start instead
    of trying to derive it later using count and offsetof.

    Signed-off-by: David Teigland

    David Teigland
     

24 Dec, 2008

1 commit

  • The lkb bastmode value is set in the context of processing the
    lock, and read by the dlm_astd thread. Because it's accessed
    in these two separate contexts, the writing/reading ought to
    be done under a lock. This is simple to do by setting it and
    reading it when the lkb is added to and removed from dlm_astd's
    callback list which is properly locked.

    Signed-off-by: David Teigland

    David Teigland
     

05 Sep, 2008

1 commit


29 Aug, 2008

2 commits

  • If dlm_controld (the userspace daemon that controls the setup and
    recovery of the dlm) fails, the kernel should shut down the lockspaces
    in the kernel rather than leaving them running. This is detected by
    having dlm_controld hold a misc device open while running, and if
    the kernel detects a close while the daemon is still needed, it stops
    the lockspaces in the kernel.

    Knowing that the userspace daemon isn't running also allows the
    lockspace create/remove routines to avoid waiting on the daemon
    for join/leave operations.

    Signed-off-by: David Teigland

    David Teigland
     
  • Add a count for lockspace create and release so that create can
    be called multiple times to use the lockspace from different places.
    Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
    the previous behavior of returning -EEXIST if the lockspace already
    exists.

    Signed-off-by: David Teigland

    David Teigland
     

14 Aug, 2008

1 commit


29 Jul, 2008

1 commit


15 Jul, 2008

1 commit


21 Jun, 2008

1 commit


07 Feb, 2008

2 commits


04 Feb, 2008

1 commit

  • a) in device_write(): add sentinel NUL byte, making sure that lspace.name will
    be NUL-terminated
    b) in compat_input() be keep it simple about the amounts of data we are copying.

    Signed-off-by: Al Viro
    Signed-off-by: David Teigland

    Al Viro
     

31 Jan, 2008

3 commits


20 Oct, 2007

1 commit

  • The task_struct->pid member is going to be deprecated, so start
    using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
    the kernel.

    The first thing to start with is the pid, printed to dmesg - in
    this case we may safely use task_pid_nr(). Besides, printks produce
    more (much more) than a half of all the explicit pid usage.

    [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
    Signed-off-by: Pavel Emelyanov
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

09 Jul, 2007

4 commits

  • Add a function that can be used through libdlm by a system daemon to cancel
    another process's deadlocked lock. A completion ast with EDEADLK is returned
    to the process waiting for the lock.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Various fixes related to the new timeout feature:
    - add_timeout() missed setting TIMEWARN flag on lkb's when the
    TIMEOUT flag was already set
    - clear_proc_locks should remove a dead process's locks from the
    timeout list
    - the end-of-life calculation for user locks needs to consider that
    ETIMEDOUT is equivalent to -DLM_ECANCEL
    - make initial default timewarn_cs config value visible in configfs
    - change bit position of TIMEOUT_CANCEL flag so it's not copied to
    a remote master node
    - set timestamp on remote lkb's so a lock dump will display the time
    they've been waiting

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Change the user/kernel device interface used by libdlm:
    - Add ability for userspace to check the version of the interface. libdlm
    can now adapt to different versions of the kernel interface.
    - Increase the size of the flags passed in a lock request so all possible
    flags can be used from userspace.
    - Add an opaque "xid" value for each lock. This "transaction id" will be
    used later to associate locks with each other during deadlock detection.
    - Add a "timeout" value for each lock. This is used along with the
    DLM_LKF_TIMEOUT flag.

    Also, remove a fragment of unused code in device_read().

    This patch requires updating libdlm which is backward compatible with
    older kernels.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
    flag is set, then the request/conversion will be canceled after waiting
    the specified number of centiseconds (specified per lock). This feature
    is only available for locks requested through libdlm (can be enabled for
    kernel dlm users if there's a use for it.)

    If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
    a warning message will be sent to userspace (using genetlink) after a
    request/conversion has been waiting for a given number of centiseconds
    (configurable per node). The time warnings will be used in the future
    to do deadlock detection in userspace.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

01 May, 2007

4 commits

  • This patch removes a redundant (and incorrect) assignment from compat_output

    Signed-Off-By: Patrick Caulfield
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     
  • Add code to accept purge commands from userland.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Full cancel and force-unlock support. In the past, cancel and force-unlock
    wouldn't work if there was another operation in progress on the lock. Now,
    both cancel and unlock-force can overlap an operation on a lock, meaning there
    may be 2 or 3 operations in progress on a lock in parallel. This support is
    important not only because cancel and force-unlock are explicit operations
    that an app can use, but both are used implicitly when a process exits while
    holding locks.

    Summary of changes:

    - add-to and remove-from waiters functions were rewritten to handle situations
    with more than one remote operation outstanding on a lock

    - validate_unlock_args detects when an overlapping cancel/unlock-force
    can be sent and when it needs to be delayed until a request/lookup
    reply is received

    - processing request/lookup replies detects when cancel/unlock-force
    occured during the op, and carries out the delayed cancel/unlock-force

    - manipulation of the "waiters" (remote operation) state of a lock moved under
    the standard rsb mutex that protects all the other lock state

    - the two recovery routines related to locks on the waiters list changed
    according to the way lkb's are now locked before accessing waiters state

    - waiters recovery detects when lkb's being recovered have overlapping
    cancel/unlock-force, and may not recover such locks

    - revert_lock (cancel) returns a value to distinguish cases where it did
    nothing vs cases where it actually did a cancel; the cancel completion ast
    should only be done when cancel did something

    - orphaned locks put on new list so they can be found later for purging

    - cancel must be called on a lock when making it an orphan

    - flag user locks (ENDOFLIFE) at the end of their useful life (to the
    application) so we can return an error for any further cancel/unlock-force

    - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
    either a completion or blocking ast

    - clear an unread bast on a lock that's become unlocked

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Currently if the lockspace removal fails the misc device associated with a
    lockspace is left deleted. After that there is no way to access the orphaned
    lockspace from userland.

    This patch recreates the misc device if th dlm_release_lockspace fails. I
    believe this is better than attempting to remove the lockspace first because
    that leaves an unattached device lying around. The potential gap in which there
    is no access to the lockspace between removing the misc device and recreating it
    is acceptable ... after all the application is trying to remove it, and only new
    users of the lockspace will be affected.

    Signed-Off-By: Patrick Caulfield
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     

08 Mar, 2007

1 commit


13 Feb, 2007

1 commit

  • Many struct file_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

06 Feb, 2007

1 commit

  • When a user process exits, we clear all the locks it holds. There is a
    problem, though, with locks that the process had begun unlocking before it
    exited. We couldn't find the lkb's that were in the process of being
    unlocked remotely, to flag that they are DEAD. To solve this, we move
    lkb's being unlocked onto a new list in the per-process structure that
    tracks what locks the process is holding. We can then go through this
    list to flag the necessary lkb's when clearing locks for a process when it
    exits.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

01 Sep, 2006

1 commit


21 Jul, 2006

1 commit


20 Jul, 2006

1 commit


13 Jul, 2006

1 commit

  • This changes the way the dlm handles user locks. The core dlm is now
    aware of user locks so they can be dealt with more efficiently. There is
    no more dlm_device module which previously managed its own duplicate copy
    of every user lock.

    Signed-off-by: Patrick Caulfield
    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland