23 Jun, 2006

6 commits

  • We can now make posix_locks_deadlock() static.

    Signed-off-by: Adrian Bunk
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Pass the POSIX lock owner ID to the flush operation.

    This is useful for filesystems which don't want to store any locking state
    in inode->i_flock but want to handle locking/unlocking POSIX locks
    internally. FUSE is one such filesystem but I think it possible that some
    network filesystems would need this also.

    Also add a flag to indicate that a POSIX locking request was generated by
    close(), so filesystems using the above feature won't send an extra locking
    request in this case.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • locks_remove_posix() can use posix_lock_file() instead of doing the lock
    removal by hand. posix_lock_file() now does exacly the same.

    The comment about pids no longer applies, posix_lock_file() takes only the
    owner into account.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • posix_lock_file() always allocates new locks in advance, even if it's easy to
    determine that no allocations will be needed.

    Optimize these cases:

    - FL_ACCESS flag is set

    - Unlocking the whole range

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • posix_lock_file() was too cautious, failing operations on OOM, even if they
    didn't actually require an allocation.

    This has the disadvantage, that a failing unlock on process exit could lead to
    a memory leak. There are two possibilites for this:

    - filesystem implements .lock() and calls back to posix_lock_file(). On
    cleanup of files_struct locks_remove_posix() is called which should remove all
    locks belonging to files_struct. However if filesystem calls
    posix_lock_file() which fails, then those locks will never be freed.

    - if a file is closed while a lock is blocked, then after acquiring
    fcntl_setlk() will undo the lock. But this unlock itself might fail on OOM,
    again possibly leaking the lock.

    The solution is to move the checking of the allocations until after it is sure
    that they will be needed. This will solve the above problem since unlock will
    always succeed unless it splits an existing region.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch removes the steal_locks() function.

    steal_locks() doesn't work correctly with any filesystem that does it's own
    lock management, including NFS, CIFS, etc.

    In addition it has weird semantics on local filesystems in case tasks
    sharing file-descriptor tables are doing POSIX locking operations in
    parallel to execve().

    The steal_locks() function has an effect on applications doing:

    clone(CLONE_FILES)
    /* in child */
    lock
    execve
    lock

    POSIX locks acquired before execve (by "child", "parent" or any further
    task sharing files_struct) will after the execve be owned exclusively by
    "child".

    According to Chris Wright some LSB/LTP kind of suite triggers without the
    stealing behavior, but there's no known real-world application that would
    also fail.

    Apps using NPTL are not affected, since all other threads are killed before
    execve.

    Apps using LinuxThreads are only affected if they

    - have multiple threads during exec (LinuxThreads doesn't kill other
    threads, the app may do it with pthread_kill_other_threads_np())
    - rely on POSIX locks being inherited across exec

    Both conditions are documented, but not their interaction.

    Apps using clone() natively are affected if they

    - use clone(CLONE_FILES)
    - rely on POSIX locks being inherited across exec

    The above scenarios are unlikely, but possible.

    If the patch is vetoed, there's a plan B, that involves mostly keeping the
    weird stealing semantics, but changing the way lock ownership is handled so
    that network and local filesystems work consistently.

    That would add more complexity though, so this solution seems to be
    preferred by most people.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: Chris Wright
    Cc: Christoph Hellwig
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

14 Jun, 2006

1 commit


08 May, 2006

1 commit

  • It is insane to be giving lease_init() the task of freeing the lock it is
    supposed to initialise, given that the lock is not guaranteed to be
    allocated on the stack. This causes lockups in fcntl_setlease().
    Problem diagnosed by Daniel Hokka Zakrisson

    Also fix a slab leak in __setlease() due to an uninitialised return value.
    Problem diagnosed by Björn Steinbrink.

    Signed-off-by: Trond Myklebust
    Tested-by: Daniel Hokka Zakrisson
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

20 Apr, 2006

1 commit

  • There are places in the kernel where we look up files in fd tables and
    access the file structure without holding refereces to the file. So, we
    need special care to avoid the race between looking up files in the fd
    table and tearing down of the file in another CPU. Otherwise, one might
    see a NULL f_dentry or such torn down version of the file. This patch
    fixes those special places where such a race may happen.

    Signed-off-by: Dipankar Sarma
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

01 Apr, 2006

2 commits

  • sys_flock() currently has a race which can result in a double free in the
    multi-thread case.

    Thread 1 Thread 2

    sys_flock(file, LOCK_EX)
    sys_flock(file, LOCK_UN)

    If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for
    list_empty(&lock->fl_link) at the end of sys_flock, then both threads will
    end up calling locks_free_lock for the same lock.

    Fix is to make flock_lock_file() do the same as posix_lock_file(), namely
    to make a copy of the request, so that the caller can always free the lock.

    This also has the side-effect of fixing up a reference problem in the
    lockd handling of flock.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • Don't panic! Just BUG_ON().

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

27 Mar, 2006

3 commits

  • Lockd and the NFSv4 server both exercise a race condition where
    posix_test_lock() is called either before or after posix_lock_file() to
    deal with a denied lock request due to a conflicting lock.

    Remove the race condition for the NFSv4 server by adding a new conflicting
    lock parameter to __posix_lock_file() , changing the name to
    __posix_lock_file_conf().

    Keep posix_lock_file() interface, add posix_lock_conf() interface, both
    call __posix_lock_file_conf().

    [akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Adamson
     
  • BUG instead of handling a case that should never happen.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • I discovered on oprofile hunting on a SMP platform that dentry lookups were
    slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
    a cache line that contained inodes_stat. So each time inodes_stats is
    changed by a cpu, other cpus have to refill their cache line.

    This patch moves some variables to the __read_mostly section, in order to
    avoid false sharing. RCU dentry lookups can go full speed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

21 Mar, 2006

5 commits

  • Currently lockd directly access the file_lock_list from fs/locks.c.
    It does so to mark locks granted or reclaimable. This is very
    suboptimal, because a) lockd needs to poke into locks.c internals, and
    b) it needs to iterate over all locks in the system for marking locks
    granted or reclaimable.

    This patch adds lists for granted and reclaimable locks to the nlm_host
    structure instead, and adds locks to those.

    nlmclnt_lock:
    now adds the lock to h_granted instead of setting the
    NFS_LCK_GRANTED, still O(1)

    nlmclnt_mark_reclaim:
    goes away completely, replaced by a list_splice_init.
    Complexity reduced from O(locks in the system) to O(1)

    reclaimer:
    iterates over h_reclaim now, complexity reduced from
    O(locks in the system) to O(locks per nlm_host)

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Trond Myklebust

    Christoph Hellwig
     
  • The caller of posix_test_lock() should never need to look at the lock
    private data, so do not copy that information. This also means that there
    is no need to call the fl_release_private methods.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • posix_test_lock() returns a pointer to a struct file_lock which is unprotected
    and can be removed while in use by the caller. Move the conflicting lock from
    the return to a parameter, and copy the conflicting lock.

    In most cases the caller ends up putting the copy of the conflicting lock on
    the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes.
    We're assuming that's reasonable.

    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • posix_lock_file() is used to add a blocked lock to Lockd's block, so
    posix_block_lock() is no longer needed.

    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • The struct file_lock->fl_u area must be copied using the fl_copy_lock()
    operation.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

09 Jan, 2006

1 commit

  • uninline some file locking functions

    add/remove: 3/0 grow/shrink: 0/15 up/down: 256/-1525 (-1269)
    function old new delta
    locks_free_lock - 134 +134
    posix_same_owner - 69 +69
    __locks_delete_block - 53 +53
    posix_locks_conflict 126 108 -18
    locks_remove_posix 266 237 -29
    locks_wake_up_blocks 121 87 -34
    locks_block_on_timeout 83 47 -36
    locks_insert_block 157 120 -37
    locks_delete_block 62 23 -39
    posix_unblock_lock 104 59 -45
    posix_locks_deadlock 162 100 -62
    locks_delete_lock 228 119 -109
    sys_flock 338 217 -121
    __break_lease 600 474 -126
    lease_init 252 122 -130
    fcntl_setlk64 793 649 -144
    fcntl_setlk 793 649 -144
    __posix_lock_file 1477 1026 -451

    Signed-off-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     

07 Jan, 2006

2 commits

  • If the server receives an NLM cancel call and finds no waiting lock to
    cancel, then chances are the lock has already been applied, and the client
    just hadn't yet processed the NLM granted callback before it sent the
    cancel.

    The Open Group text, for example, perimts a server to return either success
    (LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error
    seems more helpful; the client may be able to use it to recognize that a
    race has occurred and to recover from the race.

    So, modify the relevant functions to return an error in this case.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     
  • Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
    the same range. This is incorrect.

    The Open Group documentation says that "This procedure cancels an
    *outstanding* blocked lock request." (Emphasis mine.)

    Also, consider a client that holds a lock on the first byte of a file, and
    requests a lock on the entire file. If the client cancels that request
    (perhaps because the requesting process is signalled), the server shouldn't
    apply perform an unlock on the entire file, since that will also remove the
    previous lock that the client was already granted.

    Or consider a lock request that actually *downgraded* an exclusive lock to
    a shared lock.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     

14 Nov, 2005

2 commits

  • Remove time_out_leases() printk that's easily triggered by users.

    Signed-off-by: Chris Wright
    Signed-off-by: Trond Myklebust

    Chris Wright
     
  • The patch
    http://linux.bkbits.net:8080/linux-2.6/diffs/fs/locks.c@1.70??nav=index.html
    introduced a pretty nasty memory leak in the lease code. When freeing
    the lease, the code in locks_delete_lock() will correctly clean up
    the fasync queue, but when we return to fcntl_setlease(), the freed
    fasync entry will be reinstated.

    This patch ensures that we skip the call to fasync_helper() when we're
    freeing up the lease.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     

19 Oct, 2005

1 commit


24 Sep, 2005

1 commit

  • [PATCH] Fix miscompare in __posix_lock_file

    If an application requests the same lock twice, the
    kernel should just leave the existing lock in place.
    Currently, it will install a second lock of the same type.

    Signed-off-by: Olaf Kirch
    Signed-off-by: Trond Myklebust

    Olaf Kirch
     

18 Sep, 2005

1 commit

  • With the new fdtable locking rules, you have to protect fdtable with either
    ->file_lock or rcu_read_lock/unlock(). There are some places where we
    aren't doing either. This patch fixes those places.

    Signed-off-by: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

10 Sep, 2005

1 commit

  • In order for the RCU to work, the file table array, sets and their sizes must
    be updated atomically. Instead of ensuring this through too many memory
    barriers, we put the arrays and their sizes in a separate structure. This
    patch takes the first step of putting the file table elements in a separate
    structure fdtable that is embedded withing files_struct. It also changes all
    the users to refer to the file table using files_fdtable() macro. Subsequent
    applciation of RCU becomes easier after this.

    Signed-off-by: Dipankar Sarma
    Signed-Off-By: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

28 Jul, 2005

1 commit

  • I believe that there is a problem with the handling of POSIX locks, which
    the attached patch should address.

    The problem appears to be a race between fcntl(2) and close(2). A
    multithreaded application could close a file descriptor at the same time as
    it is trying to acquire a lock using the same file descriptor. I would
    suggest that that multithreaded application is not providing the proper
    synchronization for itself, but the OS should still behave correctly.

    SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when
    a file descriptor is closed, that all POSIX locks on the file, owned by the
    process which closed the file descriptor, should be released.

    The trick here is when those locks are released. The current code releases
    all locks which exist when close is processing, but any locks in progress
    are handled when the last reference to the open file is released.

    There are three cases to consider.

    One is the simple case, a multithreaded (mt) process has a file open and
    races to close it and acquire a lock on it. In this case, the close will
    release one reference to the open file and when the fcntl is done, it will
    release the other reference. For this situation, no locks should exist on
    the file when both the close and fcntl operations are done. The current
    system will handle this case because the last reference to the open file is
    being released.

    The second case is when the mt process has dup(2)'d the file descriptor.
    The close will release one reference to the file and the fcntl, when done,
    will release another, but there will still be at least one more reference
    to the open file. One could argue that the existence of a lock on the file
    after the close has completed is okay, because it was acquired after the
    close operation and there is still a way for the application to release the
    lock on the file, using an existing file descriptor.

    The third case is when the mt process has forked, after opening the file
    and either before or after becoming an mt process. In this case, each
    process would hold a reference to the open file. For each process, this
    degenerates to first case above. However, the lock continues to exist
    until both processes have released their references to the open file. This
    lock could block other lock requests.

    The changes to release the lock when the last reference to the open file
    aren't quite right because they would allow the lock to exist as long as
    there was a reference to the open file. This is too long.

    The new proposed solution is to add support in the fcntl code path to
    detect a race with close and then to release the lock which was just
    acquired when such as race is detected. This causes locks to be released
    in a timely fashion and for the system to conform to the POSIX semantic
    specification.

    This was tested by instrumenting a kernel to detect the handling locks and
    then running a program which generates case #3 above. A dangling lock
    could be reliably generated. When the changes to detect the close/fcntl
    race were added, a dangling lock could no longer be generated.

    Cc: Matthew Wilcox
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Staubach
     

08 Jul, 2005

1 commit

  • We're dereferencing `flp' and then we're testing it for NULLness.

    Either the compiler accidentally saved us or the existing null-pointer checdk
    is redundant.

    This defect was found automatically by Coverity Prevent, a static analysis tool.

    Signed-off-by: Zaur Kambarov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMBAROV, ZAUR
     

23 Jun, 2005

1 commit


06 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds