17 Oct, 2007

1 commit

  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 Oct, 2007

9 commits

  • Currently /proc/locks is shown with a proc_read function, but its behavior
    is rather complex as it has to manually handle current offset and buffer
    length. On the other hand, files that show objects from lists can be
    easily reimplemented using the sequential files and the seq_list_XXX()
    helpers.

    This saves (as usually) 16 lines of code and more than 200 from
    the .text section.

    [akpm@linux-foundation.org: no externs in C]
    [akpm@linux-foundation.org: warning fixes]
    Signed-off-by: Pavel Emelyanov
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Pavel Emelyanov
     
  • fs/locks.c: use list_for_each_entry() instead of list_for_each() in
    posix_locks_deadlock() and get_locks_status()

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton

    Matthias Kaehlcke
     
  • The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
    inode as "mandatory lockable" and there's a macro for this check called
    MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
    the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
    buggy itself, as it dereferences the inode arg twice.

    Convert this macro into static inline function and switch its users to it,
    making the code shorter and more readable.

    The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
    for superblock is already known to be true.

    Signed-off-by: Pavel Emelyanov
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: David Howells
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton

    Pavel Emelyanov
     
  • This code is run under lock_kernel(), which is dropped during
    sleeping operations, so the following race is possible:

    CPU1: CPU2:
    vfs_setlease(); vfs_setlease();
    lock_kernel();
    lock_kernel(); /* spin */
    generic_setlease():
    ...
    for (before = ...)
    /* here we found some lease after
    * which we will insert the new one
    */
    fl = locks_alloc_lock();
    /* go to sleep in this allocation and
    * drop the BKL
    */
    generic_setlease():
    ...
    for (before = ...)
    /* here we find the "before" pointing
    * at the one we found on CPU1
    */
    ->fl_change(my_before, arg);
    lease_modify();
    locks_free_lock();
    /* and we freed it */
    ...
    unlock_kernel();
    locks_insert_lock(before, fl);
    /* OOPS! We have just tried to add the lease
    * at the tail of already removed one
    */

    The similar races are already handled in other code - all the
    allocations are performed before any checks/updates.

    Thanks to Kamalesh Babulal for testing and for a bug report on an
    earlier version.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: J. Bruce Fields
    Cc: Kamalesh Babulal

    Pavel Emelyanov
     
  • This routine deletes all the elements from the list
    with the "while (!list_empty())" loop, and we already
    have a list_first_entry() macro to help it look nicer :)

    Signed-off-by: Pavel Emelyanov

    Pavel Emelyanov
     
  • This comment wasn't updated when lease support was added, and it makes
    essentially the same mistake that the code made before a recent bugfix.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When the flock_lock_file() is called to change the flock
    from F_RDLCK to F_WRLCK or vice versa the existing flock
    can be removed without appropriate warning.

    Look:
    for_each_lock(inode, before) {
    struct file_lock *fl = *before;
    if (IS_POSIX(fl))
    break;
    if (IS_LEASE(fl))
    continue;
    if (filp != fl->fl_file)
    continue;
    if (request->fl_type == fl->fl_type)
    goto out;
    found = 1;
    locks_delete_lock(before); <<<<<< !
    break;
    }

    if after this point the subsequent locks_alloc_lock() will
    fail the return code will be -ENOMEM, but the existing lock
    is already removed.

    This is a known feature that such "re-locking" is not atomic,
    but in the racy case the file should stay locked (although by
    some other process), but in this case the file will be unlocked.

    The proposal is to prepare the lock in advance keeping no chance
    to fail in the future code.

    Found during making the flocks pid-namespaces aware.

    (Note: Thanks to Reuben Farrelly for finding a bug in an earlier version
    of this patch.)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: J. Bruce Fields
    Cc: Reuben Farrelly

    Pavel Emelyanov
     
  • There's no need for another variable local to this loop; we can use the
    variable (of the same name!) already declared at the top of the function,
    and not used till later (at which point it's initialized, so this is safe).

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The first argument to posix_locks_conflict() is meant to be a lock request,
    and the second a lock from an inode's lock request. It doesn't really
    make a difference which order you call them in, since the only
    asymmetric test in posix_lock_conflict() is the check whether the second
    argument is a posix lock--and every caller already does that check for
    some reason.

    But may as well fix posix_test_lock() to call posix_locks_conflict()
    with the arguments in the same order as everywhere else.

    Signed-off-by: "J. Bruce Fields"

    J. Bruce Fields
     

12 Sep, 2007

1 commit

  • The inode->i_flock list contains the leases, flocks and posix
    locks in the specified order. However, the flocks are added in
    the head of this list thus hiding the leases from F_GETLEASE
    command, from time_out_leases() and other code that expects
    the leases to come first.

    The following example will demonstrate this:

    #define _GNU_SOURCE

    #include
    #include
    #include
    #include

    static void show_lease(int fd)
    {
    int res;

    res = fcntl(fd, F_GETLEASE);
    switch (res) {
    case F_RDLCK:
    printf("Read lease\n");
    break;
    case F_WRLCK:
    printf("Write lease\n");
    break;
    case F_UNLCK:
    printf("No leases\n");
    break;
    default:
    printf("Some shit\n");
    break;
    }
    }

    int main(int argc, char **argv)
    {
    int fd, res;

    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
    perror("Can't open file");
    return 1;
    }

    res = fcntl(fd, F_SETLEASE, F_WRLCK);
    if (res == -1) {
    perror("Can't set lease");
    return 1;
    }

    show_lease(fd);

    if (flock(fd, LOCK_SH) == -1) {
    perror("Can't flock shared");
    return 1;
    }

    show_lease(fd);

    return 0;
    }

    The first call to show_lease() will show the write lease set, but
    the second will show no leases.

    Fix the flock adding so that the leases always stay in the head
    of this list.

    Found during making the flocks pid-namespaces aware.

    Signed-off-by: Pavel Emelyanov
    Acked-by: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Andrew Morton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

01 Aug, 2007

1 commit


20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

19 Jul, 2007

9 commits


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

11 May, 2007

1 commit

  • In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock
    to modify its single file_lock argument instead of taking separate input
    and output arguments. This makes it no longer safe to set the output
    lock's fl_type to F_UNLCK before looking for a conflict, since that
    means searching for a conflict against a lock with type F_UNLCK.

    This fixes a regression which causes F_GETLK to incorrectly report no
    conflict on most filesystems (including any filesystem that doesn't do
    its own locking).

    Also fix posix_lock_to_flock() to copy the lock type. This isn't
    strictly necessary, since the caller already does this; but it seems
    less likely to cause confusion in the future.

    Thanks to Doug Chapman for the bug report.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Doug Chapman
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

08 May, 2007

2 commits

  • * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
    gfs2: nfs lock support for gfs2
    lockd: add code to handle deferred lock requests
    lockd: always preallocate block in nlmsvc_lock()
    lockd: handle test_lock deferrals
    lockd: pass cookie in nlmsvc_testlock
    lockd: handle fl_grant callbacks
    lockd: save lock state on deferral
    locks: add fl_grant callback for asynchronous lock return
    nfsd4: Convert NFSv4 to new lock interface
    locks: add lock cancel command
    locks: allow {vfs,posix}_lock_file to return conflicting lock
    locks: factor out generic/filesystem switch from setlock code
    locks: factor out generic/filesystem switch from test_lock
    locks: give posix_test_lock same interface as ->lock
    locks: make ->lock release private data before returning in GETLK case
    locks: create posix-to-flock helper functions
    locks: trivial removal of unnecessary parentheses

    Linus Torvalds
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 May, 2007

7 commits

  • Acquiring a lock on a cluster filesystem may require communication with
    remote hosts, and to avoid blocking lockd or nfsd threads during such
    communication, we allow the results to be returned asynchronously.

    When a ->lock() call needs to block, the file system will return
    -EINPROGRESS, and then later return the results with a call to the
    routine in the fl_grant field of the lock_manager_operations struct.

    This differs from the case when ->lock returns -EAGAIN to a blocking
    lock request; in that case, the filesystem calls fl_notify when the lock
    is granted, and the caller retries the original lock. So while
    fl_notify is merely a hint to the caller that it should retry, fl_grant
    actually communicates the final result of the lock operation (with the
    lock already acquired in the succesful case).

    Therefore fl_grant takes a lock, a status and, for the test lock case, a
    conflicting lock. We also allow fl_grant to return an error to the
    filesystem, to handle the case where the fl_grant requests arrives after
    the lock manager has already given up waiting for it.

    Signed-off-by: Marc Eshel
    Signed-off-by: J. Bruce Fields

    Marc Eshel
     
  • Lock managers need to be able to cancel pending lock requests. In the case
    where the exported filesystem manages its own locks, it's not sufficient just
    to call posix_unblock_lock(); we need to let the filesystem know what's
    happening too.

    We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this
    might also be made available to userspace applications that could benefit from
    an asynchronous locking api.

    Signed-off-by: Marc Eshel
    Signed-off-by: "J. Bruce Fields"

    Marc Eshel
     
  • The nfsv4 protocol's lock operation, in the case of a conflict, returns
    information about the conflicting lock.

    It's unclear how clients can use this, so for now we're not going so far as to
    add a filesystem method that can return a conflicting lock, but we may as well
    return something in the local case when it's easy to.

    Signed-off-by: Marc Eshel
    Signed-off-by: "J. Bruce Fields"

    Marc Eshel
     
  • Factor out the code that switches between generic and filesystem-specific lock
    methods; eventually we want to call this from lock managers (lockd and nfsd)
    too; currently they only call the generic methods.

    This patch does that for all the setlk code.

    Signed-off-by: Marc Eshel
    Signed-off-by: "J. Bruce Fields"

    Marc Eshel
     
  • Factor out the code that switches between generic and filesystem-specific lock
    methods; eventually we want to call this from lock managers (lockd and nfsd)
    too; currently they only call the generic methods.

    This patch does that for test_lock.

    Note that this hasn't been necessary until recently, because the few
    filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS.
    However GFS (and, in the future, other cluster filesystems) need to implement
    their own locking to get cluster-coherent locking, and also want to be able to
    export locking to NFS (lockd and NFSv4).

    So we accomplish this by factoring out code such as this and exporting it for
    the use of lockd and nfsd.

    Signed-off-by: "J. Bruce Fields"

    J. Bruce Fields
     
  • posix_test_lock() and ->lock() do the same job but have gratuitously
    different interfaces. Modify posix_test_lock() so the two agree,
    simplifying some code in the process.

    Signed-off-by: Marc Eshel
    Signed-off-by: "J. Bruce Fields"

    Marc Eshel
     
  • The file_lock argument to ->lock is used to return the conflicting lock
    when found. There's no reason for the filesystem to return any private
    information with this conflicting lock, but nfsv4 is.

    Fix nfsv4 client, and modify locks.c to stop calling fl_release_private
    for it in this case.

    Signed-off-by: "J. Bruce Fields"
    Cc: "Trond Myklebust" "

    J. Bruce Fields
     

17 Apr, 2007

2 commits


09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Oct, 2006

1 commit


01 Oct, 2006

1 commit