14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Signed-off-by: James Morris

    David Howells
     

24 Oct, 2008

1 commit

  • * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits)
    proc: remove fs/proc/proc_misc.c
    proc: move /proc/vmcore creation to fs/proc/vmcore.c
    proc: move pagecount stuff to fs/proc/page.c
    proc: move all /proc/kcore stuff to fs/proc/kcore.c
    proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
    proc: move /proc/modules boilerplate to kernel/module.c
    proc: move /proc/diskstats boilerplate to block/genhd.c
    proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmstat boilerplate to mm/vmstat.c
    proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
    proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmallocinfo to mm/vmalloc.c
    proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
    proc: move /proc/slab_allocators boilerplate to mm/slab.c
    proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
    proc: move /proc/stat to fs/proc/stat.c
    proc: move rest of /proc/partitions code to block/genhd.c
    proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
    proc: move /proc/devices code to fs/proc/devices.c
    proc: move rest of /proc/locks to fs/locks.c
    ...

    Linus Torvalds
     

23 Oct, 2008

1 commit


21 Oct, 2008

1 commit


27 Jul, 2008

1 commit

  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

3 commits

  • Allow filesystem's ->lock() method to call posix_lock_file() instead of
    posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it
    possible to implement a such a ->lock() function, that works with the lock
    manager, which needs the call to be asynchronous.

    Now the vfs_lock_file() helper can be used, so this is a cleanup as well.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Extract common code into a function.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Use a special error value FILE_LOCK_DEFERRED to mean that a locking
    operation returned asynchronously. This is returned by

    posix_lock_file() for sleeping locks to mean that the lock has been
    queued on the block list, and will be woken up when it might become
    available and needs to be retried (either fl_lmops->fl_notify() is
    called or fl_wait is woken up).

    f_op->lock() to mean either the above, or that the filesystem will
    call back with fl_lmops->fl_grant() when the result of the locking
    operation is known. The filesystem can do this for sleeping as well
    as non-sleeping locks.

    This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
    filesystems are not mistaken to mean an asynchronous locking.

    This also makes error handling in fs/locks.c and lockd/svclock.c slightly
    cleaner.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

23 Jun, 2008

1 commit


12 May, 2008

1 commit

  • It acts exactly like a regular 'cond_resched()', but will not get
    optimized away when CONFIG_PREEMPT is set.

    Normal kernel code is already preemptable in the presense of
    CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
    02b67cc3ba36bdba351d6c3a00593f4ec550d9d3 "sched: do not do
    cond_resched() when CONFIG_PREEMPT").

    But when wanting to conditionally reschedule while holding a lock, you
    need to use "cond_sched_lock(lock)", and the new function is the BKL
    equivalent of that.

    Also make fs/locks.c use it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 May, 2008

1 commit

  • fcntl_setlk()/close() race prevention has a subtle hole - we need to
    make sure that if we *do* have an fcntl/close race on SMP box, the
    access to descriptor table and inode->i_flock won't get reordered.

    As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
    STORE descriptor table entry, LOAD inode->i_flock with not a single
    lock in common on both sides. We do have BKL around the first STORE,
    but check in locks_remove_posix() is outside of BKL and for a good
    reason - we don't want BKL on common path of close(2).

    Solution is to hold ->file_lock around fcheck() in there; that orders
    us wrt removal from descriptor table that preceded locks_remove_posix()
    on close path and we either come first (in which case eviction will be
    handled by the close side) or we'll see the effect of close and do
    eviction ourselves. Note that even though it's read-only access,
    we do need ->file_lock here - rcu_read_lock() won't be enough to
    order the things.

    Signed-off-by: Al Viro

    Al Viro
     

02 May, 2008

1 commit


26 Apr, 2008

6 commits

  • Commit 1a747ee0 ("locks: don't call ->copy_lock methods on return of
    conflicting locks") changed fs/lockd/svclock.c to call
    __locks_copy_lock() instead of locks_copy_lock(), but lockd can be built
    as a module and __locks_copy_lock() is not exported, which causes a
    build error

    ERROR: "__locks_copy_lock" [fs/lockd/lockd.ko] undefined!

    with CONFIG_LOCKD=m.

    Fix this by exporting __locks_copy_lock().

    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Roland Dreier
     
  • The file_lock structure is used both as a heavy-weight representation of
    an active lock, with pointers to reference-counted structures, etc., and
    as a simple container for parameters that describe a file lock.

    The conflicting lock returned from __posix_lock_file is an example of
    the latter; so don't call the filesystem or lock manager callbacks when
    copying to it. This also saves the need for an unnecessary
    locks_init_lock in the nfsv4 server.

    Thanks to Trond for pointing out the error.

    Signed-off-by: J. Bruce Fields
    Cc: Trond Myklebust

    J. Bruce Fields
     
  • fcntl_setlease() has a struct dentry* that is used only once; this patch
    removes it.

    Signed-off-by: David M. Richter
    Signed-off-by: J. Bruce Fields

    David M. Richter
     
  • In generic_setlease(), the struct file_lock is allocated after tests for the
    presence of conflicting readers/writers is done, despite the fact that the
    allocation might block; this patch moves the allocation earlier. A subsequent
    set of patches will rely on this behavior to properly serialize between a
    modified __break_lease() and generic_setlease().

    Signed-off-by: David M. Richter
    Signed-off-by: J. Bruce Fields

    David M. Richter
     
  • In generic_setlease(), we don't need to allocate a new struct file_lock
    or check for readers or writers when called with F_UNLCK.

    Signed-off-by: David M. Richter
    Signed-off-by: J. Bruce Fields

    David M. Richter
     
  • Fixes a return-value mixup from 85c59580b30c82aa771aa33b37217a6b6851bc14
    "locks: Fix potential OOPS in generic_setlease()", in which -ENOMEM replaced
    what had been intended to stay -EAGAIN in the variable "error".

    Signed-off-by: David M. Richter
    Signed-off-by: J. Bruce Fields

    David M. Richter
     

19 Apr, 2008

1 commit


15 Apr, 2008

1 commit

  • Miklos Szeredi found the bug:

    "Basically what happens is that on the server nlm_fopen() calls
    nfsd_open() which returns -EACCES, to which nlm_fopen() returns
    NLM_LCK_DENIED.

    "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
    which in will cause fcntl_setlk() to retry forever."

    So, for example, opening a file on an nfs filesystem, changing
    permissions to forbid further access, then trying to lock the file,
    could result in an infinite loop.

    And Trond Myklebust identified the culprit, from Marc Eshel and I:

    7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
    generic/filesystem switch from setlock code"

    That commit claimed to just be reshuffling code, but actually introduced
    a behavioral change by calling the lock method repeatedly as long as it
    returned -EAGAIN.

    We assumed this would be safe, since we assumed a lock of type SETLKW
    would only return with either success or an error other than -EAGAIN.
    However, nfs does can in fact return -EAGAIN in this situation, and
    independently of whether that behavior is correct or not, we don't
    actually need this change, and it seems far safer not to depend on such
    assumptions about the filesystem's ->lock method.

    Therefore, revert the problematic part of the original commit. This
    leaves vfs_lock_file() and its other callers unchanged, while returning
    fcntl_setlk and fcntl_setlk64 to their former behavior.

    Signed-off-by: J. Bruce Fields
    Tested-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Marc Eshel
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

20 Mar, 2008

1 commit

  • Fix kernel-doc notation warnings in fs/.

    Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
    * mark_files_ro
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
    * lookup_one_len: filesystem helper to lookup single pathname component
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
    * bh_uptodate_or_lock: Test whether the buffer is uptodate
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
    * bh_submit_read: Submit a locked buffer for reading
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
    * writeback_acquire: attempt to get exclusive writeback access to a device
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
    * writeback_in_progress: determine whether there is writeback in progress
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
    * writeback_release: relinquish exclusive writeback access against a device.
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
    Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
    * void journal_invalidatepage()

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

09 Feb, 2008

1 commit

  • Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
    _all_ converted to operate on the current pid namespace. After this each call
    like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
    one.

    Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
    appropriate.

    Signed-off-by: Pavel Emelyanov
    Reviewed-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

04 Feb, 2008

3 commits

  • fcntl(F_GETLK,..) can return pid of process for not current pid namespace
    (if process is belonged to the several namespaces). It is true also for
    pids in /proc/locks. So correct behavior is saving pointer to the struct
    pid of the process lock owner.

    Signed-off-by: Vitaliy Gusev
    Acked-by: Serge Hallyn
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: J. Bruce Fields

    Vitaliy Gusev
     
  • interruptible_sleep_on_locked() is just an open-coded
    wait_event_interruptible_timeout(), with the one difference that
    interruptible_sleep_on_locked() doesn't bother to check the condition on
    which it is waiting, depending instead on the BKL to avoid the case
    where it blocks after the wakeup has already been called.

    locks_block_on_timeout() is only used in one place, so it's actually
    simpler to inline it into its caller.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: J. Bruce Fields

    Matthew Wilcox
     
  • For such a short function (with such a long comment),
    posix_locks_deadlock() seems to cause a lot of confusion. Attempt to
    make it a bit clearer:

    - Remove the initial posix_same_owner() check, which can never
    pass (since this is only called in the case that block_fl and
    caller_fl conflict)
    - Use an explicit loop (and a helper function) instead of a goto.
    - Rewrite the comment, attempting a clearer explanation, and
    removing some uninteresting historical detail.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

31 Oct, 2007

1 commit

  • It's currently possible to send posix_locks_deadlock() into an infinite
    loop (under the BKL).

    For now, fix this just by bailing out after a few iterations. We may
    want to fix this in a way that better clarifies the semantics of
    deadlock detection. But that will take more time, and this minimal fix
    is probably adequate for any realistic scenario, and is simple enough to
    be appropriate for applying to stable kernels now.

    Thanks to George Davis for reporting the problem.

    Cc: "George G. Davis"
    Signed-off-by: J. Bruce Fields
    Acked-by: Alan Cox
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

17 Oct, 2007

1 commit

  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 Oct, 2007

9 commits

  • Currently /proc/locks is shown with a proc_read function, but its behavior
    is rather complex as it has to manually handle current offset and buffer
    length. On the other hand, files that show objects from lists can be
    easily reimplemented using the sequential files and the seq_list_XXX()
    helpers.

    This saves (as usually) 16 lines of code and more than 200 from
    the .text section.

    [akpm@linux-foundation.org: no externs in C]
    [akpm@linux-foundation.org: warning fixes]
    Signed-off-by: Pavel Emelyanov
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Pavel Emelyanov
     
  • fs/locks.c: use list_for_each_entry() instead of list_for_each() in
    posix_locks_deadlock() and get_locks_status()

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton

    Matthias Kaehlcke
     
  • The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
    inode as "mandatory lockable" and there's a macro for this check called
    MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
    the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
    buggy itself, as it dereferences the inode arg twice.

    Convert this macro into static inline function and switch its users to it,
    making the code shorter and more readable.

    The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
    for superblock is already known to be true.

    Signed-off-by: Pavel Emelyanov
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: David Howells
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton

    Pavel Emelyanov
     
  • This code is run under lock_kernel(), which is dropped during
    sleeping operations, so the following race is possible:

    CPU1: CPU2:
    vfs_setlease(); vfs_setlease();
    lock_kernel();
    lock_kernel(); /* spin */
    generic_setlease():
    ...
    for (before = ...)
    /* here we found some lease after
    * which we will insert the new one
    */
    fl = locks_alloc_lock();
    /* go to sleep in this allocation and
    * drop the BKL
    */
    generic_setlease():
    ...
    for (before = ...)
    /* here we find the "before" pointing
    * at the one we found on CPU1
    */
    ->fl_change(my_before, arg);
    lease_modify();
    locks_free_lock();
    /* and we freed it */
    ...
    unlock_kernel();
    locks_insert_lock(before, fl);
    /* OOPS! We have just tried to add the lease
    * at the tail of already removed one
    */

    The similar races are already handled in other code - all the
    allocations are performed before any checks/updates.

    Thanks to Kamalesh Babulal for testing and for a bug report on an
    earlier version.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: J. Bruce Fields
    Cc: Kamalesh Babulal

    Pavel Emelyanov
     
  • This routine deletes all the elements from the list
    with the "while (!list_empty())" loop, and we already
    have a list_first_entry() macro to help it look nicer :)

    Signed-off-by: Pavel Emelyanov

    Pavel Emelyanov
     
  • This comment wasn't updated when lease support was added, and it makes
    essentially the same mistake that the code made before a recent bugfix.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When the flock_lock_file() is called to change the flock
    from F_RDLCK to F_WRLCK or vice versa the existing flock
    can be removed without appropriate warning.

    Look:
    for_each_lock(inode, before) {
    struct file_lock *fl = *before;
    if (IS_POSIX(fl))
    break;
    if (IS_LEASE(fl))
    continue;
    if (filp != fl->fl_file)
    continue;
    if (request->fl_type == fl->fl_type)
    goto out;
    found = 1;
    locks_delete_lock(before); <<<<<< !
    break;
    }

    if after this point the subsequent locks_alloc_lock() will
    fail the return code will be -ENOMEM, but the existing lock
    is already removed.

    This is a known feature that such "re-locking" is not atomic,
    but in the racy case the file should stay locked (although by
    some other process), but in this case the file will be unlocked.

    The proposal is to prepare the lock in advance keeping no chance
    to fail in the future code.

    Found during making the flocks pid-namespaces aware.

    (Note: Thanks to Reuben Farrelly for finding a bug in an earlier version
    of this patch.)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: J. Bruce Fields
    Cc: Reuben Farrelly

    Pavel Emelyanov
     
  • There's no need for another variable local to this loop; we can use the
    variable (of the same name!) already declared at the top of the function,
    and not used till later (at which point it's initialized, so this is safe).

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The first argument to posix_locks_conflict() is meant to be a lock request,
    and the second a lock from an inode's lock request. It doesn't really
    make a difference which order you call them in, since the only
    asymmetric test in posix_lock_conflict() is the check whether the second
    argument is a posix lock--and every caller already does that check for
    some reason.

    But may as well fix posix_test_lock() to call posix_locks_conflict()
    with the arguments in the same order as everywhere else.

    Signed-off-by: "J. Bruce Fields"

    J. Bruce Fields
     

12 Sep, 2007

1 commit

  • The inode->i_flock list contains the leases, flocks and posix
    locks in the specified order. However, the flocks are added in
    the head of this list thus hiding the leases from F_GETLEASE
    command, from time_out_leases() and other code that expects
    the leases to come first.

    The following example will demonstrate this:

    #define _GNU_SOURCE

    #include
    #include
    #include
    #include

    static void show_lease(int fd)
    {
    int res;

    res = fcntl(fd, F_GETLEASE);
    switch (res) {
    case F_RDLCK:
    printf("Read lease\n");
    break;
    case F_WRLCK:
    printf("Write lease\n");
    break;
    case F_UNLCK:
    printf("No leases\n");
    break;
    default:
    printf("Some shit\n");
    break;
    }
    }

    int main(int argc, char **argv)
    {
    int fd, res;

    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
    perror("Can't open file");
    return 1;
    }

    res = fcntl(fd, F_SETLEASE, F_WRLCK);
    if (res == -1) {
    perror("Can't set lease");
    return 1;
    }

    show_lease(fd);

    if (flock(fd, LOCK_SH) == -1) {
    perror("Can't flock shared");
    return 1;
    }

    show_lease(fd);

    return 0;
    }

    The first call to show_lease() will show the write lease set, but
    the second will show no leases.

    Fix the flock adding so that the leases always stay in the head
    of this list.

    Found during making the flocks pid-namespaces aware.

    Signed-off-by: Pavel Emelyanov
    Acked-by: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Andrew Morton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

01 Aug, 2007

1 commit


20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

19 Jul, 2007

1 commit