06 Feb, 2008

25 commits

  • smbfs is a bit buggy and has no maintainer. Change it to shout at the user on
    the first five mount attempts - tell them to switch to CIFS.

    Come December we'll mark it BROKEN and see what happens.

    [olecom@flower.upol.cz: documentation update]
    Cc: Urban Widmark
    Acked-by: Steven French
    Signed-off-by: Oleg Verych
    Cc: Jeff Layton
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply.

    Signed-off-by: Dominique Quatravaux
    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dominique Quatravaux
     
  • The patch supports legacy (32-bit) capability userspace, and where possible
    translates 32-bit capabilities to/from userspace and the VFS to 64-bit
    kernel space capabilities. If a capability set cannot be compressed into
    32-bits for consumption by user space, the system call fails, with -ERANGE.

    FWIW libcap-2.00 supports this change (and earlier capability formats)

    http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/

    [akpm@linux-foundation.org: coding-syle fixes]
    [akpm@linux-foundation.org: use get_task_comm()]
    [ezk@cs.sunysb.edu: build fix]
    [akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
    [akpm@linux-foundation.org: unused var]
    [serue@us.ibm.com: export __cap_ symbols]
    Signed-off-by: Andrew G. Morgan
    Cc: Stephen Smalley
    Acked-by: Serge Hallyn
    Cc: Chris Wright
    Cc: James Morris
    Cc: Casey Schaufler
    Signed-off-by: Erez Zadok
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morgan
     
  • Originally vfs_getxattr would pull the security xattr variable using
    the inode getxattr handle and then proceed to clobber it with a subsequent call
    to the LSM.

    This patch reorders the two operations such that when the xattr requested is
    in the security namespace it first attempts to grab the value from the LSM
    directly.

    If it fails to obtain the value because there is no module present or the
    module does not support the operation it will fall back to using the inode
    getxattr operation.

    In the event that both are inaccessible it returns EOPNOTSUPP.

    Signed-off-by: David P. Quigley
    Cc: Stephen Smalley
    Cc: Chris Wright
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Casey Schaufler
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David P. Quigley
     
  • This patch modifies the interface to inode_getsecurity to have the function
    return a buffer containing the security blob and its length via parameters
    instead of relying on the calling function to give it an appropriately sized
    buffer.

    Security blobs obtained with this function should be freed using the
    release_secctx LSM hook. This alleviates the problem of the caller having to
    guess a length and preallocate a buffer for this function allowing it to be
    used elsewhere for Labeled NFS.

    The patch also removed the unused err parameter. The conversion is similar to
    the one performed by Al Viro for the security_getprocattr hook.

    Signed-off-by: David P. Quigley
    Cc: Stephen Smalley
    Cc: Chris Wright
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Casey Schaufler
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David P. Quigley
     
  • After making dirty a 100M file, the normal behavior is to start the
    writeback for all data after 30s delays. But sometimes the following
    happens instead:

    - after 30s: ~4M
    - after 5s: ~4M
    - after 5s: all remaining 92M

    Some analyze shows that the internal io dispatch queues goes like this:

    s_io s_more_io
    -------------------------
    1) 100M,1K 0
    2) 1K 96M
    3) 0 96M
    1) initial state with a 100M file and a 1K file

    2) 4M written, nr_to_write 0, no more writes(BUG)

    nr_to_write > 0 in (3) fools the upper layer to think that data have all
    been written out. The big dirty file is actually still sitting in
    s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io
    becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
    may starve newly expired inodes in s_dirty. It is also not an option to
    draw inodes from both s_more_io and s_dirty, an let the loop go on: this
    might lead to live locks, and might also starve other superblocks in sync
    time(well kupdate may still starve some superblocks, that's another bug).

    We have to return when a full scan of s_io completes. So nr_to_write > 0
    does not necessarily mean that "all data are written". This patch
    introduces a flag writeback_control.more_io to indicate that more io should
    be done. With it the big dirty file no longer has to wait for the next
    kupdate invokation 5s later.

    In sync_sb_inodes() we only set more_io on super_blocks we actually
    visited. This avoids the interaction between two pdflush deamons.

    Also in __sync_single_inode() we don't blindly keep requeuing the io if the
    filesystem cannot progress. Failing to do so may lead to 100% iowait.

    Tested-by: Mike Snitzer
    Signed-off-by: Fengguang Wu
    Cc: Michael Rubin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Since I_SYNC was split out from I_LOCK, the concern in commit
    4b89eed93e0fa40a63e3d7b1796ec1337ea7a3aa ("Write back inode data pages
    even when the inode itself is locked") is not longer valid.

    We should revert to the original behavior: in __writeback_single_inode(),
    when we find an I_SYNC-ed inode and we're not doing a data-integrity sync,
    skip writing entirely. Otherwise, we are double calling do_writepages()

    Signed-off-by: Qi Yong
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Joern Engel
    Cc: WU Fengguang
    Cc: Michael Rubin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qi Yong
     
  • This patch fixes a sles9 system hang in start_this_handle from a customer
    with some heavy workload where all tasks are waiting on kjournald to commit
    the transaction, but kjournald waits on t_updates to go down to zero (it
    never does).

    This was reported as a lowmem shortage deadlock but when checking the debug
    data I noticed the VM wasn't under pressure at all (well it was really
    under vm pressure, because lots of tasks hanged in the VM prune_dcache
    methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS
    mode, the holder of the journal handle should have if this was a vm issue
    in the first place).

    No task was apparently holding the leftover handle in the committing
    transaction, so I deduced t_updates was stuck to 1 because a journal_stop
    was never run by some path (this turned out to be correct). With a debug
    patch adding proper reverse links and stack trace logging in ext3 deployed
    in production, I found journal_stop is never run because
    mark_inode_dirty_sync is called inside release_task called by do_exit.
    (that was quite fun because I would have never thought about this
    subtleness, I thought a regular path in ext3 had a bug and it forgot to
    call journal_stop)

    do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
    come back to run journal_stop)

    The reason is that shrink_dcache_parent is racy by design (feature not
    a bug) and it can do blocking I/O in some case, but the point is that
    calling shrink_dcache_parent at the last stage of do_exit isn't safe
    for self-reaping tasks.

    I guess the memory pressure of the unbalanced highmem system allowed
    to trigger this more easily.

    Now mainline doesn't have this line in iput (like sles9 has):

    if (inode->i_state & I_DIRTY_DELAYED)
    mark_inode_dirty_sync(inode);

    so it will probably not crash with ext3, but for example ext2 implements an
    I/O-blocking ext2_put_inode that will lead to similar screwups with
    ext2_free_blocks never coming back and it's definitely wrong to call
    blocking-IO paths inside do_exit. So this should fix a subtle bug in
    mainline too (not verified in practice though). The equivalent fix for
    ext3 is also not verified yet to fix the problem in sles9 but I don't have
    doubt it will (it usually takes days to crash, so it'll take weeks to be
    sure).

    An alternate fix would be to offload that work to a kernel thread, but I
    don't think a reschedule for this is worth it, the vm should be able to
    collect those entries for the synchronous release_task.

    Signed-off-by: Andrea Arcangeli
    Cc: Jan Kara
    Cc: Ingo Molnar
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Make /proc/ page monitoring configurable

    This puts the following files under an embedded config option:

    /proc/pid/clear_refs
    /proc/pid/smaps
    /proc/pid/pagemap
    /proc/kpagecount
    /proc/kpageflags

    [akpm@linux-foundation.org: Kconfig fix]
    Signed-off-by: Matt Mackall
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This makes a subset of physical page flags available to userspace. Together
    with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors.

    Exported flags are decoupled from the kernel's internal flags. This
    allows us to reorder flag bits, and synthesize any bits that get
    redefined in terms of other bits.

    [akpm@linux-foundation.org: remove unneeded access_ok()]
    [akpm@linux-foundation.org: s/0/NULL/]
    Signed-off-by: Matt Mackall
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This makes physical page map counts available to userspace. Together
    with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to
    monitor memory usage on a per-page basis.

    [akpm@linux-foundation.org: remove unneeded access_ok()]
    [bunk@stusta.de: make struct proc_kpagemap static]
    Signed-off-by: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Cc: David Rientjes
    Signed-off-by: Adrian Bunk
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This interface provides a mapping for each page in an address space to its
    physical page frame number, allowing precise determination of what pages are
    mapped and what pages are shared between processes.

    New in this version:

    - headers gone again (as recommended by Dave Hansen and Alan Cox)
    - 64-bit entries (as per discussion with Andi Kleen)
    - swap pte information exported (from Dave Hansen)
    - page walker callback for holes (from Dave Hansen)
    - direct put_user I/O (as suggested by Rusty Russell)

    This patch folds in cleanups and swap PTE support from Dave Hansen
    .

    Signed-off-by: Matt Mackall
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Reorder source so that all the code and data for each interface is together.

    Signed-off-by: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Cc: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This puts all the clear_refs code where it belongs and probably lets things
    compile on MMU-less systems as well.

    Signed-off-by: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Cc: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This pulls the shared map display code out of show_map and puts it in
    show_smap where it belongs.

    Signed-off-by: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Acked-by: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Use the generic pagewalker for smaps and clear_refs

    Signed-off-by: Matt Mackall
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • The "proportional set size" (PSS) of a process is the count of pages it has
    in memory, where each page is divided by the number of processes sharing
    it. So if a process has 1000 pages all to itself, and 1000 shared with one
    other process, its PSS will be 1500.

    - lwn.net: "ELC: How much memory are applications really using?"

    The PSS proposed by Matt Mackall is a very nice metic for measuring an
    process's memory footprint. So collect and export it via
    /proc//smaps.

    Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the
    job. They are comprehensive tools. But for PSS, let's do it in the simple
    way.

    Cc: John Berthels
    Cc: Bernardo Innocenti
    Cc: Padraig Brady
    Cc: Denys Vlasenko
    Cc: Balbir Singh
    Signed-off-by: Matt Mackall
    Signed-off-by: Fengguang Wu
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Allow sticky directory mount option for hugetlbfs. This allows admin
    to create a shared hugetlbfs mount point for multiple users, while
    prevent accidental file deletion that users may step on each other.
    It is similiar to default tmpfs mount option, or typical option used
    on /tmp.

    Signed-off-by: Ken Chen
    Cc: Badari Pulavarty
    Cc: Adam Litke
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     
  • The constructor for buffer_head slabs was removed recently. We need the
    constructor back in slab defrag in order to insure that slab objects always
    have a definite state even before we allocated them.

    I think we mistakenly merged the removal of the constuctor into a cleanup
    patch. You (ie: akpm) had a test that showed that the removal of the
    constructor led to a small regression. The prior state makes things easier
    for slab defrag.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Checking if an address is a vmalloc address is done in a couple of places.
    Define a common version in mm.h and replace the other checks.

    Again the include structures suck. The definition of VMALLOC_START and
    VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
    there.

    Signed-off-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • As Roland pointed out, we have the very old problem with exec. de_thread()
    sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
    clears signal->flags. All signals (even fatal ones) sent in this window
    (which is not too small) will be lost.

    With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(),
    the new helper, should be used to detect exit_group() or exec() in progress.
    It can have more users, but this patch does only strictly necessary changes.

    Signed-off-by: Oleg Nesterov
    Cc: Davide Libenzi
    Cc: Ingo Molnar
    Cc: Robin Holt
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • It was dumb to make get_task_comm() return void. Change it to return a
    pointer to the resulting output for caller convenience.

    Cc: Ulrich Drepper
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:

    > I remember I talked with Arjan about this time ago. Basically, since 1)
    > you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
    > are used, you can see a wake_up() from inside another wake_up(), but they
    > will never refer to the same lock instance.
    > Think about:
    >
    > dfd = socket(...);
    > efd1 = epoll_create();
    > efd2 = epoll_create();
    > epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
    > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
    >
    > When a packet arrives to the device underneath "dfd", the net code will
    > issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
    > callback wakeup entry on that queue, and the wake_up() performed by the
    > "dfd" net code will end up in ep_poll_callback(). At this point epoll
    > (efd1) notices that it may have some event ready, so it needs to wake up
    > the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
    > that ends up in another wake_up(), after having checked about the
    > recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
    > avoid stack blasting. Never hit the same queue, to avoid loops like:
    >
    > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
    > epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
    > epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
    > epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
    >
    > The code "if (tncur->wq == wq || ..." prevents re-entering the same
    > queue/lock.

    Since the epoll code is very careful to not nest same instance locks
    allow the recursion.

    Signed-off-by: Peter Zijlstra
    Tested-by: Stefan Richter
    Acked-by: Davide Libenzi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

04 Feb, 2008

7 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
    Jesper Juhl is the new trivial patches maintainer
    Documentation: mention email-clients.txt in SubmittingPatches
    fs/binfmt_elf.c: spello fix
    do_invalidatepage() comment typo fix
    Documentation/filesystems/porting fixes
    typo fixes in net/core/net_namespace.c
    typo fix in net/rfkill/rfkill.c
    typo fixes in net/sctp/sm_statefuns.c
    lib/: Spelling fixes
    kernel/: Spelling fixes
    include/scsi/: Spelling fixes
    include/linux/: Spelling fixes
    include/asm-m68knommu/: Spelling fixes
    include/asm-frv/: Spelling fixes
    fs/: Spelling fixes
    drivers/watchdog/: Spelling fixes
    drivers/video/: Spelling fixes
    drivers/ssb/: Spelling fixes
    drivers/serial/: Spelling fixes
    drivers/scsi/: Spelling fixes
    ...

    Linus Torvalds
     
  • * 'locks' of git://linux-nfs.org/~bfields/linux:
    pid-namespaces-vs-locks-interaction
    file locks: Use wait_event_interruptible_timeout()
    locks: clarify posix_locks_deadlock

    Linus Torvalds
     
  • Drivers that register a ->fault handler, but do not range-check the
    offset argument, must set VM_DONTEXPAND in the vm_flags in order to
    prevent an expanding mremap from overflowing the resource.

    I've audited the tree and attempted to fix these problems (usually by
    adding VM_DONTEXPAND where it is not obvious).

    Signed-off-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • fcntl(F_GETLK,..) can return pid of process for not current pid namespace
    (if process is belonged to the several namespaces). It is true also for
    pids in /proc/locks. So correct behavior is saving pointer to the struct
    pid of the process lock owner.

    Signed-off-by: Vitaliy Gusev
    Acked-by: Serge Hallyn
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: J. Bruce Fields

    Vitaliy Gusev
     
  • interruptible_sleep_on_locked() is just an open-coded
    wait_event_interruptible_timeout(), with the one difference that
    interruptible_sleep_on_locked() doesn't bother to check the condition on
    which it is waiting, depending instead on the BKL to avoid the case
    where it blocks after the wakeup has already been called.

    locks_block_on_timeout() is only used in one place, so it's actually
    simpler to inline it into its caller.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: J. Bruce Fields

    Matthew Wilcox
     
  • For such a short function (with such a long comment),
    posix_locks_deadlock() seems to cause a lot of confusion. Attempt to
    make it a bit clearer:

    - Remove the initial posix_same_owner() check, which can never
    pass (since this is only called in the case that block_fl and
    caller_fl conflict)
    - Use an explicit loop (and a helper function) instead of a goto.
    - Rewrite the comment, attempting a clearer explanation, and
    removing some uninteresting historical detail.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • s/litle/little

    Signed-off-by: Ohad Ben-Cohen
    Signed-off-by: Adrian Bunk

    Ohad Ben-Cohen
     

03 Feb, 2008

4 commits


02 Feb, 2008

4 commits

  • * 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits)
    SUNRPC: RPC program information is stored in unsigned integers
    SUNRPC: Move exported symbol definitions after function declaration part 2
    NLM: tear down RPC clients in nlm_shutdown_hosts
    SUNRPC: spin svc_rqst initialization to its own function
    nfsd: more careful input validation in nfsctl write methods
    lockd: minor log message fix
    knfsd: don't bother mapping putrootfh enoent to eperm
    rdma: makefile
    rdma: ONCRPC RDMA protocol marshalling
    rdma: SVCRDMA sendto
    rdma: SVCRDMA recvfrom
    rdma: SVCRDMA Core Transport Services
    rdma: SVCRDMA Transport Module
    rdma: SVCRMDA Header File
    svc: Add svc_xprt_names service to replace svc_sock_names
    knfsd: Support adding transports by writing portlist file
    svc: Add svc API that queries for a transport instance
    svc: Add /proc/sys/sunrpc/transport files
    svc: Add transport hdr size for defer/revisit
    svc: Move the xprt independent code to the svc_xprt.c file
    ...

    Linus Torvalds
     
  • It's possible for a RPC to outlive the lockd daemon that created it, so
    we need to make sure that all RPC's are killed when lockd is coming
    down. When nlm_shutdown_hosts is called, kill off all RPC tasks
    associated with the host. Since we need to wait until they have all gone
    away, we might as well just shut down the RPC client altogether.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Neil Brown points out that we're checking buf[size-1] in a couple places
    without first checking whether size is zero.

    Actually, given the implementation of simple_transaction_get(), buf[-1]
    is zero, so in both of these cases the subsequent check of the value of
    buf[size-1] will catch this case.

    But it seems fragile to depend on that, so add explicit checks for this
    case.

    Signed-off-by: J. Bruce Fields
    Acked-by: NeilBrown

    J. Bruce Fields
     
  • Wendy Cheng noticed that function name doesn't agree here.

    Signed-off-by: J. Bruce Fields
    Cc: Wendy Cheng

    J. Bruce Fields