01 Mar, 2012

5 commits

  • commit abe9a6d57b4544ac208401f9c0a4262814db2be4 upstream.

    server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
    non-zero

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Weston Andros Adamson
     
  • commit b9f9a03150969e4bd9967c20bce67c4de769058f upstream.

    To ensure that we don't just reuse the bad delegation when we attempt to
    recover the nfs4_state that received the bad stateid error.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 331818f1c468a24e581aedcbe52af799366a9dfe upstream.

    Commit bf118a342f10dafe44b14451a1392c3254629a1f (NFSv4: include bitmap
    in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
    where we may need to decode multi-page data. However it fails to take
    into account the fact that the variable may be NULL (for the case where
    we're not doing multi-page decode), and it also attaches it to the
    encoding xdr_stream rather than the decoding one.

    The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
    call to page_address() with a NULL page pointer.

    Signed-off-by: Trond Myklebust
    Cc: Andy Adamson
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit e188dc02d3a9c911be56eca5aa114fe7e9822d53 upstream.

    d_inode_lookup() leaks a dentry reference on IS_DEADDIR().

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 545d680938be1e86a6c5250701ce9abaf360c495 upstream.

    After passing through a ->setxattr() call, eCryptfs needs to copy the
    inode attributes from the lower inode to the eCryptfs inode, as they
    may have changed in the lower filesystem's ->setxattr() path.

    One example is if an extended attribute containing a POSIX Access
    Control List is being set. The new ACL may cause the lower filesystem to
    modify the mode of the lower inode and the eCryptfs inode would need to
    be updated to reflect the new mode.

    https://launchpad.net/bugs/926292

    Signed-off-by: Tyler Hicks
    Reported-by: Sebastien Bacher
    Cc: John Johansen
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     

21 Feb, 2012

3 commits

  • commit ff4fa4a25a33f92b5653bb43add0c63bea98d464 upstream.

    standard_receive3 will check the validity of the response from the
    server (via checkSMB). It'll pass the result of that check to handle_mid
    which will dequeue it and mark it with a status of
    MID_RESPONSE_MALFORMED if checkSMB returned an error. At that point,
    standard_receive3 will also return an error, which will make the
    demultiplex thread skip doing the callback for the mid.

    This is wrong -- if we were able to identify the request and the
    response is marked malformed, then we want the demultiplex thread to do
    the callback. Fix this by making standard_receive3 return 0 in this
    situation.

    Reported-and-Tested-by: Mark Moseley
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 8b0192a5f478da1c1ae906bf3ffff53f26204f56 upstream.

    Currently, it's always set to 0 (no oplock requested).

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 15eb77a07c714ac80201abd0a9568888bcee6276 upstream.

    bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
    tearing down the original bdi. Fix trace_writeback_single_inode to
    use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
    teared down bdi.

    Reported-by: Rabin Vincent
    Tested-by: Rabin Vincent
    Signed-off-by: Wu Fengguang
    Signed-off-by: Greg Kroah-Hartman

    Wu Fengguang
     

14 Feb, 2012

6 commits

  • commit de47a4176c532ef5961b8a46a2d541a3517412d3 upstream.

    For null user mounts, do not invoke string length function
    during session setup.

    Reported-and-Tested-by: Chris Clayton
    Acked-by: Jeff Layton
    Signed-off-by: Shirish Pargaonkar
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Shirish Pargaonkar
     
  • commit 684a3ff7e69acc7c678d1a1394fe9e757993fd34 upstream.

    ecryptfs_write() can enter an infinite loop when truncating a file to a
    size larger than 4G. This only happens on architectures where size_t is
    represented by 32 bits.

    This was caused by a size_t overflow due to it incorrectly being used to
    store the result of a calculation which uses potentially large values of
    type loff_t.

    [tyhicks@canonical.com: rewrite subject and commit message]
    Signed-off-by: Li Wang
    Signed-off-by: Yunchuan Wen
    Reviewed-by: Cong Wang
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Li Wang
     
  • commit 853a0c25baf96b028de1654bea1e0c8857eadf3d upstream.

    When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
    This then results in an anoying warning from mark_buffer_dirty() when we
    write the buffer again. So just set uptodate flag unconditionally.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jan Kara
    Cc: Dave Jones
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 6d08f2c7139790c268820a2e590795cb8333181a upstream.

    Once /proc/pid/mem is opened, the memory can't be released until
    mem_release() even if its owner exits.

    Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
    pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
    before access_remote_vm(), this verifies that this mm is still alive.

    I am not sure what should mem_rw() return if atomic_inc_not_zero()
    fails. With this patch it returns zero to match the "mm == NULL" case,
    may be it should return -EINVAL like it did before e268337d.

    Perhaps it makes sense to add the additional fatal_signal_pending()
    check into the main loop, to ensure we do not hold this memory if
    the target task was oom-killed.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 572d34b946bae070debd42db1143034d9687e13f upstream.

    No functional changes, cleanup and preparation.

    mem_read() and mem_write() are very similar. Move this code into the
    new common helper, mem_rw(), which takes the additional "int write"
    argument.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 71879d3cb3dd8f2dfdefb252775c1b3ea04a3dd4 upstream.

    mem_release() can hit mm == NULL, add the necessary check.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     

04 Feb, 2012

8 commits

  • commit ce597919361dcec97341151690e780eade2a9cf4 upstream.

    Recently an OOPS was observed from the usb serial io_ti driver when it tried to remove
    sysfs directories. Upon investigation it turns out this driver was always buggy
    and that a recent sysfs change had stopped guarding itself against removing attributes
    from sysfs directories that had already been removed. :(

    Historically we have been silent about attempting to files from nonexistent sysfs
    directories and have politely returned error codes. That has resulted in people writing
    broken code that ignores the error codes.

    Issue a kernel WARNING and a stack backtrace to make it clear in no uncertain
    terms that abusing sysfs is not ok, and the callers need to fix their code.

    This change transforms the io_ti OOPS into a more comprehensible error message
    and stack backtrace.

    Signed-off-by: Eric W. Biederman
    Reported-by: Wolfgang Frisch
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 353b67d8ced4dc53281c88150ad295e24bc4b4c5 upstream.

    When we reach cleanup_journal_tail(), there is no guarantee that
    checkpointed buffers are on a stable storage - especially if buffers were
    written out by log_do_checkpoint(), they are likely to be only in disk's
    caches. Thus when we update journal superblock, effectively removing old
    transaction from journal, this write of superblock can get to stable storage
    before those checkpointed buffers which can result in filesystem corruption
    after a crash.

    A similar problem can happen if we replay the journal and wipe it before
    flushing disk's caches.

    Thus we must unconditionally issue a cache flush before we update journal
    superblock in these cases. The fix is slightly complicated by the fact that we
    have to get log tail before we issue cache flush but we can store it in the
    journal superblock only after the cache flush. Otherwise we risk races where
    new tail is written before appropriate cache flush is finished.

    I managed to reproduce the corruption using somewhat tweaked Chris Mason's
    barrier-test scheduler. Also this should fix occasional reports of 'Bit already
    freed' filesystem errors which are totally unreproducible but inspection of
    several fs images I've gathered over time points to a problem like this.

    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream.

    Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
    symplink and bailed out. Fix it by jumping to 'out' instead of doing return.

    CC: Carlos Maiolino
    Signed-off-by: Jan Kara
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 58ded24f0fcb85bddb665baba75892f6ad0f4b8a upstream.

    If pages passed to the eCryptfs extent-based crypto functions are not
    mapped and the module parameter ecryptfs_verbosity=1 was specified at
    loading time, a NULL pointer dereference will occur.

    Note that this wouldn't happen on a production system, as you wouldn't
    pass ecryptfs_verbosity=1 on a production system. It leaks private
    information to the system logs and is for debugging only.

    The debugging info printed in these messages is no longer very useful
    and rather than doing a kmap() in these debugging paths, it will be
    better to simply remove the debugging paths completely.

    https://launchpad.net/bugs/913651

    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     
  • commit a261a03904849c3df50bd0300efb7fb3f865137d upstream.

    Most filesystems call inode_change_ok() very early in ->setattr(), but
    eCryptfs didn't call it at all. It allowed the lower filesystem to make
    the call in its ->setattr() function. Then, eCryptfs would copy the
    appropriate inode attributes from the lower inode to the eCryptfs inode.

    This patch changes that and actually calls inode_change_ok() on the
    eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
    would happen earlier in ecryptfs_setattr(), but there are some possible
    inode initialization steps that must happen first.

    Since the call was already being made on the lower inode, the change in
    functionality should be minimal, except for the case of a file extending
    truncate call. In that case, inode_newsize_ok() was never being
    called on the eCryptfs inode. Rather than inode_newsize_ok() catching
    maximum file size errors early on, eCryptfs would encrypt zeroed pages
    and write them to the lower filesystem until the lower filesystem's
    write path caught the error in generic_write_checks(). This patch
    introduces a new function, called ecryptfs_inode_newsize_ok(), which
    checks if the new lower file size is within the appropriate limits when
    the truncate operation will be growing the lower file.

    In summary this change prevents eCryptfs truncate operations (and the
    resulting page encryptions), which would exceed the lower filesystem
    limits or FSIZE rlimits, from ever starting.

    Signed-off-by: Tyler Hicks
    Reviewed-by: Li Wang
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     
  • commit 5e6f0d769017cc49207ef56996e42363ec26c1f0 upstream.

    ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
    page, zeroes out the appropriate portions, and then encrypts the page
    before writing it to the lower filesystem. It was unkillable and due to
    the lack of sparse file support could result in tying up a large portion
    of system resources, while encrypting pages of zeros, with no way for
    the truncate operation to be stopped from userspace.

    This patch adds the ability for ecryptfs_write() to detect a pending
    fatal signal and return as gracefully as possible. The intent is to
    leave the lower file in a useable state, while still allowing a user to
    break out of the encryption loop. If a pending fatal signal is detected,
    the eCryptfs inode size is updated to reflect the modified inode size
    and then -EINTR is returned.

    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     
  • commit 30373dc0c87ffef68d5628e77d56ffb1fa22e1ee upstream.

    Print inode on metadata read failure. The only real
    way of dealing with metadata read failures is to delete
    the underlying file system file. Having the inode
    allows one to 'find . -inum INODE`.

    [tyhicks@canonical.com: Removed some minor not-for-stable parts]
    Signed-off-by: Tim Gardner
    Reviewed-by: Kees Cook
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Tim Gardner
     
  • commit db10e556518eb9d21ee92ff944530d84349684f4 upstream.

    A malicious count value specified when writing to /dev/ecryptfs may
    result in a a very large kernel memory allocation.

    This patch peeks at the specified packet payload size, adds that to the
    size of the packet headers and compares the result with the write count
    value. The resulting maximum memory allocation size is approximately 532
    bytes.

    Signed-off-by: Tyler Hicks
    Reported-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     

26 Jan, 2012

18 commits

  • commit 85e72aa5384b1a614563ad63257ded0e91d1a620 upstream.

    /proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
    pages and corresponding page table entries of the task with PID pid, which
    includes any special mappings inserted into the page tables in order to
    provide things like vDSOs and user helper functions.

    On ARM this causes a problem because the vectors page is mapped as a
    global mapping and since ec706dab ("ARM: add a vma entry for the user
    accessible vector page"), a VMA is also inserted into each task for this
    page to aid unwinding through signals and syscall restarts. Since the
    vectors page is required for handling faults, clearing the YOUNG bit (and
    subsequently writing a faulting pte) means that we lose the vectors page
    *globally* and cannot fault it back in. This results in a system deadlock
    on the next exception.

    To see this problem in action, just run:

    $ echo 1 > /proc/self/clear_refs

    on an ARM platform (as any user) and watch your system hang. I think this
    has been the case since 2.6.37

    This patch avoids clearing the aforementioned bits for reserved pages,
    therefore leaving the vectors page intact on ARM. Since reserved pages
    are not candidates for swap, this change should not have any impact on the
    usefulness of clear_refs.

    Signed-off-by: Will Deacon
    Reported-by: Moussa Ba
    Acked-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Russell King
    Acked-by: Nicolas Pitre
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • commit ce91acb3acae26f4163c5a6f1f695d1a1e8d9009 upstream.

    We've had some reports of servers (namely, the Solaris in-kernel CIFS
    server) that don't deal properly with writes that are "too large" even
    though they set CAP_LARGE_WRITE_ANDX. Change the default to better
    mirror what windows clients do.

    Cc: Pavel Shilovsky
    Reported-by: Nick Davis
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream

    When finding the longest extent in an AG, we read the value directly
    out of the AGF buffer without endian conversion. This will give an
    incorrect length, resulting in FITRIM operations potentially not
    trimming everything that it should.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers
    Signed-off-by: Greg Kroah-Hartman

    Dave Chinner
     
  • commit e268337dfe26dfc7efd422a804dbb27977a3cccc upstream.

    Jüri Aedla reported that the /proc//mem handling really isn't very
    robust, and it also doesn't match the permission checking of any of the
    other related files.

    This changes it to do the permission checks at open time, and instead of
    tracking the process, it tracks the VM at the time of the open. That
    simplifies the code a lot, but does mean that if you hold the file
    descriptor open over an execve(), you'll continue to read from the _old_
    VM.

    That is different from our previous behavior, but much simpler. If
    somebody actually finds a load where this matters, we'll need to revert
    this commit.

    I suspect that nobody will ever notice - because the process mapping
    addresses will also have changed as part of the execve. So you cannot
    actually usefully access the fd across a VM change simply because all
    the offsets for IO would have changed too.

    Reported-by: Jüri Aedla
    Cc: Al Viro
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit c3e0ef9a298e028a82ada28101ccd5cf64d209ee upstream.

    For 32-bit architectures using standard jiffies the idletime calculation
    in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
    of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
    Switch to 64-bit calculations.

    Cc: Michael Abbott
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Greg Kroah-Hartman

    Martin Schwidefsky
     
  • commit 74a6eeb44ca6174d9cc93b9b8b4d58211c57bc80 upstream.

    One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
    bio_alloc will fail when there are many pages in one read/write_pagelist.

    Signed-off-by: Peng Tao
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao
     
  • commit 93a3844ee0f843b05a1df4b52e1a19ff26b98d24 upstream.

    bl_free_block_dev() may sleep. We can not call it with spinlock held.
    Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.

    Signed-off-by: Peng Tao
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao
     
  • commit 39e567ae36fe03c2b446e1b83ee3d39bea08f90b upstream.

    When calling _add_entry, we should take the im_lock to protect
    agains other modifiers.

    Signed-off-by: Peng Tao
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao
     
  • commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream.

    Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
    cause shrink_dcache_parent() to loop forever.

    Here's what appears to happen:

    1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

    2 - CPU1: select_parent(P) locks P->d_lock

    3 - CPU0: shrink_dentry_list() locks C->d_lock
    dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

    4 - CPU1: select_parent(P) locks C->d_lock,
    moves C from dispose list being processed on CPU0 to the new
    dispose list, returns 1

    5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

    6 - Goto 2 with CPU0 and CPU1 switched

    Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
    it found a new one, causing shrink_dentry_list() to think it's making progress
    and loop over and over.

    One way to trigger this is to make udev calls stat() on the sysfs file while it
    is going away.

    Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:

    ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

    Then execute the following loop:

    while true; do
    echo -bond0 > /sys/class/net/bonding_masters
    echo +bond0 > /sys/class/net/bonding_masters
    echo -bond1 > /sys/class/net/bonding_masters
    echo +bond1 > /sys/class/net/bonding_masters
    done

    One fix would be to check all callers and prevent concurrent calls to
    shrink_dcache_parent(). But I think a better solution is to stop the
    stealing behavior.

    This patch adds a new dentry flag that is set when the dentry is added to the
    dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
    new reference just before being pruned.

    If the dentry has this flag, select_parent() will skip it and let
    shrink_dentry_list() retry pruning it. With select_parent() skipping those
    dentries there will not be the appearance of progress (new dentries found) when
    there is none, hence shrink_dcache_parent() will not loop forever.

    Set the flag is also set in prune_dcache_sb() for consistency as suggested by
    Linus.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit b48f03b319ba78f3abf9a7044d1f436d8d90f4f9 upstream.

    select_parent currently abuses the dentry cache LRU to provide
    cleanup features for child dentries that need to be freed. It moves
    them to the tail of the LRU, then tells shrink_dcache_parent() to
    calls __shrink_dcache_sb to unconditionally move them to a dispose
    list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
    relock the dentries to move them off the LRU onto the dispose list,
    but otherwise does not touch the dentries that select_parent() moved
    to the tail of the LRU. It then passses the dispose list to
    shrink_dentry_list() which tries to free the dentries.

    IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
    exactly the same list of dentries for disposal directly in
    select_parent() and call shrink_dentry_list() instead of calling
    __shrink_dcache_sb() to do that. This means that we avoid long holds
    on the lru lock walking the LRU moving dentries to the dispose list
    We also avoid the need to relock each dentry just to move it off the
    LRU, reducing the numebr of times we lock each dentry to dispose of
    them in shrink_dcache_parent() from 3 to 2 times.

    Further, we remove one of the two callers of __shrink_dcache_sb().
    This also means that __shrink_dcache_sb can be moved into back into
    prune_dcache_sb() and we no longer have to handle referenced
    dentries conditionally, simplifying the code.

    Signed-off-by: Dave Chinner
    Signed-off-by: Linus Torvalds
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Dave Chinner
     
  • commit fed474857efbed79cd390d0aee224231ca718f63 upstream.

    Removing the parent of a watched file results in "kernel BUG at
    fs/notify/mark.c:139".

    To reproduce

    add "-w /tmp/audit/dir/watched_file" to audit.rules
    rm -rf /tmp/audit/dir

    This is caused by fsnotify_destroy_mark() being called without an
    extra reference taken by the caller.

    Reported by Francesco Cosoleto here:

    https://bugzilla.novell.com/show_bug.cgi?id=689860

    Fix by removing the BUG_ON and adding a comment about not accessing mark after
    the iput.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit b2ea70afade7080360ac55c4e64ff7a5fafdb67b upstream.

    expkey_parse() oopses when handling a 0 length export. This is easily
    triggerable from usermode by writing 0 bytes into
    '/proc/[proc id]/net/rpc/nfsd.fh/channel'.

    Below is the log:

    [ 1402.286893] BUG: unable to handle kernel paging request at ffff880077c49fff
    [ 1402.287632] IP: [] expkey_parse+0x28/0x2e1
    [ 1402.287632] PGD 2206063 PUD 1fdfd067 PMD 1ffbc067 PTE 8000000077c49160
    [ 1402.287632] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 1402.287632] CPU 1
    [ 1402.287632] Pid: 20198, comm: trinity Not tainted 3.2.0-rc2-sasha-00058-gc65cd37 #6
    [ 1402.287632] RIP: 0010:[] [] expkey_parse+0x28/0x2e1
    [ 1402.287632] RSP: 0018:ffff880077f0fd68 EFLAGS: 00010292
    [ 1402.287632] RAX: ffff880077c49fff RBX: 00000000ffffffea RCX: 0000000001043400
    [ 1402.287632] RDX: 0000000000000000 RSI: ffff880077c4a000 RDI: ffffffff82283de0
    [ 1402.287632] RBP: ffff880077f0fe18 R08: 0000000000000001 R09: ffff880000000000
    [ 1402.287632] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880077c4a000
    [ 1402.287632] R13: ffffffff82283de0 R14: 0000000001043400 R15: ffffffff82283de0
    [ 1402.287632] FS: 00007f25fec3f700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
    [ 1402.287632] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 1402.287632] CR2: ffff880077c49fff CR3: 0000000077e1d000 CR4: 00000000000406e0
    [ 1402.287632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1402.287632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 1402.287632] Process trinity (pid: 20198, threadinfo ffff880077f0e000, task ffff880077db17b0)
    [ 1402.287632] Stack:
    [ 1402.287632] ffff880077db17b0 ffff880077c4a000 ffff880077f0fdb8 ffffffff810b411e
    [ 1402.287632] ffff880000000000 ffff880077db17b0 ffff880077c4a000 ffffffff82283de0
    [ 1402.287632] 0000000001043400 ffffffff82283de0 ffff880077f0fde8 ffffffff81111f63
    [ 1402.287632] Call Trace:
    [ 1402.287632] [] ? lock_release+0x1af/0x1bc
    [ 1402.287632] [] ? might_fault+0x97/0x9e
    [ 1402.287632] [] ? might_fault+0x4e/0x9e
    [ 1402.287632] [] cache_do_downcall+0x3e/0x4f
    [ 1402.287632] [] cache_write.clone.16+0xbb/0x130
    [ 1402.287632] [] ? cache_write_pipefs+0x1a/0x1a
    [ 1402.287632] [] cache_write_procfs+0x19/0x1b
    [ 1402.287632] [] proc_reg_write+0x8e/0xad
    [ 1402.287632] [] vfs_write+0xaa/0xfd
    [ 1402.287632] [] ? fget_light+0x35/0x9e
    [ 1402.287632] [] sys_write+0x48/0x6f
    [ 1402.287632] [] system_call_fastpath+0x16/0x1b
    [ 1402.287632] Code: c0 c9 c3 55 48 63 d2 48 89 e5 48 8d 44 32 ff 41 57 41 56 41 55 41 54 53 bb ea ff ff ff 48 81 ec 88 00 00 00 48 89 b5 58 ff ff ff
    [ 1402.287632] 38 0a 0f 85 89 02 00 00 c6 00 00 48 8b 3d 44 4a e5 01 48 85
    [ 1402.287632] RIP [] expkey_parse+0x28/0x2e1
    [ 1402.287632] RSP
    [ 1402.287632] CR2: ffff880077c49fff
    [ 1402.287632] ---[ end trace 368ef53ff773a5e3 ]---

    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Sasha Levin
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     
  • commit b93d87c19821ba7d3ee11557403d782e541071ad upstream.

    Lockowners are looked up by file as well as by owner, but we were
    forgetting to do a comparison on the file. This could cause an
    incorrect result from lockt.

    (Note looking up the inode from the lockowner is pretty awkward here.
    The data structures need fixing.)

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • commit 69e4747ee9727d660b88d7e1efe0f4afcb35db1b upstream.

    Since commit 080d676de095 ("aio: allocate kiocbs in batches") iocbs are
    allocated in a batch during processing of first iocbs. All iocbs in a
    batch are automatically added to ctx->active_reqs list and accounted in
    ctx->reqs_active.

    If one (not the last one) of iocbs submitted by an user fails, further
    iocbs are not processed, but they are still present in ctx->active_reqs
    and accounted in ctx->reqs_active. This causes process to stuck in a D
    state in wait_for_all_aios() on exit since ctx->reqs_active will never
    go down to zero. Furthermore since kiocb_batch_free() frees iocb
    without removing it from active_reqs list the list become corrupted
    which may cause oops.

    Fix this by removing iocb from ctx->active_reqs and updating
    ctx->reqs_active in kiocb_batch_free().

    Signed-off-by: Gleb Natapov
    Reviewed-by: Jeff Moyer
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Gleb Natapov
     
  • commit 1f5d78dc4823a85f112aaa2d0f17624f8c2a6c52 upstream.

    We switch to dynamic debugging in commit
    56e46742e846e4de167dde0e1e1071ace1c882a5 but did not take into account that
    now we do not control anymore whether a specific message is enabled or not.
    So now we lock the "dbg_lock" and release it in every debugging macro, which
    make them not so light-weight.

    This commit removes the "dbg_lock" protection from the debugging macros to
    fix the issue.

    The downside is that now our DBGKEY() stuff is broken, but this is not
    critical at all and will be fixed later.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Greg Kroah-Hartman

    Artem Bityutskiy
     
  • commit d34315da9146253351146140ea4b277193ee5e5f upstream.

    Patch 56e46742e846e4de167dde0e1e1071ace1c882a5 broke UBIFS debugging messages:
    before that commit when UBIFS debugging was enabled, users saw few useful
    debugging messages after mount. However, that patch turned 'dbg_msg()' into
    'pr_debug()', so to enable the debugging messages users have to enable them
    first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.

    This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
    as it was before the breakage.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Greg Kroah-Hartman

    Artem Bityutskiy
     
  • commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 upstream.

    Setting the security context of a NFSv4 mount via the context= mount
    option is currently broken. The NFSv4 codepath allocates a parsed
    options struct, and then parses the mount options to fill it. It
    eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
    That clobbers the lsm_opts struct that was populated earlier. This bug
    also looks like it causes a small memory leak on each v4 mount where
    context= is used.

    Fix this by moving the initialization of the lsm_opts into
    nfs_alloc_parsed_mount_data. Also, add a destructor for
    nfs_parsed_mount_data to make it easier to free all of the allocations
    hanging off of it, and to ensure that the security_free_mnt_opts is
    called whenever security_init_mnt_opts is.

    I believe this regression was introduced quite some time ago, probably
    by commit c02d7adf.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit bf118a342f10dafe44b14451a1392c3254629a1f upstream.

    The NFSv4 bitmap size is unbounded: a server can return an arbitrary
    sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
    nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
    with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
    xdr length to the (cached) acl page data.

    This is a general solution to commit e5012d1f "NFSv4.1: update
    nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
    when getting ACLs.

    Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
    was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Andy Adamson