07 Jun, 2014

1 commit

  • The age table walker doesn't check non-present hugetlb entry in common
    path, so hugetlb_entry() callbacks must check it. The reason for this
    behavior is that some callers want to handle it in its own way.

    [ I think that reason is bogus, btw - it should just do what the regular
    code does, which is to call the "pte_hole()" function for such hugetlb
    entries - Linus]

    However, some callers don't check it now, which causes unpredictable
    result, for example when we have a race between migrating hugepage and
    reading /proc/pid/numa_maps. This patch fixes it by adding !pte_present
    checks on buggy callbacks.

    This bug exists for years and got visible by introducing hugepage
    migration.

    ChangeLog v2:
    - fix if condition (check !pte_present() instead of pte_present())

    Reported-by: Sasha Levin
    Signed-off-by: Naoya Horiguchi
    Cc: Rik van Riel
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    [ Backported to 3.15. Signed-off-by: Josh Boyer ]
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

03 Jun, 2014

1 commit

  • There is still one residue of sysfs remaining: the sb_magic
    SYSFS_MAGIC. However this should be kernfs user specific,
    so this patch moves it out. Kerrnfs user should specify their
    magic number while mouting.

    Signed-off-by: Jianyu Zhan
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Jianyu Zhan
     

01 Jun, 2014

1 commit

  • lock_parent() very much on purpose does nested locking of dentries, and
    is careful to maintain the right order (lock parent first). But because
    it didn't annotate the nested locking order, lockdep thought it might be
    a deadlock on d_lock, and complained.

    Add the proper annotation for the inner locking of the child dentry to
    make lockdep happy.

    Introduced by commit 046b961b45f9 ("shrink_dentry_list(): take parent's
    ->d_lock earlier").

    Reported-and-tested-by: Josh Boyer
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 May, 2014

3 commits

  • it's 1 in the only remaining caller.

    Signed-off-by: Al Viro

    Al Viro
     
  • We have the same problem with ->d_lock order in the inner loop, where
    we are dropping references to ancestors. Same solution, basically -
    instead of using dentry_kill() we use lock_parent() (introduced in the
    previous commit) to get that lock in a safe way, recheck ->d_count
    (in case if lock_parent() has ended up dropping and retaking ->d_lock
    and somebody managed to grab a reference during that window), trylock
    the inode->i_lock and use __dentry_kill() to do the rest.

    Signed-off-by: Al Viro

    Al Viro
     
  • The cause of livelocks there is that we are taking ->d_lock on
    dentry and its parent in the wrong order, forcing us to use
    trylock on the parent's one. d_walk() takes them in the right
    order, and unfortunately it's not hard to create a situation
    when shrink_dentry_list() can't make progress since trylock
    keeps failing, and shrink_dcache_parent() or check_submounts_and_drop()
    keeps calling d_walk() disrupting the very shrink_dentry_list() it's
    waiting for.

    Solution is straightforward - if that trylock fails, let's unlock
    the dentry itself and take locks in the right order. We need to
    stabilize ->d_parent without holding ->d_lock, but that's doable
    using RCU. And we'd better do that in the very beginning of the
    loop in shrink_dentry_list(), since the checks on refcount, etc.
    would need to be redone anyway.

    That deals with a half of the problem - killing dentries on the
    shrink list itself. Another one (dropping their parents) is
    in the next commit.

    locking parent is interesting - it would be easy to do rcu_read_lock(),
    lock whatever we think is a parent, lock dentry itself and check
    if the parent is still the right one. Except that we need to check
    that *before* locking the dentry, or we are risking taking ->d_lock
    out of order. Fortunately, once the D1 is locked, we can check if
    D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent
    can start or stop pointing to D1 only under D1->d_lock, so taking
    D1->d_lock is enough. In other words, the right solution is
    rcu_read_lock/lock what looks like parent right now/check if it's
    still our parent/rcu_read_unlock/lock the child.

    Signed-off-by: Al Viro

    Al Viro
     

29 May, 2014

2 commits


28 May, 2014

2 commits

  • It can happen only when dentry_kill() is called with unlock_on_failure
    equal to 0 - other callers had dentry pinned until the moment they've
    got ->d_lock and DCACHE_DENTRY_KILLED is set only after lockref_mark_dead().

    IOW, only one of three call sites of dentry_kill() might end up reaching
    that code. Just move it there.

    Signed-off-by: Al Viro

    Al Viro
     
  • Commit 6130f5315ee8 "switch vmsplice_to_user() to copy_page_to_iter()" in
    v3.15-rc1 broke vmsplice(2).

    This patch fixes two bugs:

    - count is not initialized to a proper value, which resulted in no data
    being copied

    - if rw_copy_check_uvector() returns negative then the iov might be leaked.

    Tested OK.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     

26 May, 2014

2 commits

  • Pull AFS fixes and cleanups from David Howells:
    "Here are some patches to the AFS filesystem:

    1) Fix problems in the clean-up parts of the cache manager service
    handler.

    2) Split afs_end_call() introduced in (1) and replace some identical
    code elsewhere with a call to the first half of the split function.

    3) Fix an error introduced in the workqueue PREPARE_WORK() elimination
    commits.

    4) Clean up argument passing to functions called from the workqueue as
    there's now an insulating layer between them and the workqueue.
    This is possible from (3)"

    * 'afs' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    AFS: Pass an afs_call* to call->async_workfn() instead of a work_struct*
    AFS: Fix kafs module unloading
    AFS: Part of afs_end_call() is identical to code elsewhere, so split it
    AFS: Fix cache manager service handlers

    Linus Torvalds
     
  • Pull two nfsd bugfixes from Bruce Fields:
    "Just two bugfixes, one for a merge-window-introduced ACL regression,
    the other for a longer-standing v4 state bug"

    * 'for-3.15' of git://linux-nfs.org/~bfields/linux:
    nfsd4: warn on finding lockowner without stateid's
    nfsd4: remove lockowner when removing lock stateid
    nfsd4: fix corruption on setting an ACL.

    Linus Torvalds
     

24 May, 2014

1 commit

  • In dlm_init, if create dlm_lockname_cache failed in
    dlm_init_master_caches, it will destroy dlm_lockres_cache which created
    before twice. And this will cause system die when loading modules.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

23 May, 2014

3 commits

  • call->async_workfn() can take an afs_call* arg rather than a work_struct* as
    the functions assigned there are now called from afs_async_workfn() which has
    to call container_of() anyway.

    Signed-off-by: David Howells
    Reviewed-by: Nathaniel Wesley Filardo
    Reviewed-by: Tejun Heo

    David Howells
     
  • At present, it is not possible to successfully unload the kafs module if there
    are outstanding async outgoing calls (those made with afs_make_call()). This
    appears to be due to the changes introduced by:

    commit 059499453a9abd1857d442b44da8b4c126dc72a8
    Author: Tejun Heo
    Date: Fri Mar 7 10:24:50 2014 -0500
    Subject: afs: don't use PREPARE_WORK

    which didn't go far enough. The problem is due to:

    (1) The aforementioned commit introduced a separate handler function pointer
    in the call, call->async_workfn, in addition to the original workqueue
    item, call->async_work, for asynchronous operations because workqueues
    subsystem cannot handle the workqueue item pointer being changed whilst
    the item is queued or being processed.

    (2) afs_async_workfn() was introduced in that commit to be the callback for
    call->async_work. Its sole purpose is to run whatever call->async_workfn
    points to.

    (3) call->async_workfn is only used from afs_async_workfn(), which is only
    set on async_work by afs_collect_incoming_call() - ie. for incoming
    calls.

    (4) call->async_workfn is *not* set by afs_make_call() when outgoing calls are
    made, and call->async_work is set afs_process_async_call() - and not
    afs_async_workfn().

    (5) afs_process_async_call() now changes call->async_workfn rather than
    call->async_work to point to afs_delete_async_call() to clean up, but this
    is only effective for incoming calls because call->async_work does not
    point to afs_async_workfn() for outgoing calls.

    (6) Because, for incoming calls, call->async_work remains pointing to
    afs_process_async_call() this results in an infinite loop.

    Instead, make the workqueue uniformly vector through call->async_workfn, via
    afs_async_workfn() and simply initialise call->async_workfn to point to
    afs_process_async_call() in afs_make_call().

    Signed-off-by: Nathaniel Wesley Filardo
    Signed-off-by: David Howells
    Reviewed-by: Tejun Heo

    Nathaniel Wesley Filardo
     
  • Split afs_end_call() into two pieces, one of which is identical to code in
    afs_process_async_call(). Replace the latter with a call to the first part of
    afs_end_call().

    Signed-off-by: Nathaniel Wesley Filardo
    Signed-off-by: David Howells

    Nathaniel Wesley Filardo
     

22 May, 2014

2 commits

  • Pull two btrfs fixes from Chris Mason:
    "This has two fixes that we've been testing for 3.16, but since both
    are safe and fix real bugs, it makes sense to send for 3.15 instead"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: send, fix incorrect ref access when using extrefs
    Btrfs: fix EIO on reading file after ioctl clone works on it

    Linus Torvalds
     
  • Pull xfs fixes from Dave Chinner:
    "Code inspection of the XFS error number sign translations found a
    bunch of issues, including returning incorrectly signed errors for
    some data integrity operations.

    These leak to userspace and result in applications not getting the
    errors correctly reported. Hence they need fixing sooner rather than
    later.

    A couple of the bugs are in data integrity operations, a couple more
    are in the new COLLAPSE_RANGE code. One of these came in through a
    recent ext4 merge and so I had to update the base tree to 3.15-rc5
    before fixing the issues"

    * tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
    xfs: list_lru_init returns a negative error
    xfs: negate xfs_icsb_init_counters error value
    xfs: negate mount workqueue init error value
    xfs: fix wrong err sign on xfs_set_acl()
    xfs: fix wrong errno from xfs_initxattrs
    xfs: correct error sign on COLLAPSE_RANGE errors
    xfs: xfs_commit_metadata returns wrong errno
    xfs: fix incorrect error sign in xfs_file_aio_read
    xfs: xfs_dir_fsync() returns positive errno

    Linus Torvalds
     

21 May, 2014

6 commits

  • The current code assumes a one-to-one lockownerlock stateid
    correspondance.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The nfsv4 state code has always assumed a one-to-one correspondance
    between lock stateid's and lockowners even if it appears not to in some
    places.

    We may actually change that, but for now when FREE_STATEID releases a
    lock stateid it also needs to release the parent lockowner.

    Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
    calls same_lockowner_ino on a lockowner that unexpectedly has an empty
    so_stateids list.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Fix the cache manager RPC service handlers. The afs_send_empty_reply() and
    afs_send_simple_reply() functions:

    (a) Kill the call and free up the buffers associated with it if they fail.

    (b) Return with call intact if it they succeed.

    However, none of the callers actually check the result or clean up if
    successful - and may use the now non-existent data if it fails.

    This was detected by Dan Carpenter using a static checker:

    The patch 08e0e7c82eea: "[AF_RXRPC]: Make the in-kernel AFS
    filesystem use AF_RXRPC." from Apr 26, 2007, leads to the following
    static checker warning:
    "fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
    warn: 'call' was already freed."

    Reported-by: Dan Carpenter
    Signed-off-by: David Howells

    David Howells
     
  • Pull driver core fixes from Greg KH:
    "Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
    some reported issues and a regression from 3.13"

    * tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    sysfs: make sure read buffer is zeroed
    kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs

    Linus Torvalds
     
  • When running send, if an inode only has extended reference items
    associated to it and no regular references, send.c:get_first_ref()
    was incorrectly assuming the reference it found was of type
    BTRFS_INODE_REF_KEY due to use of the wrong key variable.
    This caused weird behaviour when using the found item has a regular
    reference, such as weird path string, and occasionally (when lucky)
    a crash:

    [ 190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
    [ 190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
    [ 190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
    [ 190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
    [ 190.603030] RIP: 0010:[] [] memcpy+0x54/0x110
    [ 190.603262] RSP: 0018:ffff8801fa79f880 EFLAGS: 00010202
    [ 190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
    [ 190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
    [ 190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
    [ 190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
    [ 190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
    [ 190.604020] FS: 00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
    [ 190.604020] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
    [ 190.604020] Stack:
    [ 190.604020] ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
    [ 190.604020] 00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
    [ 190.604020] 00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
    [ 190.604020] Call Trace:
    [ 190.604020] [] ? read_extent_buffer+0xb9/0x120 [btrfs]
    [ 190.604020] [] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
    [ 190.604020] [] get_first_ref+0x1f6/0x210 [btrfs]
    [ 190.604020] [] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
    [ 190.604020] [] ? kmem_cache_alloc_trace+0x11d/0x1e0
    [ 190.604020] [] ? fs_path_alloc+0x24/0x60 [btrfs]
    [ 190.604020] [] get_cur_path+0xd1/0x240 [btrfs]
    (...)

    Steps to reproduce (either crash or some weirdness like an odd path string):

    mkfs.btrfs -f -O extref /dev/sdd
    mount /dev/sdd /mnt

    mkdir /mnt/testdir
    touch /mnt/testdir/foobar

    for i in `seq 1 2550`; do
    ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name

    rm -f /mnt/testdir/foobar
    for i in `seq 1 2550`; do
    rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    btrfs subvolume snapshot -r /mnt /mnt/mysnap
    btrfs send /mnt/mysnap -f /tmp/mysnap.send

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason
    Reviewed-by: Liu Bo

    Filipe Manana
     
  • For inline data extent, we need to make its length aligned, otherwise,
    we can get a phantom extent map which confuses readpages() to return -EIO.

    This can be detected by xfstests/btrfs/035.

    Reported-by: David Disseldorp
    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     

20 May, 2014

2 commits

  • Pull Metag architecture and related fixes from James Hogan:
    "Mostly fixes for metag and parisc relating to upgrowing stacks.

    - Fix missing compiler barriers in metag memory barriers.
    - Fix BUG_ON on metag when RLIMIT_STACK hard limit is increased
    beyond safe value.
    - Make maximum stack size configurable. This reduces the default
    user stack size back to 80MB (especially on parisc after their
    removal of _STK_LIM_MAX override). This only affects metag and
    parisc.
    - Remove metag _STK_LIM_MAX override to match other arches and follow
    parisc, now that it is safe to do so (due to the BUG_ON fix
    mentioned above).
    - Finally now that both metag and parisc _STK_LIM_MAX overrides have
    been removed, it makes sense to remove _STK_LIM_MAX altogether"

    * tag 'metag-for-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
    asm-generic: remove _STK_LIM_MAX
    metag: Remove _STK_LIM_MAX override
    parisc,metag: Do not hardcode maximum userspace stack size
    metag: Reduce maximum stack size to 256MB
    metag: fix memory barriers

    Linus Torvalds
     
  • 13c589d5b0ac ("sysfs: use seq_file when reading regular files")
    switched sysfs from custom read implementation to seq_file to enable
    later transition to kernfs. After the change, the buffer passed to
    ->show() is acquired through seq_get_buf(); unfortunately, this
    introduces a subtle behavior change. Before the commit, the buffer
    passed to ->show() was always zero as it was allocated using
    get_zeroed_page(). Because seq_file doesn't clear buffers on
    allocation and neither does seq_get_buf(), after the commit, depending
    on the behavior of ->show(), we may end up exposing uninitialized data
    to userland thus possibly altering userland visible behavior and
    leaking information.

    Fix it by explicitly clearing the buffer.

    Signed-off-by: Tejun Heo
    Reported-by: Ron
    Fixes: 13c589d5b0ac ("sysfs: use seq_file when reading regular files")
    Cc: stable # 3.13+
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

16 May, 2014

1 commit


15 May, 2014

10 commits


13 May, 2014

3 commits

  • The kernfs open method - kernfs_fop_open() - inherited extra
    permission checks from sysfs. While the vfs layer allows ignoring the
    read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
    sysfs explicitly denied open regardless of the cap if the file doesn't
    have any of the UGO perms of the requested access or doesn't implement
    the requested operation. It can be debated whether this was a good
    idea or not but the behavior is too subtle and dangerous to change at
    this point.

    After cgroup got converted to kernfs, this extra perm check also got
    applied to cgroup breaking libcgroup which opens write-only files with
    O_RDWR as root. This patch gates the extra open permission check with
    a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
    For sysfs, nothing changes. For cgroup, root now can perform any
    operation regardless of the permissions as it was before kernfs
    conversion. Note that kernfs still fails unimplemented operations
    with -EINVAL.

    While at it, add comments explaining KERNFS_ROOT flags.

    Signed-off-by: Tejun Heo
    Reported-by: Andrey Wagin
    Tested-by: Andrey Wagin
    Cc: Li Zefan
    References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
    Fixes: 2bd59d48ebfb ("cgroup: convert to kernfs")
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Pull file locking fix from Jeff Layton:
    "Fix for regression in handling of F_GETLK commands"

    * tag 'locks-v3.15-4' of git://git.samba.org/jlayton/linux:
    locks: only validate the lock vs. f_mode in F_SETLK codepaths

    Linus Torvalds
     
  • Pull cifs fix from Steve French:
    "Small cifs fix for metadata caching"

    * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: fix actimeo=0 corner case when cifs_i->time == jiffies

    Linus Torvalds