15 Oct, 2016

1 commit

  • Pull cgroup updates from Tejun Heo:

    - tracepoints for basic cgroup management operations added

    - kernfs and cgroup path formatting functions updated to behave in the
    style of strlcpy()

    - non-critical bug fixes

    * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
    cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
    cpuset: fix error handling regression in proc_cpuset_show()
    cgroup: add tracepoints for basic operations
    cgroup: make cgroup_path() and friends behave in the style of strlcpy()
    kernfs: remove kernfs_path_len()
    kernfs: make kernfs_path*() behave in the style of strlcpy()
    kernfs: add dummy implementation of kernfs_path_from_node()

    Linus Torvalds
     

14 Oct, 2016

4 commits

  • Pull NFS client updates from Anna Schumaker:
    "Highlights include:

    Stable bugfixes:
    - sunrpc: fix writ espace race causing stalls
    - NFS: Fix inode corruption in nfs_prime_dcache()
    - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
    - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
    - NFSv4: Open state recovery must account for file permission changes
    - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

    Features:
    - Add support for tracking multiple layout types with an ordered list
    - Add support for using multiple backchannel threads on the client
    - Add support for pNFS file layout session trunking
    - Delay xprtrdma use of DMA API (for device driver removal)
    - Add support for xprtrdma remote invalidation
    - Add support for larger xprtrdma inline thresholds
    - Use a scatter/gather list for sending xprtrdma RPC calls
    - Add support for the CB_NOTIFY_LOCK callback
    - Improve hashing sunrpc auth_creds by using both uid and gid

    Bugfixes:
    - Fix xprtrdma use of DMA API
    - Validate filenames before adding to the dcache
    - Fix corruption of xdr->nwords in xdr_copy_to_scratch
    - Fix setting buffer length in xdr_set_next_buffer()
    - Don't deadlock the state manager on the SEQUENCE status flags
    - Various delegation and stateid related fixes
    - Retry operations if an interrupted slot receives EREMOTEIO
    - Make nfs boot time y2038 safe"

    * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
    NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
    fs: nfs: Make nfs boot time y2038 safe
    sunrpc: replace generic auth_cred hash with auth-specific function
    sunrpc: add RPCSEC_GSS hash_cred() function
    sunrpc: add auth_unix hash_cred() function
    sunrpc: add generic_auth hash_cred() function
    sunrpc: add hash_cred() function to rpc_authops struct
    Retry operation on EREMOTEIO on an interrupted slot
    pNFS: Fix atime updates on pNFS clients
    sunrpc: queue work on system_power_efficient_wq
    NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
    NFSv4: If recovery failed for a specific open stateid, then don't retry
    NFSv4: Fix retry issues with nfs41_test/free_stateid
    NFSv4: Open state recovery must account for file permission changes
    NFSv4: Mark the lock and open stateids as invalid after freeing them
    NFSv4: Don't test open_stateid unless it is set
    NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
    NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
    NFSv4: Fix a race when updating an open_stateid
    NFSv4: Fix a race in nfs_inode_reclaim_delegation()
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Some RDMA work and some good bugfixes, and two new features that could
    benefit from user testing:

    - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
    COPY is already supported on the client side, so a call to
    copy_file_range() on a recent client should now result in a
    server-side copy that doesn't require all the data to make a round
    trip to the client and back.

    - Jeff Layton implemented callbacks to notify clients when contended
    locks become available, which should reduce latency on workloads
    with contended locks"

    * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
    NFSD: Implement the COPY call
    nfsd: handle EUCLEAN
    nfsd: only WARN once on unmapped errors
    exportfs: be careful to only return expected errors.
    nfsd4: setclientid_confirm with unmatched verifier should fail
    nfsd: randomize SETCLIENTID reply to help distinguish servers
    nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
    nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
    nfsd: add a LRU list for blocked locks
    nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
    nfsd: plumb in a CB_NOTIFY_LOCK operation
    NFSD: fix corruption in notifier registration
    svcrdma: support Remote Invalidation
    svcrdma: Server-side support for rpcrdma_connect_private
    rpcrdma: RDMA/CM private message data structure
    svcrdma: Skip put_page() when send_reply() fails
    svcrdma: Tail iovec leaves an orphaned DMA mapping
    nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
    nfsd: eliminate cb_minorversion field
    nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

    Linus Torvalds
     
  • …kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
    ----------------------------------
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||----w |
    || ||

    Pull XFS support for shared data extents from Dave Chinner:
    "This is the second part of the XFS updates for this merge cycle. This
    pullreq contains the new shared data extents feature for XFS.

    Given the complexity and size of this change I am expecting - like the
    addition of reverse mapping last cycle - that there will be some
    follow-up bug fixes and cleanups around the -rc3 stage for issues that
    I'm sure will show up once the code hits a wider userbase.

    What it is:

    At the most basic level we are simply adding shared data extents to
    XFS - i.e. a single extent on disk can now have multiple owners. To do
    this we have to add new on-disk features to both track the shared
    extents and the number of times they've been shared. This is done by
    the new "refcount" btree that sits in every allocation group. When we
    share or unshare an extent, this tree gets updated.

    Along with this new tree, the reverse mapping tree needs to be updated
    to track each owner or a shared extent. This also needs to be updated
    ever share/unshare operation. These interactions at extent allocation
    and freeing time have complex ordering and recovery constraints, so
    there's a significant amount of new intent-based transaction code to
    ensure that operations are performed atomically from both the runtime
    and integrity/crash recovery perspectives.

    We also need to break sharing when writes hit a shared extent - this
    is where the new copy-on-write implementation comes in. We allocate
    new storage and copy the original data along with the overwrite data
    into the new location. We only do this for data as we don't share
    metadata at all - each inode has it's own metadata that tracks the
    shared data extents, the extents undergoing CoW and it's own private
    extents.

    Of course, being XFS, nothing is simple - we use delayed allocation
    for CoW similar to how we use it for normal writes. ENOSPC is a
    significant issue here - we build on the reservation code added in
    4.8-rc1 with the reverse mapping feature to ensure we don't get
    spurious ENOSPC issues part way through a CoW operation. These
    mechanisms also help minimise fragmentation due to repeated CoW
    operations. To further reduce fragmentation overhead, we've also
    introduced a CoW extent size hint, which indicates how large a region
    we should allocate when we execute a CoW operation.

    With all this functionality in place, we can hook up .copy_file_range,
    .clone_file_range and .dedupe_file_range and we gain all the
    capabilities of reflink and other vfs provided functionality that
    enable manipulation to shared extents. We also added a fallocate mode
    that explicitly unshares a range of a file, which we implemented as an
    explicit CoW of all the shared extents in a file.

    As such, it's a huge chunk of new functionality with new on-disk
    format features and internal infrastructure. It warns at mount time as
    an experimental feature and that it may eat data (as we do with all
    new on-disk features until they stabilise). We have not released
    userspace suport for it yet - userspace support currently requires
    download from Darrick's xfsprogs repo and build from source, so the
    access to this feature is really developer/tester only at this point.
    Initial userspace support will be released at the same time the kernel
    with this code in it is released.

    The new code causes 5-6 new failures with xfstests - these aren't
    serious functional failures but things the output of tests changing
    slightly due to perturbations in layouts, space usage, etc. OTOH,
    we've added 150+ new tests to xfstests that specifically exercise this
    new functionality so it's got far better test coverage than any
    functionality we've previously added to XFS.

    Darrick has done a pretty amazing job getting us to this stage, and
    special mention also needs to go to Christoph (review, testing,
    improvements and bug fixes) and Brian (caught several intricate bugs
    during review) for the effort they've also put in.

    Summary:

    - unshare range (FALLOC_FL_UNSHARE) support for fallocate

    - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
    interface

    - shared extent support for XFS

    - copy-on-write support for shared extents

    - copy_file_range support

    - clone_file_range support (implements reflink)

    - dedupe_file_range support

    - defrag support for reverse mapping enabled filesystems"

    * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
    xfs: convert COW blocks to real blocks before unwritten extent conversion
    xfs: rework refcount cow recovery error handling
    xfs: clear reflink flag if setting realtime flag
    xfs: fix error initialization
    xfs: fix label inaccuracies
    xfs: remove isize check from unshare operation
    xfs: reduce stack usage of _reflink_clear_inode_flag
    xfs: check inode reflink flag before calling reflink functions
    xfs: implement swapext for rmap filesystems
    xfs: refactor swapext code
    xfs: various swapext cleanups
    xfs: recognize the reflink feature bit
    xfs: simulate per-AG reservations being critically low
    xfs: don't mix reflink and DAX mode for now
    xfs: check for invalid inode reflink flags
    xfs: set a default CoW extent size of 32 blocks
    xfs: convert unwritten status of reverse mappings for shared files
    xfs: use interval query for rmap alloc operations on shared files
    xfs: add shared rmap map/unmap/convert log item types
    xfs: increase log reservations for reflink
    ...

    Linus Torvalds
     
  • Pull watchdog updates from Wim Van Sebroeck:

    - a new watchdog pretimeout governor framework

    - support to upload the firmware on the ziirave_wdt

    - several fixes and cleanups

    * git://www.linux-watchdog.org/linux-watchdog: (26 commits)
    watchdog: imx2_wdt: add pretimeout function support
    watchdog: softdog: implement pretimeout support
    watchdog: pretimeout: add pretimeout_available_governors attribute
    watchdog: pretimeout: add option to select a pretimeout governor in runtime
    watchdog: pretimeout: add panic pretimeout governor
    watchdog: pretimeout: add noop pretimeout governor
    watchdog: add watchdog pretimeout governor framework
    watchdog: hpwdt: add support for iLO5
    fs: compat_ioctl: add pretimeout functions for watchdogs
    watchdog: add pretimeout support to the core
    watchdog: imx2_wdt: use preferred BIT macro instead of open coded values
    watchdog: st_wdt: Remove support for obsolete platforms
    watchdog: bindings: Remove obsolete platforms from dt doc.
    watchdog: mt7621_wdt: Remove assignment of dev pointer
    watchdog: rt2880_wdt: Remove assignment of dev pointer
    watchdog: constify watchdog_ops structures
    watchdog: tegra: constify watchdog_ops structures
    watchdog: iTCO_wdt: constify iTCO_wdt_pm structure
    watchdog: cadence_wdt: Fix the suspend resume
    watchdog: txx9wdt: Add missing clock (un)prepare calls for CCF
    ...

    Linus Torvalds
     

12 Oct, 2016

31 commits

  • Merge more updates from Andrew Morton:

    - a few block updates that fell in my lap

    - lib/ updates

    - checkpatch

    - autofs

    - ipc

    - a ton of misc other things

    * emailed patches from Andrew Morton : (100 commits)
    mm: split gfp_mask and mapping flags into separate fields
    fs: use mapping_set_error instead of opencoded set_bit
    treewide: remove redundant #include
    hung_task: allow hung_task_panic when hung_task_warnings is 0
    kthread: add kerneldoc for kthread_create()
    kthread: better support freezable kthread workers
    kthread: allow to modify delayed kthread work
    kthread: allow to cancel kthread work
    kthread: initial support for delayed kthread work
    kthread: detect when a kthread work is used by more workers
    kthread: add kthread_destroy_worker()
    kthread: add kthread_create_worker*()
    kthread: allow to call __kthread_create_on_node() with va_list args
    kthread/smpboot: do not park in kthread_create_on_cpu()
    kthread: kthread worker API cleanup
    kthread: rename probe_kthread_data() to kthread_probe_data()
    scripts/tags.sh: enable code completion in VIM
    mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping
    kdump, vmcoreinfo: report memory sections virtual addresses
    ipc/sem.c: add cond_resched in exit_sme
    ...

    Linus Torvalds
     
  • The mapping_set_error() helper sets the correct AS_ flag for the mapping
    so there is no reason to open code it. Use the helper directly.

    [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
    Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Kernel source files need not include explicitly
    because the top Makefile forces to include it with:

    -include $(srctree)/include/linux/kconfig.h

    This commit removes explicit includes except the following:

    * arch/s390/include/asm/facilities_src.h
    * tools/testing/radix-tree/linux/kernel.h

    These two are used for host programs.

    Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • This is a patch that provides behavior that is more consistent, and
    probably less surprising to users. I consider the change optional, and
    welcome opinions about whether it should be applied.

    By default, pipes are created with a capacity of 64 kiB. However,
    /proc/sys/fs/pipe-max-size may be set smaller than this value. In this
    scenario, an unprivileged user could thus create a pipe whose initial
    capacity exceeds the limit. Therefore, it seems logical to cap the
    initial pipe capacity according to the value of pipe-max-size.

    The test program shown earlier in this patch series can be used to
    demonstrate the effect of the change brought about with this patch:

    # cat /proc/sys/fs/pipe-max-size
    1048576
    # sudo -u mtk ./test_F_SETPIPE_SZ 1
    Initial pipe capacity: 65536
    # echo 10000 > /proc/sys/fs/pipe-max-size
    # cat /proc/sys/fs/pipe-max-size
    16384
    # sudo -u mtk ./test_F_SETPIPE_SZ 1
    Initial pipe capacity: 16384
    # ./test_F_SETPIPE_SZ 1
    Initial pipe capacity: 65536

    The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
    caps the initial allocation for a new pipe for unprivileged users, but
    not for privileged users.

    Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • This is an optional patch, to provide a small performance
    improvement. Alter account_pipe_buffers() so that it returns the
    new value in user->pipe_bufs. This means that we can refactor
    too_many_pipe_buffers_soft() and too_many_pipe_buffers_hard() to
    avoid the costs of repeated use of atomic_long_read() to get the
    value user->pipe_bufs.

    Link: http://lkml.kernel.org/r/93e5f193-1e5e-3e1f-3a20-eae79b7e1310@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • The limit checking in alloc_pipe_info() (used by pipe(2) and when
    opening a FIFO) has the following problems:

    (1) When checking capacity required for the new pipe, the checks against
    the limit in /proc/sys/fs/pipe-user-pages-{soft,hard} are made
    against existing consumption, and exclude the memory required for
    the new pipe capacity. As a consequence: (1) the memory allocation
    throttling provided by the soft limit does not kick in quite as
    early as it should, and (2) the user can overrun the hard limit.

    (2) As currently implemented, accounting and checking against the limits
    is done as follows:

    (a) Test whether the user has exceeded the limit.
    (b) Make new pipe buffer allocation.
    (c) Account new allocation against the limits.

    This is racey. Multiple processes may pass point (a) simultaneously,
    and then allocate pipe buffers that are accounted for only in step
    (c). The race means that the user's pipe buffer allocation could be
    pushed over the limit (by an arbitrary amount, depending on how
    unlucky we were in the race). [Thanks to Vegard Nossum for spotting
    this point, which I had missed.]

    This patch addresses the above problems as follows:

    * Alter the checks against limits to include the memory required for the
    new pipe.
    * Re-order the accounting step so that it precedes the buffer allocation.
    If the accounting step determines that a limit has been reached, revert
    the accounting and cause the operation to fail.

    Link: http://lkml.kernel.org/r/8ff3e9f9-23f6-510c-644f-8e70cd1c0bd9@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • Replace an 'if' block that covers most of the code in this function
    with a 'goto'. This makes the code a little simpler to read, and also
    simplifies the next patch (fix limit checking in alloc_pipe_info())

    Link: http://lkml.kernel.org/r/aef030c1-0257-98a9-4988-186efa48530c@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ))
    has the following problems:

    (1) When increasing the pipe capacity, the checks against the limits in
    /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing
    consumption, and exclude the memory required for the increased pipe
    capacity. The new increase in pipe capacity can then push the total
    memory used by the user for pipes (possibly far) over a limit. This
    can also trigger the problem described next.

    (2) The limit checks are performed even when the new pipe capacity is
    less than the existing pipe capacity. This can lead to problems if a
    user sets a large pipe capacity, and then the limits are lowered,
    with the result that the user will no longer be able to decrease the
    pipe capacity.

    (3) As currently implemented, accounting and checking against the
    limits is done as follows:

    (a) Test whether the user has exceeded the limit.
    (b) Make new pipe buffer allocation.
    (c) Account new allocation against the limits.

    This is racey. Multiple processes may pass point (a)
    simultaneously, and then allocate pipe buffers that are accounted
    for only in step (c). The race means that the user's pipe buffer
    allocation could be pushed over the limit (by an arbitrary amount,
    depending on how unlucky we were in the race). [Thanks to Vegard
    Nossum for spotting this point, which I had missed.]

    This patch addresses the above problems as follows:

    * Perform checks against the limits only when increasing a pipe's
    capacity; an unprivileged user can always decrease a pipe's capacity.
    * Alter the checks against limits to include the memory required for
    the new pipe capacity.
    * Re-order the accounting step so that it precedes the buffer
    allocation. If the accounting step determines that a limit has
    been reached, revert the accounting and cause the operation to fail.

    The program below can be used to demonstrate problems 1 and 2, and the
    effect of the fix. The program takes one or more command-line arguments.
    The first argument specifies the number of pipes that the program should
    create. The remaining arguments are, alternately, pipe capacities that
    should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in
    seconds) between the fcntl() operations. (The sleep intervals allow the
    possibility to change the limits between fcntl() operations.)

    Problem 1
    =========

    Using the test program on an unpatched kernel, we first set some
    limits:

    # echo 0 > /proc/sys/fs/pipe-user-pages-soft
    # echo 1000000000 > /proc/sys/fs/pipe-max-size
    # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB

    Then show that we can set a pipe with capacity (100MB) that is
    over the hard limit

    # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
    Initial pipe capacity: 65536
    Loop 1: set pipe capacity to 100000000 bytes
    F_SETPIPE_SZ returned 134217728

    Now set the capacity to 100MB twice. The second call fails (which is
    probably surprising to most users, since it seems like a no-op):

    # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 0 100000000
    Initial pipe capacity: 65536
    Loop 1: set pipe capacity to 100000000 bytes
    F_SETPIPE_SZ returned 134217728
    Loop 2: set pipe capacity to 100000000 bytes
    Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

    With a patched kernel, setting a capacity over the limit fails at the
    first attempt:

    # echo 0 > /proc/sys/fs/pipe-user-pages-soft
    # echo 1000000000 > /proc/sys/fs/pipe-max-size
    # echo 10000 > /proc/sys/fs/pipe-user-pages-hard
    # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
    Initial pipe capacity: 65536
    Loop 1: set pipe capacity to 100000000 bytes
    Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

    There is a small chance that the change to fix this problem could
    break user-space, since there are cases where fcntl(F_SETPIPE_SZ)
    calls that previously succeeded might fail. However, the chances are
    small, since (a) the pipe-user-pages-{soft,hard} limits are new (in
    4.5), and the default soft/hard limits are high/unlimited. Therefore,
    it seems warranted to make these limits operate more precisely (and
    behave more like what users probably expect).

    Problem 2
    =========

    Running the test program on an unpatched kernel, we first set some limits:

    # getconf PAGESIZE
    4096
    # echo 0 > /proc/sys/fs/pipe-user-pages-soft
    # echo 1000000000 > /proc/sys/fs/pipe-max-size
    # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB

    Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
    first setting a pipe capacity (10MB), sleeping for a few seconds,
    during which time the hard limit is lowered, and then set pipe
    capacity to a smaller amount (5MB):

    # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
    [1] 748
    # Initial pipe capacity: 65536
    Loop 1: set pipe capacity to 10000000 bytes
    F_SETPIPE_SZ returned 16777216
    Sleeping 15 seconds

    # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB
    # Loop 2: set pipe capacity to 5000000 bytes
    Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

    In this case, the user should be able to lower the limit.

    With a kernel that has the patch below, the second fcntl()
    succeeds:

    # echo 0 > /proc/sys/fs/pipe-user-pages-soft
    # echo 1000000000 > /proc/sys/fs/pipe-max-size
    # echo 10000 > /proc/sys/fs/pipe-user-pages-hard
    # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
    [1] 3215
    # Initial pipe capacity: 65536
    # Loop 1: set pipe capacity to 10000000 bytes
    F_SETPIPE_SZ returned 16777216
    Sleeping 15 seconds

    # echo 1000 > /proc/sys/fs/pipe-user-pages-hard

    # Loop 2: set pipe capacity to 5000000 bytes
    F_SETPIPE_SZ returned 8388608

    8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

    /* test_F_SETPIPE_SZ.c

    (C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later

    Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
    and interactions with limits defined by /proc/sys/fs/pipe-* files.
    */

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    int
    main(int argc, char *argv[])
    {
    int (*pfd)[2];
    int npipes;
    int pcap, rcap;
    int j, p, s, stime, loop;

    if (argc < 2) {
    fprintf(stderr, "Usage: %s num-pipes "
    "[pipe-capacity sleep-time]...\n", argv[0]);
    exit(EXIT_FAILURE);
    }

    npipes = atoi(argv[1]);

    pfd = calloc(npipes, sizeof (int [2]));
    if (pfd == NULL) {
    perror("calloc");
    exit(EXIT_FAILURE);
    }

    for (j = 0; j < npipes; j++) {
    if (pipe(pfd[j]) == -1) {
    fprintf(stderr, "Loop %d: pipe() failed: ", j);
    perror("pipe");
    exit(EXIT_FAILURE);
    }
    }

    printf("Initial pipe capacity: %d\n", fcntl(pfd[0][0], F_GETPIPE_SZ));

    for (j = 2; j < argc; j += 2 ) {
    loop = j / 2;
    pcap = atoi(argv[j]);
    printf(" Loop %d: set pipe capacity to %d bytes\n", loop, pcap);

    for (p = 0; p < npipes; p++) {
    s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap);
    if (s == -1) {
    fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
    "failed: ", loop, p);
    perror("fcntl");
    exit(EXIT_FAILURE);
    }

    if (p == 0) {
    printf(" F_SETPIPE_SZ returned %d\n", s);
    rcap = s;
    } else {
    if (s != rcap) {
    fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
    "unexpected return: %d\n", loop, p, s);
    exit(EXIT_FAILURE);
    }
    }

    stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0;
    if (stime > 0) {
    printf(" Sleeping %d seconds\n", stime);
    sleep(stime);
    }
    }
    }

    exit(EXIT_SUCCESS);
    }

    8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

    Patch history:

    v2
    * Switch order of test in 'if' statement to avoid function call
    (to capability()) in normal path. [This is a fix to a preexisting
    wart in the code. Thanks to Willy Tarreau]
    * Perform (size > pipe_max_size) check before calling
    account_pipe_buffers(). [Thanks to Vegard Nossum]
    Quoting Vegard:

    The potential problem happens if the user passes a very large number
    which will overflow pipe->user->pipe_bufs.

    On 32-bit, sizeof(int) == sizeof(long), so if they pass arg = INT_MAX
    then round_pipe_size() returns INT_MAX. Although it's true that the
    accounting is done in terms of pages and not bytes, so you'd need on
    the order of (1 << 13) = 8192 processes hitting the limit at the same
    time in order to make it overflow, which seems a bit unlikely.

    (See https://lkml.org/lkml/2016/8/12/215 for another discussion on the
    limit checking)

    Link: http://lkml.kernel.org/r/1e464945-536b-2420-798b-e77b9c7e8593@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • This is a preparatory patch for following work. account_pipe_buffers()
    performs accounting in the 'user_struct'. There is no need to pass a
    pointer to a 'pipe_inode_info' struct (which is then dereferenced to
    obtain a pointer to the 'user' field). Instead, pass a pointer directly
    to the 'user_struct'. This change is needed in preparation for a
    subsequent patch that the fixes the limit checking in alloc_pipe_info()
    (and the resulting code is a little more logical).

    Link: http://lkml.kernel.org/r/7277bf8c-a6fc-4a7d-659c-f5b145c981ab@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • This is a preparatory patch for following work. Move the F_SETPIPE_SZ
    limit-checking logic from pipe_fcntl() into pipe_set_size(). This
    simplifies the code a little, and allows for reworking required in
    a later patch that fixes the limit checking in pipe_set_size()

    Link: http://lkml.kernel.org/r/3701b2c5-2c52-2c3e-226d-29b9deb29b50@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • Patch series "pipe: fix limit handling", v2.

    When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various limits
    defined by /proc/sys/fs/pipe-* files are checked to see if unprivileged
    users are exceeding limits on memory consumption.

    While documenting and testing the operation of these limits I noticed
    that, as currently implemented, these checks have a number of problems:

    (1) When increasing the pipe capacity, the checks against the limits
    in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
    existing consumption, and exclude the memory required for the
    increased pipe capacity. The new increase in pipe capacity can then
    push the total memory used by the user for pipes (possibly far) over
    a limit. This can also trigger the problem described next.

    (2) The limit checks are performed even when the new pipe capacity
    is less than the existing pipe capacity. This can lead to problems
    if a user sets a large pipe capacity, and then the limits are
    lowered, with the result that the user will no longer be able to
    decrease the pipe capacity.

    (3) As currently implemented, accounting and checking against the
    limits is done as follows:

    (a) Test whether the user has exceeded the limit.
    (b) Make new pipe buffer allocation.
    (c) Account new allocation against the limits.

    This is racey. Multiple processes may pass point (a) simultaneously,
    and then allocate pipe buffers that are accounted for only in step
    (c). The race means that the user's pipe buffer allocation could be
    pushed over the limit (by an arbitrary amount, depending on how
    unlucky we were in the race). [Thanks to Vegard Nossum for spotting
    this point, which I had missed.]

    This patch series addresses these three problems.

    This patch (of 8):

    This is a minor preparatory patch. After subsequent patches,
    round_pipe_size() will be called from pipe_set_size(), so place
    round_pipe_size() above pipe_set_size().

    Link: http://lkml.kernel.org/r/91a91fdb-a959-ba7f-b551-b62477cc98a1@gmail.com
    Signed-off-by: Michael Kerrisk
    Reviewed-by: Vegard Nossum
    Cc: Willy Tarreau
    Cc:
    Cc: Tetsuo Handa
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • cmd part of this struct is the same as an index of itself within
    _ioctls[]. In fact this cmd is unused, so we can drop this part.

    Link: http://lkml.kernel.org/r/20160831033414.9910.66697.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • Having this in autofs_i.h gives illusion that uncommenting this enables
    pr_debug(), but it doesn't enable all the pr_debug() in autofs because
    inclusion order matters.

    XFS has the same DEBUG macro in its core header fs/xfs/xfs.h, however XFS
    seems to have a rule to include this prior to other XFS headers as well as
    kernel headers. This is not the case with autofs, and DEBUG could be
    enabled via Makefile, so autofs should just get rid of this comment to
    make the code less confusing. It's a comment, so there is literally no
    functional difference.

    Link: http://lkml.kernel.org/r/20160831033409.9910.77067.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • All other warnings use "cmd(0x%08x)" and this is the only one with
    "cmd(%d)". (below comes from my userspace debug program, but not
    automount daemon)

    [ 1139.905676] autofs4:pid:1640:check_dev_ioctl_version: ioctl control interface version mismatch: kernel(1.0), user(0.0), cmd(-1072131215)

    Link: http://lkml.kernel.org/r/20160812024851.12352.75458.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • No functional changes, based on the following justification.

    1. Make the code more consistent using the ioctl vector _ioctls[],
    rather than assigning NULL only for this ioctl command.
    2. Remove goto done; for better maintainability in the long run.
    3. The existing code is based on the fact that validate_dev_ioctl()
    sets ioctl version for any command, but AUTOFS_DEV_IOCTL_VERSION_CMD
    should explicitly set it regardless of the default behavior.

    Link: http://lkml.kernel.org/r/20160812024846.12352.9885.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • The count of miscellaneous device ioctls in fs/autofs4/autofs_i.h is wrong.

    The number of ioctls is the difference between AUTOFS_DEV_IOCTL_VERSION_CMD
    and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD (14) not the difference between
    AUTOFS_IOC_COUNT and 11 (21).

    [kusumi.tomohiro@gmail.com: fix typo that made the count macro negative]
    Link: http://lkml.kernel.org/r/20160831033420.9910.16809.stgit@pluto.themaw.net
    Link: http://lkml.kernel.org/r/20160812024841.12352.11975.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Cc: Tomohiro Kusumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • This isn't a return value, so change the message to indicate the status is
    the result of may_umount().

    (or locate pr_debug() after put_user() with the same message)

    Link: http://lkml.kernel.org/r/20160812024836.12352.74628.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • Returning -ENOTTY here fails to free dynamically allocated param.

    Link: http://lkml.kernel.org/r/20160812024815.12352.69153.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • These two were left from commit aa55ddf340c9 ("autofs4: remove unused
    ioctls") which removed unused ioctls.

    Link: http://lkml.kernel.org/r/20160812024810.12352.96377.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • kfree dentry data allocated by autofs4_new_ino() with autofs4_free_ino()
    instead of raw kfree. (since we have the interface to free autofs_info*)

    This patch was modified to remove the need to set the dentry info field to
    NULL dew to a change in the previous patch.

    Link: http://lkml.kernel.org/r/20160812024805.12352.43650.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • The inode allocation failure case in autofs4_dir_symlink() frees the
    autofs dentry info of the dentry without setting ->d_fsdata to NULL.

    That could lead to a double free so just get rid of the free and leave it
    to ->d_release().

    Link: http://lkml.kernel.org/r/20160812024759.12352.10653.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Cc: Tomohiro Kusumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • It's invalid if the given mode is neither dir nor link, so warn on else
    case.

    Link: http://lkml.kernel.org/r/20160812024754.12352.8536.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • Somewhere along the line the error handling gotos have become incorrect.

    Link: http://lkml.kernel.org/r/20160812024749.12352.15100.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Cc: Tomohiro Kusumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • This patch does what the below comment says. It could be and it's
    considered better to do this first before various functions get called
    during initialization.

    /* Couldn't this be tested earlier? */

    Link: http://lkml.kernel.org/r/20160812024744.12352.43075.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • autofs4_kill_sb() doesn't need to be declared as extern, and no other
    functions in .h are explicitly declared as extern.

    Link: http://lkml.kernel.org/r/20160812024739.12352.99354.stgit@pluto.themaw.net
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomohiro Kusumi
     
  • The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
    with the number of fds passed. We had a customer report page allocation
    failures of order-4 for this allocation. This is a costly order, so it might
    easily fail, as the VM expects such allocation to have a lower-order fallback.

    Such trivial fallback is vmalloc(), as the memory doesn't have to be physically
    contiguous and the allocation is temporary for the duration of the syscall
    only. There were some concerns, whether this would have negative impact on the
    system by exposing vmalloc() to userspace. Although an excessive use of vmalloc
    can cause some system wide performance issues - TLB flushes etc. - a large
    order allocation is not for free either and an excessive reclaim/compaction can
    have a similar effect. Also note that the size is effectively limited by
    RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
    bitmaps will fit well within single page and thus the vmalloc() fallback could
    be only excercised for processes where root allows a higher limit.

    Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
    it doesn't need this kind of fallback.

    [eric.dumazet@gmail.com: fix failure path logic]
    [akpm@linux-foundation.org: use proper type for size]
    Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Alexander Viro
    Cc: Eric Dumazet
    Cc: David Laight
    Cc: Hillf Danton
    Cc: Nicholas Piggin
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • After much discussion, it seems that the fallocate feature flag
    FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
    FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
    for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
    set. A length that goes past the end of the device will be clamped to the
    device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
    start and length must be aligned to the device's logical block size.

    Since the semantics of fallocate are fairly well established already, wire
    up the two pieces. The other fallocate variants (collapse range, insert
    range, and allocate blocks) are not supported.

    Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    Cc: Theodore Ts'o
    Cc: Martin K. Petersen
    Cc: Mike Snitzer # tweaked header
    Cc: Brian Foster
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle
    should be freed, otherwise the memory will be leaked.

    Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com
    Signed-off-by: Guozhonghua
    Reviewed-by: Mark Fasheh
    Cc: Eric Ren
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guozhonghua
     
  • Pull libnvdimm updates from Dan Williams:
    "Aside from the recently added pmem sub-division support these have
    been in -next for several releases with no reported issues. The sub-
    division support was included in next-20161010 with no reported
    issues. It passes all unit tests including new tests for all the new
    functionality below.

    Summary:

    - PMEM sub-division support: Allow a single PMEM region to be divided
    into multiple namespaces. Originally, ~2 years ago, it was thought
    that partitions of a /dev/pmemX block device could handle
    sub-allocations of persistent memory for different use cases. With
    the decision to not support DAX mappings of raw block-devices, and
    the genesis of device-dax, the need for having multiple
    pmem-namespace per region has grown.

    - Device-DAX unified inode: In support of dynamic-resizing of a
    device-dax instance the kernel arranges for all mappings of a
    device-dax node to share the same inode. This allows unmap /
    truncate / invalidation events to affect all instances of the
    device similar to the behavior of mmap on block devices.

    - Hardware error scrubbing reworks: The original address-range-scrub
    and badblocks tracking solution allowed clearing entries at the
    individual namespace level, but it failed to clear the internal
    list of media errors maintained at the bus level. The result was
    that the next scrub or namespace disable/re-enable event would
    restore the cleared badblocks, but now that is fixed. The v4.8
    kernel introduced an auto-scrub-on-machine-check behavior to
    repopulate the badblocks list. Now, in v4.9, the auto-scrub
    behavior can be disabled and simply arrange for the error reported
    in the machine-check to be added to the list.

    - DIMM health-event notification support: ACPI 6.1 defines a
    notification event code that can be send to ACPI NVDIMM devices. A
    poll(2) capable file descriptor for these events can be obtained
    from the nmemX/nfit/flags sysfs-attribute of a libnvdimm memory
    device.

    - Miscellaneous fixes: NVDIMM-N probe error, device-dax build error,
    and a change to dedup the flush hint list to not flush the memory
    controller more than necessary"

    * tag 'libnvdimm-for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
    /dev/dax: fix Kconfig dependency build breakage
    dax: use correct dev_t value
    dax: convert devm_create_dax_dev to PTR_ERR
    libnvdimm, namespace: allow creation of multiple pmem-namespaces per region
    libnvdimm, namespace: lift single pmem limit in scan_labels()
    libnvdimm, namespace: filter out of range labels in scan_labels()
    libnvdimm, namespace: enable allocation of multiple pmem namespaces
    libnvdimm, namespace: update label implementation for multi-pmem
    libnvdimm, namespace: expand pmem device naming scheme for multi-pmem
    libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
    libnvdimm, namespace: sort namespaces by dpa at init
    libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time
    tools/testing/nvdimm: support for sub-dividing a pmem region
    libnvdimm, namespace: unify blk and pmem label scanning
    libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper
    libnvdimm, label: convert label tracking to a linked list
    libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc
    nvdimm: reduce duplicated wpq flushes
    libnvdimm: clear the internal poison_list when clearing badblocks
    pmem: reduce kmap_atomic sections to the memcpys only
    ...

    Linus Torvalds
     
  • Pull btrfs updates from Chris Mason:
    "This is a big variety of fixes and cleanups.

    Liu Bo continues to fixup fuzzer related problems, and some of Josef's
    cleanups are prep for his bigger extent buffer changes (slated for
    v4.10)"

    * 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
    Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
    Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
    Btrfs: don't BUG() during drop snapshot
    btrfs: fix btrfs_no_printk stub helper
    Btrfs: memset to avoid stale content in btree leaf
    btrfs: parent_start initialization cleanup
    btrfs: Remove already completed TODO comment
    btrfs: Do not reassign count in btrfs_run_delayed_refs
    btrfs: fix a possible umount deadlock
    Btrfs: fix memory leak in do_walk_down
    btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
    btrfs: convert send's verbose_printk to btrfs_debug
    btrfs: convert pr_* to btrfs_* where possible
    btrfs: convert printk(KERN_* to use pr_* calls
    btrfs: unsplit printed strings
    btrfs: clean the old superblocks before freeing the device
    Btrfs: kill BUG_ON in run_delayed_tree_ref
    Btrfs: don't leak reloc root nodes on error
    btrfs: squash lines for simple wrapper functions
    Btrfs: improve check_node to avoid reading corrupted nodes
    ...

    Linus Torvalds
     
  • Pull UBI/UBIFS updates from Richard Weinberger:
    "This pull request contains:

    - Fixes for both UBI and UBIFS
    - overlayfs support (O_TMPFILE, RENAME_WHITEOUT/EXCHANGE)
    - Code refactoring for the upcoming MLC support"

    [ Ugh, we just got rid of the "rename2()" naming for the extended rename
    functionality. And this re-introduces it in ubifs with the cross-
    renaming and whiteout support.

    But rather than do any re-organizations in the merge itself, the
    naming can be cleaned up later ]

    * tag 'upstream-4.9-rc1' of git://git.infradead.org/linux-ubifs: (27 commits)
    UBIFS: improve function-level documentation
    ubifs: fix host xattr_len when changing xattr
    ubifs: Use move variable in ubifs_rename()
    ubifs: Implement RENAME_EXCHANGE
    ubifs: Implement RENAME_WHITEOUT
    ubifs: Implement O_TMPFILE
    ubi: Fix Fastmap's update_vol()
    ubi: Fix races around ubi_refill_pools()
    ubi: Deal with interrupted erasures in WL
    UBI: introduce the VID buffer concept
    UBI: hide EBA internals
    UBI: provide an helper to query LEB information
    UBI: provide an helper to check whether a LEB is mapped or not
    UBI: add an helper to check lnum validity
    UBI: simplify LEB write and atomic LEB change code
    UBI: simplify recover_peb() code
    UBI: move the global ech and vidh variables into struct ubi_attach_info
    UBI: provide helpers to allocate and free aeb elements
    UBI: fastmap: use ubi_io_{read, write}_data() instead of ubi_io_{read, write}()
    UBI: fastmap: use ubi_rb_for_each_entry() in unmap_peb()
    ...

    Linus Torvalds
     

11 Oct, 2016

4 commits

  • Pull networking fixes from David Miller:

    1) Netfilter list handling fix, from Linus.

    2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
    endpoints, build warnings, missing notifications, etc.) From David
    Howells.

    3) Kernel log message missing newlines, from Colin Ian King.

    4) Don't enter direct reclaim in netlink dumps, the idea is to use a
    high order allocation first and fallback quickly to a 0-order
    allocation if such a high-order one cannot be done cheaply and
    without reclaim. From Eric Dumazet.

    5) Fix firmware download errors in btusb bluetooth driver, from Ethan
    Hsieh.

    6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.

    7) Fix MDIO_XGENE dup Kconfig entry. From Laura Abbott.

    8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
    Żenczykowski.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
    netfilter: Fix slab corruption.
    be2net: Enable VF link state setting for BE3
    be2net: Fix TX stats for TSO packets
    be2net: Update Copyright string in be_hw.h
    be2net: NCSI FW section should be properly updated with ethtool for BE3
    be2net: Provide an alternate way to read pf_num for BEx chips
    wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
    net: macb: NULL out phydev after removing mdio bus
    xen-netback: make sure that hashes are not send to unaware frontends
    Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
    MAINTAINERS: add myself as a maintainer of xen-netback
    ipv6 addrconf: disallow rtr_solicits < -1
    Bluetooth: btusb: Fix atheros firmware download error
    drivers: net: phy: Correct duplicate MDIO_XGENE entry
    ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM
    net: ethernet: mediatek: remove hwlro property in the device tree
    net: ethernet: mediatek: get hw lro capability by the chip id instead of by the dtsi
    net: ethernet: mediatek: get the chip id by ETHDMASYS registers
    net: bgmac: Fix errant feature flag check
    netlink: do not enter direct reclaim from netlink_dump()
    ...

    Linus Torvalds
     
  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds