Eric Lee / smarc-fsl-linux-kernel

15 Oct, 2016

1 commit

f34d3606f Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- tracepoints for basic cgroup management operations added

- kernfs and cgroup path formatting functions updated to behave in the
style of strlcpy()

- non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
cpuset: fix error handling regression in proc_cpuset_show()
cgroup: add tracepoints for basic operations
cgroup: make cgroup_path() and friends behave in the style of strlcpy()
kernfs: remove kernfs_path_len()
kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs: add dummy implementation of kernfs_path_from_node()

Linus Torvalds
2016-10-15 03:18:50 +0800

14 Oct, 2016

4 commits

c4a86165d Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Highlights include:

Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid

Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...

Linus Torvalds
2016-10-14 12:28:20 +0800
277855647 Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:

- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.

- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

Linus Torvalds
2016-10-14 12:04:42 +0800
35a891be9 Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/dgc/linux-xfs

< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Pull XFS support for shared data extents from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle. This
pullreq contains the new shared data extents feature for XFS.

Given the complexity and size of this change I am expecting - like the
addition of reverse mapping last cycle - that there will be some
follow-up bug fixes and cleanups around the -rc3 stage for issues that
I'm sure will show up once the code hits a wider userbase.

What it is:

At the most basic level we are simply adding shared data extents to
XFS - i.e. a single extent on disk can now have multiple owners. To do
this we have to add new on-disk features to both track the shared
extents and the number of times they've been shared. This is done by
the new "refcount" btree that sits in every allocation group. When we
share or unshare an extent, this tree gets updated.

Along with this new tree, the reverse mapping tree needs to be updated
to track each owner or a shared extent. This also needs to be updated
ever share/unshare operation. These interactions at extent allocation
and freeing time have complex ordering and recovery constraints, so
there's a significant amount of new intent-based transaction code to
ensure that operations are performed atomically from both the runtime
and integrity/crash recovery perspectives.

We also need to break sharing when writes hit a shared extent - this
is where the new copy-on-write implementation comes in. We allocate
new storage and copy the original data along with the overwrite data
into the new location. We only do this for data as we don't share
metadata at all - each inode has it's own metadata that tracks the
shared data extents, the extents undergoing CoW and it's own private
extents.

Of course, being XFS, nothing is simple - we use delayed allocation
for CoW similar to how we use it for normal writes. ENOSPC is a
significant issue here - we build on the reservation code added in
4.8-rc1 with the reverse mapping feature to ensure we don't get
spurious ENOSPC issues part way through a CoW operation. These
mechanisms also help minimise fragmentation due to repeated CoW
operations. To further reduce fragmentation overhead, we've also
introduced a CoW extent size hint, which indicates how large a region
we should allocate when we execute a CoW operation.

With all this functionality in place, we can hook up .copy_file_range,
.clone_file_range and .dedupe_file_range and we gain all the
capabilities of reflink and other vfs provided functionality that
enable manipulation to shared extents. We also added a fallocate mode
that explicitly unshares a range of a file, which we implemented as an
explicit CoW of all the shared extents in a file.

As such, it's a huge chunk of new functionality with new on-disk
format features and internal infrastructure. It warns at mount time as
an experimental feature and that it may eat data (as we do with all
new on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time the kernel
with this code in it is released.

The new code causes 5-6 new failures with xfstests - these aren't
serious functional failures but things the output of tests changing
slightly due to perturbations in layouts, space usage, etc. OTOH,
we've added 150+ new tests to xfstests that specifically exercise this
new functionality so it's got far better test coverage than any
functionality we've previously added to XFS.

Darrick has done a pretty amazing job getting us to this stage, and
special mention also needs to go to Christoph (review, testing,
improvements and bug fixes) and Brian (caught several intricate bugs
during review) for the effort they've also put in.

Summary:

- unshare range (FALLOC_FL_UNSHARE) support for fallocate

- copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
interface

- shared extent support for XFS

- copy-on-write support for shared extents

- copy_file_range support

- clone_file_range support (implements reflink)

- dedupe_file_range support

- defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
xfs: convert COW blocks to real blocks before unwritten extent conversion
xfs: rework refcount cow recovery error handling
xfs: clear reflink flag if setting realtime flag
xfs: fix error initialization
xfs: fix label inaccuracies
xfs: remove isize check from unshare operation
xfs: reduce stack usage of _reflink_clear_inode_flag
xfs: check inode reflink flag before calling reflink functions
xfs: implement swapext for rmap filesystems
xfs: refactor swapext code
xfs: various swapext cleanups
xfs: recognize the reflink feature bit
xfs: simulate per-AG reservations being critically low
xfs: don't mix reflink and DAX mode for now
xfs: check for invalid inode reflink flags
xfs: set a default CoW extent size of 32 blocks
xfs: convert unwritten status of reverse mappings for shared files
xfs: use interval query for rmap alloc operations on shared files
xfs: add shared rmap map/unmap/convert log item types
xfs: increase log reservations for reflink
...

Linus Torvalds
2016-10-14 11:28:22 +0800
e3799a210 Merge git://www.linux-watchdog.org/linux-watchdog ... Browse Code »

Pull watchdog updates from Wim Van Sebroeck:

- a new watchdog pretimeout governor framework

- support to upload the firmware on the ziirave_wdt

- several fixes and cleanups

* git://www.linux-watchdog.org/linux-watchdog: (26 commits)
watchdog: imx2_wdt: add pretimeout function support
watchdog: softdog: implement pretimeout support
watchdog: pretimeout: add pretimeout_available_governors attribute
watchdog: pretimeout: add option to select a pretimeout governor in runtime
watchdog: pretimeout: add panic pretimeout governor
watchdog: pretimeout: add noop pretimeout governor
watchdog: add watchdog pretimeout governor framework
watchdog: hpwdt: add support for iLO5
fs: compat_ioctl: add pretimeout functions for watchdogs
watchdog: add pretimeout support to the core
watchdog: imx2_wdt: use preferred BIT macro instead of open coded values
watchdog: st_wdt: Remove support for obsolete platforms
watchdog: bindings: Remove obsolete platforms from dt doc.
watchdog: mt7621_wdt: Remove assignment of dev pointer
watchdog: rt2880_wdt: Remove assignment of dev pointer
watchdog: constify watchdog_ops structures
watchdog: tegra: constify watchdog_ops structures
watchdog: iTCO_wdt: constify iTCO_wdt_pm structure
watchdog: cadence_wdt: Fix the suspend resume
watchdog: txx9wdt: Add missing clock (un)prepare calls for CCF
...

Linus Torvalds
2016-10-14 07:44:20 +0800

12 Oct, 2016

31 commits

a379f71a3 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- a few block updates that fell in my lap

- lib/ updates

- checkpatch

- autofs

- ipc

- a ton of misc other things

* emailed patches from Andrew Morton : (100 commits)
mm: split gfp_mask and mapping flags into separate fields
fs: use mapping_set_error instead of opencoded set_bit
treewide: remove redundant #include
hung_task: allow hung_task_panic when hung_task_warnings is 0
kthread: add kerneldoc for kthread_create()
kthread: better support freezable kthread workers
kthread: allow to modify delayed kthread work
kthread: allow to cancel kthread work
kthread: initial support for delayed kthread work
kthread: detect when a kthread work is used by more workers
kthread: add kthread_destroy_worker()
kthread: add kthread_create_worker*()
kthread: allow to call __kthread_create_on_node() with va_list args
kthread/smpboot: do not park in kthread_create_on_cpu()
kthread: kthread worker API cleanup
kthread: rename probe_kthread_data() to kthread_probe_data()
scripts/tags.sh: enable code completion in VIM
mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping
kdump, vmcoreinfo: report memory sections virtual addresses
ipc/sem.c: add cond_resched in exit_sme
...

Linus Torvalds
2016-10-12 08:34:10 +0800
5114a97a8 fs: use mapping_set_error instead of opencoded set_bit ... Browse Code »

The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it. Use the helper directly.

[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Signed-off-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-10-12 06:06:33 +0800
97139d4a6 treewide: remove redundant #include <linux/kconfig.h> ... Browse Code »

Kernel source files need not include explicitly
because the top Makefile forces to include it with:

-include $(srctree)/include/linux/kconfig.h

This commit removes explicit includes except the following:

* arch/s390/include/asm/facilities_src.h
* tools/testing/radix-tree/linux/kernel.h

These two are used for host programs.

Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2016-10-12 06:06:33 +0800
086e774a5 pipe: cap initial pipe capacity according to pipe-max-size limit ... Browse Code »

This is a patch that provides behavior that is more consistent, and
probably less surprising to users. I consider the change optional, and
welcome opinions about whether it should be applied.

By default, pipes are created with a capacity of 64 kiB. However,
/proc/sys/fs/pipe-max-size may be set smaller than this value. In this
scenario, an unprivileged user could thus create a pipe whose initial
capacity exceeds the limit. Therefore, it seems logical to cap the
initial pipe capacity according to the value of pipe-max-size.

The test program shown earlier in this patch series can be used to
demonstrate the effect of the change brought about with this patch:

# cat /proc/sys/fs/pipe-max-size
1048576
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
# echo 10000 > /proc/sys/fs/pipe-max-size
# cat /proc/sys/fs/pipe-max-size
16384
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 16384
# ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536

The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
caps the initial allocation for a new pipe for unprivileged users, but
not for privileged users.

Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:32 +0800
9c87bcf0a pipe: make account_pipe_buffers() return a value, and use it ... Browse Code »

This is an optional patch, to provide a small performance
improvement. Alter account_pipe_buffers() so that it returns the
new value in user->pipe_bufs. This means that we can refactor
too_many_pipe_buffers_soft() and too_many_pipe_buffers_hard() to
avoid the costs of repeated use of atomic_long_read() to get the
value user->pipe_bufs.

Link: http://lkml.kernel.org/r/93e5f193-1e5e-3e1f-3a20-eae79b7e1310@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:32 +0800
a005ca0e6 pipe: fix limit checking in alloc_pipe_info() ... Browse Code »

The limit checking in alloc_pipe_info() (used by pipe(2) and when
opening a FIFO) has the following problems:

(1) When checking capacity required for the new pipe, the checks against
the limit in /proc/sys/fs/pipe-user-pages-{soft,hard} are made
against existing consumption, and exclude the memory required for
the new pipe capacity. As a consequence: (1) the memory allocation
throttling provided by the soft limit does not kick in quite as
early as it should, and (2) the user can overrun the hard limit.

(2) As currently implemented, accounting and checking against the limits
is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a) simultaneously,
and then allocate pipe buffers that are accounted for only in step
(c). The race means that the user's pipe buffer allocation could be
pushed over the limit (by an arbitrary amount, depending on how
unlucky we were in the race). [Thanks to Vegard Nossum for spotting
this point, which I had missed.]

This patch addresses the above problems as follows:

* Alter the checks against limits to include the memory required for the
new pipe.
* Re-order the accounting step so that it precedes the buffer allocation.
If the accounting step determines that a limit has been reached, revert
the accounting and cause the operation to fail.

Link: http://lkml.kernel.org/r/8ff3e9f9-23f6-510c-644f-8e70cd1c0bd9@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:32 +0800
09b4d1990 pipe: simplify logic in alloc_pipe_info() ... Browse Code »

Replace an 'if' block that covers most of the code in this function
with a 'goto'. This makes the code a little simpler to read, and also
simplifies the next patch (fix limit checking in alloc_pipe_info())

Link: http://lkml.kernel.org/r/aef030c1-0257-98a9-4988-186efa48530c@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:32 +0800
b0b91d18e pipe: fix limit checking in pipe_set_size() ... Browse Code »

The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ))
has the following problems:

(1) When increasing the pipe capacity, the checks against the limits in
/proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing
consumption, and exclude the memory required for the increased pipe
capacity. The new increase in pipe capacity can then push the total
memory used by the user for pipes (possibly far) over a limit. This
can also trigger the problem described next.

(2) The limit checks are performed even when the new pipe capacity is
less than the existing pipe capacity. This can lead to problems if a
user sets a large pipe capacity, and then the limits are lowered,
with the result that the user will no longer be able to decrease the
pipe capacity.

(3) As currently implemented, accounting and checking against the
limits is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a)
simultaneously, and then allocate pipe buffers that are accounted
for only in step (c). The race means that the user's pipe buffer
allocation could be pushed over the limit (by an arbitrary amount,
depending on how unlucky we were in the race). [Thanks to Vegard
Nossum for spotting this point, which I had missed.]

This patch addresses the above problems as follows:

* Perform checks against the limits only when increasing a pipe's
capacity; an unprivileged user can always decrease a pipe's capacity.
* Alter the checks against limits to include the memory required for
the new pipe capacity.
* Re-order the accounting step so that it precedes the buffer
allocation. If the accounting step determines that a limit has
been reached, revert the accounting and cause the operation to fail.

The program below can be used to demonstrate problems 1 and 2, and the
effect of the fix. The program takes one or more command-line arguments.
The first argument specifies the number of pipes that the program should
create. The remaining arguments are, alternately, pipe capacities that
should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in
seconds) between the fcntl() operations. (The sleep intervals allow the
possibility to change the limits between fcntl() operations.)

Problem 1
=========

Using the test program on an unpatched kernel, we first set some
limits:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB

Then show that we can set a pipe with capacity (100MB) that is
over the hard limit

# sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 100000000 bytes
F_SETPIPE_SZ returned 134217728

Now set the capacity to 100MB twice. The second call fails (which is
probably surprising to most users, since it seems like a no-op):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 0 100000000
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 100000000 bytes
F_SETPIPE_SZ returned 134217728
Loop 2: set pipe capacity to 100000000 bytes
Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

With a patched kernel, setting a capacity over the limit fails at the
first attempt:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard
# sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 100000000 bytes
Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

There is a small chance that the change to fix this problem could
break user-space, since there are cases where fcntl(F_SETPIPE_SZ)
calls that previously succeeded might fail. However, the chances are
small, since (a) the pipe-user-pages-{soft,hard} limits are new (in
4.5), and the default soft/hard limits are high/unlimited. Therefore,
it seems warranted to make these limits operate more precisely (and
behave more like what users probably expect).

Problem 2
=========

Running the test program on an unpatched kernel, we first set some limits:

# getconf PAGESIZE
4096
# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB

Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
first setting a pipe capacity (10MB), sleeping for a few seconds,
during which time the hard limit is lowered, and then set pipe
capacity to a smaller amount (5MB):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
[1] 748
# Initial pipe capacity: 65536
Loop 1: set pipe capacity to 10000000 bytes
F_SETPIPE_SZ returned 16777216
Sleeping 15 seconds

# echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB
# Loop 2: set pipe capacity to 5000000 bytes
Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

In this case, the user should be able to lower the limit.

With a kernel that has the patch below, the second fcntl()
succeeds:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard
# sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
[1] 3215
# Initial pipe capacity: 65536
# Loop 1: set pipe capacity to 10000000 bytes
F_SETPIPE_SZ returned 16777216
Sleeping 15 seconds

# echo 1000 > /proc/sys/fs/pipe-user-pages-hard

# Loop 2: set pipe capacity to 5000000 bytes
F_SETPIPE_SZ returned 8388608

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

/* test_F_SETPIPE_SZ.c

(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later

Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
and interactions with limits defined by /proc/sys/fs/pipe-* files.
*/

#define _GNU_SOURCE
#include
#include
#include
#include

int
main(int argc, char *argv[])
{
int (*pfd)[2];
int npipes;
int pcap, rcap;
int j, p, s, stime, loop;

if (argc < 2) {
fprintf(stderr, "Usage: %s num-pipes "
"[pipe-capacity sleep-time]...\n", argv[0]);
exit(EXIT_FAILURE);
}

npipes = atoi(argv[1]);

pfd = calloc(npipes, sizeof (int [2]));
if (pfd == NULL) {
perror("calloc");
exit(EXIT_FAILURE);
}

for (j = 0; j < npipes; j++) {
if (pipe(pfd[j]) == -1) {
fprintf(stderr, "Loop %d: pipe() failed: ", j);
perror("pipe");
exit(EXIT_FAILURE);
}
}

printf("Initial pipe capacity: %d\n", fcntl(pfd[0][0], F_GETPIPE_SZ));

for (j = 2; j < argc; j += 2 ) {
loop = j / 2;
pcap = atoi(argv[j]);
printf(" Loop %d: set pipe capacity to %d bytes\n", loop, pcap);

for (p = 0; p < npipes; p++) {
s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap);
if (s == -1) {
fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
"failed: ", loop, p);
perror("fcntl");
exit(EXIT_FAILURE);
}

if (p == 0) {
printf(" F_SETPIPE_SZ returned %d\n", s);
rcap = s;
} else {
if (s != rcap) {
fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
"unexpected return: %d\n", loop, p, s);
exit(EXIT_FAILURE);
}
}

stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0;
if (stime > 0) {
printf(" Sleeping %d seconds\n", stime);
sleep(stime);
}
}
}

exit(EXIT_SUCCESS);
}

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

Patch history:

v2
* Switch order of test in 'if' statement to avoid function call
(to capability()) in normal path. [This is a fix to a preexisting
wart in the code. Thanks to Willy Tarreau]
* Perform (size > pipe_max_size) check before calling
account_pipe_buffers(). [Thanks to Vegard Nossum]
Quoting Vegard:

The potential problem happens if the user passes a very large number
which will overflow pipe->user->pipe_bufs.

On 32-bit, sizeof(int) == sizeof(long), so if they pass arg = INT_MAX
then round_pipe_size() returns INT_MAX. Although it's true that the
accounting is done in terms of pages and not bytes, so you'd need on
the order of (1 << 13) = 8192 processes hitting the limit at the same
time in order to make it overflow, which seems a bit unlikely.

(See https://lkml.org/lkml/2016/8/12/215 for another discussion on the
limit checking)

Link: http://lkml.kernel.org/r/1e464945-536b-2420-798b-e77b9c7e8593@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:32 +0800
3734a13b9 pipe: refactor argument for account_pipe_buffers() ... Browse Code »

This is a preparatory patch for following work. account_pipe_buffers()
performs accounting in the 'user_struct'. There is no need to pass a
pointer to a 'pipe_inode_info' struct (which is then dereferenced to
obtain a pointer to the 'user' field). Instead, pass a pointer directly
to the 'user_struct'. This change is needed in preparation for a
subsequent patch that the fixes the limit checking in alloc_pipe_info()
(and the resulting code is a little more logical).

Link: http://lkml.kernel.org/r/7277bf8c-a6fc-4a7d-659c-f5b145c981ab@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:31 +0800
d37d41666 pipe: move limit checking logic into pipe_set_size() ... Browse Code »

This is a preparatory patch for following work. Move the F_SETPIPE_SZ
limit-checking logic from pipe_fcntl() into pipe_set_size(). This
simplifies the code a little, and allows for reworking required in
a later patch that fixes the limit checking in pipe_set_size()

Link: http://lkml.kernel.org/r/3701b2c5-2c52-2c3e-226d-29b9deb29b50@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:31 +0800
f491bd711 pipe: relocate round_pipe_size() above pipe_set_size() ... Browse Code »

Patch series "pipe: fix limit handling", v2.

When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various limits
defined by /proc/sys/fs/pipe-* files are checked to see if unprivileged
users are exceeding limits on memory consumption.

While documenting and testing the operation of these limits I noticed
that, as currently implemented, these checks have a number of problems:

(1) When increasing the pipe capacity, the checks against the limits
in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
existing consumption, and exclude the memory required for the
increased pipe capacity. The new increase in pipe capacity can then
push the total memory used by the user for pipes (possibly far) over
a limit. This can also trigger the problem described next.

(2) The limit checks are performed even when the new pipe capacity
is less than the existing pipe capacity. This can lead to problems
if a user sets a large pipe capacity, and then the limits are
lowered, with the result that the user will no longer be able to
decrease the pipe capacity.

(3) As currently implemented, accounting and checking against the
limits is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a) simultaneously,
and then allocate pipe buffers that are accounted for only in step
(c). The race means that the user's pipe buffer allocation could be
pushed over the limit (by an arbitrary amount, depending on how
unlucky we were in the race). [Thanks to Vegard Nossum for spotting
this point, which I had missed.]

This patch series addresses these three problems.

This patch (of 8):

This is a minor preparatory patch. After subsequent patches,
round_pipe_size() will be called from pipe_set_size(), so place
round_pipe_size() above pipe_set_size().

Link: http://lkml.kernel.org/r/91a91fdb-a959-ba7f-b551-b62477cc98a1@gmail.com
Signed-off-by: Michael Kerrisk
Reviewed-by: Vegard Nossum
Cc: Willy Tarreau
Cc:
Cc: Tetsuo Handa
Cc: Jens Axboe
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk (man-pages)
2016-10-12 06:06:31 +0800
fcc24534b autofs: refactor ioctl fn vector in iookup_dev_ioctl() ... Browse Code »

cmd part of this struct is the same as an index of itself within
_ioctls[]. In fact this cmd is unused, so we can drop this part.

Link: http://lkml.kernel.org/r/20160831033414.9910.66697.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
962ca7cfb autofs: remove possibly misleading /* #define DEBUG */ ... Browse Code »

Having this in autofs_i.h gives illusion that uncommenting this enables
pr_debug(), but it doesn't enable all the pr_debug() in autofs because
inclusion order matters.

XFS has the same DEBUG macro in its core header fs/xfs/xfs.h, however XFS
seems to have a rule to include this prior to other XFS headers as well as
kernel headers. This is not the case with autofs, and DEBUG could be
enabled via Makefile, so autofs should just get rid of this comment to
make the code less confusing. It's a comment, so there is literally no
functional difference.

Link: http://lkml.kernel.org/r/20160831033409.9910.77067.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
390855547 autofs: fix print format for ioctl warning message ... Browse Code »

All other warnings use "cmd(0x%08x)" and this is the only one with
"cmd(%d)". (below comes from my userspace debug program, but not
automount daemon)

[ 1139.905676] autofs4:pid:1640:check_dev_ioctl_version: ioctl control interface version mismatch: kernel(1.0), user(0.0), cmd(-1072131215)

Link: http://lkml.kernel.org/r/20160812024851.12352.75458.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
d9e192320 autofs: add autofs_dev_ioctl_version() for AUTOFS_DEV_IOCTL_VERSION_CMD ... Browse Code »

No functional changes, based on the following justification.

1. Make the code more consistent using the ioctl vector _ioctls[],
rather than assigning NULL only for this ioctl command.
2. Remove goto done; for better maintainability in the long run.
3. The existing code is based on the fact that validate_dev_ioctl()
sets ioctl version for any command, but AUTOFS_DEV_IOCTL_VERSION_CMD
should explicitly set it regardless of the default behavior.

Link: http://lkml.kernel.org/r/20160812024846.12352.9885.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2016-10-12 06:06:31 +0800
aa8419367 autofs: fix dev ioctl number range check ... Browse Code »

The count of miscellaneous device ioctls in fs/autofs4/autofs_i.h is wrong.

The number of ioctls is the difference between AUTOFS_DEV_IOCTL_VERSION_CMD
and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD (14) not the difference between
AUTOFS_IOC_COUNT and 11 (21).

[kusumi.tomohiro@gmail.com: fix typo that made the count macro negative]
Link: http://lkml.kernel.org/r/20160831033420.9910.16809.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/20160812024841.12352.11975.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc: Tomohiro Kusumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2016-10-12 06:06:31 +0800
b6e3795a0 autofs: fix pr_debug() message ... Browse Code »

This isn't a return value, so change the message to indicate the status is
the result of may_umount().

(or locate pr_debug() after put_user() with the same message)

Link: http://lkml.kernel.org/r/20160812024836.12352.74628.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
41a4497a4 autofs: don't fail to free_dev_ioctl(param) ... Browse Code »

Returning -ENOTTY here fails to free dynamically allocated param.

Link: http://lkml.kernel.org/r/20160812024815.12352.69153.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
eea618e6d autofs: remove obsolete sb fields ... Browse Code »

These two were left from commit aa55ddf340c9 ("autofs4: remove unused
ioctls") which removed unused ioctls.

Link: http://lkml.kernel.org/r/20160812024810.12352.96377.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
ca552599b autofs: use autofs4_free_ino() to kfree dentry data ... Browse Code »

kfree dentry data allocated by autofs4_new_ino() with autofs4_free_ino()
instead of raw kfree. (since we have the interface to free autofs_info*)

This patch was modified to remove the need to set the dentry info field to
NULL dew to a change in the previous patch.

Link: http://lkml.kernel.org/r/20160812024805.12352.43650.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
1574fa7be autofs: remove ino free in autofs4_dir_symlink() ... Browse Code »

The inode allocation failure case in autofs4_dir_symlink() frees the
autofs dentry info of the dentry without setting ->d_fsdata to NULL.

That could lead to a double free so just get rid of the free and leave it
to ->d_release().

Link: http://lkml.kernel.org/r/20160812024759.12352.10653.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc: Tomohiro Kusumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2016-10-12 06:06:31 +0800
97537b35b autofs: add WARN_ON(1) for non dir/link inode case ... Browse Code »

It's invalid if the given mode is neither dir nor link, so warn on else
case.

Link: http://lkml.kernel.org/r/20160812024754.12352.8536.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
1973a1226 autofs: fix autofs4_fill_super() error exit handling ... Browse Code »

Somewhere along the line the error handling gotos have become incorrect.

Link: http://lkml.kernel.org/r/20160812024749.12352.15100.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc: Tomohiro Kusumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2016-10-12 06:06:31 +0800
749800ef5 autofs: test autofs versions first on sb initialization ... Browse Code »

This patch does what the below comment says. It could be and it's
considered better to do this first before various functions get called
during initialization.

/* Couldn't this be tested earlier? */

Link: http://lkml.kernel.org/r/20160812024744.12352.43075.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
4a44c1859 autofs: drop unnecessary extern in autofs_i.h ... Browse Code »

autofs4_kill_sb() doesn't need to be declared as extern, and no other
functions in .h are explicitly declared as extern.

Link: http://lkml.kernel.org/r/20160812024739.12352.99354.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomohiro Kusumi
2016-10-12 06:06:31 +0800
2d19309cf fs/select: add vmalloc fallback for select(2) ... Browse Code »

The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be physically
contiguous and the allocation is temporary for the duration of the syscall
only. There were some concerns, whether this would have negative impact on the
system by exposing vmalloc() to userspace. Although an excessive use of vmalloc
can cause some system wide performance issues - TLB flushes etc. - a large
order allocation is not for free either and an excessive reclaim/compaction can
have a similar effect. Also note that the size is effectively limited by
RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
bitmaps will fit well within single page and thus the vmalloc() fallback could
be only excercised for processes where root allows a higher limit.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.

[eric.dumazet@gmail.com: fix failure path logic]
[akpm@linux-foundation.org: use proper type for size]
Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Alexander Viro
Cc: Eric Dumazet
Cc: David Laight
Cc: Hillf Danton
Cc: Nicholas Piggin
Cc: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2016-10-12 06:06:30 +0800
25f4c4141 block: implement (some of) fallocate for block devices ... Browse Code »

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
set. A length that goes past the end of the device will be clamped to the
device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
start and length must be aligned to the device's logical block size.

Since the semantics of fallocate are fairly well established already, wire
up the two pieces. The other fallocate variants (collapse range, insert
range, and allocate blocks) are not supported.

Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org
Signed-off-by: Darrick J. Wong
Reviewed-by: Hannes Reinecke
Reviewed-by: Bart Van Assche
Cc: Theodore Ts'o
Cc: Martin K. Petersen
Cc: Mike Snitzer # tweaked header
Cc: Brian Foster
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2016-10-12 06:06:30 +0800
0cc482ee4 ocfs2: fix memory leak in dlm_migrate_request_handler() ... Browse Code »

In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle
should be freed, otherwise the memory will be leaked.

Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Guozhonghua
Reviewed-by: Mark Fasheh
Cc: Eric Ren
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guozhonghua
2016-10-12 06:06:30 +0800
d09ba1311 Merge tag 'libnvdimm-for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"Aside from the recently added pmem sub-division support these have
been in -next for several releases with no reported issues. The sub-
division support was included in next-20161010 with no reported
issues. It passes all unit tests including new tests for all the new
functionality below.

Summary:

- PMEM sub-division support: Allow a single PMEM region to be divided
into multiple namespaces. Originally, ~2 years ago, it was thought
that partitions of a /dev/pmemX block device could handle
sub-allocations of persistent memory for different use cases. With
the decision to not support DAX mappings of raw block-devices, and
the genesis of device-dax, the need for having multiple
pmem-namespace per region has grown.

- Device-DAX unified inode: In support of dynamic-resizing of a
device-dax instance the kernel arranges for all mappings of a
device-dax node to share the same inode. This allows unmap /
truncate / invalidation events to affect all instances of the
device similar to the behavior of mmap on block devices.

- Hardware error scrubbing reworks: The original address-range-scrub
and badblocks tracking solution allowed clearing entries at the
individual namespace level, but it failed to clear the internal
list of media errors maintained at the bus level. The result was
that the next scrub or namespace disable/re-enable event would
restore the cleared badblocks, but now that is fixed. The v4.8
kernel introduced an auto-scrub-on-machine-check behavior to
repopulate the badblocks list. Now, in v4.9, the auto-scrub
behavior can be disabled and simply arrange for the error reported
in the machine-check to be added to the list.

- DIMM health-event notification support: ACPI 6.1 defines a
notification event code that can be send to ACPI NVDIMM devices. A
poll(2) capable file descriptor for these events can be obtained
from the nmemX/nfit/flags sysfs-attribute of a libnvdimm memory
device.

- Miscellaneous fixes: NVDIMM-N probe error, device-dax build error,
and a change to dedup the flush hint list to not flush the memory
controller more than necessary"

* tag 'libnvdimm-for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
/dev/dax: fix Kconfig dependency build breakage
dax: use correct dev_t value
dax: convert devm_create_dax_dev to PTR_ERR
libnvdimm, namespace: allow creation of multiple pmem-namespaces per region
libnvdimm, namespace: lift single pmem limit in scan_labels()
libnvdimm, namespace: filter out of range labels in scan_labels()
libnvdimm, namespace: enable allocation of multiple pmem namespaces
libnvdimm, namespace: update label implementation for multi-pmem
libnvdimm, namespace: expand pmem device naming scheme for multi-pmem
libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
libnvdimm, namespace: sort namespaces by dpa at init
libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time
tools/testing/nvdimm: support for sub-dividing a pmem region
libnvdimm, namespace: unify blk and pmem label scanning
libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper
libnvdimm, label: convert label tracking to a linked list
libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc
nvdimm: reduce duplicated wpq flushes
libnvdimm: clear the internal poison_list when clearing badblocks
pmem: reduce kmap_atomic sections to the memcpys only
...

Linus Torvalds
2016-10-12 03:19:31 +0800
f29135b54 Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This is a big variety of fixes and cleanups.

Liu Bo continues to fixup fuzzer related problems, and some of Josef's
cleanups are prep for his bigger extent buffer changes (slated for
v4.10)"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
Btrfs: don't BUG() during drop snapshot
btrfs: fix btrfs_no_printk stub helper
Btrfs: memset to avoid stale content in btree leaf
btrfs: parent_start initialization cleanup
btrfs: Remove already completed TODO comment
btrfs: Do not reassign count in btrfs_run_delayed_refs
btrfs: fix a possible umount deadlock
Btrfs: fix memory leak in do_walk_down
btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
btrfs: convert send's verbose_printk to btrfs_debug
btrfs: convert pr_* to btrfs_* where possible
btrfs: convert printk(KERN_* to use pr_* calls
btrfs: unsplit printed strings
btrfs: clean the old superblocks before freeing the device
Btrfs: kill BUG_ON in run_delayed_tree_ref
Btrfs: don't leak reloc root nodes on error
btrfs: squash lines for simple wrapper functions
Btrfs: improve check_node to avoid reading corrupted nodes
...

Linus Torvalds
2016-10-12 02:23:06 +0800
4c609922a Merge tag 'upstream-4.9-rc1' of git://git.infradead.org/linux-ubifs ... Browse Code »

Pull UBI/UBIFS updates from Richard Weinberger:
"This pull request contains:

- Fixes for both UBI and UBIFS
- overlayfs support (O_TMPFILE, RENAME_WHITEOUT/EXCHANGE)
- Code refactoring for the upcoming MLC support"

[ Ugh, we just got rid of the "rename2()" naming for the extended rename
functionality. And this re-introduces it in ubifs with the cross-
renaming and whiteout support.

But rather than do any re-organizations in the merge itself, the
naming can be cleaned up later ]

* tag 'upstream-4.9-rc1' of git://git.infradead.org/linux-ubifs: (27 commits)
UBIFS: improve function-level documentation
ubifs: fix host xattr_len when changing xattr
ubifs: Use move variable in ubifs_rename()
ubifs: Implement RENAME_EXCHANGE
ubifs: Implement RENAME_WHITEOUT
ubifs: Implement O_TMPFILE
ubi: Fix Fastmap's update_vol()
ubi: Fix races around ubi_refill_pools()
ubi: Deal with interrupted erasures in WL
UBI: introduce the VID buffer concept
UBI: hide EBA internals
UBI: provide an helper to query LEB information
UBI: provide an helper to check whether a LEB is mapped or not
UBI: add an helper to check lnum validity
UBI: simplify LEB write and atomic LEB change code
UBI: simplify recover_peb() code
UBI: move the global ech and vidh variables into struct ubi_attach_info
UBI: provide helpers to allocate and free aeb elements
UBI: fastmap: use ubi_io_{read, write}_data() instead of ubi_io_{read, write}()
UBI: fastmap: use ubi_rb_for_each_entry() in unmap_peb()
...

Linus Torvalds
2016-10-12 01:49:44 +0800

11 Oct, 2016

4 commits

6b5e09a74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Netfilter list handling fix, from Linus.

2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
endpoints, build warnings, missing notifications, etc.) From David
Howells.

3) Kernel log message missing newlines, from Colin Ian King.

4) Don't enter direct reclaim in netlink dumps, the idea is to use a
high order allocation first and fallback quickly to a 0-order
allocation if such a high-order one cannot be done cheaply and
without reclaim. From Eric Dumazet.

5) Fix firmware download errors in btusb bluetooth driver, from Ethan
Hsieh.

6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.

7) Fix MDIO_XGENE dup Kconfig entry. From Laura Abbott.

8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
Żenczykowski.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
netfilter: Fix slab corruption.
be2net: Enable VF link state setting for BE3
be2net: Fix TX stats for TSO packets
be2net: Update Copyright string in be_hw.h
be2net: NCSI FW section should be properly updated with ethtool for BE3
be2net: Provide an alternate way to read pf_num for BEx chips
wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
net: macb: NULL out phydev after removing mdio bus
xen-netback: make sure that hashes are not send to unaware frontends
Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
MAINTAINERS: add myself as a maintainer of xen-netback
ipv6 addrconf: disallow rtr_solicits < -1
Bluetooth: btusb: Fix atheros firmware download error
drivers: net: phy: Correct duplicate MDIO_XGENE entry
ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM
net: ethernet: mediatek: remove hwlro property in the device tree
net: ethernet: mediatek: get hw lro capability by the chip id instead of by the dtsi
net: ethernet: mediatek: get the chip id by ETHDMASYS registers
net: bgmac: Fix errant feature flag check
netlink: do not enter direct reclaim from netlink_dump()
...

Linus Torvalds
2016-10-11 23:10:19 +0800
101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
2016-10-11 11:02:51 +0800
97d211670 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas

This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check

Linus Torvalds
2016-10-11 08:11:50 +0800