Eric Lee / smarc-fsl-linux-kernel

20 Aug, 2016

1 commit

a8414fa36 Merge tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/dgc/linux-xfs

Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:

Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping

Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"

* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement

Linus Torvalds
2016-08-20 00:06:41 +0800

18 Aug, 2016

1 commit

184ca8234 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.

2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.

4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.

5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.

6) Several sctp_diag fixes from Phil Sutter.

7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

8) Checksum fixup fixes in bpf from Daniel Borkmann.

9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...

Linus Torvalds
2016-08-18 08:26:58 +0800

17 Aug, 2016

13 commits

32438cf9d Merge branch 'iomap-fixes-4.8-rc3' into for-next Browse Code »

Dave Chinner
2016-08-17 09:13:37 +0800
a03f1a663 xfs: remove OWN_AG rmap when allocating a block from the AGFL ... Browse Code »

When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller. Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up. This bug was discovered by running generic/299
repeatedly.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-17 09:12:57 +0800
1d4795e7b xfs: (re-)implement FIEMAP_FLAG_XATTR ... Browse Code »

Use a special read-only iomap_ops implementation to support fiemap on
the attr fork.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:45:30 +0800
b95a21271 xfs: simplify xfs_file_iomap_begin ... Browse Code »

We'll never get nimap == 0 for a successful return from xfs_bmapi_read,
so don't try to handle it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:44:52 +0800
f20ac7ab1 iomap: mark ->iomap_end as optional ... Browse Code »

No need to implement it for read-only mappings.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:42:34 +0800
ac2dc058b iomap: prepare iomap_fiemap for attribute mappings ... Browse Code »

By bassing through an -ENOENT, similar to the old XFS implementation of
FIEMAP_FLAG_XATTR.

Signed-off-by: Dave Chinner
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Dave Chinner
2016-08-17 06:41:34 +0800
8896b8f60 iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag ... Browse Code »

The flag is checked as supported, but then we do an unconditional
sync of the file, regardless of whether the flag is set or not. Make
the sync conditional on having the FIEMAP_FLAG_SYNC flag set.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-08-17 06:41:10 +0800
274c88749 iomap: remove superflous pagefault_disable from iomap_write_actor ... Browse Code »

iov_iter_copy_from_user_atomic disables page faults internally, no need to
do it around the call. This also brings the iomap code in line with
the original filemap version.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:40:18 +0800
97dd8c9ee iomap: remove superflous mark_page_accessed from iomap_write_actor ... Browse Code »

This catches up with commit 2457ae ("mm: non-atomically mark page
accessed during page cache allocation where possible"), which
moved the initial access marking into the pagecache allocator.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:39:47 +0800
f32866fdc xfs: store rmapbt block count in the AGF ... Browse Code »

Track the number of blocks used for the rmapbt in the AGF. When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Darrick J. Wong
2016-08-17 06:31:49 +0800
8b2180b3b xfs: don't invalidate whole file on DAX read/write ... Browse Code »

When we do DAX IO, we try to invalidate the entire page cache held
on the file. This is incorrect as it will trash the entire mapping
tree that now tracks dirty state in exceptional entries in the radix
tree slots.

What we are trying to do is remove cached pages (e.g from reads
into holes) that sit in the radix tree over the range we are about
to write to. Hence we should just limit the invalidation to the
range we are about to overwrite.

Reported-by: Jan Kara
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-08-17 06:31:33 +0800
0af32fb46 xfs: fix bogus space reservation in xfs_iomap_write_allocate ... Browse Code »

The space reservations was without an explaination in commit

"Add error reporting calls in error paths that return EFSCORRUPTED"

back in 2003. There is no reason to reserve disk blocks in the
transaction when allocating blocks for delalloc space as we already
reserved the space when creating the delalloc extent.

With this fix we stop running out of the reserved pool in
generic/229, which has happened for long time with small blocksize
file systems, and has increased in severity with the new buffered
write path.

[ dchinner: we still need to pass the block reservation into
xfs_bmapi_write() to ensure we don't deadlock during AG selection.
See commit dbd5c8c ("xfs: pass total block res. as total
xfs_bmapi_write() parameter") for more details on why this is
necessary. ]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-08-17 06:30:28 +0800
4dd3fd719 xfs: don't assert fail on non-async buffers on ioacct decrement ... Browse Code »

The buffer I/O accounting mechanism tracks async buffers under I/O. As
an optimization, the buffer I/O count is incremented only once on the
first async I/O for a given hold cycle of a buffer and decremented once
the buffer is released to the LRU (or freed).

xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but
we have one or two corner cases where a buffer can be submitted for I/O
multiple times via different methods in a single hold cycle. If an async
I/O occurs first, the I/O count is incremented. If a sync I/O occurs
before the hold count drops, XBF_ASYNC is cleared by the time the I/O
count is decremented.

Remove the async assert check from xfs_buf_ioacct_dec() as this is a
perfectly valid scenario. For the purposes of I/O accounting, we really
only care about the buffer async state at I/O submission time.

Discovered-and-analyzed-by: Dave Chinner
Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-08-17 06:30:28 +0800

14 Aug, 2016

1 commit

a1e210331 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- an NVMe fix from Gabriel, fixing a suspend/resume issue on some
setups

- addition of a few missing entries in the block queue sysfs
documentation, from Joe

- a fix for a sparse shadow warning for the bvec iterator, from
Johannes

- a writeback deadlock involving raid issuing barriers, and not
flushing the plug when we wakeup the flusher threads. From
Konstantin

- a set of patches for the NVMe target/loop/rdma code, from Roland and
Sagi

* 'for-linus' of git://git.kernel.dk/linux-block:
bvec: avoid variable shadowing warning
doc: update block/queue-sysfs.txt entries
nvme: Suspend all queues before deletion
mm, writeback: flush plugged IO in wakeup_flusher_threads()
nvme-rdma: Remove unused includes
nvme-rdma: start async event handler after reconnecting to a controller
nvmet: Fix controller serial number inconsistency
nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
nvmet-rdma: Correctly handle RDMA device hot removal
nvme-rdma: Make sure to shutdown the controller if we can
nvme-loop: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Free the I/O tags when we delete the controller
nvme-rdma: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Fix device removal handling
nvme-rdma: Queue ns scanning after a sucessful reconnection
nvme-rdma: Don't leak uninitialized memory in connect request private data

Linus Torvalds
2016-08-14 00:56:45 +0800

13 Aug, 2016

3 commits

b112324c2 Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd fixes from Bruce Fields:
"Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for
races in the LOCK code which appear to go back to the big nfsd state
lock removal from 3.17"

* tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux:
nfsd: don't return an unhashed lock stateid after taking mutex
nfsd: Fix race between FREE_STATEID and LOCK
nfsd: fix dentry refcounting on create

Linus Torvalds
2016-08-13 07:28:41 +0800
dd257933f nfsd: don't return an unhashed lock stateid after taking mutex ... Browse Code »

nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.

Signed-off-by: Jeff Layton
Tested-by: Alexey Kodanev
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Jeff Layton
2016-08-13 04:10:25 +0800
990917006 Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

- Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
needs multiple different security services (e.g. krb5i and krb5p).

- Stable patch to fix a regression introduced by the use of
SO_REUSEPORT, and that prevented the use of multiple different NFS
versions to the same server.

- TCP socket reconnection timer fixes.

- Patch from Neil to disable the use of IPv6 temporary addresses"

* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Cap the transport reconnection timer at 1/2 lease period
NFSv4: Cleanup the setting of the nfs4 lease period
SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
SUNRPC: Fix reconnection timeouts
NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
SUNRPC: disable the use of IPv6 temporary addresses.
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Fix up socket autodisconnect
SUNRPC: Handle EADDRNOTAVAIL on connection failures

Linus Torvalds
2016-08-13 03:32:24 +0800

12 Aug, 2016

4 commits

4b9eaf33d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"7 fixes"

* emailed patches from Andrew Morton :
mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats
mm, oom: fix uninitialized ret in task_will_free_mem()
kasan: remove the unnecessary WARN_ONCE from quarantine.c
mm: memcontrol: fix memcg id ref counter on swap charge move
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
proc, meminfo: use correct helpers for calculating LRU sizes in meminfo
mm/hugetlb: fix incorrect hugepages count during mem hotplug

Linus Torvalds
2016-08-12 07:58:24 +0800
2f95ff90b proc, meminfo: use correct helpers for calculating LRU sizes in meminfo ... Browse Code »

meminfo_proc_show() and si_mem_available() are using the wrong helpers
for calculating the size of the LRUs. The user-visible impact is that
there appears to be an abnormally high number of unevictable pages.

Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.net
Signed-off-by: Mel Gorman
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2016-08-12 07:58:13 +0800
3b3ce01a5 Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client ... Browse Code »

Pull ceph fixes from Ilya Dryomov:
"A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful
of static checker fixes"

* tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client:
ceph: initialize pathbase in the !dentry case in encode_caps_cb()
rbd: nuke the 32-bit pool id check
rbd: destroy header_oloc in rbd_dev_release()
ceph: fix null pointer dereference in ceph_flush_snaps()
libceph: using kfree_rcu() to simplify the code
libceph: make cancel_generic_request() static
libceph: fix return value check in alloc_msg_with_page_vector()

Linus Torvalds
2016-08-12 04:53:34 +0800
42691398b nfsd: Fix race between FREE_STATEID and LOCK ... Browse Code »

When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324 R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.

Reported-by: Alexey Kodanev
Fix-suggested-by: Jeff Layton
Signed-off-by: Chuck Lever
Tested-by: Alexey Kodanev
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-08-12 03:08:39 +0800

11 Aug, 2016

2 commits

502aa0a5b nfsd: fix dentry refcounting on create ... Browse Code »

b44061d0b9 introduced a dentry ref counting bug. Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked(). The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.

Signed-off-by: Josef Bacik
Fixes: b44061d0b9 "nfsd: reorganize nfsd_create"
Reported-by: kernel test robot
Reviewed-by: Jeff Layton
Acked-by: Al Viro
Signed-off-by: J. Bruce Fields

Josef Bacik
2016-08-11 23:42:08 +0800
9512c47ec Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"Some fixes for btrfs send/recv and fsync from Filipe and Robbie Ko.

Bonus points to Filipe for already having xfstests in place for many
of these"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve()
Btrfs: improve performance on fsync against new inode after rename/unlink
Btrfs: be more precise on errors when getting an inode from disk
Btrfs: send, don't bug on inconsistent snapshots
Btrfs: send, avoid incorrect leaf accesses when sending utimes operations
Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations
Btrfs: send, fix warning due to late freeing of orphan_dir_info structures
Btrfs: incremental send, fix premature rmdir operations
Btrfs: incremental send, fix invalid paths for rename operations
Btrfs: send, add missing error check for calls to path_loop()
Btrfs: send, fix failure to move directories with the same name around
Btrfs: add missing check for writeback errors on fsync

Linus Torvalds
2016-08-11 02:16:03 +0800

10 Aug, 2016

2 commits

51350ea0d mm, writeback: flush plugged IO in wakeup_flusher_threads() ... Browse Code »

I've found funny live-lock between raid10 barriers during resync and
memory controller hard limits. Inside mpage_readpages() task holds on to
its plug bio which blocks the barrier in raid10. Its memory cgroup have
no free memory thus the task goes into reclaimer but all reclaimable
pages are dirty and cannot be written because raid10 is rebuilding and
stuck on the barrier.

Common flush of such IO in schedule() never happens, because the caller
doesn't go to sleep.

Lock is 'live' because changing memory limit or killing tasks which
holds that stuck bio unblock whole progress.

That was what happened in 3.18.x but I see no difference in upstream
logic. Theoretically this might happen even without memory cgroup.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2016-08-10 09:58:06 +0800
c4159a75b mm: memcontrol: only mark charged pages with PageKmemcg ... Browse Code »

To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context. To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true. The latter is set iff there are kmem-enabled memory cgroups
(online or offline). The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

# no memory cgroups has been created yet, create one
mkdir /sys/fs/cgroup/memory/test
# run something allocating pages with __GFP_ACCOUNT, e.g.
# a program using pipe
dmesg | tail
# remove the memory cgroup
rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page->_mapcount != -1:

BUG: Bad page state in process swapper/0 pfn:1fd945c
page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0
flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: Eric Dumazet
Signed-off-by: Vladimir Davydov
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-08-10 01:14:10 +0800

09 Aug, 2016

2 commits

4eacd4cb3 ceph: initialize pathbase in the !dentry case in encode_caps_cb() ... Browse Code »

pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2016-08-09 23:26:56 +0800
e4d2b16a4 ceph: fix null pointer dereference in ceph_flush_snaps() ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2016-08-09 03:41:43 +0800

08 Aug, 2016

2 commits

1eff9d322 block: rename bio bi_rw to bi_opf ... Browse Code »

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800
c11f0c0b5 block/mm: make bdev_ops->rw_page() take a bool for read/write ... Browse Code »

Commit abf545484d31 changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.

Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.

Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800

07 Aug, 2016

3 commits

e9d488c31 Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc ... Browse Code »

Pull binfmt_misc update from James Bottomley:
"This update is to allow architecture emulation containers to function
such that the emulation binary can be housed outside the container
itself. The container and fs parts both have acks from relevant
experts.

To use the new feature you have to add an F option to your binfmt_misc
configuration"

From the docs:
"The usual behaviour of binfmt_misc is to spawn the binary lazily when
the misc format file is invoked. However, this doesn't work very well
in the face of mount namespaces and changeroots, so the F mode opens
the binary as soon as the emulation is installed and uses the opened
image to spawn the emulator, meaning it is always available once
installed, regardless of how the environment changes"

* tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
binfmt_misc: add F option description to documentation
binfmt_misc: add persistent opened binary handler for containers
fs: add filp_clone_open API

Linus Torvalds
2016-08-07 22:13:14 +0800
337684a17 fs: return EPERM on immutable inode ... Browse Code »

In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.

Acked-by: Dave Chinner
Signed-off-by: Eryu Guan
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Eryu Guan
2016-08-07 22:03:31 +0800
fe64f3283 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted cleanups and fixes.

In the "trivial API change" department - ->d_compare() losing 'parent'
argument"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cachefiles: Fix race between inactivating and culling a cache object
9p: use clone_fid()
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
vfs: make dentry_needs_remove_privs() internal
vfs: remove file_needs_remove_privs()
vfs: fix deadlock in file_remove_privs() on overlayfs
get rid of 'parent' argument of ->d_compare()
cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
affs ->d_compare(): don't bother with ->d_inode
fold _d_rehash() and __d_rehash() together
fold dentry_rcuwalk_invalidate() into its only remaining caller

Linus Torvalds
2016-08-07 22:01:14 +0800

06 Aug, 2016

6 commits

0cbbc422d Merge tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/dgc/linux-xfs

Pull more xfs updates from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle, and
contains the new reverse block mapping feature for XFS.

Reverse mapping allows us to track the owner of a specific block on
disk precisely. It is implemented as a set of btrees (one per
allocation group) that track the owners of allocated extents.
Effectively it is a "used space tree" that is updated when we allocate
or free extents. i.e. it is coherent with the free space btrees we
already maintain and never overlaps with them.

This reverse mapping infrastructure is the building block of several
upcoming features - reflink, copy-on-write data, dedupe, online
metadata and data scrubbing, highly accurate bad sector/data loss
reporting to users, and significantly improved reconstruction of
damaged and corrupted filesystems. There's a lot of new stuff coming
along in the next couple of cycles,a nd it all builds in the rmap
infrastructure.

As such, it's a huge chunk of new code with new on-disk format
features and internal infrastructure. It warns at mount time as an
experimental feature and that it may eat data (as we do with all new
on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time kernel
with this code in it is released.

The new rmap enabled code regresses 3 xfstests - all are ENOSPC
related corner cases, one of which Darrick posted a fix for a few
hours ago. The other two are fixed by infrastructure that is part of
the upcoming reflink patchset. This new ENOSPC infrastructure
requires a on-disk format tweak required to keep mount times in
check - we need to keep an on-disk count of allocated rmapbt blocks so
we don't have to scan the entire btrees at mount time to count them.

This is currently being tested and will be part of the fixes sent in
the next week or two so users will not be exposed to this change"

* tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits)
xfs: move (and rename) the deferred bmap-free tracepoints
xfs: collapse single use static functions
xfs: remove unnecessary parentheses from log redo item recovery functions
xfs: remove the extents array from the rmap update done log item
xfs: in btree_lshift, only allocate temporary cursor when needed
xfs: remove unnecesary lshift/rshift key initialization
xfs: remove the get*keys and update_keys btree ops pointers
xfs: enable the rmap btree functionality
xfs: don't update rmapbt when fixing agfl
xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
xfs: add rmap btree block detection to log recovery
xfs: add rmap btree geometry feature flag
xfs: propagate bmap updates to rmapbt
xfs: enable the xfs_defer mechanism to process rmaps to update
xfs: log rmap intent items
xfs: create rmap update intent log items
xfs: add rmap btree insert and delete helpers
xfs: convert unwritten status of reverse mappings
xfs: remove an extent from the rmap btree
xfs: add an extent to the rmap btree
...

Linus Torvalds
2016-08-06 21:50:36 +0800
835c92d43 Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull qstr constification updates from Al Viro:
"Fairly self-contained bunch - surprising lot of places passes struct
qstr * as an argument when const struct qstr * would suffice; it
complicates analysis for no good reason.

I'd prefer to feed that separately from the assorted fixes (those are
in #for-linus and with somewhat trickier topology)"

* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qstr: constify instances in adfs
qstr: constify instances in lustre
qstr: constify instances in f2fs
qstr: constify instances in ext2
qstr: constify instances in vfat
qstr: constify instances in procfs
qstr: constify instances in fuse
qstr constify instances in fs/dcache.c
qstr: constify instances in nfs
qstr: constify instances in ocfs2
qstr: constify instances in autofs4
qstr: constify instances in hfs
qstr: constify instances in hfsplus
qstr: constify instances in logfs
qstr: constify dentry_init_security

Linus Torvalds
2016-08-06 21:49:02 +0800
372ee1638 rxrpc: Fix races between skb free, ACK generation and replying ... Browse Code »

Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().

(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.

The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.

(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.

(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.

(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.

(6) There are many bits of unmarshalling code where:

ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}

is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:

ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;

(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:

if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;

becomes:

ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;

(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[] [] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
0000000000000006 000000000be04930 0000000000000000 ffff880400000000
ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
[] ? mark_held_locks+0x5e/0x74
[] ? __local_bh_enable_ip+0x9b/0xa1
[] ? trace_hardirqs_on_caller+0x16d/0x189
[] lock_acquire+0x122/0x1b6
[] ? lock_acquire+0x122/0x1b6
[] ? skb_dequeue+0x18/0x61
[] _raw_spin_lock_irqsave+0x35/0x49
[] ? skb_dequeue+0x18/0x61
[] skb_dequeue+0x18/0x61
[] afs_deliver_to_call+0x344/0x39d [kafs]
[] afs_process_async_call+0x4c/0xd5 [kafs]
[] afs_async_workfn+0xe/0x10 [kafs]
[] process_one_work+0x29d/0x57c
[] worker_thread+0x24a/0x385
[] ? rescuer_thread+0x2d0/0x2d0
[] kthread+0xf3/0xfb
[] ret_from_fork+0x1f/0x40
[] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2016-08-06 12:08:40 +0800
a02040d8d Merge tag 'pstore-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull pstore fixes from Kees Cook:
"Fixes for pstore ramoops driver to catch bad kfree() and to use better
DT bindings"

* tag 'pstore-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ramoops: use persistent_ram_free() instead of kfree() for freeing prz
ramoops: use DT reserved-memory bindings

Linus Torvalds
2016-08-06 11:52:52 +0800
fff648da9 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Here's the second round of block updates for this merge window.

It's a mix of fixes for changes that went in previously in this round,
and fixes in general. This pull request contains:

- Fixes for loop from Christoph

- A bdi vs gendisk lifetime fix from Dan, worth two cookies.

- A blk-mq timeout fix, when on frozen queues. From Gabriel.

- Writeback fix from Jan, ensuring that __writeback_single_inode()
does the right thing.

- Fix for bio->bi_rw usage in f2fs from me.

- Error path deadlock fix in blk-mq sysfs registration from me.

- Floppy O_ACCMODE fix from Jiri.

- Fix to the new bio op methods from Mike.

One more followup will be coming here, ensuring that we don't
propagate the block types outside of block. That, and a rename of
bio->bi_rw is coming right after -rc1 is cut.

- Various little fixes"

* 'for-linus' of git://git.kernel.dk/linux-block:
mm/block: convert rw_page users to bio op use
loop: make do_req_filebacked more robust
loop: don't try to use AIO for discards
blk-mq: fix deadlock in blk_mq_register_disk() error path
Include: blkdev: Removed duplicate 'struct request;' declaration.
Fixup direct bi_rw modifiers
block: fix bdi vs gendisk lifetime mismatch
blk-mq: Allow timeouts to run while queue is freezing
nbd: fix race in ioctl
block: fix use-after-free in seq file
f2fs: drop bio->bi_rw manual assignment
block: add missing group association in bio-cloning functions
blkcg: kill unused field nr_undestroyed_grps
writeback: Write dirty times for WB_SYNC_ALL writeback
floppy: fix open(O_ACCMODE) for ioctl-only open

Linus Torvalds
2016-08-06 11:31:51 +0800
8d480326c NFSv4: Cap the transport reconnection timer at 1/2 lease period ... Browse Code »

We don't want to miss a lease period renewal due to the TCP connection
failing to reconnect in a timely fashion. To ensure this doesn't happen,
cap the reconnection timer so that we retry the connection attempt
at least every 1/2 lease period.

Signed-off-by: Trond Myklebust

Trond Myklebust
2016-08-06 07:22:22 +0800