Eric Lee / smarc-fsl-linux-kernel

26 Apr, 2018

40 commits

ad10785a7 netfilter: x_tables: fix pointer leaks to userspace ... Browse Code »

[ Upstream commit 1e98ffea5a8935ec040ab72299e349cb44b8defd ]

Several netfilter matches and targets put kernel pointers into
info objects, but don't set usersize in descriptors.
This leads to kernel pointer leaks if a match/target is set
and then read back to userspace.

Properly set usersize for these matches/targets.

Found with manual code inspection.

Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize")
Signed-off-by: Dmitry Vyukov
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dmitry Vyukov
2018-04-26 17:02:13 +0800
2b7cc9368 x86/hyperv: Check for required priviliges in hyperv_init() ... Browse Code »

[ Upstream commit 89a8f6d4904c8cf3ff8fee9fdaff392a6bbb8bf6 ]

In hyperv_init() its presumed that it always has access to VP index and
hypercall MSRs while according to the specification it should be checked if
it's allowed to access the corresponding MSRs before accessing them.

Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Thomas Gleixner
Reviewed-by: Thomas Gleixner
Cc: Stephen Hemminger
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář
Cc: Haiyang Zhang
Cc: "Michael Kelley (EOSG)"
Cc: Roman Kagan
Cc: Andy Lutomirski
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini
Cc: "K. Y. Srinivasan"
Cc: Cathy Avery
Cc: Mohammed Gamal
Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Vitaly Kuznetsov
2018-04-26 17:02:13 +0800
cdf635a66 gianfar: prevent integer wrapping in the rx handler ... Browse Code »

[ Upstream commit 202a0a70e445caee1d0ec7aae814e64b1189fa4d ]

When the frame check sequence (FCS) is split across the last two frames
of a fragmented packet, part of the FCS gets counted twice, once when
subtracting the FCS, and again when subtracting the previously received
data.

For example, if 1602 bytes are received, and the first fragment contains
the first 1600 bytes (including the first two bytes of the FCS), and the
second fragment contains the last two bytes of the FCS:

'skb->len == 1600' from the first fragment

size = lstatus & BD_LENGTH_MASK; # 1602
size -= ETH_FCS_LEN; # 1598
size -= skb->len; # -2

Since the size is unsigned, it wraps around and causes a BUG later in
the packet handling, as shown below:

kernel BUG at ./include/linux/skbuff.h:2068!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c021ec60] skb_pull+0x24/0x44
LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
Call Trace:
[df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
[df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
[df7edf40] [c023352c] net_rx_action+0x21c/0x274
[df7edf90] [c0329000] __do_softirq+0xd8/0x240
[df7edff0] [c000c108] call_do_irq+0x24/0x3c
[c0597e90] [c00041dc] do_IRQ+0x64/0xc4
[c0597eb0] [c000d920] ret_from_except+0x0/0x18
--- interrupt: 501 at arch_cpu_idle+0x24/0x5c

Change the size to a signed integer and then trim off any part of the
FCS that was received prior to the last fragment.

Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames")
Signed-off-by: Andy Spencer
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Andy Spencer
2018-04-26 17:02:13 +0800
67fa8bfff ntb_transport: Fix bug with max_mw_size parameter ... Browse Code »

[ Upstream commit cbd27448faff4843ac4b66cc71445a10623ff48d ]

When using the max_mw_size parameter of ntb_transport to limit the size of
the Memory windows, communication cannot be established and the queues
freeze.

This is because the mw_size that's reported to the peer is correctly
limited but the size used locally is not. So the MW is initialized
with a buffer smaller than the window but the TX side is using the
full window. This means the TX side will be writing to a region of the
window that points nowhere.

This is easily fixed by applying the same limit to tx_size in
ntb_transport_init_queue().

Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Logan Gunthorpe
Acked-by: Allen Hubbe
Cc: Dave Jiang
Signed-off-by: Jon Mason
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Logan Gunthorpe
2018-04-26 17:02:13 +0800
d810c5481 RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure ... Browse Code »

[ Upstream commit b081808a66345ba725b77ecd8d759bee874cd937 ]

Failure in XRCD FW deallocation command leaves memory leaked and
returns error to the user which he can't do anything about it.

This patch changes behavior to always free memory and always return
success to the user.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Reviewed-by: Yuval Shaia
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-04-26 17:02:13 +0800
0bddd43ac powerpc/numa: Ensure nodes initialized for hotplug ... Browse Code »

[ Upstream commit ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6 ]

This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot. The
problems of interest include:

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
them are allowed to be 'possible' and 'online'. Memory allocations
for those nodes are taken from another node that does have memory
until and if memory is hot-added to the node.

* Nodes which have no resources assigned at boot, but which may still
be referenced subsequently by affinity or associativity attributes,
are kept in the list of 'possible' nodes for powerpc. Hot-add of
memory or CPUs to the system can reference these nodes and bring
them online instead of redirecting the references to one of the set
of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug. We
are not doing memory hotplug in this code, but rather updating the
kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology). We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU
hotplug operation. CPU hotplug operations are protected by a range of
APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected by
mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap. Using HMC hot-add/hot-remove operations, we have been able to
add and remove CPUs to any possible node without failures. HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann
Reviewed-by: Nathan Fontenot
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Michael Bringmann
2018-04-26 17:02:13 +0800
0caebc381 powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes ... Browse Code »

[ Upstream commit a346137e9142b039fd13af2e59696e3d40c487ef ]

On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes that
were not used for these resources at bootup. In the kernel, any node
that is used must be defined and initialized. These empty nodes may
occur when,

* Dedicated vs. shared resources. Shared resources require information
such as the VPHN hcall for CPU assignment to nodes. Associativity
decisions made based on dedicated resource rules, such as
associativity properties in the device tree, may vary from decisions
made using the values returned by the VPHN hcall.

* memoryless nodes at boot. Nodes need to be defined as 'possible' at
boot for operation with other code modules. Previously, the powerpc
code would limit the set of possible nodes to those which have
memory assigned at boot, and were thus online. Subsequent add/remove
of CPUs or memory would only work with this subset of possible
nodes.

* memoryless nodes with CPUs at boot. Due to the previous restriction
on nodes, nodes that had CPUs but no memory were being collapsed
into other nodes that did have memory at boot. In practice this
meant that the node assignment presented by the runtime kernel
differed from the affinity and associativity attributes presented by
the device tree or VPHN hcalls. Nodes that might be known to the
pHyp were not 'possible' in the runtime kernel because they did not
have memory at boot.

This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot. This patch
set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
them are allowed to be 'possible' and 'online'. Memory allocations
for those nodes are taken from another node that does have memory
until and if memory is hot-added to the node. * Nodes which have no
resources assigned at boot, but which may still be referenced
subsequently by affinity or associativity attributes, are kept in
the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
to the system can reference these nodes and bring them online
instead of redirecting to one of the set of nodes that were known to
have memory at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system. This new setting will
override the instruction:

nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the "ibm,max-associativity-domains" property is not present at
boot, no operation will be performed to define or enable additional
nodes, or enable the above 'nodes_and()'.

Signed-off-by: Michael Bringmann
Reviewed-by: Nathan Fontenot
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Michael Bringmann
2018-04-26 17:02:12 +0800
b086dd2d7 samples/bpf: Partially fixes the bpf.o build ... Browse Code »

[ Upstream commit c25ef6a5e62fa212d298ce24995ce239f29b5f96 ]

Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory. This avoid making a buggy bpf.o file (e.g. missing
symbols).

This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).

Signed-off-by: Mickaël Salaün
Cc: Alexei Starovoitov
Cc: Daniel Borkmann
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Mickaël Salaün
2018-04-26 17:02:12 +0800
64e5e46cd i40e: fix reported mask for ntuple filters ... Browse Code »

[ Upstream commit 40339af33c703bacb336493157d43c86a8bf2fed ]

In commit 36777d9fa24c ("i40e: check current configured input set when
adding ntuple filters") some code was added to report the input set
mask for a given filter when reporting it to the user.

This code is necessary so that the reported filter correctly displays
that it is or is not masking certain fields.

Unfortunately the code was incorrect. Development error accidentally
swapped the mask values for the IPv4 addresses with the L4 port numbers.
The port numbers are only 16bits wide while IPv4 addresses are 32 bits.
Unfortunately we assigned only 16 bits to the IPv4 address masks.
Additionally we assigned 32bit value 0xFFFFFFF to the TCP port numbers.
This second part does not matter as the value would be truncated to
16bits regardless, but it is unnecessary.

Fix the reported masks to properly report that the entire field is
masked.

Fixes: 36777d9fa24c ("i40e: check current configured input set when adding ntuple filters")
Signed-off-by: Jacob Keller
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jacob Keller
2018-04-26 17:02:12 +0800
1ec85fe4e i40e: program fragmented IPv4 filter input set ... Browse Code »

[ Upstream commit 02b4016bfe43d2d5ed043be7ffa56cda6a4d1100 ]

When implementing support for IP_USER_FLOW filters, we correctly
programmed a filter for both the non fragmented IPv4/Other filter, as
well as the fragmented IPv4 filters. However, we did not properly
program the input set for fragmented IPv4 PCTYPE. This meant that the
filters would almost certainly not match, unless the user specified all
of the flow types.

Add support to program the fragmented IPv4 filter input set. Since we
always program these filters together, we'll assume that the two input
sets must match, and will thus always program the input sets to the same
value.

Signed-off-by: Jacob Keller
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jacob Keller
2018-04-26 17:02:12 +0800
7addb3e4a ixgbe: don't set RXDCTL.RLPML for 82599 ... Browse Code »

[ Upstream commit 2bafa8fac19a31ca72ae1a3e48df35f73661dbed ]

commit 2de6aa3a666e ("ixgbe: Add support for padding packet")

Uses RXDCTL.RLPML to limit the maximum frame size on Rx when using
build_skb. Unfortunately that register does not work on 82599.

Added an explicit check to avoid setting this register on 82599 MAC.

Extended the comment related to the setting of RXDCTL.RLPML to better
explain its purpose.

Signed-off-by: Emil Tantilov
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Emil Tantilov
2018-04-26 17:02:12 +0800
27eb641f2 jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path ... Browse Code »

[ Upstream commit 5bdd0c6f89fba430e18d636493398389dadc3b17 ]

If jffs2_iget() fails for a newly-allocated inode, jffs2_do_clear_inode()
can get called twice in the error handling path, the first call in
jffs2_iget() itself and the second through iget_failed(). This can result
to a use-after-free error in the second jffs2_do_clear_inode() call, such
as shown by the oops below wherein the second jffs2_do_clear_inode() call
was trying to free node fragments that were already freed in the first
jffs2_do_clear_inode() call.

[ 78.178860] jffs2: error: (1904) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 24 at physical location 0x1fc00c
[ 78.178914] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
[ 78.185871] pgd = ffffffc03a567000
[ 78.188794] [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
[ 78.194968] Internal error: Oops: 96000004 [#1] PREEMPT SMP
...
[ 78.513147] PC is at rb_first_postorder+0xc/0x28
[ 78.516503] LR is at jffs2_kill_fragtree+0x28/0x90 [jffs2]
[ 78.520672] pc : [] lr : [] pstate: 60000105
[ 78.526757] sp : ffffff800cea38f0
[ 78.528753] x29: ffffff800cea38f0 x28: ffffffc01f3f8e80
[ 78.532754] x27: 0000000000000000 x26: ffffff800cea3c70
[ 78.536756] x25: 00000000dc67c8ae x24: ffffffc033d6945d
[ 78.540759] x23: ffffffc036811740 x22: ffffff800891a5b8
[ 78.544760] x21: 0000000000000000 x20: 0000000000000000
[ 78.548762] x19: ffffffc037d48910 x18: ffffff800891a588
[ 78.552764] x17: 0000000000000800 x16: 0000000000000c00
[ 78.556766] x15: 0000000000000010 x14: 6f2065646f6e695f
[ 78.560767] x13: 6461657220726f66 x12: 2064656c69616620
[ 78.564769] x11: 435243203a6c616e x10: 7265746e695f6564
[ 78.568771] x9 : 6f6e695f64616572 x8 : ffffffc037974038
[ 78.572774] x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
[ 78.576775] x5 : 002f91d85bd44a2f x4 : 0000000000000000
[ 78.580777] x3 : 0000000000000000 x2 : 000000403755e000
[ 78.584779] x1 : 6b6b6b6b6b6b6b6b x0 : 6b6b6b6b6b6b6b6b
...
[ 79.038551] [] rb_first_postorder+0xc/0x28
[ 79.042962] [] jffs2_do_clear_inode+0x88/0x100 [jffs2]
[ 79.048395] [] jffs2_evict_inode+0x3c/0x48 [jffs2]
[ 79.053443] [] evict+0xb0/0x168
[ 79.056835] [] iput+0x1c0/0x200
[ 79.060228] [] iget_failed+0x30/0x3c
[ 79.064097] [] jffs2_iget+0x2d8/0x360 [jffs2]
[ 79.068740] [] jffs2_lookup+0xe8/0x130 [jffs2]
[ 79.073434] [] lookup_slow+0x118/0x190
[ 79.077435] [] walk_component+0xfc/0x28c
[ 79.081610] [] path_lookupat+0x84/0x108
[ 79.085699] [] filename_lookup+0x88/0x100
[ 79.089960] [] user_path_at_empty+0x58/0x6c
[ 79.094396] [] vfs_statx+0xa4/0x114
[ 79.098138] [] SyS_newfstatat+0x58/0x98
[ 79.102227] [] __sys_trace_return+0x0/0x4
[ 79.106489] Code: d65f03c0 f9400001 b40000e1 aa0103e0 (f9400821)

The jffs2_do_clear_inode() call in jffs2_iget() is unnecessary since
iget_failed() will eventually call jffs2_do_clear_inode() if needed, so
just remove it.

Fixes: 5451f79f5f81 ("iget: stop JFFS2 from using iget() and read_inode()")
Reviewed-by: Richard Weinberger
Signed-off-by: Jake Daryll Obina
Signed-off-by: Al Viro
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jake Daryll Obina
2018-04-26 17:02:12 +0800
19b3638ce RDMA/uverbs: Use an unambiguous errno for method not supported ... Browse Code »

[ Upstream commit 3624a8f02568f08aef299d3b117f2226f621177d ]

Returning EOPNOTSUPP is problematic because it can also be
returned by the method function, and we use it in quite a few
places in drivers these days.

Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework
is enabled but the requested object and method are not supported by
the kernel. No other case will return this code, and it lets userspace
know to fall back to write().

grep says we do not use it today in drivers/infiniband subsystem.

Signed-off-by: Jason Gunthorpe
Reviewed-by: Matan Barak
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jason Gunthorpe
2018-04-26 17:02:12 +0800
827aab45c crypto: artpec6 - remove select on non-existing CRYPTO_SHA384 ... Browse Code »

[ Upstream commit 980b4c95e78e4113cb7b9f430f121dab1c814b6c ]

Since CRYPTO_SHA384 does not exists, Kconfig should not select it.
Anyway, all SHA384 stuff is in CRYPTO_SHA512 which is already selected.

Fixes: a21eb94fc4d3i ("crypto: axis - add ARTPEC-6/7 crypto accelerator driver")
Signed-off-by: Corentin Labbe
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Corentin LABBE
2018-04-26 17:02:12 +0800
592ea370b device property: Define type of PROPERTY_ENRTY_*() macros ... Browse Code »

[ Upstream commit c505cbd45f6e9c539d57dd171d95ec7e5e9f9cd0 ]

Some of the drivers may use the macro at runtime flow, like

struct property_entry p[10];
...
p[index++] = PROPERTY_ENTRY_U8("u8 property", u8_data);

In that case and absence of the data type compiler fails the build:

drivers/char/ipmi/ipmi_dmi.c:79:29: error: Expected ; at end of statement
drivers/char/ipmi/ipmi_dmi.c:79:29: error: got {

Acked-by: Corey Minyard
Cc: Corey Minyard
Signed-off-by: Andy Shevchenko
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Andy Shevchenko
2018-04-26 17:02:12 +0800
c5fda2b86 tty: serial: exar: Relocate sleep wake-up handling ... Browse Code »

[ Upstream commit c7e1b4059075c9e8eed101d7cc5da43e95eb5e18 ]

Exar sleep wake-up handling has been done on a per-channel basis by
virtue of INT0 being accessible from each channel's address space. I
believe this was initially done out of necessity, but now that Exar
devices have their own driver, we can do things more efficiently by
registering a dedicated INT0 handler at the PCI device level.

I see this change providing the following benefits:

1. If more than one port is active, eliminates the redundant bus
cycles for reading INT0 on every interrupt.
2. This note associated with hooking in the per-channel handler in
8250_port.c is resolved:
/* Fixme: probably not the best place for this */

Cc: Matt Schulte
Signed-off-by: Aaron Sierra
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Aaron Sierra
2018-04-26 17:02:11 +0800
519a71195 x86/hyperv: Stop suppressing X86_FEATURE_PCID ... Browse Code »

[ Upstream commit 617ab45c9a8900e64a78b43696c02598b8cad68b ]

When hypercall-based TLB flush was enabled for Hyper-V guests PCID feature
was deliberately suppressed as a precaution: back then PCID was never
exposed to Hyper-V guests and it wasn't clear what will happen if some day
it becomes available. The day came and PCID/INVPCID features are already
exposed on certain Hyper-V hosts.

>From TLFS (as of 5.0b) it is unclear how TLB flush hypercalls combine with
PCID. In particular the usage of PCID is per-cpu based: the same mm gets
different CR3 values on different CPUs. If the hypercall does exact
matching this will fail. However, this is not the case. David Zhang
explains:

"In practice, the AddressSpace argument is ignored on any VM that supports
PCIDs.

Architecturally, the AddressSpace argument must match the CR3 with PCID
bits stripped out (i.e., the low 12 bits of AddressSpace should be 0 in
long mode). The flush hypercalls flush all PCIDs for the specified
AddressSpace."

With this, PCID can be enabled.

Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Thomas Gleixner
Cc: David Zhang
Cc: Stephen Hemminger
Cc: Haiyang Zhang
Cc: "Michael Kelley (EOSG)"
Cc: Andy Lutomirski
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan"
Cc: Aditya Bhandari
Link: https://lkml.kernel.org/r/20180124103629.29980-1-vkuznets@redhat.com
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Vitaly Kuznetsov
2018-04-26 17:02:11 +0800
9a1dda252 fm10k: fix "failed to kill vid" message for VF ... Browse Code »

[ Upstream commit cf315ea596ec26d7aa542a9ce354990875a920c0 ]

When a VF is under PF VLAN assignment:

ip link set vf vlan

This will remove all previous entries in the VLAN table including those
generated by VLAN interfaces created on the VF. The issue arises when
the VF is under PF VLAN assignment and one or more of these VLAN
interfaces of the VF are deleted. When deleting these VLAN interfaces,
the following message will be generated in "dmesg":

failed to kill vid 0081/ for device

This is due to the fact that "ndo_vlan_rx_kill_vid" exits with an error.
The handler for this ndo is "fm10k_update_vid". Any calls to this
function while under PF VLAN management will exit prematurely and, thus,
it will generate the failure message.

Additionally, since "fm10k_update_vid" exits prematurely, none of the
VLAN update is performed. So, even though the actual VLAN interfaces of
the VF will be deleted, the active_vlans bitmask is not cleared. When
the VF is no longer under PF VLAN assignment, the driver mistakenly
restores the previous entries of the VLAN table based on an
unsynchronized list of active VLANs.

The solution to this issue involves checking the VLAN update action type
before exiting "fm10k_update_vid". If the VLAN update action type is to
"add", this action will not be permitted while the VF is under PF VLAN
assignment and the VLAN update is abandoned like before.

However, if the VLAN update action type is to "kill", then we need to
also clear the active_vlans bitmask. However, we don't need to actually
queue any messages to the PF, because the MAC and VLAN tables have
already been cleared, and the PF would silently ignore these requests
anyways.

Signed-off-by: Ngai-Mint Kwan
Signed-off-by: Jacob Keller
Tested-by: Krishneil Singh
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ngai-Mint Kwan
2018-04-26 17:02:11 +0800
0e7a0c139 igb: Clear TXSTMP when ptp_tx_work() is timeout ... Browse Code »

[ Upstream commit 3a53285228165225a7f76c7d5ff1ddc0213ce0e4 ]

Problem description:
After ethernet cable connect and disconnect for several iterations on a
device with i210, tx timestamp will stop being put into the socket.

Steps to reproduce:
1. Setup a device with i210 and wire it to a 802.1AS capable switch (
Extreme Networks Summit x440 is used in our case)
2. Have the gptp daemon running on the device and make sure it is synced
with the switch
3. Have the switch disable and enable the port, wait for the device gets
resynced with the switch
4. Iterates step 3 until the device is not albe to get resynced
5. Review the log in dmesg and you will see warning message "igb : clearing
Tx timestamp hang"

Root cause:
If ptp_tx_work() gets scheduled just before the port gets disabled, a LINK
DOWN event will be processed before ptp_tx_work(), which may cause timeout
in ptp_tx_work(). In the timeout logic, the TSYNCTXCTL's TXTT bit (Transmit
timestamp valid bit) is not cleared, causing no new timestamp loaded to
TXSTMP register. Consequently therefore, no new interrupt is triggerred by
TSICR.TXTS bit and no more Tx timestamp send to the socket.

Signed-off-by: Daniel Hua
Tested-by: Aaron Brown
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Daniel Hua
2018-04-26 17:02:11 +0800
187bf2819 igb: Allow to remove administratively set MAC on VFs ... Browse Code »

[ Upstream commit 177132df5e45b134c147f419f567a3b56aafaf2b ]

Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
for use by a virtual machine (either using vfio device assignment or
macvtap passthru mode), it saves the current MAC address and vlan tag
so that it can reset them to their original value when the guest is
done. Libvirt can't leave the VF MAC set to the value used by the
now-defunct guest since it may be started again later using a
different VF, but it certainly shouldn't just pick any random value,
either. So it saves the state of everything prior to using the VF, and
resets it to that.

The igb driver initializes the MAC addresses of all VFs to
00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
netlink message, also visible in the list of VFs in the output of "ip
link show"). But when libvirt attempts to restore the MAC address back
to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
responds with "Invalid argument".

Forbidding a reset back to the original value leaves the VF MAC at the
value set for the now-defunct virtual machine. Especially on a system
with NetworkManager enabled, this has very bad consequences, since
NetworkManager forces all interfacess to be IFF_UP all the time - if
the same virtual machine is restarted using a different VF (or even on
a different host), there will be multiple interfaces watching for
traffic with the same MAC address.

To allow libvirt to revert to the original state, we need a way to
remove the administrative set MAC on a VF, to allow normal host
operation again, and to reset/overwrite the VF MAC via VF netdev.

This patch implements the outlined scenario by allowing to set the
VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
so it's possible to reset the VF MAC back to the original value via
the VF netdev.

Note: Recent patches to libvirt allow for a workaround if the NIC
isn't capable of resetting the administrative MAC back to all 0, but
in theory the NIC should allow resetting the MAC in the first place.

Signed-off-by: Corinna Vinschen
Tested-by: Aaron Brown
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Corinna Vinschen
2018-04-26 17:02:11 +0800
048af64fd ASoC: rockchip: Use dummy_dai for rt5514 dsp dailink ... Browse Code »

[ Upstream commit fde7f9dbc71365230eeb8c8ea97ce9b552c8e5bd ]

The rt5514 dsp captures pcm data through spi directly, so we should not
use rockchip-i2s as it's cpu dai like other codecs.

Use dummy_dai for rt5514 dsp dailink to make voice wakeup work again.

Reported-by: Jimmy Cheng-Yi Chiang
Fixes: (72cfb0f20c75 ASoC: rockchip: Use codec of_node and dai_name for rt5514 dsp)
Signed-off-by: Jeffy Chen
Tested-by: Brian Norris
Signed-off-by: Mark Brown
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jeffy Chen
2018-04-26 17:02:11 +0800
f25ba4f6b blk-mq-debugfs: don't allow write on attributes with seq_operations set ... Browse Code »

[ Upstream commit 6b136a24b05c81a24e0b648a4bd938bcd0c4f69e ]

Attributes that only implement .seq_ops are read-only, any write to
them should be rejected. But currently kernel would crash when
writing to such debugfs entries, e.g.

chmod +w /sys/kernel/debug/block//requeue_list
echo 0 > /sys/kernel/debug/block//requeue_list
chmod -w /sys/kernel/debug/block//requeue_list

Fix it by returning -EPERM in blk_mq_debugfs_write() when writing to
such attributes.

Cc: Ming Lei
Signed-off-by: Eryu Guan
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Eryu Guan
2018-04-26 17:02:11 +0800
a42ebbdae KVM: s390: vsie: use READ_ONCE to access some SCB fields ... Browse Code »

[ Upstream commit b3ecd4aa8632a86428605ab73393d14779019d82 ]

Another VCPU might try to modify the SCB while we are creating the
shadow SCB. In general this is no problem - unless the compiler decides
to not load values once, but e.g. twice.

For us, this is only relevant when checking/working with such values.
E.g. the prefix value, the mso, state of transactional execution and
addresses of satellite blocks.

E.g. if we blindly forward values (e.g. general purpose registers or
execution controls after masking), we don't care.

Leaving unpin_blocks() untouched for now, will handle it separately.

The worst thing right now that I can see would be a missed prefix
un/remap (mso, prefix, tx) or using wrong guest addresses. Nothing
critical, but let's try to avoid unpredictable behavior.

Signed-off-by: David Hildenbrand
Message-Id:
Reviewed-by: Christian Borntraeger
Acked-by: Cornelia Huck
Signed-off-by: Christian Borntraeger
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Hildenbrand
2018-04-26 17:02:11 +0800
48d441324 platform/x86: thinkpad_acpi: suppress warning about palm detection ... Browse Code »

[ Upstream commit 587d8628fb71c3bfae29fb2bbe84c1478c59bac8 ]

This patch prevents the thinkpad_acpi driver from warning about 2 event
codes returned for keyboard palm-detection. No behavioral changes,
other than suppressing the warning in the kernel log. The events are
still forwarded via acpi-netlink channels.

We could, optionally, decide to forward the event through a
input-switch on the tpacpi input device. However, so far no suitable
input-code exists, and no similar drivers report such events. Hence,
leave it an acpi event for now.

Note that the event-codes are named based on empirical studies. On the
ThinkPad X1 5th Gen the sensor can be found underneath the arrow key.

Cc: Matthew Thode
Signed-off-by: David Herrmann
Acked-by: Henrique de Moraes Holschuh
Signed-off-by: Andy Shevchenko
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Herrmann
2018-04-26 17:02:11 +0800
b9d78055c i40evf: ignore link up if not running ... Browse Code »

[ Upstream commit e0346f9fcb6c636d2f870e6666de8781413f34ea ]

If we receive the link status message from PF with link up before queues
are actually enabled, it will trigger a TX hang. This fixes the issue
by ignoring a link up message if the VF state is not yet in RUNNING
state.

Signed-off-by: Alan Brady
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Alan Brady
2018-04-26 17:02:10 +0800
09f6d65db i40evf: Don't schedule reset_task when device is being removed ... Browse Code »

[ Upstream commit 06aa040f039404a0039a5158cd12f41187487a1f ]

When a host disables and enables a PF device, all the associated
VFs are removed and added back in. It also generates a PFR which in turn
resets all the connected VFs. This behaviour is different from that of
Linux guest on Linux host. Hence we end up in a situation where there's
a PFR and device removal at the same time. And watchdog doesn't have a
clue about this and schedules a reset_task. This patch adds code to send
signal to reset_task that the device is currently being removed.

Signed-off-by: Avinash Dayanand
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Avinash Dayanand
2018-04-26 17:02:10 +0800
7c7ae4ed2 bpf: test_maps: cleanup sockmaps when test ends ... Browse Code »

[ Upstream commit 783687810e986a15ffbf86c516a1a48ff37f38f7 ]

Bug: BPF programs and maps related to sockmaps test exist
in memory even after test_maps ends.

This patch fixes it as a short term workaround (sockmap
kernel side needs real fixing) by empyting sockmaps when
test ends.

Fixes: 6f6d33f3b3d0f ("bpf: selftests add sockmap tests")
Signed-off-by: Prashant Bhole
[ daniel: Note on workaround. ]
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Prashant Bhole
2018-04-26 17:02:10 +0800
c6c6e38ae block: Set BIO_TRACE_COMPLETION on new bio during split ... Browse Code »

[ Upstream commit 20d59023c5ec4426284af492808bcea1f39787ef ]

We inadvertently set it again on the source bio, but we need
to set it on the new split bio instead.

Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Goldwyn Rodrigues
2018-04-26 17:02:10 +0800
f2e73df30 nfp: fix error return code in nfp_pci_probe() ... Browse Code »

[ Upstream commit e58decc9c51eb61697aba35ba8eda33f4b80552d ]

Fix to return error code -EINVAL instead of 0 when num_vfs above
limit_vfs, as done elsewhere in this function.

Fixes: 0dc786219186 ("nfp: handle SR-IOV already enabled when driver is probing")
Signed-off-by: Wei Yongjun
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Wei Yongjun
2018-04-26 17:02:10 +0800
859195841 HID: roccat: prevent an out of bounds read in kovaplus_profile_activated() ... Browse Code »

[ Upstream commit 7ad81482cad67cbe1ec808490d1ddfc420c42008 ]

We get the "new_profile_index" value from the mouse device when we're
handling raw events. Smatch taints it as untrusted data and complains
that we need a bounds check. This seems like a reasonable warning
otherwise there is a small read beyond the end of the array.

Fixes: 0e70f97f257e ("HID: roccat: Add support for Kova[+] mouse")
Signed-off-by: Dan Carpenter
Acked-by: Silvan Jegen
Signed-off-by: Jiri Kosina
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2018-04-26 17:02:10 +0800
6a5505da4 Input: stmfts - set IRQ_NOAUTOEN to the irq flag ... Browse Code »

[ Upstream commit cba04cdf437d745fac85220d1d692a9ae23d7004 ]

The interrupt is requested before the device is powered on and
it's value in some cases cannot be reliable. It happens on some
devices that an interrupt is generated as soon as requested
before having the chance to disable the irq.

Set the irq flag as IRQ_NOAUTOEN before requesting it.

This patch mutes the error:

stmfts 2-0049: failed to read events: -11

received sometimes during boot time.

Signed-off-by: Andi Shyti
Signed-off-by: Dmitry Torokhov
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Andi Shyti
2018-04-26 17:02:10 +0800
8afed2798 scsi: fas216: fix sense buffer initialization ... Browse Code »

[ Upstream commit 96d5eaa9bb74d299508d811d865c2c41b38b0301 ]

While testing with the ARM specific memset() macro removed, I ran into a
compiler warning that shows an old bug:

drivers/scsi/arm/fas216.c: In function 'fas216_rq_sns_done':
drivers/scsi/arm/fas216.c:2014:40: error: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess]

It turns out that the definition of the scsi_cmd structure changed back
in linux-2.6.25, so now we clear only four bytes (sizeof(pointer))
instead of 96 (SCSI_SENSE_BUFFERSIZE). I did not check whether we
actually need to initialize the buffer here, but it's clear that if we
do it, we should use the correct size.

Fixes: de25deb18016 ("[SCSI] use dynamically allocated sense buffer")
Signed-off-by: Arnd Bergmann
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2018-04-26 17:02:10 +0800
800fda575 scsi: devinfo: fix format of the device list ... Browse Code »

[ Upstream commit 3f884a0a8bdf28cfd1e9987d54d83350096cdd46 ]

Replace "" with NULL for product revision level, and merge TEXEL
duplicate entries.

Cc: Hannes Reinecke
Cc: Martin K. Petersen
Cc: James E.J. Bottomley
Cc: SCSI ML
Signed-off-by: Xose Vazquez Perez
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Xose Vazquez Perez
2018-04-26 17:02:10 +0800
a09881cfb f2fs: avoid hungtask when GC encrypted block if io_bits is set ... Browse Code »

[ Upstream commit a9d572c7550044d5b217b5287d99a2e6d34b97b0 ]

When io_bits is set, GCing encrypted block may hit the following hungtask.
Since io_bits requires aligned block address, f2fs_submit_page_write may
return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a
result, the encrypted page will never be writtenback.

This patch makes move_data_block aware the EAGAIN error and cancel the
writeback.

[ 246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds.
[ 246.752423] Not tainted 4.15.0-rc4+ #11
[ 246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.755336] kworker/u4:4 D25448 797 2 0x80000000
[ 246.755597] Workqueue: writeback wb_workfn (flush-7:0)
[ 246.755616] Call Trace:
[ 246.755695] ? __schedule+0x322/0xa90
[ 246.755761] ? blk_init_request_from_bio+0x120/0x120
[ 246.755773] ? pci_mmcfg_check_reserved+0xb0/0xb0
[ 246.755801] ? __radix_tree_create+0x19e/0x200
[ 246.755813] ? delete_node+0x136/0x370
[ 246.755838] schedule+0x43/0xc0
[ 246.755904] io_schedule+0x17/0x40
[ 246.755939] wait_on_page_bit_common+0x17b/0x240
[ 246.755950] ? wake_page_function+0xa0/0xa0
[ 246.755961] ? add_to_page_cache_lru+0x160/0x160
[ 246.755972] ? page_cache_tree_insert+0x170/0x170
[ 246.755983] ? __lru_cache_add+0x96/0xb0
[ 246.756086] __filemap_fdatawait_range+0x14f/0x1c0
[ 246.756097] ? wait_on_page_bit_common+0x240/0x240
[ 246.756120] ? __wake_up_locked_key_bookmark+0x20/0x20
[ 246.756167] ? wait_on_all_pages_writeback+0xc9/0x100
[ 246.756179] ? __remove_ino_entry+0x120/0x120
[ 246.756192] ? wait_woken+0x100/0x100
[ 246.756204] filemap_fdatawait_range+0x9/0x20
[ 246.756216] write_checkpoint+0x18a1/0x1f00
[ 246.756254] ? blk_get_request+0x10/0x10
[ 246.756265] ? cpumask_next_and+0x43/0x60
[ 246.756279] ? f2fs_sync_inode_meta+0x160/0x160
[ 246.756289] ? remove_element.isra.4+0xa0/0xa0
[ 246.756300] ? __put_compound_page+0x40/0x40
[ 246.756310] ? f2fs_sync_fs+0xec/0x1c0
[ 246.756320] ? f2fs_sync_fs+0x120/0x1c0
[ 246.756329] f2fs_sync_fs+0x120/0x1c0
[ 246.756357] ? trace_event_raw_event_f2fs__page+0x260/0x260
[ 246.756393] ? ata_build_rw_tf+0x173/0x410
[ 246.756397] f2fs_balance_fs_bg+0x198/0x390
[ 246.756405] ? drop_inmem_page+0x230/0x230
[ 246.756415] ? ahci_qc_prep+0x1bb/0x2e0
[ 246.756418] ? ahci_qc_issue+0x1df/0x290
[ 246.756422] ? __accumulate_pelt_segments+0x42/0xd0
[ 246.756426] ? f2fs_write_node_pages+0xd1/0x380
[ 246.756429] f2fs_write_node_pages+0xd1/0x380
[ 246.756437] ? sync_node_pages+0x8f0/0x8f0
[ 246.756440] ? update_curr+0x53/0x220
[ 246.756444] ? __accumulate_pelt_segments+0xa2/0xd0
[ 246.756448] ? __update_load_avg_se.isra.39+0x349/0x360
[ 246.756452] ? do_writepages+0x2a/0xa0
[ 246.756456] do_writepages+0x2a/0xa0
[ 246.756460] __writeback_single_inode+0x70/0x490
[ 246.756463] ? check_preempt_wakeup+0x199/0x310
[ 246.756467] writeback_sb_inodes+0x2a2/0x660
[ 246.756471] ? is_empty_dir_inode+0x40/0x40
[ 246.756474] ? __writeback_single_inode+0x490/0x490
[ 246.756477] ? string+0xbf/0xf0
[ 246.756480] ? down_read_trylock+0x35/0x60
[ 246.756484] __writeback_inodes_wb+0x9f/0xf0
[ 246.756488] wb_writeback+0x41d/0x4b0
[ 246.756492] ? writeback_inodes_wb.constprop.55+0x150/0x150
[ 246.756498] ? set_worker_desc+0xf7/0x130
[ 246.756502] ? current_is_workqueue_rescuer+0x60/0x60
[ 246.756511] ? _find_next_bit+0x2c/0xa0
[ 246.756514] ? wb_workfn+0x400/0x5d0
[ 246.756518] wb_workfn+0x400/0x5d0
[ 246.756521] ? finish_task_switch+0xdf/0x2a0
[ 246.756525] ? inode_wait_for_writeback+0x30/0x30
[ 246.756529] process_one_work+0x3a7/0x6f0
[ 246.756533] worker_thread+0x82/0x750
[ 246.756537] kthread+0x16f/0x1c0
[ 246.756541] ? trace_event_raw_event_workqueue_work+0x110/0x110
[ 246.756544] ? kthread_create_worker_on_cpu+0xb0/0xb0
[ 246.756548] ret_from_fork+0x1f/0x30

Signed-off-by: Sheng Yong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Sheng Yong
2018-04-26 17:02:09 +0800
889177d17 RDMA/cma: Check existence of netdevice during port validation ... Browse Code »

[ Upstream commit 00db63c128dd3daf38f481371976c24d32678142 ]

If valid netdevice is not found for RoCE, GID table should not be
searched with NULL netdevice.

Doing so causes the search routines to ignore the netdev argument and may
match the wrong GID table entry if the netdev is deleted.

Fixes: abae1b71dd37 ("IB/cma: cma_validate_port should verify the port and netdevice")
Signed-off-by: Parav Pandit
Reviewed-by: Mark Bloch
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-04-26 17:02:09 +0800
48b8839d9 Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io ... Browse Code »

[ Upstream commit 7583d8d088ff2c323b1d4f15b191ca2c23d32558 ]

Before rbio_orig_end_io() goes to free rbio, rbio may get merged with
more bios from other rbios and rbio->bio_list becomes non-empty,
in that case, these newly merged bios don't end properly.

Once unlock_stripe() is done, rbio->bio_list will not be updated any
more and we can call bio_endio() on all queued bios.

It should only happen in error-out cases, the normal path of recover
and full stripe write have already set RBIO_RMW_LOCKED_BIT to disable
merge before doing IO, so rbio_orig_end_io() called by them doesn't
have the above issue.

Reported-by: Jérôme Carretero
Signed-off-by: Liu Bo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Liu Bo
2018-04-26 17:02:09 +0800
ebe064401 Btrfs: fix unexpected EEXIST from btrfs_get_extent ... Browse Code »

[ Upstream commit 18e83ac75bfe67009c4ddcdd581bba8eb16f4030 ]

This fixes a corner case that is caused by a race of dio write vs dio
read/write.

Here is how the race could happen.

Suppose that no extent map has been loaded into memory yet.
There is a file extent [0, 32K), two jobs are running concurrently
against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
read from [0, 4K) or [4K, 8K).

t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).

------------------------------------------------------
t1 t2
btrfs_get_blocks_direct() btrfs_get_blocks_direct()
-> btrfs_get_extent() -> btrfs_get_extent()
-> lookup_extent_mapping()
-> add_extent_mapping() -> lookup_extent_mapping()
# load [0, 32K)
-> btrfs_new_extent_direct()
-> btrfs_drop_extent_cache()
# split [0, 32K) and
# drop [8K, 32K)
-> add_extent_mapping()
# add [8K, 32K)
-> add_extent_mapping()
# handle -EEXIST when adding
# [0, 32K)
------------------------------------------------------
About how t2(dio read/write) runs into -EEXIST:

a) add_extent_mapping() gets -EEXIST for adding em [0, 32k),

b) search_extent_mapping() then returns [0, 8k) as the existing em,
even though start == existing->start, em is [0, 32k) so that
extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k,

c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0) and returns -EEXIST as [8k, 32k) is already in tree,

d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write,
which is confusing applications.

Here I conclude all the possible situations,
1) start < existing->start

+-----------+em+-----------+
+--prev---+ | +-------------+ |
| | | | | |
+---------+ + +---+existing++ ++
+
|
+
start

2) start == existing->start

+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start

3) start > existing->start && start < (existing->start + existing->len)

+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start

4) start >= (existing->start + existing->len)

+-----------+em+-----------+
| +-------------+ | +--next---+
| | | | | |
+ +---+existing++ + +---------+
+
|
+
start

As we can see, it turns out that if start is within existing em (front
inclusive), then the existing em should be returned as is, otherwise,
we try our best to merge candidate em with sibling ems to form a
larger em (in order to reduce the total number of em).

Reported-by: David Vallender
Signed-off-by: Liu Bo
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Liu Bo
2018-04-26 17:02:09 +0800
c231cec82 btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP ... Browse Code »

[ Upstream commit 6f794e3c5c8f8fdd3b5bb20d9ded894e685b5bbe ]

It appears from the original commit [1] that there isn't any design
specific reason not to fail the mount instead of just warning. This
patch will change it to fail.

[1]
commit 319e4d0661e5323c9f9945f0f8fb5905e5fe74c3
btrfs: Enhance super validation check

Fixes: 319e4d0661e5323 ("btrfs: Enhance super validation check")
Signed-off-by: Anand Jain
Reviewed-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2018-04-26 17:02:09 +0800
d91bb7c69 Btrfs: fix scrub to repair raid6 corruption ... Browse Code »

[ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ]

The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.

We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.

[1]: https://patchwork.kernel.org/patch/10091755/

Signed-off-by: Liu Bo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Liu Bo
2018-04-26 17:02:09 +0800
db6d651ec btrfs: Fix out of bounds access in btrfs_search_slot ... Browse Code »

[ Upstream commit 9ea2c7c9da13c9073e371c046cbbc45481ecb459 ]

When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
the level variable is going to be 7 (this is the max height of the
tree). On the other hand btrfs_cow_block is always called with
"level + 1" as an index into the nodes and slots arrays. This leads to
an out of bounds access. Admittdely this will be benign since an OOB
access of the nodes array will likely read the 0th element from the
slots array, which in this case is going to be 0 (since we start CoW at
the top of the tree). The OOB access into the slots array in turn will
read the 0th and 1st values of the locks array, which would both be 0
at the time. However, this benign behavior relies on the fact that the
path being passed hasn't been initialised, if it has already been used to
query a btree then it could potentially have populated the nodes/slots arrays.

Fix it by explicitly checking if we are at level 7 (the maximum allowed
index in nodes/slots arrays) and explicitly call the CoW routine with
NULL for parent's node/slot.

Signed-off-by: Nikolay Borisov
Fixes-coverity-id: 711515
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2018-04-26 17:02:09 +0800