Eric Lee / smarc-fsl-linux-kernel

06 Apr, 2019

40 commits

e6786f868 perf annotate: Fix getting source line failure ... Browse Code »

[ Upstream commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 ]

The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33
("perf annotate: No need to calculate notes->start twice") removed notes->start
assignment in symbol__calc_lines(). It will get failed in
find_address_in_section() from symbol__tty_annotate() subroutine as the
a2l->addr is wrong. So the annotate summary doesn't report the line number of
source code correctly.

Before fix:

liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
void hotspot_1(void)
{
volatile int i;

for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
}

int main(void)
{
hotspot_1();

return 0;
}
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1

liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------

19.30 common_while_1[32]
19.03 common_while_1[4e]
19.01 common_while_1[16]
5.04 common_while_1[13]
4.99 common_while_1[4b]
4.78 common_while_1[2c]
4.77 common_while_1[10]
4.66 common_while_1[2f]
4.59 common_while_1[51]
4.59 common_while_1[35]
4.52 common_while_1[19]
4.20 common_while_1[56]
0.51 common_while_1[48]
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa :
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1[10] 4.77 : 60a: add $0x1,%eax
common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp)
common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax
common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607
: for (i = 0; i < 0x10000000; i++);
...

After fix:

liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------

33.34 common_while_1.c:5
33.34 common_while_1.c:6
33.32 common_while_1.c:7
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa :
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1.c:5 4.70 : 60a: add $0x1,%eax
4.89 : 60d: mov %eax,-0x4(%rbp)
common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax
common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607
: for (i = 0; i < 0x10000000; i++);
0.00 : 61a: movl $0x0,-0x4(%rbp)
0.00 : 621: jmp 62c
0.00 : 623: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 626: add $0x1,%eax
4.73 : 629: mov %eax,-0x4(%rbp)
common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax
...

Signed-off-by: Wei Li
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Jin Yao
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice")
Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Sasha Levin

Wei Li
2019-04-06 04:33:03 +0800
763a895aa clk: fractional-divider: check parent rate only if flag is set ... Browse Code »

[ Upstream commit d13501a2bedfbea0983cc868d3f1dc692627f60d ]

Custom approximation of fractional-divider may not need parent clock
rate checking. For example Rockchip SoCs work fine using grand parent
clock rate even if target rate is greater than parent.

This patch checks parent clock rate only if CLK_SET_RATE_PARENT flag
is set.

For detailed example, clock tree of Rockchip I2S audio hardware.
- Clock rate of CPLL is 1.2GHz, GPLL is 491.52MHz.
- i2s1_div is integer divider can divide N (N is 1~128).
Input clock is CPLL or GPLL. Initial divider value is N = 1.
Ex) PLL = CPLL, N = 10, i2s1_div output rate is
CPLL / 10 = 1.2GHz / 10 = 120MHz
- i2s1_frac is fractional divider can divide input to x/y, x and
y are 16bit integer.

CPLL --> | selector | ---> i2s1_div -+--> | selector | --> I2S1 MCLK
GPLL --> | | ,--------------' | |
`--> i2s1_frac ---> | |

Clock mux system try to choose suitable one from i2s1_div and
i2s1_frac for master clock (MCLK) of I2S1.

Bad scenario as follows:
- Try to set MCLK to 8.192MHz (32kHz audio replay)
Candidate setting is
- i2s1_div: GPLL / 60 = 8.192MHz
i2s1_div candidate is exactly same as target clock rate, so mux
choose this clock source. i2s1_div output rate is changed
491.52MHz -> 8.192MHz

- After that try to set to 11.2896MHz (44.1kHz audio replay)
Candidate settings are
- i2s1_div : CPLL / 107 = 11.214945MHz
- i2s1_frac: i2s1_div = 8.192MHz
This is because clk_fd_round_rate() thinks target rate
(11.2896MHz) is higher than parent rate (i2s1_div = 8.192MHz)
and returns parent clock rate.

Above is current upstreamed behavior. Clock mux system choose
i2s1_div, but this clock rate is not acceptable for I2S driver, so
users cannot replay audio.

Expected behavior is:
- Try to set master clock to 11.2896MHz (44.1kHz audio replay)
Candidate settings are
- i2s1_div : CPLL / 107 = 11.214945MHz
- i2s1_frac: i2s1_div * 147/6400 = 11.2896MHz
Change i2s1_div to GPLL / 1 = 491.52MHz at same
time.

If apply this commit, clk_fd_round_rate() calls custom approximate
function of Rockchip even if target rate is higher than parent.
Custom function changes both grand parent (i2s1_div) and parent
(i2s_frac) settings at same time. Clock mux system can choose
i2s1_frac and audio works fine.

Signed-off-by: Katsuhiro Suzuki
Reviewed-by: Heiko Stuebner
[sboyd@kernel.org: Make function into a macro instead]
Signed-off-by: Stephen Boyd
Signed-off-by: Sasha Levin

Katsuhiro Suzuki
2019-04-06 04:33:03 +0800
d3ec442d6 IB/mlx4: Increase the timeout for CM cache ... Browse Code »

[ Upstream commit 2612d723aadcf8281f9bf8305657129bd9f3cd57 ]

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when
an entry has to evicted form the cache. But life is not perfect,
remote peers may die or be rebooted. Hence, it's a timeout to wipe out
a cache entry, when the PF driver assumes the remote peer has gone.

During workloads where a high number of QPs are destroyed concurrently,
excessive amount of CM DREQ retries has been observed

The problem can be demonstrated in a bare-metal environment, where two
nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
have 16 vPorts per physical server.

64 processes are associated with each vPort and creates and destroys
one QP for each of the remote 64 processes. That is, 1024 QPs per
vPort, all in all 16K QPs. The QPs are created/destroyed using the
CM.

When tearing down these 16K QPs, excessive CM DREQ retries (and
duplicates) are observed. With some cat/paste/awk wizardry on the
infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
nodes:

cm_rx_duplicates:
dreq 2102
cm_rx_msgs:
drep 1989
dreq 6195
rep 3968
req 4224
rtu 4224
cm_tx_msgs:
drep 4093
dreq 27568
rep 4224
req 3968
rtu 3968
cm_tx_retries:
dreq 23469

Note that the active/passive side is equally distributed between the
two nodes.

Enabling pr_debug in cm.c gives tons of:

[171778.814239] mlx4_ib_multiplex_cm_handler: id{slave:
1,sl_cm_id: 0xd393089f} is NULL!

By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
tear-down phase of the application is reduced from approximately 90 to
50 seconds. Retries/duplicates are also significantly reduced:

cm_rx_duplicates:
dreq 2460
[]
cm_tx_retries:
dreq 3010
req 47

Increasing the timeout further didn't help, as these duplicates and
retries stems from a too short CMA timeout, which was 20 (~4 seconds)
on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
numbers fell down to about 10 for both of them.

Adjustment of the CMA timeout is not part of this commit.

Signed-off-by: Håkon Bugge
Acked-by: Jack Morgenstein
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Håkon Bugge
2019-04-06 04:33:03 +0800
61584032c loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() ... Browse Code »

[ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ]

Commit 0da03cab87e6
("loop: Fix deadlock when calling blkdev_reread_part()") moves
blkdev_reread_part() out of the loop_ctl_mutex. However,
GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
__blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
will not rescan the loop device to delete all partitions.

Below are steps to reproduce the issue:

step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
step2 # losetup -P /dev/loop0 tmp.raw
step3 # parted /dev/loop0 mklabel gpt
step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
step5 # losetup -d /dev/loop0

Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
there is below kernel warning message:

[ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)

This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().

Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
Signed-off-by: Dongli Zhang
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Dongli Zhang
2019-04-06 04:33:03 +0800
07a31820b platform/mellanox: mlxreg-hotplug: Fix KASAN warning ... Browse Code »

[ Upstream commit e4c275f77624961b56cce397814d9d770a45ac59 ]

Fix the following KASAN warning produced when booting a 64-bit kernel:
[ 13.334750] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
[ 13.342166] Read of size 8 at addr ffff880235067178 by task kworker/2:1/42
[ 13.342176] CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 4.20.0-rc1+ #106
[ 13.342179] Hardware name: Mellanox Technologies Ltd. MSN2740/Mellanox x86 SFF board, BIOS 5.6.5 06/07/2016
[ 13.342190] Workqueue: events deferred_probe_work_func
[ 13.342194] Call Trace:
[ 13.342206] dump_stack+0xc7/0x15b
[ 13.342214] ? show_regs_print_info+0x5/0x5
[ 13.342220] ? kmsg_dump_rewind_nolock+0x59/0x59
[ 13.342234] ? _raw_write_lock_irqsave+0x100/0x100
[ 13.351593] print_address_description+0x73/0x260
[ 13.351603] kasan_report+0x260/0x380
[ 13.351611] ? find_first_bit+0x19/0x70
[ 13.351619] find_first_bit+0x19/0x70
[ 13.351630] mlxreg_hotplug_work_handler+0x73c/0x920 [mlxreg_hotplug]
[ 13.351639] ? __lock_text_start+0x8/0x8
[ 13.351646] ? _raw_write_lock_irqsave+0x80/0x100
[ 13.351656] ? mlxreg_hotplug_remove+0x1e0/0x1e0 [mlxreg_hotplug]
[ 13.351663] ? regmap_volatile+0x40/0xb0
[ 13.351668] ? regcache_write+0x4c/0x90
[ 13.351676] ? mlxplat_mlxcpld_reg_write+0x24/0x30 [mlx_platform]
[ 13.351681] ? _regmap_write+0xea/0x220
[ 13.351688] ? __mutex_lock_slowpath+0x10/0x10
[ 13.351696] ? devm_add_action+0x70/0x70
[ 13.351701] ? mutex_unlock+0x1d/0x40
[ 13.351710] mlxreg_hotplug_probe+0x82e/0x989 [mlxreg_hotplug]
[ 13.351723] ? mlxreg_hotplug_work_handler+0x920/0x920 [mlxreg_hotplug]
[ 13.351731] ? sysfs_do_create_link_sd.isra.2+0xf4/0x190
[ 13.351737] ? sysfs_rename_link_ns+0xf0/0xf0
[ 13.351743] ? devres_close_group+0x2b0/0x2b0
[ 13.351749] ? pinctrl_put+0x20/0x20
[ 13.351755] ? acpi_dev_pm_attach+0x2c/0xd0
[ 13.351763] platform_drv_probe+0x70/0xd0
[ 13.351771] really_probe+0x480/0x6e0
[ 13.351778] ? device_attach+0x10/0x10
[ 13.351784] ? __lock_text_start+0x8/0x8
[ 13.351790] ? _raw_write_lock_irqsave+0x80/0x100
[ 13.351797] ? _raw_write_lock_irqsave+0x80/0x100
[ 13.351806] ? __driver_attach+0x190/0x190
[ 13.351812] driver_probe_device+0x17d/0x1a0
[ 13.351819] ? __driver_attach+0x190/0x190
[ 13.351825] bus_for_each_drv+0xd6/0x130
[ 13.351831] ? bus_rescan_devices+0x20/0x20
[ 13.351837] ? __mutex_lock_slowpath+0x10/0x10
[ 13.351845] __device_attach+0x18c/0x230
[ 13.351852] ? device_bind_driver+0x70/0x70
[ 13.351859] ? __mutex_lock_slowpath+0x10/0x10
[ 13.351866] bus_probe_device+0xea/0x110
[ 13.351874] deferred_probe_work_func+0x1c9/0x290
[ 13.351882] ? driver_deferred_probe_add+0x1d0/0x1d0
[ 13.351889] ? preempt_notifier_dec+0x20/0x20
[ 13.351897] ? read_word_at_a_time+0xe/0x20
[ 13.351904] ? strscpy+0x151/0x290
[ 13.351912] ? set_work_pool_and_clear_pending+0x9c/0xf0
[ 13.351918] ? __switch_to_asm+0x34/0x70
[ 13.351924] ? __switch_to_asm+0x40/0x70
[ 13.351929] ? __switch_to_asm+0x34/0x70
[ 13.351935] ? __switch_to_asm+0x40/0x70
[ 13.351942] process_one_work+0x5cc/0xa00
[ 13.351952] ? pwq_dec_nr_in_flight+0x1e0/0x1e0
[ 13.351960] ? pci_mmcfg_check_reserved+0x80/0xb8
[ 13.351967] ? run_rebalance_domains+0x250/0x250
[ 13.351980] ? stack_access_ok+0x35/0x80
[ 13.351986] ? deref_stack_reg+0xa1/0xe0
[ 13.351994] ? schedule+0xcd/0x250
[ 13.352000] ? worker_enter_idle+0x2d6/0x330
[ 13.352006] ? __schedule+0xeb0/0xeb0
[ 13.352014] ? fork_usermode_blob+0x130/0x130
[ 13.352019] ? mutex_lock+0xa7/0x100
[ 13.352026] ? _raw_spin_lock_irq+0x98/0xf0
[ 13.352032] ? _raw_read_unlock_irqrestore+0x30/0x30
[ 13.352037] i2c i2c-2: Added multiplexed i2c bus 11
[ 13.352043] worker_thread+0x181/0xa80
[ 13.352052] ? __switch_to_asm+0x34/0x70
[ 13.352058] ? __switch_to_asm+0x40/0x70
[ 13.352064] ? process_one_work+0xa00/0xa00
[ 13.352070] ? __switch_to_asm+0x34/0x70
[ 13.352076] ? __switch_to_asm+0x40/0x70
[ 13.352081] ? __switch_to_asm+0x34/0x70
[ 13.352086] ? __switch_to_asm+0x40/0x70
[ 13.352092] ? __switch_to_asm+0x34/0x70
[ 13.352097] ? __switch_to_asm+0x40/0x70
[ 13.352105] ? __schedule+0x3d6/0xeb0
[ 13.352112] ? migrate_swap_stop+0x470/0x470
[ 13.352119] ? save_stack+0x89/0xb0
[ 13.352127] ? kmem_cache_alloc_trace+0xe5/0x570
[ 13.352132] ? kthread+0x59/0x1d0
[ 13.352138] ? ret_from_fork+0x35/0x40
[ 13.352154] ? __schedule+0xeb0/0xeb0
[ 13.352161] ? remove_wait_queue+0x150/0x150
[ 13.352169] ? _raw_write_lock_irqsave+0x80/0x100
[ 13.352175] ? __lock_text_start+0x8/0x8
[ 13.352183] ? process_one_work+0xa00/0xa00
[ 13.352188] kthread+0x1a4/0x1d0
[ 13.352195] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 13.352202] ret_from_fork+0x35/0x40

[ 13.353879] The buggy address belongs to the page:
[ 13.353885] page:ffffea0008d419c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 13.353890] flags: 0x2ffff8000000000()
[ 13.353897] raw: 02ffff8000000000 ffffea0008d419c8 ffffea0008d419c8 0000000000000000
[ 13.353903] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 13.353905] page dumped because: kasan: bad access detected

[ 13.353908] Memory state around the buggy address:
[ 13.353912] ffff880235067000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 13.353917] ffff880235067080: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04
[ 13.353921] >ffff880235067100: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2 f2 f2 f2 04
[ 13.353923] ^
[ 13.353927] ffff880235067180: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 00 00 00 00 00
[ 13.353931] ffff880235067200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 13.353933] ==================================================================

The warning is caused by the below loop:
for_each_set_bit(bit, (unsigned long *)&asserted, 8) {
while "asserted" is declared as 'unsigned'.

The casting of 32-bit unsigned integer pointer to a 64-bit unsigned long
pointer. There are two problems here.
It causes the access of four extra byte, which can corrupt memory
The 32-bit pointer address may not be 64-bit aligned.

The fix changes variable "asserted" to "unsigned long".

Fixes: 1f976f6978bf ("platform/x86: Move Mellanox platform hotplug driver to platform/mellanox")
Signed-off-by: Vadim Pasternak
Signed-off-by: Darren Hart (VMware)
Signed-off-by: Sasha Levin

Vadim Pasternak
2019-04-06 04:33:03 +0800
0bacfb4ad platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN ... Browse Code »

[ Upstream commit 4d9b2864a415fec39150bc13efc730c7eb88711e ]

Commit ae7c8cba3221 ("platform/x86: ideapad-laptop: add lenovo RESCUER
R720-15IKBN to no_hw_rfkill_list") added
DMI_MATCH(DMI_BOARD_NAME, "80WW")
for Lenovo RESCUER R720-15IKBN.

But DMI_BOARD_NAME does not match 80WW on Lenovo RESCUER R720-15IKBN,
thus cause Wireless LAN still be hard blocked.

On Lenovo RESCUER R720-15IKBN:
~$ cat /sys/class/dmi/id/sys_vendor
LENOVO
~$ cat /sys/class/dmi/id/board_name
Provence-5R3
~$ cat /sys/class/dmi/id/product_name
80WW
~$ cat /sys/class/dmi/id/product_version
Lenovo R720-15IKBN

So on Lenovo RESCUER R720-15IKBN:
DMI_SYS_VENDOR should match "LENOVO",
DMI_BOARD_NAME should match "Provence-5R3",
DMI_PRODUCT_NAME should match "80WW",
DMI_PRODUCT_VERSION should match "Lenovo R720-15IKBN".

Fix it, and in according with other entries in no_hw_rfkill_list,
use DMI_PRODUCT_VERSION instead of DMI_BOARD_NAME.

Fixes: ae7c8cba3221 ("platform/x86: ideapad-laptop: add lenovo RESCUER R720-15IKBN to no_hw_rfkill_list")
Signed-off-by: Yang Fan
Signed-off-by: Darren Hart (VMware)
Signed-off-by: Sasha Levin

Yang Fan
2019-04-06 04:33:03 +0800
a64ffbaf7 mlxsw: spectrum: Avoid -Wformat-truncation warnings ... Browse Code »

[ Upstream commit ab2c4e2581ad32c28627235ff0ae8c5a5ea6899f ]

Give precision identifiers to the two snprintf() formatting the priority
and TC strings to avoid producing these two warnings:

drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_prio_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
directive output may be truncated writing between 1 and 3 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
output between 3 and 36 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_prio_stats[i].str, prio);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_tc_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
directive output may be truncated writing between 1 and 11 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
output between 3 and 44 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_tc_stats[i].str, tc);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Florian Fainelli
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Florian Fainelli
2019-04-06 04:33:02 +0800
49dd86f0f e1000e: Fix -Wformat-truncation warnings ... Browse Code »

[ Upstream commit 135e7245479addc6b1f5d031e3d7e2ddb3d2b109 ]

Provide precision hints to snprintf() since we know the destination
buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
following warnings:

drivers/net/ethernet/intel/e1000e/netdev.c: In function
'e1000_request_msix':
drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
"%s-rx-0", netdev->name);
^
drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->rx_ring->name,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(adapter->rx_ring->name) - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-rx-0", netdev->name);
~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
"%s-tx-0", netdev->name);
^
drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->tx_ring->name,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(adapter->tx_ring->name) - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-tx-0", netdev->name);
~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Florian Fainelli
2019-04-06 04:33:02 +0800
c6fb45d89 net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat ... Browse Code »

[ Upstream commit f6d9758b12660484b6639364cc406da92a918c96 ]

The following false positive lockdep splat has been observed.

======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #302 Not tainted
------------------------------------------------------
systemd-udevd/160 is trying to acquire lock:
edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

but task is already holding lock:
edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&desc->request_mutex){+.+.}:
mutex_lock_nested+0x1c/0x24
__setup_irq+0xa0/0x704
request_threaded_irq+0xd0/0x150
mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
mdio_probe+0x2c/0x54
really_probe+0x200/0x2c4
driver_probe_device+0x5c/0x174
__driver_attach+0xd8/0xdc
bus_for_each_dev+0x58/0x7c
bus_add_driver+0xe4/0x1f0
driver_register+0x7c/0x110
mdio_driver_register+0x24/0x58
do_one_initcall+0x74/0x2e8
do_init_module+0x60/0x1d0
load_module+0x1968/0x1ff4
sys_finit_module+0x8c/0x98
ret_fast_syscall+0x0/0x28
0xbedf2ae8

-> #0 (&chip->reg_lock){+.+.}:
__mutex_lock+0x50/0x8b8
mutex_lock_nested+0x1c/0x24
__setup_irq+0x640/0x704
request_threaded_irq+0xd0/0x150
mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
mdio_probe+0x2c/0x54
really_probe+0x200/0x2c4
driver_probe_device+0x5c/0x174
__driver_attach+0xd8/0xdc
bus_for_each_dev+0x58/0x7c
bus_add_driver+0xe4/0x1f0
driver_register+0x7c/0x110
mdio_driver_register+0x24/0x58
do_one_initcall+0x74/0x2e8
do_init_module+0x60/0x1d0
load_module+0x1968/0x1ff4
sys_finit_module+0x8c/0x98
ret_fast_syscall+0x0/0x28
0xbedf2ae8

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&desc->request_mutex);
lock(&chip->reg_lock);
lock(&desc->request_mutex);
lock(&chip->reg_lock);

&desc->request_mutex refer to two different mutex. #1 is the GPIO for
the chip interrupt. #2 is the chained interrupt between global 1 and
global 2.

Add lockdep classes to the GPIO interrupt to avoid this.

Reported-by: Russell King
Signed-off-by: Andrew Lunn
Signed-off-by: David S. Miller

Signed-off-by: Sasha Levin

Andrew Lunn
2019-04-06 04:33:02 +0800
194b888af mmc: omap: fix the maximum timeout setting ... Browse Code »

[ Upstream commit a6327b5e57fdc679c842588c3be046c0b39cc127 ]

When running OMAP1 kernel on QEMU, MMC access is annoyingly noisy:

MMC: CTO of 0xff and 0xfe cannot be used!
MMC: CTO of 0xff and 0xfe cannot be used!
MMC: CTO of 0xff and 0xfe cannot be used!
[ad inf.]

Emulator warnings appear to be valid. The TI document SPRU680 [1]
("OMAP5910 Dual-Core Processor MultiMedia Card/Secure Data Memory Card
(MMC/SD) Reference Guide") page 36 states that the maximum timeout is 253
cycles and "0xff and 0xfe cannot be used".

Fix by using 0xfd as the maximum timeout.

Tested using QEMU 2.5 (Siemens SX1 machine, OMAP310), and also checked on
real hardware using Palm TE (OMAP310), Nokia 770 (OMAP1710) and Nokia N810
(OMAP2420) that MMC works as before.

[1] http://www.ti.com/lit/ug/spru680/spru680.pdf

Fixes: 730c9b7e6630f ("[MMC] Add OMAP MMC host driver")
Signed-off-by: Aaro Koskinen
Signed-off-by: Ulf Hansson
Signed-off-by: Sasha Levin

Aaro Koskinen
2019-04-06 04:33:02 +0800
dcedd3795 btrfs: qgroup: Make qgroup async transaction commit more aggressive ... Browse Code »

[ Upstream commit f5fef4593653dfa2a865c485bb81415de51d5c99 ]

[BUG]
Btrfs qgroup will still hit EDQUOT under the following case:

$ dev=/dev/test/test
$ mnt=/mnt/btrfs
$ umount $mnt &> /dev/null
$ umount $dev &> /dev/null

$ mkfs.btrfs -f $dev
$ mount $dev $mnt -o nospace_cache

$ btrfs subv create $mnt/subv
$ btrfs quota enable $mnt
$ btrfs quota rescan -w $mnt
$ btrfs qgroup limit -e 1G $mnt/subv

$ fallocate -l 900M $mnt/subv/padding
$ sync

$ rm $mnt/subv/padding

# Hit EDQUOT
$ xfs_io -f -c "pwrite 0 512M" $mnt/subv/real_file

[CAUSE]
Since commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance
to reduce early EDQUOT"), btrfs is not forced to commit transaction to
reclaim more quota space.

Instead, we just check pertrans metadata reservation against some
threshold and try to do asynchronously transaction commit.

However in above case, the pertrans metadata reservation is pretty small
thus it will never trigger asynchronous transaction commit.

[FIX]
Instead of only accounting pertrans metadata reservation, we calculate
how much free space we have, and if there isn't much free space left,
commit transaction asynchronously to try to free some space.

This may slow down the fs when we have less than 32M free qgroup space,
but should reduce a lot of false EDQUOT, so the cost should be
acceptable.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin

Qu Wenruo
2019-04-06 04:33:02 +0800
6cf5f631b powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback ... Browse Code »

[ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]

After we ALIGN up the address we need to make sure we didn't overflow
and resulted in zero address. In that case, we need to make sure that
the returned address is greater than mmap_min_addr.

This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
run as non root user for

mmap(-1, MAP_HUGETLB)

The bug is that a non-root user requesting address -1 will be given address 0
which will then fail, whereas they should have been given something else that
would have succeeded.

We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
with this change. So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped. ie. there are existing capability
checks to prevent a user mapping below mmap_min_addr and those will still be
honoured even without this fix.

Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
Reviewed-by: Laurent Dufour
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Aneesh Kumar K.V
2019-04-06 04:33:02 +0800
fc96b44c0 iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables ... Browse Code »

[ Upstream commit 032ebd8548c9d05e8d2bdc7a7ec2fe29454b0ad0 ]

L1 tables are allocated with __get_dma_pages, and therefore already
ignored by kmemleak.

Without this, the kernel would print this error message on boot,
when the first L1 table is allocated:

[ 2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black
[ 2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S 4.19.16 #8
[ 2.831227] Workqueue: events deferred_probe_work_func
[ 2.836353] Call trace:
...
[ 2.852532] paint_ptr+0xa0/0xa8
[ 2.855750] kmemleak_ignore+0x38/0x6c
[ 2.859490] __arm_v7s_alloc_table+0x168/0x1f4
[ 2.863922] arm_v7s_alloc_pgtable+0x114/0x17c
[ 2.868354] alloc_io_pgtable_ops+0x3c/0x78
...

Fixes: e5fc9753b1a8314 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
Signed-off-by: Nicolas Boichat
Acked-by: Will Deacon
Signed-off-by: Joerg Roedel
Signed-off-by: Sasha Levin

Nicolas Boichat
2019-04-06 04:33:02 +0800
d81bdb3c1 ARM: 8840/1: use a raw_spinlock_t in unwind ... Browse Code »

[ Upstream commit 74ffe79ae538283bbf7c155e62339f1e5c87b55a ]

Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.

I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then

->new_slab_objects()
->new_slab()
->setup_object()
->setup_object_debug()
->init_tracking()
->set_track()
->save_stack_trace()
->save_stack_trace_tsk()
->walk_stackframe()
->unwind_frame()
->unwind_find_idx()
=>spin_lock_irqsave(&unwind_lock);

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Russell King
Signed-off-by: Sasha Levin

Sebastian Andrzej Siewior
2019-04-06 04:33:01 +0800
951307172 serial: 8250_pxa: honor the port number from devicetree ... Browse Code »

[ Upstream commit fe9ed6d2483fda55465f32924fb15bce0fac3fac ]

Like the other OF-enabled drivers, use the port number from the firmware if
the devicetree specifies an alias:

aliases {
...
serial2 = &uart2; /* Should be ttyS2 */
}

This is how the deprecated pxa.c driver behaved, switching to 8250_pxa
messes up the numbering.

Signed-off-by: Lubomir Rintel
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Sasha Levin

Lubomir Rintel
2019-04-06 04:33:01 +0800
2636ccec9 coresight: etm4x: Add support to enable ETMv4.2 ... Browse Code »

[ Upstream commit 5666dfd1d8a45a167f0d8b4ef47ea7f780b1f24a ]

SDM845 has ETMv4.2 and can use the existing etm4x driver.
But the current etm driver checks only for ETMv4.0 and
errors out for other etm4x versions. This patch adds this
missing support to enable SoC's with ETMv4x to use same
driver by checking only the ETM architecture major version
number.

Without this change, we get below error during etm probe:

/ # dmesg | grep etm
[ 6.660093] coresight-etm4x: probe of 7040000.etm failed with error -22
[ 6.666902] coresight-etm4x: probe of 7140000.etm failed with error -22
[ 6.673708] coresight-etm4x: probe of 7240000.etm failed with error -22
[ 6.680511] coresight-etm4x: probe of 7340000.etm failed with error -22
[ 6.687313] coresight-etm4x: probe of 7440000.etm failed with error -22
[ 6.694113] coresight-etm4x: probe of 7540000.etm failed with error -22
[ 6.700914] coresight-etm4x: probe of 7640000.etm failed with error -22
[ 6.707717] coresight-etm4x: probe of 7740000.etm failed with error -22

With this change, etm probe is successful:

/ # dmesg | grep etm
[ 6.659198] coresight-etm4x 7040000.etm: CPU0: ETM v4.2 initialized
[ 6.665848] coresight-etm4x 7140000.etm: CPU1: ETM v4.2 initialized
[ 6.672493] coresight-etm4x 7240000.etm: CPU2: ETM v4.2 initialized
[ 6.679129] coresight-etm4x 7340000.etm: CPU3: ETM v4.2 initialized
[ 6.685770] coresight-etm4x 7440000.etm: CPU4: ETM v4.2 initialized
[ 6.692403] coresight-etm4x 7540000.etm: CPU5: ETM v4.2 initialized
[ 6.699024] coresight-etm4x 7640000.etm: CPU6: ETM v4.2 initialized
[ 6.705646] coresight-etm4x 7740000.etm: CPU7: ETM v4.2 initialized

Signed-off-by: Sai Prakash Ranjan
Reviewed-by: Suzuki K Poulose
Signed-off-by: Mathieu Poirier
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Sasha Levin

Sai Prakash Ranjan
2019-04-06 04:33:01 +0800
c70214d51 powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc ... Browse Code »

[ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]

When building with -Wsometimes-uninitialized, Clang warns:

arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
if (opcode == NULL)
^~~~~~
arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
condition is always true
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
'opcode' to silence this warning
const struct powerpc_opcode *opcode;
^
= NULL
1 warning generated.

This warning seems to make no sense on the surface because opcode is set
to NULL right below this statement. However, there is a comma instead of
semicolon to end the dialect assignment, meaning that the opcode
assignment only happens in the if statement. Properly terminate that
line so that Clang no longer warns.

Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
Signed-off-by: Nathan Chancellor
Reviewed-by: Nick Desaulniers
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Nathan Chancellor
2019-04-06 04:33:01 +0800
638ecaf58 kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing ... Browse Code »

[ Upstream commit 9390dff66a52d1a60c6e517d8fa6cdbdffc83cb1 ]

If include/config/auto.conf.cmd is lost for some reasons, it is not
self-healing, so the top Makefile misses to run syncconfig.
Move include/config/auto.conf.cmd to the target side.

I used a pattern rule instead of a normal rule here although it is
a bit gross.

If the rule were written with a normal rule like this,

include/config/auto.conf \
include/config/auto.conf.cmd \
include/config/tristate.conf: $(KCONFIG_CONFIG)
$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

... syncconfig would be executed per target.

Using a pattern rule makes sure that syncconfig is executed just once
because Make assumes the recipe will create all of the targets.

Here is a quote from the GNU Make manual [1]:

"Pattern rules may have more than one target. Unlike normal rules,
this does not act as many different rules with the same prerequisites
and recipe. If a pattern rule has multiple targets, make knows that
the rule's recipe is responsible for making all of the targets. The
recipe is executed only once to make all the targets. When searching
for a pattern rule to match a target, the target patterns of a rule
other than the one that matches the target in need of a rule are
incidental: make worries only about giving a recipe and prerequisites
to the file presently in question. However, when this file's recipe is
run, the other targets are marked as having been updated themselves."

[1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html

Signed-off-by: Masahiro Yamada
Signed-off-by: Sasha Levin

Masahiro Yamada
2019-04-06 04:33:01 +0800
5db107484 scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c ... Browse Code »

[ Upstream commit 1749ef00f7312679f76d5e9104c5d1e22a829038 ]

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838] sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
GFP_KERNEL.

Here is the contexts for the three functions so far:

scsi_alloc_sdev()
scsi_probe_and_add_lun()
scsi_sequential_lun_scan()
__scsi_scan_target()
scsi_scan_target()
mutex_lock()
scsi_scan_channel()
scsi_scan_host_selected()
mutex_lock()
scsi_report_lun_scan()
__scsi_scan_target()
...
__scsi_add_device()
mutex_lock()
__scsi_scan_target()
...
scsi_report_lun_scan()
...
scsi_get_host_dev()
mutex_lock()

scsi_probe_and_add_lun()
...

scsi_add_lun()
scsi_probe_and_add_lun()
...

So replace all these, and give them a bit of a better chance to succeed,
with more chances of reclaim.

Signed-off-by: Benjamin Block
Reviewed-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin

Benjamin Block
2019-04-06 04:33:01 +0800
4acf79745 powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables ... Browse Code »

[ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]

We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.

However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.

To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.

As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.

This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.

We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.

Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Alexey Kardashevskiy
2019-04-06 04:33:01 +0800
6030bcc04 usb: chipidea: Grab the (legacy) USB PHY by phandle first ... Browse Code »

[ Upstream commit 68ef236274793066b9ba3154b16c0acc1c891e5c ]

According to the chipidea driver bindings, the USB PHY is specified via
the "phys" phandle node. However, this only takes effect for USB PHYs
that use the common PHY framework. For legacy USB PHYs, a simple lookup
based on the USB PHY type is done instead.

This does not play out well when more than one USB PHY is registered,
since the first registered PHY matching the type will always be
returned regardless of what the driver was bound to.

Fix this by looking up the PHY based on the "phys" phandle node.
Although generic PHYs are rather matched by their "phys-name" and not
the "phys" phandle directly, there is no helper for similar lookup on
legacy PHYs and it's probably not worth the effort to add it.

When no legacy USB PHY is found by phandle, fallback to grabbing any
registered USB2 PHY. This ensures backward compatibility if some users
were actually relying on this mechanism.

Signed-off-by: Paul Kocialkowski
Signed-off-by: Peter Chen
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Sasha Levin

Paul Kocialkowski
2019-04-06 04:33:01 +0800
b142c7973 crypto: cavium/zip - fix collision with generic cra_driver_name ... Browse Code »

[ Upstream commit 41798036430015ad45137db2d4c213cd77fd0251 ]

The cavium/zip implementation of the deflate compression algorithm is
incorrectly being registered under the generic driver name, which
prevents the generic implementation from being registered with the
crypto API when CONFIG_CRYPTO_DEV_CAVIUM_ZIP=y. Similarly the lzs
algorithm (which does not currently have a generic implementation...)
is incorrectly being registered as lzs-generic.

Fix the naming collision by adding a suffix "-cavium" to the
cra_driver_name of the cavium/zip algorithms.

Fixes: 640035a2dc55 ("crypto: zip - Add ThunderX ZIP driver core")
Cc: Mahipal Challa
Cc: Jan Glauber
Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin

Eric Biggers
2019-04-06 04:33:01 +0800
d401d1211 crypto: crypto4xx - add missing of_node_put after of_device_is_available ... Browse Code »

[ Upstream commit 8c2b43d2d85b48a97d2f8279278a4aac5b45f925 ]

Add an of_node_put when a tested device node is not available.

The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):

//
@@
identifier f;
local idexpression e;
expression x;
@@

e = f(...);
... when != of_node_put(e)
when != x = e
when != e = x
when any
if () {
... when != of_node_put(e)
(
return e;
|
+ of_node_put(e);
return ...;
)
}
//

Fixes: 5343e674f32fb ("crypto4xx: integrate ppc4xx-rng into crypto4xx")
Signed-off-by: Julia Lawall
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin

Julia Lawall
2019-04-06 04:33:00 +0800
241ebd2ea mt76: fix a leaked reference by adding a missing of_node_put ... Browse Code »

[ Upstream commit 34e022d8b780a03902d82fb3997ba7c7b1f40c81 ]

The call to of_find_node_by_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./drivers/net/wireless/mediatek/mt76/eeprom.c:58:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:61:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:67:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:70:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:72:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.

Signed-off-by: Wen Yang
Cc: Felix Fietkau
Cc: Lorenzo Bianconi
Cc: Kalle Valo
Cc: "David S. Miller"
Cc: Matthias Brugger
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Kalle Valo
Signed-off-by: Sasha Levin

Wen Yang
2019-04-06 04:33:00 +0800
6115055b4 wil6210: check null pointer in _wil_cfg80211_merge_extra_ies ... Browse Code »

[ Upstream commit de77a53c2d1e8fb3621e63e8e1f0f0c9a1a99ff7 ]

ies1 or ies2 might be null when code inside
_wil_cfg80211_merge_extra_ies access them.
Add explicit check for null and make sure ies1/ies2 are not
accessed in such a case.

spos might be null and be accessed inside
_wil_cfg80211_merge_extra_ies.
Add explicit check for null in the while condition statement
and make sure spos is not accessed in such a case.

Signed-off-by: Alexei Avshalom Lazar
Signed-off-by: Maya Erez
Signed-off-by: Kalle Valo
Signed-off-by: Sasha Levin

Alexei Avshalom Lazar
2019-04-06 04:33:00 +0800
9546c3662 PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() ... Browse Code »

[ Upstream commit 95c80bc6952b6a5badc7b702d23e5bf14d251e7c ]

Dongdong reported a deadlock triggered by a hotplug event during a sysfs
"remove" operation:

pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
# echo 1 > 0000:00:0c.0/remove

PME and hotplug share an MSI/MSI-X vector. The sysfs "remove" side is:

remove_store
pci_stop_and_remove_bus_device_locked
pci_lock_rescan_remove
pci_stop_and_remove_bus_device
...
pcie_pme_remove
pcie_pme_suspend
synchronize_irq # wait for hotplug IRQ handler
pci_unlock_rescan_remove

The hotplug side is:

pciehp_ist
pciehp_handle_presence_or_link_change
pciehp_configure_device
pci_lock_rescan_remove # wait for pci_unlock_rescan_remove()

INFO: task bash:10913 blocked for more than 120 seconds.

# ps -ax |grep D
PID TTY STAT TIME COMMAND
10913 ttyAMA0 Ds+ 0:00 -bash
14022 ? D 0:00 [irq/745-pciehp]

# cat /proc/14022/stack
__switch_to+0x94/0xd8
pci_lock_rescan_remove+0x20/0x28
pciehp_configure_device+0x30/0x140
pciehp_handle_presence_or_link_change+0x324/0x458
pciehp_ist+0x1dc/0x1e0

# cat /proc/10913/stack
__switch_to+0x94/0xd8
synchronize_irq+0x8c/0xc0
pcie_pme_suspend+0xa4/0x118
pcie_pme_remove+0x20/0x40
pcie_port_remove_service+0x3c/0x58
...
pcie_port_device_remove+0x2c/0x48
pcie_portdrv_remove+0x68/0x78
pci_device_remove+0x48/0x120
...
pci_stop_bus_device+0x84/0xc0
pci_stop_and_remove_bus_device_locked+0x24/0x40
remove_store+0xa4/0xb8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80

It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
reasons.

First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
native hotplug interrupt handler as well as for the PME one, because they
share one IRQ (as per the spec). That may deadlock if hotplug is signaled
while pcie_pme_remove() is running and the latter calls
pci_lock_rescan_remove() before the former.

Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
for the port, it will return without disabling the interrupt as expected by
pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM:
Fix native PME handling during system suspend/resume").

To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
its status and prevent the PME worker function from re-enabling it before
calling free_irq() on it, which should be sufficient.

Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume")
Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.com
Reported-by: Dongdong Liu
Signed-off-by: Rafael J. Wysocki
[bhelgaas: add URL and deadlock details from Dongdong]
Signed-off-by: Bjorn Helgaas
Signed-off-by: Sasha Levin

Rafael J. Wysocki
2019-04-06 04:33:00 +0800
224c996e4 tools lib traceevent: Fix buffer overflow in arg_eval ... Browse Code »

[ Upstream commit 7c5b019e3a638a5a290b0ec020f6ca83d2ec2aaa ]

Fix buffer overflow observed when running perf test.

The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
resulting in -9223372036854775808 which overflows the 20 character
buffer.

If is possible this bug has been reported before but I still don't see
any fix checked in:

See: https://www.spinics.net/lists/linux-perf-users/msg07714.html

Reported-by: Michael Sartain
Reported-by: Mathias Krause
Signed-off-by: Tony Jones
Acked-by: Steven Rostedt (VMware)
Cc: Frederic Weisbecker
Fixes: f7d82350e597 ("tools/events: Add files to create libtraceevent.a")
Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.de
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Sasha Levin

Tony Jones
2019-04-06 04:33:00 +0800
83c395332 fs: fix guard_bio_eod to check for real EOD errors ... Browse Code »

[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above

Reviewed-by: Ming Lei
Signed-off-by: Carlos Maiolino
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Carlos Maiolino
2019-04-06 04:33:00 +0800
6a817a7ae jbd2: fix invalid descriptor block checksum ... Browse Code »

[ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]

In jbd2_journal_commit_transaction(), if we are in abort mode,
we may flush the buffer without setting descriptor block checksum
by goto start_journal_io. Then fs is mounted,
jbd2_descriptor_block_csum_verify() failed.

[ 271.379811] EXT4-fs (vdd): shut down requested (2)
[ 271.381827] Aborting journal on device vdd-8.
[ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
[ 271.598023] JBD2: recovery failed
[ 271.598484] EXT4-fs (vdd): error loading journal

Fix this problem by keep setting descriptor block checksum if the
descriptor buffer is not NULL.

This checksum problem can be reproduced by xfstests generic/388.

Signed-off-by: luojiajun
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Signed-off-by: Sasha Levin

luojiajun
2019-04-06 04:33:00 +0800
ca66f6671 netfilter: conntrack: tcp: only close if RST matches exact sequence ... Browse Code »

[ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]

TCP resets cause instant transition from established to closed state
provided the reset is in-window. Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e. whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1 win 64240
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R 2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence. Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset. Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin

Florian Westphal
2019-04-06 04:33:00 +0800
709aaa09b netfilter: nf_tables: check the result of dereferencing base_chain->stats ... Browse Code »

[ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]

Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.

base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.

And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.

And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.

Thanks for Eric's help to finish this patch.

Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet
Signed-off-by: Zhang Yu
Signed-off-by: Li RongQing
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin

Li RongQing
2019-04-06 04:33:00 +0800
36a3219e6 cifs: Fix NULL pointer dereference of devname ... Browse Code »

[ Upstream commit 68e2672f8fbd1e04982b8d2798dd318bf2515dd2 ]

There is a NULL pointer dereference of devname in strspn()

The oops looks something like:

CIFS: Attempting to mount (null)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:strspn+0x0/0x50
...
Call Trace:
? cifs_parse_mount_options+0x222/0x1710 [cifs]
? cifs_get_volume_info+0x2f/0x80 [cifs]
cifs_setup_volume_info+0x20/0x190 [cifs]
cifs_get_volume_info+0x50/0x80 [cifs]
cifs_smb3_do_mount+0x59/0x630 [cifs]
? ida_alloc_range+0x34b/0x3d0
cifs_do_mount+0x11/0x20 [cifs]
mount_fs+0x52/0x170
vfs_kern_mount+0x6b/0x170
do_mount+0x216/0xdc0
ksys_mount+0x83/0xd0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix this by adding a NULL check on devname in cifs_parse_devname()

Signed-off-by: Yao Liu
Signed-off-by: Steve French
Signed-off-by: Sasha Levin

Yao Liu
2019-04-06 04:33:00 +0800
d579b4eae cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED ... Browse Code »

[ Upstream commit 969ae8e8d4ee54c99134d3895f2adf96047f5bee ]

Old windows version or Netapp SMB server will return
NT_STATUS_NOT_SUPPORTED since they do not allow or implement
FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
provided it's properly signed.

See
https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/

and

MS-SMB2 validate negotiate response processing:
https://msdn.microsoft.com/en-us/library/hh880630.aspx

Samba client had already handled it.
https://bugzilla.samba.org/attachment.cgi?id=13285&action=edit

Signed-off-by: Namjae Jeon
Signed-off-by: Steve French
Signed-off-by: Sasha Levin

Namjae Jeon
2019-04-06 04:32:59 +0800
4ab78f4d7 f2fs: fix to check inline_xattr_size boundary correctly ... Browse Code »

[ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]

We use below condition to check inline_xattr_size boundary:

if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)

There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.

.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Chao Yu
2019-04-06 04:32:59 +0800
8c81fcd3d dm thin: add sanity checks to thin-pool and external snapshot creation ... Browse Code »

[ Upstream commit 70de2cbda8a5d788284469e755f8b097d339c240 ]

Invoking dm_get_device() twice on the same device path with different
modes is dangerous. Because in that case, upgrade_mode() will alloc a
new 'dm_dev' and free the old one, which may be referenced by a previous
caller. Dereferencing the dangling pointer will trigger kernel NULL
pointer dereference.

The following two cases can reproduce this issue. Actually, they are
invalid setups that must be disallowed, e.g.:

1. Creating a thin-pool with read_only mode, and the same device as
both metadata and data.

dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
...
Call Trace:
new_read+0xfb/0x110 [dm_bufio]
dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
? kmem_cache_alloc_trace+0x15c/0x1e0
__create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
? realloc_argv+0x50/0x70 [dm_mod]
dm_table_add_target+0x14e/0x330 [dm_mod]
table_load+0x122/0x2e0 [dm_mod]
? dev_status+0x40/0x40 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
? handle_mm_fault+0xda/0x200
? __do_page_fault+0x26c/0x4f0
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9

2. Creating a external snapshot using the same thin-pool device.

dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
dmsetup message /dev/mapper/thinp 0 "create_thin 0"
dmsetup create snap --table \
"0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
Call Trace:
? __alloc_pages_nodemask+0x13c/0x2e0
retrieve_status+0xa5/0x1f0 [dm_mod]
? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
table_status+0x61/0xa0 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
ksys_ioctl+0x60/0x90
? ksys_write+0x4f/0xb0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Jason Cai (Xiang Feng)
Signed-off-by: Mike Snitzer
Signed-off-by: Sasha Levin

Jason Cai (Xiang Feng)
2019-04-06 04:32:59 +0800
626d98bbd cifs: use correct format characters ... Browse Code »

[ Upstream commit 259594bea574e515a148171b5cd84ce5cbdc028a ]

When compiling with -Wformat, clang emits the following warnings:

fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
tgt_total_cnt, total_in_tgt);
^~~~~~~~~~~~

fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->flags, ref->server_type);
^~~~~~~~~~

fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->flags, ref->server_type);
^~~~~~~~~~~~~~~~

fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->ref_flag, ref->path_consumed);
^~~~~~~~~~~~~

fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->ref_flag, ref->path_consumed);
^~~~~~~~~~~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378

Signed-off-by: Louis Taylor
Signed-off-by: Steve French
Reviewed-by: Nick Desaulniers
Signed-off-by: Sasha Levin

Louis Taylor
2019-04-06 04:32:59 +0800
a6c56bf63 page_poison: play nicely with KASAN ... Browse Code »

[ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ]

KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
It triggers false positives in the allocation path:

BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
Read of size 8 at addr ffff88881f800000 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
__asan_report_load8_noabort+0x19/0x20
memchr_inv+0x2ea/0x330
kernel_poison_pages+0x103/0x3d5
get_page_from_freelist+0x15e7/0x4d90

because KASAN has not yet unpoisoned the shadow page for allocation
before it checks memchr_inv() but only found a stale poison pattern.

Also, false positives in free path,

BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
check_memory_region+0x22d/0x250
memset+0x28/0x40
kernel_poison_pages+0x29e/0x3d5
__free_pages_ok+0x75f/0x13e0

due to KASAN adds poisoned redzones around slab objects, but the page
poisoning needs to poison the whole page.

Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
Signed-off-by: Qian Cai
Acked-by: Andrey Ryabinin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Qian Cai
2019-04-06 04:32:59 +0800
d609ecd88 fs/file.c: initialize init_files.resize_wait ... Browse Code »

[ Upstream commit 5704a06810682683355624923547b41540e2801a ]

(Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)

'get_unused_fd_flags' in kthread cause kernel crash. It works fine on
4.1, but causes crash after get 64 fds. It also cause crash on
ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
same.

The crash message on centos7.5 shows below:

start fd 61
start fd 62
start fd 63
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: __wake_up_common+0x2e/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
RIP: 0010:__wake_up_common+0x2e/0x90
RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
Call Trace:
__wake_up+0x39/0x50
expand_files+0x131/0x250
__alloc_fd+0x47/0x170
get_unused_fd_flags+0x30/0x40
test_fd+0x12a/0x1c0 [test]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
RIP __wake_up_common+0x2e/0x90
RSP
CR2: 0000000000000000

This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
(3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not
initialized before being used.

Reported-by: Richard Zhang
Reviewed-by: Andrew Morton
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Shuriyc Chu
2019-04-06 04:32:59 +0800
9b4f27667 f2fs: do not use mutex lock in atomic context ... Browse Code »

[ Upstream commit 9083977dabf3833298ddcd40dee28687f1e6b483 ]

Fix below warning coming because of using mutex lock in atomic context.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
Preemption disabled at: __radix_tree_preload+0x28/0x130
Call trace:
dump_backtrace+0x0/0x2b4
show_stack+0x20/0x28
dump_stack+0xa8/0xe0
___might_sleep+0x144/0x194
__might_sleep+0x58/0x8c
mutex_lock+0x2c/0x48
f2fs_trace_pid+0x88/0x14c
f2fs_set_node_page_dirty+0xd0/0x184

Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
spin_lock() acquired.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Sahitya Tummala
2019-04-06 04:32:59 +0800
20141feb9 ocfs2: fix a panic problem caused by o2cb_ctl ... Browse Code »

[ Upstream commit cc725ef3cb202ef2019a3c67c8913efa05c3cce6 ]

In the process of creating a node, it will cause NULL pointer
dereference in kernel if o2cb_ctl failed in the interval (mkdir,
o2cb_set_node_attribute(node_num)] in function o2cb_add_node.

The node num is initialized to 0 in function o2nm_node_group_make_item,
o2nm_node_group_drop_item will mistake the node number 0 for a valid
node number when we delete the node before the node number is set
correctly. If the local node number of the current host happens to be
0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
o2hb_thread still running. The panic stack is generated as follows:

o2hb_thread
\-o2hb_do_disk_heartbeat
\-o2hb_check_own_slot
|-slot = ®->hr_slots[o2nm_this_node()];
//o2nm_this_node() return O2NM_INVALID_NODE_NUM

We need to check whether the node number is set when we delete the node.

Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com
Signed-off-by: Jia Guo
Reviewed-by: Joseph Qi
Acked-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Jia Guo
2019-04-06 04:32:59 +0800