06 Apr, 2019

40 commits

  • [ Upstream commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 ]

    The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33
    ("perf annotate: No need to calculate notes->start twice") removed notes->start
    assignment in symbol__calc_lines(). It will get failed in
    find_address_in_section() from symbol__tty_annotate() subroutine as the
    a2l->addr is wrong. So the annotate summary doesn't report the line number of
    source code correctly.

    Before fix:

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
    void hotspot_1(void)
    {
    volatile int i;

    for (i = 0; i < 0x10000000; i++);
    for (i = 0; i < 0x10000000; i++);
    for (i = 0; i < 0x10000000; i++);
    }

    int main(void)
    {
    hotspot_1();

    return 0;
    }
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

    Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
    ----------------------------------------------

    19.30 common_while_1[32]
    19.03 common_while_1[4e]
    19.01 common_while_1[16]
    5.04 common_while_1[13]
    4.99 common_while_1[4b]
    4.78 common_while_1[2c]
    4.77 common_while_1[10]
    4.66 common_while_1[2f]
    4.59 common_while_1[51]
    4.59 common_while_1[35]
    4.52 common_while_1[19]
    4.20 common_while_1[56]
    0.51 common_while_1[48]
    Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
    -----------------------------------------------------------------------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 00000000000005fa :
    : hotspot_1():
    : void hotspot_1(void)
    : {
    0.00 : 5fa: push %rbp
    0.00 : 5fb: mov %rsp,%rbp
    : volatile int i;
    :
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 5fe: movl $0x0,-0x4(%rbp)
    0.00 : 605: jmp 610
    0.00 : 607: mov -0x4(%rbp),%eax
    common_while_1[10] 4.77 : 60a: add $0x1,%eax
    common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp)
    common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax
    common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax
    0.00 : 618: jle 607
    : for (i = 0; i < 0x10000000; i++);
    ...

    After fix:

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

    Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
    ----------------------------------------------

    33.34 common_while_1.c:5
    33.34 common_while_1.c:6
    33.32 common_while_1.c:7
    Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
    -----------------------------------------------------------------------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 00000000000005fa :
    : hotspot_1():
    : void hotspot_1(void)
    : {
    0.00 : 5fa: push %rbp
    0.00 : 5fb: mov %rsp,%rbp
    : volatile int i;
    :
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 5fe: movl $0x0,-0x4(%rbp)
    0.00 : 605: jmp 610
    0.00 : 607: mov -0x4(%rbp),%eax
    common_while_1.c:5 4.70 : 60a: add $0x1,%eax
    4.89 : 60d: mov %eax,-0x4(%rbp)
    common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax
    common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax
    0.00 : 618: jle 607
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 61a: movl $0x0,-0x4(%rbp)
    0.00 : 621: jmp 62c
    0.00 : 623: mov -0x4(%rbp),%eax
    common_while_1.c:6 4.54 : 626: add $0x1,%eax
    4.73 : 629: mov %eax,-0x4(%rbp)
    common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax
    common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax
    ...

    Signed-off-by: Wei Li
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice")
    Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Wei Li
     
  • [ Upstream commit d13501a2bedfbea0983cc868d3f1dc692627f60d ]

    Custom approximation of fractional-divider may not need parent clock
    rate checking. For example Rockchip SoCs work fine using grand parent
    clock rate even if target rate is greater than parent.

    This patch checks parent clock rate only if CLK_SET_RATE_PARENT flag
    is set.

    For detailed example, clock tree of Rockchip I2S audio hardware.
    - Clock rate of CPLL is 1.2GHz, GPLL is 491.52MHz.
    - i2s1_div is integer divider can divide N (N is 1~128).
    Input clock is CPLL or GPLL. Initial divider value is N = 1.
    Ex) PLL = CPLL, N = 10, i2s1_div output rate is
    CPLL / 10 = 1.2GHz / 10 = 120MHz
    - i2s1_frac is fractional divider can divide input to x/y, x and
    y are 16bit integer.

    CPLL --> | selector | ---> i2s1_div -+--> | selector | --> I2S1 MCLK
    GPLL --> | | ,--------------' | |
    `--> i2s1_frac ---> | |

    Clock mux system try to choose suitable one from i2s1_div and
    i2s1_frac for master clock (MCLK) of I2S1.

    Bad scenario as follows:
    - Try to set MCLK to 8.192MHz (32kHz audio replay)
    Candidate setting is
    - i2s1_div: GPLL / 60 = 8.192MHz
    i2s1_div candidate is exactly same as target clock rate, so mux
    choose this clock source. i2s1_div output rate is changed
    491.52MHz -> 8.192MHz

    - After that try to set to 11.2896MHz (44.1kHz audio replay)
    Candidate settings are
    - i2s1_div : CPLL / 107 = 11.214945MHz
    - i2s1_frac: i2s1_div = 8.192MHz
    This is because clk_fd_round_rate() thinks target rate
    (11.2896MHz) is higher than parent rate (i2s1_div = 8.192MHz)
    and returns parent clock rate.

    Above is current upstreamed behavior. Clock mux system choose
    i2s1_div, but this clock rate is not acceptable for I2S driver, so
    users cannot replay audio.

    Expected behavior is:
    - Try to set master clock to 11.2896MHz (44.1kHz audio replay)
    Candidate settings are
    - i2s1_div : CPLL / 107 = 11.214945MHz
    - i2s1_frac: i2s1_div * 147/6400 = 11.2896MHz
    Change i2s1_div to GPLL / 1 = 491.52MHz at same
    time.

    If apply this commit, clk_fd_round_rate() calls custom approximate
    function of Rockchip even if target rate is higher than parent.
    Custom function changes both grand parent (i2s1_div) and parent
    (i2s_frac) settings at same time. Clock mux system can choose
    i2s1_frac and audio works fine.

    Signed-off-by: Katsuhiro Suzuki
    Reviewed-by: Heiko Stuebner
    [sboyd@kernel.org: Make function into a macro instead]
    Signed-off-by: Stephen Boyd
    Signed-off-by: Sasha Levin

    Katsuhiro Suzuki
     
  • [ Upstream commit 2612d723aadcf8281f9bf8305657129bd9f3cd57 ]

    Using CX-3 virtual functions, either from a bare-metal machine or
    pass-through from a VM, MAD packets are proxied through the PF driver.

    Since the VF drivers have separate name spaces for MAD Transaction Ids
    (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
    in a cache.

    Following the RDMA Connection Manager (CM) protocol, it is clear when
    an entry has to evicted form the cache. But life is not perfect,
    remote peers may die or be rebooted. Hence, it's a timeout to wipe out
    a cache entry, when the PF driver assumes the remote peer has gone.

    During workloads where a high number of QPs are destroyed concurrently,
    excessive amount of CM DREQ retries has been observed

    The problem can be demonstrated in a bare-metal environment, where two
    nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
    have 16 vPorts per physical server.

    64 processes are associated with each vPort and creates and destroys
    one QP for each of the remote 64 processes. That is, 1024 QPs per
    vPort, all in all 16K QPs. The QPs are created/destroyed using the
    CM.

    When tearing down these 16K QPs, excessive CM DREQ retries (and
    duplicates) are observed. With some cat/paste/awk wizardry on the
    infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
    nodes:

    cm_rx_duplicates:
    dreq 2102
    cm_rx_msgs:
    drep 1989
    dreq 6195
    rep 3968
    req 4224
    rtu 4224
    cm_tx_msgs:
    drep 4093
    dreq 27568
    rep 4224
    req 3968
    rtu 3968
    cm_tx_retries:
    dreq 23469

    Note that the active/passive side is equally distributed between the
    two nodes.

    Enabling pr_debug in cm.c gives tons of:

    [171778.814239] mlx4_ib_multiplex_cm_handler: id{slave:
    1,sl_cm_id: 0xd393089f} is NULL!

    By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
    tear-down phase of the application is reduced from approximately 90 to
    50 seconds. Retries/duplicates are also significantly reduced:

    cm_rx_duplicates:
    dreq 2460
    []
    cm_tx_retries:
    dreq 3010
    req 47

    Increasing the timeout further didn't help, as these duplicates and
    retries stems from a too short CMA timeout, which was 20 (~4 seconds)
    on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
    numbers fell down to about 10 for both of them.

    Adjustment of the CMA timeout is not part of this commit.

    Signed-off-by: Håkon Bugge
    Acked-by: Jack Morgenstein
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Håkon Bugge
     
  • [ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ]

    Commit 0da03cab87e6
    ("loop: Fix deadlock when calling blkdev_reread_part()") moves
    blkdev_reread_part() out of the loop_ctl_mutex. However,
    GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
    __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
    will not rescan the loop device to delete all partitions.

    Below are steps to reproduce the issue:

    step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
    step2 # losetup -P /dev/loop0 tmp.raw
    step3 # parted /dev/loop0 mklabel gpt
    step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
    step5 # losetup -d /dev/loop0

    Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
    there is below kernel warning message:

    [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)

    This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().

    Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
    Signed-off-by: Dongli Zhang
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Dongli Zhang
     
  • [ Upstream commit e4c275f77624961b56cce397814d9d770a45ac59 ]

    Fix the following KASAN warning produced when booting a 64-bit kernel:
    [ 13.334750] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
    [ 13.342166] Read of size 8 at addr ffff880235067178 by task kworker/2:1/42
    [ 13.342176] CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 4.20.0-rc1+ #106
    [ 13.342179] Hardware name: Mellanox Technologies Ltd. MSN2740/Mellanox x86 SFF board, BIOS 5.6.5 06/07/2016
    [ 13.342190] Workqueue: events deferred_probe_work_func
    [ 13.342194] Call Trace:
    [ 13.342206] dump_stack+0xc7/0x15b
    [ 13.342214] ? show_regs_print_info+0x5/0x5
    [ 13.342220] ? kmsg_dump_rewind_nolock+0x59/0x59
    [ 13.342234] ? _raw_write_lock_irqsave+0x100/0x100
    [ 13.351593] print_address_description+0x73/0x260
    [ 13.351603] kasan_report+0x260/0x380
    [ 13.351611] ? find_first_bit+0x19/0x70
    [ 13.351619] find_first_bit+0x19/0x70
    [ 13.351630] mlxreg_hotplug_work_handler+0x73c/0x920 [mlxreg_hotplug]
    [ 13.351639] ? __lock_text_start+0x8/0x8
    [ 13.351646] ? _raw_write_lock_irqsave+0x80/0x100
    [ 13.351656] ? mlxreg_hotplug_remove+0x1e0/0x1e0 [mlxreg_hotplug]
    [ 13.351663] ? regmap_volatile+0x40/0xb0
    [ 13.351668] ? regcache_write+0x4c/0x90
    [ 13.351676] ? mlxplat_mlxcpld_reg_write+0x24/0x30 [mlx_platform]
    [ 13.351681] ? _regmap_write+0xea/0x220
    [ 13.351688] ? __mutex_lock_slowpath+0x10/0x10
    [ 13.351696] ? devm_add_action+0x70/0x70
    [ 13.351701] ? mutex_unlock+0x1d/0x40
    [ 13.351710] mlxreg_hotplug_probe+0x82e/0x989 [mlxreg_hotplug]
    [ 13.351723] ? mlxreg_hotplug_work_handler+0x920/0x920 [mlxreg_hotplug]
    [ 13.351731] ? sysfs_do_create_link_sd.isra.2+0xf4/0x190
    [ 13.351737] ? sysfs_rename_link_ns+0xf0/0xf0
    [ 13.351743] ? devres_close_group+0x2b0/0x2b0
    [ 13.351749] ? pinctrl_put+0x20/0x20
    [ 13.351755] ? acpi_dev_pm_attach+0x2c/0xd0
    [ 13.351763] platform_drv_probe+0x70/0xd0
    [ 13.351771] really_probe+0x480/0x6e0
    [ 13.351778] ? device_attach+0x10/0x10
    [ 13.351784] ? __lock_text_start+0x8/0x8
    [ 13.351790] ? _raw_write_lock_irqsave+0x80/0x100
    [ 13.351797] ? _raw_write_lock_irqsave+0x80/0x100
    [ 13.351806] ? __driver_attach+0x190/0x190
    [ 13.351812] driver_probe_device+0x17d/0x1a0
    [ 13.351819] ? __driver_attach+0x190/0x190
    [ 13.351825] bus_for_each_drv+0xd6/0x130
    [ 13.351831] ? bus_rescan_devices+0x20/0x20
    [ 13.351837] ? __mutex_lock_slowpath+0x10/0x10
    [ 13.351845] __device_attach+0x18c/0x230
    [ 13.351852] ? device_bind_driver+0x70/0x70
    [ 13.351859] ? __mutex_lock_slowpath+0x10/0x10
    [ 13.351866] bus_probe_device+0xea/0x110
    [ 13.351874] deferred_probe_work_func+0x1c9/0x290
    [ 13.351882] ? driver_deferred_probe_add+0x1d0/0x1d0
    [ 13.351889] ? preempt_notifier_dec+0x20/0x20
    [ 13.351897] ? read_word_at_a_time+0xe/0x20
    [ 13.351904] ? strscpy+0x151/0x290
    [ 13.351912] ? set_work_pool_and_clear_pending+0x9c/0xf0
    [ 13.351918] ? __switch_to_asm+0x34/0x70
    [ 13.351924] ? __switch_to_asm+0x40/0x70
    [ 13.351929] ? __switch_to_asm+0x34/0x70
    [ 13.351935] ? __switch_to_asm+0x40/0x70
    [ 13.351942] process_one_work+0x5cc/0xa00
    [ 13.351952] ? pwq_dec_nr_in_flight+0x1e0/0x1e0
    [ 13.351960] ? pci_mmcfg_check_reserved+0x80/0xb8
    [ 13.351967] ? run_rebalance_domains+0x250/0x250
    [ 13.351980] ? stack_access_ok+0x35/0x80
    [ 13.351986] ? deref_stack_reg+0xa1/0xe0
    [ 13.351994] ? schedule+0xcd/0x250
    [ 13.352000] ? worker_enter_idle+0x2d6/0x330
    [ 13.352006] ? __schedule+0xeb0/0xeb0
    [ 13.352014] ? fork_usermode_blob+0x130/0x130
    [ 13.352019] ? mutex_lock+0xa7/0x100
    [ 13.352026] ? _raw_spin_lock_irq+0x98/0xf0
    [ 13.352032] ? _raw_read_unlock_irqrestore+0x30/0x30
    [ 13.352037] i2c i2c-2: Added multiplexed i2c bus 11
    [ 13.352043] worker_thread+0x181/0xa80
    [ 13.352052] ? __switch_to_asm+0x34/0x70
    [ 13.352058] ? __switch_to_asm+0x40/0x70
    [ 13.352064] ? process_one_work+0xa00/0xa00
    [ 13.352070] ? __switch_to_asm+0x34/0x70
    [ 13.352076] ? __switch_to_asm+0x40/0x70
    [ 13.352081] ? __switch_to_asm+0x34/0x70
    [ 13.352086] ? __switch_to_asm+0x40/0x70
    [ 13.352092] ? __switch_to_asm+0x34/0x70
    [ 13.352097] ? __switch_to_asm+0x40/0x70
    [ 13.352105] ? __schedule+0x3d6/0xeb0
    [ 13.352112] ? migrate_swap_stop+0x470/0x470
    [ 13.352119] ? save_stack+0x89/0xb0
    [ 13.352127] ? kmem_cache_alloc_trace+0xe5/0x570
    [ 13.352132] ? kthread+0x59/0x1d0
    [ 13.352138] ? ret_from_fork+0x35/0x40
    [ 13.352154] ? __schedule+0xeb0/0xeb0
    [ 13.352161] ? remove_wait_queue+0x150/0x150
    [ 13.352169] ? _raw_write_lock_irqsave+0x80/0x100
    [ 13.352175] ? __lock_text_start+0x8/0x8
    [ 13.352183] ? process_one_work+0xa00/0xa00
    [ 13.352188] kthread+0x1a4/0x1d0
    [ 13.352195] ? kthread_create_worker_on_cpu+0xc0/0xc0
    [ 13.352202] ret_from_fork+0x35/0x40

    [ 13.353879] The buggy address belongs to the page:
    [ 13.353885] page:ffffea0008d419c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
    [ 13.353890] flags: 0x2ffff8000000000()
    [ 13.353897] raw: 02ffff8000000000 ffffea0008d419c8 ffffea0008d419c8 0000000000000000
    [ 13.353903] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    [ 13.353905] page dumped because: kasan: bad access detected

    [ 13.353908] Memory state around the buggy address:
    [ 13.353912] ffff880235067000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 13.353917] ffff880235067080: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04
    [ 13.353921] >ffff880235067100: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2 f2 f2 f2 04
    [ 13.353923] ^
    [ 13.353927] ffff880235067180: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 00 00 00 00 00
    [ 13.353931] ffff880235067200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 13.353933] ==================================================================

    The warning is caused by the below loop:
    for_each_set_bit(bit, (unsigned long *)&asserted, 8) {
    while "asserted" is declared as 'unsigned'.

    The casting of 32-bit unsigned integer pointer to a 64-bit unsigned long
    pointer. There are two problems here.
    It causes the access of four extra byte, which can corrupt memory
    The 32-bit pointer address may not be 64-bit aligned.

    The fix changes variable "asserted" to "unsigned long".

    Fixes: 1f976f6978bf ("platform/x86: Move Mellanox platform hotplug driver to platform/mellanox")
    Signed-off-by: Vadim Pasternak
    Signed-off-by: Darren Hart (VMware)
    Signed-off-by: Sasha Levin

    Vadim Pasternak
     
  • [ Upstream commit 4d9b2864a415fec39150bc13efc730c7eb88711e ]

    Commit ae7c8cba3221 ("platform/x86: ideapad-laptop: add lenovo RESCUER
    R720-15IKBN to no_hw_rfkill_list") added
    DMI_MATCH(DMI_BOARD_NAME, "80WW")
    for Lenovo RESCUER R720-15IKBN.

    But DMI_BOARD_NAME does not match 80WW on Lenovo RESCUER R720-15IKBN,
    thus cause Wireless LAN still be hard blocked.

    On Lenovo RESCUER R720-15IKBN:
    ~$ cat /sys/class/dmi/id/sys_vendor
    LENOVO
    ~$ cat /sys/class/dmi/id/board_name
    Provence-5R3
    ~$ cat /sys/class/dmi/id/product_name
    80WW
    ~$ cat /sys/class/dmi/id/product_version
    Lenovo R720-15IKBN

    So on Lenovo RESCUER R720-15IKBN:
    DMI_SYS_VENDOR should match "LENOVO",
    DMI_BOARD_NAME should match "Provence-5R3",
    DMI_PRODUCT_NAME should match "80WW",
    DMI_PRODUCT_VERSION should match "Lenovo R720-15IKBN".

    Fix it, and in according with other entries in no_hw_rfkill_list,
    use DMI_PRODUCT_VERSION instead of DMI_BOARD_NAME.

    Fixes: ae7c8cba3221 ("platform/x86: ideapad-laptop: add lenovo RESCUER R720-15IKBN to no_hw_rfkill_list")
    Signed-off-by: Yang Fan
    Signed-off-by: Darren Hart (VMware)
    Signed-off-by: Sasha Levin

    Yang Fan
     
  • [ Upstream commit ab2c4e2581ad32c28627235ff0ae8c5a5ea6899f ]

    Give precision identifiers to the two snprintf() formatting the priority
    and TC strings to avoid producing these two warnings:

    drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
    'mlxsw_sp_port_get_prio_strings':
    drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
    directive output may be truncated writing between 1 and 3 bytes into a
    region of size between 0 and 31 [-Wformat-truncation=]
    snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
    ^~
    drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
    output between 3 and 36 bytes into a destination of size 32
    snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    mlxsw_sp_port_hw_prio_stats[i].str, prio);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
    'mlxsw_sp_port_get_tc_strings':
    drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
    directive output may be truncated writing between 1 and 11 bytes into a
    region of size between 0 and 31 [-Wformat-truncation=]
    snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
    ^~
    drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
    output between 3 and 44 bytes into a destination of size 32
    snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    mlxsw_sp_port_hw_tc_stats[i].str, tc);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Florian Fainelli
     
  • [ Upstream commit 135e7245479addc6b1f5d031e3d7e2ddb3d2b109 ]

    Provide precision hints to snprintf() since we know the destination
    buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
    following warnings:

    drivers/net/ethernet/intel/e1000e/netdev.c: In function
    'e1000_request_msix':
    drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
    output may be truncated before the last format character
    [-Wformat-truncation=]
    "%s-rx-0", netdev->name);
    ^
    drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
    output between 6 and 21 bytes into a destination of size 20
    snprintf(adapter->rx_ring->name,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(adapter->rx_ring->name) - 1,
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "%s-rx-0", netdev->name);
    ~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
    output may be truncated before the last format character
    [-Wformat-truncation=]
    "%s-tx-0", netdev->name);
    ^
    drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
    output between 6 and 21 bytes into a destination of size 20
    snprintf(adapter->tx_ring->name,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(adapter->tx_ring->name) - 1,
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "%s-tx-0", netdev->name);
    ~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Florian Fainelli
     
  • [ Upstream commit f6d9758b12660484b6639364cc406da92a918c96 ]

    The following false positive lockdep splat has been observed.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.20.0+ #302 Not tainted
    ------------------------------------------------------
    systemd-udevd/160 is trying to acquire lock:
    edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

    but task is already holding lock:
    edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&desc->request_mutex){+.+.}:
    mutex_lock_nested+0x1c/0x24
    __setup_irq+0xa0/0x704
    request_threaded_irq+0xd0/0x150
    mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
    mdio_probe+0x2c/0x54
    really_probe+0x200/0x2c4
    driver_probe_device+0x5c/0x174
    __driver_attach+0xd8/0xdc
    bus_for_each_dev+0x58/0x7c
    bus_add_driver+0xe4/0x1f0
    driver_register+0x7c/0x110
    mdio_driver_register+0x24/0x58
    do_one_initcall+0x74/0x2e8
    do_init_module+0x60/0x1d0
    load_module+0x1968/0x1ff4
    sys_finit_module+0x8c/0x98
    ret_fast_syscall+0x0/0x28
    0xbedf2ae8

    -> #0 (&chip->reg_lock){+.+.}:
    __mutex_lock+0x50/0x8b8
    mutex_lock_nested+0x1c/0x24
    __setup_irq+0x640/0x704
    request_threaded_irq+0xd0/0x150
    mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
    mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
    mdio_probe+0x2c/0x54
    really_probe+0x200/0x2c4
    driver_probe_device+0x5c/0x174
    __driver_attach+0xd8/0xdc
    bus_for_each_dev+0x58/0x7c
    bus_add_driver+0xe4/0x1f0
    driver_register+0x7c/0x110
    mdio_driver_register+0x24/0x58
    do_one_initcall+0x74/0x2e8
    do_init_module+0x60/0x1d0
    load_module+0x1968/0x1ff4
    sys_finit_module+0x8c/0x98
    ret_fast_syscall+0x0/0x28
    0xbedf2ae8

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&desc->request_mutex);
    lock(&chip->reg_lock);
    lock(&desc->request_mutex);
    lock(&chip->reg_lock);

    &desc->request_mutex refer to two different mutex. #1 is the GPIO for
    the chip interrupt. #2 is the chained interrupt between global 1 and
    global 2.

    Add lockdep classes to the GPIO interrupt to avoid this.

    Reported-by: Russell King
    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Signed-off-by: Sasha Levin

    Andrew Lunn
     
  • [ Upstream commit a6327b5e57fdc679c842588c3be046c0b39cc127 ]

    When running OMAP1 kernel on QEMU, MMC access is annoyingly noisy:

    MMC: CTO of 0xff and 0xfe cannot be used!
    MMC: CTO of 0xff and 0xfe cannot be used!
    MMC: CTO of 0xff and 0xfe cannot be used!
    [ad inf.]

    Emulator warnings appear to be valid. The TI document SPRU680 [1]
    ("OMAP5910 Dual-Core Processor MultiMedia Card/Secure Data Memory Card
    (MMC/SD) Reference Guide") page 36 states that the maximum timeout is 253
    cycles and "0xff and 0xfe cannot be used".

    Fix by using 0xfd as the maximum timeout.

    Tested using QEMU 2.5 (Siemens SX1 machine, OMAP310), and also checked on
    real hardware using Palm TE (OMAP310), Nokia 770 (OMAP1710) and Nokia N810
    (OMAP2420) that MMC works as before.

    [1] http://www.ti.com/lit/ug/spru680/spru680.pdf

    Fixes: 730c9b7e6630f ("[MMC] Add OMAP MMC host driver")
    Signed-off-by: Aaro Koskinen
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Aaro Koskinen
     
  • [ Upstream commit f5fef4593653dfa2a865c485bb81415de51d5c99 ]

    [BUG]
    Btrfs qgroup will still hit EDQUOT under the following case:

    $ dev=/dev/test/test
    $ mnt=/mnt/btrfs
    $ umount $mnt &> /dev/null
    $ umount $dev &> /dev/null

    $ mkfs.btrfs -f $dev
    $ mount $dev $mnt -o nospace_cache

    $ btrfs subv create $mnt/subv
    $ btrfs quota enable $mnt
    $ btrfs quota rescan -w $mnt
    $ btrfs qgroup limit -e 1G $mnt/subv

    $ fallocate -l 900M $mnt/subv/padding
    $ sync

    $ rm $mnt/subv/padding

    # Hit EDQUOT
    $ xfs_io -f -c "pwrite 0 512M" $mnt/subv/real_file

    [CAUSE]
    Since commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance
    to reduce early EDQUOT"), btrfs is not forced to commit transaction to
    reclaim more quota space.

    Instead, we just check pertrans metadata reservation against some
    threshold and try to do asynchronously transaction commit.

    However in above case, the pertrans metadata reservation is pretty small
    thus it will never trigger asynchronous transaction commit.

    [FIX]
    Instead of only accounting pertrans metadata reservation, we calculate
    how much free space we have, and if there isn't much free space left,
    commit transaction asynchronously to try to free some space.

    This may slow down the fs when we have less than 32M free qgroup space,
    but should reduce a lot of false EDQUOT, so the cost should be
    acceptable.

    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin

    Qu Wenruo
     
  • [ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]

    After we ALIGN up the address we need to make sure we didn't overflow
    and resulted in zero address. In that case, we need to make sure that
    the returned address is greater than mmap_min_addr.

    This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
    run as non root user for

    mmap(-1, MAP_HUGETLB)

    The bug is that a non-root user requesting address -1 will be given address 0
    which will then fail, whereas they should have been given something else that
    would have succeeded.

    We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
    with this change. So we think this is not a security issue, because it only affects
    whether we choose an address below mmap_min_addr, not whether we
    actually allow that address to be mapped. ie. there are existing capability
    checks to prevent a user mapping below mmap_min_addr and those will still be
    honoured even without this fix.

    Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
    Reviewed-by: Laurent Dufour
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin

    Aneesh Kumar K.V
     
  • [ Upstream commit 032ebd8548c9d05e8d2bdc7a7ec2fe29454b0ad0 ]

    L1 tables are allocated with __get_dma_pages, and therefore already
    ignored by kmemleak.

    Without this, the kernel would print this error message on boot,
    when the first L1 table is allocated:

    [ 2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black
    [ 2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S 4.19.16 #8
    [ 2.831227] Workqueue: events deferred_probe_work_func
    [ 2.836353] Call trace:
    ...
    [ 2.852532] paint_ptr+0xa0/0xa8
    [ 2.855750] kmemleak_ignore+0x38/0x6c
    [ 2.859490] __arm_v7s_alloc_table+0x168/0x1f4
    [ 2.863922] arm_v7s_alloc_pgtable+0x114/0x17c
    [ 2.868354] alloc_io_pgtable_ops+0x3c/0x78
    ...

    Fixes: e5fc9753b1a8314 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
    Signed-off-by: Nicolas Boichat
    Acked-by: Will Deacon
    Signed-off-by: Joerg Roedel
    Signed-off-by: Sasha Levin

    Nicolas Boichat
     
  • [ Upstream commit 74ffe79ae538283bbf7c155e62339f1e5c87b55a ]

    Mostly unwind is done with irqs enabled however SLUB may call it with
    irqs disabled while creating a new SLUB cache.

    I had system freeze while loading a module which called
    kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
    interrupts and then

    ->new_slab_objects()
    ->new_slab()
    ->setup_object()
    ->setup_object_debug()
    ->init_tracking()
    ->set_track()
    ->save_stack_trace()
    ->save_stack_trace_tsk()
    ->walk_stackframe()
    ->unwind_frame()
    ->unwind_find_idx()
    =>spin_lock_irqsave(&unwind_lock);

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Russell King
    Signed-off-by: Sasha Levin

    Sebastian Andrzej Siewior
     
  • [ Upstream commit fe9ed6d2483fda55465f32924fb15bce0fac3fac ]

    Like the other OF-enabled drivers, use the port number from the firmware if
    the devicetree specifies an alias:

    aliases {
    ...
    serial2 = &uart2; /* Should be ttyS2 */
    }

    This is how the deprecated pxa.c driver behaved, switching to 8250_pxa
    messes up the numbering.

    Signed-off-by: Lubomir Rintel
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Lubomir Rintel
     
  • [ Upstream commit 5666dfd1d8a45a167f0d8b4ef47ea7f780b1f24a ]

    SDM845 has ETMv4.2 and can use the existing etm4x driver.
    But the current etm driver checks only for ETMv4.0 and
    errors out for other etm4x versions. This patch adds this
    missing support to enable SoC's with ETMv4x to use same
    driver by checking only the ETM architecture major version
    number.

    Without this change, we get below error during etm probe:

    / # dmesg | grep etm
    [ 6.660093] coresight-etm4x: probe of 7040000.etm failed with error -22
    [ 6.666902] coresight-etm4x: probe of 7140000.etm failed with error -22
    [ 6.673708] coresight-etm4x: probe of 7240000.etm failed with error -22
    [ 6.680511] coresight-etm4x: probe of 7340000.etm failed with error -22
    [ 6.687313] coresight-etm4x: probe of 7440000.etm failed with error -22
    [ 6.694113] coresight-etm4x: probe of 7540000.etm failed with error -22
    [ 6.700914] coresight-etm4x: probe of 7640000.etm failed with error -22
    [ 6.707717] coresight-etm4x: probe of 7740000.etm failed with error -22

    With this change, etm probe is successful:

    / # dmesg | grep etm
    [ 6.659198] coresight-etm4x 7040000.etm: CPU0: ETM v4.2 initialized
    [ 6.665848] coresight-etm4x 7140000.etm: CPU1: ETM v4.2 initialized
    [ 6.672493] coresight-etm4x 7240000.etm: CPU2: ETM v4.2 initialized
    [ 6.679129] coresight-etm4x 7340000.etm: CPU3: ETM v4.2 initialized
    [ 6.685770] coresight-etm4x 7440000.etm: CPU4: ETM v4.2 initialized
    [ 6.692403] coresight-etm4x 7540000.etm: CPU5: ETM v4.2 initialized
    [ 6.699024] coresight-etm4x 7640000.etm: CPU6: ETM v4.2 initialized
    [ 6.705646] coresight-etm4x 7740000.etm: CPU7: ETM v4.2 initialized

    Signed-off-by: Sai Prakash Ranjan
    Reviewed-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Sai Prakash Ranjan
     
  • [ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]

    When building with -Wsometimes-uninitialized, Clang warns:

    arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
    uninitialized whenever 'if' condition is false
    [-Wsometimes-uninitialized]
    if (cpu_has_feature(CPU_FTRS_POWER9))
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
    if (opcode == NULL)
    ^~~~~~
    arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
    condition is always true
    if (cpu_has_feature(CPU_FTRS_POWER9))
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
    'opcode' to silence this warning
    const struct powerpc_opcode *opcode;
    ^
    = NULL
    1 warning generated.

    This warning seems to make no sense on the surface because opcode is set
    to NULL right below this statement. However, there is a comma instead of
    semicolon to end the dialect assignment, meaning that the opcode
    assignment only happens in the if statement. Properly terminate that
    line so that Clang no longer warns.

    Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin

    Nathan Chancellor
     
  • [ Upstream commit 9390dff66a52d1a60c6e517d8fa6cdbdffc83cb1 ]

    If include/config/auto.conf.cmd is lost for some reasons, it is not
    self-healing, so the top Makefile misses to run syncconfig.
    Move include/config/auto.conf.cmd to the target side.

    I used a pattern rule instead of a normal rule here although it is
    a bit gross.

    If the rule were written with a normal rule like this,

    include/config/auto.conf \
    include/config/auto.conf.cmd \
    include/config/tristate.conf: $(KCONFIG_CONFIG)
    $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

    ... syncconfig would be executed per target.

    Using a pattern rule makes sure that syncconfig is executed just once
    because Make assumes the recipe will create all of the targets.

    Here is a quote from the GNU Make manual [1]:

    "Pattern rules may have more than one target. Unlike normal rules,
    this does not act as many different rules with the same prerequisites
    and recipe. If a pattern rule has multiple targets, make knows that
    the rule's recipe is responsible for making all of the targets. The
    recipe is executed only once to make all the targets. When searching
    for a pattern rule to match a target, the target patterns of a rule
    other than the one that matches the target in need of a rule are
    incidental: make worries only about giving a recipe and prerequisites
    to the file presently in question. However, when this file's recipe is
    run, the other targets are marked as having been updated themselves."

    [1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Sasha Levin

    Masahiro Yamada
     
  • [ Upstream commit 1749ef00f7312679f76d5e9104c5d1e22a829038 ]

    We had a test-report where, under memory pressure, adding LUNs to the
    systems would fail (the tests add LUNs strictly in sequence):

    [ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
    [ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
    [ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
    [ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
    [ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
    [ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
    [ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
    [ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
    [ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    [ 5525.857838] sdk: sdk1
    [ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
    [ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
    [ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
    [ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
    [ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
    [ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
    [ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

    Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
    there seems to be no reason to use GFP_ATMOIC here. All the different
    call-contexts use a mutex at some point, and nothing in between that
    requires no sleeping, as far as I could see. Additionally, the code that
    later allocates the block queue for the device (scsi_mq_alloc_queue())
    already uses GFP_KERNEL.

    There are similar allocations in two other functions:
    scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
    GFP_KERNEL.

    Here is the contexts for the three functions so far:

    scsi_alloc_sdev()
    scsi_probe_and_add_lun()
    scsi_sequential_lun_scan()
    __scsi_scan_target()
    scsi_scan_target()
    mutex_lock()
    scsi_scan_channel()
    scsi_scan_host_selected()
    mutex_lock()
    scsi_report_lun_scan()
    __scsi_scan_target()
    ...
    __scsi_add_device()
    mutex_lock()
    __scsi_scan_target()
    ...
    scsi_report_lun_scan()
    ...
    scsi_get_host_dev()
    mutex_lock()

    scsi_probe_and_add_lun()
    ...

    scsi_add_lun()
    scsi_probe_and_add_lun()
    ...

    So replace all these, and give them a bit of a better chance to succeed,
    with more chances of reclaim.

    Signed-off-by: Benjamin Block
    Reviewed-by: Bart Van Assche
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Sasha Levin

    Benjamin Block
     
  • [ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]

    We store 2 multilevel tables in iommu_table - one for the hardware and
    one with the corresponding userspace addresses. Before allocating
    the tables, the iommu_table_group_ops::get_table_size() hook returns
    the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
    the locked_vm counter correctly. When the table is actually allocated,
    the amount of allocated memory is stored in iommu_table::it_allocated_size
    and used to decrement the locked_vm counter when we release the memory
    used by the table; .get_table_size() and .create_table() calculate it
    independently but the result is expected to be the same.

    However the allocator does not add the userspace table size to
    .it_allocated_size so when we destroy the table because of VFIO PCI
    unplug (i.e. VFIO container is gone but the userspace keeps running),
    we decrement locked_vm by just a half of size of memory we are
    releasing.

    To make things worse, since we enabled on-demand allocation of
    indirect levels, it_allocated_size contains only the amount of memory
    actually allocated at the table creation time which can just be a
    fraction. It is not a problem with incrementing locked_vm (as
    get_table_size() value is used) but it is with decrementing.

    As the result, we leak locked_vm and may not be able to allocate more
    IOMMU tables after few iterations of hotplug/unplug.

    This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
    hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
    have a single place which calculates the maximum memory a table can
    occupy. The original meaning of it_allocated_size is somewhat lost now
    though.

    We do not ditch it_allocated_size whatsoever here and we do not call
    get_table_size() from vfio_iommu_spapr_tce.c when decrementing
    locked_vm as we may have multiple IOMMU groups per container and even
    though they all are supposed to have the same get_table_size()
    implementation, there is a small chance for failure or confusion.

    Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
    Signed-off-by: Alexey Kardashevskiy
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin

    Alexey Kardashevskiy
     
  • [ Upstream commit 68ef236274793066b9ba3154b16c0acc1c891e5c ]

    According to the chipidea driver bindings, the USB PHY is specified via
    the "phys" phandle node. However, this only takes effect for USB PHYs
    that use the common PHY framework. For legacy USB PHYs, a simple lookup
    based on the USB PHY type is done instead.

    This does not play out well when more than one USB PHY is registered,
    since the first registered PHY matching the type will always be
    returned regardless of what the driver was bound to.

    Fix this by looking up the PHY based on the "phys" phandle node.
    Although generic PHYs are rather matched by their "phys-name" and not
    the "phys" phandle directly, there is no helper for similar lookup on
    legacy PHYs and it's probably not worth the effort to add it.

    When no legacy USB PHY is found by phandle, fallback to grabbing any
    registered USB2 PHY. This ensures backward compatibility if some users
    were actually relying on this mechanism.

    Signed-off-by: Paul Kocialkowski
    Signed-off-by: Peter Chen
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Paul Kocialkowski
     
  • [ Upstream commit 41798036430015ad45137db2d4c213cd77fd0251 ]

    The cavium/zip implementation of the deflate compression algorithm is
    incorrectly being registered under the generic driver name, which
    prevents the generic implementation from being registered with the
    crypto API when CONFIG_CRYPTO_DEV_CAVIUM_ZIP=y. Similarly the lzs
    algorithm (which does not currently have a generic implementation...)
    is incorrectly being registered as lzs-generic.

    Fix the naming collision by adding a suffix "-cavium" to the
    cra_driver_name of the cavium/zip algorithms.

    Fixes: 640035a2dc55 ("crypto: zip - Add ThunderX ZIP driver core")
    Cc: Mahipal Challa
    Cc: Jan Glauber
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Eric Biggers
     
  • [ Upstream commit 8c2b43d2d85b48a97d2f8279278a4aac5b45f925 ]

    Add an of_node_put when a tested device node is not available.

    The semantic patch that fixes this problem is as follows
    (http://coccinelle.lip6.fr):

    //
    @@
    identifier f;
    local idexpression e;
    expression x;
    @@

    e = f(...);
    ... when != of_node_put(e)
    when != x = e
    when != e = x
    when any
    if () {
    ... when != of_node_put(e)
    (
    return e;
    |
    + of_node_put(e);
    return ...;
    )
    }
    //

    Fixes: 5343e674f32fb ("crypto4xx: integrate ppc4xx-rng into crypto4xx")
    Signed-off-by: Julia Lawall
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Julia Lawall
     
  • [ Upstream commit 34e022d8b780a03902d82fb3997ba7c7b1f40c81 ]

    The call to of_find_node_by_phandle returns a node pointer with refcount
    incremented thus it must be explicitly decremented after the last
    usage.

    Detected by coccinelle with the following warnings:
    ./drivers/net/wireless/mediatek/mt76/eeprom.c:58:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
    ./drivers/net/wireless/mediatek/mt76/eeprom.c:61:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
    ./drivers/net/wireless/mediatek/mt76/eeprom.c:67:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
    ./drivers/net/wireless/mediatek/mt76/eeprom.c:70:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
    ./drivers/net/wireless/mediatek/mt76/eeprom.c:72:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.

    Signed-off-by: Wen Yang
    Cc: Felix Fietkau
    Cc: Lorenzo Bianconi
    Cc: Kalle Valo
    Cc: "David S. Miller"
    Cc: Matthias Brugger
    Cc: linux-wireless@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mediatek@lists.infradead.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Kalle Valo
    Signed-off-by: Sasha Levin

    Wen Yang
     
  • [ Upstream commit de77a53c2d1e8fb3621e63e8e1f0f0c9a1a99ff7 ]

    ies1 or ies2 might be null when code inside
    _wil_cfg80211_merge_extra_ies access them.
    Add explicit check for null and make sure ies1/ies2 are not
    accessed in such a case.

    spos might be null and be accessed inside
    _wil_cfg80211_merge_extra_ies.
    Add explicit check for null in the while condition statement
    and make sure spos is not accessed in such a case.

    Signed-off-by: Alexei Avshalom Lazar
    Signed-off-by: Maya Erez
    Signed-off-by: Kalle Valo
    Signed-off-by: Sasha Levin

    Alexei Avshalom Lazar
     
  • [ Upstream commit 95c80bc6952b6a5badc7b702d23e5bf14d251e7c ]

    Dongdong reported a deadlock triggered by a hotplug event during a sysfs
    "remove" operation:

    pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
    # echo 1 > 0000:00:0c.0/remove

    PME and hotplug share an MSI/MSI-X vector. The sysfs "remove" side is:

    remove_store
    pci_stop_and_remove_bus_device_locked
    pci_lock_rescan_remove
    pci_stop_and_remove_bus_device
    ...
    pcie_pme_remove
    pcie_pme_suspend
    synchronize_irq # wait for hotplug IRQ handler
    pci_unlock_rescan_remove

    The hotplug side is:

    pciehp_ist
    pciehp_handle_presence_or_link_change
    pciehp_configure_device
    pci_lock_rescan_remove # wait for pci_unlock_rescan_remove()

    INFO: task bash:10913 blocked for more than 120 seconds.

    # ps -ax |grep D
    PID TTY STAT TIME COMMAND
    10913 ttyAMA0 Ds+ 0:00 -bash
    14022 ? D 0:00 [irq/745-pciehp]

    # cat /proc/14022/stack
    __switch_to+0x94/0xd8
    pci_lock_rescan_remove+0x20/0x28
    pciehp_configure_device+0x30/0x140
    pciehp_handle_presence_or_link_change+0x324/0x458
    pciehp_ist+0x1dc/0x1e0

    # cat /proc/10913/stack
    __switch_to+0x94/0xd8
    synchronize_irq+0x8c/0xc0
    pcie_pme_suspend+0xa4/0x118
    pcie_pme_remove+0x20/0x40
    pcie_port_remove_service+0x3c/0x58
    ...
    pcie_port_device_remove+0x2c/0x48
    pcie_portdrv_remove+0x68/0x78
    pci_device_remove+0x48/0x120
    ...
    pci_stop_bus_device+0x84/0xc0
    pci_stop_and_remove_bus_device_locked+0x24/0x40
    remove_store+0xa4/0xb8
    dev_attr_store+0x44/0x60
    sysfs_kf_write+0x58/0x80

    It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
    reasons.

    First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
    native hotplug interrupt handler as well as for the PME one, because they
    share one IRQ (as per the spec). That may deadlock if hotplug is signaled
    while pcie_pme_remove() is running and the latter calls
    pci_lock_rescan_remove() before the former.

    Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
    for the port, it will return without disabling the interrupt as expected by
    pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM:
    Fix native PME handling during system suspend/resume").

    To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
    its status and prevent the PME worker function from re-enabling it before
    calling free_irq() on it, which should be sufficient.

    Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume")
    Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.com
    Reported-by: Dongdong Liu
    Signed-off-by: Rafael J. Wysocki
    [bhelgaas: add URL and deadlock details from Dongdong]
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Sasha Levin

    Rafael J. Wysocki
     
  • [ Upstream commit 7c5b019e3a638a5a290b0ec020f6ca83d2ec2aaa ]

    Fix buffer overflow observed when running perf test.

    The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
    resulting in -9223372036854775808 which overflows the 20 character
    buffer.

    If is possible this bug has been reported before but I still don't see
    any fix checked in:

    See: https://www.spinics.net/lists/linux-perf-users/msg07714.html

    Reported-by: Michael Sartain
    Reported-by: Mathias Krause
    Signed-off-by: Tony Jones
    Acked-by: Steven Rostedt (VMware)
    Cc: Frederic Weisbecker
    Fixes: f7d82350e597 ("tools/events: Add files to create libtraceevent.a")
    Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.de
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Tony Jones
     
  • [ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]

    guard_bio_eod() can truncate a segment in bio to allow it to do IO on
    odd last sectors of a device.

    It already checks if the IO starts past EOD, but it does not consider
    the possibility of an IO request starting within device boundaries can
    contain more than one segment past EOD.

    In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
    underflow bvec->bv_len.

    Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

    This situation has been found on filesystems such as isofs and vfat,
    which doesn't check the device size before mount, if the device is
    smaller than the filesystem itself, a readahead on such filesystem,
    which spans EOD, can trigger this situation, leading a call to
    zero_user() with a wrong size possibly corrupting memory.

    I didn't see any crash, or didn't let the system run long enough to
    check if memory corruption will be hit somewhere, but adding
    instrumentation to guard_bio_end() to check truncated_bytes size, was
    enough to see the error.

    The following script can trigger the error.

    MNT=/mnt
    IMG=./DISK.img
    DEV=/dev/loop0

    mkfs.vfat $IMG
    mount $IMG $MNT
    cp -R /etc $MNT &> /dev/null
    umount $MNT

    losetup -D

    losetup --find --show --sizelimit 16247280 $IMG
    mount $DEV $MNT

    find $MNT -type f -exec cat {} + >/dev/null

    Kudos to Eric Sandeen for coming up with the reproducer above

    Reviewed-by: Ming Lei
    Signed-off-by: Carlos Maiolino
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Carlos Maiolino
     
  • [ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]

    In jbd2_journal_commit_transaction(), if we are in abort mode,
    we may flush the buffer without setting descriptor block checksum
    by goto start_journal_io. Then fs is mounted,
    jbd2_descriptor_block_csum_verify() failed.

    [ 271.379811] EXT4-fs (vdd): shut down requested (2)
    [ 271.381827] Aborting journal on device vdd-8.
    [ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
    [ 271.598023] JBD2: recovery failed
    [ 271.598484] EXT4-fs (vdd): error loading journal

    Fix this problem by keep setting descriptor block checksum if the
    descriptor buffer is not NULL.

    This checksum problem can be reproduced by xfstests generic/388.

    Signed-off-by: luojiajun
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Sasha Levin

    luojiajun
     
  • [ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]

    TCP resets cause instant transition from established to closed state
    provided the reset is in-window. Endpoints that implement RFC 5961
    require resets to match the next expected sequence number.
    RST segments that are in-window (but that do not match RCV.NXT) are
    ignored, and a "challenge ACK" is sent back.

    Main problem for conntrack is that its a middlebox, i.e. whereas an end
    host might have ACK'd SEQ (and would thus accept an RST with this
    sequence number), conntrack might not have seen this ACK (yet).

    Therefore we can't simply flag RSTs with non-exact match as invalid.

    This updates RST processing as follows:

    1. If the connection is in a state other than ESTABLISHED, nothing is
    changed, RST is subject to normal in-window check.

    2. If the RSTs sequence number either matches exactly RCV.NXT,
    connection state moves to CLOSE.

    3. The same applies if the RST sequence number aligns with a previous
    packet in the same direction.

    In all other cases, the connection remains in ESTABLISHED state.
    If the normal-in-window check passes, the timeout will be lowered
    to that of CLOSE.

    If the peer sends a challenge ack, connection timeout will be reset.

    If the challenge ACK triggers another RST (RST was valid after all),
    this 2nd RST will match expected sequence and conntrack state changes to
    CLOSE.

    If no challenge ACK is received, the connection will time out after
    CLOSE seconds (10 seconds by default), just like without this patch.

    Packetdrill test case:

    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    0.000 bind(3, ..., ...) = 0
    0.000 listen(3, 1) = 0

    0.100 < S 0:0(0) win 32792
    0.100 > S. 0:0(0) ack 1 win 64240
    0.200 < . 1:1(0) ack 1 win 257
    0.200 accept(3, ..., ...) = 4

    // Receive a segment.
    0.210 < P. 1:1001(1000) ack 1 win 46
    0.210 > . 1:1(0) ack 1001

    // Application writes 1000 bytes.
    0.250 write(4, ..., 1000) = 1000
    0.250 > P. 1:1001(1000) ack 1001

    // First reset, old sequence. Conntrack (correctly) considers this
    // invalid due to failed window validation (regardless of this patch).
    0.260 < R 2:2(0) ack 1001 win 260

    // 2nd reset, but too far ahead sequence. Same: correctly handled
    // as invalid.
    0.270 < R 99990001:99990001(0) ack 1001 win 260

    // in-window, but not exact sequence.
    // Current Linux kernels might reply with a challenge ack, and do not
    // remove connection.
    // Without this patch, conntrack state moves to CLOSE.
    // With patch, timeout is lowered like CLOSE, but connection stays
    // in ESTABLISHED state.
    0.280 < R 1010:1010(0) ack 1001 win 260

    // Expect challenge ACK
    0.281 > . 1001:1001(0) ack 1001 win 501

    // With or without this patch, RST will cause connection
    // to move to CLOSE (sequence number matches)
    // 0.282 < R 1001:1001(0) ack 1001 win 260

    // ACK
    0.300 < . 1001:1001(0) ack 1001 win 257

    // more data could be exchanged here, connection
    // is still established

    // Client closes the connection.
    0.610 < F. 1001:1001(0) ack 1001 win 260
    0.650 > . 1001:1001(0) ack 1002

    // Close the connection without reading outstanding data
    0.700 close(4) = 0

    // so one more reset. Will be deemed acceptable with patch as well:
    // connection is already closing.
    0.701 > R. 1001:1001(0) ack 1002 win 501
    // End packetdrill test case.

    With patch, this generates following conntrack events:
    [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
    [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
    [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
    [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
    [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
    [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

    Without patch, first RST moves connection to close, whereas socket state
    does not change until FIN is received.
    [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
    [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
    [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
    [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]

    Check the result of dereferencing base_chain->stats, instead of result
    of this_cpu_ptr with NULL.

    base_chain->stats maybe be changed to NULL when a chain is updated and a
    new NULL counter can be attached.

    And we do not need to check returning of this_cpu_ptr since
    base_chain->stats is from percpu allocator if it is non-NULL,
    this_cpu_ptr returns a valid value.

    And fix two sparse error by replacing rcu_access_pointer and
    rcu_dereference with READ_ONCE under rcu_read_lock.

    Thanks for Eric's help to finish this patch.

    Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Li RongQing
     
  • [ Upstream commit 68e2672f8fbd1e04982b8d2798dd318bf2515dd2 ]

    There is a NULL pointer dereference of devname in strspn()

    The oops looks something like:

    CIFS: Attempting to mount (null)
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    RIP: 0010:strspn+0x0/0x50
    ...
    Call Trace:
    ? cifs_parse_mount_options+0x222/0x1710 [cifs]
    ? cifs_get_volume_info+0x2f/0x80 [cifs]
    cifs_setup_volume_info+0x20/0x190 [cifs]
    cifs_get_volume_info+0x50/0x80 [cifs]
    cifs_smb3_do_mount+0x59/0x630 [cifs]
    ? ida_alloc_range+0x34b/0x3d0
    cifs_do_mount+0x11/0x20 [cifs]
    mount_fs+0x52/0x170
    vfs_kern_mount+0x6b/0x170
    do_mount+0x216/0xdc0
    ksys_mount+0x83/0xd0
    __x64_sys_mount+0x25/0x30
    do_syscall_64+0x65/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fix this by adding a NULL check on devname in cifs_parse_devname()

    Signed-off-by: Yao Liu
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Yao Liu
     
  • [ Upstream commit 969ae8e8d4ee54c99134d3895f2adf96047f5bee ]

    Old windows version or Netapp SMB server will return
    NT_STATUS_NOT_SUPPORTED since they do not allow or implement
    FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
    provided it's properly signed.

    See
    https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/

    and

    MS-SMB2 validate negotiate response processing:
    https://msdn.microsoft.com/en-us/library/hh880630.aspx

    Samba client had already handled it.
    https://bugzilla.samba.org/attachment.cgi?id=13285&action=edit

    Signed-off-by: Namjae Jeon
    Signed-off-by: Steve French
    Signed-off-by: Sasha Levin

    Namjae Jeon
     
  • [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]

    We use below condition to check inline_xattr_size boundary:

    if (!F2FS_OPTION(sbi).inline_xattr_size ||
    F2FS_OPTION(sbi).inline_xattr_size >=
    DEF_ADDRS_PER_INODE -
    F2FS_TOTAL_EXTRA_ATTR_SIZE -
    DEF_INLINE_RESERVED_SIZE -
    DEF_MIN_INLINE_SIZE)

    There is there problems in that check:
    - we should allow inline_xattr_size equaling to min size of inline
    {data,dentry} area.
    - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
    different size unit, previous one is 4 bytes, latter one is 1 bytes.
    - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
    however, we need to consider min size of inline dentry area as well,
    minimal inline dentry should at least contain two entries: '.' and
    '..', so that min inline_dentry size is 40 bytes.

    .bitmap 1 * 1 = 1
    .reserved 1 * 1 = 1
    .dentry 11 * 2 = 22
    .filename 8 * 2 = 16
    total 40

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 70de2cbda8a5d788284469e755f8b097d339c240 ]

    Invoking dm_get_device() twice on the same device path with different
    modes is dangerous. Because in that case, upgrade_mode() will alloc a
    new 'dm_dev' and free the old one, which may be referenced by a previous
    caller. Dereferencing the dangling pointer will trigger kernel NULL
    pointer dereference.

    The following two cases can reproduce this issue. Actually, they are
    invalid setups that must be disallowed, e.g.:

    1. Creating a thin-pool with read_only mode, and the same device as
    both metadata and data.

    dmsetup create thinp --table \
    "0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
    ...
    Call Trace:
    new_read+0xfb/0x110 [dm_bufio]
    dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
    ? kmem_cache_alloc_trace+0x15c/0x1e0
    __create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
    dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
    pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
    ? realloc_argv+0x50/0x70 [dm_mod]
    dm_table_add_target+0x14e/0x330 [dm_mod]
    table_load+0x122/0x2e0 [dm_mod]
    ? dev_status+0x40/0x40 [dm_mod]
    ctl_ioctl+0x1aa/0x3e0 [dm_mod]
    dm_ctl_ioctl+0xa/0x10 [dm_mod]
    do_vfs_ioctl+0xa2/0x600
    ? handle_mm_fault+0xda/0x200
    ? __do_page_fault+0x26c/0x4f0
    ksys_ioctl+0x60/0x90
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x55/0x150
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    2. Creating a external snapshot using the same thin-pool device.

    dmsetup create thinp --table \
    "0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
    dmsetup message /dev/mapper/thinp 0 "create_thin 0"
    dmsetup create snap --table \
    "0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    Call Trace:
    ? __alloc_pages_nodemask+0x13c/0x2e0
    retrieve_status+0xa5/0x1f0 [dm_mod]
    ? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
    table_status+0x61/0xa0 [dm_mod]
    ctl_ioctl+0x1aa/0x3e0 [dm_mod]
    dm_ctl_ioctl+0xa/0x10 [dm_mod]
    do_vfs_ioctl+0xa2/0x600
    ksys_ioctl+0x60/0x90
    ? ksys_write+0x4f/0xb0
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x55/0x150
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Signed-off-by: Jason Cai (Xiang Feng)
    Signed-off-by: Mike Snitzer
    Signed-off-by: Sasha Levin

    Jason Cai (Xiang Feng)
     
  • [ Upstream commit 259594bea574e515a148171b5cd84ce5cbdc028a ]

    When compiling with -Wformat, clang emits the following warnings:

    fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
    short' but the argument has type 'unsigned int' [-Wformat]
    tgt_total_cnt, total_in_tgt);
    ^~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->flags, ref->server_type);
    ^~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->flags, ref->server_type);
    ^~~~~~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->ref_flag, ref->path_consumed);
    ^~~~~~~~~~~~~

    fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
    but the argument has type 'int' [-Wformat]
    ref->ref_flag, ref->path_consumed);
    ^~~~~~~~~~~~~~~~~~
    The types of these arguments are unconditionally defined, so this patch
    updates the format character to the correct ones for ints and unsigned
    ints.

    Link: https://github.com/ClangBuiltLinux/linux/issues/378

    Signed-off-by: Louis Taylor
    Signed-off-by: Steve French
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Sasha Levin

    Louis Taylor
     
  • [ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ]

    KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
    It triggers false positives in the allocation path:

    BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
    Read of size 8 at addr ffff88881f800000 by task swapper/0
    CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
    Call Trace:
    dump_stack+0xe0/0x19a
    print_address_description.cold.2+0x9/0x28b
    kasan_report.cold.3+0x7a/0xb5
    __asan_report_load8_noabort+0x19/0x20
    memchr_inv+0x2ea/0x330
    kernel_poison_pages+0x103/0x3d5
    get_page_from_freelist+0x15e7/0x4d90

    because KASAN has not yet unpoisoned the shadow page for allocation
    before it checks memchr_inv() but only found a stale poison pattern.

    Also, false positives in free path,

    BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
    Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
    CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
    Call Trace:
    dump_stack+0xe0/0x19a
    print_address_description.cold.2+0x9/0x28b
    kasan_report.cold.3+0x7a/0xb5
    check_memory_region+0x22d/0x250
    memset+0x28/0x40
    kernel_poison_pages+0x29e/0x3d5
    __free_pages_ok+0x75f/0x13e0

    due to KASAN adds poisoned redzones around slab objects, but the page
    poisoning needs to poison the whole page.

    Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Acked-by: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Qian Cai
     
  • [ Upstream commit 5704a06810682683355624923547b41540e2801a ]

    (Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)

    'get_unused_fd_flags' in kthread cause kernel crash. It works fine on
    4.1, but causes crash after get 64 fds. It also cause crash on
    ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
    same.

    The crash message on centos7.5 shows below:

    start fd 61
    start fd 62
    start fd 63
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: __wake_up_common+0x2e/0x90
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
    virtio floppy dm_mirror dm_region_hash dm_log dm_mod
    CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
    task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
    RIP: 0010:__wake_up_common+0x2e/0x90
    RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086
    RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
    RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
    R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
    FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
    Call Trace:
    __wake_up+0x39/0x50
    expand_files+0x131/0x250
    __alloc_fd+0x47/0x170
    get_unused_fd_flags+0x30/0x40
    test_fd+0x12a/0x1c0 [test]
    kthread+0xd1/0xe0
    ret_from_fork_nospec_begin+0x21/0x21
    Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
    RIP __wake_up_common+0x2e/0x90
    RSP
    CR2: 0000000000000000

    This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
    (3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not
    initialized before being used.

    Reported-by: Richard Zhang
    Reviewed-by: Andrew Morton
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Shuriyc Chu
     
  • [ Upstream commit 9083977dabf3833298ddcd40dee28687f1e6b483 ]

    Fix below warning coming because of using mutex lock in atomic context.

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
    in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
    Preemption disabled at: __radix_tree_preload+0x28/0x130
    Call trace:
    dump_backtrace+0x0/0x2b4
    show_stack+0x20/0x28
    dump_stack+0xa8/0xe0
    ___might_sleep+0x144/0x194
    __might_sleep+0x58/0x8c
    mutex_lock+0x2c/0x48
    f2fs_trace_pid+0x88/0x14c
    f2fs_set_node_page_dirty+0xd0/0x184

    Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
    spin_lock() acquired.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit cc725ef3cb202ef2019a3c67c8913efa05c3cce6 ]

    In the process of creating a node, it will cause NULL pointer
    dereference in kernel if o2cb_ctl failed in the interval (mkdir,
    o2cb_set_node_attribute(node_num)] in function o2cb_add_node.

    The node num is initialized to 0 in function o2nm_node_group_make_item,
    o2nm_node_group_drop_item will mistake the node number 0 for a valid
    node number when we delete the node before the node number is set
    correctly. If the local node number of the current host happens to be
    0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
    o2hb_thread still running. The panic stack is generated as follows:

    o2hb_thread
    \-o2hb_do_disk_heartbeat
    \-o2hb_check_own_slot
    |-slot = ®->hr_slots[o2nm_this_node()];
    //o2nm_this_node() return O2NM_INVALID_NODE_NUM

    We need to check whether the node number is set when we delete the node.

    Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com
    Signed-off-by: Jia Guo
    Reviewed-by: Joseph Qi
    Acked-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Jia Guo