07 Oct, 2020

2 commits

  • Kernel threads intentionally do CLONE_FS in order to follow any changes
    that 'init' does to set up the root directory (or cwd).

    It is admittedly a bit odd, but it avoids the situation where 'init'
    does some extensive setup to initialize the system environment, and then
    we execute a usermode helper program, and it uses the original FS setup
    from boot time that may be very limited and incomplete.

    [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
    follow the root regardless, since it fixes up other users of root (see
    chroot_fs_refs() for details), but overmounting root and doing a
    chroot() would not. ]

    However, Vegard Nossum noticed that the CLONE_FS not only means that we
    follow the root and current working directories, it also means we share
    umask with whatever init changed it to. That wasn't intentional.

    Just reset umask to the original default (0022) before actually starting
    the usermode helper program.

    Reported-by: Vegard Nossum
    Cc: Al Viro
    Acked-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Tetsuo Handa reports that splice() can return 0 before the real EOF, if
    the data in the splice source pipe is an empty pipe buffer. That empty
    pipe buffer case doesn't happen in any normal situation, but you can
    trigger it by doing a write to a pipe that fails due to a page fault.

    Tetsuo has a test-case to show the behavior:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    int main(int argc, char *argv[])
    {
    const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
    int pipe_fd[2] = { -1, -1 };
    pipe(pipe_fd);
    write(pipe_fd[1], NULL, 4096);
    /* This splice() should wait unless interrupted. */
    return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
    }

    which results in

    write(5, NULL, 4096) = -1 EFAULT (Bad address)
    splice(4, NULL, 3, NULL, 65536, 0) = 0

    and this can confuse splice() users into believing they have hit EOF
    prematurely.

    The issue was introduced when the pipe write code started pre-allocating
    the pipe buffers before copying data from user space.

    This is modified verion of Tetsuo's original patch.

    Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
    Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/
    Reported-by: Tetsuo Handa
    Acked-by: Tetsuo Handa
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Oct, 2020

2 commits

  • Pull x86 platform driver fixes from Andy Shevchenko:
    "We have some fixes for Tablet Mode reporting in particular, that users
    are complaining a lot about.

    Summary:

    - Attempt #3 of enabling Tablet Mode reporting w/o regressions

    - Improve battery recognition code in ASUS WMI driver

    - Fix Kconfig dependency warning for Fujitsu and LG laptop drivers

    - Add fixes in Thinkpad ACPI driver for _BCL method and NVRAM polling

    - Fix power supply extended topology in Mellanox driver

    - Fix memory leak in OLPC EC driver

    - Avoid static struct device in Intel PMC core driver

    - Add support for the touchscreen found in MPMAN Converter9 2-in-1

    - Update MAINTAINERS to reflect the real state of affairs"

    * tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86:
    platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
    MAINTAINERS: Add Mark Gross and Hans de Goede as x86 platform drivers maintainers
    platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting
    platform/x86: intel-vbtn: Revert "Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360"
    platform/x86: intel_pmc_core: do not create a static struct device
    platform/x86: mlx-platform: Fix extended topology configuration for power supply units
    platform/x86: pcengines-apuv2: Fix typo on define of AMD_FCH_GPIO_REG_GPIO55_DEVSLP0
    platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
    platform/x86: fix kconfig dependency warning for LG_LAPTOP
    platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
    platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360
    platform/x86: asus-wmi: Add BATC battery name to the list of supported
    platform/x86: asus-nb-wmi: Revert "Do not load on Asus T100TA and T200TA"
    platform/x86: touchscreen_dmi: Add info for the MPMAN Converter9 2-in-1
    Documentation: laptops: thinkpad-acpi: fix underline length build warning
    Platform: OLPC: Fix memleak in olpc_ec_probe

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Make sure SKB control block is in the proper state during IPSEC
    ESP-in-TCP encapsulation. From Sabrina Dubroca.

    2) Various kinds of attributes were not being cloned properly when we
    build new xfrm_state objects from existing ones. Fix from Antony
    Antony.

    3) Make sure to keep BTF sections, from Tony Ambardar.

    4) TX DMA channels need proper locking in lantiq driver, from Hauke
    Mehrtens.

    5) Honour route MTU during forwarding, always. From Maciej
    Żenczykowski.

    6) Fix races in kTLS which can result in crashes, from Rohit
    Maheshwari.

    7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
    Jha.

    8) Use correct address family in xfrm state lookups, from Herbert Xu.

    9) A bridge FDB flush should not clear out user managed fdb entries
    with the ext_learn flag set, from Nikolay Aleksandrov.

    10) Fix nested locking of netdev address lists, from Taehee Yoo.

    11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.

    12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.

    13) Don't free command entries in mlx5 while comp handler could still be
    running, from Eran Ben Elisha.

    14) Error flow of request_irq() in mlx5 is busted, due to an off by one
    we try to free and IRQ never allocated. From Maor Gottlieb.

    15) Fix leak when dumping netlink policies, from Johannes Berg.

    16) Sendpage cannot be performed when a page is a slab page, or the page
    count is < 1. Some subsystems such as nvme were doing so. Create a
    "sendpage_ok()" helper and use it as needed, from Coly Li.

    17) Don't leak request socket when using syncookes with mptcp, from
    Paolo Abeni.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
    net/core: check length before updating Ethertype in skb_mpls_{push,pop}
    net: mvneta: fix double free of txq->buf
    net_sched: check error pointer in tcf_dump_walker()
    net: team: fix memory leak in __team_options_register
    net: typhoon: Fix a typo Typoon --> Typhoon
    net: hinic: fix DEVLINK build errors
    net: stmmac: Modify configuration method of EEE timers
    tcp: fix syn cookied MPTCP request socket leak
    libceph: use sendpage_ok() in ceph_tcp_sendpage()
    scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
    drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
    tcp: use sendpage_ok() to detect misused .sendpage
    nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
    net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
    net: introduce helper sendpage_ok() in include/linux/net.h
    net: usb: pegasus: Proper error handing when setting pegasus' MAC address
    net: core: document two new elements of struct net_device
    netlink: fix policy dump leak
    net/mlx5e: Fix race condition on nhe->n pointer in neigh update
    net/mlx5e: Fix VLAN create flow
    ...

    Linus Torvalds
     

05 Oct, 2020

6 commits

  • Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0.
    When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered.

    Re-initialize buffer size will make ACPI evaluate successfully.

    Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL")
    Signed-off-by: Aaron Ma
    Signed-off-by: Andy Shevchenko

    Aaron Ma
     
  • Linus Torvalds
     
  • Openvswitch allows to drop a packet's Ethernet header, therefore
    skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
    and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
    doesn't point to an Ethernet header and the new Ethertype is written at
    unexpected locations.

    Fix this by verifying that mac_len is big enough to contain an Ethernet
    header.

    Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
    Signed-off-by: Guillaume Nault
    Acked-by: Davide Caratti
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • clang static analysis reports this problem:

    drivers/net/ethernet/marvell/mvneta.c:3465:2: warning:
    Attempt to free released memory
    kfree(txq->buf);
    ^~~~~~~~~~~~~~~

    When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs,
    it frees without poisoning txq->buf. The error is caught
    in the mvneta_setup_txqs() caller which handles the error
    by cleaning up all of the txqs with a call to
    mvneta_txq_sw_deinit which also frees txq->buf.

    Since mvneta_txq_sw_deinit is a general cleaner, all of the
    partial cleaning in mvneta_txq_sw_deinit()'s error handling
    is not needed.

    Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
    Signed-off-by: Tom Rix
    Signed-off-by: David S. Miller

    Tom Rix
     
  • Although we take RTNL on dump path, it is possible to
    skip RTNL on insertion path. So the following race condition
    is possible:

    rtnl_lock() // no rtnl lock
    mutex_lock(&idrinfo->lock);
    // insert ERR_PTR(-EBUSY)
    mutex_unlock(&idrinfo->lock);
    tc_dump_action()
    rtnl_unlock()

    So we have to skip those temporary -EBUSY entries on dump path
    too.

    Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
    Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • The variable "i" isn't initialized back correctly after the first loop
    under the label inst_rollback gets executed.

    The value of "i" is assigned to be option_count - 1, and the ensuing
    loop (under alloc_rollback) begins by initializing i--.
    Thus, the value of i when the loop begins execution will now become
    i = option_count - 2.

    Thus, when kfree(dst_opts[i]) is called in the second loop in this
    order, (i.e., inst_rollback followed by alloc_rollback),
    dst_optsp[option_count - 2] is the first element freed, and
    dst_opts[option_count - 1] does not get freed, and thus, a memory
    leak is caused.

    This memory leak can be fixed, by assigning i = option_count (instead of
    option_count - 1).

    Fixes: 80f7c6683fe0 ("team: add support for per-port options")
    Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
    Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
    Signed-off-by: Anant Thazhemadam
    Signed-off-by: David S. Miller

    Anant Thazhemadam
     

04 Oct, 2020

11 commits

  • s/Typoon/Typhoon/

    Signed-off-by: Christophe JAILLET
    Signed-off-by: David S. Miller

    Christophe JAILLET
     
  • Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK.

    ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler':
    hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
    hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
    hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump':
    hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put'
    ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc':
    hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free':
    hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register':
    hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister':
    hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create':
    hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create'
    ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create'
    ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy'
    ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy':

    Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface")
    Signed-off-by: Randy Dunlap
    Cc: Bin Luo
    Cc: "David S. Miller"
    Cc: Jakub Kicinski
    Cc: Aviad Krawczyk
    Cc: Zhao Chen
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • Ethtool manual stated that the tx-timer is the "the amount of time the
    device should stay in idle mode prior to asserting its Tx LPI". The
    previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW
    timer duration which is not correct. Hence, this patch fixes the
    "ethtool --set-eee tx-timer" to configure the EEE LPI timer.

    The LPI TW Timer will be using the defined default value instead of
    "ethtool --set-eee tx-timer" which follows the EEE LS timer implementation.

    Changelog V2
    *Not removing/modifying the eee_timer.
    *EEE LPI timer can be configured through ethtool and also the eee_timer
    module param.
    *EEE TW Timer will be configured with default value only, not able to be
    configured through ethtool or module param. This follows the implementation
    of the EEE LS Timer.

    Fixes: d765955d2ae0 ("stmmac: add the Energy Efficient Ethernet support")
    Signed-off-by: Vineetha G. Jaya Kumaran
    Signed-off-by: Voon Weifeng
    Signed-off-by: David S. Miller

    Vineetha G. Jaya Kumaran
     
  • Pull kvm fixes from Paolo Bonzini:
    "Two bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
    KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest

    Linus Torvalds
     
  • Pull xen fix from Juergen Gross:
    "Fix a regression introduced in 5.9-rc3 which caused a system running
    as fully virtualized guest under Xen to crash when using legacy
    devices like a floppy"

    * tag 'for-linus-5.9b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/events: don't use chip_data for legacy IRQs

    Linus Torvalds
     
  • Pull USB/PHY fixes from Greg KH:
    "Here are some small USB and PHY driver fixes for 5.9-rc8

    The PHY driver fix resolves an issue found by Dan Carpenter for a
    memory leak.

    The USB fixes fall into two groups:

    - usb gadget fix from Bryan that is a fix for a previous security fix
    that showed up in in-the-wild testing

    - usb core driver matching bugfixes. This fixes a bug that has
    plagued the both the usbip driver and syzbot testing tools this -rc
    release cycle. All is now working properly so usbip connections
    will work, and syzbot can get back to fuzzing USB drivers properly.

    All have been in linux-next for a while with no reported issues"

    * tag 'usb-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usbcore/driver: Accommodate usbip
    usbcore/driver: Fix incorrect downcast
    usbcore/driver: Fix specific driver selection
    Revert "usbip: Implement a match function to fix usbip"
    USB: gadget: f_ncm: Fix NDP16 datagram validation
    phy: ti: am654: Fix a leak in serdes_am654_probe()

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "Some more driver fixes for i2c"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: npcm7xx: Clear LAST bit after a failed transaction.
    i2c: cpm: Fix i2c_ram structure
    i2c: i801: Exclude device from suspend direct complete optimization

    Linus Torvalds
     
  • Pull input fixes from Dmitry Torokhov:
    "A couple more driver quirks, now enabling newer trackpoints from
    Synaptics for real"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
    Input: trackpoint - enable Synaptics trackpoints

    Linus Torvalds
     
  • One of the entries has three fields "mistake||correction||correction"
    rather than the expected two fields "mistake||correction". Fix it.

    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20200930234359.255295-1-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • memalloc_nocma_{save/restore} APIs can be used to skip page allocation
    on CMA area, but, there is a missing case and the page on CMA area could
    be allocated even if APIs are used. This patch handles this case to fix
    the potential issue.

    For now, these APIs are used to prevent long-term pinning on the CMA
    page. When the long-term pinning is requested on the CMA page, it is
    migrated to the non-CMA page before pinning. This non-CMA page is
    allocated by using memalloc_nocma_{save/restore} APIs. If APIs doesn't
    work as intended, the CMA page is allocated and it is pinned for a long
    time. This long-term pin for the CMA page causes cma_alloc() failure
    and it could result in wrong behaviour on the device driver who uses the
    cma_alloc().

    Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
    could have the pages on CMA area so we need to skip it if ALLOC_CMA
    isn't specified.

    Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: "Aneesh Kumar K . V"
    Cc: Mel Gorman
    Link: https://lkml.kernel.org/r/1601429472-12599-1-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • The routine that applies debug flags to the kmem_cache slabs
    inadvertantly prevents non-debug flags from being applied to those
    same objects. That is, if slub_debug=, is specified,
    non-debugged slabs will end up having flags of zero, and the slabs
    may be unusable.

    Fix this by including the input flags for non-matching slabs with the
    contents of slub_debug, so that the caches are created as expected
    alongside any debugging options that may be requested. With this, we
    can remove the check for a NULL slub_debug_string, since it's covered
    by the loop itself.

    Fixes: e17f1dfba37b ("mm, slub: extend slub_debug syntax for multiple blocks")
    Signed-off-by: Eric Farman
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Kees Cook
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: https://lkml.kernel.org/r/20200930161931.28575-1-farman@linux.ibm.com
    Signed-off-by: Linus Torvalds

    Eric Farman
     

03 Oct, 2020

19 commits

  • …kvmarm/kvmarm into kvm-master

    KVM/arm64 fixes for 5.9, take #3

    - Fix synchronization of VTTBR update on TLB invalidation for nVHE systems

    Paolo Bonzini
     
  • The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of
    the #PF intercept bit in the exception bitmap when they do not match.
    This means that, if PFEC_MASK and/or PFEC_MATCH are set, the
    hypervisor can get a vmexit for #PF exceptions even when the
    corresponding bit is clear in the exception bitmap.

    This is unexpected and is promptly detected by a WARN_ON_ONCE.
    To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept
    is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr).

    Reported-by: Qian Cai >
    Reported-by: Naresh Kamboju
    Tested-by: Naresh Kamboju
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • From: Saeed Mahameed

    ====================
    This series introduces some fixes to mlx5 driver.

    v1->v2:
    - Patch #1 Don't return while mutex is held. (Dave)

    v2->v3:
    - Drop patch #1, will consider a better approach (Jakub)
    - use cpu_relax() instead of cond_resched() (Jakub)
    - while(i--) to reveres a loop (Jakub)
    - Drop old mellanox email sign-off and change the committer email
    (Jakub)

    Please pull and let me know if there is any problem.

    For -stable v4.15
    ('net/mlx5e: Fix VLAN cleanup flow')
    ('net/mlx5e: Fix VLAN create flow')

    For -stable v4.16
    ('net/mlx5: Fix request_irqs error flow')

    For -stable v5.4
    ('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU')
    ('net/mlx5: Avoid possible free of command entry while timeout comp handler')

    For -stable v5.7
    ('net/mlx5e: Fix return status when setting unsupported FEC mode')

    For -stable v5.8
    ('net/mlx5e: Fix race condition on nhe->n pointer in neigh update')
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If a syn-cookies request socket don't pass MPTCP-level
    validation done in syn_recv_sock(), we need to release
    it immediately, or it will be leaked.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/89
    Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
    Reported-and-tested-by: Geliang Tang
    Reviewed-by: Matthieu Baerts
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Coly Li says:

    ====================
    Introduce sendpage_ok() to detect misused sendpage in network related drivers

    As Sagi Grimberg suggested, the original fix is refind to a more common
    inline routine:
    static inline bool sendpage_ok(struct page *page)
    {
    return (!PageSlab(page) && page_count(page) >= 1);
    }
    If sendpage_ok() returns true, the checking page can be handled by the
    concrete zero-copy sendpage method in network layer.

    The v10 series has 7 patches, fixes a WARN_ONCE() usage from v9 series,
    - The 1st patch in this series introduces sendpage_ok() in header file
    include/linux/net.h.
    - The 2nd patch adds WARN_ONCE() for improper zero-copy send in
    kernel_sendpage().
    - The 3rd patch fixes the page checking issue in nvme-over-tcp driver.
    - The 4th patch adds page_count check by using sendpage_ok() in
    do_tcp_sendpages() as Eric Dumazet suggested.
    - The 5th and 6th patches just replace existing open coded checks with
    the inline sendpage_ok() routine.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In libceph, ceph_tcp_sendpage() does the following checks before handle
    the page by network layer's zero copy sendpage method,
    if (page_count(page) >= 1 && !PageSlab(page))

    This check is exactly what sendpage_ok() does. This patch replace the
    open coded checks by sendpage_ok() as a code cleanup.

    Signed-off-by: Coly Li
    Acked-by: Jeff Layton
    Cc: Ilya Dryomov
    Signed-off-by: David S. Miller

    Coly Li
     
  • In iscsci driver, iscsi_tcp_segment_map() uses the following code to
    check whether the page should or not be handled by sendpage:
    if (!recv && page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg)))

    The "page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg)" part is to
    make sure the page can be sent to network layer's zero copy path. This
    part is exactly what sendpage_ok() does.

    This patch uses use sendpage_ok() in iscsi_tcp_segment_map() to replace
    the original open coded checks.

    Signed-off-by: Coly Li
    Reviewed-by: Lee Duncan
    Acked-by: Martin K. Petersen
    Cc: Vasily Averin
    Cc: Cong Wang
    Cc: Mike Christie
    Cc: Chris Leech
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Signed-off-by: David S. Miller

    Coly Li
     
  • In _drbd_send_page() a page is checked by following code before sending
    it by kernel_sendpage(),
    (page_count(page) < 1) || PageSlab(page)
    If the check is true, this page won't be send by kernel_sendpage() and
    handled by sock_no_sendpage().

    This kind of check is exactly what macro sendpage_ok() does, which is
    introduced into include/linux/net.h to solve a similar send page issue
    in nvme-tcp code.

    This patch uses macro sendpage_ok() to replace the open coded checks to
    page type and refcount in _drbd_send_page(), as a code cleanup.

    Signed-off-by: Coly Li
    Cc: Philipp Reisner
    Cc: Sagi Grimberg
    Signed-off-by: David S. Miller

    Coly Li
     
  • commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab
    objects") adds the checks for Slab pages, but the pages don't have
    page_count are still missing from the check.

    Network layer's sendpage method is not designed to send page_count 0
    pages neither, therefore both PageSlab() and page_count() should be
    both checked for the sending page. This is exactly what sendpage_ok()
    does.

    This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused
    .sendpage, to make the code more robust.

    Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects")
    Suggested-by: Eric Dumazet
    Signed-off-by: Coly Li
    Cc: Vasily Averin
    Cc: David S. Miller
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Coly Li
     
  • Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
    send slab pages. But for pages allocated by __get_free_pages() without
    __GFP_COMP, which also have refcount as 0, they are still sent by
    kernel_sendpage() to remote end, this is problematic.

    The new introduced helper sendpage_ok() checks both PageSlab tag and
    page_count counter, and returns true if the checking page is OK to be
    sent by kernel_sendpage().

    This patch fixes the page checking issue of nvme_tcp_try_send_data()
    with sendpage_ok(). If sendpage_ok() returns true, send this page by
    kernel_sendpage(), otherwise use sock_no_sendpage to handle this page.

    Signed-off-by: Coly Li
    Cc: Chaitanya Kulkarni
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Mikhail Skorzhinskii
    Cc: Philipp Reisner
    Cc: Sagi Grimberg
    Cc: Vlastimil Babka
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Coly Li
     
  • If a page sent into kernel_sendpage() is a slab page or it doesn't have
    ref_count, this page is improper to send by the zero copy sendpage()
    method. Otherwise such page might be unexpected released in network code
    path and causes impredictable panic due to kernel memory management data
    structure corruption.

    This path adds a WARN_ON() on the sending page before sends it into the
    concrete zero-copy sendpage() method, if the page is improper for the
    zero-copy sendpage() method, a warning message can be observed before
    the consequential unpredictable kernel panic.

    This patch does not change existing kernel_sendpage() behavior for the
    improper page zero-copy send, it just provides hint warning message for
    following potential panic due the kernel memory heap corruption.

    Signed-off-by: Coly Li
    Cc: Cong Wang
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Coly Li
     
  • The original problem was from nvme-over-tcp code, who mistakenly uses
    kernel_sendpage() to send pages allocated by __get_free_pages() without
    __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
    tail pages, sending them by kernel_sendpage() may trigger a kernel panic
    from a corrupted kernel heap, because these pages are incorrectly freed
    in network stack as page_count 0 pages.

    This patch introduces a helper sendpage_ok(), it returns true if the
    checking page,
    - is not slab page: PageSlab(page) is false.
    - has page refcount: page_count(page) is not zero

    All drivers who want to send page to remote end by kernel_sendpage()
    may use this helper to check whether the page is OK. If the helper does
    not return true, the driver should try other non sendpage method (e.g.
    sock_no_sendpage()) to handle the page.

    Signed-off-by: Coly Li
    Cc: Chaitanya Kulkarni
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Mikhail Skorzhinskii
    Cc: Philipp Reisner
    Cc: Sagi Grimberg
    Cc: Vlastimil Babka
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Coly Li
     
  • v2:

    If reading the MAC address from eeprom fail don't throw an error, use randomly
    generated MAC instead. Either way the adapter will soldier on and the return
    type of set_ethernet_addr() can be reverted to void.

    v1:

    Fix a bug in set_ethernet_addr() which does not take into account possible
    errors (or partial reads) returned by its helpers. This can potentially lead to
    writing random data into device's MAC address registers.

    Signed-off-by: Petko Manolov
    Signed-off-by: David S. Miller

    Petko Manolov
     
  • As warned by "make htmldocs", there are two new struct elements
    that aren't documented:

    ../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device'
    ../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device'

    Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device")
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: David S. Miller

    Mauro Carvalho Chehab
     
  • Pull pin control fixes from Linus Walleij:
    "Some pin control fixes here. All of them are driver fixes, the Intel
    Cherryview being the most interesting one.

    - Fix a mux problem for I2C in the MVEBU driver.

    - Fix a really hairy inversion problem in the Intel Cherryview
    driver.

    - Fix the register for the sdc2_clk in the Qualcomm SM8250 driver.

    - Check the virtual GPIO boot failur in the Mediatek driver"

    * tag 'pinctrl-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
    pinctrl: mediatek: check mtk_is_virt_gpio input parameter
    pinctrl: qcom: sm8250: correct sdc2_clk
    pinctrl: cherryview: Preserve CHV_PADCTRL1_INVRXTX_TXDATA flag on GPIOs
    pinctrl: mvebu: Fix i2c sda definition for 98DX3236

    Linus Torvalds
     
  • Pull PCI fixes from Bjorn Helgaas:

    - Fix rockchip regression in rockchip_pcie_valid_device() (Lorenzo
    Pieralisi)

    - Add Pali Rohár as aardvark PCI maintainer (Pali Rohár)

    * tag 'pci-v5.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    MAINTAINERS: Add Pali Rohár as aardvark PCI maintainer
    PCI: rockchip: Fix bus checks in rockchip_pcie_valid_device()

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "Two patches in driver frameworks. The iscsi one corrects a bug induced
    by a BPF change to network locking and the other is a regression we
    introduced"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()
    scsi: target: Fix lun lookup for TARGET_SCF_LOOKUP_LUN_FROM_TAG case

    Linus Torvalds
     
  • Pull io_uring fixes from Jens Axboe:

    - fix for async buffered reads if read-ahead is fully disabled (Hao)

    - double poll match fix

    - ->show_fdinfo() potential ABBA deadlock complaint fix

    * tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
    io_uring: fix async buffered reads when readahead is disabled
    io_uring: fix potential ABBA deadlock in ->show_fdinfo()
    io_uring: always delete double poll wait entry on match

    Linus Torvalds
     
  • Pull block fix from Jens Axboe:
    "Single fix for a ->commit_rqs failure case"

    * tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
    blk-mq: call commit_rqs while list empty but error happen

    Linus Torvalds