17 Jan, 2021

3 commits

  • [ Upstream commit f9c4845385c8f6631ebd5dddfb019ea7a285fba4 ]

    ip_finish_output_gso() may call .ndo_features_check() even before the
    skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt
    to inspect the L2 header via vlan_eth_hdr().

    Switch to vlan_get_protocol(), as already used further down in the
    common qeth_features_check() path.

    Fixes: f13ade199391 ("s390/qeth: run non-offload L3 traffic over common xmit path")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upstream commit b41b554c1ee75070a14c02a88496b1f231c7eacc ]

    Due to insufficient locking, qeth_core_set_online() and
    qeth_dev_layer2_store() can run in parallel, both attempting to load &
    setup the discipline (and stepping on each other toes along the way).
    A similar race can also occur between qeth_core_remove_device() and
    qeth_dev_layer2_store().

    Access to .discipline is meant to be protected by the discipline_mutex,
    so add/expand the locking in qeth_core_remove_device() and
    qeth_core_set_online().
    Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
    handled by the callers in a consistent manner.

    Based on an initial patch by Ursula Braun.

    Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations")
    Signed-off-by: Julian Wiedmann
    Reviewed-by: Alexandra Winter
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upstream commit 0b9902c1fcc59ba75268386c0420a554f8844168 ]

    When qeth_dev_layer2_store() - holding the discipline_mutex - waits
    inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
    we can hit a deadlock if qeth_do_reset() concurrently calls
    qeth_set_online() and thus tries to aquire the discipline_mutex.

    Move the discipline_mutex locking outside of qeth_set_online() and
    qeth_set_offline(), and turn the discipline into a parameter so that
    callers understand the dependency.

    To fix the deadlock, we can now relax the locking:
    As already established, qeth_l*_remove_device() waits for
    qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
    of having card->discipline ripped out while it's running, and thus
    doesn't need to take the discipline_mutex.

    Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations")
    Signed-off-by: Julian Wiedmann
    Reviewed-by: Alexandra Winter
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     

30 Dec, 2020

5 commits

  • commit 53a7f655834c7c335bf683f248208d4fbe4b47bc upstream.

    In dasd_alias_disconnect_device_from_lcu the device is removed from any
    list on the LCU. Afterwards the LCU is removed from the lcu list if it
    does not contain devices any longer.

    The lcu->lock protects the lcu from parallel updates. But to cancel all
    workers and wait for completion the lcu->lock has to be unlocked.

    If two devices are removed in parallel and both are removed from the LCU
    the first device that takes the lcu->lock again will delete the LCU because
    it is already empty but the second device also tries to free the LCU which
    leads to a list corruption of the lcu list.

    Fix by removing the device right before the lcu is checked without
    unlocking the lcu->lock in between.

    Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
    Cc: stable@vger.kernel.org
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Stefan Haberland
     
  • commit 0ede91f83aa335da1c3ec68eb0f9e228f269f6d8 upstream.

    dasd_alias_add_device() moves devices to the active_devices list in case
    of a scheduled LCU update regardless if they have previously been in a
    pavgroup or not.

    Example: device A and B are in the same pavgroup.

    Device A has already been in a pavgroup and the private->pavgroup pointer
    is set and points to a valid pavgroup. While going through dasd_add_device
    it is moved from the pavgroup to the active_devices list.

    In parallel device B might be removed from the same pavgroup in
    remove_device_from_lcu() which in turn checks if the group is empty
    and deletes it accordingly because device A has already been removed from
    there.

    When now device A enters remove_device_from_lcu() it is tried to remove it
    from the pavgroup again because the pavgroup pointer is still set and again
    the empty group will be cleaned up which leads to a list corruption.

    Fix by setting private->pavgroup to NULL in dasd_add_device.

    If the device has been the last device on the pavgroup an empty pavgroup
    remains but this will be cleaned up by the scheduled lcu_update which
    iterates over all existing pavgroups.

    Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
    Cc: stable@vger.kernel.org
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Stefan Haberland
     
  • commit a29ea01653493b94ea12bb2b89d1564a265081b6 upstream.

    Prevent _lcu_update from adding a device to a pavgroup if the LCU still
    requires an update. The data is not reliable any longer and in parallel
    devices might have been moved on the lists already.
    This might lead to list corruptions or invalid PAV grouping.
    Only add devices to a pavgroup if the LCU is up to date. Additional steps
    are taken by the scheduled lcu update.

    Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
    Cc: stable@vger.kernel.org
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Stefan Haberland
     
  • commit 658a337a606f48b7ebe451591f7681d383fa115e upstream.

    For an LCU update a read unit address configuration IO is required.
    This is started using sleep_on(), which has early exit paths in case the
    device is not usable for IO. For example when it is in offline processing.

    In those cases the LCU update should fail and not be retried.
    Therefore lcu_update_work checks if EOPNOTSUPP is returned or not.

    Commit 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration")
    accidentally removed the EOPNOTSUPP return code from
    read_unit_address_configuration(), which in turn might lead to an endless
    loop of the LCU update in offline processing.

    Fix by returning EOPNOTSUPP again if the device is not able to perform the
    request.

    Fixes: 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration")
    Cc: stable@vger.kernel.org #5.3
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Stefan Haberland
     
  • [ Upstream commit 14d4c4fa46eeaa3922e8e1c4aa727eb0a1412804 ]

    Use of sch->dev reference after the put_device() call could trigger
    the use-after-free bugs.

    Fix this by simply adjusting the position of put_device.

    Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support")
    Reported-by: Hulk Robot
    Suggested-by: Cornelia Huck
    Signed-off-by: Qinglang Miao
    Reviewed-by: Cornelia Huck
    Reviewed-by: Vineeth Vijayan
    [vneethv@linux.ibm.com: Slight modification in the commit-message]
    Signed-off-by: Vineeth Vijayan
    Signed-off-by: Heiko Carstens
    Signed-off-by: Sasha Levin

    Qinglang Miao
     

28 Nov, 2020

1 commit

  • Pull networking fixes from Jakub Kicinski:
    "Networking fixes for 5.10-rc6, including fixes from the WiFi driver,
    and CAN subtrees.

    Current release - regressions:

    - gro_cells: reduce number of synchronize_net() calls

    - ch_ktls: release a lock before jumping to an error path

    Current release - always broken:

    - tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header

    Previous release - regressions:

    - net/tls: fix missing received data after fast remote close

    - vsock/virtio: discard packets only when socket is really closed

    - sock: set sk_err to ee_errno on dequeue from errq

    - cxgb4: fix the panic caused by non smac rewrite

    Previous release - always broken:

    - tcp: fix corner cases around setting ECN with BPF selection of
    congestion control

    - tcp: fix race condition when creating child sockets from syncookies
    on loopback interface

    - usbnet: ipheth: fix connectivity with iOS 14

    - tun: honor IOCB_NOWAIT flag

    - net/packet: fix packet receive on L3 devices without visible hard
    header

    - devlink: Make sure devlink instance and port are in same net
    namespace

    - net: openvswitch: fix TTL decrement action netlink message format

    - bonding: wait for sysfs kobject destruction before freeing struct
    slave

    - net: stmmac: fix upstream patch applied to the wrong context

    - bnxt_en: fix return value and unwind in probe error paths

    Misc:

    - devlink: add extra layer of categorization to the reload stats uAPI
    before it's released"

    * tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
    sock: set sk_err to ee_errno on dequeue from errq
    mptcp: fix NULL ptr dereference on bad MPJ
    net: openvswitch: fix TTL decrement action netlink message format
    can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
    can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
    can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
    can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
    can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
    can: gs_usb: fix endianess problem with candleLight firmware
    ch_ktls: lock is not freed
    net/tls: Protect from calling tls_dev_del for TLS RX twice
    devlink: Make sure devlink instance and port are in same net namespace
    devlink: Hold rtnl lock while reading netdev attributes
    ptp: clockmatrix: bug fix for idtcm_strverscmp
    enetc: Let the hardware auto-advance the taprio base-time of 0
    gro_cells: reduce number of synchronize_net() calls
    net: stmmac: fix incorrect merge of patch upstream
    ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
    Documentation: netdev-FAQ: suggest how to post co-dependent series
    ibmvnic: enhance resetting status check during module exit
    ...

    Linus Torvalds
     

21 Nov, 2020

5 commits

  • When qeth_iqd_tx_complete() detects that a TX buffer requires additional
    async completion via QAOB, it might fail to replace the queue entry's
    metadata (and ends up triggering recovery).

    Assume now that the device gets torn down, overruling the recovery.
    If the QAOB notification then arrives before the tear down has
    sufficiently progressed, the buffer state is changed to
    QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob().

    The tear down code calls qeth_drain_output_queue(), where
    qeth_cleanup_handled_pending() will then attempt to replace such a
    buffer _again_. If it succeeds this time, the buffer ends up dangling in
    its replacement's ->next_pending list ... where it will never be freed,
    since there's no further call to qeth_cleanup_handled_pending().

    But the second attempt isn't actually needed, we can simply leave the
    buffer on the queue and re-use it after a potential recovery has
    completed. The qeth_clear_output_buffer() in qeth_drain_output_queue()
    will ensure that it's in a clean state again.

    Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Jakub Kicinski

    Julian Wiedmann
     
  • The two expected notification sequences are
    1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when
    our TX completion code first observed the pending TX and the QAOB
    then completes at a later time; or
    2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB
    completion before our TX completion code even noticed that the TX
    was pending.

    But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run
    concurrently, we may end up with a race that results in a sequence of
    TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse
    the af_iucv code in its tracking of pending transmits.

    Rework the notification code, so that qeth_qdio_handle_aob() defers its
    notification if the TX completion code is still active.

    Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Jakub Kicinski

    Julian Wiedmann
     
  • Calling into socket code is ugly already, at least check whether we are
    dealing with the expected sk_family. Only looking at skb->protocol is
    bound to cause troubles (consider eg. af_packet).

    Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Jakub Kicinski

    Julian Wiedmann
     
  • Remove workaround that supported early hardware implementations
    of PNSO OC3. Rely on the 'enarf' feature bit instead.

    Fixes: fa115adff2c1 ("s390/qeth: Detect PNSO OC3 capability")
    Signed-off-by: Alexandra Winter
    Reviewed-by: Julian Wiedmann
    [jwi: use logical instead of bit-wise AND]
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Jakub Kicinski

    Alexandra Winter
     
  • Pull block fixes from Jens Axboe:

    - NVMe pull request from Christoph:
    - Doorbell Buffer freeing fix (Minwoo Im)
    - CSE log leak fix (Keith Busch)

    - blk-cgroup hd_struct leak fix (Christoph)

    - Flush request state fix (Ming)

    - dasd NULL deref fix (Stefan)

    * tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
    s390/dasd: fix null pointer dereference for ERP requests
    blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
    nvme: fix memory leak freeing command effects
    nvme: directly cache command effects log
    nvme: free sq/cq dbbuf pointers when dbbuf set fails
    block: mark flush request as IDLE when it is really finished

    Linus Torvalds
     

16 Nov, 2020

1 commit

  • When requeueing all requests on the device request queue to the blocklayer
    we might get to an ERP (error recovery) request that is a copy of an
    original CQR.

    Those requests do not have blocklayer request information or a pointer to
    the dasd_queue set. When trying to access those data it will lead to a
    null pointer dereference in dasd_requeue_all_requests().

    Fix by checking if the request is an ERP request that can simply be
    ignored. The blocklayer request will be requeued by the original CQR that
    is on the device queue right behind the ERP request.

    Fixes: 9487cfd3430d ("s390/dasd: fix handling of internal requests")
    Cc: #4.16
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe

    Stefan Haberland
     

03 Nov, 2020

2 commits

  • When both the paes and the pkey kernel module are statically build
    into the kernel, the paes cipher selftests run before the pkey
    kernel module is initialized. So a static variable set in the pkey
    init function and used in the pkey_clr2protkey function is not
    initialized when the paes cipher's selftests request to call pckmo for
    transforming a clear key value into a protected key.

    This patch moves the initial setup of the static variable into
    the function pck_clr2protkey. So it's possible, to use the function
    for transforming a clear to a protected key even before the pkey
    init function has been called and the paes selftests may run
    successful.

    Reported-by: Alexander Egorenkov
    Cc: # 4.20
    Fixes: f822ad2c2c03 ("s390/pkey: move pckmo subfunction available checks away from module init")
    Signed-off-by: Harald Freudenberger
    Signed-off-by: Heiko Carstens

    Harald Freudenberger
     
  • With the last rework of the AP bus scan function one get_device() is
    missing causing the reference counter to be one instance too
    low. Together with binding/unbinding device drivers to an ap device it
    may end up in an segfault because the ap device is freed but a device
    driver still assumes it's pointer to the ap device is valid:

    Unable to handle kernel pointer dereference in virtual kernel address space
    Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
    Fault in home space mode while using kernel ASCE.
    Krnl PSW : 0404e00180000000 000000001472f3b6 (klist_next+0x7e/0x180)
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
    Call Trace:
    [] klist_next+0x7e/0x180
    ([] klist_next+0x32/0x180)
    [] bus_for_each_dev+0x66/0xb8
    [] ap_scan_adapter+0xcc/0x6c0
    [] ap_scan_bus+0x82/0x140
    [] process_one_work+0x27c/0x478
    [] worker_thread+0x66/0x368
    [] kthread+0x17a/0x1a0
    [] ret_from_fork+0x24/0x2c
    Kernel panic - not syncing: Fatal exception: panic_on_oops

    Fixed by adjusting the reference count with get_device() on the right
    place. Also now the device drivers don't need to adjust the ap
    device's reference counting any more. This is now done in the ap bus
    probe and remove functions.

    Reported-by: Marc Hartmayer
    Fixes: 4f2fcccdb547 ("s390/ap: add card/queue deconfig state")
    Signed-off-by: Harald Freudenberger
    Signed-off-by: Heiko Carstens

    Harald Freudenberger
     

27 Oct, 2020

1 commit

  • The system EID that is defined by the ISM driver is not correct. Using
    an incorrect system EID allows to communicate with remote Linux systems
    that use the same incorrect system EID, but when it comes to
    interoperability with other operating systems then the system EIDs do
    never match which prevents SMC-Dv2 communication.
    Using the correct system EID fixes this problem.

    Fixes: 201091ebb2a1 ("net/smc: introduce System Enterprise ID (SEID)")
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski

    Karsten Graul
     

17 Oct, 2020

3 commits

  • Pull s390 updates from Vasily Gorbik:

    - Remove address space overrides using set_fs()

    - Convert to generic vDSO

    - Convert to generic page table dumper

    - Add ARCH_HAS_DEBUG_WX support

    - Add leap seconds handling support

    - Add NVMe firmware-assisted kernel dump support

    - Extend NVMe boot support with memory clearing control and addition of
    kernel parameters

    - AP bus and zcrypt api code rework. Add adapter configure/deconfigure
    interface. Extend debug features. Add failure injection support

    - Add ECC secure private keys support

    - Add KASan support for running protected virtualization host with
    4-level paging

    - Utilize destroy page ultravisor call to speed up secure guests
    shutdown

    - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

    - Various checksum improvements

    - Other small various fixes and improvements all over the code

    * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
    s390/uaccess: fix indentation
    s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
    s390/zcrypt: fix wrong format specifications
    s390/kprobes: move insn_page to text segment
    s390/sie: fix typo in SIGP code description
    s390/lib: fix kernel doc for memcmp()
    s390/zcrypt: Introduce Failure Injection feature
    s390/zcrypt: move ap_msg param one level up the call chain
    s390/ap/zcrypt: revisit ap and zcrypt error handling
    s390/ap: Support AP card SCLP config and deconfig operations
    s390/sclp: Add support for SCLP AP adapter config/deconfig
    s390/ap: add card/queue deconfig state
    s390/ap: add error response code field for ap queue devices
    s390/ap: split ap queue state machine state from device state
    s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
    s390/zcrypt: introduce msg tracking in zcrypt functions
    s390/startup: correct early pgm check info formatting
    s390: remove orphaned extern variables declarations
    s390/kasan: make sure int handler always run with DAT on
    s390/ipl: add support to control memory clearing for nvme re-IPL
    ...

    Linus Torvalds
     
  • Merge more updates from Andrew Morton:
    "155 patches.

    Subsystems affected by this patch series: mm (dax, debug, thp,
    readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
    core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
    binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
    romfs, and fault-injection"

    * emailed patches from Andrew Morton : (155 commits)
    lib, uaccess: add failure injection to usercopy functions
    lib, include/linux: add usercopy failure capability
    ROMFS: support inode blocks calculation
    ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
    sched.h: drop in_ubsan field when UBSAN is in trap mode
    scripts/gdb/tasks: add headers and improve spacing format
    scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
    kernel/relay.c: drop unneeded initialization
    panic: dump registers on panic_on_warn
    rapidio: fix the missed put_device() for rio_mport_add_riodev
    rapidio: fix error handling path
    nilfs2: fix some kernel-doc warnings for nilfs2
    autofs: harden ioctl table
    ramfs: fix nommu mmap with gaps in the page cache
    mm: remove the now-unnecessary mmget_still_valid() hack
    mm/gup: take mmap_lock in get_dump_page()
    binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
    coredump: rework elf/elf_fdpic vma_dump_size() into common helper
    coredump: refactor page range dumping into common helper
    coredump: let dump_emit() bail out on short writes
    ...

    Linus Torvalds
     
  • We soon want to pass flags, e.g., to mark added System RAM resources.
    mergeable. Prepare for that.

    This patch is based on a similar patch by Oscar Salvador:

    https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Juergen Gross # Xen related part
    Reviewed-by: Pankaj Gupta
    Acked-by: Wei Liu
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Baoquan He
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Greg Kroah-Hartman
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: David Hildenbrand
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Boris Ostrovsky
    Cc: Stefano Stabellini
    Cc: "Oliver O'Halloran"
    Cc: Pingfan Liu
    Cc: Nathan Lynch
    Cc: Libor Pechacek
    Cc: Anton Blanchard
    Cc: Leonardo Bras
    Cc: Ard Biesheuvel
    Cc: Eric Biederman
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: Roger Pau Monné
    Cc: Thomas Gleixner
    Cc: Wei Yang
    Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

16 Oct, 2020

1 commit

  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • Pull SCSI updates from James Bottomley:
    "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
    hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.

    There are only three core changes: adding sense codes, cleaning up
    noretry and adding an option for limitless retries"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
    scsi: hisi_sas: Recover PHY state according to the status before reset
    scsi: hisi_sas: Filter out new PHY up events during suspend
    scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
    scsi: hisi_sas: Add check for methods _PS0 and _PR0
    scsi: hisi_sas: Add controller runtime PM support for v3 hw
    scsi: hisi_sas: Switch to new framework to support suspend and resume
    scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
    scsi: qedf: Remove redundant assignment to variable 'rc'
    scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
    scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
    scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
    scsi: sun_esp: Use module_platform_driver to simplify the code
    scsi: sun3x_esp: Use module_platform_driver to simplify the code
    scsi: sni_53c710: Use module_platform_driver to simplify the code
    scsi: qlogicpti: Use module_platform_driver to simplify the code
    scsi: mac_esp: Use module_platform_driver to simplify the code
    scsi: jazz_esp: Use module_platform_driver to simplify the code
    scsi: mvumi: Fix error return in mvumi_io_attach()
    scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
    scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
    ...

    Linus Torvalds
     

14 Oct, 2020

1 commit

  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

10 Oct, 2020

1 commit

  • Fixes 5 wrong format specification findings found by the
    kernel test robot in ap_queue.c:

    warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
    __func__, status.response_code,

    Signed-off-by: Harald Freudenberger
    Reported-by: kernel test robot
    Fixes: 2ea2a6099ae3 ("s390/ap: add error response code field for ap queue devices")
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     

08 Oct, 2020

9 commits

  • Introduce a way to specify additional debug flags with an crpyto
    request to be able to trigger certain failures within the zcrypt
    device drivers and/or ap core code.

    This failure injection possibility is only enabled with a kernel debug
    build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular
    kernel running in production environment.

    Details:

    * The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the
    leftmost bit of the 32 bit unsigned int inputdatalength field is
    set, the uppermost 16 bits are separated and used as debug flag
    value. The process is checked to have the CAP_SYS_ADMIN capability
    enabled or EPERM is returned.

    * The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the
    leftmost bit of the 32 bit unsigned int inputdatalength field is set,
    the uppermost 16 bits are separated and used als debug flag
    value. The process is checked to have the CAP_SYS_ADMIN capability
    enabled or EPERM is returned.

    * The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct
    ica_xcRB. If the leftmost bit of the 32 bit unsigned int status
    field is set, the uppermost 16 bits of this field are used as debug
    flag value. The process is checked to have the CAP_SYS_ADMIN
    capability enabled or EPERM is returned.

    * The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct
    ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len
    field is set, the uppermost 16 bits of this field are used as debug
    flag value. The process is checked to have the CAP_SYS_ADMIN
    capability enabled or EPERM is returned.

    So it is possible to send an additional 16 bit value to the zcrypt API
    to be used to carry a failure injection command which may trigger
    special behavior within the zcrypt API and layers below. This 16 bit
    value is for the rest of the test referred as 'fi command' for Failure
    Injection.

    The lower 8 bits of the fi command construct a numerical argument in
    the range of 1-255 and is the 'fi action' to be performed with the
    request or the resulting reply:

    * 0x00 (all requests): No failure injection action but flags may be
    provided which may affect the processing of the request or reply.
    * 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to
    'FF'. This results in an reply code 0x90 (Transport-Protocol
    Failure).
    * 0x02 (only CCA CPRBs): After the APQN to send to has been chosen,
    the domain field within the CPRB is overwritten with value 99 to
    enforce an reply with RY 0x8A.
    * 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00
    is used causing an response code of 0x01 (AP queue not valid).

    The upper 8 bits of the fi command may carry bit flags which may
    influence the processing of an request or response:

    * 0x01: No retry. If this bit is set, the usual loop in the zcrypt API
    which retries an CPRB up to 10 times when the lower layers return
    with EAGAIN is abandoned after the first attempt to send the CPRB.
    * 0x02: Toggle special. Toggles the special bit on this request. This
    should result in an reply code RY~0x41 and result in an ioctl
    failure with errno EINVAL.

    This failure injection possibilities may get some further extensions
    in the future. As of now this is a starting point for Continuous Test
    and Integration to trigger some failures and watch for the reaction of
    the ap bus and zcrypt device driver code.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • Move the creating and disposal of the struct ap_message one
    level up the call chain. The ap message was constructed in the
    calling functions in msgtype50 and msgtype6 but only for the
    ica rsa messages. For CCA and EP11 CPRBs the ap message struct
    is created in the zcrypt api functions.

    This patch moves the construction of the ap message struct into
    the functions zcrypt_rsa_modexpo and zcrypt_rsa_crt. So now all
    the 4 zcrypt api functions zcrypt_rsa_modexpo, zcrypt_rsa_crt,
    zcrypt_send_cprb and zcrypt_send_ep11_cprb appear and act
    similar.

    There are no functional changes coming with this patch.
    However, the availability of the ap_message struct has
    advantages which will be needed by a follow up patch.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • Revisit the ap queue error handling: Based on discussions and
    evaluatios with the firmware folk here is now a rework of the response
    code handling for all the AP instructions. The idea is to distinguish
    between failures because of some kind of invalid request where a retry
    does not make any sense and a failure where another attempt to send
    the very same request may succeed. The first case is handled by
    returning EINVAL to the userspace application. The second case results
    in retries within the zcrypt API controlled by a per message retry
    counter.

    Revisit the zcrpyt error handling: Similar here, based on discussions
    with the firmware people here comes a rework of the handling of all
    the reply codes. Main point here is that there are only very few
    cases left, where a zcrypt device queue is switched to offline. It
    should never be the case that an AP reply message is 'unknown' to the
    device driver as it indicates a total mismatch between device driver
    and crypto card firmware. In all other cases, the code distinguishes
    between failure because of invalid message (see above - EINVAL) or
    failures of the infrastructure (see above - EAGAIN).

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • Support SCLP AP adapter config and deconfig operations:
    The sysfs deconfig attribute /sys/devices/ap/cardxx/deconfig
    for each AP card is now read-write. Writing in a '1' triggers
    a synchronous SCLP request to configure the adapter, writing
    in a '0' sends a synchronous SCLP deconfigure request.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • Add support for AP bus adapter config and deconfig to the sclp
    core code. The code is statically build into the kernel when
    ZCRYPT is configured either as module or with static support.

    This is the base functionality for having configure/deconfigure
    support in the AP bus and card code. Another patch will exploit
    this soon.

    Signed-off-by: Harald Freudenberger
    Suggested-by: Pierre Morel
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • This patch adds a new config state to the ap card and queue
    devices. This state reflects the response code
    0x03 "AP deconfigured" on TQAP invocation and is tracked with
    every ap bus scan.

    Together with this new state now a card/queue device which
    is 'deconfigured' is not disposed any more. However, for backward
    compatibility the online state now needs to take this state into
    account. So a card/queue is offline when the device is not configured.
    Furthermore a device can't get switched from offline to online state
    when not configured.

    The config state is shown in sysfs at
    /sys/devices/ap/cardxx/config
    for the card and
    /sys/devices/ap/cardxx/xx.yyyy/config
    for each queue within each card.
    It is a read-only attribute reflecting the negation of the
    'AP deconfig' state as it is noted in the AP documents.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • On AP instruction failures the last response code is now
    kept in the struct ap_queue. There is also a new sysfs
    attribute showing this field (enabled only on debug kernels).

    Also slight rework of the AP_DBF macros to get some more
    content into one debug feature message line.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • The state machine for each ap queue covered a mixture of
    device states and state machine (firmware queue state) states.

    This patch splits the device states and the state machine
    states into two different enums and variables. The major
    state is the device state with currently these values:

    AP_DEV_STATE_UNINITIATED - fresh and virgin, not touched
    AP_DEV_STATE_OPERATING - queue dev is working normal
    AP_DEV_STATE_SHUTDOWN - remove/unbind/shutdown in progress
    AP_DEV_STATE_ERROR - device is in error state

    only when the device state is > UNINITIATED the state machine
    is run. The state machine represents the states of the firmware
    queue:

    AP_SM_STATE_RESET_START - starting point, reset (RAPQ) ap queue
    AP_SM_STATE_RESET_WAIT - reset triggered, waiting to be finished
    if irqs enabled, set up irq (AQIC)
    AP_SM_STATE_SETIRQ_WAIT - enable irq triggered, waiting to be
    finished, then go to IDLE
    AP_SM_STATE_IDLE - queue is operational but empty
    AP_SM_STATE_WORKING - queue is operational, requests are stored
    and replies may wait for getting fetched
    AP_SM_STATE_QUEUE_FULL - firmware queue is full, so only replies
    can get fetched

    For debugging each ap queue shows a sysfs attribute 'states' which
    displays the device and state machine state and is only available
    when the kernel is build with CONFIG_ZCRYPT_DEBUG enabled.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • Introduce a new internal struct zcrypt_track with an retry counter
    field and a last return code field. Fill and update these fields at
    certain points during processing of an request/reply. This tracking
    info is then used to
    - avoid trying to resend the message forever. Now each message is
    tried to be send TRACK_AGAIN_MAX (currently 10) times and then the
    ioctl returns to userspace with errno EAGAIN.
    - avoid trying to resend the message on the very same card/domain. If
    possible (more than one APQN with same quality) don't use the very
    same qid as the previous attempt when again scheduling the request.
    This is done by adding penalty weight values when the dispatching
    takes place. There is a penalty TRACK_AGAIN_CARD_WEIGHT_PENALTY for
    using the same card as previously and another penalty define
    TRACK_AGAIN_QUEUE_WEIGHT_PENALTY to be considered when the same qid
    as the previous sent attempt is calculated. Both values make it
    harder to choose the very same card/domain but not impossible. For
    example when only one APQN is available a resend can only address the
    very same APQN.

    There are some more ideas for the future to extend the use of this
    tracking information. For example the last response code at NQAP and
    DQAP could be stored there, giving the possibility to extended tracing
    and debugging about requests failing to get processed properly.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     

06 Oct, 2020

1 commit


03 Oct, 2020

5 commits

  • drivers/s390/net/ctcm_fsms.h: fsm_action_nop - only declaration left
    after commit 04885948b101 ("ctc: removal of the old ctc driver")

    drivers/s390/net/ctcm_mpc.h: ctcmpc_open - only declaration left after
    commit 293d984f0e36 ("ctcm: infrastructure for replaced ctc driver")

    Reviewed-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Vasily Gorbik
     
  • - Add/delete some blanks, white spaces and braces.
    - Fix misindentations.
    - Adjust a deprecated header include, and htons() conversion.
    - Remove extra 'return' statements.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • Replace our custom version of netdev_name().

    Once we started to allocate the netdev at probe time with
    commit d3d1b205e89f ("s390/qeth: allocate netdevice early"), this
    stopped working as intended anyway.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • The discipline struct is a fixed group of function pointers.
    So declare the L2 and L3 disciplines as constant.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • For OSA devices that are _not_ configured in prio-queue mode, give users
    the option of selecting the number of active TX queues.
    This requires setting up the HW queues with a reasonable default QoS
    value in the QIB's PQUE parm area.

    As with the other device types, we bring up the device with a minimal
    number of TX queues for compatibility reasons.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann