09 Jun, 2020

1 commit

  • Pull s390 updates from Vasily Gorbik:

    - Add support for multi-function devices in pci code.

    - Enable PF-VF linking for architectures using the pdev->no_vf_scan
    flag (currently just s390).

    - Add reipl from NVMe support.

    - Get rid of critical section cleanup in entry.S.

    - Refactor PNSO CHSC (perform network subchannel operation) in cio and
    qeth.

    - QDIO interrupts and error handling fixes and improvements, more
    refactoring changes.

    - Align ioremap() with generic code.

    - Accept requests without the prefetch bit set in vfio-ccw.

    - Enable path handling via two new regions in vfio-ccw.

    - Other small fixes and improvements all over the code.

    * tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
    vfio-ccw: make vfio_ccw_regops variables declarations static
    vfio-ccw: Add trace for CRW event
    vfio-ccw: Wire up the CRW irq and CRW region
    vfio-ccw: Introduce a new CRW region
    vfio-ccw: Refactor IRQ handlers
    vfio-ccw: Introduce a new schib region
    vfio-ccw: Refactor the unregister of the async regions
    vfio-ccw: Register a chp_event callback for vfio-ccw
    vfio-ccw: Introduce new helper functions to free/destroy regions
    vfio-ccw: document possible errors
    vfio-ccw: Enable transparent CCW IPL from DASD
    s390/pci: Log new handle in clp_disable_fh()
    s390/cio, s390/qeth: cleanup PNSO CHSC
    s390/qdio: remove q->first_to_kick
    s390/qdio: fix up qdio_start_irq() kerneldoc
    s390: remove critical section cleanup from entry.S
    s390: add machine check SIGP
    s390/pci: ioremap() align with generic code
    s390/ap: introduce new ap function ap_get_qdev()
    Documentation/s390: Update / remove developerWorks web links
    ...

    Linus Torvalds
     

06 Jun, 2020

1 commit

  • Pull SCSI updates from James Bottomley:
    :This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
    target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
    of other minor updates.

    There are no major core changes in this series apart from a
    refactoring in scsi_lib.c"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
    scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
    scsi: cxgb3i: Fix some leaks in init_act_open()
    scsi: ibmvscsi: Make some functions static
    scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
    scsi: ufs: Fix WriteBooster flush during runtime suspend
    scsi: ufs: Fix index of attributes query for WriteBooster feature
    scsi: ufs: Allow WriteBooster on UFS 2.2 devices
    scsi: ufs: Remove unnecessary memset for dev_info
    scsi: ufs-qcom: Fix scheduling while atomic issue
    scsi: mpt3sas: Fix reply queue count in non RDPQ mode
    scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
    scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
    scsi: vhost: Notify TCM about the maximum sg entries supported per command
    scsi: qla2xxx: Remove return value from qla_nvme_ls()
    scsi: qla2xxx: Remove an unused function
    scsi: iscsi: Register sysfs for iscsi workqueue
    scsi: scsi_debug: Parser tables and code interaction
    scsi: core: Refactor scsi_mq_setup_tags function
    scsi: core: Fix incorrect usage of shost_for_each_device
    scsi: qla2xxx: Fix endianness annotations in source files
    ...

    Linus Torvalds
     

04 Jun, 2020

2 commits

  • Fixes the following sparse warnings:
    drivers/s390/cio/vfio_ccw_chp.c:62:30: warning: symbol 'vfio_ccw_schib_region_ops' was not declared. Should it be static?
    drivers/s390/cio/vfio_ccw_chp.c:117:30: warning: symbol 'vfio_ccw_crw_region_ops' was not declared. Should it be static?

    Link: https://lkml.kernel.org/r/patch.git-a34be7aede18.your-ad-here.call-01591269421-ext-5655@work.hours
    Reviewed-by: Cornelia Huck
    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

03 Jun, 2020

5 commits

  • Since CRW events are (should be) rare, let's put a trace
    in that routine too.

    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Eric Farman
     
  • Use the IRQ to notify userspace that there is a CRW
    pending in the region, related to path-availability
    changes on the passthrough subchannel.

    Signed-off-by: Farhan Ali
    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Farhan Ali
     
  • This region provides a mechanism to pass a Channel Report Word
    that affect vfio-ccw devices, and needs to be passed to the guest
    for its awareness and/or processing.

    The base driver (see crw_collect_info()) provides space for two
    CRWs, as a subchannel event may have two CRWs chained together
    (one for the ssid, one for the subchannel). As vfio-ccw will
    deal with everything at the subchannel level, provide space
    for a single CRW to be transferred in one shot.

    Signed-off-by: Farhan Ali
    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    [CH: added padding to ccw_crw_region]
    Signed-off-by: Cornelia Huck

    Farhan Ali
     
  • Pull block driver updates from Jens Axboe:
    "On top of the core changes, here are the block driver changes for this
    merge window:

    - NVMe changes:
    - NVMe over Fibre Channel protocol updates, which also reach
    over to drivers/scsi/lpfc (James Smart)
    - namespace revalidation support on the target (Anthony
    Iliopoulos)
    - gcc zero length array fix (Arnd Bergmann)
    - nvmet cleanups (Chaitanya Kulkarni)
    - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
    - use a SRQ per completion vector (Max Gurtovoy)
    - fix handling of runtime changes to the queue count (Weiping
    Zhang)
    - t10 protection information support for nvme-rdma and
    nvmet-rdma (Israel Rukshin and Max Gurtovoy)
    - target side AEN improvements (Chaitanya Kulkarni)
    - various fixes and minor improvements all over, icluding the
    nvme part of the lpfc driver"

    - Floppy code cleanup series (Willy, Denis)

    - Floppy contention fix (Jiri)

    - Loop CONFIGURE support (Martijn)

    - bcache fixes/improvements (Coly, Joe, Colin)

    - q->queuedata cleanups (Christoph)

    - Get rid of ioctl_by_bdev (Christoph, Stefan)

    - md/raid5 allocation fixes (Coly)

    - zero length array fixes (Gustavo)

    - swim3 task state fix (Xu)"

    * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
    bcache: configure the asynchronous registertion to be experimental
    bcache: asynchronous devices registration
    bcache: fix refcount underflow in bcache_device_free()
    bcache: Convert pr_ uses to a more typical style
    bcache: remove redundant variables i and n
    lpfc: Fix return value in __lpfc_nvme_ls_abort
    lpfc: fix axchg pointer reference after free and double frees
    lpfc: Fix pointer checks and comments in LS receive refactoring
    nvme: set dma alignment to qword
    nvmet: cleanups the loop in nvmet_async_events_process
    nvmet: fix memory leak when removing namespaces and controllers concurrently
    nvmet-rdma: add metadata/T10-PI support
    nvmet: add metadata support for block devices
    nvmet: add metadata/T10-PI support
    nvme: add Metadata Capabilities enumerations
    nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
    nvmet: rename nvmet_rw_len to nvmet_rw_data_len
    nvmet: add metadata characteristics for a namespace
    nvme-rdma: add metadata/T10-PI support
    nvme-rdma: introduce nvme_rdma_sgl structure
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "Core block changes that have been queued up for this release:

    - Remove dead blk-throttle and blk-wbt code (Guoqing)

    - Include pid in blktrace note traces (Jan)

    - Don't spew I/O errors on wouldblock termination (me)

    - Zone append addition (Johannes, Keith, Damien)

    - IO accounting improvements (Konstantin, Christoph)

    - blk-mq hardware map update improvements (Ming)

    - Scheduler dispatch improvement (Salman)

    - Inline block encryption support (Satya)

    - Request map fixes and improvements (Weiping)

    - blk-iocost tweaks (Tejun)

    - Fix for timeout failing with error injection (Keith)

    - Queue re-run fixes (Douglas)

    - CPU hotplug improvements (Christoph)

    - Queue entry/exit improvements (Christoph)

    - Move DMA drain handling to the few drivers that use it (Christoph)

    - Partition handling cleanups (Christoph)"

    * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
    block: mark bio_wouldblock_error() bio with BIO_QUIET
    blk-wbt: rename __wbt_update_limits to wbt_update_limits
    blk-wbt: remove wbt_update_limits
    blk-throttle: remove tg_drain_bios
    blk-throttle: remove blk_throtl_drain
    null_blk: force complete for timeout request
    blk-mq: drain I/O when all CPUs in a hctx are offline
    blk-mq: add blk_mq_all_tag_iter
    blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
    blk-mq: use BLK_MQ_NO_TAG in more places
    blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
    blk-mq: move more request initialization to blk_mq_rq_ctx_init
    blk-mq: simplify the blk_mq_get_request calling convention
    blk-mq: remove the bio argument to ->prepare_request
    nvme: force complete cancelled requests
    blk-mq: blk-mq: provide forced completion method
    block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
    block: blk-crypto-fallback: remove redundant initialization of variable err
    block: reduce part_stat_lock() scope
    block: use __this_cpu_add() instead of access by smp_processor_id()
    ...

    Linus Torvalds
     

02 Jun, 2020

6 commits

  • To simplify future expansion.

    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Eric Farman
     
  • The schib region can be used by userspace to get the subchannel-
    information block (SCHIB) for the passthrough subchannel.
    This can be useful to get information such as channel path
    information via the SCHIB.PMCW fields.

    Signed-off-by: Farhan Ali
    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Farhan Ali
     
  • This is mostly for the purposes of a later patch, since
    we'll need to do the same thing later.

    While we are at it, move the resulting function call to ahead
    of the unregistering of the IOMMU notifier, so that it's done
    in the reverse order of how it was created.

    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Eric Farman
     
  • Register the chp_event callback to receive channel path related
    events for the subchannels managed by vfio-ccw.

    Signed-off-by: Farhan Ali
    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Farhan Ali
     
  • Consolidate some of the cleanup code for the regions, so that
    as more are added we reduce code duplication.

    Signed-off-by: Farhan Ali
    Signed-off-by: Eric Farman
    Reviewed-by: Cornelia Huck
    Message-Id:
    Signed-off-by: Cornelia Huck

    Farhan Ali
     
  • Remove the explicit prefetch check when using vfio-ccw devices.
    This check does not trigger in practice as all Linux channel programs
    are intended to use prefetch.

    It is expected that all ORBs issued by Linux will request prefetch.
    Although non-prefetching ORBs are not rejected, they will prefetch
    nonetheless. A warning is issued up to once per 5 seconds when a
    forced prefetch occurs.

    A non-prefetch ORB does not necessarily result in an error, however
    frequent encounters with non-prefetch ORBs indicate that channel
    programs are being executed in a way that is inconsistent with what
    the guest is requesting. While there is currently no known case of an
    error caused by forced prefetch, it is possible in theory that forced
    prefetch could result in an error if applied to a channel program that
    is dependent on non-prefetch.

    Signed-off-by: Jared Rossi
    Reviewed-by: Eric Farman
    Message-Id:
    Signed-off-by: Cornelia Huck

    Jared Rossi
     

28 May, 2020

3 commits

  • CHSC3D (PNSO - perform network subchannel operation) is used for
    OC0 (Store-network-bridging-information) as well as for
    OC3 (Store-network-address-information). So common fields are renamed
    from *brinfo* to *pnso*.
    Also *_bridge_host_* is changed into *_addr_change_*, e.g.
    qeth_bridge_host_event to qeth_addr_change_event, for the
    same reasons.
    The keywords in the card traces are changed accordingly.

    Remove unused L3 types, as PNSO will only return Layer2 entries.

    Make PNSO CHSC implementation more consistent with existing API usage:
    Add new function ccw_device_pnso() to drivers/s390/cio/device_ops.c and
    the function declaration to arch/s390/include/asm/ccwdev.h, which takes
    a struct ccw_device * as parameter instead of schid and calls
    chsc_pnso().

    PNSO CHSC has no strict relationship to qdio. So move the calling
    function from qdio to qeth_l2 and move the necessary structures to a
    new file arch/s390/include/asm/chsc.h.

    Do response code evaluation only in chsc_error_from_response() and
    use return code in all other places. qeth_anset_makerc() was meant to
    evaluate the PNSO response code, but never did, because pnso_rc was
    already non-zero.

    Indentation was corrected in some places.

    Signed-off-by: Alexandra Winter
    Reviewed-by: Peter Oberparleiter
    Reviewed-by: Vineeth Vijayan
    Reviewed-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Alexandra Winter
     
  • q->first_to_kick is obsolete, and can be replaced by q->first_to_check.

    Both cursors start off at 0. Out of the three code paths that update
    first_to_check, the qdio_inspect_queue() path is irrelevant as it
    doesn't even touch first_to_kick anymore.
    This leaves us with the two tasklet-driven code paths. Here any update
    to first_to_check is followed by a call to qdio_kick_handler(), which
    advances first_to_kick by the same amount.

    So the two cursors will differ only for a tiny moment. Drivers have no
    way of deterministically observing this difference, and thus it doesn't
    matter which of the cursors we use for reporting an error to q->handler.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • Document the actual semantics, correcting an old copy & paste mistake.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     

21 May, 2020

2 commits

  • The IBM partition parser requires device type specific information only
    available to the DASD driver to correctly register partitions. The
    current approach of using ioctl_by_bdev with a fake user space pointer
    is discouraged.

    Fix this by replacing IOCTL calls with direct in-kernel function calls.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Stefan Haberland
    Reviewed-by: Jan Hoeppner
    Reviewed-by: Peter Oberparleiter
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Stefan Haberland
     
  • Prepare for in-kernel callers of this functionality.

    Signed-off-by: Christoph Hellwig
    [sth@de.ibm.com: remove leftover kfree]
    Signed-off-by: Stefan Haberland
    Reviewed-by: Peter Oberparleiter
    Reviewed-by: Jan Hoeppner
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

20 May, 2020

7 commits

  • Provide a new interface function to be used by the ap drivers:
    struct ap_queue *ap_get_qdev(ap_qid_t qid);
    Returns ptr to the struct ap_queue device or NULL if there
    was no ap_queue device with this qid found. When something is
    found, the reference count of the embedded device is increased.
    So the caller has to decrease the reference count after use
    with a call to put_device(&aq->ap_dev.device).

    With this patch also the ap_card_list is removed from the
    ap core code and a new hashtable is introduced which stores
    hnodes of all the ap queues known to the ap bus.

    The hashtable approach and a first implementation of this
    interface comes from a previous patch from
    Anthony Krowiak and an idea from Halil Pasic.

    Signed-off-by: Harald Freudenberger
    Suggested-by: Tony Krowiak
    Suggested-by: Halil Pasic
    Reviewed-by: Tony Krowiak
    Signed-off-by: Vasily Gorbik

    Harald Freudenberger
     
  • SBALs in PRIMED or ERROR state represent new work on the Input Queue.
    But while inbound_primed() does all sorts of ACK management for new
    PRIMED work, the same handling is currently missing for ERROR work.
    In particular the path for ERROR work doesn't clear up _old_ ACKs.

    Treat ERROR work the same as PRIMED work, but consider that the QEBSM
    auto-ACK feature doesn't apply here. So we need to set the ACK manually,
    as if it was a non-QEBSM device.

    Note that this doesn't aspire to actually improve performance, the main
    goal is to just unify the code paths and have consistent behaviour.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • inbound_primed() currently has two code paths - one for QEBSM that knows
    how to deal with multiple ACKs, and a non-QEBSM path that strictly
    assumes a single ACK on the queue.

    In preparation for a subsequent patch, slightly adjust the non-QEBSM
    path so that it can manage a queue with multiple ACKs.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • Refilling the Input Queue requires additional checks, as the refilled
    SBALs can overlap with the ACKs that qdio maintains on the queue.

    This code path is way too complex, and does a whole bunch of wrap-around
    checks that the modulo arithmetic in sub_buf() takes care of by itself.
    So shrink down all that code into a few lines of equivalent
    functionality.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • commit 8ebd51a705c5 ("s390/cio: idset.c: remove some unused functions")
    left behind this, remove it

    Link: https://lkml.kernel.org/r/20200508140643.30540-1-yuehaibing@huawei.com
    Signed-off-by: YueHaibing
    Reviewed-by: Vineeth Vijayan
    [vneethv@linux.ibm.com: Slight modification in the title]
    Signed-off-by: Vineeth Vijayan
    Signed-off-by: Vasily Gorbik

    YueHaibing
     
  • Commit 394216275c7d ("s390: remove broken hibernate / power management support")
    removed support for ARCH_HIBERNATION_POSSIBLE on s390.
    So drop the unused pm ops from the iucv drivers.

    CC: Hendrik Brueckner
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • commit 5e1fb45ec8e2 ("s390/ccwgroup: remove pm support") removed power
    management support from the ccwgroup bus driver. So remove the
    associated callbacks from all ccwgroup drivers.

    CC: Vineeth Vijayan
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     

16 May, 2020

1 commit


14 May, 2020

1 commit

  • Fix to return negative error code -ENOMEM from the smcd_alloc_dev()
    error handling case instead of 0, as done elsewhere in this function.

    Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory")
    Reported-by: Hulk Robot
    Signed-off-by: Wei Yongjun
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

12 May, 2020

8 commits

  • At the moment we allocate and register the Scsi_Host object corresponding
    to a zfcp adapter (FCP device) very early in the life cycle of the adapter
    - even before we fully discover and initialize the underlying
    firmware/hardware. This had the advantage that we could already use the
    Scsi_Host object, and fill in all its information during said discover and
    initialize.

    Due to commit 737eb78e82d5 ("block: Delay default elevator initialization")
    (first released in v5.4), we noticed a regression that would prevent us
    from using any storage volume if zfcp is configured with support for DIF or
    DIX (zfcp.dif=1 || zfcp.dix=1). Doing so would result in an illegal memory
    access as soon as the first request is sent with such an configuration. As
    example for a crash resulting from this:

    scsi host0: scsi_eh_0: sleeping
    scsi host0: zfcp
    qdio: 0.0.1900 ZFCP on SC 4bd using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
    scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
    Unable to handle kernel pointer dereference in virtual kernel address space
    Failing address: 0000000000000000 TEID: 0000000000000483
    Fault in home space mode while using kernel ASCE.
    AS:0000000035c7c007 R3:00000001effcc007 S:00000001effd1000 P:000000000000003d
    Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: ...
    CPU: 1 PID: 783 Comm: kworker/u760:5 Kdump: loaded Not tainted 5.6.0-rc2-bb-next+ #1
    Hardware name: ...
    Workqueue: scsi_wq_0 fc_scsi_scan_rport [scsi_transport_fc]
    Krnl PSW : 0704e00180000000 000003ff801fcdae (scsi_queue_rq+0x436/0x740 [scsi_mod])
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
    Krnl GPRS: 0fffffffffffffff 0000000000000000 0000000187150120 0000000000000000
    000003ff80223d20 000000000000018e 000000018adc6400 0000000187711000
    000003e0062337e8 00000001ae719000 0000000187711000 0000000187150000
    00000001ab808100 0000000187150120 000003ff801fcd74 000003e0062336a0
    Krnl Code: 000003ff801fcd9e: e310a35c0012 lt %r1,860(%r10)
    000003ff801fcda4: a7840010 brc 8,000003ff801fcdc4
    #000003ff801fcda8: e310b2900004 lg %r1,656(%r11)
    >000003ff801fcdae: d71710001000 xc 0(24,%r1),0(%r1)
    000003ff801fcdb4: e310b2900004 lg %r1,656(%r11)
    000003ff801fcdba: 41201018 la %r2,24(%r1)
    000003ff801fcdbe: e32010000024 stg %r2,0(%r1)
    000003ff801fcdc4: b904002b lgr %r2,%r11
    Call Trace:
    [] scsi_queue_rq+0x436/0x740 [scsi_mod]
    ([] scsi_queue_rq+0x3fc/0x740 [scsi_mod])
    [] blk_mq_dispatch_rq_list+0x390/0x680
    [] blk_mq_sched_dispatch_requests+0x196/0x1a8
    [] __blk_mq_run_hw_queue+0x144/0x160
    [] __blk_mq_delay_run_hw_queue+0x96/0x228
    [] blk_mq_run_hw_queue+0xd2/0xe0
    [] blk_mq_sched_insert_request+0x192/0x1d8
    [] blk_execute_rq_nowait+0x80/0x90
    [] blk_execute_rq+0x6e/0xb0
    [] __scsi_execute+0xe2/0x1f0 [scsi_mod]
    [] scsi_probe_and_add_lun+0x358/0x840 [scsi_mod]
    [] __scsi_scan_target+0xc4/0x228 [scsi_mod]
    [] scsi_scan_target+0xd4/0x100 [scsi_mod]
    [] fc_scsi_scan_rport+0x96/0xc0 [scsi_transport_fc]
    [] process_one_work+0x458/0x7d0
    [] worker_thread+0x242/0x448
    [] kthread+0x15c/0x170
    [] ret_from_fork+0x30/0x38
    INFO: lockdep is turned off.
    Last Breaking-Event-Address:
    [] scsi_add_cmd_to_list+0x9e/0xa8 [scsi_mod]
    Kernel panic - not syncing: Fatal exception: panic_on_oops

    While this issue is exposed by the commit named above, this is only by
    accident. The real issue exists for longer already - basically since it's
    possible to use blk-mq via scsi-mq, and blk-mq pre-allocates all requests
    for a tag-set during initialization of the same. For a given Scsi_Host
    object this is done when adding the object to the midlayer
    (`scsi_add_host()` and such). In `scsi_mq_setup_tags()` the midlayer
    calculates how much memory is required for a single scsi_cmnd, and its
    additional data, which also might include space for additional protection
    data - depending on whether the Scsi_Host has any form of protection
    capabilities (`scsi_host_get_prot()`).

    The problem is now thus, because zfcp does this step before we actually
    know whether the firmware/hardware has these capabilities, we don't set any
    protection capabilities in the Scsi_Host object. And so, no space is
    allocated for additional protection data for requests in the Scsi_Host
    tag-set.

    Once we go through discover and initialize the FCP device firmware/hardware
    fully (this is done via the firmware commands "Exchange Config Data" and
    "Exchange Port Data") we find out whether it actually supports DIF and DIX,
    and we set the corresponding capabilities in the Scsi_Host object (in
    `zfcp_scsi_set_prot()`). Now the Scsi_Host potentially has protection
    capabilities, but the already allocated requests in the tag-set don't have
    any space allocated for that.

    When we then trigger target scanning or add scsi_devices manually, the
    midlayer will use requests from that tag-set, and before sending most
    requests, it will also call `scsi_mq_prep_fn()`. To prepare the scsi_cmnd
    this function will check again whether the used Scsi_Host has any
    protection capabilities - and now it potentially has - and if so, it will
    try to initialize the assumed to be preallocated structures and thus it
    causes the crash, like shown above.

    Before delaying the default elevator initialization with the commit named
    above, we always would also allocate an elevator for any scsi_device before
    ever sending any requests - in contrast to now, where we do it after
    device-probing. That elevator in turn would have its own tag-set, and that
    is initialized after we went through discovery and initialization of the
    underlying firmware/hardware. So requests from that tag-set can be
    allocated properly, and if used - unless the user changes/disabled the
    default elevator - this would hide the underlying issue.

    To fix this for any configuration - with or without an elevator - we move
    the allocation and registration of the Scsi_Host object for a given FCP
    device to after the first complete discovery and initialization of the
    underlying firmware/hardware. By doing that we can make all basic
    properties of the Scsi_Host known to the midlayer by the time we call
    `scsi_add_host()`, including whether we have any protection capabilities.

    To do that we have to delay all the accesses that we would have done in the
    past during discovery and initialization, and do them instead once we are
    finished with it. The previous patches ramp up to this by fencing and
    factoring out all these accesses, and make it possible to re-do them later
    on. In addition we make also use of the diagnostic buffers we recently
    added with

    commit 92953c6e0aa7 ("scsi: zfcp: signal incomplete or error for sync exchange config/port data")
    commit 7e418833e689 ("scsi: zfcp: diagnostics buffer caching and use for exchange port data")
    commit 088210233e6f ("scsi: zfcp: add diagnostics buffer for exchange config data")

    (first released in v5.5), because these already cache all the information
    we need for that "re-do operation" - the information cached are always
    updated during xconf or xport data, so it won't be stale.

    In addition to the move and re-do, this patch also updates the
    function-documentation of `zfcp_scsi_adapter_register()` and changes how it
    reports if a Scsi_Host object already exists. In that case future
    recovery-operations can skip this step completely and behave much like they
    would do in the past - zfcp does not release a once allocated Scsi_Host
    object unless the corresponding FCP device is deconstructed completely.

    Link: https://lore.kernel.org/r/030dd6da318bbb529f0b5268ec65cebcd20fc0a3.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When setting an adapter online for the first time, we also create a couple
    of entries for it in the sysfs device tree. This is also true even if the
    adapter has not yet ever gone successfully through exchange config and
    exchange port data.

    When moving the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this make the `port_rescan`
    attribute susceptible to invalid pointer-dereferences of the shost field
    before the adapter is fully initialized.

    When written to, it schedules a `scan_work` item that will in turn make use
    of the associated fibre channel host object to check the topology used for
    this FCP device.

    Because scanning for remote ports can't be done successfully without
    completing exchange config and exchange port data first, we can simply
    fence `port_rescan`, and so prevent the illegal access.

    As with cases where we can't get a reference to the adapter, we also return
    -ENODEV here. Applications need to handle that errno today already.

    After a successful allocation of the scsi host object nothing changes in
    the work flow.

    Link: https://lore.kernel.org/r/ef65366d309993ca91b6917727590ca7ca166c8f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • Common status flags that all main objects - adapter, port, and unit -
    support are propagated to sub-objects when set or cleared. For instance,
    when setting the status ZFCP_STATUS_COMMON_ERP_INUSE for an adapter object,
    we will propagate this to all its child ports and units - same for when
    clearing a common status flag.

    Units of an adapter object are enumerated via __shost_for_each_device()
    over the scsi host object of the corresponding adapter.

    Once we move the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this won't be possible for
    cases where we set or clear common statuses during the very first adapter
    recovery.

    But since we won't have any port or unit objects yet at that point of time,
    we can just fence the status propagation for cases where the scsi host
    object is not yet set in the adapter object. It won't change any effective
    status propagations, but will prevent us from dereferencing invalid
    pointers.

    For any later point in the work flow the scsi host object will be set and
    thus nothing is changed then.

    Link: https://lore.kernel.org/r/f51fe5f236a1e3d1ce53379c308777561bfe35e1.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When doing the very first adapter recovery - initialization - for a FCP
    device in a point-to-point topology we also allocate the port object
    corresponding to the attached remote port, and trigger a port recovery for
    it that will run after the adapter recovery finished.

    Right now this happens right after we finished with the exchange config
    data command, and uses the fibre channel host object corresponding to the
    FCP device to determine whether a point-to-point topology is used.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this use of the fc_host object is not
    possible anymore at that point in the work flow.

    But the allocation and recovery trigger doesn't have notable side-effects
    on the following exchange port data processing, so we can move those to
    after xport data, and thus also to after the scsi host object allocation,
    once we move it. Then the fc_host object can be used again, like it is now.

    For any further adapter recoveries this doesn't change anything, because at
    that point the port object already exists and recovery is triggered
    elsewhere for existing port objects.

    Link: https://lore.kernel.org/r/73e5d4ac21e2b37bf0c3ca8e530bc5a5c6e74f8f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When receiving a notification that a FCP device lost its local link we
    usually update the fibre channel host object which represents that FCP
    device to reflect that.

    This notification/information can also surface when the FCP device is
    running through adapter recovery (exchange config and exchange port data
    return incomplete).

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, and this happens during the very first
    adapter recovery, these updates can not be done until after the scsi host
    object is allocated.

    Reorder the fc_host updates in zfcp_fsf_fc_host_link_down() so that they
    only happen after a check of whether the scsi host object is already
    allocated or not.

    During the first adapter recovery this will cause the skip of these updates
    if a link-down condition is detected, but we can repeat them after we
    allocated the scsi host object, if necessary.

    For any further link-down handling the only changes in the work flow are
    the slightly reordered assignments in zfcp_fsf_fc_host_link_down().

    Link: https://lore.kernel.org/r/f841f2cda61dcd7b8549910c44e1831927459edf.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When executing exchange port data for a FCP device for the first time, or
    after an adapter recovery, we update several properties of the fibre
    channel host object which represents that FCP device.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this is not possible for the former case.

    Move all these update into separate, and fenced function that first checks
    whether the scsi host object already exists or not, before making the
    updates.

    During the first ever exchange port data in the adapter life cycle this
    will make the exchange port data handler skip over this update step, but we
    can repeat it later, after we allocated the scsi host object.

    For any further recovery of that adapter the work flow is only changed
    slightly because then the scsi host object already exists and we don't free
    it until we release the adapter completely at the end of its life cycle.

    Link: https://lore.kernel.org/r/ae454c2dc6da0b02907c489af91d0b211d331825.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When executing exchange config data for a FCP device for the first time, or
    after an adapter recovery, we update several properties of the scsi host or
    fibre channel host object that represent that FCP device.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this is not possible for the former case.

    Move all these update into separate, and fenced function that first checks
    whether the scsi host object already exists or not, before making the
    updates.

    During the first ever exchange config data in the adapter life cycle this
    will make the exchange config data handler skip over this update step, but
    we can repeat it later, after we allocated the scsi host object.

    For any further recovery of that adapter the work flow is only changed
    slightly because then the scsi host object already exists and we don't free
    it until we release the adapter completely at the end of its life cycle.

    Link: https://lore.kernel.org/r/5fc3f4d38d4334f7aa595497c6f7865fb1102e0f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When establishing and activating the QDIO queue pair for a FCP device for
    the first time, or after an adapter recovery, we publish some of its
    characteristics to the scsi host object representing that FCP device.

    When moving the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this is not possible for the
    former case - QDIO open for the first time - because that happens before
    exchange config and exchange port data.

    Move the scsi host object update into a fenced function that checks whether
    the object already exists or not. This way we can repeat that step later,
    once we are past the allocation.

    Once the first recovery succeeds we don't release the scsi host object
    anymore, so further recoveries do work as before.

    Link: https://lore.kernel.org/r/a214ebf508f71e3690113e3e90edab1cea0e24e3.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     

07 May, 2020

3 commits