17 Oct, 2020

1 commit

  • Pull s390 updates from Vasily Gorbik:

    - Remove address space overrides using set_fs()

    - Convert to generic vDSO

    - Convert to generic page table dumper

    - Add ARCH_HAS_DEBUG_WX support

    - Add leap seconds handling support

    - Add NVMe firmware-assisted kernel dump support

    - Extend NVMe boot support with memory clearing control and addition of
    kernel parameters

    - AP bus and zcrypt api code rework. Add adapter configure/deconfigure
    interface. Extend debug features. Add failure injection support

    - Add ECC secure private keys support

    - Add KASan support for running protected virtualization host with
    4-level paging

    - Utilize destroy page ultravisor call to speed up secure guests
    shutdown

    - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

    - Various checksum improvements

    - Other small various fixes and improvements all over the code

    * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
    s390/uaccess: fix indentation
    s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
    s390/zcrypt: fix wrong format specifications
    s390/kprobes: move insn_page to text segment
    s390/sie: fix typo in SIGP code description
    s390/lib: fix kernel doc for memcmp()
    s390/zcrypt: Introduce Failure Injection feature
    s390/zcrypt: move ap_msg param one level up the call chain
    s390/ap/zcrypt: revisit ap and zcrypt error handling
    s390/ap: Support AP card SCLP config and deconfig operations
    s390/sclp: Add support for SCLP AP adapter config/deconfig
    s390/ap: add card/queue deconfig state
    s390/ap: add error response code field for ap queue devices
    s390/ap: split ap queue state machine state from device state
    s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
    s390/zcrypt: introduce msg tracking in zcrypt functions
    s390/startup: correct early pgm check info formatting
    s390: remove orphaned extern variables declarations
    s390/kasan: make sure int handler always run with DAT on
    s390/ipl: add support to control memory clearing for nvme re-IPL
    ...

    Linus Torvalds
     

16 Sep, 2020

2 commits

  • While reviewing commit 936e6b85da04 ("scsi: zfcp: Fix panic on ERP timeout
    for previously dismissed ERP action"), I stumbled over
    zfcp_fsf_req_complete() and wondered whether it has similar issues wrt
    concurrent modification of req->erp_action by
    zfcp_erp_strategy_check_fsfreq().

    But a closer look shows that both its two callers [zfcp_fsf_reqid_check(),
    zfcp_fsf_req_dismiss_all()] remove the request from the adapter's req_list
    under the req_list's lock. Hence we can trust that if
    zfcp_erp_strategy_check_fsfreq() concurrently looks up the corresponding
    req_id, it won't find this request and is thus unable to modify it while
    it's being processed by zfcp_fsf_req_complete().

    Add a code comment that hopefully makes this easier for future readers, and
    condense the two accesses to ->erp_action that made me trip over this code
    path in the first place.

    Link: https://lore.kernel.org/r/c500eac301fcbba5af942bbd200f2d6b14e46994.1599765652.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Reviewed-by: Benjamin Block
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     
  • Use the right helper to avoid poking around in the list's internals.

    Link: https://lore.kernel.org/r/ed669555c73aab95b29444c10066f492c0c43391.1599765652.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Reviewed-by: Benjamin Block
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     

14 Sep, 2020

1 commit


18 Aug, 2020

1 commit

  • Before v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
    timer_setup()"), we intentionally only passed zfcp_adapter as context
    argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
    adapter recovery, it was unnecessary to sync against races between timeout
    and (late) completion. Likewise, we only passed zfcp_erp_action as context
    argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
    it was unnecessary to sync against races between timeout and (late)
    completion.

    Meanwhile the timeout handlers get timer_list as context argument and do a
    timer-specific container-of to zfcp_fsf_req which can have been freed.

    Fix it by making sure that any request timeout handlers, that might just
    have started before del_timer(), are completed by using del_timer_sync()
    instead. This ensures the request free happens afterwards.

    Space time diagram of potential use-after-free:

    Basic idea is to have 2 or more pending requests whose timeouts run out at
    almost the same time.

    req 1 timeout ERP thread req 2 timeout
    ---------------- ---------------- ---------------------------------------
    zfcp_fsf_request_timeout_handler
    fsf_req = from_timer(fsf_req, t, timer)
    adapter = fsf_req->adapter
    zfcp_qdio_siosl(adapter)
    zfcp_erp_adapter_reopen(adapter,...)
    zfcp_erp_strategy
    ...
    zfcp_fsf_req_dismiss_all
    list_for_each_entry_safe
    zfcp_fsf_req_complete 1
    del_timer 1
    zfcp_fsf_req_free 1
    zfcp_fsf_req_complete 2
    zfcp_fsf_request_timeout_handler
    del_timer 2
    fsf_req = from_timer(fsf_req, t, timer)
    zfcp_fsf_req_free 2
    adapter = fsf_req->adapter
    ^^^^^^^ already freed

    Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com
    Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
    Cc: #4.15+
    Suggested-by: Julian Wiedmann
    Reviewed-by: Julian Wiedmann
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     

07 Aug, 2020

1 commit

  • Pull SCSI updates from James Bottomley:
    "This consists of the usual driver updates (ufs, qla2xxx, tcmu, lpfc,
    hpsa, zfcp, scsi_debug) and minor bug fixes.

    We also have a huge docbook fix update like most other subsystems and
    no major update to the core (the few non trivial updates are either
    minor fixes or removing an unused feature [scsi_sdb_cache])"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (307 commits)
    scsi: scsi_transport_srp: Sanitize scsi_target_block/unblock sequences
    scsi: ufs-mediatek: Apply DELAY_AFTER_LPM quirk to Micron devices
    scsi: ufs: Introduce device quirk "DELAY_AFTER_LPM"
    scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged
    scsi: scsi_debug: Implement tur_ms_to_ready parameter
    scsi: scsi_debug: Fix request sense
    scsi: lpfc: Fix typo in comment for ULP
    scsi: ufs-mediatek: Prevent LPM operation on undeclared VCC
    scsi: iscsi: Do not put host in iscsi_set_flashnode_param()
    scsi: hpsa: Correct ctrl queue depth
    scsi: target: tcmu: Make TMR notification optional
    scsi: target: tcmu: Implement tmr_notify callback
    scsi: target: tcmu: Fix and simplify timeout handling
    scsi: target: tcmu: Factor out new helper ring_insert_padding
    scsi: target: tcmu: Do not queue aborted commands
    scsi: target: tcmu: Use priv pointer in se_cmd
    scsi: target: Add tmr_notify backend function
    scsi: target: Modify core_tmr_abort_task()
    scsi: target: iscsi: Fix inconsistent debug message
    scsi: target: iscsi: Fix login error when receiving
    ...

    Linus Torvalds
     

08 Jul, 2020

5 commits

  • zfcp_qdio_send() and zfcp_qdio_int_req() run concurrently, adding and
    completing SBALs on the Request Queue. There's a theoretical race where
    zfcp_qdio_int_req() completes a number of SBALs & increments the queue's
    free-level _before_ zfcp_qdio_send() was able to decrement it.

    This can cause ->req_q_free to momentarily hold a value larger than
    QDIO_MAX_BUFFERS_PER_Q. Luckily zfcp_qdio_send() is always called under
    ->req_q_lock, and all readers of the free-level also take this lock. So we
    can trust that zfcp_qdio_send() will clean up such a temporary overflow
    before anyone can actually observe it.

    But it's still confusing and annoying to worry about. So adjust the code to
    avoid this race.

    Link: https://lore.kernel.org/r/7f61f59a1f8db270312e64644f9173b8f1ac895f.1593780621.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     
  • Instead of manually moving each element of the unit and port lists into our
    temporary on-stack lists, splice them over in one go.

    Link: https://lore.kernel.org/r/cacb179f49ece50fd4dce119c61252d632cdc1d4.1593780621.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     
  • We already maintain a pointer to act->adapter. Use it consistently to avoid
    any confusion about whose ->erp_ready_head and ->erp_ready_wq we are
    accessing.

    Link: https://lore.kernel.org/r/d1bb04322f240dee32f4c4a551bc93bc736f4b01.1593780621.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     
  • zfcp no longer uses the qdio PCI flag, update the comment.

    Link: https://lore.kernel.org/r/6717c26fc986bff8776d110e27c199b523684c63.1593780621.git.bblock@linux.ibm.com
    Fixes: 21ddaa53f92d ("[SCSI] zfcp: Remove PCI flag")
    Reviewed-by: Steffen Maier
    Reviewed-by: Fedor Loshakov
    Signed-off-by: Julian Wiedmann
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Julian Wiedmann
     
  • We don't need crypto-grade random numbers for randomized backoffs. Instead
    use prandom_u32_max(ep_ro) which generates a pseudo-random number uniformly
    distributed in the interval [0, ep_ro).

    Link: https://lore.kernel.org/r/8fc7c4c4069ff1783f4a9ccd84a923f581a09ec5.1593780621.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: George Spelvin
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    George Spelvin
     

24 Jun, 2020

1 commit

  • Suppose that, for unrelated reasons, FSF requests on behalf of recovery are
    very slow and can run into the ERP timeout.

    In the case at hand, we did adapter recovery to a large degree. However
    due to the slowness a LUN open is pending so the corresponding fc_rport
    remains blocked. After fast_io_fail_tmo we trigger close physical port
    recovery for the port under which the LUN should have been opened. The new
    higher order port recovery dismisses the pending LUN open ERP action and
    dismisses the pending LUN open FSF request. Such dismissal decouples the
    ERP action from the pending corresponding FSF request by setting
    zfcp_fsf_req->erp_action to NULL (among other things)
    [zfcp_erp_strategy_check_fsfreq()].

    If now the ERP timeout for the pending open LUN request runs out, we must
    not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a
    problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
    timer_setup()"). Before that we intentionally only passed zfcp_erp_action
    as context argument to zfcp_erp_timeout_handler().

    Note: The lifetime of the corresponding zfcp_fsf_req object continues until
    a (late) response or an (unrelated) adapter recovery.

    Just like the regular response path ignores dismissed requests
    [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the
    ERP timeout handler now needs to ignore dismissed requests. So simply
    return early in the ERP timeout handler if the FSF request is marked as
    dismissed in its status flags. To protect against the race where
    zfcp_erp_strategy_check_fsfreq() dismisses and sets
    zfcp_fsf_req->erp_action to NULL after our previous status flag check,
    return early if zfcp_fsf_req->erp_action is NULL. After all, the former
    ERP action does not need to be woken up as that was already done as part of
    the dismissal above [zfcp_erp_action_dismiss()].

    This fixes the following panic due to kernel page fault in IRQ context:

    Unable to handle kernel pointer dereference in virtual kernel address space
    Failing address: 0000000000000000 TEID: 0000000000000483
    Fault in home space mode while using kernel ASCE.
    AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d
    Oops: 0004 ilc:2 [#1] SMP
    Modules linked in: ...
    CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ...
    Hardware name: IBM 8561 T01 701 (LPAR)
    Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp])
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000
    000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02
    0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000
    0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8
    Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0
    001fffff80549bd8: 4120a0f0 la %r2,240(%r10)
    #001fffff80549bdc: a53e0003 llilh %r3,3
    >001fffff80549be0: ba132000 cs %r1,%r3,0(%r2)
    001fffff80549be4: a7740037 brc 7,1fffff80549c52
    001fffff80549be8: e320b0180004 lg %r2,24(%r11)
    001fffff80549bee: e31020e00004 lg %r1,224(%r2)
    001fffff80549bf4: 412020e0 la %r2,224(%r2)
    Call Trace:
    [] zfcp_erp_notify+0x40/0xc0 [zfcp]
    [] call_timer_fn+0x38/0x190
    [] expire_timers+0xfc/0x190
    [] run_timer_softirq+0xec/0x218
    [] __do_softirq+0x144/0x398
    [] do_softirq_own_stack+0x72/0x88
    [] irq_exit+0xb0/0xb8
    [] do_IRQ+0x82/0xb0
    [] ext_int_handler+0x128/0x12c
    [] clear_subpage.constprop.13+0x38/0x60
    ([] clear_huge_page+0xec/0x250)
    [] do_huge_pmd_anonymous_page+0x32a/0x768
    [] __handle_mm_fault+0x88a/0x900
    [] handle_mm_fault+0xd8/0x1b0
    [] do_dat_exception+0x136/0x3e8
    [] pgm_check_handler+0x1c8/0x220
    Last Breaking-Event-Address:
    [] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
    Kernel panic - not syncing: Fatal exception in interrupt

    Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
    Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
    Cc: #4.15+
    Reviewed-by: Julian Wiedmann
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     

12 May, 2020

8 commits

  • At the moment we allocate and register the Scsi_Host object corresponding
    to a zfcp adapter (FCP device) very early in the life cycle of the adapter
    - even before we fully discover and initialize the underlying
    firmware/hardware. This had the advantage that we could already use the
    Scsi_Host object, and fill in all its information during said discover and
    initialize.

    Due to commit 737eb78e82d5 ("block: Delay default elevator initialization")
    (first released in v5.4), we noticed a regression that would prevent us
    from using any storage volume if zfcp is configured with support for DIF or
    DIX (zfcp.dif=1 || zfcp.dix=1). Doing so would result in an illegal memory
    access as soon as the first request is sent with such an configuration. As
    example for a crash resulting from this:

    scsi host0: scsi_eh_0: sleeping
    scsi host0: zfcp
    qdio: 0.0.1900 ZFCP on SC 4bd using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
    scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
    Unable to handle kernel pointer dereference in virtual kernel address space
    Failing address: 0000000000000000 TEID: 0000000000000483
    Fault in home space mode while using kernel ASCE.
    AS:0000000035c7c007 R3:00000001effcc007 S:00000001effd1000 P:000000000000003d
    Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: ...
    CPU: 1 PID: 783 Comm: kworker/u760:5 Kdump: loaded Not tainted 5.6.0-rc2-bb-next+ #1
    Hardware name: ...
    Workqueue: scsi_wq_0 fc_scsi_scan_rport [scsi_transport_fc]
    Krnl PSW : 0704e00180000000 000003ff801fcdae (scsi_queue_rq+0x436/0x740 [scsi_mod])
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
    Krnl GPRS: 0fffffffffffffff 0000000000000000 0000000187150120 0000000000000000
    000003ff80223d20 000000000000018e 000000018adc6400 0000000187711000
    000003e0062337e8 00000001ae719000 0000000187711000 0000000187150000
    00000001ab808100 0000000187150120 000003ff801fcd74 000003e0062336a0
    Krnl Code: 000003ff801fcd9e: e310a35c0012 lt %r1,860(%r10)
    000003ff801fcda4: a7840010 brc 8,000003ff801fcdc4
    #000003ff801fcda8: e310b2900004 lg %r1,656(%r11)
    >000003ff801fcdae: d71710001000 xc 0(24,%r1),0(%r1)
    000003ff801fcdb4: e310b2900004 lg %r1,656(%r11)
    000003ff801fcdba: 41201018 la %r2,24(%r1)
    000003ff801fcdbe: e32010000024 stg %r2,0(%r1)
    000003ff801fcdc4: b904002b lgr %r2,%r11
    Call Trace:
    [] scsi_queue_rq+0x436/0x740 [scsi_mod]
    ([] scsi_queue_rq+0x3fc/0x740 [scsi_mod])
    [] blk_mq_dispatch_rq_list+0x390/0x680
    [] blk_mq_sched_dispatch_requests+0x196/0x1a8
    [] __blk_mq_run_hw_queue+0x144/0x160
    [] __blk_mq_delay_run_hw_queue+0x96/0x228
    [] blk_mq_run_hw_queue+0xd2/0xe0
    [] blk_mq_sched_insert_request+0x192/0x1d8
    [] blk_execute_rq_nowait+0x80/0x90
    [] blk_execute_rq+0x6e/0xb0
    [] __scsi_execute+0xe2/0x1f0 [scsi_mod]
    [] scsi_probe_and_add_lun+0x358/0x840 [scsi_mod]
    [] __scsi_scan_target+0xc4/0x228 [scsi_mod]
    [] scsi_scan_target+0xd4/0x100 [scsi_mod]
    [] fc_scsi_scan_rport+0x96/0xc0 [scsi_transport_fc]
    [] process_one_work+0x458/0x7d0
    [] worker_thread+0x242/0x448
    [] kthread+0x15c/0x170
    [] ret_from_fork+0x30/0x38
    INFO: lockdep is turned off.
    Last Breaking-Event-Address:
    [] scsi_add_cmd_to_list+0x9e/0xa8 [scsi_mod]
    Kernel panic - not syncing: Fatal exception: panic_on_oops

    While this issue is exposed by the commit named above, this is only by
    accident. The real issue exists for longer already - basically since it's
    possible to use blk-mq via scsi-mq, and blk-mq pre-allocates all requests
    for a tag-set during initialization of the same. For a given Scsi_Host
    object this is done when adding the object to the midlayer
    (`scsi_add_host()` and such). In `scsi_mq_setup_tags()` the midlayer
    calculates how much memory is required for a single scsi_cmnd, and its
    additional data, which also might include space for additional protection
    data - depending on whether the Scsi_Host has any form of protection
    capabilities (`scsi_host_get_prot()`).

    The problem is now thus, because zfcp does this step before we actually
    know whether the firmware/hardware has these capabilities, we don't set any
    protection capabilities in the Scsi_Host object. And so, no space is
    allocated for additional protection data for requests in the Scsi_Host
    tag-set.

    Once we go through discover and initialize the FCP device firmware/hardware
    fully (this is done via the firmware commands "Exchange Config Data" and
    "Exchange Port Data") we find out whether it actually supports DIF and DIX,
    and we set the corresponding capabilities in the Scsi_Host object (in
    `zfcp_scsi_set_prot()`). Now the Scsi_Host potentially has protection
    capabilities, but the already allocated requests in the tag-set don't have
    any space allocated for that.

    When we then trigger target scanning or add scsi_devices manually, the
    midlayer will use requests from that tag-set, and before sending most
    requests, it will also call `scsi_mq_prep_fn()`. To prepare the scsi_cmnd
    this function will check again whether the used Scsi_Host has any
    protection capabilities - and now it potentially has - and if so, it will
    try to initialize the assumed to be preallocated structures and thus it
    causes the crash, like shown above.

    Before delaying the default elevator initialization with the commit named
    above, we always would also allocate an elevator for any scsi_device before
    ever sending any requests - in contrast to now, where we do it after
    device-probing. That elevator in turn would have its own tag-set, and that
    is initialized after we went through discovery and initialization of the
    underlying firmware/hardware. So requests from that tag-set can be
    allocated properly, and if used - unless the user changes/disabled the
    default elevator - this would hide the underlying issue.

    To fix this for any configuration - with or without an elevator - we move
    the allocation and registration of the Scsi_Host object for a given FCP
    device to after the first complete discovery and initialization of the
    underlying firmware/hardware. By doing that we can make all basic
    properties of the Scsi_Host known to the midlayer by the time we call
    `scsi_add_host()`, including whether we have any protection capabilities.

    To do that we have to delay all the accesses that we would have done in the
    past during discovery and initialization, and do them instead once we are
    finished with it. The previous patches ramp up to this by fencing and
    factoring out all these accesses, and make it possible to re-do them later
    on. In addition we make also use of the diagnostic buffers we recently
    added with

    commit 92953c6e0aa7 ("scsi: zfcp: signal incomplete or error for sync exchange config/port data")
    commit 7e418833e689 ("scsi: zfcp: diagnostics buffer caching and use for exchange port data")
    commit 088210233e6f ("scsi: zfcp: add diagnostics buffer for exchange config data")

    (first released in v5.5), because these already cache all the information
    we need for that "re-do operation" - the information cached are always
    updated during xconf or xport data, so it won't be stale.

    In addition to the move and re-do, this patch also updates the
    function-documentation of `zfcp_scsi_adapter_register()` and changes how it
    reports if a Scsi_Host object already exists. In that case future
    recovery-operations can skip this step completely and behave much like they
    would do in the past - zfcp does not release a once allocated Scsi_Host
    object unless the corresponding FCP device is deconstructed completely.

    Link: https://lore.kernel.org/r/030dd6da318bbb529f0b5268ec65cebcd20fc0a3.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When setting an adapter online for the first time, we also create a couple
    of entries for it in the sysfs device tree. This is also true even if the
    adapter has not yet ever gone successfully through exchange config and
    exchange port data.

    When moving the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this make the `port_rescan`
    attribute susceptible to invalid pointer-dereferences of the shost field
    before the adapter is fully initialized.

    When written to, it schedules a `scan_work` item that will in turn make use
    of the associated fibre channel host object to check the topology used for
    this FCP device.

    Because scanning for remote ports can't be done successfully without
    completing exchange config and exchange port data first, we can simply
    fence `port_rescan`, and so prevent the illegal access.

    As with cases where we can't get a reference to the adapter, we also return
    -ENODEV here. Applications need to handle that errno today already.

    After a successful allocation of the scsi host object nothing changes in
    the work flow.

    Link: https://lore.kernel.org/r/ef65366d309993ca91b6917727590ca7ca166c8f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • Common status flags that all main objects - adapter, port, and unit -
    support are propagated to sub-objects when set or cleared. For instance,
    when setting the status ZFCP_STATUS_COMMON_ERP_INUSE for an adapter object,
    we will propagate this to all its child ports and units - same for when
    clearing a common status flag.

    Units of an adapter object are enumerated via __shost_for_each_device()
    over the scsi host object of the corresponding adapter.

    Once we move the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this won't be possible for
    cases where we set or clear common statuses during the very first adapter
    recovery.

    But since we won't have any port or unit objects yet at that point of time,
    we can just fence the status propagation for cases where the scsi host
    object is not yet set in the adapter object. It won't change any effective
    status propagations, but will prevent us from dereferencing invalid
    pointers.

    For any later point in the work flow the scsi host object will be set and
    thus nothing is changed then.

    Link: https://lore.kernel.org/r/f51fe5f236a1e3d1ce53379c308777561bfe35e1.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When doing the very first adapter recovery - initialization - for a FCP
    device in a point-to-point topology we also allocate the port object
    corresponding to the attached remote port, and trigger a port recovery for
    it that will run after the adapter recovery finished.

    Right now this happens right after we finished with the exchange config
    data command, and uses the fibre channel host object corresponding to the
    FCP device to determine whether a point-to-point topology is used.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this use of the fc_host object is not
    possible anymore at that point in the work flow.

    But the allocation and recovery trigger doesn't have notable side-effects
    on the following exchange port data processing, so we can move those to
    after xport data, and thus also to after the scsi host object allocation,
    once we move it. Then the fc_host object can be used again, like it is now.

    For any further adapter recoveries this doesn't change anything, because at
    that point the port object already exists and recovery is triggered
    elsewhere for existing port objects.

    Link: https://lore.kernel.org/r/73e5d4ac21e2b37bf0c3ca8e530bc5a5c6e74f8f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When receiving a notification that a FCP device lost its local link we
    usually update the fibre channel host object which represents that FCP
    device to reflect that.

    This notification/information can also surface when the FCP device is
    running through adapter recovery (exchange config and exchange port data
    return incomplete).

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, and this happens during the very first
    adapter recovery, these updates can not be done until after the scsi host
    object is allocated.

    Reorder the fc_host updates in zfcp_fsf_fc_host_link_down() so that they
    only happen after a check of whether the scsi host object is already
    allocated or not.

    During the first adapter recovery this will cause the skip of these updates
    if a link-down condition is detected, but we can repeat them after we
    allocated the scsi host object, if necessary.

    For any further link-down handling the only changes in the work flow are
    the slightly reordered assignments in zfcp_fsf_fc_host_link_down().

    Link: https://lore.kernel.org/r/f841f2cda61dcd7b8549910c44e1831927459edf.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When executing exchange port data for a FCP device for the first time, or
    after an adapter recovery, we update several properties of the fibre
    channel host object which represents that FCP device.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this is not possible for the former case.

    Move all these update into separate, and fenced function that first checks
    whether the scsi host object already exists or not, before making the
    updates.

    During the first ever exchange port data in the adapter life cycle this
    will make the exchange port data handler skip over this update step, but we
    can repeat it later, after we allocated the scsi host object.

    For any further recovery of that adapter the work flow is only changed
    slightly because then the scsi host object already exists and we don't free
    it until we release the adapter completely at the end of its life cycle.

    Link: https://lore.kernel.org/r/ae454c2dc6da0b02907c489af91d0b211d331825.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When executing exchange config data for a FCP device for the first time, or
    after an adapter recovery, we update several properties of the scsi host or
    fibre channel host object that represent that FCP device.

    When moving the scsi host object allocation and registration - and thus
    also the fibre channel host object allocation - to after the first exchange
    config and exchange port data, this is not possible for the former case.

    Move all these update into separate, and fenced function that first checks
    whether the scsi host object already exists or not, before making the
    updates.

    During the first ever exchange config data in the adapter life cycle this
    will make the exchange config data handler skip over this update step, but
    we can repeat it later, after we allocated the scsi host object.

    For any further recovery of that adapter the work flow is only changed
    slightly because then the scsi host object already exists and we don't free
    it until we release the adapter completely at the end of its life cycle.

    Link: https://lore.kernel.org/r/5fc3f4d38d4334f7aa595497c6f7865fb1102e0f.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     
  • When establishing and activating the QDIO queue pair for a FCP device for
    the first time, or after an adapter recovery, we publish some of its
    characteristics to the scsi host object representing that FCP device.

    When moving the scsi host object allocation and registration to after the
    first exchange config and exchange port data, this is not possible for the
    former case - QDIO open for the first time - because that happens before
    exchange config and exchange port data.

    Move the scsi host object update into a fenced function that checks whether
    the object already exists or not. This way we can repeat that step later,
    once we are past the allocation.

    Once the first recovery succeeds we don't release the scsi host object
    anymore, so further recoveries do work as before.

    Link: https://lore.kernel.org/r/a214ebf508f71e3690113e3e90edab1cea0e24e3.1588956679.git.bblock@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     

11 Apr, 2020

1 commit

  • Pull more SCSI updates from James Bottomley:
    "This is a batch of changes that didn't make it in the initial pull
    request because the lpfc series had to be rebased to redo an incorrect
    split.

    It's basically driver updates to lpfc, target, bnx2fc and ufs with the
    rest being minor updates except the sr_block_release one which fixes a
    use after free introduced by the removal of the global mutex in the
    first patch set"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (35 commits)
    scsi: core: Add DID_ALLOC_FAILURE and DID_MEDIUM_ERROR to hostbyte_table
    scsi: ufs: Use ufshcd_config_pwr_mode() when scaling gear
    scsi: bnx2fc: fix boolreturn.cocci warnings
    scsi: zfcp: use fallthrough;
    scsi: aacraid: do not overwrite retval in aac_reset_adapter()
    scsi: sr: Fix sr_block_release()
    scsi: aic7xxx: Remove more FreeBSD-specific code
    scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
    scsi: ufs: set device as active power mode after resetting device
    scsi: iscsi: Report unbind session event when the target has been removed
    scsi: lpfc: Change default SCSI LUN QD to 64
    scsi: libfc: rport state move to PLOGI if all PRLI retry exhausted
    scsi: libfc: If PRLI rejected, move rport to PLOGI state
    scsi: bnx2fc: Update the driver version to 2.12.13
    scsi: bnx2fc: Fix SCSI command completion after cleanup is posted
    scsi: bnx2fc: Process the RQE with CQE in interrupt context
    scsi: target: use the stack for XCOPY passthrough cmds
    scsi: target: increase XCOPY I/O size
    scsi: target: avoid per-loop XCOPY buffer allocations
    scsi: target: drop xcopy DISK BLOCK LENGTH debug
    ...

    Linus Torvalds
     

06 Apr, 2020

4 commits

  • It's no longer needed.

    Signed-off-by: Julian Wiedmann
    Reviewed-by: Benjamin Block
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • Upper-layer drivers allocate their SBALs by calling qdio_alloc_buffers()
    for each individual queue. But when later passing the SBAL addresses to
    qdio_establish(), they need to be in a single array of pointers.
    So if the driver uses multiple Input or Output queues, it needs to
    allocate a temporary array just to present all its SBAL pointers in this
    layout.

    This patch slightly changes the format of the QDIO initialization data,
    so that drivers can pass a per-queue array where each element points to
    a queue's SBAL array.
    zfcp doesn't use multiple queues, so the impact there is trivial.
    For qeth this brings a nice reduction in complexity, and removes
    a page-sized allocation.

    Signed-off-by: Julian Wiedmann
    Reviewed-by: Benjamin Block
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • In preparation for a subsequent patch, move the setup of init_data into
    the only caller.

    Signed-off-by: Julian Wiedmann
    Reviewed-by: Benjamin Block
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     
  • All that qdio_allocate() actually uses from the init_data is the cdev,
    and the number of Input and Output Queues. Have the driver pass those as
    parameters, and defer the init_data processing into qdio_establish().
    This includes writing per-device(!) trace entries, and most of the
    sanity checks.

    Signed-off-by: Julian Wiedmann
    Reviewed-by: Benjamin Block
    Signed-off-by: Vasily Gorbik

    Julian Wiedmann
     

03 Apr, 2020

1 commit

  • Pull SCSI updates from James Bottomley:
    "This series has a huge amount of churn because it pulls in Mauro's doc
    update changing all our txt files to rst ones.

    Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
    zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
    some other minor updates.

    The major core change is Hannes moving functions out of the aacraid
    driver and into the core"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
    scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
    scsi: ufs: Do not rely on prefetched data
    scsi: dc395x: remove dc395x_bios_param
    scsi: libiscsi: Fix error count for active session
    scsi: hpsa: correct race condition in offload enabled
    scsi: message: fusion: Replace zero-length array with flexible-array member
    scsi: qedi: Add PCI shutdown handler support
    scsi: qedi: Add MFW error recovery process
    scsi: ufs: Enable block layer runtime PM for well-known logical units
    scsi: ufs-qcom: Override devfreq parameters
    scsi: ufshcd: Let vendor override devfreq parameters
    scsi: ufshcd: Update the set frequency to devfreq
    scsi: ufs: Resume ufs host before accessing ufs device
    scsi: ufs-mediatek: customize the delay for enabling host
    scsi: ufs: make HCE polling more compact to improve initialization latency
    scsi: ufs: allow custom delay prior to host enabling
    scsi: ufs-mediatek: use common delay function
    scsi: ufs: introduce common and flexible delay function
    scsi: ufs: use an enum for host capabilities
    scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
    ...

    Linus Torvalds
     

01 Apr, 2020

1 commit

  • Convert the various uses of fallthrough comments to fallthrough;

    Done via script
    Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/

    Signed-off-by: Joe Perches
    Reviewed-by: Fedor Loshakov
    Reviewed-by: Steffen Maier
    [bblock@linux.ibm.com: resolved merge conflict with recently upstream-sent patch "zfcp: expose fabric name as common fc_host sysfs attribute"]
    Link: https://lore.kernel.org/r/d14669a67a17392490d3184117941123765db1a4.1585663010.git.bblock@linux.ibm.com
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Joe Perches
     

18 Mar, 2020

10 commits

  • Log any FC Endpoint Security errors to the kernel ring buffer with rate-
    limiting.

    Link: https://lore.kernel.org/r/20200312174505.51294-11-maier@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • Enable for explicit FCP channel FC Endpoint Security error reporting and
    handle any FSF security errors according to specification. Take the
    following recovery actions when a FSF_SECURITY_ERROR is reported for the
    specified FSF commands:

    - Open Port: Retry the command if possible
    - Send FCP : Physically close the remote port and reopen

    For Open Port the command status is set to error, which triggers a retry.
    For Send FCP the command status is set to error and recovery is triggered
    to physically reopen the remote port.

    Link: https://lore.kernel.org/r/20200312174505.51294-10-maier@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • Trace changes in Fibre Channel Endpoint Security capabilities of FCP
    devices as well as changes in Fibre Channel Endpoint Security state of
    their connections to FC remote ports as FC Endpoint Security changes with
    trace level 3 in HBA DBF.

    A change in FC Endpoint Security capabilities of FCP devices is traced as
    response to FSF command FSF_QTCB_EXCHANGE_PORT_DATA with a trace tag of
    "fsfcesa" and a WWPN of ZFCP_DBF_INVALID_WWPN = 0x0000000000000000 (see
    FC-FS-4 §18 "Name_Identifier Formats", NAA field).

    A change in FC Endpoint Security state of connections between FCP devices
    and FC remote ports is traced as response to FSF command
    FSF_QTCB_OPEN_PORT_WITH_DID with a trace tag of "fsfcesp".

    Example trace record of FC Endpoint Security capability change of FCP
    device formatted with zfcpdbf from s390-tools:

    Timestamp : ...
    Area : HBA
    Subarea : 00
    Level : 3
    Exception : -
    CPU ID : ...
    Caller : 0x...
    Record ID : 5 ZFCP_DBF_HBA_FCES
    Tag : fsfcesa FSF FC Endpoint Security adapter
    Request ID : 0x...
    Request status : 0x00000010
    FSF cmnd : 0x0000000e FSF_QTCB_EXCHANGE_PORT_DATA
    FSF sequence no: 0x...
    FSF issued : ...
    FSF stat : 0x00000000 FSF_GOOD
    FSF stat qual : n/a
    Prot stat : n/a
    Prot stat qual : n/a
    Port handle : 0x00000000 none (invalid)
    LUN handle : n/a
    WWPN : 0x0000000000000000 ZFCP_DBF_INVALID_WWPN
    FCES old : 0x00000000 old FC Endpoint Security
    FCES new : 0x00000007 new FC Endpoint Security

    Example trace record of FC Endpoint Security change of connection to
    FC remote port formatted with zfcpdbf from s390-tools:

    Timestamp : ...
    Area : HBA
    Subarea : 00
    Level : 3
    Exception : -
    CPU ID : ...
    Caller : 0x...
    Record ID : 5 ZFCP_DBF_HBA_FCES
    Tag : fsfcesp FSF FC Endpoint Security port
    Request ID : 0x...
    Request status : 0x00000010
    FSF cmnd : 0x00000005 FSF_QTCB_OPEN_PORT_WITH_DID
    FSF sequence no: 0x...
    FSF issued : ...
    FSF stat : 0x00000000 FSF_GOOD
    FSF stat qual : n/a
    Prot stat : n/a
    Prot stat qual : n/a
    Port handle : 0x...
    WWPN : 0x500507630401120c WWPN
    FCES old : 0x00000000 old FC Endpoint Security
    FCES new : 0x00000004 new FC Endpoint Security

    Link: https://lore.kernel.org/r/20200312174505.51294-9-maier@linux.ibm.com
    Reviewed-by: Steffen Maier
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • Log the usage of and subsequent changes in FC Endpoint Security of
    connections between FCP devices and FC remote ports to the kernel ring
    buffer. Activation of FC Endpoint Security is logged as informational.
    Change and deactivation are logged as warning.

    No logging takes place, if FC Endpoint Security is not used (i.e. never
    activated) on a connection or if it does not change during reopen of a port
    (e.g. due to adapter or port recovery).

    Link: https://lore.kernel.org/r/20200312174505.51294-8-maier@linux.ibm.com
    Reviewed-by: Steffen Maier
    Reviewed-by: Fedor Loshakov
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • Add an interface to read Fibre Channel Endpoint Security information of FCP
    channels and their connections to FC remote ports. It comes in the form of
    new sysfs attributes that are attached to the CCW device representing the
    FCP device and its zfcp port objects.

    The read-only sysfs attribute "fc_security" of a CCW device representing a
    FCP device shows the FC Endpoint Security capabilities of the device.
    Possible values are: "unknown", "unsupported", "none", or a comma-
    separated list of one or more mnemonics and/or one hexadecimal value
    representing the supported FC Endpoint Security:

    Authentication: Authentication supported
    Encryption : Encryption supported

    The read-only sysfs attribute "fc_security" of a zfcp port object shows the
    FC Endpoint Security used on the connection between its parent FCP device
    and the FC remote port. Possible values are: "unknown", "unsupported",
    "none", or a mnemonic or hexadecimal value representing the FC Endpoint
    Security used:

    Authentication: Connection has been authenticated
    Encryption : Connection is encrypted

    Both sysfs attributes may return hexadecimal values instead of mnemonics,
    if the mnemonic lookup table does not contain an entry for the FC Endpoint
    Security reported by the FCP device.

    Link: https://lore.kernel.org/r/20200312174505.51294-7-maier@linux.ibm.com
    Reviewed-by: Fedor Loshakov
    Reviewed-by: Steffen Maier
    Reviewed-by: Benjamin Block
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • Introduce automatic variables for adapter and QTCB bottom in
    zfcp_fsf_open_port_handler(). This facilitates subsequent changes to meet
    the 80 character per line limit.

    Link: https://lore.kernel.org/r/20200312174505.51294-6-maier@linux.ibm.com
    Reviewed-by: Fedor Loshakov
    Reviewed-by: Steffen Maier
    Reviewed-by: Benjamin Block
    Signed-off-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Jens Remus
     
  • When we get an unsolicited notification on local link went down,
    zfcp_fsf_status_read_link_down() calls zfcp_fsf_link_down_info_eval().
    This only blocks rports, and sets ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED and
    ZFCP_STATUS_COMMON_ERP_FAILED. Only the fc_host port_state changes to
    "Linkdown", because zfcp_scsi_get_host_port_state() is an active callback
    and uses the adapter status.

    Other fc_host attributes model, port_id, port_type, speed, fabric_name (and
    zfcp device attributes card_version, peer_wwpn, peer_wwnn, peer_d_id) which
    depend on a local link, continued to show their last known "good" value.

    Only if something triggered an exchange config data, some values were
    updated to their unknown equivalent via case
    FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE due to local link down. Triggers for
    exchange config data are adapter recovery, or reading any of the following
    zfcp-specific scsi host sysfs attributes "requests", "megabytes", or
    "seconds_active" in /sys/devices/css*/*.*.*/*.*.*/host*/scsi_host/host*/.

    The other fc_host attributes active_fc4s and permanent_port_name continued
    to show their last known "good" value. Only if something triggered an
    exchange port data, some values changed. Active_fc4s became all zeros as
    unknown equivalent during link down. Permanent_port_name does not depend
    on a local link. But for non-NPIV FCP devices, permanent_port_name
    erroneously became whatever value fc_host port_name had at that point in
    time (see previous paragraph). Triggers for exchange port data are the
    zfcp-specific scsi host sysfs attribute "utilization", or
    [{reset,get}_fc_host_stats] write anything into "reset_statistics" or read
    any of the other attributes under
    /sys/devices/css*/*.*.*/*.*.*/host*/fc_host/host*/statistics/.

    (cf. v4.9 commit bd77befa5bcf ("zfcp: fix fc_host port_type with NPIV"))

    This is particularly confusing when using "lszfcp -b -Ha" or
    dbginfo.sh which read fc_host attributes and also scsi_host attributes.
    After link down, the first invocation produces (abbreviated):

    Class = "fc_host"
    active_fc4s = "0x00 0x00 0x01 0x00 ..."
    ...
    fabric_name = "0x10000027f8e04c49"
    ...
    permanent_port_name = "0xc05076e4588059c1"
    port_id = "0x244800"
    port_state = "Linkdown"
    port_type = "NPort (fabric via point-to-point)"
    ...
    speed = "16 Gbit"
    Class = "scsi_host"
    ...
    megabytes = "0 0"
    ...
    requests = "0 0 0"
    seconds_active = "37"
    ...
    utilization = "0 0 0"

    The second and next invocations produce (abbreviated):

    Class = "fc_host"
    active_fc4s = "0x00 0x00 0x00 0x00 ..."
    ...
    fabric_name = "0x0"
    ...
    permanent_port_name = "0x0"
    port_id = "0x000000"
    port_state = "Linkdown"
    port_type = "Unknown"
    ...
    speed = "unknown"
    Class = "scsi_host"
    ...
    megabytes = "0 0"
    ...
    requests = "0 0 0"
    seconds_active = "38"
    ...
    utilization = "0 0 0"

    Factor out the resetting of local link dependent fc_host attributes from
    zfcp_fsf_exchange_config_data_handler() case
    FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE into a new helper function
    zfcp_fsf_fc_host_link_down(). All code places that detect local link down
    (SRB, FSF_PROT_LINK_DOWN, xconf data/port incomplete) call
    zfcp_fsf_link_down_info_eval(). Call the new helper from there. This works
    because zfcp_fsf_link_down_info_eval() and thus the helper is called before
    zfcp_fsf_exchange_{config,port}_evaluate().

    Port_name and node_name are always valid, so never reset them.

    Get the permanent_port_name from exchange port data unconditionally as it
    always has a valid known good value, even during link down.

    Note: Rather than hardcode in zfcp_fsf_exchange_config_evaluate(), fc_host
    supported_classes could theoretically get its value from
    fsf_qtcb_bottom_port.class_of_service in zfcp_fsf_exchange_port_evaluate().

    When the link comes back, we get a different notification, perform adapter
    recovery, and this triggers an implicit exchange config data followed by
    exchange port data filling in the link dependent fc_host attributes with
    known good values again.

    Link: https://lore.kernel.org/r/20200312174505.51294-5-maier@linux.ibm.com
    Reviewed-by: Jens Remus
    Reviewed-by: Benjamin Block
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     
  • Manufacturer, HBA model, firmware version, and hardware version. Use the
    same value format as for the driver-specific attributes. Keep the
    driver-specific attributes for stable user space sysfs API.

    Link: https://lore.kernel.org/r/20200312174505.51294-4-maier@linux.ibm.com
    Reviewed-by: Jens Remus
    Reviewed-by: Benjamin Block
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     
  • FICON Express8S or older, as well as card features newer than FICON
    Express16S+ have no certain firmware level requirement.

    FICON Express16S or FICON Express16S+ have the following
    minimum firmware level requirements to show a proper fabric name value:

    z13 machine
    FICON Express16S , MCL P08424.005 , LIC version 0x00000721
    z14 machine
    FICON Express16S , MCL P42611.008 , LIC version 0x10200069
    FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147

    Otherwise, the read value is not the fabric name.

    Each FCP channel of these card features might need one SAN fabric re-login
    after concurrent microcode update in order to show the proper fabric name.
    Possible ways to trigger a SAN fabric re-login are one of: Pull fibres
    between FCP channel port and SAN switch port on either side and re-plug,
    disable SAN switch port adjacent to FCP channel port and re-enable switch
    port, or at Service Element toggle off all CHPIDs of FCP channel over all
    LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices)
    on such FCP channel recovers a fabric re-login.

    Initialize fabric name for any topology and have it an invalid WWPN 0x0 for
    anything but fabric topology. Otherwise for e.g. point-to-point topology
    one could see the initial -1 from fc_host_setup() and after a link unplug
    our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix
    fc_host attributes that should be unknown on local link down") and stay 0x0
    on link replug. I did not initialize to 0x0 somewhere even earlier in the
    code path such that it would not flap from real to 0x0 to real on e.g. an
    exchange config data with fabric topology.

    Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com
    Reviewed-by: Benjamin Block
    Reviewed-by: Jens Remus
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     
  • v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote
    ports") introduced zfcp automatic port scan.

    Before that, the user had to use the sysfs attribute "port_add" of an FCP
    device (adapter) to add and open remote (target) ports, even for the remote
    peer port in point-to-point topology. That code path did a proper port open
    recovery trigger taking the erp_lock.

    Since above commit, a new helper function zfcp_erp_open_ptp_port()
    performed an UNlocked port open recovery trigger. This can race with other
    parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
    e.g. adapter->erp_total_count or adapter->erp_ready_head.

    As already found for fabric topology in v4.17 commit fa89adba1941 ("scsi:
    zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
    during tracing of rport (un)block. A subsequent v4.18 commit 9e156c54ace3
    ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
    trigger") introduced a lockdep assertion for that case.

    As a side effect, that lockdep assertion now uncovered the unlocked code
    path for PtP. It is from within an adapter ERP action:

    zfcp_erp_strategy[1479] intentionally DROPs erp lock around
    zfcp_erp_strategy_do_action()
    zfcp_erp_strategy_do_action[1441] NO erp lock
    zfcp_erp_adapter_strategy[876] NO erp lock
    zfcp_erp_adapter_strategy_open[855] NO erp lock
    zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
    zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around
    zfcp_erp_action_to_running(),
    BUT *_not_* around
    zfcp_erp_enqueue_ptp_port()
    zfcp_erp_enqueue_ptp_port[728] BUG: *_not_* taking erp lock
    _zfcp_erp_port_reopen[432] assumes to be called with erp lock
    zfcp_erp_action_enqueue[314] assumes to be called with erp lock
    zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock:
    lockdep_assert_held(&adapter->erp_lock);

    It causes the following lockdep warning:

    WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
    zfcp_dbf_rec_trig+0x16a/0x188
    no locks held by zfcperp0.0.17c0/775.

    Fix this by using the proper locked recovery trigger helper function.

    Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com
    Fixes: cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")
    Cc: #v2.6.27+
    Reviewed-by: Jens Remus
    Reviewed-by: Benjamin Block
    Signed-off-by: Steffen Maier
    Signed-off-by: Martin K. Petersen

    Steffen Maier
     

29 Feb, 2020

1 commit

  • Pull SCSI fixes from James Bottomley:
    "Four small fixes.

    Three are in drivers for fairly obvious bugs. The fourth is a set of
    regressions introduced by the compat_ioctl changes because some of the
    compat updates wrongly replaced .ioctl instead of .compat_ioctl"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
    scsi: zfcp: fix wrong data and display format of SFP+ temperature
    scsi: sd_sbc: Fix sd_zbc_report_zones()
    scsi: libfc: free response frame from GPN_ID

    Linus Torvalds
     

25 Feb, 2020

1 commit

  • When implementing support for retrieval of local diagnostic data from the
    FCP channel, the wrong data format was assumed for the temperature of the
    local SFP+ connector. The Fibre Channel Link Services (FC-LS-3)
    specification is not clear on the format of the stored integer, and only
    after consulting the SNIA specification SFF-8472 did we realize it is
    stored as two's complement. Thus, the used data and display format is
    wrong, and highly misleading for users when the temperature should drop
    below 0°C (however unlikely that may be).

    To fix this, change the data format in `struct fsf_qtcb_bottom_port` from
    unsigned to signed, and change the printf format string used to generate
    `zfcp_sysfs_adapter_diag_sfp_temperature_show()` from `%hu` to `%hd`.

    Link: https://lore.kernel.org/r/d6e3be5428da5c9490cfff4df7cae868bc9f1a7e.1582039501.git.bblock@linux.ibm.com
    Fixes: a10a61e807b0 ("scsi: zfcp: support retrieval of SFP Data via Exchange Port Data")
    Fixes: 6028f7c4cd87 ("scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver")
    Cc: # 5.5+
    Reviewed-by: Jens Remus
    Reviewed-by: Fedor Loshakov
    Reviewed-by: Steffen Maier
    Signed-off-by: Benjamin Block
    Signed-off-by: Martin K. Petersen

    Benjamin Block
     

20 Feb, 2020

1 commit