11 Jan, 2012

2 commits

  • SCSI updates for post 3.2 merge window

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (67 commits)
    [SCSI] lpfc 8.3.28: Update driver version to 8.3.28
    [SCSI] lpfc 8.3.28: Add Loopback support for SLI4 adapters
    [SCSI] lpfc 8.3.28: Critical Miscellaneous fixes
    [SCSI] Lpfc 8.3.28: FC and SCSI Discovery Fixes
    [SCSI] lpfc 8.3.28: Add support for ABTS failure handling
    [SCSI] lpfc 8.3.28: SLI fixes and added SLI4 support
    [SCSI] lpfc 8.3.28: Miscellaneous fixes in sysfs and mgmt interfaces
    [SCSI] mpt2sas: Removed redundant calling of _scsih_probe_devices() from _scsih_probe
    [SCSI] mac_scsi: Remove obsolete IRQ_FLG_* users
    [SCSI] qla4xxx: Update driver version to 5.02.00-k10
    [SCSI] qla4xxx: check for FW alive before calling chip_reset
    [SCSI] qla4xxx: Fix qla4xxx_dump_buffer to dump buffer correctly
    [SCSI] qla4xxx: Fix the IDC locking mechanism
    [SCSI] qla4xxx: Wait for disable_acb before doing set_acb
    [SCSI] qla4xxx: Don't recover adapter if device state is FAILED
    [SCSI] qla4xxx: fix call trace on rmmod with ql4xdontresethba=1
    [SCSI] qla4xxx: Fix CPU lockups when ql4xdontresethba set
    [SCSI] qla4xxx: Perform context resets in case of context failures.
    [SCSI] iscsi class: export pid of process that created
    [SCSI] mpt2sas: Remove unused duplicate diag_buffer_enable param
    ...

    Linus Torvalds
     
  • * 'upstream-linus' of git://github.com/jgarzik/libata-dev:
    ahci: support the STA2X11 I/O Hub
    pata_bf54x: fix BMIDE status register emulation
    ata: add ata port hibernate callbacks
    ata: update ata port's runtime status during system resume
    [SCSI] runtime resume parent for child's system-resume
    ahci: platform support for suspend/resume
    libata-core: kill duplicate statement in ata_do_set_mode()
    pata_of_platform: remove direct dependency on OF_IRQ
    SATA/PATA: convert drivers/ata/* to use module_platform_driver()
    pata_cs5536: forward port changes from cs5536
    libata-sff: use ATAPI_{COD|IO}
    ata: add ata port runtime PM callbacks
    ata: add ata port system PM callbacks
    [SCSI] sd: check runtime PM status in sd_shutdown
    [SCSI] check runtime PM status in system PM
    [SCSI] add flag to skip the runtime PM calls on the host
    ata: make ata port as parent device of scsi host
    ahci: start engine only during soft/hard resets

    Linus Torvalds
     

09 Jan, 2012

1 commit

  • With previous change, now the ata port runtime suspend will happen as:

    disk suspend --> scsi target suspend --> scsi host suspend --> ata port
    suspend

    ata port(parent device) suspend need to schedule scsi EH which will resume
    scsi host(child device). Then the child device resume will in turn make
    parent device resume first. This is kind of recursive.

    This patch adds a new flag Scsi_Host::eh_noresume.
    ata port will set this flag to skip the runtime PM calls on scsi host.

    Acked-by: Alan Stern
    Signed-off-by: Lin Ming
    Signed-off-by: Jeff Garzik

    Lin Ming
     

04 Jan, 2012

1 commit


15 Dec, 2011

3 commits

  • Use DCB notifiers to set the skb priority to allow packets
    to be steered and tagged correctly over DCB enabled drivers
    that setup traffic classes.

    This allows queue_mapping() routines to be removed in these
    drivers that were previously inspecting the ethertype of
    every skb to mark FCoE/FIP frames.

    Signed-off-by: John Fastabend
    Signed-off-by: Robert Love
    Signed-off-by: James Bottomley

    john fastabend
     
  • There could be multiple userspace entities creating/destroying/
    recoverying sessions and also the kernel's iscsi drivers could
    be doing this too. If the userspace apps do try to manage the kernel
    ones it can get the driver/fw out of sync and cause the user to
    loose the root disk, oopses or ping ponging becasue userspace
    wants to do one thing but the kernel manager thought we
    are trying to do another.

    This patch fixes the problem by just exporting the pid of
    the entity that created the session. Userspace programs like
    iscsid, iscsiadm, iscsistart, qlogic's tools, etc, can then
    figure out which sessions they own and only manage them.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • All the handlers have now implemented the match function so We don't need to
    use scsi_dev_info any more for matching purposes.

    Signed-off-by: Babu Moger
    Acked-by: Hannes Reinecke
    Signed-off-by: James Bottomley

    Moger, Babu
     

29 Oct, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
    [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
    [SCSI] ipr: Fix BUG on adapter dump timeout
    [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
    [SCSI] hpsa: change confusing message to be more clear
    [SCSI] iscsi class: fix vlan configuration
    [SCSI] qla4xxx: fix data alignment and use nl helpers
    [SCSI] iscsi class: fix link local mispelling
    [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
    [SCSI] aacraid: use lower snprintf() limit
    [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
    [SCSI] lpfc 8.3.27: T10 additions for SLI4
    [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
    [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
    [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
    [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
    [SCSI] megaraid_sas: Changelog and version update
    [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
    [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
    [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
    [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
    ...

    Linus Torvalds
     

25 Oct, 2011

3 commits

  • This is finally the RAID5 Write support.

    The bigger part of this patch is not the XOR engine itself, But the
    read4write logic, which is a complete mini prepare_for_striping
    reading engine that can read scattered pages of a stripe into cache
    so it can be used for XOR calculation. That is, if the write was not
    stripe aligned.

    The main algorithm behind the XOR engine is the 2 dimensional array:
    struct __stripe_pages_2d.
    A drawing might save 1000 words
    ---

    __stripe_pages_2d
    |
    n = pages_in_stripe_unit;
    w = group_width - parity;
    | pages array presented to the XOR lib
    | |
    V |
    __1_page_stripe[0].pages --> [c0][c1]..[cw][c_par] [c0][c1]..[cw][c_par] [c0][c1]..[cw][c_par]
    ^
    |
    data added columns first then row

    ---
    The pages are put on this array columns first. .i.e:
    p0-of-c0, p1-of-c0, ... pn-of-c0, p0-of-c1, ...
    So we are doing a corner turn of the pages.

    Note that pages will zigzag down and left. but are put sequentially
    in growing order. So when the time comes to XOR the stripe, only the
    beginning and end of the array need be checked. We scan the array
    and any NULL spot will be field by pages-to-be-read.

    The FS that wants to support RAID5 needs to supply an
    operations-vector that searches a given page in cache, and specifies
    if the page is uptodate or need reading. All these pages to be read
    are put on a slave ore_io_state and synchronously read. All the pages
    of a stripe are read in one IO, using the scatter gather mechanism.

    In write we constrain our IO to only be incomplete on a single
    stripe. Meaning either the complete IO is within a single stripe so
    we might have pages to read from both beginning or end of the
    strip. Or we have some reading to do at beginning but end at strip
    boundary. The left over pages are pushed to the next IO by the API
    already established by previous work, where an IO offset/length
    combination presented to the ORE might get the length truncated and
    the user must re-submit the leftover pages. (Both exofs and NFS
    support this)

    But any ORE user should make it's best effort to align it's IO
    before hand and avoid complications. A cached ore_layout->stripe_size
    member can be used for that calculation. (NOTE: that ORE demands
    that stripe_size may not be bigger then 32bit)

    What else? Well read it and tell me.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • This patch introduces the first stage of RAID5 support
    mainly the skip-over-raid-units when reading. For
    writes it inserts BLANK units, into where XOR blocks
    should be calculated and written to.

    It introduces the new "general raid maths", and the main
    additional parameters and components needed for raid5.

    Since at this stage it could corrupt future version that
    actually do support raid5. The enablement of raid5
    mounting and setting of parity-count > 0 is disabled. So
    the raid5 code will never be used. Mounting of raid5 is
    only enabled later once the basic XOR write is also in.
    But if the patch "enable RAID5" is applied this code has
    been tested to be able to properly read raid5 volumes
    and is according to standard.

    Also it has been tested that the new maths still properly
    supports RAID0 and grouping code just as before.
    (BTW: I have found more bugs in the pnfs-obj RAID math
    fixed here)

    The ore.c file is getting too big, so new ore_raid.[hc]
    files are added that will include the special raid stuff
    that are not used in striping and mirrors. In future write
    support these will get bigger.
    When adding the ore_raid.c to Kbuild file I was forced to
    rename ore.ko to libore.ko. Is it possible to keep source
    file, say ore.c and module file ore.ko the same even if there
    are multiple files inside ore.ko?

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • ore_calc_stripe_info is needed by exofs::export.c
    for the layout calculations. Make it exportable

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

20 Oct, 2011

3 commits

  • Userspace was sending the priority/id part of the vlan tag
    and sysfs was displaying the id in the vlan file. This
    renames the vlan sysfs file to vlan_id to reflect that it
    was showing the id and to match the vlan_priority file.
    This also adds a ISCSI_NET_PARAM_VLAN_TAG iscsi nl command
    to relfect that we are sending down the vlan/priority
    part of the tag.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • This has the driver use helpers for a common operation and fixes
    a issue where if multiple iscsi params are sent they could be
    sent at offsets that cause unaligned accesses. The nla helpers
    account for the padding needed to align properly for the driver.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • Replaced the iscsi_get_next_target_id with IDA to make
    target-id allocation efficient for iscsi offload drivers

    This patch should be applied after Jonathen Cameron Patch
    "ida : simplified functions for id allocation"

    Signed-off-by: John Soni Jose
    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

16 Oct, 2011

2 commits

  • port->dev_list maintains a list of devices attached to a given sas root port.
    It needs to be mutated under a lock as contexts outside of the
    single-threaded-libsas-workqueue access the list via sas_find_dev_by_rphy().
    Fixup locations where the list was being mutated without a lock.

    This is a follow-up to commit 5911e963 "[SCSI] libsas: remove expander
    from dev list on error", where Luben noted [1]:

    > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(),
    > sas_unregister_common_dev(), and sas_ex_discover_end_dev()

    Yes, I can see that and that is very unfortunate.

    [1]: http://marc.info/?l=linux-scsi&m=131480962006471&w=2

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • Except for obtaining the netdev from lport, fcoe_get_lesb is the common code
    for the LLDs.

    Signed-off-by: Bhanu Prakash Gollapudi
    Acked-by: Yi Zou
    Signed-off-by: James Bottomley

    Bhanu Prakash Gollapudi
     

15 Oct, 2011

4 commits

  • Current ore_check_io API receives a residual
    pointer, to report partial IO. But it is actually
    not used, because in a multiple devices IO there
    is never a linearity in the IO failure.

    On the other hand if every failing device is reported
    through a received callback measures can be taken to
    handle only failed devices. One at a time.

    This will also be needed by the objects-layout-driver
    for it's error reporting facility.

    Exofs is not currently using the new information and
    keeps the old behaviour of failing the complete IO in
    case of an error. (No partial completion)

    TODO: Use an ore_check_io callback to set_page_error only
    the failing pages. And re-dirty write pages.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • All users of the ore will need to check if current code
    supports the given layout. For example RAID5/6 is not
    currently supported.

    So move all the checks from exofs/super.c to a new
    ore_verify_layout() to be used by ore users.

    Note that any new layout should be passed through the
    ore_verify_layout() because the ore engine will prepare
    and verify some internal members of ore_layout, and
    assumes it's called.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Users like the objlayout-driver would like to only pass
    a partial device table that covers the IO in question.
    For example exofs divides the file into raid-group-sized
    chunks and only serves group_width number of devices at
    a time.

    The partiality is communicated by setting
    ore_componets->first_dev and the array covers all logical
    devices from oc->first_dev upto (oc->first_dev + oc->numdevs)

    The ore_comp_dev() API receives a logical device index
    and returns the actual present device in the table.
    An out-of-range dev_index will BUG.

    Logical device index is the theoretical device index as if
    all the devices of a file are present. .i.e:
    total_devs = group_width * mirror_p1 * group_count
    0 < total_devs

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Now that each ore_io_state covers only a single raid group.
    A single striping_info math is needed. Embed one inside
    ore_io_state to cache the calculation results and eliminate
    an extra call.

    Also the outer _prepare_for_striping is removed since it does nothing.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

04 Oct, 2011

1 commit

  • In the pNFS obj-LD the device table at the layout level needs
    to point to a device_cache node, where it is possible and likely
    that many layouts will point to the same device-nodes.

    In Exofs we have a more orderly structure where we have a single
    array of devices that repeats twice for a round-robin view of the
    device table

    This patch moves to a model that can be used by the pNFS obj-LD
    where struct ore_components holds an array of ore_dev-pointers.
    (ore_dev is newly defined and contains a struct osd_dev *od
    member)

    Each pointer in the array of pointers will point to a bigger
    user-defined dev_struct. That can be accessed by use of the
    container_of macro.

    In Exofs an __alloc_dev_table() function allocates the
    ore_dev-pointers array as well as an exofs_dev array, in one
    allocation and does the addresses dance to set everything pointing
    correctly. It still keeps the double allocation trick for the
    inodes round-robin view of the table.

    The device table is always allocated dynamically, also for the
    single device case. So it is unconditionally freed at umount.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

03 Oct, 2011

10 commits

  • The struct ore_striping_info will be used later in other
    structures. And ore_calc_stripe_info as well. Rename them
    make struct ore_striping_info public. ore_calc_stripe_info
    is still static, will be made public on first use.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • The struct pnfs_osd_data_map data_map member of exofs_sb_info was
    never used after mount. In fact all it's members were duplicated
    by the ore_layout structure. So just remove the duplicated information.

    Also removed some stupid, but perfectly supported, restrictions on
    layout parameters. The case where num_devices is not divisible by
    mirror_count+1 is perfectly fine since the rotating device view
    will eventually use all the devices it can get.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • ore_components already has a comps member so this leads
    to things like comps->comps which is annoying. the name oc
    was already used in new code. So rename all old usage of
    ore_components comps => ore_components oc.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Allow the sas-transport-class to update events for local phys via a new
    PHY_FUNC_GET_EVENTS command to ->lldd_control_phy(). Fixup drivers that
    are not prepared for new enum phy_func values, and unify
    ->lldd_control_phy() error codes.

    These are the SAS defined phy events that are reported in a
    smp-report-phy-error-log command:
    * /sys/class/sas_phy//invalid_dword_count
    * /sys/class/sas_phy//running_disparity_error_count
    * /sys/class/sas_phy//loss_of_dword_sync_count
    * /sys/class/sas_phy//phy_reset_problem_count

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • Based on original implementation from Jiangbi Liu and Maciej Trela.

    ATAPI transfers happen in two-to-three stages. The two stage atapi
    commands are those that include a dma data transfer. The data transfer
    portion of these operations is handled by the hardware packet-dma
    acceleration. The three-stage commands do not have a data transfer and
    are handled without hardware assistance in raw frame mode.

    stage1: transmit host-to-device fis to notify the device of an incoming
    atapi cdb. Upon reception of the pio-setup-fis repost the task_context
    to perform the dma transfer of the cdb+data (go to stage3), or repost
    the task_context to transmit the cdb as a raw frame (go to stage 2).

    stage2: wait for hardware notification of the cdb transmission and then
    go to stage 3.

    stage3: wait for the arrival of the terminating device-to-host fis and
    terminate the command.

    To keep the implementation simple we only support ATAPI packet-dma
    protocol (for commands with data) to avoid needing to handle the data
    transfer manually (like we do for SATA-PIO). This may affect
    compatibility for a small number of devices (see
    ATA_HORKAGE_ATAPI_MOD16_DMA).

    If the data-transfer underruns, or encounters an error the
    device-to-host fis is expected to arrive in the unsolicited frame queue
    to pass to libata for disposition. However, in the DONE_UNEXP_FIS (data
    underrun) case it appears we need to craft a response. In the
    DONE_REG_ERR case we do receive the UF and propagate it to libsas.

    Signed-off-by: Maciej Trela
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • cache aligned xid and ex_lock beside
    removing holes.

    Signed-off-by: Vasu Dev
    Tested-by: Ross Brattain
    Signed-off-by: Yi Zou
    Signed-off-by: James Bottomley

    Vasu Dev
     
  • Re-arrange its fields to avoid padding and have better
    cacheline alignments.

    Removed not used start_time, end_time and last_pkt_time
    fields.

    This all reduced this struct size to 448 from 480 and
    that also reduced one cacheline on x86_64 beside
    eliminating 8 pads. However kept logical fields together.

    Signed-off-by: Vasu Dev
    Tested-by: Ross Brattain
    Signed-off-by: Yi Zou
    Signed-off-by: James Bottomley

    Vasu Dev
     
  • Several sas drivers legitimately check the protocol against the union of
    SAS_PROTOCOL_SATA and SAS_PROTOCOL_STP. Provide a SAS_PROTOCOL_STP_ALL
    to silence warnings like:

    drivers/scsi/pm8001/pm8001_sas.c:438:3: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
    drivers/scsi/mvsas/mv_sas.c:798:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
    drivers/scsi/mvsas/mv_sas.c:1783:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
    drivers/scsi/mvsas/mv_sas.c:1886:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
    drivers/scsi/isci/request.c:3565:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • If the user has disabled CONFIG_SCSI_SAS_HOST_SMP then libsas drivers
    will not be receiving smp-gpio frames and do not need this lookup code.

    Reported-by: Randy Dunlap
    Tested-by: Randy Dunlap
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • Allow expander table-to-table attachments for
    expanders that support it.

    Signed-off-by: Luben Tuikov
    Signed-off-by: James Bottomley

    Luben Tuikov
     

22 Sep, 2011

1 commit

  • Add SFF-8485 v0.7 / SAS-1 smp-write-gpio register support to libsas.
    Defer SAS-2 support unless/until it defines an sgpio interface.

    Minimum implementation needed to get the lights blinking.
    try_test_sas_gpio_gp_bit() provides a common method to parse the
    incoming write data (raw bitstream), and the to_sas_gpio_gp_bit() helper
    routine can be used as a basis for the set/clear operations for the
    'read' implementation. Host implementations parse as many bits
    (ODx.[012]) as are locally supported and report the number of registers
    successfully written. If the submitted data overruns the internal
    number of registers available report the write as a success with the
    number of bytes remaining reported in ->resid_len.

    Example (assuming an active backplane) set the "identify" pattern for
    the first 21 devices:

    smp_write_gpio --count=2 --data=92,49,24,92,24,92,49,24 -t 4 --index=1 /dev/bsg/sas_hostX

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     

31 Aug, 2011

2 commits


29 Aug, 2011

1 commit

  • The problem is that if we are doing a scsi scan then the device goes
    into recovery then we will wait for the recovery to complete. It waits
    because scsi-ml will send inquiries or report luns and the queueing code
    will have been blocked due to the host not being ready. However, if we
    are in recovery and then a scan is started the scan will silently fail
    and some devices will not be added.

    It is easy to hit the problem where devices do not show up with
    FC where we are doing tests that disrupt the target controllers.
    When the controller is disruprted (reboot, or setting firmware, etc),
    and we cause the dev loss tmo to fire then devices will be removed
    Then when the problem has been fixed, the rport will be scanned and
    devices should be added back. But if we cause another disruption before
    scanning has started then devices will not get added back. If the problem
    is not started until the scan is started then the devices will be added
    back.

    This patch fixes that problem by not failing scans when the host
    is in recovery. We will let scsi-ml send the IO and let the queueing
    and scsi error handling deal with it like is done if we went into
    recovery while scanning.

    For recovery cases where the host is being torn down then with the
    patch we will still fail the scan since there is not point in scanning.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

27 Aug, 2011

5 commits