26 Sep, 2011

1 commit

  • This oops was reported recently:
    d:mon> e
    cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
    sp: c0000000fd4c73a0
    msr: 8000000000009032
    dar: 0
    dsisr: 40000000
    current = 0xc0000000fd640d40
    paca = 0xc00000000054ff80
    pid = 5085, comm = iscsid
    d:mon> t
    [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
    [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
    [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
    [scsi_transport_iscsi2]
    [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
    [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
    [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
    [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
    [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
    [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
    [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
    --- Exception: c01 (System Call) at 00000080da560cfc

    The root cause was an EEH error, which sent us down the offload_close path in
    the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
    upper layer driver (like the cxgbi drivers) which might have execution contexts
    in the middle of its use. The result is the oops above, when t3_l2t_get attempts
    to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.

    The fix is to prevent the setting of the NULL pointer until after there are no
    further users of it. The t3cdev->l2opt pointer is now converted to be an rcu
    pointer and the L2DATA macro is now called under the protection of the
    rcu_read_lock(). When the EEH error path:
    t3_adapter_error->offload_close->cxgb3_offload_deactivate
    Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
    quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
    pointer without concern that the underlying data will be freeded before the
    pointer is dereferenced.

    This has been tested by the reporter and shown to fix the reproted oops

    [nhorman: fix up unitinialised variable reported by Dan Carpenter]
    Signed-off-by: Neil Horman
    Reviewed-by: Karen Xie
    Cc: stable@kernel.org
    Signed-off-by: James Bottomley

    Neil Horman
     

18 Aug, 2011

3 commits


17 Aug, 2011

1 commit

  • Fix a bug introduced in 69cce1d14049 ("net: Abstract dst->neighbour
    accesses behind helpers.") where we might dereference skb_dst(skb)
    even if it is NULL, which causes:

    [ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    [ 240.948007] IP: [] ipoib_start_xmit+0x39/0x280 [ib_ipoib]
    [...]
    [ 240.948007] Call Trace:
    [ 240.948007]
    [ 240.948007] [] dev_hard_start_xmit+0x2a0/0x590
    [ 240.948007] [] ? arp_create+0x70/0x200
    [ 240.948007] [] sch_direct_xmit+0xef/0x1c0

    Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212
    Signed-off-by: Bernd Schubert
    Signed-off-by: Roland Dreier

    Bernd Schubert
     

28 Jul, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
    kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
    iscsi-target: Add iSCSI fabric support for target v4.1
    iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
    iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
    iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi

    Linus Torvalds
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

25 Jul, 2011

1 commit

  • This patch renames the following iscsi_proto.h structures to avoid
    namespace issues with drivers/target/iscsi/iscsi_target_core.h:

    *) struct iscsi_cmd -> struct iscsi_scsi_req
    *) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
    *) struct iscsi_login -> struct iscsi_login_req

    This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
    and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
    fixes the incorrect definition of struct iscsi_snack to following
    RFC-3720 Section 10.16. SNACK Request.

    Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
    use the updated structure definitions in a handful of locations.

    Signed-off-by: Mike Christie
    Signed-off-by: Nicholas A. Bellinger

    Nicholas Bellinger
     

23 Jul, 2011

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
    IB/qib: Defer HCA error events to tasklet
    mlx4_core: Bump the driver version to 1.0
    RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
    IB/mlx4: Support PMA counters for IBoE
    IB/mlx4: Use flow counters on IBoE ports
    IB/pma: Add include file for IBA performance counters definitions
    mlx4_core: Add network flow counters
    mlx4_core: Fix location of counter index in QP context struct
    mlx4_core: Read extended capabilities into the flags field
    mlx4_core: Extend capability flags to 64 bits
    IB/mlx4: Generate GID change events in IBoE code
    IB/core: Add GID change event
    RDMA/cma: Don't allow IPoIB port space for IBoE
    RDMA: Allow for NULL .modify_device() and .modify_port() methods
    IB/qib: Update active link width
    IB/qib: Fix potential deadlock with link down interrupt
    IB/qib: Add sysfs interface to read free contexts
    IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
    IB/qib: Remove double define
    IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
    ...

    Linus Torvalds
     
  • Roland Dreier
     
  • With ib_qib options:

    options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

    a run of ib_write_bw -a yields the following:

    ------------------------------------------------------------------
    #bytes #iterations BW peak[MB/sec] BW average[MB/sec]
    1048576 5000 2910.64 229.80
    ------------------------------------------------------------------

    The top cpu use in a profile is:

    CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
    of 0x00 (No unit mask) count 1002300
    Counted LLC_MISSES events (Last level cache demand requests from this core that
    missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
    samples % samples % app name symbol name
    15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr
    12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors
    4106 7.8860 0 0 vmlinux vsnprintf

    Analysis of the stats, profile, the code, and the annotated profile indicate:
    - All of the overflow interrupts (one per packet overflow) are
    serviced on CPU0 with no mitigation on the frequency.
    - All of the receive interrupts are being serviced by CPU0. (That is
    the way truescale.cmds statically allocates the kctx IRQs to CPU)
    - The code is spending all of its time servicing QIB_I_C_ERROR
    RcvEgrFullErr interrupts on CPU0, starving the packet receive
    processing.
    - The decode_err routine is very inefficient, using a printf variant
    to format a "%s" and continues to loop when the errs mask has been
    cleared.
    - Both qib_7322intr and handle_7322_errors read pci registers, which
    is very inefficient.

    The fix does the following:
    - Adds a tasklet to service QIB_I_C_ERROR
    - Replaces the very inefficient scnprintf() with a memcpy(). A field
    is added to qib_hwerror_msgs to save the sizeof("string") at
    compile time so that a strlen is not needed during err_decode().
    - The most frequent errors (Overflows) are serviced first to exit the
    loop as early as possible.
    - The loop now exits as soon as the errs mask is clear rather than
    fruitlessly looping through the msp array.

    With this fix the performance changes to:

    ------------------------------------------------------------------
    #bytes #iterations BW peak[MB/sec] BW average[MB/sec]
    1048576 5000 2990.64 2941.35
    ------------------------------------------------------------------

    During testing of the error handling overflow patch, it was determined
    that some CPU's were slower when servicing both overflow and receive
    interrupts on CPU0 with different MSI interrupt vectors.

    This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
    interrupt for kctx's < 2 and to service them on the default interrupt.
    For some CPUs, the cost of the interrupt enter/exit is more costly
    than then the additional PCI read in the default handler.

    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Mike Marciniszyn
     

22 Jul, 2011

1 commit

  • - unify vlan and nonvlan rx path
    - kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register
    - allow to turn on/off rx/tx vlan accel via ethtool (set_features)

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

19 Jul, 2011

17 commits

  • Since printk_ratelimit() shouldn't be used anymore (see comment in
    include/linux/printk.h), replace it with printk_ratelimited().

    Signed-off-by: Manuel Zerpies
    Signed-off-by: Roland Dreier

    Manuel Zerpies
     
  • Use the per port counter attached to all QPs created on that port to
    implement port level packets/bytes performance counters a la IB.
    Derived from a patch by Eli Cohen

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Allocate flow counter per Ethernet/IBoE port, and attach this counter
    to all the QPs created on that port. Based on patch by Eli Cohen
    .

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Move the various definitions and mad structures needed for software
    implementation of IBA PM agent from the ipath and qib drivers into a
    single include file, which in turn could be used by more consumers.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • IBoE doesn't use LIDs. Use the GID change event to update the IB core
    cache for addition/deletion of GIDs.

    Signed-off-by: Eli Cohen
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Add IB GID change event type. This is needed for IBoE when the HW
    driver updates the GID (e.g when new VLANs are added/deleted) table
    and the change should be reflected to the IB core cache.

    Signed-off-by: Eli Cohen
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • This patch fixes a kernel crash in cma_set_qkey().

    When the link layer is Ethernet, it is wrong to use IPoIB port space
    since no IPoIB interface is available. Specifically, setting the
    Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
    an SA query, which doesn't make sense over Ethernet.

    Signed-off-by: Moni Shoua
    Acked-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Moni Shoua
     
  • These methods don't make sense for iWARP devices, so rather than
    forcing them to implement stubs, just return -ENOSYS in the core if
    the hardware driver doesn't set .modify_device and/or .modify_port.

    Signed-off-by: Roland Dreier

    Bart Van Assche
     
  • Update the active link width on QLE7220 chips when link goes down if
    chip width does not match shadowed width.

    Signed-off-by: Mitko Haralanov
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Mitko Haralanov
     
  • There is a possibility of a deadlock due to the way locks are
    acquired and released in qib_set_uevent_bits(). The function
    qib_set_uevent_bits() is called in process context and it uses
    spin_lock() and spin_unlock(). This same lock is acquired/released
    in interrupt context which can lead to a deadlock when running on
    the same cpu.

    The fix is to replace spin_lock() and spin_unlock() with
    spin_lock_irqsave() and spin_unlock_irqrestore() respectively in
    qib_set_uevent_bits().

    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Ram Vepa
     
  • Indicate the number of free user contexts via the sysfs file
    /sys/class/infiniband/qib0/nfreectxts as required for PSM.

    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Ram Vepa
     
  • The PCIE capability offset is saved during PCI bus walking. It will
    remove an unnecessary search in the PCI configuration space if this
    value is referenced instead of reacquiring it. Also, pci_is_pcie is a
    better way of determining if the device is PCIE or not (as it uses the
    same saved PCIE capability offset).

    Signed-off-by: Jon Mason
    Signed-off-by: Roland Dreier

    Jon Mason
     
  • Signed-off-by: Edwin van Vliet
    Reviewed-by: Jesper Juhl
    Acked-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Edwin van Vliet
     
  • The PCIE capability offset is saved during PCI bus walking. It will
    remove an unnecessary search in the PCI configuration space if this
    value is referenced instead of reacquiring it.

    Signed-off-by: Jon Mason
    Acked-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Jon Mason
     
  • Adapt to new api. We plan to remove old one later. Almost all
    changes are trivial, but there is one real fix: the following code is
    unsafe:

    int ncpus = num_online_cpus()
    for (i = 0; i < ncpus; i++) {
    ..
    }

    because 1) we don't guarantee last bit of online cpus is equal to
    num_online_cpus(). some arch assign sparse cpu number. 2) cpu
    hotplugging may change cpu_online_mask at same time. we need to pin
    it by get_online_cpus().

    Signed-off-by: KOSAKI Motohiro
    Acked-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Motohiro KOSAKI
     
  • Adapt to use new APIs. We plan to remove old one later and plan to
    change current->cpus_allowed implementation.

    No functional change.

    Signed-off-by: KOSAKI Motohiro
    Acked-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Motohiro KOSAKI
     
  • Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a
    few anomalies in the error flow due to confusion about checking for
    NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
    every time we wish to determine if the cma_id object has a cm device
    associated with it.

    Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
    check directly for the existence of the device pointer -- for a
    non-NULL check, makes no difference if it is the iwarp or the ib
    pointer).

    Finally, make a few code changes here to improve coding consistency.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

18 Jul, 2011

1 commit


16 Jul, 2011

1 commit

  • Instead of having firmware command functions return an error and also
    a status, leading to code like:

    err = mthca_FW_COMMAND(..., &status);
    if (err)
    goto out;
    if (status) {
    err = -E...;
    goto out;
    }

    all over the place, just handle the FW status inside the FW command
    handling code (the way mlx4 does it), so we can simply write:

    err = mthca_FW_COMMAND(...);
    if (err)
    goto out;

    In addition to simplifying the source code, this also saves a healthy
    chunk of text:

    add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
    function old new delta
    static.trans_table 324 584 +260
    mthca_cmd_poll 352 477 +125
    mthca_cmd_wait 511 567 +56
    mthca_table_put 213 240 +27
    mthca_cleanup_db_tab 372 387 +15
    __mthca_remove_one 314 323 +9
    mthca_cleanup_user_db_tab 275 283 +8
    __mthca_init_one 1738 1746 +8
    mthca_cleanup 20 21 +1
    mthca_MAD_IFC 1081 1082 +1
    mthca_MGID_HASH 43 40 -3
    mthca_MAP_ICM_AUX 23 20 -3
    mthca_MAP_ICM 19 16 -3
    mthca_MAP_FA 23 20 -3
    mthca_READ_MGM 43 38 -5
    mthca_QUERY_SRQ 43 38 -5
    mthca_QUERY_QP 59 54 -5
    mthca_HW2SW_SRQ 43 38 -5
    mthca_HW2SW_MPT 60 55 -5
    mthca_HW2SW_EQ 43 38 -5
    mthca_HW2SW_CQ 43 38 -5
    mthca_free_icm_table 120 114 -6
    mthca_query_srq 214 206 -8
    mthca_free_qp 662 654 -8
    mthca_cmd 38 28 -10
    mthca_alloc_db 1321 1311 -10
    mthca_setup_hca 1067 1055 -12
    mthca_WRITE_MTT 35 22 -13
    mthca_WRITE_MGM 40 27 -13
    mthca_UNMAP_ICM_AUX 36 23 -13
    mthca_UNMAP_FA 36 23 -13
    mthca_SYS_DIS 36 23 -13
    mthca_SYNC_TPT 36 23 -13
    mthca_SW2HW_SRQ 35 22 -13
    mthca_SW2HW_MPT 35 22 -13
    mthca_SW2HW_EQ 35 22 -13
    mthca_SW2HW_CQ 35 22 -13
    mthca_RUN_FW 36 23 -13
    mthca_DISABLE_LAM 36 23 -13
    mthca_CLOSE_IB 36 23 -13
    mthca_CLOSE_HCA 38 25 -13
    mthca_ARM_SRQ 39 26 -13
    mthca_free_icms 178 164 -14
    mthca_QUERY_DDR 389 375 -14
    mthca_resize_cq 1063 1048 -15
    mthca_unmap_eq_icm 123 107 -16
    mthca_map_eq_icm 396 380 -16
    mthca_cmd_box 90 74 -16
    mthca_SET_IB 433 417 -16
    mthca_RESIZE_CQ 369 353 -16
    mthca_MAP_ICM_page 240 224 -16
    mthca_MAP_EQ 183 167 -16
    mthca_INIT_IB 473 457 -16
    mthca_INIT_HCA 745 729 -16
    mthca_map_user_db 816 798 -18
    mthca_SYS_EN 157 139 -18
    mthca_cleanup_qp_table 78 59 -19
    mthca_cleanup_eq_table 168 149 -19
    mthca_UNMAP_ICM 143 121 -22
    mthca_modify_srq 172 149 -23
    mthca_unmap_fmr 198 174 -24
    mthca_query_qp 814 790 -24
    mthca_query_pkey 343 319 -24
    mthca_SET_ICM_SIZE 34 10 -24
    mthca_QUERY_DEV_LIM 1870 1846 -24
    mthca_map_cmd 1130 1105 -25
    mthca_ENABLE_LAM 401 375 -26
    mthca_modify_port 247 220 -27
    mthca_query_device 884 850 -34
    mthca_NOP 75 41 -34
    mthca_table_get 287 249 -38
    mthca_init_qp_table 333 293 -40
    mthca_MODIFY_QP 348 308 -40
    mthca_close_hca 131 89 -42
    mthca_free_eq 435 390 -45
    mthca_query_port 755 705 -50
    mthca_free_cq 581 528 -53
    mthca_alloc_icm_table 578 524 -54
    mthca_multicast_attach 1041 986 -55
    mthca_init_hca 326 271 -55
    mthca_query_gid 487 431 -56
    mthca_free_srq 524 468 -56
    mthca_free_mr 168 111 -57
    mthca_create_eq 1560 1501 -59
    mthca_multicast_detach 790 728 -62
    mthca_write_mtt 918 854 -64
    mthca_register_device 1406 1342 -64
    mthca_fmr_alloc 947 883 -64
    mthca_mr_alloc 652 582 -70
    mthca_process_mad 1242 1164 -78
    mthca_dev_lim 910 830 -80
    find_mgm 482 400 -82
    mthca_modify_qp 3852 3753 -99
    mthca_init_cq 1281 1181 -100
    mthca_alloc_srq 1719 1610 -109
    mthca_init_eq_table 1807 1679 -128
    mthca_init_tavor 761 491 -270
    mthca_init_arbel 2617 2098 -519

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     

14 Jul, 2011

2 commits

  • Conflicts:
    net/bluetooth/l2cap_core.c

    David S. Miller
     
  • SCSI scanning of a channel:id:lun triplet in Linux works as follows
    (function scsi_scan_target() in drivers/scsi/scsi_scan.c):

    - If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
    and process the result.

    - If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
    corresponding to the specified channel:id:lun triplet to verify
    whether the LUN exists.

    So a SCSI driver must either take the channel and target id values in
    account in its quecommand() function or it should declare that it only
    supports one channel and one target id.

    Currently the ib_srp driver does neither. As a result scanning the
    SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
    devices to be created. For each 0:0:L device, several duplicates are
    created with the same LUN number and with (C:I) != (0:0). Fix this by
    declaring that the ib_srp driver only supports one channel and one
    target id.

    Signed-off-by: Bart Van Assche
    Cc:
    Acked-by: David Dillow
    Signed-off-by: Roland Dreier

    Bart Van Assche
     

06 Jul, 2011

1 commit


05 Jul, 2011

1 commit

  • Commits 71c29bd5c235 ("IB/uverbs: Add devnode method to set path/mode")
    and c3af0980ce01 ("IB: Add devnode methods to cm_class and umad_class")
    added devnode methods that set the mode.

    However, these methods don't check for a NULL mode, and so we get a
    crash when unloading modules because devtmpfs_delete_node() calls
    device_get_devnode() with mode == NULL.

    Add the missing checks.

    Signed-off-by: Goldwyn Rodrigues
    [ Also fix cm.c. - Roland ]
    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

18 Jun, 2011

5 commits