09 Jan, 2012

2 commits

  • infiniband changes for 3.3 merge window

    * tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    rdma/core: Fix sparse warnings
    RDMA/cma: Fix endianness bugs
    RDMA/nes: Fix terminate during AE
    RDMA/nes: Make unnecessarily global nes_set_pau() static
    RDMA/nes: Change MDIO bus clock to 2.5MHz
    IB/cm: Fix layout of APR message
    IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE
    IB/qib: Default some module parameters optimally
    IB/qib: Optimize locking for get_txreq()
    IB/qib: Fix a possible data corruption when receiving packets
    IB/qib: Eliminate 64-bit jiffies use
    IB/qib: Fix style issues
    IB/uverbs: Protect QP multicast list

    Linus Torvalds
     
  • * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
    reiserfs: Properly display mount options in /proc/mounts
    vfs: prevent remount read-only if pending removes
    vfs: count unlinked inodes
    vfs: protect remounting superblock read-only
    vfs: keep list of mounts for each superblock
    vfs: switch ->show_options() to struct dentry *
    vfs: switch ->show_path() to struct dentry *
    vfs: switch ->show_devname() to struct dentry *
    vfs: switch ->show_stats to struct dentry *
    switch security_path_chmod() to struct path *
    vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
    vfs: trim includes a bit
    switch mnt_namespace ->root to struct mount
    vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
    vfs: opencode mntget() mnt_set_mountpoint()
    vfs: spread struct mount - remaining argument of next_mnt()
    vfs: move fsnotify junk to struct mount
    vfs: move mnt_devname
    vfs: move mnt_list to struct mount
    vfs: switch pnode.h macros to struct mount *
    ...

    Linus Torvalds
     

05 Jan, 2012

3 commits


04 Jan, 2012

3 commits

  • Add a missing 16-bit reserved field between ap_status and info fields.

    Signed-off-by: Eli Cohen
    Acked-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Eli Cohen
     
  • Userspace verbs multicast attach/detach operations on a QP are done
    while holding the rwsem of the QP for reading. That's not sufficient
    since a reader lock allows more than one reader to acquire the
    lock. However, multicast attach/detach does list manipulation that
    can corrupt the list if multiple threads run in parallel.

    Fix this by acquiring the rwsem as a writer to serialize attach/detach
    operations. Add idr_write_qp() and put_qp_write() to encapsulate
    this.

    This fixes oops seen when running applications that perform multicast
    joins/leaves.

    Reported by: Mike Dubman
    Signed-off-by: Eli Cohen
    Cc:
    Signed-off-by: Roland Dreier

    Eli Cohen
     
  • both callers of device_get_devnode() are only interested in lower 16bits
    and nobody tries to return anything wider than 16bit anyway.

    Signed-off-by: Al Viro

    Al Viro
     

24 Dec, 2011

1 commit


20 Dec, 2011

1 commit

  • private_data_len is defined as a u8. If the user specifies a large
    private_data size (> 220 bytes), we will calculate a total length that
    exceeds 255, resulting in private_data_len wrapping back to 0. This
    can lead to overwriting random kernel memory. Avoid this by verifying
    that the resulting size fits into a u8.

    Reported-by: B. Thery
    Addresses:
    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

06 Dec, 2011

2 commits


03 Dec, 2011

1 commit


30 Nov, 2011

1 commit

  • Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
    forgot to take care of infiniband uses of dst neighbours.

    Many thanks to Marc Aurele who provided a nice bug report and feedback.

    Reported-by: Marc Aurele La France
    Signed-off-by: Eric Dumazet
    Cc: David Miller
    Cc:
    Signed-off-by: Roland Dreier

    Eric Dumazet
     

23 Nov, 2011

1 commit


07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

02 Nov, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
    mlx4_core: Deprecate log_num_vlan module param
    IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
    IB/mlx4: Enable 4K mtu for IBoE
    RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
    RDMA/cxgb4: Serialize calls to CQ's comp_handler
    RDMA/cxgb3: Serialize calls to CQ's comp_handler
    IB/qib: Fix issue with link states and QSFP cables
    IB/mlx4: Configure extended active speeds
    mlx4_core: Add extended port capabilities support
    IB/qib: Hold links until tuning data is available
    IB/qib: Clean up checkpatch issue
    IB/qib: Remove s_lock around header validation
    IB/qib: Precompute timeout jiffies to optimize latency
    IB/qib: Use RCU for qpn lookup
    IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
    IB/qib: Decode path MTU optimization
    IB/qib: Optimize RC/UC code by IB operation
    IPoIB: Use the right function to do DMA unmap pages
    RDMA/cxgb4: Use correct QID in insert_recv_cqe()
    RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
    ...

    Linus Torvalds
     
  • …sc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next

    Roland Dreier
     

01 Nov, 2011

4 commits

  • Some kernel components pin user space memory (infiniband and perf) (by
    increasing the page count) and account that memory as "mlocked".

    The difference between mlocking and pinning is:

    A. mlocked pages are marked with PG_mlocked and are exempt from
    swapping. Page migration may move them around though.
    They are kept on a special LRU list.

    B. Pinned pages cannot be moved because something needs to
    directly access physical memory. They may not be on any
    LRU list.

    I recently saw an mlockalled process where mm->locked_vm became
    bigger than the virtual size of the process (!) because some
    memory was accounted for twice:

    Once when the page was mlocked and once when the Infiniband
    layer increased the refcount because it needt to pin the RDMA
    memory.

    This patch introduces a separate counter for pinned pages and
    accounts them seperately.

    Signed-off-by: Christoph Lameter
    Cc: Mike Marciniszyn
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • These were getting it implicitly via device.h --> module.h but
    we are going to stop that when we clean up the headers.

    Fix these in advance so the tree remains biscect-clean.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • They had been getting it implicitly via device.h but we can't
    rely on that for the future, due to a pending cleanup so fix
    it now.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • They get it via module.h (via device.h) but we want to clean that up.
    When we do, we'll get things like:

    CC [M] drivers/infiniband/core/sysfs.o
    sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function)
    sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function)

    so add in the stat header it is using explicitly in advance.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

14 Oct, 2011

16 commits

  • Allow processes that share the same XRC domain to open an existing
    shareable QP. This permits those processes to receive events on the
    shared QP and transfer ownership, so that any process may modify the
    QP. The latter allows the creating process to exit, while a remaining
    process can still transition it for path migration purposes.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC TGT QPs are shared resources among multiple processes. Since the
    creating process may exit, allow other processes which share the same
    XRC domain to open an existing QP. This allows us to transfer
    ownership of an XRC TGT QP to another process.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Because an XRC TGT QP can end up being shared among multiple
    processes, don't have the ib_cm automatically send a DREQ when the
    userspace process that owns the ib_cm_id exits. Disconnect can be
    initiated by the user directly; otherwise, the owner of the XRC INI QP
    controls the connection.

    Note that as a result of the process exiting, the ib_cm will stop
    tracking the XRC connection on the target side. For the purposes of
    disconnecting, this isn't a big deal. The ib_cm will respond to the
    DREQ appropriately. For other messages, mainly LAP, the CM will
    reject the request, since there's no one available to route the
    request to.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow users to connect XRC QPs through the rdma_cm.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow the user to indicate the QP type separately from the port space
    when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a
    1:1 relationship between the QP type and port space, so we need to
    switch on the QP type to select between UD and connected QPs.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
    over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
    which is listed as being for 'any local network'.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The XRC annex was updated to have XRC behave more like RD. Specifically,
    the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of
    SRQN is done using the REQ/REP protocol.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Update the REQ and REP messages to support XRC connection setup
    according to the XRC Annex. Several existing fields must be set to 0 or
    1 when connecting XRC QPs, and a reserved field is changed to an
    extended transport type.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow user space to operate on XRC TGT QPs the same way as other types
    of QPs, with one notable exception: since XRC TGT QPs may be shared
    among multiple processes, the XRC TGT QP is allowed to exist beyond the
    lifetime of the creating process.

    The process that creates the QP is allowed to destroy it, but if the
    process exits without destroying the QP, then the QP will be left bound
    to the lifetime of the XRCD.

    TGT QPs are not associated with CQs or a PD.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC INI QPs are similar to send only RC QPs. Allow user space to create
    INI QPs. Note that INI QPs do not require receive CQs.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • We require additional information to create XRC SRQs than we can
    exchange using the existing create SRQ ABI. Provide an enhanced create
    ABI for extended SRQ types.

    Based on patches by Jack Morgenstein
    and Roland Dreier

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow user space to create XRC domains. Because XRCDs are expected to
    be shared among multiple processes, we use inodes to identify an XRCD.

    Based on patches by Jack Morgenstein

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC TGT QPs are intended to be shared among multiple users and
    processes. Allow the destruction of an XRC TGT QP to be done explicitly
    through ib_destroy_qp() or when the XRCD is destroyed.

    To support destroying an XRC TGT QP, we need to track TGT QPs with the
    XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also
    cleaned up.

    To avoid stale reference issues, if a user is holding a reference on a
    TGT QP, we increment a reference count on the QP. The user releases the
    reference by calling ib_release_qp. This releases any access to the QP
    from a user above verbs, but allows the QP to continue to exist until
    destroyed by the XRCD.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    XRC communication is between an initiator (INI) QP and a target (TGT)
    QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP
    behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC
    QPs, except that work requests posted to an XRC INI QP must specify the
    remote SRQ that is the target of the work request.

    We define two new QP types for XRC, to distinguish between INI and TGT
    QPs, and update the core layer to support XRC QPs.

    This patch is derived from work by Jack Morgenstein

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    XRC defines SRQs that are specifically used by XRC connections. Expand
    the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to
    only XRC use according to the IB XRC Annex.

    Portions of this patch were derived from work by
    Jack Morgenstein .

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Currently, there is only a single ("basic") type of SRQ, but with XRC
    support we will add a second. Prepare for this by defining an SRQ type
    and setting all current users to IB_SRQT_BASIC.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

13 Oct, 2011

1 commit

  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    A few new concepts are introduced to support this. This patch adds:

    - A new device capability flag, IB_DEVICE_XRC, which low-level
    drivers set to indicate that a device supports XRC.
    - A new object type, XRC domains (struct ib_xrcd), and new verbs
    ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which
    XRC SRQs an incoming message can target.

    This patch is derived from work by Jack Morgenstein .

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

12 Oct, 2011

1 commit

  • Introduce support for the following extended speeds:

    FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
    64b/66b encoding rather than 8b/10b encoding.
    FDR: IBA extended speed 14.0625 Gbps.
    EDR: IBA extended speed 25.78125 Gbps.

    Signed-off-by: Marcel Apfelbaum
    Reviewed-by: Hal Rosenstock
    Reviewed-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Marcel Apfelbaum