07 Oct, 2012

5 commits

  • set netlink_dump_control.module to avoid panic.

    Signed-off-by: Gao feng
    Cc: Roland Dreier
    Cc: Sean Hefty
    Signed-off-by: David S. Miller

    Gao feng
     
  • I get a panic when I use ss -a and rmmod inet_diag at the
    same time.

    It's because netlink_dump uses inet_diag_dump which belongs to module
    inet_diag.

    I search the codes and find many modules have the same problem. We
    need to add a reference to the module which the cb->dump belongs to.

    Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

    Change From v3:
    change netlink_dump_start to inline,suggestion from Pablo and
    Eric.

    Change From v2:
    delete netlink_dump_done,and call module_put in netlink_dump
    and netlink_sock_destruct.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Pull UAPI disintegration fixes from David Howells:
    "There are three main parts:

    (1) I found I needed some more fixups in the wake of testing Arm64
    (some asm/unistd.h files had weird guards that caused problems -
    mostly in arches for which I don't have a compiler) and some
    __KERNEL__ splitting needed to take place in Arm64.

    (2) I found that c6x was missing some __KERNEL__ guards in its
    asm/signal.h. Mark Salter pointed me at a tree with a patch to
    remove that file entirely and use the asm-generic variant instead.

    (3) Lastly, m68k turned out to have a header installation problem due
    to it lacking a kvm_para.h file.

    The conditional installation bits for linux/kvm_para.h, linux/kvm.h
    and linux/a.out.h weren't very well specified - and didn't work if
    an arch didn't have the asm/ version of that file, but there *was*
    an asm-generic/ version.

    It seems the "ifneq $((wildcard ...),)" for each of those three
    headers in include/kernel/Kbuild is invoked twice during header
    installation, and the second time it matches on the just installed
    asm-generic/kvm_para.h file and thus incorrectly installs
    linux/kvm_para.h as well.

    Most arches actually have an asm/kvm_para.h, so this wasn't
    detectable in those."

    * 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers:
    UAPI: Fix conditional header installation handling (notably kvm_para.h on m68k)
    c6x: remove c6x signal.h
    UAPI: Split compound conditionals containing __KERNEL__ in Arm64
    UAPI: Fix the guards on various asm/unistd.h files
    c6x: make dsk6455 the default config

    Linus Torvalds
     
  • Pull SLAB changes from Pekka Enberg:
    "New and noteworthy:

    * More SLAB allocator unification patches from Christoph Lameter and
    others. This paves the way for slab memcg patches that hopefully
    will land in v3.8.

    * SLAB tracing improvements from Ezequiel Garcia.

    * Kernel tainting upon SLAB corruption from Dave Jones.

    * Miscellanous SLAB allocator bug fixes and improvements from various
    people."

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (43 commits)
    slab: Fix build failure in __kmem_cache_create()
    slub: init_kmem_cache_cpus() and put_cpu_partial() can be static
    mm/slab: Fix kmem_cache_alloc_node_trace() declaration
    Revert "mm/slab: Fix kmem_cache_alloc_node_trace() declaration"
    mm, slob: fix build breakage in __kmalloc_node_track_caller
    mm/slab: Fix kmem_cache_alloc_node_trace() declaration
    mm/slab: Fix typo _RET_IP -> _RET_IP_
    mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB
    mm, slab: Rename __cache_alloc() -> slab_alloc()
    mm, slab: Match SLAB and SLUB kmem_cache_alloc_xxx_trace() prototype
    mm, slab: Replace 'caller' type, void* -> unsigned long
    mm, slob: Add support for kmalloc_track_caller()
    mm, slab: Remove silly function slab_buffer_size()
    mm, slob: Use NUMA_NO_NODE instead of -1
    mm, sl[au]b: Taint kernel when we detect a corrupted slab
    slab: Only define slab_error for DEBUG
    slab: fix the DEADLOCK issue on l3 alien lock
    slub: Zero initial memory segment for kmem_cache and kmem_cache_node
    Revert "mm/sl[aou]b: Move sysfs_slab_add to common"
    mm/sl[aou]b: Move kmem_cache refcounting to common code
    ...

    Linus Torvalds
     
  • Pull ADM Xen support from Konrad Rzeszutek Wilk:

    Features:
    * Allow a Linux guest to boot as initial domain and as normal guests
    on Xen on ARM (specifically ARMv7 with virtualized extensions). PV
    console, block and network frontend/backends are working.
    Bug-fixes:
    * Fix compile linux-next fallout.
    * Fix PVHVM bootup crashing.

    The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
    ARMv7 platforms.

    The goal in implementing this architecture is to exploit the hardware
    as much as possible. That means use as little as possible of PV
    operations (so no PV MMU) - and use existing PV drivers for I/Os
    (network, block, console, etc). This is similar to how PVHVM guests
    operate in X86 platform nowadays - except that on ARM there is no need
    for QEMU. The end result is that we share a lot of the generic Xen
    drivers and infrastructure.

    Details on how to compile/boot/etc are available at this Wiki:

    http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions

    and this blog has links to a technical discussion/presentations on the
    overall architecture:

    http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/

    * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
    xen/xen_initial_domain: check that xen_start_info is initialized
    xen: mark xen_init_IRQ __init
    xen/Makefile: fix dom-y build
    arm: introduce a DTS for Xen unprivileged virtual machines
    MAINTAINERS: add myself as Xen ARM maintainer
    xen/arm: compile netback
    xen/arm: compile blkfront and blkback
    xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
    xen/arm: receive Xen events on ARM
    xen/arm: initialize grant_table on ARM
    xen/arm: get privilege status
    xen/arm: introduce CONFIG_XEN on ARM
    xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
    xen/arm: Introduce xen_ulong_t for unsigned long
    xen/arm: Xen detection and shared_info page mapping
    docs: Xen ARM DT bindings
    xen/arm: empty implementation of grant_table arch specific functions
    xen/arm: sync_bitops
    xen/arm: page.h definitions
    xen/arm: hypercalls
    ...

    Linus Torvalds
     

06 Oct, 2012

35 commits

  • Pull powerpc updates from Benjamin Herrenschmidt:
    "Some highlights in addition to the usual batch of fixes:

    - 64TB address space support for 64-bit processes by Aneesh Kumar

    - Gavin Shan did a major cleanup & re-organization of our EEH support
    code (IBM fancy PCI error handling & recovery infrastructure) which
    paves the way for supporting different platform backends, along
    with some rework of the PCIe code for the PowerNV platform in order
    to remove home made resource allocations and instead use the
    generic code (which is possible after some small improvements to it
    done by Gavin).

    - Uprobes support by Ananth N Mavinakayanahalli

    - A pile of embedded updates from Freescale folks, including new SoC
    and board supports, more KVM stuff including preparing for 64-bit
    BookE KVM support, ePAPR 1.1 updates, etc..."

    Fixup trivial conflicts in drivers/scsi/ipr.c

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
    powerpc/iommu: Fix multiple issues with IOMMU pools code
    powerpc: Fix VMX fix for memcpy case
    driver/mtd:IFC NAND:Initialise internal SRAM before any write
    powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
    powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
    powerpc: Remove tlb batching hack for nighthawk
    powerpc: Set paca->data_offset = 0 for boot cpu
    powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
    powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
    powerpc/mpc85xx: Update interrupt handling for IFC controller
    powerpc/85xx: Enable USB support in p1023rds_defconfig
    powerpc/smp: Do not disable IPI interrupts during suspend
    powerpc/eeh: Fix crash on converting OF node to edev
    powerpc/eeh: Lock module while handling EEH event
    powerpc/kprobe: Don't emulate store when kprobe stwu r1
    powerpc/kprobe: Complete kprobe and migrate exception frame
    powerpc/kprobe: Introduce a new thread flag
    powerpc: Remove unused __get_user64() and __put_user64()
    powerpc/eeh: Global mutex to protect PE tree
    powerpc/eeh: Remove EEH PE for normal PCI hotplug
    ...

    Linus Torvalds
     
  • Pull networking changes from David Miller:
    "The most important bit in here is the fix for input route caching from
    Eric Dumazet, it's a shame we couldn't fully analyze this in time for
    3.6 as it's a 3.6 regression introduced by the routing cache removal.

    Anyways, will send quickly to -stable after you pull this in.

    Other changes of note:

    1) Fix lockdep splats in team and bonding, from Eric Dumazet.

    2) IPV6 adds link local route even when there is no link local
    address, from Nicolas Dichtel.

    3) Fix ixgbe PTP implementation, from Jacob Keller.

    4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.

    5) MAC length computed improperly in VLAN demux, from Antonio
    Quartulli."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
    ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
    Remove noisy printks from llcp_sock_connect
    tipc: prevent dropped connections due to rcvbuf overflow
    silence some noisy printks in irda
    team: set qdisc_tx_busylock to avoid LOCKDEP splat
    bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
    sctp: check src addr when processing SACK to update transport state
    sctp: fix a typo in prototype of __sctp_rcv_lookup()
    ipv4: add a fib_type to fib_info
    can: mpc5xxx_can: fix section type conflict
    can: peak_pcmcia: fix error return code
    can: peak_pci: fix error return code
    cxgb4: Fix build error due to missing linux/vmalloc.h include.
    bnx2x: fix ring size for 10G functions
    cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
    ixgbe: add support for X540-AT1
    ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
    ixgbe: fix PTP ethtool timestamping function
    ixgbe: (PTP) Fix PPS interrupt code
    ixgbe: Fix PTP X540 SDP alignment code for PPS signal
    ...

    Linus Torvalds
     
  • Merge misc patches from Andrew Morton:
    "The MM tree is rather stuck while I wait to find out what the heck is
    happening with sched/numa. Probably I'll need to route around all the
    code which was added to -next, sigh.

    So this is "everything else", or at least most of it - other small
    bits are still awaiting resolutions of various kinds."

    * emailed patches from Andrew Morton : (180 commits)
    lib/decompress.c add __init to decompress_method and data
    kernel/resource.c: fix stack overflow in __reserve_region_with_split()
    omfs: convert to use beXX_add_cpu()
    taskstats: cgroupstats_user_cmd() may leak on error
    aoe: update aoe-internal version number to 50
    aoe: update documentation to better reflect aoe-plus-udev usage
    aoe: remove unused code
    aoe: make dynamic block minor numbers the default
    aoe: update and specify AoE address guards and error messages
    aoe: retain static block device numbers for backwards compatibility
    aoe: support more AoE addresses with dynamic block device minor numbers
    aoe: update documentation with new URL and VM settings reference
    aoe: update copyright year in touched files
    aoe: update internal version number to 49
    aoe: remove unused code and add cosmetic improvements
    aoe: increase net_device reference count while using it
    aoe: associate frames with the AoE storage target
    aoe: disallow unsupported AoE minor addresses
    aoe: do revalidation steps in order
    aoe: failover remote interface based on aoe_deadsecs parameter
    ...

    Linus Torvalds
     
  • Fix the warning:

    WARNING: vmlinux.o(.text+0x14cfd8): Section mismatch in reference from the variable compressed_formats to the function .init.text:gunzip()
    The function compressed_formats() references
    the function __init gunzip().
    etc..

    Within decompress.c, compressed_formats[] needs 'a __initdata annotation',
    because some of it's data members refer to functions which will be
    unloaded after init.

    Consequently, its user decompress_method() will get the __init prefix.

    Signed-off-by: Hein Tibosch
    Cc: Albin Tonnerre
    Cc: Phillip Lougher
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hein Tibosch
     
  • Using a recursive call add a non-conflicting region in
    __reserve_region_with_split() could result in a stack overflow in the case
    that the recursive calls are too deep. Convert the recursive calls to an
    iterative loop to avoid the problem.

    Tested on a machine containing 135 regions. The kernel no longer panicked
    with stack overflow.

    Also tested with code arbitrarily adding regions with no conflict,
    embedding two consecutive conflicts and embedding two non-consecutive
    conflicts.

    Signed-off-by: T Makphaibulchoke
    Reviewed-by: Ram Pai
    Cc: Paul Gortmaker
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    T Makphaibulchoke
     
  • Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().

    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)

    Signed-off-by: Wei Yongjun
    Acked-by: Bob Copeland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yongjun
     
  • If prepare_reply() succeeds we have allocated memory for 'rep_skb'. If
    nla_reserve() then subsequently fails and returns NULL we fail to release
    the memory we allocated, thus causing a leak.

    Signed-off-by: Jesper Juhl
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Because udev use is so widespread, making the old static mapping the
    default is too conservative, given the severe limitations it places on
    usable AoE addresses. Storage virtualization and larger shelves have made
    the old limitations too confining.

    These changes make the dynamic block device minor numbers the default,
    removing the limitations on usable AoE addresses.

    The static arrangement is still available with aoe_dyndevs=0, and the
    aoe-stat tool from the userland aoetools package, the user space
    counterpart to the aoe driver, recognizes the case where there is a
    mismatch between the minor number in sysfs and the minor number in a
    special device file.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • In general, specific is better when it comes to messages about AoE usage
    problems. Also, explicit checks for the AoE broadcast addresses are
    added.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The old mapping between AoE target shelf and slot addresses and the block
    device minor number is retained as a backwards-compatible feature, with a
    new "aoe_dyndevs" module parameter available for enabling dynamic block
    device minor numbers.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The ATA over Ethernet protocol uses a major (shelf) and minor (slot)
    address to identify a particular storage target. These changes remove an
    artificial limitation the aoe driver imposes on the use of AoE addresses.
    For example, without these changes, the slot address has a maximum of 15,
    but users commonly use slot numbers much greater than that.

    The AoE shelf and slot address space is often used sparsely. Instead of
    using a static mapping between AoE addresses and the block device minor
    number, the block device minor numbers are now allocated on demand.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The old area has a new URL. Also, now that the driver can perform better,
    it is worth mentioning the VM settings that help aoe to sink dirty pages
    out early, avoiding unecessary memory pressure when much I/O is going on.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The internal version number of the aoe driver appears in a console message
    when the driver loads and is usually obtained by the user with the
    userland aoe-version tool, part of the aoetools.[1]

    Although this patchset includes bugfixes backported from higher-numbered
    versions published on the coraid.com website, it is a form of version 49.

    1. http://aoetools.sourceforge.net/

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • This change removes some unused code and attempts to increase code
    consistency.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • This change eliminates the danger that the user could rmmod the driver for
    a network interface that is being used for AoE by the aoe driver.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • In the driver code, "target" and aoetgt refer to a particular remote
    interface on the AoE storage target. The latter is identified by its AoE
    major and minor addresses. Commands that are being sent to an AoE storage
    target {major, minor} can be sent or retransmitted to any of the remote
    MAC addresses associated with the AoE storage target.

    That is, frames are naturally associated with not an aoetgt (AoE major,
    AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor).
    Making the code reflect that reality simplifies the driver, especially
    when the path to a remote MAC address becomes unusable.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • A guard is inserted to prevent AoE minor addresses (slot addresses) higher
    than 15 to be used, as they are not yet supported by the driver.

    There is a change coming that will allow the aoe driver to overcome this
    limit by using system device minor numbers dynamically, but until then,
    this guard prevents unexpected targets from being used by the driver when
    AoE targets with high minor numbers are on the AoE network.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The discovery process begins with an optional AoE config query command and
    an AoE config query response. Normally when an aoe device is already
    open, the config query response does not trigger an ATA identify device
    command to be sent out, since the response contains storage capacity
    information that, if changed, could surprise the user of the device.

    The userland "aoe-revalidate" tool uses a character device to trigger an
    AoE config query for a particular AoE storage target and an ATA device
    identify command, even when the device is open.

    This change causes the config query to go out first, reflecting the normal
    discovery sequence. The responses could come back in any order, so this
    change is fairly cosmetic.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The aoe_deadsecs module parameter allows the user to specify a hard limit
    on the number of seconds an AoE command can be retransmitted before the
    AoE block device is considered to have failed.

    Using aoe_deadsecs to determine the time we try using a different remote
    interface helps to ensure that the hard limit is not reached before we've
    tried to recover by sending to a different remote port.

    As a data storage target, the AoE target is unambiguously identified by
    its {major, minor} AoE address tuple, and an AoE target can have multiple
    MAC addresses. However, note that "target" in the driver code and
    comments means a {major, minor, MAC address} tuple, as in "somewhere to
    send packets".

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Users with several network interfaces dedicated to AoE generally do not
    configure them to support different-sized AoE data payloads on purpose.

    For a given AoE target, there will be a set of local network interfaces
    that can reach it. Using only the payload that will fit in the
    smallest-sized MTU of all those local interfaces greatly simplifies the
    driver, especially in failure scenarios.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The dev_queue_xmit function needs to have interrupts enabled, so the most
    simple way to get the locking right but still fulfill that requirement is
    to use a process that can call dev_queue_xmit serially over queued
    transmissions.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • To allow users to choose an elevator algorithm for their particular
    workloads, change from a make_request-style driver to an
    I/O-request-queue-handler-style driver.

    We have to do a couple of things that might be surprising. We manipulate
    the page _count directly on the assumption that we still have no guarantee
    that users of the block layer are prohibited from submitting bios
    containing pages with zero reference counts.[1] If such a prohibition now
    exists, I can get rid of the _count manipulation.

    Just as before this patch, we still keep track of the sk_buffs that the
    network layer still hasn't finished yet and cap the resources we use with
    a "pool" of skbs.[2]

    Now that the block layer maintains the disk stats, the aoe driver's
    diskstats function can go away.

    1. https://lkml.org/lkml/2007/3/1/374
    2. https://lkml.org/lkml/2007/7/6/241

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Make the frames the aoe driver uses to track the relationship between bios
    and packets more flexible and detached, so that they can be passed to an
    "aoe_ktio" thread for completion of I/O.

    The frames are handled much like skbs, with a capped amount of
    preallocation so that real-world use cases are likely to run smoothly and
    degenerate gracefully even under memory pressure.

    Decoupling I/O completion from the receive path and serializing it in a
    process makes it easier to think about the correctness of the locking in
    the driver, especially in the case of a remote MAC address becoming
    unusable.

    [dan.carpenter@oracle.com: cleanup an allocation a bit]
    Signed-off-by: Ed Cashin
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • tAdd adds the ability to work with large packets composed of a number of
    segments, using the scatter gather feature of the block layer (biovecs)
    and the network layer (skb frag array). The motivation is the performance
    gained by using a packet data payload greater than a page size and by
    using the network card's scatter gather feature.

    Users of the out-of-tree aoe driver already had these changes, but since
    early 2011, they have complained of increased memory utilization and
    higher CPU utilization during heavy writes.[1] The commit below appears
    related, as it disables scatter gather on non-IP protocols inside the
    harmonize_features function, even when the NIC supports sg.

    commit f01a5236bd4b140198fbcc550f085e8361fd73fa
    Author: Jesse Gross
    Date: Sun Jan 9 06:23:31 2011 +0000

    net offloading: Generalize netif_get_vlan_features().

    With that regression in place, transmits always linearize sg AoE packets,
    but in-kernel users did not have this patch. Before 2.6.38, though, these
    changes were working to allow sg to increase performance.

    1. http://www.spinics.net/lists/linux-mm/msg15184.html

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Add discard support to nbd. If the nbd-server supports discard, it will
    send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
    in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
    for the device (QUEUE_FLAG_DISCARD).

    If discard support is enabled, then when the nbd client system receives a
    discard request, this will be passed along to the nbd-server. When the
    discard request is received by the nbd-server, it will perform:

    fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

    To punch a hole in the backend storage, which is no longer needed.

    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • Add a set-flags ioctl, allowing various option flags to be set on an nbd
    device. This allows the nbd-client to set the device flags (to enable
    read-only mode, or enable discard support, etc.).

    Flags are typically specified by the nbd-server. During the negotiation
    phase of the nbd connection, the server sends its flags to the client.
    The client then uses NBD_SET_FLAGS to inform the kernel of the options.

    Also included is a one-line fix to debug output for the set-timeout ioctl.

    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • Replace the single global destination ID counter with per-net allocation
    mechanism to allow independent destID management for each available
    RapidIO network. Using bitmap based mechanism instead of counters allows
    destination ID release and reuse in systems that support hot-swap.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Li Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Make RIONET driver multi-net safe/capable by introducing per-net lists of
    RapidIO network peers. Rework registration of network adapters to support
    all available RIO master port devices.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Li Yang
    Cc: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Modify mport initialization routine to run the RapidIO discovery process
    asynchronously. This allows to have an arbitrary order of enumerating and
    discovering ports in systems with multiple RapidIO controllers without
    creating a deadlock situation if enumerator port is registered after a
    discovering one.

    Making netID matching to mportID ensures consistent net ID assignment in
    multiport RapidIO systems with asynchronous discovery process (global
    counter implementation is affected by race between threads).

    [akpm@linux-foundation.org: tweak code layput]
    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Li Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Modify handling of device lists to resolve issues caused by using single
    global list of RIO devices during enumeration/discovery. The most common
    sign of existing issue is incorrect contents of switch routing tables in
    systems with multiple mport controllers while single-port configuration
    performs as expected.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Li Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • The following set of patches provides modifications targeting support of
    multiple RapidIO master port (mport) devices on a CPU-side of
    RapidIO-capable board. While the RapidIO subsystem code has definitions
    suitable for multi-controller/multi-net support, the existing
    implementation cannot be considered ready for multiple mport
    configurations.

    =========== NOTES: =============

    a) The patches below do not address RapidIO side view of multiport
    processing elements defined in Part 6 of RapidIO spec Rev.2.1 (section
    6.4.1). These devices have Base Device ID CSR (0x60) and Component Tag
    CSR (0x6C) shared by all SRIO ports. For example, Freescale's P4080,
    P3041 and P5020 have a dual-port SRIO controller implemented according
    the specification. Enumeration/discovery of such devices from RapidIO
    side may require device-specific fixups.

    b) Devices referenced above may also require implementation specific
    code to setup a host device ID for mport device. These operations are
    not addressed by patches in this package.

    =================================

    Details about provided patches:

    1. Fix blocking wait for discovery ready

    While it does not happen on PowerPC based platforms, there is
    possibility of stalled CPU warning dump on x86 based platforms that run
    RapidIO discovery process if they wait too long for being enumerated.

    Currently users can avoid it by disabling the soft-lockup detector
    using "nosoftlockup" kernel parameter OR by ensuring that enumeration
    is completed before soft-lockup is detected.

    This patch eliminates blocking wait and keeps a scheduler running.
    It also is required for patch 3 below which introduces asynchronous
    discovery process.

    2. Use device lists handling on per-net basis

    This patch allows to correctly support multiple RapidIO nets and
    resolves possible issues caused by using single global list of devices
    during RapidIO system enumeration/discovery. The most common sign of
    existing issue is incorrect contents of switch routing tables in
    systems with multiple mport controllers while single-port configuration
    performs as expected.

    The patch does not eliminate the global RapidIO device list but
    changes some routines in enumeration/discovery to use per-net device
    lists instead. This way compatibility with upper layer RIO routines is
    preserved.

    3. Run discovery as an asynchronous process

    This patch modifies RapidIO initialization routine to asynchronously
    run the discovery process for each corresponding mport. This allows
    having an arbitrary order of enumerating and discovering mports without
    creating a deadlock situation if an enumerator port was registered
    after a discovering one.

    On boards with multiple discovering mports it also eliminates order
    dependency between mports and may reduce total time of RapidIO
    subsystem initialization.

    Making netID matching to mportID ensures consistent netID assignment
    in multiport RapidIO systems with asynchronous discovery process
    (global counter implementation is affected by race between threads).

    4. Rework RIONET to support multiple RIO master ports

    In the current version of the driver rionet_probe() has comment "XXX
    Make multi-net safe". Now it is a good time to address this comment.

    This patch makes RIONET driver multi-net safe/capable by introducing
    per-net lists of RapidIO network peers. It also enables to register
    network adapters for all available mport devices.

    5. Add destination ID allocation mechanism

    The patch replaces a single global destination ID counter with
    per-net allocation mechanism to allow independent destID management for
    each available RapidIO network. Using bitmap based mechanism instead
    of counters allows destination ID release and reuse in systems that
    support hot-swap.

    This patch:

    Fix blocking wait loop in the RapidIO discovery routine to avoid warning
    dumps about stalled CPU on x86 platforms.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Li Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine