20 Jan, 2011

9 commits

  • This implements a mqprio queueing discipline that by default creates
    a pfifo_fast qdisc per tx queue and provides the needed configuration
    interface.

    Using the mqprio qdisc the number of tcs currently in use along
    with the range of queues alloted to each class can be configured. By
    default skbs are mapped to traffic classes using the skb priority.
    This mapping is configurable.

    Configurable parameters,

    struct tc_mqprio_qopt {
    __u8 num_tc;
    __u8 prio_tc_map[TC_BITMASK + 1];
    __u8 hw;
    __u16 count[TC_MAX_QUEUE];
    __u16 offset[TC_MAX_QUEUE];
    };

    Here the count/offset pairing give the queue alignment and the
    prio_tc_map gives the mapping from skb->priority to tc.

    The hw bit determines if the hardware should configure the count
    and offset values. If the hardware bit is set then the operation
    will fail if the hardware does not implement the ndo_setup_tc
    operation. This is to avoid undetermined states where the hardware
    may or may not control the queue mapping. Also minimal bounds
    checking is done on the count/offset to verify a queue does not
    exceed num_tx_queues and that queue ranges do not overlap. Otherwise
    it is left to user policy or hardware configuration to create
    useful mappings.

    It is expected that hardware QOS schemes can be implemented by
    creating appropriate mappings of queues in ndo_tc_setup().

    One expected use case is drivers will use the ndo_setup_tc to map
    queue ranges onto 802.1Q traffic classes. This provides a generic
    mechanism to map network traffic onto these traffic classes and
    removes the need for lower layer drivers to know specifics about
    traffic types.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This patch provides a mechanism for lower layer devices to
    steer traffic using skb->priority to tx queues. This allows
    for hardware based QOS schemes to use the default qdisc without
    incurring the penalties related to global state and the qdisc
    lock. While reliably receiving skbs on the correct tx ring
    to avoid head of line blocking resulting from shuffling in
    the LLD. Finally, all the goodness from txq caching and xps/rps
    can still be leveraged.

    Many drivers and hardware exist with the ability to implement
    QOS schemes in the hardware but currently these drivers tend
    to rely on firmware to reroute specific traffic, a driver
    specific select_queue or the queue_mapping action in the
    qdisc.

    By using select_queue for this drivers need to be updated for
    each and every traffic type and we lose the goodness of much
    of the upstream work. Firmware solutions are inherently
    inflexible. And finally if admins are expected to build a
    qdisc and filter rules to steer traffic this requires knowledge
    of how the hardware is currently configured. The number of tx
    queues and the queue offsets may change depending on resources.
    Also this approach incurs all the overhead of a qdisc with filters.

    With the mechanism in this patch users can set skb priority using
    expected methods ie setsockopt() or the stack can set the priority
    directly. Then the skb will be steered to the correct tx queues
    aligned with hardware QOS traffic classes. In the normal case with
    single traffic class and all queues in this class everything
    works as is until the LLD enables multiple tcs.

    To steer the skb we mask out the lower 4 bits of the priority
    and allow the hardware to configure upto 15 distinct classes
    of traffic. This is expected to be sufficient for most applications
    at any rate it is more then the 8021Q spec designates and is
    equal to the number of prio bands currently implemented in
    the default qdisc.

    This in conjunction with a userspace application such as
    lldpad can be used to implement 8021Q transmission selection
    algorithms one of these algorithms being the extended transmission
    selection algorithm currently being used for DCB.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • If a rtnetlink request specifies a negative or zero ifindex and has no
    interface name attribute, but has a group attribute, then the chenges
    are made to all the interfaces belonging to the specified group.

    Signed-off-by: Vlad Dogaru
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Vlad Dogaru
     
  • Net devices can now be grouped, enabling simpler manipulation from
    userspace. This patch adds a group field to the net_device structure, as
    well as rtnetlink support to query and modify it.

    Signed-off-by: Vlad Dogaru
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Vlad Dogaru
     
  • Clean up some unused macros in net/*.
    1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
    2. never be used since introduced to kernel.
    e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.

    Signed-off-by: Shan Wei
    Acked-by: Sjur Braendeland
    Signed-off-by: David S. Miller

    Shan Wei
     
  • Update vxge driver version to 2.5.2

    Signed-off-by: Jon Mason
    Signed-off-by: David S. Miller

    Jon Mason
     
  • To reduce the possibility of losing an interrupt in the handler due to a
    race between an interrupt processing and disable/enable of interrupts,
    enable MSIX one shot.

    Also, add support for adaptive interrupt coalesing

    Signed-off-by: Jon Mason
    Signed-off-by: Masroor Vettuparambil
    Signed-off-by: David S. Miller

    Jon Mason
     
  • The firmware PXE EPROM version detection is failing due to passing the
    wrong parameter into firmware query function. Also, the version
    printing function has an extraneous newline.

    Signed-off-by: Jon Mason
    Signed-off-by: Sivakumar Subramani
    Signed-off-by: David S. Miller

    Jon Mason
     
  • Reorder the commands to be in the inverse order of their allocations
    (instead of the random order they appear to be in), propagate return
    code on errors from pci_request_region and register_netdev, reduce the
    config_dev_cnt and total_dev_cnt counters on remove, and return the
    correct error code for vdev->vpaths kzalloc failures. Also, prevent
    leaking of vdev->vpaths memory and netdev in vxge_probe error path due
    to freeing for these not occurring in vxge_device_unregister.

    Signed-off-by: Jon Mason
    Signed-off-by: Sivakumar Subramani
    Signed-off-by: David S. Miller

    Jon Mason
     

19 Jan, 2011

31 commits

  • Packet filter (BPF) doesnt need to disable softirqs, being fully
    re-entrant and lock-less.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Linux Socket Filters can already be successfully attached and detached on unix
    sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
    See: Documentation/networking/filter.txt

    But the filter was never used in the unix socket code so it did not work. This
    patch uses sk_filter() to filter buffers before delivery.

    This short program demonstrates the problem on SOCK_DGRAM.

    int main(void) {
    int i, j, ret;
    int sv[2];
    struct pollfd fds[2];
    char *message = "Hello world!";
    char buffer[64];
    struct sock_filter ins[32] = {{0,},};
    struct sock_fprog filter;

    socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);

    for (i = 0 ; i < 2 ; i++) {
    fds[i].fd = sv[i];
    fds[i].events = POLLIN;
    fds[i].revents = 0;
    }

    for(j = 1 ; j < 13 ; j++) {

    /* Set a socket filter to truncate the message */
    memset(ins, 0, sizeof(ins));
    ins[0].code = BPF_RET|BPF_K;
    ins[0].k = j;
    filter.len = 1;
    filter.filter = ins;
    setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));

    /* send a message */
    send(sv[0], message, strlen(message) + 1, 0);

    /* The filter should let the message pass but truncated. */
    poll(fds, 2, 0);

    /* Receive the truncated message*/
    ret = recv(sv[1], buffer, 64, 0);
    printf("received %d bytes, expected %d\n", ret, j);
    }

    for (i = 0 ; i < 2 ; i++)
    close(sv[i]);

    return 0;
    }

    Signed-off-by: Alban Crequy
    Reviewed-by: Ian Molton
    Signed-off-by: David S. Miller

    Alban Crequy
     
  • David S. Miller
     
  • Just stumbled upon the issue while looking for another bug.

    The code looks correct, the indentation is not.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: David S. Miller

    Anton Vorontsov
     
  • sh_irda can not use RX/TX in same time,
    but this driver didn't return to RX mode when TX error occurred.
    This patch care xmit error case to solve this issue.

    Signed-off-by: Kuninori Morimoto
    Signed-off-by: David S. Miller

    Kuninori Morimoto
     
  • In netif_skb_features() we return only the features that are valid for vlans
    if we have a vlan packet. However, we should not mask out NETIF_F_HW_VLAN_TX
    since it enables transmission of vlan tags and is obviously valid.

    Reported-by: Eric Dumazet
    Signed-off-by: Jesse Gross
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jesse Gross
     
  • - tx_fixup() can be called from either timer callback or from xmit()
    in usbnet, so spinlock is added to avoid concurrency-related problem.
    - minor correction due to checkpatch warning for some line over 80
    chars after previous patch was applied.

    Signed-off-by: Alexey Orishko
    Signed-off-by: David S. Miller

    Alexey Orishko
     
  • In drivers/net/ns83820.c::ns83820_init_one() we dynamically allocate
    memory via alloc_etherdev(). We then call PRIV() on the returned storage
    which is 'return netdev_priv()'. netdev_priv() takes the pointer it is
    passed and adds 'ALIGN(sizeof(struct net_device), NETDEV_ALIGN)' to it and
    returns it. Then we test the resulting pointer for NULL, which it is
    unlikely to be at this point, and later dereference it. This will go bad
    if alloc_etherdev() actually returned NULL.

    This patch reworks the code slightly so that we test for a NULL pointer
    (and return -ENOMEM) directly after calling alloc_etherdev().

    Signed-off-by: Jesper Juhl
    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • When a network namespace is created (via CLONE_NEWNET), the loopback
    interface is automatically added to the new namespace, triggering a
    printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.

    This is problematic for applications which use CLONE_NEWNET as
    part of a sandbox, like Chromium's suid sandbox or recent versions of
    vsftpd. On a busy machine, it can lead to thousands of useless
    "lo: Disabled Privacy Extensions" messages appearing in dmesg.

    It's easy enough to check the status of privacy extensions via the
    use_tempaddr sysctl, so just removing the printk seems like the most
    sensible solution.

    Signed-off-by: Romain Francoise
    Signed-off-by: David S. Miller

    Romain Francoise
     
  • Update bnx2x version to 1.62.00-4

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Fix AER settings for BCM57712 to allow accessing all device addresses range in CL45 MDC/MDIO

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Fix BCM84823 LED behavior which may show on some systems

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Device may show incorrect duplex mode for devices with external PHY

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Improve microcode loading verification before proceeding to next stage

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • LED on BCM57712+BCM8727 systems requires different settings

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Common init used to be called by the driver when the first port comes up, mainly to reset and reload external PHY microcode.
    However, in case management driver is active on the other port, traffic would halted. So limit the common init to be done only once after POR.

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Enable controlling BCM8073 PN polarity swap through nvm configuration, which is required in certain systems

    Signed-off-by: Yaniv Rosner
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Yaniv Rosner
     
  • Linus Torvalds
     
  • * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
    hwmon: (lm93) Add support for LM94

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Validate cpu early in perf_event_alloc()
    perf: Find_get_context: fix the per-cpu-counter check
    perf: Fix contexted inheritance

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Clear irqstack thread_info
    x86: Make relocatable kernel work with new binutils

    Linus Torvalds
     
  • * 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (26 commits)
    MIPS: Malta: enable Cirrus FB console
    MIPS: add CONFIG_VIRTUALIZATION for virtio support
    MIPS: Implement __read_mostly
    MIPS: ath79: add common WMAC device for AR913X based boards
    MIPS: ath79: Add initial support for the Atheros AP81 reference board
    MIPS: ath79: add common SPI controller device
    SPI: Add SPI controller driver for the Atheros AR71XX/AR724X/AR913X SoCs
    MIPS: ath79: add common GPIO buttons device
    MIPS: ath79: add common watchdog device
    MIPS: ath79: add common GPIO LEDs device
    MIPS: ath79: add initial support for the Atheros PB44 reference board
    MIPS: ath79: utilize the MIPS multi-machine support
    MIPS: ath79: add GPIOLIB support
    MIPS: Add initial support for the Atheros AR71XX/AR724X/AR931X SoCs
    MIPS: jump label: Add MIPS support.
    MIPS: Use WARN() in uasm for better diagnostics.
    MIPS: Optimize TLB handlers for Octeon CPUs
    MIPS: Add LDX and LWX instructions to uasm.
    MIPS: Use BBIT instructions in TLB handlers
    MIPS: Declare uasm bbit0 and bbit1 functions.
    ...

    Linus Torvalds
     
  • David S. Miller
     
  • This patch adds basic support for LM94 to the LM93 driver. LM94 specific
    sensors and features are not supported.

    Signed-off-by: Guenter Roeck
    Acked-by: Jean Delvare

    Guenter Roeck
     
  • When read valid tx/rx chains from EEPROM, there is a bug to use the
    tx chain value for both tx and rx, the result of this cause low
    receive throughput on 1x2 devices becuase rx will only utilize single
    chain instead of two chains

    Signed-off-by: Wey-Yi Guy
    Signed-off-by: John W. Linville

    Wey-Yi Guy
     
  • ath5k_reset must be called with sc->lock. Since the tx queue
    watchdog runs in a workqueue and accesses sc, it's appropriate
    to just take the lock over the whole function.

    Signed-off-by: Bob Copeland
    Signed-off-by: John W. Linville

    Bob Copeland
     
  • Starting from perf_event_alloc()->perf_init_event(), the kernel
    assumes that event->cpu is either -1 or the valid CPU number.

    Change perf_event_alloc() to validate this argument early. This
    also means we can remove the similar check in
    find_get_context().

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    Cc: gregkh@suse.de
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • If task == NULL, find_get_context() should always check that cpu
    is correct.

    Afaics, the bug was introduced by 38a81da2 "perf events: Clean
    up pid passing", but even before that commit "&& cpu != -1" was
    not exactly right, -ESRCH from find_task_by_vpid() is not
    accurate.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    Cc: gregkh@suse.de
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • While most users of a physical Malta board are using the serial port
    as the console, a lot of QEMU users would prefer to interact with a
    graphical console. Enable the Cirrus FB support in the Malta default
    configuration to make that possible. Note that the default console will
    still be the serial port, users have to pass "console=tty0" to the
    kernel to use the Cirrus FB.

    Signed-off-by: Aurelien Jarno
    To: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/2001/
    Signed-off-by: Ralf Baechle

    Aurelien Jarno
     
  • Add CONFIG_VIRTUALIZATION to the MIPS architecture and include the
    the virtio code there. Used to enable the virtio drivers under QEMU.

    Signed-off-by: Aurelien Jarno
    To: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/2002/
    Signed-off-by: Ralf Baechle

    Aurelien Jarno
     
  • Just do what everyone else is doing by placing __read_mostly things in
    the .data.read_mostly section.

    mips_io_port_base can not be read-only (const) and writable
    (__read_mostly) at the same time. One of them has to go, so I chose
    to eliminate the __read_mostly. It will still get stuck in a portion
    of memory that is not adjacent to things that are written, and thus
    not be on a dirty cache line, for whatever that is worth.

    Signed-off-by: David Daney
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1702/
    Signed-off-by: Ralf Baechle

    David Daney