14 Oct, 2013

5 commits

  • [ Upstream commit 9a3bab6b05383f1e4c3716b3615500c51285959e ]

    A host might need net_secret[] and never open a single socket.

    Problem added in commit aebda156a570782
    ("net: defer net_secret[] initialization")

    Based on prior patch from Hannes Frederic Sowa.

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ]

    I've been hitting a NULL ptr deref while using netconsole because the
    np->dev check and the pointer manipulation in netpoll_cleanup are done
    without rtnl and the following sequence happens when having a netconsole
    over a vlan and we remove the vlan while disabling the netconsole:
    CPU 1 CPU2
    removes vlan and calls the notifier
    enters store_enabled(), calls
    netdev_cleanup which checks np->dev
    and then waits for rtnl
    executes the netconsole netdev
    release notifier making np->dev
    == NULL and releases rtnl
    continues to dereference a member of
    np->dev which at this point is == NULL

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • [ Upstream commit b0dd663b60944a3ce86430fa35549fb37968bda0 ]

    The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP.

    [ Bug introduced by commit b7394d2429c198b1da3d46ac39192e891029ec0f
    ("netpoll: prepare for ipv6") ]

    Signed-off-by: Sonic Zhang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sonic Zhang
     
  • [ Upstream commit b86783587b3d1d552326d955acee37eac48800f1 ]

    In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
    later usage"), we missed that existing code was using nhoff as a
    temporary variable that could not always contain transport header
    offset.

    This is not a problem for TCP/UDP because port offset (@poff)
    is 0 for these protocols.

    Signed-off-by: Eric Dumazet
    Cc: Daniel Borkmann
    Cc: Nikolay Aleksandrov
    Acked-by: Nikolay Aleksandrov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 50d1784ee4683f073c0362ee360bfae7a3333d6c ]

    commit 416186fbf8c5b4e4465 ("net: Split core bits of netdev_pick_tx
    into __netdev_pick_tx") added a bug that disables caching of queue
    index in the socket.

    This is the source of packet reorders for TCP flows, and
    again this is happening more often when using FQ pacing.

    Old code was doing

    if (queue_index != old_index)
    sk_tx_queue_set(sk, queue_index);

    Alexander renamed the variables but forgot to change sk_tx_queue_set()
    2nd parameter.

    if (queue_index != new_index)
    sk_tx_queue_set(sk, queue_index);

    This means we store -1 over and over in sk->sk_tx_queue_mapping

    Signed-off-by: Eric Dumazet
    Cc: Alexander Duyck
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

27 Sep, 2013

1 commit


14 Sep, 2013

5 commits

  • [ Upstream commit 702821f4ea6f68db18aa1de7d8ed62c6ba586a64 ]

    commit 8728c544a9cbdc ("net: dev_pick_tx() fix") and commit
    b6fe83e9525a ("bonding: refine IFF_XMIT_DST_RELEASE capability")
    are quite incompatible : Queue selection is disabled because skb
    dst was dropped before entering bonding device.

    This causes major performance regression, mainly because TCP packets
    for a given flow can be sent to multiple queues.

    This is particularly visible when using the new FQ packet scheduler
    with MQ + FQ setup on the slaves.

    We can safely revert the first commit now that 416186fbf8c5b
    ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
    properly caps the queue_index.

    Reported-by: Xi Wang
    Diagnosed-by: Xi Wang
    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Alexander Duyck
    Cc: Denys Fedorysychenko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 3e805ad288c524bb65aad3f1e004402223d3d504 ]

    Fix the iproute2 command `bridge vlan show`, after switching from
    rtgenmsg to ifinfomsg.

    Let's start with a little history:

    Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in
    the 3.9 merge window.
    In the kernel commit 6cbdceeb, he added attribute support to
    bridge GETLINK requests sent with rtgenmsg.

    Mar 6th: Vlad got this iproute2 reference implementation of the bridge
    vlan netlink interface accepted (iproute2 9eff0e5c)

    Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
    http://patchwork.ozlabs.org/patch/239602/
    http://marc.info/?t=136680900700007

    Apr 28th: Linus released 3.9

    Apr 30th: Stephen released iproute2 3.9.0

    The `bridge vlan show` command haven't been working since the switch to
    ifinfomsg, or in a released version of iproute2. Since the kernel side
    only supports rtgenmsg, which iproute2 switched away from just prior to
    the iproute2 3.9.0 release.

    I haven't been able to find any documentation, about neither rtgenmsg
    nor ifinfomsg, and in which situation to use which, but kernel commit
    88c5b5ce seams to suggest that ifinfomsg should be used.

    Fixing this in kernel will break compatibility, but I doubt that anybody
    have been using it due to this bug in the user space reference
    implementation, at least not without noticing this bug. That said the
    functionality is still fully functional in 3.9, when reversing iproute2
    commit 63338dca.

    This could also be fixed in iproute2, but thats an ugly patch that would
    reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
    like rtgenmsg usage is discouraged. I'm assuming that the only reason
    that Vlad implemented the kernel side to use rtgenmsg, was because
    iproute2 was using it at the time.

    Signed-off-by: Asbjoern Sloth Toennesen
    Reviewed-by: Vlad Yasevich
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Asbjoern Sloth Toennesen
     
  • [ Upstream commit 645359930231d5e78fd3296a38b98c1a658a7ade ]

    Fix inverted check when deleting an fdb entry.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sridhar Samudrala
     
  • [ Upstream commit 63134803a6369dcf7dddf7f0d5e37b9566b308d2 ]

    dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
    populate them before calling it.

    Signed-off-by: Veaceslav Falico
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Veaceslav Falico
     
  • [ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ]

    It's possible to assign an invalid value to the net.core.somaxconn
    sysctl variable, because there is no checks at all.

    The sk_max_ack_backlog field of the sock structure is defined as
    unsigned short. Therefore, the backlog argument in inet_listen()
    shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
    is truncated to the somaxconn value. So, the somaxconn value shouldn't
    exceed 65535 (USHRT_MAX).
    Also, negative values of somaxconn are meaningless.

    before:
    $ sysctl -w net.core.somaxconn=256
    net.core.somaxconn = 256
    $ sysctl -w net.core.somaxconn=65536
    net.core.somaxconn = 65536
    $ sysctl -w net.core.somaxconn=-100
    net.core.somaxconn = -100

    after:
    $ sysctl -w net.core.somaxconn=256
    net.core.somaxconn = 256
    $ sysctl -w net.core.somaxconn=65536
    error: "Invalid argument" setting key "net.core.somaxconn"
    $ sysctl -w net.core.somaxconn=-100
    error: "Invalid argument" setting key "net.core.somaxconn"

    Based on a prior patch from Changli Gao.

    Signed-off-by: Roman Gushchin
    Reported-by: Changli Gao
    Suggested-by: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Roman Gushchin
     

29 Jul, 2013

2 commits

  • [ Upstream commit d4b812dea4a236f729526facf97df1a9d18e191c ]

    In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
    ("vlan: don't deliver frames for unknown vlans to protocols")
    Florian made sure we set pkt_type to PACKET_OTHERHOST
    if the vlan id is set and we could find a vlan device for this
    particular id.

    But we also have a problem if prio bits are set.

    Steinar reported an issue on a router receiving IPv6 frames with a
    vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
    because skb->vlan_tci is set.

    Forwarded frame is completely corrupted : We can see (8100:4000)
    being inserted in the middle of IPv6 source address :

    16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
    9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
    0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
    0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
    0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
    0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

    It seems we are not really ready to properly cope with this right now.

    We can probably do better in future kernels :
    vlan_get_ingress_priority() should be a netdev property instead of
    a per vlan_dev one.

    For stable kernels, lets clear vlan_tci to fix the bugs.

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ]

    There is a race in neighbour code, because neigh_destroy() uses
    skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
    while other parts of the code assume neighbour rwlock is what
    protects arp_queue

    Convert all skb_queue_purge() calls to the __skb_queue_purge() variant

    Use __skb_queue_head_init() instead of skb_queue_head_init()
    to make clear we do not use arp_queue.lock

    And hold neigh->lock in neigh_destroy() to close the race.

    Reported-by: Joe Jin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

27 Jun, 2013

1 commit

  • When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
    rename of a network interface, it can end up waiting for a workqueue
    to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
    SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
    the fact that read_secklock_begin() will spin forever waiting for the
    writer process (the one doing the interface rename) to update the
    devnet_rename_seq sequence.

    This patch fixes the problem by adding a helper (netdev_get_name())
    and using it in the code handling the SIOCGIFNAME ioctl and
    SO_BINDTODEVICE setsockopt.

    The netdev_get_name() helper uses raw_seqcount_begin() to avoid
    spinning forever, waiting for devnet_rename_seq->sequence to become
    even. cond_resched() is used in the contended case, before retrying
    the access to give the writer process a chance to finish.

    The use of raw_seqcount_begin() will incur some unneeded work in the
    reader process in the contended case, but this is better than
    deadlocking the system.

    Signed-off-by: Nicolas Schichan
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nicolas Schichan
     

26 Jun, 2013

1 commit

  • commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
    added a possible skb leak, because it frees only the head of segment
    list, in case a skb_linearize() call fails.

    This patch adds a kfree_skb_list() helper to fix the bug.

    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jun, 2013

1 commit

  • As part of the push to add 802.1ad server provider tagging support to the
    kernel the VLAN features flags were renamed. Unfortunately the kernel name
    for the VLAN hardware acceleration features that the kernel shows user space
    was included in the rename, which broke ethtool (txvlan and rxvlan options
    do not work). This patch restores the original names, i.e. the original ABI.
    If we wanted to make clear to users that we are refering to CTAGs we can
    always change ethtool's short_name and long_name for these features (for
    example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
    tx-vlan-ctag-offload).

    Cc: Patrick McHardy
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Fernando Luis Vazquez Cao
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Fernando Luis Vazquez Cao
     

11 Jun, 2013

1 commit


05 Jun, 2013

1 commit

  • Eric Dumazet spotted that we have to check skb->head instead
    of skb->data as skb->head points to the beginning of the
    data area of the skbuff. Similarly, we have to initialize the
    skb->head pointer, not skb->data in __alloc_skb_head.

    After this fix, netlink crashes in the release path of the
    sk_buff, so let's fix that as well.

    This bug was introduced in (0ebd0ac net: add function to
    allocate sk_buff head without data area).

    Reported-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira
     

01 Jun, 2013

4 commits

  • The dev_mc_sync_multiple function is currently calling
    __hw_addr_sync, and not __hw_addr_sync_multiple. This will result in
    addresses only being synced to the first device from the set.

    Corrected by calling the _multiple variant.

    Signed-off-by: Jay Vosburgh
    Reviewed-by: Vlad Yasevich
    Tested-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Jay Vosburgh
     
  • Currently, __hw_addr_sync_one is called in a loop by
    __hw_addr_sync_multiple to sync each of a "from" device's hw addresses
    to a "to" device. __hw_addr_sync_one calls __hw_addr_add_ex to attempt
    to add each address. __hw_addr_add_ex is called with global=false, and
    sync=true.

    __hw_addr_add_ex checks to see if the new address matches an
    address already on the list. If so, it tests global and sync. In this
    case, sync=true, and it then checks if the address is already synced,
    and if so, returns 0.

    This 0 return causes __hw_addr_sync_one to increment the sync_cnt
    and refcount for the "from" list's address entry, even though the address
    is already synced and has a reference and sync_cnt. This will cause
    the sync_cnt and refcount to increment without bound every time an
    addresses is added to the "from" device and synced to the "to" device.

    The fix here has two parts:

    First, when __hw_addr_add_ex finds the address already exists
    and is synced, return -EEXIST instead of 0.

    Second, __hw_addr_sync_one checks the error return for -EEXIST,
    and if so, it (a) does not add a refcount/sync_cnt, and (b) returns 0
    itself so that __hw_addr_sync_multiple will not return an error.

    Signed-off-by: Jay Vosburgh
    Reviewed-by: Vlad Yasevich
    Tested-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Jay Vosburgh
     
  • When an address is added to a subordinate interface (the "to"
    list), the address entry in the "from" list is not marked "synced" as
    the entry added to the "to" list is.

    When performing the unsync operation (e.g., dev_mc_unsync),
    __hw_addr_unsync_one calls __hw_addr_del_entry with the "synced"
    parameter set to true for the case when the address reference is being
    released from the "from" list. This causes a test inside to fail,
    with the result being that the reference count on the "from" address
    is not properly decremeted and the address on the "from" list will
    never be freed.

    Correct this by having __hw_addr_unsync_one call the
    __hw_addr_del_entry function with the "sync" flag set to false for the
    "remove from the from list" case.

    Signed-off-by: Jay Vosburgh
    Reviewed-by: Vlad Yasevich
    Tested-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Jay Vosburgh
     
  • The sync_cnt field is not being initialized, which can result
    in arbitrary values in the field. Fixed by initializing it to zero.

    Signed-off-by: Jay Vosburgh
    Reviewed-by: Vlad Yasevich
    Tested-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Jay Vosburgh
     

29 May, 2013

1 commit


20 May, 2013

1 commit

  • ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

    That function is only present with CONFIG_NET. Turns out that
    crypto/algif_skcipher.c also uses that outside net, but it actually
    needs sockets anyway.

    In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
    CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
    that function and revert that commit too.

    socket.h already includes uio.h, so no callers need updating; trying
    only broke things fo x86_64 randconfig (thanks Fengguang!).

    Reported-by: Randy Dunlap
    Acked-by: David S. Miller
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Rusty Russell
     

12 May, 2013

1 commit

  • We have seen multiple NULL dereferences in __inet6_lookup_established()

    After analysis, I found that inet6_sk() could be NULL while the
    check for sk_family == AF_INET6 was true.

    Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
    and TCP stacks.

    Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
    table, we no longer can clear pinet6 field.

    This patch extends logic used in commit fcbdf09d9652c891
    ("net: fix nulls list corruptions in sk_prot_alloc")

    TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
    to make sure we do not clear pinet6 field.

    At socket clone phase, we do not really care, as cloning the parent (non
    NULL) pinet6 is not adding a fatal race.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 May, 2013

1 commit


06 May, 2013

2 commits


03 May, 2013

2 commits


02 May, 2013

5 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Supply a function (proc_remove()) to remove a proc entry (and any subtree
    rooted there) by proc_dir_entry pointer rather than by name and (optionally)
    root dir entry pointer. This allows us to eliminate all remaining pde->name
    accesses outside of procfs.

    Signed-off-by: David Howells
    Acked-by: Grant Likely
    cc: linux-acpi@vger.kernel.org
    cc: openipmi-developer@lists.sourceforge.net
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-pci@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     
  • Bart Van Assche recently reported a warning to me:

    [] warn_slowpath_common+0x7f/0xc0
    [] warn_slowpath_null+0x1a/0x20
    [] mutex_trylock+0x16d/0x180
    [] netpoll_poll_dev+0x49/0xc30
    [] ? __alloc_skb+0x82/0x2a0
    [] netpoll_send_skb_on_dev+0x265/0x410
    [] netpoll_send_udp+0x28a/0x3a0
    [] ? write_msg+0x53/0x110 [netconsole]
    [] write_msg+0xcf/0x110 [netconsole]
    [] call_console_drivers.constprop.17+0xa1/0x1c0
    [] console_unlock+0x2d6/0x450
    [] vprintk_emit+0x1ee/0x510
    [] printk+0x4d/0x4f
    [] scsi_print_command+0x7d/0xe0 [scsi_mod]

    This resulted from my commit ca99ca14c which introduced a mutex_trylock
    operation in a path that could execute in interrupt context. When mutex
    debugging is enabled, the above warns the user when we are in fact
    exectuting in interrupt context
    interrupt context.

    After some discussion, It seems that a semaphore is the proper mechanism to use
    here. While mutexes are defined to be unusable in interrupt context, no such
    condition exists for semaphores (save for the fact that the non blocking api
    calls, like up and down_trylock must be used when in irq context).

    Signed-off-by: Neil Horman
    Reported-by: Bart Van Assche
    CC: Bart Van Assche
    CC: David Miller
    CC: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Neil Horman
     

30 Apr, 2013

5 commits

  • Conflicts:
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
    drivers/net/ethernet/emulex/benet/be.h
    include/net/tcp.h
    net/mac802154/mac802154.h

    Most conflicts were minor overlapping stuff.

    The be2net driver brought in some fixes that added __vlan_put_tag
    calls, which in net-next take an additional argument.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently, peeking on a unix datagram socket with an offset larger than len of
    the data in the sk receive queue returns immediately with bogus data. That's
    because *off is not reset between each skb_queue_walk().

    This patch fixes this so that the behavior is the same as peeking with no
    offset on an empty queue: the caller blocks.

    Signed-off-by: Benjamin Poirier
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Benjamin Poirier
     
  • "77c1090 net: fix infinite loop in __skb_recv_datagram()" (v3.8) introduced a
    regression:
    After that commit, recv can no longer peek beyond a 0-sized skb in the queue.
    __skb_recv_datagram() instead stops at the first skb with len == 0 and results
    in the system call failing with -EFAULT via skb_copy_datagram_iovec().

    When peeking at an offset with 0-sized skb(s), each one of those is received
    only once, in sequence. The offset starts moving forward again after receiving
    datagrams with len > 0.

    Signed-off-by: Benjamin Poirier
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Benjamin Poirier
     
  • Use consume_skb() to free the original skb that is successfully transmitted
    as gso segmented skbs so that it is not treated as a drop due to an error.

    Signed-off-by: Sridhar Samudrala
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     
  • Remove duplicate statements by using do-while loop instead of while loop.

    - A;
    - while (e) {
    + do {
    A;
    - }
    + } while (e);

    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita