01 Apr, 2014

3 commits


31 Mar, 2014

7 commits

  • This patch replaces/reworks the kernel-internal BPF interpreter with
    an optimized BPF instruction set format that is modelled closer to
    mimic native instruction sets and is designed to be JITed with one to
    one mapping. Thus, the new interpreter is noticeably faster than the
    current implementation of sk_run_filter(); mainly for two reasons:

    1. Fall-through jumps:

    BPF jump instructions are forced to go either 'true' or 'false'
    branch which causes branch-miss penalty. The new BPF jump
    instructions have only one branch and fall-through otherwise,
    which fits the CPU branch predictor logic better. `perf stat`
    shows drastic difference for branch-misses between the old and
    new code.

    2. Jump-threaded implementation of interpreter vs switch
    statement:

    Instead of single table-jump at the top of 'switch' statement,
    gcc will now generate multiple table-jump instructions, which
    helps CPU branch predictor logic.

    Note that the verification of filters is still being done through
    sk_chk_filter() in classical BPF format, so filters from user- or
    kernel space are verified in the same way as we do now, and same
    restrictions/constraints hold as well.

    We reuse current BPF JIT compilers in a way that this upgrade would
    even be fine as is, but nevertheless allows for a successive upgrade
    of BPF JIT compilers to the new format.

    The internal instruction set migration is being done after the
    probing for JIT compilation, so in case JIT compilers are able to
    create a native opcode image, we're going to use that, and in all
    other cases we're doing a follow-up migration of the BPF program's
    instruction set, so that it can be transparently run in the new
    interpreter.

    In short, the *internal* format extends BPF in the following way (more
    details can be taken from the appended documentation):

    - Number of registers increase from 2 to 10
    - Register width increases from 32-bit to 64-bit
    - Conditional jt/jf targets replaced with jt/fall-through
    - Adds signed > and >= insns
    - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
    - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
    - Adds arithmetic right shift and endianness conversion insns
    - Adds atomic_add insn
    - Old tax/txa insns are replaced with 'mov dst,src' insn

    Performance of two BPF filters generated by libpcap resp. bpf_asm
    was measured on x86_64, i386 and arm32 (other libpcap programs
    have similar performance differences):

    fprog #1 is taken from Documentation/networking/filter.txt:
    tcpdump -i eth0 port 22 -dd

    fprog #2 is taken from 'man tcpdump':
    tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<>2)) != 0)' -dd

    Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
    same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
    smaller is better:

    --x86_64--
    fprog #1 fprog #1 fprog #2 fprog #2
    cache-hit cache-miss cache-hit cache-miss
    old BPF 90 101 192 202
    new BPF 31 71 47 97
    old BPF jit 12 34 17 44
    new BPF jit TBD

    --i386--
    fprog #1 fprog #1 fprog #2 fprog #2
    cache-hit cache-miss cache-hit cache-miss
    old BPF 107 136 227 252
    new BPF 40 119 69 172

    --arm32--
    fprog #1 fprog #1 fprog #2 fprog #2
    cache-hit cache-miss cache-hit cache-miss
    old BPF 202 300 475 540
    new BPF 180 270 330 470
    old BPF jit 26 182 37 202
    new BPF jit TBD

    Thus, without changing any userland BPF filters, applications on
    top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
    classifier, netfilter's xt_bpf, team driver's load-balancing mode,
    and many more will have better interpreter filtering performance.

    While we are replacing the internal BPF interpreter, we also need
    to convert seccomp BPF in the same step to make use of the new
    internal structure since it makes use of lower-level API details
    without being further decoupled through higher-level calls like
    sk_unattached_filter_{create,destroy}(), for example.

    Just as for normal socket filtering, also seccomp BPF experiences
    a time-to-verdict speedup:

    05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

    seccomp_rule_add_exact(ctx,...
    seccomp_rule_add_exact(ctx,...

    rc = seccomp_load(ctx);

    for (i = 0; i < 10000000; i++)
    syscall(199, 100);

    'short filter' has 2 rules
    'large filter' has 200 rules

    'short filter' performance is slightly better on x86_64/i386/arm32
    'large filter' is much faster on x86_64 and i386 and shows no
    difference on arm32

    --x86_64-- short filter
    old BPF: 2.7 sec
    39.12% bench libc-2.15.so [.] syscall
    8.10% bench [kernel.kallsyms] [k] sk_run_filter
    6.31% bench [kernel.kallsyms] [k] system_call
    5.59% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller
    4.37% bench [kernel.kallsyms] [k] trace_hardirqs_off_caller
    3.70% bench [kernel.kallsyms] [k] __secure_computing
    3.67% bench [kernel.kallsyms] [k] lock_is_held
    3.03% bench [kernel.kallsyms] [k] seccomp_bpf_load
    new BPF: 2.58 sec
    42.05% bench libc-2.15.so [.] syscall
    6.91% bench [kernel.kallsyms] [k] system_call
    6.25% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller
    6.07% bench [kernel.kallsyms] [k] __secure_computing
    5.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp

    --arm32-- short filter
    old BPF: 4.0 sec
    39.92% bench [kernel.kallsyms] [k] vector_swi
    16.60% bench [kernel.kallsyms] [k] sk_run_filter
    14.66% bench libc-2.17.so [.] syscall
    5.42% bench [kernel.kallsyms] [k] seccomp_bpf_load
    5.10% bench [kernel.kallsyms] [k] __secure_computing
    new BPF: 3.7 sec
    35.93% bench [kernel.kallsyms] [k] vector_swi
    21.89% bench libc-2.17.so [.] syscall
    13.45% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
    6.25% bench [kernel.kallsyms] [k] __secure_computing
    3.96% bench [kernel.kallsyms] [k] syscall_trace_exit

    --x86_64-- large filter
    old BPF: 8.6 seconds
    73.38% bench [kernel.kallsyms] [k] sk_run_filter
    10.70% bench libc-2.15.so [.] syscall
    5.09% bench [kernel.kallsyms] [k] seccomp_bpf_load
    1.97% bench [kernel.kallsyms] [k] system_call
    new BPF: 5.7 seconds
    66.20% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
    16.75% bench libc-2.15.so [.] syscall
    3.31% bench [kernel.kallsyms] [k] system_call
    2.88% bench [kernel.kallsyms] [k] __secure_computing

    --i386-- large filter
    old BPF: 5.4 sec
    new BPF: 3.8 sec

    --arm32-- large filter
    old BPF: 13.5 sec
    73.88% bench [kernel.kallsyms] [k] sk_run_filter
    10.29% bench [kernel.kallsyms] [k] vector_swi
    6.46% bench libc-2.17.so [.] syscall
    2.94% bench [kernel.kallsyms] [k] seccomp_bpf_load
    1.19% bench [kernel.kallsyms] [k] __secure_computing
    0.87% bench [kernel.kallsyms] [k] sys_getuid
    new BPF: 13.5 sec
    76.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
    10.98% bench [kernel.kallsyms] [k] vector_swi
    5.87% bench libc-2.17.so [.] syscall
    1.77% bench [kernel.kallsyms] [k] __secure_computing
    0.93% bench [kernel.kallsyms] [k] sys_getuid

    BPF filters generated by seccomp are very branchy, so the new
    internal BPF performance is better than the old one. Performance
    gains will be even higher when BPF JIT is committed for the
    new structure, which is planned in future work (as successive
    JIT migrations).

    BPF has also been stress-tested with trinity's BPF fuzzer.

    Joint work with Daniel Borkmann.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Cc: Hagen Paul Pfeifer
    Cc: Kees Cook
    Cc: Paul Moore
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: linux-kernel@vger.kernel.org
    Acked-by: Kees Cook
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Similarly as in ppp, we need to migrate the ISDN/PPP code to make use
    of the sk_unattached_filter api in order to decouple having direct
    filter structure access. By using sk_unattached_filter_{create,destroy},
    we can allow for the possibility to jit compile filters for faster
    filter verdicts as well.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Karsten Keil
    Cc: isdn4linux@listserv.isdn4linux.de
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There are currently pch_gbe, cpts, and ixp4xx_eth drivers that open-code
    and reimplement a BPF classifier for the PTP protocol. Since all of them
    effectively do the very same thing and load the very same PTP/BPF filter,
    we can just consolidate that code by introducing ptp_classify_raw() in
    the time-stamping core framework which can be used in drivers.

    As drivers get initialized after bootstrapping the core networking
    subsystem, they can make use of ptp_insns wrapped through
    ptp_classify_raw(), which allows to simplify and remove PTP classifier
    setup code in drivers.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Richard Cochran
    Cc: Jiri Benc
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch migrates an open-coded sk_run_filter() implementation with
    proper use of the BPF API, that is, sk_unattached_filter_create(). This
    migration is needed, as we will be internally transforming the filter
    to a different representation, and therefore needs to be decoupled.

    It is okay to do so as skb_timestamping_init() is called during
    initialization of the network stack in core initcall via sock_init().
    This would effectively also allow for PTP filters to be jit compiled if
    bpf_jit_enable is set.

    For better readability, there are also some newlines introduced, also
    ptp_classify.h is only in kernel space.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Richard Cochran
    Cc: Jiri Benc
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch basically does two things, i) removes the extern keyword
    from the include/linux/filter.h file to be more consistent with the
    rest of Joe's changes, and ii) moves filter accounting into the filter
    core framework.

    Filter accounting mainly done through sk_filter_{un,}charge() take
    care of the case when sockets are being cloned through sk_clone_lock()
    so that removal of the filter on one socket won't result in eviction
    as it's still referenced by the other.

    These functions actually belong to net/core/filter.c and not
    include/net/sock.h as we want to keep all that in a central place.
    It's also not in fast-path so uninlining them is fine and even allows
    us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
    declaration.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In order to open up the possibility to internally transform a BPF program
    into an alternative and possibly non-trivial reversible representation, we
    need to keep the original BPF program around, so that it can be passed back
    to user space w/o the need of a complex decoder.

    The reason for that use case resides in commit a8fc92778080 ("sk-filter:
    Add ability to get socket filter program (v2)"), that is, the ability
    to retrieve the currently attached BPF filter from a given socket used
    mainly by the checkpoint-restore project, for example.

    Therefore, we add two helpers sk_{store,release}_orig_filter for taking
    care of that. In the sk_unattached_filter_create() case, there's no such
    possibility/requirement to retrieve a loaded BPF program. Therefore, we
    can spare us the work in that case.

    This approach will simplify and slightly speed up both, sk_get_filter()
    and sock_diag_put_filterinfo() handlers as we won't need to successively
    decode filters anymore through sk_decode_filter(). As we still need
    sk_decode_filter() later on, we're keeping it around.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Cc: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch adds a jited flag into sk_filter struct in order to indicate
    whether a filter is currently jited or not. The size of sk_filter is
    not being expanded as the 32 bit 'len' member allows upper bits to be
    reused since a filter can currently only grow as large as BPF_MAXINSNS.

    Therefore, there's enough room also for other in future needed flags to
    reuse 'len' field if necessary. The jited flag also allows for having
    alternative interpreter functions running as currently, we can only
    detect jit compiled filters by testing fp->bpf_func to not equal the
    address of sk_run_filter().

    Joint work with Alexei Starovoitov.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

30 Mar, 2014

4 commits

  • Conflicts:
    drivers/net/ethernet/marvell/mvneta.c

    The mvneta.c conflict is a case of overlapping changes,
    a conversion to devm_ioremap_resource() vs. a conversion
    to netdev_alloc_pcpu_stats.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Stop taking the transmit lock when a network device has specified
    NETIF_F_LLTX.

    If no locks needed to trasnmit a packet this is the ideal scenario for
    netpoll as all packets can be trasnmitted immediately.

    Even if some locks are needed in ndo_start_xmit skipping any unnecessary
    serialization is desirable for netpoll as it makes it more likely a
    debugging packet may be trasnmitted immediately instead of being
    deferred until later.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The netpoll_rx_enable and netpoll_rx_disable functions have always
    controlled polling the network drivers transmit and receive queues.

    Rename them to netpoll_poll_enable and netpoll_poll_disable to make
    their functionality clear.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The gfp parameter was added in:
    commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af
    Author: Amerigo Wang
    Date: Fri Aug 10 01:24:37 2012 +0000

    netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

    slave_enable_netpoll() and __netpoll_setup() may be called
    with read_lock() held, so should use GFP_ATOMIC to allocate
    memory. Eric suggested to pass gfp flags to __netpoll_setup().

    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Reported-by: Dan Carpenter
    Signed-off-by: Eric Dumazet
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    The reason for the gfp parameter was removed in:
    commit c4cdef9b7183159c23c7302aaf270d64c549f557
    Author: dingtianhong
    Date: Tue Jul 23 15:25:27 2013 +0800

    bonding: don't call slave_xxx_netpoll under spinlocks

    The slave_xxx_netpoll will call synchronize_rcu_bh(),
    so the function may schedule and sleep, it should't be
    called under spinlocks.

    bond_netpoll_setup() and bond_netpoll_cleanup() are always
    protected by rtnl lock, it is no need to take the read lock,
    as the slave list couldn't be changed outside rtnl lock.

    Signed-off-by: Ding Tianhong
    Cc: Jay Vosburgh
    Cc: Andy Gospodarek
    Signed-off-by: David S. Miller

    Nothing else that calls __netpoll_setup or ndo_netpoll_setup
    requires a gfp paramter, so remove the gfp parameter from both
    of these functions making the code clearer.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

29 Mar, 2014

5 commits

  • Some drivers incorrectly assign vlan acceleration features to
    vlan_features thus causing issues for Q-in-Q vlan configurations.
    Warn the user of such cases.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • skb_network_protocol() already accounts for multiple vlan
    headers that may be present in the skb. However, skb_mac_gso_segment()
    doesn't know anything about it and assumes that skb->mac_len
    is set correctly to skip all mac headers. That may not
    always be the case. If we are simply forwarding the packet (via
    bridge or macvtap), all vlan headers may not be accounted for.

    A simple solution is to allow skb_network_protocol to return
    the vlan depth it has calculated. This way skb_mac_gso_segment
    will correctly skip all mac headers.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Dropping packets in __dev_queue_xmit() when transmit queue
    is stopped (NIC TX ring buffer full or BQL limit reached) currently
    outputs a syslog message.

    It would be better to get a precise count of such events available in
    netdevice stats so that monitoring tools can have a clue.

    This extends the work done in caf586e5f23ce
    ("net: add a core netdev->rx_dropped counter")

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add implementation for the add/del vxlan port ndo calls, using the
    CONFIG_DEV firmware command.

    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Or Gerlitz
     
  • Introduce the CONFIG_DEV firmware command which we will use to
    configure the UDP port assumed by the firmware for the VXLAN offloads.

    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Or Gerlitz
     

28 Mar, 2014

2 commits

  • skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
    orphan them. Also, it doesn't handle errors, so this patch takes care of that
    as well, and modify the callers accordingly. skb_tx_error() is also added to
    the callers so they will signal the failed delivery towards the creator of the
    skb.

    Signed-off-by: Zoltan Kiss
    Signed-off-by: David S. Miller

    Zoltan Kiss
     
  • This fixes a race which happens by freeing an object on the stack.
    Quoting Julius:
    > The issue is
    > that it calls usbnet_terminate_urbs() before that, which temporarily
    > installs a waitqueue in dev->wait in order to be able to wait on the
    > tasklet to run and finish up some queues. The waiting itself looks
    > okay, but the access to 'dev->wait' is totally unprotected and can
    > race arbitrarily. I think in this case usbnet_bh() managed to succeed
    > it's dev->wait check just before usbnet_terminate_urbs() sets it back
    > to NULL. The latter then finishes and the waitqueue_t structure on its
    > stack gets overwritten by other functions halfway through the
    > wake_up() call in usbnet_bh().

    The fix is to just not allocate the data structure on the stack.
    As dev->wait is abused as a flag it also takes a runtime PM change
    to fix this bug.

    Signed-off-by: Oliver Neukum
    Reported-by: Grant Grundler
    Tested-by: Grant Grundler
    Signed-off-by: David S. Miller

    Oliver Neukum
     

27 Mar, 2014

3 commits

  • This patch adds support for Samsung 10Gb ethernet driver(sxgbe).

    - sxgbe core initialization
    - Tx and Rx support
    - MDIO support
    - ISRs for Tx and Rx
    - ifconfig support to driver

    Signed-off-by: Siva Reddy Kallam
    Signed-off-by: Vipul Pandya
    Signed-off-by: Girish K S
    Neatening-by: Joe Perches
    Signed-off-by: Byungho An
    Signed-off-by: David S. Miller

    Siva Reddy
     
  • The vlan support 2 proto: 802.1q and 802.1ad, so make a new function
    called vlan_dev_vlan_proto() which could return the vlan proto for
    input dev.

    Signed-off-by: Ding Tianhong
    Signed-off-by: David S. Miller

    dingtianhong
     
  • The packet hash can be considered a property of the packet, not just
    on RX path.

    This patch changes name of rxhash and l4_rxhash skbuff fields to be
    hash and l4_hash respectively. This includes changing uses of the
    field in the code which don't call the access functions.

    Signed-off-by: Tom Herbert
    Signed-off-by: Eric Dumazet
    Cc: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Tom Herbert
     

26 Mar, 2014

2 commits

  • Conflicts:
    Documentation/devicetree/bindings/net/micrel-ks8851.txt
    net/core/netpoll.c

    The net/core/netpoll.c conflict is a bug fix in 'net' happening
    to code which is completely removed in 'net-next'.

    In micrel-ks8851.txt we simply have overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • John W. Linville says:

    ====================
    Please pull this batch of wireless updates intended for 3.15!

    For the mac80211 bits, Johannes says:

    "This has a whole bunch of bugfixes for things that went into -next
    previously as well as some other bugfixes I didn't want to rush into
    3.14 at this point. The rest of it is some cleanups and a few small
    features, the biggest of which is probably Janusz's regulatory DFS CAC
    time code."

    For the Bluetooth bits, Gustavo says:

    "One more pull request to 3.15. This is mostly and bug fix pull request, it
    contains several fixes and clean up all over the tree, plus some small new
    features."

    For the NFC bits, Samuel says:

    "This is the NFC pull request for 3.15. With this one we have:

    - Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
    15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
    now supports those through the NFC netlink and digital APIs.

    - Support for TI's trf7970a chipset. This chipset relies on the NFC
    digital layer and the driver currently supports type 2, 4A and 5 tags.

    - Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
    relies on a different firmware download protocal than the C2 one. We
    now support both and use the right one depending on the version we
    detect at runtime.

    - Support for 4A tags from the NFC digital layer.

    - A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."

    For the iwlwifi bits, Emmanuel says:

    "We were sending a host command while the mutex wasn't held. This
    led to hard-to-catch races."

    And...

    "I have a fix for a "merge damage" which is not really a merge
    damage: it enables scheduled scan which has been disabled in
    wireless.git. Since you merged wireless.git into wireless-next.git,
    this can now be fixed in wireless-next.git.

    Besides this, Alex made a workaround for a hardware bug. This fix
    allows us to consume less power in S3. Arik and Eliad continue to
    work on D0i3 which is a run-time power saving feature. Eliad also
    contributes a few bits to the rate scaling logic to which Eyal adds his
    own contribution. Avri dives deep in the power code - newer firmware
    will allow to enable power save in newer scenarios. Johannes made a few
    clean-ups. I have the regular amount of BT Coex boring stuff. I disable
    uAPSD since we identified firmware bugs that cause packet loss. One
    thing that do stand out is the udev event that we now send when the
    FW asserts. I hope it will allow us to debug the FW more easily."

    Also included is one last iwlwifi pull for a build breakage fix...

    For the Atheros bits, Kalle says:

    "Michal now did some optimisations and was able to improve throughput by
    100 Mbps on our MIPS based AP135 platform. Chun-Yeow added some
    workarounds to be able to better use ad-hoc mode. Ben improved log
    messages and added support for MSDU chaining. And, as usual, also some
    smaller fixes."

    Beyond that...

    Andrea Merello continues his rtl8180 refactoring, in preparation for
    a long-awaited rtl8187 driver. We get a new driver (rsi) for the
    RS9113 chip, from Fariya Fatima. And, of course, we get the usual
    round of updates for ath9k, brcmfmac, mwifiex, wil6210, etc. as well.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Mar, 2014

2 commits

  • Replace kfree_skb with dev_kfree_skb_any in vlan_insert_tag as
    vlan_insert_tag can be called from hard irq context (netpoll)
    and from other contexts.

    dev_kfree_skb_any is used as vlan_insert_tag only frees the skb if the
    skb can not be modified to insert a tag, in which case vlan_insert_tag
    drops the skb.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Pull networking fixes from David Miller:

    1) OpenVswitch's lookup_datapath() returns error pointers, so don't
    check against NULL. From Jiri Pirko.

    2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation
    under RCU locks, fix by using GFP_ATOMIC when necessary. From
    Nikolay Aleksandrov.

    3) phy_suspend() indirectly passes uninitialized data into the ethtool
    get wake-on-land implementations. Fix from Sebastian Hesselbarth.

    4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger.

    5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers
    a NULL pointer. Fix from David Stevens.

    6) IPV6 neigh handling in vxlan doesn't validate the destination
    address properly, and it builds a packet with the src and dst
    reversed. Fix also from David Stevens.

    7) Fix spinlock recursion during subscription failures in TIPC stack,
    from Erik Hugne.

    8) Revert buggy conversion of davinci_emac to devm_request_irq, from
    Chrstian Riesch.

    9) Wrong flags passed into forwarding database netlink notifications,
    from Nicolas Dichtel.

    10) The netpoll neighbour soliciation handler checks wrong ethertype,
    needs to be ETH_P_IPV6 rather than ETH_P_ARP. Fix from Li RongQing.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
    tipc: fix spinlock recursion bug for failed subscriptions
    vxlan: fix nonfunctional neigh_reduce()
    net: davinci_emac: Fix rollback of emac_dev_open()
    net: davinci_emac: Replace devm_request_irq with request_irq
    netpoll: fix the skb check in pkt_is_ns
    net: micrel : ks8851-ml: add vdd-supply support
    ip6mr: fix mfc notification flags
    ipmr: fix mfc notification flags
    rtnetlink: fix fdb notification flags
    tcp: syncookies: do not use getnstimeofday()
    netlink: fix setsockopt in mmap examples in documentation
    openvswitch: Correctly report flow used times for first 5 minutes after boot.
    via-rhine: Disable device in error path
    ATHEROS-ATL1E: Convert iounmap to pci_iounmap
    vxlan: fix potential NULL dereference in arp_reduce()
    cnic: Update version to 2.5.20 and copyright year.
    cnic,bnx2i,bnx2fc: Fix inconsistent use of page size
    cnic: Use proper ulp_ops for per device operations.
    net: cdc_ncm: fix control message ordering
    ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
    ...

    Linus Torvalds
     

22 Mar, 2014

2 commits


21 Mar, 2014

7 commits

  • …it/rostedt/linux-trace

    Pull trace fix from Steven Rostedt:
    "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the
    array index in the trace event format bogus.

    He supplied an elegant solution that uses __stringify() and also
    removes the need for the event_storage and event_storage_mutex and
    also cuts off a few K of overhead from the trace events"

    * tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix array size mismatch in format string

    Linus Torvalds
     
  • Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting
    little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity:
    indicating that remove_migration_ptes() failed to find one of the
    migration entries that was temporarily inserted.

    The problem comes from remap_file_pages()'s switch from vma_interval_tree
    (good for inserting the migration entry) to i_mmap_nonlinear list (no good
    for locating it again); but can only be a problem if the remap_file_pages()
    range does not cover the whole of the vma (zap_pte() clears the range).

    remove_migration_ptes() needs a file_nonlinear method to go down the
    i_mmap_nonlinear list, applying linear location to look for migration
    entries in those vmas too, just in case there was this race.

    The file_nonlinear method does need rmap_walk_control.arg to do this;
    but it never needed vma passed in - vma comes from its own iteration.

    Reported-and-tested-by: Dave Jones
    Reported-and-tested-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • According to "Universal Serial Bus Communications Class Subclass
    Specification for Mobile Broadband Interface Model, Revision 1.0,
    Errata-1" published by USB-IF, the wMTU field of the MBIM extended
    functional descriptor indicates the operator preferred MTU for IP data
    streams.

    This patch modifies cdc_ncm_setup to ensure that the MTU value set on
    the usbnet device does not exceed the operator preferred MTU indicated
    by wMTU if the MBIM device exposes a MBIM extended functional
    descriptor.

    Signed-off-by: Ben Chan
    Signed-off-by: David S. Miller

    Ben Chan
     
  • Adds support for N-Port VFs, this includes:
    1. Adding support in the wrapped FW command
    In wrapped commands, we need to verify and convert
    the slave's port into the real physical port.
    Furthermore, when sending the response back to the slave,
    a reverse conversion should be made.
    2. Adjusting sqpn for QP1 para-virtualization
    The slave assumes that sqpn is used for QP1 communication.
    If the slave is assigned to a port != (first port), we need
    to adjust the sqpn that will direct its QP1 packets into the
    correct endpoint.
    3. Adjusting gid[5] to modify the port for raw ethernet
    In B0 steering, gid[5] contains the port. It needs
    to be adjusted into the physical port.
    4. Adjusting number of ports in the query / ports caps in the FW commands
    When a slave queries the hardware, it needs to view only
    the physical ports it's assigned to.
    5. Adjusting the sched_qp according to the port number
    The QP port is encoded in the sched_qp, thus in modify_qp we need
    to encode the correct port in sched_qp.

    Signed-off-by: Matan Barak
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Matan Barak
     
  • This patch adds the following utils:
    1. Convert slave_id -> VF
    2. Get the active ports by slave_id
    3. Convert slave's port to real port
    4. Get the slave's port from real port
    5. Get all slaves that uses the i'th real port
    6. Get all slaves that uses the i'th real port exclusively

    Signed-off-by: Matan Barak
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Matan Barak
     
  • Adds the required data structures to support VFs with N (1 or 2)
    ports instead of always using the number of physical ports.

    Signed-off-by: Matan Barak
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Matan Barak
     
  • In event format strings, the array size is reported in two locations.
    One in array subscript and then via the "size:" attribute. The values
    reported there have a mismatch.

    For e.g., in sched:sched_switch the prev_comm and next_comm character
    arrays have subscript values as [32] where as the actual field size is
    16.

    name: sched_switch
    ID: 301
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:char prev_comm[32]; offset:8; size:16; signed:1;
    field:pid_t prev_pid; offset:24; size:4; signed:1;
    field:int prev_prio; offset:28; size:4; signed:1;
    field:long prev_state; offset:32; size:8; signed:1;
    field:char next_comm[32]; offset:40; size:16; signed:1;
    field:pid_t next_pid; offset:56; size:4; signed:1;
    field:int next_prio; offset:60; size:4; signed:1;

    After bisection, the following commit was blamed:
    92edca0 tracing: Use direct field, type and system names

    This commit removes the duplication of strings for field->name and
    field->type assuming that all the strings passed in
    __trace_define_field() are immutable. This is not true for arrays, where
    the type string is created in event_storage variable and field->type for
    all array fields points to event_storage.

    Use __stringify() to create a string constant for the type string.

    Also, get rid of event_storage and event_storage_mutex that are not
    needed anymore.

    also, an added benefit is that this reduces the overhead of events a bit more:

    text data bss dec hex filename
    8424787 2036472 1302528 11763787 b3804b vmlinux
    8420814 2036408 1302528 11759750 b37086 vmlinux.patched

    Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

    Cc: Laurent Chavey
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

20 Mar, 2014

2 commits


19 Mar, 2014

1 commit

  • This is a context modified revert of commit 6a9612e2cb22
    ("net: cdc_ncm: remove ncm_parm field") which introduced
    a NCM specification violation, causing setup errors for
    some devices. These errors resulted in the device and
    host disagreeing about shared settings, with complete
    failure to communicate as the end result.

    The NCM specification require that many of the NCM specific
    control reuests are sent only while the NCM Data Interface
    is in alternate setting 0. Reverting the commit ensures that
    we follow this requirement.

    Fixes: 6a9612e2cb22 ("net: cdc_ncm: remove ncm_parm field")
    Reported-and-tested-by: Pasi Kärkkäinen
    Reported-by: Thomas Schäfer
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Bjørn Mork