10 Jan, 2014

5 commits


08 Jan, 2014

6 commits


07 Jan, 2014

10 commits

  • GRO/GSO layers can be enabled on a node, even if said
    node is only forwarding packets.

    This patch permits GSO (and upcoming GRO) support for GRE
    encapsulated packets, even if the host has no GRE tunnel setup.

    Signed-off-by: Eric Dumazet
    Cc: H.K. Jerry Chu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Jesse Gross says:

    ====================
    [GIT net-next] Open vSwitch

    Open vSwitch changes for net-next/3.14. Highlights are:
    * Performance improvements in the mechanism to get packets to userspace
    using memory mapped netlink and skb zero copy where appropriate.
    * Per-cpu flow stats in situations where flows are likely to be shared
    across CPUs. Standard flow stats are used in other situations to save
    memory and allocation time.
    * A handful of code cleanups and rationalization.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Drop user features if an outdated user space instance that does not
    understand the concept of user_features attempted to create a new
    datapath.

    Signed-off-by: Thomas Graf
    Signed-off-by: Jesse Gross

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Signed-off-by: Jesse Gross

    Thomas Graf
     
  • Make the skb zerocopy logic written for nfnetlink queue available for
    use by other modules.

    Signed-off-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Acked-by: David S. Miller
    Signed-off-by: Jesse Gross

    Thomas Graf
     
  • Allocates a new sk_buff large enough to cover the specified payload
    plus required Netlink headers. Will check receiving socket for
    memory mapped i/o capability and use it if enabled. Will fall back
    to non-mapped skb if message size exceeds the frame size of the ring.

    Signed-of-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Signed-off-by: Jesse Gross

    Thomas Graf
     
  • Conflicts:
    drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
    net/ipv6/ip6_tunnel.c
    net/ipv6/ip6_vti.c

    ipv6 tunnel statistic bug fixes conflicting with consolidation into
    generic sw per-cpu net stats.

    qlogic conflict between queue counting bug fix and the addition
    of multiple MAC address support.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • TCP out_of_order_queue lock is not used, as queue manipulation
    happens with socket lock held and we therefore use the lockless
    skb queue routines (as __skb_queue_head())

    We can use __skb_queue_head_init() instead of skb_queue_head_init()
    to make this more consistent.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Proportional Integral controller Enhanced (PIE) is a scheduler to address the
    bufferbloat problem.

    >From the IETF draft below:
    " Bufferbloat is a phenomenon where excess buffers in the network cause high
    latency and jitter. As more and more interactive applications (e.g. voice over
    IP, real time video streaming and financial transactions) run in the Internet,
    high latency and jitter degrade application performance. There is a pressing
    need to design intelligent queue management schemes that can control latency and
    jitter; and hence provide desirable quality of service to users.

    We present here a lightweight design, PIE(Proportional Integral controller
    Enhanced) that can effectively control the average queueing latency to a target
    value. Simulation results, theoretical analysis and Linux testbed results have
    shown that PIE can ensure low latency and achieve high link utilization under
    various congestion situations. The design does not require per-packet
    timestamp, so it incurs very small overhead and is simple enough to implement
    in both hardware and software. "

    Many thanks to Dave Taht for extensive feedback, reviews, testing and
    suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
    suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
    support.

    For more information, please see technical paper about PIE in the IEEE
    Conference on High Performance Switching and Routing 2013. A copy of the paper
    can be found at ftp://ftpeng.cisco.com/pie/.

    Please also refer to the IETF draft submission at
    http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

    All relevant code, documents and test scripts and results can be found at
    ftp://ftpeng.cisco.com/pie/.

    For problems with the iproute2/tc or Linux kernel code, please contact Vijay
    Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
    (mysuryan@cisco.com)

    Signed-off-by: Vijay Subramanian
    Signed-off-by: Mythili Prabhu
    CC: Dave Taht
    Signed-off-by: David S. Miller

    Vijay Subramanian
     
  • Pablo Neira Ayuso says:

    ====================
    nftables updates for net-next

    The following patchset contains nftables updates for your net-next tree,
    they are:

    * Add set operation to the meta expression by means of the select_ops()
    infrastructure, this allows us to set the packet mark among other things.
    From Arturo Borrero Gonzalez.

    * Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
    Borkmann.

    * Add new queue expression to nf_tables. These comes with two previous patches
    to prepare this new feature, one to add mask in nf_tables_core to
    evaluate the queue verdict appropriately and another to refactor common
    code with xt_NFQUEUE, from Eric Leblond.

    * Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
    Eric Leblond.

    * Add the reject expression to nf_tables, this adds the missing TCP RST
    support. It comes with an initial patch to refactor common code with
    xt_NFQUEUE, again from Eric Leblond.

    * Remove an unused variable assignment in nf_tables_dump_set(), from Michal
    Nazarewicz.

    * Remove the nft_meta_target code, now that Arturo added the set operation
    to the meta expression, from me.

    * Add help information for nf_tables to Kconfig, also from me.

    * Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
    available to other nf_tables objects, requested by Arturo, from me.

    * Expose the table usage counter, so we can know how many chains are using
    this table without dumping the list of chains, from Tomasz Bursztyka.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jan, 2014

2 commits

  • macvlan needs vlan_pcpu_stats so make it visible even if compiling
    without VLAN_8021Q support. Otherwise a very long compiler error happens.

    Fixes: cdf3e274cf1b36 ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
    Cc: Li RongQing
    Signed-off-by: Hannes Frederic Sowa
    Acked-By: Li RongQing
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next tree,
    they are:

    * Add full port randomization support. Some crazy researchers found a way
    to reconstruct the secure ephemeral ports that are allocated in random mode
    by sending off-path bursts of UDP packets to overrun the socket buffer of
    the DNS resolver to trigger retransmissions, then if the timing for the
    DNS resolution done by a client is larger than usual, then they conclude
    that the port that received the burst of UDP packets is the one that was
    opened. It seems a bit aggressive method to me but it seems to work for
    them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
    new NAT mode to fully randomize ports using prandom.

    * Add a new classifier to x_tables based on the socket net_cls set via
    cgroups. These includes two patches to prepare the field as requested by
    Zefan Li. Also from Daniel Borkmann.

    * Use prandom instead of get_random_bytes in several locations of the
    netfilter code, from Florian Westphal.

    * Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
    mark, also from Florian Westphal.

    * Fix compilation warning due to unused variable in IPVS, from Geert
    Uytterhoeven.

    * Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

    * Add IPComp extension to x_tables, from Fan Du.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Jan, 2014

7 commits

  • This function is used to get a specific core when there is more than
    one core of that specific type. This is used in bgmac to reset all GMAC
    cores.

    Signed-off-by: Hauke Mehrtens
    Acked-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Hauke Mehrtens
     
  • They are same, so unify them as one; since macvlan is a kind of vlan,
    vlan_pcpu_stats should be a proper name for vlan and macvlan.

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • They are same, so unify them as one, pcpu_sw_netstats.

    Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
    from if_tunnel and remove br_cpu_netstats from br_private.h

    Cc: Cong Wang
    Cc: Stephen Hemminger
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates

    This series contains updates to i40e and pci_regs.h.

    Anjali provides a patch to prevent messages from stray HMC events, except
    at interrupt message level, and refactors the HMC error handling.

    Catherine adds routines in probe to populate/check PCI bus speed and width,
    then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.

    Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
    return type for i40e_vsi_clear_rings().

    Joseph implements receive offload for VXLAN for i40e, where the hardware
    supports checksum offload/verification of the inner/outer header.

    Mitch provides the bulk of the changes, where he refactors the VF reset
    code so that it works on real hardware. Then does code cleanup by
    calling existing functions to enable and disable queues for VFs and
    remove unused functions. Removes a unnecessary log messages that are
    seen at every VF reset, for example complaining about disabling queues
    that are already disabled. Fixes an error return when the VF asks to
    add an invalid MAC address and if the VF sends a bad message, make it
    more informative about what is actually going on.

    Jesse refactors the LED function to flash LED lights correctly.

    v2:
    - removed patch 5 "i40e: add set settings and pauseparam" based on
    feedback from Ben Hutchings, will re-work that patch for later
    submission
    - Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
    address Or Gerlitz's questions and concerns. This patch adds the
    implementation for the VXLAN ndo's and allows the hardware to do
    receive checksum offload for inner packets on the UDP ports that
    VXLAN notifies us about.
    - Added patch "i40e: using for_each_set_bit to simplify the code"
    from Wei Yongjun. This patch uses for_each_set_bit() to simply
    the code.

    v3:
    - fixed indentation issue in patch 11 based on feedback from
    Sergei Shtylyov.

    Sorry for the delayed release of v4, it was delayed to the holidays.

    v4:
    - Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
    while holding a spin lock in patch 6 by executing the AQ commands from
    a subtask.
    - Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
    option for i40e and wrapped appropriate code with the config option in
    patch 6.
    - Updated patch 7 based on the changes made in patch 6 in the above two
    bullets.

    v5:
    - Added the patch to pci_regs.h based on David Miller's feedback to add
    PCI defines for speed and width
    - Updated patch 3 description to better explain the changes based on
    feedback from David Miller
    - Updated patch 4 to use the newly added defines to pci_regs.h instead
    of local defines
    - Updated patch 7 to use in the #include based on feedback
    from David Miller
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • phy_scan_fixups() isn't and shouldn't be called by the drivers directly, so
    unexport it. And since Florian Fainelli's recent patches, the function is only
    called locally, so we can make it static as well.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     
  • Remove adjust_state() callback from 'struct phy_device' since it seems to have
    never been really used from the inception: phy_start_machine() has been always
    called with 2nd argument equal to NULL.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     
  • Running 'checkpatch.pl' gives some errors and warnings:

    - no spaces around =;

    - * separated by space from the function name;

    - { in function definition not on a separate line;

    - line over 80 characters.

    While fixing these, also fix the following style issues:

    - file name in the heading comment;

    - alignment not matching open paren.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     

04 Jan, 2014

10 commits

  • Add missing PCI bus link speed 8.0 GT/s and bus link widths of
    x1, x2, x4 and x8.

    CC:
    CC: Bjorn Helgaas
    Signed-off-by: Jeff Kirsher
    Acked-by: Bjorn Helgaas

    Jeff Kirsher
     
  • Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    sfeldma@cumulusnetworks.com
     
  • Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
    ad_select via netlink.

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    sfeldma@cumulusnetworks.com
     
  • Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
    lacp_rate via netlink.

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    sfeldma@cumulusnetworks.com
     
  • The llc_sap_list_lock does not need to be global, only acquired
    in core.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Namespace related cleaning

    * make cred_to_ucred static
    * remove unused sock_rmalloc function

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • percpu route cache eliminates share of dst refcnt between CPUs.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Avoid doing a route lookup on every packet being tunneled.

    In ip_tunnel.c cache the route returned from ip_route_output if
    the tunnel is "connected" so that all the rouitng parameters are
    taken from tunnel parms for a packet. Specifically, not NBMA tunnel
    and tos is from tunnel parms (not inner packet).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • It would be useful e.g. in a server or desktop environment to have
    a facility in the notion of fine-grained "per application" or "per
    application group" firewall policies. Probably, users in the mobile,
    embedded area (e.g. Android based) with different security policy
    requirements for application groups could have great benefit from
    that as well. For example, with a little bit of configuration effort,
    an admin could whitelist well-known applications, and thus block
    otherwise unwanted "hard-to-track" applications like [1] from a
    user's machine. Blocking is just one example, but it is not limited
    to that, meaning we can have much different scenarios/policies that
    netfilter allows us than just blocking, e.g. fine grained settings
    where applications are allowed to connect/send traffic to, application
    traffic marking/conntracking, application-specific packet mangling,
    and so on.

    Implementation of PID-based matching would not be appropriate
    as they frequently change, and child tracking would make that
    even more complex and ugly. Cgroups would be a perfect candidate
    for accomplishing that as they associate a set of tasks with a
    set of parameters for one or more subsystems, in our case the
    netfilter subsystem, which, of course, can be combined with other
    cgroup subsystems into something more complex if needed.

    As mentioned, to overcome this constraint, such processes could
    be placed into one or multiple cgroups where different fine-grained
    rules can be defined depending on the application scenario, while
    e.g. everything else that is not part of that could be dropped (or
    vice versa), thus making life harder for unwanted processes to
    communicate to the outside world. So, we make use of cgroups here
    to track jobs and limit their resources in terms of iptables
    policies; in other words, limiting, tracking, etc what they are
    allowed to communicate.

    In our case we're working on outgoing traffic based on which local
    socket that originated from. Also, one doesn't even need to have
    an a-prio knowledge of the application internals regarding their
    particular use of ports or protocols. Matching is *extremly*
    lightweight as we just test for the sk_classid marker of sockets,
    originating from net_cls. net_cls and netfilter do not contradict
    each other; in fact, each construct can live as standalone or they
    can be used in combination with each other, which is perfectly fine,
    plus it serves Tejun's requirement to not introduce a new cgroups
    subsystem. Through this, we result in a very minimal and efficient
    module, and don't add anything except netfilter code.

    One possible, minimal usage example (many other iptables options
    can be applied obviously):

    1) Configuring cgroups if not already done, e.g.:

    mkdir /sys/fs/cgroup/net_cls
    mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
    mkdir /sys/fs/cgroup/net_cls/0
    echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
    (resp. a real flow handle id for tc)

    2) Configuring netfilter (iptables-nftables), e.g.:

    iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

    3) Running applications, e.g.:

    ping 208.67.222.222
    echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
    64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
    [...]
    ping 208.67.220.220
    ping: sendmsg: Operation not permitted
    [...]
    echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
    64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
    [...]

    Of course, real-world deployments would make use of cgroups user
    space toolsuite, or own custom policy daemons dynamically moving
    applications from/to various cgroups.

    [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

    Signed-off-by: Daniel Borkmann
    Cc: Tejun Heo
    Cc: cgroups@vger.kernel.org
    Acked-by: Li Zefan
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     
  • While we're at it and introduced CGROUP_NET_CLASSID, lets also make
    NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
    into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
    CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
    option consistent among networking cgroups, but also among cgroups
    CONFIG conventions in general as the vast majority has a prefix of
    CONFIG_CGROUP_.

    Signed-off-by: Daniel Borkmann
    Cc: Zefan Li
    Cc: cgroups@vger.kernel.org
    Acked-by: Li Zefan
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann