22 Mar, 2012

2 commits

  • napi->skb is allocated in napi_get_frags() using
    netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
    NET_IP_ALIGN bytes.

    However, when such skb is recycled in napi_reuse_skb(), it ends with a
    reserve of NET_IP_ALIGN which is suboptimal.

    Signed-off-by: Eric Dumazet
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

21 Mar, 2012

2 commits

  • Pull networking merge from David Miller:
    "1) Move ixgbe driver over to purely page based buffering on receive.
    From Alexander Duyck.

    2) Add receive packet steering support to e1000e, from Bruce Allan.

    3) Convert TCP MD5 support over to RCU, from Eric Dumazet.

    4) Reduce cpu usage in handling out-of-order TCP packets on modern
    systems, also from Eric Dumazet.

    5) Support the IP{,V6}_UNICAST_IF socket options, making the wine
    folks happy, from Erich Hoover.

    6) Support VLAN trunking from guests in hyperv driver, from Haiyang
    Zhang.

    7) Support byte-queue-limtis in r8169, from Igor Maravic.

    8) Outline code intended for IP_RECVTOS in IP_PKTOPTIONS existed but
    was never properly implemented, Jiri Benc fixed that.

    9) 64-bit statistics support in r8169 and 8139too, from Junchang Wang.

    10) Support kernel side dump filtering by ctmark in netfilter
    ctnetlink, from Pablo Neira Ayuso.

    11) Support byte-queue-limits in gianfar driver, from Paul Gortmaker.

    12) Add new peek socket options to assist with socket migration, from
    Pavel Emelyanov.

    13) Add sch_plug packet scheduler whose queue is controlled by
    userland daemons using explicit freeze and release commands. From
    Shriram Rajagopalan.

    14) Fix FCOE checksum offload handling on transmit, from Yi Zou."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1846 commits)
    Fix pppol2tp getsockname()
    Remove printk from rds_sendmsg
    ipv6: fix incorrent ipv6 ipsec packet fragment
    cpsw: Hook up default ndo_change_mtu.
    net: qmi_wwan: fix build error due to cdc-wdm dependecy
    netdev: driver: ethernet: Add TI CPSW driver
    netdev: driver: ethernet: add cpsw address lookup engine support
    phy: add am79c874 PHY support
    mlx4_core: fix race on comm channel
    bonding: send igmp report for its master
    fs_enet: Add MPC5125 FEC support and PHY interface selection
    net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
    net: update the usage of CHECKSUM_UNNECESSARY
    fcoe: use CHECKSUM_UNNECESSARY instead of CHECKSUM_PARTIAL on tx
    net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso
    ixgbe: Fix issues with SR-IOV loopback when flow control is disabled
    net/hyperv: Fix the code handling tx busy
    ixgbe: fix namespace issues when FCoE/DCB is not enabled
    rtlwifi: Remove unused ETH_ADDR_LEN defines
    igbvf: Use ETH_ALEN
    ...

    Fix up fairly trivial conflicts in drivers/isdn/gigaset/interface.c and
    drivers/net/usb/{Kconfig,qmi_wwan.c} as per David.

    Linus Torvalds
     
  • Pull cgroup changes from Tejun Heo:
    "Out of the 8 commits, one fixes a long-standing locking issue around
    tasklist walking and others are cleanups."

    * 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
    cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
    cgroup: remove cgroup_subsys argument from callbacks
    cgroup: remove extra calls to find_existing_css_set
    cgroup: replace tasklist_lock with rcu_read_lock
    cgroup: simplify double-check locking in cgroup_attach_proc
    cgroup: move struct cgroup_pidlist out from the header file
    cgroup: remove cgroup_attach_task_current_cg()

    Linus Torvalds
     

20 Mar, 2012

1 commit


13 Mar, 2012

1 commit


12 Mar, 2012

1 commit

  • The following 4 functions:
    move_addr_to_kernel
    move_addr_to_user
    verify_iovec
    verify_compat_iovec
    are always effectively called with a sockaddr_storage.

    Make this explicit by changing their signature.

    This removes a large number of casts from sockaddr_storage to sockaddr.

    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

07 Mar, 2012

1 commit


06 Mar, 2012

2 commits


05 Mar, 2012

2 commits


01 Mar, 2012

1 commit

  • Add VF spoof check to IFLA policy. The original patch I submitted to
    add the spoof checking feature to rtnl failed to add the proper policy
    rule that identifies the data type and len. This patch corrects that
    oversight. No bugs have been reported against this but it may cause
    some problem for the netlink message parsing that uses the policy
    table.

    CC: stable@vger.kernel.org
    Signed-off-by: Greg Rose
    Tested-by: Sibai Li
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

27 Feb, 2012

2 commits


25 Feb, 2012

1 commit


24 Feb, 2012

4 commits

  • This flag requests that network devices pass all
    received frames up the stack, even ones with errors
    such as invalid FCS (frame check sum). This will
    allow sniffers to see bad packets and perhaps
    give the user some idea how to fix the problem.

    Signed-off-by: Ben Greear
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Ben Greear
     
  • This is useful for testing RX handling of frames with bad
    CRCs.

    Requires driver support to actually put the packet on the
    wire properly.

    Signed-off-by: Ben Greear
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Ben Greear
     
  • When set on hardware that supports the feature,
    this causes the Ethernet FCS to be appended
    to the end of the skb.

    Useful for sniffing packets.

    Signed-off-by: Ben Greear
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Ben Greear
     
  • …_key_slow_[inc|dec]()

    So here's a boot tested patch on top of Jason's series that does
    all the cleanups I talked about and turns jump labels into a
    more intuitive to use facility. It should also address the
    various misconceptions and confusions that surround jump labels.

    Typical usage scenarios:

    #include <linux/static_key.h>

    struct static_key key = STATIC_KEY_INIT_TRUE;

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    Or:

    if (static_key_true(&key))
    do likely code
    else
    do unlikely code

    The static key is modified via:

    static_key_slow_inc(&key);
    ...
    static_key_slow_dec(&key);

    The 'slow' prefix makes it abundantly clear that this is an
    expensive operation.

    I've updated all in-kernel code to use this everywhere. Note
    that I (intentionally) have not pushed through the rename
    blindly through to the lowest levels: the actual jump-label
    patching arch facility should be named like that, so we want to
    decouple jump labels from the static-key facility a bit.

    On non-jump-label enabled architectures static keys default to
    likely()/unlikely() branches.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Jason Baron <jbaron@redhat.com>
    Acked-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: a.p.zijlstra@chello.nl
    Cc: mathieu.desnoyers@efficios.com
    Cc: davem@davemloft.net
    Cc: ddaney.cavm@gmail.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Ingo Molnar
     

22 Feb, 2012

5 commits

  • Implement a new netlink attribute type IFLA_EXT_MASK. The mask
    is a 32 bit value that can be used to indicate to the kernel that
    certain extended ifinfo values are requested by the user application.
    At this time the only mask value defined is RTEXT_FILTER_VF to
    indicate that the user wants the ifinfo dump to send information
    about the VFs belonging to the interface.

    This patch fixes a bug in which certain applications do not have
    large enough buffers to accommodate the extra information returned
    by the kernel with large numbers of SR-IOV virtual functions.
    Those applications will not send the new netlink attribute with
    the interface info dump request netlink messages so they will
    not get unexpectedly large request buffers returned by the kernel.

    Modifies the rtnl_calcit function to traverse the list of net
    devices and compute the minimum buffer size that can hold the
    info dumps of all matching devices based upon the filter passed
    in via the new netlink attribute filter mask. If no filter
    mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

    With this change it is possible to add yet to be defined netlink
    attributes to the dump request which should make it fairly extensible
    in the future.

    Signed-off-by: Greg Rose
    Signed-off-by: David S. Miller

    Greg Rose
     
  • When the fixed race condition happens:

    1. While function neigh_periodic_work scans the neighbor hash table
    pointed by field tbl->nht, it unlocks and locks tbl->lock between
    buckets in order to call cond_resched.

    2. Assume that function neigh_periodic_work calls cond_resched, that is,
    the lock tbl->lock is available, and function neigh_hash_grow runs.

    3. Once function neigh_hash_grow finishes, and RCU calls
    neigh_hash_free_rcu, the original struct neigh_hash_table that function
    neigh_periodic_work was using doesn't exist anymore.

    4. Once back at neigh_periodic_work, whenever the old struct
    neigh_hash_table is accessed, things can go badly.

    Signed-off-by: Michel Machado
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Michel Machado
     
  • This one specifies where to start MSG_PEEK-ing queue data from. When
    set to negative value means that MSG_PEEK works as ususally -- peeks
    from the head of the queue always.

    When some bytes are peeked from queue and the peeking offset is non
    negative it is moved forward so that the next peek will return next
    portion of data.

    When non-peeking recvmsg occurs and the peeking offset is non negative
    is is moved backward so that the next peek will still peek the proper
    data (i.e. the one that would have been picked if there were no non
    peeking recv in between).

    The offset is set using per-proto opteration to let the protocol handle
    the locking issues and to check whether the peeking offset feature is
    supported by the protocol the socket belongs to.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This one is only considered for MSG_PEEK flag and the value pointed by
    it specifies where to start peeking bytes from. If the offset happens to
    point into the middle of the returned skb, the offset within this skb is
    put back to this very argument.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This makes lines shorter and simplifies further patching.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

20 Feb, 2012

1 commit


15 Feb, 2012

1 commit


14 Feb, 2012

1 commit


11 Feb, 2012

4 commits


09 Feb, 2012

2 commits

  • Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
    only, and failed on IB/ipoib traffic.

    He provided a patch faking a zeroed header to let GRO aggregates frames.

    Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
    check to be more generic, ie not assuming L2 header is 14 bytes, but
    taking into account hard_header_len.

    __napi_gro_receive() has special handling for the common case (Ethernet)
    to avoid a memcmp() call and use an inline optimized function instead.

    Signed-off-by: Eric Dumazet
    Reported-by: Shlomo Pongratz
    Cc: Roland Dreier
    Cc: Or Gerlitz
    Cc: Herbert Xu
    Tested-by: Sean Hefty
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
    only, and failed on IB/ipoib traffic.

    He provided a patch faking a zeroed header to let GRO aggregates frames.

    Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
    check to be more generic, ie not assuming L2 header is 14 bytes, but
    taking into account hard_header_len.

    __napi_gro_receive() has special handling for the common case (Ethernet)
    to avoid a memcmp() call and use an inline optimized function instead.

    Signed-off-by: Eric Dumazet
    Reported-by: Shlomo Pongratz
    Cc: Roland Dreier
    Cc: Or Gerlitz
    Cc: Herbert Xu
    Tested-by: Sean Hefty
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Feb, 2012

2 commits


03 Feb, 2012

1 commit

  • The argument is not used at all, and it's not necessary, because
    a specific callback handler of course knows which subsys it
    belongs to.

    Now only ->pupulate() takes this argument, because the handlers of
    this callback always call cgroup_add_file()/cgroup_add_files().

    So we reduce a few lines of code, though the shrinking of object size
    is minimal.

    16 files changed, 113 insertions(+), 162 deletions(-)

    text data bss dec hex filename
    5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
    5486170 656987 7039960 13183117 c9288d vmlinux.o

    Signed-off-by: Li Zefan
    Signed-off-by: Tejun Heo

    Li Zefan
     

02 Feb, 2012

3 commits