01 May, 2012

1 commit


14 Apr, 2012

1 commit

  • The skb struct ubuf_info callback gets passed struct ubuf_info
    itself, not the arg value as the field name and the function signature
    seem to imply. Rename the arg field to ctx to match usage,
    add documentation and change the callback argument type
    to make usage clear and to have compiler check correctness.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

11 Apr, 2012

1 commit

  • Marc Merlin reported many order-1 allocations failures in TX path on its
    wireless setup, that dont make any sense with MTU=1500 network, and non
    SG capable hardware.

    After investigation, it turns out TCP uses sk_stream_alloc_skb() and
    used as a convention skb_tailroom(skb) to know how many bytes of data
    payload could be put in this skb (for non SG capable devices)

    Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
    sizeof(struct skb_shared_info) being above 2048)

    Later, mac80211 layer need to add some bytes at the tail of skb
    (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
    available has to call pskb_expand_head() and request order-1
    allocations.

    This patch changes sk_stream_alloc_skb() so that only
    sk->sk_prot->max_header bytes of headroom are reserved, and use a new
    skb field, avail_size to hold the data payload limit.

    This way, order-0 allocations done by TCP stack can leave more than 2 KB
    of tailroom and no more allocation is performed in mac80211 layer (or
    any layer needing some tailroom)

    avail_size is unioned with mark/dropcount, since mark will be set later
    in IP stack for output packets. Therefore, skb size is unchanged.

    Reported-by: Marc MERLIN
    Tested-by: Marc MERLIN
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Mar, 2012

2 commits

  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     

28 Mar, 2012

1 commit

  • Pull networking fixes from David Miller:
    1) Name string overrun fix in gianfar driver from Joe Perches.

    2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El

    3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso.

    4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso.

    5) Add a parameter to skb_add_rx_frag() so we can fix the truesize
    adjustments in the drivers that use it. The individual drivers
    aren't fixed by this commit, but will be dealt with using follow-on
    commits. From Eric Dumazet.

    6) Add some device IDs to qmi_wwan driver, from Andrew Bird.

    7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From
    Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()
    net: add a truesize parameter to skb_add_rx_frag()
    gianfar: Fix possible overrun and simplify interrupt name field creation
    USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces
    USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces
    USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces
    qlcnic: Bug fix for LRO
    netfilter: nf_conntrack: permanently attach timeout policy to conntrack
    netfilter: xt_CT: fix assignation of the generic protocol tracker
    netfilter: xt_CT: missing rcu_read_lock section in timeout assignment
    netfilter: cttimeout: fix dependency with l4protocol conntrack module
    netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6
    vhost: fix release path lockdep checks
    vhost: don't forget to schedule()
    tools/virtio: stub out strong barriers
    tools/virtio: add linux/hrtimer.h stub
    tools/virtio: add linux/module.h stub

    Linus Torvalds
     

26 Mar, 2012

1 commit

  • skb_add_rx_frag() API is misleading.

    Network skbs built with this helper can use uncharged kernel memory and
    eventually stress/crash machine in OOM.

    Add a 'truesize' parameter and then fix drivers in followup patches.

    Signed-off-by: Eric Dumazet
    Cc: Wey-Yi Guy
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Mar, 2012

1 commit

  • Pull cleanup from Paul Gortmaker:
    "The changes shown here are to unify linux's BUG support under the one
    file. Due to historical reasons, we have some BUG code
    in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in
    linux/kernel.h predates the addition of linux/bug.h, but old code in
    kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h
    was including to pseudo link them.

    This has caused confusion[1] and general yuck/WTF[2] reactions. Here
    is an example that violates the principle of least surprise:

    CC lib/string.o
    lib/string.c: In function 'strlcat':
    lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
    make[2]: *** [lib/string.o] Error 1
    $
    $ grep linux/bug.h lib/string.c
    #include
    $

    We've included for the BUG infrastructure and yet we
    still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
    very confusing for someone who is new to kernel development.

    With the above in mind, the goals of this changeset are:

    1) find and fix any include/*.h files that were relying on the
    implicit presence of BUG code.
    2) find and fix any C files that were consuming kernel.h and hence
    relying on implicitly getting some/all BUG code.
    3) Move the BUG related code living in kernel.h to
    4) remove the asm/bug.h from kernel.h to finally break the chain.

    During development, the order was more like 3-4, build-test, 1-2. But
    to ensure that git history for bisect doesn't get needless build
    failures introduced, the commits have been reorderd to fix the problem
    areas in advance.

    [1] https://lkml.org/lkml/2012/1/3/90
    [2] https://lkml.org/lkml/2012/1/17/414"

    Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
    and linux-next.

    * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    kernel.h: doesn't explicitly use bug.h, so don't include it.
    bug: consolidate BUILD_BUG_ON with other bug code
    BUG: headers with BUG/BUG_ON etc. need linux/bug.h
    bug.h: add include of it to various implicit C users
    lib: fix implicit users of kernel.h for TAINT_WARN
    spinlock: macroize assert_spin_locked to avoid bug.h dependency
    x86: relocate get/set debugreg fcns to include/asm/debugreg.

    Linus Torvalds
     

20 Mar, 2012

1 commit

  • As suggested by Ben, this adds the clarification on the usage of
    CHECKSUM_UNNECESSARY on the outgoing patch. Also add the usage
    description of NETIF_F_FCOE_CRC and CHECKSUM_UNNECESSARY
    for the kernel FCoE protocol driver.

    This is a follow-up to the following:
    http://patchwork.ozlabs.org/patch/147315/

    Signed-off-by: Yi Zou
    Cc: Ben Hutchings
    Cc: Jeff Kirsher
    Cc: www.Open-FCoE.org
    Signed-off-by: David S. Miller

    Yi Zou
     

10 Mar, 2012

1 commit


05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

27 Feb, 2012

1 commit


24 Feb, 2012

2 commits


22 Feb, 2012

2 commits


11 Feb, 2012

1 commit


06 Jan, 2012

1 commit

  • nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
    packed with tx_flags.

    Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other we can
    avoid a hole between ip6_frag_id and frag_list on 64 bit systems.

    Signed-off-by: Ian Campbell
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Ian Campbell
     

24 Dec, 2011

1 commit


05 Dec, 2011

1 commit

  • We discovered that TCP stack could retransmit misaligned skbs if a
    malicious peer acknowledged sub MSS frame. This currently can happen
    only if output interface is non SG enabled : If SG is enabled, tcp
    builds headless skbs (all payload is included in fragments), so the tcp
    trimming process only removes parts of skb fragments, header stay
    aligned.

    Some arches cant handle misalignments, so force a head reallocation and
    shrink headroom to MAX_TCP_HEADER.

    Dont care about misaligments on x86 and PPC (or other arches setting
    NET_IP_ALIGN to 0)

    This patch introduces __pskb_copy() which can specify the headroom of
    new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Nov, 2011

1 commit

  • Given we dont use anymore the struct net_device *dev argument, and this
    interface brings litle benefit, remove netdev_{alloc|free}_page(), to
    debloat include/linux/skbuff.h a bit.

    (Some drivers used a mix of these interfaces and alloc_pages())

    When allocating a page given to device for DMA transfer (device to
    memory), it makes sense to use a cold one (__GFP_COLD)

    Signed-off-by: Eric Dumazet
    CC: Jeff Kirsher
    CC: Dimitris Michailidis
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Nov, 2011

1 commit


17 Nov, 2011

2 commits


15 Nov, 2011

1 commit

  • One of the thing we discussed during netdev 2011 conference was the idea
    to change some network drivers to allocate/populate their skb at RX
    completion time, right before feeding the skb to network stack.

    In old days, we allocated skbs when populating the RX ring.

    This means bringing into cpu cache sk_buff and skb_shared_info cache
    lines (since we clear/initialize them), then 'queue' skb->data to NIC.

    By the time NIC fills a frame in skb->data buffer and host can process
    it, cpu probably threw away the cache lines from its caches, because lot
    of things happened between the allocation and final use.

    So the deal would be to allocate only the data buffer for the NIC to
    populate its RX ring buffer. And use build_skb() at RX completion to
    attach a data buffer (now filled with an ethernet frame) to a new skb,
    initialize the skb_shared_info portion, and give the hot skb to network
    stack.

    build_skb() is the function to allocate an skb, caller providing the
    data buffer that should be attached to it. Drivers are expected to call
    skb_reserve() right after build_skb() to adjust skb->data to the
    Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
    drivers might add a hardware provided alignment)

    Data provided to build_skb() MUST have been allocated by a prior
    kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
    skb_shared_info)) bytes at the end of the data without corrupting
    incoming frame.

    data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 +
    SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
    GFP_ATOMIC);
    ...
    skb = build_skb(data);
    if (!skb) {
    recycle_data(data);
    } else {
    skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
    ...
    }

    Signed-off-by: Eric Dumazet
    CC: Eilon Greenstein
    CC: Ben Hutchings
    CC: Tom Herbert
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    CC: Thomas Graf
    CC: Herbert Xu
    CC: Jeff Kirsher
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Nov, 2011

1 commit

  • The 802.1X EAPOL handshake hostapd does requires
    knowing whether the frame was ack'ed by the peer.
    Currently, we fudge this pretty badly by not even
    transmitting the frame as a normal data frame but
    injecting it with radiotap and getting the status
    out of radiotap monitor as well. This is rather
    complex, confuses users (mon.wlan0 presence) and
    doesn't work with all hardware.

    To get rid of that hack, introduce a real wifi TX
    status option for data frame transmissions.

    This works similar to the existing TX timestamping
    in that it reflects the SKB back to the socket's
    error queue with a SCM_WIFI_STATUS cmsg that has
    an int indicating ACK status (0/1).

    Since it is possible that at some point we will
    want to have TX timestamping and wifi status in a
    single errqueue SKB (there's little point in not
    doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
    to SO_EE_ORIGIN_TXSTATUS which can collect more
    than just the timestamp; keep the old constant
    as an alias of course. Currently the internal APIs
    don't make that possible, but it wouldn't be hard
    to split them up in a way that makes it possible.

    Thanks to Neil Horman for helping me figure out
    the functions that add the control messages.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

01 Nov, 2011

1 commit


25 Oct, 2011

1 commit


24 Oct, 2011

1 commit

  • The pair of functions,

    * skb_clone_tx_timestamp()
    * skb_complete_tx_timestamp()

    were designed to allow timestamping in PHY devices. The first
    function, called during the MAC driver's hard_xmit method, identifies
    PTP protocol packets, clones them, and gives them to the PHY device
    driver. The PHY driver may hold onto the packet and deliver it at a
    later time using the second function, which adds the packet to the
    socket's error queue.

    As pointed out by Johannes, nothing prevents the socket from
    disappearing while the cloned packet is sitting in the PHY driver
    awaiting a timestamp. This patch fixes the issue by taking a reference
    on the socket for each such packet. In addition, the comments
    regarding the usage of these function are expanded to highlight the
    rule that PHY drivers must use skb_complete_tx_timestamp() to release
    the packet, in order to release the socket reference, too.

    These functions first appeared in v2.6.36.

    Reported-by: Johannes Berg
    Signed-off-by: Richard Cochran
    Cc:
    Signed-off-by: Eric Dumazet
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Richard Cochran
     

21 Oct, 2011

2 commits


20 Oct, 2011

2 commits

  • I audited all of the callers in the tree and only one of them (pktgen) expects
    it to do so. Taking this reference is pretty obviously confusing and error
    prone.

    In particular I looked at the following commits which switched callers of
    (__)skb_frag_set_page to the skb paged fragment api:

    6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
    5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
    0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
    86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
    4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
    18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
    b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
    b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
    804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
    ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs

    Signed-off-by: Ian Campbell
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Ian Campbell
     
  • skb_recycle_check resets the skb if it's eligible for recycling.
    However, there are times when a driver might want to optionally
    manipulate the skb data with the skb before resetting the skb,
    but after it has determined eligibility. We do this by splitting the
    eligibility check from the skb reset, creating two inline functions to
    accomplish that task.

    Signed-off-by: Andy Fleming
    Acked-by: David Daney
    Signed-off-by: David S. Miller

    Andy Fleming
     

19 Oct, 2011

1 commit

  • To ease skb->truesize sanitization, its better to be able to localize
    all references to skb frags size.

    Define accessors : skb_frag_size() to fetch frag size, and
    skb_frag_size_{set|add|sub}() to manipulate it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Oct, 2011

1 commit

  • skb truesize currently accounts for sk_buff struct and part of skb head.
    kmalloc() roundings are also ignored.

    Considering that skb_shared_info is larger than sk_buff, its time to
    take it into account for better memory accounting.

    This patch introduces SKB_TRUESIZE(X) macro to centralize various
    assumptions into a single place.

    At skb alloc phase, we put skb_shared_info struct at the exact end of
    skb head, to allow a better use of memory (lowering number of
    reallocations), since kmalloc() gives us power-of-two memory blocks.

    Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
    aligned to cache lines, as before.

    Note: This patch might trigger performance regressions because of
    misconfigured protocol stacks, hitting per socket or global memory
    limits that were previously not reached. But its a necessary step for a
    more accurate memory accounting.

    Signed-off-by: Eric Dumazet
    CC: Andi Kleen
    CC: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Sep, 2011

1 commit

  • Conflicts:
    MAINTAINERS
    drivers/net/Kconfig
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
    drivers/net/ethernet/broadcom/tg3.c
    drivers/net/wireless/iwlwifi/iwl-pci.c
    drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
    drivers/net/wireless/rt2x00/rt2800usb.c
    drivers/net/wireless/wl12xx/main.c

    David S. Miller
     

16 Sep, 2011

1 commit

  • dev_forward_skb loops an skb back into host networking
    stack which might hang on the memory indefinitely.
    In particular, this can happen in macvtap in bridged mode.
    Copy the userspace fragments to avoid blocking the
    sender in that case.

    As this patch makes skb_copy_ubufs extern now,
    I also added some documentation and made it clear
    the SKBTX_DEV_ZEROCOPY flag automatically instead
    of doing it in all callers. This can be made into a separate
    patch if people feel it's worth it.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

25 Aug, 2011

1 commit


23 Aug, 2011

1 commit

  • The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
    bare get/put_page on SKB pages fragments and to isolate users from subsequent
    changes to the skb_frag_t data structure.

    Signed-off-by: Ian Campbell
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: "Michał Mirosław"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     

19 Aug, 2011

1 commit