21 Apr, 2010

1 commit


17 Apr, 2010

1 commit

  • The af_packet protocol is used by Perl to do ioctls as reported by
    Stephane Riviere:

    "Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
    addresses of the network interface."

    But in a new network namespace these ioctl fail because it is disabled for
    a namespace different from the init_net_ns.

    These two lines should not be there as af_inet and af_packet are
    namespace aware since a long time now. I suppose we forget to remove these
    lines because we sent the af_packet first, before af_inet was supported.

    Signed-off-by: Daniel Lezcano
    Reported-by: Stephane Riviere
    Signed-off-by: David S. Miller

    Daniel Lezcano
     

13 Apr, 2010

1 commit

  • Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets.
    We introduce PACKET_TX_TIMESTAMP for the control message cmsg_type.

    Similar support for UDP and CAN sockets was added in commit
    51f31cabe3ce5345b51e4a4f82138b38c4d5dc91

    Signed-off-by: Richard Cochran
    Signed-off-by: David S. Miller

    Richard Cochran
     

12 Apr, 2010

1 commit


04 Apr, 2010

2 commits

  • Converts the list and the core manipulating with it to be the same as uc_list.

    +uses two functions for adding/removing mc address (normal and "global"
    variant) instead of a function parameter.
    +removes dev_mcast.c completely.
    +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
    manipulation with lists on a sandbox (used in bonding and 80211 drivers)

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • +little renaming of unicast functions to be smooth with multicast ones

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Mar, 2010

1 commit

  • My previous patch 914c8ad2d18b62ad1420f518c0cab0b0b90ab308 incorrectly changed
    the length check in packet_mc_add to be more strict. The problem is that
    userspace is not filling this field (and it stays zeroed) in case of setting
    PACKET_MR_PROMISC or PACKET_MR_ALLMULTI. So move the strict check to the point
    in path where the addr_len must be set correctly.

    Signed-off-by: Jiri Pirko
    Reported-by: Pavel Roskin
    Signed-off-by: David S. Miller

    Jiri Pirko
     

01 Mar, 2010

1 commit


26 Feb, 2010

1 commit


25 Feb, 2010

1 commit

  • Update rcu_dereference() primitives to use new lockdep-based
    checking. The rcu_dereference() in __in6_dev_get() may be
    protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
    The rcu_dereference() in __sk_free() is protected by the fact
    that it is never reached if an update could change it. Check
    for this by using rcu_dereference_check() to verify that the
    struct sock's ->sk_wmem_alloc counter is zero.

    Acked-by: Eric Dumazet
    Acked-by: David S. Miller
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

23 Feb, 2010

1 commit

  • Convert AF_PACKET to use RCU, eliminating one more reader/writer lock.

    There is no need for a real sk_del_node_init_rcu(), because sk_del_node_init
    is doing the equivalent thing to hlst_del_init_rcu already; but added
    some comments to try and make that obvious.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

11 Feb, 2010

1 commit


06 Feb, 2010

1 commit

  • Early on this was an experimental facility that few
    people other than Alexey Kuznetsov played with.

    Now it's a pretty fundamental thing and as people add
    more features to AF_PACKET sockets this config options
    creates ifdef spaghetti.

    So kill it off.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Feb, 2010

1 commit

  • This patch adds GSO/checksum offload to af_packet sockets using
    virtio_net_hdr. Based on Rusty's patch to add this support to tun.
    It allows GSO/checksum offload to be enabled when using raw socket
    backend with virtio_net.
    Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the
    receive path and process/skip virtio_net_hdr in the send path. This
    option is only allowed with SOCK_RAW sockets attached to ethernet
    type devices.

    v2 updates
    ----------
    Michael's Comments
    - Perform length check in packet_snd() when GSO is off even when
    vnet_hdr is present.
    - Check for SKB_GSO_FCOE type and return -EINVAL
    - don't allow tx/rx ring when vnet_hdr is enabled.
    Herbert's Comments
    - Removed ethernet specific code.
    - protocol value is assumed to be passed in by the caller.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     

23 Jan, 2010

1 commit


18 Jan, 2010

1 commit


12 Jan, 2010

1 commit


16 Dec, 2009

1 commit

  • commit 654d1f8a019dfa06d (packet: less dev_put() calls)
    introduced a problem, calling potentially sleeping functions from a
    rcu_read_lock() protected section.

    Fix this by releasing lock before the sock_wmalloc()/memcpy_fromiovec() calls.

    After skb allocation and copy from user space, we redo device
    lookup and appropriate tests.

    Reported-and-tested-by: Frederic Weisbecker
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2009

1 commit


26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

11 Nov, 2009

1 commit


06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

02 Nov, 2009

1 commit

  • - packet_sendmsg_spkt() can use dev_get_by_name_rcu() to avoid touching device refcount.

    - packet_getname_spkt() & packet_getname() can use dev_get_by_index_rcu() to
    avoid touching device refcount too.

    tpacket_snd() & packet_snd() can not use RCU yet because they can sleep when
    allocating skb.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Oct, 2009

1 commit


29 Oct, 2009

1 commit

  • Currently PACKET_TX_RING forces certain amount of every frame to remain
    unused. This probably originates from an early version of the
    PACKET_TX_RING patch that in fact used the extra space when the (since
    removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current
    code does not make any use of this extra space.

    This patch removes the extra space reservation and lets userspace make
    use of the full frame size.

    Signed-off-by: Gabor Gombas
    Signed-off-by: David S. Miller

    Gabor Gombas
     

27 Oct, 2009

1 commit

  • We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb.

    Null value is used as a special value, meaning vlan tagging not enabled.
    This forbids use of null vlan ID.

    As pointed by David, some drivers use the 3 high order bits (PRIO)

    As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and
    allow null VLAN ID.

    In case future code really wants to use VLAN_CFI_MASK, we'll have to use
    a bit outside of vlan_tci.

    #define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */
    #define VLAN_PRIO_SHIFT 13
    #define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */
    #define VLAN_TAG_PRESENT VLAN_CFI_MASK
    #define VLAN_VID_MASK 0x0fff /* VLAN Identifier */

    Reported-by: Gertjan Hofman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Oct, 2009

2 commits

  • We hold RTNL, we can use __dev_get_by_index() instead of dev_get_by_index()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • While doing multiple captures, I found af_packet was dirtying cache line
    containing its prot_hook.

    This slow down machines where several cpus are necessary to handle capture
    traffic, as each prot_hook is traversed for each packet coming in or out
    the host.

    This patches moves "struct packet_type prot_hook" to the end of
    packet_sock, and uses a ____cacheline_aligned_in_smp to make sure
    this remains shared by all cpus.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2009

1 commit

  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

12 Oct, 2009

1 commit


07 Oct, 2009

2 commits


05 Oct, 2009

1 commit

  • Add Ancilliary data to better represent loss information

    I've had a few requests recently to provide more detail regarding frame loss
    during an AF_PACKET packet capture session. Specifically the requestors want to
    see where in a packet sequence frames were lost, i.e. they want to see that 40
    frames were lost between frames 302 and 303 in a packet capture file. In order
    to do this we need:

    1) The kernel to export this data to user space
    2) The applications to make use of it

    This patch addresses item (1). It does this by doing the following:

    A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
    also no increment a stats called po->stats.tp_gap.

    B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
    value of po->stats.tp_gap in skb->mark. skb->cb would nominally be the place to
    record this, but since all the space there is used up, we're overloading
    skb->mark. Its safe to do since any enqueued packet is guaranteed to be
    unshared at this point, and skb->mark isn't used for anything else in the rx
    path to the application. After we record tp_gap in the skb, we zero
    po->stats.tp_gap. This allows us to keep a counter of the number of frames lost
    between any two enqueued packets

    C) When the application goes to dequeue a frame from the packet socket, we look
    at skb->mark for that frame. If it is non-zero, we add a cmsg chunk to the
    msghdr of level SOL_PACKET and type PACKET_GAPDATA. Its a 32 bit integer that
    represents the number of frames lost between this packet and the last previous
    frame received.

    Note there is a chance that if there is frame loss after a receive, and then the
    socket is closed, some gap data might be lost. This is covered by the use of
    the PACKET_AUXDATA socket option, which gives total loss data. With a bit of
    math, the final gap can be determined that way.

    I've tested this patch myself, and it works well.

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet

    include/linux/if_packet.h | 2 ++
    net/packet/af_packet.c | 33 +++++++++++++++++++++++++++++++++
    2 files changed, 35 insertions(+)
    Signed-off-by: David S. Miller

    Neil Horman
     

01 Oct, 2009

2 commits


28 Sep, 2009

1 commit


24 Jul, 2009

1 commit


18 Jun, 2009

1 commit

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet