11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Sep, 2012

1 commit


25 Aug, 2012

1 commit


24 Aug, 2012

1 commit


23 Aug, 2012

3 commits

  • Change since v1:

    * Fixed inuse counters access spotted by Eric

    In patch eea68e2f (packet: Report socket mclist info via diag module) I've
    introduced a "scheduling in atomic" problem in packet diag module -- the
    socket list is traversed under rcu_read_lock() while performed under it sk
    mclist access requires rtnl lock (i.e. -- mutex) to be taken.

    [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
    [152363.820573] 4 locks held by crtools/12517:
    [152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [] sock_diag_rcv+0x1f/0x3e
    [152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [] sock_diag_rcv_msg+0xdb/0x11a
    [152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [] netlink_dump+0x23/0x1ab
    [152363.820693] #3: (rcu_read_lock){.+.+..}, at: [] packet_diag_dump+0x0/0x1af

    Similar thing was then re-introduced by further packet diag patches (fanount
    mutex and pgvec mutex for rings) :(

    Apart from being terribly sorry for the above, I propose to change the packet
    sk list protection from spinlock to mutex. This lock currently protects two
    modifications:

    * sklist
    * prot inuse counters

    The sklist modifications can be just reprotected with mutex since they already
    occur in a sleeping context. The inuse counters modifications are trickier -- the
    __this_cpu_-s are used inside, thus requiring the caller to handle the potential
    issues with contexts himself. Since packet sockets' counters are modified in two
    places only (packet_create and packet_release) we only need to protect the context
    from being preempted. BH disabling is not required in this case.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Instead of using a hard-coded value for the status variable, it would make
    the code more readable to use its destined define from linux/if_packet.h.

    Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
    Signed-off-by: David S. Miller

    danborkmann@iogearbox.net
     
  • David S. Miller
     

20 Aug, 2012

3 commits

  • If a packet is emitted on one socket in one group of fanout sockets,
    it is transmitted again. It is thus read again on one of the sockets
    of the fanout group. This result in a loop for software which
    generate packets when receiving one.
    This retransmission is not the intended behavior: a fanout group
    must behave like a single socket. The packet should not be
    transmitted on a socket if it originates from a socket belonging
    to the same fanout group.

    This patch fixes the issue by changing the transmission check to
    take fanout group info account.

    Reported-by: Aleksandr Kotov
    Signed-off-by: Eric Leblond
    Signed-off-by: David S. Miller

    Eric Leblond
     
  • Reported value is the same reported by the FANOUT getsockoption, but
    unlike it, the absent fanout setup results in absent nlattr, rather
    than in nlattr with zero value. This is done so, since zero fanout
    report may mean both -- no fanout, and fanout with both id and type zero.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • One extension bit may result in two nlattrs -- one per ring type.
    If some ring type is not configured, then the respective nlatts
    will be empty.

    The structure reported contains the data, that is given to the
    corresponding ring setup socket option.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

15 Aug, 2012

5 commits


13 Aug, 2012

1 commit

  • Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

    Don't use BUG() or BUG_ON() unless there's really no way out; one
    example might be detecting data structure corruption in the middle
    of an operation that can't be backed out of. If the (sub)system
    can somehow continue operating, perhaps with reduced functionality,
    it's probably not BUG-worthy.

    If you're tempted to BUG(), think again: is completely giving up
    really the *only* solution? There are usually better options, where
    users don't need to reboot ASAP and can mostly shut down cleanly.

    In our case, the status flag of a ring buffer slot is managed from both sides,
    the kernel space and the user space. This means that even though the kernel
    side might work as expected, the user space screws up and changes this flag
    right between the send(2) is triggered when the flag is changed to
    TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
    will hit the BUG macro. As David suggested, the best solution is to simply
    remove this statement since it cannot be used for kernel side internal
    consistency checks. I've tested it and the system still behaves /stable/ in
    this case, so in accordance with the above comment, we should rather remove it.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    danborkmann@iogearbox.net
     

09 Aug, 2012

1 commit


28 Jun, 2012

1 commit

  • 1. removed code replication for tov calculation for 1G, 10G and
    made is common for speed > 1G (1G, 10G, 40G, 100G).
    2. defines values for #4 different 40G Phys (KR4, LF4, SR4, CR4)

    Signed-off-by: Parav Pandit
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    parav.pandit@emulex.com
     

12 Jun, 2012

1 commit


04 Jun, 2012

1 commit

  • Adding casts of objects to the same type is unnecessary
    and confusing for a human reader.

    For example, this cast:

    int y;
    int *p = (int *)&y;

    I used the coccinelle script below to find and remove these
    unnecessary casts. I manually removed the conversions this
    script produces of casts with __force and __user.

    @@
    type T;
    T *p;
    @@

    - (T *)p
    + p

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

22 Apr, 2012

1 commit


20 Apr, 2012

1 commit


16 Apr, 2012

1 commit


29 Mar, 2012

1 commit


24 Feb, 2012

1 commit


31 Dec, 2011

1 commit


28 Dec, 2011

1 commit


24 Dec, 2011

1 commit


23 Dec, 2011

1 commit

  • skb->truesize might be big even for a small packet.

    Its even bigger after commit 87fb4b7b533 (net: more accurate skb
    truesize) and big MTU.

    We should allow queueing at least one packet per receiver, even with a
    low RCVBUF setting.

    Reported-by: Michal Simek
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Nov, 2011

2 commits

  • packet: Add needed_tailroom to packet_sendmsg_spkt

    While auditing LL_ALLOCATED_SPACE I noticed that packet_sendmsg_spkt
    did not include needed_tailroom when allocating an skb. This isn't
    a fatal error as we should always tolerate inadequate tail room but
    it isn't optimal.

    This patch fixes that.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • net: Remove all uses of LL_ALLOCATED_SPACE

    The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
    alignment to the sum of needed_headroom and needed_tailroom. As
    the amount that is then reserved for head room is needed_headroom
    with alignment, this means that the tail room left may be too small.

    This patch replaces all uses of LL_ALLOCATED_SPACE with the macro
    LL_RESERVED_SPACE and direct reference to needed_tailroom.

    This also fixes the problem with needed_headroom changing between
    allocating the skb and reserving the head room.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

04 Nov, 2011

1 commit

  • This popped some compiler errors due to mismatched prototypes. Just
    remove most manual inlines, the compiler should be able to figure out
    what makes sense to inline and not.

    net/packet/af_packet.c:252: warning: 'prb_curr_blk_in_use' declared inline after being called
    net/packet/af_packet.c:252: warning: previous declaration of 'prb_curr_blk_in_use' was here
    net/packet/af_packet.c:258: warning: 'prb_queue_frozen' declared inline after being called
    net/packet/af_packet.c:258: warning: previous declaration of 'prb_queue_frozen' was here
    net/packet/af_packet.c:248: warning: 'packet_previous_frame' declared inline after being called
    net/packet/af_packet.c:248: warning: previous declaration of 'packet_previous_frame' was here
    net/packet/af_packet.c:251: warning: 'packet_increment_head' declared inline after being called
    net/packet/af_packet.c:251: warning: previous declaration of 'packet_increment_head' was here

    Signed-off-by: Olof Johansson
    Cc: Chetan Loke
    Signed-off-by: David S. Miller

    Olof Johansson
     

19 Oct, 2011

1 commit

  • Fragmented multicast frames are delivered to a single macvlan port,
    because ip defrag logic considers other samples are redundant.

    Implement a defrag step before trying to send the multicast frame.

    Reported-by: Ben Greear
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Oct, 2011

1 commit


08 Oct, 2011

1 commit


04 Oct, 2011

1 commit

  • This is a minor change.

    Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
    ...) would return total and dropped packets since its last invocation. The
    introduction of socket queue overflow reporting [1] changed drop
    rate calculation in the normal packet socket path, but not when using a
    packet ring. As a result, the getsockopt now returns different statistics
    depending on the reception method used. With a ring, it still returns the
    count since the last call, as counts are incremented in tpacket_rcv and
    reset in getsockopt. Without a ring, it returns 0 if no drops occurred
    since the last getsockopt and the total drops over the lifespan of
    the socket otherwise. The culprit is this line in packet_rcv, executed
    on a drop:

    drop_n_acct:
    po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);

    As it shows, the new drop number it taken from the socket drop counter,
    which is not reset at getsockopt. I put together a small example
    that demonstrates the issue [2]. It runs for 10 seconds and overflows
    the queue/ring on every odd second. The reported drop rates are:
    ring: 16, 0, 16, 0, 16, ...
    non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.

    Note how the even ring counts monotonically increase. Because the
    getsockopt adds tp_drops to tp_packets, total counts are similarly
    reported cumulatively. Long story short, reinstating the original code, as
    the below patch does, fixes the issue at the cost of additional per-packet
    cycles. Another solution that does not introduce per-packet overhead
    is be to keep the current data path, record the value of sk_drops at
    getsockopt() at call N in a new field in struct packetsock and subtract
    that when reporting at call N+1. I'll be happy to code that, instead,
    it's just more messy.

    [1] http://patchwork.ozlabs.org/patch/35665/
    [2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

16 Sep, 2011

1 commit

  • This patch does several things:
    - introduces __ethtool_get_settings which is called from ethtool code and
    from drivers as well. Put ASSERT_RTNL there.
    - dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
    - changes calling in drivers so rtnl locking is respected. In
    iboe_get_rate was previously ->get_settings() called unlocked. This
    fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
    problem. Also fixed by calling __dev_get_by_index() instead of
    dev_get_by_index() and holding rtnl_lock for both calls.
    - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
    so bnx2fc_if_create() and fcoe_if_create() are called locked as they
    are from other places.
    - use __ethtool_get_settings() in bonding code

    Signed-off-by: Jiri Pirko

    v2->v3:
    -removed dev_ethtool_get_settings()
    -added ASSERT_RTNL into __ethtool_get_settings()
    -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
    around it and __ethtool_get_settings() call
    v1->v2:
    add missing export_symbol
    Reviewed-by: Ben Hutchings [except FCoE bits]
    Acked-by: Ralf Baechle
    Signed-off-by: David S. Miller

    Jiri Pirko
     

27 Aug, 2011

1 commit


25 Aug, 2011

1 commit

  • 1) Blocks can be configured with non-static frame-size.
    2) Read/poll is at a block-level(as opposed to packet-level).
    3) Added poll timeout to avoid indefinite user-space wait on idle links.
    4) Added user-configurable knobs:
    4.1) block::timeout.
    4.2) tpkt_hdr::sk_rxhash.

    Changes:
    C1) tpacket_rcv()
    C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
    The bulk of the processing is then moved in the following chain:
    packet_current_rx_frame()
    __packet_lookup_frame_in_block
    fill_curr_block()
    or
    retire_current_block
    dispatch_next_block
    or
    return NULL(queue is plugged/paused)

    Signed-off-by: Chetan Loke
    Signed-off-by: David S. Miller

    chetan loke
     

14 Jul, 2011

1 commit

  • Currently we flush tp_status and then flush the remainder of the header+payload.
    tp_status should be flushed in the end to avoid stale data being read by user-space.

    Incorrectly re-ordered barriers in v1.

    Signed-off-by: Chetan Loke
    Signed-off-by: David S. Miller

    Chetan Loke