02 Dec, 2011

1 commit


22 Nov, 2011

1 commit


04 Nov, 2011

1 commit

  • Simon Kirby reported lockdep warnings and following messages :

    [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
    preempt_count 00000101, exited with 00000102?

    [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
    preempt_count 00000101, exited with 00000102?

    Problem comes from commit 0e734419
    (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

    If inet_csk_route_child_sock() returns NULL, we should release socket
    lock before freeing it.

    Another lock imbalance exists if __inet_inherit_port() returns an error
    since commit 093d282321da ( tproxy: fix hash locking issue when using
    port redirection in __inet_inherit_port()) a backport is also needed for
    >= 2.6.37 kernels.

    Reported-by: Simon Kirby
    Signed-off-by: Eric Dumazet
    Tested-by: Eric Dumazet
    CC: Balazs Scheidler
    CC: KOVACS Krisztian
    Reviewed-by: Thomas Gleixner
    Tested-by: Simon Kirby
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Aug, 2011

1 commit

  • Computers have become a lot faster since we compromised on the
    partial MD4 hash which we use currently for performance reasons.

    MD5 is a much safer choice, and is inline with both RFC1948 and
    other ISS generators (OpenBSD, Solaris, etc.)

    Furthermore, only having 24-bits of the sequence number be truly
    unpredictable is a very serious limitation. So the periodic
    regeneration and 8-bit counter have been removed. We compute and
    use a full 32-bit sequence number.

    For ipv6, DCCP was found to use a 32-bit truncated initial sequence
    number (it needs 43-bits) and that is fixed here as well.

    Reported-by: Dan Kaminsky
    Tested-by: Willy Tarreau
    Signed-off-by: David S. Miller

    David S. Miller
     

19 May, 2011

1 commit


09 May, 2011

2 commits


04 May, 2011

1 commit


29 Apr, 2011

2 commits

  • Now that output route lookups update the flow with
    destination address selection, we can fetch it from
    fl4->daddr instead of rt->rt_dst

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We lack proper synchronization to manipulate inet->opt ip_options

    Problem is ip_make_skb() calls ip_setup_cork() and
    ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
    without any protection against another thread manipulating inet->opt.

    Another thread can change inet->opt pointer and free old one under us.

    Use RCU to protect inet->opt (changed to inet->inet_opt).

    Instead of handling atomic refcounts, just copy ip_options when
    necessary, to avoid cache line dirtying.

    We cant insert an rcu_head in struct ip_options since its included in
    skb->cb[], so this patch is large because I had to introduce a new
    ip_options_rcu structure.

    Signed-off-by: Eric Dumazet
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Apr, 2011

1 commit

  • These functions are used together as a unit for route resolution
    during connect(). They address the chicken-and-egg problem that
    exists when ports need to be allocated during connect() processing,
    yet such port allocations require addressing information from the
    routing code.

    It's currently more heavy handed than it needs to be, and in
    particular we allocate and initialize a flow object twice.

    Let the callers provide the on-stack flow object. That way we only
    need to initialize it once in the ip_route_connect() call.

    Later, if ip_route_newports() needs to do anything, it re-uses that
    flow object as-is except for the ports which it updates before the
    route re-lookup.

    Also, describe why this set of facilities are needed and how it works
    in a big comment.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David S. Miller
     

13 Mar, 2011

4 commits


03 Mar, 2011

1 commit


02 Mar, 2011

3 commits


25 Feb, 2011

1 commit

  • ip_route_newports() is the only place in the entire kernel that
    cares about the port members in the routing cache entry's lookup
    flow key.

    Therefore the only reason we store an entire flow inside of the
    struct rtentry is for this one special case.

    Rewrite ip_route_newports() such that:

    1) The caller passes in the original port values, so we don't need
    to use the rth->fl.fl_ip_{s,d}port values to remember them.

    2) The lookup flow is constructed by hand instead of being copied
    from the routing cache entry's flow.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Nov, 2010

1 commit


21 Oct, 2010

1 commit

  • When __inet_inherit_port() is called on a tproxy connection the wrong locks are
    held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
    implicit assumption that the listener's port number (and thus its bind bucket).
    Unfortunately, if you're using the TPROXY target to redirect skbs to a
    transparent proxy that assumption is not true anymore and things break.

    This patch adds code to __inet_inherit_port() so that it can handle this case
    by looking up or creating a new bind bucket for the child socket and updates
    callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
    failing.

    Reported by and original patch from Stephen Buck .
    See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.

    Signed-off-by: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Balazs Scheidler
     

11 Jun, 2010

1 commit


12 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit

  • dccp: fix panic caused by failed initialisation

    This fixes a kernel panic reported thanks to Andre Noll:

    if DCCP is compiled into the kernel and any out of the initialisation
    steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
    a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
    from creating DCCP sockets.

    This patch fixes the problem by propagating a failure in dccp_init() to
    dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
    DCCP protocol is not made available if its initialisation fails.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

18 Jan, 2010

1 commit


09 Dec, 2009

1 commit

  • First patch changes __inet_hash_nolisten() and __inet6_hash()
    to get a timewait parameter to be able to unhash it from ehash
    at same time the new socket is inserted in hash.

    This makes sure timewait socket wont be found by a concurrent
    writer in __inet_check_established()

    Reported-by: kapil dakhane
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2009

1 commit

  • Add optional function parameters associated with sending SYNACK.
    These parameters are not needed after sending SYNACK, and are not
    used for retransmission. Avoids extending struct tcp_request_sock,
    and avoids allocating kernel memory.

    Also affects DCCP as it uses common struct request_sock_ops,
    but this parameter is currently reserved for future use.

    Signed-off-by: William.Allen.Simpson@gmail.com
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    William Allen Simpson
     

06 Nov, 2009

1 commit

  • struct can_proto had a capability field which wasn't ever used. It is
    dropped entirely.

    struct inet_protosw had a capability field which can be more clearly
    expressed in the code by just checking if sock->type = SOCK_RAW.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Sep, 2009

1 commit


02 Sep, 2009

1 commit


03 Jun, 2009

2 commits

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

    Delete skb->rtable field

    Setting rtable is not allowed, just set dst instead as rtable is an alias.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2008

1 commit

  • RCU was added to UDP lookups, using a fast infrastructure :
    - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
    price of call_rcu() at freeing time.
    - hlist_nulls permits to use few memory barriers.

    This patch uses same infrastructure for TCP/DCCP established
    and timewait sockets.

    Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
    using short lived TCP connections. A followup patch, converting
    rwlocks to spinlocks will even speedup this case.

    __inet_lookup_established() is pretty fast now we dont have to
    dirty a contended cache line (read_lock/read_unlock)

    Only established and timewait hashtable are converted to RCU
    (bind table and listen table are still using traditional locking)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Nov, 2008

2 commits

  • This inserts the required de-allocation routines for memory allocated
    by feature negotiation in the socket destructors, replacing
    dccp_feat_clean() in one instance.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides feature-negotiation initialisation for both DCCP sockets
    and DCCP request_sockets, to support feature negotiation during
    connection setup.

    It also resolves a FIXME regarding the congestion control
    initialisation.

    Thanks to Wei Yongjun for help with the IPv6 side of this patch.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

31 Oct, 2008

1 commit


08 Oct, 2008

1 commit