24 Dec, 2009

1 commit

  • Add rtnetlink init_rcvwnd to set the TCP initial receive window size
    advertised by passive and active TCP connections.
    The current Linux TCP implementation limits the advertised TCP initial
    receive window to the one prescribed by slow start. For short lived
    TCP connections used for transaction type of traffic (i.e. http
    requests), bounding the advertised TCP initial receive window results
    in increased latency to complete the transaction.
    Support for setting initial congestion window is already supported
    using rtnetlink init_cwnd, but the feature is useless without the
    ability to set a larger TCP initial receive window.
    The rtnetlink init_rcvwnd allows increasing the TCP initial receive
    window, allowing TCP connection to advertise larger TCP receive window
    than the ones bounded by slow start.

    Signed-off-by: Laurent Chavey
    Signed-off-by: David S. Miller

    laurent chavey
     

16 Dec, 2009

1 commit

  • It creates a regression, triggering badness for SYN_RECV
    sockets, for example:

    [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
    [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
    [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
    [19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
    [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

    This is likely caused by the change in the 'estab' parameter
    passed to tcp_parse_options() when invoked by the functions
    in net/ipv4/tcp_minisocks.c

    But even if that is fixed, the ->conn_request() changes made in
    this patch series is fundamentally wrong. They try to use the
    listening socket's 'dst' to probe the route settings. The
    listening socket doesn't even have a route, and you can't
    get the right route (the child request one) until much later
    after we setup all of the state, and it must be done by hand.

    This stuff really isn't ready, so the best thing to do is a
    full revert. This reverts the following commits:

    f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
    022c3f7d82f0f1c68018696f2f027b87b9bb45c2
    1aba721eba1d84a2defce45b950272cee1e6c72a
    cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
    345cda2fd695534be5a4494f1b59da9daed33663
    dc343475ed062e13fc260acccaab91d7d80fd5b2
    05eaade2782fb0c90d3034fd7a7d5a16266182bb
    6a2a2d6bf8581216e08be15fcb563cfd6c430e1e

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Nov, 2009

1 commit


04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Oct, 2009

1 commit


21 Oct, 2009

1 commit

  • dst_negative_advice() should check for changed dst and reset
    sk_tx_queue_mapping accordingly. Pass sock to the callers of
    dst_negative_advice.

    (sk_reset_txq is defined just for use by dst_negative_advice. The
    only way I could find to get around this is to move dst_negative_()
    from dst.h to dst.c, include sock.h in dst.c, etc)

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

02 Sep, 2009

1 commit

  • struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
    but there is no fundamental reason for it. Embed it directly into
    struct netns_ipv6.

    For that:
    * move struct dst_ops into separate header to fix circular dependencies
    I honestly tried not to, it's pretty impossible to do other way
    * drop dynamical allocation, allocate together with netns

    For a change, remove struct dst_ops::dst_net, it's deducible
    by using container_of() given dst_ops pointer.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Nov, 2008

1 commit

  • Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
    to flow_cache_lookup() and resolver callback.

    Take it from socket or netdevice. Stub DECnet to init_net.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

17 Nov, 2008

1 commit

  • As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
    [NET]: Fix tbench regression in 2.6.25-rc1), it is really
    important that struct dst_entry refcount is aligned on a cache line.

    We cannot use __atribute((aligned)), so manually pad the structure
    for 32 and 64 bit arches.

    for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
    for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

    As it is not possible to guess at compile time cache line size,
    we use a generic value of 64 bytes, that satisfies many current arches.
    (Using 128 bytes alignment on 64bit arches would waste 64 bytes)

    Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
    break this alignment.

    "tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
    (2350 MB/s instead of 2250 MB/s)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Nov, 2008

1 commit


29 Oct, 2008

1 commit


05 Aug, 2008

1 commit

  • dst_input() was doing something completely absurd, looping
    on skb->dst->input() if NET_XMIT_BYPASS was seen, but these
    functions never return such an error.

    And as a result plain ole' NET_XMIT_BYPASS has no more
    references and can be completely killed off.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jul, 2008

1 commit

  • Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
    kernel units (jiffies) and this leaks out through the netlink API to
    user space where the units for jiffies are unknown.

    This patches changes the kernel to convert to/from milliseconds. This
    changes the ABI, but milliseconds seemed like the most natural unit
    for these parameters. Values available via syscall in
    /proc/net/rt_cache and netlink will be in milliseconds.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

28 Mar, 2008

1 commit

  • Codiff stats (allyesconfig, v2.6.24-mm1):
    -16420 187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release

    Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
    -7257 186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
    dst_release | +40

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

13 Mar, 2008

1 commit

  • Comparing with kernel 2.6.24, tbench result has regression with
    2.6.25-rc1.

    1) On 2 quad-core processor stoakley: 4%.
    2) On 4 quad-core processor tigerton: more than 30%.

    bisect located below patch.

    b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit
    commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b
    Author: Herbert Xu
    Date: Tue Nov 13 21:33:32 2007 -0800

    [IPV6]: Move nfheader_len into rt6_info

    The dst member nfheader_len is only used by IPv6. It's also currently
    creating a rather ugly alignment hole in struct dst. Therefore this patch
    moves it from there into struct rt6_info.

    Above patch changes the cache line alignment, especially member
    __refcnt. I did a testing by adding 2 unsigned long pading before
    lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
    cache line. The performance is recovered.

    I created a patch to rearrange the members in struct dst_entry.

    With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.

    1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
    sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
    tested many patches on my 16-core tigerton by moving tclassid to
    different place. It looks like tclassid could also have impact on
    performance. If moving tclassid before metrics, or just don't move
    tclassid, the performance isn't good. So I move it behind metrics.

    2) Add comments before __refcnt.

    On 16-core tigerton:

    If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
    better than the one without the patch;

    If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
    better than the one without the patch.

    With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
    introduce regression.

    Thank Eric, Valdis, and David!

    Signed-off-by: Zhang Yanmin
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhang Yanmin
     

29 Jan, 2008

9 commits

  • On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to
    0x180 bytes by SLAB allocator.

    We can reduce this to exactly 0x140 bytes, without alignment overhead,
    and store 12 struct rtable per PAGE instead of 10.

    rate_tokens is currently defined as an "unsigned long", while its
    content should not exceed 6*HZ. It can safely be converted to an
    unsigned int.

    Moving tclassid right after rate_tokens to fill the 4 bytes hole
    permits to save 8 bytes on 'struct dst_entry', which finally permits
    to save 8 bytes on 'struct rtable'

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The network namespace pointer can be stored into the dst_ops structure.
    This is usefull when there are multiple instances of the dst_ops for a
    protocol. When there are no several instances, this field will be never
    used in the protocol. So there is no impact for the protocols which do
    implement the network namespaces.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • The garbage collection function receive the dst_ops structure as
    parameter. This is useful for the next incoming patchset because it
    will need the dst_ops (there will be several instances) and the
    network namespace pointer (contained in the dst_ops).

    The protocols which do not take care of the namespaces will not be
    impacted by this change (expect for the function signature), they do
    just ignore the parameter.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • The info placeholder member of dst_entry seems to be unused in the
    network stack.

    Signed-off-by: Rami Rosen
    Signed-off-by: David S. Miller

    Rami Rosen
     
  • RFC 4301 requires us to relookup ICMP traffic that does not match any
    policies using the reverse of its payload. This patch implements this
    for ICMP traffic that originates from or terminates on localhost.

    This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
    and on inbound by the new state flag XFRM_STATE_ICMP.

    On inbound the policy check is now performed by the ICMP protocol so
    that it can repeat the policy check where necessary.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch introduces an enum for bits in the flags argument of xfrm_lookup.
    This is so that we can cram more information into it later.

    Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
    been added with the value 1 << 0 to represent the current meaning of flags.

    The test in __xfrm_lookup has been changed accordingly.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As part of the work on asynchrnous cryptographic operations, we need
    to be able to resume from the spot where they occur. As such, it
    helps if we isolate them to one spot.

    This patch moves most of the remaining family-specific processing into
    the common output code.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • We have a number of copies of dst_discard scattered around the place
    which all do the same thing, namely free a packet on the input or
    output paths.

    This patch deletes all of them except dst_discard and points all the
    users to it.

    The only non-trivial bit is decnet where it returns an error.
    However, conceptually this is identical to the blackhole functions
    used in IPv4 and IPv6 which do not return errors. So they should
    either all return errors or all return zero. For now I've stuck with
    the majority and picked zero as the return value.

    It doesn't really matter in practice since few if any driver would
    react differently depending on a zero return value or NET_RX_DROP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The dst member nfheader_len is only used by IPv6. It's also currently
    creating a rather ugly alignment hole in struct dst. Therefore this patch
    moves it from there into struct rt6_info.

    It also reorders the fields in rt6_info to minimize holes.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

11 Nov, 2007

1 commit


11 Jul, 2007

1 commit


25 May, 2007

1 commit

  • The current IPSEC rule resolution behavior we have does not work for a
    lot of people, even though technically it's an improvement from the
    -EAGAIN buisness we had before.

    Right now we'll block until the key manager resolves the route. That
    works for simple cases, but many folks would rather packets get
    silently dropped until the key manager resolves the IPSEC rules.

    We can't tell these folks to "set the socket non-blocking" because
    they don't have control over the non-block setting of things like the
    sockets used to resolve DNS deep inside of the resolver libraries in
    libc.

    With that in mind I coded up the patch below with some help from
    Herbert Xu which provides packet-drop behavior during larval state
    resolution, controllable via sysctl and off by default.

    This lays the framework to either:

    1) Make this default at some point or...

    2) Move this logic into xfrm{4,6}_policy.c and implement the
    ARP-like resolution queue we've all been dreaming of.
    The idea would be to queue packets to the policy, then
    once the larval state is resolved by the key manager we
    re-resolve the route and push the packets out. The
    packets would timeout if the rule didn't get resolved
    in a certain amount of time.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Feb, 2007

2 commits

  • This last patch (but not least :) ) finally moves the next pointer at
    the end of struct dst_entry. This permits to perform route cache
    lookups with a minimal cost of one cache line per entry, instead of
    two.

    Both 32bits and 64bits platforms benefit from this new layout.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch introduces an anonymous union to nicely express the fact that all
    objects inherited from struct dst_entry should access to the generic 'next'
    pointer but with appropriate type verification.

    This patch is a prereq before following patches.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Sep, 2006

1 commit


23 Sep, 2006

1 commit

  • For originated outbound IPv6 packets which will fragment, ip6_append_data()
    should know length of extension headers before sending them and
    the length is carried by dst_entry.
    IPv6 IPsec headers fragment then transformation was
    designed to place all headers after fragment header.
    OTOH Mobile IPv6 extension headers do not fragment then
    it is a good idea to make dst_entry have non-fragment length to tell it
    to ip6_append_data().

    Signed-off-by: Masahide NAKAMURA
    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Masahide NAKAMURA
     

26 Apr, 2006

1 commit


08 Jan, 2006

1 commit

  • Call netfilter hooks before IPsec transforms. Packets visit the
    FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
    and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
    transform.

    Patch from Herbert Xu :

    Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
    the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
    all subsequent transport mode SAs and is called in a loop that calls the
    netfilter hooks between each two calls.

    In order to avoid the tail call issue, I've added the inline function
    nf_hook which is nf_hook_slow plus the empty list check.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

04 Jan, 2006

1 commit


26 Oct, 2005

1 commit

  • Now that we've switched over to storing MTUs in the xfrm_dst entries,
    we no longer need the dst's get_mss methods. This patch gets rid of
    them.

    It also documents the fact that our MTU calculation is not optimal
    for ESP.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arnaldo Carvalho de Melo

    Herbert Xu
     

20 Apr, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds