26 Feb, 2011

1 commit


19 Feb, 2011

1 commit


01 Feb, 2011

1 commit

  • When an IPSEC SA is still being set up, __xfrm_lookup() will return
    -EREMOTE and so ip_route_output_flow() will return a blackhole route.
    This can happen in a sndmsg call, and after d33e455337ea ("net: Abstract
    default MTU metric calculation behind an accessor.") this leads to a
    crash in ip_append_data() because the blackhole dst_ops have no
    default_mtu() method and so dst_mtu() calls a NULL pointer.

    Fix this by adding default_mtu() methods (that simply return 0, matching
    the old behavior) to the blackhole dst_ops.

    The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
    VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
    looks to be needed as well.

    Signed-off-by: Roland Dreier
    Signed-off-by: David S. Miller

    Roland Dreier
     

28 Jan, 2011

1 commit

  • They are bogus. The basic idea is that I wanted to make sure
    that prefixed routes never bind to peers.

    The test I used was whether RTF_CACHE was set.

    But first of all, the RTF_CACHE flag is set at different spots
    depending upon which ip6_rt_copy() caller you're talking about.

    I've validated all of the code paths, and even in the future
    where we bind peers more aggressively (for route metric COW'ing)
    we never bind to prefix'd routes, only fully specified ones.
    This even applies when addrconf or icmp6 routes are allocated.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jan, 2011

1 commit


19 Dec, 2010

1 commit


18 Dec, 2010

1 commit


17 Dec, 2010

1 commit

  • The first big packets sent to a "low-MTU" client correctly
    triggers the creation of a temporary route containing the reduced MTU.

    But after the temporary route has expired, new ICMP6 "packet too big"
    will be sent, rt6_pmtu_discovery will find the previous EXPIRED route
    check that its mtu isn't bigger then in icmp packet and do nothing
    before the temporary route will not deleted by gc.

    I make the simple experiment:
    while :; do
    time ( dd if=/dev/zero bs=10K count=1 | ssh hostname dd of=/dev/null ) || break;
    done

    The "time" reports real 0m0.197s if a temporary route isn't expired, but
    it reports real 0m52.837s (!!!!) immediately after a temporare route has
    expired.

    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

15 Dec, 2010

1 commit


14 Dec, 2010

1 commit

  • Make all RTAX_ADVMSS metric accesses go through a new helper function,
    dst_metric_advmss().

    Leave the actual default metric as "zero" in the real metric slot,
    and compute the actual default value dynamically via a new dst_ops
    AF specific callback.

    For stacked IPSEC routes, we use the advmss of the path which
    preserves existing behavior.

    Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
    advmss on pmtu updates. This inconsistency in advmss handling
    results in more raw metric accesses than I wish we ended up with.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Dec, 2010

3 commits


10 Dec, 2010

1 commit

  • Use helper functions to hide all direct accesses, especially writes,
    to dst_entry metrics values.

    This will allow us to:

    1) More easily change how the metrics are stored.

    2) Implement COW for metrics.

    In particular this will help us put metrics into the inetpeer
    cache if that is what we end up doing. We can make the _metrics
    member a pointer instead of an array, initially have it point
    at the read-only metrics in the FIB, and then on the first set
    grab an inetpeer entry and point the _metrics member there.

    Signed-off-by: David S. Miller
    Acked-by: Eric Dumazet

    David S. Miller
     

01 Dec, 2010

1 commit


29 Nov, 2010

1 commit

  • 1. IPV6_TLV_TEL_DST_SIZE
    This has not been using for several years since created.

    2. RT6_INFO_LEN
    commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

    commit 33120b30cc3b8665204d4fcde7288638b0dd04d5
    Author: Alexey Dobriyan
    Date: Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

18 Nov, 2010

1 commit


13 Nov, 2010

1 commit


04 Nov, 2010

1 commit

  • There're some percpu_counter list corruption and poison overwritten warnings
    in recent kernel, which is resulted by fc66f95c.

    commit fc66f95c switches to use percpu_counter, in ip6_route_net_init, kernel
    init the percpu_counter for dst entries, but, the percpu_counter is never destroyed
    in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter
    is still on the list, then if we insert/remove other percpu_counter, list corruption
    resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of
    the freed value, the poison overwritten is resulted then.

    With the following patch, the percpu_counter list corruption and poison overwritten
    warnings disappeared.

    Signed-off-by: Xiaotian Feng
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: "Pekka Savola (ipv6)"
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Xiaotian Feng
     

12 Oct, 2010

1 commit

  • struct dst_ops tracks number of allocated dst in an atomic_t field,
    subject to high cache line contention in stress workload.

    Switch to a percpu_counter, to reduce number of time we need to dirty a
    central location. Place it on a separate cache line to avoid dirtying
    read only fields.

    Stress test :

    (Sending 160.000.000 UDP frames,
    IP route cache disabled, dual E5540 @2.53GHz,
    32bit kernel, FIB_TRIE, SLUB/NUMA)

    Before:

    real 0m51.179s
    user 0m15.329s
    sys 10m15.942s

    After:

    real 0m45.570s
    user 0m15.525s
    sys 9m56.669s

    With a small reordering of struct neighbour fields, subject of a
    following patch, (to separate refcnt from other read mostly fields)

    real 0m41.841s
    user 0m15.261s
    sys 8m45.949s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2010

1 commit


04 Oct, 2010

1 commit


29 Sep, 2010

1 commit

  • AnyIP is the capability to receive packets and establish incoming
    connections on IPs we have not explicitly configured on the machine.

    An example use case is to configure a machine to accept all incoming
    traffic on eth0, and leave the policy of whether traffic for a given IP
    should be delivered to the machine up to the load balancer.

    Can be setup as follows:
    ip -6 rule from all iif eth0 lookup 200
    ip -6 route add local default dev lo table 200
    (in this case for all IPv6 addresses)

    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

28 Sep, 2010

1 commit


24 Sep, 2010

1 commit


15 Aug, 2010

1 commit

  • sysctl output ipv6 gc_elasticity and min_adv_mss as values divided by
    HZ. However, they are not in unit of jiffies, since ip6_rt_min_advmss
    refers to packet size and ip6_rt_fc_elasticity is used as scaler as in
    expire>>ip6_rt_gc_elasticity, so replace the jiffies conversion
    handler will regular handler for them.

    This has impact on scripts that are currently working assuming the
    divide by HZ, will yield different results with this patch in place.

    Signed-off-by: Min Zhang
    Signed-off-by: David S. Miller

    Min Zhang
     

15 Jun, 2010

1 commit


11 Jun, 2010

1 commit


29 May, 2010

1 commit

  • Commit f4f914b5 (net: ipv6 bind to device issue) caused
    a regression with Mobile IPv6 when it changed the meaning
    of fl->oif to become a strict requirement of the route
    lookup. Instead, only force strict mode when
    sk->sk_bound_dev_if is set on the calling socket, getting
    the intended behavior and fixing the regression.

    Tested-by: Arnaud Ebalard
    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

22 Apr, 2010

1 commit

  • The issue raises when having 2 NICs both assigned the same
    IPv6 global address.

    If a sender binds to a particular NIC (SO_BINDTODEVICE),
    the outgoing traffic is being sent via the first found.
    The bonded device is thus not taken into an account during the
    routing.

    From the ip6_route_output function:

    If the binding address is multicast, linklocal or loopback,
    the RT6_LOOKUP_F_IFACE bit is set, but not for global address.

    So binding global address will neglect SO_BINDTODEVICE-binded device,
    because the fib6_rule_lookup function path won't check for the
    flowi::oif field and take first route that fits.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Scott Otto
    Signed-off-by: David S. Miller

    Jiri Olsa
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

29 Mar, 2010

1 commit


20 Mar, 2010

1 commit


08 Mar, 2010

1 commit

  • IPV6_PREFER_SRC_xxx definitions:
    | #define IPV6_PREFER_SRC_TMP 0x0001
    | #define IPV6_PREFER_SRC_PUBLIC 0x0002
    | #define IPV6_PREFER_SRC_COA 0x0004

    RT6_LOOKUP_F_xxx definitions:
    | #define RT6_LOOKUP_F_SRCPREF_TMP 0x00000008
    | #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010
    | #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020

    So, we can translate between these two groups by shift operation
    instead of multiple 'if's.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki / 吉藤英明
     

26 Feb, 2010

1 commit

  • RFC 4291 section 2.4 states that all uncategorized addresses
    should be considered as Global Unicast.

    This will remove IPV6_ADDR_RESERVED completely
    and return IPV6_ADDR_UNICAST in ipv6_addr_type() instead.

    Signed-off-by: Ulrich Weber
    Signed-off-by: David S. Miller

    Ulrich Weber
     

19 Feb, 2010

1 commit


18 Jan, 2010

1 commit


19 Dec, 2009

1 commit


08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds