23 Nov, 2011

1 commit


09 Nov, 2011

1 commit


19 Oct, 2011

1 commit


31 Aug, 2011

1 commit


05 Aug, 2011

1 commit

  • When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added
    in 0f8d3c7ac3693d7b6c731bf2159273a59bf70e12 the rest of the code
    wasn't told so now it's possible to bind IPv6 datagram socket to
    ::ffff.0.0.0.0, connect it to another IPv4 address and it will all
    work except for getsockhame() which does not return the local address
    as expected.

    To give getsockname() something to work with check for 'mapped INADDR_ANY'
    when connecting and update the in-core source addresses appropriately.

    Signed-off-by: Max Matveev
    Signed-off-by: David S. Miller

    Max Matveev
     

13 Mar, 2011

4 commits


02 Mar, 2011

1 commit

  • Route lookups follow a general pattern in the ipv6 code wherein
    we first find the non-IPSEC route, potentially override the
    flow destination address due to ipv6 options settings, and then
    finally make an IPSEC search using either xfrm_lookup() or
    __xfrm_lookup().

    __xfrm_lookup() is used when we want to generate a blackhole route
    if the key manager needs to resolve the IPSEC rules (in this case
    -EREMOTE is returned and the original 'dst' is left unchanged).

    Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
    resolution is necessary, we simply fail the lookup completely.

    All of these cases are encapsulated into two routines,
    ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
    handles unconnected UDP datagram sockets.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Oct, 2010

1 commit


09 Sep, 2010

1 commit

  • commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
    added a secondary hash on UDP, hashed on (local addr, local port).

    Problem is that following sequence :

    fd = socket(...)
    connect(fd, &remote, ...)

    not only selects remote end point (address and port), but also sets
    local address, while UDP stack stored in secondary hash table the socket
    while its local address was INADDR_ANY (or ipv6 equivalent)

    Sequence is :
    - autobind() : choose a random local port, insert socket in hash tables
    [while local address is INADDR_ANY]
    - connect() : set remote address and port, change local address to IP
    given by a route lookup.

    When an incoming UDP frame comes, if more than 10 sockets are found in
    primary hash table, we switch to secondary table, and fail to find
    socket because its local address changed.

    One solution to this problem is to rehash datagram socket if needed.

    We add a new rehash(struct socket *) method in "struct proto", and
    implement this method for UDP v4 & v6, using a common helper.

    This rehashing only takes care of secondary hash table, since primary
    hash (based on local port only) is not changed.

    Reported-by: Krzysztof Piotr Oledzki
    Signed-off-by: Eric Dumazet
    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Jun, 2010

1 commit

  • There are more than a dozen occurrences of following code in the
    IPv6 stack:

    if (opt && opt->srcrt) {
    struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
    ipv6_addr_copy(&final, &fl.fl6_dst);
    ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
    final_p = &final;
    }

    Replace those with a helper. Note that the helper overrides final_p
    in all cases. This is ok as final_p was previously initialized to
    NULL when declared.

    Signed-off-by: Arnaud Ebalard
    Signed-off-by: David S. Miller

    Arnaud Ebalard
     

12 May, 2010

1 commit


06 May, 2010

1 commit

  • I noticed when I added support for IPV6_DONTFRAG that if you set
    IPV6_RECVERR and tried to send a UDP packet larger than 64K to an
    IPv6 destination, you'd correctly get an EMSGSIZE, but reading from
    MSG_ERRQUEUE returned the incorrect address in the cmsg:

    struct msghdr:
    msg_name 0x7fff8f3c96d0
    msg_namelen 28
    struct sockaddr_in6:
    sin6_family 10
    sin6_port 7639
    sin6_flowinfo 0
    sin6_addr ::ffff:38.32.0.0
    sin6_scope_id 0 ((null))

    It should have returned this in my case:

    struct msghdr:
    msg_name 0x7fffd866b510
    msg_namelen 28
    struct sockaddr_in6:
    sin6_family 10
    sin6_port 7639
    sin6_flowinfo 0
    sin6_addr 2620:0:a09:e000:21f:29ff:fe57:f88b
    sin6_scope_id 0 ((null))

    The problem is that ipv6_recv_error() assumes that if the error
    wasn't generated by ICMPv6, it's an IPv4 address sitting there,
    and proceeds to create a v4-mapped address from it.

    Change ipv6_icmp_error() and ipv6_local_error() to set skb->protocol
    to htons(ETH_P_IPV6) so that ipv6_recv_error() knows the address
    sitting right after the extended error is IPv6, else it will
    incorrectly map the first octet into an IPv4-mapped IPv6 address
    in the cmsg structure returned in a recvmsg() call to obtain
    the error.

    Signed-off-by: Brian Haley

    --
    To unsubscribe from this list: send the line "unsubscribe netdev" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Signed-off-by: David S. Miller

    Brian Haley
     

24 Apr, 2010

2 commits


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

02 Nov, 2009

1 commit


19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Oct, 2009

1 commit


07 Oct, 2009

1 commit

  • Atis Elsts wrote:
    > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.

    Ok, I'll just drop that part, I'm not sure what should be done in this case.

    > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?

    I just sent that patch out too quickly, here's a better one with the updates.

    Add support for IPv6 route lookups using sk_mark.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

26 Nov, 2008

1 commit

  • Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
    to flow_cache_lookup() and resolver callback.

    Take it from socket or netdevice. Stub DECnet to init_net.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

13 Nov, 2008

1 commit

  • This patch fixes two bugs:

    1. setsockopt() of anything but a Type 2 routing header should return
    EINVAL instead of EPERM. Noticed by Shan Wei
    (shanwei@cn.fujitsu.com).

    2. setsockopt()/sendmsg() of a Type 2 routing header with invalid
    length or segments should return EINVAL. These values are statically
    fixed in RFC 3775, unlike the variable Type 0 was.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

30 Jul, 2008

1 commit


14 Jun, 2008

1 commit


12 Jun, 2008

2 commits


05 Jun, 2008

2 commits


29 Jan, 2008

2 commits


09 Jan, 2008

1 commit


11 Oct, 2007

1 commit

  • This patch makes most of the generic device layer network
    namespace safe. This patch makes dev_base_head a
    network namespace variable, and then it picks up
    a few associated variables. The functions:
    dev_getbyhwaddr
    dev_getfirsthwbytype
    dev_get_by_flags
    dev_get_by_name
    __dev_get_by_name
    dev_get_by_index
    __dev_get_by_index
    dev_ioctl
    dev_ethtool
    dev_load
    wireless_process_ioctl

    were modified to take a network namespace argument, and
    deal with it.

    vlan_ioctl_set and brioctl_set were modified so their
    hooks will receive a network namespace argument.

    So basically anthing in the core of the network stack that was
    affected to by the change of dev_base was modified to handle
    multiple network namespaces. The rest of the network stack was
    simply modified to explicitly use &init_net the initial network
    namespace. This can be fixed when those components of the network
    stack are modified to handle multiple network namespaces.

    For now the ifindex generator is left global.

    Fundametally ifindex numbers are per namespace, or else
    we will have corner case problems with migration when
    we get that far.

    At the same time there are assumptions in the network stack
    that the ifindex of a network device won't change. Making
    the ifindex number global seems a good compromise until
    the network stack can cope with ifindex changes when
    you change namespaces, and the like.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

11 Jul, 2007

2 commits

  • Based on .

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • This patch makes MIPv6 loadable module named "mip6".

    Here is a modprobe.conf(5) example to load it automatically
    when user application uses XFRM state for MIPv6:

    alias xfrm-type-10-43 mip6
    alias xfrm-type-10-60 mip6

    Some MIPv6 feature is not included by this modular, however,
    it should not be affected to other features like either IPsec
    or IPv6 with and without the patch.
    We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
    separately for future work.

    Loadable features:
    * MH receiving check (to send ICMP error back)
    * RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
    * XFRM policy/state database handling for RO

    These are NOT covered as loadable:
    * Home Address flags and its rule on source address selection
    * XFRM sub policy (depends on its own kernel option)
    * XFRM functions to receive RO as IPv6 extension header
    * MH sending/receiving through raw socket if user application
    opens it (since raw socket allows to do so)
    * RH2 sending as ancillary data
    * RH2 operation with setsockopt(2)

    Signed-off-by: Masahide NAKAMURA
    Signed-off-by: David S. Miller

    Masahide NAKAMURA
     

25 May, 2007

1 commit

  • The current IPSEC rule resolution behavior we have does not work for a
    lot of people, even though technically it's an improvement from the
    -EAGAIN buisness we had before.

    Right now we'll block until the key manager resolves the route. That
    works for simple cases, but many folks would rather packets get
    silently dropped until the key manager resolves the IPSEC rules.

    We can't tell these folks to "set the socket non-blocking" because
    they don't have control over the non-block setting of things like the
    sockets used to resolve DNS deep inside of the resolver libraries in
    libc.

    With that in mind I coded up the patch below with some help from
    Herbert Xu which provides packet-drop behavior during larval state
    resolution, controllable via sysctl and off by default.

    This lays the framework to either:

    1) Make this default at some point or...

    2) Move this logic into xfrm{4,6}_policy.c and implement the
    ARP-like resolution queue we've all been dreaming of.
    The idea would be to queue packets to the policy, then
    once the larval state is resolved by the key manager we
    re-resolve the route and push the packets out. The
    packets would timeout if the rule didn't get resolved
    in a certain amount of time.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Apr, 2007

3 commits

  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • These are a bit more subtle, they are of this type:

    - skb->h.raw = payload;
    __skb_pull(skb, payload - skb->data);
    + skb_reset_transport_header(skb);

    __skb_pull results in:

    skb->data = skb->data + payload - skb->data;
    skb->data = payload;

    So after __skb_pull we have skb->data pointing to payload and we can
    just call skb_reset_transport_header(skb), that will do:

    skb->h.raw = payload;

    The others are similar, allowing us to get rid of some more cases where a
    pointer was being attributed to the layer headers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo