16 Nov, 2010

1 commit

  • The commit below added a new helper dev_ingress_queue to cleanly obtain the
    ingress queue pointer. This necessitated including 'linux/netdevice.h':

    commit 24824a09e35402b8d58dcc5be803a5ad3937bdba
    Author: Eric Dumazet
    Date: Sat Oct 2 06:11:55 2010 +0000

    net: dynamic ingress_queue allocation

    However this include triggers issues for applications in userspace
    which use the rtnetlink interfaces. Commonly this requires they include
    'net/if.h' and 'linux/rtnetlink.h' leading to a compiler error as below:

    In file included from /usr/include/linux/netdevice.h:28:0,
    from /usr/include/linux/rtnetlink.h:9,
    from t.c:2:
    /usr/include/linux/if.h:135:8: error: redefinition of ‘struct ifmap’
    /usr/include/net/if.h:112:8: note: originally defined here
    /usr/include/linux/if.h:169:8: error: redefinition of ‘struct ifreq’
    /usr/include/net/if.h:127:8: note: originally defined here
    /usr/include/linux/if.h:218:8: error: redefinition of ‘struct ifconf’
    /usr/include/net/if.h:177:8: note: originally defined here

    The new helper is only defined for the kernel and protected by __KERNEL__
    therefore we can simply pull the include down into the same protected
    section.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: David S. Miller

    Andy Whitcroft
     

05 Oct, 2010

2 commits

  • rtnl_dereference() is used in contexts where RTNL is held, to fetch an
    RCU protected pointer.

    Updates to this pointer are prevented by RTNL, so we dont need
    smp_read_barrier_depends() and the ACCESS_ONCE() provided in
    rcu_dereference_check().

    rtnl_dereference() is mainly a macro to document the locking invariant.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David S. Miller
     
  • ingress being not used very much, and net_device->ingress_queue being
    quite a big object (128 or 256 bytes), use a dynamic allocation if
    needed (tc qdisc add dev eth0 ingress ...)

    dev_ingress_queue(dev) helper should be used only with RTNL taken.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Sep, 2010

1 commit


09 Sep, 2010

1 commit

  • We use rcu_dereference_check(p, rcu_read_lock_held() ||
    lockdep_rtnl_is_held()) several times in network stack.

    More usages to come too, so its time to create a helper.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Jul, 2010

1 commit

  • Add a new rt attribute, RTA_MARK, and use it in
    rt_fill_info()/inet_rtm_getroute() to support following commands :

    ip route get 192.168.20.110 mark NUMBER
    ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER
    ip route list cache [192.168.20.110] mark NUMBER

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 May, 2010

1 commit

  • This patch adds support for multiple independant multicast routing instances,
    named "tables".

    Userspace multicast routing daemons can bind to a specific table instance by
    issuing a setsockopt call using a new option MRT6_TABLE. The table number is
    stored in the raw socket data and affects all following ip6mr setsockopt(),
    getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
    is created with a default routing rule pointing to it. Newly created pim6reg
    devices have the table number appended ("pim6regX"), with the exception of
    devices created in the default table, which are named just "pim6reg" for
    compatibility reasons.

    Packets are directed to a specific table instance using routing rules,
    similar to how regular routing rules work. Currently iif, oif and mark
    are supported as keys, source and destination addresses could be supported
    additionally.

    Example usage:

    - bind pimd/xorp/... to a specific table:

    uint32_t table = 123;
    setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

    - create routing rules directing packets to the new table:

    # ip -6 mrule add iif eth0 lookup 123
    # ip -6 mrule add oif eth0 lookup 123

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

26 Apr, 2010

1 commit

  • Decouple rtnetlink address families from real address families in socket.h to
    be able to add rtnetlink interfaces to code that is not a real address family
    without increasing AF_MAX/NPROTO.

    This will be used to add support for multicast route dumping from all tables
    as the proc interface can't be extended to support anything but the main table
    without breaking compatibility.

    This partialy undoes the patch to introduce independant families for routing
    rules and converts ipmr routing rules to a new rtnetlink family. Similar to
    that patch, values up to 127 are reserved for real address families, values
    above that may be used arbitrarily.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

01 Mar, 2010

1 commit


25 Feb, 2010

1 commit

  • Update rcu_dereference() primitives to use new lockdep-based
    checking. The rcu_dereference() in __in6_dev_get() may be
    protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
    The rcu_dereference() in __sk_free() is protected by the fact
    that it is never reached if an update could change it. Check
    for this by using rcu_dereference_check() to verify that the
    struct sock's ->sk_wmem_alloc counter is zero.

    Acked-by: Eric Dumazet
    Acked-by: David S. Miller
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

24 Dec, 2009

1 commit

  • Add rtnetlink init_rcvwnd to set the TCP initial receive window size
    advertised by passive and active TCP connections.
    The current Linux TCP implementation limits the advertised TCP initial
    receive window to the one prescribed by slow start. For short lived
    TCP connections used for transaction type of traffic (i.e. http
    requests), bounding the advertised TCP initial receive window results
    in increased latency to complete the transaction.
    Support for setting initial congestion window is already supported
    using rtnetlink init_cwnd, but the feature is useless without the
    ability to set a larger TCP initial receive window.
    The rtnetlink init_rcvwnd allows increasing the TCP initial receive
    window, allowing TCP connection to advertise larger TCP receive window
    than the ones bounded by slow start.

    Signed-off-by: Laurent Chavey
    Signed-off-by: David S. Miller

    laurent chavey
     

16 Dec, 2009

1 commit

  • It creates a regression, triggering badness for SYN_RECV
    sockets, for example:

    [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
    [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
    [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
    [19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
    [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

    This is likely caused by the change in the 'estab' parameter
    passed to tcp_parse_options() when invoked by the functions
    in net/ipv4/tcp_minisocks.c

    But even if that is fixed, the ->conn_request() changes made in
    this patch series is fundamentally wrong. They try to use the
    listening socket's 'dst' to probe the route settings. The
    listening socket doesn't even have a route, and you can't
    get the right route (the child request one) until much later
    after we setup all of the state, and it must be done by hand.

    This stuff really isn't ready, so the best thing to do is a
    full revert. This reverts the following commits:

    f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
    022c3f7d82f0f1c68018696f2f027b87b9bb45c2
    1aba721eba1d84a2defce45b950272cee1e6c72a
    cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
    345cda2fd695534be5a4494f1b59da9daed33663
    dc343475ed062e13fc260acccaab91d7d80fd5b2
    05eaade2782fb0c90d3034fd7a7d5a16266182bb
    6a2a2d6bf8581216e08be15fcb563cfd6c430e1e

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Oct, 2009

4 commits


09 Sep, 2009

1 commit

  • Fix apparent thinko related to RTM_DELADDRLABEL, introduced by commit
    2a8cc6c89039e0530a3335954253b76ed0f9339a ("[IPV6] ADDRCONF: Support
    RFC3484 configurable address selection policy table.").

    Signed-off-by: Tushar Gohad
    Signed-off-by: David S. Miller

    Tushar Gohad
     

20 Mar, 2009

1 commit

  • To improve manageability, it would be good to be able to disambiguate routes
    added by administrator from those added by DHCP client. The only necessary
    kernel change is to add value to rtnetlink include file so iproute2 utility
    can use it.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

25 Feb, 2009

1 commit

  • This patch changes the return value of nlmsg_notify() as follows:

    If NETLINK_BROADCAST_ERROR is set by any of the listeners and
    an error in the delivery happened, return the broadcast error;
    else if there are no listeners apart from the socket that
    requested a change with the echo flag, return the result of the
    unicast notification. Thus, with this patch, the unicast
    notification is handled in the same way of a broadcast listener
    that has set the NETLINK_BROADCAST_ERROR socket flag.

    This patch is useful in case that the caller of nlmsg_notify()
    wants to know the result of the delivery of a netlink notification
    (including the broadcast delivery) and take any action in case
    that the delivery failed. For example, ctnetlink can drop packets
    if the event delivery failed to provide reliable logging and
    state-synchronization at the cost of dropping packets.

    This patch also modifies the rtnetlink code to ignore the return
    value of rtnl_notify() in all callers. The function rtnl_notify()
    (before this patch) returned the error of the unicast notification
    which makes rtnl_set_sk_err() reports errors to all listeners. This
    is not of any help since the origin of the change (the socket that
    requested the echoing) notices the ENOBUFS error if the notification
    fails and should resync itself.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

31 Jan, 2009

1 commit


21 Nov, 2008

1 commit

  • This adds support for Data Center Bridging (DCB) features in the ixgbe
    driver and adds an rtnetlink interface for configuring DCB to the
    kernel. The DCB feature support included are Priority Grouping (PG) -
    which allows bandwidth guarantees to be allocated to groups to traffic
    based on the 802.1q priority, and Priority Based Flow Control (PFC) -
    which introduces a new MAC control PAUSE frame which works at
    granularity of the 802.1p priority instead of the link (IEEE 802.3x).

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: David S. Miller

    Alexander Duyck
     

23 Sep, 2008

1 commit


26 Jul, 2008

1 commit


20 Jul, 2008

1 commit


11 Jun, 2008

1 commit

  • Most legacy software do not like tables > 255 as rtm_table is u8
    so tb_id is sent &0xff and it is possible to mismatch for example
    table 510 with table 254 (main).

    This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
    tb_id > 255. It makes such old applications happy, new
    ones are still able to use RTA_TABLE to get a proper table id.

    Signed-off-by: Krzysztof Piotr Oledzki
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Krzysztof Piotr Oledzki
     

04 Jun, 2008

1 commit


24 Apr, 2008

1 commit

  • ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
    held. This bogus warnings when running in atomic context, which
    f.e. happens when adding secondary unicast addresses through
    macvlan or vlan or when synchronizing multicast addresses from
    wireless devices.

    Mid-term we might want to consider moving all address updates
    to process context since the locking seems overly complicated,
    for now just fix the bogus warning by changing ASSERT_RTNL to
    use mutex_is_locked().

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

05 Feb, 2008

1 commit


29 Jan, 2008

2 commits


13 Nov, 2007

1 commit


11 Oct, 2007

1 commit

  • As discussed before, this patch provides userland with a way to access
    relevant options in Router Advertisements, after they are processed
    and validated by the kernel. Extra options are processed in a generic
    way; this patch only exports RDNSS options described in RFC5006, but
    support to control which options are exported could be easily added.

    A new rtnetlink message type is defined, to transport Neighbor
    Discovery options, along with optional context information. At the
    moment only the address of the router sending an RDNSS option is
    included, but additional attributes may be later defined, if needed by
    new use cases.

    Signed-off-by: Pierre Ynard
    Signed-off-by: David S. Miller

    Pierre Ynard
     

31 Aug, 2007

1 commit


11 Jul, 2007

3 commits

  • Sent the wrong patch previously.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add a nested compat attribute type that can be used to convert
    attributes that contain a structure to nested attributes in a
    backwards compatible way.

    The attribute looks like this:

    struct {
    [ compat contents ]
    struct rtattr {
    .rta_len = total size,
    .rta_type = type,
    } rta;
    struct old_structure struct;

    [ nested top-level attribute ]
    struct rtattr {
    .rta_len = nest size,
    .rta_type = type,
    } nest_attr;

    [ optional 0 .. n nested attributes ]
    struct rtattr {
    .rta_len = private attribute len,
    .rta_type = private attribute typ,
    } nested_attr;
    struct nested_data data;
    };

    Since both userspace and kernel deal correctly with attributes that are
    larger than expected old versions will just parse the compat part and
    ignore the rest.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • With help from Chris Wedgwood.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Apr, 2007

2 commits

  • This patch adds a new interface to register rtnetlink message
    handlers replacing the exported rtnl_links[] array which
    required many message handlers to be exported unnecessarly.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

09 Dec, 2006

1 commit