29 Jun, 2013

1 commit

  • Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
    assmued that "locally destined, and routed packets, never trigger
    PMTU events or redirects that will be processed by us".

    However, it seems that tunnel devices do trigger PMTU events in certain
    cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
    skb_dst(skb)->ops->update_pmtu to propage mtu information from the
    outer flows. These can cause the inner flow mtu to be decreased. If
    next hop exceptions are not consulted for pmtu, IP fragmentation will
    not be done properly for these routes.

    It also seems that we really need to have the PMTU information always
    for netfilter TCPMSS clamp-to-pmtu feature to work properly.

    So for the time being, cache separate copies of input routes for
    each next hop exception.

    Signed-off-by: Timo Teräs
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Timo Teräs
     

13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

06 Jun, 2013

1 commit

  • Merge 'net' bug fixes into 'net-next' as we have patches
    that will build on top of them.

    This merge commit includes a change from Emil Goode
    (emilgoode@gmail.com) that fixes a warning that would
    have been introduced by this merge. Specifically it
    fixes the pingv6_ops method ipv6_chk_addr() to add a
    "const" to the "struct net_device *dev" argument and
    likewise update the dummy_ipv6_chk_addr() declaration.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jun, 2013

3 commits

  • commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
    added the support to flush learned pmtu information.

    However, using rt_genid is quite heavy as it is bumped on route
    add/change and multicast events amongst other places. These can
    happen quite often, especially if using dynamic routing protocols.

    While this is ok with routes (as they are just recreated locally),
    the pmtu information is learned from remote systems and the icmp
    notification can come with long delays. It is worthy to have separate
    genid to avoid excessive pmtu resets.

    Cc: Steffen Klassert
    Signed-off-by: Timo Teräs
    Signed-off-by: David S. Miller

    Timo Teräs
     
  • The tunnel devices call update_pmtu for each packet sent, this causes
    contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
    actually changed, and there is still plenty of time before the entry
    expires.

    Signed-off-by: Timo Teräs
    Signed-off-by: David S. Miller

    Timo Teräs
     
  • This reverts commit 05ab86c5 (xfrm4: Invalidate all ipv4 routes on
    IPsec pmtu events). Flushing all cached entries is not needed.

    Instead, invalidate only the related next hop dsts to recheck for
    the added next hop exception where needed. This also fixes a subtle
    race due to bumping generation id's before updating the pmtu.

    Cc: Steffen Klassert
    Signed-off-by: Timo Teräs
    Signed-off-by: David S. Miller

    Timo Teräs
     

28 May, 2013

1 commit

  • Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
    doesn't call __build_flow_key() directly but via
    ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
    getting pointer to IPv4 header of the ICMP redirect packet
    rather than pointer to the embedded IPv4 header of the packet
    initiating the redirect.

    As a result, handling of ICMP redirects initiated by TCP packets
    is broken. Issue was introduced by

    4895c771c ("ipv4: Add FIB nexthop exceptions.")

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubecek
     

22 Mar, 2013

1 commit


20 Feb, 2013

1 commit


19 Feb, 2013

1 commit

  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

23 Jan, 2013

1 commit

  • git commit 9cb3a50c (ipv4: Invalidate the socket cached route on
    pmtu events if possible) introduced a refcount problem. We don't
    get a refcount on the route if we get it from__sk_dst_get(), but
    we need one if we want to reuse this route because __sk_dst_set()
    releases the refcount of the old route. This patch adds proper
    refcount handling for that case. We introduce a 'new' flag to
    indicate that we are going to use a new route and we release the
    old route only if we replace it by a new one.

    Reported-by: Julian Anastasov
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

22 Jan, 2013

1 commit

  • The route lookup in ipv4_sk_update_pmtu() might return a route
    different from the route we cached at the socket. This is because
    standart routes are per cpu, so each cpu has it's own struct rtable.
    This means that we do not invalidate the socket cached route if the
    NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
    uses. As a result, the cached route reused until we disconnect.

    With this patch we invalidate the socket cached route if possible.
    If the socket is owened by the user, we can't update the cached
    route directly. A followup patch will implement socket release
    callback functions for datagram sockets to handle this case.

    Reported-by: Yurij M. Plotnikov
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

17 Jan, 2013

2 commits

  • Routes with locked mtu should not use learned pmtu informations,
    so do not update the pmtu on these routes.

    Reported-by: Julian Anastasov
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • The output route check was introduced with git commit 261663b0
    (ipv4: Don't use the cached pmtu informations for input routes)
    during times when we cached the pmtu informations on the
    inetpeer. Now the pmtu informations are back in the routes,
    so this check is obsolete. It also had some unwanted side effects,
    as reported by Timo Teras and Lukas Tribus.

    Signed-off-by: Steffen Klassert
    Acked-by: Timo Teräs
    Signed-off-by: David S. Miller

    Steffen Klassert
     

08 Dec, 2012

1 commit

  • Commit f1ce3062c538 (ipv4: Remove 'rt_dst' from 'struct rtable') removes the
    call to ipmr_get_route(), which will get multicast parameters of the route.

    I revert the part of the patch that remove this call. I think the goal was only
    to get rid of rt_dst field.

    The patch is only compiled-tested. My first idea was to remove ipmr_get_route()
    because rt_fill_info() was the only user, but it seems the previous patch cleans
    the code a bit too much ;-)

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

26 Nov, 2012

1 commit


23 Nov, 2012

1 commit

  • Starting from 3.6 we cache output routes for
    multicasts only when using route to 224/4. For local receivers
    we can set RTCF_LOCAL flag depending on the membership but
    in such case we use maddr and saddr which are not caching
    keys as before. Additionally, we can not use same place to
    cache routes that differ in RTCF_LOCAL flag value.

    Fix it by caching only RTCF_MULTICAST entries
    without RTCF_LOCAL (send-only, no loopback). As a side effect,
    we avoid unneeded lookup for fnhe when not caching because
    multicasts are not redirected and they do not learn PMTU.

    Thanks to Maxime Bizon for showing the caching
    problems in __mkroute_output for 3.6 kernels: different
    RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
    ip_output call and the visible problem is that traffic can
    not reach local receivers via loopback.

    Reported-by: Maxime Bizon
    Tested-by: Maxime Bizon
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

19 Nov, 2012

1 commit

  • In preparation for supporting the creation of network namespaces
    by unprivileged users, modify all of the per net sysctl exports
    and refuse to allow them to unprivileged users.

    This makes it safe for unprivileged users in general to access
    per net sysctls, and allows sysctls to be exported to unprivileged
    users on an individual basis as they are deemed safe.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 Nov, 2012

1 commit

  • The xfrm gc threshold value depends on ip_rt_max_size. This
    value was set to INT_MAX with the routing cache removal patch,
    so we start doing garbage collecting when we have INT_MAX/2
    IPsec routes cached. Fix this by going back to the static
    threshold of 1024 routes.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

19 Oct, 2012

1 commit

  • Currently we can not flush cached pmtu/redirect informations via
    the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
    of the old route and reset the nh exeption if the old route is
    expired when we bind a new route to a nh exeption.

    Signed-off-by: Steffen Klassert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Steffen Klassert
     

11 Oct, 2012

1 commit

  • Sparse complains about RTA_MARK which is should be host order according
    to include file and usage in iproute.

    net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
    net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value
    net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_mark

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

09 Oct, 2012

7 commits

  • Add flag to request that output route should be
    returned with known rt_gateway, in case we want to use
    it as nexthop for neighbour resolving.

    The returned route can be cached as follows:

    - in NH exception: because the cached routes are not shared
    with other destinations
    - in FIB NH: when using gateway because all destinations for
    NH share same gateway

    As last option, to return rt_gateway!=0 we have to
    set DST_NOCACHE.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Add new flag to remember when route is via gateway.
    We will use it to allow rt_gateway to contain address of
    directly connected host for the cases when DST_NOCACHE is
    used or when the NH exception caches per-destination route
    without DST_NOCACHE flag, i.e. when routes are not used for
    other destinations. By this way we force the neighbour
    resolving to work with the routed destination but we
    can use different address in the packet, feature needed
    for IPVS-DR where original packet for virtual IP is routed
    via route to real IP.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Avoid checking nh_pcpu_rth_output in fast path,
    abort fib_info creation on alloc_percpu failure.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • After "Cache input routes in fib_info nexthops" (commit
    d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
    (commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
    should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
    the same fib_info can be used for traffic that is not redirected,
    eg. from other input devices or from sources that are not in same subnet.

    As result, we have to disable the caching of RTCF_DOREDIRECT
    flag and to force source validation for the case when forwarding
    traffic to the input device. If traffic comes from directly connected
    source we allow redirection as it was done before both changes.

    Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
    is disabled, this can avoid source address validation and to
    help caching the routes.

    After the change "Adjust semantics of rt->rt_gateway"
    (commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
    contain daddr instead of 0.0.0.0 when target is directly connected.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • We report cached pmtu values even if they are already expired.
    Change this to not report these values after they are expired
    and fix a race in the expire time calculation, as suggested by
    Eric Dumazet.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • When a local tool like tracepath tries to send packets bigger than
    the device mtu, we create a nh exeption and set the pmtu to device
    mtu. The device mtu does not expire, so check if the device mtu is
    smaller than the reported pmtu and don't crerate a nh exeption in
    that case.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Some protocols, like IPsec still cache routes. So we need to invalidate
    the old route on pmtu events to avoid the reuse of stale routes.
    We also need to update the mtu and expire time of the route if we already
    use a nh exception route, otherwise we ignore newly learned pmtu values
    after the first expiration.

    With this patch we always invalidate or update the route on pmtu events.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Sep, 2012

3 commits


11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

08 Sep, 2012

2 commits


01 Sep, 2012

2 commits

  • In ipv4_mtu there is some logic where we are testing for a non-zero value
    and a timer expiration, then setting the value to zero, and then testing if
    the value is zero we set it to a value based on the dst. Instead of
    bothering with the extra steps it is easier to just cleanup the logic so
    that we set it to the dst based value if it is zero or if the timer has
    expired.

    Signed-off-by: Alexander Duyck

    Alexander Duyck
     
  • Merge the 'net' tree to get the recent set of netfilter bug fixes in
    order to assist with some merge hassles Pablo is going to have to deal
    with for upcoming changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Aug, 2012

1 commit

  • Following lockdep splat was reported by Pavel Roskin :

    [ 1570.586223] ===============================
    [ 1570.586225] [ INFO: suspicious RCU usage. ]
    [ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
    [ 1570.586229] -------------------------------
    [ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
    [ 1570.586233]
    [ 1570.586233] other info that might help us debug this:
    [ 1570.586233]
    [ 1570.586236]
    [ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
    [ 1570.586238] 2 locks held by Chrome_IOThread/4467:
    [ 1570.586240] #0: (slock-AF_INET){+.-...}, at: [] release_sock+0x2c/0xa0
    [ 1570.586253] #1: (fnhe_lock){+.-...}, at: [] update_or_create_fnhe+0x2c/0x270
    [ 1570.586260]
    [ 1570.586260] stack backtrace:
    [ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
    [ 1570.586265] Call Trace:
    [ 1570.586271] [] lockdep_rcu_suspicious+0xfd/0x130
    [ 1570.586275] [] update_or_create_fnhe+0x15c/0x270
    [ 1570.586278] [] __ip_rt_update_pmtu+0x73/0xb0
    [ 1570.586282] [] ip_rt_update_pmtu+0x29/0x90
    [ 1570.586285] [] inet_csk_update_pmtu+0x2c/0x80
    [ 1570.586290] [] tcp_v4_mtu_reduced+0x2e/0xc0
    [ 1570.586293] [] tcp_release_cb+0xa4/0xb0
    [ 1570.586296] [] release_sock+0x55/0xa0
    [ 1570.586300] [] tcp_sendmsg+0x4af/0xf50
    [ 1570.586305] [] inet_sendmsg+0x120/0x230
    [ 1570.586308] [] ? inet_sk_rebuild_header+0x40/0x40
    [ 1570.586312] [] ? sock_update_classid+0xbd/0x3b0
    [ 1570.586315] [] ? sock_update_classid+0x130/0x3b0
    [ 1570.586320] [] do_sock_write+0xc5/0xe0
    [ 1570.586323] [] sock_aio_write+0x53/0x80
    [ 1570.586328] [] do_sync_write+0xa3/0xe0
    [ 1570.586332] [] vfs_write+0x165/0x180
    [ 1570.586335] [] sys_write+0x45/0x90
    [ 1570.586340] [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Eric Dumazet
    Reported-by: Pavel Roskin
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Aug, 2012

1 commit

  • Multicast traffic allocates dst with DST_NOCACHE, but dst is
    not inserted into rt_uncached_list.

    This slowdown multicast workloads on SMP because rt_uncached_lock is
    contended.

    Change the test before taking the lock to actually check the dst
    was inserted into rt_uncached_list.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Aug, 2012

1 commit

  • Sylvain Munault reported following info :

    - TCP connection get "stuck" with data in send queue when doing
    "large" transfers ( like typing 'ps ax' on a ssh connection )
    - Only happens on path where the PMTU is lower than the MTU of
    the interface
    - Is not present right after boot, it only appears 10-20min after
    boot or so. (and that's inside the _same_ TCP connection, it works
    fine at first and then in the same ssh session, it'll get stuck)
    - Definitely seems related to fragments somehow since I see a router
    sending ICMP message saying fragmentation is needed.
    - Exact same setup works fine with kernel 3.5.1

    Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
    period is over.

    ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
    but dst_set_expires() does nothing because dst.expires is already set.

    It seems we want to set the expires field to a new value, regardless
    of prior one.

    With help from Julian Anastasov.

    Reported-by: Sylvain Munaut
    Signed-off-by: Eric Dumazet
    CC: Julian Anastasov
    Tested-by: Sylvain Munaut
    Signed-off-by: David S. Miller

    Eric Dumazet