Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

16 Sep, 2014

1 commit

f92ee6198 xfrm: Generate blackhole routes only from route lookup functions ... Browse Code »
18

Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1d ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis
Signed-off-by: Steffen Klassert

Steffen Klassert
2014-09-16 16:08:40 +0800

13 Sep, 2014

1 commit

381f4dca4 ipv6: clean up anycast when an interface is destroyed ... Browse Code »

If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).

Signed-off-by: Sabrina Dubroca
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Sabrina Dubroca
2014-09-13 05:33:06 +0800

08 Sep, 2014

1 commit

de185ab46 ipv6: restore the behavior of ipv6_sock_ac_drop() ... Browse Code »
5

It is possible that the interface is already gone after joining
the list of anycast on this interface as we don't hold a refcount
for the device, in this case we are safe to ignore the error.

What's more important, for API compatibility we should not
change this behavior for applications even if it were correct.

Fixes: commit a9ed4a2986e13011 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
Cc: Sabrina Dubroca
Cc: David S. Miller
Signed-off-by: Cong Wang
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

WANG Cong
2014-09-08 07:10:07 +0800

06 Sep, 2014

3 commits

e7478dfc4 ipv6: use addrconf_get_prefix_route() to remove peer addr ... Browse Code »

addrconf_get_prefix_route() ensures to get the right route in the right table.

Signed-off-by: Nicolas Dichtel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Nicolas Dichtel
2014-09-06 08:13:24 +0800
f24062b07 ipv6: fix a refcnt leak with peer addr ... Browse Code »

There is no reason to take a refcnt before deleting the peer address route.
It's done some lines below for the local prefix route because
inet6_ifa_finish_destroy() will release it at the end.
For the peer address route, we want to free it right now.

This bug has been introduced by commit
caeaba79009c ("ipv6: add support of peer address").

Signed-off-by: Nicolas Dichtel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Nicolas Dichtel
2014-09-06 08:13:24 +0800
a9ed4a298 ipv6: fix rtnl locking in setsockopt for anycast and multicast ... Browse Code »
23

Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()

ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
calling ipv6_dev_mc_inc/dec.

This patch moves ASSERT_RTNL() up a level in the call stack.

Signed-off-by: Cong Wang
Signed-off-by: Sabrina Dubroca
Reported-by: Tommi Rantala
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Sabrina Dubroca
2014-09-06 02:52:28 +0800

03 Sep, 2014

2 commits

41ad82f7f netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG ... Browse Code »

make defconfig reports:

warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED)

Fixes: d79a61d netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_*
Reported-by: kbuild test robot
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira
2014-09-03 04:59:54 +0800
abccc5878 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
pull request: Netfilter/IPVS fixes for net

The following patchset contains seven Netfilter fixes for your net
tree, they are:

1) Make the NAT infrastructure independent of x_tables, some users are
already starting to test nf_tables with NAT without enabling x_tables.
Without this patch for Kconfig, there's a superfluous dependency
between NAT and x_tables.
2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL
with no good reason. From Daniel Borkmann.

3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also
resolves another NAT dependency with x_tables.

4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook
code as elsewhere in the kernel to resolve toolchain problems, from
Zhouyi Zhou.

5) Use iptunnel_handle_offloads() to set up tunnel encapsulation
depending on the offload capabilities, reported by Alex Gartrell
patch from Julian Anastasov.

6) Fix wrong family when registering the ip_vs_local_reply6() hook,
also from Julian.

7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał
Miłecki reported that when jumping from 3.16 to 3.17-rc, his log
target is not selected anymore due to changes in the previous
development cycle to accomodate the full logging support for
nf_tables.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-09-03 04:56:30 +0800

23 Aug, 2014

1 commit

793c3b400 net: ipv6: fib: don't sleep inside atomic lock ... Browse Code »

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
> #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> #1: (&tb->tb6_lock){++--+.}, at: [] ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
> 0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
> ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
> 0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
> Call Trace:
> [] dump_stack+0x4e/0x68
> [] __might_sleep+0x10a/0x120
> [] kmem_cache_alloc_trace+0x4e/0x190
> [] ? fib6_commit_metrics+0x66/0x110
> [] fib6_commit_metrics+0x66/0x110
> [] fib6_add+0x883/0xa80
> [] ? ip6_route_add+0x65a/0x800
> [] ip6_route_add+0x675/0x800
> [] ? ip6_route_add+0x6a/0x800
> [] inet6_rtm_newroute+0x5c/0x80
> [] rtnetlink_rcv_msg+0x211/0x260
> [] ? rtnl_lock+0x17/0x20
> [] ? lock_release_holdtime+0x28/0x180
> [] ? rtnl_lock+0x17/0x20
> [] ? __rtnl_unlock+0x20/0x20
> [] netlink_rcv_skb+0x6e/0xd0
> [] rtnetlink_rcv+0x25/0x40
> [] netlink_unicast+0xd9/0x180
> [] netlink_sendmsg+0x700/0x770
> [] ? local_clock+0x25/0x30
> [] sock_sendmsg+0x6c/0x90
> [] ? might_fault+0xa3/0xb0
> [] ? verify_iovec+0x7d/0xf0
> [] ___sys_sendmsg+0x37e/0x3b0
> [] ? trace_hardirqs_on_caller+0x185/0x220
> [] ? mutex_unlock+0xe/0x10
> [] ? netlink_insert+0xbc/0xe0
> [] ? netlink_autobind.isra.30+0x125/0x150
> [] ? netlink_autobind.isra.30+0x60/0x150
> [] ? netlink_bind+0x159/0x230
> [] ? might_fault+0x5a/0xb0
> [] ? SYSC_bind+0x7e/0xd0
> [] __sys_sendmsg+0x4d/0x80
> [] SyS_sendmsg+0x12/0x20
> [] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.

Signed-off-by: Benjamin Block
Acked-by: David Rientjes
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Benjamin Block
2014-08-23 01:54:49 +0800

19 Aug, 2014

1 commit

8993cf8ed netfilter: move NAT Kconfig switches out of the iptables scope ... Browse Code »
13

Currently, the NAT configs depend on iptables and ip6tables. However,
users should be capable of enabling NAT for nft without having to
switch on iptables.

Fix this by adding new specific IP_NF_NAT and IP6_NF_NAT config
switches for iptables and ip6tables NAT support. I have also moved
the original NF_NAT_IPV4 and NF_NAT_IPV6 configs out of the scope
of iptables to make them independent of it.

This patch also adds NETFILTER_XT_NAT which selects the xt_nat
combo that provides snat/dnat for iptables. We cannot use NF_NAT
anymore since nf_tables can select this.

Reported-by: Matteo Croce
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2014-08-19 03:55:54 +0800

15 Aug, 2014

2 commits

4fab90719 tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() ... Browse Code »
5

Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.

Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Diagnosed-by: Willem de Bruijn
Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Neal Cardwell
2014-08-15 05:38:54 +0800
bc8fc7b8f sit: Fix ipip6_tunnel_lookup device matching criteria ... Browse Code »
5

As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.

However the comparison was incorrectly based on skb->dev->iflink.

Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.

This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).

Signed-off-by: Shmulik Ladkani
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Shmulik Ladkani
2014-08-15 05:38:54 +0800

07 Aug, 2014

3 commits

33caee399 Merge branch 'akpm' (patchbomb from Andrew Morton) ... Browse Code »

Merge incoming from Andrew Morton:
- Various misc things.
- arch/sh updates.
- Part of ocfs2. Review is slow.
- Slab updates.
- Most of -mm.
- printk updates.
- lib/ updates.
- checkpatch updates.

* emailed patches from Andrew Morton : (226 commits)
checkpatch: update $declaration_macros, add uninitialized_var
checkpatch: warn on missing spaces in broken up quoted
checkpatch: fix false positives for --strict "space after cast" test
checkpatch: fix false positive MISSING_BREAK warnings with --file
checkpatch: add test for native c90 types in unusual order
checkpatch: add signed generic types
checkpatch: add short int to c variable types
checkpatch: add for_each tests to indentation and brace tests
checkpatch: fix brace style misuses of else and while
checkpatch: add --fix option for a couple OPEN_BRACE misuses
checkpatch: use the correct indentation for which()
checkpatch: add fix_insert_line and fix_delete_line helpers
checkpatch: add ability to insert and delete lines to patch/file
checkpatch: add an index variable for fixed lines
checkpatch: warn on break after goto or return with same tab indentation
checkpatch: emit a warning on file add/move/delete
checkpatch: add test for commit id formatting style in commit log
checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings
checkpatch: improve "no space after cast" test
checkpatch: allow multiple const * types
...

Linus Torvalds
2014-08-07 12:14:42 +0800
1d023284c list: fix order of arguments for hlist_add_after(_rcu) ... Browse Code »
13

All other add functions for lists have the new item as first argument
and the position where it is added as second argument. This was changed
for no good reason in this function and makes using it unnecessary
confusing.

The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias
Cc: "Paul E. McKenney"
Acked-by: Jeff Kirsher [intel driver bits]
Cc: Hugh Dickins
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Helias
2014-08-07 09:01:24 +0800
9ea88a153 tcp: md5: check md5 signature without socket lock ... Browse Code »

Since a8afca032 (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup
doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is
no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these
calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock:
from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv.

Signed-off-by: Dmitry Popov
Signed-off-by: David S. Miller

Dmitry Popov
2014-08-07 07:00:20 +0800

06 Aug, 2014

2 commits

d247b6ab3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/Makefile
net/ipv6/sysctl_net_ipv6.c

Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.

In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.

Signed-off-by: David S. Miller

David S. Miller
2014-08-06 09:46:26 +0800
09c2d251b net-timestamp: add key to disambiguate concurrent datagrams ... Browse Code »

Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.

Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.

The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.

The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2014-08-06 07:35:54 +0800

03 Aug, 2014

5 commits

166bd890a ipv6: data of fwmark_reflect sysctl needs to be updated on netns construction ... Browse Code »

Fixes: e110861f86094cd ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-08-03 07:16:54 +0800
d4ad4d22e inet: frags: use kmem_cache for inet_frag_queue ... Browse Code »

Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.

Signed-off-by: Nikolay Aleksandrov
Acked-by: Florian Westphal
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-08-03 06:31:31 +0800
2e404f632 inet: frags: use INET_FRAG_EVICTED to prevent icmp messages ... Browse Code »

Now that we have INET_FRAG_EVICTED we might as well use it to stop
sending icmp messages in the "frag_expire" functions instead of
stripping INET_FRAG_FIRST_IN from their flags when evicting.
Also fix the comment style in ip6_expire_frag_queue().

Signed-off-by: Nikolay Aleksandrov
Reviewed-by: Florian Westphal
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-08-03 06:31:31 +0800
06aa8b8a0 inet: frags: rename last_in to flags ... Browse Code »

The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-08-03 06:31:31 +0800
d2373862b inet: frags: use INC_STATS_BH in the ipv6 reassembly code ... Browse Code »

Softirqs are already disabled so no need to do it again, thus let's be
consistent and use the IP6_INC_STATS_BH variant.

Signed-off-by: Nikolay Aleksandrov
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-08-03 06:31:31 +0800

01 Aug, 2014

2 commits

4330487ac net: use inet6_iif instead of IP6CB()->iif ... Browse Code »

Signed-off-by: Duan Jiong
Signed-off-by: David S. Miller

Duan Jiong
2014-08-01 13:37:06 +0800
7304fe468 net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS ... Browse Code »

When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
and icmpv6_notify() do some valid checks on packet's length, but then some
protocols check packet's length redaudantly. So remove those duplicated
statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
function icmp_socket_deliver() and icmpv6_notify() respectively.

In addition, add missed counter in udp6/udplite6 when socket is NULL.

Signed-off-by: Duan Jiong
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Duan Jiong
2014-08-01 13:04:18 +0800

31 Jul, 2014

2 commits

ccda4a77f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-07-30

This is the last pull request for ipsec-next before I'll be
off for two weeks starting on friday. David, can you please
take urgent ipsec patches directly into net/net-next during
this time?

1) Error handling simplifications for vti and vti6.
From Mathias Krause.

2) Remove a duplicate semicolon after a return statement.
From Christoph Paasch.
====================

Signed-off-by: David S. Miller

David S. Miller
2014-07-31 11:05:54 +0800
f139c74a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2014-07-31 04:25:49 +0800

30 Jul, 2014

1 commit

a317a2f19 ipv6: fail early when creating netdev named all or default ... Browse Code »
5

We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".

Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.

Reported-by: Stephane Chazelas
Cc: "David S. Miller"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2014-07-30 02:43:50 +0800

29 Jul, 2014

1 commit

04ca6973f ip: make IP identifiers less predictable ... Browse Code »
21

In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet
Reported-by: Jeffrey Knockel
Reported-by: Jedidiah R. Crandall
Cc: Willy Tarreau
Cc: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2014-07-29 09:46:34 +0800

28 Jul, 2014

9 commits

1bab4c750 inet: frag: set limits and make init_net's high_thresh limit global ... Browse Code »

This patch makes init_net's high_thresh limit to be the maximum for all
namespaces, thus introducing a global memory limit threshold equal to the
sum of the individual high_thresh limits which are capped.
It also introduces some sane minimums for low_thresh as it shouldn't be
able to drop below 0 (or > high_thresh in the unsigned case), and
overall low_thresh should not ever be above high_thresh, so we make the
following relations for a namespace:
init_net:
high_thresh - max(not capped), min(init_net low_thresh)
low_thresh - max(init_net high_thresh), min (0)

all other namespaces:
high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
low_thresh = max(namespace's high_thresh), min(0)

The major issue with having low_thresh > high_thresh is that we'll
schedule eviction but never evict anything and thus rely only on the
timers.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-07-28 13:34:36 +0800
ab1c724f6 inet: frag: use seqlock for hash rebuild ... Browse Code »

rehash is rare operation, don't force readers to take
the read-side rwlock.

Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.

If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:36 +0800
e3a57d18b inet: frag: remove periodic secret rebuild timer ... Browse Code »

merge functionality into the eviction workqueue.

Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.

If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.

ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:36 +0800
3fd588eb9 inet: frag: remove lru list ... Browse Code »

no longer used.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:36 +0800
434d30540 inet: frag: don't account number of fragment queues ... Browse Code »

The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter. Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:36 +0800
b13d3cbfb inet: frag: move eviction of queues to work queue ... Browse Code »
31

When the high_thresh limit is reached we try to toss the 'oldest'
incomplete fragment queues until memory limits are below the low_thresh
value. This happens in softirq/packet processing context.

This has two drawbacks:

1) processors might evict a queue that was about to be completed
by another cpu, because they will compete wrt. resource usage and
resource reclaim.

2) LRU list maintenance is expensive.

But when constantly overloaded, even the 'least recently used' element is
recent, so removing 'lru' queue first is not 'fairer' than removing any
other fragment queue.

This moves eviction out of the fast path:

When the low threshold is reached, a work queue is scheduled
which then iterates over the table and removes the queues that exceed
the memory limits of the namespace. It sets a new flag called
INET_FRAG_EVICTED on the evicted queues so the proper counters will get
incremented when the queue is forcefully expired.

When the high threshold is reached, no more fragment queues are
created until we're below the limit again.

The LRU list is now unused and will be removed in a followup patch.

Joint work with Nikolay Aleksandrov.

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:35 +0800
86e93e470 inet: frag: move evictor calls into frag_find function ... Browse Code »

First step to move eviction handling into a work queue.

We lose two spots that accounted evicted fragments in MIB counters.

Accounting will be restored since the upcoming work-queue evictor
invokes the frag queue timer callbacks instead.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:35 +0800
fb3cfe6e7 inet: frag: remove hash size assumptions from callers ... Browse Code »

hide actual hash size from individual users: The _find
function will now fold the given hash value into the required range.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:35 +0800
36c777821 inet: frag: constify match, hashfn and constructor arguments ... Browse Code »

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2014-07-28 13:34:35 +0800

25 Jul, 2014

2 commits

ac3d2e5a9 ipv6: remove obsolete comment in ip6_append_data() ... Browse Code »

After 11878b40e[net-timestamp: SOCK_RAW and PING timestamping], this comment
becomes obsolete since the codes check not only UDP socket, but also RAW sock;
and the codes are clear, not need the comments

Signed-off-by: Li RongQing
Signed-off-by: David S. Miller

Li RongQing
2014-07-25 14:47:04 +0800
56ec0fb10 neigh: remove exceptional & on function name ... Browse Code »

In this file, function names are otherwise used as pointers without &.

A simplified version of the Coccinelle semantic patch that makes this
change is as follows:

//
@r@
identifier f;
@@

f(...) { ... }

@@
identifier r.f;
@@

- &f
+ f
//

Signed-off-by: Himangi Saraogi
Acked-by: Julia Lawall
Signed-off-by: David S. Miller

Himangi Saraogi
2014-07-25 14:23:31 +0800

24 Jul, 2014

1 commit

274f482d3 sock: remove skb argument from sk_rcvqueues_full ... Browse Code »

It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits).

Signed-off-by: Sorin Dumitru
Signed-off-by: David S. Miller

Sorin Dumitru
2014-07-24 04:23:06 +0800