Eric Lee / smarc-fsl-linux-kernel

07 Mar, 2014

1 commit

e588e2f28 inet: frag: make sure forced eviction removes all frags ... Browse Code »

Quoting Alexander Aring:
While fragmentation and unloading of 6lowpan module I got this kernel Oops
after few seconds:

BUG: unable to handle kernel paging request at f88bbc30
[..]
Modules linked in: ipv6 [last unloaded: 6lowpan]
Call Trace:
[] ? call_timer_fn+0x54/0xb3
[] ? process_timeout+0xa/0xa
[] run_timer_softirq+0x140/0x15f

Problem is that incomplete frags are still around after unload; when
their frag expire timer fires, we get crash.

When a netns is removed (also done when unloading module), inet_frag
calls the evictor with 'force' argument to purge remaining frags.

The evictor loop terminates when accounted memory ('work') drops to 0
or the lru-list becomes empty. However, the mem accounting is done
via percpu counters and may not be accurate, i.e. loop may terminate
prematurely.

Alter evictor to only stop once the lru list is empty when force is
requested.

Reported-by: Phoebe Buckheister
Reported-by: Alexander Aring
Tested-by: Alexander Aring
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Florian Westphal
2014-03-07 04:28:45 +0800

06 Mar, 2014

1 commit

24b9bf43e net: fix for a race condition in the inet frag code ... Browse Code »

I stumbled upon this very serious bug while hunting for another one,
it's a very subtle race condition between inet_frag_evictor,
inet_frag_intern and the IPv4/6 frag_queue and expire functions
(basically the users of inet_frag_kill/inet_frag_put).

What happens is that after a fragment has been added to the hash chain
but before it's been added to the lru_list (inet_frag_lru_add) in
inet_frag_intern, it may get deleted (either by an expired timer if
the system load is high or the timer sufficiently low, or by the
fraq_queue function for different reasons) before it's added to the
lru_list, then after it gets added it's a matter of time for the
evictor to get to a piece of memory which has been freed leading to a
number of different bugs depending on what's left there.

I've been able to trigger this on both IPv4 and IPv6 (which is normal
as the frag code is the same), but it's been much more difficult to
trigger on IPv4 due to the protocol differences about how fragments
are treated.

The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
in a RR bond, so the same flow can be seen on multiple cards at the
same time. Then I used multiple instances of ping/ping6 to generate
fragmented packets and flood the machines with them while running
other processes to load the attacked machine.

*It is very important to have the _same flow_ coming in on multiple CPUs
concurrently. Usually the attacked machine would die in less than 30
minutes, if configured properly to have many evictor calls and timeouts
it could happen in 10 minutes or so.

An important point to make is that any caller (frag_queue or timer) of
inet_frag_kill will remove both the timer refcount and the
original/guarding refcount thus removing everything that's keeping the
frag from being freed at the next inet_frag_put. All of this could
happen before the frag was ever added to the LRU list, then it gets
added and the evictor uses a freed fragment.

An example for IPv6 would be if a fragment is being added and is at
the stage of being inserted in the hash after the hash lock is
released, but before inet_frag_lru_add executes (or is able to obtain
the lru lock) another overlapping fragment for the same flow arrives
at a different CPU which finds it in the hash, but since it's
overlapping it drops it invoking inet_frag_kill and thus removing all
guarding refcounts, and afterwards freeing it by invoking
inet_frag_put which removes the last refcount added previously by
inet_frag_find, then inet_frag_lru_add gets executed by
inet_frag_intern and we have a freed fragment in the lru_list.

The fix is simple, just move the lru_add under the hash chain locked
region so when a removing function is called it'll have to wait for
the fragment to be added to the lru_list, and then it'll remove it (it
works because the hash chain removal is done before the lru_list one
and there's no window between the two list adds when the frag can get
dropped). With this fix applied I couldn't kill the same machine in 24
hours with the same setup.

Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of
rwlock")

CC: Florian Westphal
CC: Jesper Dangaard Brouer
CC: David S. Miller

Signed-off-by: Nikolay Aleksandrov
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2014-03-06 09:31:42 +0800

24 Oct, 2013

1 commit

7088ad74e inet: remove old fragmentation hash initializing ... Browse Code »

All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-10-24 05:01:41 +0800

10 Jul, 2013

1 commit

496322bc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.

Highlights:

1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().

Currently ixgbe, mlx4, and bnx2x support this feature.

Full high level description, performance numbers, and design in
commit 0a4db187a999 ("Merge branch 'll_poll'")

From Eliezer Tamir.

2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.

3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.

5) Allow controlling network device VF link state using netlink, from
Rony Efraim.

6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.

8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.

9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.

10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.

11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.

12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.

13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.

14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.

15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.

16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.

17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.

18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.

19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.

20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.

21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.

23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.

24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.

25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.

26) Support IPV6 in ping sockets. From Lorenzo Colitti.

27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.

28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.

29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

30) Fix L2TP sequence number handling bugs, from James Chapman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...

Linus Torvalds
2013-07-10 09:24:39 +0800

04 Jul, 2013

1 commit

0ed5fd138 mm: use totalram_pages instead of num_physpages at runtime ... Browse Code »

The global variable num_physpages is scheduled to be removed, so use
totalram_pages instead of num_physpages at runtime.

Signed-off-by: Jiang Liu
Cc: Miklos Szeredi
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:35 +0800

20 Jun, 2013

1 commit

af92e5425 inet: frag , remove an empty ifdef. ... Browse Code »

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2013-06-20 14:06:52 +0800

06 May, 2013

1 commit

b56141ab3 net: frag, fix race conditions in LRU list maintenance ... Browse Code »

This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4bf92c6d2510fe5c4dc51852746f206
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at (null)
[119482.132693] IP: [] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[] [] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58 EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS: 00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004] ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038] ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083] 00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009] [] ip_defrag+0x8fa/0xd10
[119482.252921] [] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803] [] nf_iterate+0x8b/0xa0
[119482.262658] [] ? inet_del_offload+0x40/0x40
[119482.267527] [] nf_hook_slow+0x74/0x130
[119482.272412] [] ? inet_del_offload+0x40/0x40
[119482.277302] [] ip_rcv+0x268/0x320
[119482.282147] [] __netif_receive_skb_core+0x612/0x7e0
[119482.286998] [] __netif_receive_skb+0x18/0x60
[119482.291826] [] process_backlog+0xa0/0x160
[119482.296648] [] net_rx_action+0x139/0x220
[119482.301403] [] __do_softirq+0xe7/0x220
[119482.306103] [] run_ksoftirqd+0x28/0x40
[119482.310809] [] smpboot_thread_fn+0xff/0x1a0
[119482.315515] [] ? lg_local_lock_cpu+0x40/0x40
[119482.320219] [] kthread+0xc0/0xd0
[119482.324858] [] ? insert_kthread_work+0x40/0x40
[119482.329460] [] ret_from_fork+0x7c/0xb0
[119482.334057] [] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP [] __list_del_entry+0x29/0xd0
[119482.348675] RSP
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov
Cc: Jesper Dangaard Brouer
Cc: Florian Westphal
Cc: Eric Dumazet
Cc: David S. Miller
Acked-by: Florian Westphal
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Konstantin Khlebnikov
2013-05-06 23:06:51 +0800

05 Apr, 2013

1 commit

19952cc4f net: frag queue per hash bucket locking ... Browse Code »

This patch implements per hash bucket locking for the frag queue
hash. This removes two write locks, and the only remaining write
lock is for protecting hash rebuild. This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
(http://thread.gmane.org/gmane.linux.network/257155)
Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
Ethernet flow-control. A third interface is used for generating the
DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly. Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
Test-20G64K == 2x10G with 65K fragments
Test-20G3F == 2x10G with 3x fragments (3*1472 bytes)
Test-20G64K+DoS == Same as 20G64K with frag DoS
Test-20G3F+DoS == Same as 20G3F with frag DoS
Test-20G64K+MQ == Same as 20G64K with Multi-Queue frag DoS
Test-20G3F+MQ == Same as 20G3F with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression. BUT this
was caused by some unrelated change in-between. See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type: 20G64K 20G3F 20G64K+DoS 20G3F+DoS 20G64K+MQ 20G3F+MQ
---------- ------- ------- ---------- --------- -------- -------
(A) Patch-02 : 18848.7 13230.1 4103.04 5310.36 130.0 440.2
(B) 1b5ab0de : 18841.5 13156.8 4101.08 5314.57 129.0 424.2
(C) Patch-03v1: 18838.0 13490.5 4405.11 6814.72 196.6 461.6

(D) a210576c : 18321.5 11250.4 3635.34 5160.13 119.1 405.2
(E) with _bh : 17247.3 11492.6 3994.74 6405.29 166.7 413.6
(F) without bh: 17471.3 11298.7 3818.05 6102.11 165.7 406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried). But the other results does show
improvements. And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer
Acked-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
need the spinlock _bh versions, as Netfilter currently does a
local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-04-05 05:37:05 +0800

28 Mar, 2013

2 commits

1b5ab0def net: use the frag lru_lock to protect netns_frags.nqueues update ... Browse Code »

Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock. As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-03-28 00:48:33 +0800
68399ac37 net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop ... Browse Code »

The LRU list is protected by its own lock, since commit 3ef0eb0db4
(net: frag, move LRU list maintenance outside of rwlock), and
no-longer by a read_lock.

This makes it possible, to remove the inet_frag_queue, which is about
to be "evicted", from the LRU list head. This avoids the problem, of
several CPUs grabbing the same frag queue.

Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
called by inet_frag_kill(), because inet_frag_kill() is also used in
other situations. Thus, we use list_del_init() to allow this
double list_del to work.

Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-03-28 00:48:33 +0800

25 Mar, 2013

1 commit

be991971d inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6 ... Browse Code »

This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet
Cc: Jesper Dangaard Brouer
Cc: YOSHIFUJI Hideaki
Signed-off-by: Hannes Frederic Sowa
Acked-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-03-25 05:16:30 +0800

19 Mar, 2013

1 commit

5a3da1fe9 inet: limit length of fragment queue hash table bucket lists ... Browse Code »

This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet
Cc: Jesper Dangaard Brouer
Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-03-19 22:28:36 +0800

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

30 Jan, 2013

3 commits

3ef0eb0db net: frag, move LRU list maintenance outside of rwlock ... Browse Code »

Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock. However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.

Original-idea-by: Florian Westphal
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-01-30 02:36:24 +0800
6d7b857d5 net: use lib/percpu_counter API for fragmentation mem accounting ... Browse Code »

Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.

Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.

At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage. Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().

The batch size is increased to 130.000, based on the largest 64K
fragment memory usage. This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.

It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-01-30 02:36:24 +0800
d433673e5 net: frag helper functions for mem limit tracking ... Browse Code »

This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue. This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2013-01-30 02:36:24 +0800

20 Sep, 2012

1 commit

6b102865e ipv6: unify fragment thresh handling code ... Browse Code »

Cc: Herbert Xu
Cc: Michal Kubeček
Cc: David Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Amerigo Wang
2012-09-20 05:23:28 +0800

09 Jun, 2012

1 commit

54db0cc2b inetpeer: add parameter net for inet_getpeer_v4,v6 ... Browse Code »

add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.

and modify some places to provide net for inet_getpeer_v[4,6]

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2012-06-09 05:27:23 +0800

13 Jul, 2010

1 commit

4bc2f18ba net/ipv4: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:54 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

27 Feb, 2009

1 commit

56bca31ff inet fragments: fix sparse warning: context imbalance ... Browse Code »

Impact: Attribute function with __releases(...)

Fix this sparse warning:
net/ipv4/inet_fragment.c:276:35: warning: context imbalance in 'inet_frag_find' - unexpected unlock

Signed-off-by: Hannes Eder
Signed-off-by: David S. Miller

Hannes Eder
2009-02-27 15:13:35 +0800

26 Jul, 2008

1 commit

547b792ca net: convert BUG_TRAP to generic WARN_ON ... Browse Code »

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2008-07-26 12:43:18 +0800

28 Jun, 2008

1 commit

9a375803f inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild ... Browse Code »

The problem is that while we work w/o the inet_frags.lock even
read-locked the secret rebuild timer may occur (on another CPU, since
BHs are still disabled in the inet_frag_find) and change the rnd seed
for ipv4/6 fragments.

It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244
([INET]: Omit double hash calculations in xxx_frag_intern) late
in the 2.6.24 kernel, so this should probably be queued to -stable.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-06-28 11:06:08 +0800

03 Apr, 2008

1 commit

e1ec1b8cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/s2io.c

David S. Miller
2008-04-03 13:35:23 +0800

29 Mar, 2008

2 commits

e8e16b706 [INET]: inet_frag_evictor() must run with BH disabled ... Browse Code »

Based upon a lockdep trace from Dave Jones.

Signed-off-by: David S. Miller

David S. Miller
2008-03-29 08:30:18 +0800
bc578a54f [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_* ... Browse Code »

On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.

Done for include/net and net

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2008-03-29 07:35:27 +0800

29 Jan, 2008

9 commits

81566e832 [NETNS][FRAGS]: Make the pernet subsystem for fragments. ... Browse Code »

On namespace start we mainly prepare the ctl variables.

When the namespace is stopped we have to kill all the fragments that
point to this namespace. The inet_frags_exit_net() handles it.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:40 +0800
3140c25c8 [NETNS][FRAGS]: Make the LRU list per namespace. ... Browse Code »

The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops in evictor.

The spinlock is not per-namespace because it protects the
hash table as well, which is global.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:39 +0800
3b4bc4a2b [NETNS][FRAGS]: Isolate the secret interval from namespaces. ... Browse Code »

Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.

The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:39 +0800
e31e0bdc7 [NETNS][FRAGS]: Make thresholds work in namespaces. ... Browse Code »

This is the same as with the timeout variable.

Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:38 +0800
b2fd5321d [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces. ... Browse Code »

Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.

Now fragment, that live in different namespaces can
live for different times.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:37 +0800
6ddc08222 [NETNS][FRAGS]: Make the mem counter per-namespace. ... Browse Code »

This is also simple, but introduces more changes, since
then mem counter is altered in more places.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:36 +0800
e5a2bb842 [NETNS][FRAGS]: Make the nqueues counter per-namespace. ... Browse Code »

This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:35 +0800
ac18e7509 [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces. ... Browse Code »

Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.

So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.

The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:34 +0800
b24b8a247 [NET]: Convert init_timer into setup_timer ... Browse Code »

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:53:35 +0800

18 Oct, 2007

5 commits

c95477090 [INET]: Consolidate frag queues freeing ... Browse Code »

Since we now allocate the queues in inet_fragment.c, we
can safely free it in the same place. The ->destructor
callback thus becomes optional for inet_frags.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-18 10:48:26 +0800
48d600563 [INET]: Remove no longer needed ->equal callback ... Browse Code »

Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the
argument, used to create one.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-18 10:47:56 +0800
abd6523d1 [INET]: Consolidate xxx_find() in fragment management ... Browse Code »

Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used
is the same as the creation argument for inet_frag_create.

Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.

Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-18 10:47:21 +0800
c6fda2822 [INET]: Consolidate xxx_frag_create() ... Browse Code »

This one uses the xxx_frag_intern() and xxx_frag_alloc()
routines, which are already consolidated, so remove them
from protocol code (as promised).

The ->constructor callback is used to init the rest of
the frag queue and it is the same for netfilter and ipv6.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-18 10:46:47 +0800
e521db9d7 [INET]: Consolidate xxx_frag_alloc() ... Browse Code »

Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.

The inet_frag_alloc() may return NULL, so check the
return value before doing the container_of(). This looks
ugly, but the xxx_frag_alloc() will be removed soon.

The xxx_expire() timer callbacks are patches,
because the argument is now the inet_frag_queue, not
the protocol specific queue.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-18 10:45:23 +0800