Eric Lee / smarc-fsl-linux-kernel

16 Jun, 2015

1 commit

3fd22af80 sock_diag: specify info_size per inet protocol ... Browse Code »

Previously, there was no clear distinction between the inet protocols
that used struct tcp_info to report information and those that didn't.
This change adds a specific size attribute to the inet_diag_handler
struct which defines these interfaces. This will make dispatching
sock_diag get_info requests identical for all inet protocols in a
following patch.

Tested: ss -au
Tested: ss -at
Signed-off-by: Craig Gallek
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Craig Gallek
2015-06-16 10:49:22 +0800

24 Apr, 2015

1 commit

b357a364c inet: fix possible panic in reqsk_queue_unlink() ... Browse Code »

[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
0000000000000080
[ 3897.931025] IP: [] reqsk_timer_handler+0x1a6/0x243

There is a race when reqsk_timer_handler() and tcp_check_req() call
inet_csk_reqsk_queue_unlink() on the same req at the same time.

Before commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener
timer"), listener spinlock was held and race could not happen.

To solve this bug, we change reqsk_queue_unlink() to not assume req
must be found, and we return a status, to conditionally release a
refcount on the request sock.

This also means tcp_check_req() in non fastopen case might or not
consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
to properly handle this.

(Same remark for dccp_check_req() and its callers)

inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
called 4 times in tcp and 3 times in dccp.

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet
Reported-by: Yuchung Cheng
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-24 23:39:15 +0800

14 Apr, 2015

1 commit

789f558cf tcp/dccp: get rid of central timewait timer ... Browse Code »

Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-14 04:40:05 +0800

24 Mar, 2015

4 commits

c69736696 inet: fix double request socket freeing ... Browse Code »

Eric Hugne reported following error :

I'm hitting this warning on latest net-next when i try to SSH into a machine
with eth0 added to a bridge (but i think the problem is older than that)

Steps to reproduce:
node2 ~ # brctl addif br0 eth0
[ 223.758785] device eth0 entered promiscuous mode
node2 ~ # ip link set br0 up
[ 244.503614] br0: port 1(eth0) entered forwarding state
[ 244.505108] br0: port 1(eth0) entered forwarding state
node2 ~ # [ 251.160159] ------------[ cut here ]------------
[ 251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
[ 251.162077] Modules linked in:
[ 251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
[ 251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 251.164078] ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
[ 251.165084] 0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
[ 251.166195] ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
[ 251.167203] Call Trace:
[ 251.167533] [] dump_stack+0x45/0x57
[ 251.168206] [] warn_slowpath_common+0x85/0xc0
[ 251.169239] [] warn_slowpath_null+0x15/0x20
[ 251.170271] [] tcp_v4_err+0x6b1/0x720
[ 251.171408] [] ? _raw_read_lock_irq+0x3/0x10
[ 251.172589] [] ? inet_del_offload+0x40/0x40
[ 251.173366] [] icmp_socket_deliver+0x65/0xb0
[ 251.174134] [] icmp_unreach+0xc2/0x280
[ 251.174820] [] icmp_rcv+0x2bd/0x3a0
[ 251.175473] [] ip_local_deliver_finish+0x82/0x1e0
[ 251.176282] [] ip_local_deliver+0x88/0x90
[ 251.177004] [] ip_rcv_finish+0xf0/0x310
[ 251.177693] [] ip_rcv+0x2dc/0x390
[ 251.178336] [] __netif_receive_skb_core+0x713/0xa20
[ 251.179170] [] __netif_receive_skb+0x1a/0x80
[ 251.179922] [] process_backlog+0x94/0x120
[ 251.180639] [] net_rx_action+0x1e2/0x310
[ 251.181356] [] __do_softirq+0xa7/0x290
[ 251.182046] [] run_ksoftirqd+0x19/0x30
[ 251.182726] [] smpboot_thread_fn+0x153/0x1d0
[ 251.183485] [] ? SyS_setgroups+0x130/0x130
[ 251.184228] [] kthread+0xee/0x110
[ 251.184871] [] ? kthread_create_on_node+0x1b0/0x1b0
[ 251.185690] [] ret_from_fork+0x58/0x90
[ 251.186385] [] ? kthread_create_on_node+0x1b0/0x1b0
[ 251.187216] ---[ end trace c947fc7b24e42ea1 ]---
[ 259.542268] br0: port 1(eth0) entered forwarding state

Remove the double calls to reqsk_put()

[edumazet] :

I got confused because reqsk_timer_handler() _has_ to call
reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
the timer handler holds a reference on req.

Signed-off-by: Fan Du
Signed-off-by: Eric Dumazet
Reported-by: Erik Hugne
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: David S. Miller

Fan Du
2015-03-24 09:40:48 +0800
52036a430 ipv6: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets ... Browse Code »

dccp_v6_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-24 04:52:26 +0800
85645bab5 ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets ... Browse Code »

dccp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-24 04:52:26 +0800
42cb80a23 inet: remove sk_listener parameter from syn_ack_timeout() ... Browse Code »

It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-24 04:52:25 +0800

21 Mar, 2015

2 commits

fa76ce732 inet: get rid of central tcp/dccp listener timer ... Browse Code »

One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-21 00:40:25 +0800
52452c542 inet: drop prev pointer handling in request sock ... Browse Code »

When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-21 00:40:25 +0800

19 Mar, 2015

3 commits

08d2cc3b2 inet: request sock should init IPv6/IPv4 addresses ... Browse Code »

In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-19 10:00:35 +0800
77a6a471b ipv6: get rid of __inet6_hash() ... Browse Code »

We can now use inet_hash() and __inet_hash() instead of private
functions.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-19 10:00:35 +0800
d1e559d0b inet: add IPv6 support to sk_ehashfn() ... Browse Code »

Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-19 10:00:34 +0800

18 Mar, 2015

1 commit

407640de2 inet: add sk_listener argument to inet_reqsk_alloc() ... Browse Code »

listener socket can be used to set net pointer, and will
be later used to hold a reference on listener.

Add a const qualifier to first argument (struct request_sock_ops *),
and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-18 10:01:55 +0800

15 Mar, 2015

1 commit

16f86165b inet: fill request sock ir_iif for IPv4 ... Browse Code »

Once request socks will be in ehash table, they will need to have
a valid ir_iff field.

This is currently true only for IPv6. This patch extends support
for IPv4 as well.

This means inet_diag_fill_req() can now properly use ir_iif,
which is better for IPv6 link locals anyway, as request sockets
and established sockets will propagate consistent netlink idiag_if.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-15 03:05:10 +0800

13 Mar, 2015

2 commits

3f66b083a inet: introduce ireq_family ... Browse Code »

Before inserting request socks into general hash table,
fill their socket family.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-13 10:58:13 +0800
bd337c581 ipv6: add missing ireq_net & ir_cookie initializations ... Browse Code »

I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()

Signed-off-by: Eric Dumazet
Fixes: 33cf7c90fe2f ("net: add real socket cookies")
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-13 10:58:12 +0800

12 Mar, 2015

2 commits

d77c555d3 net: fix CONFIG_NET_NS=n compilation ... Browse Code »

I forgot to use write_pnet() in three locations.

Signed-off-by: Eric Dumazet
Fixes: 33cf7c90fe2f9 ("net: add real socket cookies")
Reported-by: kbuild test robot
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-12 11:28:49 +0800
33cf7c90f net: add real socket cookies ... Browse Code »

A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
no guarantee a cookie is used once and identify
a flow.

3) request sock, establish sock, and timewait socks
for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet
Cc: Eric Salo
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-12 09:55:28 +0800

11 Mar, 2015

1 commit

34160ea3f inet_diag: add const to inet_diag_req_v2 ... Browse Code »

diag dumpers should not modify the request.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-11 01:45:28 +0800

03 Mar, 2015

1 commit

1b7841404 net: Remove iocb argument from sendmsg and recvmsg ... Browse Code »

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig
Suggested-by: Al Viro
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-03-03 02:06:31 +0800

11 Dec, 2014

1 commit

f95b414ed net: introduce helper macro for_each_cmsghdr ... Browse Code »

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.

Signed-off-by: Gu Zheng
Signed-off-by: David S. Miller

Gu Zheng
2014-12-11 11:41:55 +0800

24 Nov, 2014

1 commit

6ce8e9ce5 new helper: memcpy_from_msg() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-24 17:28:48 +0800

19 Nov, 2014

4 commits

a77b63436 dccp: spelling s/reseting/resetting ... Browse Code »

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-11-19 04:26:32 +0800
02c31d2e5 dccp: replace min/casting by min_t ... Browse Code »

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-11-19 04:26:32 +0800
54da7996b dccp: remove blank lines between function/EXPORT_SYMBOL ... Browse Code »

See Documentation/CodingStyle chapter 6.

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-11-19 04:26:32 +0800
6d80c4732 dccp: kerneldoc warning fixes ... Browse Code »

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-11-19 04:26:31 +0800

09 Nov, 2014

1 commit

c0560b9c5 dccp: Convert DCCP_WARN to net_warn_ratelimited ... Browse Code »

Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.

Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.

This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2014-11-09 10:22:54 +0800

06 Nov, 2014

1 commit

51f3d02b9 net: Add and use skb_copy_datagram_msg() helper. ... Browse Code »

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller

David S. Miller
2014-11-06 05:46:40 +0800

19 Oct, 2014

1 commit

2e923b025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Include fixes for netrom and dsa (Fabian Frederick and Florian
Fainelli)

2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
ip_tunnel, fou), from Li ROngQing.

4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

5) Use after free in virtio_net, from Michael S Tsirkin.

6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
Shelar.

7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
the side effect of adjusting the socket memory usage. From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
bna: fix skb->truesize underestimation
net: dsa: add includes for ethtool and phy_fixed definitions
openvswitch: Set flow-key members.
netrom: use linux/uaccess.h
dsa: Fix conversion from host device to mii bus
tipc: fix bug in bundled buffer reception
ipv6: introduce tcp_v6_iif()
sfc: add support for skb->xmit_more
r8152: return -EBUSY for runtime suspend
ipv4: fix a potential use after free in fou.c
ipv4: fix a potential use after free in ip_tunnel_core.c
hyperv: Add handling of IP header with option field in netvsc_set_hash()
openvswitch: Create right mask with disabled megaflows
vxlan: fix a free after use
openvswitch: fix a use after free
ipv4: dst_entry leak in ip_send_unicast_reply()
ipv4: clean up cookie_v4_check()
ipv4: share tcp_v4_save_options() with cookie_v4_check()
ipv4: call __ip_options_echo() in cookie_v4_check()
atm: simplify lanai.c by using module_pci_driver
...

Linus Torvalds
2014-10-19 00:31:37 +0800

18 Oct, 2014

1 commit

870c31513 ipv6: introduce tcp_v6_iif() ... Browse Code »

Commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.

This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]

This patch adds tcp_v6_iif() helper and uses it where necessary.

Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.

Signed-off-by: Eric Dumazet
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang
Signed-off-by: David S. Miller

Eric Dumazet
2014-10-18 11:48:07 +0800

10 Oct, 2014

1 commit

c798360cd Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...

- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.

This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.

Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe0c
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.

- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.

- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.

There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...

Linus Torvalds
2014-10-10 19:26:02 +0800

09 Oct, 2014

1 commit

35a9ad8af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Most notable changes in here:

1) By far the biggest accomplishment, thanks to a large range of
contributors, is the addition of multi-send for transmit. This is
the result of discussions back in Chicago, and the hard work of
several individuals.

Now, when the ->ndo_start_xmit() method of a driver sees
skb->xmit_more as true, it can choose to defer the doorbell
telling the driver to start processing the new TX queue entires.

skb->xmit_more means that the generic networking is guaranteed to
call the driver immediately with another SKB to send.

There is logic added to the qdisc layer to dequeue multiple
packets at a time, and the handling mis-predicted offloads in
software is now done with no locks held.

Finally, pktgen is extended to have a "burst" parameter that can
be used to test a multi-send implementation.

Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
virtio_net

Adding support is almost trivial, so export more drivers to
support this optimization soon.

I want to thank, in no particular or implied order, Jesper
Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
David Tat, Hannes Frederic Sowa, and Rusty Russell.

2) PTP and timestamping support in bnx2x, from Michal Kalderon.

3) Allow adjusting the rx_copybreak threshold for a driver via
ethtool, and add rx_copybreak support to enic driver. From
Govindarajulu Varadarajan.

4) Significant enhancements to the generic PHY layer and the bcm7xxx
driver in particular (EEE support, auto power down, etc.) from
Florian Fainelli.

5) Allow raw buffers to be used for flow dissection, allowing drivers
to determine the optimal "linear pull" size for devices that DMA
into pools of pages. The objective is to get exactly the
necessary amount of headers into the linear SKB area pre-pulled,
but no more. The new interface drivers use is eth_get_headlen().
From WANG Cong, with driver conversions (several had their own
by-hand duplicated implementations) by Alexander Duyck and Eric
Dumazet.

6) Support checksumming more smoothly and efficiently for
encapsulations, and add "foo over UDP" facility. From Tom
Herbert.

7) Add Broadcom SF2 switch driver to DSA layer, from Florian
Fainelli.

8) eBPF now can load programs via a system call and has an extensive
testsuite. Alexei Starovoitov and Daniel Borkmann.

9) Major overhaul of the packet scheduler to use RCU in several major
areas such as the classifiers and rate estimators. From John
Fastabend.

10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
Duyck.

11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
Dumazet.

12) Add Datacenter TCP congestion control algorithm support, From
Florian Westphal.

13) Reorganize sk_buff so that __copy_skb_header() is significantly
faster. From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
netlabel: directly return netlbl_unlabel_genl_init()
net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
net: description of dma_cookie cause make xmldocs warning
cxgb4: clean up a type issue
cxgb4: potential shift wrapping bug
i40e: skb->xmit_more support
net: fs_enet: Add NAPI TX
net: fs_enet: Remove non NAPI RX
r8169:add support for RTL8168EP
net_sched: copy exts->type in tcf_exts_change()
wimax: convert printk to pr_foo()
af_unix: remove 0 assignment on static
ipv6: Do not warn for informational ICMP messages, regardless of type.
Update Intel Ethernet Driver maintainers list
bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
tipc: fix bug in multicast congestion handling
net: better IFF_XMIT_DST_RELEASE support
net/mlx4_en: remove NETDEV_TX_BUSY
3c59x: fix bad split of cpu_to_le32(pci_map_single())
net: bcmgenet: fix Tx ring priority programming
...

Linus Torvalds
2014-10-09 09:40:54 +0800

08 Oct, 2014

1 commit

d0cd84817 Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine ... Browse Code »

Pull dmaengine updates from Dan Williams:
"Even though this has fixes marked for -stable, given the size and the
needed conflict resolutions this is 3.18-rc1/merge-window material.

These patches have been languishing in my tree for a long while. The
fact that I do not have the time to do proper/prompt maintenance of
this tree is a primary factor in the decision to step down as
dmaengine maintainer. That and the fact that the bulk of drivers/dma/
activity is going through Vinod these days.

The net_dma removal has not been in -next. It has developed simple
conflicts against mainline and net-next (for-3.18).

Continuing thanks to Vinod for staying on top of drivers/dma/.

Summary:

1/ Step down as dmaengine maintainer see commit 08223d80df38
"dmaengine maintainer update"

2/ Removal of net_dma, as it has been marked 'broken' since 3.13
(commit 77873803363c "net_dma: mark broken"), without reports of
performance regression.

3/ Miscellaneous fixes"

* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
net: make tcp_cleanup_rbuf private
net_dma: revert 'copied_early'
net_dma: simple removal
dmaengine maintainer update
dmatest: prevent memory leakage on error path in thread
ioat: Use time_before_jiffies()
dmaengine: fix xor sources continuation
dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
drivers: dma: Include appropriate header file in dca.c
drivers: dma: Mark functions as static in dma_v3.c
dma: mv_xor: Add DMA API error checks
ioat/dca: Use dev_is_pci() to check whether it is pci device

Linus Torvalds
2014-10-08 08:39:25 +0800

02 Oct, 2014

2 commits

e500f488c net/dccp/ccid.c: add __init to ccid_activate ... Browse Code »

ccid_activate is only called by __init ccid_initialize_builtins in same module.

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-10-02 06:33:13 +0800
0c5b8a462 net/dccp/proto.c: add __init to dccp_mib_init ... Browse Code »

dccp_mib_init is only called by __init dccp_init in same module.

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-10-02 06:33:13 +0800

29 Sep, 2014

1 commit

a224772db ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted() ... Browse Code »

ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-09-29 04:35:43 +0800

28 Sep, 2014

1 commit

7bced3975 net_dma: simple removal ... Browse Code »

Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():

https://lkml.org/lkml/2014/9/3/177

Cc: Dave Jiang
Cc: Vinod Koul
Cc: David Whipple
Cc: Alexander Duyck
Cc:
Reported-by: Roman Gushchin
Acked-by: David S. Miller
Signed-off-by: Dan Williams

Dan Williams
2014-09-28 22:05:16 +0800

08 Sep, 2014

1 commit

908c7f194 percpu_counter: add @gfp to percpu_counter_init() ... Browse Code »

Percpu allocator now supports allocation mask. Add @gfp to
percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
with percpu_counters too.

We could have left percpu_counter_init() alone and added
percpu_counter_init_gfp(); however, the number of users isn't that
high and introducing _gfp variants to all percpu data structures would
be quite ugly, so let's just do the conversion. This is the one with
the most users. Other percpu data structures are a lot easier to
convert.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo
Acked-by: Jan Kara
Acked-by: "David S. Miller"
Cc: x86@kernel.org
Cc: Jens Axboe
Cc: "Theodore Ts'o"
Cc: Alexander Viro
Cc: Andrew Morton

Tejun Heo
2014-09-08 08:51:29 +0800

02 Jul, 2014

1 commit

9fe516ba3 inet: move ipv6only in sock_common ... Browse Code »

When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-07-02 14:46:21 +0800

28 Jun, 2014

1 commit

476eab825 net: remove inet6_reqsk_alloc ... Browse Code »

Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.

This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2014-06-28 06:53:35 +0800