Eric Lee / smarc-fsl-linux-kernel

20 Sep, 2018

2 commits

990204ddc inet: frags: break the 2GB limit for frags storage ... Browse Code »

Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds 3317150 0.0
IpReasmFails 3317112 0.0

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
(cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-09-20 04:43:46 +0800
9aee41eff inet: frags: use rhashtables for reassembly units ... Browse Code »

Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.

It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.

This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.

Then there is the problem of sharing this hash table for all netns.

It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.

Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.

Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.

After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)

$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608

A followup patch will change the limits for 64bit arches.

Signed-off-by: Eric Dumazet
Cc: Kirill Tkhai
Cc: Herbert Xu
Cc: Florian Westphal
Cc: Jesper Dangaard Brouer
Cc: Alexander Aring
Cc: Stefan Schmidt
Signed-off-by: David S. Miller
(cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-09-20 04:43:46 +0800

12 Jun, 2018

1 commit

be1f1827a netdev-FAQ: clarify DaveM's position for stable backports ... Browse Code »

[ Upstream commit 75d4e704fa8d2cf33ff295e5b441317603d7f9fd ]

Per discussion with David at netconf 2018, let's clarify
DaveM's position of handling stable backports in netdev-FAQ.

This is important for people relying on upstream -stable
releases.

Cc: Greg Kroah-Hartman
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-06-12 04:49:19 +0800

09 Mar, 2018

1 commit

ff01f118d doc: Change the min default value of tcp_wmem/tcp_rmem. ... Browse Code »

[ Upstream commit a61a86f8db92923a2a4c857c49a795bcae754497 ]

The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. And the
tcp_wmem/tcp_rmem min default values are 4096.

Fixes: bd68a2a854ad ("net: set SK_MEM_QUANTUM to 4096")
Cc: Eric Dumazet
Signed-off-by: Tonghao Zhang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Tonghao Zhang
2018-03-09 14:41:13 +0800

08 Oct, 2017

1 commit

00a534e5e doc: Fix typo "8023.ad" in bonding documentation ... Browse Code »

Should be "802.3ad" like everywhere else in the document.

Signed-off-by: Axel Beckert
Signed-off-by: David S. Miller

Axel Beckert
2017-10-08 06:19:13 +0800

23 Sep, 2017

2 commits

71aa60f67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix NAPI poll list corruption in enic driver, from Christian
Lamparter.

2) Fix route use after free, from Eric Dumazet.

3) Fix regression in reuseaddr handling, from Josef Bacik.

4) Assert the size of control messages in compat handling since we copy
it in from userspace twice. From Meng Xu.

5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
from Ursula Braun.

6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.

7) Don't use ARRAY_SIZE on spinlock array which might have zero
entries, from Geert Uytterhoeven.

8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov
Kasiviswanathan.

9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin
Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
inet: fix improper empty comparison
net: use inet6_rcv_saddr to compare sockets
net: set tb->fast_sk_family
net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
MAINTAINERS: update git tree locations for ieee802154 subsystem
net: prevent dst uses after free
net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
net/smc: no close wait in case of process shut down
net/smc: introduce a delay
net/smc: terminate link group if out-of-sync is received
net/smc: longer delay for client link group removal
net/smc: adapt send request completion notification
net/smc: adjust net_device refcount
net/smc: take RCU read lock for routing cache lookup
net/smc: add receive timeout check
net/smc: add missing dev_put
net: stmmac: Cocci spatch "of_table"
lan78xx: Use default values loaded from EEPROM/OTP after reset
lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
lan78xx: Fix for eeprom read/write when device auto suspend
...

Linus Torvalds
2017-09-23 23:41:27 +0800
c0a3a64e7 Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp updates from Kees Cook:
"Major additions:

- sysctl and seccomp operation to discover available actions
(tyhicks)

- new per-filter configurable logging infrastructure and sysctl
(tyhicks)

- SECCOMP_RET_LOG to log allowed syscalls (tyhicks)

- SECCOMP_RET_KILL_PROCESS as the new strictest possible action

- self-tests for new behaviors"

[ This is the seccomp part of the security pull request during the merge
window that was nixed due to unrelated problems - Linus ]

* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples: Unrename SECCOMP_RET_KILL
selftests/seccomp: Test thread vs process killing
seccomp: Implement SECCOMP_RET_KILL_PROCESS action
seccomp: Introduce SECCOMP_RET_KILL_PROCESS
seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
seccomp: Action to log before allowing
seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
seccomp: Selftest for detection of filter flag support
seccomp: Sysctl to configure actions that are allowed to be logged
seccomp: Operation for checking if an action is available
seccomp: Sysctl to display available actions
seccomp: Provide matching filter for introspection
selftests/seccomp: Refactor RET_ERRNO tests
selftests/seccomp: Add simple seccomp overhead benchmark
selftests/seccomp: Add tests for basic ptrace actions

Linus Torvalds
2017-09-23 10:16:41 +0800

20 Sep, 2017

1 commit

35e015e1f ipv6: fix net.ipv6.conf.all interface DAD handlers ... Browse Code »

Currently, writing into
net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect.
Fix handling of these flags by:

- using the maximum of global and per-interface values for the
accept_dad flag. That is, if at least one of the two values is
non-zero, enable DAD on the interface. If at least one value is
set to 2, enable DAD and disable IPv6 operation on the interface if
MAC-based link-local address was found

- using the logical OR of global and per-interface values for the
optimistic_dad flag. If at least one of them is set to one, optimistic
duplicate address detection (RFC 4429) is enabled on the interface

- using the logical OR of global and per-interface values for the
use_optimistic flag. If at least one of them is set to one,
optimistic addresses won't be marked as deprecated during source address
selection on the interface.

While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(),
drop inline, and let the compiler decide.

Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates")
Signed-off-by: Matteo Croce
Signed-off-by: David S. Miller

Matteo Croce
2017-09-20 07:44:02 +0800

19 Sep, 2017

1 commit

51513748d Documentation: networking: fix ASCII art in switchdev.txt ... Browse Code »

Fix ASCII art in Documentation/networking/switchdev.txt:

Change non-ASCII "spaces" to ASCII spaces.

Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
characters below):

line 32:
+--+----+----+----+-*--+----+---+ +-----+-----+
line 41:
+--------------+---*------------+

Signed-off-by: Randy Dunlap
Acked-by: Pavel Machek
Reviewed-by: Andrew Lunn
Signed-off-by: David S. Miller

Randy Dunlap
2017-09-19 07:38:46 +0800

17 Sep, 2017

1 commit

2130c0281 Documentation: link in networking docs ... Browse Code »

Fix link in filter.txt.

Acked-by: Pavel Machek

Signed-off-by: David S. Miller

Pavel Machek
2017-09-17 00:12:48 +0800

07 Sep, 2017

1 commit

aae3dbb47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.

2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.

3) Allow generic XDP to work on virtual devices, from John Fastabend.

4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.

5) Remove UFO offloads from the tree, gave us little other than bugs.

6) Remove the IPSEC flow cache, from Florian Westphal.

7) Support ipv6 route offload in mlxsw driver.

8) Support VF representors in bnxt_en, from Sathya Perla.

9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.

10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.

11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.

12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.

13) Add new jump instructions to eBPF, from Daniel Borkmann.

14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.

15) Support XDP in tap driver, from Jason Wang.

16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

17) Add Huawei hinic ethernet driver.

18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...

Linus Torvalds
2017-09-07 05:45:08 +0800

04 Sep, 2017

2 commits

81a84ad3c Merge branch 'docs-next' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"After a fair amount of churn in the last couple of cycles, docs are
taking it easier this time around. Lots of fixes and some new
documentation, but nothing all that radical. Perhaps the most
interesting change for many is the scripts/sphinx-pre-install tool
from Mauro; it will tell you exactly which packages you need to
install to get a working docs toolchain on your system.

There are two little patches reaching outside of Documentation/; both
just tweak kerneldoc comments to eliminate warnings and fix some
dangling doc pointers"

* 'docs-next' of git://git.lwn.net/linux: (52 commits)
Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale
genalloc: Fix an incorrect kerneldoc comment
doc: Add documentation for the genalloc subsystem
assoc_array: fix path to assoc_array documentation
kernel-doc parser mishandles declarations split into lines
docs: ReSTify table of contents in core.rst
docs: process: drop git snapshots from applying-patches.rst
Documentation:input: fix typo
swap: Remove obsolete sentence
sphinx.rst: Allow Sphinx version 1.6 at the docs
docs-rst: fix verbatim font size on tables
Documentation: stable-kernel-rules: fix broken git urls
rtmutex: update rt-mutex
rtmutex: update rt-mutex-design
docs: fix minimal sphinx version in conf.py
docs: fix nested numbering in the TOC
NVMEM documentation fix: A minor typo
docs-rst: pdf: use same vertical margin on all Sphinx versions
doc: Makefile: if sphinx is not found, run a check script
docs: Fix paths in security/keys
...

Linus Torvalds
2017-09-04 12:07:29 +0800
b63f6044d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:

1) Add expection to hashes after timer initialization to prevent
access from another CPU that walks on the hashes and calls
del_timer(), from Florian Westphal.

2) Don't update nf_tables chain counters from hot path, this is only
used by the x_tables compatibility layer.

3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
Hooks are always guaranteed to run from rcu read side, so remove
nested rcu_read_lock() where possible. Patch from Taehee Yoo.

4) nf_tables new ruleset generation notifications include PID and name
of the process that has updated the ruleset, from Phil Sutter.

5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
the nf_family netdev family. Patch from Pablo M. Bermudo.

6) Add support for nft_fib in nf_tables netdev family, also from Pablo.

7) Use deferrable workqueue for conntrack garbage collection, to reduce
power consumption, from Patch from Subash Abhinov Kasiviswanathan.

8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
Westphal.

9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.

10) Drop references on conntrack removal path when skbuffs has escaped via
nfqueue, from Florian.

11) Don't queue packets to nfqueue with dying conntrack, from Florian.

12) Constify nf_hook_ops structure, from Florian.

13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.

14) Add nla_strdup(), from Phil Sutter.

15) Rise nf_tables objects name size up to 255 chars, people want to use
DNS names, so increase this according to what RFC 1035 specifies.
Patch series from Phil Sutter.

16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
registration on demand, suggested by Eric Dumazet, patch from Florian.

17) Remove unused variables in compat_copy_entry_from_user both in
ip_tables and arp_tables code. Patch from Taehee Yoo.

18) Constify struct nf_conntrack_l4proto, from Julia Lawall.

19) Constify nf_loginfo structure, also from Julia.

20) Use a single rb root in connlimit, from Taehee Yoo.

21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.

22) Use audit_log() instead of open-coding it, from Geliang Tang.

23) Allow to mangle tcp options via nft_exthdr, from Florian.

24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
a fix for a miscalculation of the minimal length.

25) Simplify branch logic in h323 helper, from Nick Desaulniers.

26) Calculate netlink attribute size for conntrack tuple at compile
time, from Florian.

27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
From Florian.

28) Remove holes in nf_conntrack_l4proto structure, so it becomes
smaller. From Florian.

29) Get rid of print_tuple() indirection for /proc conntrack listing.
Place all the code in net/netfilter/nf_conntrack_standalone.c.
Patch from Florian.

30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
off. From Florian.

31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
Florian.

32) Fix broken indentation in ebtables extensions, from Colin Ian King.

33) Fix several harmless sparse warning, from Florian.

34) Convert netfilter hook infrastructure to use array for better memory
locality, joint work done by Florian and Aaron Conole. Moreover, add
some instrumentation to debug this.

35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
per batch, from Florian.

36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.

37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.

38) Remove unused code in the generic protocol tracker, from Davide
Caratti.

I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-09-04 08:08:42 +0800

02 Sep, 2017

1 commit

cc8889ae8 doc: document MSG_ZEROCOPY ... Browse Code »

Documentation for this feature was missing from the patchset.
Copied a lot from the netdev 2.1 paper, addressing some small
interface changes since then.

Changes
v1 -> v2
- change email discussion URL format
- clarify that u32 counter is per-syscall, unsigned and
wraps after UINT_MAX calls
- describe errno on send failure specific to MSG_ZEROCOPY
- a few very minor rewordings

Signed-off-by: Willem de Bruijn
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Willem de Bruijn
2017-09-02 01:39:35 +0800

31 Aug, 2017

2 commits

d35d6e92c hv_netvsc: Fix typos in the document of UDP hashing ... Browse Code »

There are two typos in the document, netvsc.txt,
regarding UDP hashing level. This patch fixes them.

Signed-off-by: Haiyang Zhang
Signed-off-by: David S. Miller

Haiyang Zhang
2017-08-31 07:04:44 +0800
ceed73a2c drivers: net: ethernet: qualcomm: rmnet: Initial implementation ... Browse Code »

RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.

Signed-off-by: Subash Abhinov Kasiviswanathan
Signed-off-by: David S. Miller

Subash Abhinov Kasiviswanathan
2017-08-31 02:41:13 +0800

30 Aug, 2017

2 commits

eaa72dc47 neigh: increase queue_len_bytes to match wmem_default ... Browse Code »

Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.

Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).

Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.

Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.

Signed-off-by: Eric Dumazet
Reported-by: Florian Fainelli
Signed-off-by: David S. Miller

Eric Dumazet
2017-08-30 07:10:50 +0800
c96558403 Documentation: networking: Add blurb about patches in patchwork ... Browse Code »

Explain that the patch queue in patchwork should not be touched by patch
submitters.

Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller

Florian Fainelli
2017-08-30 06:12:34 +0800

29 Aug, 2017

3 commits

c038a58cc rxrpc: Allow failed client calls to be retried ... Browse Code »

Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.

Two new functions are provided:

(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.

(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.

A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.

Signed-off-by: David Howells

David Howells
2017-08-29 17:55:20 +0800
e833251ad rxrpc: Add notification of end-of-Tx phase ... Browse Code »

Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.

This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.

Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.

Signed-off-by: David Howells

David Howells
2017-08-29 17:55:20 +0800
065919163 Documentation: networking: add RSS information ... Browse Code »

Signed-off-by: Madalin Bucur
Signed-off-by: David S. Miller

Madalin Bucur
2017-08-29 07:41:01 +0800

25 Aug, 2017

2 commits

3fd871270 strparser: initialize all callbacks ... Browse Code »

commit bbb03029a899 ("strparser: Generalize strparser") added more
function pointers to 'struct strp_callbacks'; however, kcm_attach() was
not updated to initialize them. This could cause the ->lock() and/or
->unlock() function pointers to be set to garbage values, causing a
crash in strp_work().

Fix the bug by moving the callback structs into static memory, so
unspecified members are zeroed. Also constify them while we're at it.

This bug was found by syzkaller, which encountered the following splat:

IP: 0x55
PGD 3b1ca067
P4D 3b1ca067
PUD 3b12f067
PMD 0

Oops: 0010 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: kstrp strp_work
task: ffff88006bb0e480 task.stack: ffff88006bb10000
RIP: 0010:0x55
RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
FS: 0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
worker_thread+0x223/0x1860 kernel/workqueue.c:2233
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Code: Bad RIP value.
RIP: 0x55 RSP: ffff88006bb17540
CR2: 0000000000000055
---[ end trace f0e4920047069cee ]---

Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
CONFIG_AF_KCM=y):

#include
#include
#include
#include
#include
#include
#include
#include

static const struct bpf_insn bpf_insns[3] = {
{ .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
{ .code = 0x95 }, /* BPF_EXIT_INSN() */
};

static const union bpf_attr bpf_attr = {
.prog_type = 1,
.insn_cnt = 2,
.insns = (uintptr_t)&bpf_insns,
.license = (uintptr_t)"",
};

int main(void)
{
int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
&bpf_attr, sizeof(bpf_attr));
int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);

ioctl(kcm_fd, SIOCKCMATTACH,
&(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
}

Fixes: bbb03029a899 ("strparser: Generalize strparser")
Cc: Dmitry Vyukov
Cc: Tom Herbert
Signed-off-by: Eric Biggers
Signed-off-by: David S. Miller

Eric Biggers
2017-08-25 12:57:50 +0800
22b6722bf ipv6: Add sysctl for per namespace flow label reflection ... Browse Code »

Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.

Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.

[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01

Signed-off-by: Jakub Sitnicki
Signed-off-by: David S. Miller

Jakub Sitnicki
2017-08-25 09:05:43 +0800

24 Aug, 2017

1 commit

d2aaa3dc4 bpf, doc: Add arm32 as arch supporting eBPF JIT ... Browse Code »

As eBPF JIT support for arm32 was added recently with
commit 39c13c204bb1150d401e27d41a9d8b332be47c49, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.

Signed-off-by: Shubham Bansal
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Shubham Bansal
2017-08-24 13:40:12 +0800

23 Aug, 2017

2 commits

3b0c34580 hv_netvsc: Update netvsc Document for UDP hash level setting ... Browse Code »

Update Documentation/networking/netvsc.txt for UDP hash level setting
and related info.

Signed-off-by: Haiyang Zhang
Signed-off-by: David S. Miller

Haiyang Zhang
2017-08-23 05:08:12 +0800
51ba902a1 net-next/hinic: Initialize hw interface ... Browse Code »

Initialize hw interface as part of the nic initialization for accessing hw.

Signed-off-by: Aviad Krawczyk
Signed-off-by: Zhao Chen
Signed-off-by: David S. Miller

Aviad Krawczyk
2017-08-23 01:48:52 +0800

22 Aug, 2017

1 commit

e2a7c34fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2017-08-22 08:06:42 +0800

21 Aug, 2017

1 commit

5a7844981 switchdev: documentation: minor typo fixes ... Browse Code »

Two typos in switchdev.txt

Signed-off-by: Chris Packham
Signed-off-by: David S. Miller

Chris Packham
2017-08-21 10:49:10 +0800

15 Aug, 2017

1 commit

fd76875ca seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD ... Browse Code »

In preparation for adding SECCOMP_RET_KILL_PROCESS, rename SECCOMP_RET_KILL
to the more accurate SECCOMP_RET_KILL_THREAD.

The existing selftest values are intentionally left as SECCOMP_RET_KILL
just to be sure we're exercising the alias.

Signed-off-by: Kees Cook

Kees Cook
2017-08-15 04:46:48 +0800

11 Aug, 2017

1 commit

8ac5ac1b0 doc: linux-wpan: Change the old function names to the lastest function names ... Browse Code »

The function declaration in the lastest include/net/mac802154.h has been
changed since v3.19.

ieee802154_alloc_device => ieee802154_alloc_hw
ieee802154_free_device => ieee802154_free_hw
ieee802154_register_device => ieee802154_register_hw
ieee802154_unregister_device => ieee802154_unregister_hw

However, the description in the Device drivers API section of
Documentation/networking/ieee802154.txt is still in the state of
v3.18.63.

Signed-off-by: Jian-Hong Pan
Acked-by: Stefan Schmidt
Signed-off-by: Jonathan Corbet

Jian-Hong Pan
2017-08-11 05:03:18 +0800

10 Aug, 2017

1 commit

92b31a9af bpf: add BPF_J{LT,LE,SLT,SLE} instructions ... Browse Code »

Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.

This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.

Thus, add BPF_JLT (
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2017-08-10 07:53:56 +0800

09 Aug, 2017

1 commit

0cbf47416 Documentation: describe the new eBPF verifier value tracking behaviour ... Browse Code »

Also bring the eBPF documentation up to date in other ways.

Signed-off-by: Edward Cree
Signed-off-by: David S. Miller

Edward Cree
2017-08-09 08:51:35 +0800

04 Aug, 2017

1 commit

f6c00a3bb Merge tag 'batadv-next-for-davem-20170802' of git://git.open-mesh.org/linux-merge ... Browse Code »

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- Remove unnecessary length qualifier, by Joe Perches

- Remove too short %pM field width, by Sven Eckelmann

- Remove return value handling from skb_put_data, by Sven Eckelmann

- Spelling fixes, by Colin Ian King

- Convert batman-adv.txt to reStructuredText, by Sven Eckelmann
====================

Signed-off-by: David S. Miller

David S. Miller
2017-08-04 00:24:06 +0800

03 Aug, 2017

1 commit

a5050c610 netvsc: add documentation ... Browse Code »

Add some background documentation on netvsc device options
and limitations.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2017-08-03 07:55:33 +0800

02 Aug, 2017

1 commit

bbb03029a strparser: Generalize strparser ... Browse Code »

Generalize strparser from more than just being used in conjunction
with read_sock. strparser will also be used in the send path with
zero proxy. The primary change is to create strp_process function
that performs the critical processing on skbs. The documentation
is also updated to reflect the new uses.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2017-08-02 06:26:19 +0800

01 Aug, 2017

2 commits

b6690b143 tcp: remove low_latency sysctl ... Browse Code »

Was only checked by the removed prequeue code.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2017-08-01 05:37:49 +0800
4d3a57f23 netfilter: conntrack: do not enable connection tracking unless needed ... Browse Code »

Discussion during NFWS 2017 in Faro has shown that the current
conntrack behaviour is unreasonable.

Even if conntrack module is loaded on behalf of a single net namespace,
its turned on for all namespaces, which is expensive. Commit
481fa373476 ("netfilter: conntrack: add nf_conntrack_default_on sysctl")
attempted to provide an alternative to the 'default on' behaviour by
adding a sysctl to change it.

However, as Eric points out, the sysctl only becomes available
once the module is loaded, and then its too late.

So we either have to move the sysctl to the core, or, alternatively,
change conntrack to become active only once the rule set requires this.

This does the latter, conntrack is only enabled when a rule needs it.

Reported-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2017-08-01 02:42:00 +0800

29 Jul, 2017

1 commit

e45eba246 batman-adv: Convert batman-adv.txt to reStructuredText ... Browse Code »

Converting the freeform text to parsable reStructuredText, allows the
integration in the sphinx based documentation system of the kernel. It will
therefore be accessible as hypertext under
https://www.kernel.org/doc/html/latest/

Signed-off-by: Sven Eckelmann
Signed-off-by: Simon Wunderlich

Sven Eckelmann
2017-07-29 15:51:28 +0800

19 Jul, 2017

1 commit

3c2a89ddc net: xfrm: revert to lower xfrm dst gc limit ... Browse Code »

revert c386578f1cdb4dac230395 ("xfrm: Let the flowcache handle its size by default.").

Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2017-07-19 02:13:41 +0800

12 Jul, 2017

1 commit

5e34fa23c net: Fix minor code bug in timestamping.txt ... Browse Code »

Passing (void*)val instead of &val would make a pointer out of an integer
and cause sock_setsockopt to -EFAULT.

See tools/testing/selftests/networking/timestamping/timestamping.c
for a working example.

Cc: David S. Miller
Cc: netdev@vger.kernel.org
Signed-off-by: Ahmad Fatoum
Signed-off-by: David S. Miller

Ahmad Fatoum
2017-07-12 04:34:54 +0800