Eric Lee / smarc-fsl-linux-kernel

14 Oct, 2013

5 commits

bdf831a68 net: net_secret should not depend on TCP ... Browse Code »

[ Upstream commit 9a3bab6b05383f1e4c3716b3615500c51285959e ]

A host might need net_secret[] and never open a single socket.

Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-10-14 07:08:30 +0800
e66bdd710 netpoll: fix NULL pointer dereference in netpoll_cleanup ... Browse Code »

[ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ]

I've been hitting a NULL ptr deref while using netconsole because the
np->dev check and the pointer manipulation in netpoll_cleanup are done
without rtnl and the following sequence happens when having a netconsole
over a vlan and we remove the vlan while disabling the netconsole:
CPU 1 CPU2
removes vlan and calls the notifier
enters store_enabled(), calls
netdev_cleanup which checks np->dev
and then waits for rtnl
executes the netconsole netdev
release notifier making np->dev
== NULL and releases rtnl
continues to dereference a member of
np->dev which at this point is == NULL

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Nikolay Aleksandrov
2013-10-14 07:08:29 +0800
0885f6b8b netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_reply ... Browse Code »

[ Upstream commit b0dd663b60944a3ce86430fa35549fb37968bda0 ]

The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP.

[ Bug introduced by commit b7394d2429c198b1da3d46ac39192e891029ec0f
("netpoll: prepare for ipv6") ]

Signed-off-by: Sonic Zhang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sonic Zhang
2013-10-14 07:08:29 +0800
98913d075 net: flow_dissector: fix thoff for IPPROTO_AH ... Browse Code »

[ Upstream commit b86783587b3d1d552326d955acee37eac48800f1 ]

In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
later usage"), we missed that existing code was using nhoff as a
temporary variable that could not always contain transport header
offset.

This is not a problem for TCP/UDP because port offset (@poff)
is 0 for these protocols.

Signed-off-by: Eric Dumazet
Cc: Daniel Borkmann
Cc: Nikolay Aleksandrov
Acked-by: Nikolay Aleksandrov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-10-14 07:08:29 +0800
b991056a4 net: fix multiqueue selection ... Browse Code »

[ Upstream commit 50d1784ee4683f073c0362ee360bfae7a3333d6c ]

commit 416186fbf8c5b4e4465 ("net: Split core bits of netdev_pick_tx
into __netdev_pick_tx") added a bug that disables caching of queue
index in the socket.

This is the source of packet reorders for TCP flows, and
again this is happening more often when using FQ pacing.

Old code was doing

if (queue_index != old_index)
sk_tx_queue_set(sk, queue_index);

Alexander renamed the variables but forgot to change sk_tx_queue_set()
2nd parameter.

if (queue_index != new_index)
sk_tx_queue_set(sk, queue_index);

This means we store -1 over and over in sk->sk_tx_queue_mapping

Signed-off-by: Eric Dumazet
Cc: Alexander Duyck
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-10-14 07:08:28 +0800

27 Sep, 2013

1 commit

979ad974d net: Check the correct namespace when spoofing pid over SCM_RIGHTS ... Browse Code »

commit d661684cf6820331feae71146c35da83d794467e upstream.

This is a security bug.

The follow-up will fix nsproxy to discourage this type of issue from
happening again.

Signed-off-by: Andy Lutomirski
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Andy Lutomirski
2013-09-27 08:18:05 +0800

14 Sep, 2013

5 commits

56a12aceb net: revert 8728c544a9c ("net: dev_pick_tx() fix") ... Browse Code »

[ Upstream commit 702821f4ea6f68db18aa1de7d8ed62c6ba586a64 ]

commit 8728c544a9cbdc ("net: dev_pick_tx() fix") and commit
b6fe83e9525a ("bonding: refine IFF_XMIT_DST_RELEASE capability")
are quite incompatible : Queue selection is disabled because skb
dst was dropped before entering bonding device.

This causes major performance regression, mainly because TCP packets
for a given flow can be sent to multiple queues.

This is particularly visible when using the new FQ packet scheduler
with MQ + FQ setup on the slaves.

We can safely revert the first commit now that 416186fbf8c5b
("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
properly caps the queue_index.

Reported-by: Xi Wang
Diagnosed-by: Xi Wang
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Alexander Duyck
Cc: Denys Fedorysychenko
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-09-14 21:54:56 +0800
f784dbb9b rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header ... Browse Code »

[ Upstream commit 3e805ad288c524bb65aad3f1e004402223d3d504 ]

Fix the iproute2 command `bridge vlan show`, after switching from
rtgenmsg to ifinfomsg.

Let's start with a little history:

Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in
the 3.9 merge window.
In the kernel commit 6cbdceeb, he added attribute support to
bridge GETLINK requests sent with rtgenmsg.

Mar 6th: Vlad got this iproute2 reference implementation of the bridge
vlan netlink interface accepted (iproute2 9eff0e5c)

Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
http://patchwork.ozlabs.org/patch/239602/
http://marc.info/?t=136680900700007

Apr 28th: Linus released 3.9

Apr 30th: Stephen released iproute2 3.9.0

The `bridge vlan show` command haven't been working since the switch to
ifinfomsg, or in a released version of iproute2. Since the kernel side
only supports rtgenmsg, which iproute2 switched away from just prior to
the iproute2 3.9.0 release.

I haven't been able to find any documentation, about neither rtgenmsg
nor ifinfomsg, and in which situation to use which, but kernel commit
88c5b5ce seams to suggest that ifinfomsg should be used.

Fixing this in kernel will break compatibility, but I doubt that anybody
have been using it due to this bug in the user space reference
implementation, at least not without noticing this bug. That said the
functionality is still fully functional in 3.9, when reversing iproute2
commit 63338dca.

This could also be fixed in iproute2, but thats an ugly patch that would
reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
like rtgenmsg usage is discouraged. I'm assuming that the only reason
that Vlad implemented the kernel side to use rtgenmsg, was because
iproute2 was using it at the time.

Signed-off-by: Asbjoern Sloth Toennesen
Reviewed-by: Vlad Yasevich
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Asbjoern Sloth Toennesen
2013-09-14 21:54:55 +0800
21db4be13 rtnetlink: Fix inverted check in ndo_dflt_fdb_del() ... Browse Code »

[ Upstream commit 645359930231d5e78fd3296a38b98c1a658a7ade ]

Fix inverted check when deleting an fdb entry.

Signed-off-by: Sridhar Samudrala
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sridhar Samudrala
2013-09-14 21:54:54 +0800
5cf1ad6c6 neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup ... Browse Code »

[ Upstream commit 63134803a6369dcf7dddf7f0d5e37b9566b308d2 ]

dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
populate them before calling it.

Signed-off-by: Veaceslav Falico
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Veaceslav Falico
2013-09-14 21:54:54 +0800
4691236ce net: check net.core.somaxconn sysctl values ... Browse Code »

[ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ]

It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.

The sk_max_ack_backlog field of the sock structure is defined as
unsigned short. Therefore, the backlog argument in inet_listen()
shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
is truncated to the somaxconn value. So, the somaxconn value shouldn't
exceed 65535 (USHRT_MAX).
Also, negative values of somaxconn are meaningless.

before:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
net.core.somaxconn = 65536
$ sysctl -w net.core.somaxconn=-100
net.core.somaxconn = -100

after:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
error: "Invalid argument" setting key "net.core.somaxconn"
$ sysctl -w net.core.somaxconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"

Based on a prior patch from Changli Gao.

Signed-off-by: Roman Gushchin
Reported-by: Changli Gao
Suggested-by: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Roman Gushchin
2013-09-14 21:54:54 +0800

29 Jul, 2013

2 commits

37b25f3f9 vlan: mask vlan prio bits ... Browse Code »

[ Upstream commit d4b812dea4a236f729526facf97df1a9d18e191c ]

In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-07-29 07:30:05 +0800
ac294f13d neighbour: fix a race in neigh_destroy() ... Browse Code »

[ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ]

There is a race in neighbour code, because neigh_destroy() uses
skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
while other parts of the code assume neighbour rwlock is what
protects arp_queue

Convert all skb_queue_purge() calls to the __skb_queue_purge() variant

Use __skb_queue_head_init() instead of skb_queue_head_init()
to make clear we do not use arp_queue.lock

And hold neigh->lock in neigh_destroy() to close the race.

Reported-by: Joe Jin
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2013-07-29 07:29:44 +0800

27 Jun, 2013

1 commit

5dbe7c178 net: fix kernel deadlock with interface rename and netdev name retrieval. ... Browse Code »

When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.

This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.

The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.

The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.

Signed-off-by: Nicolas Schichan
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Nicolas Schichan
2013-06-27 04:42:54 +0800

26 Jun, 2013

1 commit

bd8a7036c gre: fix a possible skb leak ... Browse Code »

commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.

This patch adds a kfree_skb_list() helper to fix the bug.

Signed-off-by: Eric Dumazet
Cc: Pravin B Shelar
Cc: Daniel Borkmann
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-26 07:07:44 +0800

18 Jun, 2013

1 commit

788dfcaca vlan: restore ethtool ABI to control VLAN hardware acceleration ... Browse Code »

As part of the push to add 802.1ad server provider tagging support to the
kernel the VLAN features flags were renamed. Unfortunately the kernel name
for the VLAN hardware acceleration features that the kernel shows user space
was included in the rename, which broke ethtool (txvlan and rxvlan options
do not work). This patch restores the original names, i.e. the original ABI.
If we wanted to make clear to users that we are refering to CTAGs we can
always change ethtool's short_name and long_name for these features (for
example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
tx-vlan-ctag-offload).

Cc: Patrick McHardy
Cc: David S. Miller
Cc: netdev@vger.kernel.org
Signed-off-by: Fernando Luis Vazquez Cao
Reviewed-by: Ben Hutchings
Signed-off-by: David S. Miller

Fernando Luis Vazquez Cao
2013-06-18 08:09:35 +0800

11 Jun, 2013

1 commit

ed13998c3 sock_diag: fix filter code sent to userspace ... Browse Code »

Filters need to be translated to real BPF code for userland, like SO_GETFILTER.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-06-11 13:23:32 +0800

05 Jun, 2013

1 commit

5e71d9d77 net: fix sk_buff head without data area ... Browse Code »

Eric Dumazet spotted that we have to check skb->head instead
of skb->data as skb->head points to the beginning of the
data area of the skbuff. Similarly, we have to initialize the
skb->head pointer, not skb->data in __alloc_skb_head.

After this fix, netlink crashes in the release path of the
sk_buff, so let's fix that as well.

This bug was introduced in (0ebd0ac net: add function to
allocate sk_buff head without data area).

Reported-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira
2013-06-05 08:26:49 +0800

01 Jun, 2013

4 commits

b190a5087 net/core: dev_mc_sync_multiple calls wrong helper ... Browse Code »

The dev_mc_sync_multiple function is currently calling
__hw_addr_sync, and not __hw_addr_sync_multiple. This will result in
addresses only being synced to the first device from the set.

Corrected by calling the _multiple variant.

Signed-off-by: Jay Vosburgh
Reviewed-by: Vlad Yasevich
Tested-by: Shawn Bohrer
Signed-off-by: David S. Miller

Jay Vosburgh
2013-06-01 07:56:56 +0800
29ca2f8fc net/core: __hw_addr_sync_one / _multiple broken ... Browse Code »

Currently, __hw_addr_sync_one is called in a loop by
__hw_addr_sync_multiple to sync each of a "from" device's hw addresses
to a "to" device. __hw_addr_sync_one calls __hw_addr_add_ex to attempt
to add each address. __hw_addr_add_ex is called with global=false, and
sync=true.

__hw_addr_add_ex checks to see if the new address matches an
address already on the list. If so, it tests global and sync. In this
case, sync=true, and it then checks if the address is already synced,
and if so, returns 0.

This 0 return causes __hw_addr_sync_one to increment the sync_cnt
and refcount for the "from" list's address entry, even though the address
is already synced and has a reference and sync_cnt. This will cause
the sync_cnt and refcount to increment without bound every time an
addresses is added to the "from" device and synced to the "to" device.

The fix here has two parts:

First, when __hw_addr_add_ex finds the address already exists
and is synced, return -EEXIST instead of 0.

Second, __hw_addr_sync_one checks the error return for -EEXIST,
and if so, it (a) does not add a refcount/sync_cnt, and (b) returns 0
itself so that __hw_addr_sync_multiple will not return an error.

Signed-off-by: Jay Vosburgh
Reviewed-by: Vlad Yasevich
Tested-by: Shawn Bohrer
Signed-off-by: David S. Miller

Jay Vosburgh
2013-06-01 07:56:56 +0800
60ba834c2 net/core: __hw_addr_unsync_one "from" address not marked synced ... Browse Code »

When an address is added to a subordinate interface (the "to"
list), the address entry in the "from" list is not marked "synced" as
the entry added to the "to" list is.

When performing the unsync operation (e.g., dev_mc_unsync),
__hw_addr_unsync_one calls __hw_addr_del_entry with the "synced"
parameter set to true for the case when the address reference is being
released from the "from" list. This causes a test inside to fail,
with the result being that the reference count on the "from" address
is not properly decremeted and the address on the "from" list will
never be freed.

Correct this by having __hw_addr_unsync_one call the
__hw_addr_del_entry function with the "sync" flag set to false for the
"remove from the from list" case.

Signed-off-by: Jay Vosburgh
Reviewed-by: Vlad Yasevich
Tested-by: Shawn Bohrer
Signed-off-by: David S. Miller

Jay Vosburgh
2013-06-01 07:56:56 +0800
9747ba663 net/core: __hw_addr_create_ex does not initialize sync_cnt ... Browse Code »

The sync_cnt field is not being initialized, which can result
in arbitrary values in the field. Fixed by initializing it to zero.

Signed-off-by: Jay Vosburgh
Reviewed-by: Vlad Yasevich
Tested-by: Shawn Bohrer
Signed-off-by: David S. Miller

Jay Vosburgh
2013-06-01 07:56:56 +0800

29 May, 2013

1 commit

456db6a4d net/core/sock.c: add missing VSOCK string in af_family_*_key_strings ... Browse Code »

The three arrays of strings: af_family_key_strings,
af_family_slock_key_strings and af_family_clock_key_strings have not
VSOCK's string

Signed-off-by: Federico Vaga
Signed-off-by: David S. Miller

Federico Vaga
2013-05-29 14:58:49 +0800

20 May, 2013

1 commit

d2f83e907 Hoist memcpy_fromiovec/memcpy_toiovec into lib/ ... Browse Code »

ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.

In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.

socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).

Reported-by: Randy Dunlap
Acked-by: David S. Miller
Acked-by: Michael S. Tsirkin
Signed-off-by: Rusty Russell

Rusty Russell
2013-05-20 08:54:22 +0800

12 May, 2013

1 commit

f77d60212 ipv6: do not clear pinet6 field ... Browse Code »

We have seen multiple NULL dereferences in __inet6_lookup_established()

After analysis, I found that inet6_sk() could be NULL while the
check for sk_family == AF_INET6 was true.

Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
and TCP stacks.

Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
table, we no longer can clear pinet6 field.

This patch extends logic used in commit fcbdf09d9652c891
("net: fix nulls list corruptions in sk_prot_alloc")

TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
to make sure we do not clear pinet6 field.

At socket clone phase, we do not really care, as cloning the parent (non
NULL) pinet6 is not adding a fatal race.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-05-12 07:26:38 +0800

09 May, 2013

1 commit

19acc3272 gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol() ... Browse Code »

Rather than having logic to calculate inner protocol in every
tunnel gso handler move it to gso code. This simplifies code.

Cc: Eric Dumazet
Cc: Cong Wang
Cc: David S. Miller
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2013-05-09 04:13:30 +0800

06 May, 2013

2 commits

a3dbbc2ba netpoll: inverted down_trylock() test ... Browse Code »

The return value is reversed from mutex_trylock().

Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller

Dan Carpenter
2013-05-06 23:06:52 +0800
243198d09 rps_dev_flow_table_release(): no need to delay vfree() ... Browse Code »

The same story as with fib_trie patch - vfree() from RCU callbacks
is legitimate now.

Signed-off-by: Al Viro
Signed-off-by: David S. Miller

Al Viro
2013-05-06 23:06:51 +0800

03 May, 2013

2 commits

b29d31451 net: vlan,ethtool: netdev_features_t is more than 32 bit ... Browse Code »

Signed-off-by: Bjørn Mork
Signed-off-by: David S. Miller

Bjørn Mork
2013-05-03 01:58:12 +0800
6708c9e5c net: use netdev_features_t in skb_needs_linearize() ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-05-03 01:58:12 +0800

02 May, 2013

5 commits

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800
a8ca16ea7 proc: Supply a function to remove a proc entry by PDE ... Browse Code »

Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer. This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells
Acked-by: Grant Likely
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:46 +0800
0bb80f240 proc: Split the namespace stuff out into linux/proc_ns.h ... Browse Code »

Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800
73287a43c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):

1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.

2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.

3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.

4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.

6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.

Use it to support wake-on-lan in mv643xx_eth.

From Michael Stapelberg.

7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.

8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.

9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.

11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.

12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.

13) Start adding networking selftests.

14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.

15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.

16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.

17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.

18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.

20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.

21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.

22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.

23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.

24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.

25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.

26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.

27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.

29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.

30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

31) Support several new r8169 chips, from Hayes Wang.

32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.

33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

34) Add 802.1ad vlan offload support, from Patrick McHardy.

35) Support mmap() based netlink communication, also from Patrick
McHardy.

36) Support HW timestamping in mlx4 driver, from Amir Vadai.

37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.

38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.

39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...

Linus Torvalds
2013-05-02 05:08:52 +0800
bd7c4b604 netpoll: convert mutex into a semaphore ... Browse Code »

Bart Van Assche recently reported a warning to me:

[] warn_slowpath_common+0x7f/0xc0
[] warn_slowpath_null+0x1a/0x20
[] mutex_trylock+0x16d/0x180
[] netpoll_poll_dev+0x49/0xc30
[] ? __alloc_skb+0x82/0x2a0
[] netpoll_send_skb_on_dev+0x265/0x410
[] netpoll_send_udp+0x28a/0x3a0
[] ? write_msg+0x53/0x110 [netconsole]
[] write_msg+0xcf/0x110 [netconsole]
[] call_console_drivers.constprop.17+0xa1/0x1c0
[] console_unlock+0x2d6/0x450
[] vprintk_emit+0x1ee/0x510
[] printk+0x4d/0x4f
[] scsi_print_command+0x7d/0xe0 [scsi_mod]

This resulted from my commit ca99ca14c which introduced a mutex_trylock
operation in a path that could execute in interrupt context. When mutex
debugging is enabled, the above warns the user when we are in fact
exectuting in interrupt context
interrupt context.

After some discussion, It seems that a semaphore is the proper mechanism to use
here. While mutexes are defined to be unusable in interrupt context, no such
condition exists for semaphores (save for the fact that the non blocking api
calls, like up and down_trylock must be used when in irq context).

Signed-off-by: Neil Horman
Reported-by: Bart Van Assche
CC: Bart Van Assche
CC: David Miller
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Neil Horman
2013-05-02 03:00:24 +0800

30 Apr, 2013

5 commits

58717686c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/net/ethernet/emulex/benet/be.h
include/net/tcp.h
net/mac802154/mac802154.h

Most conflicts were minor overlapping stuff.

The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.

Signed-off-by: David S. Miller

David S. Miller
2013-04-30 15:55:20 +0800
39cc86130 unix/dgram: fix peeking with an offset larger than data in queue ... Browse Code »

Currently, peeking on a unix datagram socket with an offset larger than len of
the data in the sk receive queue returns immediately with bogus data. That's
because *off is not reset between each skb_queue_walk().

This patch fixes this so that the behavior is the same as peeking with no
offset on an empty queue: the caller blocks.

Signed-off-by: Benjamin Poirier
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Benjamin Poirier
2013-04-30 12:43:54 +0800
add05ad4e unix/dgram: peek beyond 0-sized skbs ... Browse Code »

"77c1090 net: fix infinite loop in __skb_recv_datagram()" (v3.8) introduced a
regression:
After that commit, recv can no longer peek beyond a 0-sized skb in the queue.
__skb_recv_datagram() instead stops at the first skb with len == 0 and results
in the system call failing with -EFAULT via skb_copy_datagram_iovec().

When peeking at an offset with 0-sized skb(s), each one of those is received
only once, in sequence. The offset starts moving forward again after receiving
datagrams with len > 0.

Signed-off-by: Benjamin Poirier
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Benjamin Poirier
2013-04-30 12:43:54 +0800
0c772159d net: Use consume_skb() to free gso segmented skb ... Browse Code »

Use consume_skb() to free the original skb that is successfully transmitted
as gso segmented skbs so that it is not treated as a drop due to an error.

Signed-off-by: Sridhar Samudrala
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Sridhar Samudrala
2013-04-30 12:18:13 +0800
70e3ba72b net/core: remove duplicate statements by do-while loop ... Browse Code »

Remove duplicate statements by using do-while loop instead of while loop.

- A;
- while (e) {
+ do {
A;
- }
+ } while (e);

Signed-off-by: Akinobu Mita
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2013-04-30 09:28:43 +0800