18 Nov, 2016
1 commit
-
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data."int" being used as an array index needs to be sign-extended
to 64-bit before being used.void f(long *p, int i)
{
g(p[i]);
}roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call gMOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
15 Nov, 2016
1 commit
-
Similar to commit 14135f30e33c ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().Switch to the new wait API.
Cc: Eric Dumazet
Cc: Peter Zijlstra
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
21 Oct, 2016
1 commit
-
firewire-net:
- set min/max_mtu
- remove fwnet_change_mtunes:
- set max_mtu
- clean up nes_netdev_change_mtuxpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtuhippi:
- set min/max_mtu
- remove hippi_change_mtubatman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
in the optimal place, don't have hardware to test withrionet:
- set min/max_mtu
- remove rionet_change_mtuslip:
- set min/max_mtu
- streamline sl_change_mtuum/net_kern:
- remove pointless ndo_change_mtuhsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtuipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
connected mode or not, so we'll just set the max_mtu value to the max
possible, and let the ndo_change_mtu function continue to validate any new
MTU change requests with checks for CM or not. Note that ipoib has no
min_mtu set, and thus, the network core's mtu > 0 check is the only lower
bounds here.mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtufddi:
- min_mtu = 21, max_mtu = 4470
- remove now redundant fddi_change_mtu (including export)fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
get past the core net MTU range checks so fjes_change_mtu can validate a
new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)f_phonet:
- min_mtu = 6, max_mtu = 65541u_ether:
- min_mtu = 14, max_mtu = 15412phonet/pep-gprs:
- min_mtu = 576, max_mtu = 65530
- remove redundant gprs_set_mtuCC: netdev@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: Stefan Richter
CC: Faisal Latif
CC: linux-rdma@vger.kernel.org
CC: Cliff Whickman
CC: Robin Holt
CC: Jes Sorensen
CC: Marek Lindner
CC: Simon Wunderlich
CC: Antonio Quartulli
CC: Sathya Prakash
CC: Chaitra P B
CC: Suganath Prabu Subramani
CC: MPT-FusionLinux.pdl@broadcom.com
CC: Sebastian Reichel
CC: Felipe Balbi
CC: Arvid Brodin
CC: Remi Denis-Courmont
Signed-off-by: Jarod Wilson
Signed-off-by: David S. Miller
11 Feb, 2016
1 commit
-
In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller
13 Jan, 2016
1 commit
-
Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
("dev: add per net_device packet type chains").skb->dev becomes NULL and we crash in __netif_receive_skb_core().
Before above commit, different kind of bugs or corruptions could happen
without major crash.But the root cause is that phonet_rcv() can queue skb without checking
if skb is shared or not.Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
Reported-by: Ivaylo Dimitrov
Tested-by: Ivaylo Dimitrov
Signed-off-by: Eric Dumazet
Cc: Remi Denis-Courmont
Signed-off-by: David S. Miller
11 May, 2015
1 commit
-
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
03 Mar, 2015
1 commit
-
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.Cc: Christoph Hellwig
Suggested-by: Al Viro
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
20 Jan, 2015
1 commit
-
My previous patch to this file changed the code to be bug-compatible
towards userspace. Unless userspace (which I wasn't able to find)
implements the dump reader by hand in a wrong way, this isn't needed.
If it uses libnl or similar code putting multiple messages into a
single SKB is far more efficient.Change the code to do this. While at it, also clean it up and don't
use so many variables - just store the address in the callback args
directly.Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
18 Jan, 2015
1 commit
-
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to writeif (my_function(...))
/* error condition */and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller
24 Nov, 2014
1 commit
-
Signed-off-by: Al Viro
12 Nov, 2014
1 commit
-
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.All messages are still ratelimited.
Some KERN_ uses are changed to KERN_DEBUG.
This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.cMiscellanea:
o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign argumentsSigned-off-by: Joe Perches
Signed-off-by: David S. Miller
06 Nov, 2014
1 commit
-
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.Having a helper like this means there will be less places to touch
during that transformation.Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller
07 Oct, 2014
1 commit
-
-Add __rcu annotation on table to fix sparse warnings:
net/phonet/pn_dev.c:279:25: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:279:25: expected struct net_device *
net/phonet/pn_dev.c:279:25: got void [noderef] *
net/phonet/pn_dev.c:376:17: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:376:17: expected struct net_device *volatile
net/phonet/pn_dev.c:376:17: got struct net_device [noderef] *
net/phonet/pn_dev.c:392:17: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:392:17: expected struct net_device *
net/phonet/pn_dev.c:392:17: got void [noderef] *-Access table with rcu_access_pointer (fixes the following sparse errors):
net/phonet/pn_dev.c:278:25: error: incompatible types in comparison expression (different address spaces)
net/phonet/pn_dev.c:391:17: error: incompatible types in comparison expression (different address spaces)Suggested-by: Eric Dumazet
Signed-off-by: Fabian Frederick
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Jul, 2014
1 commit
-
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.Coccinelle patch:
@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)v9: move comments here from the wrong commit
Signed-off-by: Tom Gundersen
Reviewed-by: David Herrmann
Signed-off-by: David S. Miller
25 Apr, 2014
1 commit
-
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
12 Apr, 2014
1 commit
-
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.Signed-off-by: David S. Miller
19 Jan, 2014
1 commit
-
This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.Signed-off-by: Steffen Hurrle
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
20 Nov, 2013
1 commit
-
Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.
4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.
10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.
17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.20) Fix route leak in VTI tunnel transmit, from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...
19 Nov, 2013
1 commit
-
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.Reported-by: mpb
Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
15 Nov, 2013
1 commit
-
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.Signed-off-by: Tetsuo Handa
Signed-off-by: Kees Cook
Cc: Joe Perches
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Aug, 2013
1 commit
-
UIDs are printed in the proc_fs as signed int, whereas
they are unsigned int.Signed-off-by: Francesco Fusco
Signed-off-by: David S. Miller
13 Jun, 2013
1 commit
-
Reduce the uses of this unnecessary typedef.
Done via perl script:
$ git grep --name-only -w ctl_table net | \
xargs perl -p -i -e '\
sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'Reflow the modified lines that now exceed 80 columns.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
29 May, 2013
1 commit
-
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.Signed-off-by: Jiri Pirko
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller
22 Mar, 2013
1 commit
-
With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
28 Feb, 2013
1 commit
-
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;type T;
expression a,c,d,e;
identifier b;
statement S;
@@-T b;
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Feb, 2013
2 commits
-
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller -
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller
19 Nov, 2012
1 commit
-
- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged
users to make netlink calls to modify their local network
namespace.- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
that calls that are not safe for unprivileged users are still
protected.Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
11 Sep, 2012
1 commit
-
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
15 Aug, 2012
1 commit
-
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Cc: Arnaldo Carvalho de Melo
Cc: Sridhar Samudrala
Acked-by: Vlad Yasevich
Acked-by: David S. Miller
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
18 Jun, 2012
1 commit
-
Signed-off-by: Rémi Denis-Courmont
Cc: Sakari Ailus
Signed-off-by: David S. Miller
21 Apr, 2012
2 commits
-
This results in code with less boiler plate that is a bit easier
to read.Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This makes it clearer which sysctls are relative to your current network
namespace.This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
16 Apr, 2012
1 commit
-
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Apr, 2012
2 commits
-
Signed-off-by: Rémi Denis-Courmont
Signed-off-by: David S. Miller -
Signed-off-by: Rémi Denis-Courmont
Signed-off-by: David S. Miller
13 Apr, 2012
2 commits
-
Pull in the 'net' tree to get CAIF bug fixes upon which
the following set of CAIF feature patches depend.Signed-off-by: David S. Miller
-
Recently an oops was reported in phonet if there was a failure during
network namespace creation.[ 163.733755] ------------[ cut here ]------------
[ 163.734501] kernel BUG at include/net/netns/generic.h:45!
[ 163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
[ 163.734501] CPU 2
[ 163.734501] Pid: 19145, comm: trinity Tainted: G W 3.4.0-rc1-next-20120405-sasha-dirty #57
[ 163.734501] RIP: 0010:[] [] phonet_pernet+0x182/0x1a0
[ 163.734501] RSP: 0018:ffff8800674d5ca8 EFLAGS: 00010246
[ 163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
[ 163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
[ 163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
[ 163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
[ 163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
[ 163.734501] FS: 00007f055e8de700(0000) GS:ffff88007d000000(0000) knlGS:0000000000000000
[ 163.734501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
[ 163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 163.734501] Process trinity (pid: 19145, threadinfo ffff8800674d4000, task ffff8800678c8000)
[ 163.734501] Stack:
[ 163.734501] ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000 00000000ffffffd4
[ 163.734501] ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000 00000000ffffffd4
[ 163.734501] ffffffff836b90c0 0000000000000000 ffff8800674d5d18 ffffffff824d707d
[ 163.734501] Call Trace:
[ 163.734501] [] ? phonet_pernet+0x20/0x1a0
[ 163.734501] [] ? get_parent_ip+0x11/0x50
[ 163.734501] [] phonet_device_destroy+0x1a/0x100
[ 163.734501] [] phonet_device_notify+0x3d/0x50
[ 163.734501] [] notifier_call_chain+0xee/0x130
[ 163.734501] [] raw_notifier_call_chain+0x11/0x20
[ 163.734501] [] call_netdevice_notifiers+0x52/0x60
[ 163.734501] [] rollback_registered_many+0x185/0x270
[ 163.734501] [] unregister_netdevice_many+0x14/0x60
[ 163.734501] [] ipip_exit_net+0x1b3/0x1d0
[ 163.734501] [] ? ipip_rcv+0x420/0x420
[ 163.734501] [] ops_exit_list+0x35/0x70
[ 163.734501] [] setup_net+0xab/0xe0
[ 163.734501] [] copy_net_ns+0x76/0x100
[ 163.734501] [] create_new_namespaces+0xfb/0x190
[ 163.734501] [] unshare_nsproxy_namespaces+0x61/0x80
[ 163.734501] [] sys_unshare+0xff/0x290
[ 163.734501] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 163.734501] [] system_call_fastpath+0x16/0x1b
[ 163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82 be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48 85 db 75 0e 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48 89 d8
[ 163.734501] RIP [] phonet_pernet+0x182/0x1a0
[ 163.734501] RSP
[ 163.861289] ---[ end trace fb5615826c548066 ]---After investigation it turns out there were two issues.
1) Phonet was not implementing network devices but was using register_pernet_device
instead of register_pernet_subsys.This was allowing there to be cases when phonenet was not initialized and
the phonet net_generic was not set for a network namespace when network
device events were being reported on the netdevice_notifier for a network
namespace leading to the oops above.2) phonet_exit_net was implementing a confusing and special case of handling all
network devices from going away that it was hard to see was correct, and would
only occur when the phonet module was removed.Now that unregister_netdevice_notifier has been modified to synthesize unregistration
events for the network devices that are extant when called this confusing special
case in phonet_exit_net is no longer needed.Signed-off-by: Eric W. Biederman
Acked-by: Rémi Denis-Courmont
Signed-off-by: David S. Miller
11 Apr, 2012
1 commit
06 Apr, 2012
1 commit
-
A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185] [] warn_slowpath_common+0x87/0xb0
[ 8921.762868] [] warn_slowpath_null+0x15/0x20
[ 8921.765399] [] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226] [] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686] [] ? get_page_from_freelist+0x625/0x660
[ 8921.773919] [] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248] [] kmalloc_large_node+0x70/0xc0
[ 8921.778294] [] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847] [] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179] [] __alloc_skb+0x75/0x170
[ 8921.784971] [] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111] [] ? release_sock+0x7e/0x90
[ 8921.788973] [] sock_alloc_send_skb+0x10/0x20
[ 8921.791052] [] pep_sendmsg+0x60/0x380
[ 8921.792931] [] ? pn_socket_bind+0x156/0x180
[ 8921.794917] [] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053] [] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992] [] sock_aio_write+0x187/0x1b0
[ 8921.801395] [] ? sub_preempt_count+0xae/0xf0
[ 8921.803501] [] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505] [] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860] [] do_sync_readv_writev+0xbc/0x110
[ 8921.809986] [] ? might_fault+0x97/0xa0
[ 8921.811998] [] ? security_file_permission+0x1e/0x90
[ 8921.814595] [] do_readv_writev+0xe2/0x1e0
[ 8921.816702] [] ? do_setitimer+0x1ac/0x200
[ 8921.818819] [] ? get_parent_ip+0x11/0x50
[ 8921.820863] [] ? sub_preempt_count+0xae/0xf0
[ 8921.823318] [] vfs_writev+0x46/0x60
[ 8921.825219] [] sys_writev+0x4f/0xb0
[ 8921.827127] [] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---Signed-off-by: Sasha Levin
Acked-by: Rémi Denis-Courmont
Signed-off-by: David S. Miller