11 Feb, 2016

1 commit

  • In order to support fast reuseport lookups in TCP, the hash function
    defined in struct proto must be capable of returning an error code.
    This patch changes the function signature of all related hash functions
    to return an integer and handles or propagates this return value at
    all call sites.

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     

13 Jan, 2016

1 commit

  • Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
    ("dev: add per net_device packet type chains").

    skb->dev becomes NULL and we crash in __netif_receive_skb_core().

    Before above commit, different kind of bugs or corruptions could happen
    without major crash.

    But the root cause is that phonet_rcv() can queue skb without checking
    if skb is shared or not.

    Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.

    Reported-by: Ivaylo Dimitrov
    Tested-by: Ivaylo Dimitrov
    Signed-off-by: Eric Dumazet
    Cc: Remi Denis-Courmont
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 May, 2015

1 commit


03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

20 Jan, 2015

1 commit

  • My previous patch to this file changed the code to be bug-compatible
    towards userspace. Unless userspace (which I wasn't able to find)
    implements the dump reader by hand in a wrong way, this isn't needed.
    If it uses libnl or similar code putting multiple messages into a
    single SKB is far more efficient.

    Change the code to do this. While at it, also clean it up and don't
    use so many variables - just store the address in the callback args
    directly.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

24 Nov, 2014

1 commit


12 Nov, 2014

1 commit

  • Use the more common dynamic_debug capable net_dbg_ratelimited
    and remove the LIMIT_NETDEBUG macro.

    All messages are still ratelimited.

    Some KERN_ uses are changed to KERN_DEBUG.

    This may have some negative impact on messages that were
    emitted at KERN_INFO that are not not enabled at all unless
    DEBUG is defined or dynamic_debug is enabled. Even so,
    these messages are now _not_ emitted by default.

    This also eliminates the use of the net_msg_warn sysctl
    "/proc/sys/net/core/warnings". For backward compatibility,
    the sysctl is not removed, but it has no function. The extern
    declaration of net_msg_warn is removed from sock.h and made
    static in net/core/sysctl_net_core.c

    Miscellanea:

    o Update the sysctl documentation
    o Remove the embedded uses of pr_fmt
    o Coalesce format fragments
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Oct, 2014

1 commit

  • -Add __rcu annotation on table to fix sparse warnings:
    net/phonet/pn_dev.c:279:25: warning: incorrect type in assignment (different address spaces)
    net/phonet/pn_dev.c:279:25: expected struct net_device *
    net/phonet/pn_dev.c:279:25: got void [noderef] *
    net/phonet/pn_dev.c:376:17: warning: incorrect type in assignment (different address spaces)
    net/phonet/pn_dev.c:376:17: expected struct net_device *volatile
    net/phonet/pn_dev.c:376:17: got struct net_device [noderef] *
    net/phonet/pn_dev.c:392:17: warning: incorrect type in assignment (different address spaces)
    net/phonet/pn_dev.c:392:17: expected struct net_device *
    net/phonet/pn_dev.c:392:17: got void [noderef] *

    -Access table with rcu_access_pointer (fixes the following sparse errors):
    net/phonet/pn_dev.c:278:25: error: incompatible types in comparison expression (different address spaces)
    net/phonet/pn_dev.c:391:17: error: incompatible types in comparison expression (different address spaces)

    Suggested-by: Eric Dumazet
    Signed-off-by: Fabian Frederick
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Fabian Frederick
     

16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

20 Nov, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Mostly these are fixes for fallout due to merge window changes, as
    well as cures for problems that have been with us for a much longer
    period of time"

    1) Johannes Berg noticed two major deficiencies in our genetlink
    registration. Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

    2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports. From Ben Hutchings.

    3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

    4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet(). From Alexei Starovoitov.

    5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
    Makita.

    6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

    7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease. One particularly problematic area is 802.11 AMPDU in
    wireless. From Eric Dumazet.

    8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here. Fix from Eric Dumzaet, reported by Dave Jones.

    9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

    10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account. From Michael Dalton.

    11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
    bumping, this one has been with us for a while. From Eric Dumazet.

    12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

    13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

    14) macvlan has the same issue as normal vlans did wrt. propagating LRO
    disabling down to the real device, fix it the same way. From Michal
    Kubecek.

    15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

    16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

    17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

    18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little. Fix
    from Vlad Yasevich.

    19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace. From Hannes Frederic Sowa. There is another fix in the
    works from Hannes which will prevent future problems of this nature.

    20) Fix route leak in VTI tunnel transmit, from Fan Du.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    genetlink: make multicast groups const, prevent abuse
    genetlink: pass family to functions using groups
    genetlink: add and use genl_set_err()
    genetlink: remove family pointer from genl_multicast_group
    genetlink: remove genl_unregister_mc_group()
    hsr: don't call genl_unregister_mc_group()
    quota/genetlink: use proper genetlink multicast APIs
    drop_monitor/genetlink: use proper genetlink multicast APIs
    genetlink: only pass array to genl_register_family_with_ops()
    tcp: don't update snd_nxt, when a socket is switched from repair mode
    atm: idt77252: fix dev refcnt leak
    xfrm: Release dst if this dst is improper for vti tunnel
    netlink: fix documentation typo in netlink_set_err()
    be2net: Delete secondary unicast MAC addresses during be_close
    be2net: Fix unconditional enabling of Rx interface options
    net, virtio_net: replace the magic value
    ping: prevent NULL pointer dereference on write to msg_name
    bnx2x: Prevent "timeout waiting for state X"
    bnx2x: prevent CFC attention
    bnx2x: Prevent panic during DMAE timeout
    ...

    Linus Torvalds
     

19 Nov, 2013

1 commit

  • Only update *addr_len when we actually fill in sockaddr, otherwise we
    can return uninitialized memory from the stack to the caller in the
    recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
    checks because we only get called with a valid addr_len pointer either
    from sock_common_recvmsg or inet_recvmsg.

    If a blocking read waits on a socket which is concurrently shut down we
    now return zero and set msg_msgnamelen to 0.

    Reported-by: mpb
    Suggested-by: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

15 Nov, 2013

1 commit


16 Aug, 2013

1 commit


13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

29 May, 2013

1 commit

  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

22 Mar, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

19 Nov, 2012

1 commit

  • - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
    to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged
    users to make netlink calls to modify their local network
    namespace.

    - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
    that calls that are not safe for unprivileged users are still
    protected.

    Later patches will remove the extra capable calls from methods
    that are safe for unprivilged users.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

15 Aug, 2012

1 commit


18 Jun, 2012

1 commit


21 Apr, 2012

2 commits

  • This results in code with less boiler plate that is a bit easier
    to read.

    Additionally stops us from using compatibility code in the sysctl
    core, hastening the day when the compatibility code can be removed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This makes it clearer which sysctls are relative to your current network
    namespace.

    This makes it a little less error prone by not exposing sysctls for the
    initial network namespace in other namespaces.

    This is the same way we handle all of our other network interfaces to
    userspace and I can't honestly remember why we didn't do this for
    sysctls right from the start.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 Apr, 2012

1 commit


14 Apr, 2012

2 commits


13 Apr, 2012

2 commits

  • Pull in the 'net' tree to get CAIF bug fixes upon which
    the following set of CAIF feature patches depend.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Recently an oops was reported in phonet if there was a failure during
    network namespace creation.

    [ 163.733755] ------------[ cut here ]------------
    [ 163.734501] kernel BUG at include/net/netns/generic.h:45!
    [ 163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
    [ 163.734501] CPU 2
    [ 163.734501] Pid: 19145, comm: trinity Tainted: G W 3.4.0-rc1-next-20120405-sasha-dirty #57
    [ 163.734501] RIP: 0010:[] [] phonet_pernet+0x182/0x1a0
    [ 163.734501] RSP: 0018:ffff8800674d5ca8 EFLAGS: 00010246
    [ 163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
    [ 163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
    [ 163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
    [ 163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
    [ 163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
    [ 163.734501] FS: 00007f055e8de700(0000) GS:ffff88007d000000(0000) knlGS:0000000000000000
    [ 163.734501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
    [ 163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 163.734501] Process trinity (pid: 19145, threadinfo ffff8800674d4000, task ffff8800678c8000)
    [ 163.734501] Stack:
    [ 163.734501] ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000 00000000ffffffd4
    [ 163.734501] ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000 00000000ffffffd4
    [ 163.734501] ffffffff836b90c0 0000000000000000 ffff8800674d5d18 ffffffff824d707d
    [ 163.734501] Call Trace:
    [ 163.734501] [] ? phonet_pernet+0x20/0x1a0
    [ 163.734501] [] ? get_parent_ip+0x11/0x50
    [ 163.734501] [] phonet_device_destroy+0x1a/0x100
    [ 163.734501] [] phonet_device_notify+0x3d/0x50
    [ 163.734501] [] notifier_call_chain+0xee/0x130
    [ 163.734501] [] raw_notifier_call_chain+0x11/0x20
    [ 163.734501] [] call_netdevice_notifiers+0x52/0x60
    [ 163.734501] [] rollback_registered_many+0x185/0x270
    [ 163.734501] [] unregister_netdevice_many+0x14/0x60
    [ 163.734501] [] ipip_exit_net+0x1b3/0x1d0
    [ 163.734501] [] ? ipip_rcv+0x420/0x420
    [ 163.734501] [] ops_exit_list+0x35/0x70
    [ 163.734501] [] setup_net+0xab/0xe0
    [ 163.734501] [] copy_net_ns+0x76/0x100
    [ 163.734501] [] create_new_namespaces+0xfb/0x190
    [ 163.734501] [] unshare_nsproxy_namespaces+0x61/0x80
    [ 163.734501] [] sys_unshare+0xff/0x290
    [ 163.734501] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 163.734501] [] system_call_fastpath+0x16/0x1b
    [ 163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82 be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48 85 db 75 0e 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48 89 d8
    [ 163.734501] RIP [] phonet_pernet+0x182/0x1a0
    [ 163.734501] RSP
    [ 163.861289] ---[ end trace fb5615826c548066 ]---

    After investigation it turns out there were two issues.
    1) Phonet was not implementing network devices but was using register_pernet_device
    instead of register_pernet_subsys.

    This was allowing there to be cases when phonenet was not initialized and
    the phonet net_generic was not set for a network namespace when network
    device events were being reported on the netdevice_notifier for a network
    namespace leading to the oops above.

    2) phonet_exit_net was implementing a confusing and special case of handling all
    network devices from going away that it was hard to see was correct, and would
    only occur when the phonet module was removed.

    Now that unregister_netdevice_notifier has been modified to synthesize unregistration
    events for the network devices that are extant when called this confusing special
    case in phonet_exit_net is no longer needed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

11 Apr, 2012

1 commit


06 Apr, 2012

1 commit

  • A phonet packet is limited to USHRT_MAX bytes, this is never checked during
    tx which means that the user can specify any size he wishes, and the kernel
    will attempt to allocate that size.

    In the good case, it'll lead to the following warning, but it may also cause
    the kernel to kick in the OOM and kill a random task on the server.

    [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
    [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46
    [ 8921.756672] Call Trace:
    [ 8921.758185] [] warn_slowpath_common+0x87/0xb0
    [ 8921.762868] [] warn_slowpath_null+0x15/0x20
    [ 8921.765399] [] __alloc_pages_slowpath+0x65/0x730
    [ 8921.769226] [] ? zone_watermark_ok+0x1a/0x20
    [ 8921.771686] [] ? get_page_from_freelist+0x625/0x660
    [ 8921.773919] [] __alloc_pages_nodemask+0x1f8/0x240
    [ 8921.776248] [] kmalloc_large_node+0x70/0xc0
    [ 8921.778294] [] __kmalloc_node_track_caller+0x34/0x1c0
    [ 8921.780847] [] ? sock_alloc_send_pskb+0xbc/0x260
    [ 8921.783179] [] __alloc_skb+0x75/0x170
    [ 8921.784971] [] sock_alloc_send_pskb+0xbc/0x260
    [ 8921.787111] [] ? release_sock+0x7e/0x90
    [ 8921.788973] [] sock_alloc_send_skb+0x10/0x20
    [ 8921.791052] [] pep_sendmsg+0x60/0x380
    [ 8921.792931] [] ? pn_socket_bind+0x156/0x180
    [ 8921.794917] [] ? pn_socket_autobind+0x3f/0x90
    [ 8921.797053] [] pn_socket_sendmsg+0x4f/0x70
    [ 8921.798992] [] sock_aio_write+0x187/0x1b0
    [ 8921.801395] [] ? sub_preempt_count+0xae/0xf0
    [ 8921.803501] [] ? __lock_acquire+0x42c/0x4b0
    [ 8921.805505] [] ? __sock_recv_ts_and_drops+0x140/0x140
    [ 8921.807860] [] do_sync_readv_writev+0xbc/0x110
    [ 8921.809986] [] ? might_fault+0x97/0xa0
    [ 8921.811998] [] ? security_file_permission+0x1e/0x90
    [ 8921.814595] [] do_readv_writev+0xe2/0x1e0
    [ 8921.816702] [] ? do_setitimer+0x1ac/0x200
    [ 8921.818819] [] ? get_parent_ip+0x11/0x50
    [ 8921.820863] [] ? sub_preempt_count+0xae/0xf0
    [ 8921.823318] [] vfs_writev+0x46/0x60
    [ 8921.825219] [] sys_writev+0x4f/0xb0
    [ 8921.827127] [] system_call_fastpath+0x16/0x1b
    [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

    Signed-off-by: Sasha Levin
    Acked-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    Sasha Levin
     

02 Apr, 2012

1 commit


13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Nov, 2011

1 commit