10 Mar, 2017

1 commit

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

02 Mar, 2017

1 commit


25 Dec, 2016

1 commit


14 Jul, 2016

1 commit

  • Sockets can have a filter program attached that drops or trims
    incoming packets based on the filter program return value.

    Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
    verifies this on arrival in rose_route_frame and unconditionally pulls
    the bytes in rose_recvmsg. The filter can trim packets to below this
    value in-between, causing pull to fail, leaving the partial header at
    the time of skb_copy_datagram_msg.

    Place a lower bound on the size to which sk_filter may trim packets
    by introducing sk_filter_trim_cap and call this for rose packets.

    Signed-off-by: Willem de Bruijn
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

24 Jun, 2015

1 commit


23 Jun, 2015

1 commit


19 Jun, 2015

1 commit


11 May, 2015

1 commit


03 Mar, 2015

4 commits

  • Now that there are no more users kill dev_rebuild_header and all of it's
    implementations.

    This is long overdue.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Patterned after the similar code in net/rom this turns out
    to be a trivial obviously correct transmformation.

    Cc: Ralf Baechle
    Cc: linux-hams@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Not setting the destination address is a bug that I suspect causes no
    problems today, as only the arp code seems to call dev_hard_header and
    the description I have of rose is that it is expected to be used with a
    static neigbour table.

    I have derived the offset and the length of the rose destination address
    from rose_rebuild_header where arp_find calls neigh_ha_snapshot to set
    the destination address.

    Cc: Ralf Baechle
    Cc: linux-hams@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

24 Nov, 2014

1 commit


06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Sep, 2014

1 commit


16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

07 Jan, 2014

1 commit


30 Dec, 2013

1 commit

  • recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.

    After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
    (net: rework recvmsg handler msg_name and msg_namelen logic), we now
    always take the else branch due to namelen being initialized to 0.

    Digging in netdev-vger-cvs git repo shows that msg_namelen was
    initialized with a fixed-size since at least 1995, so the else branch
    was never taken.

    Compile tested only.

    Signed-off-by: Florian Westphal
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

23 Dec, 2013

1 commit


21 Nov, 2013

1 commit


13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

29 May, 2013

1 commit

  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

08 Apr, 2013

1 commit

  • The code in rose_recvmsg() does not initialize all of the members of
    struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
    Nor does it initialize the padding bytes of the structure inserted by
    the compiler for alignment. This will lead to leaking uninitialized
    kernel stack bytes in net/socket.c.

    Fix the issue by initializing the memory used for sockaddr info with
    memset(0).

    Cc: Ralf Baechle
    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

21 Apr, 2012

2 commits

  • This results in code with less boiler plate that is a bit easier
    to read.

    Additionally stops us from using compatibility code in the sysctl
    core, hastening the day when the compatibility code can be removed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This makes it clearer which sysctls are relative to your current network
    namespace.

    This makes it a little less error prone by not exposing sysctls for the
    initial network namespace in other namespaces.

    This is the same way we handle all of our other network interfaces to
    userspace and I can't honestly remember why we didn't do this for
    sysctls right from the start.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 Apr, 2012

1 commit


03 Apr, 2012

1 commit

  • Pull networking fixes from David Miller:

    1) Provide device string properly for USB i2400m wimax devices, also
    don't OOPS when providing firmware string. From Phil Sutter.

    2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu.

    3) Add another device ID to USB zaurus driver, from Guan Xin.

    4) Loop index start in pool vector iterator is wrong causing MAC to not
    get configured in bnx2x driver, fix from Dmitry Kravkov.

    5) EQL driver assumes HZ=100, fix from Eric Dumazet.

    6) Now that skb_add_rx_frag() can specify the truesize increment
    separately, do so in f_phonet and cdc_phonet, also from Eric
    Dumazet.

    7) virtio_net accidently uses net_ratelimit() not only on the kernel
    warning but also the statistic bump, fix from Rick Jones.

    8) ip_route_input_mc() uses fixed init_net namespace, oops, use
    dev_net(dev) instead. Fix from Benjamin LaHaise.

    9) dev_forward_skb() needs to clear the incoming interface index of the
    SKB so that it looks like a new incoming packet, also from Benjamin
    LaHaise.

    10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of
    5GHZ, fix from Stanislav Yakovlev.

    11) Missing kmalloc() return value checks in orinoco, from Santosh
    Nayak.

    12) ath9k doesn't check for HT capabilities in the right way, it is
    checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from
    Sujith Manoharan.

    13) Fix x86 BPF JIT emission of 16-bit immediate field of AND
    instructions, from Feiran Zhuang.

    14) Avoid infinite loop in GARP code when registering sysfs entries.
    From David Ward.

    15) rose protocol uses memcpy instead of memcmp in a device address
    comparison, oops. Fix from Daniel Borkmann.

    16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being
    renamed to eth_hw_addr_random(). From Roland Stigge.

    17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way
    that ipv4 does. Fix from Shmulik Ladkani.

    18) via-rhine has an inverted bit test, causing suspend/resume
    regressions. Fix from Andreas Mohr.

    19) RIONET assumes 4K page size, fix from Akinobu Mita.

    20) Initialization of imask register in sky2 is buggy, because bits are
    "or'd" into an uninitialized local variable. Fix from Lino
    Sanfilippo.

    21) Fix FCOE checksum offload handling, from Yi Zou.

    22) Fix VLAN processing regression in e1000, from Jiri Pirko.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
    sky2: dont overwrite settings for PHY Quick link
    tg3: Fix 5717 serdes powerdown problem
    net: usb: cdc_eem: fix mtu
    net: sh_eth: fix endian check for architecture independent
    usb/rtl8150 : Remove duplicated definitions
    rionet: fix page allocation order of rionet_active
    via-rhine: fix wait-bit inversion.
    ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
    net: lpc_eth: Fix rename of dev_hw_addr_random
    net/netfilter/nfnetlink_acct.c: use linux/atomic.h
    rose_dev: fix memcpy-bug in rose_set_mac_address
    Fix non TBI PHY access; a bad merge undid bug fix in a previous commit.
    net/garp: avoid infinite loop if attribute already exists
    x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND
    bonding: emit event when bonding changes MAC
    mac80211: fix oper channel timestamp updation
    ath9k: Use HW HT capabilites properly
    MAINTAINERS: adding maintainer for ipw2x00
    net: orinoco: add error handling for failed kmalloc().
    net/wireless: ipw2x00: fix a typo in wiphy struct initilization
    ...

    Linus Torvalds
     

02 Apr, 2012

1 commit

  • If both addresses equal, nothing needs to be done. If the device is down,
    then we simply copy the new address to dev->dev_addr. If the device is up,
    then we add another loopback device with the new address, and if that does
    not fail, we remove the loopback device with the old address. And only
    then, we update the dev->dev_addr.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    danborkmann@iogearbox.net
     

29 Mar, 2012

1 commit


01 Nov, 2011

1 commit


07 Jul, 2011

1 commit


14 Apr, 2011

1 commit


31 Mar, 2011

1 commit


28 Mar, 2011

2 commits

  • Define some constant offsets for CALL_REQUEST based on the description
    at and the
    definition of ROSE as using 10-digit (5-byte) addresses. Use them
    consistently. Validate all implicit and explicit facilities lengths.
    Validate the address length byte rather than either trusting or
    assuming its value.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • When parsing the FAC_NATIONAL_DIGIS facilities field, it's possible for
    a remote host to provide more digipeaters than expected, resulting in
    heap corruption. Check against ROSE_MAX_DIGIS to prevent overflows, and
    abort facilities parsing on failure.

    Additionally, when parsing the FAC_CCITT_DEST_NSAP and
    FAC_CCITT_SRC_NSAP facilities fields, a remote host can provide a length
    of less than 10, resulting in an underflow in a memcpy size, causing a
    kernel panic due to massive heap corruption. A length of greater than
    20 results in a stack overflow of the callsign array. Abort facilities
    parsing on these invalid length values.

    Signed-off-by: Dan Rosenberg
    Cc: stable@kernel.org
    Signed-off-by: David S. Miller

    Dan Rosenberg