10 Dec, 2014

1 commit

  • Note that the code _using_ ->msg_iter at that point will be very
    unhappy with anything other than unshifted iovec-backed iov_iter.
    We still need to convert users to proper primitives.

    Signed-off-by: Al Viro

    Al Viro
     

24 Nov, 2014

1 commit

  • ... and make it handle multi-segment iovecs - deals with that
    "fix this later" issue for free. A bit of shame, really - it
    had been there since 2.3.15pre3 when the whole thing went into the
    tree, practically a historical artefact by now...

    Signed-off-by: Al Viro

    Al Viro
     

06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Oct, 2014

1 commit

  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Sep, 2014

1 commit


02 Sep, 2014

1 commit


25 Aug, 2014

1 commit


22 Aug, 2014

1 commit


14 Aug, 2014

2 commits

  • b67bfe0d42cac56c512dd5da4b1b347a23f4b70a (hlist: drop the node
    parameter from iterators) dropped the node parameter from
    iterators which lec_tbl_walk() was using to iterate the list.

    Signed-off-by: Chas Williams
    Signed-off-by: David S. Miller

    chas williams - CONTRACTOR
     
  • One should not call blocking primitives inside a wait loop, since both
    require task_struct::state to sleep, so the inner will destroy the
    outer state.

    sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before
    prepare_to_wait() to avoid sleeping while waiting interruptibly. You do
    not actually need to call sigd_enq() after the initial prepare_to_wait()
    because we test the termination condition before calling schedule().

    Based on suggestions from Peter Zijlstra.

    Signed-off-by: Chas Williams
    Acked-by: Peter Zijlstra
    Signed-off-by: David S. Miller

    chas williams - CONTRACTOR
     

16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

13 Jun, 2014

1 commit

  • Pull networking updates from David Miller:

    1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

    2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

    3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

    4) BPF now has a "random" opcode, from Chema Gonzalez.

    5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

    6) Support TCP fastopen over ipv6, from Daniel Lee.

    7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

    8) Support software TSO in fec driver too, from Nimrod Andy.

    9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

    10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

    11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

    12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

    13) Support busy polling in SCTP, from Neal Horman.

    14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

    15) Bridge promisc mode handling improvements from Vlad Yasevich.

    16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
    rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
    tcp: fixing TLP's FIN recovery
    net: fec: Add software TSO support
    net: fec: Add Scatter/gather support
    net: fec: Increase buffer descriptor entry number
    net: fec: Factorize feature setting
    net: fec: Enable IP header hardware checksum
    net: fec: Factorize the .xmit transmit function
    bridge: fix compile error when compiling without IPv6 support
    bridge: fix smatch warning / potential null pointer dereference
    via-rhine: fix full-duplex with autoneg disable
    bnx2x: Enlarge the dorq threshold for VFs
    bnx2x: Check for UNDI in uncommon branch
    bnx2x: Fix 1G-baseT link
    bnx2x: Fix link for KR with swapped polarity lane
    sctp: Fix sk_ack_backlog wrap-around problem
    net/core: Add VF link state control policy
    net/fsl: xgmac_mdio is dependent on OF_MDIO
    net/fsl: Make xgmac_mdio read error message useful
    net_sched: drr: warn when qdisc is not work conserving
    ...

    Linus Torvalds
     

31 May, 2014

1 commit

  • This preprocessor check is commented out ever since this file was added
    during the v2.3 development cycle. It is unclear what it purpose might
    have been. Whatever it was, it can safely be removed now.

    Signed-off-by: Paul Bolle
    Signed-off-by: David S. Miller

    Paul Bolle
     

18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Mar, 2014

1 commit

  • Use del_timer_sync to ensure that the timer is stopped on all CPUs before
    the driver exists.

    This change was suggested by Thomas Gleixner.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @r@
    declarer name module_exit;
    identifier ex;
    @@

    module_exit(ex);

    @@
    identifier r.ex;
    @@

    ex(...) {

    }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

22 Jan, 2014

2 commits

  • net/appletalk/aarp.c: In function ‘__aarp_send_query’:
    net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
    ...
    net/atm/lec.c: In function ‘send_to_lecd’:
    net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default]
    In file included from net/atm/lec.c:17:0:
    include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’
    ...
    net/caif/caif_usb.c: In function ‘cfusbl_create’:
    net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
    save some cycles on arm and powerpc.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

21 Nov, 2013

1 commit


29 May, 2013

2 commits

  • commit 351638e7deeed2ec8ce451b53d3 (net: pass info struct via netdevice notifier)
    breaks booting of my KVM guest, this is due to we still forget to pass
    struct netdev_notifier_info in several places. This patch completes it.

    Cc: Jiri Pirko
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

10 Apr, 2013

1 commit

  • The only part of proc_dir_entry the code outside of fs/proc
    really cares about is PDE(inode)->data. Provide a helper
    for that; static inline for now, eventually will be moved
    to fs/proc, along with the knowledge of struct proc_dir_entry
    layout.

    Signed-off-by: Al Viro

    Al Viro
     

08 Apr, 2013

2 commits

  • Conflicts:
    drivers/nfc/microread/mei.c
    net/netfilter/nfnetlink_queue_core.c

    Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
    some cleanups are going to go on-top.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The current code does not fill the msg_name member in case it is set.
    It also does not set the msg_namelen member to 0 and therefore makes
    net/socket.c leak the local, uninitialized sockaddr_storage variable
    to userland -- 128 bytes of kernel stack memory.

    Fix that by simply setting msg_namelen to 0 as obviously nobody cared
    about vcc_recvmsg() not filling the msg_name in case it was set.

    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     

28 Mar, 2013

1 commit

  • Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
    an 802.3 frame. Frames with a lower value in the ethernet type field
    are Ethernet II.

    Also update all the users of this value that David Miller and
    I could find to use the new constant.

    Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
    should be >= not >.

    As suggested by Jesse Gross.

    Compile tested only.

    Cc: David Miller
    Cc: Jesse Gross
    Cc: Karsten Keil
    Cc: John W. Linville
    Cc: Johannes Berg
    Cc: Bart De Schuymer
    Cc: Stephen Hemminger
    Cc: Patrick McHardy
    Cc: Marcel Holtmann
    Cc: Gustavo Padovan
    Cc: Johan Hedberg
    Cc: linux-bluetooth@vger.kernel.org
    Cc: netfilter-devel@vger.kernel.org
    Cc: bridge@lists.linux-foundation.org
    Cc: linux-wireless@vger.kernel.org
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: linux-media@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: dev@openvswitch.org
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Stefan Richter
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


19 Feb, 2013

1 commit

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

18 Dec, 2012

1 commit


02 Dec, 2012

6 commits

  • We don't need to schedule the wakeup tasklet on *every* unlock; only if we
    actually blocked the channel in the first place.

    Signed-off-by: David Woodhouse
    Acked-by: Krzysztof Mazur

    David Woodhouse
     
  • The br2684 does not check if used vcc is in connected state,
    causing potential Oops in pppoatm_send() when vcc->send() is called
    on not fully connected socket.

    Now br2684 can be assigned only on connected sockets; otherwise
    -EINVAL error is returned.

    Signed-off-by: Krzysztof Mazur
    Signed-off-by: David Woodhouse

    Krzysztof Mazur
     
  • The br2684 code used module_put() during unassignment from vcc with
    hope that we have BKL. This assumption is no longer true.

    Now owner field in atmvcc is used to move this module_put()
    to vcc_destroy_socket().

    Signed-off-by: David Woodhouse
    Acked-by: Krzysztof Mazur

    David Woodhouse
     
  • Now that we can return zero from pppoatm_send() for reasons *other* than
    the queue being full, that means we can't depend on a subsequent call to
    pppoatm_pop() waking the queue, and we might leave it stalled
    indefinitely.

    Use the ->release_cb() callback to wake the queue after the sock is
    unlocked.

    Signed-off-by: David Woodhouse
    Acked-by: Krzysztof Mazur

    David Woodhouse
     
  • Avoid submitting packets to a vcc which is being closed. Things go badly
    wrong when the ->pop method gets later called after everything's been
    torn down.

    Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
    which clears the ATM_VF_READY bit under the same lock. Otherwise, we
    could end up submitting a packet to the device driver even after its
    ->ops->close method has been called. And it could call the vcc's ->pop
    method after the protocol has been shut down. Which leads to a panic.

    Signed-off-by: David Woodhouse
    Acked-by: Krzysztof Mazur

    David Woodhouse
     
  • The immediate use case for this is that it will allow us to ensure that a
    pppoatm queue is woken after it has to drop a packet due to the sock being
    locked.

    Note that 'release_cb' is called when the socket is *unlocked*. This is
    not to be confused with vcc_release() — which probably ought to be called
    vcc_close().

    Signed-off-by: David Woodhouse
    Acked-by: Krzysztof Mazur

    David Woodhouse
     

30 Nov, 2012

2 commits

  • The pppoatm_may_send() is quite heavy and it's called three times
    in pppoatm_send() and inlining costs more than 200 bytes of code
    (more than 10% of total pppoatm driver code size).

    add/remove: 1/0 grow/shrink: 0/1 up/down: 132/-367 (-235)
    function old new delta
    pppoatm_may_send - 132 +132
    pppoatm_send 900 533 -367

    Signed-off-by: Krzysztof Mazur
    Signed-off-by: David Woodhouse

    Krzysztof Mazur
     
  • The vcc_destroy_socket() closes vcc before the protocol is detached
    from vcc by calling vcc->push() with NULL skb. This leaves some time
    window, where the protocol may call vcc->send() on closed vcc
    and crash.

    Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
    indicate that vcc is not ready. If the vcc is not ready we just
    drop frame. Queueing frames is much more complicated because we
    don't have callbacks that inform us about vcc flags changes.

    Signed-off-by: Krzysztof Mazur
    Signed-off-by: David Woodhouse

    Krzysztof Mazur
     

28 Nov, 2012

1 commit

  • The pppoatm_send() does not take any lock that will prevent concurrent
    vcc_sendmsg(). This causes two problems:

    - there is no locking between checking the send queue size
    with atm_may_send() and incrementing sk_wmem_alloc,
    and the real queue size can be a little higher than sk_sndbuf

    - the vcc->sendmsg() can be called concurrently. I'm not sure
    if it's allowed. Some drivers (eni, nicstar, ...) seem
    to assume it will never happen.

    Now pppoatm_send() takes ATM socket lock, the same that is used
    in vcc_sendmsg() and other ATM socket functions. The pppoatm_send()
    is called with BH disabled, so bh_lock_sock() is used instead
    of lock_sock().

    Signed-off-by: Krzysztof Mazur
    Cc: Chas Williams - CONTRACTOR
    Signed-off-by: David Woodhouse

    Krzysztof Mazur