13 Jan, 2016

1 commit

  • Pull networking updates from Davic Miller:

    1) Support busy polling generically, for all NAPI drivers. From Eric
    Dumazet.

    2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

    3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

    4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

    5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

    6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

    7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

    8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

    9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers. From Kalle Valo.

    10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti.

    11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

    12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

    13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

    14) Dead route support in MPLS, from Roopa Prabhu.

    15) TSO support for thunderx chips, from Sunil Goutham.

    16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

    17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack. From Tom Herbert.

    18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

    19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
    net: bnxt: always return values from _bnxt_get_max_rings
    net: bpf: reject invalid shifts
    phonet: properly unshare skbs in phonet_rcv()
    dwc_eth_qos: Fix dma address for multi-fragment skbs
    phy: remove an unneeded condition
    mdio: remove an unneed condition
    mdio_bus: NULL dereference on allocation error
    net: Fix typo in netdev_intersect_features
    net: freescale: mac-fec: Fix build error from phy_device API change
    net: freescale: ucc_geth: Fix build error from phy_device API change
    bonding: Prevent IPv6 link local address on enslaved devices
    IB/mlx5: Add flow steering support
    net/mlx5_core: Export flow steering API
    net/mlx5_core: Make ipv4/ipv6 location more clear
    net/mlx5_core: Enable flow steering support for the IB driver
    net/mlx5_core: Initialize namespaces only when supported by device
    net/mlx5_core: Set priority attributes
    net/mlx5_core: Connect flow tables
    net/mlx5_core: Introduce modify flow table command
    net/mlx5_core: Managing root flow table
    ...

    Linus Torvalds
     

04 Jan, 2016

1 commit


04 Dec, 2015

1 commit


02 Dec, 2015

1 commit

  • This patch is a cleanup to make following patch easier to
    review.

    Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
    from (struct socket)->flags to a (struct socket_wq)->flags
    to benefit from RCU protection in sock_wake_async()

    To ease backports, we rename both constants.

    Two new helpers, sk_set_bit(int nr, struct sock *sk)
    and sk_clear_bit(int net, struct sock *sk) are added so that
    following patch can change their implementation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Dec, 2015

1 commit

  • The memory barrier in the helper wq_has_sleeper is needed by just
    about every user of waitqueue_active. This patch generalises it
    by making it take a wait_queue_head_t directly. The existing
    helper is renamed to skwq_has_sleeper.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

25 Nov, 2015

1 commit

  • Normally, the transmit phase of a client call is implicitly ack'd by the
    reception of the first data packet of the response being received.
    However, if a security negotiation happens, the transmit phase, if it is
    entirely contained in a single packet, may get an ack packet in response
    and then may get aborted due to security negotiation failure.

    Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
    to having transmitted all the data, the code that handles processing of the
    received ack packet doesn't note the hard ack the data packet.

    The following abort packet in the case of security negotiation failure then
    incurs an assertion failure when it tries to drain the Tx queue because the
    hard ack state is out of sync (hard ack means the packets have been
    processed and can be discarded by the sender; a soft ack means that the
    packets are received but could still be discarded and rerequested by the
    receiver).

    To fix this, we should record the hard ack we received for the ack packet.

    The assertion failure looks like:

    RxRPC: Assertion failed
    1 ] [] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
    ...

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

06 Nov, 2015

1 commit

  • Pull security subsystem update from James Morris:
    "This is mostly maintenance updates across the subsystem, with a
    notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
    maintainer of that"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
    apparmor: clarify CRYPTO dependency
    selinux: Use a kmem_cache for allocation struct file_security_struct
    selinux: ioctl_has_perm should be static
    selinux: use sprintf return value
    selinux: use kstrdup() in security_get_bools()
    selinux: use kmemdup in security_sid_to_context_core()
    selinux: remove pointless cast in selinux_inode_setsecurity()
    selinux: introduce security_context_str_to_sid
    selinux: do not check open perm on ftruncate call
    selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
    KEYS: Merge the type-specific data with the payload data
    KEYS: Provide a script to extract a module signature
    KEYS: Provide a script to extract the sys cert list from a vmlinux file
    keys: Be more consistent in selection of union members used
    certs: add .gitignore to stop git nagging about x509_certificate_list
    KEYS: use kvfree() in add_key
    Smack: limited capability for changing process label
    TPM: remove unnecessary little endian conversion
    vTPM: support little endian guests
    char: Drop owner assignment from i2c_driver
    ...

    Linus Torvalds
     

21 Oct, 2015

1 commit

  • Merge the type-specific data with the payload data into one four-word chunk
    as it seems pointless to keep them separate.

    Use user_key_payload() for accessing the payloads of overloaded
    user-defined keys.

    Signed-off-by: David Howells
    cc: linux-cifs@vger.kernel.org
    cc: ecryptfs@vger.kernel.org
    cc: linux-ext4@vger.kernel.org
    cc: linux-f2fs-devel@lists.sourceforge.net
    cc: linux-nfs@vger.kernel.org
    cc: ceph-devel@vger.kernel.org
    cc: linux-ima-devel@lists.sourceforge.net

    David Howells
     

21 Sep, 2015

1 commit


11 May, 2015

2 commits


12 Apr, 2015

2 commits


01 Apr, 2015

4 commits

  • Handle VERSION Rx protocol packets. We should respond to a VERSION packet
    with a string indicating the Rx version. This is a maximum of 64 characters
    and is padded out to 65 chars with NUL bytes.

    Note that other AFS clients use the version request as a NAT keepalive so we
    need to handle it rather than returning an abort.

    The standard formulation seems to be:

    built --

    for example:

    " OpenAFS 1.6.2 built 2013-05-07 "

    (note the three extra spaces) as obtained with:

    rxdebug grand.mit.edu -version

    from the openafs package.

    Signed-off-by: David Howells

    David Howells
     
  • Use iov_iter_count() in rxrpc_send_data() to get the remaining data length
    instead of using the len argument as the len argument is now redundant.

    Signed-off-by: David Howells

    David Howells
     
  • Don't call skb_add_data() in rxrpc_send_data() if there's no data to copy and
    also skip the calculations associated with it in such a case.

    Signed-off-by: David Howells

    David Howells
     
  • This commit:

    commit af2b040e470b470bfc881981db3c796072853eae
    Author: Al Viro
    Date: Thu Nov 27 21:44:24 2014 -0500
    Subject: rxrpc: switch rxrpc_send_data() to iov_iter primitives

    incorrectly changes a do-while loop into a while loop in rxrpc_send_data().

    Unfortunately, at least one pass through the loop is required - even if
    there is no data - so that the packet the closes the send phase can be
    sent if MSG_MORE is not set.

    Signed-off-by: David Howells

    David Howells
     

21 Mar, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Mar, 2015

1 commit

  • [I would really like an ACK on that one from dhowells; it appears to be
    quite straightforward, but...]

    MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
    fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
    in there. It gets passed via flags; in fact, another such check in the same
    function is done correctly - as flags & MSG_PEEK.

    It had been that way (effectively disabled) for 8 years, though, so the patch
    needs beating up - that case had never been tested. If it is correct, it's
    -stable fodder.

    Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     

10 Mar, 2015

1 commit


09 Mar, 2015

1 commit

  • When reading from the error queue, msg_name and msg_control are only
    populated for some errors. A new exception for empty timestamp skbs
    added a false positive on icmp errors without payload.

    `traceroute -M udpconn` only displayed gateways that return payload
    with the icmp error: the embedded network headers are pulled before
    sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

    Fix this regression by refining when msg_name and msg_control
    branches are taken. The solutions for the two fields are independent.

    msg_name only makes sense for errors that configure serr->port and
    serr->addr_offset. Test the first instead of skb->len. This also fixes
    another issue. saddr could hold the wrong data, as serr->addr_offset
    is not initialized in some code paths, pointing to the start of the
    network header. It is only valid when serr->port is set (non-zero).

    msg_control support differs between IPv4 and IPv6. IPv4 only honors
    requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
    skb->len test can simply be removed, because skb->dev is also tested
    and never true for empty skbs. IPv6 honors requests for all errors
    aside from local errors and timestamps on empty skbs.

    In both cases, make the policy more explicit by moving this logic to
    a new function that decides whether to process msg_control and that
    optionally prepares the necessary fields in skb->cb[]. After this
    change, the IPv4 and IPv6 paths are more similar.

    The last case is rxrpc. Here, simply refine to only match timestamps.

    Fixes: 49ca0d8bfaf3 ("net-timestamp: no-payload option")

    Reported-by: Jan Niehusmann
    Signed-off-by: Willem de Bruijn

    ----

    Changes
    v1->v2
    - fix local origin test inversion in ip6_datagram_support_cmsg
    - make v4 and v6 code paths more similar by introducing analogous
    ipv4_datagram_support_cmsg
    - fix compile bug in rxrpc
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

04 Mar, 2015

1 commit


03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

02 Mar, 2015

3 commits

  • Commit 3b885787ea4112 ("net: Generalize socket rx gap / receive queue overflow cmsg")
    allowed receiving packet dropcount information as a socket level option.
    RXRPC sockets recvmsg function was changed to support this by calling
    sock_recv_ts_and_drops() instead of sock_recv_timestamp().

    However, protocol families wishing to receive dropcount should call
    sock_queue_rcv_skb() or set the dropcount specifically (as done
    in packet_rcv()). This was not done for rxrpc and thus this feature
    never worked on these sockets.

    Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as
    part of an effort to move skb->dropcount into skb->cb[]

    Signed-off-by: Eyal Birger
    Signed-off-by: David S. Miller

    Eyal Birger
     
  • rxrpc_resend_timeout has an initial value of 4 * HZ; use it as-is.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Typo, 'stop' is never set to true.
    Seems intent is to not attempt to retransmit more packets after sendmsg
    returns an error.

    This change is based on code inspection only.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Feb, 2015

1 commit


04 Feb, 2015

2 commits


03 Feb, 2015

1 commit

  • Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
    timestamps, this loops timestamps on top of empty packets.

    Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
    cmsg reception (aside from timestamps) are no longer possible. This
    works together with a follow on patch that allows administrators to
    only allow tx timestamping if it does not loop payload or metadata.

    Signed-off-by: Willem de Bruijn

    ----

    Changes (rfc -> v1)
    - add documentation
    - remove unnecessary skb->len test (thanks to Richard Cochran)
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

11 Dec, 2014

1 commit


10 Dec, 2014

1 commit

  • Note that the code _using_ ->msg_iter at that point will be very
    unhappy with anything other than unshifted iovec-backed iov_iter.
    We still need to convert users to proper primitives.

    Signed-off-by: Al Viro

    Al Viro
     

06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Oct, 2014

1 commit

  • Pull security subsystem updates from James Morris.

    Mostly ima, selinux, smack and key handling updates.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
    integrity: do zero padding of the key id
    KEYS: output last portion of fingerprint in /proc/keys
    KEYS: strip 'id:' from ca_keyid
    KEYS: use swapped SKID for performing partial matching
    KEYS: Restore partial ID matching functionality for asymmetric keys
    X.509: If available, use the raw subjKeyId to form the key description
    KEYS: handle error code encoded in pointer
    selinux: normalize audit log formatting
    selinux: cleanup error reporting in selinux_nlmsg_perm()
    KEYS: Check hex2bin()'s return when generating an asymmetric key ID
    ima: detect violations for mmaped files
    ima: fix race condition on ima_rdwr_violation_check and process_measurement
    ima: added ima_policy_flag variable
    ima: return an error code from ima_add_boot_aggregate()
    ima: provide 'ima_appraise=log' kernel option
    ima: move keyring initialization to ima_init()
    PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
    PKCS#7: Better handling of unsupported crypto
    KEYS: Overhaul key identification when searching for asymmetric keys
    KEYS: Implement binary asymmetric key ID handling
    ...

    Linus Torvalds
     

24 Sep, 2014

1 commit


17 Sep, 2014

1 commit

  • A previous patch added a ->match_preparse() method to the key type. This is
    allowed to override the function called by the iteration algorithm.
    Therefore, we can just set a default that simply checks for an exact match of
    the key description with the original criterion data and allow match_preparse
    to override it as needed.

    The key_type::match op is then redundant and can be removed, as can the
    user_match() function.

    Signed-off-by: David Howells
    Acked-by: Vivek Goyal

    David Howells
     

10 Sep, 2014

1 commit


02 Sep, 2014

1 commit

  • sk->sk_error_queue is dequeued in four locations. All share the
    exact same logic. Deduplicate.

    Also collapse the two critical sections for dequeue (at the top of
    the recv handler) and signal (at the bottom).

    This moves signal generation for the next packet forward, which should
    be harmless.

    It also changes the behavior if the recv handler exits early with an
    error. Previously, a signal for follow-up packets on the errqueue
    would then not be scheduled. The new behavior, to always signal, is
    arguably a bug fix.

    For rxrpc, the change causes the same function to be called repeatedly
    for each queued packet (because the recv handler == sk_error_report).
    It is likely that all packets will fail for the same reason (e.g.,
    memory exhaustion).

    This code runs without sk_lock held, so it is not safe to trust that
    sk->sk_err is immutable inbetween releasing q->lock and the subsequent
    test. Introduce int err just to avoid this potential race.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

23 Aug, 2014

1 commit