11 Dec, 2014

1 commit


10 Dec, 2014

1 commit

  • Note that the code _using_ ->msg_iter at that point will be very
    unhappy with anything other than unshifted iovec-backed iov_iter.
    We still need to convert users to proper primitives.

    Signed-off-by: Al Viro

    Al Viro
     

06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Oct, 2014

1 commit

  • Pull security subsystem updates from James Morris.

    Mostly ima, selinux, smack and key handling updates.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
    integrity: do zero padding of the key id
    KEYS: output last portion of fingerprint in /proc/keys
    KEYS: strip 'id:' from ca_keyid
    KEYS: use swapped SKID for performing partial matching
    KEYS: Restore partial ID matching functionality for asymmetric keys
    X.509: If available, use the raw subjKeyId to form the key description
    KEYS: handle error code encoded in pointer
    selinux: normalize audit log formatting
    selinux: cleanup error reporting in selinux_nlmsg_perm()
    KEYS: Check hex2bin()'s return when generating an asymmetric key ID
    ima: detect violations for mmaped files
    ima: fix race condition on ima_rdwr_violation_check and process_measurement
    ima: added ima_policy_flag variable
    ima: return an error code from ima_add_boot_aggregate()
    ima: provide 'ima_appraise=log' kernel option
    ima: move keyring initialization to ima_init()
    PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
    PKCS#7: Better handling of unsupported crypto
    KEYS: Overhaul key identification when searching for asymmetric keys
    KEYS: Implement binary asymmetric key ID handling
    ...

    Linus Torvalds
     

24 Sep, 2014

1 commit


17 Sep, 2014

1 commit

  • A previous patch added a ->match_preparse() method to the key type. This is
    allowed to override the function called by the iteration algorithm.
    Therefore, we can just set a default that simply checks for an exact match of
    the key description with the original criterion data and allow match_preparse
    to override it as needed.

    The key_type::match op is then redundant and can be removed, as can the
    user_match() function.

    Signed-off-by: David Howells
    Acked-by: Vivek Goyal

    David Howells
     

10 Sep, 2014

1 commit


02 Sep, 2014

1 commit

  • sk->sk_error_queue is dequeued in four locations. All share the
    exact same logic. Deduplicate.

    Also collapse the two critical sections for dequeue (at the top of
    the recv handler) and signal (at the bottom).

    This moves signal generation for the next packet forward, which should
    be harmless.

    It also changes the behavior if the recv handler exits early with an
    error. Previously, a signal for follow-up packets on the errqueue
    would then not be scheduled. The new behavior, to always signal, is
    arguably a bug fix.

    For rxrpc, the change causes the same function to be called repeatedly
    for each queued packet (because the recv handler == sk_error_report).
    It is likely that all packets will fail for the same reason (e.g.,
    memory exhaustion).

    This code runs without sk_lock held, so it is not safe to trust that
    sk->sk_err is immutable inbetween releasing q->lock and the subsequent
    test. Introduce int err just to avoid this potential race.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

23 Aug, 2014

1 commit


07 Aug, 2014

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Steady transitioning of the BPF instructure to a generic spot so
    all kernel subsystems can make use of it, from Alexei Starovoitov.

    2) SFC driver supports busy polling, from Alexandre Rames.

    3) Take advantage of hash table in UDP multicast delivery, from David
    Held.

    4) Lighten locking, in particular by getting rid of the LRU lists, in
    inet frag handling. From Florian Westphal.

    5) Add support for various RFC6458 control messages in SCTP, from
    Geir Ola Vaagland.

    6) Allow to filter bridge forwarding database dumps by device, from
    Jamal Hadi Salim.

    7) virtio-net also now supports busy polling, from Jason Wang.

    8) Some low level optimization tweaks in pktgen from Jesper Dangaard
    Brouer.

    9) Add support for ipv6 address generation modes, so that userland
    can have some input into the process. From Jiri Pirko.

    10) Consolidate common TCP connection request code in ipv4 and ipv6,
    from Octavian Purdila.

    11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.

    12) Generic resizable RCU hash table, with intial users in netlink and
    nftables. From Thomas Graf.

    13) Maintain a name assignment type so that userspace can see where a
    network device name came from (enumerated by kernel, assigned
    explicitly by userspace, etc.) From Tom Gundersen.

    14) Automatic flow label generation on transmit in ipv6, from Tom
    Herbert.

    15) New packet timestamping facilities from Willem de Bruijn, meant to
    assist in measuring latencies going into/out-of the packet
    scheduler, latency from TCP data transmission to ACK, etc"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
    cxgb4 : Disable recursive mailbox commands when enabling vi
    net: reduce USB network driver config options.
    tg3: Modify tg3_tso_bug() to handle multiple TX rings
    amd-xgbe: Perform phy connect/disconnect at dev open/stop
    amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
    net: sun4i-emac: fix memory leak on bad packet
    sctp: fix possible seqlock seadlock in sctp_packet_transmit()
    Revert "net: phy: Set the driver when registering an MDIO bus device"
    cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
    team: Simplify return path of team_newlink
    bridge: Update outdated comment on promiscuous mode
    net-timestamp: ACK timestamp for bytestreams
    net-timestamp: TCP timestamping
    net-timestamp: SCHED timestamp on entering packet scheduler
    net-timestamp: add key to disambiguate concurrent datagrams
    net-timestamp: move timestamp flags out of sk_flags
    net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
    cxgb4i : Move stray CPL definitions to cxgb4 driver
    tcp: reduce spurious retransmits due to transient SACK reneging
    qlcnic: Initialize dcbnl_ops before register_netdev
    ...

    Linus Torvalds
     

23 Jul, 2014

1 commit


21 Jul, 2014

1 commit


17 May, 2014

1 commit


12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2014

1 commit

  • Keep track of rxrpc_call structures in a hashtable so they can be
    found directly from the network parameters which define the call.

    This allows incoming packets to be routed directly to a call without walking
    through hierarchy of peer -> transport -> connection -> call and all the
    spinlocks that that entailed.

    Signed-off-by: Tim Smith
    Signed-off-by: David Howells

    Tim Smith
     

27 Feb, 2014

5 commits

  • Set the RxRPC header flag to request an ACK packet for every odd-numbered DATA
    packet unless it's the last one (which implicitly requests an ACK anyway).
    This is similar to how librx appears to work.

    If we don't do this, we'll send out a full window of packets and then just sit
    there until the other side gets bored and sends an ACK to indicate that it's
    been idle for a while and has received no new packets.

    Requesting a lot of ACKs shouldn't be a problem as ACKs should be merged when
    possible.

    As AF_RXRPC currently works, it will schedule an ACK to be generated upon
    receipt of a DATA packet with the ACK-request packet set - and in the time
    taken to schedule this in a work queue, several other packets are likely to
    arrive and then all get ACK'd together.

    Signed-off-by: David Howells

    David Howells
     
  • Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU
    maximum size and the number of packets that can be glued into a jumbo packet.

    More info added to Documentation/networking/rxrpc.txt.

    Signed-off-by: David Howells

    David Howells
     
  • Improve ACK production by the following means:

    (1) Don't send an ACK_REQUESTED ack immediately even if the RXRPC_MORE_PACKETS
    flag isn't set on a data packet that has also has RXRPC_REQUEST_ACK set.

    MORE_PACKETS just means that the sender just emptied its Tx data buffer.
    More data will be forthcoming unless RXRPC_LAST_PACKET is also flagged.

    It is possible to see runs of DATA packets with MORE_PACKETS unset that
    aren't waiting for an ACK.

    It is therefore better to wait a small instant to see if we can combine an
    ACK for several packets.

    (2) Don't send an ACK_IDLE ack immediately unless we're responding to the
    terminal data packet of a call.

    Whilst sending an ACK_IDLE mid-call serves to let the other side know
    that we won't be asking it to resend certain Tx buffers and that it can
    discard them, spamming it with loads of acks just because we've
    temporarily run out of data just distracts it.

    (3) Put the ACK_IDLE ack generation timeout up to half a second rather than a
    single jiffy. Just because we haven't been given more data immediately
    doesn't mean that more isn't forthcoming. The other side may be busily
    finding the data to send to us.

    Signed-off-by: David Howells

    David Howells
     
  • Add sysctls for configuring RxRPC protocol handling, specifically controls on
    delays before ack generation, the delay before resending a packet, the maximum
    lifetime of a call and the expiration times of calls, connections and
    transports that haven't been recently used.

    More info added in Documentation/networking/rxrpc.txt.

    Signed-off-by: David Howells

    David Howells
     
  • AF_RXRPC sends UDP packets with the "Don't Fragment" bit set in an attempt to
    determine the maximum packet size between the local socket and the peer by
    invoking the generation of ICMP_FRAG_NEEDED packets.

    Once a packet is sent with the "Don't Fragment" bit set, it is then
    inconvenient to break it up as that requires recalculating all the rxrpc serial
    and sequence numbers and reencrypting all the fragments, so we switch off the
    "Don't Fragment" service temporarily and send the bounced packet again. Future
    packets then use the new MTU.

    That's all fine. The problem lies in rxrpc_UDP_error_report() where the code
    that deals with ICMP_FRAG_NEEDED packets lives. Packets of this type have a
    field (ee_info) to indicate the maximum packet size at the reporting node - but
    sometimes ee_info isn't filled in and is just left as 0 and the code must allow
    for this.

    When ee_info is 0, the code should take the MTU size we're currently using and
    reduce it for the next packet we want to send. However, it takes ee_info
    (which is known to be 0) and tries to reduce that instead.

    This was discovered by Coverity.

    Reported-by: Dave Jones
    Signed-off-by: David Howells

    David Howells
     

08 Feb, 2014

2 commits

  • When an ABORT is sent, aborting a connection, the sender quite reasonably
    forgets about the connection. If another frame is received, another ABORT
    will be sent. When the receiver gets it, it no longer applies to an extant
    connection, so an ABORT is sent, and so on...

    Prevent this by never sending a rejection for an ABORT packet.

    Signed-off-by: Tim Smith
    Signed-off-by: David Howells

    Tim Smith
     
  • The UDP checksum was already verified in rxrpc_data_ready() - which calls
    skb_checksum_complete() - as the RxRPC packet header contains no checksum of
    its own. Subsequent calls to skb_copy_and_csum_datagram_iovec() are thus
    redundant and are, in any case, being passed only a subset of the UDP payload -
    so the checksum will always fail if that path is taken.

    So there is no need to check skb->ip_summed in rxrpc_recvmsg(), and no need for
    the csum_copy_error: exit path.

    Signed-off-by: Tim Smith
    Signed-off-by: David Howells

    Tim Smith
     

29 Jan, 2014

1 commit


26 Jan, 2014

3 commits

  • On input, CHECKSUM_PARTIAL should be treated the same way as
    CHECKSUM_UNNECESSARY. See include/linux/skbuff.h

    Signed-off-by: Tim Smith
    Signed-off-by: David Howells

    Tim Smith
     
  • skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset.
    This leaves a free'd skb on the queue, resulting a double-free later.

    Without this, the following oops can occur:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] skb_dequeue+0x47/0x70
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in: af_rxrpc ...
    CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ #4
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000
    RIP: 0010:[] skb_dequeue+0x47/0x70
    RSP: 0018:ffff880035c93db8 EFLAGS: 00010097
    RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084
    RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270
    R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070
    R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720
    FS: 00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0
    Stack:
    ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8
    ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0
    ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008
    Call Trace:
    [] rxrpc_release+0x128/0x2e0 [af_rxrpc]
    [] sock_release+0x1f/0x80
    [] sock_close+0x12/0x20
    [] __fput+0xe1/0x230
    [] ____fput+0xe/0x10
    [] task_work_run+0xbc/0xe0
    [] do_exit+0x2be/0xa10
    [] ? do_munmap+0x297/0x3b0
    [] do_group_exit+0x3f/0xa0
    [] SyS_exit_group+0x14/0x20
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Tim Smith
    Signed-off-by: David Howells

    Tim Smith
     
  • If rx->conn is not NULL, rxrpc_connect_exclusive() does not
    acquire the transport's client lock, but it still releases it.

    The patch adds locking of the spinlock to this path.

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Signed-off-by: David Howells

    Alexey Khoroshilov
     

22 Jan, 2014

1 commit

  • Smatch complains because we are using an untrusted index into the
    rxrpc_acks[] array. It's just a read and it's only in the debug code,
    but it's simple enough to add a check and fix it.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

21 Nov, 2013

1 commit


20 Oct, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

22 Feb, 2013

1 commit

  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

12 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: "David S. Miller"
    Signed-off-by: Kees Cook
    Acked-by: David S. Miller

    Kees Cook
     

10 Jan, 2013

1 commit


15 Oct, 2012

1 commit

  • Pull module signing support from Rusty Russell:
    "module signing is the highlight, but it's an all-over David Howells frenzy..."

    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
    X.509: Fix indefinite length element skip error handling
    X.509: Convert some printk calls to pr_devel
    asymmetric keys: fix printk format warning
    MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
    MODSIGN: Make mrproper should remove generated files.
    MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
    MODSIGN: Use the same digest for the autogen key sig as for the module sig
    MODSIGN: Sign modules during the build process
    MODSIGN: Provide a script for generating a key ID from an X.509 cert
    MODSIGN: Implement module signature checking
    MODSIGN: Provide module signing public keys to the kernel
    MODSIGN: Automatically generate module signing keys if missing
    MODSIGN: Provide Kconfig options
    MODSIGN: Provide gitignore and make clean rules for extra files
    MODSIGN: Add FIPS policy
    module: signature checking hook
    X.509: Add a crypto key parser for binary (DER) X.509 certificates
    MPILIB: Provide a function to read raw data into an MPI
    X.509: Add an ASN.1 decoder
    X.509: Add simple ASN.1 grammar compiler
    ...

    Linus Torvalds
     

08 Oct, 2012

1 commit

  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     

14 Sep, 2012

1 commit


11 Jul, 2012

2 commits