14 Oct, 2013

1 commit

  • [ Upstream commit bd784a140712fd06674f2240eecfc4ccae421129 ]

    DCCP shouldn't be setting sk_err on redirects as it
    isn't an error condition. it should be doing exactly
    what tcp is doing and leaving the error handler without
    touching the socket.

    Signed-off-by: Duan Jiong
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Duan Jiong
     

18 Mar, 2013

1 commit

  • TCPCT uses option-number 253, reserved for experimental use and should
    not be used in production environments.
    Further, TCPCT does not fully implement RFC 6013.

    As a nice side-effect, removing TCPCT increases TCP's performance for
    very short flows:

    Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
    for files of 1KB size.

    before this patch:
    average (among 7 runs) of 20845.5 Requests/Second
    after:
    average (among 7 runs) of 21403.6 Requests/Second

    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller

    Christoph Paasch
     

22 Feb, 2013

1 commit

  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

12 Jan, 2013

2 commits

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Gerrit Renker
    CC: "David S. Miller"
    Signed-off-by: Kees Cook
    Acked-by: David S. Miller
    Acked-by: Gerrit Renker

    Kees Cook
     
  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Gerrit Renker
    CC: "David S. Miller"
    Signed-off-by: Kees Cook
    Acked-by: David S. Miller
    Acked-by: Gerrit Renker

    Kees Cook
     

15 Dec, 2012

1 commit

  • If in either of the above functions inet_csk_route_child_sock() or
    __inet_inherit_port() fails, the newsk will not be freed:

    unreferenced object 0xffff88022e8a92c0 (size 1592):
    comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
    hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x21/0x3e
    [] kmem_cache_alloc+0xb5/0xc5
    [] sk_prot_alloc.isra.53+0x2b/0xcd
    [] sk_clone_lock+0x16/0x21e
    [] inet_csk_clone_lock+0x10/0x7b
    [] tcp_create_openreq_child+0x21/0x481
    [] tcp_v4_syn_recv_sock+0x3a/0x23b
    [] tcp_check_req+0x29f/0x416
    [] tcp_v4_do_rcv+0x161/0x2bc
    [] tcp_v4_rcv+0x6c9/0x701
    [] ip_local_deliver_finish+0x70/0xc4
    [] ip_local_deliver+0x4e/0x7f
    [] ip_rcv_finish+0x1fc/0x233
    [] ip_rcv+0x217/0x267
    [] __netif_receive_skb+0x49e/0x553
    [] netif_receive_skb+0x50/0x82

    This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
    a single sock_put() is not enough to free the memory. Additionally, things
    like xfrm, memcg, cookie_values,... may have been initialized.
    We have to free them properly.

    This is fixed by forcing a call to tcp_done(), ending up in
    inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
    because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
    xfrm,...

    Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
    force it entering inet_csk_destroy_sock. To avoid the warning in
    inet_csk_destroy_sock, inet_num has to be set to 0.
    As inet_csk_destroy_sock does a dec on orphan_count, we first have to
    increase it.

    Calling tcp_done() allows us to remove the calls to
    tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

    A similar approach is taken for dccp by calling dccp_done().

    This is in the kernel since 093d282321 (tproxy: fix hash locking issue
    when using port redirection in __inet_inherit_port()), thus since
    version >= 2.6.37.

    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller

    Christoph Paasch
     

04 Nov, 2012

1 commit

  • For passive TCP connections using TCP_DEFER_ACCEPT facility,
    we incorrectly increment req->retrans each time timeout triggers
    while no SYNACK is sent.

    SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
    which we received the ACK from client). Only the last SYNACK is sent
    so that we can receive again an ACK from client, to move the req into
    accept queue. We plan to change this later to avoid the useless
    retransmit (and potential problem as this SYNACK could be lost)

    TCP_INFO later gives wrong information to user, claiming imaginary
    retransmits.

    Decouple req->retrans field into two independent fields :

    num_retrans : number of retransmit
    num_timeout : number of timeouts

    num_timeout is the counter that is incremented at each timeout,
    regardless of actual SYNACK being sent or not, and used to
    compute the exponential timeout.

    Introduce inet_rtx_syn_ack() helper to increment num_retrans
    only if ->rtx_syn_ack() succeeded.

    Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
    when we re-send a SYNACK in answer to a (retransmitted) SYN.
    Prior to this patch, we were not counting these retransmits.

    Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
    only if a synack packet was successfully queued.

    Reported-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Cc: Vijay Subramanian
    Cc: Elliott Hughes
    Cc: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Aug, 2012

2 commits

  • The CCID3 code fails to initialize the trailing padding bytes of struct
    tfrc_tx_info added for alignment on 64 bit architectures. It that for
    potentially leaks four bytes kernel stack via the getsockopt() syscall.
    Add an explicit memset(0) before filling the structure to avoid the
    info leak.

    Signed-off-by: Mathias Krause
    Cc: Gerrit Renker
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
    a NULL ccid pointer leading to a NULL pointer dereference. This could
    lead to a privilege escalation if the attacker is able to map page 0 and
    prepare it with a fake ccid_ops pointer.

    Signed-off-by: Mathias Krause
    Cc: Gerrit Renker
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Mathias Krause
     

24 Jul, 2012

1 commit

  • Use inet_iif() consistently, and for TCP record the input interface of
    cached RX dst in inet sock.

    rt->rt_iif is going to be encoded differently, so that we can
    legitimately cache input routes in the FIB info more aggressively.

    When the input interface is "use SKB device index" the rt->rt_iif will
    be set to zero.

    This forces us to move the TCP RX dst cache installation into the ipv4
    specific code, and as well it should since doing the route caching for
    ipv6 is pointless at the moment since it is not inspected in the ipv6
    input paths yet.

    Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
    obsolete set to a non-zero value to force invocation of the check
    callback.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Jul, 2012

1 commit


17 Jul, 2012

1 commit

  • This will be used so that we can compose a full flow key.

    Even though we have a route in this context, we need more. In the
    future the routes will be without destination address, source address,
    etc. keying. One ipv4 route will cover entire subnets, etc.

    In this environment we have to have a way to possess persistent storage
    for redirects and PMTU information. This persistent storage will exist
    in the FIB tables, and that's why we'll need to be able to rebuild a
    full lookup flow key here. Using that flow key will do a fib_lookup()
    and create/update the persistent entry.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jul, 2012

2 commits

  • This is the ipv6 version of inet_csk_update_pmtu().

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This abstracts away the call to dst_ops->update_pmtu() so that we can
    transparently handle the fact that, in the future, the dst itself can
    be invalidated by the PMTU update (when we have non-host routes cached
    in sockets).

    So we try to rebuild the socket cached route after the method
    invocation if necessary.

    This isn't used by SCTP because it needs to cache dsts per-transport,
    and thus will need it's own local version of this helper.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jul, 2012

3 commits


11 Jul, 2012

1 commit


05 Jul, 2012

1 commit


23 Jun, 2012

1 commit


16 Jun, 2012

1 commit

  • One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
    to handle the error pass the 32-bit info cookie in network byte order
    whereas ipv4 passes it around in host byte order.

    Like the ipv4 side, we have two helper functions. One for when we
    have a socket context and one for when we do not.

    ip6ip6 tunnels are not handled here, because they handle PMTU events
    by essentially relaying another ICMP packet-too-big message back to
    the original sender.

    This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
    kinds of situations that simply cannot happen when we do the PMTU
    update directly using a fully resolved route.

    In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
    likely be removed or changed into a BUG_ON() check. We should never
    have a prefixed ipv6 route when we get there.

    Another piece of strange history here is that TCP and DCCP, unlike in
    ipv4, never invoke the update_pmtu() method from their ICMP error
    handlers. This is incredibly astonishing since this is the context
    where we have the most accurate context in which to make a PMTU
    update, namely we have a fully connected socket and associated cached
    socket route.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 May, 2012

1 commit


21 Apr, 2012

2 commits

  • This results in code with less boiler plate that is a bit easier
    to read.

    Additionally stops us from using compatibility code in the sysctl
    core, hastening the day when the compatibility code can be removed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This makes it clearer which sysctls are relative to your current network
    namespace.

    This makes it a little less error prone by not exposing sysctls for the
    initial network namespace in other namespaces.

    This is the same way we handle all of our other network interfaces to
    userspace and I can't honestly remember why we didn't do this for
    sysctls right from the start.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

20 Apr, 2012

1 commit


16 Apr, 2012

1 commit


15 Apr, 2012

1 commit

  • There are two struct request_sock_ops providers, tcp and dccp.

    inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
    NULL if we make it non NULL like syn_ack_timeout

    Signed-off-by: Eric Dumazet
    Cc: Gerrit Renker
    Cc: dccp@vger.kernel.org
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Mar, 2012

2 commits

  • This fixes a bug in the sequence number validation during the initial handshake.

    The code did not treat the initial sequence numbers ISS and ISR as read-only and
    did not keep state for GSR and GSS as required by the specification. This causes
    problems with retransmissions during the initial handshake, causing the
    budding connection to be reset.

    This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.

    Signed-off-by: Samuel Jero
    Signed-off-by: Gerrit Renker

    Samuel Jero
     
  • This replaces an unjustified BUG_ON(), which could get triggered under normal
    conditions: X_calc can be 0 when p > 0. X would in this case be set to the
    minimum, s/t_mbi. Its replacement avoids t_ipi = 0 (unbounded sending rate).

    Thanks to Jordi, Victor and Xavier who reported this.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald

    Gerrit Renker
     

12 Jan, 2012

1 commit


20 Dec, 2011

2 commits

  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     
  • DaveM said:
    Please, this kind of stuff rots forever and not using bool properly
    drives me crazy.

    Joe Perches gave me the spatch script:

    @@
    bool b;
    @@
    -b = 0
    +b = false
    @@
    bool b;
    @@
    -b = 1
    +b = true

    I merely installed coccinelle, read the documentation and took credit.

    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     

17 Dec, 2011

1 commit

  • I've made a mistake when fixing the sock_/inet_diag aliases :(

    1. The sock_diag layer should request the family-based alias,
    not just the IPPROTO_IP one;
    2. The inet_diag layer should request for AF_INET+protocol alias,
    not just the protocol one.

    Thus fix this.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

12 Dec, 2011

1 commit


10 Dec, 2011

2 commits


07 Dec, 2011

2 commits

  • Sorry, but the vger didn't let this message go to the list. Re-sending it with
    less spam-filter-prone subject.

    When dumping the AF_INET/AF_INET6 sockets user will also specify the protocol,
    so prepare the protocol diag handlers to work with IPPROTO_ constants.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The ultimate goal is to get the sock_diag module, that works in
    family+protocol terms. Currently this is suitable to do on the
    inet_diag basis, so rename parts of the code. It will be moved
    to sock_diag.c later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov