18 Dec, 2019

1 commit

  • [ Upstream commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e ]

    This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
    as some modules currently pass a net argument without a socket to
    ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
    ipv6_stub_impl.ipv6_dst_lookup to take net argument").

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     

05 Nov, 2019

1 commit


02 Nov, 2019

1 commit

  • Historically linux tried to stick to RFC 791, 1122, 2003
    for IPv4 ID field generation.

    RFC 6864 made clear that no matter how hard we try,
    we can not ensure unicity of IP ID within maximum
    lifetime for all datagrams with a given source
    address/destination address/protocol tuple.

    Linux uses a per socket inet generator (inet_id), initialized
    at connection startup with a XOR of 'jiffies' and other
    fields that appear clear on the wire.

    Thiemo Nagel pointed that this strategy is a privacy
    concern as this provides 16 bits of entropy to fingerprint
    devices.

    Let's switch to a random starting point, this is just as
    good as far as RFC 6864 is concerned and does not leak
    anything critical.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: Thiemo Nagel
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

27 Sep, 2019

1 commit

  • Currently, ip6_xmit() sets skb->priority based on sk->sk_priority

    This is not desirable for TCP since TCP shares the same ctl socket
    for a given netns. We want to be able to send RST or ACK packets
    with a non zero skb->priority.

    This patch has no functional change.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Jul, 2019

1 commit

  • In the sysctl code the proc_dointvec_minmax() function is often used to
    validate the user supplied value between an allowed range. This
    function uses the extra1 and extra2 members from struct ctl_table as
    minimum and maximum allowed value.

    On sysctl handler declaration, in every source file there are some
    readonly variables containing just an integer which address is assigned
    to the extra1 and extra2 members, so the sysctl range is enforced.

    The special values 0, 1 and INT_MAX are very often used as range
    boundary, leading duplication of variables like zero=0, one=1,
    int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

    Add a const int array containing the most commonly used values, some
    macros to refer more easily to the correct array member, and use them
    instead of creating a local one for every object file.

    This is the bloat-o-meter output comparing the old and new binary
    compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data old new delta
    sysctl_vals - 12 +12
    __kstrtab_sysctl_vals - 12 +12
    max 14 10 -4
    int_max 16 - -16
    one 68 - -68
    zero 128 28 -100
    Total: Before=20583249, After=20583085, chg -0.00%

    [mcroce@redhat.com: tipc: remove two unused variables]
    Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
    [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
    [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
    Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
    [akpm@linux-foundation.org: fix fs/eventpoll.c]
    Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
    Signed-off-by: Matteo Croce
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Reviewed-by: Aaron Tomlin
    Cc: Matthew Wilcox
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matteo Croce
     

09 Jul, 2019

1 commit

  • Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
    If not set, by default an autogenerated flowlabel is selected.

    Explicit flowlabels require a control operation per label plus a
    datapath check on every connection (every datagram if unconnected).
    This is particularly expensive on unconnected sockets multiplexing
    many flows, such as QUIC.

    In the common case, where no lease is exclusive, the check can be
    safely elided, as both lease request and check trivially succeed.
    Indeed, autoflowlabel does the same even with exclusive leases.

    Elide the check if no process has requested an exclusive lease.

    fl6_sock_lookup previously returns either a reference to a lease or
    NULL to denote failure. Modify to return a real error and update
    all callers. On return NULL, they can use the label and will elide
    the atomic_dec in fl6_sock_release.

    This is an optimization. Robust applications still have to revert to
    requesting leases if the fast path fails due to an exclusive lease.

    Changes RFC->v1:
    - use static_key_false_deferred to rate limit jump label operations
    - call static_key_deferred_flush to stop timers on exit
    - move decrement out of RCU context
    - defer optimization also if opt data is associated with a lease
    - updated all fp6_sock_lookup callers, not just udp

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

05 Jun, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation version 2 of the license

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 315 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Armijn Hemel
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license v2 as published
    by the free software foundation

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 2 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Reviewed-by: Armijn Hemel
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531081037.837563564@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify it
    under the terms of the gnu general public license as published by the
    free software foundation either version 2 of the license or at your
    option any later version this program is distributed in the hope that it
    will be useful but without any warranty without even the implied warranty
    of merchantability or fitness for a particular purpose see the gnu
    general public license for more details you should have received a copy
    of the gnu general public license along with this program if not write to
    the free software foundation inc 675 mass ave cambridge ma 02139 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 4 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190524100843.499675784@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details you
    should have received a copy of the gnu general public license along
    with this program if not write to the free software foundation inc
    675 mass ave cambridge ma 02139 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 441 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Michael Ellerman (powerpc)
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520071858.739733335@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


13 May, 2019

1 commit


20 Apr, 2019

1 commit

  • The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
    socket protocol handlers, and all of those end up calling the same
    sock_get_timestamp()/sock_get_timestampns() helper functions, which
    results in a lot of duplicate code.

    With the introduction of 64-bit time_t on 32-bit architectures, this
    gets worse, as we then need four different ioctl commands in each
    socket protocol implementation.

    To simplify that, let's add a new .gettstamp() operation in
    struct proto_ops, and move ioctl implementation into the common
    sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
    through.

    We can reuse the sock_get_timestamp() implementation, but generalize
    it so it can deal with both native and compat mode, as well as
    timeval and timespec structures.

    Acked-by: Stefan Schmidt
    Acked-by: Neil Horman
    Acked-by: Marc Kleine-Budde
    Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/
    Signed-off-by: Arnd Bergmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

02 Apr, 2019

1 commit

  • If dccp_feat_push_change fails, we forget free the mem
    which is alloced by kmemdup in dccp_feat_clone_sp_val.

    Reported-by: Hulk Robot
    Fixes: e8ef967a54f4 ("dccp: Registration routines for changing feature values")
    Reviewed-by: Mukesh Ojha
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

20 Mar, 2019

1 commit


09 Feb, 2019

1 commit


02 Feb, 2019

1 commit

  • Similarly to commit 276bdb82dedb ("dccp: check ccid before dereferencing")
    it is wise to test for a NULL ccid.

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
    RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
    Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
    kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
    RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
    RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
    RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
    R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
    R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
    kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
    dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
    sk_backlog_rcv include/net/sock.h:936 [inline]
    __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
    dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
    ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
    ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
    NF_HOOK include/linux/netfilter.h:289 [inline]
    NF_HOOK include/linux/netfilter.h:283 [inline]
    ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
    dst_input include/net/dst.h:450 [inline]
    ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
    NF_HOOK include/linux/netfilter.h:289 [inline]
    NF_HOOK include/linux/netfilter.h:283 [inline]
    ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
    __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
    __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
    process_backlog+0x206/0x750 net/core/dev.c:5923
    napi_poll net/core/dev.c:6346 [inline]
    net_rx_action+0x76d/0x1930 net/core/dev.c:6412
    __do_softirq+0x30b/0xb11 kernel/softirq.c:292
    run_ksoftirqd kernel/softirq.c:654 [inline]
    run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
    smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
    kthread+0x357/0x430 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
    Modules linked in:
    ---[ end trace 58a0ba03bea2c376 ]---
    RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
    RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
    Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
    RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
    RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
    RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
    R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
    R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Gerrit Renker
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Jan, 2019

1 commit

  • Instead of using pingpong as a single bit information, we refactor the
    code to treat it as a counter. When interactive session is detected,
    we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
    is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.

    This patch is a pure refactor and sets foundation for the next patch.
    This patch itself does not change any pingpong logic.

    Signed-off-by: Wei Wang
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     

29 Dec, 2018

2 commits

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     
  • Patch series "mm: convert totalram_pages, totalhigh_pages and managed
    pages to atomic", v5.

    This series converts totalram_pages, totalhigh_pages and
    zone->managed_pages to atomic variables.

    totalram_pages, zone->managed_pages and totalhigh_pages updates are
    protected by managed_page_count_lock, but readers never care about it.
    Convert these variables to atomic to avoid readers potentially seeing a
    store tear.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
    to remove the lock and convert variables to atomic. With the change,
    preventing poteintial store-to-read tearing comes as a bonus.

    This patch (of 4):

    This is in preparation to a later patch which converts totalram_pages and
    zone->managed_pages to atomic variables. Please note that re-reading the
    value might lead to a different value and as such it could lead to
    unexpected behavior. There are no known bugs as a result of the current
    code but it is better to prevent from them in principle.

    Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

25 Dec, 2018

1 commit

  • Patch eedbbb0d98b2 "net: dccp: initialize (addr,port) ..."
    added calling to inet_hashinfo2_init() from dccp_init().

    However, inet_hashinfo2_init() is marked as __init(), and
    thus the kernel panics when dccp is loaded as module. Removing
    __init() tag from inet_hashinfo2_init() is not feasible because
    it calls into __init functions in mm.

    This patch adds inet_hashinfo2_init_mod() function that can
    be called after the init phase is done; changes dccp_init() to
    call the new function; un-marks inet_hashinfo2_init() as
    exported.

    Fixes: eedbbb0d98b2 ("net: dccp: initialize (addr,port) ...")
    Reported-by: kernel test robot
    Signed-off-by: Peter Oskolkov
    Signed-off-by: David S. Miller

    Peter Oskolkov
     

18 Dec, 2018

1 commit

  • Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
    removes port-only listener lookups. This caused segfaults in DCCP
    lookups because DCCP did not initialize the (addr,port) hashtable.

    This patch adds said initialization.

    The only non-trivial issue here is the size of the new hashtable.
    It seemed reasonable to make it match the size of the port-only
    hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
    parameters to inet_hashinfo2_init() match those used in TCP.

    V2 changes: marked inet_hashinfo2_init as an exported symbol
    so that DCCP compiles when configured as a module.

    Tested: syzcaller issues fixed; the second patch in the patchset
    tests that DCCP lookups work correctly.

    Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
    Reported-by: syzcaller
    Signed-off-by: Peter Oskolkov
    Signed-off-by: David S. Miller

    Peter Oskolkov
     

17 Dec, 2018

2 commits

  • This reverts commit ec49d83f245453515a9b6e88324e27bbcb69fbae.

    Cause build failures when DCCP is modular.

    ERROR: "inet_hashinfo2_init" [net/dccp/dccp.ko] undefined!

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
    removes port-only listener lookups. This caused segfaults in DCCP
    lookups because DCCP did not initialize the (addr,port) hashtable.

    This patch adds said initialization.

    The only non-trivial issue here is the size of the new hashtable.
    It seemed reasonable to make it match the size of the port-only
    hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
    parameters to inet_hashinfo2_init() match those used in TCP.

    Tested: syzcaller issues fixed; the second patch in the patchset
    tests that DCCP lookups work correctly.

    Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
    Reported-by: syzcaller
    Signed-off-by: Peter Oskolkov
    Signed-off-by: David S. Miller

    Peter Oskolkov
     

09 Nov, 2018

1 commit

  • We'll need this to handle ICMP errors for tunnels without a sending socket
    (i.e. FoU and GUE). There, we might have to look up different types of IP
    tunnels, registered as network protocols, before we get a match, so we
    want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
    inet_protos and inet6_protos. These error codes will be used in the next
    patch.

    For consistency, return sensible error codes in protocol error handlers
    whenever handlers can't handle errors because, even if valid, they don't
    match a protocol or any of its states.

    This has no effect on existing error handling paths.

    Signed-off-by: Stefano Brivio
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Stefano Brivio
     

08 Nov, 2018

1 commit


24 Oct, 2018

1 commit

  • This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317.

    This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
    internal TCP socket for the initial handshake with the remote peer.
    Whenever the SMC connection can not be established this TCP socket is
    used as a fallback. All socket operations on the SMC socket are then
    forwarded to the TCP socket. In case of poll, the file->private_data
    pointer references the SMC socket because the TCP socket has no file
    assigned. This causes tcp_poll to wait on the wrong socket.

    Signed-off-by: Karsten Graul
    Signed-off-by: David S. Miller

    Karsten Graul
     

03 Oct, 2018

1 commit

  • Timer handlers do not imply rcu_read_lock(), so my recent fix
    triggered a LOCKDEP warning when SYNACK is retransmit.

    Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
    usages instead of guessing what is done by callers, since it is
    not worth the pain.

    Get rid of ireq_opt_deref() helper since it hides the logic
    without real benefit, since it is now a standard rcu_dereference().

    Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged")
    Signed-off-by: Eric Dumazet
    Reported-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Oct, 2018

1 commit

  • In normal SYN processing, packets are handled without listener
    lock and in RCU protected ingress path.

    But syzkaller is known to be able to trick us and SYN
    packets might be processed in process context, after being
    queued into socket backlog.

    In commit 06f877d613be ("tcp/dccp: fix other lockdep splats
    accessing ireq_opt") I made a very stupid fix, that happened
    to work mostly because of the regular path being RCU protected.

    Really the thing protecting ireq->ireq_opt is RCU read lock,
    and the pseudo request refcnt is not relevant.

    This patch extends what I did in commit 449809a66c1d ("tcp/dccp:
    block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
    pair in the paths that might be taken when processing SYN from
    socket backlog (thus possibly in process context)

    Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Aug, 2018

1 commit


08 Aug, 2018

1 commit

  • The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
    can lead to undefined behavior [1].

    In order to fix this use a gradual shift of the window with a 'while'
    loop, similar to what tcp_cwnd_restart() is doing.

    When comparing delta and RTO there is a minor difference between TCP
    and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
    'cwnd' if delta equals RTO. That case is preserved in this change.

    [1]:
    [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
    [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
    [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
    ...
    [40851.377176] Call Trace:
    [40851.408503] dump_stack+0xf1/0x17b
    [40851.451331] ? show_regs_print_info+0x5/0x5
    [40851.503555] ubsan_epilogue+0x9/0x7c
    [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
    [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
    [40851.686796] ? xfrm4_output_finish+0x80/0x80
    [40851.739827] ? lock_downgrade+0x6d0/0x6d0
    [40851.789744] ? xfrm4_prepare_output+0x160/0x160
    [40851.845912] ? ip_queue_xmit+0x810/0x1db0
    [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
    [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
    [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
    [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
    [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
    [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
    [40852.254833] ? sched_clock+0x5/0x10
    [40852.298508] ? sched_clock+0x5/0x10
    [40852.342194] ? inet_create+0xdf0/0xdf0
    [40852.388988] sock_sendmsg+0xd9/0x160
    ...

    Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

31 Jul, 2018

1 commit


29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Jun, 2018

2 commits


13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds