01 Dec, 2018

1 commit

  • commit 604d415e2bd642b7e02c80e719e0396b9d4a77a6 upstream.

    syzkaller triggered a use-after-free [1], caused by a combination of
    skb_get() in llc_conn_state_process() and usage of sk_eat_skb()

    sk_eat_skb() is assuming the skb about to be freed is only used by
    the current thread. TCP/DCCP stacks enforce this because current
    thread holds the socket lock.

    llc_conn_state_process() wants to make sure skb does not disappear,
    and holds a reference on the skb it manipulates. But as soon as this
    skb is added to socket receive queue, another thread can consume it.

    This means that llc must use regular skb_unlink() and kfree_skb()
    so that both producer and consumer can safely work on the same skb.

    [1]
    BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
    BUG: KASAN: use-after-free in refcount_read include/linux/refcount.h:43 [inline]
    BUG: KASAN: use-after-free in skb_unref include/linux/skbuff.h:967 [inline]
    BUG: KASAN: use-after-free in kfree_skb+0xb7/0x580 net/core/skbuff.c:655
    Read of size 4 at addr ffff8801d1f6fba4 by task ksoftirqd/1/18

    CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc8+ #295
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c4/0x2b6 lib/dump_stack.c:113
    print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
    kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
    atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
    refcount_read include/linux/refcount.h:43 [inline]
    skb_unref include/linux/skbuff.h:967 [inline]
    kfree_skb+0xb7/0x580 net/core/skbuff.c:655
    llc_sap_state_process+0x9b/0x550 net/llc/llc_sap.c:224
    llc_sap_rcv+0x156/0x1f0 net/llc/llc_sap.c:297
    llc_sap_handler+0x65e/0xf80 net/llc/llc_sap.c:438
    llc_rcv+0x79e/0xe20 net/llc/llc_input.c:208
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
    process_backlog+0x218/0x6f0 net/core/dev.c:5829
    napi_poll net/core/dev.c:6249 [inline]
    net_rx_action+0x7c5/0x1950 net/core/dev.c:6315
    __do_softirq+0x30c/0xb03 kernel/softirq.c:292
    run_ksoftirqd+0x94/0x100 kernel/softirq.c:653
    smpboot_thread_fn+0x68b/0xa00 kernel/smpboot.c:164
    kthread+0x35a/0x420 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

    Allocated by task 18:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
    __alloc_skb+0x119/0x770 net/core/skbuff.c:193
    alloc_skb include/linux/skbuff.h:995 [inline]
    llc_alloc_frame+0xbc/0x370 net/llc/llc_sap.c:54
    llc_station_ac_send_xid_r net/llc/llc_station.c:52 [inline]
    llc_station_rcv+0x1dc/0x1420 net/llc/llc_station.c:111
    llc_rcv+0xc32/0xe20 net/llc/llc_input.c:220
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
    process_backlog+0x218/0x6f0 net/core/dev.c:5829
    napi_poll net/core/dev.c:6249 [inline]
    net_rx_action+0x7c5/0x1950 net/core/dev.c:6315
    __do_softirq+0x30c/0xb03 kernel/softirq.c:292

    Freed by task 16383:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x83/0x290 mm/slab.c:3756
    kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
    __kfree_skb+0x1d/0x20 net/core/skbuff.c:642
    sk_eat_skb include/net/sock.h:2366 [inline]
    llc_ui_recvmsg+0xec2/0x1610 net/llc/af_llc.c:882
    sock_recvmsg_nosec net/socket.c:794 [inline]
    sock_recvmsg+0xd0/0x110 net/socket.c:801
    ___sys_recvmsg+0x2b6/0x680 net/socket.c:2278
    __sys_recvmmsg+0x303/0xb90 net/socket.c:2390
    do_sys_recvmmsg+0x181/0x1a0 net/socket.c:2466
    __do_sys_recvmmsg net/socket.c:2484 [inline]
    __se_sys_recvmmsg net/socket.c:2480 [inline]
    __x64_sys_recvmmsg+0xbe/0x150 net/socket.c:2480
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801d1f6fac0
    which belongs to the cache skbuff_head_cache of size 232
    The buggy address is located 228 bytes inside of
    232-byte region [ffff8801d1f6fac0, ffff8801d1f6fba8)
    The buggy address belongs to the page:
    page:ffffea000747dbc0 count:1 mapcount:0 mapping:ffff8801d9be7680 index:0xffff8801d1f6fe80
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0007346e88 ffffea000705b108 ffff8801d9be7680
    raw: ffff8801d1f6fe80 ffff8801d1f6f0c0 000000010000000b 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801d1f6fa80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    ffff8801d1f6fb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8801d1f6fb80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
    ^
    ffff8801d1f6fc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801d1f6fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

16 Oct, 2018

1 commit

  • WHen an llc sock is added into the sk_laddr_hash of an llc_sap,
    it is not marked with SOCK_RCU_FREE.

    This causes that the sock could be freed while it is still being
    read by __llc_lookup_established() with RCU read lock. sock is
    refcounted, but with RCU read lock, nothing prevents the readers
    getting a zero refcnt.

    Fix it by setting SOCK_RCU_FREE in llc_sap_add_socket().

    Reported-by: syzbot+11e05f04c15e03be5254@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

10 Aug, 2018

1 commit


08 Aug, 2018

1 commit

  • llc_sap_put() decreases the refcnt before deleting sap
    from the global list. Therefore, there is a chance
    llc_sap_find() could find a sap with zero refcnt
    in this global list.

    Close this race condition by checking if refcnt is zero
    or not in llc_sap_find(), if it is zero then it is being
    removed so we can just treat it as gone.

    Reported-by:
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Jul, 2018

1 commit


29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Jun, 2018

1 commit

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     

26 May, 2018

1 commit


16 May, 2018

1 commit


08 May, 2018

1 commit

  • syzbot loves to set very small mtu on devices, since it brings joy.
    We must make llc_ui_sendmsg() fool proof.

    usercopy: Kernel memory overwrite attempt detected to wrapped address (offset 0, size 18446612139802320068)!

    kernel BUG at mm/usercopy.c:100!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 17464 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #36
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:usercopy_abort+0xbb/0xbd mm/usercopy.c:88
    RSP: 0018:ffff8801868bf800 EFLAGS: 00010282
    RAX: 000000000000006c RBX: ffffffff87d2fb00 RCX: 0000000000000000
    RDX: 000000000000006c RSI: ffffffff81610731 RDI: ffffed0030d17ef6
    RBP: ffff8801868bf858 R08: ffff88018daa4200 R09: ffffed003b5c4fb0
    R10: ffffed003b5c4fb0 R11: ffff8801dae27d87 R12: ffffffff87d2f8e0
    R13: ffffffff87d2f7a0 R14: ffffffff87d2f7a0 R15: ffffffff87d2f7a0
    FS: 00007f56a14ac700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b2bc21000 CR3: 00000001abeb1000 CR4: 00000000001426f0
    DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000030602
    Call Trace:
    check_bogus_address mm/usercopy.c:153 [inline]
    __check_object_size+0x5d9/0x5d9 mm/usercopy.c:256
    check_object_size include/linux/thread_info.h:108 [inline]
    check_copy_size include/linux/thread_info.h:139 [inline]
    copy_from_iter_full include/linux/uio.h:121 [inline]
    memcpy_from_msg include/linux/skbuff.h:3305 [inline]
    llc_ui_sendmsg+0x4b1/0x1530 net/llc/af_llc.c:941
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:639
    __sys_sendto+0x3d7/0x670 net/socket.c:1789
    __do_sys_sendto net/socket.c:1801 [inline]
    __se_sys_sendto net/socket.c:1797 [inline]
    __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x455979
    RSP: 002b:00007f56a14abc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00007f56a14ac6d4 RCX: 0000000000455979
    RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000018
    RBP: 000000000072bea0 R08: 00000000200012c0 R09: 0000000000000010
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000000548 R14: 00000000006fbf60 R15: 0000000000000000
    Code: 55 c0 e8 c0 55 bb ff ff 75 c8 48 8b 55 c0 4d 89 f9 ff 75 d0 4d 89 e8 48 89 d9 4c 89 e6 41 56 48 c7 c7 80 fa d2 87 e8 a0 0b a3 ff 0b e8 95 55 bb ff e8 c0 a8 f7 ff 8b 95 14 ff ff ff 4d 89 e8
    RIP: usercopy_abort+0xbb/0xbd mm/usercopy.c:88 RSP: ffff8801868bf800

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2018

2 commits

  • For SOCK_ZAPPED socket, we don't need to care about llc->sap,
    so we should just skip these refcount functions in this case.

    Fixes: f7e43672683b ("llc: hold llc_sap before release_sock()")
    Reported-by: kernel test robot
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • The connection timers of an llc sock could be still flying
    after we delete them in llc_sk_free(), and even possibly
    after we free the sock. We could just wait synchronously
    here in case of troubles.

    Note, I leave other call paths as they are, since they may
    not have to wait, at least we can change them to synchronously
    when needed.

    Also, move the code to net/llc/llc_conn.c, which is apparently
    a better place.

    Reported-by:
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Apr, 2018

1 commit

  • syzbot reported we still access llc->sap in llc_backlog_rcv()
    after it is freed in llc_sap_remove_socket():

    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
    llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
    llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
    llc_conn_service net/llc/llc_conn.c:400 [inline]
    llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
    llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
    sk_backlog_rcv include/net/sock.h:909 [inline]
    __release_sock+0x12f/0x3a0 net/core/sock.c:2335
    release_sock+0xa4/0x2b0 net/core/sock.c:2850
    llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204

    llc->sap is refcount'ed and llc_sap_remove_socket() is paired
    with llc_sap_add_socket(). This can be amended by holding its refcount
    before llc_sap_remove_socket() and releasing it after release_sock().

    Reported-by:
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

02 Apr, 2018

1 commit


27 Mar, 2018

2 commits

  • llc_conn_send_pdu() pushes the skb into write queue and
    calls llc_conn_send_pdus() to flush them out. However, the
    status of dev_queue_xmit() is not returned to caller,
    in this case, llc_conn_state_process().

    llc_conn_state_process() needs hold the skb no matter
    success or failure, because it still uses it after that,
    therefore we should hold skb before dev_queue_xmit() when
    that skb is the one being processed by llc_conn_state_process().

    For other callers, they can just pass NULL and ignore
    the return value as they are.

    Reported-by: Noam Rathaus
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 Mar, 2018

1 commit

  • Avoid a VLA[1] by using a real constant expression instead of a variable.
    The compiler should be able to optimize the original code and avoid using
    an actual VLA. Anyway this change is useful because it will avoid a false
    positive with -Wvla, it might also help the compiler generating better
    code.

    [1] https://lkml.org/lkml/2018/3/7/621

    Signed-off-by: Salvatore Mesoraca
    Signed-off-by: David S. Miller

    Salvatore Mesoraca
     

13 Feb, 2018

1 commit

  • Changes since v1:
    Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

    Before:
    All these functions either return a negative error indicator,
    or store length of sockaddr into "int *socklen" parameter
    and return zero on success.

    "int *socklen" parameter is awkward. For example, if caller does not
    care, it still needs to provide on-stack storage for the value
    it does not need.

    None of the many FOO_getname() functions of various protocols
    ever used old value of *socklen. They always just overwrite it.

    This change drops this parameter, and makes all these functions, on success,
    return length of sockaddr. It's always >= 0 and can be differentiated
    from an error.

    Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

    rpc_sockname() lost "int buflen" parameter, since its only use was
    to be passed to kernel_getsockname() as &buflen and subsequently
    not used in any way.

    Userspace API is not changed.

    text data bss dec hex filename
    30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
    30108109 2633612 873672 33615393 200ee21 vmlinux.o

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-bluetooth@vger.kernel.org
    CC: linux-decnet-user@lists.sourceforge.net
    CC: linux-wireless@vger.kernel.org
    CC: linux-rdma@vger.kernel.org
    CC: linux-sctp@vger.kernel.org
    CC: linux-nfs@vger.kernel.org
    CC: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

16 Nov, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Maintain the TCP retransmit queue using an rbtree, with 1GB
    windows at 100Gb this really has become necessary. From Eric
    Dumazet.

    2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

    3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
    Lunn.

    4) Add meter action support to openvswitch, from Andy Zhou.

    5) Add a data meta pointer for BPF accessible packets, from Daniel
    Borkmann.

    6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

    7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

    8) More work to move the RTNL mutex down, from Florian Westphal.

    9) Add 'bpftool' utility, to help with bpf program introspection.
    From Jakub Kicinski.

    10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
    Dangaard Brouer.

    11) Support 'blocks' of transformations in the packet scheduler which
    can span multiple network devices, from Jiri Pirko.

    12) TC flower offload support in cxgb4, from Kumar Sanghvi.

    13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
    Leitner.

    14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

    15) Add RED qdisc offloadability, and use it in mlxsw driver. From
    Nogah Frankel.

    16) eBPF based device controller for cgroup v2, from Roman Gushchin.

    17) Add some fundamental tracepoints for TCP, from Song Liu.

    18) Remove garbage collection from ipv6 route layer, this is a
    significant accomplishment. From Wei Wang.

    19) Add multicast route offload support to mlxsw, from Yotam Gigi"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
    tcp: highest_sack fix
    geneve: fix fill_info when link down
    bpf: fix lockdep splat
    net: cdc_ncm: GetNtbFormat endian fix
    openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
    netem: remove unnecessary 64 bit modulus
    netem: use 64 bit divide by rate
    tcp: Namespace-ify sysctl_tcp_default_congestion_control
    net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
    ipv6: set all.accept_dad to 0 by default
    uapi: fix linux/tls.h userspace compilation error
    usbnet: ipheth: prevent TX queue timeouts when device not ready
    vhost_net: conditionally enable tx polling
    uapi: fix linux/rxrpc.h userspace compilation errors
    net: stmmac: fix LPI transitioning for dwmac4
    atm: horizon: Fix irq release error
    net-sysfs: trigger netlink notification on ifalias change via sysfs
    openvswitch: Using kfree_rcu() to simplify the code
    openvswitch: Make local function ovs_nsh_key_attr_size() static
    openvswitch: Fix return value check in ovs_meter_cmd_features()
    ...

    Linus Torvalds
     

07 Nov, 2017

1 commit


04 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Oct, 2017

2 commits

  • …READ_ONCE()/WRITE_ONCE()

    Please do not apply this to mainline directly, instead please re-run the
    coccinelle script shown below and apply its output.

    For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
    preference to ACCESS_ONCE(), and new code is expected to use one of the
    former. So far, there's been no reason to change most existing uses of
    ACCESS_ONCE(), as these aren't harmful, and changing them results in
    churn.

    However, for some features, the read/write distinction is critical to
    correct operation. To distinguish these cases, separate read/write
    accessors must be used. This patch migrates (most) remaining
    ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
    coccinelle script:

    ----
    // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
    // WRITE_ONCE()

    // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

    virtual patch

    @ depends on patch @
    expression E1, E2;
    @@

    - ACCESS_ONCE(E1) = E2
    + WRITE_ONCE(E1, E2)

    @ depends on patch @
    expression E;
    @@

    - ACCESS_ONCE(E)
    + READ_ONCE(E)
    ----

    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: davem@davemloft.net
    Cc: linux-arch@vger.kernel.org
    Cc: mpe@ellerman.id.au
    Cc: shuah@kernel.org
    Cc: snitzer@redhat.com
    Cc: thor.thayer@linux.intel.com
    Cc: tj@kernel.org
    Cc: viro@zeniv.linux.org.uk
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Mark Rutland
     
  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: Hans Liljestrand
    Cc: "Paul E. McKenney"
    Cc: "Reshetova, Elena"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

05 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

27 May, 2017

1 commit

  • There is a race condition in llc_ui_bind if two or more processes/threads
    try to bind a same socket.

    If more processes/threads bind a same socket success that will lead to
    two problems, one is this action is not what we expected, another is
    will lead to kernel in unstable status or oops(in my simple test case,
    cause llc2.ko can't unload).

    The current code is test SOCK_ZAPPED bit to avoid a process to
    bind a same socket twice but that is can't avoid more processes/threads
    try to bind a same socket at the same time.

    So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.

    Signed-off-by: Lin Zhang
    Signed-off-by: David S. Miller

    linzhang
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

10 Mar, 2017

1 commit

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

02 Mar, 2017

1 commit


13 Feb, 2017

1 commit

  • It seems nobody used LLC since linux-3.12.

    Fortunately fuzzers like syzkaller still know how to run this code,
    otherwise it would be no fun.

    Setting skb->sk without skb->destructor leads to all kinds of
    bugs, we now prefer to be very strict about it.

    Ideally here we would use skb_set_owner() but this helper does not exist yet,
    only CAN seems to have a private helper for that.

    Fixes: 376c7311bdb6 ("net: add a temporary sanity check in skb_orphan()")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Nov, 2016

1 commit

  • Similar to commit 14135f30e33c ("inet: fix sleeping inside inet_wait_for_connect()"),
    sk_wait_event() needs to fix too, because release_sock() is blocking,
    it changes the process state back to running after sleep, which breaks
    the previous prepare_to_wait().

    Switch to the new wait API.

    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

17 Sep, 2016

1 commit


10 May, 2016

1 commit


05 May, 2016

1 commit

  • The stack object “info” has a total size of 12 bytes. Its last byte
    is padding which is not initialized and leaked via “put_cmsg”.

    Signed-off-by: Kangjie Lu
    Signed-off-by: David S. Miller

    Kangjie Lu
     

14 Apr, 2016

1 commit

  • sock_owned_by_user should not be used without socket lock held. It seems
    to be a common practice to check .owned before lock reclassification, so
    provide a little help to abstract this check away.

    Cc: linux-cifs@vger.kernel.org
    Cc: linux-bluetooth@vger.kernel.org
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

18 Feb, 2016

1 commit


27 Jul, 2015

1 commit

  • Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
    with flags = MSG_WAITALL | MSG_PEEK.

    sk_wait_data waits for sk_receive_queue not empty, but in this case,
    the receive queue is not empty, but does not contain any skb that we
    can use.

    Add a "last skb seen on receive queue" argument to sk_wait_data, so
    that it sleeps until the receive queue has new skbs.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
    Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
    Reported-by: Enrico Scholz
    Reported-by: Dan Searle
    Signed-off-by: Sabrina Dubroca
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sabrina Dubroca