24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

25 Jul, 2020

1 commit

  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

10 Jun, 2020

1 commit

  • The dynamic key update for addr_list_lock still causes troubles,
    for example the following race condition still exists:

    CPU 0: CPU 1:
    (RCU read lock) (RTNL lock)
    dev_mc_seq_show() netdev_update_lockdep_key()
    -> lockdep_unregister_key()
    -> netif_addr_lock_bh()

    because lockdep doesn't provide an API to update it atomically.
    Therefore, we have to move it back to static keys and use subclass
    for nest locking like before.

    In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key
    changes"), I already reverted most parts of commit ab92d68fc22f
    ("net: core: add generic lockdep keys").

    This patch reverts the rest and also part of commit f3b0a18bb6cb
    ("net: remove unnecessary variables and callback"). After this
    patch, addr_list_lock changes back to using static keys and
    subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
    not have to change back to ->ndo_get_lock_subclass().

    And hopefully this reduces some syzbot lockdep noises too.

    Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
    Cc: Taehee Yoo
    Cc: Dmitry Vyukov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

05 May, 2020

1 commit

  • This patch reverts the folowing commits:

    commit 064ff66e2bef84f1153087612032b5b9eab005bd
    "bonding: add missing netdev_update_lockdep_key()"

    commit 53d374979ef147ab51f5d632dfe20b14aebeccd0
    "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"

    commit 1f26c0d3d24125992ab0026b0dab16c08df947c7
    "net: fix kernel-doc warning in "

    commit ab92d68fc22f9afab480153bd82a20f6e2533769
    "net: core: add generic lockdep keys"

    but keeps the addr_list_lock_key because we still lock
    addr_list_lock nestedly on stack devices, unlikely xmit_lock
    this is safe because we don't take addr_list_lock on any fast
    path.

    Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
    Cc: Dmitry Vyukov
    Cc: Taehee Yoo
    Signed-off-by: Cong Wang
    Acked-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Cong Wang
     

19 Apr, 2020

1 commit

  • nr_add_node() invokes nr_neigh_get_dev(), which returns a local
    reference of the nr_neigh object to "nr_neigh" with increased refcnt.

    When nr_add_node() returns, "nr_neigh" becomes invalid, so the refcount
    should be decreased to keep refcount balanced.

    The issue happens in one normal path of nr_add_node(), which forgets to
    decrease the refcnt increased by nr_neigh_get_dev() and causes a refcnt
    leak. It should decrease the refcnt before the function returns like
    other normal paths do.

    Fix this issue by calling nr_neigh_put() before the nr_add_node()
    returns.

    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Signed-off-by: David S. Miller

    Xiyu Yang
     

25 Feb, 2020

6 commits


25 Oct, 2019

1 commit

  • Some interface types could be nested.
    (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
    These interface types should set lockdep class because, without lockdep
    class key, lockdep always warn about unexisting circular locking.

    In the current code, these interfaces have their own lockdep class keys and
    these manage itself. So that there are so many duplicate code around the
    /driver/net and /net/.
    This patch adds new generic lockdep keys and some helper functions for it.

    This patch does below changes.
    a) Add lockdep class keys in struct net_device
    - qdisc_running, xmit, addr_list, qdisc_busylock
    - these keys are used as dynamic lockdep key.
    b) When net_device is being allocated, lockdep keys are registered.
    - alloc_netdev_mqs()
    c) When net_device is being free'd llockdep keys are unregistered.
    - free_netdev()
    d) Add generic lockdep key helper function
    - netdev_register_lockdep_key()
    - netdev_unregister_lockdep_key()
    - netdev_update_lockdep_key()
    e) Remove unnecessary generic lockdep macro and functions
    f) Remove unnecessary lockdep code of each interfaces.

    After this patch, each interface modules don't need to maintain
    their lockdep keys.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

25 Jul, 2019

1 commit

  • sock_efree() releases the sock refcnt, if we don't hold this refcnt
    when setting skb->destructor to it, the refcnt would not be balanced.
    This leads to several bug reports from syzbot.

    I have checked other users of sock_efree(), all of them hold the
    sock refcnt.

    Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")
    Reported-and-tested-by:
    Reported-and-tested-by:
    Reported-and-tested-by:
    Reported-and-tested-by:
    Cc: Ralf Baechle
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

02 Jul, 2019

1 commit

  • When the skb is associated with a new sock, just assigning
    it to skb->sk is not sufficient, we have to set its destructor
    to free the sock properly too.

    Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


20 Apr, 2019

1 commit

  • The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
    socket protocol handlers, and all of those end up calling the same
    sock_get_timestamp()/sock_get_timestampns() helper functions, which
    results in a lot of duplicate code.

    With the introduction of 64-bit time_t on 32-bit architectures, this
    gets worse, as we then need four different ioctl commands in each
    socket protocol implementation.

    To simplify that, let's add a new .gettstamp() operation in
    struct proto_ops, and move ioctl implementation into the common
    sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
    through.

    We can reuse the sock_get_timestamp() implementation, but generalize
    it so it can deal with both native and compat mode, as well as
    timeval and timespec structures.

    Acked-by: Stefan Schmidt
    Acked-by: Neil Horman
    Acked-by: Marc Kleine-Budde
    Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/
    Signed-off-by: Arnd Bergmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

12 Apr, 2019

1 commit

  • Syzkaller report this:

    BUG: unable to handle kernel paging request at fffffbfff830524b
    PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
    Oops: 0000 [#1] SMP KASAN PTI
    CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
    Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
    RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
    RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
    RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
    RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
    R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
    R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
    FS: 00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    __list_add include/linux/list.h:60 [inline]
    list_add include/linux/list.h:79 [inline]
    proto_register+0x444/0x8f0 net/core/sock.c:3375
    nr_proto_init+0x73/0x4b3 [netrom]
    ? 0xffffffffc1628000
    ? 0xffffffffc1628000
    do_one_initcall+0xbc/0x47d init/main.c:887
    do_init_module+0x1b5/0x547 kernel/module.c:3456
    load_module+0x6405/0x8c10 kernel/module.c:3804
    __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
    do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x462e99
    Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
    RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
    RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
    R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
    Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
    bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
    Dumping ftrace buffer:
    (ftrace buffer empty)
    CR2: fffffbfff830524b
    ---[ end trace 039ab24b305c4b19 ]---

    If nr_proto_init failed, it may forget to call proto_unregister,
    tiggering this issue.This patch rearrange code of nr_proto_init
    to avoid such issues.

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

28 Jan, 2019

1 commit

  • sk_reset_timer() and sk_stop_timer() properly handle
    sock refcnt for timer function. Switching to them
    could fix a refcounting bug reported by syzbot.

    Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
    Cc: Ralf Baechle
    Cc: linux-hams@vger.kernel.org
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 Dec, 2018

1 commit

  • nr_find_socket(), nr_find_peer() and nr_find_listener() lock the
    sock after finding it in the global list. However, the call path
    requires BH disabled for the sock lock consistently.

    Actually the locking is unnecessary at this point, we can just hold
    the sock refcnt to make sure it is not gone after we unlock the global
    list, and lock it later only when needed.

    Reported-and-tested-by: syzbot+f621cda8b7e598908efa@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Jun, 2018

1 commit

  • The kzalloc() function has a 2-factor argument form, kcalloc(). This
    patch replaces cases of:

    kzalloc(a * b, gfp)

    with:
    kcalloc(a * b, gfp)

    as well as handling cases of:

    kzalloc(a * b * c, gfp)

    with:

    kzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kzalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kzalloc
    + kcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kzalloc(sizeof(THING) * C2, ...)
    |
    kzalloc(sizeof(TYPE) * C2, ...)
    |
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(C1 * C2, ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

05 Jun, 2018

1 commit

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     

26 May, 2018

1 commit


16 May, 2018

1 commit


27 Mar, 2018

1 commit

  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

13 Feb, 2018

1 commit

  • Changes since v1:
    Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

    Before:
    All these functions either return a negative error indicator,
    or store length of sockaddr into "int *socklen" parameter
    and return zero on success.

    "int *socklen" parameter is awkward. For example, if caller does not
    care, it still needs to provide on-stack storage for the value
    it does not need.

    None of the many FOO_getname() functions of various protocols
    ever used old value of *socklen. They always just overwrite it.

    This change drops this parameter, and makes all these functions, on success,
    return length of sockaddr. It's always >= 0 and can be differentiated
    from an error.

    Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

    rpc_sockname() lost "int buflen" parameter, since its only use was
    to be passed to kernel_getsockname() as &buflen and subsequently
    not used in any way.

    Userspace API is not changed.

    text data bss dec hex filename
    30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
    30108109 2633612 873672 33615393 200ee21 vmlinux.o

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-bluetooth@vger.kernel.org
    CC: linux-decnet-user@lists.sourceforge.net
    CC: linux-wireless@vger.kernel.org
    CC: linux-rdma@vger.kernel.org
    CC: linux-sctp@vger.kernel.org
    CC: linux-nfs@vger.kernel.org
    CC: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

22 Nov, 2017

2 commits

  • With all callbacks converted, and the timer callback prototype
    switched over, the TIMER_FUNC_TYPE cast is no longer needed,
    so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
    $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
    $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

    The now unused macros are also dropped from include/linux/timer.h.

    Signed-off-by: Kees Cook

    Kees Cook
     
  • This changes all DEFINE_TIMER() callbacks to use a struct timer_list
    pointer instead of unsigned long. Since the data argument has already been
    removed, none of these callbacks are using their argument currently, so
    this renames the argument to "unused".

    Done using the following semantic patch:

    @match_define_timer@
    declarer name DEFINE_TIMER;
    identifier _timer, _callback;
    @@

    DEFINE_TIMER(_timer, _callback);

    @change_callback depends on match_define_timer@
    identifier match_define_timer._callback;
    type _origtype;
    identifier _origarg;
    @@

    void
    -_callback(_origtype _origarg)
    +_callback(struct timer_list *unused)
    { ... }

    Signed-off-by: Kees Cook

    Kees Cook
     

16 Nov, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Maintain the TCP retransmit queue using an rbtree, with 1GB
    windows at 100Gb this really has become necessary. From Eric
    Dumazet.

    2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

    3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
    Lunn.

    4) Add meter action support to openvswitch, from Andy Zhou.

    5) Add a data meta pointer for BPF accessible packets, from Daniel
    Borkmann.

    6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

    7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

    8) More work to move the RTNL mutex down, from Florian Westphal.

    9) Add 'bpftool' utility, to help with bpf program introspection.
    From Jakub Kicinski.

    10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
    Dangaard Brouer.

    11) Support 'blocks' of transformations in the packet scheduler which
    can span multiple network devices, from Jiri Pirko.

    12) TC flower offload support in cxgb4, from Kumar Sanghvi.

    13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
    Leitner.

    14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

    15) Add RED qdisc offloadability, and use it in mlxsw driver. From
    Nogah Frankel.

    16) eBPF based device controller for cgroup v2, from Roman Gushchin.

    17) Add some fundamental tracepoints for TCP, from Song Liu.

    18) Remove garbage collection from ipv6 route layer, this is a
    significant accomplishment. From Wei Wang.

    19) Add multicast route offload support to mlxsw, from Yotam Gigi"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
    tcp: highest_sack fix
    geneve: fix fill_info when link down
    bpf: fix lockdep splat
    net: cdc_ncm: GetNtbFormat endian fix
    openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
    netem: remove unnecessary 64 bit modulus
    netem: use 64 bit divide by rate
    tcp: Namespace-ify sysctl_tcp_default_congestion_control
    net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
    ipv6: set all.accept_dad to 0 by default
    uapi: fix linux/tls.h userspace compilation error
    usbnet: ipheth: prevent TX queue timeouts when device not ready
    vhost_net: conditionally enable tx polling
    uapi: fix linux/rxrpc.h userspace compilation errors
    net: stmmac: fix LPI transitioning for dwmac4
    atm: horizon: Fix irq release error
    net-sysfs: trigger netlink notification on ifalias change via sysfs
    openvswitch: Using kfree_rcu() to simplify the code
    openvswitch: Make local function ovs_nsh_key_attr_size() static
    openvswitch: Fix return value check in ovs_meter_cmd_features()
    ...

    Linus Torvalds
     

01 Nov, 2017

2 commits


22 Oct, 2017

1 commit


18 Oct, 2017

2 commits

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly for all users of sk_timer.

    Cc: "David S. Miller"
    Cc: Ralf Baechle
    Cc: Andrew Hendry
    Cc: Eric Dumazet
    Cc: Paolo Abeni
    Cc: David Howells
    Cc: Julia Lawall
    Cc: linzhang
    Cc: Ingo Molnar
    Cc: netdev@vger.kernel.org
    Cc: linux-hams@vger.kernel.org
    Cc: linux-x25@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     
  • The core sk_timer initializer can provide the common .data assignment
    instead of it being set separately in users.

    Cc: "David S. Miller"
    Cc: Ralf Baechle
    Cc: Andrew Hendry
    Cc: Eric Dumazet
    Cc: Paolo Abeni
    Cc: David Howells
    Cc: Colin Ian King
    Cc: Ingo Molnar
    Cc: linzhang
    Cc: netdev@vger.kernel.org
    Cc: linux-hams@vger.kernel.org
    Cc: linux-x25@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

05 Oct, 2017

1 commit

  • Drop the arguments from the macro and adjust all callers with the
    following script:

    perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \
    $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h)

    Signed-off-by: Kees Cook
    Acked-by: Geert Uytterhoeven # for m68k parts
    Acked-by: Guenter Roeck # for watchdog parts
    Acked-by: David S. Miller # for networking parts
    Acked-by: Greg Kroah-Hartman
    Acked-by: Kalle Valo # for wireless parts
    Acked-by: Arnd Bergmann
    Cc: linux-mips@linux-mips.org
    Cc: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Lai Jiangshan
    Cc: Sebastian Reichel
    Cc: Kalle Valo
    Cc: Paul Mackerras
    Cc: Pavel Machek
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: Chris Metcalf
    Cc: linux-s390@vger.kernel.org
    Cc: linux-wireless@vger.kernel.org
    Cc: "James E.J. Bottomley"
    Cc: Wim Van Sebroeck
    Cc: Michael Ellerman
    Cc: Ursula Braun
    Cc: Viresh Kumar
    Cc: Harish Patil
    Cc: Stephen Boyd
    Cc: Michael Reed
    Cc: Manish Chopra
    Cc: Len Brown
    Cc: Arnd Bergmann
    Cc: linux-pm@vger.kernel.org
    Cc: Heiko Carstens
    Cc: Tejun Heo
    Cc: Julian Wiedmann
    Cc: John Stultz
    Cc: Mark Gross
    Cc: linux-watchdog@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: "Martin K. Petersen"
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Oleg Nesterov
    Cc: Ralf Baechle
    Cc: Stefan Richter
    Cc: Guenter Roeck
    Cc: netdev@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: Andrew Morton
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Sudip Mukherjee
    Link: https://lkml.kernel.org/r/1507159627-127660-11-git-send-email-keescook@chromium.org
    Signed-off-by: Thomas Gleixner

    Kees Cook
     

05 Jul, 2017

2 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

10 Mar, 2017

1 commit

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells