02 Mar, 2019

2 commits

  • hugetlb pages should only be migrated if they are 'active'. The
    routines set/clear_page_huge_active() modify the active state of hugetlb
    pages.

    When a new hugetlb page is allocated at fault time, set_page_huge_active
    is called before the page is locked. Therefore, another thread could
    race and migrate the page while it is being added to page table by the
    fault code. This race is somewhat hard to trigger, but can be seen by
    strategically adding udelay to simulate worst case scheduling behavior.
    Depending on 'how' the code races, various BUG()s could be triggered.

    To address this issue, simply delay the set_page_huge_active call until
    after the page is successfully added to the page table.

    Hugetlb pages can also be leaked at migration time if the pages are
    associated with a file in an explicitly mounted hugetlbfs filesystem.
    For example, consider a two node system with 4GB worth of huge pages
    available. A program mmaps a 2G file in a hugetlbfs filesystem. It
    then migrates the pages associated with the file from one node to
    another. When the program exits, huge page counts are as follows:

    node0
    1024 free_hugepages
    1024 nr_hugepages

    node1
    0 free_hugepages
    1024 nr_hugepages

    Filesystem Size Used Avail Use% Mounted on
    nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool

    That is as expected. 2G of huge pages are taken from the free_hugepages
    counts, and 2G is the size of the file in the explicitly mounted
    filesystem. If the file is then removed, the counts become:

    node0
    1024 free_hugepages
    1024 nr_hugepages

    node1
    1024 free_hugepages
    1024 nr_hugepages

    Filesystem Size Used Avail Use% Mounted on
    nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool

    Note that the filesystem still shows 2G of pages used, while there
    actually are no huge pages in use. The only way to 'fix' the filesystem
    accounting is to unmount the filesystem

    If a hugetlb page is associated with an explicitly mounted filesystem,
    this information in contained in the page_private field. At migration
    time, this information is not preserved. To fix, simply transfer
    page_private from old to new page at migration time if necessary.

    There is a related race with removing a huge page from a file and
    migration. When a huge page is removed from the pagecache, the
    page_mapping() field is cleared, yet page_private remains set until the
    page is actually freed by free_huge_page(). A page could be migrated
    while in this state. However, since page_mapping() is not set the
    hugetlbfs specific routine to transfer page_private is not called and we
    leak the page count in the filesystem.

    To fix that, check for this condition before migrating a huge page. If
    the condition is detected, return EBUSY for the page.

    Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
    Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
    Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active")
    Signed-off-by: Mike Kravetz
    Reviewed-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Cc: "Kirill A . Shutemov"
    Cc: Mel Gorman
    Cc: Davidlohr Bueso
    Cc:
    [mike.kravetz@oracle.com: v2]
    Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
    [mike.kravetz@oracle.com: update comment and changelog]
    Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Building an arm64 allmodconfig kernel with clang results in over 140
    warnings about overly large stack frames, the worst ones being:

    drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
    drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
    drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
    drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
    drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
    drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'

    None of these happen with gcc today, and almost all of these are the
    result of a single known issue in llvm. Hopefully it will eventually
    get fixed with the clang-9 release.

    In the meantime, the best idea I have is to turn off asan-stack for
    clang-8 and earlier, so we can produce a kernel that is safe to run.

    I have posted three patches that address the frame overflow warnings
    that are not addressed by turning off asan-stack, so in combination with
    this change, we get much closer to a clean allmodconfig build, which in
    turn is necessary to do meaningful build regression testing.

    It is still possible to turn on the CONFIG_ASAN_STACK option on all
    versions of clang, and it's always enabled for gcc, but when
    CONFIG_COMPILE_TEST is set, the option remains invisible, so
    allmodconfig and randconfig builds (which are normally done with a
    forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.

    Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
    Link: https://bugs.llvm.org/show_bug.cgi?id=38809
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Qian Cai
    Reviewed-by: Mark Brown
    Acked-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Nick Desaulniers
    Cc: Kostya Serebryany
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

26 Feb, 2019

3 commits

  • When a cell with a volume location server list is added manually by
    echoing the details into /proc/net/afs/cells, a record is added but the
    flag saying it has been looked up isn't set.

    This causes the VL server rotation code to wait forever, with the top of
    /proc/pid/stack looking like:

    afs_select_vlserver+0x3a6/0x6f3
    afs_vl_lookup_vldb+0x4b/0x92
    afs_create_volume+0x25/0x1b9
    ...

    with the thread stuck in afs_start_vl_iteration() waiting for
    AFS_CELL_FL_NO_LOOKUP_YET to be cleared.

    Fix this by clearing AFS_CELL_FL_NO_LOOKUP_YET when setting up a record
    if that record's details were supplied manually.

    Fixes: 0a5143f2f89c ("afs: Implement VL server rotation")
    Reported-by: Dave Botsch
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • When we made the shmem_reserve_inode call in shmem_link conditional, we
    forgot to update the declaration for ret so that it always has a known
    value. Dan Carpenter pointed out this deficiency in the original patch.

    Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in")
    Reported-by: Dan Carpenter
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Hugh Dickins
    Cc: Matej Kupljen
    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b.

    It was well-intentioned, but wrong. Overriding the exception tables for
    instructions for random reasons is just wrong, and that is what the new
    code did.

    It caused problems for tracing, and it caused problems for strncpy_from_user(),
    because the new checks made perfectly valid use cases break, rather than
    catch things that did bad things.

    Unchecked user space accesses are a problem, but that's not a reason to
    add invalid checks that then people have to work around with silly flags
    (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
    odd way to say "this commit was wrong" and was sprinked into random
    places to hide the wrongness).

    The real fix to unchecked user space accesses is to get rid of the
    special "let's not check __get_user() and __put_user() at all" logic.
    Make __{get|put}_user() be just aliases to the regular {get|put}_user()
    functions, and make it impossible to access user space without having
    the proper checks in places.

    The raison d'être of the special double-underscore versions used to be
    that the range check was expensive, and if you did multiple user
    accesses, you'd do the range check up front (like the signal frame
    handling code, for example). But SMAP (on x86) and PAN (on ARM) have
    made that optimization pointless, because the _real_ expense is the "set
    CPU flag to allow user space access".

    Do let's not break the valid cases to catch invalid cases that shouldn't
    even exist.

    Cc: Thomas Gleixner
    Cc: Kees Cook
    Cc: Tobin C. Harding
    Cc: Borislav Petkov
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Jann Horn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 Feb, 2019

3 commits

  • Linus Torvalds
     
  • Pull KVM fixes from Paolo Bonzini:
    "Bug fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
    kvm: x86: Return LA57 feature based on hardware capability
    x86/kvm/mmu: fix switch between root and guest MMUs
    s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "Hopefully the last pull request for this release. Fingers crossed:

    1) Only refcount ESP stats on full sockets, from Martin Willi.

    2) Missing barriers in AF_UNIX, from Al Viro.

    3) RCU protection fixes in ipv6 route code, from Paolo Abeni.

    4) Avoid false positives in untrusted GSO validation, from Willem de
    Bruijn.

    5) Forwarded mesh packets in mac80211 need more tailroom allocated,
    from Felix Fietkau.

    6) Use operstate consistently for linkup in team driver, from George
    Wilkie.

    7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
    and PF code paths.

    8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.

    9) nfp eBPF code gen fixes from Jiong Wang.

    10) bnxt_en firmware timeout fix from Michael Chan.

    11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.

    12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
    net: phy: realtek: Dummy IRQ calls for RTL8366RB
    tcp: repaired skbs must init their tso_segs
    net/x25: fix a race in x25_bind()
    net: dsa: Remove documentation for port_fdb_prepare
    Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
    selftests: fib_tests: sleep after changing carrier. again.
    net: set static variable an initial value in atl2_probe()
    net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
    bpf, doc: add bpf list as secondary entry to maintainers file
    udp: fix possible user after free in error handler
    udpv6: fix possible user after free in error handler
    fou6: fix proto error handler argument type
    udpv6: add the required annotation to mib type
    mdio_bus: Fix use-after-free on device_register fails
    net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
    bnxt_en: Wait longer for the firmware message response to complete.
    bnxt_en: Fix typo in firmware message timeout logic.
    nfp: bpf: fix ALU32 high bits clearance bug
    nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
    Documentation: networking: switchdev: Update port parent ID section
    ...

    Linus Torvalds
     

24 Feb, 2019

10 commits

  • This fixes a regression introduced by
    commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7
    "net: phy: replace PHY_HAS_INTERRUPT with a check for
    config_intr and ack_interrupt".

    This assumes that a PHY cannot trigger interrupt unless
    it has .config_intr() or .ack_interrupt() implemented.
    A later patch makes the code assume both need to be
    implemented for interrupts to be present.

    But this PHY (which is inside a DSA) will happily
    fire interrupts without either callback.

    Implement dummy callbacks for .config_intr() and
    .ack_interrupt() in the phy header to fix this.

    Tested on the RTL8366RB on D-Link DIR-685.

    Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
    Cc: Heiner Kallweit
    Signed-off-by: Linus Walleij
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Linus Walleij
     
  • syzbot reported a WARN_ON(!tcp_skb_pcount(skb))
    in tcp_send_loss_probe() [1]

    This was caused by TCP_REPAIR sent skbs that inadvertenly
    were missing a call to tcp_init_tso_segs()

    [1]
    WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    panic+0x2cb/0x65c kernel/panic.c:214
    __warn.cold+0x20/0x45 kernel/panic.c:571
    report_bug+0x263/0x2b0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    fixup_bug arch/x86/kernel/traps.c:173 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
    do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
    RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
    Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89
    RSP: 0018:ffff8880ae907c60 EFLAGS: 00010206
    RAX: ffff8880a989c340 RBX: 0000000000000000 RCX: ffffffff85dedbdb
    RDX: 0000000000000100 RSI: ffffffff85dee0b1 RDI: 0000000000000005
    RBP: ffff8880ae907c90 R08: ffff8880a989c340 R09: ffffed10147d1ae1
    R10: ffffed10147d1ae0 R11: ffff8880a3e8d703 R12: ffff888091b90040
    R13: ffff8880a3e8d540 R14: 0000000000008000 R15: ffff888091b90860
    tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583
    tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607
    call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
    expire_timers kernel/time/timer.c:1362 [inline]
    __run_timers kernel/time/timer.c:1681 [inline]
    __run_timers kernel/time/timer.c:1649 [inline]
    run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
    __do_softirq+0x266/0x95a kernel/softirq.c:292
    invoke_softirq kernel/softirq.c:373 [inline]
    irq_exit+0x180/0x1d0 kernel/softirq.c:413
    exiting_irq arch/x86/include/asm/apic.h:536 [inline]
    smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807

    RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
    Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
    RSP: 0018:ffff8880a98afd78 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
    RAX: 1ffffffff1125061 RBX: ffff8880a989c340 RCX: 0000000000000000
    RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880a989cbbc
    RBP: ffff8880a98afda8 R08: ffff8880a989c340 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
    R13: ffffffff889282f8 R14: 0000000000000001 R15: 0000000000000000
    arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555
    default_idle_call+0x36/0x90 kernel/sched/idle.c:93
    cpuidle_idle_call kernel/sched/idle.c:153 [inline]
    do_idle+0x386/0x570 kernel/sched/idle.c:262
    cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353
    start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271
    secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
    Kernel Offset: disabled
    Rebooting in 86400 seconds..

    Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Andrey Vagin
    Cc: Soheil Hassas Yeganeh
    Cc: Neal Cardwell
    Acked-by: Soheil Hassas Yeganeh
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • syzbot was able to trigger another soft lockup [1]

    I first thought it was the O(N^2) issue I mentioned in my
    prior fix (f657d22ee1f "net/x25: do not hold the cpu
    too long in x25_new_lci()"), but I eventually found
    that x25_bind() was not checking SOCK_ZAPPED state under
    socket lock protection.

    This means that multiple threads can end up calling
    x25_insert_socket() for the same socket, and corrupt x25_list

    [1]
    watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.2:10492]
    Modules linked in:
    irq event stamp: 27515
    hardirqs last enabled at (27514): [] trace_hardirqs_on_thunk+0x1a/0x1c
    hardirqs last disabled at (27515): [] trace_hardirqs_off_thunk+0x1a/0x1c
    softirqs last enabled at (32): [] x25_get_neigh+0xa3/0xd0 net/x25/x25_link.c:336
    softirqs last disabled at (34): [] x25_find_socket+0x23/0x140 net/x25/af_x25.c:341
    CPU: 0 PID: 10492 Comm: syz-executor.2 Not tainted 5.0.0-rc7+ #88
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x50 kernel/kcov.c:97
    Code: f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00 00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c 92 7e 81 e2
    RSP: 0018:ffff88806e94fc48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
    RAX: 1ffff1100d84dac5 RBX: 0000000000000001 RCX: ffffc90006197000
    RDX: 0000000000040000 RSI: ffffffff86324bf3 RDI: ffff88806c26d628
    RBP: ffff88806e94fc48 R08: ffff88806c1c6500 R09: fffffbfff1282561
    R10: fffffbfff1282560 R11: ffffffff89412b03 R12: ffff88806c26d628
    R13: ffff888090455200 R14: dffffc0000000000 R15: 0000000000000000
    FS: 00007f3a107e4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3a107e3db8 CR3: 00000000a5544000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __x25_find_socket net/x25/af_x25.c:327 [inline]
    x25_find_socket+0x7d/0x140 net/x25/af_x25.c:342
    x25_new_lci net/x25/af_x25.c:355 [inline]
    x25_connect+0x380/0xde0 net/x25/af_x25.c:784
    __sys_connect+0x266/0x330 net/socket.c:1662
    __do_sys_connect net/socket.c:1673 [inline]
    __se_sys_connect net/socket.c:1670 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1670
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457e29
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f3a107e3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29
    RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000005
    RBP: 000000000073c040 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a107e46d4
    R13: 00000000004be362 R14: 00000000004ceb98 R15: 00000000ffffffff
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 10493 Comm: syz-executor.3 Not tainted 5.0.0-rc7+ #88
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
    RIP: 0010:queued_write_lock_slowpath+0x143/0x290 kernel/locking/qrwlock.c:86
    Code: 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 41 0f b6 55 00 38 d7 7c eb 84 d2 74 e7 48 89 df e8 cc aa 4e 00 eb dd be 04 00
    RSP: 0018:ffff888085c47bd8 EFLAGS: 00000206
    RAX: 0000000000000300 RBX: ffffffff89412b00 RCX: 1ffffffff1282560
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89412b00
    RBP: ffff888085c47c70 R08: 1ffffffff1282560 R09: fffffbfff1282561
    R10: fffffbfff1282560 R11: ffffffff89412b03 R12: 00000000000000ff
    R13: fffffbfff1282560 R14: 1ffff11010b88f7d R15: 0000000000000003
    FS: 00007fdd04086700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fdd04064db8 CR3: 0000000090be0000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
    do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
    __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
    _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
    x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
    x25_bind+0x273/0x340 net/x25/af_x25.c:703
    __sys_bind+0x23f/0x290 net/socket.c:1481
    __do_sys_bind net/socket.c:1492 [inline]
    __se_sys_bind net/socket.c:1490 [inline]
    __x64_sys_bind+0x73/0xb0 net/socket.c:1490
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457e29

    Fixes: 90c27297a9bf ("X.25 remove bkl in bind")
    Signed-off-by: Eric Dumazet
    Cc: andrew hendry
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This callback was removed some time ago, also remove the documentation.

    Fixes: 1b6dd556c304 ("net: dsa: Remove prepare phase for FDB")
    Signed-off-by: Hauke Mehrtens
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Hauke Mehrtens
     
  • This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
    when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
    bridge: remove ipv6 zero address check in mcast queries")

    The reason is RFC 4541 is not a standard but suggestive. Currently we
    will elect 0.0.0.0 as Querier if there is no ip address configured on
    bridge. If we do not add the port which recives query with source
    0.0.0.0 to router list, the IGMP reports will not be about to forward
    to Querier, IGMP data will also not be able to forward to dest.

    As Nikolay suggested, revert this change first and add a boolopt api
    to disable none-zero election in future if needed.

    Reported-by: Linus Lüssing
    Reported-by: Sebastian Gottschall
    Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
    Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries")
    Signed-off-by: Hangbin Liu
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • Just like commit e2ba732a1681 ("selftests: fib_tests: sleep after
    changing carrier"), wait one second to allow linkwatch to propagate the
    carrier change to the stack.

    There are two sets of carrier tests. The first slept after the carrier
    was set to off, and when the second set ran, it was likely that the
    linkwatch would be able to run again without much delay, reducing the
    likelihood of a race. However, if you run 'fib_tests.sh -t carrier' on a
    loop, you will quickly notice the failures.

    Sleeping on the second set of tests make the failures go away.

    Cc: David Ahern
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • cards_found is a static variable, but when it enters atl2_probe(),
    cards_found is set to zero, the value is not consistent with last probe,
    so next behavior is not our expect.

    Signed-off-by: Mao Wenan
    Signed-off-by: David S. Miller

    Mao Wenan
     
  • Some Marvell Alaska PHYs support 2.5G, 5G and 10G BaseT links. Their
    default behaviour is to advertise all of these modes, but at the moment,
    only 10GBaseT is supported. To prevent link partners from establishing
    link at that speed, clear these modes upon configuring aneg parameters.

    Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
    Signed-off-by: Maxime Chevallier
    Reported-by: Russell King
    Signed-off-by: David S. Miller

    Maxime Chevallier
     
  • Pull powerpc fix from Michael Ellerman:
    "One fix for an oops when using SRIOV, introduced by the recent changes
    to support compound IOMMU groups.

    Thanks to Alexey Kardashevskiy"

    * tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/powernv/sriov: Register IOMMU groups for VFs

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "Four small fixes: three in drivers and one in the core.

    The core fix is also minor in scope since the bug it fixes is only
    known to affect systems using SCSI reservations. Of the driver bugs,
    the libsas one is the most major because it can lead to multiple disks
    on the same expander not being exposed"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: core: reset host byte in DID_NEXUS_FAILURE case
    scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
    scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation
    scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task

    Linus Torvalds
     

23 Feb, 2019

22 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-02-23

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a bug in BPF's LPM deletion logic to match correct prefix
    length, from Alban.

    2) Fix AF_XDP teardown by not destroying umem prematurely as it
    is still needed till all outstanding skbs are freed, from Björn.

    3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking
    signal_pending() outside need_resched() condition which is never
    triggered there, from Stanislav.

    4) Fix two nfp JIT bugs, one in code emission for K-based xor, and
    another one to explicitly clear upper bits in alu32, from Jiong.

    5) Add bpf list address to maintainers file, from Daniel.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • …morris/linux-security

    Pull keys fixes from James Morris:
    "Two fixes from Eric Biggers"

    * 'fixes-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    KEYS: always initialize keyring_index_key::desc_len
    KEYS: user: Align the payload buffer

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "These fix a regression in the PM-runtime framework introduced by the
    recent switch-over of it to using hrtimers and a use-after-free
    introduced by one of the recent changes in the scmi-cpufreq driver.

    Specifics:

    - Use hrtimer_try_to_cancel() instead of hrtimer_cancel() in the
    PM-runtime framework to avoid a possible timer-related deadlock
    introduced recently (Vincent Guittot).

    - Reorder the scmi-cpufreq driver code to avoid accessing memory that
    has just been freed (Yangtao Li)"

    * tag 'pm-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM-runtime: Fix deadlock when canceling hrtimer
    cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()

    Linus Torvalds
     
  • Pull ARM SoC fixes from Arnd Bergmann:
    "Only a handful of device tree fixes, all simple enough:

    NVIDIA Tegra:
    - Fix a regression for booting on chromebooks

    TI OMAP:
    - Two fixes PHY mode on am335x reference boards

    Marvell mvebu:
    - A regression fix for Armada XP NAND flash controllers
    - An incorrect reset signal on the clearfog board"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
    ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
    ARM: dts: am335x-evm: Fix PHY mode for ethernet
    ARM: dts: am335x-evmsk: Fix PHY mode for ethernet
    arm64: dts: clearfog-gt-8k: fix SGMII PHY reset signal
    ARM: dts: armada-xp: fix Armada XP boards NAND description

    Linus Torvalds
     
  • Pull ARC fixes from Vineet Gupta:
    "Fixes for ARC for 5.0, bunch of those are stable fodder anyways so
    sooner the better.

    - Fix memcpy to prevent prefetchw beyond end of buffer [Eugeniy]

    - Enable unaligned access early to prevent exceptions given newer gcc
    code gen [Eugeniy]

    - Tighten up uboot arg checking to prevent false negatives and also
    allow both jtag and bootloading to coexist w/o config option as
    needed by kernelCi folks [Eugeniy]

    - Set slab alignment to 8 for ARC to avoid the atomic64_t unalign
    [Alexey]

    - Disable regfile auto save on interrupts on HSDK platform due to a
    silicon issue [Vineet]

    - Avoid HS38x boot printing crash by not reading HS48x only reg
    [Vineet]"

    * tag 'arc-5.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARCv2: don't assume core 0x54 has dual issue
    ARC: define ARCH_SLAB_MINALIGN = 8
    ARC: enable uboot support unconditionally
    ARC: U-boot: check arguments paranoidly
    ARCv2: support manual regfile save on interrupts
    ARC: uacces: remove lp_start, lp_end from clobber list
    ARC: fix actionpoints configuration detection
    ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
    ARCv2: Enable unaligned access in early ASM code

    Linus Torvalds
     
  • We recently created a bpf@vger.kernel.org list (https://lore.kernel.org/bpf/)
    for BPF related discussions, originally in context of BPF track at LSF/MM
    for topic discussions. It's *optional* but *desirable* to keep it in Cc for
    BPF related kernel/loader/llvm/tooling threads, meaning also infrastructure
    like llvm that sits on top of kernel but is crucial to BPF. In any case,
    netdev with it's bpf delegate is *as-is* today primary list for patches, so
    nothing changes in the workflow. Main purpose is to have some more awareness
    for the bpf@vger.kernel.org list that folks can Cc for BPF specific topics.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Pull parisc fixes from Helge Deller:
    "Fix ptrace syscall number modification which has been broken since
    kernel v4.5 and provide alternative email addresses for the remaining
    users of the retired parisc-linux.org email domain"

    * 'parisc-5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    CREDITS/MAINTAINERS: Retire parisc-linux.org email domain
    parisc: Fix ptrace syscall number modification

    Linus Torvalds
     
  • …/masahiroy/linux-kbuild

    Pull more Kbuild fixes from Masahiro Yamada:

    - fix scripts/kallsyms.c to correctly check too long symbol names

    - fix sh build error for the combination of CONFIG_OF_EARLY_FLATTREE=y
    and CONFIG_USE_BUILTIN_DTB=n

    * tag 'kbuild-fixes-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    sh: fix build error for invisible CONFIG_BUILTIN_DTB_SOURCE
    kallsyms: Handle too long symbols in kallsyms.c

    Linus Torvalds
     
  • Paolo Abeni says:

    ====================
    udp: a few fixes

    This series includes some UDP-related fixlet. All this stuff has been
    pointed out by the sparse tool. The first two patches are just annotation
    related, while the last 2 cover some very unlikely races.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Similar to the previous commit, this addresses the same issue for
    ipv4: use a single fetch operation and use the correct rcu
    annotation.

    Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port")
    Signed-off-by: Paolo Abeni
    Acked-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Before derefencing the encap pointer, commit e7cc082455cb ("udp: Support
    for error handlers of tunnels with arbitrary destination port") checks
    for a NULL value, but the two fetch operation can race with removal.
    Fix the above using a single access.
    Also fix a couple of type annotations, to make sparse happy.

    Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port")
    Signed-off-by: Paolo Abeni
    Acked-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Last argument of gue6_err_proto_handler() has a wrong type annotation,
    fix it and make sparse happy again.

    Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE")
    Signed-off-by: Paolo Abeni
    Acked-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • In commit 029a37434880 ("udp6: cleanup stats accounting in recvmsg()")
    I forgot to add the percpu annotation for the mib pointer. Add it, and
    make sparse happy.

    Fixes: 029a37434880 ("udp6: cleanup stats accounting in recvmsg()")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • KASAN has found use-after-free in fixed_mdio_bus_init,
    commit 0c692d07842a ("drivers/net/phy/mdio_bus.c: call
    put_device on device_register() failure") call put_device()
    while device_register() fails,give up the last reference
    to the device and allow mdiobus_release to be executed
    ,kfreeing the bus. However in most drives, mdiobus_free
    be called to free the bus while mdiobus_register fails.
    use-after-free occurs when access bus again, this patch
    revert it to let mdiobus_free free the bus.

    KASAN report details as below:

    BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
    Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524

    CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xfa/0x1ce lib/dump_stack.c:113
    print_address_description+0x65/0x270 mm/kasan/report.c:187
    kasan_report+0x149/0x18d mm/kasan/report.c:317
    mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
    fixed_mdio_bus_init+0x283/0x1000 [fixed_phy]
    ? 0xffffffffc0e40000
    ? 0xffffffffc0e40000
    ? 0xffffffffc0e40000
    do_one_initcall+0xfa/0x5ca init/main.c:887
    do_init_module+0x204/0x5f6 kernel/module.c:3460
    load_module+0x66b2/0x8570 kernel/module.c:3808
    __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
    do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x462e99
    Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
    RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc
    R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004

    Allocated by task 3524:
    set_track mm/kasan/common.c:85 [inline]
    __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
    kmalloc include/linux/slab.h:545 [inline]
    kzalloc include/linux/slab.h:740 [inline]
    mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143
    fixed_mdio_bus_init+0x163/0x1000 [fixed_phy]
    do_one_initcall+0xfa/0x5ca init/main.c:887
    do_init_module+0x204/0x5f6 kernel/module.c:3460
    load_module+0x66b2/0x8570 kernel/module.c:3808
    __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
    do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 3524:
    set_track mm/kasan/common.c:85 [inline]
    __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
    slab_free_hook mm/slub.c:1409 [inline]
    slab_free_freelist_hook mm/slub.c:1436 [inline]
    slab_free mm/slub.c:2986 [inline]
    kfree+0xe1/0x270 mm/slub.c:3938
    device_release+0x78/0x200 drivers/base/core.c:919
    kobject_cleanup lib/kobject.c:662 [inline]
    kobject_release lib/kobject.c:691 [inline]
    kref_put include/linux/kref.h:67 [inline]
    kobject_put+0x146/0x240 lib/kobject.c:708
    put_device+0x1c/0x30 drivers/base/core.c:2060
    __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382
    fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy]
    do_one_initcall+0xfa/0x5ca init/main.c:887
    do_init_module+0x204/0x5f6 kernel/module.c:3460
    load_module+0x66b2/0x8570 kernel/module.c:3808
    __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
    do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8881dc824c80
    which belongs to the cache kmalloc-2k of size 2048
    The buggy address is located 248 bytes inside of
    2048-byte region [ffff8881dc824c80, ffff8881dc825480)
    The buggy address belongs to the page:
    page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0
    flags: 0x2fffc0000010200(slab|head)
    raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800
    raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

    Fixes: 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure")
    Signed-off-by: YueHaibing
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    YueHaibing
     
  • Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
    keep legacy software happy. This is similar to what was done for
    ipv4 in commit 709772e6e065 ("net: Fix routing tables with
    id > 255 for legacy software").

    Signed-off-by: Kalash Nainwal
    Signed-off-by: David S. Miller

    Kalash Nainwal
     
  • Michael Chan says:

    ====================
    bnxt_en: firmware message delay fixes.

    We were seeing some intermittent firmware message timeouts in our lab and
    these 2 small patches fix them. Please apply to stable as well. Thanks.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The code waits up to 20 usec for the firmware response to complete
    once we've seen the valid response header in the buffer. It turns
    out that in some scenarios, this wait time is not long enough.
    Extend it to 150 usec and use usleep_range() instead of udelay().

    Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • The logic that polls for the firmware message response uses a shorter
    sleep interval for the first few passes. But there was a typo so it
    was using the wrong counter (larger counter) for these short sleep
    passes. The result is a slightly shorter timeout period for these
    firmware messages than intended. Fix it by using the proper counter.

    Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • Jiong Wang says:

    ====================
    Code-gen for BPF_ALU | BPF_XOR | BPF_K is wrong when imm is -1,
    also high 32-bit of 64-bit register should always be cleared.

    This set fixed both bugs.
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • NFP BPF JIT compiler is doing a couple of small optimizations when jitting
    ALU imm instructions, some of these optimizations could save code-gen, for
    example:

    A & -1 = A
    A | 0 = A
    A ^ 0 = A

    However, for ALU32, high 32-bit of the 64-bit register should still be
    cleared according to ISA semantics.

    Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Daniel Borkmann

    Jiong Wang
     
  • The intended optimization should be A ^ 0 = A, not A ^ -1 = A.

    Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Daniel Borkmann

    Jiong Wang
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Three more fixes:
    * mac80211 mesh code wasn't allocating SKB tailroom properly
    in some cases
    * tx_sk_pacing_shift should be 7 for better performance
    * mac80211_hwsim wasn't propagating genlmsg_reply() errors
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller