06 Jan, 2020

2 commits

  • Linus Torvalds
     
  • Pull RISC-V fixes from Paul Walmsley:
    "Several fixes for RISC-V:

    - Fix function graph trace support

    - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions
    with macros elsewhere in the Linux kernel tree named "IRQ_TIMER"

    - Use __pa_symbol() when computing the physical address of a kernel
    symbol, rather than __pa()

    - Mark the RISC-V port as supporting GCOV

    One DT addition:

    - Describe the L2 cache controller in the FU540 DT file

    One documentation update:

    - Add patch acceptance guideline documentation"

    * tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
    Documentation: riscv: add patch acceptance guidelines
    riscv: prefix IRQ_ macro names with an RV_ namespace
    clocksource: riscv: add notrace to riscv_sched_clock
    riscv: ftrace: correct the condition logic in function graph tracer
    riscv: dts: Add DT support for SiFive L2 cache controller
    riscv: gcov: enable gcov for RISC-V
    riscv: mm: use __pa_symbol for kernel symbols

    Linus Torvalds
     

05 Jan, 2020

26 commits

  • Formalize, in kernel documentation, the patch acceptance policy for
    arch/riscv. In summary, it states that as maintainers, we plan to
    only accept patches for new modules or extensions that have been
    frozen or ratified by the RISC-V Foundation.

    We've been following these guidelines for the past few months. In the
    meantime, we've received quite a bit of feedback that it would be
    helpful to have these guidelines formally documented.

    Based on a suggestion from Matthew Wilcox, we also add a link to this
    file to Documentation/process/index.rst, to make this document easier
    to find. The format of this document has also been changed to align
    to the format outlined in the maintainer entry profiles, in accordance
    with comments from Jon Corbet and Dan Williams.

    Signed-off-by: Paul Walmsley
    Reviewed-by: Palmer Dabbelt
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Krste Asanovic
    Cc: Andrew Waterman
    Cc: Matthew Wilcox
    Cc: Dan Williams
    Cc: Jonathan Corbet

    Paul Walmsley
     
  • "IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently
    generic macro name that it's used by several source files across the
    Linux code base. Some of these other files ultimately include the
    arch/riscv CSR include file, causing collisions. Fix by prefixing the
    RISC-V csr.h IRQ_ macro names with an RV_ prefix.

    Fixes: a4c3733d32a72 ("riscv: abstract out CSR names for supervisor vs machine mode")
    Reported-by: Olof Johansson
    Acked-by: Olof Johansson
    Signed-off-by: Paul Walmsley

    Paul Walmsley
     
  • When enabling ftrace graph tracer, it gets the tracing clock in
    ftrace_push_return_trace(). Eventually, it invokes riscv_sched_clock()
    to get the clock value. If riscv_sched_clock() isn't marked with
    'notrace', it will call ftrace_push_return_trace() and cause infinite
    loop.

    The result of failure as follow:

    command: echo function_graph >current_tracer
    [ 46.176787] Unable to handle kernel paging request at virtual address ffffffe04fb38c48
    [ 46.177309] Oops [#1]
    [ 46.177478] Modules linked in:
    [ 46.177770] CPU: 0 PID: 256 Comm: $d Not tainted 5.5.0-rc1 #47
    [ 46.177981] epc: ffffffe00035e59a ra : ffffffe00035e57e sp : ffffffe03a7569b0
    [ 46.178216] gp : ffffffe000d29b90 tp : ffffffe03a756180 t0 : ffffffe03a756968
    [ 46.178430] t1 : ffffffe00087f408 t2 : ffffffe03a7569a0 s0 : ffffffe03a7569f0
    [ 46.178643] s1 : ffffffe00087f408 a0 : 0000000ac054cda4 a1 : 000000000087f411
    [ 46.178856] a2 : 0000000ac054cda4 a3 : 0000000000373ca0 a4 : ffffffe04fb38c48
    [ 46.179099] a5 : 00000000153e22a8 a6 : 00000000005522ff a7 : 0000000000000005
    [ 46.179338] s2 : ffffffe03a756a90 s3 : ffffffe00032811c s4 : ffffffe03a756a58
    [ 46.179570] s5 : ffffffe000d29fe0 s6 : 0000000000000001 s7 : 0000000000000003
    [ 46.179809] s8 : 0000000000000003 s9 : 0000000000000002 s10: 0000000000000004
    [ 46.180053] s11: 0000000000000000 t3 : 0000003fc815749c t4 : 00000000000efc90
    [ 46.180293] t5 : ffffffe000d29658 t6 : 0000000000040000
    [ 46.180482] status: 0000000000000100 badaddr: ffffffe04fb38c48 cause: 000000000000000f

    Signed-off-by: Zong Li
    Reviewed-by: Steven Rostedt (VMware)
    [paul.walmsley@sifive.com: cleaned up patch description]
    Fixes: 92e0d143fdef ("clocksource/drivers/riscv_timer: Provide the sched_clock")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paul Walmsley

    Zong Li
     
  • Merge misc fixes from Andrew Morton:
    "17 fixes"

    * emailed patches from Andrew Morton :
    hexagon: define ioremap_uc
    ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
    ocfs2: call journal flush to mark journal as empty after journal recovery when mount
    mm/hugetlb: defer freeing of huge pages if in non-task context
    mm/gup: fix memory leak in __gup_benchmark_ioctl
    mm/oom: fix pgtables units mismatch in Killed process message
    fs/posix_acl.c: fix kernel-doc warnings
    hexagon: work around compiler crash
    hexagon: parenthesize registers in asm predicates
    fs/namespace.c: make to_mnt_ns() static
    fs/nsfs.c: include headers for missing declarations
    fs/direct-io.c: include fs/internal.h for missing prototype
    mm: move_pages: return valid node id in status if the page is already on the target node
    memcg: account security cred as well to kmemcg
    kcov: fix struct layout for kcov_remote_arg
    mm/zsmalloc.c: fix the migrated zspage statistics.
    mm/memory_hotplug: shrink zones when offlining memory

    Linus Torvalds
     
  • …git/jj/linux-apparmor

    Pull apparmor fixes from John Johansen:

    - performance regression: only get a label reference if the fast path
    check fails

    - fix aa_xattrs_match() may sleep while holding a RCU lock

    - fix bind mounts aborting with -ENOMEM

    * tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
    apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
    apparmor: only get a label reference if the fast path check fails
    apparmor: fix bind mounts aborting with -ENOMEM

    Linus Torvalds
     
  • aa_xattrs_match() is unfortunately calling vfs_getxattr_alloc() from a
    context protected by an rcu_read_lock. This can not be done as
    vfs_getxattr_alloc() may sleep regardles of the gfp_t value being
    passed to it.

    Fix this by breaking the rcu_read_lock on the policy search when the
    xattr match feature is requested and restarting the search if a policy
    changes occur.

    Fixes: 8e51f9087f40 ("apparmor: Add support for attaching profiles via xattr, presence and value")
    Reported-by: Jia-Ju Bai
    Reported-by: Al Viro
    Signed-off-by: John Johansen

    John Johansen
     
  • Pull MIPS fixes from Paul Burton:
    "A collection of MIPS fixes:

    - Fill the struct cacheinfo shared_cpu_map field with sensible
    values, notably avoiding issues with perf which was unhappy in the
    absence of these values.

    - A boot fix for Loongson 2E & 2F machines which was fallout from
    some refactoring performed this cycle.

    - A Kconfig dependency fix for the Loongson CPU HWMon driver.

    - A couple of VDSO fixes, ensuring gettimeofday() behaves
    appropriately for kernel configurations that don't include support
    for a clocksource the VDSO can use & fixing the calling convention
    for the n32 & n64 VDSOs which would previously clobber the $gp/$28
    register.

    - A build fix for vmlinuz compressed images which were
    inappropriately building with -fsanitize-coverage despite not being
    part of the kernel proper, then failing to link due to the missing
    __sanitizer_cov_trace_pc() function.

    - A couple of eBPF JIT fixes, including disabling it for MIPS32 due
    to a large number of issues with the code generated there &
    reflecting ISA dependencies in Kconfig to enforce that systems
    which don't support the JIT must include the interpreter"

    * tag 'mips_fixes_5.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
    MIPS: Avoid VDSO ABI breakage due to global register variable
    MIPS: BPF: eBPF JIT: check for MIPS ISA compliance in Kconfig
    MIPS: BPF: Disable MIPS32 eBPF JIT
    MIPS: Prevent link failure with kcov instrumentation
    MIPS: Kconfig: Use correct form for 'depends on'
    mips: Fix gettimeofday() in the vdso library
    MIPS: Fix boot on Fuloong2 systems
    mips: cacheinfo: report shared CPU map

    Linus Torvalds
     
  • Similar to commit 38e45d81d14e ("sparc64: implement ioremap_uc") define
    ioremap_uc for hexagon to avoid errors from
    -Wimplicit-function-definition.

    Link: http://lkml.kernel.org/r/20191209222956.239798-2-ndesaulniers@google.com
    Link: https://github.com/ClangBuiltLinux/linux/issues/797
    Fixes: e537654b7039 ("lib: devres: add a helper function for ioremap_uc")
    Signed-off-by: Nick Desaulniers
    Suggested-by: Nathan Chancellor
    Acked-by: Brian Cain
    Cc: Lee Jones
    Cc: Andy Shevchenko
    Cc: Tuowen Zhao
    Cc: Mika Westerberg
    Cc: Luis Chamberlain
    Cc: Greg Kroah-Hartman
    Cc: Alexios Zavras
    Cc: Allison Randal
    Cc: Will Deacon
    Cc: Richard Fontana
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Boqun Feng
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • Because ocfs2_get_dlm_debug() function is called once less here, ocfs2
    file system will trigger the system crash, usually after ocfs2 file
    system is unmounted.

    This system crash is caused by a generic memory corruption, these crash
    backtraces are not always the same, for exapmle,

    ocfs2: Unmounting device (253,16) on (node 172167785)
    general protection fault: 0000 [#1] SMP PTI
    CPU: 3 PID: 14107 Comm: fence_legacy Kdump:
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    RIP: 0010:__kmalloc+0xa5/0x2a0
    Code: 00 00 4d 8b 07 65 4d 8b
    RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0
    RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079
    R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0
    R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0
    FS: 00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0
    Call Trace:
    ext4_htree_store_dirent+0x35/0x100 [ext4]
    htree_dirblock_to_tree+0xea/0x290 [ext4]
    ext4_htree_fill_tree+0x1c1/0x2d0 [ext4]
    ext4_readdir+0x67c/0x9d0 [ext4]
    iterate_dir+0x8d/0x1a0
    __x64_sys_getdents+0xab/0x130
    do_syscall_64+0x60/0x1f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f699d33a9fb

    This regression problem was introduced by commit e581595ea29c ("ocfs: no
    need to check return value of debugfs_create functions").

    Link: http://lkml.kernel.org/r/20191225061501.13587-1-ghe@suse.com
    Fixes: e581595ea29c ("ocfs: no need to check return value of debugfs_create functions")
    Signed-off-by: Gang He
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc: [5.3+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gang He
     
  • If journal is dirty when mount, it will be replayed but jbd2 sb log tail
    cannot be updated to mark a new start because journal->j_flag has
    already been set with JBD2_ABORT first in journal_init_common.

    When a new transaction is committed, it will be recored in block 1
    first(journal->j_tail is set to 1 in journal_reset). If emergency
    restart happens again before journal super block is updated
    unfortunately, the new recorded trans will not be replayed in the next
    mount.

    The following steps describe this procedure in detail.
    1. mount and touch some files
    2. these transactions are committed to journal area but not checkpointed
    3. emergency restart
    4. mount again and its journals are replayed
    5. journal super block's first s_start is 1, but its s_seq is not updated
    6. touch a new file and its trans is committed but not checkpointed
    7. emergency restart again
    8. mount and journal is dirty, but trans committed in 6 will not be
    replayed.

    This exception happens easily when this lun is used by only one node.
    If it is used by multi-nodes, other node will replay its journal and its
    journal super block will be updated after recovery like what this patch
    does.

    ocfs2_recover_node->ocfs2_replay_journal.

    The following jbd2 journal can be generated by touching a new file after
    journal is replayed, and seq 15 is the first valid commit, but first seq
    is 13 in journal super block.

    logdump:
    Block 0: Journal Superblock
    Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2)
    Blocksize: 4096 Total Blocks: 32768 First Block: 1
    First Commit ID: 13 Start Log Blknum: 1
    Error: 0
    Feature Compat: 0
    Feature Incompat: 2 block64
    Feature RO compat: 0
    Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
    FS Share Cnt: 1 Dynamic Superblk Blknum: 0
    Per Txn Block Limit Journal: 0 Data: 0

    Block 1: Journal Commit Block
    Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK)

    Block 2: Journal Descriptor
    Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK)
    No. Blocknum Flags
    0. 587 none
    UUID: 00000000000000000000000000000000
    1. 8257792 JBD2_FLAG_SAME_UUID
    2. 619 JBD2_FLAG_SAME_UUID
    3. 24772864 JBD2_FLAG_SAME_UUID
    4. 8257802 JBD2_FLAG_SAME_UUID
    5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
    ...
    Block 7: Inode
    Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809)
    FS Generation: 2839773110 (0xa9437fb6)
    CRC32: 00000000 ECC: 0000
    Type: Regular Attr: 0x0 Flags: Valid
    Dynamic Features: (0x1) InlineData
    User: 0 (root) Group: 0 (root) Size: 7
    Links: 1 Clusters: 0
    ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
    atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019
    mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
    dtime: 0x0 -- Thu Jan 1 08:00:00 1970
    ...
    Block 9: Journal Commit Block
    Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK)

    The following is journal recovery log when recovering the upper jbd2
    journal when mount again.

    syslog:
    ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
    fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
    fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

    Due to first commit seq 13 recorded in journal super is not consistent
    with the value recorded in block 1(seq is 14), journal recovery will be
    terminated before seq 15 even though it is an unbroken commit, inode
    8257802 is a new file and it will be lost.

    Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
    Signed-off-by: Kai Li
    Reviewed-by: Joseph Qi
    Reviewed-by: Changwei Ge
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kai Li
     
  • The following lockdep splat was observed when a certain hugetlbfs test
    was run:

    ================================
    WARNING: inconsistent lock state
    4.18.0-159.el8.x86_64+debug #1 Tainted: G W --------- - -
    --------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    swapper/30/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    ffffffff9acdc038 (hugetlb_lock){+.?.}, at: free_huge_page+0x36f/0xaa0
    {SOFTIRQ-ON-W} state was registered at:
    lock_acquire+0x14f/0x3b0
    _raw_spin_lock+0x30/0x70
    __nr_hugepages_store_common+0x11b/0xb30
    hugetlb_sysctl_handler_common+0x209/0x2d0
    proc_sys_call_handler+0x37f/0x450
    vfs_write+0x157/0x460
    ksys_write+0xb8/0x170
    do_syscall_64+0xa5/0x4d0
    entry_SYSCALL_64_after_hwframe+0x6a/0xdf
    irq event stamp: 691296
    hardirqs last enabled at (691296): [] _raw_spin_unlock_irqrestore+0x4b/0x60
    hardirqs last disabled at (691295): [] _raw_spin_lock_irqsave+0x22/0x81
    softirqs last enabled at (691284): [] irq_enter+0xc3/0xe0
    softirqs last disabled at (691285): [] irq_exit+0x23e/0x2b0

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(hugetlb_lock);

    lock(hugetlb_lock);

    *** DEADLOCK ***
    :
    Call Trace:

    __lock_acquire+0x146b/0x48c0
    lock_acquire+0x14f/0x3b0
    _raw_spin_lock+0x30/0x70
    free_huge_page+0x36f/0xaa0
    bio_check_pages_dirty+0x2fc/0x5c0
    clone_endio+0x17f/0x670 [dm_mod]
    blk_update_request+0x276/0xe50
    scsi_end_request+0x7b/0x6a0
    scsi_io_completion+0x1c6/0x1570
    blk_done_softirq+0x22e/0x350
    __do_softirq+0x23d/0xad8
    irq_exit+0x23e/0x2b0
    do_IRQ+0x11a/0x200
    common_interrupt+0xf/0xf

    Both the hugetbl_lock and the subpool lock can be acquired in
    free_huge_page(). One way to solve the problem is to make both locks
    irq-safe. However, Mike Kravetz had learned that the hugetlb_lock is
    held for a linear scan of ALL hugetlb pages during a cgroup reparentling
    operation. So it is just too long to have irq disabled unless we can
    break hugetbl_lock down into finer-grained locks with shorter lock hold
    times.

    Another alternative is to defer the freeing to a workqueue job. This
    patch implements the deferred freeing by adding a free_hpage_workfn()
    work function to do the actual freeing. The free_huge_page() call in a
    non-task context saves the page to be freed in the hpage_freelist linked
    list in a lockless manner using the llist APIs.

    The generic workqueue is used to process the work, but a dedicated
    workqueue can be used instead if it is desirable to have the huge page
    freed ASAP.

    Thanks to Kirill Tkhai for suggesting the use of
    llist APIs which simplfy the code.

    Link: http://lkml.kernel.org/r/20191217170331.30893-1-longman@redhat.com
    Signed-off-by: Waiman Long
    Reviewed-by: Mike Kravetz
    Acked-by: Davidlohr Bueso
    Acked-by: Michal Hocko
    Reviewed-by: Kirill Tkhai
    Cc: Aneesh Kumar K.V
    Cc: Matthew Wilcox
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • In the implementation of __gup_benchmark_ioctl() the allocated pages
    should be released before returning in case of an invalid cmd. Release
    pages via kvfree().

    [akpm@linux-foundation.org: rework code flow, return -EINVAL rather than -1]
    Link: http://lkml.kernel.org/r/20191211174653.4102-1-navid.emamdoost@gmail.com
    Fixes: 714a3a1ebafe ("mm/gup_benchmark.c: add additional pinning methods")
    Signed-off-by: Navid Emamdoost
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Reviewed-by: John Hubbard
    Cc: Keith Busch
    Cc: Kirill A. Shutemov
    Cc: Dave Hansen
    Cc: Dan Williams
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Navid Emamdoost
     
  • pr_err() expects kB, but mm_pgtables_bytes() returns the number of bytes.
    As everything else is printed in kB, I chose to fix the value rather than
    the string.

    Before:

    [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
    ...
    [ 1878] 1000 1878 217253 151144 1269760 0 0 python
    ...
    Out of memory: Killed process 1878 (python) total-vm:869012kB, anon-rss:604572kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:1269760kB oom_score_adj:0

    After:

    [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
    ...
    [ 1436] 1000 1436 217253 151890 1294336 0 0 python
    ...
    Out of memory: Killed process 1436 (python) total-vm:869012kB, anon-rss:607516kB, file-rss:44kB, shmem-rss:0kB, UID:1000 pgtables:1264kB oom_score_adj:0

    Link: http://lkml.kernel.org/r/20191211202830.1600-1-idryomov@gmail.com
    Fixes: 70cb6d267790 ("mm/oom: add oom_score_adj and pgtables to Killed process message")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Andrew Morton
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Cc: Edward Chron
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ilya Dryomov
     
  • Fix kernel-doc warnings in fs/posix_acl.c.
    Also fix one typo (setgit -> setgid).

    fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
    fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
    fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'

    Link: http://lkml.kernel.org/r/29b0dc46-1f28-a4e5-b1d0-ba2b65629779@infradead.org
    Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file permissions")

    Signed-off-by: Randy Dunlap
    Acked-by: Andreas Gruenbacher
    Reviewed-by: Jan Kara
    Cc: Jan Kara
    Cc: Andreas Gruenbacher
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Clang cannot translate the string "r30" into a valid register yet.

    Link: https://github.com/ClangBuiltLinux/linux/issues/755
    Link: http://lkml.kernel.org/r/20191028155722.23419-1-ndesaulniers@google.com
    Signed-off-by: Nick Desaulniers
    Suggested-by: Sid Manning
    Reviewed-by: Brian Cain
    Cc: Allison Randal
    Cc: Greg Kroah-Hartman
    Cc: Richard Fontana
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • Hexagon requires that register predicates in assembly be parenthesized.

    Link: https://github.com/ClangBuiltLinux/linux/issues/754
    Link: http://lkml.kernel.org/r/20191209222956.239798-3-ndesaulniers@google.com
    Signed-off-by: Nick Desaulniers
    Suggested-by: Sid Manning
    Acked-by: Brian Cain
    Cc: Lee Jones
    Cc: Andy Shevchenko
    Cc: Tuowen Zhao
    Cc: Mika Westerberg
    Cc: Luis Chamberlain
    Cc: Greg Kroah-Hartman
    Cc: Alexios Zavras
    Cc: Allison Randal
    Cc: Will Deacon
    Cc: Richard Fontana
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Boqun Feng
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • Make to_mnt_ns() static to address the following 'sparse' warning:

    fs/namespace.c:1731:22: warning: symbol 'to_mnt_ns' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/20191209234830.156260-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Include linux/proc_fs.h and fs/internal.h to address the following
    'sparse' warnings:

    fs/nsfs.c:41:32: warning: symbol 'ns_dentry_operations' was not declared. Should it be static?
    fs/nsfs.c:145:5: warning: symbol 'open_related_ns' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/20191209234822.156179-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Include fs/internal.h to address the following 'sparse' warning:

    fs/direct-io.c:591:5: warning: symbol 'sb_init_dio_done_wq' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/20191209234544.128302-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Felix Abecassis reports move_pages() would return random status if the
    pages are already on the target node by the below test program:

    int main(void)
    {
    const long node_id = 1;
    const long page_size = sysconf(_SC_PAGESIZE);
    const int64_t num_pages = 8;

    unsigned long nodemask = 1 << node_id;
    long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
    if (ret < 0)
    return (EXIT_FAILURE);

    void **pages = malloc(sizeof(void*) * num_pages);
    for (int i = 0; i < num_pages; ++i) {
    pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
    MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
    -1, 0);
    if (pages[i] == MAP_FAILED)
    return (EXIT_FAILURE);
    }

    ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
    if (ret < 0)
    return (EXIT_FAILURE);

    int *nodes = malloc(sizeof(int) * num_pages);
    int *status = malloc(sizeof(int) * num_pages);
    for (int i = 0; i < num_pages; ++i) {
    nodes[i] = node_id;
    status[i] = 0xd0; /* simulate garbage values */
    }

    ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
    printf("move_pages: %ld\n", ret);
    for (int i = 0; i < num_pages; ++i)
    printf("status[%d] = %d\n", i, status[i]);
    }

    Then running the program would return nonsense status values:

    $ ./move_pages_bug
    move_pages: 0
    status[0] = 208
    status[1] = 208
    status[2] = 208
    status[3] = 208
    status[4] = 208
    status[5] = 208
    status[6] = 208
    status[7] = 208

    This is because the status is not set if the page is already on the
    target node, but move_pages() should return valid status as long as it
    succeeds. The valid status may be errno or node id.

    We can't simply initialize status array to zero since the pages may be
    not on node 0. Fix it by updating status with node id which the page is
    already on.

    Link: http://lkml.kernel.org/r/1575584353-125392-1-git-send-email-yang.shi@linux.alibaba.com
    Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
    Signed-off-by: Yang Shi
    Reported-by: Felix Abecassis
    Tested-by: Felix Abecassis
    Suggested-by: Michal Hocko
    Reviewed-by: John Hubbard
    Acked-by: Christoph Lameter
    Acked-by: Michal Hocko
    Reviewed-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: [4.17+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • The cred_jar kmem_cache is already memcg accounted in the current kernel
    but cred->security is not. Account cred->security to kmemcg.

    Recently we saw high root slab usage on our production and on further
    inspection, we found a buggy application leaking processes. Though that
    buggy application was contained within its memcg but we observe much
    more system memory overhead, couple of GiBs, during that period. This
    overhead can adversely impact the isolation on the system.

    One source of high overhead we found was cred->security objects, which
    have a lifetime of at least the life of the process which allocated
    them.

    Link: http://lkml.kernel.org/r/20191205223721.40034-1-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Acked-by: Chris Down
    Reviewed-by: Roman Gushchin
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     
  • Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
    This makes it more convenient to write userspace apps that can be
    compiled into 32-bit or 64-bit binaries and still work with the same
    64-bit kernel.

    Also use proper __u32 types in uapi headers instead of unsigned ints.

    Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
    Fixes: eec028c9386ed1a ("kcov: remote coverage support")
    Signed-off-by: Andrey Konovalov
    Acked-by: Marco Elver
    Cc: Greg Kroah-Hartman
    Cc: Alan Stern
    Cc: Felipe Balbi
    Cc: Chunfeng Yun
    Cc: "Jacky . Cao @ sony . com"
    Cc: Dmitry Vyukov
    Cc: Alexander Potapenko
    Cc: Marco Elver
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • When zspage is migrated to the other zone, the zone page state should be
    updated as well, otherwise the NR_ZSPAGE for each zone shows wrong
    counts including proc/zoneinfo in practice.

    Link: http://lkml.kernel.org/r/1575434841-48009-1-git-send-email-chanho.min@lge.com
    Fixes: 91537fee0013 ("mm: add NR_ZSMALLOC to vmstat")
    Signed-off-by: Chanho Min
    Signed-off-by: Jinsuk Choi
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: [4.9+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chanho Min
     
  • We currently try to shrink a single zone when removing memory. We use
    the zone of the first page of the memory we are removing. If that
    memmap was never initialized (e.g., memory was never onlined), we will
    read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __remove_pages+0x4b/0x640
    arch_remove_memory+0x63/0x8d
    try_remove_memory+0xdb/0x130
    __remove_memory+0xa/0x11
    acpi_memory_device_remove+0x70/0x100
    acpi_bus_trim+0x55/0x90
    acpi_device_hotplug+0x227/0x3a0
    acpi_hotplug_work_fn+0x1a/0x30
    process_one_work+0x221/0x550
    worker_thread+0x50/0x3b0
    kthread+0x105/0x140
    ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

    Instead, shrink the zones when offlining memory or when onlining failed.
    Introduce and use remove_pfn_range_from_zone(() for that. We now
    properly shrink the zones, even if we have DIMMs whereby

    - Some memory blocks fall into no zone (never onlined)

    - Some memory blocks fall into multiple zones (offlined+re-onlined)

    - Multiple memory blocks that fall into different zones

    Drop the zone parameter (with a potential dubious value) from
    __remove_pages() and __remove_section().

    Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
    Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
    Signed-off-by: David Hildenbrand
    Reviewed-by: Oscar Salvador
    Cc: Michal Hocko
    Cc: "Matthew Wilcox (Oracle)"
    Cc: "Aneesh Kumar K.V"
    Cc: Pavel Tatashin
    Cc: Greg Kroah-Hartman
    Cc: Dan Williams
    Cc: Logan Gunthorpe
    Cc: [5.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • Pull dmaengine fixes from Vinod Koul:
    "A bunch of fixes for:

    - uninitialized dma_slave_caps access

    - virt-dma use after free in vchan_complete()

    - driver fixes for ioat, k3dma and jz4780"

    * tag 'dmaengine-fix-5.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
    ioat: ioat_alloc_ring() failure handling.
    dmaengine: virt-dma: Fix access after free in vchan_complete()
    dmaengine: k3dma: Avoid null pointer traversal
    dmaengine: dma-jz4780: Also break descriptor chains on JZ4725B
    dmaengine: Fix access to uninitialized dma_slave_caps

    Linus Torvalds
     
  • Pull media fixes from Mauro Carvalho Chehab:

    - some fixes at CEC core to comply with HDMI 2.0 specs and fix some
    border cases

    - a fix at the transmission logic of the pulse8-cec driver

    - one alignment fix on a data struct at ipu3 when built with 32 bits

    * tag 'media/v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
    media: intel-ipu3: Align struct ipu3_uapi_awb_fr_config_s to 32 bytes
    media: pulse8-cec: fix lost cec_transmit_attempt_done() call
    media: cec: check 'transmit_in_progress', not 'transmitting'
    media: cec: avoid decrementing transmit_queue_sz if it is 0
    media: cec: CEC 2.0-only bcast messages were ignored

    Linus Torvalds
     

04 Jan, 2020

8 commits

  • Pull btrfs fixes from David Sterba:
    "A few fixes for btrfs:

    - blkcg accounting problem with compression that could stall writes

    - setting up blkcg bio for compression crashes due to NULL bdev
    pointer

    - fix possible infinite loop in writeback for nocow files (here
    possible means almost impossible, 13 things that need to happen to
    trigger it)"

    * tag 'for-5.5-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: fix infinite loop during nocow writeback due to race
    btrfs: fix compressed write bio blkcg attribution
    btrfs: punt all bios created in btrfs_submit_compressed_write()

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Three fixes in here:

    - Fix for a missing split on default memory boundary mask (4G) (Ming)

    - Fix for multi-page read bio truncate (Ming)

    - Fix for null_blk zone close request handling (Damien)"

    * tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block:
    null_blk: Fix REQ_OP_ZONE_CLOSE handling
    block: fix splitting segments on boundary masks
    block: add bio_truncate to fix guard_bio_eod

    Linus Torvalds
     
  • …/masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - fix build error in usr/gen_initramfs_list.sh

    - fix libelf-dev dependency in deb-pkg build

    * tag 'kbuild-fixes-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kbuild/deb-pkg: annotate libelf-dev dependency as :native
    gen_initramfs_list.sh: fix 'bad variable name' error

    Linus Torvalds
     
  • Pull thread fixes from Christian Brauner:
    "Here are two fixes:

    - Panic earlier when global init exits to generate useable coredumps.

    Currently, when global init and all threads in its thread-group
    have exited we panic via:

    do_exit()
    -> exit_notify()
    -> forget_original_parent()
    -> find_child_reaper()

    This makes it hard to extract a useable coredump for global init
    from a kernel crashdump because by the time we panic exit_mm() will
    have already released global init's mm. We now panic slightly
    earlier. This has been a problem in certain environments such as
    Android.

    - Fix a race in assigning and reading taskstats for thread-groups
    with more than one thread.

    This patch has been waiting for quite a while since people
    disagreed on what the correct fix was at first"

    * tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    exit: panic before exit_mm() on global init exit
    taskstats: fix data-race

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    "Two more powerpc fixes for 5.5:

    - One commit to fix a build error when CONFIG_JUMP_LABEL=n,
    introduced by our recent fix to is_shared_processor().

    - A commit marking some SLB related functions as notrace, as tracing
    them triggers warnings.

    Thanks to Jason A Donenfeld"

    * tag 'powerpc-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/spinlocks: Include correct header for static key
    powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "Nothing to worry at this stage but all nice small changes:

    - A regression fix for AMD GPU detection in HD-audio

    - A long-standing sleep-in-atomic fix for an ice1724 device

    - Usual suspects, the device-specific quirks for HD- and USB-audio"

    * tag 'sound-5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
    ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
    ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
    ALSA: hda - Apply sync-write workaround to old Intel platforms, too
    ALSA: hda/hdmi - fix atpx_present when CLASS is not VGA
    ALSA: usb-audio: fix set_format altsetting sanity check
    ALSA: hda/realtek - Add headset Mic no shutup for ALC283
    ALSA: usb-audio: set the interface format after resume on Dell WD19

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "New Years fixes! Mostly amdgpu with a light smattering of arm
    graphics, and two AGP warning fixes.

    Quiet as expected, hopefully we don't get a post holiday rush.

    agp:
    - two unused variable removed

    amdgpu:
    - ATPX regression fix
    - SMU metrics table locking fixes
    - gfxoff fix for raven
    - RLC firmware loading stability fix

    mediatek:
    - external display fix
    - dsi timing fix

    sun4i:
    - Fix double-free in connector/encoder cleanup (Stefan)

    maildp:
    - Make vtable static (Ben)"

    * tag 'drm-fixes-2020-01-03' of git://anongit.freedesktop.org/drm/drm:
    agp: remove unused variable arqsz in agp_3_5_enable()
    agp: remove unused variable mcapndx
    drm/amdgpu: correct RLC firmwares loading sequence
    drm/amdgpu: enable gfxoff for raven1 refresh
    drm/amdgpu/smu: add metrics table lock for vega20 (v2)
    drm/amdgpu/smu: add metrics table lock for navi (v2)
    drm/amdgpu/smu: add metrics table lock for arcturus (v2)
    drm/amdgpu/smu: add metrics table lock
    Revert "drm/amdgpu: simplify ATPX detection"
    drm/arm/mali: make malidp_mw_connector_helper_funcs static
    drm/sun4i: hdmi: Remove duplicate cleanup calls
    drm/mediatek: reduce the hbp and hfp for phy timing
    drm/mediatek: Fix can't get component for external display plane.
    drm/mediatek: Check return value of mtk_drm_ddp_comp_for_plane.

    Linus Torvalds
     
  • LTP memfd_create04 started failing for some huge page sizes
    after v5.4-10135-gc3bfc5dd73c6.

    The problem is the check introduced to for_each_hstate() loop that
    should skip default_hstate_idx. Since it doesn't update 'i' counter,
    all subsequent huge page sizes are skipped as well.

    Fixes: 8fc312b32b25 ("mm/hugetlbfs: fix error handling when setting up mounts")
    Signed-off-by: Jan Stancek
    Reviewed-by: Mike Kravetz
    Signed-off-by: Linus Torvalds

    Jan Stancek
     

03 Jan, 2020

4 commits

  • Cross compiling the x86 kernel on a non-x86 build machine produces
    the following error when CONFIG_UNWINDER_ORC is enabled, regardless
    of whether libelf-dev is installed or not.

    dpkg-checkbuilddeps: error: Unmet build dependencies: libelf-dev
    dpkg-buildpackage: warning: build dependencies/conflicts unsatisfied; aborting
    dpkg-buildpackage: warning: (Use -d flag to override.)

    Since this is a build time dependency for a build tool, we need to
    depend on the native version of libelf-dev so add the appropriate
    annotation.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Masahiro Yamada

    Ard Biesheuvel
     
  • Prior to commit 858805b336be ("kbuild: add $(BASH) to run scripts with
    bash-extension"), this shell script was almost always run by bash since
    bash is usually installed on the system by default.

    Now, this script is run by sh, which might be a symlink to dash. On such
    distributions, the following code emits an error:

    local dev=`LC_ALL=C ls -l "${location}"`

    You can reproduce the build error, for example by setting
    CONFIG_INITRAMFS_SOURCE="/dev".

    GEN usr/initramfs_data.cpio.gz
    ./usr/gen_initramfs_list.sh: 131: local: 1: bad variable name
    make[1]: *** [usr/Makefile:61: usr/initramfs_data.cpio.gz] Error 2

    This is because `LC_ALL=C ls -l "${location}"` contains spaces.
    Surrounding it with double-quotes fixes the error.

    Fixes: 858805b336be ("kbuild: add $(BASH) to run scripts with bash-extension")
    Reported-by: Jory A. Pratt
    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     
  • A struct that needs to be aligned to 32 bytes has a size of 28. Increase
    the size to 32.

    This makes elements of arrays of this struct aligned to 32 as well, and
    other structs where members are aligned to 32 mixing
    ipu3_uapi_awb_fr_config_s as well as other types.

    Fixes: commit dca5ef2aa1e6 ("media: staging/intel-ipu3: remove the unnecessary compiler flags")
    Signed-off-by: Sakari Ailus
    Tested-by: Bingbu Cao
    Signed-off-by: Mauro Carvalho Chehab

    Sakari Ailus
     
  • The condition should be logical NOT to assign the hook address to parent
    address. Because the return value 0 of function_graph_enter upon
    success.

    Fixes: e949b6db51dc (riscv/function_graph: Simplify with function_graph_enter())
    Signed-off-by: Zong Li
    Reviewed-by: Steven Rostedt (VMware)
    Cc: stable@vger.kernel.org
    Signed-off-by: Paul Walmsley

    Zong Li