07 May, 2020

1 commit

  • eventfd is using ->read() as it's file_operations read handler, but
    this prevents passing in information about whether a given IO operation
    is blocking or not. We can only use the file flags for that. To support
    async (-EAGAIN/poll based) retries for io_uring, we need ->read_iter()
    support. Convert eventfd to using ->read_iter().

    With ->read_iter(), we can support IOCB_NOWAIT. Ensure the fd setup
    is done such that we set file->f_mode with FMODE_NOWAIT.

    [missing include added]

    Signed-off-by: Jens Axboe
    Signed-off-by: Al Viro

    Jens Axboe
     

13 Apr, 2020

10 commits

  • Linus Torvalds
     
  • This sorts the actual field names too, potentially causing even more
    chaos and confusion at merge time if you have edited the MAINTAINERS
    file. But the end result is a more consistent layout, and hopefully
    it's a one-time pain minimized by doing this just before the -rc1
    release.

    This was entirely scripted:

    ./scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS --order

    Requested-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • They are all supposed to be sorted, but people who add new entries don't
    always know the alphabet. Plus sometimes the entry names get edited,
    and people don't then re-order the entry.

    Let's see how painful this will be for merging purposes (the MAINTAINERS
    file is often edited in various different trees), but Joe claims there's
    relatively few patches in -next that touch this, and doing it just
    before -rc1 is likely the best time. Fingers crossed.

    This was scripted with

    /scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS

    but then I also ended up manually upper-casing a few entry names that
    stood out when looking at the end result.

    Requested-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A set of three patches to fix the fallout of the newly added split
    lock detection feature.

    It addressed the case where a KVM guest triggers a split lock #AC and
    KVM reinjects it into the guest which is not prepared to handle it.

    Add proper sanity checks which prevent the unconditional injection
    into the guest and handles the #AC on the host side in the same way as
    user space detections are handled. Depending on the detection mode it
    either warns and disables detection for the task or kills the task if
    the mode is set to fatal"

    * tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest
    KVM: x86: Emulate split-lock access as a write in emulator
    x86/split_lock: Provide handle_guest_split_lock()

    Linus Torvalds
     
  • Pull time(keeping) updates from Thomas Gleixner:

    - Fix the time_for_children symlink in /proc/$PID/ so it properly
    reflects that it part of the 'time' namespace

    - Add the missing userns limit for the allowed number of time
    namespaces, which was half defined but the actual array member was
    not added. This went unnoticed as the array has an exessive empty
    member at the end but introduced a user visible regression as the
    output was corrupted.

    - Prevent further silent ucount corruption by adding a BUILD_BUG_ON()
    to catch half updated data.

    * tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ucount: Make sure ucounts in /proc/sys/user don't regress again
    time/namespace: Add max_time_namespaces ucount
    time/namespace: Fix time_for_children symlink

    Linus Torvalds
     
  • Pull scheduler fixes/updates from Thomas Gleixner:

    - Deduplicate the average computations in the scheduler core and the
    fair class code.

    - Fix a raise between runtime distribution and assignement which can
    cause exceeding the quota by up to 70%.

    - Prevent negative results in the imbalanace calculation

    - Remove a stale warning in the workqueue code which can be triggered
    since the call site was moved out of preempt disabled code. It's a
    false positive.

    - Deduplicate the print macros for procfs

    - Add the ucmap values to the SCHED_DEBUG procfs output for completness

    * tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/debug: Add task uclamp values to SCHED_DEBUG procfs
    sched/debug: Factor out printing formats into common macros
    sched/debug: Remove redundant macro define
    sched/core: Remove unused rq::last_load_update_tick
    workqueue: Remove the warning in wq_worker_sleeping()
    sched/fair: Fix negative imbalance in imbalance calculation
    sched/fair: Fix race between runtime distribution and assignment
    sched/fair: Align rq->avg_idle and rq->avg_scan_cost

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "Three fixes/updates for perf:

    - Fix the perf event cgroup tracking which tries to track the cgroup
    even for disabled events.

    - Add Ice Lake server support for uncore events

    - Disable pagefaults when retrieving the physical address in the
    sampling code"

    * tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/core: Disable page faults when getting phys address
    perf/x86/intel/uncore: Add Ice Lake server uncore support
    perf/cgroup: Correct indirection in perf_less_group_idx()
    perf/core: Fix event cgroup tracking

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "Three small fixes/updates for the locking core code:

    - Plug a task struct reference leak in the percpu rswem
    implementation.

    - Document the refcount interaction with PID_MAX_LIMIT

    - Improve the 'invalid wait context' data dump in lockdep so it
    contains all information which is required to decode the problem"

    * tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/lockdep: Improve 'invalid wait context' splat
    locking/refcount: Document interaction with PID_MAX_LIMIT
    locking/percpu-rwsem: Fix a task_struct refcount

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "Ten cifs/smb fixes:

    - five RDMA (smbdirect) related fixes

    - add experimental support for swap over SMB3 mounts

    - also a fix which improves performance of signed connections"

    * tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
    smb3: enable swap on SMB3 mounts
    smb3: change noisy error message to FYI
    smb3: smbdirect support can be configured by default
    cifs: smbd: Do not schedule work to send immediate packet on every receive
    cifs: smbd: Properly process errors on ib_post_send
    cifs: Allocate crypto structures on the fly for calculating signatures of incoming packets
    cifs: smbd: Update receive credits before sending and deal with credits roll back on failure before sending
    cifs: smbd: Check send queue size before posting a send
    cifs: smbd: Merge code to track pending packets
    cifs: ignore cached share root handle closing errors

    Linus Torvalds
     
  • Pull NFS client bugfix from Trond Myklebust:
    "Fix an RCU read lock leakage in pnfs_alloc_ds_commits_list()"

    * tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    pNFS: Fix RCU lock leakage

    Linus Torvalds
     

12 Apr, 2020

4 commits


11 Apr, 2020

25 commits

  • Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
    the RCU lock.

    Fixes: a9901899b649 ("pNFS: Add infrastructure for cleaning up per-layout commit structures")
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Two types of #AC can be generated in Intel CPUs:
    1. legacy alignment check #AC
    2. split lock #AC

    Reflect #AC back into the guest if the guest has legacy alignment checks
    enabled or if split lock detection is disabled.

    If the #AC is not a legacy one and split lock detection is enabled, then
    invoke handle_guest_split_lock() which will either warn and disable split
    lock detection for this task or force SIGBUS on it.

    [ tglx: Switch it to handle_guest_split_lock() and rename the misnamed
    helper function. ]

    Suggested-by: Sean Christopherson
    Signed-off-by: Xiaoyao Li
    Signed-off-by: Sean Christopherson
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Borislav Petkov
    Acked-by: Paolo Bonzini
    Link: https://lkml.kernel.org/r/20200410115517.176308876@linutronix.de

    Xiaoyao Li
     
  • Emulate split-lock accesses as writes if split lock detection is on
    to avoid #AC during emulation, which will result in a panic(). This
    should never occur for a well-behaved guest, but a malicious guest can
    manipulate the TLB to trigger emulation of a locked instruction[1].

    More discussion can be found at [2][3].

    [1] https://lkml.kernel.org/r/8c5b11c9-58df-38e7-a514-dc12d687b198@redhat.com
    [2] https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com
    [3] https://lkml.kernel.org/r/20200227001117.GX9940@linux.intel.com

    Suggested-by: Sean Christopherson
    Signed-off-by: Xiaoyao Li
    Signed-off-by: Sean Christopherson
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Borislav Petkov
    Acked-by: Paolo Bonzini
    Link: https://lkml.kernel.org/r/20200410115517.084300242@linutronix.de

    Xiaoyao Li
     
  • Without at least minimal handling for split lock detection induced #AC,
    VMX will just run into the same problem as the VMWare hypervisor, which
    was reported by Kenneth.

    It will inject the #AC blindly into the guest whether the guest is
    prepared or not.

    Provide a function for guest mode which acts depending on the host
    SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a
    warning, disable SLD and mark the task accordingly. Otherwise force
    SIGBUS.

    [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Borislav Petkov
    Acked-by: Paolo Bonzini
    Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de
    Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de

    Thomas Gleixner
     
  • The keyword here is 'twice' to explain the trick.

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     
  • Merge yet more updates from Andrew Morton:

    - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
    gup, hugetlb, pagemap, memremap)

    - Various other things (hfs, ocfs2, kmod, misc, seqfile)

    * akpm: (34 commits)
    ipc/util.c: sysvipc_find_ipc() should increase position index
    kernel/gcov/fs.c: gcov_seq_next() should increase position index
    fs/seq_file.c: seq_read(): add info message about buggy .next functions
    drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
    change email address for Pali Rohár
    selftests: kmod: test disabling module autoloading
    selftests: kmod: fix handling test numbers above 9
    docs: admin-guide: document the kernel.modprobe sysctl
    fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
    kmod: make request_module() return an error when autoloading is disabled
    mm/memremap: set caching mode for PCI P2PDMA memory to WC
    mm/memory_hotplug: add pgprot_t to mhp_params
    powerpc/mm: thread pgprot_t through create_section_mapping()
    x86/mm: introduce __set_memory_prot()
    x86/mm: thread pgprot_t through init_memory_mapping()
    mm/memory_hotplug: rename mhp_restrictions to mhp_params
    mm/memory_hotplug: drop the flags field from struct mhp_restrictions
    mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
    mm/vma: introduce VM_ACCESS_FLAGS
    mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
    ...

    Linus Torvalds
     
  • Pull Documentation fixes from Jonathan Corbet:
    "A handful of late-arriving fixes for the documentation tree"

    * tag 'docs-5.7-2' of git://git.lwn.net/linux:
    Documentation: android: binderfs: add 'stats' mount option
    Documentation: driver-api/usb/writing_usb_driver.rst Updates documentation links
    docs: driver-api: address duplicate label warning
    Documentation: sysrq: fix RST formatting
    docs: kernel-parameters.txt: Fix broken references
    docs: kernel-parameters.txt: Remove nompx
    docs: filesystems: fix typo in qnx6.rst

    Linus Torvalds
     
  • Pull orangefs updates from Mike Marshall:
    "A fix and two cleanups.

    Fix:

    - Christoph Hellwig noticed that some logic I added to
    orangefs_file_read_iter introduced a race condition, so he sent a
    reversion patch. I had to modify his patch since reverting at this
    point broke Orangefs.

    Cleanups:

    - Christoph Hellwig noticed that we were doing some unnecessary work
    in orangefs_flush, so he sent in a patch that removed the un-needed
    code.

    - Al Viro told me he had trouble building Orangefs. Orangefs should
    be easy to build, even for Al :-).

    I looked back at the test server build notes in orangefs.txt, just
    in case that's where the trouble really is, and found a couple of
    typos and made a couple of clarifications"

    * tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: clarify build steps for test server in orangefs.txt
    orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
    orangefs: get rid of knob code...

    Linus Torvalds
     
  • Pull xtensa updates from Max Filippov:

    - replace setup_irq() by request_irq()

    - cosmetic fixes in xtensa Kconfig and boot/Makefile

    * tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa:
    arch/xtensa: fix grammar in Kconfig help text
    xtensa: remove meaningless export ccflags-y
    xtensa: replace setup_irq() by request_irq()

    Linus Torvalds
     
  • Pull more xen updates from Juergen Gross:

    - two cleanups

    - fix a boot regression introduced in this merge window

    - fix wrong use of memory allocation flags

    * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    x86/xen: fix booting 32-bit pv guest
    x86/xen: make xen_pvmmu_arch_setup() static
    xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
    xen: Use evtchn_type_t as a type for event channels

    Linus Torvalds
     
  • If seq_file .next function does not change position index, read after
    some lseek can generate unexpected output.

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Acked-by: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Al Viro
    Cc: Ingo Molnar
    Cc: NeilBrown
    Cc: Peter Oberparleiter
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/b7a20945-e315-8bb0-21e6-3875c14a8494@virtuozzo.com
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • If seq_file .next function does not change position index, read after
    some lseek can generate unexpected output.

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Acked-by: Peter Oberparleiter
    Cc: Al Viro
    Cc: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Manfred Spraul
    Cc: NeilBrown
    Cc: Steven Rostedt
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Patch series "seq_file .next functions should increase position index".

    In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c:
    simplify seq_file iteration code and interface")

    "Some ->next functions do not increment *pos when they return NULL...
    Note that such ->next functions are buggy and should be fixed. A simple
    demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size
    larger than the size of /proc/swaps. This will always show the whole
    last line of /proc/swaps"

    Described problem is still actual. If you make lseek into middle of
    last output line following read will output end of last line and whole
    last line once again.

    $ dd if=/proc/swaps bs=1 # usual output
    Filename Type Size Used Priority
    /dev/dm-0 partition 4194812 97536 -2
    104+0 records in
    104+0 records out
    104 bytes copied

    $ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice
    dd: /proc/swaps: cannot skip to specified offset
    v/dm-0 partition 4194812 97536 -2
    /dev/dm-0 partition 4194812 97536 -2
    3+1 records in
    3+1 records out
    131 bytes copied

    There are lot of other affected files, I've found 30+ including
    /proc/net/ip_tables_matches and /proc/sysvipc/*

    I've sent patches into maillists of affected subsystems already, this
    patch-set fixes the problem in files related to pstore, tracing, gcov,
    sysvipc and other subsystems processed via linux-kernel@ mailing list
    directly

    https://bugzilla.kernel.org/show_bug.cgi?id=206283

    This patch (of 4):

    Add debug code to seq_read() to detect missed or out-of-tree incorrect
    .next seq_file functions.

    [akpm@linux-foundation.org: s/pr_info/pr_info_ratelimited/, per Qian Cai]
    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Cc: NeilBrown
    Cc: Al Viro
    Cc: Steven Rostedt
    Cc: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Manfred Spraul
    Cc: Peter Oberparleiter
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/244674e5-760c-86bd-d08a-047042881748@virtuozzo.com
    Link: http://lkml.kernel.org/r/7c24087c-e280-e580-5b0c-0cdaeb14cd18@virtuozzo.com
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Remove dev_err() messages after platform_get_irq*() failures.
    platform_get_irq() already prints an error.

    Generated by: scripts/coccinelle/api/platform_get_irq.cocci

    Fixes: 6c41ac96ad92 ("dmaengine: tegra-apb: Support COMPILE_TEST")
    Signed-off-by: kbuild test robot
    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Reviewed-by: Dmitry Osipenko
    Acked-by: Thierry Reding
    Cc: Laxman Dewangan
    Cc: Vinod Koul
    Cc: Stephen Warren
    Cc: Jon Hunter
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002271133450.2973@hadrien
    Signed-off-by: Linus Torvalds

    kbuild test robot
     
  • For security reasons I stopped using gmail account and kernel address is
    now up-to-date alias to my personal address.

    People periodically send me emails to address which they found in source
    code of drivers, so this change reflects state where people can contact
    me.

    [ Added .mailmap entry as per Joe Perches - Linus ]
    Signed-off-by: Pali Rohár
    Signed-off-by: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: Joe Perches
    Link: http://lkml.kernel.org/r/20200307104237.8199-1-pali@kernel.org
    Signed-off-by: Linus Torvalds

    Pali Rohár
     
  • Test that request_module() fails with -ENOENT when
    /proc/sys/kernel/modprobe contains (a) a nonexistent path, and (b) an
    empty path.

    Case (b) is a regression test for the patch "kmod: make request_module()
    return an error when autoloading is disabled".

    Tested with 'kmod.sh -t 0010 && kmod.sh -t 0011', and also simply with
    'kmod.sh' to run all kmod tests.

    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Acked-by: Luis Chamberlain
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Jessica Yu
    Cc: Kees Cook
    Cc: NeilBrown
    Link: http://lkml.kernel.org/r/20200312202552.241885-5-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • get_test_count() and get_test_enabled() were broken for test numbers
    above 9 due to awk interpreting a field specification like '$0010' as
    octal rather than decimal. Fix it by stripping the leading zeroes.

    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Acked-by: Luis Chamberlain
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Jessica Yu
    Cc: Kees Cook
    Cc: NeilBrown
    Link: http://lkml.kernel.org/r/20200318230515.171692-5-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Document the kernel.modprobe sysctl in the same place that all the other
    kernel.* sysctls are documented. Make sure to mention how to use this
    sysctl to completely disable module autoloading, and how this sysctl
    relates to CONFIG_STATIC_USERMODEHELPER.

    [ebiggers@google.com: v5]
    Link: http://lkml.kernel.org/r/20200318230515.171692-4-ebiggers@kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Jessica Yu
    Cc: Kees Cook
    Cc: Luis Chamberlain
    Cc: NeilBrown
    Link: http://lkml.kernel.org/r/20200312202552.241885-4-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • After request_module(), nothing is stopping the module from being
    unloaded until someone takes a reference to it via try_get_module().

    The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
    running 'rmmod' concurrently.

    Since WARN_ONCE() is for kernel bugs only, not for user-reachable
    situations, downgrade this warning to pr_warn_once().

    Keep it printed once only, since the intent of this warning is to detect
    a bug in modprobe at boot time. Printing the warning more than once
    wouldn't really provide any useful extra information.

    Fixes: 41124db869b7 ("fs: warn in case userspace lied about modprobe return")
    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Reviewed-by: Jessica Yu
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Jessica Yu
    Cc: Kees Cook
    Cc: Luis Chamberlain
    Cc: NeilBrown
    Cc: [4.13+]
    Link: http://lkml.kernel.org/r/20200312202552.241885-3-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Patch series "module autoloading fixes and cleanups", v5.

    This series fixes a bug where request_module() was reporting success to
    kernel code when module autoloading had been completely disabled via
    'echo > /proc/sys/kernel/modprobe'.

    It also addresses the issues raised on the original thread
    (https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
    bydocumenting the modprobe sysctl, adding a self-test for the empty path
    case, and downgrading a user-reachable WARN_ONCE().

    This patch (of 4):

    It's long been possible to disable kernel module autoloading completely
    (while still allowing manual module insertion) by setting
    /proc/sys/kernel/modprobe to the empty string.

    This can be preferable to setting it to a nonexistent file since it
    avoids the overhead of an attempted execve(), avoids potential
    deadlocks, and avoids the call to security_kernel_module_request() and
    thus on SELinux-based systems eliminates the need to write SELinux rules
    to dontaudit module_request.

    However, when module autoloading is disabled in this way,
    request_module() returns 0. This is broken because callers expect 0 to
    mean that the module was successfully loaded.

    Apparently this was never noticed because this method of disabling
    module autoloading isn't used much, and also most callers don't use the
    return value of request_module() since it's always necessary to check
    whether the module registered its functionality or not anyway.

    But improperly returning 0 can indeed confuse a few callers, for example
    get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

    if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
    fs = __get_fs_type(name, len);
    WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
    }

    This is easily reproduced with:

    echo > /proc/sys/kernel/modprobe
    mount -t NONEXISTENT none /

    It causes:

    request_module fs-NONEXISTENT succeeded, but still no fs?
    WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
    [...]

    This should actually use pr_warn_once() rather than WARN_ONCE(), since
    it's also user-reachable if userspace immediately unloads the module.
    Regardless, request_module() should correctly return an error when it
    fails. So let's make it return -ENOENT, which matches the error when
    the modprobe binary doesn't exist.

    I've also sent patches to document and test this case.

    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Reviewed-by: Jessica Yu
    Acked-by: Luis Chamberlain
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Ben Hutchings
    Cc: Josh Triplett
    Cc:
    Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
    Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • PCI BAR IO memory should never be mapped as WB, however prior to this
    the PAT bits were set WB and it was typically overridden by MTRR
    registers set by the firmware.

    Set PCI P2PDMA memory to be UC as this is what it currently, typically,
    ends up being mapped as on x86 after the MTRR registers override the
    cache setting.

    Future use-cases may need to generalize this by adding flags to select
    the caching type, as some P2PDMA cases may not want UC. However, those
    use-cases are not upstream yet and this can be changed when they arrive.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Jason Gunthorpe
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • devm_memremap_pages() is currently used by the PCI P2PDMA code to create
    struct page mappings for IO memory. At present, these mappings are
    created with PAGE_KERNEL which implies setting the PAT bits to be WB.
    However, on x86, an mtrr register will typically override this and force
    the cache type to be UC-. In the case firmware doesn't set this
    register it is effectively WB and will typically result in a machine
    check exception when it's accessed.

    Other arches are not currently likely to function correctly seeing they
    don't have any MTRR registers to fall back on.

    To solve this, provide a way to specify the pgprot value explicitly to
    arch_add_memory().

    Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
    simple change to pass the pgprot_t down to their respective functions
    which set up the page tables. For x86_32, set the page tables
    explicitly using _set_memory_prot() (seeing they are already mapped).

    For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
    should be fine, for now, seeing these architectures don't support
    ZONE_DEVICE.

    A check in __add_pages() is also added to ensure the pgprot parameter
    was set for all arches.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Dan Williams
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • In prepartion to support a pgprot_t argument for arch_add_memory().

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Michal Hocko
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • For use in the 32bit arch_add_memory() to set the pgprot type of the
    memory to add.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • In preparation to support a pgprot_t argument for arch_add_memory().

    It's required to move the prototype of init_memory_mapping() seeing the
    original location came before the definition of pgprot_t.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe