18 Mar, 2021

3 commits

  • Matthias reports that the Amazon Kindle automatically removes its
    emulated media if it doesn't receive another SCSI command within about
    one second after a SYNCHRONIZE CACHE. It does so even when the host
    has sent a PREVENT MEDIUM REMOVAL command. The reason for this
    behavior isn't clear, although it's not hard to make some guesses.

    At any rate, the results can be unexpected for anyone who tries to
    access the Kindle in an unusual fashion, and in theory they can lead
    to data loss (for example, if one file is closed and synchronized
    while other files are still in the middle of being written).

    To avoid such problems, this patch creates a new usb-storage quirks
    flag telling the driver always to issue a REQUEST SENSE following a
    SYNCHRONIZE CACHE command, and adds an unusual_devs entry for the
    Kindle with the flag set. This is sufficient to prevent the Kindle
    from doing its automatic unload, without interfering with proper
    operation.

    Another possible way to deal with this would be to increase the
    frequency of TEST UNIT READY polling that the kernel normally carries
    out for removable-media storage devices. However that would increase
    the overall load on the system and it is not as reliable, because the
    user can override the polling interval. Changing the driver's
    behavior is safer and has minimal overhead.

    CC:
    Reported-and-tested-by: Matthias Schwarzott
    Signed-off-by: Alan Stern
    Link: https://lore.kernel.org/r/20210317190654.GA497856@rowland.harvard.edu
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • When gadget is disconnected, running sequence is like this.
    . composite_disconnect
    . Call trace:
    usb_string_copy+0xd0/0x128
    gadget_config_name_configuration_store+0x4
    gadget_config_name_attr_store+0x40/0x50
    configfs_write_file+0x198/0x1f4
    vfs_write+0x100/0x220
    SyS_write+0x58/0xa8
    . configfs_composite_unbind
    . configfs_composite_bind

    In configfs_composite_bind, it has
    "cn->strings.s = cn->configuration;"

    When usb_string_copy is invoked. it would
    allocate memory, copy input string, release previous pointed memory space,
    and use new allocated memory.

    When gadget is connected, host sends down request to get information.
    Call trace:
    usb_gadget_get_string+0xec/0x168
    lookup_string+0x64/0x98
    composite_setup+0xa34/0x1ee8

    If gadget is disconnected and connected quickly, in the failed case,
    cn->configuration memory has been released by usb_string_copy kfree but
    configfs_composite_bind hasn't been run in time to assign new allocated
    "cn->configuration" pointer to "cn->strings.s".

    When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
    memory is accessed, "BUG: KASAN: use-after-free" error occurs.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jim Lin
    Signed-off-by: Macpaul Lin
    Link: https://lore.kernel.org/r/1615444961-13376-1-git-send-email-macpaul.lin@mediatek.com
    Signed-off-by: Greg Kroah-Hartman

    Jim Lin
     
  • Currently udc->ud.tcp_rx is being assigned twice, the second assignment
    is incorrect, it should be to udc->ud.tcp_tx instead of rx. Fix this.

    Fixes: 46613c9dfa96 ("usbip: fix vudc usbip_sockfd_store races leading to gpf")
    Acked-by: Shuah Khan
    Signed-off-by: Colin Ian King
    Cc: stable
    Addresses-Coverity: ("Unused value")
    Link: https://lore.kernel.org/r/20210311104445.7811-1-colin.king@canonical.com
    Signed-off-by: Greg Kroah-Hartman

    Colin Ian King
     

16 Mar, 2021

1 commit


15 Mar, 2021

14 commits

  • …el/git/westeri/thunderbolt into usb-linus

    Mika writes:

    thunderbolt: Fixes for v5.12-rc4

    This includes a fix to initialize HopID IDAs earlier to make sure
    tb_switch_release() always works, and another fix that increases runtime
    PM reference count on DisplayPort tunnel discovery.

    * tag 'thunderbolt-for-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
    thunderbolt: Increase runtime PM reference count on DP tunnel discovery
    thunderbolt: Initialize HopID IDAs in tb_switch_alloc()

    Greg Kroah-Hartman
     
  • Linus Torvalds
     
  • Doing a

    prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);

    will copy 1 byte from userspace to (quite big) on-stack array
    and then stash everything to mm->saved_auxv.
    AT_NULL terminator will be inserted at the very end.

    /proc/*/auxv handler will find that AT_NULL terminator
    and copy original stack contents to userspace.

    This devious scheme requires CAP_SYS_RESOURCE.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Pull irq fixes from Thomas Gleixner:
    "A set of irqchip updates:

    - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct

    - Add a missing DT compatible string for the Ingenic driver

    - Remove the pointless debugfs_file pointer from struct irqdomain"

    * tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/ingenic: Add support for the JZ4760
    dt-bindings/irq: Add compatible string for the JZ4760B
    irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
    ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
    irqdomain: Remove debugfs_file from struct irq_domain

    Linus Torvalds
     
  • Pull timer fix from Thomas Gleixner:
    "A single fix in for hrtimers to prevent an interrupt storm caused by
    the lack of reevaluation of the timers which expire in softirq context
    under certain circumstances, e.g. when the clock was set"

    * tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

    Linus Torvalds
     
  • Pull scheduler fixes from Thomas Gleixner:
    "A set of scheduler updates:

    - Prevent a NULL pointer dereference in the migration_stop_cpu()
    mechanims

    - Prevent self concurrency of affine_move_task()

    - Small fixes and cleanups related to task migration/affinity setting

    - Ensure that sync_runqueues_membarrier_state() is invoked on the
    current CPU when it is in the cpu mask"

    * tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/membarrier: fix missing local execution of ipi_sync_rq_state()
    sched: Simplify set_affinity_pending refcounts
    sched: Fix affine_move_task() self-concurrency
    sched: Optimize migration_cpu_stop()
    sched: Collate affine_move_task() stoppers
    sched: Simplify migration_cpu_stop()
    sched: Fix migration_cpu_stop() requeueing

    Linus Torvalds
     
  • Pull objtool fix from Thomas Gleixner:
    "A single objtool fix to handle the PUSHF/POPF validation correctly for
    the paravirt changes which modified arch_local_irq_restore not to use
    popf"

    * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    objtool,x86: Fix uaccess PUSHF/POPF validation

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "A couple of locking fixes:

    - A fix for the static_call mechanism so it handles unaligned
    addresses correctly.

    - Make u64_stats_init() a macro so every instance gets a seperate
    lockdep key.

    - Make seqcount_latch_init() a macro as well to preserve the static
    variable which is used for the lockdep key"

    * tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    seqlock,lockdep: Fix seqcount_latch_init()
    u64_stats,lockdep: Fix u64_stats_init() vs lockdep
    static_call: Fix the module key fixup

    Linus Torvalds
     
  • Pull perf fixes from Borislav Petkov:

    - Make sure PMU internal buffers are flushed for per-CPU events too and
    properly handle PID/TID for large PEBS.

    - Handle the case properly when there's no PMU and therefore return an
    empty list of perf MSRs for VMX to switch instead of reading random
    garbage from the stack.

    * tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
    perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
    perf/core: Flush PMU internal buffers for per-CPU events

    Linus Torvalds
     
  • Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
    "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
    added v5.10, but failed to take the SetVirtualAddressMap() RT service
    into account"

    * tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table

    Linus Torvalds
     
  • Pull x86 fixes from Borislav Petkov:

    - A couple of SEV-ES fixes and robustifications: verify usermode stack
    pointer in NMI is not coming from the syscall gap, correctly track
    IRQ states in the #VC handler and access user insn bytes atomically
    in same handler as latter cannot sleep.

    - Balance 32-bit fast syscall exit path to do the proper work on exit
    and thus not confuse audit and ptrace frameworks.

    - Two fixes for the ORC unwinder going "off the rails" into KASAN
    redzones and when ORC data is missing.

    * tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/sev-es: Use __copy_from_user_inatomic()
    x86/sev-es: Correctly track IRQ states in runtime #VC handler
    x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
    x86/sev-es: Introduce ip_within_syscall_gap() helper
    x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
    x86/unwind/orc: Silence warnings caused by missing ORC data
    x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    "Some more powerpc fixes for 5.12:

    - Fix wrong instruction encoding for lis in ppc_function_entry(),
    which could potentially lead to missed kprobes.

    - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
    non-volatile GPRs immediately after exec.

    - Clean up a missed SRR specifier in the recent interrupt rework.

    - Don't treat unrecoverable_exception() as an interrupt handler, it's
    called from other handlers so shouldn't do the interrupt entry/exit
    accounting itself.

    - Fix build errors caused by missing declarations for
    [en/dis]able_kernel_vsx().

    Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
    Olsa, Naveen N. Rao, and Nicholas Piggin"

    * tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/traps: unrecoverable_exception() is not an interrupt handler
    powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
    powerpc/64s/exception: Clean up a missed SRR specifier
    powerpc: Fix inverted SET_FULL_REGS bitop
    powerpc/64s: Use symbolic macros for function entry encoding
    powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()

    Linus Torvalds
     
  • Pull KVM fixes from Paolo Bonzini:
    "More fixes for ARM and x86"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: LAPIC: Advancing the timer expiration on guest initiated write
    KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
    KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
    kvm: x86: annotate RCU pointers
    KVM: arm64: Fix exclusive limit for IPA size
    KVM: arm64: Reject VM creation when the default IPA size is unsupported
    KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
    KVM: arm64: Don't use cbz/adr with external symbols
    KVM: arm64: Fix range alignment when walking page tables
    KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
    KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
    KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
    KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
    KVM: arm64: Fix nVHE hyp panic host context restore
    KVM: arm64: Avoid corrupting vCPU context register in guest exit
    KVM: arm64: nvhe: Save the SPE context early
    kvm: x86: use NULL instead of using plain integer as pointer
    KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
    KVM: x86: Ensure deadline timer has truly expired before posting its IRQ

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "28 patches.

    Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
    highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
    zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
    ia64"

    * emailed patches from Andrew Morton : (28 commits)
    zram: fix broken page writeback
    zram: fix return value on writeback_store
    mm/memcg: set memcg when splitting page
    mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
    ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
    ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
    mm/userfaultfd: fix memory corruption due to writeprotect
    kasan: fix KASAN_STACK dependency for HW_TAGS
    kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
    mm/madvise: replace ptrace attach requirement for process_madvise
    include/linux/sched/mm.h: use rcu_dereference in in_vfork()
    kfence: fix reports if constant function prefixes exist
    kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
    kfence: fix printk format for ptrdiff_t
    linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
    MAINTAINERS: exclude uapi directories in API/ABI section
    binfmt_misc: fix possible deadlock in bm_register_write
    mm/highmem.c: fix zero_user_segments() with start > end
    hugetlb: do early cow when page pinned on src mm
    mm: use is_cow_mapping() across tree where proper
    ...

    Linus Torvalds
     

14 Mar, 2021

22 commits

  • …t/maz/arm-platforms into irq/urgent

    Pull irqchip fixes from Marc Zyngier:

    - More compatible strings for the Ingenic irqchip (introducing the
    JZ4760B SoC)
    - Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
    - Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
    Kconfig, now relying on the architecture to get it right
    - Drop the debugfs_file field from struct irq_domain, now that
    debugfs can track things on its own

    Thomas Gleixner
     
  • Pull char/misc driver fixes from Greg KH:
    "Here are some small misc/char driver fixes to resolve some reported
    problems:

    - habanalabs driver fixes

    - Acrn build fixes (reported many times)

    - pvpanic module table export fix

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    misc/pvpanic: Export module FDT device table
    misc: fastrpc: restrict user apps from sending kernel RPC messages
    virt: acrn: Correct type casting of argument of copy_from_user()
    virt: acrn: Use EPOLLIN instead of POLLIN
    virt: acrn: Use vfs_poll() instead of f_op->poll()
    virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
    cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
    habanalabs: fix debugfs address translation
    habanalabs: Disable file operations after device is removed
    habanalabs: Call put_pid() when releasing control device
    drivers: habanalabs: remove unused dentry pointer for debugfs files
    habanalabs: mark hl_eq_inc_ptr() as static

    Linus Torvalds
     
  • Pull staging driver fixes from Greg KH:
    "Here are some small staging driver fixes for reported problems. They
    include:

    - wfx header file cleanup patch reverted as it could cause problems

    - comedi driver endian fixes

    - buffer overflow problems for staging wifi drivers

    - build dependency issue for rtl8192e driver

    All have been in linux-next for a while with no reported problems"

    * tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
    Revert "staging: wfx: remove unused included header files"
    staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
    staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
    staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
    staging: comedi: pcl726: Use 16-bit 0 for interrupt data
    staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
    staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
    staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
    staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
    staging: comedi: pcl818: Fix endian problem for AI command data
    staging: comedi: pcl711: Fix endian problem for AI command data
    staging: comedi: me4000: Fix endian problem for AI command data
    staging: comedi: dmm32at: Fix endian problem for AI command data
    staging: comedi: das800: Fix endian problem for AI command data
    staging: comedi: das6402: Fix endian problem for AI command data
    staging: comedi: adv_pci1710: Fix endian problem for AI command data
    staging: comedi: addi_apci_1500: Fix endian problem for command sample
    staging: comedi: addi_apci_1032: Fix endian problem for COS sample
    staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
    staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
    ...

    Linus Torvalds
     
  • Pull tty/serial fixes from Greg KH:
    "Here are some small tty and serial driver fixes to resolve some
    reported problems:

    - led tty trigger fixes based on review and were acked by the led
    maintainer

    - revert a max310x serial driver patch as it was causing problems

    - revert a pty change as it was also causing problems

    All of these have been in linux-next for a while with no reported
    problems"

    * tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    Revert "drivers:tty:pty: Fix a race causing data loss on close"
    Revert "serial: max310x: rework RX interrupt handling"
    leds: trigger/tty: Use led_set_brightness_sync() from workqueue
    leds: trigger: Fix error path to not unlock the unlocked mutex

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch
    of reported issues:

    - usbip fixups for issues found by syzbot

    - xhci driver fixes and quirk additions

    - gadget driver fixes

    - dwc3 QCOM driver fix

    - usb-serial new ids and fixes

    - usblp fix for a long-time issue

    - cdc-acm quirk addition

    - other tiny fixes for reported problems

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
    xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
    usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
    xhci: Improve detection of device initiated wake signal.
    usb: xhci: do not perform Soft Retry for some xHCI hosts
    usbip: fix vudc usbip_sockfd_store races leading to gpf
    usbip: fix vhci_hcd attach_store() races leading to gpf
    usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
    usbip: fix vudc to check for stream socket
    usbip: fix vhci_hcd to check for stream socket
    usbip: fix stub_dev to check for stream socket
    usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
    USB: usblp: fix a hang in poll() if disconnected
    USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
    usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
    usb: dwc3: qcom: Honor wakeup enabled/disabled state
    usb: gadget: f_uac1: stop playback on function disable
    usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
    USB: gadget: u_ether: Fix a configfs return code
    usb: dwc3: qcom: add ACPI device id for sc8180x
    Goodix Fingerprint device is not a modem
    ...

    Linus Torvalds
     
  • Pull erofs fix from Gao Xiang:
    "Fix an urgent regression introduced by commit baa2c7c97153 ("block:
    set .bi_max_vecs as actual allocated vector number"), which could
    cause unexpected hung since linux 5.12-rc1.

    Resolve it by avoiding using bio->bi_max_vecs completely"

    * tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
    erofs: fix bio->bi_max_vecs behavior change

    Linus Torvalds
     
  • …t/masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - avoid 'make image_name' invoking syncconfig

    - fix a couple of bugs in scripts/dummy-tools

    - fix LLD_VENDOR and locale issues in scripts/ld-version.sh

    - rebuild GCC plugins when the compiler is upgraded

    - allow LTO to be enabled with KASAN_HW_TAGS

    - allow LTO to be enabled without LLVM=1

    * tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kbuild: fix ld-version.sh to not be affected by locale
    kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
    kbuild: remove LLVM=1 test from HAS_LTO_CLANG
    kbuild: remove unneeded -O option to dtc
    kbuild: dummy-tools: adjust to scripts/cc-version.sh
    kbuild: Allow LTO to be selected with KASAN_HW_TAGS
    kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
    kbuild: rebuild GCC plugins when the compiler is upgraded
    kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
    kbuild: dummy-tools: fix inverted tests for gcc
    kbuild: add image_name to no-sync-config-targets

    Linus Torvalds
     
  • commit 0d8359620d9b ("zram: support page writeback") introduced two
    problems. It overwrites writeback_store's return value as kstrtol's
    return value, which makes return value zero so user could see zero as
    return value of write syscall even though it wrote data successfully.

    It also breaks index value in the loop in that it doesn't increase the
    index any longer. It means it can write only first starting block index
    so user couldn't write all idle pages in the zram so lose memory saving
    chance.

    This patch fixes those issues.

    Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org
    Fixes: 0d8359620d9b("zram: support page writeback")
    Signed-off-by: Minchan Kim
    Reported-by: Amos Bianchi
    Cc: Sergey Senozhatsky
    Cc: John Dias
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • writeback_store's return value is overwritten by submit_bio_wait's return
    value. Thus, writeback_store will return zero since there was no IO
    error. In the end, write syscall from userspace will see the zero as
    return value, which could make the process stall to keep trying the write
    until it will succeed.

    Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org
    Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store")
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Colin Ian King
    Cc: John Dias
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • As described in the split_page() comment, for the non-compound high order
    page, the sub-pages must be freed individually. If the memcg of the first
    page is valid, the tail pages cannot be uncharged when be freed.

    For example, when alloc_pages_exact is used to allocate 1MB continuous
    physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
    set). When make_alloc_exact free the unused 1MB and free_pages_exact free
    the applied 1MB, actually, only 4KB(one page) is uncharged.

    Therefore, the memcg of the tail page needs to be set when splitting a
    page.

    Michel:

    There are at least two explicit users of __GFP_ACCOUNT with
    alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64:
    Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713
    ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
    just a theoretical issue.

    Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com
    Signed-off-by: Zhou Guanghui
    Acked-by: Johannes Weiner
    Reviewed-by: Zi Yan
    Reviewed-by: Shakeel Butt
    Acked-by: Michal Hocko
    Cc: Hanjun Guo
    Cc: Hugh Dickins
    Cc: Kefeng Wang
    Cc: "Kirill A. Shutemov"
    Cc: Nicholas Piggin
    Cc: Rui Xiang
    Cc: Tianhong Ding
    Cc: Weilong Chen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhou Guanghui
     
  • Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
    in page number argument.

    In this way, the interface name is more common and can be used by
    potential users. In addition, the complete info(memcg and flag) of the
    memcg needs to be set to the tail pages.

    Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com
    Signed-off-by: Zhou Guanghui
    Acked-by: Johannes Weiner
    Reviewed-by: Zi Yan
    Reviewed-by: Shakeel Butt
    Acked-by: Michal Hocko
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Nicholas Piggin
    Cc: Kefeng Wang
    Cc: Hanjun Guo
    Cc: Tianhong Ding
    Cc: Weilong Chen
    Cc: Rui Xiang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhou Guanghui
     
  • In https://bugs.gentoo.org/769614 Dmitry noticed that
    `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.

    The bug is in mismatch between get/set errors:

    static inline long syscall_get_error(struct task_struct *task,
    struct pt_regs *regs)
    {
    return regs->r10 == -1 ? regs->r8:0;
    }

    static inline long syscall_get_return_value(struct task_struct *task,
    struct pt_regs *regs)
    {
    return regs->r8;
    }

    static inline void syscall_set_return_value(struct task_struct *task,
    struct pt_regs *regs,
    int error, long val)
    {
    if (error) {
    /* error < 0, but ia64 uses > 0 return value */
    regs->r8 = -error;
    regs->r10 = -1;
    } else {
    regs->r8 = val;
    regs->r10 = 0;
    }
    }

    Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

    Link: https://lkml.kernel.org/r/20210221002554.333076-2-slyfox@gentoo.org
    Link: https://bugs.gentoo.org/769614
    Signed-off-by: Sergei Trofimovich
    Reported-by: Dmitry V. Levin
    Reviewed-by: Dmitry V. Levin
    Cc: John Paul Adrian Glaubitz
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergei Trofimovich
     
  • In https://bugs.gentoo.org/769614 Dmitry noticed that
    `ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via
    glibc's syscall() wrapper.

    ia64 has two ways to call syscalls from userspace: via `break` and via
    `eps` instructions.

    The difference is in stack layout:

    1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
    2. `break` uses userspace stack frame: may be locals (glibc provides
    one), in{0..7} == out{0..8}.

    Both work fine in syscall handling cde itself.

    But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
    re-extract syscall arguments but it does not account for locals.

    The change always skips locals registers. It should not change `eps`
    path as kernel's handler already enforces locals=0 and fixes `break`.

    Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

    Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org
    Link: https://bugs.gentoo.org/769614
    Signed-off-by: Sergei Trofimovich
    Reported-by: Dmitry V. Levin
    Cc: Oleg Nesterov
    Cc: John Paul Adrian Glaubitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergei Trofimovich
     
  • Userfaultfd self-test fails occasionally, indicating a memory corruption.

    Analyzing this problem indicates that there is a real bug since mmap_lock
    is only taken for read in mwriteprotect_range() and defers flushes, and
    since there is insufficient consideration of concurrent deferred TLB
    flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in
    wp_page_copy(), this flush takes place after the copy has already been
    performed, and therefore changes of the page are possible between the time
    of the copy and the time in which the PTE is flushed.

    To make matters worse, memory-unprotection using userfaultfd also poses a
    problem. Although memory unprotection is logically a promotion of PTE
    permissions, and therefore should not require a TLB flush, the current
    userrfaultfd code might actually cause a demotion of the architectural PTE
    permission: when userfaultfd_writeprotect() unprotects memory region, it
    unintentionally *clears* the RW-bit if it was already set. Note that this
    unprotecting a PTE that is not write-protected is a valid use-case: the
    userfaultfd monitor might ask to unprotect a region that holds both
    write-protected and write-unprotected PTEs.

    The scenario that happens in selftests/vm/userfaultfd is as follows:

    cpu0 cpu1 cpu2
    ---- ---- ----
    [ Writable PTE
    cached in TLB ]
    userfaultfd_writeprotect()
    [ write-*unprotect* ]
    mwriteprotect_range()
    mmap_read_lock()
    change_protection()

    change_protection_range()
    ...
    change_pte_range()
    [ *clear* “write”-bit ]
    [ defer TLB flushes ]
    [ page-fault ]
    ...
    wp_page_copy()
    cow_user_page()
    [ copy page ]
    [ write to old
    page ]
    ...
    set_pte_at_notify()

    A similar scenario can happen:

    cpu0 cpu1 cpu2 cpu3
    ---- ---- ---- ----
    [ Writable PTE
    cached in TLB ]
    userfaultfd_writeprotect()
    [ write-protect ]
    [ deferred TLB flush ]
    userfaultfd_writeprotect()
    [ write-unprotect ]
    [ deferred TLB flush]
    [ page-fault ]
    wp_page_copy()
    cow_user_page()
    [ copy page ]
    ... [ write to page ]
    set_pte_at_notify()

    This race exists since commit 292924b26024 ("userfaultfd: wp: apply
    _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent
    since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
    wp_page_copy() more likely to take place, specifically if page_count(page)
    > 1.

    To resolve the aforementioned races, check whether there are pending
    flushes on uffd-write-protected VMAs, and if there are, perform a flush
    before doing the COW.

    Further optimizations will follow to avoid during uffd-write-unprotect
    unnecassary PTE write-protection and TLB flushes.

    Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
    Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
    Signed-off-by: Nadav Amit
    Suggested-by: Yu Zhao
    Reviewed-by: Peter Xu
    Tested-by: Peter Xu
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Pavel Emelyanov
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Minchan Kim
    Cc: Will Deacon
    Cc: Peter Zijlstra
    Cc: [5.9+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadav Amit
     
  • There's a runtime failure when running HW_TAGS-enabled kernel built with
    GCC on hardware that doesn't support MTE. GCC-built kernels always have
    CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't
    supported by HW_TAGS. Having that config enabled causes KASAN to issue
    MTE-only instructions to unpoison kernel stacks, which causes the failure.

    Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used.

    (The commit that introduced CONFIG_KASAN_HW_TAGS specified proper
    dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.)

    Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.1615215441.git.andreyknvl@google.com
    Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS")
    Signed-off-by: Andrey Konovalov
    Reported-by: Catalin Marinas
    Cc:
    Cc: Will Deacon
    Cc: Vincenzo Frascino
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Marco Elver
    Cc: Peter Collingbourne
    Cc: Evgenii Stepanov
    Cc: Branislav Rankov
    Cc: Kevin Brodsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
    after debug_pagealloc_unmap_pages(). This causes a crash when
    debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
    unmapped page.

    This patch puts kasan_free_nondeferred_pages() before
    debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
    the page unavailable.

    Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com
    Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
    Signed-off-by: Andrey Konovalov
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Vincenzo Frascino
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Marco Elver
    Cc: Peter Collingbourne
    Cc: Evgenii Stepanov
    Cc: Branislav Rankov
    Cc: Kevin Brodsky
    Cc: Christoph Hellwig
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • process_madvise currently requires ptrace attach capability.
    PTRACE_MODE_ATTACH gives one process complete control over another
    process. It effectively removes the security boundary between the two
    processes (in one direction). Granting ptrace attach capability even to a
    system process is considered dangerous since it creates an attack surface.
    This severely limits the usage of this API.

    The operations process_madvise can perform do not affect the correctness
    of the operation of the target process; they only affect where the data is
    physically located (and therefore, how fast it can be accessed). What we
    want is the ability for one process to influence another process in order
    to optimize performance across the entire system while leaving the
    security boundary intact.

    Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and
    CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and
    CAP_SYS_NICE for influencing process performance.

    Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com
    Signed-off-by: Suren Baghdasaryan
    Reviewed-by: Kees Cook
    Acked-by: Minchan Kim
    Acked-by: David Rientjes
    Cc: Jann Horn
    Cc: Jeff Vander Stoep
    Cc: Michal Hocko
    Cc: Shakeel Butt
    Cc: Tim Murray
    Cc: Florian Weimer
    Cc: Oleg Nesterov
    Cc: James Morris
    Cc: [5.10+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suren Baghdasaryan
     
  • Fix a sparse warning by using rcu_dereference(). Technically this is a
    bug and a sufficiently aggressive compiler could reload the `real_parent'
    pointer outside the protection of the rcu lock (and access freed memory),
    but I think it's pretty unlikely to happen.

    Link: https://lkml.kernel.org/r/20210221194207.1351703-1-willy@infradead.org
    Fixes: b18dc5f291c0 ("mm, oom: skip vforked tasks from being selected")
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Miaohe Lin
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Some architectures prefix all functions with a constant string ('.' on
    ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in
    , so that get_stack_skipnr() can work properly.

    Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f058@csgroup.eu
    Link: https://lkml.kernel.org/r/20210304144000.1148590-1-elver@google.com
    Signed-off-by: Marco Elver
    Reported-by: Christophe Leroy
    Tested-by: Christophe Leroy
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Andrey Konovalov
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco Elver
     
  • cache_alloc_debugcheck_after() performs checks on an object, including
    adjusting the returned pointer. None of this should apply to KFENCE
    objects. While for non-bulk allocations, the checks are skipped when we
    allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
    is called via cache_alloc_debugcheck_after_bulk().

    Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.

    Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
    Signed-off-by: Marco Elver
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Andrey Konovalov
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco Elver
     
  • Use %td for ptrdiff_t.

    Link: https://lkml.kernel.org/r/3abbe4c9-16ad-c168-a90f-087978ccd8f7@csgroup.eu
    Link: https://lkml.kernel.org/r/20210303121157.3430807-1-elver@google.com
    Signed-off-by: Marco Elver
    Reported-by: Christophe Leroy
    Reviewed-by: Alexander Potapenko
    Cc: Dmitriy Vyukov
    Cc: Andrey Konovalov
    Cc: Jann Horn
    Cc: Christophe Leroy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco Elver
     
  • Separating compiler-clang.h from compiler-gcc.h inadventently dropped the
    definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling
    back to the open-coded version and hoping that the compiler detects it.

    Since all versions of clang support the __builtin_bswap interfaces, add
    back the flags and have the headers pick these up automatically.

    This results in a 4% improvement of compilation speed for arm defconfig.

    Note: it might also be worth revisiting which architectures set
    CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is
    set on six architectures (arm32, csky, mips, powerpc, s390, x86), while
    another ten architectures define custom helpers (alpha, arc, ia64, m68k,
    mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300,
    hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized
    version and rely on the compiler to detect it.

    A long time ago, the compiler builtins were architecture specific, but
    nowadays, all compilers that are able to build the kernel have correct
    implementations of them, though some may not be as optimized as the inline
    asm versions.

    The patch that dropped the optimization landed in v4.19, so as discussed
    it would be fairly safe to backport this revert to stable kernels to the
    4.19/5.4/5.10 stable kernels, but there is a remaining risk for
    regressions, and it has no known side-effects besides compile speed.

    Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org
    Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/
    Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nathan Chancellor
    Reviewed-by: Kees Cook
    Acked-by: Miguel Ojeda
    Acked-by: Nick Desaulniers
    Acked-by: Luc Van Oostenryck
    Cc: Masahiro Yamada
    Cc: Nick Hu
    Cc: Greentime Hu
    Cc: Vincent Chen
    Cc: Paul Walmsley
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Guo Ren
    Cc: Randy Dunlap
    Cc: Sami Tolvanen
    Cc: Marco Elver
    Cc: Arvind Sankar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann