09 Feb, 2016

20 commits

  • Add some text and an example to Documentation/x86/early-microcode.txt
    explaining how to build in microcode.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-18-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Before this, we issued this message from save_microcode_in_initrd()
    which is called from free_initrd_mem(), i.e., only when we have an
    initrd enabled. However, we can update from builtin microcode too but
    then we don't issue the update message.

    Fix it by issuing that message on the generic driver init path.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-17-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Reflow arguments, sort local variables in reverse christmas tree, kill
    "out" label.

    No functionality change.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-16-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • @cpu is unused, kill it.

    No functionality change.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-15-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Rename it to mc_tmp_ptrs to denote better what it is - a temporary array
    for saving pointers to microcode blobs. And "initrd" is not accurate
    anymore since initrd is not the only source for early microcode.
    Therefore, rename copy_initrd_ptrs() to copy_ptrs() simply and
    "initrd_start" to "offset".

    And then do the following convention: the global variable is called
    "mc_tmp_ptrs" and the local function arguments "mc_ptrs" for
    differentiation.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-14-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • ... and drop the 32-bit casting games which we had to do at the time
    because wrmsr() was unforgiving then, see c3fd0bd5e19a from the
    full history tree:

    commit c3fd0bd5e19aaff9cdd104edff136a2023db657e
    Author: Linus Torvalds
    Date: Tue Feb 17 23:23:41 2004 -0800

    Fix up the microcode update on regular 32-bit x86. Our wrmsr()
    is a bit unforgiving and really doesn't like 64-bit values.
    ...

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-13-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Get rid of local variable cpu_num as it is equal to @cpu now. Deref
    cpu_data() only when it is really needed at the end.

    No functionality change.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-12-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • If we're going to BUG_ON() because we're running on the wrong CPU, we
    better do it as the first thing we do when entering that function. And
    also, turn it into a WARN_ON() because it is not worth to panic the
    system if we apply the microcode on the wrong CPU - we're simply going
    to exit early.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-11-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Well, it is apparent what it points to - microcode. And since it is the
    intel loader, no need for the "_intel" suffix. Use "!" for the 0/NULL
    checks, while at it.

    No functionality change.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-10-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • It is shorter and easier on the eyes. Change the "== 0" tests to "!..."
    while at it.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-9-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • So it is always a head-twister when trying to stare at code which has a
    bunch of

    struct mc_saved_data *mc_saved_data;

    local function variables *and* a global mc_saved_data of the same name.

    Rename all locals to "mcs" to differentiate from the global one.

    No functionality change.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-8-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • It is supplied by pr_fmt already.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-7-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • This is especially annoying on large boxes:

    x86: Booting SMP configuration:
    .... node #0, CPUs: #1
    microcode: CPU1 microcode updated early to revision 0x428, date = 2014-05-29
    #2
    microcode: CPU2 microcode updated early to revision 0x428, date = 2014-05-29
    #3
    ...

    so issue the update message only once.

    $ grep microcode /proc/cpuinfo

    shows whether every core got updated properly.

    Reported-by: Ingo Molnar
    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-6-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • "uci" is an element of the ucode_cpu_info[] array, it can't be NULL.

    Tested-by: Thomas Voegtle
    Signed-off-by: Dan Carpenter
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kernel-janitors@vger.kernel.org
    Link: http://lkml.kernel.org/r/1454499225-21544-5-git-send-email-bp@alien8.de
    Link: http://lkml.kernel.org/r/20140120103046.GC14233@elgon.mountain
    Signed-off-by: Ingo Molnar

    Dan Carpenter
     
  • We do parse for the disable microcode loader chicken bit very early.
    After the driver merge, the __setup() param parsing method is not needed
    anymore so get rid of it.

    In addition, fix a compiler warning from an old SLES11 gcc (4.3.4)
    reported by Jan Beulich :

    arch/x86/kernel/cpu/microcode/core.c: In function ‘load_ucode_bsp’:
    arch/x86/kernel/cpu/microcode/core.c:96: warning: array subscript is above array bounds

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-4-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Set the initrd @start depending on the presence of an initrd. Otherwise,
    builtin microcode loading doesn't work as the start is wrong and we're
    using it to compute offset to the microcode blobs.

    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: # 4.4
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-3-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Thomas Voegtle reported that doing oldconfig with a .config which has
    CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the
    microcode loading mechanism from being built.

    So untangle it from the BLK_DEV_INITRD dependency so that oldconfig
    doesn't turn it off and add an explanatory text to its Kconfig help what
    the supported methods for supplying microcode are.

    Reported-by: Thomas Voegtle
    Tested-by: Thomas Voegtle
    Signed-off-by: Borislav Petkov
    Cc: # 4.4
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Pull KVM fixes from Paolo Bonzini:
    "KVM-ARM fixes, mostly coming from the PMU work"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    arm64: KVM: Fix guest dead loop when register accessor returns false
    arm64: KVM: Fix comments of the CP handler
    arm64: KVM: Fix wrong use of the CPSR MODE mask for 32bit guests
    arm64: KVM: Obey RES0/1 reserved bits when setting CPTR_EL2
    arm64: KVM: Fix AArch64 guest userspace exception injection

    Linus Torvalds
     
  • …nel/git/broonie/regmap

    Pull regmap fix from Mark Brown:
    "A single revert back to v4.4 endianness handling.

    Commit 29bb45f25ff3 ("regmap-mmio: Use native endianness for
    read/write") attempted to fix some long standing bugs in the MMIO
    implementation for big endian systems caused by duplicate byte
    swapping in both regmap and readl()/writel(). Sadly the fix makes
    things worse rather than better, so revert it for now"

    * tag 'regmap-fix-v4.5-big-endian' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
    regmap: mmio: Revert to v4.4 endianness handling

    Linus Torvalds
     
  • Fix the doubled "started" and tidy up the following sentences.

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

08 Feb, 2016

4 commits

  • …/kvmarm/kvmarm into kvm-master

    KVM/ARM fixes for v4.5-rc2

    A few random fixes, mostly coming from the PMU work by Shannon:

    - fix for injecting faults coming from the guest's userspace
    - cleanup for our CPTR_EL2 accessors (reserved bits)
    - fix for a bug impacting perf (user/kernel discrimination)
    - fix for a 32bit sysreg handling bug

    Paolo Bonzini
     
  • Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "The first real batch of fixes for this release cycle, so there are a
    few more than usual.

    Most of these are fixes and tweaks to board support (DT bugfixes,
    etc). I've also picked up a couple of small cleanups that seemed
    innocent enough that there was little reason to wait (const/
    __initconst and Kconfig deps).

    Quite a bit of the changes on OMAP were due to fixes to no longer
    write to rodata from assembly when ARM_KERNMEM_PERMS was enabled, but
    there were also other fixes.

    Kirkwood had a bunch of gpio fixes for some boards. OMAP had RTC
    fixes on OMAP5, and Nomadik had changes to MMC parameters in DT.

    All in all, mostly the usual mix of various fixes"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits)
    ARM: multi_v7_defconfig: enable DW_WATCHDOG
    ARM: nomadik: fix up SD/MMC DT settings
    ARM64: tegra: Add chosen node for tegra132 norrin
    ARM: realview: use "depends on" instead of "if" after prompt
    ARM: tango: use "depends on" instead of "if" after prompt
    ARM: tango: use const and __initconst for smp_operations
    ARM: realview: use const and __initconst for smp_operations
    bus: uniphier-system-bus: revive tristate prompt
    arm64: dts: Add missing DMA Abort interrupt to Juno
    bus: vexpress-config: Add missing of_node_put
    ARM: dts: am57xx: sbc-am57x: correct Eth PHY settings
    ARM: dts: am57xx: cl-som-am57x: fix CPSW EMAC pinmux
    ARM: dts: am57xx: sbc-am57x: fix UART3 pinmux
    ARM: dts: am57xx: cl-som-am57x: update SPI Flash frequency
    ARM: dts: am57xx: cl-som-am57x: set HOST mode for USB2
    ARM: dts: am57xx: sbc-am57x: fix SB-SOM EEPROM I2C address
    ARM: dts: LogicPD Torpedo: Revert Duplicative Entries
    ARM: dts: am437x: pixcir_tangoc: use correct flags for irq types
    ARM: dts: am4372: fix irq type for arm twd and global timer
    ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
    ...

    Linus Torvalds
     
  • Pull mailbox fixes from Jassi Brar:

    - fix getting element from the pcc-channels array by simply indexing
    into it

    - prevent building mailbox-test driver for archs that don't have IOMEM

    * 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
    mailbox: Fix dependencies for !HAS_IOMEM archs
    mailbox: pcc: fix channel calculation in get_pcc_channel()

    Linus Torvalds
     

07 Feb, 2016

2 commits

  • Pull USB fixes from Greg KH:
    "Here are some USB fixes for 4.5-rc3.

    The usual, xhci fixes for reported issues, combined with some small
    gadget driver fixes, and a MAINTAINERS file update. All have been in
    linux-next with no reported issues"

    * tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    xhci: harden xhci_find_next_ext_cap against device removal
    xhci: Fix list corruption in urb dequeue at host removal
    usb: host: xhci-plat: fix NULL pointer in probe for device tree case
    usb: xhci-mtk: fix AHB bus hang up caused by roothubs polling
    usb: xhci-mtk: fix bpkts value of LS/HS periodic eps not behind TT
    usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
    usb: xhci: set SSIC port unused only if xhci_suspend succeeds
    usb: xhci: add a quirk bit for ssic port unused
    usb: xhci: handle both SSIC ports in PME stuck quirk
    usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.
    Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
    MAINTAINERS: fix my email address
    usb: dwc2: Fix probe problem on bcm2835
    Revert "usb: dwc2: Move reset into dwc2_get_hwparams()"
    usb: musb: ux500: Fix NULL pointer dereference at system PM
    usb: phy: mxs: declare variable with initialized value
    usb: phy: msm: fix error handling in probe.

    Linus Torvalds
     
  • Pull staging and IIO driver fixes from Greg KH:
    "Here are some IIO and staging driver fixes for 4.5-rc3.

    All of them, except one, are for IIO drivers, and one is for a speakup
    driver fix caused by some earlier patches, to resolve a reported build
    failure"

    * tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    Staging: speakup: Fix allyesconfig build on mn10300
    iio: dht11: Use boottime
    iio: ade7753: avoid uninitialized data
    iio: pressure: mpl115: fix temperature offset sign
    iio: imu: Fix dependencies for !HAS_IOMEM archs
    staging: iio: Fix dependencies for !HAS_IOMEM archs
    iio: adc: Fix dependencies for !HAS_IOMEM archs
    iio: inkern: fix a NULL dereference on error
    iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
    iio: light: acpi-als: Report data as processed
    iio: dac: mcp4725: set iio name property in sysfs
    iio: add HAS_IOMEM dependency to VF610_ADC
    iio: add IIO_TRIGGER dependency to STK8BA50
    iio: proximity: lidar: correct return value
    iio-light: Use a signed return type for ltr501_match_samp_freq()

    Linus Torvalds
     

06 Feb, 2016

14 commits

  • Merge fixes from Andrew Morton:
    "22 fixes"

    * emailed patches from Andrew Morton : (22 commits)
    epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
    radix-tree: fix oops after radix_tree_iter_retry
    MAINTAINERS: trim the file triggers for ABI/API
    dax: dirty inode only if required
    thp: make deferred_split_scan() work again
    mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
    ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
    um: asm/page.h: remove the pte_high member from struct pte_t
    mm, hugetlb: don't require CMA for runtime gigantic pages
    mm/hugetlb: fix gigantic page initialization/allocation
    mm: downgrade VM_BUG in isolate_lru_page() to warning
    mempolicy: do not try to queue pages from !vma_migratable()
    mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
    vmstat: make vmstat_update deferrable
    mm, vmstat: make quiet_vmstat lighter
    mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
    memblock: don't mark memblock_phys_mem_size() as __init
    dump_stack: avoid potential deadlocks
    mm: validate_mm browse_rb SMP race condition
    m32r: fix build failure due to SMP and MMU
    ...

    Linus Torvalds
     
  • Pull Ceph fixes from Sage Weil:
    "We have a few wire protocol compatibility fixes, ports of a few recent
    CRUSH mapping changes, and a couple error path fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: MOSDOpReply v7 encoding
    libceph: advertise support for TUNABLES5
    crush: decode and initialize chooseleaf_stable
    crush: add chooseleaf_stable tunable
    crush: ensure take bucket value is valid
    crush: ensure bucket id is valid before indexing buckets array
    ceph: fix snap context leak in error path
    ceph: checking for IS_ERR instead of NULL

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Fixes all over the place:

    - amdkfd: two static checker fixes
    - mst: a bunch of static checker and spec/hw interaction fixes
    - amdgpu: fix Iceland hw properly, and some fiji bugs, along with
    some write-combining fixes.
    - exynos: some regression fixes
    - adv7511: fix some EDID reading issues"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (38 commits)
    drm/dp/mst: deallocate payload on port destruction
    drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
    drm/dp/mst: move GUID storage from mgr, port to only mst branch
    drm/dp/mst: change MST detection scheme
    drm/dp/mst: Calculate MST PBN with 31.32 fixed point
    drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
    drm/mst: Add range check for max_payloads during init
    drm/mst: Don't ignore the MST PBN self-test result
    drm: fix missing reference counting decrease
    drm/amdgpu: disable uvd and vce clockgating on Fiji
    drm/amdgpu: remove exp hardware support from iceland
    drm/amdgpu: load MEC ucode manually on iceland
    drm/amdgpu: don't load MEC2 on topaz
    drm/amdgpu: drop topaz support from gmc8 module
    drm/amdgpu: pull topaz gmc bits into gmc_v7
    drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
    drm/amdgpu: iceland use CI based MC IP
    drm/amdgpu: move gmc7 support out of CIK dependency
    drm/amdgpu/gfx7: enable cp inst/reg error interrupts
    drm/amdgpu/gfx8: enable cp inst/reg error interrupts
    ...

    Linus Torvalds
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These are: a fix for a recently introduced false-positive warnings
    about PM domain pointers being changed inappropriately (harmless but
    annoying), an MCH size workaround quirk for one more platform, a
    compiler warning fix (generic power domains framework), an ACPI LPSS
    (Intel SoCs) driver fixup and a cleanup of the ACPI CPPC core code.

    Specifics:

    - PM core fix to avoid false-positive warnings generated when the
    pm_domain field is cleared for a device that appears to be bound to
    a driver (Rafael Wysocki).

    - New MCH size workaround quirk for Intel Haswell-ULT (Josh Boyer).

    - Fix for an "unused function" compiler warning in the generic power
    domains framework (Ulf Hansson).

    - Fixup for the ACPI driver for Intel SoCs (acpi-lpss) to set the PM
    domain pointer of a device properly in one place that was
    overlooked by a recent PM core update (Andy Shevchenko).

    - Removal of a redundant function declaration in the ACPI CPPC core
    code (Timur Tabi)"

    * tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM: Avoid false-positive warnings in dev_pm_domain_set()
    PM / Domains: Silence compiler warning for an unused function
    ACPI / CPPC: remove redundant mbox_send_message() declaration
    ACPI / LPSS: set PM domain via helper setter
    PNP: Add Haswell-ULT to Intel MCH size workaround

    Linus Torvalds
     
  • In the current implementation of the EPOLLEXCLUSIVE flag (added for
    4.5-rc1), if epoll waiters create different POLL* sets and register them
    as exclusive against the same target fd, the current implementation will
    stop waking any further waiters once it finds the first idle waiter.
    This means that waiters could miss wakeups in certain cases.

    For example, when we wake up a pipe for reading we do:
    wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
    one epoll set or epfd is added to pipe p with POLLIN and a second set
    epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
    wakeup since the current implementation will stop after it finds any
    intersection of events with a waiter that is blocked in epoll_wait().

    We could potentially address this by requiring all epoll waiters that
    are added to p be required to pass the same set of POLL* events. IE the
    first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
    flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
    However, I think it might be somewhat confusing interface as we would
    have to reference count the number of users for that set, and so
    userspace would have to keep track of that count, or we would need a
    more involved interface. It also adds some shared state that we'd have
    store somewhere. I don't think anybody will want to bloat
    __wait_queue_head for this.

    I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
    such that it can only be specified with EPOLLIN and/or EPOLLOUT. So
    that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
    once we hit the first idle waiter that specifies the EPOLLIN bit, since
    any remaining waiters that only have 'POLLOUT' set wouldn't need to be
    woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
    bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set in the
    wake bit set (there is at least one example of this I saw in fs/pipe.c),
    then we just wake the entire exclusive list. Having both 'POLLOUT' and
    'POLLIN' both set should not be on any performance critical path, so I
    think that's ok (in fs/pipe.c its in pipe_release()). We also continue
    to include EPOLLERR and EPOLLHUP by default in any exclusive set. Thus,
    the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
    so.

    Since epoll waiters may be interested in other events as well besides
    EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
    doing a 'dup' call on the target fd and adding that as one normally
    would with EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT
    events are what we are interest in balancing, I think that the 'dup'
    thing could perhaps be added to only one of the waiter threads.
    However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
    sufficient for the majority of use-cases.

    Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
    among multiple epfds, where between 1 and n of the epfds may receive an
    event, it does not satisfy the semantics of EPOLLONESHOT where only 1
    epfd would get an event. Thus, it is not allowed to be specified in
    conjunction with EPOLLEXCLUSIVE.

    EPOLL_CTL_MOD is also not allowed if the fd was previously added as
    EPOLLEXCLUSIVE. It seems with the limited number of flags to not be as
    interesting, but this could be relaxed at some further point.

    Signed-off-by: Jason Baron
    Tested-by: Madars Vitolins
    Cc: Michael Kerrisk
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: Eric Wong
    Cc: Jonathan Corbet
    Cc: Andy Lutomirski
    Cc: Hagen Paul Pfeifer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • Helper radix_tree_iter_retry() resets next_index to the current index.
    In following radix_tree_next_slot current chunk size becomes zero. This
    isn't checked and it tries to dereference null pointer in slot.

    Tagged iterator is fine because retry happens only at slot 0 where tag
    bitmask in iter->tags is filled with single bit.

    Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
    Signed-off-by: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Ohad Ben-Cohen
    Cc: Jeremiah Mahler
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Commit ea8f8fc8631 ("MAINTAINERS: add linux-api for review of API/ABI
    changes") added file triggers for various paths that likely indicated
    API/ABI changes. However, catching all changes in Documentation/ABI/
    and include/uapi/ produces a large volume of mail to linux-api, rather
    than only API/ABI changes. Drop those two entries, but leave
    include/linux/syscalls.h and kernel/sys_ni.c to catch syscall-related
    changes.

    [josh@joshtriplett.org: redid changelog]
    Signed-off-by: Michael Kerrisk
    Acked-by: Shuah khan
    Cc: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk (man-pages)
     
  • Signed-off-by: Dmitry Monakhov
    Reviewed-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Monakhov
     
  • We need to iterate over split_queue, not local empty list to get
    anything split from the shrinker.

    Fixes: e3ae19535c66 ("thp: limit number of object to scan on deferred_split_scan()")
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
    anon_vma appeared between lock and unlock. We have to check anon_vma
    first or call anon_vma_prepare() to be sure that it's here. There are
    only few users of these legacy helpers. Let's get rid of them.

    This patch fixes anon_vma lock imbalance in validate_mm(). Write lock
    isn't required here, read lock is enough.

    And reorders expand_downwards/expand_upwards: security_mmap_addr() and
    wrapping-around check don't have to be under anon vma lock.

    Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com
    Signed-off-by: Konstantin Khlebnikov
    Reported-by: Dmitry Vyukov
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • When recovery master down, dlm_do_local_recovery_cleanup() only remove
    the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
    Which will make umount thread falling in dead loop migrating $RECOVERY
    to the dead node.

    Signed-off-by: xuejiufei
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • Commit 16da306849d0 ("um: kill pfn_t") introduced a compile warning for
    defconfig (SUBARCH=i386):

    arch/um/kernel/skas/mmu.c:38:206:
    warning: right shift count >= width of type [-Wshift-count-overflow]

    Aforementioned patch changes the definition of the phys_to_pfn() macro
    from

    ((pfn_t) ((p) >> PAGE_SHIFT))

    to

    ((p) >> PAGE_SHIFT)

    This effectively changes the phys_to_pfn() expansion's type from
    unsigned long long to unsigned long.

    Through the callchain init_stub_pte() => mk_pte(), the expansion of
    phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
    pte_set_val(pte, phys, prot) macro, eventually leading to

    (pte).pte_high = (phys) >> 32;

    This results in the warning from above.

    Since UML only deals with 32 bit addresses, the upper 32 bits from
    'phys' used to be always zero anyway. Also, all page protection flags
    defined by UML don't use any bits beyond bit 9. Since the contents of a
    PTE are defined within architecture scope only, the ->pte_high member
    can be safely removed.

    Remove the ->pte_high member from struct pte_t.
    Rename ->pte_low to ->pte.
    Adapt the pte helper macros in arch/um/include/asm/page.h.

    Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped. This
    write barrier doesn't seem to be paired with any read barrier though and
    thus, was useless anyway.

    Fixes: 16da306849d0 ("um: kill pfn_t")
    Signed-off-by: Nicolai Stange
    Cc: Dan Williams
    Cc: Richard Weinberger
    Cc: Nicolai Stange
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolai Stange
     
  • Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
    at runtime") has added the runtime gigantic page allocation via
    alloc_contig_range(), making this support available only when CONFIG_CMA
    is enabled. Because it doesn't depend on MIGRATE_CMA pageblocks and the
    associated infrastructure, it is possible with few simple adjustments to
    require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.

    After this patch, alloc_contig_range() and related functions are
    available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
    enabled. Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION. This allows
    supporting runtime gigantic pages without the CMA-specific checks in
    page allocator fastpaths.

    Signed-off-by: Vlastimil Babka
    Cc: Luiz Capitulino
    Cc: Kirill A. Shutemov
    Cc: Zhang Yanfei
    Cc: Yasuaki Ishimatsu
    Cc: Joonsoo Kim
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Davidlohr Bueso
    Cc: Hillf Danton
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Attempting to preallocate 1G gigantic huge pages at boot time with
    "hugepagesz=1G hugepages=1" on the kernel command line will prevent
    booting with the following:

    kernel BUG at mm/hugetlb.c:1218!

    When mapcount accounting was reworked, the setting of
    compound_mapcount_ptr in prep_compound_gigantic_page was overlooked. As
    a result, the validation of mapcount in free_huge_page fails.

    The "BUG_ON" checks in free_huge_page were also changed to
    "VM_BUG_ON_PAGE" to assist with debugging.

    Fixes: 53f9263baba69 ("mm: rework mapcount accounting to enable 4k mapping of THPs")
    Signed-off-by: Mike Kravetz
    Signed-off-by: Naoya Horiguchi
    Acked-by: Kirill A. Shutemov
    Acked-by: David Rientjes
    Tested-by: Vlastimil Babka
    Cc: "Aneesh Kumar K.V"
    Cc: Jerome Marchand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz