20 Oct, 2013

1 commit


19 Oct, 2013

4 commits

  • Pull btrfs fix from Chris Mason:
    "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
    regression in our initial rc1 pull. When doing nocow writes we were
    sometimes starting a transaction with locks held"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: release path before starting transaction in can_nocow_extent

    Linus Torvalds
     
  • Pull ACPI and power management fixes from Rafael Wysocki:

    - intel_pstate fix for misbehavior after system resume if sysfs
    attributes are set in a specific way before the corresponding suspend
    from Dirk Brandewie.

    - A recent intel_pstate fix has no effect if unsigned long is 32-bit,
    so fix it up to cover that case as well.

    - The s3c64xx cpufreq driver was not updated when the index field of
    struct cpufreq_frequency_table was replaced with driver_data, so
    update it now. From Charles Keepax.

    - The Kconfig help text for ACPI_BUTTON still refers to
    /proc/acpi/event that has been dropped recently, so modify it to
    remove that reference. From Krzysztof Mazur.

    - A Lan Tianyu's change adds a missing mutex unlock to an error code
    path in acpi_resume_power_resources().

    - Some code related to ACPI power resources, whose very purpose is
    questionable to put it lightly, turns out to cause problems to happen
    during testing on real systems, so remove it completely (we may
    revisit that in the future if there's a compelling enough reason).
    From Rafael J Wysocki and Aaron Lu.

    * tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / PM: Drop two functions that are not used any more
    ATA / ACPI: remove power dependent device handling
    cpufreq: s3c64xx: Rename index to driver_data
    ACPI / power: Drop automaitc resume of power resource dependent devices
    intel_pstate: Fix type mismatch warning
    cpufreq / intel_pstate: Fix max_perf_pct on resume
    ACPI: remove /proc/acpi/event from ACPI_BUTTON help
    ACPI / power: Release resource_lock after acpi_power_get_state() return error

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "Two fixlets:

    - fix a (rare-config) build bug
    - fix a next-gen SGI/UV hw/firmware enumeration bug"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Update UV3 hub revision ID
    x86/microcode: Correct Kconfig dependencies

    Linus Torvalds
     
  • We can't be holding tree locks while we try to start a transaction, we will
    deadlock. Thanks,

    Reported-by: Sage Weil
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

18 Oct, 2013

12 commits

  • * acpi-fixes:
    ACPI / PM: Drop two functions that are not used any more
    ATA / ACPI: remove power dependent device handling
    ACPI / power: Drop automaitc resume of power resource dependent devices
    ACPI: remove /proc/acpi/event from ACPI_BUTTON help
    ACPI / power: Release resource_lock after acpi_power_get_state() return error

    Rafael J. Wysocki
     
  • * pm-fixes:
    cpufreq: s3c64xx: Rename index to driver_data
    intel_pstate: Fix type mismatch warning
    cpufreq / intel_pstate: Fix max_perf_pct on resume

    Rafael J. Wysocki
     
  • Pull CIFS fixes from Steve French:
    "Five small cifs fixes (includes fixes for: unmount hang, 2 security
    related, symlink, large file writes)"

    * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: ntstatus_to_dos_map[] is not terminated
    cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
    cifs: Fix inability to write files >2GB to SMB2/3 shares
    cifs: Avoid umount hangs with smb2 when server is unresponsive
    do not treat non-symlink reparse points as valid symlinks

    Linus Torvalds
     
  • Pull driver core fix from Greg KH:
    "Here is one fix for the hotplug memory path that resolves a regression
    when removing memory that showed up in 3.12-rc1"

    * tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    driver core: Release device_hotplug_lock when store_mem_state returns EINVAL

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are some USB fixes and new device ids for 3.12-rc6

    The largest change here is a bunch of new device ids for the option
    USB serial driver for new Huawei devices. Other than that, just some
    small bug fixes for issues that people have reported (run-time and
    build-time), nothing major"

    * tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
    usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
    usb/chipidea: fix oops on memory allocation failure
    usb-storage: add quirk for mandatory READ_CAPACITY_16
    usb: serial: option: blacklist Olivetti Olicard200
    USB: quirks: add touchscreen that is dazzeled by remote wakeup
    Revert "usb: musb: gadget: fix otg active status flag"
    USB: quirks.c: add one device that cannot deal with suspension
    USB: serial: option: add support for Inovia SEW858 device
    USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.
    USB: support new huawei devices in option.c
    usb: musb: start musb on the udc side, too
    xhci: Fix spurious wakeups after S5 on Haswell
    xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers
    xhci: quirk for extra long delay for S4
    xhci: Don't enable/disable RWE on bus suspend/resume.

    Linus Torvalds
     
  • Pull serial driver fixes from Greg KH:
    "Here are two serial driver fixes for your tree. One is a revert of a
    patch that causes a build error, the other is a fix to provide the
    correct brace placement which resolves a bug where the driver was not
    working properly"

    * tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    serial: vt8500: add missing braces
    Revert "serial: i.MX: evaluate linux,stdout-path property"

    Linus Torvalds
     
  • Pull char/misc driver fixes from Greg KH:
    "Here are some small iio and w1 driver fixes for 3.12-rc6.

    There is also a hyper-v fix in here, which turned out to be incorrect,
    so it was reverted. That will probably have to wait unto 3.13-rc1 to
    get accepted as it's still being discussed"

    * tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
    Drivers: hv: vmbus: Fix a bug in channel rescind code
    iio:buffer: Free active scan mask in iio_disable_all_buffers()
    iio: frequency: adf4350: add missing clk_disable_unprepare() on error in adf4350_probe()
    w1 - call request_module with w1 master mutex unlocked
    w1 - fix fops in w1_bus_notify

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap
    regression fix, and kernel memory leak fix in hdsp driver. Hopefully
    this will be the last pull request for 3.12..."

    * tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hdsp - info leak in snd_hdsp_hwdep_ioctl()
    ALSA: us122l: Fix pcm_usb_stream mmapping regression
    ALSA: hda - Fix inverted internal mic not indicated on some machines

    Linus Torvalds
     
  • Pull apparmor fixes from James Morris:
    "A couple more regressions fixed"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    apparmor: fix bad lock balance when introspecting policy
    apparmor: fix memleak of the profile hash

    Linus Torvalds
     
  • …/jic23/iio into char-misc-linus

    Jonathan writes:

    Third set of IIO fixes for the 3.12 cycle.

    Two little ones this time:

    1) A missing clk_unprepare in adf4350.
    2) A missing free of the active_scan_mask when iio_disable_all_buffers is
    called during an unexpected device removal. This leak was introduced by
    the fix
    a87c82e454f184a9473f8cdfd4d304205f585f65 iio: Stop sampling when the device is removed
    and hence is a regression fix.

    Greg Kroah-Hartman
     
  • Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
    changed the conditional around the declaration of usb_nop_xceiv_register
    from
    #if defined(CONFIG_NOP_USB_XCEIV) ||
    (defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
    to
    #if IS_ENABLED(CONFIG_NOP_USB_XCEIV)

    While that looks the same, it is semantically different. The first expression
    is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
    code is built as module. The second expression is true if code depending on
    CONFIG_NOP_USB_XCEIV if built as module or into the kernel.

    As a result, the arm:allmodconfig build fails with

    arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
    arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
    `usb_nop_xceiv_register'

    Fix the problem by reverting to the old conditional.

    Cc: Josh Boyer
    Signed-off-by: Guenter Roeck
    Signed-off-by: Greg Kroah-Hartman

    Guenter Roeck
     
  • This reverts commit 90d33f3ec519db19d785216299a4ee85ef58ec97 as it's not
    the correct fix for this issue, and it causes a build warning to be
    added to the kernel tree.

    Cc: K. Y. Srinivasan
    Cc:
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Oct, 2013

23 commits

  • Two functions defined in device_pm.c, acpi_dev_pm_add_dependent()
    and acpi_dev_pm_remove_dependent(), have no callers and may be
    dropped, so drop them.

    Moreover, they are the only functions adding entries to and removing
    entries from the power_dependent list in struct acpi_device, so drop
    that list too.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Previously, we wanted SCSI devices corrsponding to ATA devices to
    be runtime resumed when the power resource for those ATA device was
    turned on by some other device, so we added the SCSI device to the
    dependent device list of the ATA device's ACPI node. However, this
    code has no effect after commit 41863fc (ACPI / power: Drop automaitc
    resume of power resource dependent devices) and the mechanism it was
    supposed to implement is regarded as a bad idea now, so drop it.

    [rjw: Changelog]
    Signed-off-by: Aaron Lu
    Signed-off-by: Rafael J. Wysocki

    Aaron Lu
     
  • Merge misc fixes from Andrew Morton.

    * emailed patches from Andrew Morton : (21 commits)
    mm: revert mremap pud_free anti-fix
    mm: fix BUG in __split_huge_page_pmd
    swap: fix set_blocksize race during swapon/swapoff
    procfs: call default get_unmapped_area on MMU-present architectures
    procfs: fix unintended truncation of returned mapped address
    writeback: fix negative bdi max pause
    percpu_refcount: export symbols
    fs: buffer: move allocation failure loop into the allocator
    mm: memcg: handle non-error OOM situations more gracefully
    tools/testing/selftests: fix uninitialized variable
    block/partitions/efi.c: treat size mismatch as a warning, not an error
    mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
    mm/zswap: bugfix: memory leak when re-swapon
    mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
    mm: migration: do not lose soft dirty bit if page is in migration state
    gcov: MAINTAINERS: Add an entry for gcov
    mm/hugetlb.c: correct missing private flag clearing
    mm/vmscan.c: don't forget to free shrinker->nr_deferred
    ipc/sem.c: synchronize semop and semctl with IPC_RMID
    ipc: update locking scheme comments
    ...

    Linus Torvalds
     
  • Revert commit 1ecfd533f4c5 ("mm/mremap.c: call pud_free() after fail
    calling pmd_alloc()").

    The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
    ensure that the pud, pmd, pt is already allocated, and seldom do they
    need to allocate; on failure, upper levels are freed if appropriate by
    the subsequent do_munmap(). Whereas commit 1ecfd533f4c5 did an
    unconditional pud_free() of a most-likely still-in-use pud: saved only
    by the near-impossiblity of pmd_alloc() failing.

    Signed-off-by: Hugh Dickins
    Cc: Chen Gang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
    __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).

    It's invalid: we don't always have down_write of mmap_sem there: a racing
    do_huge_pmd_wp_page() might have copied-on-write to another huge page
    before our split_huge_page() got the anon_vma lock.

    Forget the BUG_ON, just go back and try again if this happens.

    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Naoya Horiguchi
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix race between swapoff and swapon. Swapoff used old_block_size from
    swap_info outside of swapon_mutex so it could be overwritten by
    concurrent swapon.

    The race has visible effect only if more than one swap block device
    exists with different block sizes (e.g. /dev/sda1 with block size 4096
    and /dev/sdb1 with 512). In such case it leads to setting the blocksize
    of swapped off device with wrong blocksize.

    The bug can be triggered with multiple concurrent swapoff and swapon:
    0. Swap for some device is on.
    1. swapoff:
    First the swapoff is called on this device and "struct swap_info_struct
    *p" is assigned. This is done under swap_lock however this lock is
    released for the call try_to_unuse().

    2. swapon:
    After the assignment above (and before acquiring swapon_mutex &
    swap_lock by swapoff) the swapon is called on the same device.
    The p->old_block_size is assigned to the value of block_size the device.
    This block size should be the same as previous but sometimes it is not.
    The swapon ends successfully.

    3. swapoff:
    Swapoff resumes, grabs the locks and mutex and continues to disable this
    swap device. Now it sets the block size to value taken from swap_info
    which was overwritten by swapon in 2.

    Signed-off-by: Krzysztof Kozlowski
    Reported-by: Weijie Yang
    Cc: Bob Liu
    Cc: Konrad Rzeszutek Wilk
    Cc: Shaohua Li
    Cc: Minchan Kim
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     
  • Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
    proc_reg_get_unmapped_area in proc_reg_file_ops and
    proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
    get_unmapped_area method is not defined for the target procfs file,
    which causes regression of mmap on /proc/vmcore.

    To address this issue, like get_unmapped_area(), call default
    current->mm->get_unmapped_area on MMU-present architectures if
    pde->proc_fops->get_unmapped_area, i.e. the one in actual file
    operation in the procfs file, is not defined.

    Reported-by: Michael Holzheu
    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
    mapped virtual address returned from get_unmapped_area method in
    pde->proc_fops due to the variable rv of signed integer on x86_64. This
    is too small to have vitual address of unsigned long on x86_64 since on
    x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
    To fix this issue, use unsigned long instead.

    Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
    proc file mmap(2)").

    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Toralf runs trinity on UML/i386. After some time it hangs and the last
    message line is

    BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]

    It's found that pages_dirtied becomes very large. More than 1000000000
    pages in this case:

    period = HZ * pages_dirtied / task_ratelimit;
    BUG_ON(pages_dirtied > 2000000000);
    BUG_ON(pages_dirtied > 1000000000); < 0) {
    + extern int printf(char *, ...);
    + printf("ick : pause : %li\n", pause);
    + printf("ick: pages_dirtied : %lu\n", pages_dirtied);
    + printf("ick: task_ratelimit: %lu\n", task_ratelimit);
    + BUG_ON(1);
    + }
    trace_balance_dirty_pages(bdi,

    Since pause is bounded by [min_pause, max_pause] where min_pause is also
    bounded by max_pause. It's suspected and demonstrated that the
    max_pause calculation goes wrong:

    ick: pause : -717
    ick: min_pause : -177
    ick: max_pause : -717
    ick: pages_dirtied : 14
    ick: task_ratelimit: 0

    The problem lies in the two "long = unsigned long" assignments in
    bdi_max_pause() which might go negative if the highest bit is 1, and the
    min_t(long, ...) check failed to protect it falling under 0. Fix all of
    them by using "unsigned long" throughout the function.

    Signed-off-by: Fengguang Wu
    Reported-by: Toralf Förster
    Tested-by: Toralf Förster
    Reviewed-by: Jan Kara
    Cc: Richard Weinberger
    Cc: Geert Uytterhoeven
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Export the interface to be used within modules.

    Signed-off-by: Matias Bjorling
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matias Bjorling
     
  • Buffer allocation has a very crude indefinite loop around waking the
    flusher threads and performing global NOFS direct reclaim because it can
    not handle allocation failures.

    The most immediate problem with this is that the allocation may fail due
    to a memory cgroup limit, where flushers + direct reclaim might not make
    any progress towards resolving the situation at all. Because unlike the
    global case, a memory cgroup may not have any cache at all, only
    anonymous pages but no swap. This situation will lead to a reclaim
    livelock with insane IO from waking the flushers and thrashing unrelated
    filesystem cache in a tight loop.

    Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
    any looping happens in the page allocator, which knows how to
    orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
    allows memory cgroups to detect allocations that can't handle failure
    and will allow them to ultimately bypass the limit if reclaim can not
    make progress.

    Reported-by: azurIt
    Signed-off-by: Johannes Weiner
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
    callstack on OOM") assumed that only a few places that can trigger a
    memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
    readahead. But there are many more and it's impractical to annotate
    them all.

    First of all, we don't want to invoke the OOM killer when the failed
    allocation is gracefully handled, so defer the actual kill to the end of
    the fault handling as well. This simplifies the code quite a bit for
    added bonus.

    Second, since a failed allocation might not be the abrupt end of the
    fault, the memcg OOM handler needs to be re-entrant until the fault
    finishes for subsequent allocation attempts. If an allocation is
    attempted after the task already OOMed, allow it to bypass the limit so
    that it can quickly finish the fault and invoke the OOM killer.

    Reported-by: azurIt
    Signed-off-by: Johannes Weiner
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The err variable is intended to receive the timer_create() return before
    checking it

    Signed-off-by: Felipe Pena
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Felipe Pena
     
  • In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba")
    we started treating bad sizes in lba field of the partition that has the
    0xEE (GPT protective) as errors.

    However, we may run into these "bad sizes" in the real world if someone
    uses dd to copy an image from a smaller disk to a bigger disk. Since
    this case used to work (even without using force_gpt), keep it working
    and treat the size mismatch as a warning instead of an error.

    Reported-by: Josh Triplett
    Reported-by: Sean Paul
    Signed-off-by: Doug Anderson
    Reviewed-by: Josh Triplett
    Acked-by: Davidlohr Bueso
    Tested-by: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Anderson
     
  • Commit 11feeb498086 ("kvm: optimize away THP checks in
    kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
    compound pages.

    That commit depends on the assumption that PG_reserved is identical for
    all head and tail pages of a compound page. So that if get_user_pages
    returns a tail page, we don't need to check the head page in order to
    know if we deal with a reserved page that requires different
    refcounting.

    The assumption that PG_reserved is the same for head and tail pages is
    certainly correct for THP and regular hugepages, but gigantic hugepages
    allocated through bootmem don't clear the PG_reserved on the tail pages
    (the clearing of PG_reserved is done later only if the gigantic hugepage
    is freed).

    This patch corrects the gigantic compound page initialization so that we
    can retain the optimization in 11feeb498086. The cacheline was already
    modified in order to set PG_tail so this won't affect the boot time of
    large memory systems.

    [akpm@linux-foundation.org: tweak comment layout and grammar]
    Signed-off-by: Andrea Arcangeli
    Reported-by: andy123
    Acked-by: Rik van Riel
    Cc: Gleb Natapov
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Acked-by: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
    so a memory leak occurs.

    Free the memory of zswap_tree in zswap_frontswap_invalidate_area().

    Signed-off-by: Weijie Yang
    Reviewed-by: Bob Liu
    Cc: Minchan Kim
    Reviewed-by: Minchan Kim
    Cc:
    From: Weijie Yang
    Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently

    Consider the following scenario:
    thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
    thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
    finished, entry x and its zbud is not freed as its refcount != 0
    now, the swap_map[x] = 0
    thread 0: now call zswap_get_swap_cache_page
    swapcache_prepare return -ENOENT because entry x is not used any more
    zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
    zswap_writeback_entry do nothing except put refcount
    Now, the memory of zswap_entry x and its zpage leak.

    Modify:
    - check the refcount in fail path, free memory if it is not referenced.

    - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
    can be not only caused by nomem but also by invalidate.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Weijie Yang
    Reviewed-by: Bob Liu
    Cc: Minchan Kim
    Cc:
    Acked-by: Seth Jennings

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Weijie Yang
     
  • If a page we are inspecting is in swap we may occasionally report it as
    having soft dirty bit (even if it is clean). The pte_soft_dirty helper
    should be called on present pte only.

    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Reviewed-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • If page migration is turned on in config and the page is migrating, we
    may lose the soft dirty bit. If fork and mprotect are called on
    migrating pages (once migration is complete) pages do not obtain the
    soft dirty bit in the correspond pte entries. Fix it adding an
    appropriate test on swap entries.

    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Signed-off-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Oberparleiter
     
  • We should clear the page's private flag when returing the page to the
    hugepage pool. Otherwise, marked hugepage can be allocated to the user
    who tries to allocate the non-reserved hugepage. If this user fail to
    map this hugepage, he would try to return the page to the hugepage pool.
    Since this page has a private flag, resv_huge_pages would mistakenly
    increase. This patch fixes this situation.

    Signed-off-by: Joonsoo Kim
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: "Aneesh Kumar K.V"
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Davidlohr Bueso
    Cc: David Gibson
    Cc: Wanpeng Li
    Cc: Naoya Horiguchi
    Cc: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • This leak was added by commit 1d3d4437eae1 ("vmscan: per-node deferred
    work").

    unreferenced object 0xffff88006ada3bd0 (size 8):
    comm "criu", pid 14781, jiffies 4295238251 (age 105.641s)
    hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00 ........
    backtrace:
    [] kmemleak_alloc+0x5e/0xc0
    [] __kmalloc+0x247/0x310
    [] register_shrinker+0x3c/0xa0
    [] sget+0x5ab/0x670
    [] proc_mount+0x54/0x170
    [] mount_fs+0x43/0x1b0
    [] vfs_kern_mount+0x72/0x110
    [] kern_mount_data+0x19/0x30
    [] pid_ns_prepare_proc+0x20/0x40
    [] alloc_pid+0x466/0x4a0
    [] copy_process+0xc6a/0x1860
    [] do_fork+0x8b/0x370
    [] SyS_clone+0x16/0x20
    [] stub_clone+0x69/0x90
    [] 0xffffffffffffffff

    Signed-off-by: Andrew Vagin
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Glauber Costa
    Cc: Chuck Lever
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Vagin
     
  • After acquiring the semlock spinlock, operations must test that the
    array is still valid.

    - semctl() and exit_sem() would walk stale linked lists (ugly, but
    should be ok: all lists are empty)

    - semtimedop() would sleep forever - and if woken up due to a signal -
    access memory after free.

    The patch also:
    - standardizes the tests for .deleted, so that all tests in one
    function leave the function with the same approach.
    - unconditionally tests for .deleted immediately after every call to
    sem_lock - even it it means that for semctl(GETALL), .deleted will be
    tested twice.

    Both changes make the review simpler: After every sem_lock, there must
    be a test of .deleted, followed by a goto to the cleanup code (if the
    function uses "goto cleanup").

    The only exception is semctl_down(): If sem_ids().rwsem is locked, then
    the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
    additional test is required.

    Signed-off-by: Manfred Spraul
    Cc: Mike Galbraith
    Acked-by: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The initial documentation was a bit incomplete, update accordingly.

    [akpm@linux-foundation.org: make it more readable in 80 columns]
    Signed-off-by: Davidlohr Bueso
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso