02 Dec, 2012

4 commits

  • 8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()")
    implemented mod_delayed_work[_on]() using the improved
    try_to_grab_pending(). The function is later used, among others, to
    replace [__]candel_delayed_work() + queue_delayed_work() combinations.

    Unfortunately, a delayed_work item w/ zero @delay is handled slightly
    differently by mod_delayed_work_on() compared to
    queue_delayed_work_on(). The latter skips timer altogether and
    directly queues it using queue_work_on() while the former schedules
    timer which will expire on the closest tick. This means, when @delay
    is zero, that [__]cancel_delayed_work() + queue_delayed_work_on()
    makes the target item immediately executable while
    mod_delayed_work_on() may induce delay of upto a full tick.

    This somewhat subtle difference breaks some of the converted users.
    e.g. block queue plugging uses delayed_work for deferred processing
    and uses mod_delayed_work_on() when the queue needs to be immediately
    unplugged. The above problem manifested as noticeably higher number
    of context switches under certain circumstances.

    The difference in behavior was caused by missing special case handling
    for 0 delay in mod_delayed_work_on() compared to
    queue_delayed_work_on(). Joonsoo Kim posted a patch to add it -
    ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1].
    The patch was queued for 3.8 but it was described as optimization and
    I missed that it was a correctness issue.

    As both queue_delayed_work_on() and mod_delayed_work_on() use
    __queue_delayed_work() for queueing, it seems that the better approach
    is to move the 0 delay special handling to the function instead of
    duplicating it in mod_delayed_work_on().

    Fix the problem by moving 0 delay special case handling from
    queue_delayed_work_on() to __queue_delayed_work(). This replaces
    Joonsoo's patch.

    [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Anders Kaseorg
    Reported-and-tested-by: Zlatko Calusic
    LKML-Reference:
    LKML-Reference:
    Cc: Joonsoo Kim

    Tejun Heo
     
  • A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
    off, never to be seen again. In the case where this occurred, an exiting
    thread hit reiserfs homebrew conditional resched while holding a mutex,
    bringing the box to its knees.

    PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush"
    #0 [ffff8808157e7670] schedule at ffffffff8143f489
    #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
    #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
    #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
    #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
    #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
    #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
    #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
    #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
    #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18
    RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

    Signed-off-by: Mike Galbraith
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Mike Galbraith
     
  • Pull RCU fix from Ingo Molnar:
    "Fix leaking RCU extended quiescent state, which might trigger warnings
    and mess up the extended quiescent state tracking logic into thinking
    that we are in "RCU user mode" while we aren't."

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rcu: Fix unrecovered RCU user mode in syscall_trace_leave()

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "This is mostly about unbreaking architectures that took the UAPI
    changes in the v3.7 cycle, plus misc fixes."

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf kvm: Fix building perf kvm on non x86 arches
    perf kvm: Rename perf_kvm to perf_kvm_stat
    perf: Make perf build for x86 with UAPI disintegration applied
    perf powerpc: Use uapi/unistd.h to fix build error
    tools: Pass the target in descend
    tools: Honour the O= flag when tool build called from a higher Makefile
    tools: Define a Makefile function to do subdir processing
    x86: Export asm/{svm.h,vmx.h,perf_regs.h}
    perf tools: Fix strbuf_addf() when the buffer needs to grow
    perf header: Fix numa topology printing
    perf, powerpc: Fix hw breakpoints returning -ENOSPC

    Linus Torvalds
     

01 Dec, 2012

20 commits

  • …it/acme/linux into perf/urgent

    Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

    - Don't build 'perf kvm stat" on non-x86 arches, fix from Xiao Guangrong.

    - UAPI fixes to get perf building again in non-x86 arches, from David Howells.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Pull x86 fixes from Peter Anvin.

    This includes the resume-time FPU corruption fix from the chromeos guys,
    marked for stable.

    * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, fpu: Avoid FPU lazy restore after suspend
    x86-32: Unbreak booting on some 486 clones
    x86, kvm: Remove incorrect redundant assembly constraint

    Linus Torvalds
     
  • Pull C6X fixes from Mark Salter.

    * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
    c6x: use generic kvm_para.h
    c6x: remove internal kernel symbols from exported setup.h
    c6x: fix misleading comment
    c6x: run do_notify_resume with interrupts enabled

    Linus Torvalds
     
  • Pull assorted signal-related fixes from Al Viro:
    "uml regression fix (braino in sys_execve() patch) + a bunch of fucked
    sigaltstack-on-rt_sigreturn uses, similar to sparc64 fix that went in
    through davem's tree. m32r horrors not included - that one's waiting
    for maintainer."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    microblaze: rt_sigreturn is too trigger-happy about sigaltstack errors
    score: do_sigaltstack() expects a userland pointer...
    sh64: fix altstack switching on sigreturn
    openrisk: fix altstack switching on sigreturn
    um: get_safe_registers() should be done in flush_thread(), not start_thread()

    Linus Torvalds
     
  • Pull CIFS fixes from Steve French:
    "Two low risk, small fixes, that fix cifs regressions introduced in
    3.7."

    * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Fix wrong buffer pointer usage in smb_set_file_info
    cifs: fix writeback race with file that is growing

    Linus Torvalds
     
  • Pull remoteproc fix from Ohad Ben-Cohen:
    "A single remoteproc fix for an error path issue reported by Ido Yariv."

    * tag 'rproc-3.7-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
    remoteproc: fix error path of ->find_vqs

    Linus Torvalds
     
  • Pull target fix from Nicholas Bellinger:
    "So just a single target fix for v3.7.0 this time around from Roland to
    address a aborted command bug w/ tcm_qla2xxx fabric ports.

    Also, there is one outstanding IBLOCK + virtio-blk bug that is still
    being tracked down effecting v3.6.x, but AFAICT thus far this appears
    to be a bug outside of target code."

    * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: Fix handling of aborted commands

    Linus Torvalds
     
  • When a cpu enters S3 state, the FPU state is lost.
    After resuming for S3, if we try to lazy restore the FPU for a process running
    on the same CPU, this will result in a corrupted FPU context.

    Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
    so nobody will try to lazy restore a state which doesn't exist in the hardware.

    Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
    by doing thousands of suspend/resume cycles with 4 processes doing FPU
    operations running. Without the patch, a process is killed after a
    few hundreds cycles by a SIGFPE.

    Cc: Duncan Laurie
    Cc: Olof Johansson
    Cc: v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
    Signed-off-by: Vincent Palatin
    Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org
    Signed-off-by: H. Peter Anvin

    Vincent Palatin
     
  • Pull DRM fixes from Dave Airlie:
    "Just driver fixes, nothing major, except maybe the Ironlake rc6
    disable:

    - intel:
    * revert ironlake rc6 - we still have one ilk regression, but this
    gets rid of one big one
    * turn off cloning
    * a directed fix for Apple edp
    - radeon: one modesetting fix
    - exynos: minor fixes"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    radeon: fix pll/ctrc mapping on dce2 and dce3 hardware
    Revert "drm/i915: enable rc6 on ilk again"
    drm/i915: do not default to 18 bpp for eDP if missing from VBT
    drm/exynos: Fix potential NULL pointer dereference in exynos_drm_encoder.c
    drm/exynos: Make exynos4/5_fimd_driver_data static
    drm/exynos: fix overlay updating issue
    drm/exynos: remove unnecessary code.
    drm/exynos: fix linux framebuffer address setting.
    drm/i915: disable cloning on sdvo

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "Seven fixes, some of them fingers-crossed :("

    * emailed patches from Andrew Morton : (7 patches)
    drivers/rtc/rtc-tps65910.c: fix invalid pointer access on _remove()
    mm: soft offline: split thp at the beginning of soft_offline_page()
    mm: avoid waking kswapd for THP allocations when compaction is deferred or contended
    revert "Revert "mm: remove __GFP_NO_KSWAPD""
    mm: vmscan: fix endless loop in kswapd balancing
    mm/vmemmap: fix wrong use of virt_to_page
    mm: compaction: fix return value of capture_free_page()

    Linus Torvalds
     
  • Pull ARM SoC fixes from Arnd Bergmann:
    "These are three fixes for the Marvell EBU family and one for the
    Samsung s3c platforms. All of them are obvious should still make it
    into 3.7."

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: Kirkwood: Update PCI-E fixup
    Dove: Fix irq_to_pmu()
    Dove: Attempt to fix PMU/RTC interrupts
    ARM: S3C24XX: Fix potential NULL pointer dereference error

    Linus Torvalds
     
  • Pull ARM ixp4xx bug fixes from Arnd Bergmann:
    "These were originally prepared by Krzysztof Halasa but not submitted
    in time for v3.7 due to some confusion about how ixp4xx patches should
    be handled. Jason Cooper thankfully offered to help out sending the
    patches upstream through arm-soc now, but given the timing, we could
    as well delay them for 3.8."

    * tag 'ixp4xx-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    IXP4xx: use __iomem for MMIO
    IXP4xx: map CPU config registers within VMALLOC region.
    IXP4xx: Always ioremap() Queue Manager MMIO region at boot.
    ixp4xx: Declare MODULE_FIRMWARE usage
    IXP4xx crypto: MOD_AES{128,192,256} already include key size.
    WAN: Remove redundant HDLC info printed by IXP4xx HSS driver.
    IXP4xx: Remove time limit for PCI TRDY to enable use of slow devices.
    IXP4xx: ixp4xx_crypto driver requires Queue Manager and NPE drivers.
    IXP4xx: HW pseudo-random generator is available on IXP45x/46x only.
    IXP4xx: Fix off-by-one bug in Goramo MultiLink platform.
    IXP4xx: Fix Goramo MultiLink platform compilation.

    Linus Torvalds
     
  • Pull final ARM fix from Russell King:
    "One final fix, spotted by Will, to do with what happens when we boot a
    SMP kernel on UP."

    * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
    ARM: 7586/1: sp804: set cpumask to cpu_possible_mask for clock event device

    Linus Torvalds
     
  • The tps65910_rtc data is registered as the platform driver data in
    _probe(= ). Therefore the tps65910_rtc should be used on unregistering
    the rtc device. And device pointer should be retrieved from the
    platform_device structure.

    This patch fixes the below oops:

    Unable to handle kernel NULL pointer dereference at virtual address 00000008
    Modules linked in: rtc_tps65910(-)
    CPU: 0 Not tainted (3.7.0-rc7-next-20121128-g6b1f974-dirty #7)
    PC is at tps65910_rtc_alarm_irq_enable+0x20/0x2c [rtc_tps65910]
    (tps65910_rtc_alarm_irq_enable+0x20/0x2c [rtc_tps65910])
    (tps65910_rtc_remove+0x18/0x28 [rtc_tps65910])
    (platform_drv_remove+0x18/0x1c)
    (__device_release_driver+0x70/0xcc)
    (driver_detach+0xb4/0xb8)
    (bus_remove_driver+0x7c/0xc0)
    (sys_delete_module+0x148/0x21c)

    Signed-off-by: Milo(Woogyom) Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kim, Milo
     
  • When we try to soft-offline a thp tail page, put_page() is called on the
    tail page unthinkingly and VM_BUG_ON is triggered in put_compound_page().

    This patch splits thp before going into the main body of soft-offlining.

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Tony Luck
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
    based on failures" reverted, Zdenek Kabelac reported the following

    Hmm, so it's just took longer to hit the problem and observe
    kswapd0 spinning on my CPU again - it's not as endless like before -
    but still it easily eats minutes - it helps to turn off Firefox
    or TB (memory hungry apps) so kswapd0 stops soon - and restart
    those apps again. (And I still have like >1GB of cached memory)

    kswapd0 R running task 0 30 2 0x00000000
    Call Trace:
    preempt_schedule+0x42/0x60
    _raw_spin_unlock+0x55/0x60
    put_super+0x31/0x40
    drop_super+0x22/0x30
    prune_super+0x149/0x1b0
    shrink_slab+0xba/0x510

    The sysrq+m indicates the system has no swap so it'll never reclaim
    anonymous pages as part of reclaim/compaction. That is one part of the
    problem but not the root cause as file-backed pages could also be
    reclaimed.

    The likely underlying problem is that kswapd is woken up or kept awake
    for each THP allocation request in the page allocator slow path.

    If compaction fails for the requesting process then compaction will be
    deferred for a time and direct reclaim is avoided. However, if there
    are a storm of THP requests that are simply rejected, it will still be
    the the case that kswapd is awake for a prolonged period of time as
    pgdat->kswapd_max_order is updated each time. This is noticed by the
    main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
    it will loopp, shrinking a small number of pages and calling
    shrink_slab() on each iteration.

    This patch defers when kswapd gets woken up for THP allocations. For
    !THP allocations, kswapd is always woken up. For THP allocations,
    kswapd is woken up iff the process is willing to enter into direct
    reclaim/compaction.

    [akpm@linux-foundation.org: fix typo in comment]
    Signed-off-by: Mel Gorman
    Cc: Zdenek Kabelac
    Cc: Seth Jennings
    Cc: Jiri Slaby
    Cc: Rik van Riel
    Cc: Robert Jennings
    Cc: Valdis Kletnieks
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • It apepars that this patch was innocent, and we hope that "mm: avoid
    waking kswapd for THP allocations when compaction is deferred or
    contended" will fix the final kswapd-spinning cause.

    Cc: Zdenek Kabelac
    Cc: Seth Jennings
    Cc: Valdis Kletnieks
    Cc: Jiri Slaby
    Cc: Rik van Riel
    Cc: Robert Jennings
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Kswapd does not in all places have the same criteria for a balanced
    zone. Zones are only being reclaimed when their high watermark is
    breached, but compaction checks loop over the zonelist again when the
    zone does not meet the low watermark plus two times the size of the
    allocation. This gets kswapd stuck in an endless loop over a small
    zone, like the DMA zone, where the high watermark is smaller than the
    compaction requirement.

    Add a function, zone_balanced(), that checks the watermark, and, for
    higher order allocations, if compaction has enough free memory. Then
    use it uniformly to check for balanced zones.

    This makes sure that when the compaction watermark is not met, at least
    reclaim happens and progress is made - or the zone is declared
    unreclaimable at some point and skipped entirely.

    Signed-off-by: Johannes Weiner
    Reported-by: George Spelvin
    Reported-by: Johannes Hirte
    Reported-by: Tomas Racek
    Tested-by: Johannes Hirte
    Reviewed-by: Rik van Riel
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • I enable CONFIG_DEBUG_VIRTUAL and CONFIG_SPARSEMEM_VMEMMAP, when doing
    memory hotremove, there is a kernel BUG at arch/x86/mm/physaddr.c:20.

    It is caused by free_section_usemap()->virt_to_page(), virt_to_page() is
    only used for kernel direct mapping address, but sparse-vmemmap uses
    vmemmap address, so it is going wrong here.

    ------------[ cut here ]------------
    kernel BUG at arch/x86/mm/physaddr.c:20!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: acpihp_drv acpihp_slot edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse vfat fat loop dm_mod coretemp kvm crc32c_intel ipv6 ixgbe igb iTCO_wdt i7core_edac edac_core pcspkr iTCO_vendor_support ioatdma microcode joydev sr_mod i2c_i801 dca lpc_ich mfd_core mdio tpm_tis i2c_core hid_generic tpm cdrom sg tpm_bios rtc_cmos button ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh ata_generic ata_piix libata megaraid_sas scsi_mod
    CPU 39
    Pid: 6454, comm: sh Not tainted 3.7.0-rc1-acpihp-final+ #45 QCI QSSC-S4R/QSSC-S4R
    RIP: 0010:[] [] __phys_addr+0x88/0x90
    RSP: 0018:ffff8804440d7c08 EFLAGS: 00010006
    RAX: 0000000000000006 RBX: ffffea0012000000 RCX: 000000000000002c
    ...

    Signed-off-by: Jianguo Wu
    Signed-off-by: Jiang Liu
    Reviewd-by: Wen Congyang
    Acked-by: Johannes Weiner
    Reviewed-by: Yasuaki Ishimatsu
    Reviewed-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     
  • Commit ef6c5be658f6 ("fix incorrect NR_FREE_PAGES accounting (appears
    like memory leak)") fixes a NR_FREE_PAGE accounting leak but missed the
    return value which was also missed by this reviewer until today.

    That return value is used by compaction when adding pages to a list of
    isolated free pages and without this follow-up fix, there is a risk of
    free list corruption.

    Signed-off-by: Mel Gorman
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

29 Nov, 2012

16 commits