18 May, 2011

9 commits

  • Now that we have CONFIG_DYNAMIC_DEBUG there is no need for yet
    another flag causing dev_dbg() and pr_debug() statements in the
    core PM code to produce output. Moreover, CONFIG_PM_VERBOSE
    causes so much output to be generated that it's not really useful
    and almost no one sets it.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=23182
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • * power-domains:
    PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
    PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
    OMAP1 / PM: Use generic clock manipulation routines for runtime PM
    PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
    PM / Runtime: Add subsystem data field to struct dev_pm_info
    OMAP2+ / PM: move runtime PM implementation to use device power domains
    PM / Platform: Use generic runtime PM callbacks directly
    shmobile: Use power domains for platform runtime PM
    PM: Export platform bus type's default PM callbacks
    PM: Make power domain callbacks take precedence over subsystem ones

    Rafael J. Wysocki
     
  • * syscore:
    PM: Remove sysdev suspend, resume and shutdown operations
    PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
    PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
    PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
    PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
    ARM / Samsung: Use struct syscore_ops for "core" power management
    ARM / PXA: Use struct syscore_ops for "core" power management
    ARM / SA1100: Use struct syscore_ops for "core" power management
    ARM / Integrator: Use struct syscore_ops for core PM
    ARM / OMAP: Use struct syscore_ops for "core" power management
    ARM: Use struct syscore_ops instead of sysdevs for PM in common code

    Rafael J. Wysocki
     
  • This reverts commit bea3864fb627d110933cfb8babe048b63c4fc76e
    (PM / Hibernate: Reduce autotuned default image size), because users
    are now able to resolve the issue this commit was supposed to address
    in a different way (i.e. by using the new /sys/power/reserved_size
    interface).

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Martin reports that on his system hibernation occasionally fails due
    to the lack of memory, because the radeon driver apparently allocates
    too much of it during the device freeze stage. It turns out that the
    amount of memory allocated by radeon during hibernation (and
    presumably during system suspend too) depends on the utilization of
    the GPU (e.g. hibernating while there are two KDE 4 sessions with
    compositing enabled causes radeon to allocate more memory than for
    one KDE 4 session).

    In principle it should be possible to use image_size to make the
    memory preallocation mechanism free enough memory for the radeon
    driver, but in practice it is not easy to guess the right value
    because of the way the preallocation code uses image_size. For this
    reason, it seems reasonable to allow users to control the amount of
    memory reserved for driver allocations made after the hibernate
    preallocation, which currently is constant and amounts to 1 MB.

    Introduce a new sysfs file, /sys/power/reserved_size, whose value
    will be used as the amount of memory to reserve for the
    post-preallocation reservations made by device drivers, in bytes.
    For backwards compatibility, set its default (and initial) value to
    the currently used number (1 MB).

    References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
    Reported-and-tested-by: Martin Steigerwald
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • We need to prevent kernel-forked processes during system poweroff.
    Such processes try to access the filesystem whose disks we are
    trying to shutdown at the same time. This causes delays and exceptions
    in the storage drivers.

    A follow-up patch will add these calls and need usermodehelper_disable()
    also on systems without suspend support.

    Signed-off-by: Kay Sievers
    Signed-off-by: Rafael J. Wysocki

    Kay Sievers
     
  • Some drivers erroneously use request_firmware() from their ->resume()
    (or ->thaw(), or ->restore()) callbacks, which is not going to work
    unless the firmware has been built in. This causes system resume to
    stall until the firmware-loading timeout expires, which makes users
    think that the resume has failed and reboot their machines
    unnecessarily. For this reason, make _request_firmware() print a
    warning and return immediately with error code if it has been called
    when tasks are frozen and it's impossible to start any new usermode
    helpers.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Valdis Kletnieks

    Rafael J. Wysocki
     
  • The freezer processes are dealing with multiple threads running
    simultaneously, and on a UP system, the memory reads/writes do
    not need barriers to keep things in sync. These are only needed
    on SMP systems, so use SMP barriers instead.

    Signed-off-by: Mike Frysinger
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Mike Frysinger
     
  • The current implementation of suspend-to-RAM returns 0 if there is an
    error from suspend_enter(), because suspend_devices_and_enter() ignores
    the return value from suspend_enter(). This patch addresses this issue
    and properly keep the error return from suspend_enter() and let
    suspend_devices_and_enter relay the error return.

    Signed-off-by: MyungJoo Ham
    Signed-off-by: Kyungmin Park
    Signed-off-by: Rafael J. Wysocki

    MyungJoo Ham
     

14 May, 2011

1 commit

  • If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns).

    Get rid of _current_user_ns. This requires nsown_capable() to be
    defined in capability.c rather than as static inline in capability.h,
    so do that.

    Request_key needs init_user_ns defined at current_user_ns if
    !CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS
    at current_user_ns() define.

    Compile-tested with and without CONFIG_USERNS.

    Signed-off-by: Serge E. Hallyn
    [ This makes a huge performance difference for acl_permission_check(),
    up to 30%. And that is one of the hottest kernel functions for loads
    that are pathname-lookup heavy. ]
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

12 May, 2011

4 commits

  • Since suspend, resume and shutdown operations in struct sysdev_class
    and struct sysdev_driver are not used any more, remove them. Also
    drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
    for executing those operations and modify all of their users
    accordingly. This reduces kernel code size quite a bit and reduces
    its complexity.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • The SNAPSHOT_S2RAM ioctl used for implementing the feature allowing
    one to suspend to RAM after creating a hibernation image is currently
    broken, because it doesn't clear the "ready" flag in the struct
    snapshot_data object handled by it. As a result, the
    SNAPSHOT_UNFREEZE doesn't work correctly after SNAPSHOT_S2RAM has
    returned and the user space hibernate task cannot thaw the other
    processes as appropriate. Make SNAPSHOT_S2RAM clear data->ready
    to fix this problem.

    Tested-by: Alexandre Felipe Muller de Souza
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • If the process using the hibernate user space interface closes
    /dev/snapshot after creating a hibernation image without thawing
    tasks, snapshot_release() should call pm_restore_gfp_mask() to
    restore the GFP mask used before the creation of the image. Make
    that happen.

    Tested-by: Alexandre Felipe Muller de Souza
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • A warning is printed by pm_restrict_gfp_mask() while the
    SNAPSHOT_S2RAM ioctl is being executed after creating a hibernation
    image, because pm_restrict_gfp_mask() has been called once already
    before the image creation and suspend_devices_and_enter() calls it
    once again. This happens after commit 452aa6999e6703ffbddd7f6ea124d3
    (mm/pm: force GFP_NOIO during suspend/hibernation and resume).

    To avoid this issue, move pm_restrict_gfp_mask() and
    pm_restore_gfp_mask() from suspend_devices_and_enter() to its caller
    in kernel/power/suspend.c.

    Reported-by: Alexandre Felipe Muller de Souza
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Rafael J. Wysocki
     

08 May, 2011

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools: Makefile: Use gcc to determine ARCH
    perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions
    hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg()
    sh, hw_breakpoints: Fix racy access to ptrace breakpoints
    arm, hw_breakpoints: Fix racy access to ptrace breakpoints
    powerpc, hw_breakpoints: Fix racy access to ptrace breakpoints
    x86, hw_breakpoints: Fix racy access to ptrace breakpoints
    ptrace: Prepare to fix racy accesses on task breakpoints

    Linus Torvalds
     

07 May, 2011

1 commit

  • This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266.

    That commit changed the structure layout of the trace structure, which
    in turn broke PowerTOP (1.9x generation) quite badly.

    I appreciate not wanting to expose the variable in question, and
    PowerTOP was not using it, so I've replaced the variable with just a
    padding field - that way if in the future a new field is needed it can
    just use this padding field.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

06 May, 2011

1 commit


05 May, 2011

1 commit


03 May, 2011

1 commit

  • commit ab7798ffcf98b11a9525cf65bacdae3fd58d357f ("genirq: Expand generic
    show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to
    accomodate PowerPC, but this doesn't actually enable the functionality due
    to a typo in the #ifdef check.

    Signed-off-by: Geert Uytterhoeven
    Cc: Linux/PPC Development
    Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3E
    Signed-off-by: Thomas Gleixner

    Geert Uytterhoeven
     

01 May, 2011

1 commit


30 Apr, 2011

4 commits

  • Many different platforms and subsystems may want to disable device
    clocks during suspend and enable them during resume which is going to
    be done in a very similar way in all those cases. For this reason,
    provide generic routines for the manipulation of device clocks during
    suspend and resume.

    Convert the ARM shmobile platform to using the new routines.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf, x86, nmi: Move LVT un-masking into irq handlers
    perf events, x86: Work around the Nehalem AAJ80 erratum
    perf, x86: Fix BTS condition
    ftrace: Build without frame pointers on Microblaze

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically
    rtc: max8925: Call dev_set_drvdata before rtc_device_register

    Linus Torvalds
     
  • If a rescuer and stop_machine() bringing down a CPU race with each
    other, they may deadlock on non-preemptive kernel. The CPU won't
    accept a new task, so the rescuer can't migrate to the target CPU,
    while stop_machine() can't proceed because the rescuer is holding one
    of the CPU retrying migration. GCWQ_DISASSOCIATED is never cleared
    and worker_maybe_bind_and_lock() retries indefinitely.

    This problem can be reproduced semi reliably while the system is
    entering suspend.

    http://thread.gmane.org/gmane.linux.kernel/1122051

    A lot of kudos to Thilo-Alexander for reporting this tricky issue and
    painstaking testing.

    stable: This affects all kernels with cmwq, so all kernels since and
    including v2.6.36 need this fix.

    Signed-off-by: Tejun Heo
    Reported-by: Thilo-Alexander Ginkel
    Tested-by: Thilo-Alexander Ginkel
    Cc: stable@kernel.org

    Tejun Heo
     

29 Apr, 2011

2 commits

  • Sedat and Bruno reported RCU stalls which turned out to be caused by
    the following;

    sched_init() calls init_rt_bandwidth() which calls hrtimer_init()
    _BEFORE_ hrtimers_init() is called. While not entirely correct this
    worked because hrtimer_init() only accessed statically initialized
    data (hrtimer_bases.clock_base[CLOCK_MONOTONIC])

    Commit e06383db9 (hrtimers: extend hrtimer base code to handle more
    then 2 clockids) added an indirection to the hrtimer_bases.clock_base
    lookup to avoid gap handling in the hot path. The table which is used
    for the translataion from CLOCK_ID to HRTIMER_BASE index is
    initialized at runtime in hrtimers_init(). So the early call of the
    scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME.

    Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is
    armed and the wall clock time is set (e.g. ntpdate in the early boot
    process - which also gives the problem deterministic behaviour
    i.e. magic recovery after N hours), then the timer ends up with an
    expiry time far into the future. That breaks the RT throttler
    mechanism as rt runtime is accumulated and never cleared, so the rt
    throttler detects a false cpu hog condition and blocks all RT tasks
    until the timer finally expires. That in turn stalls the RCU thread of
    TINYRCU which leads to an huge amount of RCU callbacks piling up.

    Make the translation table statically initialized, so we are back to
    the status of
    Reported-by: Bruno Prémont
    Cc: John stultz
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E
    Reviewed-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • In corner cases where softlockup watchdog is not setup successfully, the
    relevant nmi perf event for hardlockup watchdog could be disabled, then
    the status of the underlying hardware remains unchanged.

    Also, if the kthread doesn't start then the hrtimer won't run and the
    hardlockup detector will falsely fire.

    Signed-off-by: Hillf Danton
    Signed-off-by: Don Zickus
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     

27 Apr, 2011

1 commit


25 Apr, 2011

1 commit

  • When a task is traced and is in a stopped state, the tracer
    may execute a ptrace request to examine the tracee state and
    get its task struct. Right after, the tracee can be killed
    and thus its breakpoints released.
    This can happen concurrently when the tracer is in the middle
    of reading or modifying these breakpoints, leading to dereferencing
    a freed pointer.

    Hence, to prepare the fix, create a generic breakpoint reference
    holding API. When a reference on the breakpoints of a task is
    held, the breakpoints won't be released until the last reference
    is dropped. After that, no more ptrace request on the task's
    breakpoints can be serviced for the tracer.

    Reported-by: Oleg Nesterov
    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: Prasad
    Cc: Paul Mundt
    Cc: v2.6.33..
    Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com

    Frederic Weisbecker
     

24 Apr, 2011

1 commit


21 Apr, 2011

1 commit

  • Microblaze doesn't need/support FRAME_POINTERS in order to have a working
    function tracer.

    The patch remove Kconfig warning.

    Warning log:
    warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP &&
    FUNCTION_TRACER && KMEMCHECK) selects FRAME_POINTER which has unmet direct
    dependencies (DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 ||
    SUPERH || BLACKFIN || MN10300) || ARCH_WANT_FRAME_POINTERS)

    Signed-off-by: Michal Simek
    Link: http://lkml.kernel.org/r/1301908812-8119-2-git-send-email-monstr@monstr.eu
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Michal Simek
     

20 Apr, 2011

2 commits

  • Device suspend/resume infrastructure is used not only by the suspend
    and hibernate code in kernel/power, but also by APM, Xen and the
    kexec jump feature. However, commit 40dc166cb5dddbd36aa4ad11c03915ea
    (PM / Core: Introduce struct syscore_ops for core subsystems PM)
    failed to add syscore_suspend() and syscore_resume() calls to that
    code, which generally leads to breakage when the features in question
    are used.

    To fix this problem, add the missing syscore_suspend() and
    syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
    and drivers/xen/manage.c.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Acked-by: Ian Campbell

    Rafael J. Wysocki
     
  • …l/git/tip/linux-2.6-tip

    * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    RTC: rtc-omap: Fix a leak of the IRQ during init failure
    posix clocks: Replace mutex with reader/writer semaphore

    Linus Torvalds
     

19 Apr, 2011

2 commits

  • If syscore_suspend() fails in suspend_enter(), create_image() or
    resume_target_kernel(), it is necessary to call sysdev_resume(),
    because sysdev_suspend() has been called already and succeeded
    and we are going to abort the transition.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • next_pidmap() just quietly accepted whatever 'last' pid that was passed
    in, which is not all that safe when one of the users is /proc.

    Admittedly the proc code should do some sanity checking on the range
    (and that will be the next commit), but that doesn't mean that the
    helper functions should just do that pidmap pointer arithmetic without
    checking the range of its arguments.

    So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1"
    doesn't really matter, the for-loop does check against the end of the
    pidmap array properly (it's only the actual pointer arithmetic overflow
    case we need to worry about, and going one bit beyond isn't going to
    overflow).

    [ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

    Reported-by: Tavis Ormandy
    Analyzed-by: Robert Święcki
    Cc: Eric W. Biederman
    Cc: Pavel Emelyanov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Apr, 2011

1 commit

  • A dynamic posix clock is protected from asynchronous removal by a mutex.
    However, using a mutex has the unwanted effect that a long running clock
    operation in one process will unnecessarily block other processes.

    For example, one process might call read() to get an external time stamp
    coming in at one pulse per second. A second process calling clock_gettime
    would have to wait for almost a whole second.

    This patch fixes the issue by using a reader/writer semaphore instead of
    a mutex.

    Signed-off-by: Richard Cochran
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/%3C20110330132421.GA31771%40riccoc20.at.omicron.at%3E
    Signed-off-by: Thomas Gleixner

    Richard Cochran
     

17 Apr, 2011

2 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: make unplug timer trace event correspond to the schedule() unplug
    block: let io_schedule() flush the plug inline

    Linus Torvalds
     
  • …linus', 'timer-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec()
    perf: Fix a build error with some GCC versions

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix erroneous all_pinned logic
    sched: Fix sched-domain avg_load calculation

    * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    RTC: rtc-mrst: follow on to the change of rtc_device_register()
    RTC: add missing "return 0" in new alarm func for rtc-bfin.c
    RTC: Fix s3c compile error due to missing s3c_rtc_setpie
    RTC: Fix early irqs caused by calling rtc_set_alarm too early

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, amd: Disable GartTlbWlkErr when BIOS forgets it
    x86, NUMA: Fix fakenuma boot failure
    x86/mrst: Fix boot crash caused by incorrect pin to irq mapping
    x86/ce4100: Add reg property to bridges

    Linus Torvalds
     

16 Apr, 2011

2 commits

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Linus correctly observes that the most important dispatch cases
    are now done from kblockd, this isn't ideal for latency reasons.
    The original reason for switching dispatches out-of-line was to
    avoid too deep a stack, so by _only_ letting the "accidental"
    flush directly in schedule() be guarded by offload to kblockd,
    we should be able to get the best of both worlds.

    So add a blk_schedule_flush_plug() that offloads to kblockd,
    and only use that from the schedule() path.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Apr, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: only force kblockd unplugging from the schedule() path
    block: cleanup the block plug helper functions
    block, blk-sysfs: Use the variable directly instead of a function call
    block: move queue run on unplug to kblockd
    block: kill queue_sync_plugs()
    block: readd plug trace event
    block: add callback function for unplug notification
    block: add comment on why we save and disable interrupts in flush_plug_list()
    block: fixup block IO unplug trace call
    block: remove block_unplug_timer() trace point
    block: splice plug list to local context

    Linus Torvalds