18 Jul, 2007

5 commits

  • Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata()
    function.

    Signed-off-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If the kernel OOPSed or BUGed then it probably should be considered as
    tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
    tainted kernel. This saves a lot of time explaining oddities in the
    calltraces.

    Signed-off-by: Pavel Emelianov
    Acked-by: Randy Dunlap
    Cc:
    Signed-off-by: Andrew Morton
    [ Added parisc patch from Matthew Wilson -Linus ]
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
    variant in the past. But with __GFP_ZERO it is possible now to do zeroing
    while allocating.

    Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
    we can.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
    as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
    can be used to satisfy hugepage allocations even when the system has been
    running a long time. This allows an administrator to resize the hugepage pool
    at runtime depending on the size of ZONE_MOVABLE.

    This patch adds a new sysctl called hugepages_treat_as_movable. When a
    non-zero value is written to it, future allocations for the huge page pool
    will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
    introduce additional external fragmentation of note as huge pages are always
    the largest contiguous block we care about.

    [akpm@linux-foundation.org: various fixes]
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Jul, 2007

35 commits

  • All of the clockevent notifiers expect a pointer to
    an "unsigned int" cpu argument, but hrtimer_cpu_notify()
    passes in a pointer to a long.

    [ Discussed with and ok by Thomas Gleixner ]

    Signed-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David Miller
     
  • Randy Dunlap noticed that the recent comment clarifications from Andrew
    had somehow gotten duplicated. Quoth Andrew: "hm, that could have been
    some late-night reject-fixing."

    Fix it up.

    Cc: From: Andrew Morton
    Cc: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    [PATCH] sched: fix up fs/proc/array.c whitespace problems
    [PATCH] sched: prettify prio_to_wmult[]
    [PATCH] sched: document prio_to_wmult[]
    [PATCH] sched: improve weight-array comments
    [PATCH] sched: remove dead code from task_stime()

    Fixed up trivial conflict in fs/proc/array.c

    Linus Torvalds
     
  • kernel/printk.c: document possible deadlock against scheduler

    The printk's comment states that it can be called from every context,
    which might lead to false illusion that it could be called from everywhere
    without any restrictions.

    This is however not true - a call to printk() could deadlock if called from
    scheduler code (namely from schedule(), wake_up(), etc) on runqueue lock
    when it tries to wake up klogd. Document this.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     
  • Now we always use stop_machine for module insertion or deletion, we no
    longer need the modlist_lock: merely disabling preemption is sufficient to
    block against list manipulation. This avoids deadlock on OOPSen where we
    can potentially grab the lock twice.

    Bug: 8695
    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    Cc: Tobias Oed
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean
    indicating whether the work was actually cancelled. A zero return value means
    that the work was not pending/queued.

    Without that kind of change it is not possible to avoid flush_workqueue()
    sometimes, see the next patch as an example.

    Also, this patch unifies both functions and kills the (unlikely) busy-wait
    loop.

    Signed-off-by: Oleg Nesterov
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, the current naming of cancel_xxx workqueue functions is very confusing.

    cancel_delayed_work()
    cancel_rearming_delayed_work()
    cancel_rearming_delayed_workqueue() // obsolete

    cancel_work_sync()

    This looks as if the first 2 functions differ in "type" of their argument
    which is not true any longer, nowadays the difference is the behaviour.

    The semantics of cancel_rearming_delayed_work(dwork) was changed
    significantly, it doesn't require that dwork rearms itself, and cancels dwork
    synchronously.

    Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work()
    and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple
    inline obsolete wrapper, like cancel_rearming_delayed_workqueue().

    Signed-off-by: Oleg Nesterov
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Replace (n & (n-1)) with is_power_of_2()

    Signed-off-by: vignesh babu
    Acked-by: Stelian Pop
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • This follows a suggestion from Chuck Ebbert on how to make seccomp
    absolutely zerocost in schedule too. The only remaining footprint of
    seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
    even a few kbytes) larger, measure it if you care in the embedded.

    Signed-off-by: Andrea Arcangeli
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This reduces the memory footprint and it enforces that only the current
    task can enable seccomp on itself (this is a requirement for a
    strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Remove pointless `else'.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Hopefully this will help people to understand the new regime.

    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The recent PRIVATE and REQUEUE_PI changes to the futex code made it hard to
    read. Tidy it up.

    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Improve performance of sys_time(). sys_time() returns time in seconds, but
    it does so by calling do_gettimeofday() and then returning the tv_sec
    portion of the GTOD time. But the data structure "xtime", which is updated
    by every timer/scheduler tick, already offers HZ granularity time.

    The patch improves the sysbench OLTP macrobenchmark significantly:

    2.6.22-rc6:

    #threads
    1: transactions: 3733 (373.21 per sec.)
    2: transactions: 6676 (667.46 per sec.)
    3: transactions: 6957 (695.50 per sec.)
    4: transactions: 7055 (705.48 per sec.)
    5: transactions: 6596 (659.33 per sec.)

    2.6.22-rc6 + sys_time.patch:

    1: transactions: 4005 (400.47 per sec.)
    2: transactions: 7379 (737.77 per sec.)
    3: transactions: 7347 (734.49 per sec.)
    4: transactions: 7468 (746.65 per sec.)
    5: transactions: 7428 (742.47 per sec.)

    Mixed API uses of gettimeofday() and time() are guaranteed to be coherent
    via the use of a at-most-once-per-second slowpath that updates xtime.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Ingo Molnar
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • While working on unshare support for the network namespace I noticed we
    were putting clone flags in an int. Which is weird because the syscall
    uses unsigned long and we at least need an unsigned to properly hold all of
    the unshare flags.

    So to make the code consistent, this patch updates the code to use
    unsigned long instead of int for the clone flags in those places
    where we get it wrong today.

    Signed-off-by: Eric W. Biederman
    Acked-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
    working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
    was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
    32/64bit clean, enable it for 32bit" However this isn't right. Look at
    if_dqblk structure:

    struct if_dqblk {
    __u64 dqb_bhardlimit;
    __u64 dqb_bsoftlimit;
    __u64 dqb_curspace;
    __u64 dqb_ihardlimit;
    __u64 dqb_isoftlimit;
    __u64 dqb_curinodes;
    __u64 dqb_btime;
    __u64 dqb_itime;
    __u32 dqb_valid;
    };

    For 32 bit quota tools sizeof(if_dqblk) == 0x44.
    But for 64 bit kernel its size is 0x48, 'cause of alignment!
    Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
    that handles this and related situations.

    [michal.k.k.piotrowski@gmail.com: build fix]
    [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
    Signed-off-by: Vasily Tarasov
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Jan Kara
    Cc:
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Tarasov
     
  • Fix parameter name in audit_core_dumps for kerneldoc.

    Signed-off-by: Henrik Kretzschmar
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Kretzschmar
     
  • It should improve performance in some scenarii where a lot of
    these nsproxy objects are created by unsharing namespaces. This is
    a typical use of virtual servers that are being created or entered.

    This is also a good tool to find leaks and gather statistics on
    namespace usage.

    Signed-off-by: Cedric Le Goater
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
    create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
    subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
    copy_utsname().

    Modify create_new_namespaces() to also use the errors returned by the
    copy_*_ns routines and not to systematically return ENOMEM.

    [oleg@tv-sign.ru: better changelog]
    Signed-off-by: Cedric Le Goater
    Cc: Serge E. Hallyn
    Cc: Badari Pulavarty
    Cc: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • This patch enables the unshare of user namespaces.

    It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
    resets the current user_struct and adds a new root user (uid == 0)

    For now, unsharing the user namespace allows a process to reset its
    user_struct accounting and uid 0 in the new user namespace should be contained
    using appropriate means, for instance selinux

    The plan, when the full support is complete (all uid checks covered), is to
    keep the original user's rights in the original namespace, and let a process
    become uid 0 in the new namespace, with full capabilities to the new
    namespace.

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Cedric Le Goater
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
    deactivate the unshare of the uts and ipc namespaces and do not improve
    performance.

    Signed-off-by: Cedric Le Goater
    Acked-by: "Serge E. Hallyn"
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Add TTY input auditing, used to audit system administrator's actions. This is
    required by various security standards such as DCID 6/3 and PCI to provide
    non-repudiation of administrator's actions and to allow a review of past
    actions if the administrator seems to overstep their duties or if the system
    becomes misconfigured for unknown reasons. These requirements do not make it
    necessary to audit TTY output as well.

    Compared to an user-space keylogger, this approach records TTY input using the
    audit subsystem, correlated with other audit events, and it is completely
    transparent to the user-space application (e.g. the console ioctls still
    work).

    TTY input auditing works on a higher level than auditing all system calls
    within the session, which would produce an overwhelming amount of mostly
    useless audit events.

    Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
    by process with the attribute is sent to the audit subsystem by the kernel.
    The audit netlink interface is extended to allow modifying the audit_tty
    attribute, and to allow sending explanatory audit events from user-space (for
    example, a shell might send an event containing the final command, after the
    interactive command-line editing and history expansion is performed, which
    might be difficult to decipher from the TTY input alone).

    Because the "audit_tty" attribute is inherited across fork (), it would be set
    e.g. for sshd restarted within an audited session. To prevent this, the
    audit_tty attribute is cleared when a process with no open TTY file
    descriptors (e.g. after daemon startup) opens a TTY.

    See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
    more detailed rationale document for an older version of this patch.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Miloslav Trmac
    Cc: Al Viro
    Cc: Alan Cox
    Cc: Paul Fulghum
    Cc: Casey Schaufler
    Cc: Steve Grubb
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miloslav Trmac
     
  • Currently we handle spurious IRQ activity based upon seeing a lot of
    invalid interrupts, and we clear things back on the base of lots of valid
    interrupts.

    Unfortunately in some cases you get legitimate invalid interrupts caused by
    timing asynchronicity between the PCI bus and the APIC bus when disabling
    interrupts and pulling other tricks. In this case although the spurious
    IRQs are not a problem our unhandled counters didn't clear and they act as
    a slow running timebomb. (This is effectively what the serial port/tty
    problem that was fixed by clearing counters when registering a handler
    showed up)

    It's easy enough to add a second parameter - time. This means that if we
    see a regular stream of harmless spurious interrupts which are not harming
    processing we don't go off and do something stupid like disable the IRQ
    after a month of running. OTOH lockups and performance killers show up a
    lot more than 10/second

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Alan Cox
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Make available to the user the following task and process performance
    statistics:

    * Involuntary Context Switches (task_struct->nivcsw)
    * Voluntary Context Switches (task_struct->nvcsw)

    Statistics information is available from:
    1. taskstats interface (Documentation/accounting/)
    2. /proc/PID/status (task only).

    This data is useful for detecting hyperactivity patterns between processes.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Maxim Uvarov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Jonathan Lim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maxim Uvarov
     
  • I forgot to remove capability.h from mm.h while removing sched.h! This
    patch remedies that, because the only inline function which was using
    CAP_something was made out of line.

    Cross-compile tested without regressions on:

    all powerpc defconfigs
    all mips defconfigs
    all m68k defconfigs
    all arm defconfigs
    all ia64 defconfigs

    alpha alpha-allnoconfig alpha-defconfig alpha-up
    arm
    i386 i386-allnoconfig i386-defconfig i386-up
    ia64 ia64-allnoconfig ia64-defconfig ia64-up
    m68k
    mips
    parisc parisc-allnoconfig parisc-defconfig parisc-up
    powerpc powerpc-up
    s390 s390-allnoconfig s390-defconfig s390-up
    sparc sparc-allnoconfig sparc-defconfig sparc-up
    sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
    um-x86_64
    x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a flag in /proc/timer_stats to indicate deferrable timers. This will
    let developers/users to differentiate between types of tiemrs in
    /proc/timer_stats.

    Deferrable timer and normal timer will appear in /proc/timer_stats as below.
    10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

    Also version of timer_stats changes from v0.1 to v0.2

    Signed-off-by: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     
  • Allow printk_time to be enabled or disabled at boot time. Previously it
    could be enabled only, but not disabled.

    Change printk_time from an int to a bool since that's what it is. Make its
    logical (exposed) name just be "time" (was "printk_time").

    Note: Changes kernel boot option syntax from "time" to "printk.time=value".

    Since printk_time is declared as a module_param, it can also be
    changed at run-time by modifying
    /sys/module/printk/parameters/time
    to a value of 1/Y/y to enabled it or 0/N/n to disable it.

    Since printk_time is declared as a module_param, its value can also
    be set at boot-time by using
    linux printk.time=

    If the "time" boot option is used, print a message that it is deprecated
    and will be removed.

    Note its planned removal in feature-removal-schedule.txt.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Not called by anything in tree.

    Signed-off-by: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • cpuset.c:update_nodemask() uses a write_lock_irq() on tasklist_lock to
    block concurrent forks; a read_lock() suffices and is less intrusive.

    Signed-off-by: Paul Menage
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add the print-fatal-signals=1 boot option and the
    /proc/sys/kernel/print-fatal-signals runtime switch.

    This feature prints some minimal information about userspace segfaults to
    the kernel console. This is useful to find early bootup bugs where
    userspace debugging is very hard.

    Defaults to off.

    [akpm@linux-foundation.org: Don't add new sysctl numbers]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Here there is not need even in .show callback altering. The original code
    passes list_head in *v.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Fix ksoftirqd termination on cpu hotplug with naughty real time process.

    Assuming the following case:

    - Try to hot remove CPU2 from CPU1.
    - There is a real time process on CPU2, and that process doesn't sleep at all.
    - That rt process and ksoftirqd/2 is migrated to the CPU0

    Then ksoftirqd/2 can't stop becasue that rt process runs everlastingly on
    CPU0, and CPU1 waiting the ksoftirqd/2's termination hangs up. To fix this
    problem, set the priority of ksoftirqd/2 to max one before kthread_stop().

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Satoru Takeuchi
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Ashok Raj
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Satoru Takeuchi
     
  • stop_machine_run() does its work on "kstopmachine" thread having max
    priority. However that thread get such priority after woken up.
    Therefore, in the following case ...

    - "kstopmachine" try to run on CPU1

    - There is a real time process which doesn't relinquish CPU time
    voluntary on CPU1

    ... "kstopmachine" can't start to run and the CPU on which
    stop_machine_run() is runing hangs up. To fix this problem, call
    sched_setscheduler() before waking up that thread.

    Signed-off-by: Satoru Takeuchi
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Ashok Raj
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Satoru Takeuchi
     
  • Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
    during suspend. This may cause confusion so I restore the old behaviour by
    using the boot based time instead of monotonic for uptime.

    Signed-off-by: Tomas Janousek
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Janousek