18 Mar, 2016

40 commits

  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Acked-by: Tejun Heo
    Cc: Bartlomiej Zolnierkiewicz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Acked-by: Linus Walleij
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Cc: Sebastian Reichel
    Cc: Dmitry Eremin-Solenikov
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Cc: David Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Acked-by: Linus Walleij
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The new helper returns index of the mathing string in an array. We
    would use it here.

    Signed-off-by: Andy Shevchenko
    Reviewed-by: Mika Westerberg
    Acked-by: Rafael J. Wysocki
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Occasionally we have to search for an occurrence of a string in an array
    of strings. Make a simple helper for that purpose.

    Signed-off-by: Andy Shevchenko
    Cc: "David S. Miller"
    Cc: Bartlomiej Zolnierkiewicz
    Cc: David Airlie
    Cc: David Woodhouse
    Cc: Dmitry Eremin-Solenikov
    Cc: Greg Kroah-Hartman
    Cc: Heikki Krogerus
    Cc: Linus Walleij
    Cc: Mika Westerberg
    Cc: Rafael J. Wysocki
    Cc: Sebastian Reichel
    Cc: Tejun Heo
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Without fix test crashes inside tagged iteration.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • After calling radix_tree_iter_retry(), 'slot' will be set to NULL. This
    can cause radix_tree_next_slot() to dereference the NULL pointer. Add
    Konstantin Khlebnikov's test to the regression framework.

    Signed-off-by: Matthew Wilcox
    Reported-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • shmem likes to occasionally drop the lock, schedule, then reacqire the
    lock and continue with the iteration from the last place it left off.
    This is currently done with a pretty ugly goto. Introduce
    radix_tree_iter_next() and use it throughout shmem.c.

    [koct9i@gmail.com: fix bug in radix_tree_iter_next() for tagged iteration]
    Signed-off-by: Matthew Wilcox
    Cc: Hugh Dickins
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Instead of a 'goto restart', we can now use radix_tree_iter_retry() to
    restart from our current position. This will make a difference when
    there are more ways to happen across an indirect pointer. And it
    eliminates some confusing gotos.

    [vbabka@suse.cz: remove now-obsolete-and-misleading comment]
    Signed-off-by: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Konstantin Khlebnikov
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Even though this is a 'can't happen' situation, use the new
    radix_tree_iter_retry() pattern to eliminate a goto.

    [akpm@linux-foundation.org: fix btrfs build]
    Signed-off-by: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Konstantin Khlebnikov
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This is debug code which is #if 0 out.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • With huge pages, it is convenient to have the radix tree be able to
    return an entry that covers multiple indices. Previous attempts to deal
    with the problem have involved inserting N duplicate entries, which is a
    waste of memory and leads to problems trying to handle aliased tags, or
    probing the tree multiple times to find alternative entries which might
    cover the requested index.

    This approach inserts one canonical entry into the tree for a given
    range of indices, and may also insert other entries in order to ensure
    that lookups find the canonical entry.

    This solution only tolerates inserting powers of two that are greater
    than the fanout of the tree. If we wish to expand the radix tree's
    abilities to support large-ish pages that is less than the fanout at the
    penultimate level of the tree, then we would need to add one more step
    in lookup to ensure that any sibling nodes in the final level of the
    tree are dereferenced and we return the canonical entry that they
    reference.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • When we introduce entries that can cover multiple indices, we will need
    to stop in __radix_tree_create based on the shift, not the height.
    Split out for ease of bisect.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Set the 'indirect_ptr' bit on all the pointers to internal nodes, not
    just on the root node. This enables the following patches to support
    multi-order entries in the radix tree. This patch is split out for ease
    of bisection.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This code is mostly from Andrew Morton and Nick Piggin; tarball downloaded
    from http://ozlabs.org/~akpm/rtth.tar.gz with sha1sum
    0ce679db9ec047296b5d1ff7a1dfaa03a7bef1bd

    Some small modifications were necessary to the test harness to fix the
    build with the current Linux source code.

    I also made minor modifications to automatically test the radix-tree.c
    and radix-tree.h files that are in the current source tree, as opposed
    to a copied and slightly modified version. I am sure more could be done
    to tidy up the harness, as well as adding more tests.

    [koct9i@gmail.com: fix compilation]
    Signed-off-by: Matthew Wilcox
    Cc: Shuah Khan
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The radix-tree header uses the __ffs() function, which is defined in
    bitops.h. The current kernel headers implicitly include bitops.h, but
    the userspace test harness does not.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Christian Borntraeger reported that panic_on_warn doesn't have any
    effect on s390.

    The panic_on_warn feature was introduced with 9e3961a09798 ("kernel: add
    panic_on_warn"). However it did care only for the case when
    WANT_WARN_ON_SLOWPATH is defined. This is turn is only the case for
    architectures which do not have an own __WARN_TAINT defined.

    Other architectures which do have __WARN_TAINT defined call report_bug()
    for warnings within lib/bug.c which does not call panic() in case
    panic_on_warn is set.

    Let's simply enable the panic_on_warn feature by adding the same code
    like it was added to warn_slowpath_common() in panic.c.

    This enables panic_on_warn also for arm64, parisc, powerpc, s390 and sh.

    Signed-off-by: Heiko Carstens
    Reported-by: Christian Borntraeger
    Tested-by: Christian Borntraeger
    Acked-by: Prarit Bhargava
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • hlist_bl_unhashed() and hlist_bl_empty() are all boolean functions, so
    return bool instead of int.

    Signed-off-by: Chen Gang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • Benjamin Romer is no longer a maintainer for the Unisys s-Par driver,
    presently in drivers/staging/unisys/.

    Signed-off-by: David Kershner
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Kershner
     
  • This allows us to extract from the vmcore only the messages emitted
    since the last time the ring buffer was cleared. We just have to make
    sure its value is always up-to-date, when old messages are discarded to
    free space in log_make_free_space() for example.

    Signed-off-by: Zeyu Zhao
    Signed-off-by: Ivan Delalande
    Cc: Kay Sievers
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ivan Delalande
     
  • have_callable_console() must also test CON_ENABLED bit, not just
    CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can
    wrongly assume that it's safe to call_console_drivers().

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1, since 8d91f8b15361 ("printk: do
    cond_resched() between lines while outputting to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    However, console_trylock() callers (among them is printk()) do not
    always call printk() from atomic contexts, and some of them can
    cond_resched() in console_unlock(), so console_trylock() can set
    `console_may_schedule' to 1 for such processes.

    For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
    sets `console_may_schedule' to 0.

    It's possible to drop explicit preempt_disable()/preempt_enable() in
    vprintk_emit(), because console_unlock() and console_trylock() are now
    smart enough:
    a) console_unlock() does not cond_resched() when it's unsafe
    (console_trylock() takes care of that)
    b) console_unlock() does can_use_console() check.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • console_unlock() allows to cond_resched() if its caller has set
    `console_may_schedule' to 1 (this functionality is present since
    8d91f8b15361 ("printk: do cond_resched() between lines while outputting
    to consoles").

    The rules are:
    -- console_lock() always sets `console_may_schedule' to 1
    -- console_trylock() always sets `console_may_schedule' to 0

    printk() calls console_unlock() with preemption desabled, which
    basically can lead to RCU stalls, watchdog soft lockups, etc. if
    something is simultaneously calling printk() frequent enough (IOW,
    console_sem owner always has new data to send to console divers and
    can't leave console_unlock() for a long time).

    printk()->console_trylock() callers do not necessarily execute in atomic
    contexts, and some of them can cond_resched() in console_unlock().
    console_trylock() can set `console_may_schedule' to 1 (allow
    cond_resched() later in consoe_unlock()) when it's safe.

    This patch (of 3):

    vprintk_emit() disables preemption around console_trylock_for_printk()
    and console_unlock() calls for a strong reason -- can_use_console()
    check. The thing is that vprintl_emit() can be called on a CPU that is
    not fully brought up yet (!cpu_online()), which potentially can cause
    problems if console driver wants to access per-cpu data. A console
    driver can explicitly state that it's safe to call it from !online cpu
    by setting CON_ANYTIME bit in console ->flags. That's why for
    !cpu_online() can_use_console() iterates all the console to find out if
    there is a CON_ANYTIME console, otherwise console_unlock() must be
    avoided.

    can_use_console() ensures that console_unlock() call is safe in
    vprintk_emit() only; console_lock() and console_trylock() are not
    covered by this check. Even though call_console_drivers(), invoked from
    console_cont_flush() and console_unlock(), tests `!cpu_online() &&
    CON_ANYTIME' for_each_console(), it may be too late, which can result in
    messages loss.

    Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
    CON_ANYTIME consoles available.

    CPU0 online CPU1 !online
    console_trylock()
    ...
    console_unlock()
    console_cont_flush
    spin_lock logbuf_lock
    if (!cont.len) {
    spin_unlock logbuf_lock
    return
    }
    for (;;) {
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    spin_lock logbuf_lock
    !console_trylock_for_printk msg_print_text
    return console_idx = log_next()
    console_seq++
    console_prev = msg->flags
    spin_unlock logbuf_lock

    call_console_drivers()
    for_each_console(con) {
    if (!cpu_online() &&
    !(con->flags & CON_ANYTIME))
    continue;
    }
    /*
    * no message printed, we lost it
    */
    vprintk_emit
    spin_lock logbuf_lock
    log_store
    spin_unlock logbuf_lock
    !console_trylock_for_printk
    return
    /*
    * go to the beginning of the loop,
    * find out there are new messages,
    * lose it
    */
    }

    console_trylock()/console_lock() call on CPU1 may come from cpu
    notifiers registered on that CPU. Since notifiers are not getting
    unregistered when CPU is going DOWN, all of the notifiers receive
    notifications during CPU UP. For example, on my x86_64, I see around 50
    notification sent from offline CPU to itself

    [swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
    [swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify

    while doing
    echo 0 > /sys/devices/system/cpu/cpu2/online
    echo 1 > /sys/devices/system/cpu/cpu2/online

    So grabbing the console_sem lock while CPU is !online is possible,
    in theory.

    This patch moves can_use_console() check out of
    console_trylock_for_printk(). Instead it calls it in console_unlock(),
    so now console_lock()/console_unlock() are also 'protected' by
    can_use_console(). This also means that console_trylock_for_printk() is
    not really needed anymore and can be removed.

    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Kyle McMartin
    Cc: Dave Jones
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • The v850 port was removed by commits f606ddf42fd4 and 07a887d399b8 in
    2008. These #defines are not used in the current kernel.

    Signed-off-by: Rob Landley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • There are various email addresses for me throughout the kernel. Use the
    one that will always be valid.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This has hit me a couple of times already. I would be debugging code
    and the system would simply hang and then reboot. Finally, I found that
    the problem was caused by WARN_ON_ONCE() and friends.

    The macro WARN_ON_ONCE(condition) is defined as:

    static bool __section(.data.unlikely) __warned;
    int __ret_warn_once = !!(condition);

    if (unlikely(__ret_warn_once))
    if (WARN_ON(!__warned))
    __warned = true;

    unlikely(__ret_warn_once);

    Which looks great and all. But what I have hit, is an issue when
    WARN_ON() itself hits the same WARN_ON_ONCE() code. Because, the
    variable __warned is not yet set. Then it too calls WARN_ON() and that
    triggers the warning again. It keeps doing this until the stack is
    overflowed and the system crashes.

    By setting __warned first before calling WARN_ON() makes the original
    WARN_ON_ONCE() really only warn once, and not an infinite amount of
    times if the WARN_ON() also triggers the warning.

    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • arch/mn10300/kernel/fpu-nofpu.c:27:36: error: unknown type name 'elf_fpregset_t'
    int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpreg)

    Reported-by: kbuild test robot
    Cc: Josh Triplett
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • CONFIG_BUG=n && CONFIG_GENERIC_BUG=y make no sense and things break:

    In file included from include/linux/page-flags.h:9:0,
    from kernel/bounds.c:9:
    include/linux/bug.h:91:47: warning: 'struct bug_entry' declared inside parameter list
    static inline int is_warning_bug(const struct bug_entry *bug)
    ^
    include/linux/bug.h:91:47: warning: its scope is only this definition or declaration, which is probably not what you want
    include/linux/bug.h: In function 'is_warning_bug':
    >> include/linux/bug.h:93:12: error: dereferencing pointer to incomplete type
    return bug->flags & BUGFLAG_WARNING;

    Reported-by: kbuild test robot
    Cc: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • On i686 PAE enabled machine the contiguous physical area could be large
    and it can cause trimming down variables in below calculation in
    read_vmcore() and mmap_vmcore():

    tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);

    That is, the types being used is like below on i686:
    m->offset: unsigned long long int
    m->size: unsigned long long int
    *fpos: loff_t (long long int)
    buflen: size_t (unsigned int)

    So casting (m->offset + m->size - *fpos) by size_t means truncating a
    given value by 4GB.

    Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0
    then we will get tsz = 0. It is of course not an expected result.
    Similarly we could also get other truncated values less than buflen.
    Then the real size passed down is not correct any more.

    If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or
    mmap_vmcore use the min_t result with truncated values being compared to
    buflen. Then, fpos proceeds with the wrong value so that we reach below
    bugs:

    1) read_vmcore will refuse to continue so makedumpfile fails.
    2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().

    Use unsigned long long in min_t instead so that the variables in are not
    truncated.

    Signed-off-by: Baoquan He
    Signed-off-by: Dave Young
    Cc: HATAYAMA Daisuke
    Cc: Vivek Goyal
    Cc: Jianyu Zhan
    Cc: Minfei Huang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • It is not elegant that prompt shell does not start from new line after
    executing "cat /proc/$pid/wchan". Make prompt shell start from new
    line.

    Signed-off-by: Minfei Huang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     
  • `proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is
    enabled.

    Signed-off-by: Eric Engestrom
    Acked-by: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Engestrom
     
  • This patch provides a proc/PID/timerslack_ns interface which exposes a
    task's timerslack value in nanoseconds and allows it to be changed.

    This allows power/performance management software to set timer slack for
    other threads according to its policy for the thread (such as when the
    thread is designated foreground vs. background activity)

    If the value written is non-zero, slack is set to that value. Otherwise
    sets it to the default for the thread.

    This interface checks that the calling task has permissions to to use
    PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
    arbitrary apps do not change the timer slack for other apps.

    Signed-off-by: John Stultz
    Acked-by: Kees Cook
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Cc: Oren Laadan
    Cc: Ruchi Kandoi
    Cc: Rom Lemarchand
    Cc: Android Kernel Team
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     
  • This patchset introduces a /proc//timerslack_ns interface which
    would allow controlling processes to be able to set the timerslack value
    on other processes in order to save power by avoiding wakeups (Something
    Android currently does via out-of-tree patches).

    The first patch tries to fix the internal timer_slack_ns usage which was
    defined as a long, which limits the slack range to ~4 seconds on 32bit
    systems. It converts it to a u64, which provides the same basically
    unlimited slack (500 years) on both 32bit and 64bit machines.

    The second patch introduces the /proc//timerslack_ns interface
    which allows the full 64bit slack range for a task to be read or set on
    both 32bit and 64bit machines.

    With these two patches, on a 32bit machine, after setting the slack on
    bash to 10 seconds:

    $ time sleep 1

    real 0m10.747s
    user 0m0.001s
    sys 0m0.005s

    The first patch is a little ugly, since I had to chase the slack delta
    arguments through a number of functions converting them to u64s. Let me
    know if it makes sense to break that up more or not.

    Other than that things are fairly straightforward.

    This patch (of 2):

    The timer_slack_ns value in the task struct is currently a unsigned
    long. This means that on 32bit applications, the maximum slack is just
    over 4 seconds. However, on 64bit machines, its much much larger (~500
    years).

    This disparity could make application development a little (as well as
    the default_slack) to a u64. This means both 32bit and 64bit systems
    have the same effective internal slack range.

    Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
    the interface as a unsigned long, so we preserve that limitation on
    32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
    long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
    actually larger then what can be stored by an unsigned long.

    This patch also modifies hrtimer functions which specified the slack
    delta as a unsigned long.

    Signed-off-by: John Stultz
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Cc: Oren Laadan
    Cc: Ruchi Kandoi
    Cc: Rom Lemarchand
    Cc: Kees Cook
    Cc: Android Kernel Team
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz
     
  • After the OOM killer is disabled during suspend operation, any
    !__GFP_NOFAIL && __GFP_FS allocations are forced to fail. Thus, any
    !__GFP_NOFAIL && !__GFP_FS allocations should be forced to fail as well.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • While oom_killer_disable() is called by freeze_processes() after all
    user threads except the current thread are frozen, it is possible that
    kernel threads invoke the OOM killer and sends SIGKILL to the current
    thread due to sharing the thawed victim's memory. Therefore, checking
    for SIGKILL is preferable than TIF_MEMDIE.

    Signed-off-by: Tetsuo Handa
    Cc: Tetsuo Handa
    Cc: David Rientjes
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Add a new column to pool stats, which will tell how many pages ideally
    can be freed by class compaction, so it will be easier to analyze
    zsmalloc fragmentation.

    At the moment, we have only numbers of FULL and ALMOST_EMPTY classes,
    but they don't tell us how badly the class is fragmented internally.

    The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:

    class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
    [..]
    12 224 0 2 146 5 8 4 4
    13 240 0 0 0 0 0 1 0
    14 256 1 13 1840 1672 115 1 10
    15 272 0 0 0 0 0 1 0
    [..]
    49 816 0 3 745 735 149 1 2
    51 848 3 4 361 306 76 4 8
    52 864 12 14 378 268 81 3 21
    54 896 1 12 117 57 26 2 12
    57 944 0 0 0 0 0 3 0
    [..]
    Total 26 131 12709 10994 1071 134

    For example, from this particular output we can easily conclude that
    class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed
    by compaction.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • When unmapping a huge class page in zs_unmap_object, the page will be
    unmapped by kmap_atomic. the "!area->huge" branch in __zs_unmap_object
    is alway true, and no code set "area->huge" now, so we can drop it.

    Signed-off-by: YiPing Xu
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YiPing Xu