06 Nov, 2015
40 commits
-
Before the main loop, vma is already is NULL. There is no need to set it
to NULL again.Signed-off-by: Chen Gang
Reviewed-by: Oleg Nesterov
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
probe_kernel_address() is basically the same as the (later added)
probe_kernel_read().The return value on EFAULT is a bit different: probe_kernel_address()
returns number-of-bytes-not-copied whereas probe_kernel_read() returns
-EFAULT. All callers have been checked, none cared.probe_kernel_read() can be overridden by the architecture whereas
probe_kernel_address() cannot. parisc, blackfin and um do this, to insert
additional checking. Hence this patch possibly fixes obscure bugs,
although there are only two probe_kernel_address() callsites outside
arch/.My first attempt involved removing probe_kernel_address() entirely and
converting all callsites to use probe_kernel_read() directly, but that got
tiresome.This patch shrinks mm/slab_common.o by 218 bytes. For a single
probe_kernel_address() callsite.Cc: Steven Miao
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In mlockall syscall wrapper after out-label for goto code just doing
return. Remove goto out statements and return error values directly.Also instead of rewriting ret variable before every if-check move returns
to 'error'-like path under if-check.Objdump asm listing showed me reducing by few asm lines. Object file size
descreased from 220592 bytes to 220528 bytes for me (for aarch64).Signed-off-by: Alexey Klimov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Few lines below object is reinitialized by lookup_object() so we don't
need to init it by NULL in the beginning of find_and_get_object().Signed-off-by: Alexey Klimov
Acked-by: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc
configurations defining ARCH_DMA_MINALIGN to 128), the first
kmalloc_caches[] entry to be initialised after slab_early_init = 0 is
"kmalloc-128" with index 7. Depending on the debug kernel configuration,
sizeof(struct kmem_cache) can be larger than 128 resulting in an
INDEX_NODE of 8.Commit 8fc9cf420b36 ("slab: make more slab management structure off the
slab") enables off-slab management objects for sizes starting with
PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation
of the "kmalloc-128" cache would try to place the management objects
off-slab. However, since KMALLOC_MIN_SIZE is already 128 and
freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size)
returns NULL (kmalloc_caches[7] not populated yet). This triggers the
following bug on arm64:kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540
Hardware name: Juno (DT)
PC is at __kmem_cache_create+0x21c/0x280
LR is at __kmem_cache_create+0x210/0x280
[...]
Call trace:
__kmem_cache_create+0x21c/0x280
create_boot_cache+0x48/0x80
create_kmalloc_cache+0x50/0x88
create_kmalloc_caches+0x4c/0xf4
kmem_cache_init+0x100/0x118
start_kernel+0x214/0x33cThis patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab
management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE.Fixes: 8fc9cf420b36 ("slab: make more slab management structure off the slab")
Signed-off-by: Catalin Marinas
Reported-by: Geert Uytterhoeven
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: [3.15+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In slub_order(), the order starts from max(min_order,
get_order(min_objects * size)). When (min_objects * size) has different
order from (min_objects * size + reserved), it will skip this order via a
check in the loop.This patch optimizes this a little by calculating the start order with
`reserved' in consideration and removing the check in loop.Signed-off-by: Wei Yang
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
get_order() is more easy to understand.
This patch just replaces it.
Signed-off-by: Wei Yang
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Joonsoo Kim
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In calculate_order(), it tries to calculate the best order by adjusting
the fraction and min_objects. On each iteration on min_objects, fraction
iterates on 16, 8, 4. Which means the acceptable waste increases with
1/16, 1/8, 1/4.This patch corrects the comment according to the code.
Signed-off-by: Wei Yang
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The assignment to NULL within the error condition was written in a 2014
patch to suppress a compiler warning. However it would be cleaner to just
initialize the kmem_cache to NULL and just return it in case of an error
condition.Signed-off-by: Alexandru Moise
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Acked-by: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add documentation on how to use slabinfo-gnuplot.sh script.
Signed-off-by: Sergey Senozhatsky
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
GNUplot `slabinfo -X' stats, collected, for example, using the
following command:
while [ 1 ]; do slabinfo -X >> stats; sleep 1; done`slabinfo-gnuplot.sh stats' pre-processes collected records
and generate graphs (totals, slabs sorted by size, slabs
sorted by size).Graphs can be [individually] regenerate with different samples
range and graph width-heigh (-r %d,%d and -s %d,%d options).To visually compare N `totals' graphs:
slabinfo-gnuplot.sh -t FILE1-totals FILE2-totals ... FILEN-totalsSigned-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
checkpatch.pl complains about globals being explicitly zeroed
out: "ERROR: do not initialise globals to 0 or NULL".New globals, introduced in this patch set, have no explicit 0
initialization; clean up the old ones to make it less hairy.Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce "-B|--Bytes" opt to disable store_size() dynamic
size scaling and report size in bytes instead.This `expands' the interface a bit, it's impossible to use
printf("%6s") anymore to output sizes.Example:
slabinfo -X -N 2
Slabcache Totals
----------------
Slabcaches : 91 Aliases : 119->69 Active: 63
Memory used: 199798784 # Loss : 10689376 MRatio: 5%
# Objects : 324301 # PartObj: 18151 ORatio: 5%Per Cache Average Min Max Total
----------------------------------------------------------------------------
#Objects 5147 1 89068 324301
#Slabs 199 1 3886 12537
#PartSlab 12 0 240 778
%PartSlab 32% 0% 100% 6%
PartObjs 5 0 4569 18151
% PartObj 26% 0% 100% 5%
Memory 3171409 8192 127336448 199798784
Used 3001736 160 121429728 189109408
Loss 169672 0 5906720 10689376Per Object Average Min Max
-----------------------------------------------------------
Memory 585 8 8192
User 583 8 8192
Loss 2 0 64Slabs sorted by size
--------------------
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 69948 1736 127336448 3871/0/15 18 3 0 95 a
dentry 89068 288 26058752 3164/0/17 28 1 0 98 aSlabs sorted by loss
--------------------
Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 69948 1736 5906720 3871/0/15 18 3 0 95 a
inode_cache 11628 864 537472 642/0/4 18 2 0 94 aBesides, store_size() does not use powers of two for G/M/K
if (value > 1000000000UL) {
divisor = 100000000UL;
trailer = 'G';
} else if (value > 1000000UL) {
divisor = 100000UL;
trailer = 'M';
} else if (value > 1000UL) {
divisor = 100;
trailer = 'K';
}Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add "-X|--Xtotals" opt to output extended totals summary,
which includes:
-- totals summary
-- slabs sorted by size
-- slabs sorted by loss (waste)Example:
=======slabinfo --X -N 1
Slabcache Totals
----------------
Slabcaches : 91 Aliases : 120->69 Active: 65
Memory used: 568.3M # Loss : 30.4M MRatio: 5%
# Objects : 920.1K # PartObj: 161.2K ORatio: 17%Per Cache Average Min Max Total
---------------------------------------------------------
#Objects 14.1K 1 227.8K 920.1K
#Slabs 533 1 11.7K 34.7K
#PartSlab 86 0 4.3K 5.6K
%PartSlab 24% 0% 100% 16%
PartObjs 17 0 129.3K 161.2K
% PartObj 17% 0% 100% 17%
Memory 8.7M 8.1K 384.7M 568.3M
Used 8.2M 160 366.5M 537.9M
Loss 468.8K 0 18.2M 30.4MPer Object Average Min Max
---------------------------------------------
Memory 587 8 8.1K
User 584 8 8.1K
Loss 2 0 64Slabs sorted by size
----------------------
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 211142 1736 384.7M 11732/40/10 18 3 0 95 aSlabs sorted by loss
----------------------
Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 211142 1736 18.2M 11732/40/10 18 3 0 95 aSigned-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix mismatches between usage() output and real opts[] options. Add
missing alternative opt names, e.g., '-S' had no '--Size' opts[] entry,
etc.Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce opt "-L|--sort-loss" to sort and output slabs by
loss (waste) in slabcache().Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce opt "-N|--lines=K" to limit the number of slabs
being reported in output_slabs().Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patchset adds 'extended' slabinfo mode that provides additional
information:-- totals summary
-- slabs sorted by size
-- slabs sorted by loss (waste)The patches also introduces several new slabinfo options to limit the
number of slabs reported, sort slabs by loss (waste); and some fixes.Extended output example (slabinfo -X -N 2):
Slabcache Totals
----------------
Slabcaches : 91 Aliases : 119->69 Active: 63
Memory used: 199798784 # Loss : 10689376 MRatio: 5%
# Objects : 324301 # PartObj: 18151 ORatio: 5%Per Cache Average Min Max Total
----------------------------------------------------------------------------
#Objects 5147 1 89068 324301
#Slabs 199 1 3886 12537
#PartSlab 12 0 240 778
%PartSlab 32% 0% 100% 6%
PartObjs 5 0 4569 18151
% PartObj 26% 0% 100% 5%
Memory 3171409 8192 127336448 199798784
Used 3001736 160 121429728 189109408
Loss 169672 0 5906720 10689376Per Object Average Min Max
-----------------------------------------------------------
Memory 585 8 8192
User 583 8 8192
Loss 2 0 64Slabs sorted by size
--------------------
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 69948 1736 127336448 3871/0/15 18 3 0 95 a
dentry 89068 288 26058752 3164/0/17 28 1 0 98 aSlabs sorted by loss
--------------------
Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg
ext4_inode_cache 69948 1736 5906720 3871/0/15 18 3 0 95 a
inode_cache 11628 864 537472 642/0/4 18 2 0 94 aThe last patch in the series addresses Linus' comment from
http://marc.info/?l=linux-mm&m=144148518703321&w=2(well, it's been some time. sorry.)
gnuplot script takes the slabinfo records file, where every record is a `slabinfo -X'
output. So the basic workflow is, for example, as follows:while [ 1 ]; do slabinfo -X -N 2 >> stats; sleep 1; done
^C
slabinfo-gnuplot.sh statsThe last command will produce 3 png files (and 3 stats files)
-- graph of slabinfo totals
-- graph of slabs by size
-- graph of slabs by lossIt's also possible to select a range of records for plotting (a range of collected
slabinfo outputs) via `-r 10,100` (for example); and compare totals from several
measurements (to visially compare slabs behaviour (10,50 range)) using
pre-parsed totals files:
slabinfo-gnuplot.sh -r 10,50 -t stats-totals1 .. stats-totals2This also, technically, supports ktest. Upload new slabinfo to target,
collect the stats and give the resulting stats file to slabinfo-gnuplotThis patch (of 8):
Use getopt constants in `struct option' ->has_arg instead of numerical
representations.Signed-off-by: Sergey Senozhatsky
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, when kmem_cache_destroy() is called for a global cache, we
print a warning for each per memcg cache attached to it that has active
objects (see shutdown_cache). This is redundant, because it gives no new
information and only clutters the log. If a cache being destroyed has
active objects, there must be a memory leak in the module that created the
cache, and it does not matter if the cache was used by users in memory
cgroups or not.This patch moves the warning from shutdown_cache(), which is called for
shutting down both global and per memcg caches, to kmem_cache_destroy(),
so that the warning is only printed once if there are objects left in the
cache being destroyed.Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, we do not clear pointers to per memcg caches in the
memcg_params.memcg_caches array when a global cache is destroyed with
kmem_cache_destroy.This is fine if the global cache does get destroyed. However, a cache can
be left on the list if it still has active objects when kmem_cache_destroy
is called (due to a memory leak). If this happens, the entries in the
array will point to already freed areas, which is likely to result in data
corruption when the cache is reused (via slab merging).Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
do_kmem_cache_create(), do_kmem_cache_shutdown(), and
do_kmem_cache_release() sound awkward for static helper functions that are
not supposed to be used outside slab_common.c. Rename them to
create_cache(), shutdown_cache(), and release_caches(), respectively.
This patch is a pure cleanup and does not introduce any functional
changes.Signed-off-by: Vladimir Davydov
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Acked-by: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The patch "slab.h: sprinkle __assume_aligned attributes" causes *tons* of
whinges if you do 'make C=2' with sparse 0.5.0:CHECK drivers/media/usb/pwc/pwc-if.c
include/linux/slab.h:307:43: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:308:58: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:337:73: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:375:74: error: attribute '__assume_aligned__': unknown attribute
include/linux/slab.h:378:80: error: attribute '__assume_aligned__': unknown attributesparse apparently pretends to be gcc >= 4.9, yet isn't prepared to handle
all the function attributes supported by those gccs and complains loudly.
So hide the definition of __assume_aligned from it (so that the generic
one in compiler.h gets used).Signed-off-by: Rasmus Villemoes
Reported-by: Valdis Kletnieks
Tested-By: Valdis Kletnieks
Cc: Christopher Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
gcc 4.9 added the function attribute assume_aligned, indicating to the
caller that the returned pointer may be assumed to have a certain minimal
alignment. This is useful if, for example, the return value is passed to
memset(). Add a shorthand macro for that.Signed-off-by: Rasmus Villemoes
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A good candidate to return a boolean result.
Signed-off-by: Denis Kirjanov
Cc: Christoph Lameter
Reviewed-by: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Theoretically it is possible that the watchdog timer expires right at the
time when a user sets 'watchdog_thresh' to zero (note: this disables the
lockup detectors). In this scenario, the is_softlockup() function - which
is called by the timer - could produce a false positive.Fix this by checking the current value of 'watchdog_thresh'.
Signed-off-by: Ulrich Obergfell
Acked-by: Don Zickus
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
watchdog_{park|unpark}_threads() are now called in code paths that protect
themselves against CPU hotplug, so {get|put}_online_cpus() calls are
redundant and can be removed.Signed-off-by: Ulrich Obergfell
Acked-by: Don Zickus
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The handler functions for watchdog parameters in /proc/sys/kernel do not
protect themselves against races with CPU hotplug. Hence, theoretically
it is possible that a new watchdog thread is started on a hotplugged CPU
while a parameter is being modified, and the thread could thus use a
parameter value that is 'in transition'.For example, if 'watchdog_thresh' is being set to zero (note: this
disables the lockup detectors) the thread would erroneously use the value
zero as the sample period.To avoid such races and to keep the /proc handler code consistent,
call
{get|put}_online_cpus() in proc_watchdog_common()
{get|put}_online_cpus() in proc_watchdog_thresh()
{get|put}_online_cpus() in proc_watchdog_cpumask()Signed-off-by: Ulrich Obergfell
Acked-by: Don Zickus
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The lockup detector suspend/resume interface that was introduced by
commit 8c073d27d7ad ("watchdog: introduce watchdog_suspend() and
watchdog_resume()") does not protect itself against races with CPU
hotplug. Hence, theoretically it is possible that a new watchdog thread
is started on a hotplugged CPU while the lockup detector is suspended,
and the thread could thus interfere unexpectedly with the code that
requested to suspend the lockup detector.Avoid the race by calling
get_online_cpus() in lockup_detector_suspend()
put_online_cpus() in lockup_detector_resume()Signed-off-by: Ulrich Obergfell
Acked-by: Don Zickus
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The only way to enable a hardlockup to panic the machine is to set
'nmi_watchdog=panic' on the kernel command line.This makes it awkward for end users and folks who want to run automate
tests (like myself).Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
knob.Signed-off-by: Don Zickus
Cc: Ulrich Obergfell
Acked-by: Jiri Kosina
Reviewed-by: Aaron Tomlin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.IOW, we are often looking at the stacktrace of the victim and not the
actual offender.Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.Signed-off-by: Jiri Kosina
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Acked-by: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If kthread_park() returns an error, watchdog_park_threads() should not
blindly 'roll back' the already parked threads to the unparked state.
Instead leave it up to the callers to handle such errors appropriately in
their context. For example, it is redundant to unpark the threads if the
lockup detectors will soon be disabled by the callers anyway.Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
lockup_detector_suspend() now handles errors from watchdog_park_threads().
Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
update_watchdog_all_cpus() now passes errors from watchdog_park_threads()
up to functions in the call chain. This allows watchdog_enable_all_cpus()
and proc_watchdog_update() to handle such errors too.Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move watchdog_disable_all_cpus() outside of the ifdef so that it is
available if CONFIG_SYSCTL is not defined. This is preparation for
"watchdog: implement error handling in update_watchdog_all_cpus() and
callers".Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The original watchdog_park_threads() function that was introduced by
commit 81a4beef91ba ("watchdog: introduce watchdog_park_threads() and
watchdog_unpark_threads()") takes a very simple approach to handle
errors returned by kthread_park(): It attempts to roll back all watchdog
threads to the unparked state. However, this may be undesired behaviour
from the perspective of the caller which may want to handle errors as
appropriate in its specific context. Currently, there are two possible
call chains:- watchdog suspend/resume interface
lockup_detector_suspend
watchdog_park_threads- write to parameters in /proc/sys/kernel
proc_watchdog_update
watchdog_enable_all_cpus
update_watchdog_all_cpus
watchdog_park_threadsInstead of 'blindly' attempting to unpark the watchdog threads if a
kthread_park() call fails, the new approach is to disable the lockup
detectors in the above call chains. Failure becomes visible to the user
as follows:- error messages from lockup_detector_suspend()
or watchdog_enable_all_cpus()- the state that can be read from /proc/sys/kernel/watchdog_enabled
- the 'write' system call in the latter call chain returns an error
I did not experience kthread_park() failures in practice, I used some
instrumentation to fake error returns from kthread_park() in order to test
the patches.This patch (of 5):
Restore the previous value of watchdog_thresh _and_ sample_period if
proc_watchdog_update() returns an error. The variables must be consistent
to avoid false positives of the lockup detectors.Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make is_hardlockup return bool to improve readability due to this
particular function only using either one or zero as its return value.No functional change.
Signed-off-by: Yaowei Bai
Reviewed-by: Aaron Tomlin
Acked-by: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the remote locking fail, we run a local vfs unlock that should work and
return success to userland when we didn't actually lock at all. We need
to tell the application that tried to lock that it didn't get it, not that
all went well.Signed-off-by: Dominique Martinet
Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Latchesar Ionkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make struct callback_head aligned to size of pointer. On most
architectures it happens naturally due ABI requirements, but some
architectures (like CRIS) have weird ABI and we need to ask it explicitly.The alignment is required to guarantee that bits 0 and 1 of @next will be
clear under normal conditions -- as long as we use call_rcu(),
call_rcu_bh(), call_rcu_sched(), or call_srcu() to queue callback.This guarantee is important for few reasons:
- future call_rcu_lazy() will make use of lower bits in the pointer;
- the structure shares storage spacer in struct page with @compound_head,
which encode PageTail() in bit 0. The guarantee is needed to avoid
false-positive PageTail().False postive PageTail() caused crash on crisv32[1]. It happend due
misaligned task_struct->rcu, which was byte-aligned.[1] http://lkml.kernel.org/r/55FAEA67.9000102@roeck-us.net
Signed-off-by: Kirill A. Shutemov
Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Acked-by: Paul E. McKenney
Cc: Mikael Starvik
Cc: Jesper Nilsson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
readahead_pages in ocfs2_duplicate_clusters_by_page is defined but not
used, so clean it up.Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A node can mount multiple ocfs2 volumes. And if thread names are same for
each volume/domain, it will bring inconvenience when analyzing problems
because we have to identify which volume/domain the messages belong to.Since thread name will be printed to messages, so add volume uuid or dlm
name to thread name can benefit problem analysis.Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Gang He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds