Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

24 Jun, 2014

2 commits

ed235875e kernel/watchdog.c: print traces for all cpus on lockup detection ... Browse Code »

A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.

Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.

On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.

To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting. This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.

If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.

[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning]
Signed-off-by: Aaron Tomlin
Signed-off-by: Don Zickus
Cc: David S. Miller
Cc: Mateusz Guzik
Cc: Oleg Nesterov
Signed-off-by: Jan Moskyto Matejka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aaron Tomlin
2014-06-24 07:47:44 +0800
7cd2b0a34 mm, pcp: allow restoring percpu_pagelist_fraction default ... Browse Code »
5

Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
RSP: 0018:ffff8800d87a3e78 EFLAGS: 00010246
RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
FS: 00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
Call Trace:
proc_sys_call_handler+0xb3/0xc0
proc_sys_write+0x14/0x20
vfs_write+0xba/0x1e0
SyS_write+0x46/0xb0
tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.

This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity. If a value in the range [1, 7]
is written, the sysctl will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Signed-off-by: David Rientjes
Reported-by: Oleg Drokin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-06-24 07:47:43 +0800

20 Jun, 2014

1 commit

c4222e463 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next ... Browse Code »

Pull sparc fixes from David Miller:
"Sparc sparse fixes from Sam Ravnborg"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
sparc64: fix sparse warnings in int_64.c
sparc64: fix sparse warning in ftrace.c
sparc64: fix sparse warning in kprobes.c
sparc64: fix sparse warning in kgdb_64.c
sparc64: fix sparse warnings in compat_audit.c
sparc64: fix sparse warnings in init_64.c
sparc64: fix sparse warnings in aes_glue.c
sparc: fix sparse warnings in smp_32.c + smp_64.c
sparc64: fix sparse warnings in perf_event.c
sparc64: fix sparse warnings in kprobes.c
sparc64: fix sparse warning in tsb.c
sparc64: clean up compat_sigset_t.seta handling
sparc64: fix sparse "Should it be static?" warnings in signal32.c
sparc64: fix sparse warnings in sys_sparc32.c
sparc64: fix sparse warning in pci.c
sparc64: fix sparse warnings in smp_64.c
sparc64: fix sparse warning in prom_64.c
sparc64: fix sparse warning in btext.c
sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
sparc64: fix sparse warning in process_64.c
...

Conflicts:
arch/sparc/include/asm/pgtable_64.h

Linus Torvalds
2014-06-20 01:50:07 +0800

13 Jun, 2014

1 commit

f9da455b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.

3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.

4) BPF now has a "random" opcode, from Chema Gonzalez.

5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.

6) Support TCP fastopen over ipv6, from Daniel Lee.

7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

8) Support software TSO in fec driver too, from Nimrod Andy.

9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...

Linus Torvalds
2014-06-13 05:27:40 +0800

07 Jun, 2014

4 commits

6f8fd1d77 sysctl: convert use of typedef ctl_table to struct ctl_table ... Browse Code »

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2014-06-07 07:08:16 +0800
f4aacea2f sysctl: allow for strict write position handling ... Browse Code »
2

When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start. This means the contents of
the last write to the sysctl controls the string contents instead of the
first:

open("/proc/sys/kernel/modprobe", O_WRONLY) = 1
write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
write(1, "/bin/true", 9) = 9
close(1) = 0

$ cat /proc/sys/kernel/modprobe
/bin/true

Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write. Similarly, multiple short writes would
not append to the sysctl.

The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations. For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.

This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions). For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior. Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2014-06-07 07:08:13 +0800
2ca9bb456 sysctl: refactor sysctl string writing logic ... Browse Code »

Consolidate buffer length checking with new-line/end-of-line checking.
Additionally, instead of reading user memory twice, just do the
assignment during the loop.

This change doesn't affect the potential races here. It was already
possible to read a sysctl that was in the middle of a write. In both
cases, the string will always be NULL terminated. The pre-existing race
remains a problem to be solved.

Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2014-06-07 07:08:13 +0800
f88083005 sysctl: clean up char buffer arguments ... Browse Code »

When writing to a sysctl string, each write, regardless of VFS position,
began writing the string from the start. This meant the contents of the
last write to the sysctl controlled the string contents instead of the
first.

This misbehavior was featured in an exploit against Chrome OS. While
it's not in itself a vulnerability, it's a weirdness that isn't on the
mind of most auditors: "This filter looks correct, the first line
written would not be meaningful to sysctl" doesn't apply here, since the
size of the write and the contents of the final write are what matter
when writing to sysctls.

This adds the sysctl kernel.sysctl_writes_strict to control the write
behavior. The default (0) reports when VFS position is non-0 on a
write, but retains legacy behavior, -1 disables the warning, and 1
enables the position-respecting behavior.

The long-term plan here is to wait for userspace to be fixed in response
to the new warning and to then switch the default kernel behavior to the
new position-respecting behavior.

This patch (of 4):

The char buffer arguments are needlessly cast in weird places. Clean it
up so things are easier to read.

Signed-off-by: Kees Cook
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2014-06-07 07:08:13 +0800

05 Jun, 2014

1 commit

a0abcf2e8 Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next ... Browse Code »

Pull x86 cdso updates from Peter Anvin:
"Vdso cleanups and improvements largely from Andy Lutomirski. This
makes the vdso a lot less ''special''"

* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso, build: Make LE access macros clearer, host-safe
x86/vdso, build: Fix cross-compilation from big-endian architectures
x86/vdso, build: When vdso2c fails, unlink the output
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
x86, mm: Improve _install_special_mapping and fix x86 vdso naming
mm, fs: Add vm_ops->name as an alternative to arch_vma_name
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
x86, vdso: Move the 32-bit vdso special pages after the text
x86, vdso: Reimplement vdso.so preparation in build-time C
x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
x86, vdso: Clean up 32-bit vs 64-bit vdso params
x86, mm: Ensure correct alignment of the fixmap

Linus Torvalds
2014-06-05 23:05:29 +0800

19 May, 2014

1 commit

8c7260c0d sparc64: fix sparse warning in tsb.c ... Browse Code »

Fix following warning:
tsb.c:290:5: warning: symbol 'sysctl_tsb_ratio' was not declared. Should it be static?

Add extern declaration in asm/setup.h and remove local declaration
in kernel/sysctl.c

Signed-off-by: Sam Ravnborg
Signed-off-by: David S. Miller

Sam Ravnborg
2014-05-19 10:01:32 +0800

15 May, 2014

1 commit

122ff243f ipv4: make ip_local_reserved_ports per netns ... Browse Code »

ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2014-05-15 03:31:45 +0800

06 May, 2014

1 commit

3d7ee969b x86, vdso: Clean up 32-bit vs 64-bit vdso params ... Browse Code »

Rather than using 'vdso_enabled' and an awful #define, just call the
parameters vdso32_enabled and vdso64_enabled.

Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin

Andy Lutomirski
2014-05-06 04:18:40 +0800

26 Apr, 2014

1 commit

86d56134f kobject: Make support for uevent_helper optional. ... Browse Code »

Support for uevent_helper, aka hotplug, is not required on many systems
these days but it can still be enabled via sysfs or sysctl.

Reported-by: Darren Shepherd
Signed-off-by: Michael Marineau
Signed-off-by: Greg Kroah-Hartman

Michael Marineau
2014-04-26 03:00:49 +0800

08 Apr, 2014

1 commit

80df28476 hung_task: check the value of "sysctl_hung_task_timeout_sec" ... Browse Code »
5

As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again. The screen will be filled with

"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua
Tested-by: Satoru Takeuchi
Cc: [3.4+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Liu Hua
2014-04-08 07:36:07 +0800

04 Apr, 2014

1 commit

5509a5d27 drop_caches: add some documentation and info message ... Browse Code »

There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape". Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches. On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen
Signed-off-by: Michal Hocko
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-04-04 07:21:04 +0800

02 Apr, 2014

1 commit

7a4883773 Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer updates from Jens Axboe:
"This is the pull request for the core block IO bits for the 3.15
kernel. It's a smaller round this time, it contains:

- Various little blk-mq fixes and additions from Christoph and
myself.

- Cleanup of the IPI usage from the block layer, and associated
helper code. From Frederic Weisbecker and Jan Kara.

- Duplicate code cleanup in bio-integrity from Gu Zheng. This will
give you a merge conflict, but that should be easy to resolve.

- blk-mq notify spinlock fix for RT from Mike Galbraith.

- A blktrace partial accounting bug fix from Roman Pen.

- Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"

* 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
blk-mq: add REQ_SYNC early
rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
blk-mq: support partial I/O completions
blk-mq: merge blk_mq_insert_request and blk_mq_run_request
blk-mq: remove blk_mq_alloc_rq
blk-mq: don't dump CPU -> hw queue map on driver load
blk-mq: fix wrong usage of hctx->state vs hctx->flags
blk-mq: allow blk_mq_init_commands() to return failure
block: remove old blk_iopoll_enabled variable
blktrace: fix accounting of partially completed requests
smp: Rename __smp_call_function_single() to smp_call_function_single_async()
smp: Remove wait argument from __smp_call_function_single()
watchdog: Simplify a little the IPI call
smp: Move __smp_call_function_single() below its safe version
smp: Consolidate the various smp_call_function_single() declensions
smp: Teach __smp_call_function_single() to check for offline cpus
smp: Remove unused list_head from csd
smp: Iterate functions through llist_for_each_entry_safe()
block: Stop abusing rq->csd.list in blk-softirq
block: Remove useless IPI struct initialization
...

Linus Torvalds
2014-04-02 10:19:15 +0800

13 Mar, 2014

1 commit

89f8b33ca block: remove old blk_iopoll_enabled variable ... Browse Code »

This was a debugging measure to toggle enabled/disabled
when testing. But for real production setups, it's not
safe to toggle this setting without either reloading
drivers of quiescing IO first. Neither of which the toggle
enforces.

Additionally, it makes drivers deal with the conditional
state.

Remove it completely. It's up to the driver whether iopoll
is enabled or not.

Signed-off-by: Jens Axboe

Jens Axboe
2014-03-13 23:38:42 +0800

02 Feb, 2014

1 commit

eaa4e4fcf Merge branch 'linus' into sched/core, to resolve conflicts ... Browse Code »

Conflicts:
kernel/sysctl.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-02-02 16:45:39 +0800

01 Feb, 2014

1 commit

ab5318788 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core debug changes from Ingo Molnar:
"This contains mostly kernel debugging related updates:

- make hung_task detection more configurable to distros
- add final bits for x86 UV NMI debugging, with related KGDB changes
- update the mailing-list of MAINTAINERS entries I'm involved with"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hung_task: Display every hung task warning
sysctl: Add neg_one as a standard constraint
x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
x86/uv/nmi: Fix Sparse warnings
kgdb/kdb: Fix no KDB config problem
MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries

Linus Torvalds
2014-02-01 00:59:46 +0800

28 Jan, 2014

1 commit

52bf84aa2 sched/numa, mm: Remove p->numa_migrate_deferred ... Browse Code »

Excessive migration of pages can hurt the performance of workloads
that span multiple NUMA nodes. However, it turns out that the
p->numa_migrate_deferred knob is a really big hammer, which does
reduce migration rates, but does not actually help performance.

Now that the second stage of the automatic numa balancing code
has stabilized, it is time to replace the simplistic migration
deferral code with something smarter.

Signed-off-by: Rik van Riel
Acked-by: Mel Gorman
Signed-off-by: Peter Zijlstra
Cc: Chegu Vinod
Link: http://lkml.kernel.org/r/1390860228-21539-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar

Rik van Riel
2014-01-28 20:17:04 +0800

25 Jan, 2014

2 commits

270750dbc hung_task: Display every hung task warning ... Browse Code »

When khungtaskd detects hung tasks, it prints out
backtraces from a number of those tasks.

Limiting the number of backtraces being printed
out can result in the user not seeing the information
necessary to debug the issue. The hung_task_warnings
sysctl controls this feature.

This patch makes it possible for hung_task_warnings
to accept a special value to print an unlimited
number of backtraces when khungtaskd detects hung
tasks.

The special value is -1. To use this value it is
necessary to change types from ulong to int.

Signed-off-by: Aaron Tomlin
Reviewed-by: Rik van Riel
Acked-by: David Rientjes
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
[ Build warning fix. ]
Signed-off-by: Ingo Molnar

Aaron Tomlin
2014-01-25 19:13:33 +0800
2397efb1b sysctl: Add neg_one as a standard constraint ... Browse Code »

Add neg_one to the list of standard constraints - will be used by the next patch.

Signed-off-by: Aaron Tomlin
Acked-by: Rik van Riel
Acked-by: David Rientjes
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-2-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar

Aaron Tomlin
2014-01-25 15:59:53 +0800

24 Jan, 2014

2 commits

7984754b9 kexec: add sysctl to disable kexec_load ... Browse Code »

For general-purpose (i.e. distro) kernel builds it makes sense to build
with CONFIG_KEXEC to allow end users to choose what kind of things they
want to do with kexec. However, in the face of trying to lock down a
system with such a kernel, there needs to be a way to disable kexec_load
(much like module loading can be disabled). Without this, it is too easy
for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
and modules_disabled are set. With this change, it is still possible to
load an image for use later, then disable kexec_load so the image (or lack
of image) can't be altered.

The intention is for using this in environments where "perfect"
enforcement is hard. Without a verified boot, along with verified
modules, and along with verified kexec, this is trying to give a system a
better chance to defend itself (or at least grow the window of
discoverability) against attack in the face of a privilege escalation.

In my mind, I consider several boot scenarios:

1) Verified boot of read-only verified root fs loading fd-based
verification of kexec images.
2) Secure boot of writable root fs loading signed kexec images.
3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
4) Regular boot with no control of kexec image at all.

1 and 2 don't exist yet, but will soon once the verified kexec series has
landed. 4 is the state of things now. The gap between 2 and 4 is too
large, so this change creates scenario 3, a middle-ground above 4 when 2
and 1 are not possible for a system.

Signed-off-by: Kees Cook
Acked-by: Rik van Riel
Cc: Vivek Goyal
Cc: Eric Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2014-01-24 08:37:03 +0800
54a43d549 numa: add a sysctl for numa_balancing ... Browse Code »

Add a working sysctl to enable/disable automatic numa memory balancing
at runtime.

This allows us to track down performance problems with this feature and
is generally a good idea.

This was possible earlier through debugfs, but only with special
debugging options set. Also fix the boot message.

[akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
Signed-off-by: Andi Kleen
Acked-by: Mel Gorman
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2014-01-24 08:36:51 +0800

22 Jan, 2014

1 commit

49f0ce5f9 mm: add overcommit_kbytes sysctl variable ... Browse Code »

Some applications that run on HPC clusters are designed around the
availability of RAM and the overcommit ratio is fine tuned to get the
maximum usage of memory without swapping. With growing memory, the
1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
for these workload (on a 2TB machine it represents no less than 20GB).

This patch adds the new overcommit_kbytes sysctl variable that allow a
much finer grain.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Jerome Marchand
Cc: Dave Hansen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2014-01-22 08:19:44 +0800

13 Jan, 2014

2 commits

1724813d9 sched/deadline: Remove the sysctl_sched_dl knobs ... Browse Code »

Remove the deadline specific sysctls for now. The problem with them is
that the interaction with the exisiting rt knobs is nearly impossible
to get right.

The current (as per before this patch) situation is that the rt and dl
bandwidth is completely separate and we enforce rt+dl < 100%. This is
undesirable because this means that the rt default of 95% leaves us
hardly any room, even though dl tasks are saver than rt tasks.

Another proposed solution was (a discarted patch) to have the dl
bandwidth be a fraction of the rt bandwidth. This is highly
confusing imo.

Furthermore neither proposal is consistent with the situation we
actually want; which is rt tasks ran from a dl server. In which case
the rt bandwidth is a direct subset of dl.

So whichever way we go, the introduction of dl controls at this point
is painful. Therefore remove them and instead share the rt budget.

This means that for now the rt knobs are used for dl admission control
and the dl runtime is accounted against the rt runtime. I realise that
this isn't entirely desirable either; but whatever we do we appear to
need to change the interface later, so better have a small interface
for now.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-01-13 20:47:23 +0800
332ac17ef sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks ... Browse Code »
13

In order of deadline scheduling to be effective and useful, it is
important that some method of having the allocation of the available
CPU bandwidth to tasks and task groups under control.
This is usually called "admission control" and if it is not performed
at all, no guarantee can be given on the actual scheduling of the
-deadline tasks.

Since when RT-throttling has been introduced each task group have a
bandwidth associated to itself, calculated as a certain amount of
runtime over a period. Moreover, to make it possible to manipulate
such bandwidth, readable/writable controls have been added to both
procfs (for system wide settings) and cgroupfs (for per-group
settings).

Therefore, the same interface is being used for controlling the
bandwidth distrubution to -deadline tasks and task groups, i.e.,
new controls but with similar names, equivalent meaning and with
the same usage paradigm are added.

However, more discussion is needed in order to figure out how
we want to manage SCHED_DEADLINE bandwidth at the task group level.
Therefore, this patch adds a less sophisticated, but actually
very sensible, mechanism to ensure that a certain utilization
cap is not overcome per each root_domain (the single rq for !SMP
configurations).

Another main difference between deadline bandwidth management and
RT-throttling is that -deadline tasks have bandwidth on their own
(while -rt ones doesn't!), and thus we don't need an higher level
throttling mechanism to enforce the desired bandwidth.

This patch, therefore:

- adds system wide deadline bandwidth management by means of:
* /proc/sys/kernel/sched_dl_runtime_us,
* /proc/sys/kernel/sched_dl_period_us,
that determine (i.e., runtime / period) the total bandwidth
available on each CPU of each root_domain for -deadline tasks;

- couples the RT and deadline bandwidth management, i.e., enforces
that the sum of how much bandwidth is being devoted to -rt
-deadline tasks to stay below 100%.

This means that, for a root_domain comprising M CPUs, -deadline tasks
can be created until the sum of their bandwidths stay below:

M * (sched_dl_runtime_us / sched_dl_period_us)

It is also possible to disable this bandwidth management logic, and
be thus free of oversubscribing the system up to any arbitrary level.

Signed-off-by: Dario Faggioli
Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar

Dario Faggioli
2014-01-13 20:46:42 +0800

17 Dec, 2013

1 commit

1bd53a7ef sched/numa: Drop sysctl_numa_balancing_settle_count sysctl ... Browse Code »

commit 887c290e (sched/numa: Decide whether to favour task or group weights
based on swap candidate relationships) drop the check against
sysctl_numa_balancing_settle_count, this patch remove the sysctl.

Signed-off-by: Wanpeng Li
Acked-by: Mel Gorman
Reviewed-by: Rik van Riel
Acked-by: David Rientjes
Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Naoya Horiguchi
Link: http://lkml.kernel.org/r/1386833006-6600-1-git-send-email-liwanp@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2013-12-17 22:24:38 +0800

14 Nov, 2013

1 commit

5e30025a3 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core locking changes from Ingo Molnar:
"The biggest changes:

- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.

- move the various kernel locking primitives to the new
kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...

Linus Torvalds
2013-11-14 15:30:30 +0800

13 Nov, 2013

1 commit

7833819d2 kernel/sysctl.c: check return value after call proc_put_char() in __do_proc_doulongvec_minmax() ... Browse Code »

Need to check the return value of proc_put_char(), as was done in
__do_proc_doulongvec_minmax().

Signed-off-by: Chen Gang
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-11-13 11:09:33 +0800

12 Nov, 2013

2 commits

39cf275a1 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle are:

- (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
van Riel, Peter Zijlstra et al. Yay!

- optimize preemption counter handling: merge the NEED_RESCHED flag
into the preempt_count variable, by Peter Zijlstra.

- wait.h fixes and code reorganization from Peter Zijlstra

- cfs_bandwidth fixes from Ben Segall

- SMP load-balancer cleanups from Peter Zijstra

- idle balancer improvements from Jason Low

- other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
sched: Fix asymmetric scheduling for POWER7
sched: Move completion code from core.c to completion.c
sched: Move wait code from core.c to wait.c
sched: Move wait.c into kernel/sched/
sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
sched: Avoid throttle_cfs_rq() racing with period_timer stopping
sched: Guarantee new group-entities always have weight
sched: Fix hrtimer_cancel()/rq->lock deadlock
sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
sched: Fix race on toggling cfs_bandwidth_used
sched: Remove extra put_online_cpus() inside sched_setaffinity()
sched/rt: Fix task_tick_rt() comment
sched/wait: Fix build breakage
sched/wait: Introduce prepare_to_wait_event()
sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
sched: Remove get_online_cpus() usage
sched: Fix race in migrate_swap_stop()
...

Linus Torvalds
2013-11-12 09:20:12 +0800
ad5d69899 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"As a first remark I'd like to note that the way to build perf tooling
has been simplified and sped up, in the future it should be enough for
you to build perf via:

cd tools/perf/
make install

(ie without the -j option.) The build system will figure out the
number of CPUs and will do a parallel build+install.

The various build system inefficiencies and breakages Linus reported
against the v3.12 pull request should now be resolved - please
(re-)report any remaining annoyances or bugs.

Main changes on the perf kernel side:

* Performance optimizations:
. perf ring-buffer code optimizations, by Peter Zijlstra
. perf ring-buffer code optimizations, by Oleg Nesterov
. x86 NMI call-stack processing optimizations, by Peter Zijlstra
. perf context-switch optimizations, by Peter Zijlstra
. perf sampling speedups, by Peter Zijlstra
. x86 Intel PEBS processing speedups, by Peter Zijlstra

* Enhanced hardware support:
. for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan
. for Haswell transactions, by Andi Kleen, Peter Zijlstra

* Core perf events code enhancements and fixes by Oleg Nesterov:
. for uprobes, if fork() is called with pending ret-probes
. for uprobes platform support code

* New ABI details by Andi Kleen:
. Report x86 Haswell TSX transaction abort cost as weight

Main changes on the perf tooling side (some of these tooling changes
utilize the above kernel side changes):

* 'perf report/top' enhancements:

. Convert callchain children list to rbtree, greatly reducing the
time taken for callchain processing, from Namhyung Kim.

. Add new COMM infrastructure, further improving histogram
processing, from Frédéric Weisbecker, one fix from Namhyung Kim.

. Add /proc/kcore based live-annotation improvements, including
build-id cache support, multi map 'call' instruction navigation
fixes, kcore address validation, objdump workarounds. From
Adrian Hunter.

. Show progress on histogram collapsing, that can take a long
time, from Namhyung Kim.

. Add --max-stack option to limit callchain stack scan in 'top'
and 'report', improving callchain processing when reducing the
stack depth is an option, from Waiman Long.

. Add new option --ignore-vmlinux for perf top, from Willy
Tarreau.

* 'perf trace' enhancements:

. 'perf trace' now can can use a 'perf probe' dynamic tracepoints
to hook into the userspace -> kernel pathname copy so that it
can map fds to pathnames without reading /proc/pid/fd/ symlinks.
From Arnaldo Carvalho de Melo.

. Show VFS path associated with fd in live sessions, using a
'vfs_getname' 'perf probe' created dynamic tracepoint or by
looking at /proc/pid/fd, from Arnaldo Carvalho de Melo.

. Add 'trace' beautifiers for lots of syscall arguments, from
Arnaldo Carvalho de Melo.

. Implement more compact 'trace' output by suppressing zeroed
args, from Arnaldo Carvalho de Melo.

. Show thread COMM by default in 'trace', from Arnaldo Carvalho de
Melo.

. Add option to show full timestamp in 'trace', from David Ahern.

. Add 'record' command in 'trace', to record raw_syscalls:*, from
David Ahern.

. Add summary option to dump syscall statistics in 'trace', from
David Ahern.

. Improve error messages in 'trace', providing hints about system
configuration steps needed for using it, from Ramkumar
Ramachandra.

. 'perf trace' now emits hints as to why tracing is not possible,
helping the user to setup the system to allow tracing in the
desired permission granularity, telling if the problem is due to
debugfs not being mounted or with not enough permission for
!root, /proc/sys/kernel/perf_event_paranoit value, etc. From
Arnaldo Carvalho de Melo.

* 'perf record' enhancements:

. Check maximum frequency rate for record/top, emitting better
error messages, from Jiri Olsa.

. 'perf record' code cleanups, from David Ahern.

. Improve write_output error message in 'perf record', from Adrian
Hunter.

. Allow specifying B/K/M/G unit to the --mmap-pages arguments,
from Jiri Olsa.

. Fix command line callchain attribute tests to handle the new
-g/--call-chain semantics, from Arnaldo Carvalho de Melo.

* 'perf kvm' enhancements:

. Disable live kvm command if timerfd is not supported, from David
Ahern.

. Fix detection of non-core features, from David Ahern.

* 'perf list' enhancements:

. Add usage to 'perf list', from David Ahern.

. Show error in 'perf list' if tracepoints not available, from
Pekka Enberg.

* 'perf probe' enhancements:

. Support "$vars" meta argument syntax for local variables,
allowing asking for all possible variables at a given probe
point to be collected when it hits, from Masami Hiramatsu.

* 'perf sched' enhancements:

. Address the root cause of that 'perf sched' stack initialization
build slowdown, by programmatically setting a big array after
moving the global variable back to the stack. Fix from Adrian
Hunter.

* 'perf script' enhancements:

. Set up output options for in-stream attributes, from Adrian
Hunter.

. Print addr by default for BTS in 'perf script', from Adrian
Juntmer

* 'perf stat' enhancements:

. Improved messages when doing profiling in all or a subset of
CPUs using a workload as the session delimitator, as in:

'perf stat --cpu 0,2 sleep 10s'

from Arnaldo Carvalho de Melo.

. Add units to nanosec-based counters in 'perf stat', from David
Ahern.

. Remove bogus info when using 'perf stat' -e cycles/instructions,
from Ramkumar Ramachandra.

* 'perf lock' enhancements:

. 'perf lock' fixes and cleanups, from Davidlohr Bueso.

* 'perf test' enhancements:

. Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing
and 'perf test', from Adrian Hunter.

. Clarify the "sample parsing" test entry, from Arnaldo Carvalho
de Melo.

. Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test,
from Arnaldo Carvalho de Melo.

. Memory leak fixes in 'perf test', from Felipe Pena.

* 'perf bench' enhancements:

. Change the procps visible command-name of invididual benchmark
tests plus cleanups, from Ingo Molnar.

* Generic perf tooling infrastructure/plumbing changes:

. Separating data file properties from session, code
reorganization from Jiri Olsa.

. Fix version when building out of tree, as when using one of
these:

$ make help | grep perf
perf-tar-src-pkg - Build perf-3.12.0.tar source tarball
perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball
$

from David Ahern.

. Enhance option parse error message, showing just the help lines
of the options affected, from Namhyung Kim.

. libtraceevent updates from upstream trace-cmd repo, from Steven
Rostedt.

. Always use perf_evsel__set_sample_bit to set sample_type, from
Adrian Hunter.

. Memory and mmap leak fixes from Chenggang Qin.

. Assorted build fixes for from David Ahern and Jiri Olsa.

. Speed up and prettify the build system, from Ingo Molnar.

. Implement addr2line directly using libbfd, from Roberto Vitillo.

. Separate the GTK support in a separate libperf-gtk.so DSO, that
is only loaded when --gtk is specified, from Namhyung Kim.

. perf bash completion fixes and improvements from Ramkumar
Ramachandra.

. Support for Openembedded/Yocto -dbg packages, from Ricardo
Ribalda Delgado.

And lots and lots of other fixes and code reorganizations that did not
make it into the list, see the shortlog, diffstat and the Git log for
details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits)
uprobes: Fix the memory out of bound overwrite in copy_insn()
uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()
perf tools: Remove unneeded include
perf record: Remove post_processing_offset variable
perf record: Remove advance_output function
perf record: Refactor feature handling into a separate function
perf trace: Don't relookup fields by name in each sample
perf tools: Fix version when building out of tree
perf evsel: Ditch evsel->handler.data field
uprobes: Export write_opcode() as uprobe_write_opcode()
uprobes: Introduce arch_uprobe->ixol
uprobes: Kill module_init() and module_exit()
uprobes: Move function declarations out of arch
perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support
perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes
perf: Factor out strncpy() in perf_event_mmap_event()
tools/perf: Add required memory barriers
perf: Fix arch_perf_out_copy_user default
perf: Update a stale comment
perf: Optimize perf_output_begin() -- address calculation
...

Linus Torvalds
2013-11-12 09:06:34 +0800

06 Nov, 2013

1 commit

c90423d1d Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move ... Browse Code »

Conflicts:
kernel/Makefile

There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.

Signed-off-by: Ingo Molnar

Ingo Molnar
2013-11-06 14:50:37 +0800

17 Oct, 2013

1 commit

8eaede49d sysrq: Allow magic SysRq key functions to be disabled through Kconfig ... Browse Code »

Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE)
into a Kconfig variable.

Original version by Bastian Blank .

Signed-off-by: Ben Hutchings
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2013-10-17 04:01:44 +0800

09 Oct, 2013

3 commits

de1c9ce6f sched/numa: Skip some page migrations after a shared fault ... Browse Code »

Shared faults can lead to lots of unnecessary page migrations,
slowing down the system, and causing private faults to hit the
per-pgdat migration ratelimit.

This patch adds sysctl numa_balancing_migrate_deferred, which specifies
how many shared page migrations to skip unconditionally, after each page
migration that is skipped because it is a shared fault.

This reduces the number of page migrations back and forth in
shared fault situations. It also gives a strong preference to
the tasks that are already running where most of the memory is,
and to moving the other tasks to near the memory.

Testing this with a much higher scan rate than the default
still seems to result in fewer page migrations than before.

Memory seems to be somewhat better consolidated than previously,
with multi-instance specjbb runs on a 4 node system.

Signed-off-by: Rik van Riel
Signed-off-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar

Rik van Riel
2013-10-09 20:48:21 +0800
930aa174f sched/numa: Remove the numa_balancing_scan_period_reset sysctl ... Browse Code »

With scan rate adaptions based on whether the workload has properly
converged or not there should be no need for the scan period reset
hammer. Get rid of it.

Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar

Mel Gorman
2013-10-09 20:48:18 +0800
3a7053b32 sched/numa: Favour moving tasks towards the preferred node ... Browse Code »

This patch favours moving tasks towards NUMA node that recorded a higher
number of NUMA faults during active load balancing. Ideally this is
self-reinforcing as the longer the task runs on that node, the more faults
it should incur causing task_numa_placement to keep the task running on that
node. In reality a big weakness is that the nodes CPUs can be overloaded
and it would be more efficient to queue tasks on an idle node and migrate
to the new node. This would require additional smarts in the balancer so
for now the balancer will simply prefer to place the task on the preferred
node for a PTE scans which is controlled by the numa_balancing_settle_count
sysctl. Once the settle_count number of scans has complete the schedule
is free to place the task on an alternative node if the load is imbalanced.

[srikar@linux.vnet.ibm.com: Fixed statistics]
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
[ Tunable and use higher faults instead of preferred. ]
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381141781-10992-23-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar

Mel Gorman
2013-10-09 18:40:26 +0800

04 Oct, 2013

1 commit

723478c8a perf: Enforce 1 as lower limit for perf_event_max_sample_rate ... Browse Code »

/proc/sys/kernel/perf_event_max_sample_rate will accept
negative values as well as 0.

Negative values are unreasonable, and 0 causes a
divide by zero exception in perf_proc_update_handler.

This patch enforces a lower limit of 1.

Signed-off-by: Knut Petersen
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/5242DB0C.4070005@t-online.de
Signed-off-by: Ingo Molnar

Knut Petersen
2013-10-04 16:06:07 +0800

23 Sep, 2013

1 commit

cd64647f0 hung_task: Change sysctl_hung_task_check_count to 'int' ... Browse Code »

As 'sysctl_hung_task_check_count' is 'unsigned long' when this
value is assigned to max_count in check_hung_uninterruptible_tasks(),
it's truncated to 'int' type.

This causes a minor artifact: if we write 2^32 to sysctl.hung_task_check_count,
hung task detection will be effectively disabled.

With this fix, it will still truncate the user input to 32 bits, but
reading sysctl.hung_task_check_count reflects the actual truncated value.

Signed-off-by: Li Zefan
Acked-by: Ingo Molnar
Link: http://lkml.kernel.org/r/523FFF4E.9050401@huawei.com
Signed-off-by: Ingo Molnar

Li Zefan
2013-09-23 17:10:49 +0800

13 Sep, 2013

1 commit

26935fb06 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...

Linus Torvalds
2013-09-13 06:01:38 +0800