Eric Lee / smarc-fsl-linux-kernel

28 Oct, 2010

1 commit

abffc0207 doc: clarify the behaviour of dirty_ratio/dirty_bytes ... Browse Code »

When dirty_ratio or dirty_bytes is written the other parameter is disabled
and set to 0 (in dirty_bytes_handler() / dirty_ratio_handler()).

We do the same for dirty_background_ratio and dirty_background_bytes.

However, in the sysctl documentation, we say that the counterpart becomes
a function of the old value, that is not correct.

Clarify the documentation reporting the actual behaviour.

Reviewed-by: Greg Thelen
Acked-by: David Rientjes
Signed-off-by: Andrea Righi
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Righi
2010-10-28 09:03:08 +0800

10 Aug, 2010

1 commit

ad915c432 oom: enable oom tasklist dump by default ... Browse Code »

The oom killer tasklist dump, enabled with the oom_dump_tasks sysctl, is
very helpful information in diagnosing why a user's task has been killed.
It emits useful information such as each eligible thread's memory usage
that can determine why the system is oom, so it should be enabled by
default.

Signed-off-by: David Rientjes
Acked-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-08-10 11:44:56 +0800

28 Jun, 2010

1 commit

2174efb6a Documentation/sysctl/vm.txt typo ... Browse Code »

Fix trivial typo: duplicated word.

Signed-off-by: Kulikov Vasiliy
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Kulikov Vasiliy
2010-06-28 19:59:28 +0800

25 May, 2010

2 commits

5e7719058 mm: compaction: add a tunable that decides when memory should be compacted and w… ... Browse Code »

…hen it should be reclaimed

The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation. One of these
is based on the fragmentation. If the index is below 500, memory will not
be compacted. This choice is arbitrary and not based on data. To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold. The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.

[randy.dunlap@oracle.com: Fix build errors when proc fs is not configured]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2010-05-25 23:06:59 +0800
76ab0f530 mm: compaction: add /proc trigger for memory compaction ... Browse Code »

Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is
written to the file, all zones are compacted. The expected user of such a
trigger is a job scheduler that prepares the system before the target
application runs.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:59 +0800

16 May, 2010

1 commit

3b098e2d7 net: Consistent skb timestamping ... Browse Code »

With RPS inclusion, skb timestamping is not consistent in RX path.

If netif_receive_skb() is used, its deferred after RPS dispatch.

If netif_rx() is used, its done before RPS dispatch.

This can give strange tcpdump timestamps results.

I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)

Tom Herbert prefer to sample timestamps after RPS dispatch. In case
sampling is expensive (HPET/acpi_pm on x86), this makes sense.

Let admins switch from one mode to another, using a new
sysctl, /proc/sys/net/core/netdev_tstamp_prequeue

Its default value (1), means timestamps are taken as soon as possible,
before backlog queueing, giving accurate timestamps.

Setting a 0 value permits to sample timestamps when processing backlog,
after RPS dispatch, to lower the load of the pre-RPS cpu.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-05-16 14:57:10 +0800

13 Mar, 2010

1 commit

daaf1e688 memcg: handle panic_on_oom=always case ... Browse Code »

Presently, if panic_on_oom=2, the whole system panics even if the oom
happend in some special situation (as cpuset, mempolicy....). Then,
panic_on_oom=2 means painc_on_oom_always.

Now, memcg doesn't check panic_on_oom flag. This patch adds a check.

BTW, how it's useful ?

kdump+panic_on_oom=2 is the last tool to investigate what happens in
oom-ed system. When a task is killed, the sysytem recovers and there will
be few hint to know what happnes. In mission critical system, oom should
never happen. Then, panic_on_oom=2+kdump is useful to avoid next OOM by
knowing precise information via snapshot.

TODO:
- For memcg, it's for isolate system's memory usage, oom-notiifer and
freeze_at_oom (or rest_at_oom) should be implemented. Then, management
daemon can do similar jobs (as kdump) or taking snapshot per cgroup.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: David Rientjes
Cc: Nick Piggin
Reviewed-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-13 07:52:37 +0800

12 Dec, 2009

1 commit

d75757abd doc: Add documentation for bootloader_{type,version} ... Browse Code »

Add documentation for kernel/bootloader_type and
kernel/bootloader_version to sysctl/kernel.txt. This should really
have been done a long time ago.

Signed-off-by: H. Peter Anvin
Cc: Shen Feng

H. Peter Anvin
2009-12-12 06:28:56 +0800

10 Dec, 2009

1 commit

4ef58d4e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
tree-wide: fix misspelling of "definition" in comments
reiserfs: fix misspelling of "journaled"
doc: Fix a typo in slub.txt.
inotify: remove superfluous return code check
hdlc: spelling fix in find_pvc() comment
doc: fix regulator docs cut-and-pasteism
mtd: Fix comment in Kconfig
doc: Fix IRQ chip docs
tree-wide: fix assorted typos all over the place
drivers/ata/libata-sff.c: comment spelling fixes
fix typos/grammos in Documentation/edac.txt
sysctl: add missing comments
fs/debugfs/inode.c: fix comment typos
sgivwfb: Make use of ARRAY_SIZE.
sky2: fix sky2_link_down copy/paste comment error
tree-wide: fix typos "couter" -> "counter"
tree-wide: fix typos "offest" -> "offset"
fix kerneldoc for set_irq_msi()
spidev: fix double "of of" in comment
comment typo fix: sybsystem -> subsystem
...

Linus Torvalds
2009-12-10 11:43:33 +0800

04 Dec, 2009

1 commit

af901ca18 tree-wide: fix assorted typos all over the place ... Browse Code »

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa
Signed-off-by: Jiri Kosina

André Goddard Rosa
2009-12-04 22:39:55 +0800

19 Nov, 2009

1 commit

86926d009 sysctl: Remove CTL_NONE and CTL_UNNUMBERED ... Browse Code »

Now that the sysctl structures no longer have a ctl_name field
there is no reason to retain the definitions for CTL_NONE and
CTL_UNNUMBERED, or to explain their historic usage.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2009-11-19 00:14:55 +0800

09 Nov, 2009

1 commit

7beeec88e docs: fix core_pipe_limit info ... Browse Code »

Fix typos in core_pipe_limit info.

Signed-off-by: Randy Dunlap
Cc: Neil Horman
Signed-off-by: Jiri Kosina

Randy Dunlap
2009-11-09 16:40:55 +0800

24 Sep, 2009

3 commits

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800
a293980c2 exec: let do_coredump() limit the number of concurrent dumps to pipes ... Browse Code »

Introduce core pipe limiting sysctl.

Since we can dump cores to pipe, rather than directly to the filesystem,
we create a condition in which a user can create a very high load on the
system simply by running bad applications.

If the pipe reader specified in core_pattern is poorly written, we can
have lots of ourstandig resources and processes in the system.

This sysctl introduces an ability to limit that resource consumption.
core_pipe_limit defines how many in-flight dumps may be run in parallel,
dumps beyond this value are skipped and a note is made in the kernel log.
A special value of 0 in core_pipe_limit denotes unlimited core dumps may
be handled (this is the default value).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Neil Horman
Reported-by: Earl Chew
Cc: Oleg Nesterov
Cc: Andi Kleen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Horman
2009-09-24 22:21:00 +0800
bcadbbd4c Documentation: update stale definition of file-nr in fs.txt ... Browse Code »

In "documentation: update Documentation/filesystem/proc.txt and
Documentation/sysctls" (commit 760df93ec) we merged /proc/sys/fs
documentation in Documentation/sysctl/fs.txt and
Documentation/filesystem/proc.txt, but stale file-nr definition
remained.

This patch adds back the right fs-nr definition for 2.6 kernel.

Signed-off-by: Xiaotian Feng
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2009-09-24 22:20:57 +0800

23 Sep, 2009

1 commit

af91322ef printk: add printk_delay to make messages readable for some scenarios ... Browse Code »

When syslog is not possible, at the same time there's no serial/net
console available, it will be hard to read the printk messages. For
example oops/panic/warning messages in shutdown phase.

Add a printk delay feature, we can make each printk message delay some
milliseconds.

Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay

The value range from 0 - 10000, default value is 0

[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Dave Young
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2009-09-23 22:39:28 +0800

22 Sep, 2009

2 commits

342ff1a1b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...

Linus Torvalds
2009-09-22 22:51:45 +0800
55c37a840 vm: document that setting vfs_cache_pressure to 0 isn't a good idea ... Browse Code »

Reported-by: Christian Thaeter
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-09-22 22:17:30 +0800

21 Sep, 2009

1 commit

b7f5ab6fb trivial: doc: document missing value 2 for randomize-va-space ... Browse Code »

The documentation for /proc/sys/kernel/* does not mention the possible
value 2 for randomize-va-space yet. While being there, doing some
reformatting, fixing grammar problems and clarifying the correlations
between randomize-va-space, kernel parameter "norandmaps" and the
CONFIG_COMPAT_BRK option.

Signed-off-by: Horst Schirmeier
Signed-off-by: Jiri Kosina

Horst Schirmeier
2009-09-21 21:14:53 +0800

16 Sep, 2009

1 commit

6a46079cf HWPOISON: The high level memory error handler in the VM v7 ... Browse Code »

Add the high level memory handler that poisons pages
that got corrupted by hardware (typically by a two bit flip in a DIMM
or a cache) on the Linux level. The goal is to prevent everyone
from accessing these pages in the future.

This done at the VM level by marking a page hwpoisoned
and doing the appropriate action based on the type of page
it is.

The code that does this is portable and lives in mm/memory-failure.c

To quote the overview comment:

High level machine check handler. Handles pages reported by the
hardware as being corrupted usually due to a 2bit ECC memory or cache
failure.

This focuses on pages detected as corrupted in the background.
When the current CPU tries to consume corruption the currently
running process can just be killed directly instead. This implies
that if the error cannot be handled for some reason it's safe to
just ignore it because no corruption has been consumed yet. Instead
when that happens another machine check will happen.

Handles page cache pages in various states. The tricky part
here is that we can access any page asynchronous to other VM
users, because memory failures could happen anytime and anywhere,
possibly violating some of their assumptions. This is why this code
has to be extremely careful. Generally it tries to use normal locking
rules, as in get the standard locks, even if that means the
error handling takes potentially a long time.

Some of the operations here are somewhat inefficient and have non
linear algorithmic complexity, because the data structures have not
been optimized for this case. This is in particular the case
for the mapping from a vma to a process. Since this case is expected
to be rare we hope we can get away with this.

There are in principle two strategies to kill processes on poison:
- just unmap the data and wait for an actual reference before
killing
- kill as soon as corruption is detected.
Both have advantages and disadvantages and should be used
in different situations. Right now both are implemented and can
be switched with a new sysctl vm.memory_failure_early_kill
The default is early kill.

The patch does some rmap data structure walking on its own to collect
processes to kill. This is unusual because normally all rmap data structure
knowledge is in rmap.c only. I put it here for now to keep
everything together and rmap knowledge has been seeping out anyways

Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
Nick Piggin (who did a lot of great work) and others.

Cc: npiggin@suse.de
Cc: riel@redhat.com
Signed-off-by: Andi Kleen
Acked-by: Rik van Riel
Reviewed-by: Hidehiro Kawai

Andi Kleen
2009-09-16 17:50:15 +0800

11 Sep, 2009

1 commit

c114728af [S390] add call home support ... Browse Code »

Signed-off-by: Hans-Joachim Picht
Signed-off-by: Martin Schwidefsky

Hans-Joachim Picht
2009-09-11 16:29:49 +0800

17 Jun, 2009

2 commits

90afa5de6 vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim ... Browse Code »

A bug was brought to my attention against a distro kernel but it affects
mainline and I believe problems like this have been reported in various
guises on the mailing lists although I don't have specific examples at the
moment.

The reported problem was that malloc() stalled for a long time (minutes in
some cases) if a large tmpfs mount was occupying a large percentage of
memory overall. The pages did not get cleaned or reclaimed by
zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists
are uselessly scanned frequencly making the CPU spin at near 100%.

This patchset intends to address that bug and bring the behaviour of
zone_reclaim() more in line with expectations which were noticed during
investigation. It is based on top of mmotm and takes advantage of
Kosaki's work with respect to zone_reclaim().

Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the
scan should go ahead. The broken heuristic is what was causing the
malloc() stall as it uselessly scanned the LRU constantly. Currently,
zone_reclaim is assuming zone_reclaim_mode is 1 and historically it
could not deal with tmpfs pages at all. This fixes up the heuristic so
that an unnecessary scan is more likely to be correctly avoided.

Patch 2 notes that zone_reclaim() returning a failure automatically means
the zone is marked full. This is not always true. It could have
failed because the GFP mask or zone_reclaim_mode were unsuitable.

Patch 3 introduces a counter zreclaim_failed that will increment each
time the zone_reclaim scan-avoidance heuristics fail. If that
counter is rapidly increasing, then zone_reclaim_mode should be
set to 0 as a temporarily resolution and a bug reported because
the scan-avoidance heuristic is still broken.

This patch:

On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim. On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.

There is a heuristic that determines if the scan is worthwhile but the
problem is that the heuristic is not being properly applied and is
basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of
proper detection can manfiest as high CPU usage as the LRU list is scanned
uselessly.

Historically, once enabled it was depending on NR_FILE_PAGES which may
include swapcache pages that the reclaim_mode cannot deal with. Patch
vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
pages that were not file-backed such as swapcache and made a calculation
based on the inactive, active and mapped files. This is far superior when
zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
reasonable starting figure.

This patch alters how zone_reclaim() works out how many pages it might be
able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in
the reclaim_mode it will either consider NR_FILE_PAGES as potential
candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set,
then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is
not set, then NR_FILE_MAPPED are not.

[kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages]
[fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate]
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Acked-by: Christoph Lameter
Cc: KOSAKI Motohiro
Cc: Wu Fengguang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-06-17 10:47:45 +0800
418589663 page allocator: use allocation flags as an index to the zone watermark ... Browse Code »

ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether
pages_min, pages_low or pages_high is used as the zone watermark when
allocating the pages. Two branches in the allocator hotpath determine
which watermark to use.

This patch uses the flags as an array index into a watermark array that is
indexed with WMARK_* defines accessed via helpers. All call sites that
use zone->pages_* are updated to use the helpers for accessing the values
and the array offsets for setting.

Signed-off-by: Mel Gorman
Reviewed-by: Christoph Lameter
Cc: KOSAKI Motohiro
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: Dave Hansen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-06-17 10:47:35 +0800

13 Jun, 2009

1 commit

19f594600 trivial: Miscellaneous documentation typo fixes ... Browse Code »

Fix various typos in documentation txts.

Signed-off-by: Matt LaPlante
Signed-off-by: Jiri Kosina

Matt LaPlante
2009-06-13 00:01:47 +0800

22 May, 2009

1 commit

2c9e703c6 Merge branch 'master' into next ... Browse Code »

Conflicts:
fs/exec.c

Removed IMA changes (the IMA checks are now performed via may_open()).

Signed-off-by: James Morris

James Morris
2009-05-22 16:40:59 +0800

15 May, 2009

1 commit

cd17cbfda Revert "mm: add /proc controls for pdflush threads" ... Browse Code »

This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af.

Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.

Signed-off-by: Jens Axboe

Jens Axboe
2009-05-15 17:32:24 +0800

08 May, 2009

1 commit

d25411709 Merge branch 'master' into next Browse Code »

James Morris
2009-05-08 15:56:47 +0800

03 May, 2009

1 commit

9e4a5bda8 mm: prevent divide error for small values of vm_dirty_bytes ... Browse Code »

Avoid setting less than two pages for vm_dirty_bytes: this is necessary to
avoid potential division by 0 (like the following) in get_dirty_limits().

[ 49.951610] divide error: 0000 [#1] PREEMPT SMP
[ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
[ 49.952195] CPU 1
[ 49.952195] Modules linked in: pcspkr
[ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
[ 49.952195] RIP: 0010:[] [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202
[ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
[ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
[ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
[ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
[ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
[ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
[ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
[ 49.952195] Stack:
[ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
[ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
[ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
[ 49.952195] Call Trace:
[ 49.952195] [] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
[ 49.952195] [] ? ext3_writeback_write_end+0x9e/0x120
[ 49.952195] [] generic_file_buffered_write+0x12f/0x330
[ 49.952195] [] __generic_file_aio_write_nolock+0x26d/0x460
[ 49.952195] [] ? generic_file_aio_write+0x52/0xd0
[ 49.952195] [] generic_file_aio_write+0x69/0xd0
[ 49.952195] [] ext3_file_write+0x26/0xc0
[ 49.952195] [] do_sync_write+0xf1/0x140
[ 49.952195] [] ? get_lock_stats+0x2a/0x60
[ 49.952195] [] ? autoremove_wake_function+0x0/0x40
[ 49.952195] [] vfs_write+0xcb/0x190
[ 49.952195] [] sys_write+0x50/0x90
[ 49.952195] [] system_call_fastpath+0x16/0x1b
[ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
[ 49.952195] RIP [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP
[ 50.096523] ---[ end trace 008d7aa02f244d7b ]---

Signed-off-by: Andrea Righi
Cc: Peter Zijlstra
Cc: David Rientjes
Cc: Dave Chinner
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Righi
2009-05-03 06:36:10 +0800

14 Apr, 2009

1 commit

ca8b99502 Documentation/sysctl/net.txt: fix a typo ... Browse Code »

s/spicified/specified

Signed-off-by: Li Zefan
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-04-14 06:04:28 +0800

07 Apr, 2009

1 commit

fafd688e4 mm: add /proc controls for pdflush threads ... Browse Code »

Add /proc entries to give the admin the ability to control the minimum and
maximum number of pdflush threads. This allows finer control of pdflush
on both large and small machines.

The rationale is simply one size does not fit all. Admins on large and/or
small systems may want to tune the min/max pdflush thread count to best
suit their needs. Right now the min/max is hardcoded to 2/8. While
probably a fair estimate for smaller machines, large machines with large
numbers of CPUs and large numbers of filesystems/block devices may benefit
from larger numbers of threads working on different block devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would still
desire finer control on the min/max other than to have to recompile the
kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
the current value of nr_pdflush_threads_max. This minimum is required
since additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.

[akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
Signed-off-by: Peter W Morreale
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter W Morreale
2009-04-07 23:31:03 +0800

03 Apr, 2009

3 commits

45dad7bd9 documentation: fix unix_dgram_qlen description ... Browse Code »

Previous description about system parameter in /proc/sys/net/unix/ is
wrong (or missed). Simply add a new description about unix_dgram_qlen
according to latest kernel.

Signed-off-by: Li Xiaodong
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Xiaodong
2009-04-03 10:04:53 +0800
760df93ec documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls ... Browse Code »

Now /proc/sys is described in many places and much information is
redundant. This patch updates the proc.txt and move the /proc/sys
desciption out to the files in Documentation/sysctls.

Details are:

merge
- 2.1 /proc/sys/fs - File system data
- 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
- 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface
with Documentation/sysctls/fs.txt.

remove
- 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
since it's not better then the Documentation/binfmt_misc.txt.

merge
- 2.3 /proc/sys/kernel - general kernel parameters
with Documentation/sysctls/kernel.txt

remove
- 2.5 /proc/sys/dev - Device specific parameters
since it's obsolete the sysfs is used now.

remove
- 2.6 /proc/sys/sunrpc - Remote procedure calls
since it's not better then the Documentation/sysctls/sunrpc.txt

move
- 2.7 /proc/sys/net - Networking stuff
- 2.9 Appletalk
- 2.10 IPX
to newly created Documentation/sysctls/net.txt.

remove
- 2.8 /proc/sys/net/ipv4 - IPV4 settings
since it's not better then the Documentation/networking/ip-sysctl.txt.

add
- Chapter 3 Per-Process Parameters
to descibe /proc//xxx parameters.

Signed-off-by: Shen Feng
Cc: Randy Dunlap
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shen Feng
2009-04-03 10:04:53 +0800
3d43321b7 modules: sysctl to block module loading ... Browse Code »

Implement a sysctl file that disables module-loading system-wide since
there is no longer a viable way to remove CAP_SYS_MODULE after the system
bounding capability set was removed in 2.6.25.

Value can only be set to "1", and is tested only if standard capability
checks allow CAP_SYS_MODULE. Given existing /dev/mem protections, this
should allow administrators a one-way method to block module loading
after initial boot-time module loading has finished.

Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Signed-off-by: James Morris

Kees Cook
2009-04-03 08:47:11 +0800

16 Jan, 2009

1 commit

db0fb1848 Update of Documentation: vm.txt and proc.txt ... Browse Code »

Update Documentation/sysctl/vm.txt and Documentation/filesystems/proc.txt.
More specifically, the section on /proc/sys/vm in
Documentation/filesystems/proc.txt was removed and a link to
Documentation/sysctl/vm.txt added.

Most of the verbiage from proc.txt was simply moved in vm.txt, with new
addtional text for "swappiness" and "stat_interval".

Signed-off-by: Peter W Morreale
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter W Morreale
2009-01-16 08:39:35 +0800

08 Jan, 2009

1 commit

dd8632a12 NOMMU: Make mmap allocation page trimming behaviour configurable. ... Browse Code »

NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
the nearest power-of-2 number of pages. Currently it then discards the excess
pages back to the page allocator, making that memory available for use by other
things. This can, however, cause greater amount of fragmentation.

To counter this, a sysctl is added in order to fine-tune the trimming
behaviour. The default behaviour remains to trim pages aggressively, while
this can either be disabled completely or set to a higher page-granular
watermark in order to have finer-grained control.

vm region vm_top bits taken from an earlier patch by David Howells.

Signed-off-by: Paul Mundt
Signed-off-by: David Howells
Tested-by: Mike Frysinger

Paul Mundt
2009-01-08 20:04:47 +0800

07 Jan, 2009

1 commit

2da02997e mm: add dirty_background_bytes and dirty_bytes sysctls ... Browse Code »

This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.

dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.

With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.

dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system. If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.

When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value. For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:

dirtyable_memory = free pages + mapped pages + file cache

dirty_background_bytes = dirty_background_ratio * dirtyable_memory
-or-
dirty_background_ratio = dirty_background_bytes / dirtyable_memory

AND

dirty_bytes = dirty_ratio * dirtyable_memory
-or-
dirty_ratio = dirty_bytes / dirtyable_memory

Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified. When one sysctl is written, the other appears as 0 when read.

The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.

Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100. This restriction is maintained, but
dirty_bytes has a lower limit of only one page.

Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio. This restriction is maintained in addition to
restricting dirty_background_bytes. If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.

Acked-by: Peter Zijlstra
Cc: Dave Chinner
Cc: Christoph Lameter
Signed-off-by: David Rientjes
Cc: Andrea Righi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-01-07 07:59:03 +0800

30 Oct, 2008

1 commit

bb20698d4 Document kernel taint flags properly ... Browse Code »

This fills in the documentation for all of the current kernel taint
flags, and fixes the number for TAINT_CRAP, which was incorrectly
described.

Cc: Michael Kerrisk
Cc: Randy Dunlap
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-10-30 06:03:49 +0800

11 Oct, 2008

1 commit

061b1bd39 Staging: add TAINT_CRAP for all drivers/staging code ... Browse Code »

We need to add a flag for all code that is in the drivers/staging/
directory to prevent all other kernel developers from worrying about
issues here, and to notify users that the drivers might not be as good
as they are normally used to.

Based on code from Andreas Gruenbacher and Jeff Mahoney to provide a
TAINT flag for the support level of a kernel module in the Novell
enterprise kernel release.

This is the kernel portion of this feature, the ability for the flag to
be set needs to be done in the build process and will happen in a
follow-up patch.

Cc: Andreas Gruenbacher
Cc: Jeff Mahoney
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-10-11 06:31:05 +0800

23 Sep, 2008

1 commit

b4d19cc84 Documentation/sysctl/kernel.txt: fix softlockup_thresh description ... Browse Code »

- s/s/seconds/

- s/10 seconds/60 seconds/

- Mention the zero-disables-it feature.

Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-09-23 23:09:14 +0800

27 Jul, 2008

1 commit

d91958815 Documentation cleanup: trivial misspelling, punctuation, and grammar corrections. ... Browse Code »

Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt LaPlante
2008-07-27 03:00:06 +0800