Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

24 Mar, 2006

1 commit

825a46af5 [PATCH] cpuset memory spread basic implementation ... Browse Code »

This patch provides the implementation and cpuset interface for an alternative
memory allocation policy that can be applied to certain kinds of memory
allocations, such as the page cache (file system buffers) and some slab caches
(such as inode caches).

The policy is called "memory spreading." If enabled, it spreads out these
kinds of memory allocations over all the nodes allowed to a task, instead of
preferring to place them on the node where the task is executing.

All other kinds of allocations, including anonymous pages for a tasks stack
and data regions, are not affected by this policy choice, and continue to be
allocated preferring the node local to execution, as modified by the NUMA
mempolicy.

There are two boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures. They are called 'memory_spread_page' and 'memory_spread_slab'.

If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.

If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
kernel will spread some file system related slab caches, such as for inodes
and dentries evenly over all the nodes that the faulting task is allowed to
use, instead of preferring to put those pages on the node where the task is
running.

The implementation is simple. Setting the cpuset flags 'memory_spread_page'
or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
subsequently joins that cpuset. In subsequent patches, the page allocation
calls for the affected page cache and slab caches are modified to perform an
inline check for these flags, and if set, a call to a new routine
cpuset_mem_spread_node() returns the node to prefer for the allocation.

The cpuset_mem_spread_node() routine is also simple. It uses the value of a
per-task rotor cpuset_mem_spread_rotor to select the next node in the current
tasks mems_allowed to prefer for the allocation.

This policy can provide substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large
file system data sets that need to be spread across the several nodes in the
jobs cpuset in order to fit. Without this patch, especially for jobs that
might have one thread reading in the data set, the memory allocation across
the nodes in the jobs cpuset can become very uneven.

A couple of Copyright year ranges are updated as well. And a couple of email
addresses that can be found in the MAINTAINERS file are removed.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:22 +0800

15 Mar, 2006

1 commit

b4fb37662 [PATCH] Page migration documentation update ... Browse Code »

Update the documentation for page migration.

- Fix up bits and pieces in cpusets.txt

- Rework text in vm/page-migration to be clearer and reflect the final
version of page migration in 2.6.16. Mention Andi Kleen's numactl
package that contains user space tools for page migration via
libnuma. Add reference to numa_maps and to the manpage in numactl.

- Add todo list for outstanding issues

Signed-off-by: Christoph Lameter
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-15 13:43:02 +0800

11 Jan, 2006

1 commit

864913f30 cpuset two little doc fixes ... Browse Code »

Two little cpuset documentation fixes.

Signed-off-by: Paul Jackson
Signed-off-by: Adrian Bunk

Paul Jackson
2006-01-11 09:01:38 +0800

09 Jan, 2006

3 commits

90c9cc404 [PATCH] cpuset: remove marker_pid documentation ... Browse Code »

Remove documentation for the cpuset 'marker_pid' feature, that was in the
patch "cpuset: change marker for relative numbering" That patch was previously
pulled from *-mm at my (pj) request.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-09 12:13:43 +0800
bd5e09cf7 [PATCH] cpuset: document additional features ... Browse Code »

Document the additional cpuset features:
notify_on_release
marker_pid
memory_pressure
memory_pressure_enabled

Rearrange and improve formatting of existing documentation for
cpu_exclusive and mem_exclusive features.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-09 12:13:43 +0800
45b07ef31 [PATCH] cpusets: swap migration interface ... Browse Code »

Add a boolean "memory_migrate" to each cpuset, represented by a file
containing "0" or "1" in each directory below /dev/cpuset.

It defaults to false (file contains "0"). It can be set true by writing
"1" to the file.

If true, then anytime that a task is attached to the cpuset so marked, the
pages of that task will be moved to that cpuset, preserving, to the extent
practical, the cpuset-relative placement of the pages.

Also anytime that a cpuset so marked has its memory placement changed (by
writing to its "mems" file), the tasks in that cpuset will have their pages
moved to the cpusets new nodes, preserving, to the extent practical, the
cpuset-relative placement of the moved pages.

Signed-off-by: Paul Jackson
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-09 12:12:43 +0800

31 Oct, 2005

1 commit

33430dc59 [PATCH] Typo fix: explictly -> explicitly ... Browse Code »

(akpm: I don't do typo patches, but one of these is in a printk string)

Signed-off-by: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jean Delvare
2005-10-31 09:37:20 +0800

11 Sep, 2005

1 commit

d533f6718 [PATCH] Spelling fixes for Documentation/ ... Browse Code »

The attached patch fixes the following spelling errors in Documentation/
- double "the"
- Several misspellings of function/functionality
- infomation
- memeory
- Recieved
- wether
and possibly others which I forgot ;-)
Trailing whitespaces on the same line as the typo are also deleted.

Signed-off-by: Tobias Klauser
Signed-off-by: Domen Puncer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobias Klauser
2005-09-11 01:06:28 +0800

08 Sep, 2005

1 commit

9bf2229f8 [PATCH] cpusets: formalize intermediate GFP_KERNEL containment ... Browse Code »

This patch makes use of the previously underutilized cpuset flag
'mem_exclusive' to provide what amounts to another layer of memory placement
resolution. With this patch, there are now the following four layers of
memory placement available:

1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
3) The current tasks cpuset (GFP_USER allocations constrained to here), and
4) Specific node placement, using mbind and set_mempolicy.

These nest - each layer is a subset (same or within) of the previous.

Layer (2) above is new, with this patch. The call used to check whether a
zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
extended to take a gfp_mask argument, and its logic is extended, in the case
that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
placement is allowed. The definition of GFP_USER, which used to be identical
to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
cpuset_gfp_hardwall_flag patch.

GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
cpuset, so long as any node therein is not too tight on memory, but will
escape to the larger layer, if need be.

The intended use is to allow something like a batch manager to handle several
jobs, each job in its own cpuset, but using common kernel memory for caches
and such. Swapper and oom_kill activity is also constrained to Layer (2). A
task in or below one mem_exclusive cpuset should not cause swapping on nodes
in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
task in another such cpuset. Heavy use of kernel memory for i/o caching and
such by one job should not impact the memory available to jobs in other
non-overlapping mem_exclusive cpusets.

This patch enables providing hardwall, inescapable cpusets for memory
allocations of each job, while sharing kernel memory allocations between
several jobs, in an enclosing mem_exclusive cpuset.

Like Dinakar's patch earlier to enable administering sched domains using the
cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
that had previously done nothing much useful other than restrict what cpuset
configurations were allowed.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-09-08 07:57:40 +0800

26 Jun, 2005

1 commit

85d7b9498 [PATCH] Dynamic sched domains: cpuset changes ... Browse Code »

Adds the core update_cpu_domains code and updated cpusets documentation

Signed-off-by: Dinakar Guniguntala
Acked-by: Paul Jackson
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dinakar Guniguntala
2005-06-26 07:24:45 +0800

21 May, 2005

1 commit

b39c4fab2 [PATCH] cpusets+hotplug+preepmt broken ... Browse Code »

This patch removes the entwining of cpusets and hotplug code in the "No
more Mr. Nice Guy" case of sched.c move_task_off_dead_cpu().

Since the hotplug code is holding a spinlock at this point, we cannot take
the cpuset semaphore, cpuset_sem, as would seem to be required either to
update the tasks cpuset, or to scan up the nested cpuset chain, looking for
the nearest cpuset ancestor that still has some CPUs that are online. So
we just punt and blast the tasks cpus_allowed with all bits allowed.

This reverts these lines of code to what they were before the cpuset patch.
And it updates the cpuset Doc file, to match.

The one known alternative to this that seems to work came from Dinakar
Guniguntala, and required the hotplug code to take the cpuset_sem semaphore
much earlier in its processing. So far as we know, the increased locking
entanglement between cpusets and hot plug of this alternative approach is
not worth doing in this case.

Signed-off-by: Paul Jackson
Acked-by: Nathan Lynch
Acked-by: Dinakar Guniguntala
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-05-21 06:48:19 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »
212

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800