Eric Lee / smarc-fsl-linux-kernel

26 Mar, 2006

14 commits

315ab19a6 [PATCH] mm: restore vm_normal_page check ... Browse Code »

Hugh is rightly concerned that the CONFIG_DEBUG_VM coverage has gone too
far in vm_normal_page, considering that we expect production kernels to be
shipped with the option turned off, and that the code has been under some
large changes recently.

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-26 00:43:45 +0800
1e8c57393 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
BUG_ON() Conversion in drivers/video/
BUG_ON() Conversion in drivers/parisc/
BUG_ON() Conversion in drivers/block/
BUG_ON() Conversion in sound/sparc/cs4231.c
BUG_ON() Conversion in drivers/s390/block/dasd.c
BUG_ON() Conversion in lib/swiotlb.c
BUG_ON() Conversion in kernel/cpu.c
BUG_ON() Conversion in ipc/msg.c
BUG_ON() Conversion in block/elevator.c
BUG_ON() Conversion in fs/coda/
BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
BUG_ON() Conversion in input/serio/hil_mlc.c
BUG_ON() Conversion in md/dm-hw-handler.c
BUG_ON() Conversion in md/bitmap.c
The comment describing how MS_ASYNC works in msync.c is confusing
rcu: undeclared variable used in documentation
fix typos "wich" -> "which"
typo patch for fs/ufs/super.c
Fix simple typos
tabify drivers/char/Makefile
...

Linus Torvalds
2006-03-26 00:41:09 +0800
6e692ed37 [PATCH] fix alloc_large_system_hash() roundup ... Browse Code »

The "rounded up to nearest power of 2 in size" algorithm in
alloc_large_system_hash is not correct. As coded, it takes an otherwise
acceptable power-of-2 value and doubles it. For example, we see the error
if we boot with thash_entries=2097152 which produces a hash table with
4194304 entries.

Signed-off-by: John Hawkes
Cc: Roland Dreier
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Hawkes
2006-03-26 00:22:58 +0800
05eeae208 [PATCH] find_task_by_pid() needs tasklist_lock ... Browse Code »

A couple of places are forgetting to take it.

The kswapd case is probably unimportant. keventd_create_kthread() was racy.

The whole thing is a bit flakey: you start a kernel thread, get its pid from
kernel_thread() then look up its task_struct.

a) It assumes that pid recycling takes a "long" time.

b) We get a task_struct but no reference was taken on it. The owner of the
kswapd and kthread task_struct*'s must assume that the new thread won't
exit unexpectedly. Because if it does, they're left holding dead memory
and any attempt to control or stop that task will crash.

Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-26 00:22:57 +0800
f5335c0f1 [PATCH] quieten zone_pcp_init ... Browse Code »

In zone_pcp_init we print out all zones even if they are empty:

On node 0 totalpages: 245760
DMA zone: 245760 pages, LIFO batch:31
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0

To conserve dmesg space why not print only the non zero zones.

Signed-off-by: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2006-03-26 00:22:50 +0800
d784124cf [PATCH] mm: make page migration dependent on swap and NUMA ... Browse Code »

The page migration code could function without NUMA but we currently have
no users for the non-NUMA case.

Signed-off-by: Christoph Lameter
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-26 00:22:50 +0800
0718dc2a8 [PATCH] slab: fix memory leak in alloc_kmemlist ... Browse Code »

We have had this memory leak for a while now. The situation is complicated
by the use of alloc_kmemlist() as a function to resize various caches by
do_tune_cpucache().

What we do here is first of all make sure that we deallocate properly in
the loop over all the nodes.

If we are just resizing caches then we can simply return with -ENOMEM if an
allocation fails.

If the cache is new then we need to rollback and remove all earlier
allocations.

We detect that a cache is new by checking if the link to the global cache
chain has been setup. This is a bit hackish ....

(also fix up too overlong lines that I added in the last patch...)

Signed-off-by: Christoph Lameter
Cc: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-26 00:22:50 +0800
cafeb02e0 [PATCH] alloc_kmemlist: Some cleanup in preparation for a real memory leak fix ... Browse Code »

Inspired by Jesper Juhl's patch from today

1. Get rid of err
We do not set it to anything else but zero.

2. Drop the CONFIG_NUMA stuff.
There are definitions for alloc_alien_cache and free_alien_cache()
that do the right thing for the non NUMA case.

3. Better naming of variables.

4. Remove redundant cachep->nodelists[node] expressions.

Signed-off-by: Christoph Lameter
Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-26 00:22:50 +0800
e00946fe2 [PATCH] slab: Bypass free lists for __drain_alien_cache() ... Browse Code »

__drain_alien_cache() currently drains objects by freeing them to the
(remote) freelists of the original node. However, each node also has a
shared list containing objects to be used on any processor of that node.
We can avoid a number of remote node accesses by copying the pointers to
the free objects directly into the remote shared array.

And while we are at it: Skip alien draining if the alien cache spinlock is
already taken.

Kiran reported that this is a performance benefit.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-26 00:22:49 +0800
3ded175a4 [PATCH] slab: add transfer_objects() function ... Browse Code »

slabr_objects() can be used to transfer objects between various object
caches of the slab allocator. It is currently only used during
__cache_alloc() to retrieve elements from the shared array. We will be
using it soon to transfer elements from the alien caches to the remote
shared array.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-26 00:22:49 +0800
c5e3b83e9 [PATCH] mm: use kmem_cache_zalloc ... Browse Code »

Convert mm/ to use the new kmem_cache_zalloc allocator.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2006-03-26 00:22:49 +0800
40c07ae8d [PATCH] slab: optimize constant-size kzalloc calls ... Browse Code »

As suggested by Eric Dumazet, optimize kzalloc() calls that pass a
compile-time constant size. Please note that the patch increases kernel
text slightly (~200 bytes for defconfig on x86).

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2006-03-26 00:22:49 +0800
a8c0f9a41 [PATCH] slab: introduce kmem_cache_zalloc allocator ... Browse Code »

Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator
already exits in XFS and there are potential users for it so this patch
makes the allocator available for the general public.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2006-03-26 00:22:49 +0800
871751e25 [PATCH] slab: implement /proc/slab_allocators ... Browse Code »

Implement /proc/slab_allocators. It produces output like:

idr_layer_cache: 80 idr_pre_get+0x33/0x4e
buffer_head: 2555 alloc_buffer_head+0x20/0x75
mm_struct: 9 mm_alloc+0x1e/0x42
mm_struct: 20 dup_mm+0x36/0x370
vm_area_struct: 384 dup_mm+0x18f/0x370
vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
vm_area_struct: 1 split_vma+0x5a/0x10e
vm_area_struct: 11 do_brk+0x206/0x2e2
vm_area_struct: 2 copy_vma+0xda/0x142
vm_area_struct: 9 setup_arg_pages+0x99/0x214
fs_cache: 8 copy_fs_struct+0x21/0x133
fs_cache: 29 copy_process+0xf38/0x10e3
files_cache: 30 alloc_files+0x1b/0xcf
signal_cache: 81 copy_process+0xbaa/0x10e3
sighand_cache: 77 copy_process+0xe65/0x10e3
sighand_cache: 1 de_thread+0x4d/0x5f8
anon_vma: 241 anon_vma_prepare+0xd9/0xf3
size-2048: 1 add_sect_attrs+0x5f/0x145
size-2048: 2 journal_init_revoke+0x99/0x302
size-2048: 2 journal_init_revoke+0x137/0x302
size-2048: 2 journal_init_inode+0xf9/0x1c4

Cc: Manfred Spraul
Cc: Alexander Nyberg
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Ravikiran Thirumalai
Signed-off-by: Al Viro
DESC
slab-leaks3-locking-fix
EDESC
From: Andrew Morton

Update for slab-remove-cachep-spinlock.patch

Cc: Al Viro
Cc: Manfred Spraul
Cc: Alexander Nyberg
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Ravikiran Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2006-03-26 00:22:49 +0800

25 Mar, 2006

1 commit

16538c407 The comment describing how MS_ASYNC works in msync.c is confusing ... Browse Code »

because of a typo. This patch just changes "my" to "by", which I
believe was the original intent.

Signed-off-by: Adrian Bunk

Amos Waterland
2006-03-25 01:30:53 +0800

24 Mar, 2006

17 commits

96840aa00 [PATCH] strndup_user() ... Browse Code »

This patch series creates a strndup_user() function to easy copying C strings
from userspace. Also we avoid common pitfalls like userspace modifying the
final \0 after the strlen_user().

Signed-off-by: Davi Arnaut
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davi Arnaut
2006-03-24 23:33:31 +0800
8f2e9f157 [PATCH] msync(): use do_fsync() ... Browse Code »

No need to duplicate all that code.

Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:27 +0800
676758bdb [PATCH] msync: fix return value ... Browse Code »

msync() does a strange thing. Essentially:

vma = find_vma();
for ( ; ; ) {
if (!vma)
return -ENOMEM;
...
vma = vma->vm_next;
}

so an msync() request which starts within or before a valid VMA and which ends
within or beyond the final VMA will incorrectly return -ENOMEM.

Fix.

Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:26 +0800
707c21c84 [PATCH] msync(MS_SYNC): don't hold mmap_sem while syncing ... Browse Code »

It seems bad to hold mmap_sem while performing synchronous disk I/O. Alter
the msync(MS_SYNC) code so that the lock is released while we sync the file.

Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:26 +0800
9c50823ee [PATCH] msync(): perform dirty page levelling ... Browse Code »

It seems sensible to perform dirty page throttling in msync: as the application
dirties pages we can kick off pdflush early, or even force the msync() caller
to perform writeout, or even throttle the msync() caller.

The main effect of this is to start disk writeback earlier if we've just
discovered that a large amount of pagecache has been dirtied. (Otherwise it
wouldn't happen for up to five seconds, next time pdflush wakes up).

It also will cause the page-dirtying process to get panalised for dirtying
those pages rather than whacking someone else with the problem.

We should do this for munmap() and possibly even exit(), too.

We drop the mmap_sem while performing the dirty page balancing. It doesn't
seem right to hold mmap_sem for that long.

Note that this patch only affects MS_ASYNC. MS_SYNC will be syncing all the
dirty pages anyway.

We note that msync(MS_SYNC) does a full-file-sync inside mmap_sem, and always
has. We can fix that up...

The patch also tightens up the mmap_sem coverage in sys_msync(): no point in
taking it while we perform the incoming arg checking.

Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:26 +0800
4741c9fd3 [PATCH] set_page_dirty() return value fixes ... Browse Code »

We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state. This wasn't right in a couple of places. Do a
kernel-wide audit, fix things up.

This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future. But we don't do that at present.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:26 +0800
fa5a734e4 [PATCH] balance_dirty_pages_ratelimited: take nr_pages arg ... Browse Code »

Modify balance_dirty_pages_ratelimited() so that it can take a
number-of-pages-which-I-just-dirtied argument. For msync().

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:26 +0800
ebcf28e1c [PATCH] fadvise(): write commands ... Browse Code »

Add two new linux-specific fadvise extensions():

LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file
offsets `offset' and `offset+len'. Any pages which are currently under
writeout are skipped, whether or not they are dirty.

LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file
offsets `offset' and `offset+len'.

By combining these two operations the application may do several things:

LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk.

LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty
pages at the disk.

LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all
of the currently dirty pages at the disk, wait until they have been written.

It should be noted that none of these operations write out the file's
metadata. So unless the application is strictly performing overwrites of
already-instantiated disk blocks, there are no guarantees here that the data
will be available after a crash.

To complete this suite of operations I guess we should have a "sync file
metadata only" operation. This gives applications access to all the building
blocks needed for all sorts of sync operations. But sync-metadata doesn't fit
well with the fadvise() interface. Probably it should be a new syscall:
sys_fmetadatasync().

The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64().
It is made to represent that last affected byte in the file (ie: it is
inclusive). Generally, all these byterange and pagerange functions are
inclusive so we can easily represent EOF with -1.

As Ulrich notes, these two functions are somewhat abusive of the fadvise()
concept, which appears to be "set the future policy for this fd".

But these commands are a perfect fit with the fadvise() impementation, and
several of the existing fadvise() commands are synchronous and don't affect
future policy either. I think we can live with the slight incongruity.

Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:25 +0800
469eb4d03 [PATCH] filemap_fdatawrite_range() api: clarify -end parameter ... Browse Code »

I had trouble understanding working out whether filemap_fdatawrite_range()'s
`end' parameter describes the last-byte-to-be-written or the last-plus-one.
Clarify that in comments.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-24 23:33:25 +0800
b2455396b [PATCH] cpuset: memory_spread_slab drop useless PF_SPREAD_PAGE check ... Browse Code »

The hook in the slab cache allocation path to handle cpuset memory
spreading for tasks in cpusets with 'memory_spread_slab' enabled has a
modest performance bug. The hook calls into the memory spreading handler
alternate_node_alloc() if either of 'memory_spread_slab' or
'memory_spread_page' is enabled, even though the handler does nothing
(albeit harmlessly) for the page case

Fix - drop PF_SPREAD_PAGE from the set of flag bits that are used to
trigger a call to alternate_node_alloc().

The page case is handled by separate hooks -- see the calls conditioned on
cpuset_do_page_mem_spread() in mm/filemap.c

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:24 +0800
c61afb181 [PATCH] cpuset memory spread slab cache optimizations ... Browse Code »

The hooks in the slab cache allocator code path for support of NUMA
mempolicies and cpuset memory spreading are in an important code path. Many
systems will use neither feature.

This patch optimizes those hooks down to a single check of some bits in the
current tasks task_struct flags. For non NUMA systems, this hook and related
code is already ifdef'd out.

The optimization is done by using another task flag, set if the task is using
a non-default NUMA mempolicy. Taking this flag bit along with the
PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
memory spreading' patch set, one can check for the combination of any of these
special case memory placement mechanisms with a single test of the current
tasks task_struct flags.

This patch also tightens up the code, to save a few bytes of kernel text
space, and moves some of it out of line. Due to the nested inlines called
from multiple places, we were ending up with three copies of this code, which
once we get off the main code path (for local node allocation) seems a bit
wasteful of instruction memory.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:23 +0800
101a50019 [PATCH] cpuset memory spread slab cache implementation ... Browse Code »

Provide the slab cache infrastructure to support cpuset memory spreading.

See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading.

This patch provides a slab cache SLAB_MEM_SPREAD flag. If set in the
kmem_cache_create() call defining a slab cache, then any task marked with the
process state flag PF_MEMSPREAD will spread memory page allocations for that
cache over all the allowed nodes, instead of preferring the local (faulting)
node.

On systems not configured with CONFIG_NUMA, this results in no change to the
page allocation code path for slab caches.

On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.

For tasks so marked, a second inline test is done for the slab cache flag
SLAB_MEM_SPREAD, and if that is set and if the allocation is not
in_interrupt(), this adds a call to to a cpuset routine that computes which of
the tasks mems_allowed nodes should be preferred for this allocation.

==> This patch adds another hook into the performance critical
code path to allocating objects from the slab cache, in the
____cache_alloc() chunk, below. The next patch optimizes this
hook, reducing the impact of the combined mempolicy plus memory
spreading hooks on this critical code path to a single check
against the tasks task_struct flags word.

This patch provides the generic slab flags and logic needed to apply memory
spreading to a particular slab.

A subsequent patch will mark a few specific slab caches for this placement
policy.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:23 +0800
44110fe38 [PATCH] cpuset memory spread page cache implementation and hooks ... Browse Code »

Change the page cache allocation calls to support cpuset memory spreading.

See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
spreading.

On systems without cpusets configured in the kernel, this is no change.

On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.

On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
routine that computes which of the tasks mems_allowed nodes should be
preferred for this allocation.

If memory spreading applies to a particular allocation, then any other NUMA
mempolicy does not apply.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-24 23:33:22 +0800
0b1303fcf [PATCH] cpusets: only wakeup kswapd for zones in the current cpuset ... Browse Code »

If we get under some memory pressure in a cpuset (we only scan zones that
are in the cpuset for memory) then kswapd is woken up for all zones. This
patch only wakes up kswapd in zones that are part of the current cpuset.

Signed-off-by: Christoph Lameter
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-24 23:33:22 +0800
ed5b43f15 [PATCH] Represent laptop_mode as jiffies internally ... Browse Code »

Make that the internal value for /proc/sys/vm/laptop_mode is stored as
jiffies instead of seconds. Let the sysctl interface do the conversions,
instead of doing on-the-fly conversions every time the value is used.

Add a description of the fact that laptop_mode doubles as a flag and a
timeout to the comment above the laptop_mode variable.

Signed-off-by: Bart Samwel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bart Samwel
2006-03-24 23:33:20 +0800
f6ef94381 [PATCH] Represent dirty_*_centisecs as jiffies internally ... Browse Code »

Make that the internal values for:

/proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_expire_centisecs

are stored as jiffies instead of centiseconds. Let the sysctl interface do
the conversions with full precision using clock_t_to_jiffies, instead of
doing overflow-sensitive on-the-fly conversions every time the values are
used.

Cons: apparent precision loss if HZ is not a multiple of 100, because of
conversion back and forth. This is a common problem for all sysctl values
that use proc_dointvec_userhz_jiffies. (There is only one other in-tree
use, in net/core/neighbour.c.)

Signed-off-by: Bart Samwel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bart Samwel
2006-03-24 23:33:20 +0800
2056a782f [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23 ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2006-03-24 03:00:26 +0800

23 Mar, 2006

3 commits

d8733c295 [PATCH] ext3_readdir: use generic readahead ... Browse Code »

Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory. So for large
directories we end up performing no readahead at all and we suck.

So take it all out and use the core VM's page_cache_readahead(). This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.

Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space. Fortunately the readahead code is all set up to handle this.

Tested with printk. It works. I was struggling to find a real workload which
actually cared.

(The patch also exports page_cache_readahead() to GPL modules)

Cc: "Stephen C. Tweedie"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-23 23:38:09 +0800
6e1819d61 [PATCH] swsusp: userland interface ... Browse Code »

This patch introduces a user space interface for swsusp.

The interface is based on a special character device, called the snapshot
device, that allows user space processes to perform suspend and resume-related
operations with the help of some ioctls and the read()/write() functions.
Additionally it allows these processes to allocate free swap pages from a
selected swap partition, called the resume partition, so that they know which
sectors of the resume partition are available to them.

The interface uses the same low-level system memory snapshot-handling
functions that are used by the built-it swap-writing/reading code of swsusp.

The interface documentation is included in the patch.

The patch assumes that the major and minor numbers of the snapshot device will
be 10 (ie. misc device) and 231, the registration of which has already been
requested.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-03-23 23:38:07 +0800
f577eb30a [PATCH] swsusp: low level interface ... Browse Code »

Introduce the low level interface that can be used for handling the
snapshot of the system memory by the in-kernel swap-writing/reading code of
swsusp and the userland interface code (to be introduced shortly).

Also change the way in which swsusp records the allocated swap pages and,
consequently, simplifies the in-kernel swap-writing/reading code (this is
necessary for the userland interface too). To this end, it introduces two
helper functions in mm/swapfile.c, so that the swsusp code does not refer
directly to the swap internals.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-03-23 23:38:07 +0800

22 Mar, 2006

5 commits

b20a35035 [PATCH] page migration reorg ... Browse Code »

Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-22 23:54:06 +0800
442295c94 [PATCH] mm: slab cache interleave rotor fix ... Browse Code »

The alien cache rotor in mm/slab.c assumes that the first online node is
node 0. Eventually for some archs, especially with hotplug, this will no
longer be true.

Fix the interleave rotor to handle the general case of node numbering.

Signed-off-by: Paul Jackson
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-22 23:54:06 +0800
fdb7cc590 [PATCH] mm: hugetlb alloc_fresh_huge_page bogus node loop fix ... Browse Code »

Fix bogus node loop in hugetlb.c alloc_fresh_huge_page(), which was
assuming that nodes are numbered contiguously from 0 to num_online_nodes().
Once the hotplug folks get this far, that will be false.

Signed-off-by: Paul Jackson
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-03-22 23:54:06 +0800
9b65ef59d [PATCH] fix swap cluster offset ... Browse Code »

When we've allocated SWAPFILE_CLUSTER pages, ->cluster_next should be the
first index of swap cluster. But current code probably sets it wrong offset.

Signed-off-by: Akinobu Mita
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2006-03-22 23:54:06 +0800
879336c39 [PATCH] drain_node_pages: interrupt latency reduction / optimization ... Browse Code »

1. Only disable interrupts if there is actually something to free

2. Only dirty the pcp cacheline if we actually freed something.

3. Disable interrupts for each single pcp and not for cleaning
all the pcps in all zones of a node.

drain_node_pages is called every 2 seconds from cache_reap. This
fix should avoid most disabling of interrupts.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-22 23:54:06 +0800