Eric Lee / smarc-fsl-linux-kernel

16 May, 2006

3 commits

a4523a8b3 [PATCH] slab: Fix kmem_cache_destroy() on NUMA ... Browse Code »

With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't
free all objects." The problem is caused by sequences such as the
following (suppose we are on a NUMA machine with two nodes, 0 and 1):

* Allocate an object from cache on node 0.
* Free the object on node 1. The object is put into node 1's alien
array_cache for node 0.
* Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink().
* __cache_shrink() does drain_cpu_caches(), which loops through all nodes.
For each node it drains the shared array_cache and then handles the
alien array_cache for the other node.

However this means that node 0's shared array_cache will be drained,
and then node 1 will move the contents of its alien[0] array_cache
into that same shared array_cache. node 0's shared array_cache is
never looked at again, so the objects left there will appear to be in
use when __cache_shrink() calls __node_shrink() for node 0. So
__node_shrink() will return 1 and kmem_cache_destroy() will fail.

This patch fixes this by having drain_cpu_caches() do
drain_alien_cache() on every node before it does drain_array() on the
nodes' shared array_caches.

The problem was originally reported by Or Gerlitz .

Signed-off-by: Roland Dreier
Acked-by: Christoph Lameter
Acked-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Roland Dreier
2006-05-16 22:59:32 +0800
39d24e642 [PATCH] add slab_is_available() routine for boot code ... Browse Code »

slab_is_available() indicates slab based allocators are available for use.
SPARSEMEM code needs to know this as it can be called at various times
during the boot process.

Signed-off-by: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2006-05-16 02:20:56 +0800
ac924c603 [PATCH] setup_per_zone_pages_min() overflow fix ... Browse Code »

As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=6490, this
function can experience overflows on 32-bit machines, causing our response to
changed values of min_free_kbytes to go whacky.

Fixing it efficiently is all too hard, so fix it with 64-bit math instead.

Cc: Ake Sandgren
Cc: Martin Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-05-16 02:20:55 +0800

02 May, 2006

3 commits

bed120c64 [PATCH] spufs: fix for CONFIG_NUMA ... Browse Code »

Based on an older patch from Mike Kravetz

We need to have a mem_map for high addresses in order to make fops->no_page
work on spufs mem and register files. So far, we have used the
memory_present() function during early bootup, but that did not work when
CONFIG_NUMA was enabled.

We now use the __add_pages() function to add the mem_map when loading the
spufs module, which is a lot nicer.

Signed-off-by: Arnd Bergmann
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joel H Schopp
2006-05-02 09:17:46 +0800
46a66eecd [PATCH] sparsemem interaction with memory add bug fixes ... Browse Code »

This patch fixes two bugs with the way sparsemem interacts with memory add.
They are:

- memory leak if memmap for section already exists

- calling alloc_bootmem_node() after boot

These bugs were discovered and a first cut at the fixes were provided by
Arnd Bergmann and Joel Schopp .

Signed-off-by: Mike Kravetz
Signed-off-by: Joel Schopp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2006-05-02 09:17:46 +0800
4c28f8119 [PATCH] page migration: Fix fallback behavior for dirty pages ... Browse Code »

Currently we check PageDirty() in order to make the decision to swap out
the page. However, the dirty information may be only be contained in the
ptes pointing to the page. We need to first unmap the ptes before checking
for PageDirty(). If unmap is successful then the page count of the page
will also be decreased so that pageout() works properly.

This is a fix necessary for 2.6.17. Without this fix we may migrate dirty
pages for filesystems without migration functions. Filesystems may keep
pointers to dirty pages. Migration of dirty pages can result in the
filesystem keeping pointers to freed pages.

Unmapping is currently not be separated out from removing all the
references to a page and moving the mapping. Therefore try_to_unmap will
be called again in migrate_page() if the writeout is successful. However,
it wont do anything since the ptes are already removed.

The coming updates to the page migration code will restructure the code
so that this is no longer necessary.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-05-02 09:17:45 +0800

29 Apr, 2006

1 commit

693f7d362 [PATCH] slab: fix crash on __drain_alien_cahce() during CPU Hotplug ... Browse Code »

transfer_objects should only be called when all of the cpus in the
node are online. CPU_DEAD notifier callback marks l3->shared to NULL.

Signed-off-by: Jacob Shin
Signed-off-by: Linus Torvalds

shin, jacob
2006-04-29 00:00:35 +0800

27 Apr, 2006

1 commit

ebf43500e [PATCH] Add find_get_pages_contig(): contiguous variant of find_get_pages() ... Browse Code »

find_get_pages_contig() will break out if we hit a hole in the page cache.
From Andrew Morton, small modifications and documentation by me.

Signed-off-by: Jens Axboe

Jens Axboe
2006-04-27 14:59:48 +0800

26 Apr, 2006

1 commit

83d722f7e [PATCH] Remove __devinit and __cpuinit from notifier_call definitions ... Browse Code »

Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).

This patch fixes all such usages to _not_ have the notifier_call __init
section.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds

Chandra Seetharaman
2006-04-26 23:30:03 +0800

23 Apr, 2006

1 commit

304dbdb7a [PATCH] add migratepage address space op to shmem ... Browse Code »

Basic problem: pages of a shared memory segment can only be migrated once.

In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
migratepage address space op. Therefore, migrate_pages() falls back to
default processing. In this path, it will try to pageout() dirty pages.
Once a shared memory page has been migrated it becomes dirty, so
migrate_pages() will try to page it out. However, because the page count
is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
is_page_cache_freeable() returns false. This will abort all subsequent
migrations.

This patch adds a migratepage address space op to shared memory segments to
avoid taking the default path. We use the "migrate_page()" function
because it knows how to migrate dirty pages. This allows shared memory
segment pages to migrate, subject to other conditions such as # pte's
referencing the page [page_mapcount(page)], when requested.

I think this is safe. If we're migrating a shared memory page, then we
found the page via a page table, so it must be in memory.

Can be verified with memtoy and the shmem-mbind-test script, both
available at: http://free.linux.hp.com/~lts/Tools/

Signed-off-by: Lee Schermerhorn
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2006-04-23 00:19:52 +0800

20 Apr, 2006

5 commits

6d472be37 [PATCH] Remove cond_resched in gather_stats() ... Browse Code »

gather_stats() is called with a spinlock held from check_pte_range. We
cannot reschedule with a lock held.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-04-20 22:54:03 +0800
6aa3001b2 [PATCH] page_alloc.c: buddy handling cleanup ... Browse Code »

Fix up some whitespace damage.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-20 00:13:50 +0800
013159227 [PATCH] mm: fix mm_struct reference counting bugs in mm/oom_kill.c ... Browse Code »

Fix oom_kill_task() so it doesn't call mmput() (which may sleep) while
holding tasklist_lock.

Signed-off-by: David S. Peterson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Peterson
2006-04-20 00:13:50 +0800
97c2c9b84 [PATCH] oom-kill: mm locking fix ... Browse Code »

Dave Peterson points out that badness() is playing with
mm_structs without taking a reference on them.

mmput() can sleep, so taking a reference here (inside tasklist_lock) is
hard. Fix it up via task_lock() instead.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-20 00:13:49 +0800
75129e297 [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS ... Browse Code »

Convert for-loops that explicitly reference "NR_CPUS" into the
potentially more efficient for_each_possible_cpu() construct.

Signed-off-by: John Hawkes
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Hawkes
2006-04-20 00:13:49 +0800

18 Apr, 2006

1 commit

69cf0fac6 [PATCH] Fix MADV_REMOVE protection checking ... Browse Code »

madvise_remove needs to respect file and mmap protections.

Signed-off-by: Hugh Dickins
[ Will the real CVE-2006-1524 stand up, please.. ]
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-04-18 09:22:18 +0800

11 Apr, 2006

11 commits

fd5403c79 [PATCH] page-writeback comment fixes ... Browse Code »

Signed-off-by: Coywolf Qi Hunt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Coywolf Qi Hunt
2006-04-11 21:18:46 +0800
64a3ca5f7 [PATCH] mm/migrate.c: don't export a static function ... Browse Code »

EXPORT_SYMBOL'ing of a static function is not a good idea.

Signed-off-by: Adrian Bunk
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-04-11 21:18:33 +0800
d5ddc79bc [PATCH] overcommit: use totalreserve_pages for nommu ... Browse Code »

This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/nommu.c.

When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().

Signed-off-by: Hideo Aoki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hideo AOKI
2006-04-11 21:18:32 +0800
6d9f78396 [PATCH] overcommit: use totalreserve_pages ... Browse Code »

This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/mmap.c.

When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().

Signed-off-by: Hideo Aoki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hideo AOKI
2006-04-11 21:18:32 +0800
cb45b0e96 [PATCH] overcommit: add calculate_totalreserve_pages() ... Browse Code »

These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().

- why the kernel needed patching

When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.

If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.

- the overall design approach in the patch

When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.

This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.

- testing results

I tested the patches using my test kernel module.

If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.

On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.

I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.

This patch adds totalreserve_pages for __vm_enough_memory().

Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.

The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.

Signed-off-by: Hideo Aoki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hideo AOKI
2006-04-11 21:18:32 +0800
e23ca00bf [PATCH] Some page migration fixups ... Browse Code »

- Remove sparse comment

- Remove duplicated include

- Return the correct error condition in migrate_page_remove_references().

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-04-11 21:18:32 +0800
1e624196f [PATCH] mm: fix bug in brk() ... Browse Code »

The code checks for newbrk with oldbrk which are page aligned before making
a check for the memory limit set of data segment. If the memory limit is
not page aligned in that case it bypasses the test for the limit if the
memory allocation is still for the same page.

Signed-off-by: Ram Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ram Gupta
2006-04-11 21:18:32 +0800
d6fef9da1 [PATCH] nommu: use compound page in slab allocator ... Browse Code »

The earlier patch to consolidate mmu and nommu page allocation and
refcounting by using compound pages for nommu allocations had a bug:
kmalloc slabs who's pages were initially allocated by a non-__GFP_COMP
allocator could be passed into mm/nommu.c kmalloc allocations which really
wanted __GFP_COMP underlying pages. Fix that by having nommu pass
__GFP_COMP to all higher order slab allocations.

Signed-off-by: Luke Yang
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luke Yang
2006-04-11 21:18:32 +0800
fb7faf331 [PATCH] slab: add statistics for alien cache overflows ... Browse Code »

Add a statistics counter which is incremented everytime the alien cache
overflows. alien_cache limit is hardcoded to 12 right now. We can use
this statistics to tune alien cache if needed in the future.

Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-04-11 21:18:31 +0800
5b74ada7e [PATCH] slab: allocate node local memory for off-slab slabmanagement ... Browse Code »

Allocate off-slab slab descriptors from node local memory.

Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-04-11 21:18:31 +0800
676165a8a [PATCH] Fix buddy list race that could lead to page lru list corruptions ... Browse Code »

Rohit found an obscure bug causing buddy list corruption.

page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.

Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).

Signed-off-by: Martin Bligh
Signed-off-by: Rohit Seth
Signed-off-by: Nick Piggin
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Linus Torvalds

Nick Piggin
2006-04-11 01:16:37 +0800

10 Apr, 2006

1 commit

a8062231d [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory ... Browse Code »

The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.

This can happen with memory hotplug when the hotplug area defines an so
far empty node.

Now use bootmem to try to allocate the mem_map in other nodes.

And if it fails don't panic, but just ignore the node.

To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.

TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2006-04-10 02:53:16 +0800

02 Apr, 2006

3 commits

a580290c3 Documentation: fix minor kernel-doc warnings ... Browse Code »

This patch updates the comments to match the actual code.

Signed-off-by: Martin Waitz
Signed-off-by: Adrian Bunk

Martin Waitz
2006-04-02 19:59:55 +0800
40094fa65 BUG_ON() Conversion in mm/slab.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:49:25 +0800
75babcace BUG_ON() Conversion in mm/highmem.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:47:35 +0800

01 Apr, 2006

8 commits

5aae277ed BUG_ON() Conversion in mm/vmalloc.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:26:09 +0800
e74ca2b49 BUG_ON() Conversion in mm/swap_state.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:25:12 +0800
46a350ef9 BUG_ON() Conversion in mm/mmap.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:23:29 +0800
f79e2abb9 [PATCH] sys_sync_file_range() ... Browse Code »

Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:

- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.

- Using fadvise() in this manner is something not covered by POSIX.

The patch wires up the syscall for x86.

The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

Documentation for the syscall is in fs/sync.c.

A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."

Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.

Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).

Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...

Cc: Nick Piggin
Cc: Michael Kerrisk
Cc: Ulrich Drepper
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-01 04:18:54 +0800
9b41046cd [PATCH] Don't pass boot parameters to argv_init[] ... Browse Code »

The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).

And __setup() is used in obsolete_checksetup().

start_kernel()
-> parse_args()
-> unknown_bootoption()
-> obsolete_checksetup()

If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.

If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func(). If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].

Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.

This patch fixes a wrong usage of it, however fixes obvious one only.

Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2006-04-01 04:18:53 +0800
78c997a4b [PATCH] hugetlb: don't allow free hugetlb count fall below reserved count ... Browse Code »

With strict page reservation, I think kernel should enforce number of free
hugetlb page don't fall below reserved count. Currently it is possible in
the sysctl path. Add proper check in sysctl to disallow that.

Signed-off-by: Ken Chen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-04-01 04:18:50 +0800
d6692183a [PATCH] fix extra page ref count in follow_hugetlb_page ... Browse Code »

git-commit: d5d4b0aa4e1430d73050babba999365593bdb9d2
"[PATCH] optimize follow_hugetlb_page" breaks mlock on hugepage areas.

I mis-interpret pages argument and made get_page() unconditional. It
should only get a ref count when "pages" argument is non-null.

Credit goes to Adam Litke who spotted the bug.

Signed-off-by: Ken Chen
Acked-by: Adam Litke
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-04-01 04:18:49 +0800
93fac7041 [PATCH] mm: schedule find_trylock_page() removal ... Browse Code »

find_trylock_page() is an odd interface in that it doesn't take a reference
like the others. Now that XFS no longer uses it, and its last remaining
caller actually wants an elevated refcount, opencode that callsite and
schedule find_trylock_page() for removal.

Signed-off-by: Nick Piggin
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-04-01 04:18:49 +0800

29 Mar, 2006

1 commit

7f927fcc2 [PATCH] Typo fixes ... Browse Code »

Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-03-29 01:16:08 +0800