Doug / smarc-fsl-linux-kernel | Embedian Git Server

21 Oct, 2005

1 commit

ac9b9c667 [PATCH] Fix handling spurious page fault for hugetlb region ... Browse Code »

This reverts commit 3359b54c8c07338f3a863d1109b42eebccdcf379 and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-21 00:02:07 +0800

20 Oct, 2005

3 commits

281dd25cd [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory ... Browse Code »

This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.

We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.

The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator. But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
cleanup.

With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.

Signed-off-by: Yasunori Goto
Cc: Ravikiran G Thirumalai
Signed-off-by: Linus Torvalds

Yasunori Goto
2005-10-20 14:11:33 +0800
1c59827d1 [PATCH] mm: hugetlb truncation fixes ... Browse Code »

hugetlbfs allows truncation of its files (should it?), but hugetlb.c often
forgets that: crashes and misaccounting ensue.

copy_hugetlb_page_range better grab the src page_table_lock since we don't
want to guess what happens if concurrently truncated. unmap_hugepage_range
rss accounting must not assume the full range was mapped. follow_hugetlb_page
must guard with page_table_lock and be prepared to exit early.

Restyle copy_hugetlb_page_range with a for loop like the others there.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-20 14:04:30 +0800
3359b54c8 [PATCH] Handle spurious page fault for hugetlb region ... Browse Code »

The hugetlb pages are currently pre-faulted. At the time of mmap of
hugepages, we populate the new PTEs. It is possible that HW has already
cached some of the unused PTEs internally. These stale entries never
get a chance to be purged in existing control flow.

This patch extends the check in page fault code for hugepages. Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).

Signed-off-by: Rohit Seth

[ This is apparently arguably an ia64 port bug. But the code won't
hurt, and for now it fixes a real problem on some ia64 machines ]

Signed-off-by: Linus Torvalds

Seth, Rohit
2005-10-20 04:56:27 +0800

17 Oct, 2005

1 commit

3d80636a0 Fix memory ordering bug in page reclaim ... Browse Code »

As noticed by Nick Piggin, we need to make sure that we check the page
count before we check for PageDirty, since the dirty check is only valid
if the count implies that we're the only possible ones holding the page.

We always did do this, but the code needs a read-memory-barrier to make
sure that the orderign is also honored by the CPU.

(The writer side is ordered due to the atomic decrement and test on the
page count, see the discussion on linux-kernel)

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-10-17 08:36:06 +0800

12 Oct, 2005

2 commits

f5154a98a [PATCH] Don't map the same page too much ... Browse Code »

Refuse to install a page into a mapping if the mapping count is already
ridiculously large.

You probably cannot trigger this on 32-bit architectures, but on a
64-bit setup we should protect against it.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-12 03:03:47 +0800
1bef40032 [PATCH] madvise: Avoid returning error code -EBADF for anonymous mappings ... Browse Code »

Revert this recent correctness change: Douglas Crosher
reported that it broke an existing application, and that madvise() works
without error on anonymous mappings on Solaris.

This means that madvise() will remain non-standards-compliant: we should
return -EBADF for all requests against non-file-backed vma's, but Linux only
does this for MADV_WILLNEED requests.

Signed-off-by: Suzuki K P
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suzuki
2005-10-12 00:46:54 +0800

09 Oct, 2005

1 commit

dd0fc66fb [PATCH] gfp flags annotations - part 1 ... Browse Code »

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-10-09 06:00:57 +0800

01 Oct, 2005

1 commit

6e3254c4e Revert "x86-64: Reverse order of bootmem lists" ... Browse Code »

As requested by Thomas Gleixner :

"5d3d0f7704ed0bc7eaca0501eeae3e5da1ea6c87 breaks a couple of ARM
boards, which depend on the historical bootmem allocation order.
There is a cleaner solution around to remove the pgdat list
completely, but this is a topic for post 2.6.14

Andi signalled ACK already."

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-10-01 03:38:27 +0800

28 Sep, 2005

2 commits

5c3823008 [PATCH] kmalloc_node IRQ safety fix ... Browse Code »

In kmalloc_node we are checking if the allocation is for the same node when
interrupts are "on". This may lead to an allocation on another node than
intended.

This patch just shifts the check for the current node in __cache_alloc_node
when interrupts are disabled.

Signed-off-by: Alok N Kataria
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alok N Kataria
2005-09-28 22:46:42 +0800
8b1f31246 [PATCH] mm: move_pte to remap ZERO_PAGE ... Browse Code »

Move the ZERO_PAGE remapping complexity to the move_pte macro in
asm-generic, have it conditionally depend on
__HAVE_ARCH_MULTIPLE_ZERO_PAGE, which gets defined for MIPS.

For architectures without __HAVE_ARCH_MULTIPLE_ZERO_PAGE, move_pte becomes
a noop.

From: Hugh Dickins

Fix nasty little bug we've missed in Nick's mremap move ZERO_PAGE patch.
The "pte" at that point may be a swap entry or a pte_file entry: we must
check pte_present before perhaps corrupting such an entry.

Patch below against 2.6.14-rc2-mm1, but the same bug is in 2.6.14-rc2's
mm/mremap.c, and more dangerous there since it's affecting all arches: I
think the safest course is to send Nick's patch and Yoichi's build fix and
this fix (build tested) on to Linus - so only MIPS can be affected.

Signed-off-by: Nick Piggin
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-09-28 22:46:40 +0800

24 Sep, 2005

1 commit

dbdb90450 [PATCH] revert oversized kmalloc check ... Browse Code »

As davem points out, this wasn't such a great idea. There may be some code
which does:

size = 1024*1024;
while (kmalloc(size, ...) == 0)
size /= 2;

which will now explode.

Cc: "David S. Miller"
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-09-24 04:35:37 +0800

23 Sep, 2005

4 commits

f7b3a4359 [PATCH] Fix bd_claim() error code. ... Browse Code »

Problem: In some circumstances, bd_claim() is returning the wrong error
code.

If we try to swapon an unused block device that isn't swap formatted, we
get -EINVAL. But if that same block device is already mounted, we instead
get -EBUSY, even though it still isn't a valid swap device.

This issue came up on the busybox list trying to get the error message
from "swapon -a" right. If a swap device is already enabled, we get -EBUSY,
and we shouldn't report this as an error. But we can't distinguish the two
-EBUSY conditions, which are very different errors.

In the code, bd_claim() returns either 0 or -EBUSY, but in this case busy
means "somebody other than sys_swapon has already claimed this", and
_that_ means this block device can't be a valid swap device. So return
-EINVAL there.

Signed-off-by: Rob Landley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Landley
2005-09-23 13:17:37 +0800
eafb42707 [PATCH] __kmalloc: Generate BUG if size requested is too large. ... Browse Code »

I had an issue on ia64 where I got a bug in kernel/workqueue because
kzalloc returned a NULL pointer due to the task structure getting too big
for the slab allocator. Usually these cases are caught by the kmalloc
macro in include/linux/slab.h.

Compilation will fail if a too big value is passed to kmalloc.

However, kzalloc uses __kmalloc which has no check for that. This patch
makes __kmalloc bug if a too large entity is requested.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-09-23 13:17:36 +0800
ff69416e6 [PATCH] slab: fix handling of pages from foreign NUMA nodes ... Browse Code »

The numa slab allocator may allocate pages from foreign nodes onto the
lists for a particular node if a node runs out of memory. Inspecting the
slab->nodeid field will not reflect that the page is now in use for the
slabs of another node.

This patch fixes that issue by adding a node field to free_block so that
the caller can indicate which node currently uses a slab.

Also removes the check for the current node from kmalloc_cache_node since
the process may shift later to another node which may lead to an allocation
on another node than intended.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-09-23 13:17:35 +0800
7243cc05b [PATCH] slab: alpha inlining fix ... Browse Code »

It is essential that index_of() be inlined. But alpha undoes the gcc
inlining hackery and index_of() ends up out-of-line. So fiddle with things
to make that function inline again.

Cc: Richard Henderson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ivan Kokshaysky
2005-09-23 13:17:34 +0800

22 Sep, 2005

2 commits

7e2cff42c [PATCH] mm: add a note about partially hardcoded VM_* flags ... Browse Code »

Hugh made me note this line for permission checking in mprotect():

if ((newflags & ~(newflags >> 4)) & 0xf) {

after figuring out what's that about, I decided it's nasty enough. Btw
Hugh itself didn't like the 0xf.

We can safely change it to VM_READ|VM_WRITE|VM_EXEC because we never change
VM_SHARED, so no need to check that.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-22 01:11:55 +0800
f10df6860 [PATCH] fix locking comment in unmap_region() ... Browse Code »

That comment is plain wrong (we even take the pagetable lock inside
unmap_region()).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-22 01:11:55 +0800

18 Sep, 2005

1 commit

f3519f919 [PATCH] fix mm/Kconfig spelling ... Browse Code »

Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2005-09-18 02:50:01 +0800

15 Sep, 2005

2 commits

c7e43c78a [PATCH] Fix slab BUG_ON() triggered by change in array cache size ... Browse Code »

With the new changes that we made in the initialization of the slab
allocator, we first setup the cache from which array caches are allocated,
and then the cache, from which kmem_list3's are allocated.

Now if the array cache comes from a cache in which objsize > 32, (in this
instance size-64) then, first size-64 cache will be allocated and then the
size-128 (if this is the cache from which kmem_list3's are going to be
allocated).

So with these new changes, we are not guaranteed that we will be
initializing the malloc_sizes array in a serialized order. Thus there is
a bug in __find_general_cachep, as we are checking whether the first
cache_sizes ptr is NULL.

This is replaced by checking whether the array-cache cache is initialized.
Attached is a patch which does that. Boots fine on a x86-64, with
DEBUG_SPIN, DEBUG_SLAB, and preempt.

Attached is a patch which does that. Boots fine on a x86-64, with
DEBUG_SPIN, DEBUG_SLAB, and preempt.Thanks & Regards, Alok

Signed-off-by: Alok N Kataria
Signed-off-by: Shobhit Dayal
Cc: Manfred Spraul
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alok Kataria
2005-09-15 03:31:45 +0800
2fd4ef85e [PATCH] error path in setup_arg_pages() misses vm_unacct_memory() ... Browse Code »

Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).

These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.

So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run. But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.

The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.

And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.

Signed-off-by: Hugh Dickins
Signed-Off-By: Kirill Korotaev
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-09-15 02:18:13 +0800

13 Sep, 2005

4 commits

9f1583339 [PATCH] use add_taint() for setting tainted bit flags ... Browse Code »

Use the add_taint() interface for setting tainted bit flags instead of
doing it manually.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-09-13 23:22:29 +0800
5b952b3c1 [PATCH] Fix MPOL_F_VERIFY ... Browse Code »

There was a pretty bad bug in there that the code would always check the full
VMA, not the range the user requested.

When the VMA to be checked was merged with the previous VMA this could lead to
spurious failures.

Signed-off-by: "Andi Kleen"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2005-09-13 23:22:28 +0800
8d0986e28 [PATCH] vm: kswapd cleanup: use pgdat ... Browse Code »

Use the pgdat pointer we've already defined in wakeup_kswapd

Signed-off-by: Con Kolivas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2005-09-13 23:22:28 +0800
5d3d0f770 [PATCH] x86-64: Reverse order of bootmem lists ... Browse Code »

This leads to bootmem allocating first from node 0 instead
of from the last node. This avoids swiotlb allocating on the last node, which
doesn't really work on a machine with >4GB.

Note: there is a better patch around from someone else that gets
rid of the pgdat list completely.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-09-13 01:49:56 +0800

12 Sep, 2005

1 commit

66aa2b4b1 [PATCH] uclinux: add NULL check, 0 end valid check and some more exports to nommu.c ... Browse Code »

Move call to get_mm_counter() in update_mem_hiwater() to be
inside the check for tsk->mm being null. Otherwise you can be
following a null pointer here. This patch submitted by
Javier Herrero .

Modify the end check for munmap regions to allow for the
legacy behavior of 0 being valid. Pretty much all current
uClinux system libc malloc's pass in 0 as the end point.
A hard check will fail on these, so change the check so
that if it is non-zero it must be valid otherwise it fails.
A passed in value will always succeed (as it used too).

Also export a few more mm system functions - to be consistent
with the VM code exports.

Signed-off-by: Greg Ungerer
Signed-off-by: Linus Torvalds

Greg Ungerer
2005-09-12 11:43:47 +0800

11 Sep, 2005

5 commits

13e4b57f6 [PATCH] mm: fix-up schedule_timeout() usage ... Browse Code »

Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.

Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2005-09-11 01:06:37 +0800
207f36eec [PATCH] remove invalid comment in mm/page_alloc.c ... Browse Code »

free_pages_bulk() doesn't free the entire list if count == 0.

Signed-off-by: Renaud Lienhart
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Renaud Lienhart
2005-09-11 01:06:31 +0800
9de75d110 [PATCH] mm/swap_state: Fix "nocast type" warnings ... Browse Code »

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco
Signed-off-by: Domen Puncer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Victor Fusco
2005-09-11 01:06:28 +0800
b2d550736 [PATCH] mm/slab: fix sparse warnings ... Browse Code »

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco
Signed-off-by: Domen Puncer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Victor Fusco
2005-09-11 01:06:26 +0800
5ce7852cd [PATCH] mm/filemap.c: make two functions static ... Browse Code »

With Nick Piggin

Give some things static scope.

Signed-off-by: Adrian Bunk
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-09-11 01:06:25 +0800

10 Sep, 2005

5 commits

8d06afab7 [PATCH] timer initialization cleanup: DEFINE_TIMER ... Browse Code »

Clean up timer initialization by introducing DEFINE_TIMER a'la
DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been
been in the -RT tree for some time.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2005-09-10 05:03:48 +0800
80e93effc [PATCH] update kfree, vfree, and vunmap kerneldoc ... Browse Code »

This patch clarifies NULL handling of kfree() and vfree(). I addition,
wording of calling context restriction for vfree() and vunmap() are changed
from "may not" to "must not."

Signed-off-by: Pekka Enberg
Acked-by: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2005-09-10 05:03:43 +0800
e498be7da [PATCH] Numa-aware slab allocator V5 ... Browse Code »

The NUMA API change that introduced kmalloc_node was accepted for
2.6.12-rc3. Now it is possible to do slab allocations on a node to
localize memory structures. This API was used by the pageset localization
patch and the block layer localization patch now in mm. The existing
kmalloc_node is slow since it simply searches through all pages of the slab
to find a page that is on the node requested. The two patches do a one
time allocation of slab structures at initialization and therefore the
speed of kmalloc node does not matter.

This patch allows kmalloc_node to be as fast as kmalloc by introducing node
specific page lists for partial, free and full slabs. Slab allocation
improves in a NUMA system so that we are seeing a performance gain in AIM7
of about 5% with this patch alone.

More NUMA localizations are possible if kmalloc_node operates in an fast
way like kmalloc.

Test run on a 32p systems with 32G Ram.

w/o patch
Tasks jobs/min jti jobs/min/task real cpu
1 485.36 100 485.3640 11.99 1.91 Sat Apr 30 14:01:51 2005
100 26582.63 88 265.8263 21.89 144.96 Sat Apr 30 14:02:14 2005
200 29866.83 81 149.3342 38.97 286.08 Sat Apr 30 14:02:53 2005
300 33127.16 78 110.4239 52.71 426.54 Sat Apr 30 14:03:46 2005
400 34889.47 80 87.2237 66.72 568.90 Sat Apr 30 14:04:53 2005
500 35654.34 76 71.3087 81.62 714.55 Sat Apr 30 14:06:15 2005
600 36460.83 75 60.7681 95.77 853.42 Sat Apr 30 14:07:51 2005
700 35957.00 75 51.3671 113.30 990.67 Sat Apr 30 14:09:45 2005
800 33380.65 73 41.7258 139.48 1140.86 Sat Apr 30 14:12:05 2005
900 35095.01 76 38.9945 149.25 1281.30 Sat Apr 30 14:14:35 2005
1000 36094.37 74 36.0944 161.24 1419.66 Sat Apr 30 14:17:17 2005

w/patch
Tasks jobs/min jti jobs/min/task real cpu
1 484.27 100 484.2736 12.02 1.93 Sat Apr 30 15:59:45 2005
100 28262.03 90 282.6203 20.59 143.57 Sat Apr 30 16:00:06 2005
200 32246.45 82 161.2322 36.10 282.89 Sat Apr 30 16:00:42 2005
300 37945.80 83 126.4860 46.01 418.75 Sat Apr 30 16:01:28 2005
400 40000.69 81 100.0017 58.20 561.48 Sat Apr 30 16:02:27 2005
500 40976.10 78 81.9522 71.02 696.95 Sat Apr 30 16:03:38 2005
600 41121.54 78 68.5359 84.92 834.86 Sat Apr 30 16:05:04 2005
700 44052.77 78 62.9325 92.48 971.53 Sat Apr 30 16:06:37 2005
800 41066.89 79 51.3336 113.38 1111.15 Sat Apr 30 16:08:31 2005
900 38918.77 79 43.2431 134.59 1252.57 Sat Apr 30 16:10:46 2005
1000 41842.21 76 41.8422 139.09 1392.33 Sat Apr 30 16:13:05 2005

These are measurement taken directly after boot and show a greater
improvement than 5%. However, the performance improvements become less
over time if the AIM7 runs are repeated and settle down at around 5%.

Links to earlier discussions:
http://marc.theaimsgroup.com/?t=111094594500003&r=1&w=2
http://marc.theaimsgroup.com/?t=111603406600002&r=1&w=2

Changelog V4-V5:
- alloc_arraycache and alloc_aliencache take node parameter instead of cpu
- fix initialization so that nodes without cpus are properly handled.
- simplify code in kmem_cache_init
- patch against Andrews temp mm3 release
- Add Shai to credits
- fallback to __cache_alloc from __cache_alloc_node if the node's cache
is not available yet.

Changelog V3-V4:
- Patch against 2.6.12-rc5-mm1
- Cleanup patch integrated
- More and better use of for_each_node and for_each_cpu
- GCC 2.95 fix (do not use [] use [0])
- Correct determination of INDEX_AC
- Remove hack to cause an error on platforms that have no CONFIG_NUMA but nodes.
- Remove list3_data and list3_data_ptr macros for better readability

Changelog V2-V3:
- Made to patch against 2.6.12-rc4-mm1
- Revised bootstrap mechanism so that larger size kmem_list3 structs can be
supported. Do a generic solution so that the right slab can be found
for the internal structs.
- use for_each_online_node

Changelog V1-V2:
- Batching for freeing of wrong-node objects (alien caches)
- Locking changes and NUMA #ifdefs as requested by Manfred

Signed-off-by: Alok N Kataria
Signed-off-by: Shobhit Dayal
Signed-off-by: Shai Fultheim
Signed-off-by: Christoph Lameter
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-09-10 04:57:48 +0800
570bc1c2e [PATCH] tmpfs: Enable atomic inode security labeling ... Browse Code »

This patch modifies tmpfs to call the inode_init_security LSM hook to set
up the incore inode security state for new inodes before the inode becomes
accessible via the dcache.

As there is no underlying storage of security xattrs in this case, it is
not necessary for the hook to return the (name, value, len) triple to the
tmpfs code, so this patch also modifies the SELinux hook function to
correctly handle the case where the (name, value, len) pointers are NULL.

The hook call is needed in tmpfs in order to support proper security
labeling of tmpfs inodes (e.g. for udev with tmpfs /dev in Fedora). With
this change in place, we should then be able to remove the
security_inode_post_create/mkdir/... hooks safely.

Signed-off-by: Stephen Smalley
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2005-09-10 04:57:28 +0800
fef266580 [PATCH] update filesystems for new delete_inode behavior ... Browse Code »

Update the file systems in fs/ implementing a delete_inode() callback to
call truncate_inode_pages(). One implementation note: In developing this
patch I put the calls to truncate_inode_pages() at the very top of those
filesystems delete_inode() callbacks in order to retain the previous
behavior. I'm guessing that some of those could probably be optimized.

Signed-off-by: Mark Fasheh
Acked-by: Christoph Hellwig
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Fasheh
2005-09-10 04:57:27 +0800

09 Sep, 2005

1 commit

d42c69972 [PATCH] PCI: Run PCI driver initialization on local node ... Browse Code »

Run PCI driver initialization on local node

Instead of adding messy kmalloc_node()s everywhere run the
PCI driver probe on the node local to the device.

This would not have helped for IDE, but should for
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)

Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman

Andi Kleen
2005-09-09 05:57:23 +0800

08 Sep, 2005

3 commits

dd3927105 [PATCH] introduce and use kzalloc ... Browse Code »

This patch introduces a kzalloc wrapper and converts kernel/ to use it. It
saves a little program text.

Signed-off-by: Pekka Enberg
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka J Enberg
2005-09-08 07:57:45 +0800
ef08e3b49 [PATCH] cpusets: confine oom_killer to mem_exclusive cpuset ... Browse Code »

Now the real motivation for this cpuset mem_exclusive patch series seems
trivial.

This patch keeps a task in or under one mem_exclusive cpuset from provoking an
oom kill of a task under a non-overlapping mem_exclusive cpuset. Since only
interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
containment, there is little to gain from oom killing a task under a
non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
allocation must come from disjoint memory nodes.

This patch enables configuring a system so that a runaway job under one
mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
that might be using very high compute and memory resources for a prolonged
time.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-09-08 07:57:40 +0800
9bf2229f8 [PATCH] cpusets: formalize intermediate GFP_KERNEL containment ... Browse Code »

This patch makes use of the previously underutilized cpuset flag
'mem_exclusive' to provide what amounts to another layer of memory placement
resolution. With this patch, there are now the following four layers of
memory placement available:

1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
3) The current tasks cpuset (GFP_USER allocations constrained to here), and
4) Specific node placement, using mbind and set_mempolicy.

These nest - each layer is a subset (same or within) of the previous.

Layer (2) above is new, with this patch. The call used to check whether a
zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
extended to take a gfp_mask argument, and its logic is extended, in the case
that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
placement is allowed. The definition of GFP_USER, which used to be identical
to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
cpuset_gfp_hardwall_flag patch.

GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
cpuset, so long as any node therein is not too tight on memory, but will
escape to the larger layer, if need be.

The intended use is to allow something like a batch manager to handle several
jobs, each job in its own cpuset, but using common kernel memory for caches
and such. Swapper and oom_kill activity is also constrained to Layer (2). A
task in or below one mem_exclusive cpuset should not cause swapping on nodes
in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
task in another such cpuset. Heavy use of kernel memory for i/o caching and
such by one job should not impact the memory available to jobs in other
non-overlapping mem_exclusive cpusets.

This patch enables providing hardwall, inescapable cpusets for memory
allocations of each job, while sharing kernel memory allocations between
several jobs, in an enclosing mem_exclusive cpuset.

Like Dinakar's patch earlier to enable administering sched domains using the
cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
that had previously done nothing much useful other than restrict what cpuset
configurations were allowed.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2005-09-08 07:57:40 +0800