Eric Lee / smarc-fsl-linux-kernel

26 Sep, 2006

40 commits

1bcbba306 [PATCH] FRV: Use the generic IRQ stuff ... Browse Code »

Make the FRV arch use the generic IRQ code rather than having its own
routines for doing so.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-26 23:48:53 +0800
8d6b5eeea [PATCH] binfmt_elf: consistently use loff_t ... Browse Code »

As David Howells points out, binfmt_elf sometimes uses
off_t, sometimes uses loff_t. Use loff_t throughout.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-26 23:48:53 +0800
b20c8122a [PATCH] selinux: fix tty locking ... Browse Code »

Take tty_mutex when accessing ->signal->tty in selinux code. Noted by Alan
Cox. Longer term, we are looking at refactoring the code to provide better
encapsulation of the tty layer, but this is a simple fix that addresses the
immediate bug.

Signed-off-by: Stephen Smalley
Acked-by: Alan Cox
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2006-09-26 23:48:53 +0800
bc7e982b8 [PATCH] SELinux: convert sbsec semaphore to a mutex ... Browse Code »

This patch converts the semaphore in the superblock security struct to a
mutex. No locking changes or other code changes are done.

Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2006-09-26 23:48:53 +0800
239707417 [PATCH] SELinux: change isec semaphore to a mutex ... Browse Code »

This patch converts the remaining isec->sem into a mutex. Very similar
locking is provided as before only in the faster smaller mutex rather than a
semaphore. An out_unlock path is introduced rather than the conditional
unlocking found in the original code.

Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2006-09-26 23:48:53 +0800
296fddf75 [PATCH] SELinux: eliminate inode_security_set_security ... Browse Code »

inode_security_set_sid is only called by security_inode_init_security, which
is called when a new file is being created and needs to have its incore
security state initialized and its security xattr set. This helper used to be
called in other places in the past, but now only has the one. So this patch
rolls inode_security_set_sid directly back into security_inode_init_security.
There also is no need to hold the isec->sem while doing this, as the inode is
not available to other threads at this point in time.

Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2006-09-26 23:48:53 +0800
f3f877142 [PATCH] selinux: add support for range transitions on object classes ... Browse Code »

Introduces support for policy version 21. This version of the binary
kernel policy allows for defining range transitions on security classes
other than the process security class. As always, backwards compatibility
for older formats is retained. The security class is read in as specified
when using the new format, while the "process" security class is assumed
when using an older policy format.

Signed-off-by: Darrel Goeddel
Signed-off-by: Stephen Smalley
Acked-by: James Morris
Acked-by: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrel Goeddel
2006-09-26 23:48:52 +0800
016b9bdb8 [PATCH] selinux: enable configuration of max policy version ... Browse Code »

Enable configuration of SELinux maximum supported policy version to support
legacy userland (init) that does not gracefully handle kernels that support
newer policy versions two or more beyond the installed policy, as in FC3
and FC4.

[bunk@stusta.de: improve Kconfig help text]
Signed-off-by: Stephen Smalley
Acked-by: James Morris
Acked-by: Eric Paris
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2006-09-26 23:48:52 +0800
9a2f44f01 [PATCH] selinux: replace ctxid with sid in selinux_audit_rule_match interface ... Browse Code »

Replace ctxid with sid in selinux_audit_rule_match interface for
consistency with other interfaces.

Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2006-09-26 23:48:52 +0800
1a70cd40c [PATCH] selinux: rename selinux_ctxid_to_string ... Browse Code »

Rename selinux_ctxid_to_string to selinux_sid_to_string to be
consistent with other interfaces.

Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2006-09-26 23:48:52 +0800
62bac0185 [PATCH] selinux: eliminate selinux_task_ctxid ... Browse Code »

Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid.

Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Smalley
2006-09-26 23:48:52 +0800
89fa30242 [PATCH] NUMA: Add zone_to_nid function ... Browse Code »

There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
4415cc8df [PATCH] Hugepages: Use page_to_nid rather than traversing zone pointers ... Browse Code »

I found two location in hugetlb.c where we chase pointer instead of using
page_to_nid(). Page_to_nid is more effective and can get the node directly
from page flags.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
5a291b98b [PATCH] oom-kill: update comments to reflect current code ... Browse Code »

Update the comments for __oom_kill_task() to reflect the code changes.

Signed-off-by: Ram Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ram Gupta
2006-09-26 23:48:52 +0800
83e33a471 [PATCH] zone reclaim with slab: avoid unecessary off node allocations ... Browse Code »

Minor performance fix.

If we reclaimed enough slab pages from a zone then we can avoid going off
node with the current allocation. Take care of updating nr_reclaimed when
reclaiming from the slab.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
0ff38490c [PATCH] zone_reclaim: dynamic slab reclaim ... Browse Code »

Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.

However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).

This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we

1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.

2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).

The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.

The default for the min_slab_ratio is 5%

Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.

[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
972d1a7b1 [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE ... Browse Code »

Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.

Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
8417bba4b [PATCH] Replace min_unmapped_ratio by min_unmapped_pages in struct zone ... Browse Code »

*_pages is a better description of the role of the variable.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
d00bcc98d [PATCH] Extract the allocpercpu functions from the slab allocator ... Browse Code »

The allocpercpu functions __alloc_percpu and __free_percpu() are heavily
using the slab allocator. However, they are conceptually slab. This also
simplifies SLOB (at this point slob may be broken in mm. This should fix
it).

Signed-off-by: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
39bbcb8f8 [PATCH] mm: do not check unpopulated zones for draining and counter updates ... Browse Code »

If a zone is unpopulated then we do not need to check for pages that are to
be drained and also not for vm counters that may need to be updated.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
006d22d9b [PATCH] Optimize free_one_page ... Browse Code »

Free one_page currently adds the page to a fake list and calls
free_page_bulk. Fee_page_bulk takes it off again and then calles
__free_one_page.

Make free_one_page go directly to __free_one_page. Saves list on / off and
a temporary list in free_one_page for higher ordered pages.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
46a82b2d5 [PATCH] Standardize pxx_page macros ... Browse Code »

One of the changes necessary for shared page tables is to standardize the
pxx_page macros. pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address. pud_page and pgd_page, on the
other hand, return the kernel virtual address.

Shared page tables needs pud_page and pgd_page to return the actual page
structures. There are very few actual users of these functions, so it is
simple to standardize their usage.

Since this is basic cleanup, I am submitting these changes as a standalone
patch. Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

Signed-off-by: Dave McCracken
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave McCracken
2006-09-26 23:48:51 +0800
d2e7b7d0a [PATCH] fix potential stack overflow in mm/slab.c ... Browse Code »

On High end systems (1024 or so cpus) this can potentially cause stack
overflow. Fix the stack usage.

Signed-off-by: Suresh Siddha
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Siddha, Suresh B
2006-09-26 23:48:50 +0800
980128f22 [PATCH] Define easier to handle GFP_THISNODE ... Browse Code »

In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
fbd98167e [PATCH] Profiling: require buffer allocation on the correct node ... Browse Code »

Profiling really suffers with off node buffers. Fail if no memory is
available on the nodes. The profiling code can deal with these failures
should they occur.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
1192d5264 [PATCH] Cleanup: Add zone pointer to get_page_from_freelist ... Browse Code »

There are frequent references to *z in get_page_from_freelist.

Add an explicit zone variable that can be used in all these places.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
bd1b1677b [PATCH] Guarantee that the uncached allocator gets pages on the correct node ... Browse Code »

The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.

Signed-off-by: Christoph Lameter
Cc: Andy Whitcroft
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
3d99cfb5f [PATCH] sys_move_pages: Do not fall back to other nodes ... Browse Code »

If the user specified a node where we should move the page to then we
really do not want any other node.

Signed-off-by: Christoph Lameter
Cc: Andy Whitcroft
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
9b819d204 [PATCH] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/me… ... Browse Code »

…mory policy restrictions

Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
flag is essential if a kernel component requires memory to be located on a
certain node. It will be needed for alloc_pages_node() to force allocation
on the indicated node and for alloc_pages() to force allocation on the
current node.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Christoph Lameter
2006-09-26 23:48:50 +0800
056c62418 [PATCH] slab: fix lockdep warnings ... Browse Code »

Place the alien array cache locks of on slab malloc slab caches on a
seperate lockdep class. This avoids false positives from lockdep

[akpm@osdl.org: build fix]
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Cc: Thomas Gleixner
Acked-by: Arjan van de Ven
Cc: Ingo Molnar
Cc: Pekka Enberg
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-09-26 23:48:50 +0800
2ed3a4ef9 [PATCH] slab: do not panic when alloc_kmemlist fails and slab is up ... Browse Code »

It is fairly easy to get a system to oops by simply sizing a cache via
/proc in such a way that one of the chaches (shared is easiest) becomes
bigger than the maximum allowed slab allocation size. This occurs because
enable_cpucache() fails if it cannot reallocate some caches.

However, enable_cpucache() is used for multiple purposes: resizing caches,
cache creation and bootstrap.

If the slab is already up then we already have working caches. The resize
can fail without a problem. We just need to return the proper error code.
F.e. after this patch:

# echo "size-64 10000 50 1000" >/proc/slabinfo
-bash: echo: write error: Cannot allocate memory

notice no OOPS.

If we are doing a kmem_cache_create() then we also should not panic but
return -ENOMEM.

If on the other hand we do not have a fully bootstrapped slab allocator yet
then we should indeed panic since we are unable to bring up the slab to its
full functionality.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
117f6eb1d [PATCH] slab: extract __kmem_cache_destroy from kmem_cache_destroy ... Browse Code »

The ability to free memory allocated to a slab cache is also useful if an
error occurs during setup of a slab. So extract the function.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:50 +0800
dbe5e69d2 [PATCH] slab: optimize kmalloc_node the same way as kmalloc ... Browse Code »

[akpm@osdl.org: export fix]
Signed-off-by: Christoph Hellwig
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-09-26 23:48:49 +0800
da6052f7b [PATCH] update some mm/ comments ... Browse Code »

Let's try to keep mm/ comments more useful and up to date. This is a start.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
e5ac9c5ae [PATCH] Add some comments to slab.c ... Browse Code »

Also, checks if we get a valid slabp_cache for off slab slab-descriptors.
We should always get this. If we don't, then in that case we, will have to
disable off-slab descriptors for this cache and do the calculations again.
This is a rare case, so add a BUG_ON, for now, just in case.

Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Cc: Pekka Enberg
Cc: Manfred Spraul
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-09-26 23:48:49 +0800
dfd54cbcc [PATCH] bootmem: use MAX_DMA_ADDRESS instead of LOW32LIMIT ... Browse Code »

Introduce ARCH_LOW_ADDRESS_LIMIT which can be set per architecture to
override the 4GB default limit used by the bootmem allocater within
__alloc_bootmem_low() and __alloc_bootmem_low_node(). E.g. s390 needs a
2GB limit instead of 4GB.

Acked-by: Ingo Molnar
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2006-09-26 23:48:49 +0800
b72f16044 [PATCH] oom: more printk ... Browse Code »

Print the name of the task invoking the OOM killer. Could make debugging
easier.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
5081dde33 [PATCH] oom: kthread infinite loop fix ... Browse Code »

Skip kernel threads, rather than having them return 0 from badness.
Theoretically, badness might truncate all results to 0, thus a kernel thread
might be picked first, causing an infinite loop.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
af5b91243 [PATCH] oom: swapoff tasks tweak ... Browse Code »

PF_SWAPOFF processes currently cause select_bad_process to return straight
away. Instead, give them high priority, so we will kill them first, however
we also first ensure no parallel OOM kills are happening at the same time.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
4a3ede107 [PATCH] oom: handle oom_disable exiting ... Browse Code »

Having the oomkilladj == OOM_DISABLE check before the releasing check means
that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer.

Moving the test down will give the desired behaviour. Also: it will allow
them to "OOM-kill" themselves if they are exiting. As per the previous patch,
this is required to prevent OOM killer deadlocks (and they don't actually get
killed, because they're already exiting -- they're simply allowed access to
memory reserves).

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800