26 Sep, 2006
40 commits
-
Make the FRV arch use the generic IRQ code rather than having its own
routines for doing so.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As David Howells points out, binfmt_elf sometimes uses
off_t, sometimes uses loff_t. Use loff_t throughout.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Take tty_mutex when accessing ->signal->tty in selinux code. Noted by Alan
Cox. Longer term, we are looking at refactoring the code to provide better
encapsulation of the tty layer, but this is a simple fix that addresses the
immediate bug.Signed-off-by: Stephen Smalley
Acked-by: Alan Cox
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch converts the semaphore in the superblock security struct to a
mutex. No locking changes or other code changes are done.Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch converts the remaining isec->sem into a mutex. Very similar
locking is provided as before only in the faster smaller mutex rather than a
semaphore. An out_unlock path is introduced rather than the conditional
unlocking found in the original code.Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
inode_security_set_sid is only called by security_inode_init_security, which
is called when a new file is being created and needs to have its incore
security state initialized and its security xattr set. This helper used to be
called in other places in the past, but now only has the one. So this patch
rolls inode_security_set_sid directly back into security_inode_init_security.
There also is no need to hold the isec->sem while doing this, as the inode is
not available to other threads at this point in time.Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduces support for policy version 21. This version of the binary
kernel policy allows for defining range transitions on security classes
other than the process security class. As always, backwards compatibility
for older formats is retained. The security class is read in as specified
when using the new format, while the "process" security class is assumed
when using an older policy format.Signed-off-by: Darrel Goeddel
Signed-off-by: Stephen Smalley
Acked-by: James Morris
Acked-by: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Enable configuration of SELinux maximum supported policy version to support
legacy userland (init) that does not gracefully handle kernels that support
newer policy versions two or more beyond the installed policy, as in FC3
and FC4.[bunk@stusta.de: improve Kconfig help text]
Signed-off-by: Stephen Smalley
Acked-by: James Morris
Acked-by: Eric Paris
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace ctxid with sid in selinux_audit_rule_match interface for
consistency with other interfaces.Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rename selinux_ctxid_to_string to selinux_sid_to_string to be
consistent with other interfaces.Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid.
Signed-off-by: Stephen Smalley
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I found two location in hugetlb.c where we chase pointer instead of using
page_to_nid(). Page_to_nid is more effective and can get the node directly
from page flags.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update the comments for __oom_kill_task() to reflect the code changes.
Signed-off-by: Ram Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Minor performance fix.
If we reclaimed enough slab pages from a zone then we can avoid going off
node with the current allocation. Take care of updating nr_reclaimed when
reclaiming from the slab.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.The default for the min_slab_ratio is 5%
Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
*_pages is a better description of the role of the variable.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The allocpercpu functions __alloc_percpu and __free_percpu() are heavily
using the slab allocator. However, they are conceptually slab. This also
simplifies SLOB (at this point slob may be broken in mm. This should fix
it).Signed-off-by: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If a zone is unpopulated then we do not need to check for pages that are to
be drained and also not for vm counters that may need to be updated.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Free one_page currently adds the page to a fake list and calls
free_page_bulk. Fee_page_bulk takes it off again and then calles
__free_one_page.Make free_one_page go directly to __free_one_page. Saves list on / off and
a temporary list in free_one_page for higher ordered pages.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
One of the changes necessary for shared page tables is to standardize the
pxx_page macros. pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address. pud_page and pgd_page, on the
other hand, return the kernel virtual address.Shared page tables needs pud_page and pgd_page to return the actual page
structures. There are very few actual users of these functions, so it is
simple to standardize their usage.Since this is basic cleanup, I am submitting these changes as a standalone
patch. Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.Signed-off-by: Dave McCracken
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On High end systems (1024 or so cpus) this can potentially cause stack
overflow. Fix the stack usage.Signed-off-by: Suresh Siddha
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Profiling really suffers with off node buffers. Fail if no memory is
available on the nodes. The profiling code can deal with these failures
should they occur.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are frequent references to *z in get_page_from_freelist.
Add an explicit zone variable that can be used in all these places.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.Signed-off-by: Christoph Lameter
Cc: Andy Whitcroft
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the user specified a node where we should move the page to then we
really do not want any other node.Signed-off-by: Christoph Lameter
Cc: Andy Whitcroft
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
…mory policy restrictions
Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
flag is essential if a kernel component requires memory to be located on a
certain node. It will be needed for alloc_pages_node() to force allocation
on the indicated node and for alloc_pages() to force allocation on the
current node.Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org> -
Place the alien array cache locks of on slab malloc slab caches on a
seperate lockdep class. This avoids false positives from lockdep[akpm@osdl.org: build fix]
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Cc: Thomas Gleixner
Acked-by: Arjan van de Ven
Cc: Ingo Molnar
Cc: Pekka Enberg
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is fairly easy to get a system to oops by simply sizing a cache via
/proc in such a way that one of the chaches (shared is easiest) becomes
bigger than the maximum allowed slab allocation size. This occurs because
enable_cpucache() fails if it cannot reallocate some caches.However, enable_cpucache() is used for multiple purposes: resizing caches,
cache creation and bootstrap.If the slab is already up then we already have working caches. The resize
can fail without a problem. We just need to return the proper error code.
F.e. after this patch:# echo "size-64 10000 50 1000" >/proc/slabinfo
-bash: echo: write error: Cannot allocate memorynotice no OOPS.
If we are doing a kmem_cache_create() then we also should not panic but
return -ENOMEM.If on the other hand we do not have a fully bootstrapped slab allocator yet
then we should indeed panic since we are unable to bring up the slab to its
full functionality.Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The ability to free memory allocated to a slab cache is also useful if an
error occurs during setup of a slab. So extract the function.Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[akpm@osdl.org: export fix]
Signed-off-by: Christoph Hellwig
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Let's try to keep mm/ comments more useful and up to date. This is a start.
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Also, checks if we get a valid slabp_cache for off slab slab-descriptors.
We should always get this. If we don't, then in that case we, will have to
disable off-slab descriptors for this cache and do the calculations again.
This is a rare case, so add a BUG_ON, for now, just in case.Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Cc: Pekka Enberg
Cc: Manfred Spraul
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce ARCH_LOW_ADDRESS_LIMIT which can be set per architecture to
override the 4GB default limit used by the bootmem allocater within
__alloc_bootmem_low() and __alloc_bootmem_low_node(). E.g. s390 needs a
2GB limit instead of 4GB.Acked-by: Ingo Molnar
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Print the name of the task invoking the OOM killer. Could make debugging
easier.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Skip kernel threads, rather than having them return 0 from badness.
Theoretically, badness might truncate all results to 0, thus a kernel thread
might be picked first, causing an infinite loop.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
PF_SWAPOFF processes currently cause select_bad_process to return straight
away. Instead, give them high priority, so we will kill them first, however
we also first ensure no parallel OOM kills are happening at the same time.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Having the oomkilladj == OOM_DISABLE check before the releasing check means
that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer.Moving the test down will give the desired behaviour. Also: it will allow
them to "OOM-kill" themselves if they are exiting. As per the previous patch,
this is required to prevent OOM killer deadlocks (and they don't actually get
killed, because they're already exiting -- they're simply allowed access to
memory reserves).Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds