11 May, 2007
3 commits
-
This was in SLUB in order to head off trouble while the nr_cpu_ids
functionality was not merged. Its merged now so no need to still have this.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since it is referenced by memmap_init_zone (which is __meminit) via the
early_pfn_in_nid macro when CONFIG_NODES_SPAN_OTHER_NODES is set (which
basically means PowerPC 64).This removes a section mismatch warning in those circumstances.
Signed-off-by: Stephen Rothwell
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Avoid atomic overhead in slab_alloc and slab_free
SLUB needs to use the slab_lock for the per cpu slabs to synchronize with
potential kfree operations. This patch avoids that need by moving all free
objects onto a lockless_freelist. The regular freelist continues to exist
and will be used to free objects. So while we consume the
lockless_freelist the regular freelist may build up objects.If we are out of objects on the lockless_freelist then we may check the
regular freelist. If it has objects then we move those over to the
lockless_freelist and do this again. There is a significant savings in
terms of atomic operations that have to be performed.We can even free directly to the lockless_freelist if we know that we are
running on the same processor. So this speeds up short lived objects.
They may be allocated and freed without taking the slab_lock. This is
particular good for netperf.In order to maximize the effect of the new faster hotpath we extract the
hottest performance pieces into inlined functions. These are then inlined
into kmem_cache_alloc and kmem_cache_free. So hotpath allocation and
freeing no longer requires a subroutine call within SLUB.[I am not sure that it is worth doing this because it changes the easy to
read structure of slub just to reduce atomic ops. However, there is
someone out there with a benchmark on 4 way and 8 way processor systems
that seems to show a 5% regression vs. Slab. Seems that the regression is
due to increased atomic operations use vs. SLAB in SLUB). I wonder if
this is applicable or discernable at all in a real workload?]Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 May, 2007
28 commits
-
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix stacktrace simplification fallout.
sh: SH7760 DMABRG support.
sh: clockevent/clocksource/hrtimers/nohz TMU support.
sh: Truncate MAX_ACTIVE_REGIONS for the common case.
rtc: rtc-sh: Fix rtc_dev pointer for rtc_update_irq().
sh: Convert to common die chain.
sh: Wire up utimensat syscall.
sh: landisk mv_nr_irqs definition.
sh: Fixup ndelay() xloops calculation for alternate HZ.
sh: Add 32-bit opcode feature CPU flag.
sh: Fix PC adjustments for varying opcode length.
sh: Support for SH-2A 32-bit opcodes.
sh: Kill off redundant __div64_32 symbol export.
sh: Share exception vector table for SH-3/4.
sh: Always define TRAPA_BUG_OPCODE.
sh: __GFP_REPEAT for pte allocations, too.
rtc: rtc-sh: Fix up dev_dbg() warnings.
sh: generic quicklist support. -
Commit 6fe6900e1e5b6fa9e5c59aa5061f244fe3f467e2 introduced a nasty bug
in read_cache_page_async().It added a "mark_page_accessed(page)" at the final return path in
read_cache_page_async(). But in error cases, 'page' holds the error
code, and you can't mark it accessed.[ and Glauber de Oliveira Costa points out that we can use a return
instead of adding more goto's ]Signed-off-by: David Howells
Acked-by: Nick Piggin
Signed-off-by: Linus Torvalds -
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...Fix trivial comment conflict in kernel/relay.c.
-
Currently the slab allocators contain callbacks into the page allocator to
perform the draining of pagesets on remote nodes. This requires SLUB to have
a whole subsystem in order to be compatible with SLAB. Moving node draining
out of the slab allocators avoids a section of code in SLUB.Move the node draining so that is is done when the vm statistics are updated.
At that point we are already touching all the cachelines with the pagesets of
a processor.Add a expire counter there. If we have to update per zone or global vm
statistics then assume that the pageset will require subsequent draining.The expire counter will be decremented on each vm stats update pass until it
reaches zero. Then we will drain one batch from the pageset. The draining
will cause vm counter updates which will then cause another expiration until
the pcp is empty. So we will drain a batch every 3 seconds.Note that remote node draining is a somewhat esoteric feature that is required
on large NUMA systems because otherwise significant portions of system memory
can become trapped in pcp queues. The number of pcp is determined by the
number of processors and nodes in a system. A system with 4 processors and 2
nodes has 8 pcps which is okay. But a system with 1024 processors and 512
nodes has 512k pcps with a high potential for large amount of memory being
caught in them.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make it configurable. Code in mm makes the vm statistics intervals
independent from the cache reaper use that opportunity to make it
configurable.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
vmstat is currently using the cache reaper to periodically bring the
statistics up to date. The cache reaper does only exists in SLUB as a way to
provide compatibility with SLAB. This patch removes the vmstat calls from the
slab allocators and provides its own handling.The advantage is also that we can use a different frequency for the updates.
Refreshing vm stats is a pretty fast job so we can run this every second and
stagger this by only one tick. This will lead to some overlap in large
systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm
updates occurring at once.However, the vm stats update only accesses per node information. It is only
necessary to stagger the vm statistics updates per processor in each node. Vm
counter updates occurring on distant nodes will not cause cacheline
contention.We could implement an alternate approach that runs the first processor on each
node at the second and then each of the other processor on a node on a
subsequent tick. That may be useful to keep a large amount of the second free
of timer activity. Maybe the timer folks will have some feedback on this one?[jirislaby@gmail.com: add missing break]
Cc: Arjan van de Ven
Signed-off-by: Christoph Lameter
Signed-off-by: Jiri Slaby
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's very common for file systems to need to zero part or all of a page,
the simplist way is just to use kmap_atomic() and memset(). There's
actually a library function in include/linux/highmem.h that does exactly
that, but it's confusingly named memclear_highpage_flush(), which is
descriptive of *how* it does the work rather than what the *purpose* is.
So this patchset renames the function to zero_user_page(), and calls it
from the various places that currently open code it.This first patch introduces the new function call, and converts all the
core kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones. Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a
patch deprecating the old call. The diffstat below shows the entire
patchset.[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Nate Diller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Shutdown the cache_reaper if the cpu is brought down and set the
cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good.Signed-off-by: Christoph Lameter
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
introduced.Cc: Pekka Enberg
Cc: Srivatsa Vaddagiri
Cc: Gautham Shenoy
Signed-off-by: Heiko Carstens
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Export a couple of core functions for AFS write support to use:
find_get_pages_contig()
find_get_pages_tag()Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When cpuset is configured, it breaks the strict hugetlb page reservation as
the accounting is done on a global variable. Such reservation is
completely rubbish in the presence of cpuset because the reservation is not
checked against page availability for the current cpuset. Application can
still potentially OOM'ed by kernel with lack of free htlb page in cpuset
that the task is in. Attempt to enforce strict accounting with cpuset is
almost impossible (or too ugly) because cpuset is too fluid that task or
memory node can be dynamically moved between cpusets.The change of semantics for shared hugetlb mapping with cpuset is
undesirable. However, in order to preserve some of the semantics, we fall
back to check against current free page availability as a best attempt and
hopefully to minimize the impact of changing semantics that cpuset has on
hugetlb.Signed-off-by: Ken Chen
Cc: Paul Jackson
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The internal hugetlb resv_huge_pages variable can permanently leak nonzero
value in the error path of hugetlb page fault handler when hugetlb page is
used in combination of cpuset. The leaked count can permanently trap N
number of hugetlb pages in unusable "reserved" state.Steps to reproduce the bug:
(1) create two cpuset, user1 and user2
(2) reserve 50 htlb pages in cpuset user1
(3) attempt to shmget/shmat 50 htlb page inside cpuset user2
(4) kernel oom the user process in step 3
(5) ipcrm the shm segmentAt this point resv_huge_pages will have a count of 49, even though
there are no active hugetlbfs file nor hugetlb shared memory segment
in the system. The leak is permanent and there is no recovery method
other than system reboot. The leaked count will hold up all future use
of that many htlb pages in all cpusets.The culprit is that the error path of alloc_huge_page() did not
properly undo the change it made to resv_huge_page, causing
inconsistent state.Signed-off-by: Ken Chen
Cc: David Gibson
Cc: Adam Litke
Cc: Martin Bligh
Acked-by: David Gibson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No "blank" (or "*") line is allowed between the function name and lines for
it parameter(s).Cc: Randy Dunlap
Signed-off-by: Pekka Enberg
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In some cases SLUB is creating uselessly slabs that are larger than
slub_max_order. Also the layout of some of the slabs was not satisfactory.Go to an iterarive approach.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have information about how long an object existed and about the nodes and
cpus where the allocations and frees took place. Add that information to the
tracking output in /sys/slab/xx/alloc_calls and /sys/slab/free_callsThis will then enable slabinfo to output nice reports like this:
christoph@qirst:~/slub$ ./slabinfo kmalloc-128
Slabcache: kmalloc-128 Aliases: 0 Order : 0
Sizes (bytes) Slabs Debug Memory
------------------------------------------------------------------------
Object : 128 Total : 12 Sanity Checks : On Total: 49152
SlabObj: 200 Full : 7 Redzoning : On Used : 24832
SlabSiz: 4096 Partial: 4 Poisoning : On Loss : 24320
Loss : 72 CpuSlab: 1 Tracking : On Lalig: 13968
Align : 8 Objects: 20 Tracing : Off Lpadd: 1152kmalloc-128 has no kmem_cache operations
kmalloc-128: Kernel object allocation
-----------------------------------------------------------------------
6 param_sysfs_setup+0x71/0x130 age=284512/284512/284512 pid=1 nodes=0-1,3
11 percpu_populate+0x39/0x80 age=283914/284428/284512 pid=1 nodes=0
21 __register_chrdev_region+0x31/0x170 age=282896/284347/284473 pid=1-1705 nodes=0-2
1 sys_inotify_init+0x76/0x1c0 age=283423 pid=1004 nodes=0
19 as_get_io_context+0x32/0xd0 age=6/247567/283988 pid=1-11782 nodes=0,2
10 ida_pre_get+0x4a/0x80 age=277666/283773/284526 pid=0-2177 nodes=0,2
24 kobject_kset_add_dir+0x37/0xb0 age=282727/283860/284472 pid=1-1723 nodes=0-2
1 acpi_ds_build_internal_buffer_obj+0xd3/0x11d age=284508 pid=1 nodes=0
24 con_insert_unipair+0xd7/0x110 age=284438/284438/284438 pid=1 nodes=0,2
1 uart_open+0x2d2/0x4b0 age=283896 pid=1 nodes=0
26 dma_pool_create+0x73/0x1a0 age=282762/282833/282916 pid=1705-1723 nodes=0
1 neigh_table_init_no_netlink+0xd2/0x210 age=284461 pid=1 nodes=0
2 neigh_parms_alloc+0x2b/0xe0 age=284410/284411/284412 pid=1 nodes=2
2 neigh_resolve_output+0x1e1/0x280 age=276289/276291/276293 pid=0-2443 nodes=0
1 netlink_kernel_create+0x90/0x170 age=284472 pid=1 nodes=0
4 xt_alloc_table_info+0x39/0xf0 age=283958/283958/283959 pid=1 nodes=1
3 fn_hash_insert+0x473/0x720 age=277653/277661/277666 pid=2177-2185 nodes=0
1 get_mtrr_state+0x285/0x2a0 age=284526 pid=0 nodes=0
1 cacheinfo_cpu_callback+0x26d/0x3e0 age=284458 pid=1 nodes=0
29 kernel_param_sysfs_setup+0x25/0x90 age=284511/284511/284512 pid=1 nodes=0-1,3
5 process_zones+0x5e/0x170 age=284546/284546/284546 pid=0 nodes=0
1 drm_core_init+0x48/0x160 age=284421 pid=1 nodes=2kmalloc-128: Kernel object freeing
------------------------------------------------------------------------
163 age=4295176847 pid=0 nodes=0-3
1 __vunmap+0x6e/0xf0 age=282907 pid=1723 nodes=0
28 free_as_io_context+0x12/0x90 age=9243/262197/283474 pid=42-11754 nodes=0
1 acpi_get_object_info+0x1b7/0x1d4 age=284475 pid=1 nodes=0
1 do_acpi_find_child+0x45/0x4e age=284475 pid=1 nodes=0NUMA nodes : 0 1 2 3
------------------------------------------
All slabs 7 2 2 1
Partial slabs 2 2 0 0Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components
of SLUB. Thus SLUB will be able to replace SLOB. SLUB can arrange objects in
a denser way than SLOB and the code size should be minimal without debugging
and sysfs support.Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG.
CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB. SLUB enables
debugging via a boot parameter. SLUB debug code should always be present.CONFIG_SLUB_DEBUG can be modified in the embedded config section.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the tracking definitions and the check_valid_pointer() function away from
the debugging related functions.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Trace in both slab_alloc and slab_free has a lot of common code. Use a single
function for both.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This replaces the PageError() checking. DebugSlab is clearer and allows for
future changes to the page bit used. We also need it to support
CONFIG_SLUB_DEBUG.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the resiliency check into the SYSFS section after validate_slab that is
used by the resiliency check. This will avoid a forward declaration.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Scanning of objects happens in a number of functions. Consolidate that code.
DECLARE_BITMAP instead of coding the declaration for bitmaps.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update comments throughout SLUB to reflect the new developments. Fix up
various awkward sentences.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Its only purpose was to bring some sort of symmetry to sysfs usage when
dealing with bootstrapping per cpu flushing. Since we do not time out slabs
anymore we have no need to run finish_bootstrap even without sysfs. Fold it
back into slab_sysfs_init and drop the initcall for the !SYFS case.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We really do not need all this gaga there.
ksize gives us all the information we need to figure out if the object can
cope with the new size.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We needlessly duplicate code. Also make check_valid_pointer inline.
Signed-off-by: Christoph LAemter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If no redzoning is selected then we do not need padding before the next
object.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLUB currently assumes that the cacheline size is static. However, i386 f.e.
supports dynamic cache line size determination.Use cache_line_size() instead of L1_CACHE_BYTES in the allocator.
That also explains the purpose of SLAB_HWCACHE_ALIGN. So we will need to keep
that one around to allow dynamic aligning of objects depending on boot
determination of the cache line size.[akpm@linux-foundation.org: need to define it before we use it]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
9 commits
-
Signed-off-by: Michael Opdenacker
Signed-off-by: Adrian Bunk -
This moves SH over to the generic quicklists. As per x86_64,
we have special mappings for the PGDs, so these go on their
own list..Signed-off-by: Paul Mundt
-
Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds -
This implements deferred IO support in fbdev. Deferred IO is a way to delay
and repurpose IO. This implementation is done using mm's page_mkwrite and
page_mkclean hooks in order to detect, delay and then rewrite IO. This
functionality is used by hecubafb.[adaplas]
This is useful for graphics hardware with no directly addressable/mappable
framebuffer. Implementing this will allow the "framebuffer" to be accesible
from user space via mmap().Signed-off-by: Jaya Kumar
Signed-off-by: Antonino Daplas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Same story as with cat /proc/*/wchan race vs rmmod race, only
/proc/slab_allocators want more info than just symbol name.Signed-off-by: Alexey Dobriyan
Acked-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove do_sync_file_range() and convert callers to just use
do_sync_mapping_range().Signed-off-by: Mark Fasheh
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Russell King
Signed-off-by: Bryan Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cleanup: setting an outstanding error on a mapping was open coded too many
times. Factor it out in mapping_set_error().Signed-off-by: Guillaume Chazarain
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch is add white list into modpost.c for some functions and
ia64's section to fix section mismatchs.sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
at boot time, and kmalloc/vmalloc at hotplug time. If config
memory hotplug is on, there are references of bootmem allocater(init text)
from them (normal text). This is cause of section mismatch.Bootmem is called by many functions and it must be
used only at boot time. I think __init of them should keep for
section mismatch check. So, I would like to register sparse_index_alloc()
and zone_wait_table_init() into white list.In addition, ia64's .machvec section is function table of some platform
dependent code. It is mixture of .init.text and normal text. These
reference of __init functions are valid too.Signed-off-by: Yasunori Goto
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds