01 Apr, 2006
5 commits
-
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...Cc: Nick Piggin
Cc: Michael Kerrisk
Cc: Ulrich Drepper
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).And __setup() is used in obsolete_checksetup().
start_kernel()
-> parse_args()
-> unknown_bootoption()
-> obsolete_checksetup()If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func(). If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.This patch fixes a wrong usage of it, however fixes obvious one only.
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With strict page reservation, I think kernel should enforce number of free
hugetlb page don't fall below reserved count. Currently it is possible in
the sysctl path. Add proper check in sysctl to disallow that.Signed-off-by: Ken Chen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
git-commit: d5d4b0aa4e1430d73050babba999365593bdb9d2
"[PATCH] optimize follow_hugetlb_page" breaks mlock on hugepage areas.I mis-interpret pages argument and made get_page() unconditional. It
should only get a ref count when "pages" argument is non-null.Credit goes to Adam Litke who spotted the bug.
Signed-off-by: Ken Chen
Acked-by: Adam Litke
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
find_trylock_page() is an odd interface in that it doesn't take a reference
like the others. Now that XFS no longer uses it, and its last remaining
caller actually wants an elevated refcount, opencode that callsite and
schedule find_trylock_page() for removal.Signed-off-by: Nick Piggin
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Mar, 2006
2 commits
-
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
replaces for_each_cpu with for_each_possible_cpu().
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Mar, 2006
6 commits
-
Helper functions for for_each_online_pgdat/for_each_zone look too big to be
inlined. Speed of these helper macro itself is not very important. (inner
loops are tend to do more work than this)This patch make helper function to be out-of-lined.
inline out-of-line
.text 005c0680 005bf6a0005c0680 - 005bf6a0 = FE0 = 4Kbytes.
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
By using for_each_online_pgdat(), pgdat_list is not necessary now. This patch
removes it.Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace for_each_pgdat() with for_each_online_pgdat().
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
sorted by node_boot_start.Only nodes against which init_bootmem() is called are linked to the list.
(i386 allocates bootmem only from one node(0) not from all online nodes.)A summary:
1. for_each_online_pgdat() traverses all *online* nodes.
2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes zone_mem_map.
pfn_to_page uses pgdat, page_to_pfn uses zone. page_to_pfn can use pgdat
instead of zone, which is only one user of zone_mem_map. By modifing it,
we can remove zone_mem_map.Signed-off-by: KAMEZAWA Hiroyuki
Cc: Dave Hansen
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM.
Each arch has its own page_to_pfn(), pfn_to_page() for each models.
But most of them can use the same arithmetic.This patch adds asm-generic/memory_model.h, which includes generic
page_to_pfn(), pfn_to_page() definitions for each memory model.When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are
used instead of macro. This is enabled by some archs and reduces
text size.Signed-off-by: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Mar, 2006
8 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
Kconfig help: MTD_JEDECPROBE already supports Intel
Remove ugly debugging stuff
do_mounts.c: Minor ROOT_DEV comment cleanup
BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
BUG_ON() Conversion in mm/mempool.c
BUG_ON() Conversion in mm/memory.c
BUG_ON() Conversion in kernel/fork.c
BUG_ON() Conversion in ipc/sem.c
BUG_ON() Conversion in fs/ext2/
BUG_ON() Conversion in fs/hfs/
BUG_ON() Conversion in fs/dcache.c
BUG_ON() Conversion in fs/buffer.c
BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
BUG_ON() Conversion in md/dm-table.c
BUG_ON() Conversion in md/dm-path-selector.c
BUG_ON() Conversion in drivers/isdn
BUG_ON() Conversion in drivers/char
BUG_ON() Conversion in drivers/mtd/ -
Add another allocator to the common mempool code: a kzalloc/kfree allocator
This will be used by the next patch in the series to replace a mempool-backed
kzalloc allocator. It is also very likely that there will be more users in
the future.Signed-off-by: Matthew Dobson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add another allocator to the common mempool code: a kmalloc/kfree allocator
This will be used by the next patch in the series to replace duplicate
mempool-backed kmalloc allocators in several places in the kernel. It is also
very likely that there will be more users in the future.Signed-off-by: Matthew Dobson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert two mempool users that currently use their own mempool-backed page
allocators to use the generic mempool page allocator.Also included are 2 trivial whitespace fixes.
Signed-off-by: Matthew Dobson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This will be used by the next patch in the series to replace duplicate
mempool-backed page allocators in 2 places in the kernel. It is also likely
that there will be more users in the future.Signed-off-by: Matthew Dobson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, get_user_pages() returns fully coherent pages to the kernel for
anything other than anonymous pages. This is a problem for things like
fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA
to anonymous pages passed in by users.The fix is to add a new memory management API: flush_anon_page() which
is used in get_user_pages() to make anonymous pages coherent.Signed-off-by: James Bottomley
Cc: Russell King
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk -
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
26 Mar, 2006
15 commits
-
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds -
Hugh is rightly concerned that the CONFIG_DEBUG_VM coverage has gone too
far in vm_normal_page, considering that we expect production kernels to be
shipped with the option turned off, and that the code has been under some
large changes recently.Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds -
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
BUG_ON() Conversion in drivers/video/
BUG_ON() Conversion in drivers/parisc/
BUG_ON() Conversion in drivers/block/
BUG_ON() Conversion in sound/sparc/cs4231.c
BUG_ON() Conversion in drivers/s390/block/dasd.c
BUG_ON() Conversion in lib/swiotlb.c
BUG_ON() Conversion in kernel/cpu.c
BUG_ON() Conversion in ipc/msg.c
BUG_ON() Conversion in block/elevator.c
BUG_ON() Conversion in fs/coda/
BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
BUG_ON() Conversion in input/serio/hil_mlc.c
BUG_ON() Conversion in md/dm-hw-handler.c
BUG_ON() Conversion in md/bitmap.c
The comment describing how MS_ASYNC works in msync.c is confusing
rcu: undeclared variable used in documentation
fix typos "wich" -> "which"
typo patch for fs/ufs/super.c
Fix simple typos
tabify drivers/char/Makefile
... -
The "rounded up to nearest power of 2 in size" algorithm in
alloc_large_system_hash is not correct. As coded, it takes an otherwise
acceptable power-of-2 value and doubles it. For example, we see the error
if we boot with thash_entries=2097152 which produces a hash table with
4194304 entries.Signed-off-by: John Hawkes
Cc: Roland Dreier
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A couple of places are forgetting to take it.
The kswapd case is probably unimportant. keventd_create_kthread() was racy.
The whole thing is a bit flakey: you start a kernel thread, get its pid from
kernel_thread() then look up its task_struct.a) It assumes that pid recycling takes a "long" time.
b) We get a task_struct but no reference was taken on it. The owner of the
kswapd and kthread task_struct*'s must assume that the new thread won't
exit unexpectedly. Because if it does, they're left holding dead memory
and any attempt to control or stop that task will crash.Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In zone_pcp_init we print out all zones even if they are empty:
On node 0 totalpages: 245760
DMA zone: 245760 pages, LIFO batch:31
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0To conserve dmesg space why not print only the non zero zones.
Signed-off-by: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The page migration code could function without NUMA but we currently have
no users for the non-NUMA case.Signed-off-by: Christoph Lameter
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have had this memory leak for a while now. The situation is complicated
by the use of alloc_kmemlist() as a function to resize various caches by
do_tune_cpucache().What we do here is first of all make sure that we deallocate properly in
the loop over all the nodes.If we are just resizing caches then we can simply return with -ENOMEM if an
allocation fails.If the cache is new then we need to rollback and remove all earlier
allocations.We detect that a cache is new by checking if the link to the global cache
chain has been setup. This is a bit hackish ....(also fix up too overlong lines that I added in the last patch...)
Signed-off-by: Christoph Lameter
Cc: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Inspired by Jesper Juhl's patch from today
1. Get rid of err
We do not set it to anything else but zero.2. Drop the CONFIG_NUMA stuff.
There are definitions for alloc_alien_cache and free_alien_cache()
that do the right thing for the non NUMA case.3. Better naming of variables.
4. Remove redundant cachep->nodelists[node] expressions.
Signed-off-by: Christoph Lameter
Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__drain_alien_cache() currently drains objects by freeing them to the
(remote) freelists of the original node. However, each node also has a
shared list containing objects to be used on any processor of that node.
We can avoid a number of remote node accesses by copying the pointers to
the free objects directly into the remote shared array.And while we are at it: Skip alien draining if the alien cache spinlock is
already taken.Kiran reported that this is a performance benefit.
Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
slabr_objects() can be used to transfer objects between various object
caches of the slab allocator. It is currently only used during
__cache_alloc() to retrieve elements from the shared array. We will be
using it soon to transfer elements from the alien caches to the remote
shared array.Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert mm/ to use the new kmem_cache_zalloc allocator.
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As suggested by Eric Dumazet, optimize kzalloc() calls that pass a
compile-time constant size. Please note that the patch increases kernel
text slightly (~200 bytes for defconfig on x86).Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator
already exits in XFS and there are potential users for it so this patch
makes the allocator available for the general public.Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement /proc/slab_allocators. It produces output like:
idr_layer_cache: 80 idr_pre_get+0x33/0x4e
buffer_head: 2555 alloc_buffer_head+0x20/0x75
mm_struct: 9 mm_alloc+0x1e/0x42
mm_struct: 20 dup_mm+0x36/0x370
vm_area_struct: 384 dup_mm+0x18f/0x370
vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
vm_area_struct: 1 split_vma+0x5a/0x10e
vm_area_struct: 11 do_brk+0x206/0x2e2
vm_area_struct: 2 copy_vma+0xda/0x142
vm_area_struct: 9 setup_arg_pages+0x99/0x214
fs_cache: 8 copy_fs_struct+0x21/0x133
fs_cache: 29 copy_process+0xf38/0x10e3
files_cache: 30 alloc_files+0x1b/0xcf
signal_cache: 81 copy_process+0xbaa/0x10e3
sighand_cache: 77 copy_process+0xe65/0x10e3
sighand_cache: 1 de_thread+0x4d/0x5f8
anon_vma: 241 anon_vma_prepare+0xd9/0xf3
size-2048: 1 add_sect_attrs+0x5f/0x145
size-2048: 2 journal_init_revoke+0x99/0x302
size-2048: 2 journal_init_revoke+0x137/0x302
size-2048: 2 journal_init_inode+0xf9/0x1c4Cc: Manfred Spraul
Cc: Alexander Nyberg
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Ravikiran Thirumalai
Signed-off-by: Al Viro
DESC
slab-leaks3-locking-fix
EDESC
From: Andrew MortonUpdate for slab-remove-cachep-spinlock.patch
Cc: Al Viro
Cc: Manfred Spraul
Cc: Alexander Nyberg
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Ravikiran Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Mar, 2006
1 commit
-
because of a typo. This patch just changes "my" to "by", which I
believe was the original intent.Signed-off-by: Adrian Bunk
24 Mar, 2006
3 commits
-
This patch series creates a strndup_user() function to easy copying C strings
from userspace. Also we avoid common pitfalls like userspace modifying the
final \0 after the strlen_user().Signed-off-by: Davi Arnaut
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No need to duplicate all that code.
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
msync() does a strange thing. Essentially:
vma = find_vma();
for ( ; ; ) {
if (!vma)
return -ENOMEM;
...
vma = vma->vm_next;
}so an msync() request which starts within or before a valid VMA and which ends
within or beyond the final VMA will incorrectly return -ENOMEM.Fix.
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds