08 Oct, 2006
1 commit
-
Init list is called with a list parameter that is not equal to the
cachep->nodelists entry under NUMA if more than one node exists. This is
fully legitimatei. One may want to populate the list fields before
switching nodelist pointers.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Oct, 2006
2 commits
-
Add a way for a no_page() handler to request a retry of the faulting
instruction. It goes back to userland on page faults and just tries again
in get_user_pages(). I added a cond_resched() in the loop in that later
case.The problem I have with signal and spufs is an actual bug affecting apps and I
don't see other ways of fixing it.In addition, we are having issues with infiniband and 64k pages (related to
the way the hypervisor deals with some HV cards) that will require us to muck
around with the MMU from within the IB driver's no_page() (it's a pSeries
specific driver) and return to the caller the same way using NOPAGE_REFAULT.And to add to this, the graphics folks have been following a new approach of
memory management that involves transparently swapping objects between video
ram and main meory. To do that, they need installing PTEs from a no_page()
handler as well and that also requires returning with NOPAGE_REFAULT.(For the later, they are currently using io_remap_pfn_range to install one PTE
from no_page() which is a bit racy, we need to add a check for the PTE having
already been installed afer taking the lock, but that's ok, they are only at
the proof-of-concept stage. I'll send a patch adding a "clean" function to do
that, we can use that from spufs too and get rid of the sparsemem hacks we do
to create struct page for SPEs. Basically, that provides a generic solution
for being able to have no_page() map hardware devices, which is something that
I think sound driver folks have been asking for some time too).All of these things depend on having the NOPAGE_REFAULT exit path from
no_page() handlers.Signed-off-by: Benjamin Herrenchmidt
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Reduce the NUMA text size of mm/slab.o a little on x86 by using a local
variable to store the result of numa_node_id().text data bss dec hex filename
16858 2584 16 19458 4c02 mm/slab.o (before)
16804 2584 16 19404 4bcc mm/slab.o (after)[akpm@osdl.org: use better names]
[pbadari@us.ibm.com: fix that]
Cc: Christoph Lameter
Signed-off-by: Pekka Enberg
Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Oct, 2006
2 commits
-
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
Remove all inclusions ofManually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory. -
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6: (292 commits)
[GFS2] Fix endian bug for de_type
[GFS2] Initialize SELinux extended attributes at inode creation time.
[GFS2] Move logging code into log.c (mostly)
[GFS2] Mark nlink cleared so VFS sees it happen
[GFS2] Two redundant casts removed
[GFS2] Remove uneeded endian conversion
[GFS2] Remove duplicate sb reading code
[GFS2] Mark metadata reads for blktrace
[GFS2] Remove iflags.h, use FS_
[GFS2] Fix code style/indent in ops_file.c
[GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
[GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
[GFS2] inode-diet: Eliminate i_blksize from the inode structure
[GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
[GFS2] Fix typo in last patch
[GFS2] Fix direct i/o logic in filemap.c
[GFS2] Fix bug in Makefiles for lock modules
[GFS2] Remove (extra) fs_subsys declaration
[GFS2/DLM] Fix trailing whitespace
[GFS2] Tidy up meta_io code
...
04 Oct, 2006
10 commits
-
- rename ____kmalloc to kmalloc_track_caller so that people have a chance
to guess what it does just from it's name. Add a comment describing it
for those who don't. Also move it after kmalloc in slab.h so people get
less confused when they are just looking for kmalloc - move things around
in slab.c a little to reduce the ifdef mess.[penberg@cs.helsinki.fi: Fix up reversed #ifdef]
Signed-off-by: Christoph Hellwig
Signed-off-by: Pekka Enberg
Cc: Christoph Lameter
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix kernel-doc and function declaration (missing "void") in
mm/page_alloc.c.Add mm/page_alloc.c to kernel-api.tmpl in DocBook.
mm/page_alloc.c:2589:38: warning: non-ANSI function declaration of function 'remove_all_active_ranges'
Signed-off-by: Randy Dunlap
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Spotted by Hugh that hugetlb page is free'ed back to global pool before
performing any TLB flush in unmap_hugepage_range(). This potentially allow
threads to abuse free-alloc race condition.The generic tlb gather code is unsuitable to use by hugetlb, I just open
coded a page gathering list and delayed put_page until tlb flush is
performed.Cc: Hugh Dickins
Signed-off-by: Ken Chen
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Having min be a signed quantity means gcc can't turn high latency divides
into shifts. There happen to be two such divides for GFP_ATOMIC (ie.
networking, ie. important) allocations, one of which depends on the other.
Fixing this makes code smaller as a bonus.Shame on somebody (probably me).
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixes an kerneldoc error.
Signed-off-by: Henrik Kretzschmar
Cc: "Randy.Dunlap"
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kbuild explicitly includes this at build time.
Signed-off-by: Dave Jones
-
This patch against fixes a spelling mistake ("control" instead of "cotrol").
Signed-off-by: Michael Opdenacker
Acked-by: Alan Cox
Signed-off-by: Adrian Bunk -
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger
Signed-off-by: Adrian Bunk -
Randy brought it to my attention that in proper english "can not" should always
be written "cannot". I donot see any reason to argue, even if I mightnot
understand why this rule exists. This patch fixes "can not" in several
Documentation files as well as three Kconfigs.Signed-off-by: Matt LaPlante
Acked-by: Randy Dunlap
Signed-off-by: Adrian Bunk -
Signed-off-by: Adrian Bunk
02 Oct, 2006
1 commit
01 Oct, 2006
21 commits
-
Implement lazy MMU update hooks which are SMP safe for both direct and shadow
page tables. The idea is that PTE updates and page invalidations while in
lazy mode can be batched into a single hypercall. We use this in VMI for
shadow page table synchronization, and it is a win. It also can be used by
PPC and for direct page tables on Xen.For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified. This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of. Doing this under the protection of the locks guarantees the
synchronization is correct, and also means that spurious faults which are
generated during this window by remote CPUs are properly handled, as the page
fault handler must re-check the PTE under protection of the same lock.Signed-off-by: Zachary Amsden
Signed-off-by: Jeremy Fitzhardinge
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change pte_clear_full to a more appropriately named pte_clear_not_present,
allowing optimizations when not-present mapping changes need not be reflected
in the hardware TLB for protected page table modes. There is also another
case that can use it in the fremap code.Signed-off-by: Zachary Amsden
Signed-off-by: Jeremy Fitzhardinge
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We don't want to read PTEs directly like this after they have been modified,
as a lazy MMU implementation of direct page tables may not have written the
updated PTE back to memory yet.Signed-off-by: Zachary Amsden
Signed-off-by: Jeremy Fitzhardinge
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
unfix invalidate_inode_pages2().The problem is that various bits of code in the kernel can take transient refs
on pages: the page scanner will do this when inspecting a batch of pages, and
the lru_cache_add() batching pagevecs also hold a ref.Net result is transient failures in invalidate_inode_pages2(). This affects
NFS directory invalidation (observed) and presumably also block-backed
direct-io (not yet reported).Fix it by reverting invalidate_inode_pages2() back to the old version which
ignores the page refcounts.We may come up with something more clever later, but for now we need a 2.6.18
fix for NFS.Cc: Chuck Lever
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.So, add a little helper function to do the decrements. We'll tie into it in a
bit to note when i_nlink hits zero.Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch cleans up generic_file_*_read/write() interfaces. Christoph
Hellwig gave me the idea for this clean ups.In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
to cleanup all variants of generic_file_* routines.Final available interfaces:
generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler__generic_file_aio_write_nolock() - internal worker routine
Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
One of idiomatic ways to duplicate a region of memory is
dst = kmalloc(len, GFP_KERNEL);
if (!dst)
return -ENOMEM;
memcpy(dst, src, len);which is neat code except a programmer needs to write size twice. Which
sometimes leads to mistakes. If len passed to kmalloc is smaller that len
passed to memcpy, it's straight overwrite-beyond-end. If len passed to
memcpy is smaller than len passed to kmalloc, it's either a) legit
behaviour ;-), or b) cloned buffer will contain garbage in second half.Slight trolling of commit lists shows several duplications bugs
done exactly because of diverged lenghts:Linux:
[CRYPTO]: Fix memcpy/memset args.
[PATCH] memcpy/memset fixes
OpenBSD:
kerberosV/src/lib/asn1: der_copy.c:1.4If programmer is given only one place to play with lengths, I believe, such
mistakes could be avoided.With kmemdup, the snippet above will be rewritten as:
dst = kmemdup(src, len, GFP_KERNEL);
if (!dst)
return -ENOMEM;This also leads to smaller code (kzalloc effect). Quick grep shows
200+ places where kmemdup() can be used.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid. This patch allows the fucntion to do
something besides return 0. It uses the nodes_add infomation to lookup to
node info for a hot add event.Signed-off-by: Keith Mannthey
Cc: KAMEZAWA Hiroyuki
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Migate CONFIG_MEMORY_HOTPLUG to CONFIG_MEMORY_HOTPLUG_SPARSE where needed.
Signed-off-by: Keith Mannthey
Cc: KAMEZAWA Hiroyuki
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Create Kconfig namespace for MEMORY_HOTPLUG_RESERVE and MEMORY_HOTPLUG_SPARSE.
This is needed to create a disticiton between the 2 paths. Selecting the
high level opiton of MEMORY_HOTPLUG will get you MEMORY_HOTPLUG_SPARSE if you
have sparsemem enabled or MEMORY_HOTPLUG_RESERVE if you are x86_64 with
discontig and ACPI numa support.Signed-off-by: Keith Mannthey
Cc: KAMEZAWA Hiroyuki
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix up externs in memory_hotplug.c. Cleanup.
Signed-off-by: Keith Mannthey
Cc: KAMEZAWA Hiroyuki
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Don't try and give NULL to fput() in the error handling in do_mmap_pgoff()
as it'll cause an oops.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Dissociate the generic_writepages() function from the mpage stuff, moving its
declaration to linux/mm.h and actually emitting a full implementation into
mm/page-writeback.c.The implementation is a partial duplicate of mpage_writepages() with all BIO
references removed.It is used by NFS to do writeback.
Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Move the bounce buffer code from mm/highmem.c to mm/bounce.c so that it can be
more easily disabled when the block layer is disabled.!!!NOTE!!! There may be a bug in this code: Should init_emergency_pool() be
contingent on CONFIG_HIGHMEM?Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Stop fallback_migrate_page() from using page_has_buffers() since that might not
be available. Use PagePrivate() instead since that's more general.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe -
Move some functions out of the buffering code that aren't strictly buffering
specific. This is a precursor to being able to disable the block layer.(*) Moved some stuff out of fs/buffer.c:
(*) The file sync and general sync stuff moved to fs/sync.c.
(*) The superblock sync stuff moved to fs/super.c.
(*) do_invalidatepage() moved to mm/truncate.c.
(*) try_to_release_page() moved to mm/filemap.c.
(*) Moved some related declarations between header files:
(*) declarations for do_invalidatepage() and try_to_release_page() moved
to linux/mm.h.(*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
30 Sep, 2006
3 commits
-
Add access control lists for tmpfs.
Signed-off-by: Andreas Gruenbacher
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
akpm draws my attention to the fact that sysctl(VM_PAGE_CLUSTER) might
conceivably change page_cluster to 0 while valid_swaphandles() is in the
middle of using it, leading to an embarrassingly long loop: take a local
snapshot of page_cluster and work with that.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ratelimit_pages in page-writeback.c is recalculated (in set_ratelimit())
every time a CPU is hot-added/removed. But this value is not recalculated
when new pages are hot-added.This patch fixes that problem by calling set_ratelimit() when new pages
are hot-added.[akpm@osdl.org: cleanups]
Signed-off-by: Chandra Seetharaman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds