Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2007

1 commit

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800

17 Jul, 2007

2 commits

b4c07bce7 hugetlbfs: handle empty options string ... Browse Code »

I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
Some file system get_sb() handler was returning NULL mnt_sb with
a non-negative return value. I also noticed a "hugetlbfs: Bad
mount option:" message in the log.

Turns out that hugetlbfs_parse_options() was not checking for an
empty option string after call to strsep(). On failure,
hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just
passed this return code back up the call stack where
vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.

Apparently introduced by patch:
hugetlbfs-use-lib-parser-fix-docs.patch

The problem was exposed by this line in my fstab:

none /huge hugetlbfs defaults 0 0

It can also be demonstrated by invoking mount of hugetlbfs
directly with no options or a bogus option.

This patch:

1) adds the check for empty option to hugetlbfs_parse_options(),
2) enhances the error message to bracket any unrecognized
option with quotes ,
3) modifies hugetlbfs_parse_options() to return -EINVAL on any
unrecognized option,
4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
handler that returns a NULL mnt->mnt_sb with a return value
>= 0.

Signed-off-by: Lee Schermerhorn
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2007-07-17 00:05:46 +0800
e73a75fa7 hugetlbfs: use lib/parser, fix docs ... Browse Code »

Use lib/parser.c to parse hugetlbfs mount options. Correct docs in
hugetlbpage.txt.

old size of hugetlbfs_fill_super: 675 bytes
new size of hugetlbfs_fill_super: 686 bytes
(hugetlbfs_parse_options() is inlined)

Signed-off-by: Randy Dunlap
Cc: Hugh Dickins
Cc: David Gibson
Cc: Adam Litke
Acked-by: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-07-17 00:05:46 +0800

17 Jun, 2007

1 commit

9d66586f7 shm: fix the filename of hugetlb sysv shared memory ... Browse Code »

Some user space tools need to identify SYSV shared memory when examining
/proc//maps. To do so they look for a block device with major zero, a
dentry named SYSV, and having the minor of the internal sysv
shared memory kernel mount.

To help these tools and to make it easier for people just browsing
/proc//maps this patch modifies hugetlb sysv shared memory to use the
SYSV dentry naming convention.

User space tools will still have to be aware that hugetlb sysv shared
memory lives on a different internal kernel mount and so has a different
block device minor number from the rest of sysv shared memory.

Signed-off-by: Eric W. Biederman
Cc: "Serge E. Hallyn"
Cc: Albert Cahalan
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-06-17 04:16:16 +0800

17 May, 2007

1 commit

a35afb830 Remove SLAB_CTOR_CONSTRUCTOR ... Browse Code »

SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-17 20:23:04 +0800

08 May, 2007

5 commits

5bc98594d hugetlbfs: add NULL check in hugetlb_zero_setup() ... Browse Code »

If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and
shmget() with SHM_HUGETLB flag will cause NULL pointer dereference.

Signed-off-by: Akinobu Mita
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2007-05-08 03:12:57 +0800
50953fe9e slab allocators: Remove SLAB_DEBUG_INITIAL flag ... Browse Code »

I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
036e08568 get_unmapped_area handles MAP_FIXED in hugetlbfs ... Browse Code »

Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling
prepare_hugepage_range()

Signed-off-by: Benjamin Herrenschmidt
Acked-by: William Irwin
Cc: Paul Mackerras
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: David Howells
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Kyle McMartin
Cc: Grant Grundler
Cc: Matthew Wilcox
Cc: "David S. Miller"
Cc: Adam Litke
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2007-05-08 03:12:57 +0800
d85f33855 Make page->private usable in compound pages ... Browse Code »

If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
d2ba27e80 proper prototype for hugetlb_get_unmapped_area() ... Browse Code »

Add a proper prototype for hugetlb_get_unmapped_area() in
include/linux/hugetlb.h.

Signed-off-by: Adrian Bunk
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-05-08 03:12:51 +0800

13 Feb, 2007

2 commits

ee9b6d61a [PATCH] Mark struct super_operations const ... Browse Code »

This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef 'Jeff' Sipek
2007-02-13 01:48:47 +0800
92e1d5be9 [PATCH] mark struct inode_operations const 2 ... Browse Code »

Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:46 +0800

10 Feb, 2007

1 commit

6649a3863 [PATCH] hugetlb: preserve hugetlb pte dirty state ... Browse Code »

__unmap_hugepage_range() is buggy that it does not preserve dirty state of
huge_pte when unmapping hugepage range. It causes data corruption in the
event of dop_caches being used by sys admin. For example, an application
creates a hugetlb file, modify pages, then unmap it. While leaving the
hugetlb file alive, comes along sys admin doing a "echo 3 >
/proc/sys/vm/drop_caches".

drop_pagecache_sb() will happily free all pages that aren't marked dirty if
there are no active mapping. Later when application remaps the hugetlb
file back and all data are gone, triggering catastrophic flip over on
application.

Not only that, the internal resv_huge_pages count will also get all messed
up. Fix it up by marking page dirty appropriately.

Signed-off-by: Ken Chen
Cc: "Nish Aravamudan"
Cc: Adam Litke
Cc: David Gibson
Cc: William Lee Irwin III
Cc:
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
2007-02-10 01:25:46 +0800

22 Dec, 2006

1 commit

fba2591bf VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions ... Browse Code »

They were horribly easy to mis-use because of their tempting naming, and
they also did way more than any users of them generally wanted them to
do.

A dirty page can become clean under two circumstances:

(a) when we write it out. We have "clear_page_dirty_for_io()" for
this, and that function remains unchanged.

In the "for IO" case it is not sufficient to just clear the dirty
bit, you also have to mark the page as being under writeback etc.

(b) when we actually remove a page due to it becoming inaccessible to
users, notably because it was truncate()'d away or the file (or
metadata) no longer exists, and we thus want to cancel any
outstanding dirty state.

For the (b) case, we now introduce "cancel_dirty_page()", which only
touches the page state itself, and verifies that the page is not mapped
(since cancelling writes on a mapped page would be actively wrong as it
is still accessible to users).

Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
ReiserFS, XFS all use the old confusing functions, and will be fixed
separately in subsequent commits (with some of them just removing the
offending logic, and others using clear_page_dirty_for_io()).

This was confirmed by Martin Michlmayr to fix the apt database
corruption on ARM.

Cc: Martin Michlmayr
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: Arjan van de Ven
Cc: Andrei Popa
Cc: Andrew Morton
Cc: Dave Kleikamp
Cc: Gordon Farquharson
Cc: Martin Schwidefsky
Cc: Trond Myklebust
Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-22 01:19:57 +0800

09 Dec, 2006

1 commit

b39424e27 [PATCH] struct path: convert hugetlbfs ... Browse Code »

Signed-off-by: Josef Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef Sipek
2006-12-09 00:28:45 +0800

08 Dec, 2006

2 commits

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800

15 Nov, 2006

1 commit

68589bc35 [PATCH] hugetlb: prepare_hugepage_range check offset too ... Browse Code »

(David:)

If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it. That will eventually call down to the
non-hugepage version of unmap_page_range(). On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries. I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on.

(Hugh:)

prepare_hugepage_range() should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down. PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.

Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
is the check for too small a mapping); but even so, move up setting of
VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.

Signed-off-by: Hugh Dickins
Cc: "Luck, Tony"
Cc: "David S. Miller"
Acked-by: Adam Litke
Acked-by: David Gibson
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-11-15 01:09:27 +0800

29 Oct, 2006

2 commits

856fc2950 [PATCH] hugetlb: fix prio_tree unit ... Browse Code »

hugetlb_vmtruncate_list was misconverted to prio_tree: its prio_tree is in
units of PAGE_SIZE (PAGE_CACHE_SIZE) like any other, not HPAGE_SIZE (whereas
its radix_tree is kept in units of HPAGE_SIZE, otherwise slots would be
absurdly sparse).

At first I thought the error benign, just calling __unmap_hugepage_range on
more vmas than necessary; but on 32-bit machines, when the prio_tree is
searched correctly, it happens to ensure the v_offset calculation won't
overflow. As it stood, when truncating at or beyond 4GB, it was liable to
discard pages COWed from lower offsets; or even to clear pmd entries of
preceding vmas, triggering exit_mmap's BUG_ON(nr_ptes).

Signed-off-by: Hugh Dickins
Cc: Adam Litke
Cc: David Gibson
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-10-29 02:30:53 +0800
b9d7e6ae8 [PATCH] hugetlb: fix size=4G parsing ... Browse Code »

On 32-bit machines, mount -t hugetlbfs -o size=4G gave a 0GB filesystem,
size=5G gave a 1GB filesystem etc: there's no point in masking size with
HPAGE_MASK just before shifting its lower bits away, and since HPAGE_MASK is a
UL, that removed all the higher bits of the unsigned long long size.

Signed-off-by: Hugh Dickins
Cc: Adam Litke
Cc: David Gibson
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-10-29 02:30:53 +0800

12 Oct, 2006

1 commit

502717f4e [PATCH] hugetlb: fix linked list corruption in unmap_hugepage_range() ... Browse Code »

commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes kernel to oops with
libhugetlbfs test suite. The problem is that hugetlb pages can be shared
by multiple mappings. Multiple threads can fight over page->lru in the
unmap path and bad things happen. We now serialize __unmap_hugepage_range
to void concurrent linked list manipulation. Such serialization is also
needed for shared page table page on hugetlb area. This patch will fixed
the bug and also serve as a prepatch for shared page table.

Signed-off-by: Ken Chen
Cc: Hugh Dickins
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-10-12 02:14:15 +0800

01 Oct, 2006

1 commit

d8c76e6f4 [PATCH] r/o bind mount prepwork: inc_nlink() helper ... Browse Code »

This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2006-10-01 15:39:30 +0800

30 Sep, 2006

1 commit

ddc0a51d2 [PATCH] hugetlbfs: add lock annotation to hugetlbfs_forget_inode() ... Browse Code »

hugetlbfs_forget_inode releases inode_lock. Add a lock annotation to this
function so that sparse can check callers for lock pairing, and so that
sparse will not complain about this functions since it intentionally uses
the lock in this manner.

Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2006-09-30 00:18:08 +0800

27 Sep, 2006

1 commit

ba52de123 [PATCH] inode-diet: Eliminate i_blksize from the inode structure ... Browse Code »

This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.

Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.

[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Ts'o
2006-09-27 23:26:18 +0800

11 Jul, 2006

1 commit

b6174df5e [PATCH] mmap zero-length hugetlb file with PROT_NONE to protect a hugetlb virtual area ... Browse Code »

Sometimes, applications need below call to be successful although
"/mnt/hugepages/file1" doesn't exist.

fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
*addr = mmap(NULL, 0x1024*1024*256, PROT_NONE, 0, fd, 0);

As for regular pages (or files), above call does work, but as for huge
pages, above call would fail because hugetlbfs_file_mmap would fail if
(!(vma->vm_flags & VM_WRITE) && len > inode->i_size).

This capability on huge page is useful on ia64 when the process wants to
protect one area on region 4, so other threads couldn't read/write this
area. A famous JVM (Java Virtual Machine) implementation on IA64 needs the
capability.

Signed-off-by: Zhang Yanmin
Cc: David Gibson
Cc: Hugh Dickins
[ Expand-on-mmap semantics again... this time matching normal fs's. wli ]
Acked-by: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang, Yanmin
2006-07-11 04:24:21 +0800

29 Jun, 2006

1 commit

f5e54d6e5 [PATCH] mark address_space_operations const ... Browse Code »

Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.

Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-06-29 05:59:04 +0800

23 Jun, 2006

3 commits

a43a8c39b [PATCH] tightening hugetlb strict accounting ... Browse Code »

Current hugetlb strict accounting for shared mapping always assume mapping
starts at zero file offset and reserves pages between zero and size of the
file. This assumption often reserves (or lock down) a lot more pages then
necessary if application maps at none zero file offset. libhugetlbfs is
one example that requires proper reservation on shared mapping starts at
none zero offset.

This patch extends the reservation and hugetlb strict accounting to support
any arbitrary pair of (offset, len), resulting a much more robust and
accurate scheme. More importantly, it won't lock down any hugetlb pages
outside file mapping.

Signed-off-by: Ken Chen
Acked-by: Adam Litke
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-06-23 22:42:48 +0800
726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

29 Mar, 2006

1 commit

4b6f5d20b [PATCH] Make most file operations structs in fs/ const ... Browse Code »

This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-29 01:16:06 +0800

22 Mar, 2006

2 commits

bba1e9b21 [PATCH] convert hugetlbfs_counter to atomic ... Browse Code »

Implementation of hugetlbfs_counter() is functionally equivalent to
atomic_inc_return(). Use the simpler atomic form.

Signed-off-by: Ken Chen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-03-22 23:54:04 +0800
b45b5bd65 [PATCH] hugepage: Strict page reservation for hugepage inodes ... Browse Code »

These days, hugepages are demand-allocated at first fault time. There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations. In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available. In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated. MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL. MAP_PRIVATE mappings can still
trigger an OOM. (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2006-03-22 23:54:03 +0800

02 Feb, 2006

1 commit

4e6a510a7 [PATCH] mm: hugepage accounting fix ... Browse Code »

2.6.15's hugepage faulting introduced huge_pages_needed accounting into
hugetlbfs: to count how many pages are already in cache, for spot check on
how far a new mapping may be allowed to extend the file. But it's muddled:
each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once pages were
already in cache, it would overshoot, wrap its hugepages count backwards,
and so fail a harmless repeat mapping with -ENOMEM. Fixes the problem
found by Don Dupuis.

Signed-off-by: Hugh Dickins
Acked-By: Adam Litke
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-02-02 00:53:15 +0800

15 Jan, 2006

1 commit

7339ff830 [PATCH] Add tmpfs options for memory placement policies ... Browse Code »

Anything that writes into a tmpfs filesystem is liable to disproportionately
decrease the available memory on a particular node. Since there's no telling
what sort of application (e.g. dd/cp/cat) might be dropping large files
there, this lets the admin choose the appropriate default behavior for their
site's situation.

Introduce a tmpfs mount option which allows specifying a memory policy and
a second option to specify the nodelist for that policy. With the default
policy, tmpfs will behave as it does today. This patch adds support for
preferred, bind, and interleave policies.

The default policy will cause pages to be added to tmpfs files on the node
which is doing the writing. Some jobs expect a single process to create
and manage the tmpfs files. This results in a node which has a
significantly reduced number of free pages.

With this patch, the administrator can specify the policy and nodes for
that policy where they would prefer allocations.

This patch was originally written by Brent Casavant and Hugh Dickins. I
added support for the bind and preferred policies and the mpol_nodelist
mount option.

Signed-off-by: Brent Casavant
Signed-off-by: Hugh Dickins
Signed-off-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2006-01-15 10:27:07 +0800

12 Jan, 2006

1 commit

16f7e0fe2 [PATCH] capable/capability.h (fs/) ... Browse Code »

fs: Use where capable() is used.

Signed-off-by: Randy Dunlap
Acked-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-01-12 10:42:13 +0800

10 Jan, 2006

1 commit

1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800

07 Jan, 2006

1 commit

1e8f889b1 [PATCH] Hugetlb: Copy on Write support ... Browse Code »

Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported. This helps us to safely use hugetlb pages in many more
applications. The patch makes the following changes. If needed, I also have
it broken out according to the following paragraphs.

1. Add a pair of functions to set/clear write access on huge ptes. The
writable check in make_huge_pte is moved out to the caller for use by COW
later.

2. Hugetlb copy-on-write requires special case handling in the following
situations:

- copy_hugetlb_page_range() - Copied pages must be write protected so
a COW fault will be triggered (if necessary) if those pages are written
to.

- find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
page cache. MAP_PRIVATE pages still need to be locked however.

3. Provide hugetlb_cow() and calls from hugetlb_fault() and
hugetlb_no_page() which handles the COW fault by making the actual copy.

4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
will be allowed. Make MAP_HUGETLB exempt from the depricated VM_RESERVED
mapping check.

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2006-01-07 00:33:23 +0800

23 Nov, 2005

1 commit

74a8a65c5 [PATCH] Fix hugetlbfs_statfs() reporting of block limits ... Browse Code »

Currently, if a hugetlbfs is mounted without limits (the default), statfs()
will return -1 for max/free/used blocks. This does not appear to be in
line with normal convention: simple_statfs() and shmem_statfs() both return
0 in similar cases. Worse, it confuses the translation logic in
put_compat_statfs(), causing it to return -EOVERFLOW on such a mount.

This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks
on a mount without limits. Note that we need the test in the patch below,
rather than just using 0 in the sbinfo structure, because the -1 marked in
the free blocks field is used internally to tell the

Signed-off-by: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2005-11-23 01:13:43 +0800

09 Nov, 2005

1 commit

8d3d81cf0 [PATCH] fs/hugetlbfs/inode.c: make a function static ... Browse Code »

This patch makes a needlessly global function static.

Signed-off-by: Adrian Bunk
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-09 23:56:41 +0800

30 Oct, 2005

1 commit

2e9b367c2 [PATCH] hugetlb: overcommit accounting check ... Browse Code »

Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.

Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting. I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case). That
work is underway and builds on what this patch starts.

Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.

Signed-off-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2005-10-30 12:40:43 +0800