Eric Lee / linux-smarc-t335x-v3.2

15 Jan, 2006

1 commit

7339ff830 [PATCH] Add tmpfs options for memory placement policies ... Browse Code »

Anything that writes into a tmpfs filesystem is liable to disproportionately
decrease the available memory on a particular node. Since there's no telling
what sort of application (e.g. dd/cp/cat) might be dropping large files
there, this lets the admin choose the appropriate default behavior for their
site's situation.

Introduce a tmpfs mount option which allows specifying a memory policy and
a second option to specify the nodelist for that policy. With the default
policy, tmpfs will behave as it does today. This patch adds support for
preferred, bind, and interleave policies.

The default policy will cause pages to be added to tmpfs files on the node
which is doing the writing. Some jobs expect a single process to create
and manage the tmpfs files. This results in a node which has a
significantly reduced number of free pages.

With this patch, the administrator can specify the policy and nodes for
that policy where they would prefer allocations.

This patch was originally written by Brent Casavant and Hugh Dickins. I
added support for the bind and preferred policies and the mpol_nodelist
mount option.

Signed-off-by: Brent Casavant
Signed-off-by: Hugh Dickins
Signed-off-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2006-01-15 10:27:07 +0800

12 Jan, 2006

1 commit

16f7e0fe2 [PATCH] capable/capability.h (fs/) ... Browse Code »

fs: Use where capable() is used.

Signed-off-by: Randy Dunlap
Acked-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-01-12 10:42:13 +0800

10 Jan, 2006

1 commit

1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800

07 Jan, 2006

1 commit

1e8f889b1 [PATCH] Hugetlb: Copy on Write support ... Browse Code »

Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported. This helps us to safely use hugetlb pages in many more
applications. The patch makes the following changes. If needed, I also have
it broken out according to the following paragraphs.

1. Add a pair of functions to set/clear write access on huge ptes. The
writable check in make_huge_pte is moved out to the caller for use by COW
later.

2. Hugetlb copy-on-write requires special case handling in the following
situations:

- copy_hugetlb_page_range() - Copied pages must be write protected so
a COW fault will be triggered (if necessary) if those pages are written
to.

- find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
page cache. MAP_PRIVATE pages still need to be locked however.

3. Provide hugetlb_cow() and calls from hugetlb_fault() and
hugetlb_no_page() which handles the COW fault by making the actual copy.

4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
will be allowed. Make MAP_HUGETLB exempt from the depricated VM_RESERVED
mapping check.

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2006-01-07 00:33:23 +0800

23 Nov, 2005

1 commit

74a8a65c5 [PATCH] Fix hugetlbfs_statfs() reporting of block limits ... Browse Code »

Currently, if a hugetlbfs is mounted without limits (the default), statfs()
will return -1 for max/free/used blocks. This does not appear to be in
line with normal convention: simple_statfs() and shmem_statfs() both return
0 in similar cases. Worse, it confuses the translation logic in
put_compat_statfs(), causing it to return -EOVERFLOW on such a mount.

This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks
on a mount without limits. Note that we need the test in the patch below,
rather than just using 0 in the sbinfo structure, because the -1 marked in
the free blocks field is used internally to tell the

Signed-off-by: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2005-11-23 01:13:43 +0800

09 Nov, 2005

1 commit

8d3d81cf0 [PATCH] fs/hugetlbfs/inode.c: make a function static ... Browse Code »

This patch makes a needlessly global function static.

Signed-off-by: Adrian Bunk
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-09 23:56:41 +0800

30 Oct, 2005

7 commits

2e9b367c2 [PATCH] hugetlb: overcommit accounting check ... Browse Code »

Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.

Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting. I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case). That
work is underway and builds on what this patch starts.

Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.

Signed-off-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2005-10-30 12:40:43 +0800
4c8872659 [PATCH] hugetlb: demand fault handler ... Browse Code »

Below is a patch to implement demand faulting for huge pages. The main
motivation for changing from prefaulting to demand faulting is so that huge
page memory areas can be allocated according to NUMA policy.

Thanks to consolidated hugetlb code, switching the behavior requires changing
only one fault handler. The bulk of the patch just moves the logic from
hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().

Signed-off-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2005-10-30 12:40:43 +0800
0b1533f67 [PATCH] cleanup hugelbfs_forget_inode ... Browse Code »

Reformat hugelbfs_forget_inode and add the missing but harmless
write_inode_now call. It looks the same as generic_forget_inode now except
for the call to truncate_hugepages instead of truncate_inode_pages.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-10-30 12:40:43 +0800
6b09b9df0 [PATCH] kill hugelbfs_do_delete_inode ... Browse Code »

hugetlbfs_do_delete_inode is the same as generic_delete_inode now, so remove
it in favour of the latter.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-10-30 12:40:43 +0800
149f4211a [PATCH] hugetlbfs: clean up hugetlbfs_delete_inode ... Browse Code »

Make hugetlbfs looks the same as generic_detelte_inode, fixing a bunch of
missing updates to it at the same time. Rename it to
hugetlbfs_do_delete_inode and add a real hugetlbfs_delete_inode that
implements ->delete_inode.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-10-30 12:40:43 +0800
96527980d [PATCH] hugetlbfs: move free_inodes accounting ... Browse Code »

Move hugetlbfs accounting into ->alloc_inode / ->destroy_inode. This keeps
the code simpler, fixes a loeak where a failing inode allocation wouldn't
decrement the counter and moves hugetlbfs_delete_inode and
hugetlbfs_forget_inode closer to their generic counterparts.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-10-30 12:40:43 +0800
508034a32 [PATCH] mm: unmap_vmas with inner ptlock ... Browse Code »

Remove the page_table_lock from around the calls to unmap_vmas, and replace
the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
now safe to descend without page_table_lock.

Don't attempt fancy locking for hugepages, just take page_table_lock in
unmap_hugepage_range. Which makes zap_hugepage_range, and the hugetlb test in
zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway. Nor
does unmap_vmas have much use for its mm arg now.

The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
page_table_lock: if they're implemented at all, they typically come down to
flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
(which we already audited for the mprotect case).

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:41 +0800

22 Jun, 2005

1 commit

1363c3cd8 [PATCH] Avoiding mmap fragmentation ... Browse Code »

Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.

So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.

2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander
Credit-to: "Richard Purdie"
Signed-off-by: Ken Chen
Acked-by: Ingo Molnar (partly)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfgang Wander
2005-06-22 09:46:16 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800