Eric Lee / smarc-fsl-linux-kernel

24 Apr, 2007

3 commits

0e8c7d0fd page migration: fix NR_FILE_PAGES accounting ... Browse Code »

NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to. If we replace the page in the radix tree then we may have to
shift the count to another zone.

Suggested-by: Ethan Solomita
Eventually-typed-in-by: Christoph Lameter
Cc: Martin Bligh
Cc:
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-04-24 23:23:08 +0800
3d124cbba fix OOM killing processes wrongly thought MPOL_BIND ... Browse Code »

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins
Acked-by: Christoph Lameter
Acked-by: William Irwin
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-04-24 23:23:07 +0800
650a7c974 oom: kill all threads that share mm with killed task ... Browse Code »

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584)

Acked-by: William Irwin
Acked-by: Christoph Lameter
Cc: Nick Piggin
Cc: Andrew Morton
Cc: Andi Kleen
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2007-04-24 23:11:49 +0800

13 Apr, 2007

1 commit

6a04de6db [PATCH] nommu: fix bug ip_conntrack does not work on nommu ... Browse Code »

num_physpages is not exported out in mm/nommu.c, so the ip_conntrack module
link will fail.

Signed-off-by: Bryan Wu
Acked-By: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu, Bryan
2007-04-13 06:31:42 +0800

05 Apr, 2007

1 commit

8d00647f2 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 ... Browse Code »

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] cio: Fix handling of interrupt for csch().
[S390] page_mkclean data corruption.

Linus Torvalds
2007-04-05 01:11:16 +0800

04 Apr, 2007

2 commits

e94a40c50 [PATCH] SLAB: Mention slab name when listing corrupt objects ... Browse Code »

Mention the slab name when listing corrupt objects. Although the function
that released the memory is mentioned, that is frequently ambiguous as such
functions often release several pieces of memory.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2007-04-04 23:51:52 +0800
6e1beb3c2 [S390] page_mkclean data corruption. ... Browse Code »

The git commit c2fda5fed81eea077363b285b66eafce20dfd45a which
added the page_test_and_clear_dirty call to page_mkclean and the
git commit 7658cc289288b8ae7dd2c2224549a048431222b3 which fixes
the "nasty and subtle race in shared mmap'ed page writeback"
problem in clear_page_dirty_for_io cause data corruption on s390.

The effect of the two changes is that for every call to
clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
the per page dirty bit is set set_page_dirty is called. Strangly
clear_page_dirty_for_io is called for not-uptodate pages, e.g.
over this call-chain:

[] clear_page_dirty_for_io+0x12a/0x130
[] generic_writepages+0x258/0x3e0
[] do_writepages+0x76/0x7c
[] __writeback_single_inode+0xba/0x3e4
[] sync_sb_inodes+0x23e/0x398
[] writeback_inodes+0x12e/0x140
[] wb_kupdate+0xd2/0x178
[] pdflush+0x162/0x23c

The bad news now is that page_test_and_clear_dirty might claim
that a not-uptodate page is dirty since SetPageUptodate which
resets the per page dirty bit has not yet been called. The page
writeback that follows clobbers the data on disk.

The simplest solution to this problem is to move the call to
page_test_and_clear_dirty under the "if (page_mapped(page))".
If a file backed page is mapped it is uptodate.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2007-04-04 20:37:39 +0800

29 Mar, 2007

5 commits

a76c0b976 [PATCH] mm: fix xip issue with /dev/zero ... Browse Code »

Fix the bug, that reading into xip mapping from /dev/zero fills the user
page table with ZERO_PAGE() entries. Later on, xip cannot tell which pages
have been ZERO_PAGE() filled by access to a sparse mapping, and which ones
origin from /dev/zero. It will unmap ZERO_PAGE from all mappings when
filling the sparse hole with data. xip does now use its own zeroed page
for its sparse mappings. Please apply.

Signed-off-by: Carsten Otte
Signed-off-by: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carsten Otte
2007-03-29 23:22:26 +0800
90ed52ebe [PATCH] holepunch: fix mmap_sem i_mutex deadlock ... Browse Code »
44

sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can easily devise
deadlocks from that ordering.

madvise_remove drop mmap_sem while calling vmtruncate_range: luckily, since
madvise_remove doesn't split or merge vmas, it's easy to handle this case with
a NULL prev, without restructuring sys_madvise. (Though sad to retake
mmap_sem when it's unlikely to be needed, and certainly down_read is
sufficient for MADV_REMOVE, unlike the other madvices.)

Signed-off-by: Hugh Dickins
Cc: Miklos Szeredi
Cc: Badari Pulavarty
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-03-29 23:22:26 +0800
16a100190 [PATCH] holepunch: fix disconnected pages after second truncate ... Browse Code »

shmem_truncate_range has its own truncate_inode_pages_range, to free any pages
racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when
this might have happened. But holepunching gets no chance to clear that flag
at the start of vmtruncate_range, so it's always set (unless a truncate came
just before), so holepunch almost always does this second
truncate_inode_pages_range.

shmem holepunch has unlikely swapfile races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to skip
the second truncate in the punch_hole case, but Miklos points out that would
make holepunch correctness more vulnerable to swapoff. So keep the second
truncate, but follow it by an unmap_mapping_range to eliminate the
disconnected pages (freed from pagecache while still mapped in userspace) that
it might have left behind.

Signed-off-by: Hugh Dickins
Cc: Miklos Szeredi
Cc: Badari Pulavarty
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-03-29 23:22:25 +0800
1ae700063 [PATCH] holepunch: fix shmem_truncate_range punch locking ... Browse Code »

Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching, which
receives no such protection from i_size or next_index, and is left vulnerable
to races with shmem_unuse, shmem_getpage and shmem_writepage.

Hold info->lock throughout when holepunching? No, any user could prevent
rescheduling for far too long. Instead take info->lock just when needed: in
shmem_free_swp when removing the swap entries, and whenever removing a
directory page from the level above. But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except at
misaligned start and end of the hole.

Signed-off-by: Hugh Dickins
Cc: Miklos Szeredi
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-03-29 23:22:25 +0800
a2646d1e6 [PATCH] holepunch: fix shmem_truncate_range punching too far ... Browse Code »

Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare
circumstances, because shmem_truncate_range() erroneously removes partially
truncated directory pages at the end of the range: later reclaim on pages
pointing to these removed directories triggers the BUG. Indeed, and it can
also cause data loss beyond the hole.

Fix this as in the patch proposed by Miklos, but distinguish between "limit"
(how far we need to search: ignore truncation's next_index optimization in the
holepunch case - if there are races it's more consistent to act on the whole
range specified) and "upper_limit" (how far we can free directory pages:
generally we must be careful to keep partially punched pages, but can relax at
end of file - i_size being held stable by i_mutex).

Signed-off-by: Hugh Dickins
Cc: Miklos Szeredi
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-03-29 23:22:25 +0800

27 Mar, 2007

1 commit

f772b3d9c block: blk_max_pfn is somtimes wrong ... Browse Code »

There is a small problem in handling page bounce.

At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
possible _number_ of a page frame, but the _amount_ of page frames. For
example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
0xFFFF.

request_queue structure has a member q->bounce_pfn and queue needs bounce
pages for the pages _above_ this limit. This routine is handled by
blk_queue_bounce(), where the following check is produced:

if (q->bounce_pfn >= blk_max_pfn)
return;

Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
equals 0x10000. In such situation the check above fails and for each bio
we always fall down for iterating over pages tied to the bio.

I want to notice, that for quite a big range of device drivers (ide, md,
...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
then the check above doesn't fail. But for other drivers, which obtain
reuired value from drivers, it fails. For example sata_nv uses
ATA_DMA_MASK or dev->dma_mask.

I propose to use (max_pfn - 1) for blk_max_pfn. And the same for
blk_max_low_pfn. The patch also cleanses some checks related with
bounce_pfn.

Signed-off-by: Vasily Tarasov
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Vasily Tarasov
2007-03-27 14:52:47 +0800

23 Mar, 2007

2 commits

165b23927 [PATCH] NOMMU: make SYSV SHM nattch work correctly ... Browse Code »

Make the SYSV SHM nattch counter work correctly by forcing multiple VMAs to
be produced to represent MAP_SHARED segments, even if they overlap exactly.

Using this test program:

http://people.redhat.com/~dhowells/doshm.c

Run as:

doshm sysv

I can see nattch going from one before the patch:

# /doshm sysv
Command: sysv
shmid: 65536
memory: 0xc3700000
c0b00000-c0b04000 rw-p 00000000 00:00 0
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c3520000-c352278c rw-p 00000000 00:0b 13763417 /doshm
c3584000-c35865e8 r-xs 00000000 00:0b 13763417 /doshm
c3588000-c358aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3590000-c359b6c0 rw-p 00000000 00:00 0
c3620000-c3640000 rwxp 00000000 00:00 0
c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted)
c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted)
nattch 1

To two after the patch:

# /doshm sysv
Command: sysv
shmid: 0
memory: 0xc3700000
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c3320000-c3340000 rwxp 00000000 00:00 0
c3530000-c35325e8 r-xs 00000000 00:0b 13763417 /doshm
c3534000-c353678c rw-p 00000000 00:0b 13763417 /doshm
c3538000-c353aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3590000-c359b6c0 rw-p 00000000 00:00 0
c35a4000-c35a8000 rw-p 00000000 00:00 0
c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted)
c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted)
nattch 2

That's +1 to nattch for each shmat() made.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2007-03-23 10:39:06 +0800
d56e03cd2 [PATCH] NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHM ... Browse Code »

Supply a get_unmapped_area() to fix NOMMU SYSV SHM support.

Signed-off-by: David Howells
Acked-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2007-03-23 10:39:05 +0800

17 Mar, 2007

4 commits

35ae834fa [PATCH] oom fix: prevent oom from killing a process with children/sibling unkillable ... Browse Code »

Looking at oom_kill.c, found that the intention to not kill the selected
process if any of its children/siblings has OOM_DISABLE set, is not being
met.

Signed-off-by: Ankita Garg
Acked-by: Nick Piggin
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ankita Garg
2007-03-17 10:25:06 +0800
89a09141d [PATCH] nfs: fix congestion control ... Browse Code »

The current NFS client congestion logic is severly broken, it marks the
backing device congested during each nfs_writepages() call but doesn't
mirror this in nfs_writepage() which makes for deadlocks. Also it
implements its own waitqueue.

Replace this by a more regular congestion implementation that puts a cap on
the number of active writeback pages and uses the bdi congestion waitqueue.

Also always use an interruptible wait since it makes sense to be able to
SIGKILL the process even for mounts without 'intr'.

Signed-off-by: Peter Zijlstra
Acked-by: Trond Myklebust
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-03-17 10:25:05 +0800
65b8291c4 [PATCH] dio: invalidate clean pages before dio write ... Browse Code »

This patch fixes a user-triggerable oops that was reported by Leonid
Ananiev as archived at http://lkml.org/lkml/2007/2/8/337.

dio writes invalidate clean pages that intersect the written region so that
subsequent buffered reads go to disk to read the new data. If this fails
the interface tries to tell the caller that the cache is inconsistent by
returning EIO.

Before this patch we had the problem where this invalidation failure would
clobber -EIOCBQUEUED as it made its way from fs/direct-io.c to fs/aio.c.
Both fs/aio.c and bio completion call aio_complete() and we reference freed
memory, usually oopsing.

This patch addresses this problem by invalidating before the write so that
we can cleanly return -EIO before ->direct_IO() has had a chance to return
-EIOCBQUEUED.

There is a compromise here. During the dio write we can fault in mmap()ed
pages which intersect the written range with get_user_pages() if the user
provided them for the source buffer. This is a crazy thing to do, but we
can make it mostly work in most cases by trying the invalidation again.
The compromise is that we won't return an error if this second invalidation
fails if it's an AIO write and we have -EIOCBQUEUED.

This was tested by having two processes race performing large O_DIRECT and
buffered ordered writes. Within minutes ext3 would see a race between
ext3_releasepage() and jbd holding a reference on ordered data buffers and
would cause invalidation to fail, panicing the box. The test can be found
in the 'aio_dio_bugs' test group in test.kernel.org/autotest. After this
patch the test passes.

Signed-off-by: Zach Brown
Signed-off-by: Benjamin LaHaise
Cc: Leonid Ananiev
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2007-03-17 10:25:04 +0800
00e9fa2d6 [PATCH] mm: fix madvise infinine loop ... Browse Code »

madvise(MADV_REMOVE) can go into an infinite loop or cause an oops if the
call covers a region from the start of a vma, and extending past that vma.

Signed-off-by: Nick Piggin
Cc: Badari Pulavarty
Acked-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-03-17 10:25:04 +0800

05 Mar, 2007

2 commits

0dc952dc3 [PATCH] Page migration: Fix vma flag checking ... Browse Code »

Currently we do not check for vma flags if sys_move_pages is called to move
individual pages. If sys_migrate_pages is called to move pages then we
check for vm_flags that indicate a non migratable vma but that still
includes VM_LOCKED and we can migrate mlocked pages.

Extract the vma_migratable check from mm/mempolicy.c, fix it and put it
into migrate.h so that is can be used from both locations.

Problem was spotted by Lee Schermerhorn

Signed-off-by: Christoph Lameter
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-03-05 23:57:51 +0800
759b9775c [PATCH] shmem and simple const super_operations ... Browse Code »

shmem's super_operations were missed from the recent const-ification;
and simple_fill_super()'s, which can share with get_sb_pseudo()'s.

Signed-off-by: Hugh Dickins
Acked-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-03-05 23:57:51 +0800

02 Mar, 2007

7 commits

7b965e088 [PATCH] VM: invalidate_inode_pages2_range() should not exit early ... Browse Code »

Fix invalidate_inode_pages2_range() so that it does not immediately exit
just because a single page in the specified range could not be removed.

Signed-off-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trond Myklebust
2007-03-02 06:53:39 +0800
34bbd7040 [PATCH] adapt page_lock_anon_vma() to PREEMPT_RCU ... Browse Code »

page_lock_anon_vma() uses spin_lock() to block RCU. This doesn't work with
PREEMPT_RCU, we have to do rcu_read_lock() explicitely. Otherwise, it is
theoretically possible that slab returns anon_vma's memory to the system
before we do spin_unlock(&anon_vma->lock).

[ Hugh points out that this only matters for PREEMPT_RCU, which isn't merged
yet, and may never be. Regardless, this patch is conceptually the
right thing to do, even if it doesn't matter at this point. - Linus ]

Signed-off-by: Oleg Nesterov
Cc: Paul McKenney
Cc: Nick Piggin
Cc: Christoph Lameter
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-03-02 06:53:39 +0800
232ea4d69 [PATCH] throttle_vm_writeout(): don't loop on GFP_NOFS and GFP_NOIO allocations ... Browse Code »

throttle_vm_writeout() is designed to wait for the dirty levels to subside.
But if the caller holds IO or FS locks, we might be holding up that writeout.

So change it to take a single nap to give other devices a chance to clean some
memory, then return.

Cc: Nick Piggin
Cc: OGAWA Hirofumi
Cc: Kumar Gala
Cc: Pete Zaitcev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-03-02 06:53:38 +0800
d1af65d13 [PATCH] Bug in MM_RB debugging ... Browse Code »

The code is seemingly trying to make sure that rb_next() brings us to
successive increasing vma entries.

But the two variables, prev and pend, used to perform these checks, are
never advanced.

Signed-off-by: David S. Miller
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Miller
2007-03-02 06:53:38 +0800
5409bae07 [PATCH] Rename PG_checked to PG_owner_priv_1 ... Browse Code »

Rename PG_checked to PG_owner_priv_1 to reflect its availablilty as a
private flag for use by the owner/allocator of the page. In the case of
pagecache pages (which might be considered to be owned by the mm),
filesystems may use the flag.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-03-02 06:53:37 +0800
05fb6bf0b [PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers) ... Browse Code »

Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-03-02 06:53:37 +0800
9b83a6a85 [PATCH] mm/{,tiny-}shmem.c cleanups ... Browse Code »

shmem_{nopage,mmap} are no longer used in ipc/shm.c

Signed-off-by: Adrian Bunk
Cc: "Eric W. Biederman"
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-03-02 06:53:35 +0800

21 Feb, 2007

3 commits

8ef828668 [PATCH] slab: reduce size of alien cache to cover only possible nodes ... Browse Code »

The alien cache is a per cpu per node array allocated for every slab on the
system. Currently we size this array for all nodes that the kernel does
support. For IA64 this is 1024 nodes. So we allocate an array with 1024
objects even if we only boot a system with 4 nodes.

This patch uses "nr_node_ids" to determine the number of possible nodes
supported by a hardware configuration and only allocates an alien cache
sized for possible nodes.

The initialization of nr_node_ids occurred too late relative to the bootstrap
of the slab allocator and so I moved the setup_nr_node_ids() into
free_area_init_nodes().

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-21 09:10:13 +0800
74c7aa8b8 [PATCH] Replace highest_possible_node_id() with nr_node_ids ... Browse Code »

highest_possible_node_id() is currently used to calculate the last possible
node idso that the network subsystem can figure out how to size per node
arrays.

I think having the ability to determine the maximum amount of nodes in a
system at runtime is useful but then we should name this entry
correspondingly, it should return the number of node_ids, and the the value
needs to be setup only once on bootup. The node_possible_map does not
change after bootup.

This patch introduces nr_node_ids and replaces the use of
highest_possible_node_id(). nr_node_ids is calculated on bootup when the
page allocators pagesets are initialized.

[deweerdt@free.fr: fix oops]
Signed-off-by: Christoph Lameter
Cc: Neil Brown
Cc: Trond Myklebust
Signed-off-by: Frederik Deweerdt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-21 09:10:13 +0800
8af5e2eb3 [PATCH] fix mempolicy's check on a system with memory-less-node ... Browse Code »

bind_zonelist() can create zero-length zonelist if there is a
memory-less-node. This patch checks the length of zonelist. If length is
0, returns -EINVAL.

tested on ia64/NUMA with memory-less-node.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Andi Kleen
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-02-21 09:10:13 +0800

17 Feb, 2007

1 commit

29dbb3fc8 [PATCH] knfsd: stop NFSD writes from being broken into lots of little writes to filesystem ... Browse Code »

When NFSD receives a write request, the data is typically in a number of
1448 byte segments and writev is used to collect them together.

Unfortunately, generic_file_buffered_write passes these to the filesystem
one at a time, so an e.g. 32K over-write becomes a series of partial-page
writes to each page, causing the filesystem to have to pre-read those pages
- wasted effort.

generic_file_buffered_write handles one segment of the vector at a time as
it has to pre-fault in each segment to avoid deadlocks. When writing from
kernel-space (and nfsd does) this is not an issue, so
generic_file_buffered_write does not need to break and iovec from nfsd into
little pieces.

This patch avoids the splitting when get_fs is KERNEL_DS as it is
from NFSd.

This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83

Acked-by: Nick Piggin
Cc: Norman Weathers
Cc: Vladimir V. Saveliev
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-02-17 00:14:01 +0800

16 Feb, 2007

3 commits

e0a04cffa [PATCH] mincore: vma crossing fix ... Browse Code »

My mincore also forgot about crossing vmas.

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2007-02-16 01:57:03 +0800
4a76ef036 [PATCH] mincore: fill in results properly ... Browse Code »

Paper bag time. Thanks to Randy for noticing that I didn't actually assign
'present' to anything.

Unfortunately my original patch passed the few simple test cases I gave it,
purely by coincidence.

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2007-02-16 01:57:03 +0800
30fcffed8 [PATCH] mincore: CONFIG_SWAP=n fix ... Browse Code »

Fix mincore-anon patch to compile with CONFIG_SWAP=n

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2007-02-16 01:57:03 +0800

13 Feb, 2007

4 commits

92e1d5be9 [PATCH] mark struct inode_operations const 2 ... Browse Code »

Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:46 +0800
42da9cbd3 [PATCH] mm: mincore anon ... Browse Code »

Make mincore work for anon mappings, nonlinear, and migration entries.
Based on patch from Linus Torvalds .

Signed-off-by: Nick Piggin
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-02-13 01:48:27 +0800
22cd25ed3 [PATCH] Add NOPFN_REFAULT result from vm_ops->nopfn() ... Browse Code »

Add a NOPFN_REFAULT return code for vm_ops->nopfn() equivalent to
NOPAGE_REFAULT for vmops->nopage() indicating that the handler requests a
re-execution of the faulting instruction

Signed-off-by: Benjamin Herrenschmidt
Cc: Arnd Bergmann
Cc: Hugh Dickins
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2007-02-13 01:48:27 +0800
e0dc0d8f4 [PATCH] add vm_insert_pfn() ... Browse Code »

Add a vm_insert_pfn helper, so that ->fault handlers can have nopfn
functionality by installing their own pte and returning NULL.

Signed-off-by: Nick Piggin
Signed-off-by: Benjamin Herrenschmidt
Cc: Arnd Bergmann
Cc: Hugh Dickins
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-02-13 01:48:27 +0800

12 Feb, 2007

1 commit

aa0f03037 [PATCH] Change constant zero to NOTIFY_DONE in ratelimit_handler() ... Browse Code »

Change a hard-coded constant 0 to the symbolic equivalent NOTIFY_DONE in
the ratelimit_handler() CPU notifier handler function.

Signed-off-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2007-02-12 03:18:07 +0800