Eric Lee / smarc-fsl-linux-kernel

28 Apr, 2008

1 commit

0dd1334fa fix invalidate_inode_pages2_range() to not clear ret ... Browse Code »

DIO invalidates page cache through invalidate_inode_pages2_range().
invalidate_inode_pages2_range() sets ret=-EIO when
invalidate_complete_page2() fails, but this ret is cleared if
do_launder_page() succeed on a page of next index.

In this case, dio is carried out even if invalidate_complete_page2() fails
on some pages.

This can cause inconsistency between memory and blocks on HDD because the
page cache still exists.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hisashi Hifumi
Cc: Badari Pulavarty
Cc: Ken Chen
Cc: Zach Brown
Cc: Nick Piggin
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Chuck Lever
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2008-04-28 23:58:18 +0800

04 Mar, 2008

1 commit

0643245f5 docbook: fix kernel-api source files ... Browse Code »

Fix docbook problems in kernel-api.tmpl.
These cause the generated docbook to be incorrect.

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-04 02:47:14 +0800

06 Feb, 2008

3 commits

62e1c5530 page migraton: handle orphaned pages ... Browse Code »

Orphaned page might have fs-private metadata, the page is truncated. As
the page hasn't mapping, page migration refuse to migrate the page. It
appears the page is only freed in page reclaim and if zone watermark is
low, the page is never freed, as a result migration always fail. I thought
we could free the metadata so such page can be freed in migration and make
migration more reliable.

[akpm@linux-foundation.org: go direct to try_to_free_buffers()]
Signed-off-by: Shaohua Li
Acked-by: Nick Piggin
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2008-02-06 01:44:19 +0800
a2b345642 Fix dirty page accounting leak with ext3 data=journal ... Browse Code »

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 ("Clean up and make
try_to_free_buffers() not race with dirty pages"), try_to_free_buffers
was changed to bail out if the page was dirty.

That in turn caused truncate_complete_page to leak massive amounts of
memory, because the dirty bit was only cleared after the call to
try_to_free_buffers.

So the call to cancel_dirty_page was moved up to have the dirty bit
cleared early in 3e67c0987d7567ad666641164a153dca9a43b11d ("truncate:
clear page dirtiness before running try_to_free_buffers()").

The problem with that fix is, that the page can be redirtied after
cancel_dirty_page was called, eg. like this:

truncate_complete_page()
cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
do_invalidatepage()
ext3_invalidatepage()
journal_invalidatepage()
journal_unmap_buffer()
__dispose_buffer()
__journal_unfile_buffer()
__journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

And then we end up with dirty pages being wrongly accounted.

As a result, in ecdfc9787fe527491baefc22dce8b2dbd5b2908d ("Resurrect
'try_to_free_buffers()' VM hackery") the changes to try_to_free_buffers
were reverted, so the original reason for the massive memory leak is
gone, and we can also revert the move of the call to cancel_dirty_page
from truncate_complete_page and get the accounting right again.

I'm not sure if it matters, but opposed to the final check in
__remove_from_page_cache, this one also cares about the task io
accounting, so maybe we want to use this instead, although it's not
quite the clean fix either.

Signed-off-by: Björn Steinbrink
Tested-by: Krzysztof Piotr Oledzki
Cc: Jan Kara
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Thomas Osterried
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Steinbrink
2008-02-06 01:44:19 +0800
eebd2aa35 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user ... Browse Code »

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter
Cc: Steven French
Cc: Michael Halcrow
Cc:
Cc: Steven Whitehouse
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: David Chinner
Cc: Michael Halcrow
Cc: Steven French
Cc: Steven Whitehouse
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-02-06 01:44:13 +0800

04 Feb, 2008

1 commit

28bc44d7d do_invalidatepage() comment typo fix ... Browse Code »

Fix a typo in the comment for do_invalidatepage().

Signed-off-by: Fengguang Wu
Signed-off-by: Adrian Bunk

Fengguang Wu
2008-02-04 00:04:10 +0800

17 Oct, 2007

2 commits

4af3c9cc4 Drop some headers from mm.h ... Browse Code »

mm.h doesn't use directly anything from mutex.h and backing-dev.h, so
remove them and add them back to files which need them.

Cross-compile tested on many configs and archs.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-10-17 23:42:55 +0800
c9e51e418 mm: count reclaimable pages per BDI ... Browse Code »

Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable.

Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-10-17 23:42:45 +0800

20 Jul, 2007

2 commits

54cb8821d mm: merge populate and nopage into fault (fixes nonlinear) ... Browse Code »

Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping. The hitch here
is that the ->nopage handler didn't pass down enough information (ie. pgoff).
But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn. Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache. Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin
Signed-off-by: Randy Dunlap
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800
d00806b18 mm: fix fault vs invalidate race for linear mappings ... Browse Code »

Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache. Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size. do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size). The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data. Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit. He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight. However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit. Scalability is not an issue.

This patch implements this latter approach. ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so). do_no_page only unlocks the page after setting up the mapping
completely. invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800

18 Jul, 2007

1 commit

787d2214c fs: introduce some page/buffer invariants ... Browse Code »

It is a bug to set a page dirty if it is not uptodate unless it has
buffers. If the page has buffers, then the page may be dirty (some buffers
dirty) but not uptodate (some buffers not uptodate). The exception to this
rule is if the set_page_dirty caller is racing with truncate or invalidate.

A buffer can not be set dirty if it is not uptodate.

If either of these situations occurs, it indicates there could be some data
loss problem. Some of these warnings could be a harmless one where the
page or buffer is set uptodate immediately after it is dirtied, however we
should fix those up, and enforce this ordering.

Bring the order of operations for truncate into line with those of
invalidate. This will prevent a page from being able to go !uptodate while
we're holding the tree_lock, which is probably a good thing anyway.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-18 01:23:02 +0800

17 Jul, 2007

2 commits

fc9a07e7b invalidate_mapping_pages(): add cond_resched ... Browse Code »

invalidate_mapping_pages() can sometimes take a long time (millions of pages
to free). Long enough for the softlockup detector to trigger.

We used to have a cond_resched() in there but I took it out because the
drop_caches code calls invalidate_mapping_pages() under inode_lock.

The patch adds a nasty flag and puts the cond_resched() back.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:36 +0800
2706a1b89 vmscan: fix comments related to shrink_list() ... Browse Code »

Fix the shrink_list name on some files under mm/ directory.

Signed-off-by: Anderson Briglia
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anderson Briglia
2007-07-17 00:05:35 +0800

10 May, 2007

1 commit

01f2705da fs: convert core functions to zero_user_page ... Browse Code »

It's very common for file systems to need to zero part or all of a page,
the simplist way is just to use kmap_atomic() and memset(). There's
actually a library function in include/linux/highmem.h that does exactly
that, but it's confusingly named memclear_highpage_flush(), which is
descriptive of *how* it does the work rather than what the *purpose* is.
So this patchset renames the function to zero_user_page(), and calls it
from the various places that currently open code it.

This first patch introduces the new function call, and converts all the
core kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones. Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a
patch deprecating the old call. The diffstat below shows the entire
patchset.

[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Nate Diller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nate Diller
2007-05-10 03:30:55 +0800

02 Mar, 2007

1 commit

7b965e088 [PATCH] VM: invalidate_inode_pages2_range() should not exit early ... Browse Code »

Fix invalidate_inode_pages2_range() so that it does not immediately exit
just because a single page in the specified range could not be removed.

Signed-off-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trond Myklebust
2007-03-02 06:53:39 +0800

12 Feb, 2007

2 commits

fc0ecff69 [PATCH] remove invalidate_inode_pages() ... Browse Code »

Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().

Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-12 02:51:31 +0800
54bc48552 [PATCH] Export invalidate_mapping_pages() to modules ... Browse Code »

It makes no sense to me to export invalidate_inode_pages() and not
invalidate_mapping_pages() and I actually need invalidate_mapping_pages()
because of its range specification ability...

akpm: also remove the export of invalidate_inode_pages() by making it an
inlined wrapper.

Signed-off-by: Anton Altaparmakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Altaparmakov
2007-02-12 02:51:30 +0800

27 Jan, 2007

2 commits

569d3287c [PATCH] MM: Remove [PATCH] invalidate_inode_pages2_range() debug ... Browse Code »

NFS can handle the case where invalidate_inode_pages2_range() fails, so the
premise behind commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140 is now gone.

Remove the WARN_ON_ONCE() which is causing users grief as we can see from
http://bugzilla.kernel.org/show_bug.cgi?id=7826

Signed-off-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trond Myklebust
2007-01-27 05:51:00 +0800
ecdfc9787 Resurrect 'try_to_free_buffers()' VM hackery ... Browse Code »

It's not pretty, but it appears that ext3 with data=journal will clean
pages without ever actually telling the VM that they are clean. This,
in turn, will result in the VM (and balance_dirty_pages() in particular)
to never realize that the pages got cleaned, and wait forever for an
event that already happened.

Technically, this seems to be a problem with ext3 itself, but it used to
be hidden by 'try_to_free_buffers()' noticing this situation on its own,
and just working around the filesystem problem.

This commit re-instates that hack, in order to avoid a regression for
the 2.6.20 release. This fixes bugzilla 7844:

http://bugzilla.kernel.org/show_bug.cgi?id=7844

Peter Zijlstra points out that we should probably retain the debugging
code that this removes from cancel_dirty_page(), and I agree, but for
the imminent release we might as well just silence the warning too
(since it's not a new bug: anything that triggers that warning has been
around forever).

Acked-by: Randy Dunlap
Acked-by: Jens Axboe
Acked-by: Peter Zijlstra
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-01-27 04:47:06 +0800

12 Jan, 2007

1 commit

e3db7691e [PATCH] NFS: Fix race in nfs_release_page() ... Browse Code »

NFS: Fix race in nfs_release_page()

invalidate_inode_pages2() may find the dirty bit has been set on a page
owing to the fact that the page may still be mapped after it was locked.
Only after the call to unmap_mapping_range() are we sure that the page
can no longer be dirtied.
In order to fix this, NFS has hooked the releasepage() method and tries
to write the page out between the call to unmap_mapping_range() and the
call to remove_mapping(). This, however leads to deadlocks in the page
reclaim code, where the page may be locked without holding a reference
to the inode or dentry.

Fix is to add a new address_space_operation, launder_page(), which will
attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust

Also, the bare SetPageDirty() can skew all sort of accounting leading to
other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trond Myklebust
2007-01-12 10:18:21 +0800

24 Dec, 2006

1 commit

8368e328d Clean up and export cancel_dirty_page() to modules ... Browse Code »

Make cancel_dirty_page() act more like all the other dirty and writeback
accounting functions: test for "mapping" being NULL, and do the
NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

Also, add it to the exports, so that modular filesystems can use it.

Acked-by: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-24 01:25:04 +0800

23 Dec, 2006

1 commit

5f2a105d5 [PATCH] truncate: dirty memory accounting fix ... Browse Code »

Only (un)account for IO and page-dirtying for devices which have real backing
store (ie: not tmpfs or ramdisks).

Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-23 00:55:45 +0800

22 Dec, 2006

2 commits

3e67c0987 [PATCH] truncate: clear page dirtiness before running try_to_free_buffers() ... Browse Code »

truncate presently invalidates the dirty page's buffer_heads then shoots down
the page. But try_to_free_buffers() will now bale out because the page is
dirty.

Net effect: the LRU gets filled with dirty pages which have invalidated
buffer_heads attached. They have no ->mapping and hence cannot be cleaned.
The machine leaks memory at an enormous rate.

Fix this by cleaning the page before running try_to_free_buffers(), so
try_to_free_buffers() can do its work.

Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
machine won't wedge up trying to write non-existent dirty pages.

Probably still wrong, but now less so.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-22 03:17:26 +0800
fba2591bf VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions ... Browse Code »

They were horribly easy to mis-use because of their tempting naming, and
they also did way more than any users of them generally wanted them to
do.

A dirty page can become clean under two circumstances:

(a) when we write it out. We have "clear_page_dirty_for_io()" for
this, and that function remains unchanged.

In the "for IO" case it is not sufficient to just clear the dirty
bit, you also have to mark the page as being under writeback etc.

(b) when we actually remove a page due to it becoming inaccessible to
users, notably because it was truncate()'d away or the file (or
metadata) no longer exists, and we thus want to cancel any
outstanding dirty state.

For the (b) case, we now introduce "cancel_dirty_page()", which only
touches the page state itself, and verifies that the page is not mapped
(since cancelling writes on a mapped page would be actively wrong as it
is still accessible to users).

Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
ReiserFS, XFS all use the old confusing functions, and will be fixed
separately in subsequent commits (with some of them just removing the
offending logic, and others using clear_page_dirty_for_io()).

This was confirmed by Martin Michlmayr to fix the apt database
corruption on ARM.

Cc: Martin Michlmayr
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: Arjan van de Ven
Cc: Andrei Popa
Cc: Andrew Morton
Cc: Dave Kleikamp
Cc: Gordon Farquharson
Cc: Martin Schwidefsky
Cc: Trond Myklebust
Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-22 01:19:57 +0800

11 Dec, 2006

1 commit

e08748ce0 [PATCH] io-accounting: write-cancel accounting ... Browse Code »

Account for the number of byte writes which this process caused to not happen
after all.

Cc: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Cc: David Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-11 01:55:41 +0800

17 Oct, 2006

1 commit

a649fd927 [PATCH] invalidate: remove_mapping() fix ... Browse Code »

If remove_mapping() failed to remove the page from its mapping, don't go and
mark it not uptodate! Makes kernel go dead.

(Actually, I don't think the ClearPageUptodate is needed there at all).

Says Nick Piggin:

"Right, it isn't needed because at this point the page is guaranteed
by remove_mapping to have no references (except us) and cannot pick
up any new ones because it is removed from pagecache.

We can delete it."

Signed-off-by: Andrew Morton
Acked-by: Nick Piggin
Signed-off-by: Linus Torvalds

Andrew Morton
2006-10-17 23:18:43 +0800

12 Oct, 2006

2 commits

887ed2f3a [PATCH] VM: Fix the gfp_mask in invalidate_complete_page2 ... Browse Code »

If try_to_release_page() is called with a zero gfp mask, then the
filesystem is effectively denied the possibility of sleeping while
attempting to release the page. There doesn't appear to be any valid
reason why this should be banned, given that we're not calling this from a
memory allocation context.

For this reason, change the gfp_mask argument of the call to GFP_KERNEL.

Signed-off-by: Trond Myklebust
Cc: Steve Dickson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trond Myklebust
2006-10-12 02:14:22 +0800
8258d4a57 [PATCH] invalidate_inode_pages2_range() debug ... Browse Code »

A failure in invalidate_inode_pages2_range() can result in unpleasant things
happening in NFS (at least). Stick a WARN_ON_ONCE() in there so we can find
out if it happens, and maybe why.

(akpm: might be a -mm-only patch, we'll see..)

Cc: Chuck Lever
Cc: Trond Myklebust
Cc: Steve Dickson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-10-12 02:14:22 +0800

01 Oct, 2006

3 commits

bd4c8ce41 [PATCH] invalidate_inode_pages2(): ignore page refcounts ... Browse Code »

The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
unfix invalidate_inode_pages2().

The problem is that various bits of code in the kernel can take transient refs
on pages: the page scanner will do this when inspecting a batch of pages, and
the lru_cache_add() batching pagevecs also hold a ref.

Net result is transient failures in invalidate_inode_pages2(). This affects
NFS directory invalidation (observed) and presumably also block-backed
direct-io (not yet reported).

Fix it by reverting invalidate_inode_pages2() back to the old version which
ignores the page refcounts.

We may come up with something more clever later, but for now we need a 2.6.18
fix for NFS.

Cc: Chuck Lever
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-10-01 15:39:33 +0800
9361401eb [PATCH] BLOCK: Make it possible to disable the block layer [try #6] ... Browse Code »

Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.

(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:

(*) Block I/O tracing.

(*) Disk partition code.

(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.

(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.

(*) MTD blockdev handling and FTL.

(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.

(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.

(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.

(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.

(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:

(*) Default blockdev file operations (to give error ENODEV on opening).

(*) Makes some /proc changes:

(*) /proc/devices does not list any blockdevs.

(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.

(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).

(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:31 +0800
cf9a2ae8d [PATCH] BLOCK: Move functions out of buffer code [try #6] ... Browse Code »

Move some functions out of the buffering code that aren't strictly buffering
specific. This is a precursor to being able to disable the block layer.

(*) Moved some stuff out of fs/buffer.c:

(*) The file sync and general sync stuff moved to fs/sync.c.

(*) The superblock sync stuff moved to fs/super.c.

(*) do_invalidatepage() moved to mm/truncate.c.

(*) try_to_release_page() moved to mm/filemap.c.

(*) Moved some related declarations between header files:

(*) declarations for do_invalidatepage() and try_to_release_page() moved
to linux/mm.h.

(*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:31:19 +0800

27 Sep, 2006

1 commit

0fd0e6b05 [PATCH] page invalidation cleanup ... Browse Code »

Clean up the invalidate code, and use a common function to safely remove
the page from pagecache.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-27 23:26:12 +0800

09 Sep, 2006

1 commit

016eb4a0e [PATCH] invalidate_complete_page() race fix ... Browse Code »

If a CPU faults this page into pagetables after invalidate_mapping_pages()
checked page_mapped(), invalidate_complete_page() will still proceed to remove
the page from pagecache. This leaves the page-faulting process with a
detached page. If it was MAP_SHARED then file data loss will ensue.

Fix that up by checking the page's refcount after taking tree_lock.

Cc: Nick Piggin
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-09 01:22:50 +0800

23 Jun, 2006

1 commit

e0f23603f [PATCH] Remove semi-softlockup from invalidate_mapping_pages ... Browse Code »

If invalidate_mapping_pages is called to invalidate a very large mapping
(e.g. a very large block device) and if the only active page in that
device is near the end (or at least, at a very large index), such as, say,
the superblock of an md array, and if that page happens to be locked when
invalidate_mapping_pages is called, then

pagevec_lookup will return this page and
as it is locked, 'next' will be incremented and pagevec_lookup
will be called again. and again. and again.
while we count from 0 upto a very large number.

We should really always set 'next' to 'page->index+1' before going around
the loop again, not just if the page isn't locked.

Cc: "Steinar H. Gunderson"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-23 22:43:07 +0800

10 Jan, 2006

1 commit

1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800

09 Jan, 2006

1 commit

9d0243bca [PATCH] drop-pagecache ... Browse Code »

Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.

This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-01-09 12:12:40 +0800

07 Jan, 2006

1 commit

d7339071f [PATCH] reiser4: vfs: add truncate_inode_pages_range() ... Browse Code »

This patch makes truncate_inode_pages_range from truncate_inode_pages.
truncate_inode_pages became a one-liner call to truncate_inode_pages_range.

Reiser4 needs truncate_inode_pages_ranges because it tries to keep
correspondence between existences of metadata pointing to data pages and pages
to which those metadata point to. So, when metadata of certain part of file
is removed from filesystem tree, only pages of corresponding range are to be
truncated.

(Needed by the madvise(MADV_REMOVE) patch)

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans Reiser
2006-01-07 00:33:22 +0800

24 Nov, 2005

1 commit

479ef592f [PATCH] 32bit integer overflow in invalidate_inode_pages2() ... Browse Code »

Fix a 32 bit integer overflow in invalidate_inode_pages2_range.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Drokin
2005-11-24 08:08:39 +0800

31 Oct, 2005

1 commit

aaa4059bc [PATCH] ext3: Fix unmapped buffers in transaction's lists ... Browse Code »

Fix the problem (BUG 4964) with unmapped buffers in transaction's
t_sync_data list. The problem is we need to call filesystem's own
invalidatepage() from block_write_full_page().

block_write_full_page() must call filesystem's invalidatepage(). Otherwise
following nasty race can happen:

proc 1 proc 2
------ ------
- write some new data to 'offset'
=> bh gets to the transactions data list
- starts truncate
=> i_size set to new size
- mpage_writepages()
- ext3_ordered_writepage() to 'offset'
- block_write_full_page()
- page->index > end_index+1
- block_invalidatepage()
- discard_buffer()
- clear_buffer_mapped()

- commit triggers and finds unmapped buffer - BOOM!

Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2005-10-31 09:37:17 +0800

01 May, 2005

1 commit

67be2dd1b [PATCH] DocBook: fix some descriptions ... Browse Code »

Some KernelDoc descriptions are updated to match the current code.
No code changes.

Signed-off-by: Martin Waitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Waitz
2005-05-01 23:59:26 +0800