17 Jan, 2006
1 commit
-
like other routines here, to ensure buffers are correctly initialised
with respect to b_private/b_end_io. Fixes an odd interaction between
XFS and reiserfs.Signed-off-by: Nathan Scott
15 Jan, 2006
1 commit
-
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kbSigned-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jan, 2006
1 commit
-
fs: Use where capable() is used.
Signed-off-by: Randy Dunlap
Acked-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Jan, 2006
1 commit
-
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.Modified-by: Ingo Molnar
(finished the conversion)
Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar
09 Jan, 2006
3 commits
-
We've had two instances recently of overflows when doing
64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)
I did a tree-wide grep of `<page_base)
Cc: Oleg Drokin
Cc: David Howells
Cc: David Woodhouse
Cc:
Cc: Christoph Hellwig
Cc: Anton Altaparmakov
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Roman Zippel
Cc:
Cc: Miklos Szeredi
Cc: Russell King
Cc: Trond Myklebust
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
See mm/filemap.c:
And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error. However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)Andrew Morton writes,
If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc. Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
Trond, could you please review the nfs part? Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.Acked-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch changes generic_cont_expand(), in order to share the code
with fatfs.- Use vmtruncate() if ->prepare_write() returns a error.
Even if ->prepare_write() returns an error, it may already have added some
blocks. So, this truncates blocks outside of ->i_size by vmtruncate().- Add generic_cont_expand_simple().
The generic_cont_expand_simple() assumes that ->prepare_write() can handle
the block boundary. With this, we don't need to care the extra byte.And for expanding a file size by truncate(), fatfs uses the
added generic_cont_expand_simple().Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2005
1 commit
-
Get rid of the `int unused' parameter of __find_get_block_slow().
Signed-off-by: Coywolf Qi Hunt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Oct, 2005
2 commits
-
If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
warn and will return NULL. We have a report (from Hubert Tonneau
) of isofs_fill_super() doing this (passing in
a silly block size) against an unplugged CDROM drive.But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
oops.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix the problem (BUG 4964) with unmapped buffers in transaction's
t_sync_data list. The problem is we need to call filesystem's own
invalidatepage() from block_write_full_page().block_write_full_page() must call filesystem's invalidatepage(). Otherwise
following nasty race can happen:proc 1 proc 2
------ ------
- write some new data to 'offset'
=> bh gets to the transactions data list
- starts truncate
=> i_size set to new size
- mpage_writepages()
- ext3_ordered_writepage() to 'offset'
- block_write_full_page()
- page->index > end_index+1
- block_invalidatepage()
- discard_buffer()
- clear_buffer_mapped()- commit triggers and finds unmapped buffer - BOOM!
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Oct, 2005
1 commit
-
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Oct, 2005
2 commits
-
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds -
Beginning of gfp_t annotations:
- -Wbitwise added to CHECKFLAGS
- old __bitwise renamed to __bitwise__
- __bitwise defined to either __bitwise__ or nothing, depending on
__CHECK_ENDIAN__ being defined
- gfp_t switched from __nocast to __bitwise__
- force cast to gfp_t added to __GFP_... constants
- new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
the result to intSigned-off-by: Al Viro
Signed-off-by: Linus Torvalds
09 Oct, 2005
1 commit
-
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
11 Sep, 2005
1 commit
-
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code. It does the following
things:- consolidates and enhances the spinlock/rwlock debugging code
- simplifies the asm/spinlock.h files
- encapsulates the raw spinlock type and moves generic spinlock
features (such as ->break_lock) into the generic code.- cleans up the spinlock code hierarchy to get rid of the spaghetti.
Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c. (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)Also, i've enhanced the rwlock debugging facility, it will now track
write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:include/asm-i386/spinlock_types.h | 16
include/asm-x86_64/spinlock_types.h | 16I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:SMP | UP
----------------------------|-----------------------------------
asm/spinlock_types_smp.h | linux/spinlock_types_up.h
linux/spinlock_types.h | linux/spinlock_types.h
asm/spinlock_smp.h | linux/spinlock_up.h
linux/spinlock_api_smp.h | linux/spinlock_api_up.h
linux/spinlock.h | linux/spinlock.h/*
* here's the role of the various spinlock/rwlock related include files:
*
* on SMP builds:
*
* asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
* initializers
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
* implementations, mostly inline assembly code
*
* (also included on UP-debug builds:)
*
* linux/spinlock_api_smp.h:
* contains the prototypes for the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*
* on UP builds:
*
* linux/spinlock_type_up.h:
* contains the generic, simplified UP spinlock type.
* (which is an empty structure on non-debug builds)
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* linux/spinlock_up.h:
* contains the __raw_spin_*()/etc. version of UP
* builds. (which are NOPs on non-debug, non-preempt
* builds)
*
* (included on UP-non-debug builds:)
*
* linux/spinlock_api_up.h:
* builds the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*/All SMP and UP architectures are converted by this patch.
arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.From: Grant Grundler
Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
Builds 32-bit SMP kernel (not booted or tested). I did not try to build
non-SMP kernels. That should be trivial to fix up later if necessary.I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
are well tested and contained entirely inside arch specific code. I do NOT
expect any new issues to arise with them.If someone does ever need to use debug/metrics with them, then they will
need to unravel this hairball between spinlocks, atomic ops, and bit ops
that exist only because parisc has exactly one atomic instruction: LDCW
(load and clear word).From: "Luck, Tony"
ia64 fix
Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Grant Grundler
Cc: Matthew Wilcox
Signed-off-by: Hirokazu Takata
Signed-off-by: Mikael Pettersson
Signed-off-by: Benoit Boissinot
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Sep, 2005
2 commits
-
Introduce new ll_rw_block() operation SWRITE meaning that block layer should
wait for the buffer lock and write-out afterwards. Hence data in buffers at
the time of call are guaranteed to be submitted to the disk.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Coywolf Qi Hunt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Jul, 2005
1 commit
-
Use a bit spin lock in the first buffer of the page to synchronise asynch
IO buffer completions, instead of the global page_uptodate_lock, which is
showing some scalabilty problems.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Jun, 2005
1 commit
-
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jun, 2005
2 commits
-
fs/buffer.c::__block_prepare_write() has broken error recovery. It calls
the get_block() callback with "create = 1" and if that succeeds it
immediately clears buffer_new on the just allocated buffer (which has
buffer_new set).The bug is that if an error occurs and get_block() returns != 0, we break
from this loop and go into recovery code. This code has this comment:/* Error case: */
/*
* Zero out any newly allocated blocks to avoid exposing stale
* data. If BH_New is set, we know that the block was newly
* allocated in the above loop.
*/So the intent is obviously good in that it wants to clear just allocated
and hence not zeroed buffers. However the code recognises allocated
buffers by checking for buffer_new being set.Unfortunately __block_prepare_write() as discussed above already cleared
buffer_new on all allocated buffers thus no buffers will be cleared during
error recovery and old data will be leaked.The simplest way I can see to fix this is to make the current recovery code
work by _not_ clearing buffer_new after calling get_block() in
__block_prepare_write().We cannot safely allow buffer_new buffers to "leak out" of
__block_prepare_write(), thus we simply do a quick loop over the buffers
clearing buffer_new on each of them if it is set just before returning
"success" from __block_prepare_write().Signed-off-by: Anton Altaparmakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch consolidates sys_fsync and sys_fdatasync.
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Jun, 2005
1 commit
-
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0. The following patch removes the argument and updates all the
calls to try_to_free_pages.Signed-off-by: Darren Hart
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 May, 2005
1 commit
-
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError. But it has uptodate buffers.The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well. We end up with an uptodate, !PageError page full of
zeros and the error is lost.(It seems a little odd that filemap_nopage() runs ClearPageError(). I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O. Which is good and bad. If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.Credit-to: Qu Fuping
For reporting the bug and identifying its source.
Signed-off-by: Qu Fuping
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 May, 2005
6 commits
-
This patch makes some needlessly global identifiers static.
Signed-off-by: Adrian Bunk
Acked-by: Arjan van de Ven
Acked-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The `last_bh' logic probably isn't worth much. In those situations where only
the front part of the page is being written out we will save some looping but
in the vastly more common case of an all-page writeout if just adds more code.Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove all those get_bh()'s and put_bh()'s by extending lock_page() to cover
the troublesome regions.(get_bh() and put_bh() happen every time whereas contention on a page's lock
in there happens basically never).Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_writeIt seems that more than one request would be submitted for a given bh
at a time.What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.So change __block_write_full_page to explicitly keep track of the last bh
we need to issue, so we don't touch anything after issuing the last
request.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix a race where __block_prepare_write can leak out an in-flight read
against a bh if get_block returns an error. This can lead to the page
becoming unlocked while the buffer is locked and the read still in flight.
__mpage_writepage BUGs on this condition.BUG sighted on a 2-way Itanium2 system with 16K PAGE_SIZE running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
where $DIR is a new ext2 filesystem with 4K blocks that is quite
small (causing get_block to fail often with -ENOSPC).Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This makes sure that reclaimable buffer headers and reclaimable inodes
are accounted properly during the overcommit checks.Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 May, 2005
4 commits
-
Some KernelDoc descriptions are updated to match the current code.
No code changes.Signed-off-by: Martin Waitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove PAGE_BUG - repalce it with BUG and BUG_ON.
Signed-off-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace a number of memory barriers with smp_ variants. This means we won't
take the unnecessary hit on UP machines.Signed-off-by: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In rare situations, drop_buffers() can be called for a page which has buffers,
but no ->mapping (it was truncated, but the buffers were left behind because
ext3 was still fiddling with them).But if there was an I/O error in a buffer_head, drop_buffers() will try to get
at the address_space and will oops.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Apr, 2005
2 commits
-
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!