17 Oct, 2006
1 commit
-
When IO error happens on metadata buffer, buffer is freed from memory and
later fsync() is called, filesystems like ext2 fail to report EIO. Wesolve the problem by introducing a pointer to associated address space into
the buffer_head. When a buffer is removed from a list of metadata buffers
associated with an address space, IO error is transferred from the buffer to
the address space, so that fsync can later report it.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Oct, 2006
2 commits
-
A couple of flush_dcache_page()s are missing on the I/O-error paths.
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If grow_buffers() is for some reason passed a block number which wants to lie
outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
will accidentally truncate `index' and will then instnatiate a page at the
wrong pagecache offset. This causes __getblk_slow() to go into an infinite
loop.This can happen with corrupted disks, or with software errors elsewhere.
Detect that, and handle it.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Oct, 2006
1 commit
-
This was triggered, but not the fault of, the dirty page accounting
patches. Suitable for -stable as well, after it goes upstream.Unable to handle kernel NULL pointer dereference at virtual address 0000004c
EIP is at _spin_lock+0x12/0x66
Call Trace:
[] __set_page_dirty_buffers+0x15/0xc0
[] set_page_dirty+0x2c/0x51
[] set_page_dirty_balance+0xb/0x3b
[] __do_fault+0x1d8/0x279
[] __handle_mm_fault+0x125/0x951
[] do_page_fault+0x440/0x59f
[] error_code+0x39/0x40
[] 0x8048a33Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds
01 Oct, 2006
1 commit
-
Move some functions out of the buffering code that aren't strictly buffering
specific. This is a precursor to being able to disable the block layer.(*) Moved some stuff out of fs/buffer.c:
(*) The file sync and general sync stuff moved to fs/sync.c.
(*) The superblock sync stuff moved to fs/super.c.
(*) do_invalidatepage() moved to mm/truncate.c.
(*) try_to_release_page() moved to mm/filemap.c.
(*) Moved some related declarations between header files:
(*) declarations for do_invalidatepage() and try_to_release_page() moved
to linux/mm.h.(*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
26 Sep, 2006
1 commit
-
Tracking of dirty pages in shared writeable mmap()s.
The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protectionPage from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.mprotect() is taught about the new behaviour as well. However it overrides
the last condition.Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra
Cc: Hugh Dickins
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Aug, 2006
1 commit
-
We can immediately bail from invalidate_bdev() if the blockdev has no
pagecache.This solves the huge IPI storms which hald is causing on the big ia64
machines when it polls CDROM drives.Acked-by: Jes Sorensen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2006
3 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
Remove obsolete #include
remove obsolete swsusp_encrypt
arch/arm26/Kconfig typos
Documentation/IPMI typos
Kconfig: Typos in net/sched/Kconfig
v9fs: do not include linux/version.h
Documentation/DocBook/mtdnand.tmpl: typo fixes
typo fixes: specfic -> specific
typo fixes in Documentation/networking/pktgen.txt
typo fixes: occuring -> occurring
typo fixes: infomation -> information
typo fixes: disadvantadge -> disadvantage
typo fixes: aquire -> acquire
typo fixes: mecanism -> mechanism
typo fixes: bandwith -> bandwidth
fix a typo in the RTC_CLASS help text
smb is no longer maintainedManually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
-
This makes nr_dirty a per zone counter. Looping over all processors is
avoided during writeback state determination.The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones. Someone more familiar with
NFS should probably review what I have done.[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
29 Jun, 2006
1 commit
-
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jun, 2006
1 commit
-
- add a proper prototype for the following global function:
- buffer_init()- make the following needlessly global function static:
- end_buffer_async_write()Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2006
1 commit
-
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.Signed-off-by: Jens Axboe
28 Mar, 2006
1 commit
-
Replace for_each_pgdat() with for_each_online_pgdat().
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Mar, 2006
6 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
Kconfig help: MTD_JEDECPROBE already supports Intel
Remove ugly debugging stuff
do_mounts.c: Minor ROOT_DEV comment cleanup
BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
BUG_ON() Conversion in mm/mempool.c
BUG_ON() Conversion in mm/memory.c
BUG_ON() Conversion in kernel/fork.c
BUG_ON() Conversion in ipc/sem.c
BUG_ON() Conversion in fs/ext2/
BUG_ON() Conversion in fs/hfs/
BUG_ON() Conversion in fs/dcache.c
BUG_ON() Conversion in fs/buffer.c
BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
BUG_ON() Conversion in md/dm-table.c
BUG_ON() Conversion in md/dm-path-selector.c
BUG_ON() Conversion in drivers/isdn
BUG_ON() Conversion in drivers/char
BUG_ON() Conversion in drivers/mtd/ -
Pass amount of disk needs to be mapped to get_block(). This way one can
modify the fs ->get_block() functions to map multiple blocks at the same time.[akpm@osdl.org: performance tweak]
[akpm@osdl.org: remove unneeded assignments]
Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.Signed-off-by: Nathan Scott
Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The return value of this function is never used, so let's be honest and
declare it as void.Some places where invalidatepage returned 0, I have inserted comments
suggesting a BUG_ON.[akpm@osdl.org: JBD BUG fix]
[akpm@osdl.org: rework for git-nfs]
[akpm@osdl.org: don't go BUG in block_invalidate_page()]
Signed-off-by: Neil Brown
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The only user ignores the return value, and the only instanace
(block_sync_page) always returns 0...Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized awaySigned-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
26 Mar, 2006
1 commit
-
freeze_bdev() uses a fsync_super() without sync_blockdev(). This patch
makes __fsync_super() and shares it.Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2006
4 commits
-
Pull the guts out of do_fsync() - we can use it elsewhere.
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state. This wasn't right in a couple of places. Do a
kernel-wide audit, fix things up.This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future. But we don't do that at present.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of using for_each_cpu(i), we can use for_each_online_cpu(i).
When a CPU goes offline (ie removed from online map), it might have a non
null bh_accounting.nr, so this patch adds a transfer of this counter to an
online CPU counter.We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
can do this bh_accounting.Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the kmem_cache_create calls for certain slab caches to support cpuset
memory spreading.See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
for memory spreading.The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
caches, and buffer_head. This list may change over time. In particular,
other file system types that are used extensively on large NUMA systems may
want to allow for spreading their directory and inode slab cache entries.Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2006
1 commit
-
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Mar, 2006
1 commit
-
Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c1. Extract buffer_migrate_page() from fs/buffer.c
2. Extract central migration code from vmscan.c
3. Extract some components from mempolicy.c
4. Export pageout() and remove_from_swap() from vmscan.c
5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Mar, 2006
1 commit
-
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Feb, 2006
1 commit
-
The ll_rw_block() needs to get ref-count only if it submits a buffer(). This
patch avoids the needless get/put of ref-count.Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Feb, 2006
2 commits
-
The b_private field in buffer heads needs to be zero filled when the
buffers are allocated. Thanks to Nathan Scott for finding this. It was
causing problems on systems with both XFS and reiserfs.Signed-off-by: Chris Mason
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Migrate a page with buffers without requiring writeback
This introduces a new address space operation migratepage() that may be used
by a filesystem to implement its own version of page migration.A version is provided that migrates buffers attached to pages. Some
filesystems (ext2, ext3, xfs) are modified to utilize this feature.The swapper address space operation are modified so that a regular
migrate_page() will occur for anonymous pages without writeback (migrate_pages
forces every anonymous page to have a swap entry).Signed-off-by: Mike Kravetz
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jan, 2006
1 commit
-
like other routines here, to ensure buffers are correctly initialised
with respect to b_private/b_end_io. Fixes an odd interaction between
XFS and reiserfs.Signed-off-by: Nathan Scott
15 Jan, 2006
1 commit
-
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kbSigned-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jan, 2006
1 commit
-
fs: Use where capable() is used.
Signed-off-by: Randy Dunlap
Acked-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Jan, 2006
1 commit
-
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.Modified-by: Ingo Molnar
(finished the conversion)
Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar
09 Jan, 2006
3 commits
-
We've had two instances recently of overflows when doing
64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)
I did a tree-wide grep of `<page_base)
Cc: Oleg Drokin
Cc: David Howells
Cc: David Woodhouse
Cc:
Cc: Christoph Hellwig
Cc: Anton Altaparmakov
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Roman Zippel
Cc:
Cc: Miklos Szeredi
Cc: Russell King
Cc: Trond Myklebust
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
See mm/filemap.c:
And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error. However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)Andrew Morton writes,
If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc. Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
Trond, could you please review the nfs part? Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.Acked-by: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch changes generic_cont_expand(), in order to share the code
with fatfs.- Use vmtruncate() if ->prepare_write() returns a error.
Even if ->prepare_write() returns an error, it may already have added some
blocks. So, this truncates blocks outside of ->i_size by vmtruncate().- Add generic_cont_expand_simple().
The generic_cont_expand_simple() assumes that ->prepare_write() can handle
the block boundary. With this, we don't need to care the extra byte.And for expanding a file size by truncate(), fatfs uses the
added generic_cont_expand_simple().Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2005
1 commit
-
Get rid of the `int unused' parameter of __find_get_block_slow().
Signed-off-by: Coywolf Qi Hunt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Oct, 2005
1 commit
-
If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
warn and will return NULL. We have a report (from Hubert Tonneau
) of isofs_fill_super() doing this (passing in
a silly block size) against an unplugged CDROM drive.But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
oops.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds