Eric Lee / smarc-fsl-linux-kernel

22 Aug, 2011

1 commit

99b1bb61b Merge branch 'mw-3.1-jul25' of git://oss.oracle.com/git/smushran/linux-2.6 into ocfs2-fixes Browse Code »

Joel Becker
2011-08-22 12:02:57 +0800

28 Jul, 2011

2 commits

c7e25e6e0 ocfs2: Avoid livelock in ocfs2_readpage() ... Browse Code »

When someone writes to an inode, readers accessing the same inode via
ocfs2_readpage() just busyloop trying to get ip_alloc_sem because
do_generic_file_read() looks up the page again and retries ->readpage()
when previous attempt failed with AOP_TRUNCATED_PAGE. When there are enough
readers, they can occupy all CPUs and in non-preempt kernel the system is
deadlocked because writer holding ip_alloc_sem is never run to release the
semaphore. Fix the problem by making reader block on ip_alloc_sem to break
the busy loop.

Signed-off-by: Jan Kara
Signed-off-by: Joel Becker

Jan Kara
2011-07-28 17:07:19 +0800
a11f7e63c ocfs2: serialize unaligned aio ... Browse Code »
43

Fix a corruption that can happen when we have (two or more) outstanding
aio's to an overlapping unaligned region. Ext4
(e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
similar issues.

In our case what happens is that we can have an outstanding aio on a region
and if a write comes in with some bytes overlapping the original aio we may
decide to read that region into a page before continuing (typically because
of buffered-io fallback). Since we have no ordering guarantees with the
aio, we can read stale or bad data into the page and then write it back out.

If the i/o is page and block aligned, then we avoid this issue as there
won't be any need to read data from disk.

I took the same approach as Eric in the ext4 patch and introduced some
serialization of unaligned async direct i/o. I don't expect this to have an
effect on the most common cases of AIO. Unaligned aio will be slower
though, but that's far more acceptable than data corruption.

Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Mark Fasheh
2011-07-28 17:07:16 +0800

25 Jul, 2011

1 commit

5cffff9e2 ocfs2: Fix ocfs2_page_mkwrite() ... Browse Code »

This patch address two shortcomings in ocfs2_page_mkwrite():
1. Makes the function return better VM_FAULT_* errors.
2. It handles a error that is triggered when a page is dropped from the mapping
due to memory pressure. This patch locks the page to prevent that.

[Patch was cleaned up by Sunil Mushran.]

Signed-off-by: Wengang Wang
Signed-off-by: Sunil Mushran

Wengang Wang
2011-07-25 01:36:54 +0800

21 Jul, 2011

3 commits

72c5052dd fs: move inode_dio_done to the end_io handler ... Browse Code »

For filesystems that delay their end_io processing we should keep our
i_dio_count until the the processing is done. Enable this by moving
the inode_dio_done call to the end_io handler if one exist. Note that
the actual move to the workqueue for ext4 and XFS is not done in
this patch yet, but left to the filesystem maintainers. At least
for XFS it's not needed yet either as XFS has an internal equivalent
to i_dio_count.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2011-07-21 08:47:50 +0800
df2d6f265 fs: always maintain i_dio_count ... Browse Code »

Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
This these filesystems to also protect truncate against direct I/O requests
by using common code. Right now the only non-DIO_LOCKING filesystem that
appears to do so is XFS, which uses an opencoded variant of the i_dio_count
scheme.

Behaviour doesn't change for filesystems never calling inode_dio_wait.
For ext4 behaviour changes when using the dioread_nonlock option, which
previously was missing any protection between truncate and direct I/O reads.
For ocfs2 that handcrafted i_dio_count manipulations are replaced with
the common code now enable.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2011-07-21 08:47:48 +0800
bd5fe6c5e fs: kill i_alloc_sem ... Browse Code »

i_alloc_sem is a rather special rw_semaphore. It's the last one that may
be released by a non-owner, and it's write side is always mirrored by
real exclusion. It's intended use it to wait for all pending direct I/O
requests to finish before starting a truncate.

Replace it with a hand-grown construct:

- exclusion for truncates is already guaranteed by i_mutex, so it can
simply fall way
- the reader side is replaced by an i_dio_count member in struct inode
that counts the number of pending direct I/O requests. Truncate can't
proceed as long as it's non-zero
- when i_dio_count reaches non-zero we wake up a pending truncate using
wake_up_bit on a new bit in i_flags
- new references to i_dio_count can't appear while we are waiting for
it to read zero because the direct I/O count always needs i_mutex
(or an equivalent like XFS's i_iolock) for starting a new operation.

This scheme is much simpler, and saves the space of a spinlock_t and a
struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
system).

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2011-07-21 08:47:46 +0800

29 Mar, 2011

2 commits

03e4970c1 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits)
Treat writes as new when holes span across page boundaries
fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
ocfs2/dlm: Move kmalloc() outside the spinlock
ocfs2: Make the left masklogs compat.
ocfs2: Remove masklog ML_AIO.
ocfs2: Remove masklog ML_UPTODATE.
ocfs2: Remove masklog ML_BH_IO.
ocfs2: Remove masklog ML_JOURNAL.
ocfs2: Remove masklog ML_EXPORT.
ocfs2: Remove masklog ML_DCACHE.
ocfs2: Remove masklog ML_NAMEI.
ocfs2: Remove mlog(0) from fs/ocfs2/dir.c
ocfs2: remove NAMEI from symlink.c
ocfs2: Remove masklog ML_QUOTA.
ocfs2: Remove mlog(0) from quota_local.c.
ocfs2: Remove masklog ML_RESERVATIONS.
ocfs2: Remove masklog ML_XATTR.
ocfs2: Remove masklog ML_SUPER.
ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c
ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c
...

Fix up trivial conflict in fs/ocfs2/super.c

Linus Torvalds
2011-03-29 04:03:31 +0800
272b62c1f Treat writes as new when holes span across page boundaries ... Browse Code »

When a hole spans across page boundaries, the next write forces
a read of the block. This could end up reading existing garbage
data from the disk in ocfs2_map_page_blocks. This leads to
non-zero holes. In order to avoid this, mark the writes as new
when the holes span across page boundaries.

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: jlbec

Goldwyn Rodrigues
2011-03-29 00:44:58 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

07 Mar, 2011

1 commit

c1e8d35ef ocfs2: Remove EXIT from masklog. ... Browse Code »
43

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma

Tao Ma
2011-03-07 16:43:21 +0800

22 Feb, 2011

1 commit

9558156bc ocfs2: Remove mlog(0) from fs/ocfs2/aops.c ... Browse Code »

Remove all the "mlog(0," in fs/ocfs2/aops.c.

Signed-off-by: Tao Ma

Tao Ma
2011-02-22 21:33:59 +0800

21 Feb, 2011

1 commit

ef6b689b6 ocfs2: Remove ENTRY from masklog. ... Browse Code »

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma

Tao Ma
2011-02-21 11:10:44 +0800

12 Jan, 2011

1 commit

498f7f505 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits)
MAINTAINERS: Update Joel Becker's email address
ocfs2: Remove unused truncate function from alloc.c
ocfs2/cluster: dereferencing before checking in nst_seq_show()
ocfs2: fix build for OCFS2_FS_STATS not enabled
ocfs2/cluster: Show o2net timing statistics
ocfs2/cluster: Track process message timing stats for each socket
ocfs2/cluster: Track send message timing stats for each socket
ocfs2/cluster: Use ktime instead of timeval in struct o2net_sock_container
ocfs2/cluster: Replace timeval with ktime in struct o2net_send_tracking
ocfs2: Add DEBUG_FS dependency
ocfs2/dlm: Hard code the values for enums
ocfs2/dlm: Minor cleanup
ocfs2/dlm: Cleanup dlmdebug.c
ocfs2: Release buffer_head in case of error in ocfs2_double_lock.
ocfs2/cluster: Pin the local node when o2hb thread starts
ocfs2/cluster: Show pin state for each o2hb region
ocfs2/cluster: Pin/unpin o2hb regions
ocfs2/cluster: Remove dropped region from o2hb quorum region bitmap
ocfs2/cluster: Pin the remote node item in configfs
ocfs2/dlm: make existing convertion precedent over new lock
...

Linus Torvalds
2011-01-12 03:28:34 +0800

16 Dec, 2010

1 commit

50308d813 ocfs2: Try to free truncate log when meeting ENOSPC in write. ... Browse Code »

Recently, one of our colleagues meet with a problem that if we
write/delete a 32mb files repeatly, we will get an ENOSPC in
the end. And the corresponding bug is 1288.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288

The real problem is that although we have freed the clusters,
they are in truncate log and they will be summed up so that
we can free them once in a whole.

So this patch just try to resolve it. In case we see -ENOSPC
in ocfs2_write_begin_no_lock, we will check whether the truncate
log has enough clusters for our need, if yes, we will try to
flush the truncate log at that point and try again. This method
is inspired by Mark Fasheh . Thanks.

Cc: Mark Fasheh
Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-12-16 16:46:02 +0800

10 Dec, 2010

1 commit

39c99f12f Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. ... Browse Code »

Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX
rw_lock like buffered writes did(rw_level == 1), it turns out messing the
usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being
failed to get up_read'd correctly.

This patch tries to teach ocfs2_dio_end_io to understand well on all locking
stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private
data, just like what we did for rw_lock.

Signed-off-by: Tristan Ye
Signed-off-by: Joel Becker

Tristan Ye
2010-12-10 07:36:48 +0800

26 Oct, 2010

1 commit

ebdec241d fs: kill block_prepare_write ... Browse Code »

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions. Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:20 +0800

10 Sep, 2010

2 commits

729963a1f Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into merge-2 Browse Code »

Joel Becker
2010-09-10 23:41:04 +0800
83fd9c7f6 Reorganize data elements to reduce struct sizes ... Browse Code »

Thanks for the comments. I have incorportated them all.

CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled.
Statistics now look like -
ocfs2_write_ctxt: 2144 - 2136 = 8
ocfs2_inode_info: 1960 - 1848 = 112
ocfs2_journal: 168 - 160 = 8
ocfs2_lock_res: 336 - 304 = 32
ocfs2_refcount_tree: 512 - 472 = 40

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Joel Becker

Goldwyn Rodrigues
2010-09-10 23:39:27 +0800

12 Aug, 2010

2 commits

155027121 ocfs2: Add struct file to ocfs2_refcount_cow. ... Browse Code »

Add a new parameter 'struct file *' to ocfs2_refcount_cow
so that we can add readahead support later.

Signed-off-by: Tao Ma

Tao Ma
2010-08-12 10:39:57 +0800
0378da0fd ocfs2: pass struct file* to ocfs2_write_begin_nolock. ... Browse Code »

struct file * has file_ra_state to store the readahead state
and data. So pass this to ocfs2_write_begin_nolock so that
it can be used in ocfs2_refcount_cow.

Signed-off-by: Tao Ma

Tao Ma
2010-08-12 10:39:48 +0800

10 Aug, 2010

1 commit

eafdc7d19 sort out blockdev_direct_IO variants ... Browse Code »

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-10 04:47:29 +0800

27 Jul, 2010

1 commit

40e2e9731 direct-io: move aio_complete into ->end_io ... Browse Code »

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited. That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Alex Elder

Christoph Hellwig
2010-07-27 05:09:02 +0800

13 Jul, 2010

1 commit

693c241a5 ocfs2: No need to zero pages past i_size. ... Browse Code »

When ocfs2 fills a hole, it does so by allocating clusters. When a
cluster is larger than the write, ocfs2 must zero the portions of the
cluster outside of the write. If the clustersize is smaller than a
pagecache page, this is handled by the normal pagecache mechanisms, but
when the clustersize is larger than a page, ocfs2's write code will zero
the pages adjacent to the write. This makes sure the entire cluster is
zeroed correctly.

Currently ocfs2 behaves exactly the same when writing past i_size.
However, this means ocfs2 is writing zeroed pages for portions of a new
cluster that are beyond i_size. The page writeback code isn't expecting
this. It treats all pages past the one containing i_size as left behind
due to a previous truncate operation.

Thankfully, ocfs2 calculates the number of pages it will be working on
up front. The rest of the write code merely honors the original
calculation. We can simply trim the number of pages to only cover the
actual file data.

Signed-off-by: Joel Becker
Cc: stable@kernel.org

Joel Becker
2010-07-13 04:55:27 +0800

09 Jul, 2010

2 commits

5693486ba ocfs2: Zero the tail cluster when extending past i_size. ... Browse Code »

ocfs2's allocation unit is the cluster. This can be larger than a block
or even a memory page. This means that a file may have many blocks in
its last extent that are beyond the block containing i_size. There also
may be more unwritten extents after that.

When ocfs2 grows a file, it zeros the entire cluster in order to ensure
future i_size growth will see cleared blocks. Unfortunately,
block_write_full_page() drops the pages past i_size. This means that
ocfs2 is actually leaking garbage data into the tail end of that last
cluster. This is a bug.

We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
when a write or truncate is past i_size. They will use
ocfs2_zero_extend() to ensure the data is properly zeroed.

Older versions of ocfs2_zero_extend() simply zeroed every block between
i_size and the zeroing position. This presumes three things:

1) There is allocation for all of these blocks.
2) The extents are not unwritten.
3) The extents are not refcounted.

(1) and (2) hold true for non-sparse filesystems, which used to be the
only users of ocfs2_zero_extend(). (3) is another bug.

Since we're now using ocfs2_zero_extend() for sparse filesystems as
well, we teach ocfs2_zero_extend() to check every extent between
i_size and the zeroing position. If the extent is unwritten, it is
ignored. If it is refcounted, it is CoWed. Then it is zeroed.

Signed-off-by: Joel Becker
Cc: stable@kernel.org

Joel Becker
2010-07-09 04:25:35 +0800
a4bfb4cf1 ocfs2: When zero extending, do it by page. ... Browse Code »

ocfs2_zero_extend() does its zeroing block by block, but it calls a
function named ocfs2_write_zero_page(). Let's have
ocfs2_write_zero_page() handle the page level. From
ocfs2_zero_extend()'s perspective, it is now page-at-a-time.

Signed-off-by: Joel Becker
Cc: stable@kernel.org

Joel Becker
2010-07-09 04:24:49 +0800

06 May, 2010

1 commit

4fe370afa ocfs2: use allocation reservations during file write ... Browse Code »

Add a per-inode reservations structure and pass it through to the
reservations code.

Signed-off-by: Mark Fasheh

Mark Fasheh
2010-05-06 09:17:30 +0800

06 Mar, 2010

1 commit

e213e26ab Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

Linus Torvalds
2010-03-06 05:20:53 +0800

05 Mar, 2010

1 commit

5dd4056db dquot: cleanup space allocation / freeing routines ... Browse Code »

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:28 +0800

27 Feb, 2010

1 commit

cbaee472f ocfs2: Only bug out in direct io write for reflinked extent. ... Browse Code »

In ocfs2_direct_IO_get_blocks, we only need to bug out
in case of we are going to write a recounted extent rec.

What a silly bug introduced by me!

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker
Cc: stable@kernel.org

Tao Ma
2010-02-27 07:41:19 +0800

26 Jan, 2010

1 commit

2bd632165 ocfs2/trivial: Remove trailing whitespaces ... Browse Code »

Patch removes trailing whitespaces.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-01-26 11:20:51 +0800

16 Dec, 2009

1 commit

5fe878ae7 direct-io: cleanup blockdev_direct_IO locking ... Browse Code »

Currently the locking in blockdev_direct_IO is a mess, we have three
different locking types and very confusing checks for some of them. The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks. There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag. Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.

Also revamp the documentation of the locking scheme to actually make
sense.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig
Cc: Dave Chinner
Cc: Badari Pulavarty
Cc: Jeff Moyer
Cc: Jens Axboe
Cc: Zach Brown
Cc: Al Viro
Cc: Alex Elder
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2009-12-16 23:20:13 +0800

24 Sep, 2009

1 commit

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800

23 Sep, 2009

4 commits

b80474b43 ocfs2: Use buffer IO if we are appending a file. ... Browse Code »

In ocfs2_file_aio_write, we will prevent direct io if
we find that we are appending(changing i_size) and call
generic_file_aio_write_nolock. But actually O_DIRECT flag
is there and this function will call generic_file_direct_write
eventually which will update i_size and leave di->i_size
alone. The bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1173.

So this patch let ocfs2_direct_IO returns 0 directly if we
are appending so that buffered write will be called and
di->i_size get updated successfully. And this is also
what we want in ocfs2_file_aio_write.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2009-09-23 16:54:49 +0800
37f8a2bfa ocfs2: CoW a reflinked cluster when it is truncated. ... Browse Code »

When we truncate a file to a specific size which resides in a reflinked
cluster, we need to CoW it since ocfs2_zero_range_for_truncate will
zero the space after the size(just another type of write).

So we add a "max_cpos" in ocfs2_refcount_cow so that it will stop when
it hit the max cluster offset.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:38 +0800
293b2f70b ocfs2: Integrate CoW in file write. ... Browse Code »

When we use mmap, we CoW the refcountd clusters in
ocfs2_write_begin_nolock. While for normal file
io(including directio), we do CoW in
ocfs2_prepare_inode_for_write.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:37 +0800
6f70fa519 ocfs2: Add CoW support. ... Browse Code »

This patch try CoW support for a refcounted record.

the whole process will be:
1. Calculate how many clusters we need to CoW and where we start.
Extents that are not completely encompassed by the write will
be broken on 1MB boundaries.
2. Do CoW for the clusters with the help of page cache.
3. Change the b-tree structure with the new allocated clusters.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:36 +0800

16 Sep, 2009

1 commit

aa261f549 HWPOISON: Enable .remove_error_page for migration aware file systems ... Browse Code »

Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.

I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.

Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?

Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen

Andi Kleen
2009-09-16 17:50:16 +0800

05 Sep, 2009

2 commits

5e404e9ed ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). ... Browse Code »

With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info. Phew!

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:08:13 +0800
0cf2f7632 ocfs2: Pass struct ocfs2_caching_info to the journal functions. ... Browse Code »

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:50 +0800