Eric Lee / smarc-fsl-linux-kernel

26 Mar, 2016

2 commits

e63890f38 ocfs2: fix ip_unaligned_aio deadlock with dio work queue ... Browse Code »

In the current implementation of unaligned aio+dio, lock order behave as
follow:

in user process context:
-> call io_submit()
-> get i_mutex
get ip_unaligned_aio
-> submit direct io to block device
-> release i_mutex
-> io_submit() return

in dio work queue context(the work queue is created in __blockdev_direct_IO):
-> release ip_unaligned_aio
get i_mutex
-> clear unwritten flag & change i_size
-> release i_mutex

There is a limitation to the thread number of dio work queue. 256 at
default. If all 256 thread are in the above 'window2' stage, and there
is a user process in the 'window1' stage, the system will became
deadlock. Since the user process hold i_mutex to wait ip_unaligned_aio
lock, while there is a direct bio hold ip_unaligned_aio mutex who is
waiting for a dio work queue thread to be schedule. But all the dio
work queue thread is waiting for i_mutex lock in 'window2'.

This case only happened in a test which send a large number(more than
256) of aio at one io_submit() call.

My design is to remove ip_unaligned_aio lock. Change it to a sync io
instead. Just like ip_unaligned_aio lock, serialize the unaligned aio
dio.

[akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi]
Signed-off-by: Ryan Ding
Reviewed-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ryan Ding
2016-03-26 07:37:42 +0800
c1ad1e3ca ocfs2: add ocfs2_write_type_t type to identify the caller of write ... Browse Code »

Patchset: fix ocfs2 direct io code patch to support sparse file and data
ordering semantics

The idea is to use buffer io(more precisely use the interface
ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work
beyond block size. And clear UNWRITTEN flag until direct io data has
been written to disk, which can prevent data corruption when system
crashed during direct write.

And we will also archive a better performance: eg. dd direct write new
file with block size 4KB: before this patchset:
2.5 MB/s
after this patchset:
66.4 MB/s

This patch (of 8):

To support direct io in ocfs2_write_begin_nolock &
ocfs2_write_end_nolock.

Remove unused args filp & flags. Add new arg type. The type is one of
buffer/direct/mmap. Indicate 3 way to perform write. buffer/mmap type
has implemented. direct type will be implemented later.

Signed-off-by: Ryan Ding
Reviewed-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ryan Ding
2016-03-26 07:37:42 +0800

25 Jun, 2015

1 commit

fa5a0eb3b ocfs2: remove OCFS2_IOCB_SEM lock type in direct io ... Browse Code »

In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect
inode->i_alloc_sem rw semaphore lock in the earlier kernel version.
However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not
used at all, so OCFS2_IOCB_SEM lock type needs to be removed.

Signed-off-by: Weiwei Wang
Cc: Mark Fasheh
Cc: Joel Becker
Reviewed-by: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WeiWei Wang
2015-06-25 08:49:39 +0800

26 Mar, 2015

1 commit

e2e40f2c1 fs: move struct kiocb to fs.h ... Browse Code »

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2015-03-26 08:28:11 +0800

04 Apr, 2014

1 commit

c18ceab01 ocfs2: change ip_unaligned_aio to of type mutex from atomit_t ... Browse Code »

There is a problem that waitqueue_active() may check stale data thus miss
a wakeup of threads waiting on ip_unaligned_aio.

The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
be of type mutex thus the above prolem is avoid. Another benifit is that
mutex which works as FIFO is fairer than wake_up_all().

Signed-off-by: Wengang Wang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wengang Wang
2014-04-04 07:20:53 +0800

08 May, 2013

1 commit

a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800

28 Jul, 2011

1 commit

a11f7e63c ocfs2: serialize unaligned aio ... Browse Code »
44

Fix a corruption that can happen when we have (two or more) outstanding
aio's to an overlapping unaligned region. Ext4
(e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
similar issues.

In our case what happens is that we can have an outstanding aio on a region
and if a write comes in with some bytes overlapping the original aio we may
decide to read that region into a page before continuing (typically because
of buffered-io fallback). Since we have no ordering guarantees with the
aio, we can read stale or bad data into the page and then write it back out.

If the i/o is page and block aligned, then we avoid this issue as there
won't be any need to read data from disk.

I took the same approach as Eric in the ext4 patch and introduced some
serialization of unaligned async direct i/o. I don't expect this to have an
effect on the most common cases of AIO. Unaligned aio will be slower
though, but that's far more acceptable than data corruption.

Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Mark Fasheh
2011-07-28 17:07:16 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

10 Dec, 2010

1 commit

39c99f12f Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. ... Browse Code »

Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX
rw_lock like buffered writes did(rw_level == 1), it turns out messing the
usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being
failed to get up_read'd correctly.

This patch tries to teach ocfs2_dio_end_io to understand well on all locking
stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private
data, just like what we did for rw_lock.

Signed-off-by: Tristan Ye
Signed-off-by: Joel Becker

Tristan Ye
2010-12-10 07:36:48 +0800

26 Oct, 2010

1 commit

ebdec241d fs: kill block_prepare_write ... Browse Code »

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions. Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:20 +0800

12 Aug, 2010

1 commit

0378da0fd ocfs2: pass struct file* to ocfs2_write_begin_nolock. ... Browse Code »

struct file * has file_ra_state to store the readahead state
and data. So pass this to ocfs2_write_begin_nolock so that
it can be used in ocfs2_refcount_cow.

Signed-off-by: Tao Ma

Tao Ma
2010-08-12 10:39:48 +0800

23 Sep, 2009

1 commit

6f70fa519 ocfs2: Add CoW support. ... Browse Code »

This patch try CoW support for a refcounted record.

the whole process will be:
1. Calculate how many clusters we need to CoW and where we start.
Extents that are not completely encompassed by the write will
be broken on 1MB boundaries.
2. Do CoW for the clusters with the help of page cache.
3. Change the b-tree structure with the new allocated clusters.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:36 +0800

17 Oct, 2007

1 commit

b6af1bcd8 ocfs2: convert to new aops ... Browse Code »

Plug ocfs2 into the ->write_begin and ->write_end aops.

A bunch of custom code is now gone - the iovec iteration stuff during write
and the ocfs2 splice write actor.

Signed-off-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:58 +0800

13 Oct, 2007

2 commits

1afc32b95 ocfs2: Write support for inline data ... Browse Code »

This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.

Signed-off-by: Mark Fasheh
Reviewed-by: Joel Becker

Mark Fasheh
2007-10-13 02:54:40 +0800
1d410a6e3 ocfs2: Small refactor of truncate zeroing code ... Browse Code »

We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.

The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.

We also turn part of ocfs2_free_write_ctxt() into a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.

Signed-off-by: Mark Fasheh
Reviewed-by: Joel Becker

Mark Fasheh
2007-10-13 02:54:35 +0800

11 Jul, 2007

2 commits

7307de805 ocfs2: shared writeable mmap ... Browse Code »

Implement cluster consistent shared writeable mappings using the
->page_mkwrite() callback.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:51 +0800
3a307ffc2 ocfs2: rework ocfs2_buffered_write_cluster() ... Browse Code »

Use some ideas from the new-aops patch series and turn
ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
copying data in between. The code now understands multiple cluster writes as
a result of having to deal with a full page write for greater than 4k pages.

This sets us up to easily call into the write path during ->page_mkwrite().

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:46 +0800

27 Apr, 2007

4 commits

7cdfc3a1c ocfs2: Remember rw lock level during direct io ... Browse Code »

Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:07:45 +0800
6af67d820 ocfs2: Use own splice write actor ... Browse Code »

We need to fill holes during a splice write. Provide our own splice write
actor which can call ocfs2_file_buffered_write() with a splice-specific
callback.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:02:34 +0800
60b11392f ocfs2: zero tail of sparse files on truncate ... Browse Code »

Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.

This introduced a new helper, which can be used in ocfs2_write().

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:02:20 +0800
9517bac6c ocfs2: teach ocfs2_file_aio_write() about sparse files ... Browse Code »

Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
because allocating writes will require zeroing of pages adjacent to the I/O
for cluster sizes greater than page size.

Implement a custom file write here, which can order page locks for zeroing.
This also has the advantage that cluster locks can easily be ordered outside
of the page locks.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:02:08 +0800

02 Dec, 2006

1 commit

1fabe1481 ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t ... Browse Code »

This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-12-02 10:28:28 +0800

18 May, 2006

1 commit

53013cba4 ocfs2: take data locks around extend ... Browse Code »

We need to take a data lock around extends to protect the pages that
ocfs2_zero_extend is going to be pulling into the page cache. Otherwise an
extend on one node might populate the page cache with data pages that have
no lock coverage.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-05-18 05:38:47 +0800

04 Jan, 2006

1 commit

ccd979bdb [PATCH] OCFS2: The Second Oracle Cluster Filesystem ... Browse Code »

The OCFS2 file system module.

Signed-off-by: Mark Fasheh
Signed-off-by: Kurt Hackel

Mark Fasheh
2006-01-04 03:45:47 +0800