Eric Lee / smarc-fsl-linux-kernel

18 Jan, 2020

1 commit

dae87141c ocfs2: call journal flush to mark journal as empty after journal recovery when mount ... Browse Code »

[ Upstream commit 397eac17f86f404f5ba31d8c3e39ec3124b39fd3 ]

If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset). If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
Block 0: Journal Superblock
Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2)
Blocksize: 4096 Total Blocks: 32768 First Block: 1
First Commit ID: 13 Start Log Blknum: 1
Error: 0
Feature Compat: 0
Feature Incompat: 2 block64
Feature RO compat: 0
Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
FS Share Cnt: 1 Dynamic Superblk Blknum: 0
Per Txn Block Limit Journal: 0 Data: 0

Block 1: Journal Commit Block
Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK)

Block 2: Journal Descriptor
Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK)
No. Blocknum Flags
0. 587 none
UUID: 00000000000000000000000000000000
1. 8257792 JBD2_FLAG_SAME_UUID
2. 619 JBD2_FLAG_SAME_UUID
3. 24772864 JBD2_FLAG_SAME_UUID
4. 8257802 JBD2_FLAG_SAME_UUID
5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
...
Block 7: Inode
Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809)
FS Generation: 2839773110 (0xa9437fb6)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x1) InlineData
User: 0 (root) Group: 0 (root) Size: 7
Links: 1 Clusters: 0
ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019
mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
dtime: 0x0 -- Thu Jan 1 08:00:00 1970
...
Block 9: Journal Commit Block
Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
Signed-off-by: Kai Li
Reviewed-by: Joseph Qi
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Kai Li
2020-01-18 02:49:08 +0800

19 Oct, 2019

1 commit

b918c4302 ocfs2: fix panic due to ocfs2_wq is null ... Browse Code »

mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error. ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
------------[ cut here ]------------
Oops: 0002 [#1] SMP NOPTI
CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E
4.14.148-200.ckv.x86_64 #1
Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
task: ffff967af0520000 task.stack: ffffa5f05484000
RIP: 0010:mutex_lock+0x19/0x20
Call Trace:
flush_workqueue+0x81/0x460
ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
ocfs2_dismount_volume+0x84/0x400 [ocfs2]
ocfs2_fill_super+0xa4/0x1270 [ocfs2]
? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
mount_bdev+0x17f/0x1c0
mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yi Li
2019-10-19 18:32:32 +0800

31 May, 2019

1 commit

328970de0 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 021110 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 84 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Richard Fontana
Reviewed-by: Allison Randal
Reviewed-by: Kate Stewart
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:25:18 +0800

29 Dec, 2018

1 commit

d85400af7 ocfs2: clear journal dirty flag after shutdown journal ... Browse Code »

Dirty flag of the journal should be cleared at the last stage of umount,
if do it before jbd2_journal_destroy(), then some metadata in uncommitted
transaction could be lost due to io error, but as dirty flag of journal
was already cleared, we can't find that until run a full fsck. This may
cause system panic or other corruption.

Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi
Reviewed-by: Yiwen Jiang
Reviewed-by: Joseph Qi
Cc: Jun Piao
Cc: Changwei Ge
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2018-12-29 04:11:45 +0800

04 Nov, 2018

1 commit

21158ca85 ocfs2: without quota support, avoid calling quota recovery ... Browse Code »

During one dead node's recovery by other node, quota recovery work will
be queued. We should avoid calling quota when it is not supported, so
check the quota flags.

Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: guozhonghua
Reviewed-by: Jan Kara
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guozhonghua
2018-11-04 01:09:37 +0800

13 Jun, 2018

1 commit

6396bb221 treewide: kzalloc() -> kcalloc() ... Browse Code »

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook

Kees Cook
2018-06-13 07:19:22 +0800

01 Feb, 2018

1 commit

d984187e3 ocfs2: return error when we attempt to access a dirty bh in jbd2 ... Browse Code »

We should not reuse the dirty bh in jbd2 directly due to the following
situation:

1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.

2. The bhs are submitted to jbd2 area successfully.

3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.

4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():

ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.

5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.

6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.

Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.

Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao
Reviewed-by: Yiwen Jiang
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

piaojun
2018-02-01 09:18:35 +0800

07 Sep, 2017

1 commit

964f14a0d ocfs2: clean up some dead code ... Browse Code »

clean up some unused functions and parameters.

Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Alex Chen
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun Piao
2017-09-07 08:27:24 +0800

13 Dec, 2016

1 commit

395627b07 ocfs2: use time64_t to represent orphan scan times ... Browse Code »

struct timespec is not y2038 safe. Use time64_t which is y2038 safe to
represent orphan scan times. time64_t is sufficient here as only the
seconds delta times are relevant.

Also use appropriate time functions that return time in time64_t format.
Time functions now return monotonic time instead of real time as only
delta scan times are relevant and these values are not persistent across
reboots.

The format string for the debug print is still using long as this is
only the time elapsed since the last scan and long is sufficient to
represent this value.

Link: http://lkml.kernel.org/r/1475365138-20567-1-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2016-12-13 10:55:06 +0800

27 Jul, 2016

1 commit

0b492f68b ocfs2: improve recovery performance ... Browse Code »

Journal replay will be run when performing recovery for a dead node. To
avoid the stale cache impact, all blocks of dead node's journal inode
were reloaded from disk. This hurts the performance. Check whether one
block is cached before reloading it can improve performance a lot. In
my test env, the time doing recovery was improved from 120s to 1s.

[akpm@linux-foundation.org: clean up the for loop p_blkno handling]
Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi
Reviewed-by: Joseph Qi
Cc: "Gang He"
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2016-07-27 07:19:19 +0800

26 Mar, 2016

1 commit

35ddf78e4 ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local ... Browse Code »

This patch fixes a deadlock, as follows:

Node 1 Node 2 Node 3
1)volume a and b are only mount vol a only mount vol b
mounted

2) start to mount b start to mount a

3) check hb of Node 3 check hb of Node 2
in vol a, qs_holds++ in vol b, qs_holds++

4) -------------------- all nodes' network down --------------------

5) progress of mount b the same situation as
failed, and then call Node 2
ocfs2_dismount_volume.
but the process is hung,
since there is a work
in ocfs2_wq cannot beo
completed. This work is
about vol a, because
ocfs2_wq is global wq.
BTW, this work which is
scheduled in ocfs2_wq is
ocfs2_orphan_scan_work,
and the context in this work
needs to take inode lock
of orphan_dir, because
lockres owner are Node 1 and
all nodes' nework has been down
at the same time, so it can't
get the inode lock.

6) Why can't this node be fenced
when network disconnected?
Because the process of
mount is hung what caused qs_holds
is not equal 0.

Because all works in the ocfs2_wq are relative to the super block.

The solution is to change the ocfs2_wq from global to local. In other
words, move it into struct ocfs2_super.

Signed-off-by: Yiwen Jiang
Reviewed-by: Joseph Qi
Cc: Xue jiufei
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2016-03-26 07:37:42 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

15 Jan, 2016

1 commit

72865d923 ocfs2: clean up redundant NULL check before iput ... Browse Code »

Since iput will take care the NULL check itself, NULL check before
calling it is redundant. So clean them up.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2016-01-15 08:00:49 +0800

06 Nov, 2015

3 commits

5afc44e2e ocfs2: add uuid to ocfs2 thread name for problem analysis ... Browse Code »

A node can mount multiple ocfs2 volumes. And if thread names are same for
each volume/domain, it will bring inconvenience when analyzing problems
because we have to identify which volume/domain the messages belong to.

Since thread name will be printed to messages, so add volume uuid or dlm
name to thread name can benefit problem analysis.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Gang He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-11-06 11:34:48 +0800
93d911fcc ocfs2: only take lock if dio entry when recover orphans ... Browse Code »

We have no need to take inode mutex, rw and inode lock if it is not dio
entry when recover orphans. Optimize it by adding a flag
OCFS2_INODE_DIO_ORPHAN_ENTRY to ocfs2_inode_info to reduce contention.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-11-06 11:34:48 +0800
30edc43c7 ocfs2: do not include dio entry in case of orphan scan ... Browse Code »

dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do
not include it when doing normal orphan scan to reduce contention.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-11-06 11:34:48 +0800

05 Sep, 2015

4 commits

7ecef14ab ocfs2: neaten do_error, ocfs2_error and ocfs2_abort ... Browse Code »

These uses sometimes do and sometimes don't have '\n' terminations. Make
the uses consistently use '\n' terminations and remove the newline from
the functions.

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2015-09-05 07:54:41 +0800
ad6948212 ocfs2: fix race between crashed dio and rm ... Browse Code »

There is a race case between crashed dio and rm, which will lead to
OCFS2_VALID_FL not set read-only.

N1 N2
------------------------------------------------------------------------
dd with direct flag
rm file
crashed with an dio entry left
in orphan dir
clear OCFS2_VALID_FL in
ocfs2_remove_inode
recover N1 and read the corrupted inode,
and set filesystem read-only

So we skip the inode deletion this time and wait for dio entry recovered
first.

Signed-off-by: Joseph Qi
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-09-05 07:54:41 +0800
acf8fdbe6 ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access ... Browse Code »

When storage network is unstable, it may trigger the BUG in
__ocfs2_journal_access because of buffer not uptodate. We can retry the
write in this case or return error instead of BUG.

Signed-off-by: Joseph Qi
Reported-by: Zhangguanghui
Tested-by: Zhangguanghui
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-09-05 07:54:41 +0800
512f62acb ocfs2: fix race between dio and recover orphan ... Browse Code »

During direct io the inode will be added to orphan first and then
deleted from orphan. There is a race window that the orphan entry will
be deleted twice and thus trigger the BUG when validating
OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.

ocfs2_direct_IO_write
...
ocfs2_add_inode_to_orphan
>>>>>>>> race window.
1) another node may rm the file and then down, this node
take care of orphan recovery and clear flag
OCFS2_DIO_ORPHANED_FL.
2) since rw lock is unlocked, it may race with another
orphan recovery and append dio.
ocfs2_del_inode_from_orphan

So take inode mutex lock when recovering orphans and make rw unlock at the
end of aio write in case of append dio.

Signed-off-by: Joseph Qi
Reported-by: Yiwen Jiang
Cc: Weiwei Wang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-09-05 07:54:41 +0800

25 Jun, 2015

4 commits

b519ea6d9 ocfs2: mark local functions as static ... Browse Code »

Some functions are only used locally, so mark them as static.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-06-25 08:49:40 +0800
74e364ad1 ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger() ... Browse Code »

ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function. We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2015-06-25 08:49:39 +0800
e272e7f0f ocfs2: do not BUG if jbd2_journal_dirty_metadata fails ... Browse Code »

jbd2_journal_dirty_metadata may fail. Currently it cannot take care of
non zero return value and just BUG in ocfs2_journal_dirty. This patch is
aborting the handle and journal instead of BUG.

Signed-off-by: Joseph Qi
Cc: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-06-25 08:49:39 +0800
cf1776a9e ocfs2: fix a tiny race when truncate dio orohaned entry ... Browse Code »

Once dio crashed it will leave an entry in orphan dir. And orphan scan
will take care of the clean up. There is a tiny race case that the same
entry will be truncated twice and then trigger the BUG in
ocfs2_del_inode_from_orphan.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-06-25 08:49:39 +0800

17 Feb, 2015

2 commits

4813962be ocfs2: wait for orphan recovery first once append O_DIRECT write crash ... Browse Code »

If one node has crashed with orphan entry leftover, another node which do
append O_DIRECT write to the same file will override the
i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If
this case happens, we let it wait for orphan recovery first.

Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:05 +0800
ed460cffc ocfs2: add orphan recovery types in ocfs2_recover_orphans ... Browse Code »

Define two orphan recovery types, which indicates if need truncate file or
not.

Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Junxiao Bi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:04 +0800

11 Feb, 2015

1 commit

9d6008c75 ocfs2: remove unreachable code in __ocfs2_recovery_thread() ... Browse Code »

Signed-off-by: Daeseok Youn
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daeseok Youn
2015-02-11 06:30:29 +0800

01 Nov, 2014

1 commit

ac7576f4b vfs: make first argument of dir_context.actor typed ... Browse Code »

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2014-11-01 05:48:54 +0800

05 Jun, 2014

1 commit

55b465b66 ocfs2: limit printk when journal is aborted ... Browse Code »

Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
ocfs2_commit_thread. Then it will get into a loop with mass logs. This
will meaninglessly consume a larger number of resource and may lead to
the system hanging. So limit printk in this case.

[akpm@linux-foundation.org: document the msleep]
Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-06-05 07:53:54 +0800

04 Apr, 2014

1 commit

7bf619c14 ocfs2: remove OCFS2_INODE_SKIP_DELETE flag ... Browse Code »

The flag was never set, delete it.

Signed-off-by: Jan Kara
Reviewed-by: Mark Fasheh
Reviewed-by: Srinivas Eeda
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-04-04 07:20:54 +0800

12 Sep, 2013

2 commits

f17c20dd2 ocfs2: use i_size_read() to access i_size ... Browse Code »

Though ocfs2 uses inode->i_mutex to protect i_size, there are both
i_size_read/write() and direct accesses. Clean up all direct access to
eliminate confusion.

Signed-off-by: Junxiao Bi
Cc: Jie Liu
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2013-09-12 06:56:30 +0800
2b1e55c38 ocfs2: lighten up allocate transaction ... Browse Code »

The issue scenario is as following:

When fallocating a very large disk space for a small file,
__ocfs2_extend_allocation attempts to get a very large transaction. For
some journal sizes, there may be not enough room for this transaction,
and the fallocate will fail.

The patch below extends & restarts the transaction as necessary while
allocating space, and should work with even the smallest journal. This
patch refers ext4 resize.

Test:
# mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc
...(jounral size is 32M)
# mount.ocfs2 /dev/sdc /mnt/ocfs2/
# touch /mnt/ocfs2/1.log
# fallocate -o 0 -l 400G /mnt/ocfs2/1.log
fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory
# tail -f /var/log/messages
[ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048)
[ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12
[ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12
[ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12
^C
With this patch, the test works well.

Signed-off-by: Younger Liu
Cc: Jie Liu
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Younger Liu
2013-09-12 06:56:28 +0800

29 Jun, 2013

1 commit

3704412bd [readdir] convert ocfs2 ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:02 +0800

22 Feb, 2013

1 commit

d787ab097 ocfs2: remove kfree() redundant null checks ... Browse Code »

smatch analysis indicates a number of redundant NULL checks before
calling kfree(), eg:

fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info:
redundant null check on *tl_copy calling kfree()

fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info:
redundant null check on pages calling kfree()

etc....

[akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()]
Signed-off-by: Tim Gardner
Cc: Mark Fasheh
Acked-by: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Gardner
2013-02-22 09:22:19 +0800

31 Jul, 2012

1 commit

fef6925cd ocfs2: Convert to new freezing mechanism ... Browse Code »

Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze
protection. We also protect several ioctl entry points which were missing the
protection. Finally, we add freeze protection to the journaling mechanism so
that iput() of unlinked inode cannot modify a frozen filesystem.

CC: Mark Fasheh
CC: Joel Becker
CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:45:49 +0800

25 Jul, 2011

2 commits

a035bff6b ocfs2: Add comment about orphan scanning ... Browse Code »

Add a comment that explains the reason as to why orphan scan scans all the slots.

Signed-off-by: Sunil Mushran

Sunil Mushran
2011-07-25 01:35:54 +0800
619c200de ocfs2: Clean up messages in the fs ... Browse Code »

Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability.

Signed-off-by: Sunil Mushran

Sunil Mushran
2011-07-25 01:34:54 +0800

14 May, 2011

1 commit

10b3dd761 ocfs2: Skip mount recovery for hard-ro mounts ... Browse Code »

Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops.

Signed-off-by: Sunil Mushran
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Sunil Mushran
2011-05-14 02:27:14 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

24 Feb, 2011

1 commit

b41079504 ocfs2: Remove masklog ML_JOURNAL. ... Browse Code »

Remove mlog(0) from fs/ocfs2/journal.c and the masklog JOURNAL.

Signed-off-by: Tao Ma

Tao Ma
2011-02-24 14:15:35 +0800