Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2016

1 commit

c62c38f6b ocfs2: replace CURRENT_TIME macro ... Browse Code »

CURRENT_TIME is not y2038 safe.

Use y2038 safe ktime_get_real_seconds() here for timestamps. struct
heartbeat_block's hb_seq and deletetion time are already 64 bits wide
and accommodate times beyond y2038.

Also use y2038 safe ktime_get_real_ts64() for on disk inode timestamps.
These are also wide enough to accommodate time64_t.

Link: http://lkml.kernel.org/r/1475365298-29236-1-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2016-12-13 10:55:06 +0800

11 Oct, 2016

2 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
3873691e5 Merge remote-tracking branch 'ovl/rename2' into for-linus Browse Code »

Al Viro
2016-10-11 11:02:51 +0800

08 Oct, 2016

1 commit

fd50ecadd vfs: Remove {get,set,remove}xattr inode operations ... Browse Code »

These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2016-10-08 09:48:36 +0800

28 Sep, 2016

1 commit

078cd8279 fs: Replace CURRENT_TIME with current_time() for inode timestamps ... Browse Code »

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Acked-by: Felipe Balbi
Acked-by: Steven Whitehouse
Acked-by: Ryusuke Konishi
Acked-by: David Sterba
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:21 +0800

27 Sep, 2016

2 commits

2773bf00a fs: rename "rename2" i_op to "rename" ... Browse Code »

Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-09-27 17:03:58 +0800
1cd66c93b fs: make remaining filesystems use .rename2 ... Browse Code »

This is trivial to do:

- add flags argument to foo_rename()
- check if flags is zero
- assign foo_rename() to .rename2 instead of .rename

This doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).

Filesystems converted:

9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

After this, we can get rid of the duplicate interfaces for rename.

Signed-off-by: Miklos Szeredi
Acked-by: Greg Kroah-Hartman
Acked-by: David Howells [AFS]
Acked-by: Mike Marshall
Cc: Eric Van Hensbergen
Cc: Ilya Dryomov
Cc: Jan Harkes
Cc: Tyler Hicks
Cc: Oleg Drokin
Cc: Trond Myklebust
Cc: Mark Fasheh

Miklos Szeredi
2016-09-27 17:03:58 +0800

13 May, 2016

1 commit

c25a1e067 ocfs2: fix posix_acl_create deadlock ... Browse Code »

Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create. The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs. For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted. And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.

Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed
Signed-off-by: Junxiao Bi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2016-05-13 06:52:50 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

15 Jan, 2016

2 commits

074a6c655 ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del ... Browse Code »

In ocfs2_orphan_del, currently it finds and deletes entry first, and
then access orphan dir dinode. This will have a problem once
ocfs2_journal_access_di fails. In this case, entry will be removed from
orphan dir, but in deed the inode hasn't been deleted successfully. In
other words, the file is missing but not actually deleted. So we should
access orphan dinode first like unlink and rename.

Signed-off-by: Joseph Qi
Reviewed-by: Jiufei Xue
Cc: Mark Fasheh
Cc: Joel Becker
Reviewed-by: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2016-01-15 08:00:49 +0800
72865d923 ocfs2: clean up redundant NULL check before iput ... Browse Code »

Since iput will take care the NULL check itself, NULL check before
calling it is redundant. So clean them up.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2016-01-15 08:00:49 +0800

12 Jan, 2016

1 commit

32fb37843 Merge branch 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs RCU symlink updates from Al Viro:
"Replacement of ->follow_link/->put_link, allowing to stay in RCU mode
even if the symlink is not an embedded one.

No changes since the mailbomb on Jan 1"

* 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->get_link() to delayed_call, kill ->put_link()
kill free_page_put_link()
teach nfs_get_link() to work in RCU mode
teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
teach shmem_get_link() to work in RCU mode
teach page_get_link() to work in RCU mode
replace ->follow_link() with new method that could stay in RCU mode
don't put symlink bodies in pagecache into highmem
namei: page_getlink() and page_follow_link_light() are the same thing
ufs: get rid of ->setattr() for symlinks
udf: don't duplicate page_symlink_inode_operations
logfs: don't duplicate page_symlink_inode_operations
switch befs long symlinks to page_symlink_operations

Linus Torvalds
2016-01-12 05:13:23 +0800

13 Dec, 2015

1 commit

854ee2e94 ocfs2: fix SGID not inherited issue ... Browse Code »

Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir. It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.

Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi
Cc: Mark Fasheh
Cc: Joel Becker
Acked-by: Srinivas Eeda
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-12-13 02:15:34 +0800

09 Dec, 2015

1 commit

21fc61c73 don't put symlink bodies in pagecache into highmem ... Browse Code »

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:36 +0800

21 Nov, 2015

1 commit

8f1eb4875 ocfs2: fix umask ignored issue ... Browse Code »

New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.

Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Junxiao Bi
Cc: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-11-21 08:17:32 +0800

06 Nov, 2015

2 commits

b1529a41f ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error ... Browse Code »

In ocfs2_mknod_locked if '__ocfs2_mknod_locke d' returns an error, we
should reclaim the inode successfully claimed above, otherwise, the
inode never be reused. The case is described below:

ocfs2_mknod
ocfs2_mknod_locked
ocfs2_claim_new_inode
Successfully claim the inode
__ocfs2_mknod_locked
ocfs2_journal_access_di
Failed because of -ENOMEM or other reasons, the inode
lockres has not been initialized yet.

iput(inode)
ocfs2_evict_inode
ocfs2_delete_inode
ocfs2_inode_lock
ocfs2_inode_lock_full_nested
__ocfs2_cluster_lock
Return -EINVAL because of the inode
lockres has not been initialized.

So the following operations are not performed
ocfs2_wipe_inode
ocfs2_remove_inode
ocfs2_free_dinode
ocfs2_free_suballoc_bits

Signed-off-by: Alex Chen
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2015-11-06 11:34:48 +0800
30edc43c7 ocfs2: do not include dio entry in case of orphan scan ... Browse Code »

dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do
not include it when doing normal orphan scan to reduce contention.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-11-06 11:34:48 +0800

05 Sep, 2015

4 commits

928dda1f9 ocfs2: fix a tiny case that inode can not removed ... Browse Code »

When running dirop_fileop_racer we found a case that inode
can not removed.

Two nodes, say Node A and Node B, mount the same ocfs2 volume. Create
two dirs /race/1/ and /race/2/ in the filesystem.

Node A Node B
rm -r /race/2/
mv /race/1/ /race/2/
call ocfs2_unlink(), get
the EX mode of /race/2/
wait for B unlock /race/2/
decrease i_nlink of /race/2/ to 0,
and add inode of /race/2/ into
orphan dir, unlock /race/2/
got EX mode of /race/2/. because
/race/1/ is dir, so inc i_nlink
of /race/2/ and update into disk,
unlock /race/2/
because i_nlink of /race/2/
is not zero, this inode will
always remain in orphan dir

This patch fixes this case by test whether i_nlink of new dir is zero.

Signed-off-by: Yiwen Jiang
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Cc: Xue jiufei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yiwen Jiang
2015-09-05 07:54:41 +0800
807a79071 ocfs2: set filesytem read-only when ocfs2_delete_entry failed. ... Browse Code »

In ocfs2_rename, it will lead to an inode with two entried(old and new) if
ocfs2_delete_entry(old) failed. Thus, filesystem will be inconsistent.

The case is described below:

ocfs2_rename
-> ocfs2_start_trans
-> ocfs2_add_entry(new)
-> ocfs2_delete_entry(old)
-> __ocfs2_journal_access *failed* because of -ENOMEM
-> ocfs2_commit_trans

So filesystem should be set to read-only at the moment.

Signed-off-by: Yiwen Jiang
Cc: Joseph Qi
Cc: Joel Becker
Reviewed-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2015-09-05 07:54:41 +0800
3cb2ec43f ocfs2: adjust code to match locking/unlocking order ... Browse Code »

Unlocking order in ocfs2_unlink and ocfs2_rename mismatches the
corresponding locking order, although it won't cause issues, adjust the
code so that it looks more reasonable.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-09-05 07:54:41 +0800
512f62acb ocfs2: fix race between dio and recover orphan ... Browse Code »

During direct io the inode will be added to orphan first and then
deleted from orphan. There is a race window that the orphan entry will
be deleted twice and thus trigger the BUG when validating
OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.

ocfs2_direct_IO_write
...
ocfs2_add_inode_to_orphan
>>>>>>>> race window.
1) another node may rm the file and then down, this node
take care of orphan recovery and clear flag
OCFS2_DIO_ORPHANED_FL.
2) since rw lock is unlocked, it may race with another
orphan recovery and append dio.
ocfs2_del_inode_from_orphan

So take inode mutex lock when recovering orphans and make rw unlock at the
end of aio write in case of append dio.

Signed-off-by: Joseph Qi
Reported-by: Yiwen Jiang
Cc: Weiwei Wang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-09-05 07:54:41 +0800

24 Jul, 2015

1 commit

9c89fe0af ocfs2: Handle error from dquot_initialize() ... Browse Code »

dquot_initialize() can now return error. Handle it where possible.

Reviewed-by: Junxiao Bi
Signed-off-by: Jan Kara

Jan Kara
2015-07-24 02:59:38 +0800

25 Jun, 2015

2 commits

ab1ba0218 ocfs2: use swap() in ocfs2_double_lock() ... Browse Code »

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick
Cc: Julia Lawall
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2015-06-25 08:49:40 +0800
cf1776a9e ocfs2: fix a tiny race when truncate dio orohaned entry ... Browse Code »

Once dio crashed it will leave an entry in orphan dir. And orphan scan
will take care of the clean up. There is a tiny race case that the same
entry will be truncated twice and then trigger the BUG in
ocfs2_del_inode_from_orphan.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-06-25 08:49:39 +0800

27 Apr, 2015

1 commit

9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

16 Apr, 2015

1 commit

2b0143b5c VFS: normal filesystems (and lustre): d_inode() annotations ... Browse Code »

that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:57 +0800

15 Apr, 2015

2 commits

62f8b1f0d ocfs2: use ENOENT instead of EEXIST when get system file fails ... Browse Code »

When ocfs2_get_system_file_inode fails, it is obscure to set the return
value to -EEXIST. So change it to -ENOENT.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-04-15 07:48:58 +0800
e38a57390 ocfs2: use actual name length when find entry in ocfs2_orphan_del() ... Browse Code »

If the namelen is 20 and name only has actual length 16, it will fail in
ocfs2_find_entry because of mismatch. So use actual name length when find
entry.

Signed-off-by: Joseph Qi
Signed-off-by: Yiwen Jiang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-04-15 07:48:58 +0800

17 Feb, 2015

2 commits

4813962be ocfs2: wait for orphan recovery first once append O_DIRECT write crash ... Browse Code »

If one node has crashed with orphan entry leftover, another node which do
append O_DIRECT write to the same file will override the
i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If
this case happens, we let it wait for orphan recovery first.

Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:05 +0800
06ee5c75b ocfs2: add functions to add and remove inode in orphan dir ... Browse Code »

Add functions to add inode to orphan dir and remove inode in orphan dir.
Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add
directly. Because append O_DIRECT will add inode to orphan two and may
result in more than one orphan entry for the same inode.

[akpm@linux-foundation.org: avoid dynamic stack allocation]
Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Junxiao Bi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Cc: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:04 +0800

09 Jan, 2015

1 commit

53dc20b9a ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file ... Browse Code »

In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of
new_dentry not old_dentry. We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory. This is how the problem was reproducable:

# mkdir a
# mkdir b
# touch a/test
# ln a/test b/test
ln: failed to create hard link `b/test' => `a/test': No such file or directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue
Reported-by: Szabo Aron - UBIT
Cc: Mark Fasheh
Cc: Joel Becker
Tested-by: Aron Szabo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2015-01-09 07:10:51 +0800

30 Oct, 2014

1 commit

d3556babd ocfs2: fix d_splice_alias() return code checking ... Browse Code »

d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
Currently the code checks not for ERR_PTR and will cuase an oops in
ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL().

Signed-off-by: Richard Weinberger
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Richard Weinberger
2014-10-30 07:33:15 +0800

24 Jun, 2014

3 commits

595297a8f ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod ... Browse Code »

When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
ocfs2_mknod(), iput() will not be called during dput(dentry) because no
d_instantiate(), and this will lead to umount hung.

Signed-off-by: jiangyiwen
Cc: Joel Becker
Reviewed-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2014-06-24 07:47:45 +0800
f7a14f32e ocfs2: fix a tiny race when running dirop_fileop_racer ... Browse Code »

When running dirop_fileop_racer we found a dead lock case.

2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create
/race/16/1 in the filesystem, and let the inode number of dir 16 is less
than the inode number of dir race.

Node A Node B
mv /race/16/1 /race/
right after Node A has got the
EX mode of /race/16/, and tries to
get EX mode of /race
ls /race/16/

In this case, Node A has got the EX mode of /race/16/, and wants to get EX
mode of /race/. Node B has got the PR mode of /race/, and wants to get
the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead
lock happens.

This patch fixes this case by locking in ancestor order before trying
inode number order.

Signed-off-by: Yiwen Jiang
Signed-off-by: Joseph Qi
Cc: Joel Becker
Reviewed-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yiwen Jiang
2014-06-24 07:47:45 +0800
5fb1beb06 ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename() ... Browse Code »

There are two files a and b in dir /mnt/ocfs2.

node A node B

mv a b
In ocfs2_rename(), after calling
ocfs2_orphan_add(), the inode of
file b will be added into orphan
dir.

If ocfs2_update_entry() fails,
ocfs2_rename return error and mv
operation fails. But file b still
exists in the parent dir.

ocfs2_queue_orphan_scan
-> ocfs2_queue_recovery_completion
-> ocfs2_complete_recovery
-> ocfs2_recover_orphans
The inode of the file b will be
put with iput().

ocfs2_evict_inode
-> ocfs2_delete_inode
-> ocfs2_wipe_inode
-> ocfs2_remove_inode
OCFS2_VALID_FL in the inode
i_flags will be cleared.

The file b still can be accessed
on node B.
ls /mnt/ocfs2
When first read the file b with
ocfs2_read_inode_block(). It will
validate the inode using
ocfs2_validate_inode_block().
Because OCFS2_VALID_FL not set in
the inode i_flags, so the file
system will be readonly.

So we should add inode into orphan dir after updating entry in
ocfs2_rename().

Signed-off-by: alex.chen
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2014-06-24 07:47:45 +0800

04 Apr, 2014

4 commits

6fdb702d6 ocfs2: call ocfs2_update_inode_fsync_trans when updating any inode ... Browse Code »

Ensure that ocfs2_update_inode_fsync_trans() is called any time we touch
an inode in a given transaction. This is a follow-on to the previous
patch to reduce lock contention and deadlocking during an fsync
operation.

Signed-off-by: Darrick J. Wong
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Wengang
Cc: Greg Marsden
Cc: Srinivas Eeda
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2014-04-04 07:20:56 +0800
f81c20158 ocfs2: fix panic on kfree(xattr->name) ... Browse Code »

Commit 9548906b2bb7 ('xattr: Constify ->name member of "struct xattr"')
missed that ocfs2 is calling kfree(xattr->name). As a result, kernel
panic occurs upon calling kfree(xattr->name) because xattr->name refers
static constant names. This patch removes kfree(xattr->name) from
ocfs2_mknod() and ocfs2_symlink().

Signed-off-by: Tetsuo Handa
Reported-by: Tariq Saeed
Tested-by: Tariq Saeed
Reviewed-by: Srinivas Eeda
Cc: Joel Becker
Cc: Mark Fasheh
Cc: [3.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2014-04-04 07:20:56 +0800
466e68c43 ocfs2: __ocfs2_mknod_locked should return error when ocfs2_create_new_inode_locks() failed ... Browse Code »

When ocfs2_create_new_inode_locks() return error, inode open lock may
not be obtainted for this inode. So other nodes can remove this file
and free dinode when inode still remain in memory on this node, which is
not correct and may trigger BUG. So __ocfs2_mknod_locked should return
error when ocfs2_create_new_inode_locks() failed.

Node_1 Node_2
create fileA, call ocfs2_mknod()
-> ocfs2_get_init_inode(), allocate inodeA
-> ocfs2_claim_new_inode(), claim dinode(dinodeA)
-> call ocfs2_create_new_inode_locks(),
create open lock failed, return error
-> __ocfs2_mknod_locked return success

unlink fileA
try open lock succeed,
and free dinodeA

create another file, call ocfs2_mknod()
-> ocfs2_get_init_inode(), allocate inodeB
-> ocfs2_claim_new_inode(), as Node_2 had freed dinodeA,
so claim dinodeA and update generation for dinodeA

call __ocfs2_drop_dl_inodes()->ocfs2_delete_inode()
to free inodeA, and finally triggers BUG
on(inode->i_generation != le32_to_cpu(fe->i_generation))
in function ocfs2_inode_lock_update().

Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-04-04 07:20:55 +0800
2931cdcb4 ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file ... Browse Code »

Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
transaction to complete. This isn't terribly efficient, since sync_file
really only needs to wait for the last transaction involving that inode
to complete, and this doesn't require i_mutex.

Therefore, implement the necessary bits to track the newest tid
associated with an inode, and teach sync_file to wait for that instead
of waiting for everything in the journal to commit. Furthermore, only
issue the flush request to the drive if jbd2 hasn't already done so.

This also eliminates the deadlock between ocfs2_file_aio_write() and
ocfs2_sync_file(). aio_write takes i_mutex then calls
ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
However, if that dio completion involves calling fsync, then we can get
into trouble when some ocfs2_sync_file tries to take i_mutex.

Signed-off-by: Darrick J. Wong
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2014-04-04 07:20:53 +0800

11 Feb, 2014

1 commit

0e048316f ocfs2: check existence of old dentry in ocfs2_link() ... Browse Code »

System call linkat first calls user_path_at(), check the existence of
old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
work. There may exist a race when Node A create a hard link for file
while node B rm it.

Node A Node B
user_path_at()
->ocfs2_lookup(),
find old dentry exist
rm file, add inode say inodeA
to orphan_dir

call ocfs2_link(),create a
hard link for inodeA.

rm the link, add inodeA to orphan_dir
again

When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
main work. It first tranverses entrys in orphan_dir, linking all inodes
in this orphan_dir to a list look like this:

inodeA->inodeB->...->inodeA

When tranvering this list, it will fall into loop, calling iput() again
and again. And finally trigger BUG_ON(inode->i_state & I_CLEAR).

Signed-off-by: joyce
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-02-11 08:01:43 +0800