Eric Lee / smarc-fsl-linux-kernel

25 Sep, 2019

1 commit

5e7a3ed9f ocfs2: further debugfs cleanups ... Browse Code »

There is no need to check return value of debugfs_create functions, but
the last sweep through ocfs missed a number of places where this was
happening. There is also no need to save the individual dentries for the
debugfs files, as everything is can just be removed at once when the
directory is removed.

By getting rid of the file dentries for the debugfs entries, a bit of
local memory can be saved as well.

[colin.king@canonical.com: ensure ret is set to zero before returning]
Link: http://lkml.kernel.org/r/20190807121929.28918-1-colin.king@canonical.com
Link: http://lkml.kernel.org/r/20190731132119.GA12603@kroah.com
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Colin Ian King
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Jia Guo
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Kroah-Hartman
2019-09-25 06:54:07 +0800

13 Jul, 2019

3 commits

5da844a2c ocfs2: add first lock wait time in locking_state ... Browse Code »

ocfs2 file system uses locking_state file under debugfs to dump each
ocfs2 file system's dlm lock resources, but the users ever encountered
some hang(deadlock) problems in ocfs2 file system. I'd like to add
first lock wait time in locking_state file, which can help the upper
scripts detect these deadlock problems via comparing the first lock wait
time with the current time.

Link: http://lkml.kernel.org/r/20190611015414.27754-3-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2019-07-13 02:05:41 +0800
8056773ac ocfs2: add locking filter debugfs file ... Browse Code »

Add locking filter debugfs file, which is used to filter lock resources
dump from locking_state debugfs file. We use d_filter_secs field to
filter lock resources dump, the default d_filter_secs(0) value filters
nothing, otherwise, only dump the last N seconds active lock resources.
This enhancement can avoid dumping lots of old records. The
d_filter_secs value can be changed via locking_filter file.

[akpm@linux-foundation.org: fix undefined reference to `__udivdi3']
Link: http://lkml.kernel.org/r/20190611015414.27754-2-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Joseph Qi
Acked-by: Randy Dunlap [build-tested]
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2019-07-13 02:05:41 +0800
8a7f5f4c2 ocfs2: add last unlock times in locking_state ... Browse Code »

ocfs2 file system uses locking_state file under debugfs to dump each
ocfs2 file system's dlm lock resources, but the dlm lock resources in
memory are becoming more and more after the files were touched by the
user. it will become a bit difficult to analyze these dlm lock resource
records in locking_state file by the upper scripts, though some files
are not active for now, which were accessed long time ago.

Then, I'd like to add last pr/ex unlock times in locking_state file for
each dlm lock resource record, the the upper scripts can use last unlock
time to filter inactive dlm lock resource record.

Link: http://lkml.kernel.org/r/20190611015414.27754-1-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2019-07-13 02:05:41 +0800

31 May, 2019

1 commit

328970de0 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 021110 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 84 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Richard Fontana
Reviewed-by: Allison Randal
Reviewed-by: Kate Stewart
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:25:18 +0800

06 Mar, 2019

1 commit

5500ab4ed ocfs2: fix the application IO timeout when fstrim is running ... Browse Code »

The user reported this problem, the upper application IO was timeout
when fstrim was running on this ocfs2 partition. the application
monitoring resource agent considered that this application did not work,
then this node was fenced by the cluster brain (e.g. pacemaker).

The root cause is that fstrim thread always holds main_bm meta-file
related locks until all the cluster groups are trimmed. This patch will
make fstrim thread release main_bm meta-file related locks when each
cluster group is trimmed, this will let the current application IO has a
chance to claim the clusters from main_bm meta-file.

Link: http://lkml.kernel.org/r/20190111090014.31645-1-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2019-03-06 13:07:13 +0800

06 Apr, 2018

1 commit

5f483c4ab ocfs2: add kobject for online file check ... Browse Code »

Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data, meanwhile, reduce the code lines and make the code logic
clear. The changed code is based on Goldwyn Rodrigues's patches and
ext4 fs code.

Link: http://lkml.kernel.org/r/1495611866-27360-4-git-send-email-ghe@suse.com
Signed-off-by: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2018-04-06 12:36:22 +0800

01 Feb, 2018

1 commit

4882abebc ocfs2: add trimfs dlm lock resource ... Browse Code »

Introduce a new dlm lock resource, which will be used to communicate
during fstrimming of an ocfs2 device from cluster nodes.

Link: http://lkml.kernel.org/r/1513228484-2084-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2018-02-01 09:18:35 +0800

07 Sep, 2017

1 commit

964f14a0d ocfs2: clean up some dead code ... Browse Code »

clean up some unused functions and parameters.

Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Alex Chen
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun Piao
2017-09-07 08:27:24 +0800

23 Feb, 2017

1 commit

439a36b8e ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock ... Browse Code »

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl = ocfs2_iop_get_acl,
.set_acl = ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

do_sys_open
may_open
inode_permission
ocfs2_permission
ocfs2_inode_lock()
Reviewed-by: Junxiao Bi
Reviewed-by: Joseph Qi
Cc: Stephen Rothwell
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Ren
2017-02-23 08:41:27 +0800

13 Dec, 2016

1 commit

395627b07 ocfs2: use time64_t to represent orphan scan times ... Browse Code »

struct timespec is not y2038 safe. Use time64_t which is y2038 safe to
represent orphan scan times. time64_t is sufficient here as only the
seconds delta times are relevant.

Also use appropriate time functions that return time in time64_t format.
Time functions now return monotonic time instead of real time as only
delta scan times are relevant and these values are not persistent across
reboots.

The format string for the debug print is still using long as this is
only the time elapsed since the last scan and long is sufficient to
represent this value.

Link: http://lkml.kernel.org/r/1475365138-20567-1-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2016-12-13 10:55:06 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

26 Mar, 2016

1 commit

35ddf78e4 ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local ... Browse Code »

This patch fixes a deadlock, as follows:

Node 1 Node 2 Node 3
1)volume a and b are only mount vol a only mount vol b
mounted

2) start to mount b start to mount a

3) check hb of Node 3 check hb of Node 2
in vol a, qs_holds++ in vol b, qs_holds++

4) -------------------- all nodes' network down --------------------

5) progress of mount b the same situation as
failed, and then call Node 2
ocfs2_dismount_volume.
but the process is hung,
since there is a work
in ocfs2_wq cannot beo
completed. This work is
about vol a, because
ocfs2_wq is global wq.
BTW, this work which is
scheduled in ocfs2_wq is
ocfs2_orphan_scan_work,
and the context in this work
needs to take inode lock
of orphan_dir, because
lockres owner are Node 1 and
all nodes' nework has been down
at the same time, so it can't
get the inode lock.

6) Why can't this node be fenced
when network disconnected?
Because the process of
mount is hung what caused qs_holds
is not equal 0.

Because all works in the ocfs2_wq are relative to the super block.

The solution is to change the ocfs2_wq from global to local. In other
words, move it into struct ocfs2_super.

Signed-off-by: Yiwen Jiang
Reviewed-by: Joseph Qi
Cc: Xue jiufei
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2016-03-26 07:37:42 +0800

05 Sep, 2015

1 commit

7d0fb9148 ocfs2: add errors=continue ... Browse Code »

OCFS2 is often used in high-availaibility systems. However, ocfs2
converts the filesystem to read-only at the drop of the hat. This may
not be necessary, since turning the filesystem read-only would affect
other running processes as well, decreasing availability.

This attempt is to add errors=continue, which would return the EIO to
the calling process and terminate furhter processing so that the
filesystem is not corrupted further. However, the filesystem is not
converted to read-only.

As a future plan, I intend to create a small utility or extend
fsck.ocfs2 to fix small errors such as in the inode. The input to the
utility such as the inode can come from the kernel logs so we don't have
to schedule a downtime for fixing small-enough errors.

The patch changes the ocfs2_error to return an error. The error
returned depends on the mount option set. If none is set, the default
is to turn the filesystem read-only.

Perhaps errors=continue is not the best option name. Historically it is
used for making an attempt to progress in the current process itself.
Should we call it errors=eio? or errors=killproc? Suggestions/Comments
welcome.

Sources are available at:
https://github.com/goldwynr/linux/tree/error-cont

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2015-09-05 07:54:41 +0800

25 Jun, 2015

1 commit

ae1f08146 ocfs2: fix wrong check in ocfs2_direct_IO_get_blocks ... Browse Code »

contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared
with clusters_to_alloc. So convert it to clusters first.

Signed-off-by: Joseph Qi
Reviewed-by: Weiwei Wang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-06-25 08:49:40 +0800

13 Mar, 2015

1 commit

18d585f0f ocfs2: make append_dio an incompat feature ... Browse Code »

It turns out that making this feature ro_compat isn't quite enough to
prevent accidental corruption on mount from older kernels. Ocfs2 (like
other file systems) will process orphaned inodes even when the user mounts
in 'ro' mode. So for the case of a filesystem not knowing the append_dio
feature, mounting the filesystem could result in orphaned-for-dio files
being deleted, which we clearly don't want.

So instead, turn this into an incompat flag.

Btw, this is kind of my fault - initially I asked that we add a flag to
cover the feature and even suggested that we use an ro flag. It wasn't
until I was looking through our commits for v4.0-rc1 that I realized we
actually want this to be incompat.

Signed-off-by: Mark Fasheh
Cc: Joseph Qi
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Fasheh
2015-03-13 09:46:07 +0800

17 Feb, 2015

3 commits

160cc2666 ocfs2: set append dio as a ro compat feature ... Browse Code »

Intruduce a bit OCFS2_FEATURE_RO_COMPAT_APPEND_DIO and check it in
write flow. If the bit is not set, fall back to the old way.

Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:05 +0800
24c40b329 ocfs2: implement ocfs2_direct_IO_write ... Browse Code »

Implement ocfs2_direct_IO_write. Add the inode to orphan dir first, and
then delete it once append O_DIRECT finished.

This is to make sure block allocation and inode size are consistent.

[akpm@linux-foundation.org: fix it for "block: Add discard flag to blkdev_issue_zeroout() function"]
Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Junxiao Bi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:05 +0800
ed460cffc ocfs2: add orphan recovery types in ocfs2_recover_orphans ... Browse Code »

Define two orphan recovery types, which indicates if need truncate file or
not.

Signed-off-by: Joseph Qi
Cc: Weiwei Wang
Cc: Junxiao Bi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Xuejiufei
Cc: alex chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-02-17 09:56:04 +0800

11 Feb, 2015

1 commit

1dfeb7684 ocfs2: add a mount option journal_async_commit on ocfs2 filesystem ... Browse Code »

Add a mount option to support JBD2 feature:

JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance. This option will enable
'journal_checksum' internally.

Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.

test script:
fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000

default:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 215.2 17878

with journal_async_commit option:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 317.5 17881

Signed-off-by: Alex Chen
Signed-off-by: Weiwei Wang
Reviewed-by: Joseph Qi
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2015-02-11 06:30:29 +0800

11 Dec, 2014

1 commit

d1e782387 ocfs2: do not set OCFS2_LOCK_UPCONVERT_FINISHING if nonblocking lock can not be granted at once ... Browse Code »

ocfs2_readpages() use nonblocking flag to avoid page lock inversion. It
will trigger cluster hang because that flag OCFS2_LOCK_UPCONVERT_FINISHING
is not cleared if nonblocking lock cannot be granted at once. The flag
would prevent dc thread from downconverting. So other nodes cannot
acheive this lockres for ever.

So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast if
nonblocking lock had already returned.

Signed-off-by: joyce.xue
Reviewed-by: Junxiao Bi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-12-11 09:41:03 +0800

05 Jun, 2014

1 commit

a9e9acaeb ocfs2: fix umount hang while shutting down truncate log ... Browse Code »

Revert commit 75f82eaa502c ("ocfs2: fix NULL pointer dereference when
dismount and ocfs2rec simultaneously") because it may cause a umount
hang while shutting down the truncate log.

fix NULL pointer dereference when dismount and ocfs2rec simultaneously

The situation is as followes:
ocfs2_dismout_volume
-> ocfs2_recovery_exit
-> free osb->recovery_map
-> ocfs2_truncate_shutdown
-> lock global bitmap inode
-> ocfs2_wait_for_recovery
-> check whether osb->recovery_map->rm_used is zero

Because osb->recovery_map is already freed, rm_used can be any other
values, so it may yield umount hang.

To prevent NULL pointer dereference while getting sys_root_inode, we use
a osb_tl_disable flag to disable schedule osb_truncate_log_wq after
truncate log shutdown.

Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-05 07:53:54 +0800

04 Apr, 2014

3 commits

43b10a203 ocfs2: avoid system inode ref confusion by adding mutex lock ... Browse Code »

The following case may lead to the same system inode ref in confusion.

A thread B thread
ocfs2_get_system_file_inode
->get_local_system_inode
->_ocfs2_get_system_file_inode
because of *arr == NULL,
ocfs2_get_system_file_inode
->get_local_system_inode
->_ocfs2_get_system_file_inode
gets first ref thru
_ocfs2_get_system_file_inode,
gets second ref thru igrab and
set *arr = inode
at the moment, B thread also gets
two refs, so lead to one more
inode ref.

So add mutex lock to avoid multi thread set two inode ref once at the
same time.

Signed-off-by: jiangyiwen
Reviewed-by: Joseph Qi
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2014-04-04 07:20:57 +0800
8ed6b2370 ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock ... Browse Code »

The following patches are reverted in this patch because these patches
caused performance regression in the remote unlink() calls.

ea455f8ab683 - ocfs2: Push out dropping of dentry lock to ocfs2_wq
f7b1aa69be13 - ocfs2: Fix deadlock on umount
5fd131893793 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount

Previous patches in this series removed the possible deadlocks from
downconvert thread so the above patches shouldn't be needed anymore.

The regression is caused because these patches delay the iput() in case
of dentry unlocks. This also delays the unlocking of the open lockres.
The open lockresource is required to test if the inode can be wiped from
disk or not. When the deleting node does not get the open lock, it
marks it as orphan (even though it is not in use by another
node/process) and causes a journal checkpoint. This delays operations
following the inode eviction. This also moves the inode to the orphaned
inode which further causes more I/O and a lot of unneccessary orphans.

The following script can be used to generate the load causing issues:

declare -a create
declare -a remove
declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
unique="`mktemp -u XXXXX`"
script="/tmp/idontknow-${unique}.sh"
cat < "${script}"
for n in {1..8}; do mkdir -p test/dir\${n}
eval touch test/dir\${n}/foo{1.."\$1"}
done
EOF
chmod 700 "${script}"

function fcreate ()
{
exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
}

function fremove ()
{
exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
}

function fcp ()
{
exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
}

echo -------------------------------------------------
echo "| # files | create #s | copy #s | remove #s |"
echo -------------------------------------------------
for ((x=0; x < ${#iterations[*]} ; x++)) do
create[$x]="`fcreate ${iterations[$x]}`"
copy[$x]="`fcp ${iterations[$x]}`"
remove[$x]="`fremove`"
printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
done
rm "${script}"
echo "------------------------"

Signed-off-by: Srinivas Eeda
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jan Kara
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2014-04-04 07:20:55 +0800
e3a767b60 ocfs2: implement delayed dropping of last dquot reference ... Browse Code »

We cannot drop last dquot reference from downconvert thread as that
creates the following deadlock:

NODE 1 NODE2
holds dentry lock for 'foo'
holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
dquot_initialize(bar)
ocfs2_dquot_acquire()
ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
...
downconvert thread (triggered from another
node or a different process from NODE2)
ocfs2_dentry_post_unlock()
...
iput(foo)
ocfs2_evict_inode(foo)
ocfs2_clear_inode(foo)
dquot_drop(inode)
...
ocfs2_dquot_release()
ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
- blocks
finds we need more space in
quota file
...
ocfs2_extend_no_holes()
ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
- deadlocks waiting for
downconvert thread

We solve the problem by postponing dropping of the last dquot reference to
a workqueue if it happens from the downconvert thread.

Signed-off-by: Jan Kara
Reviewed-by: Mark Fasheh
Reviewed-by: Srinivas Eeda
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-04-04 07:20:54 +0800

22 Jan, 2014

1 commit

c74a3bdd9 ocfs2: add clustername to cluster connection ... Browse Code »

This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x). AFAIK, cman also is being phased out for a unified corosync
cluster stack.

fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2. For all future
references, DLM stands for fs/dlm code.

The advantages are:
+ No need to run an additional userspace daemon (ocfs2_controld)
+ No controld device handling and controld protocol
+ Shifting responsibilities of node management to DLM layer

For backward compatibility, we are keeping the controld handling code.
Once enough time has passed we can remove a significant portion of the
code. This was tested by using the kernel with changes on older
unmodified tools. The kernel used ocfs2_controld as expected, and
displayed the appropriate warning message.

This feature requires modification in the userspace ocfs2-tools. The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.

This patch (of 6):

Add clustername to cluster connection.

Signed-off-by: Goldwyn Rodrigues
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2014-01-22 08:19:41 +0800

04 Jul, 2013

1 commit

8fa9d17f9 ocfs2: remove unecessary variable needs_checkpoint ... Browse Code »

Code cleanup: needs_checkpoint is assigned to but never used. Delete
the variable.

Signed-off-by: Goldwyn Rodrigues
Cc: Jeff Liu
Acked-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2013-07-04 07:07:23 +0800

02 Dec, 2011

1 commit

939255798 ocfs2: avoid unaligned access to dqc_bitmap ... Browse Code »

The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita
Acked-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Joel Becker

Akinobu Mita
2011-12-02 06:39:32 +0800

01 Jun, 2011

1 commit

730e663bd ocfs2: use proper little-endian bitops ... Browse Code »

Using __test_and_{set,clear}_bit_le() with ignoring its return value
can be replaced with __{set,clear}_bit_le().

Signed-off-by: Akinobu Mita
Cc: Joel Becker
Cc: Mark Fasheh
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Joel Becker

Akinobu Mita
2011-06-01 10:03:45 +0800

29 Mar, 2011

1 commit

03e4970c1 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits)
Treat writes as new when holes span across page boundaries
fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
ocfs2/dlm: Move kmalloc() outside the spinlock
ocfs2: Make the left masklogs compat.
ocfs2: Remove masklog ML_AIO.
ocfs2: Remove masklog ML_UPTODATE.
ocfs2: Remove masklog ML_BH_IO.
ocfs2: Remove masklog ML_JOURNAL.
ocfs2: Remove masklog ML_EXPORT.
ocfs2: Remove masklog ML_DCACHE.
ocfs2: Remove masklog ML_NAMEI.
ocfs2: Remove mlog(0) from fs/ocfs2/dir.c
ocfs2: remove NAMEI from symlink.c
ocfs2: Remove masklog ML_QUOTA.
ocfs2: Remove mlog(0) from quota_local.c.
ocfs2: Remove masklog ML_RESERVATIONS.
ocfs2: Remove masklog ML_XATTR.
ocfs2: Remove masklog ML_SUPER.
ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c
ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c
...

Fix up trivial conflict in fs/ocfs2/super.c

Linus Torvalds
2011-03-29 04:03:31 +0800

24 Mar, 2011

1 commit

c4354d0d6 ocfs2: use little-endian bitops ... Browse Code »

As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita
Acked-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:17 +0800

20 Feb, 2011

1 commit

5bc970e80 ocfs2: Use hrtimer to track ocfs2 fs lock stats ... Browse Code »

Patch makes use of the hrtimer to track times in ocfs2 lock stats.

The patch is a bit involved to ensure no additional impact on the memory
footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems.

A related change was to modify the unit of the max wait time from nanosec to
microsec allowing us to track max time larger than 4 secs. This change
necessitated the bumping of the output version in the debugfs file,
locking_state, from 2 to 3.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2011-02-20 19:56:07 +0800

16 Dec, 2010

1 commit

50308d813 ocfs2: Try to free truncate log when meeting ENOSPC in write. ... Browse Code »

Recently, one of our colleagues meet with a problem that if we
write/delete a 32mb files repeatly, we will get an ENOSPC in
the end. And the corresponding bug is 1288.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288

The real problem is that although we have freed the clusters,
they are in truncate log and they will be summed up so that
we can free them once in a whole.

So this patch just try to resolve it. In case we see -ENOSPC
in ocfs2_write_begin_no_lock, we will check whether the truncate
log has enough clusters for our need, if yes, we will try to
flush the truncate log at that point and try again. This method
is inspired by Mark Fasheh . Thanks.

Cc: Mark Fasheh
Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-12-16 16:46:02 +0800

19 Nov, 2010

1 commit

a2a2f5529 ocfs2: char is not always signed ... Browse Code »

Commit 1c66b360fe262 (Change some lock status member in ocfs2_lock_res
to char.) states that these fields need to be signed due to comparision
to -1, but only changed the type from unsigned char to char. However, it
is a compiler option if char is a signed or unsigned type. Change these
fields to signed char so the code will work with all compilers.

Signed-off-by: Milton Miller
Signed-off-by: Joel Becker

Milton Miller
2010-11-19 06:10:56 +0800

13 Nov, 2010

1 commit

1c66b360f ocfs2: Change some lock status member in ocfs2_lock_res to char. ... Browse Code »

Commit 83fd9c7 changes l_level, l_requested and l_blocking of
ocfs2_lock_res from int to unsigned char. But actually it is
initially as -1(ocfs2_lock_res_init_common) which
correspoding to 255 for unsigned char. So the whole dlm lock
mechanism doesn't work now which means a disaster to ocfs2.

Cc: Goldwyn Rodrigues
Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-11-13 19:15:08 +0800

16 Oct, 2010

1 commit

fc3718918 Merge branch 'globalheartbeat-2' of git://oss.oracle.com/git/smushran/linux-2.6 … ... Browse Code »

…into ocfs2-merge-window

Conflicts:
fs/ocfs2/ocfs2.h

Joel Becker
2010-10-16 04:03:09 +0800

12 Oct, 2010

1 commit

7bdb0d18b ocfs2: Add a mount option "coherency=*" to handle cluster coherency for O_DIRECT writes. ... Browse Code »

Currently, the default behavior of O_DIRECT writes was allowing
concurrent writing among nodes to the same file, with no cluster
coherency guaranteed (no EX lock held). This can leave stale data in
the cache for buffered reads on other nodes.

The new mount option introduce a chance to choose two different
behaviors for O_DIRECT writes:

* coherency=full, as the default value, will disallow
concurrent O_DIRECT writes by taking
EX locks.

* coherency=buffered, allow concurrent O_DIRECT writes
without EX lock among nodes, which
gains high performance at risk of
getting stale data on other nodes.

Signed-off-by: Tristan Ye
Signed-off-by: Joel Becker

Tristan Ye
2010-10-12 05:14:55 +0800

10 Oct, 2010

1 commit

98f486f23 ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO ... Browse Code »

OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for
both userspace and o2cb cluster stacks. It also allows us to extend cluster
info to include stack flags.

This patch also adds stackflags to sb->s_clusterinfo. It also introduces a
clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled
global heartbeat mode.

This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The
clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack.

Signed-off-by: Sunil Mushran

Sunil Mushran
2010-10-10 01:24:46 +0800

08 Oct, 2010

1 commit

2c442719e ocfs2: Add support for heartbeat=global mount option ... Browse Code »

Adds support for heartbeat=global mount option. It ensures that the heartbeat
mode passed matches the one enabled on disk.

Signed-off-by: Sunil Mushran

Sunil Mushran
2010-10-08 06:23:50 +0800

10 Sep, 2010

1 commit

b4d693fcc ocfs2: Cache system inodes of other slots. ... Browse Code »

Durring orphan scan, if we are slot 0, and we are replaying
orphan_dir:0001, the general process is that for every file
in this dir:
1. we will iget orphan_dir:0001, since there is no inode for it.
we will have to create an inode and read it from the disk.
2. do the normal work, such as delete_inode and remove it from
the dir if it is allowed.
3. call iput orphan_dir:0001 when we are done. In this case,
since we have no dcache for this inode, i_count will
reach 0, and VFS will have to call clear_inode and in
ocfs2_clear_inode we will checkpoint the inode which will let
ocfs2_cmt and journald begin to work.
4. We loop back to 1 for the next file.

So you see, actually for every deleted file, we have to read the
orphan dir from the disk and checkpoint the journal. It is very
time consuming and cause a lot of journal checkpoint I/O.
A better solution is that we can have another reference for these
inodes in ocfs2_super. So if there is no other race among
nodes(which will let dlmglue to checkpoint the inode), for step 3,
clear_inode won't be called and for step 1, we may only need to
read the inode for the 1st time. This is a big win for us.

So this patch will try to cache system inodes of other slots so
that we will have one more reference for these inodes and avoid
the extra inode read and journal checkpoint.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-10 23:56:24 +0800