08 May, 2019
2 commits
-
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
such thing as an "unrevoke" object; all this function does is remove
existing revoke objects plus some bookkeeping.Signed-off-by: Andreas Gruenbacher
-
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
sd_log_ordered: not sure what le stands for here, but it doesn't add
clarity, and if it stands for list entry, it's actually confusing as
those are both list heads but not list entries.Signed-off-by: Andreas Gruenbacher
12 Dec, 2018
1 commit
-
Field bd_ops was set but never used, so I removed it, and all
code supporting it.Signed-off-by: Bob Peterson
Acked-by: Steven Whitehouse
Signed-off-by: Andreas Gruenbacher
06 Oct, 2018
1 commit
-
Before this patch, various errors and messages were reported using
the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does
not tell you which gfs2 mount had the problem, which is often vital
to debugging. This patch changes the calls from pr_* to fs_* in
most of the messages so that the file system id is printed along
with the message.Signed-off-by: Bob Peterson
04 Jun, 2018
1 commit
-
In journaled data mode, we need to add each buffer head to the current
transaction. In ordered write mode, we only need to add the inode to
the ordered inode list. So far, both cases are handled in
gfs2_trans_add_data. This makes the code look misleading and is
inefficient for small block sizes as well. Handle both cases separately
instead.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
23 Jan, 2018
2 commits
-
This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher -
This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible). Some of these are used for
debug purposes so we can backtrack when problems occur. Others are
reserved for future expansion.This patch is based on a prototype from Steve Whitehouse.
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher
17 Jan, 2018
1 commit
-
The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
28 Nov, 2017
1 commit
-
This is a pure automated search-and-replace of the internal kernel
superblock flags.The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
31 Oct, 2017
1 commit
-
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson
31 Jan, 2017
1 commit
-
This patch modifies functions gfs2_trans_add_meta and _data so that
they check whether the buffer_head is already in a transaction,
and if so, avoid taking the gfs2_log_lock.Signed-off-by: Bob Peterson
27 Jan, 2017
2 commits
-
This patch simply combines function meta_lo_add with its only
caller, trans_add_meta. This makes the code easier to read and
will make it easier to reduce contention on gfs2_log_lock.Signed-off-by: Bob Peterson
-
This patch eliminates the int variable tr_touched in favor of a
new flag in the transaction. This is a step toward reducing contention
on the gfs2_log_lock spin_lock.Signed-off-by: Bob Peterson
02 Oct, 2015
1 commit
-
This patch fixes a timing window that causes a segfault.
The problem is that bd can remain NULL throughout the function
and then reference that NULL pointer if the bh->b_private starts
out NULL, then someone sets it to non-NULL inside the locking.
In that case, bd still needs to be set.Signed-off-by: Bob Peterson
04 Sep, 2015
1 commit
-
What uniquely identifies a glock in the glock hash table is not
gl_name, but gl_name and its superblock pointer. This patch makes
the gl_name field correspond to a unique glock identifier. That will
allow us to simplify hashing with a future patch, since the hash
algorithm can then take the gl_name and hash its components in one
operation.Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher
Acked-by: Steven Whitehouse
17 Nov, 2014
1 commit
-
The current gfs2 freezing code is considerably more complicated than it
should be because it doesn't use the vfs freezing code on any node except
the one that begins the freeze. This is because it needs to acquire a
cluster glock before calling the vfs code to prevent a deadlock, and
without the new freeze_super and thaw_super hooks, that was impossible. To
deal with the issue, gfs2 had to do some hacky locking tricks to make sure
that a frozen node couldn't be holding on a lock it needed to do the
unfreeze ioctl.This patch makes use of the new hooks to simply the gfs2 locking code. Now,
all the nodes in the cluster freeze and thaw in exactly the same way. Every
node in the cluster caches the freeze glock in the shared state. The new
freeze_super hook allows the freezing node to grab this freeze glock in
the exclusive state without first calling the vfs freeze_super function.
All the nodes in the cluster see this lock change, and call the vfs
freeze_super function. The vfs locking code guarantees that the nodes can't
get stuck holding the glocks necessary to unfreeze the system. To
unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
glock. Again, all the nodes notice this, reacquire the glock in shared mode
and call the vfs thaw_super function.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
08 Oct, 2014
1 commit
-
use macro definition
Signed-off-by: Fabian Frederick
Signed-off-by: Steven Whitehouse
14 May, 2014
1 commit
-
GFS2 has a transaction glock, which must be grabbed for every
transaction, whose purpose is to deal with freezing the filesystem.
Aside from this involving a large amount of locking, it is very easy to
make the current fsfreeze code hang on unfreezing.This patch rewrites how gfs2 handles freezing the filesystem. The
transaction glock is removed. In it's place is a freeze glock, which is
cached (but not held) in a shared state by every node in the cluster
when the filesystem is mounted. This lock only needs to be grabbed on
freezing, and actions which need to be safe from freezing, like
recovery.When a node wants to freeze the filesystem, it grabs this glock
exclusively. When the freeze glock state changes on the nodes (either
from shared to unlocked, or shared to exclusive), the filesystem does a
special log flush. gfs2_log_flush() does all the work for flushing out
the and shutting down the incore log, and then it tries to grab the
freeze glock in a shared state again. Since the filesystem is stuck in
gfs2_log_flush, no new transaction can start, and nothing can be written
to disk. Unfreezing the filesytem simply involes dropping the freeze
glock, allowing gfs2_log_flush() to grab and then release the shared
lock, so it is cached for next time.However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
shared lock on the filesystem root directory inode to check permissions.
If that glock has already been grabbed exclusively, fsfreeze will be
unable to get the shared lock and unfreeze the filesystem.In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
on the filesystem root directory during the freeze, and hold it until it
unfreezes the filesystem. The functions which need to grab a shared
lock in order to allow the unfreeze ioctl to be issued now use the lock
grabbed by the freeze code instead.The freeze and unfreeze code take care to make sure that this shared
lock will not be dropped while another process is using it.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
07 Mar, 2014
2 commits
-
Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.Other miscellanea around these changes:
o Add missing newlines
o Coalesce formats
o Realign argumentsSigned-off-by: Joe Perches
Signed-off-by: Steven Whitehouse -
-All printk(KERN_foo converted to pr_foo().
-Messages updated to fit in 80 columns.
-fs_macros converted as well.
-fs_printk removed.Signed-off-by: Fabian Frederick
Signed-off-by: Steven Whitehouse
25 Feb, 2014
2 commits
-
Now we have a master transaction into which other transactions
are merged, the accounting can be done using this master
transaction. We no longer require the superblock fields which
were being used for this function.In addition, this allows for a clean up in calc_reserved()
making it rather easier understand. Also, by reducing the
number of variables used to track the buffers being added
and removed from the journal, a number of error checks are
now no longer required.Signed-off-by: Steven Whitehouse
-
Over time, we hope to be able to improve the concurrency available
in the log code. This is one small step towards that, by moving
the buffer lists from the super block, and into the transaction
structure, so that each transaction builds its own buffer lists.At transaction commit time, the buffer lists are merged into
the currently accumulating transaction. That transaction then
is passed into the before and after commit functions at journal
flush time. Thus there should be no change in overall behaviour
yet.Signed-off-by: Steven Whitehouse
21 Feb, 2014
1 commit
-
A couple of "int" fields were being used as boolean values
so we can make them bitfields of one bit, and put them in
what might otherwise be a hole in the structure with 64
bit alignment.Signed-off-by: Steven Whitehouse
20 Jun, 2013
1 commit
-
This patch fixes a warning message introduced in the recent
"GFS2: aggressively issue revokes in gfs2_log_flush" patch.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
19 Jun, 2013
1 commit
-
This patch looks at all the outstanding blocks in all the transactions
on the log, and moves the completed ones to the ail2 list. Then it
issues revokes for these blocks. This will hopefully speed things up
in situations where there is a lot of contention for glocks, especially
if they are acquired serially.revoke_lo_before_commit will issue at most one log block's full of these
preemptive revokes. The amount of reserved log space that
gfs2_log_reserve() ignores has been incremented to allow for this extra
block.This patch also consolidates the common revoke instructions into one
function, gfs2_add_revoke().Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
01 May, 2013
1 commit
-
Pull GFS2 updates from Steven Whitehouse:
"There is not a whole lot of change this time - there are some further
changes which are in the works, but those will be held over until next
time.Here there are some clean ups to inode creation, the addition of an
origin (local or remote) indicator to glock demote requests, removal
of one of the remaining GFP_NOFAIL allocations during log flushes, one
minor clean up, and a one liner bug fix."* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Flush work queue before clearing glock hash tables
GFS2: Add origin indicator to glock demote tracing
GFS2: Add origin indicator to glock callbacks
GFS2: replace gfs2_ail structure with gfs2_trans
GFS2: Remove vestigial parameter ip from function rs_deltree
GFS2: Use gfs2_dinode_out() in the inode create path
GFS2: Remove gfs2_refresh_inode from inode creation path
GFS2: Clean up inode creation path
29 Apr, 2013
1 commit
-
Use the new vsprintf extension to avoid any possible
message interleaving.Signed-off-by: Joe Perches
Acked-by: Steven Whitehouse
Signed-off-by: Jiri Kosina
08 Apr, 2013
1 commit
-
In order to allow transactions and log flushes to happen at the same
time, gfs2 needs to move the transaction accounting and active items
list code into the gfs2_trans structure. As a first step toward this,
this patch removes the gfs2_ail structure, and handles the active items
list in the gfs_trans structure. This keeps gfs2 from allocating an ail
structure on log flushes, and gives us a struture that can later be used
to store the transaction accounting outside of the gfs2 superblock
structure.With this patch, at the end of a transaction, gfs2 will add the
gfs2_trans structure to the superblock if there is not one already.
This structure now has the active items fields that were previously in
gfs2_ail. This is not necessary in the case where the transaction was
simply used to add revokes, since these are never written outside of the
journal, and thus, don't need an active items list.Also, in order to make sure that the transaction structure is not
removed while it's still in use by gfs2_trans_end, unlocking the
sd_log_flush_lock has to happen slightly later in ending the
transaction.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
29 Jan, 2013
5 commits
-
Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.Signed-off-by: Steven Whitehouse
-
The locking in gfs2_attach_bufdata() was type specific (data/meta)
which made the function rather confusing. This patch moves the core
of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()As a result all of the locking related to adding data and metadata to
the journal is now in these two functions. This should help to clarify
what is going on, and give us some opportunities to simplify in
some cases.Signed-off-by: Steven Whitehouse
-
This patch copies the body of gfs2_trans_add_bh into the two newly
added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can
then move the .lo_add functions from lops.c into trans.c and call
them directly.As a result of this, we no longer need to use the .lo_add functions
at all, so that is removed from the log operations structure.Signed-off-by: Steven Whitehouse
-
There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.Signed-off-by: Steven Whitehouse
-
This moves the lo_add function for revokes into trans.c, removing
a function call and making the code easier to read.Signed-off-by: Steven Whitehouse
07 Nov, 2012
1 commit
-
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
31 Jul, 2012
1 commit
-
We update gfs2_page_mkwrite() to use new freeze protection and the transaction
code to use freeze protection while the transaction is running. That is needed
to stop iput() of unlinked file from modifying the filesystem. The rest is
handled by the generic code.CC: cluster-devel@redhat.com
CC: Steven Whitehouse
Acked-by: Steven Whitehouse
Signed-off-by: Jan Kara
Signed-off-by: Al Viro
02 May, 2012
1 commit
-
This patch eliminates the gfs2_log_element data structure and
rolls its two components into the gfs2_bufdata. This makes the code
easier to understand and makes it easier to migrate to a rbtree
to keep the list sorted.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
24 Apr, 2012
1 commit
-
This is another clean up in the logging code. This per-transaction
list was largely unused. Its main function was to ensure that the
number of buffers in a transaction was correct, however that counter
was only used to check the number of buffers in the bd_list_tr, plus
an assert at the end of each transaction. With the assert now changed
to use the calculated buffer counts, we can remove both bd_list_tr and
its associated counter.This should make the code easier to understand as well as shrinking
a couple of structures.Signed-off-by: Steven Whitehouse
21 Oct, 2011
1 commit
-
Here is an update of Bob's original rbtree patch which, in addition, also
resolves the rather strange ref counting that was being done relating to
the bitmap blocks.Originally we had a dual system for journaling resource groups. The metadata
blocks were journaled and also the rgrp itself was added to a list. The reason
for adding the rgrp to the list in the journal was so that the "repolish
clones" code could be run to update the free space, and potentially send any
discard requests when the log was flushed. This was done by comparing the
"cloned" bitmap with what had been written back on disk during the transaction
commit.Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
until the journal had been flushed. For that reason, there was a rather
complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
count on the buffers.However, the journal maintains a reference count on the buffers anyway, since
they are being journaled as metadata buffers. So by moving the code which deals
with the post-journal accounting for bitmap blocks to the metadata journaling
code, we can entirely dispense with the rather strange buffer ref counting
scheme and also the requirement to journal the rgrps.The net result of all this is that the ->sd_rindex_spin is left to do exactly
one job, and that is to look after the rbtree or rgrps.This patch is designed to be a stepping stone towards using RCU for the rbtree
of resource groups, however the reduction in the number of uses of the
->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
anyway.The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
be removed in future in favour of calling the functions directly where required
in the code. That will allow locking of resource groups without needing to
actually read them in - something that could be useful in speeding up statfs.In the mean time though it is valid to dereference ->bi_bh only when the rgrp
is locked. This is basically the same rule as before, modulo the references not
being valid until the following journal flush.Signed-off-by: Steven Whitehouse
Signed-off-by: Bob Peterson
Cc: Benjamin Marzinski
05 May, 2010
1 commit
-
This patch contains various tweaks to how log flushes and active item writeback
work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
for gfs2_logd to do the log flushing. Multiple functions were rewritten to
remove the need to call gfs2_log_lock(). Instead of using one test to see if
gfs2_logd had work to do, there are now seperate tests to check if there
are two many buffers in the incore log or if there are two many items on the
active items list.This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
some minor changes. Since gfs2_ail1_start always submits all the active items,
it no longer needs to keep track of the first ai submitted, so this has been
removed. In gfs2_log_reserve(), the order of the calls to
prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
been switched. If it called wake_up first there was a small window for a race,
where logd could run and return before gfs2_log_reserve was ready to get woken
up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
would be left waiting for gfs2_logd to eventualy run because it timed out.
Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
out, and flushes the log, can now be set on mount with ar_commit.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse
13 May, 2009
1 commit
-
There seems little point grabbing the transaction glock
only to have to release it again if the journal isn't
live. This moves the test earlier to avoid grabbing the lock
when we don't need it in the first place.Signed-off-by: Steven Whitehouse