04 Aug, 2012
1 commit
-
The pdflush thread is long gone, so this patch removes references to pdflush
from gfs comments.Cc: Steven Whitehouse
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro
02 Aug, 2012
1 commit
-
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
31 Jul, 2012
2 commits
-
We update gfs2_page_mkwrite() to use new freeze protection and the transaction
code to use freeze protection while the transaction is running. That is needed
to stop iput() of unlinked file from modifying the filesystem. The rest is
handled by the generic code.CC: cluster-devel@redhat.com
CC: Steven Whitehouse
Acked-by: Steven Whitehouse
Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
CC: Steven Whitehouse
CC: cluster-devel@redhat.com
Acked-by: Steven Whitehouse
Signed-off-by: Jan Kara
Signed-off-by: Al Viro
25 Jul, 2012
1 commit
-
Pull GFS2 updates from Steven Whitehouse.
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Eliminate 64-bit divides
GFS2: Reduce file fragmentation
GFS2: kernel panic with small gfs2 filesystems - 1 RG
GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
GFS2: Add kobject release method
GFS2: Size seq_file buffer more carefully
GFS2: Use seq_vprintf for glocks debugfs file
seq_file: Add seq_vprintf function and export it
GFS2: Use lvbs for storing rgrp information with mount option
GFS2: Cache last hash bucket for glock seq_files
GFS2: Increase buffer size for glocks and glstats debugfs files
GFS2: Fix error handling when reading an invalid block from the journal
GFS2: Add "top dir" flag support
GFS2: Fold quota data into the reservations struct
GFS2: Extend the life of the reservations
23 Jul, 2012
2 commits
-
Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.CC: "Theodore Ts'o"
CC: Joel Becker
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse
Acked-by: Dave Kleikamp
Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.Acked-by: Steven Whitehouse
Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Al Viro
21 Jul, 2012
1 commit
-
This patch removes the 64-bit divides introduced in the previous patch
in favor of shifting, so that it will compile properly on 32-bit machines.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
19 Jul, 2012
1 commit
-
This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.The implementation uses an rbtree of reservations attached to each
resource group (and each inode).Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
18 Jul, 2012
1 commit
-
In the unlikely setup where there's only one resource group in the gfs2
filesystem, gfs2_rgrpd_get_next() returns a NULL rgd that is not dealt with
properly, causing a kernel NULL ptr dereference. This patch fixes this issue.Signed-off-by: Abhi Das
Signed-off-by: Steven Whitehouse
14 Jul, 2012
4 commits
-
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called. They could also be passed to the
compare function.Signed-off-by: David Howells
Signed-off-by: Al Viro -
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.Signed-off-by: Al Viro
-
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...Signed-off-by: Al Viro
-
Just the lookup flags. Die, bastard, die...
Signed-off-by: Al Viro
28 Jun, 2012
1 commit
-
This patch fixes buffer_head double free in following code path:
gfs2_block_map
=> gfs2_meta_inode_buffer
=> gfs2_meta_indirect_buffer
=> gfs2_meta_read
=> release_metapathgfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
as an argument. mp.mp_bh are filled with zero at the beginning
of gfs2_block_map.If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
calls release_metapath to free buffers chained to mp.mp_bh.
release_metapath checks each slot of mp.mp_bh[i] and
free(with brelse) unless the slot is filled with NULL.&mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
passed back to caller via an argument bhp.gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
to its caller with EIO. Finally gfs2_block_map gets both EIO and
&mp.mp_bh[0] filled with the wrong pointer. release_metapath
calls brelse again on the wrong pointer.Signed-off-by: Masatake YAMATO
Signed-off-by: Steven Whitehouse
14 Jun, 2012
1 commit
-
This function combines rgrp functions get_local_rgrp and
gfs2_inplace_reserve so that the double retry loop is gone.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
13 Jun, 2012
1 commit
-
This patch adds a kobject release function that properly maintains
the kobject use count, so that accesses to the sysfs files do not
cause an access to freed kernel memory after an unmount.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
11 Jun, 2012
2 commits
-
This places a limit on the buffer size for archs with larger
PAGE_SIZE.Signed-off-by: Steven Whitehouse
Reported-by: Eric Dumazet -
Make use of the newly added seq_vprintf() function.
Signed-off-by: Steven Whitehouse
Reported-by: Eric Dumazet
Acked-by: Al Viro
08 Jun, 2012
2 commits
-
Instead of reading in the resource groups when gfs2 is checking
for free space to allocate from, gfs2 can store the necessary infromation
in the resource group's lvb. Also, instead of searching for unlinked
inodes in every resource group that's checked for free space, gfs2 can
store the number of unlinked but inodes in the lvb, and only check for
unlinked inodes if it will find some.The first time a resource group is locked, the lvb must initialized.
Since this involves counting the unlinked inodes in the resource group,
this takes a little extra time. But after that, if the resource group
is locked with GL_SKIP, the buffer head won't be read in unless it's
actually needed.Enabling the resource groups lvbs is done via the rgrplvb mount option. If
this option isn't set, the lvbs will still be set and updated, but they won't
be verfied or used by the filesystem. To safely turn on this option, all of
the nodes mounting the filesystem must be running code with this patch, and
the filesystem must have been completely unmounted since they were updated.Signed-off-by: Benjamin Marzinski
Signed-off-by: Steven Whitehouse -
For the glocks and glstats seq_files, which are exposed via debugfs
we should cache the most recent hash bucket, along with the offset
into that bucket. This allows us to restart from that point, rather
than having to begin at the beginning each time.This is an idea from Eric Dumazet, however I've slightly extended it
so that if the position from which we are due to start is at any
point beyond the last cached point, we start from the last cached
point, plus whatever is the appropriate offset. I don't really expect
people to be lseeking around these files, but if they did so with only
positive offsets, then we'd still get some of the benefit of using a
cached offset.With my simple test of around 200k entries in the file, I'm seeing
an approx 10x speed up.Cc: Eric Dumazet
Signed-off-by: Steven Whitehouse
07 Jun, 2012
1 commit
-
As per Al Viro's suggestion, this increases the buffer size used
for these two files. This provides a speed up of slightly less than
8x (i.e. proportional to the buffer size) for cases when we have
large numbers of glocks.Cc: Al Viro
Signed-off-by: Steven Whitehouse
06 Jun, 2012
4 commits
-
When we read an invalid block from the journal, we should not call
withdraw, but simply print a message and return an error. It is
up to the caller to then handle that error. In the case of mount
that means a failed mount, rather than a withdraw (requiring a
reboot). In the case of recovering another nodes journal then
we return an error via the uevent.Signed-off-by: Steven Whitehouse
-
This patch adds support for the "top dir" flag. Currently this is unused
but a subsequent patch is planned which will add support for the
Orlov allocation policy when allocating subdirectories in a parent
with this flag set.In order to ensure backward compatible behaviour, mkfs.gfs2 does
not currently tag the root directory with this flag, it must always be
set manually.Signed-off-by: Steven Whitehouse
-
This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse -
This patch lengthens the lifespan of the reservations structure for
inodes. Before, they were allocated and deallocated for every write
operation. With this patch, they are allocated when the first write
occurs, and deallocated when the last process closes the file.
It's more efficient to do it this way because it saves GFS2 a lot of
unnecessary allocates and frees. It also gives us more flexibility
for the future: (1) we can now fold the qadata structure back into
the structure and save those alloc/frees, (2) we can use this for
multi-block reservations.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
30 May, 2012
1 commit
-
pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.NOTE: that needs ceph fix folded in.
Signed-off-by: Al Viro
29 May, 2012
1 commit
-
Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally
23 May, 2012
1 commit
-
Pull dlm updates from David Teigland:
"This set includes some minor fixes and improvements. The one large
patch addresses the special "nodir" mode, which has been a long
neglected proof of concept, but with these fixes seems to be quite
usable. It allows the resource master to be assigned statically
instead of dynamically, which can improve performance if there is
little locality and most resources are shared."* tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: NULL dereference on failure in kmem_cache_create()
gfs2: fix recovery during unmount
dlm: fixes for nodir mode
dlm: improve error and debug messages
dlm: avoid unnecessary search in search_rsb
dlm: limit rcom debug messages
dlm: fix waiter recovery
dlm: prevent connections during shutdown
22 May, 2012
1 commit
-
Pull GFS2 changes from Steven Whitehouse.
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
GFS2: Fix quota adjustment return code
GFS2: Add rgrp information to block_alloc trace point
GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
GFS2: Update glock doc to add new stats info
GFS2: Update main gfs2 doc
GFS2: Remove redundant metadata block type check
GFS2: Fix sgid propagation when using ACLs
GFS2: eliminate log elements and simplify
GFS2: Eliminate vestigial sd_log_le_rg
GFS2: Eliminate needless parameter from function gfs2_setbit
GFS2: Log code fixes
GFS2: Remove unused argument from gfs2_internal_read
GFS2: Remove bd_list_tr
GFS2: Remove duplicate log code
GFS2: Clean up log write code path
GFS2: Use variable rather than qa to determine if unstuff necessary
GFS2: Change variable blk to biblk
GFS2: Fix function parameter comments in rgrp.c
GFS2: Eliminate offset parameter to gfs2_setbit
GFS2: Use slab for block reservation memory
...
16 May, 2012
1 commit
-
This patch changes function gfs2_adjust_quota so that it properly
returns a good (zero) return code on the normal path through the code.
Without this, mounting GFS2 with -o quota=account periodically gave
this error message: GFS2: fsid=cluster:fs: gfs2_quotad: sync error -5Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
11 May, 2012
3 commits
-
This is a second attempt at a patch that adds rgrp information to the
block allocation trace point for GFS2. As suggested, the patch was
modified to list the rgrp information _after_ the fields that exist today.Again, the reason for this patch is to allow us to trace and debug
problems with the block reservations patch, which is still in the works.
We can debug problems with reservations if we can see what block allocations
result from the block reservations. It may also be handy in figuring out
if there are problems in rgrp free space accounting. In other words,
we can use it to track the rgrp and its free space along side the allocations
that are taking place.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse -
It turns out that the "new" parameter to function gfs2_meta_indirect_buffer
was always being passed in as zero. Therefore, this patch eliminates it
and simplifies the function.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse -
This allows comparing hash and len in one operation on 64-bit
architectures. Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer. This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).Signed-off-by: Linus Torvalds
08 May, 2012
1 commit
-
This patch removes a redundant metadata block check. See description below.
Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
06 May, 2012
1 commit
-
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.Signed-off-by: Jan Kara
Signed-off-by: Fengguang Wu
04 May, 2012
1 commit
-
This cleans up the mode setting code when creating inodes. The
SGID bit was being reset by setattr_copy() when the user creating a
subdirectory was not in the owning group. When ACLs are in use this
SGID bit should have been propagated if the ACL allows creation of
a subdirectory. GFS2's behaviour now matches that of the other ACL
supporting filesystems in this regard.Signed-off-by: Steven Whitehouse
03 May, 2012
2 commits
-
Journal recovery from lock_dlm should not be ignored
if there is an unmount in progress. Ignoring it will
causes the recovery to get stuck. The recovery
process will correctly handle an in-progess unmount.Signed-off-by: David Teigland
-
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used. This commit
fixes a number of problems, making nodir much more usable.- Major change to recovery: recover all locks and restart
all in-progress operations after recovery. In some
cases it's not possible to know which in-progess locks
to recover, so recover all. (Most require recovery
in nodir mode anyway since rehashing changes most
master nodes.)- Change the way nodir mode is enabled, from a command
line mount arg passed through gfs2, into a sysfs
file managed by dlm_controld, consistent with the
other config settings.- Allow recovering MSTCPY locks on an rsb that has not
yet been turned into a master copy.- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
from a previous, aborted recovery cycle. Base this
on the local recovery status not being in the state
where any nodes should be sending LOCK messages for the
current recovery cycle.- Hold rsb lock around dlm_purge_mstcpy_locks() because it
may run concurrently with dlm_recover_master_copy().- Maintain highbast on process-copy lkb's (in addition to
the master as is usual), because the lkb can switch
back and forth between being a master and being a
process copy as the master node changes in recovery.- When recovering MSTCPY locks, flag rsb's that have
non-empty convert or waiting queues for granting
at the end of recovery. (Rename flag from LOCKS_PURGED
to RECOVER_GRANT and similar for the recovery function,
because it's not only resources with purged locks
that need grant a grant attempt.)- Replace a couple of unnecessary assertion panics with
error messages.Signed-off-by: David Teigland
02 May, 2012
1 commit
-
This patch eliminates the gfs2_log_element data structure and
rolls its two components into the gfs2_bufdata. This makes the code
easier to understand and makes it easier to migrate to a rbtree
to keep the list sorted.Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse