06 Mar, 2010
1 commit
-
Similar to the fsync issue fixed a while ago in commit
2daea67e966dc0c42067ebea015ddac6834cef88 we need to write for data to
actually hit the disk before writing out the metadata to guarantee
data integrity for filesystems that modify the inode in the data I/O
completion path. Currently XFS and NFS handle this manually, and AFS
has a write_inode method that does nothing but waiting for data, while
others are possibly missing out on this.Fortunately this change has a lot less impact than the fsync change
as none of the write_inode methods starts data writeout of any form
by itself.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
10 Dec, 2009
2 commits
-
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC.
Remove the duplicate manual invocation.Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara -
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment. This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag. To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition. Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set. The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op. We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Grant Grundler
Cc: "David S. Miller"
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Al Viro
Cc: Andreas Dilger
Acked-by: Trond Myklebust
Acked-by: Kyle McMartin
Acked-by: Ulrich Drepper
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Jan Kara
20 Nov, 2009
1 commit
-
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache. Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.The problem is typified by the following trace of a stuck process:
kslowd005 D 0000000000000000 0 4253 2 0x00000080
ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
Call Trace:
[] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
[] ? autoremove_wake_function+0x0/0x34
[] ? __fscache_check_page_write+0x63/0x70 [fscache]
[] nfs_fscache_release_page+0x4e/0xc4 [nfs]
[] nfs_release_page+0x3c/0x41 [nfs]
[] try_to_release_page+0x32/0x3b
[] shrink_page_list+0x316/0x4ac
[] shrink_inactive_list+0x392/0x67c
[] ? __mutex_unlock_slowpath+0x100/0x10b
[] ? trace_hardirqs_on_caller+0x10c/0x130
[] ? mutex_unlock+0x9/0xb
[] shrink_list+0x8d/0x8f
[] shrink_zone+0x278/0x33c
[] ? ktime_get_ts+0xad/0xba
[] try_to_free_pages+0x22e/0x392
[] ? isolate_pages_global+0x0/0x212
[] __alloc_pages_nodemask+0x3dc/0x5cf
[] grab_cache_page_write_begin+0x65/0xaa
[] ext3_write_begin+0x78/0x1eb
[] generic_file_buffered_write+0x109/0x28c
[] ? current_fs_time+0x22/0x29
[] __generic_file_aio_write+0x350/0x385
[] ? generic_file_aio_write+0x4a/0xae
[] generic_file_aio_write+0x60/0xae
[] do_sync_write+0xe3/0x120
[] ? autoremove_wake_function+0x0/0x34
[] ? __dentry_open+0x1a5/0x2b8
[] ? dentry_open+0x82/0x89
[] cachefiles_write_page+0x298/0x335 [cachefiles]
[] fscache_write_op+0x178/0x2c2 [fscache]
[] fscache_op_execute+0x7a/0xd1 [fscache]
[] slow_work_execute+0x18f/0x2d1
[] slow_work_thread+0x1c5/0x308
[] ? autoremove_wake_function+0x0/0x34
[] ? slow_work_thread+0x0/0x308
[] kthread+0x7a/0x82
[] child_rip+0xa/0x20
[] ? restore_args+0x0/0x30
[] ? tg_shares_up+0x171/0x227
[] ? kthread+0x0/0x82
[] ? child_rip+0x0/0x20In the above backtrace, the following is happening:
(1) A page storage operation is being executed by a slow-work thread
(fscache_write_op()).(2) FS-Cache farms the operation out to the cache to perform
(cachefiles_write_page()).(3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
standard write (do_sync_write()) under KERNEL_DS directly from the netfs
page.(4) However, for Ext3 to perform the write, it must allocate some memory, in
particular, it must allocate at least one page cache page into which it
can copy the data from the netfs page.(5) Under OOM conditions, the memory allocator can't immediately come up with
a page, so it uses vmscan to find something to discard
(try_to_free_pages()).(6) vmscan finds a clean netfs page it might be able to discard (possibly the
one it's trying to write out).(7) The netfs is called to throw the page away (nfs_release_page()) - but it's
called with __GFP_WAIT, so the netfs decides to wait for the store to
complete (__fscache_wait_on_page_write()).(8) This blocks a slow-work processing thread - possibly against itself.
The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed. This means that some data won't make it into the
cache this time. To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan". There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages. If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.Signed-off-by: David Howells
02 Oct, 2009
1 commit
-
It's just a wrapper for , so remove it.
Signed-off-by: Christoph Hellwig
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Sep, 2009
1 commit
-
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.Signed-off-by: James Morris
Acked-by: Serge Hallyn
Acked-by: Casey Schaufler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
1 commit
-
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Sep, 2009
1 commit
-
It's only set, it's never checked. Kill it.
Acked-by: Jan Kara
Signed-off-by: Jens Axboe
28 Aug, 2009
1 commit
-
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.Signed-off-by: David Howells
Acked-by: Anton Blanchard
Signed-off-by: Linus Torvalds
13 Jul, 2009
2 commits
-
Fix the following warning:
fs/afs/dir.c: In function 'afs_d_revalidate':
fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function
fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this functionby marking the 'fid' variable as an uninitialized_var. The problem is
that gcc doesn't always manage to work out that fid is always set on the
path through the function that uses it.Cc: linux-afs@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Artem Bityutskiy
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds -
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPTThis will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
09 Jul, 2009
1 commit
-
Fix various silly problems wrt mnt_namespace.h:
- exit_mnt_ns() isn't used, remove it
- done that, sched.h and nsproxy.h inclusions aren't needed
- mount.h inclusion was need for vfsmount_lock, but no longer
- remove mnt_namespace.h inclusion from files which don't use anything
from mnt_namespace.hSigned-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
01 Jul, 2009
1 commit
-
Don't unlock on vfs_rejected_lock path in afs_do_setlk, since the lock
is unlocked after abort_attempt label.Signed-off-by: Jiri Slaby
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
17 Jun, 2009
1 commit
-
Authentication error abort codes should be translated to appropriate
Linux error codes, rather than all being translated to EREMOTEIO - which
indicates that the server had internal problems.Additionally, a server shouldn't be marked unavailable and the next
server tried if an authentication error occurs. This will quickly make
all the servers unavailable to the client. Instead the error should be
returned straight to the user.Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
12 Jun, 2009
2 commits
-
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Signed-off-by: Al Viro
09 May, 2009
2 commits
-
Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
18 Apr, 2009
1 commit
-
If CONFIG_AFS_FSCACHE is not defined, the following warning is displayed when
fs/afs/file.c is compiled:fs/afs/file.c:111: warning: ‘afs_file_readpage_read_complete’ defined but not used
This occurs because all calls to this function are guarded by
CONFIG_AFS_FSCACHE. Thus, guard its definition as well.Signed-off-by: Matt Kraai
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
10 Apr, 2009
1 commit
-
Signed-off-by: Stoyan Gaydarov
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
03 Apr, 2009
1 commit
-
The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches. The kAFS filesystem will use caching
automatically if it's available.Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne
31 Mar, 2009
1 commit
-
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.->read_proc/->write_proc were just fixed to not require ->owner for
protection.rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan
28 Mar, 2009
1 commit
-
Signed-off-by: Al Viro
22 Jan, 2009
1 commit
-
Signed-off-by: Alexey Dobriyan
05 Jan, 2009
1 commit
-
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin
Reviewed-by: KOSAKI Motohiro
Cc: [2.6.28.x]
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds
31 Oct, 2008
1 commit
-
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller
23 Oct, 2008
1 commit
-
With this patch all directory fops instances that have a readdir
that doesn't take the BKL are switched to generic_file_llseek.Signed-off-by: Christoph Hellwig
17 Oct, 2008
1 commit
-
Cannot assume writes will fully complete, so this conversion goes the easy
way and always brings the page uptodate before the write.[dhowells@redhat.com: style tweaks]
Signed-off-by: Nick Piggin
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Oct, 2008
1 commit
-
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.This was posted for review some time ago and I believe its been in -mm
since then.Signed-off-by: Steven Whitehouse
Cc: Alexander Viro
Signed-off-by: Linus Torvalds
05 Aug, 2008
1 commit
-
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds
01 Aug, 2008
1 commit
-
Signed-off-by: Al Viro
27 Jul, 2008
2 commits
-
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
MAY_... found in mask.The obvious next target in that direction is permission(9)
folded fix for nfs_permission() breakage from Miklos Szeredi
Signed-off-by: Al Viro
-
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.cThis is flag day, yes.
Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jun, 2008
1 commit
-
Although if people have questions about ARCnet, perhaps it's _better_
for them to be mailing dwmw2@cam.ac.uk about it...Signed-off-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Apr, 2008
1 commit
-
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
4 commits
-
Add support for the CB.ProbeUuid cache manager RPC op. This allows a modern
OpenAFS server to quickly ask if the client has been rebooted.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The AFS RxRPC op CBGetCapabilities is actually CBTellMeAboutYourself.
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Robert P. J. Day
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.Signed-off-by: Denis V. Lunev
Acked-by: David Howells
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Apr, 2008
1 commit
-
Describe debug parameters with their names (and not their values).
Signed-off-by: Paul Bolle
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds