18 Jul, 2011
2 commits
-
It's sort of ridiculous that we've never had a working reply cache for
NFSv4.On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4). What we really need here is some serious testing.Anyway, here's a start.
Signed-off-by: J. Bruce Fields
-
This simplifies cleanup a bit.
Signed-off-by: J. Bruce Fields
16 Jul, 2011
10 commits
-
Before nfs41 client's RECLAIM_COMPLETE done, nfs server should deny any
new locks or opens.rfc5661:
" Whenever a client establishes a new client ID and before it does
the first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. "Signed-off-by: Mi Jinlong
Signed-off-by: J. Bruce Fields -
From: Miklos Szeredi
Remove SLAB initialization entirely, as suggested by Bruce and Linus.
Allocate with __GFP_ZERO instead and only initialize list heads.Signed-off-by: Miklos Szeredi
Signed-off-by: J. Bruce Fields -
Check in SEQUENCE that the request doesn't exceed maxreq_sz for the
given session.Signed-off-by: Mi Jinlong
Signed-off-by: J. Bruce Fields -
According to RFC5661, 18.36.3,
"if the client selects a value for ca_maxresponsesize such that
a replier on a channel could never send a response,the server
SHOULD return NFS4ERR_TOOSMALL in the CREATE_SESSION reply."So, error out when the client sets a maxreq_sz less than the minimum
possible SEQUENCE request size, or sets a maxresp_sz less than the
minimum possible SEQUENCE reply size.Signed-off-by: Mi Jinlong
Signed-off-by: J. Bruce Fields -
Stateid's hold a read reference for a read open, a write reference for a
write open, and an additional one of each for each read+write open. The
latter wasn't getting put on a downgrade, so something like:open RW
open R
downgrade to Rwas resulting in a file leak.
Also fix an imbalance in an error path.
Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix
downgrade/lock logic".Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields -
Without this, for example,
open read
open read+write
closewill result in a struct file leak.
Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix
downgrade/lock logic".Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields -
This operation is used by the client to check the validity of a list of
stateids.Signed-off-by: Bryan Schumaker
Signed-off-by: J. Bruce Fields -
This operation is used by the client to tell the server to free a
stateid.Signed-off-by: Bryan Schumaker
Signed-off-by: J. Bruce Fields -
As promised in feature-removal-schedule.txt it is time to
remove the nfsctl system call.Userspace has perferred to not use this call throughout 2.6 and it has been
excluded in the default configuration since 2.6.36 (9 months ago).So this patch removes all the code that was being compiled out.
There are still references to sys_nfsctl in various arch systemcall tables
and related code. These should be cleaned out too, probably in the next
merge window.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields -
DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
the client ID derived from the session ID of SEQUENCE is not the same
as the client ID to be destroyed. If the client IDs are the same,
then the server MUST return NFS4ERR_CLIENTID_BUSY.(that's not implemented yet)
If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
operation in the COMPOUND request (otherwise, the server MUST return
NFS4ERR_NOT_ONLY_OP).This fixes the error return; before, we returned
NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP.Signed-off-by: Benny Halevy
Signed-off-by: J. Bruce Fields
14 Jul, 2011
1 commit
-
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Fix use of static variable in rpcb_getport_async
NFSv4.1: update nfs4_fattr_bitmap_maxsz
SUNRPC: Fix a race between work-queue and rpc_killall_tasks
pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
12 Jul, 2011
3 commits
-
Attribute IDs assigned in RFC 5661 now require three bitmaps.
Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs.Signed-off-by: Andy Adamson
Cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust -
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: drop spinlock before calling cifs_put_tlink
cifs: fix expand_dfs_referral
cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALL
cifs: factor smb_vol allocation out of cifs_setup_volume_info
cifs: have cifs_cleanup_volume_info not take a double pointer
cifs: fix build_unc_path_to_root to account for a prefixpath
cifs: remove bogus call to cifs_cleanup_volume_info -
...as that function can sleep.
Signed-off-by: Jeff Layton
Signed-off-by: Steve French
10 Jul, 2011
2 commits
-
Regression introduced in commit 724d9f1cfba.
Prior to that, expand_dfs_referral would regenerate the mount data string
and then call cifs_parse_mount_options to re-parse it (klunky, but it
worked). The above commit moved cifs_parse_mount_options out of cifs_mount,
so the re-parsing of the new mount options no longer occurred. Fix it by
making expand_dfs_referral re-parse the mount options.Signed-off-by: Jeff Layton
Signed-off-by: Steve French -
This needs to be done regardless of whether that KConfig option is set
or not.Reported-by: Sven-Haegar Koch
Signed-off-by: Jeff Layton
Signed-off-by: Steve French
09 Jul, 2011
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
btrfs: fix oops when doing space balance
Btrfs: don't panic if we get an error while balancing V2
btrfs: add missing options displayed in mount output -
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: unpin stale inodes directly in IOP_COMMITTED
08 Jul, 2011
2 commits
-
Signed-off-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French -
Add an FS-Cache helper to bulk uncache pages on an inode. This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages. This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress. Fix it by uncaching mapped pages when we disable the inode
cookie.This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.------------[ cut here ]------------
kernel BUG at fs/cachefiles/namei.c:201!
invalid opcode: 0000 [#1] SMP
Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282
RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
Stack:
0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
Call Trace:
cachefiles_lookup_object+0x78/0xd4 [cachefiles]
fscache_lookup_object+0x131/0x16d [fscache]
fscache_object_work_func+0x1bc/0x669 [fscache]
process_one_work+0x186/0x298
worker_thread+0xda/0x15d
kthread+0x84/0x8c
kernel_thread_helper+0x4/0x10
RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles]
---[ end trace 1d481c9af1804caa ]---I tested the uncaching by the following means:
(1) Create a big file on my NFS server (104857600 bytes).
(2) Read the file into the cache with md5sum on the NFS client. Look in
/proc/fs/fscache/stats:Pages : mrk=25601 unc=0
(3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc
again:Pages : mrk=25601 unc=25601
Reported-by: Jeff Layton
Signed-off-by: David Howells
Reviewed-and-Tested-by: Suresh Jayaraman
cc: stable@kernel.org
Signed-off-by: Linus Torvalds
07 Jul, 2011
9 commits
-
We need to make sure the data relocation inode doesn't go through
the delayed metadata updates, otherwise we get an oops during balance:kernel BUG at fs/btrfs/relocation.c:4303!
[SNIP]
Call Trace:
[] ? update_ref_for_cow+0x22d/0x330 [btrfs]
[] __btrfs_cow_block+0x451/0x5e0 [btrfs]
[] ? read_block_for_search+0x14d/0x4d0 [btrfs]
[] btrfs_cow_block+0x10b/0x240 [btrfs]
[] btrfs_search_slot+0x49e/0x7a0 [btrfs]
[] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
[] ? mutex_lock+0x1e/0x50
[] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
[] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
[] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
[] btrfs_commit_transaction+0x250/0x850 [btrfs]
[] ? find_get_pages+0x39/0x130
[] ? join_transaction+0x25/0x250 [btrfs]
[] ? wake_up_bit+0x40/0x40
[] prepare_to_relocate+0xda/0xf0 [btrfs]
[] relocate_block_group+0x4b/0x620 [btrfs]
[] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
[] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
[] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
[] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
[] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
[] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[] ? btrfs_previous_item+0xb1/0x150 [btrfs]
[] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[] btrfs_balance+0x21a/0x2b0 [btrfs]
[] btrfs_ioctl+0x798/0xd20 [btrfs]
[] ? handle_mm_fault+0x148/0x270
[] ? do_page_fault+0x1d8/0x4b0
[] do_vfs_ioctl+0x9a/0x540
[] sys_ioctl+0xa1/0xb0
[] system_call_fastpath+0x16/0x1b
[SNIP]
RIP [] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]Signed-off-by: Miao Xie
Signed-off-by: Chris Mason -
A user reported an error where if we try to balance an fs after a device has
been removed it will blow up. This is because we get an EIO back and this is
where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks,Reported-by: Anand Jain
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason -
There are three missed mount options settable by user which are not
currently displayed in mount output.Signed-off-by: David Sterba
Signed-off-by: Chris Mason -
When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
in the AIL").To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path. Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder -
...as that makes for a cumbersome interface. Make it take a regular
smb_vol pointer and rely on the caller to zero it out if needed.Signed-off-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French -
Regression introduced by commit f87d39d9513.
Signed-off-by: Jeff Layton
Reviewed-by: Pavel Shilovsky
Signed-off-by: Steve French -
This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's
called the following call to cifs_get_tcp_session will oops as the
volume_info pointer will then be NULL.The caller of cifs_mount should clean up this data since it passed it
in. There's no need for us to call this here.Regression introduced by commit 724d9f1cfba.
Reported-by: Adam Williamson
Cc: Pavel Shilovsky
Signed-off-by: Jeff Layton
Signed-off-by: Steve French -
The shdr4extnum variable isn't being freed in the cleanup process of
elf_fdpic_core_dump().Signed-off-by: Davidlohr Bueso
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds -
locks_alloc_lock() assumed that the allocated struct file_lock is
already initialized to zero members. This is only true for the first
allocation of the structure, after reuse some of the members will have
random values.This will for example result in passing random fl_start values to
userspace in fuse for FL_FLOCK locks, which is an information leak at
best.Fix by reinitializing those members which may be non-zero after freeing.
Signed-off-by: Miklos Szeredi
CC: stable@kernel.org
Signed-off-by: Linus Torvalds
06 Jul, 2011
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix sync and dio writes across stripe boundaries
libceph: fix page calculation for non-page-aligned io
ceph: fix page alignment corrections -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: Fix double iput of the same inode in hfsplus_fill_super()
hfsplus: add missing call to bio_put()
02 Jul, 2011
1 commit
-
Benjamin S. reported that he was unable to suspend his machine while
it had a cifs share mounted. The freezer caused this to spew when he
tried it:-----------------------[snip]------------------
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ...
Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
cifsd S ffff880127f7b1b0 0 1821 2 0x00800000
ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff
ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000
ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010
Call Trace:
[] ? sk_reset_timer+0xf/0x19
[] ? tcp_connect+0x43c/0x445
[] ? tcp_v4_connect+0x40d/0x47f
[] ? schedule_timeout+0x21/0x1ad
[] ? _raw_spin_lock_bh+0x9/0x1f
[] ? release_sock+0x19/0xef
[] ? inet_stream_connect+0x14c/0x24a
[] ? autoremove_wake_function+0x0/0x2a
[] ? ipv4_connect+0x39c/0x3b5 [cifs]
[] ? cifs_reconnect+0x1fc/0x28a [cifs]
[] ? cifs_demultiplex_thread+0x397/0xb9f [cifs]
[] ? perf_event_exit_task+0xb9/0x1bf
[] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
[] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
[] ? kthread+0x7a/0x82
[] ? kernel_thread_helper+0x4/0x10
[] ? kthread+0x0/0x82
[] ? kernel_thread_helper+0x0/0x10Restarting tasks ... done.
-----------------------[snip]------------------We do attempt to perform a try_to_freeze in cifs_reconnect, but the
connection attempt itself seems to be taking longer than 20s to time
out. The connect timeout is governed by the socket send and receive
timeouts, so we can shorten that period by setting those timeouts
before attempting the connect instead of after.Adam Williamson tested the patch and said that it seems to have fixed
suspending on his laptop when a cifs share is mounted.Reported-by: Benjamin S
Tested-by: Adam Williamson
Signed-off-by: Jeff Layton
Signed-off-by: Steve French
30 Jun, 2011
2 commits
-
There is a misprint in resource deallocation code on error path in
hfsplus_fill_super(): the sbi->alloc_file inode is iput twice,
while the root inode in not iput at all.Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov
Signed-off-by: Christoph Hellwig -
hfsplus leaks bio objects by failing to call bio_put() on the bios
it allocates. Add the missing call to fix the leak.Signed-off-by: Seth Forshee
Cc: # .38.x, .39.x
Signed-off-by: Christoph Hellwig
29 Jun, 2011
2 commits
-
In current pnfs tree, all the layouts set mds_offset in their
.write_pagelist member.
mds_offset is only used by generic layer and should be handled by it.This patch is for upstream. It is needed in this -rc series to fix a
bug in objects layout_commit.I'll send patches for objects and blocks to be
squashed into current pnfs tree.TODO: It looks like the read path needs the same patch.
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust -
/proc/PID/io may be used for gathering private information. E.g. for
openssh and vsftpd daemons wchars/rchars may be used to learn the
precise password length. Restrict it to processes being able to ptrace
the target process.ptrace_may_access() is needed to prevent keeping open file descriptor of
"io" file, executing setuid binary and gathering io information of the
setuid'ed process.Signed-off-by: Vasiliy Kulikov
Signed-off-by: Linus Torvalds
28 Jun, 2011
2 commits
-
Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:CPU0 CPU1
...
shrink_page_list()
__remove_mapping()
__delete_from_page_cache()
radix_tree_delete()
evict_inode()
truncate_inode_pages()
truncate_inode_pages_range()
pagevec_lookup() - finds nothing
end_writeback()
mapping->nrpages != 0 -> BUG
page->mapping = NULL
mapping->nrpages--Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().Analyzed by Jay , lost in LKML, and dug out
by Miklos Szeredi .Cc: Jay
Cc: Miklos Szeredi
Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
romfs_get_unmapped_area() checks argument `len' without considering
PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after
commit f67d9b1576c ("nommu: add page_align to mmap").Fix the check by changing it in same way ramfs_nommu_get_unmapped_area()
was changed in ramfs/file-nommu.c.Signed-off-by: Bob Liu
Cc: David Howells
Cc: Paul Mundt
Acked-by: Greg Ungerer
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds