13 Jan, 2012
2 commits
-
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Minchan Kim
Cc: Dave Jones
Cc: Jan Kara
Cc: Andy Isaacson
Cc: Nai Xia
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates. Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It is
the responsibility of the callback to fail gracefully if migration would
block.Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Minchan Kim
Cc: Dave Jones
Cc: Jan Kara
Cc: Andy Isaacson
Cc: Nai Xia
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jan, 2012
1 commit
-
We have no business doing any this in the standard write release path.
Get rid of it, and put it in the pNFS layer.Also, while we're at it, get rid of the completely bogus unlock/relock
semantics that were present in nfs_writeback_release_full(). It is
not only unnecessary, but actually dangerous to release the write lock
just in order to take it again in nfs_page_async_flush(). Better just
to open code the pgio operations in a pnfs helper.Signed-off-by: Trond Myklebust
11 Nov, 2011
1 commit
-
pNFS-specific code belongs in the pnfs layer. It should not be
hijacking generic NFS read or write code paths.Signed-off-by: Trond Myklebust
20 Oct, 2011
1 commit
-
It can trivially be replaced with rpc_restart_call_prepare.
Signed-off-by: Trond Myklebust
15 Jul, 2011
3 commits
-
Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of
completely reinitialising the struct nfs_pageio_descriptor.Signed-off-by: Trond Myklebust
-
...and ensure that we recoalese to take into account differences in
differences in block sizes when falling back to write through the MDS.Signed-off-by: Trond Myklebust
-
...and ensure that we recoalese to take into account differences in
block sizes when falling back to read through the MDS.Signed-off-by: Trond Myklebust
13 Jul, 2011
2 commits
-
Signed-off-by: Trond Myklebust
-
If the client is using NFS v4.1, then we can use SECINFO_NO_NAME to find
the secflavor for the initial mount. If the server doesn't support
SECINFO_NO_NAME then I fall back on the "guess and check" method used
for v4.0 mounts.Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust
15 Jun, 2011
1 commit
-
Commit 28331a46d88459788c8fca72dbb0415cd7f514c9 "Ensure we request the
ordinary fileid when doing readdirplus"
changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when
FATTR4_WORD1_MOUNTED_ON_FILED was requested.Allow nfs_fhget to succeed with only a mounted on fileid when crossing
a mountpoint or a referral.Ask for the fileid of the absent file system if mounted_on_fileid is not
supported.Signed-off-by: Andy Adamson
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust
30 May, 2011
2 commits
-
* 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
pnfs-obj: pg_test check for max_io_size
NFSv4.1: define nfs_generic_pg_test
NFSv4.1: use pnfs_generic_pg_test directly by layout driver
NFSv4.1: change pg_test return type to bool
NFSv4.1: unify pnfs_pageio_init functions
pnfs-obj: objlayout_encode_layoutcommit implementation
pnfs: encode_layoutcommit
pnfs-obj: report errors and .encode_layoutreturn Implementation.
pnfs: encode_layoutreturn
pnfs: layoutret_on_setattr
pnfs: layoutreturn
pnfs-obj: osd raid engine read/write implementation
pnfs: support for non-rpc layout drivers
pnfs-obj: define per-inode private structure
pnfs: alloc and free layout_hdr layoutdriver methods
pnfs-obj: objio_osd device information retrieval and caching
pnfs-obj: decode layout, alloc/free lseg
pnfs-obj: pnfs_osd XDR client implementation
pnfs-obj: pnfs_osd XDR definitions
pnfs-obj: objlayoutdriver module skeleton
... -
Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.[fix lseg ref-count bugs, and null de-refs]
[Fall out from: non-rpc layout drivers]
Signed-off-by: Boaz Harrosh
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy
25 May, 2011
1 commit
-
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Acked-by: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Mar, 2011
3 commits
-
A submount may use different security than the parent
mount does. We should figure out what sec flavor the
submount uses at mount time.Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust -
A later patch will need to perform a lookup using an
alternate client with a different security flavor.
This patch adds support for doing that on NFS v4.Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust
24 Mar, 2011
1 commit
-
Implement all the hooks created in the previous patches.
This requires exporting quite a few functions and adding a few
structure fields.Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust
18 Mar, 2011
1 commit
-
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
RPC: killing RPC tasks races fixed
xprt: remove redundant check
SUNRPC: Convert struct rpc_xprt to use atomic_t counters
SUNRPC: Ensure we always run the tk_callback before tk_action
sunrpc: fix printk format warning
xprt: remove redundant null check
nfs: BKL is no longer needed, so remove the include
NFS: Fix a warning in fs/nfs/idmap.c
Cleanup: Factor out some cut-and-paste code.
cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
NFS: account direct-io into task io accounting
gss:krb5 only include enctype numbers in gm_upcall_enctypes
RPCRDMA: Fix FRMR registration/invalidate handling.
RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
NFSv4: Send unmapped uid/gids to the server when using auth_sys
NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
NFSv4: cleanup idmapper functions to take an nfs_server argument
NFSv4: Send unmapped uid/gids to the server if the idmapper fails
NFSv4: If the server sends us a numeric uid/gid then accept it
NFSv4.1: reject zero layout with zeroed stripe unit
...
17 Mar, 2011
3 commits
-
It's always equal to dentry->d_sb
Signed-off-by: Al Viro
-
part 3: now we have everything to get nfs_path() just by dentry -
just follow to (disconnected) root and pick the rest of the thing
there.Start killing propagation of struct vfsmount * on the paths that
used to bring it to nfs_path().Signed-off-by: Al Viro
-
step 1 of ->mnt_devname fixes: make sure we have the value of devname
available in ..._get_root().Signed-off-by: Al Viro
12 Mar, 2011
5 commits
-
Allows the pnfs filelayout driver to write to the data servers.
Note that COMMIT to data servers will be implemented in a future
patch. To avoid improper behavior, for the moment any WRITE to a data
server that would also require a COMMIT to the data server is sent
NFS_FILE_SYNC.Signed-off-by: Andy Adamson
Signed-off-by: Dean Hildebrand
Signed-off-by: Fred Isaman
Signed-off-by: Mingyang Guo
Signed-off-by: Oleg Drokin
Signed-off-by: Ricardo Labiaga
Signed-off-by: Andy Adamson
Signed-off-by: Benny Halevy
Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust -
Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
Attempt a pNFS file layout read by setting up the nfs_read_data struct and
calling nfs_initiate_read with the data server rpc client and the
filelayout rpc call ops.Error handling is implemented in a subsequent patch.
Signed-off-by: Andy Adamson
Signed-off-by: Dean Hildebrand
Signed-off-by: Fred Isaman
Signed-off-by: Fred Isaman
Signed-off-by: Mingyang Guo
Signed-off-by: Oleg Drokin
Signed-off-by: Ricardo Labiaga
Tested-by: Guo Mingyang
Signed-off-by: Andy Adamson
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
Introduce a data server set_client and init session following the
nfs4_set_client and nfs4_init_session convention.Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
serializes access to creating an nfs_client struct with matching properties.Use the new nfs_get_client() that initializes new clients.
Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
Now nfs_get_client returns an nfs_client ready to be used no matter if it was
found or created.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
26 Jan, 2011
1 commit
-
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
16 Jan, 2011
1 commit
-
Make NFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.Signed-off-by: David Howells
Acked-by: Trond Myklebust
Acked-by: Ian Kent
Signed-off-by: Al Viro
07 Jan, 2011
2 commits
-
Fixes a bug where the nfs_client could be freed during callback processing.
Refactor nfs_find_client to use minorversion specific means to locate the
correct nfs_client structure.In the NFS layer, V4.0 clients are found using the callback_ident field in the
CB_COMPOUND header. V4.1 clients are found using the sessionID in the
CB_SEQUENCE operation which is also compared against the sessionID associated
with the back channel thread after a successful CREATE_SESSION.Each of these methods finds the one an only nfs_client associated
with the incoming callback request - so nfs_find_client_next is not needed.In the RPC layer, the pg_authenticate call needs to find the nfs_client. For
the v4.0 callback service, the callback identifier has not been decoded so a
search by address, version, and minorversion is used. The sessionid for the
sessions based callback service has (usually) not been set for the
pg_authenticate on a CB_NULL call which can be sent prior to the return
of a CREATE_SESSION call, so the sessionid associated with the back channel
thread is not used to find the client in pg_authenticate for CB_NULL calls.Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed
via the new cb_process_state structure. The reference is held across
cb_compound processing.Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP
processing from process_op into nfs4_callback_sequence where it belongs.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
Use the small id to pointer translator service to provide a unique callback
identifier per SETCLIENTID call used to identify the v4.0 callback service
associated with the clientid.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
17 Dec, 2010
3 commits
-
Clean up.
The pointer returned by ->decode_dirent() is no longer used as a
pointer. The only call site (xdr_decode() in fs/nfs/dir.c) simply
extracts the errno value encoded in the pointer. Replace the
returned pointer with a standard integer errno return value.Also, pass the "server" argument as part of the nfs_entry instead of
as a separate parameter. It's faster to derive "server" in
nfs_readdir_xdr_to_array() since we already have the directory's inode
handy. "server" ought to be invariant for a set of entries in the
same directory, right?The legacy versions of decode_dirent() don't use "server" anyway, so
it's wasted work for them to derive and pass "server" for each entry.Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust -
We'd like to prevent local buffer overflows caused by malicious or
broken servers. New xdr_stream style decoders can do that.For efficiency, we also eventually want to be able to pass xdr_streams
from call_decode() to all XDR decoding functions, rather than building
an xdr_stream in every XDR decoding function in the kernel.nfs_decode_dirent() is renamed to follow the naming convention of the
other two dirent decoders.Static helper functions are left without the "inline" directive. This
allows the compiler to choose automatically how to optimize these for
size or speed.Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust -
Clean up.
To distinguish more clearly between the on-the-wire NFSERR_ value and
our local errno values, use the proper type for the argument of
nfs_stat_to_errno().Add a documenting comment appropriate for a global function shared
outside this source file.Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust
23 Nov, 2010
1 commit
-
Store the dirent->d_type in the struct nfs_cache_array_entry so that we
can use it in getdents() calls.This fixes a regression with the new readdir code.
Signed-off-by: Trond Myklebust
24 Oct, 2010
3 commits
-
By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files. Without readdir plus, I see this:n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access | 3 | 1 | 1 | 4 | 31
getattr | 2 | 1 | 1 | 1 | 1
lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
readdir | 2 | 16 | 158 | 1,575 | 15,749
total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784With readdir plus enabled, I see this:
n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access | 3 | 1 | 1 | 1 | 7
getattr | 2 | 1 | 1 | 1 | 1
lookup | 4 | 3 | 3 | 3 | 3
readdir | 6 | 62 | 630 | 6,300 | 62,993
total | 15 | 67 | 635 | 6,305 | 63,004Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust -
We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.Signed-off-by: Bryan Schumaker
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: Trond Myklebust -
Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
kernel oops that has been occuring when reading a vmapped page.Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust
11 Aug, 2010
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...Fix up trivial conflicts in fs/nilfs2/super.c
10 Aug, 2010
1 commit
-
Signed-off-by: Al Viro