02 Dec, 2016
1 commit
-
The parameter is already present in the "args" structure.
Signed-off-by: Trond Myklebust
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
13 Dec, 2015
1 commit
-
Commit 42cb14b110a5 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate. This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate. What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?Signed-off-by: Hugh Dickins
Tested-by: Boaz Harrosh
Cc: Benny Halevy
Cc: Trond Myklebust
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Sep, 2015
1 commit
-
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.Signed-off-by: Viresh Kumar
Reviewed-by: Jeff Layton
Reviewed-by: David Howells
Reviewed-by: Steve French
Signed-off-by: Jiri Kosina
28 Mar, 2015
2 commits
-
The LAYOUTCOMMIT operation means different things to different layout types.
For blocks and objects, it is both a data and metadata consistency operation.
For files and flexfiles, it is only a metadata consistency operation.This patch separates out the 2 cases, allowing the files/flexfiles layout
drivers to optimise away the data consistency calls to layoutcommit.Signed-off-by: Trond Myklebust
-
Use of synchronize_rcu() when unmounting and potentially freeing a lot
of deviceids is problematic. There really is no reason why we can't just
use kfree_rcu() here.Signed-off-by: Trond Myklebust
04 Feb, 2015
3 commits
-
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.Signed-off-by: Peng Tao
Signed-off-by: Tom Haynes -
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.Signed-off-by: Weston Andros Adamson
-
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.Signed-off-by: Weston Andros Adamson
22 Oct, 2014
1 commit
-
Pull email address change from Boaz Harrosh.
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
Boaz Harrosh - fix email in Documentation
Boaz Harrosh - Fix broken email address
MAINTAINERS: Change Boaz Harrosh's email
20 Oct, 2014
1 commit
-
I no longer have access to the Panasas email.
So change to an email that can always reach me.Signed-off-by: Boaz Harrosh
13 Sep, 2014
1 commit
-
The kbuild test robot complained about a new sparse warning in
objio_alloc_deviceid_node, but it turns out that this was just a moved
reference to an existing variable. Fix it to have the right big endian
annotated type.Note that there are some other endianess issues in this file that I didn't
bother to sort out as they involve global headers.Reported-by: kbuild test robot
Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust
11 Sep, 2014
1 commit
-
Add support to the common pNFS core to issue GETDEVICEINFO calls on
a device ID cache miss. The code is taken from the well debugged
file layout implementation and calls out to the layoutdriver through
a new alloc_deviceid_node method. The calling conventions for
nfs4_find_get_deviceid are changed so that all information needed to
send a GETDEVICEINFO request is passed to the common code.Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust
25 Jun, 2014
3 commits
-
Remove duplicate writeverf structure from merge of nfs_pgio_header and
nfs_pgio_data and remove writeverf related flags and logic to handle
more than one RPC per nfs_pgio_header.Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust -
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is
passed around everywhere, because there used to be multiple _data structs
per _header. Many of these functions then use the _data to find a pointer
to the _header. This patch cleans this up by merging the nfs_pgio_data
structure into nfs_pgio_header and passing nfs_pgio_header around instead.Reviewed-by: Christoph Hellwig
Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust -
Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for
merge of nfs_pgio_data and nfs_pgio_header.Reviewed-by: Christoph Hellwig
Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust
30 May, 2014
1 commit
-
Return the NULL pointer when the allocation fails.
Cc: Boaz Harrosh
Signed-off-by: Trond Myklebust
29 May, 2014
3 commits
-
Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust -
This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.For now, just return the size of the request if there is space, or 0
if there is not. This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust -
At this point, the only difference between nfs_read_data and
nfs_write_data is the write verifier.Signed-off-by: Anna Schumaker
Signed-off-by: Trond Myklebust
29 Jun, 2013
1 commit
-
Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
07 Jun, 2013
1 commit
-
This is not strictly needed, since get_deviceinfo is not allowed to
return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway
for consistency with other pNFS operations.Signed-off-by: Trond Myklebust
12 Apr, 2013
1 commit
-
Correct spelling typos in printk and comments.
Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina
18 Feb, 2013
1 commit
-
now pnfs client uses block layout, maybe we can remove
blocklayoutdriver first. if we umount later,
it can cause oops in unset_pnfs_layoutdriver.
because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.reproduce it:
modprobe blocklayoutdriver
mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
rmmod blocklayoutdriver
umount /mntthen you can see following
CPU 0
Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[] [] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286
RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
Stack:
ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
Call Trace:
[] nfs4_destroy_server+0x1e/0x30 [nfsv4]
[] nfs_free_server+0xb7/0x150 [nfs]
[] nfs_kill_super+0x35/0x40 [nfs]
[] deactivate_locked_super+0x45/0x70
[] deactivate_super+0x4a/0x70
[] mntput_no_expire+0xd2/0x130
[] sys_umount+0x72/0xe0
[] system_call_fastpath+0x16/0x1b
Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
RIP [] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP
CR2: ffffffffa04a1b38
---[ end trace 29f75aaedda058bf ]---Signed-off-by: fanchaoting
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org
05 Nov, 2012
1 commit
-
Signed-off-by: Trond Myklebust
17 Oct, 2012
1 commit
-
Signed-off-by: Trond Myklebust
09 Oct, 2012
1 commit
-
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.For direct write, both block layout and object layout use dreq->bytes_left.
Signed-off-by: Peng Tao
Signed-off-by: Trond Myklebust
03 Aug, 2012
1 commit
-
Depending on layout and ARCH, ORE has some limits on max IO sizes
which is communicated on (what else) ore_layout->max_io_length,
which is always stripe aligned.
This was considered as the pg_test boundary for splitting and starting
a new IO.But in the case of a long IO where the start offset is not aligned
what would happen is that both end of IO[N] and start of IO[N+1]
would be unaligned, causing each IO boundary parity unit to be
calculated and written twice.So what we do in this patch is split the very start of an unaligned
IO, up to a stripe boundary, and then next IO's can continue fully
aligned til the end.We might be sacrificing the case where the full unaligned IO would
fit within a single max_io_length, but the sacrifice is well worth
the elimination of double calculation and parity units IO.
Actually the sacrificing is marginal and is almost unmeasurable.TODO:
If we know the total expected linear segment that will
be received, at pg_init, we could use that information
in many places:
1. blocks-layout get_layout write segment size
2. Better mds-threshold
3. In above situation for a better clean splitI will do this in future submission.
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust
20 Jul, 2012
2 commits
-
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.Fix both birds, by returning a global zero_page, if offset is beyond
i_size.TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree
Signed-off-by: Boaz Harrosh -
[Bug since 3.2 Kernel]
CC: Stable Tree
Signed-off-by: Boaz Harrosh
05 May, 2012
1 commit
-
Fix the following sparse warnings:
fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
not declared. Should it be static?
fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
'nfs_readhdr_alloc'
fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
was not declared. Should it be static?Reported-by: Dan Carpenter
Signed-off-by: Trond Myklebust
Cc: Fred Isaman
Cc: Boaz Harrosh
28 Apr, 2012
1 commit
-
In order to avoid duplicating all the data in nfs_read_data whenever we
split it up into multiple RPC calls (either due to a short read result
or due to rsize < PAGE_SIZE), we split out the bits that are the same
per RPC call into a separate "header" structure.The goal this patch moves towards is to have a single header
refcounted by several rpc_data structures. Thus, want to always refer
from rpc_data to the header, and not the other way. This patch comes
close to that ideal, but the directio code currently needs some
special casing, isolated in the nfs_direct_[read_write]hdr_release()
functions. This will be dealt with in a future patch.Signed-off-by: Fred Isaman
Signed-off-by: Trond Myklebust
27 Apr, 2012
1 commit
-
Local variable 'sb' was not being used in objlayout_get_deviceinfo().
Signed-off-by: Sachin Bhamare
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust
21 Mar, 2012
1 commit
-
The pnfs-objects protocol mandates that we autologin into devices not
present in the system, according to information specified in the
get_device_info returned from the server.The Protocol specifies two login hints.
1. An IP address:port combination
2. A string URI which is constructed as a URL with a protocol prefix
followed by :// and a string as address. For each protocol prefix
the string-address format might be different.We only support the second option. The first option is just redundant
to the second one.
NOTE: The Kernel part of autologin does not parse the URI string. It
just channels it to a user-mode script. So any new login protocols should
only update the user-mode script which is a part of the nfs-utils package,
but the Kernel need not change.We implement the autologin by using the call_usermodehelper() API.
(Thanks to Steve Dickson for pointing it out)
So there is no running daemon needed, and/or special setup.We Add the osd_login_prog Kernel module parameters which defaults to:
/sbin/osd_loginKernel try's to upcall the program specified in osd_login_prog. If the file is
not found or the execution fails Kernel will disable any farther upcalls, by
zeroing out osd_login_prog, Until Admin re-enables it by setting the
osd_login_prog parameter to a proper program.Also add text about the osd_login program command line API to:
Documentation/filesystems/nfs/pnfs.txt
and documentation of the new osd_login_prog module parameter to:
Documentation/kernel-parameters.txtTODO: Add timeout option in the case osd_login program gets
stuckSigned-off-by: Sachin Bhamare
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust
14 Mar, 2012
1 commit
-
At some past instance Linus Trovalds wrote:
> From: Linus Torvalds
> commit a84a79e4d369a73c0130b5858199e949432da4c6 upstream.
>
> The size is always valid, but variable-length arrays generate worse code
> for no good reason (unless the function happens to be inlined and the
> compiler sees the length for the simple constant it is).
>
> Also, there seems to be some code generation problem on POWER, where
> Henrik Bakken reports that register r28 can get corrupted under some
> subtle circumstances (interrupt happening at the wrong time?). That all
> indicates some seriously broken compiler issues, but since variable
> length arrays are bad regardless, there's little point in trying to
> chase it down.
>
> "Just don't do that, then".Since then any use of "variable length arrays" has become blasphemous.
Even in perfectly good, beautiful, perfectly safe code like the one
below where the variable length arrays are only used as a sizeof()
parameter, for type-safe dynamic structure allocations. GCC is not
executing any stack allocation code.I have produced a small file which defines two functions main1(unsigned numdevs)
and main2(unsigned numdevs). main1 uses code as before with call to malloc
and main2 uses code as of after this patch. I compiled it as:
gcc -O2 -S see_asm.c
and here is what I get:main1:
.LFB7:
.cfi_startproc
mov %edi, %edi
leaq 4(%rdi,%rdi), %rdi
salq $3, %rdi
jmp malloc
.cfi_endproc
.LFE7:
.size main1, .-main1
.p2align 4,,15
.globl main2
.type main2, @function
main2:
.LFB8:
.cfi_startproc
mov %edi, %edi
addq $2, %rdi
salq $4, %rdi
jmp malloc
.cfi_endproc
.LFE8:
.size main2, .-main2
.section .text.startup,"ax",@progbits
.p2align 4,,15*Exact* same code !!!
So please seriously consider not accepting this patch and leave the
perfectly good code intact.CC: Linus Torvalds
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust
12 Mar, 2012
1 commit
-
Fix a number of "warning: symbol 'foo' was not declared. Should it be
static?" conditions.Fix 2 cases of "warning: Using plain integer as NULL pointer"
fs/nfs/delegation.c:263:31: warning: restricted fmode_t degrades to integer
- We want to allow upgrades to a WRITE delegation, but should otherwise
consider servers that hand out duplicate delegations to be borken.Signed-off-by: Trond Myklebust
07 Feb, 2012
1 commit
-
This patch addresses printks that have some context to show that they are
from fs/nfs/, but for the sake of consistency now start with NFS:Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust
06 Jan, 2012
2 commits
-
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
because this code is never called outside of pnfs.c and pnfs IO
paths)Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.[This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
CC: Stable Tree
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust -
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.[Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
CC: Stable Tree
Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust
03 Nov, 2011
1 commit
-
The ore need suplied a r4w_get_page/r4w_put_page API
from Filesystem so it can get cache pages to read-into when
writing parial stripes.Signed-off-by: Boaz Harrosh
Signed-off-by: Trond Myklebust