Eric Lee / smarc-fsl-linux-kernel

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

25 Mar, 2016

1 commit

8b306a2e7 Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull more nfsd updates from Bruce Fields:
"Apologies for the previous request, which omitted the top 8 commits
from my for-next branch (including the SCSI layout commits). Thanks
to Trond for spotting my error!"

This actually includes the new layout types, so here's that part of
the pull message repeated:

"Support for a new pnfs layout type from Christoph Hellwig. The new
layout type is a variant of the block layout which uses SCSI features
to offer improved fencing and device identification.

Note this pull request also includes the client side of SCSI layout,
with Trond's permission"

* tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
nfsd: use short read as well as i_size to set eof
nfsd: better layoutupdate bounds-checking
nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
nfsd: add SCSI layout support
nfsd: move some blocklayout code
nfsd: add a new config option for the block layout driver
nfs/blocklayout: add SCSI layout support
nfs4.h: add SCSI layout definitions

Linus Torvalds
2016-03-25 10:50:32 +0800

22 Mar, 2016

1 commit

f35592a97 nfs/blocklayout: make sure making a aligned read request ... Browse Code »

Only treat write goes up to the inode size as aligned request,
because it always write PAGE_CACHE_SIZE, but read a dynamic size.

Signed-off-by: Kinglong Mee
Signed-off-by: Trond Myklebust

Kinglong Mee
2016-03-22 00:39:46 +0800

18 Mar, 2016

1 commit

d9186c039 nfs/blocklayout: add SCSI layout support ... Browse Code »

This is a trivial extension to the block layout driver to support the
new SCSI layouts draft. There are three changes:

- device identifcation through the SCSI VPD page. This allows us to
directly use the udev generated persistent device names instead of
requiring an expensive lookup by crawling every block device node
in /dev and reading a signature for it.
- use of SCSI persistent reservations to protect device access and
allow for robust fencing. On the client sides this just means
registering and unregistering a server supplied key.
- an optimized LAYOUTCOMMIT payload that doesn't send unessecary
fields to the server.

Signed-off-by: Christoph Hellwig
Acked-by: Trond Myklebust
Signed-off-by: J. Bruce Fields

Christoph Hellwig
2016-03-18 23:38:17 +0800

18 Feb, 2016

1 commit

c89757061 pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page ... Browse Code »

unreferenced object 0xffffc90000abf000 (size 16900):
comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] __vmalloc_node_range+0x231/0x280
[] __vmalloc+0x4a/0x50
[] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
[] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
[] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
[] pnfs_generic_sync+0x13/0x20 [nfsv4]
[] nfs4_file_fsync+0x58/0x150 [nfsv4]
[] vfs_fsync_range+0x4b/0xb0
[] do_fsync+0x3d/0x70
[] SyS_fsync+0x10/0x20
[] entry_SYSCALL_64_fastpath+0x12/0x76
[] 0xffffffffffffffff

v2, add missing include header

Signed-off-by: Kinglong Mee
Signed-off-by: Trond Myklebust

Kinglong Mee
2016-02-18 00:44:45 +0800

22 Oct, 2015

1 commit

15ae2c7bd nfs/blocklayout: Fix bad using of page offset in bl_read_pagelist ... Browse Code »

Blocklayout uses file offset for the read-back page's offset of first writing,
it's definitely wrong, it writes data to bad address of page that cause userspace
application segment fault. It must be the page base stored in header->args.pgbase.

Also, the pg_offset has no influence with isect and extent length.

Note: The offset of the non-first page is always zero.

Ps: A test program will segment fault at read() as,
#define _GNU_SOURCE

#include
#include
#include
#include
#include
#include
#include

int main(int argc, char **argv)
{
char buf[2049];
char *filename = NULL;
int fd = -1;

if (argc < 2) {
printf("Usage: %s filename\n", argv[0]);
return 0;
}

filename = argv[1];
fd = open(filename, O_RDONLY | O_DIRECT);
if (fd < 0) {
printf("Open %s fail: %m\n", filename);
return 1;
}

lseek(fd, 2048, SEEK_SET);
if (read(fd, buf, sizeof(buf) - 1) != (sizeof(buf) - 1))
printf("Read 4096 bityes data from %s fail: %m\n", filename);
out:
close(fd);
return 0;
}

Signed-off-by: Kinglong Mee
Signed-off-by: Trond Myklebust

Kinglong Mee
2015-10-22 04:55:47 +0800

08 Sep, 2015

1 commit

4e4adb2f4 Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

Stable patches:
- Fix atomicity of pNFS commit list updates
- Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
- nfs_set_pgio_error sometimes misses errors
- Fix a thinko in xs_connect()
- Fix borkage in _same_data_server_addrs_locked()
- Fix a NULL pointer dereference of migration recovery ops for v4.2
client
- Don't let the ctime override attribute barriers.
- Revert "NFSv4: Remove incorrect check in can_open_delegated()"
- Ensure flexfiles pNFS driver updates the inode after write finishes
- flexfiles must not pollute the attribute cache with attrbutes from
the DS
- Fix a protocol error in layoutreturn
- Fix a protocol issue with NFSv4.1 CLOSE stateids

Bugfixes + cleanups
- pNFS blocks bugfixes from Christoph
- Various cleanups from Anna
- More fixes for delegation corner cases
- Don't fsync twice for O_SYNC/IS_SYNC files
- Fix pNFS and flexfiles layoutstats bugs
- pnfs/flexfiles: avoid duplicate tracking of mirror data
- pnfs: Fix layoutget/layoutreturn/return-on-close serialisation
issues
- pnfs/flexfiles: error handling retries a layoutget before fallback
to MDS

Features:
- Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from
Kinglong
- More RDMA client transport improvements from Chuck
- Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr()
verbs from the SUNRPC, Lustre and core infiniband tree.
- Optimise away the close-to-open getattr if there is no cached data"

* tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits)
NFSv4: Respect the server imposed limit on how many changes we may cache
NFSv4: Express delegation limit in units of pages
Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"
NFS: Optimise away the close-to-open getattr if there is no cached data
NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb
NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()
nfs: Remove unneeded checking of the return value from scnprintf
nfs: Fix truncated client owner id without proto type
NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid
NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
NFSv4.1/flexfiles: Fix freeing of mirrors
NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
NFSv4.1: Fix a protocol issue with CLOSE stateids
NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
SUNRPC: Prevent SYN+SYNACK+RST storms
SUNRPC: xs_reset_transport must mark the connection as disconnected
NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
...

Linus Torvalds
2015-09-08 05:02:24 +0800

18 Aug, 2015

5 commits

8bb289758 pnfs: move common blocklayout XDR defintions to nfs4.h ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2015-08-18 02:22:49 +0800
513d6d7a9 pnfs/blocklayout: pass proper file mode to blkdev_get/put ... Browse Code »

We generally want to read and write to a block device that's used by
the pNFS block layout client (and even if it's read only the server
has no way of telling us). Add FMODE_WRITE to the mode argument
so that we don't incorrectly tell the block driver that we want a
read-only open.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2015-08-18 02:22:49 +0800
2bd3c63a3 pnfs/blocklayout: reject too long signatures ... Browse Code »

Instead of overwriting kernel memory reject too long signatures.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2015-08-18 02:22:49 +0800
68596bd18 pnfs/blocklayout: set up layoutupdate_pages properly ... Browse Code »

We need to replace the __be32 with a void pointer to do proper arithmentics
on the virtual addresses so that we can get the right page pointers.

Reported-by: Dan Carpenter
Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2015-08-18 02:22:49 +0800
29662fa64 pnfs/blocklayout: calculate layoutupdate size correctly ... Browse Code »

We need to include the first u32 for the number of entries. Add a helper
for the calculation instead of opencoding it so that it's in one place.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2015-08-18 02:22:49 +0800

29 Jul, 2015

1 commit

4246a0b63 block: add a bi_error field to struct bio ... Browse Code »

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: NeilBrown
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-07-29 22:55:15 +0800

28 Mar, 2015

2 commits

5bb89b470 NFSv4.1/pnfs: Separate out metadata and data consistency for pNFS ... Browse Code »

The LAYOUTCOMMIT operation means different things to different layout types.
For blocks and objects, it is both a data and metadata consistency operation.
For files and flexfiles, it is only a metadata consistency operation.

This patch separates out the 2 cases, allowing the files/flexfiles layout
drivers to optimise away the data consistency calls to layoutcommit.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-28 00:39:38 +0800
84a80f62f NFSv4.1: Convert pNFS deviceid to use kfree_rcu() ... Browse Code »

Use of synchronize_rcu() when unmounting and potentially freeing a lot
of deviceids is problematic. There really is no reason why we can't just
use kfree_rcu() here.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-28 00:32:24 +0800

04 Feb, 2015

1 commit

180bb5ec0 pnfs: release lseg in pnfs_generic_pg_cleanup ... Browse Code »

This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.

Signed-off-by: Weston Andros Adamson

Weston Andros Adamson
2015-02-04 03:06:44 +0800

11 Dec, 2014

1 commit

cbfe0de30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:

- unification of d_splice_alias()/d_materialize_unique()

- iov_iter rewrite

- killing a bunch of ->f_path.dentry users (and f_dentry macro).

Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.

Still not complete, but much closer now.

- crapectomy in lustre (dead code removal, mostly)

- "let's make seq_printf return nothing" preparations

- assorted cleanups and fixes

There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...

Linus Torvalds
2014-12-11 08:10:49 +0800

09 Dec, 2014

1 commit

ba00410b8 Merge branch 'iov_iter' into for-next Browse Code »

Al Viro
2014-12-09 09:39:29 +0800

25 Nov, 2014

1 commit

6a74c0c94 pnfs/blocklayout: fix end calculation in pnfs_num_cont_bytes ... Browse Code »

Use the number of pages in the pagecache mapping instead of the
number of pnfs requests which is only slightly related.

Reported-by: Weston Andros Adamson
Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-11-25 06:00:41 +0800

20 Nov, 2014

1 commit

32a59234a rpc_pipefs.c: get rid of f_dentry ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:23 +0800

13 Nov, 2014

2 commits

b283f9445 nfs: Remove bogus assignment ... Browse Code »

Commit 3a6fd1f004fc (pnfs/blocklayout: remove read-modify-write handling
in bl_write_pagelist) introduced a bogus assignment pg_index = pg_index
in variable initialization. AFAICS it's just a typo so remove it.
Spotted by Coverity (id 1248711).

CC: Christoph Hellwig
Signed-off-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Jan Kara
2014-11-13 03:22:53 +0800
e0d4ed71c pnfs/blocklayout: serialize GETDEVICEINFO calls ... Browse Code »

The rpc_pipefs code isn't thread safe, leading to occasional use after
frees when running xfstests generic/241 (dbench).

Signed-off-by: Christoph Hellwig
Link: http://lkml.kernel.org/r/1411740170-18611-2-git-send-email-hch@lst.de
Cc: stable@vger.kernel.org # 3.17.x
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-11-13 03:22:52 +0800

13 Oct, 2014

1 commit

faafcba3b Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:

- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)

- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

- sched/numa updates and optimizations (Rik van Riel)

- sysbench speedup (Vincent Guittot)

- capacity calculation cleanups/refactoring (Vincent Guittot)

- Various cleanups to thread group iteration (Oleg Nesterov)

- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)

- various sched/deadline fixes

... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...

Linus Torvalds
2014-10-13 22:23:15 +0800

22 Sep, 2014

1 commit

5466112f0 pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe ... Browse Code »

kbuild test robot reports:

fs/built-in.o: In function `bl_map_stripe':
>> :(.text+0x965b4): undefined reference to `__aeabi_uldivmod'
>> :(.text+0x965cc): undefined reference to `__aeabi_uldivmod'
>> :(.text+0x96604): undefined reference to `__aeabi_uldivmod'

Fixes: 5c83746a0cf2 (pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing)
Cc: Stephen Rothwell
Cc: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-09-22 02:20:20 +0800

19 Sep, 2014

1 commit

f139caf2e sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() ... Browse Code »

schedule(), io_schedule() and schedule_timeout() always return
with TASK_RUNNING state set, so one more setting is unnecessary.

(All places in patch are visible good, only exception is
kiblnd_scheduler() from:

drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c

Its schedule() is one line above standard 3 lines of unified diff)

No places where set_current_state() is used for mb().

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
Cc: Alasdair Kergon
Cc: Anil Belur
Cc: Arnd Bergmann
Cc: Dave Kleikamp
Cc: David Airlie
Cc: David Howells
Cc: Dmitry Eremin
Cc: Frank Blaschka
Cc: Greg Kroah-Hartman
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Isaac Huang
Cc: James E.J. Bottomley
Cc: James E.J. Bottomley
Cc: J. Bruce Fields
Cc: Jeff Dike
Cc: Jesper Nilsson
Cc: Jiri Slaby
Cc: Laura Abbott
Cc: Liang Zhen
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Masaru Nomura
Cc: Michael Opdenacker
Cc: Mikael Starvik
Cc: Mike Snitzer
Cc: Neil Brown
Cc: Oleg Drokin
Cc: Peng Tao
Cc: Richard Weinberger
Cc: Robert Love
Cc: Steven Rostedt
Cc: Trond Myklebust
Cc: Ursula Braun
Cc: Zi Shen Lim
Cc: devel@driverdev.osuosl.org
Cc: dm-devel@redhat.com
Cc: dri-devel@lists.freedesktop.org
Cc: fcoe-devel@open-fcoe.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-afs@lists.infradead.org
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: qla2xxx-upstream@qlogic.com
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: user-mode-linux-user@lists.sourceforge.net
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-09-19 18:35:17 +0800

16 Sep, 2014

1 commit

b262b35c2 pnfs/blocklayout: include vmalloc.h for __vmalloc ... Browse Code »

Signed-off-by: Stephen Rothwell
Reviewed-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Stephen Rothwell
2014-09-16 07:33:28 +0800

13 Sep, 2014

8 commits

164ae58c3 pNFS/blocklayout: Remove a couple of unused variables ... Browse Code »

Cc: Christoph Hellwig
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-09-13 01:34:54 +0800
5c83746a0 pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing ... Browse Code »

This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well
as the management of complex devices. The reason for that is we might have
multiple outstanding complex devices after a NOTIFY_DEVICEID4_CHANGE, which
device mapper or md can't handle as they claim devices exclusively.

But as is turns out simple striping / concatenation is fairly trivial to
implement anyway, so we make our life simpler by reducing the reliance
on blkmapd. For now we still use blkmapd by feeding it synthetic SIMPLE
device XDR to translate device signatures to device numbers, but in the
long runs I have plans to eliminate it entirely.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:33:50 +0800
871760ce9 pnfs/blocklayout: move all rpc_pipefs related code into a single file ... Browse Code »

Create a file to house all the rpc_pipefs boilerplate code instead of
sprinkling it over a few files.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:33:50 +0800
ca0fe1dfa pnfs/blocklayout: refactor extent processing ... Browse Code »

Factor out a helper for all per-extent work, and merge the now trivial
functions for lseg allocation and parsing.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:33:49 +0800
9cc475411 pnfs/blocklayout: move extent processing to blocklayout.c ... Browse Code »

This isn't device(id) related, so move it into the main file. Simple move
for now, the next commit will clean it up a bit.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:33:49 +0800
34dc93c2f pnfs/blocklayout: allocate separate pages for the layoutcommit payload ... Browse Code »

Instead of overflowing the XDR send buffer with our extent list allocate
pages and pre-encode the layoutupdate payload into them. We optimistically
allocate a single page use alloc_page and only switch to vmalloc when we
have more extents outstanding. Currently there is only a single testcase
(xfstests generic/113) which can reproduce large enough extent lists for
this to occur.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:22:45 +0800
d4b18c3e0 pnfs: remove GETDEVICELIST implementation ... Browse Code »

The current GETDEVICELIST implementation is buggy in that it doesn't handle
cursors correctly, and in that it returns an error if the server returns
NFSERR_NOTSUPP. Given that there is no actual need for GETDEVICELIST,
it has various issues and might get removed for NFSv4.2 stop using it in
the blocklayout driver, and thus the Linux NFS client as whole.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:20:54 +0800
3e3f6b4e2 pnfs/blocklayout: remove some debugging ... Browse Code »

The kbuild test robot complained that we got the printk format wrong.
Let's just kill these printks instead of fixing them as there is not
point after the initial tree algorithm debugging.

Reported-by: kbuild test robot
Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-13 01:20:35 +0800

11 Sep, 2014

6 commits

20d655d61 pnfs/blocklayout: use the device id cache ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:04 +0800
848746bd2 pnfs/blocklayout: return layouts on setattr ... Browse Code »

This speads up truncate-heavy workloads like fsx by multiple orders of
magnitude.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:03 +0800
71d5b7630 pnfs/blocklayout: implement the return_range method ... Browse Code »

This allows removing extents from the extent tree especially on truncate
operations, and thus fixing reads from truncated and re-extended that
previously returned stale data.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:03 +0800
8067253c8 pnfs/blocklayout: rewrite extent tracking ... Browse Code »

Currently the block layout driver tracks extents in three separate
data structures:

- the two list of pnfs_block_extent structures returned by the server
- the list of sectors that were in invalid state but have been written to
- a list of pnfs_block_short_extent structures for LAYOUTCOMMIT

All of these share the property that they are not only highly inefficient
data structures, but also that operations on them are even more inefficient
than nessecary.

In addition there are various implementation defects like:

- using an int to track sectors, causing corruption for large offsets
- incorrect normalization of page or block granularity ranges
- insufficient error handling
- incorrect synchronization as extents can be modified while they are in
use

This patch replace all three data with a single unified rbtree structure
tracking all extents, as well as their in-memory state, although we still
need to instance for read-only and read-write extent due to the arcane
client side COW feature in the block layouts spec.

To fix the problem of extent possibly being modified while in use we make
sure to return a copy of the extent for use in the write path - the
extent can only be invalidated by a layout recall or return which has
to wait until the I/O operations finished due to refcounts on the layout
segment.

The new extent tree work similar to the schemes used by block based
filesystems like XFS or ext4.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:03 +0800
8c792ea94 pnfs/blocklayout: don't set pages uptodate ... Browse Code »

The core nfs code handles setting pages uptodate on reads, no need to mess
with the pageflags outselves. Also remove a debug function to dump page
flags.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:03 +0800
3a6fd1f00 pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist ... Browse Code »

Use the new PNFS_READ_WHOLE_PAGE flag to offload read-modify-write
handling to core nfs code, and remove a huge chunk of deadlock prone
mess from the block layout writeback path.

Signed-off-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Christoph Hellwig
2014-09-11 03:47:03 +0800