13 Jan, 2012
7 commits
-
As discussed earlier, it is better for block client to allocate memory for
tracking extents state before submitting bio. So the patch does it by allocating
a short_extent for every INVALID extent touched by write pagelist and for
every zeroing page we created, saving them in layout header. Then in end_io we
can just use them to create commit list items and avoid memory allocation there.Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
block layout can just make use of generic read/write_done.
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
Also avoid unnecessary lock_page if page is handled by others.
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
It does not need to manipulate on partial initialized blocks.
Writeback code takes care of it.Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
bio_alloc will fail when there are many pages in one read/write_pagelist.Cc: #3.1+
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
bl_free_block_dev() may sleep. We can not call it with spinlock held.
Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.Cc: #3.1+
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust -
To pass the IO status to upper layer.
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust
19 Oct, 2011
6 commits
-
We should check if the sector is already initialized before
trying to grab the page from page cache. Otherwise when two
pages of the same block are written back by two threads each
calling from writepage_locked, it can cause deadlock like bellow.[ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
[ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.972812] kswapd0 D ffff88000c4926c0 0 25 2 0x00000000
[ 1080.972816] ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
[ 1080.972821] ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
[ 1080.972824] ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
[ 1080.972828] Call Trace:
[ 1080.972860] [] ? read_tsc+0x9/0x19
[ 1080.972877] [] ? lock_page+0x2b/0x2b
[ 1080.972899] [] io_schedule+0x63/0x7e
[ 1080.972902] [] sleep_on_page+0xe/0x12
[ 1080.972905] [] __wait_on_bit_lock+0x46/0x8f
[ 1080.972916] [] ? lock_release_holdtime.part.7+0x6b/0x72
[ 1080.972919] [] __lock_page+0x66/0x68
[ 1080.972928] [] ? autoremove_wake_function+0x3d/0x3d
[ 1080.972932] [] lock_page+0x27/0x2b
[ 1080.972934] [] find_lock_page+0x34/0x57
[ 1080.972937] [] find_or_create_page+0x34/0x8a
[ 1080.972947] [] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
[ 1080.972951] [] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
[ 1080.972995] [] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
[ 1080.973033] [] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
[ 1080.973089] [] nfs_pageio_doio+0x1a/0x43 [nfs]
[ 1080.973098] [] nfs_pageio_complete+0x16/0x2d [nfs]
[ 1080.973108] [] nfs_writepage_locked+0xa0/0xbf [nfs]
[ 1080.973119] [] nfs_writepage+0x16/0x2b [nfs]
[ 1080.973122] [] ? clear_page_dirty_for_io+0x87/0x9a
[ 1080.973133] [] shrink_page_list+0x39b/0x6c8
[ 1080.973139] [] shrink_inactive_list+0x22c/0x39e
[ 1080.973144] [] ? lock_release_holdtime.part.7+0x6b/0x72
[ 1080.973148] [] shrink_zone+0x445/0x588
[ 1080.973152] [] balance_pgdat+0x2c2/0x56b
[ 1080.973170] [] ? __bitmap_weight+0x34/0x80
[ 1080.973175] [] kswapd+0x2be/0x2fa
[ 1080.973179] [] ? __init_waitqueue_head+0x4b/0x4b
[ 1080.973183] [] ? balance_pgdat+0x56b/0x56b
[ 1080.973187] [] kthread+0xa8/0xb0
[ 1080.973200] [] kernel_thread_helper+0x4/0x10
[ 1080.973205] [] ? __init_kthread_worker+0x5a/0x5a
[ 1080.973210] [] ? gs_change+0x13/0x13
[ 1080.973213] no locks held by kswapd0/25.Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust -
bl_add_page_to_bio returns error pointer. bio should be reset to
NULL in failure cases as the out path always calls bl_submit_bio.Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust -
file layout and block layout both use it to set mark layout io failure
bit. So make it generic.Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust -
Reviewed-by: Jeff Layton
Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust -
The same function is used by idmap, gss and blocklayout code. Make it
generic.Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust -
Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
nfs4_blk_decode_device.Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
nfs4_blk_get_deviceinfo.Signed-off-by: Jim Rees
Signed-off-by: Benny Halevy
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust
04 Aug, 2011
1 commit
-
Fix this compile error on s390:
CC [M] fs/nfs/blocklayout/blocklayout.o
fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read':
fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw'Introduced with 9549ec01 "pnfsblock: bl_read_pagelist".
Cc: Fred Isaman
Signed-off-by: Heiko Carstens
Signed-off-by: Trond Myklebust
01 Aug, 2011
13 commits
-
For invalid extents, find other pages in the same fsblock and write them out.
[pnfsblock: write_begin]
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Signed-off-by: Peng Tao
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.[pnfsblock: bl_write_pagelist support functions]
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees[SQUASHME: pnfsblock: mds_offset is set in the generic layer]
Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman
[pnfsblock: fixup blksize alignment in bl_setup_layoutcommit]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao
[pnfsblock: read path error handling]
Signed-off-by: Fred Isaman
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang
[pnfs-block: use new read_pagelist api]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
extents list, for reads.In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:1. On setup_layoutcommit, the range is adjusted as before
and a structure is allocated for communication with
bl_encode_layoutcommit && bl_cleanup_layoutcommit
(Generic layer provides a void-star to hang it on)2. bl_encode_layoutcommit is called to do the actual
encoding directly into xdr. The commit-extent-list is not
freed and is stored on above structure.
FIXME: The code is not yet converted to the new XDR cleanup3. On cleanup the commit-extent-list is put back by a call
to set_to_rw() as before, but with no need for XDR decoding
of the list as before. And the commit-extent-list is freed.
Finally allocated structure is freed.[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees
[pnfsblock: introduce bl_committing list]
Signed-off-by: Peng Tao
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo
[pnfsblock: cleanup_layoutcommit wants a status parameter]
Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
extents list, for reads.In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:1. On setup_layoutcommit, the range is adjusted as before
and a structure is allocated for communication with
bl_encode_layoutcommit && bl_cleanup_layoutcommit
(Generic layer provides a void-star to hang it on)2. bl_encode_layoutcommit is called to do the actual
encoding directly into xdr. The commit-extent-list is not
freed and is stored on above structure.
FIXME: The code is not yet converted to the new XDR cleanup3. On cleanup the commit-extent-list is put back by a call
to set_to_rw() as before, but with no need for XDR decoding
of the list as before. And the commit-extent-list is freed.
Finally allocated structure is freed.[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees
Signed-off-by: Peng Tao
Signed-off-by: Fred Isaman
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo
[pnfsblock: prevent commit list corruption]
[pnfsblock: fix layoutcommit with an empty opaque]
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Adds working implementations of various support functions
to handle INVAL extents, needed by writes, such as
bl_mark_sectors_init and bl_is_sector_init.[pnfsblock: fix 64-bit compiler warnings for extent manipulation]
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[Implement release_inval_marks]
Signed-off-by: Zhang Jingwang
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO
for each device returned.[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees
[pnfsblock: fix pnfs_deviceid references]
Signed-off-by: Fred Isaman
[pnfsblock: fix print format warnings for sector_t and size_t]
[pnfs-block: #include ]
[pnfsblock: no PNFS_NFS_SERVER]
Signed-off-by: Benny Halevy
[pnfsblock: fix bug determining size of striped volume]
[pnfsblock: fix oops when using multiple devices]
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[pnfsblock: get rid of vmap and deviceid->area structure]
Signed-off-by: Peng Tao
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Signed-off-by: Fred Isaman
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo
Signed-off-by: Benny Halevy
Signed-off-by: Zhang Jingwang
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Signed-off-by: Jim Rees
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[upcall bugfixes]
Signed-off-by: Peng Tao
Signed-off-by: Trond Myklebust -
Adds structures and basic create/delete code for extents.
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
Signed-off-by: Zhang Jingwang
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
[pnfsblock: use pnfs_generic_pg_init_read/write]
Signed-off-by: Peng Tao
Signed-off-by: Benny Halevy
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust -
Define a configuration variable to enable/disable compilation of the
block driver code.Add the minimal structure for a pnfs block layout driver, and empty
list-heads that will hold the extent data[pnfsblock: make NFS_V4_1 select PNFS_BLOCK]
Signed-off-by: Peng Tao
Signed-off-by: Fred Isaman
Signed-off-by: Benny Halevy
[pnfs-block: fix CONFIG_PNFS_BLOCK dependencies]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[pnfsblock: layout alloc and free]
Signed-off-by: Fred Isaman
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy
Signed-off-by: Benny Halevy
[pnfsblock: define module alias]
Signed-off-by: Peng Tao
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees
Signed-off-by: Trond Myklebust