28 Sep, 2013

1 commit

  • Use i_writecount to control whether to get an fscache cookie in nfs_open() as
    NFS does not do write caching yet. I *think* this is the cause of a problem
    encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
    pointer dereference because cookie->def is NULL:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: [] __fscache_uncache_page+0x23/0x160
    PGD 0
    Thread overran stack, or stack corrupted
    Oops: 0000 [#1] SMP
    Modules linked in: ...
    CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
    Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
    task: ffff8804203460c0 ti: ffff880420346640
    RIP: 0010:[] __fscache_uncache_page+0x23/0x160
    RSP: 0018:ffff8801053af878 EFLAGS: 00210286
    RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
    RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
    RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
    R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
    CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
    Stack:
    ...
    Call Trace:
    [] __nfs_fscache_invalidate_page+0x42/0x70
    [] nfs_invalidate_page+0x75/0x90
    [] truncate_inode_page+0x8e/0x90
    [] truncate_inode_pages_range.part.12+0x14d/0x620
    [] ? __mutex_lock_slowpath+0x1fd/0x2e0
    [] truncate_inode_pages_range+0x53/0x70
    [] truncate_inode_pages+0x2d/0x40
    [] truncate_pagecache+0x4f/0x70
    [] nfs_setattr_update_inode+0xa0/0x120
    [] nfs3_proc_setattr+0xc4/0xe0
    [] nfs_setattr+0xc8/0x150
    [] notify_change+0x1cb/0x390
    [] do_truncate+0x7b/0xc0
    [] do_last+0xa4c/0xfd0
    [] path_openat+0xcc/0x670
    [] do_filp_open+0x4e/0xb0
    [] do_sys_open+0x13f/0x2b0
    [] compat_SyS_open+0x36/0x50
    [] sysenter_dispatch+0x7/0x24

    The code at the instruction pointer was disassembled:

    > (gdb) disas __fscache_uncache_page
    > Dump of assembler code for function __fscache_uncache_page:
    > ...
    > 0xffffffff812a18ff : mov 0x48(%rbx),%rax
    > 0xffffffff812a1903 : cmpb $0x0,0x10(%rax)
    > 0xffffffff812a1907 : je 0xffffffff812a19cd

    These instructions make up:

    ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);

    That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL -
    which presumably means that the cookie has already been at least partway
    through __fscache_relinquish_cookie().

    What I think may be happening is something like a three-way race on the same
    file:

    PROCESS 1 PROCESS 2 PROCESS 3
    =============== =============== ===============
    open(O_TRUNC|O_WRONLY)
    open(O_RDONLY)
    open(O_WRONLY)
    -->nfs_open()
    -->nfs_fscache_set_inode_cookie()
    nfs_fscache_inode_lock()
    nfs_fscache_disable_inode_cookie()
    __fscache_relinquish_cookie()
    nfs_inode->fscache = NULL
    nfs_open()
    -->nfs_fscache_set_inode_cookie()
    nfs_fscache_inode_lock()
    nfs_fscache_enable_inode_cookie()
    __fscache_acquire_cookie()
    nfs_inode->fscache = cookie
    nfs_setattr()
    ...
    ...
    -->nfs_invalidate_page()
    -->__nfs_fscache_invalidate_page()
    cookie = nfsi->fscache
    -->nfs_open()
    -->nfs_fscache_set_inode_cookie()
    nfs_fscache_inode_lock()
    nfs_fscache_disable_inode_cookie()
    -->__fscache_relinquish_cookie()
    -->__fscache_uncache_page(cookie)

    fscache = NULL

    Signed-off-by: David Howells

    David Howells
     

22 Dec, 2012

1 commit

  • Provide a stub nfs_fscache_wait_on_invalidate() function for when
    CONFIG_NFS_FSCACHE=n lest the following error appear:

    fs/nfs/inode.c: In function 'nfs_invalidate_mapping':
    fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration]
    cc1: some warnings being treated as errors

    Reported-by: kbuild test robot
    Reported-by: Vineet Gupta
    Reported-by: Borislav Petkov
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

21 Dec, 2012

1 commit

  • Use the new FS-Cache invalidation facility from NFS to deal with foreign
    changes being detected on the server rather than attempting to retire the old
    cookie and get a new one.

    The problem with the old method was that NFS did not wait for all outstanding
    storage and retrieval ops on the cache to complete. There was no automatic
    wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
    as the latter can only wait on locked pages that have been added to the
    pagecache (which they haven't yet on entry to ->readpages()).

    This was leading to oopses like the one below when an outstanding read got cut
    off from its cookie by a premature release.

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
    PGD 15889067 PUD 15890067 PMD 0
    Oops: 0000 [#1] SMP
    CPU 0
    Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

    Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY
    RIP: 0010:[] [] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
    RSP: 0018:ffff8800158799e8 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
    RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
    RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
    R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
    R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
    FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
    Stack:
    ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
    ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
    ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
    Call Trace:
    [] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
    [] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
    [] ? __alloc_pages_nodemask+0x156/0x662
    [] nfs_readpages+0xee/0x187 [nfs]
    [] __do_page_cache_readahead+0x1be/0x267
    [] ? __do_page_cache_readahead+0xa2/0x267
    [] ra_submit+0x1c/0x20
    [] ondemand_readahead+0x28b/0x29a
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_aio_read+0x2ab/0x67e
    [] nfs_file_read+0xa4/0xc9 [nfs]
    [] do_sync_read+0xba/0xfa
    [] ? might_fault+0x4e/0x9e
    [] ? security_file_permission+0x7b/0x84
    [] ? rw_verify_area+0xab/0xc8
    [] vfs_read+0xaa/0x13a
    [] sys_read+0x45/0x6c
    [] system_call_fastpath+0x16/0x1b

    Reported-by: Mark Moseley
    Signed-off-by: David Howells

    David Howells
     

17 May, 2012

1 commit


15 May, 2012

1 commit


24 Sep, 2009

1 commit

  • Propagate the NFS 'fsc' mount option through NFS automounts of various types.

    This is now required as commit:

    commit c02d7adf8c5429727a98bad1d039bccad4c61c50
    Author: Trond Myklebust
    Date: Mon Jun 22 15:09:14 2009 -0400

    NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace

    uses VFS-driven automounting to reach all submounts barring the root, thus
    preventing fscaching from being enabled on any submount other than the root.

    This patch gets around that by propagating the NFS_OPTION_FSCACHE flag across
    automounts. If a uniquifier is supplied to a mount then this is propagated to
    all automounts of that mount too.

    Signed-off-by: David Howells
    [Trond: Fixed up the definition of nfs_fscache_get_super_cookie for the
    case of #undef CONFIG_NFS_FSCACHE]
    Signed-off-by: Trond Myklebust

    David Howells
     

03 Apr, 2009

9 commits

  • Display the local caching state in /proc/fs/nfsfs/volumes.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Store pages from an NFS inode into the cache data storage object associated
    with that inode.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Read pages from an FS-Cache data storage object representing an inode into an
    NFS inode.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • FS-Cache page management for NFS. This includes hooking the releasing and
    invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for
    completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2).

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Bind data storage objects in the local cache to NFS inodes.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Define and create inode-level cache data storage objects (as managed by
    nfs_inode structs).

    Each inode-level object is created in a superblock-level index object and is
    itself a data storage object into which pages from the inode are stored.

    The inode object key is the NFS file handle for the inode.

    The inode object is given coherency data to carry in the auxiliary data
    permitted by the cache. This is a sequence made up of:

    (1) i_mtime from the NFS inode.

    (2) i_ctime from the NFS inode.

    (3) i_size from the NFS inode.

    (4) change_attr from the NFSv4 attribute data.

    As the cache is a persistent cache, the auxiliary data is checked when a new
    NFS in-memory inode is set up that matches an already existing data storage
    object in the cache. If the coherency data is the same, the on-disk object is
    retained and used; if not, it is scrapped and a new one created.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Define and create superblock-level cache index objects (as managed by
    nfs_server structs).

    Each superblock object is created in a server level index object and is itself
    an index into which inode-level objects are inserted.

    Ideally there would be one superblock-level object per server, and the former
    would be folded into the latter; however, since the "nosharecache" option
    exists this isn't possible.

    The superblock object key is a sequence consisting of:

    (1) Certain superblock s_flags.

    (2) Various connection parameters that serve to distinguish superblocks for
    sget().

    (3) The volume FSID.

    (4) The security flavour.

    (5) The uniquifier length.

    (6) The uniquifier text. This is normally an empty string, unless the fsc=xyz
    mount option was used to explicitly specify a uniquifier.

    The key blob is of variable length, depending on the length of (6).

    The superblock object is given no coherency data to carry in the auxiliary data
    permitted by the cache. It is assumed that the superblock is always coherent.

    This patch also adds uniquification handling such that two otherwise identical
    superblocks, at least one of which is marked "nosharecache", won't end up
    trying to share the on-disk cache. It will be possible to manually provide a
    uniquifier through a mount option with a later patch to avoid the error
    otherwise produced.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Define and create server-level cache index objects (as managed by nfs_client
    structs).

    Each server object is created in the NFS top-level index object and is itself
    an index into which superblock-level objects are inserted.

    Ideally there would be one superblock-level object per server, and the former
    would be folded into the latter; however, since the "nosharecache" option
    exists this isn't possible.

    The server object key is a sequence consisting of:

    (1) NFS version

    (2) Server address family (eg: AF_INET or AF_INET6)

    (3) Server port.

    (4) Server IP address.

    The key blob is of variable length, depending on the length of (4).

    The server object is given no coherency data to carry in the auxiliary data
    permitted by the cache.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Register NFS for caching and retrieve the top-level cache index object cookie.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells