13 Jan, 2012

2 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

06 Jan, 2012

1 commit

  • We have no business doing any this in the standard write release path.
    Get rid of it, and put it in the pNFS layer.

    Also, while we're at it, get rid of the completely bogus unlock/relock
    semantics that were present in nfs_writeback_release_full(). It is
    not only unnecessary, but actually dangerous to release the write lock
    just in order to take it again in nfs_page_async_flush(). Better just
    to open code the pgio operations in a pnfs helper.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

11 Nov, 2011

1 commit


20 Oct, 2011

1 commit


15 Jul, 2011

3 commits


13 Jul, 2011

2 commits


15 Jun, 2011

1 commit

  • Commit 28331a46d88459788c8fca72dbb0415cd7f514c9 "Ensure we request the
    ordinary fileid when doing readdirplus"
    changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when
    FATTR4_WORD1_MOUNTED_ON_FILED was requested.

    Allow nfs_fhget to succeed with only a mounted on fileid when crossing
    a mountpoint or a referral.

    Ask for the fileid of the absent file system if mounted_on_fileid is not
    supported.

    Signed-off-by: Andy Adamson
    cc:stable@kernel.org [2.6.39]
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

30 May, 2011

2 commits

  • * 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
    pnfs-obj: pg_test check for max_io_size
    NFSv4.1: define nfs_generic_pg_test
    NFSv4.1: use pnfs_generic_pg_test directly by layout driver
    NFSv4.1: change pg_test return type to bool
    NFSv4.1: unify pnfs_pageio_init functions
    pnfs-obj: objlayout_encode_layoutcommit implementation
    pnfs: encode_layoutcommit
    pnfs-obj: report errors and .encode_layoutreturn Implementation.
    pnfs: encode_layoutreturn
    pnfs: layoutret_on_setattr
    pnfs: layoutreturn
    pnfs-obj: osd raid engine read/write implementation
    pnfs: support for non-rpc layout drivers
    pnfs-obj: define per-inode private structure
    pnfs: alloc and free layout_hdr layoutdriver methods
    pnfs-obj: objio_osd device information retrieval and caching
    pnfs-obj: decode layout, alloc/free lseg
    pnfs-obj: pnfs_osd XDR client implementation
    pnfs-obj: pnfs_osd XDR definitions
    pnfs-obj: objlayoutdriver module skeleton
    ...

    Linus Torvalds
     
  • Non-rpc layout driver such as for objects and blocks
    implement their own I/O path and error handling logic.
    Therefore bypass NFS-based error handling for these layout drivers.

    [fix lseg ref-count bugs, and null de-refs]
    [Fall out from: non-rpc layout drivers]
    Signed-off-by: Boaz Harrosh
    [get rid of PNFS_USE_RPC_CODE]
    [get rid of __nfs4_write_done_cb]
    [revert useless change in nfs4_write_done_cb]
    Signed-off-by: Benny Halevy

    Benny Halevy
     

25 May, 2011

1 commit

  • Change each shrinker's API by consolidating the existing parameters into
    shrink_control struct. This will simplify any further features added w/o
    touching each file of shrinker.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix warning]
    [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
    [akpm@linux-foundation.org: fix xfs warning]
    [akpm@linux-foundation.org: update gfs2]
    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     

25 Mar, 2011

3 commits


24 Mar, 2011

1 commit


18 Mar, 2011

1 commit

  • * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
    RPC: killing RPC tasks races fixed
    xprt: remove redundant check
    SUNRPC: Convert struct rpc_xprt to use atomic_t counters
    SUNRPC: Ensure we always run the tk_callback before tk_action
    sunrpc: fix printk format warning
    xprt: remove redundant null check
    nfs: BKL is no longer needed, so remove the include
    NFS: Fix a warning in fs/nfs/idmap.c
    Cleanup: Factor out some cut-and-paste code.
    cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
    NFS: account direct-io into task io accounting
    gss:krb5 only include enctype numbers in gm_upcall_enctypes
    RPCRDMA: Fix FRMR registration/invalidate handling.
    RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
    NFSv4: Send unmapped uid/gids to the server when using auth_sys
    NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
    NFSv4: cleanup idmapper functions to take an nfs_server argument
    NFSv4: Send unmapped uid/gids to the server if the idmapper fails
    NFSv4: If the server sends us a numeric uid/gid then accept it
    NFSv4.1: reject zero layout with zeroed stripe unit
    ...

    Linus Torvalds
     

17 Mar, 2011

3 commits


12 Mar, 2011

5 commits

  • Allows the pnfs filelayout driver to write to the data servers.

    Note that COMMIT to data servers will be implemented in a future
    patch. To avoid improper behavior, for the moment any WRITE to a data
    server that would also require a COMMIT to the data server is sent
    NFS_FILE_SYNC.

    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Mingyang Guo
    Signed-off-by: Oleg Drokin
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Use our own async error handler.
    Mark the layout as failed and retry i/o through the MDS on specified errors.

    Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
    to a DS gets correctly resent through the MDS.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Attempt a pNFS file layout read by setting up the nfs_read_data struct and
    calling nfs_initiate_read with the data server rpc client and the
    filelayout rpc call ops.

    Error handling is implemented in a subsequent patch.

    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Fred Isaman
    Signed-off-by: Mingyang Guo
    Signed-off-by: Oleg Drokin
    Signed-off-by: Ricardo Labiaga
    Tested-by: Guo Mingyang
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Introduce a data server set_client and init session following the
    nfs4_set_client and nfs4_init_session convention.

    Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
    serializes access to creating an nfs_client struct with matching properties.

    Use the new nfs_get_client() that initializes new clients.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Now nfs_get_client returns an nfs_client ready to be used no matter if it was
    found or created.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

26 Jan, 2011

1 commit

  • The information required to find the nfs_client cooresponding to the incoming
    back channel request is contained in the NFS layer. Perform minimal checking
    in the RPC layer pg_authenticate method, and push more detailed checking into
    the NFS layer where the nfs_client can be found.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

16 Jan, 2011

1 commit


07 Jan, 2011

2 commits

  • Fixes a bug where the nfs_client could be freed during callback processing.
    Refactor nfs_find_client to use minorversion specific means to locate the
    correct nfs_client structure.

    In the NFS layer, V4.0 clients are found using the callback_ident field in the
    CB_COMPOUND header. V4.1 clients are found using the sessionID in the
    CB_SEQUENCE operation which is also compared against the sessionID associated
    with the back channel thread after a successful CREATE_SESSION.

    Each of these methods finds the one an only nfs_client associated
    with the incoming callback request - so nfs_find_client_next is not needed.

    In the RPC layer, the pg_authenticate call needs to find the nfs_client. For
    the v4.0 callback service, the callback identifier has not been decoded so a
    search by address, version, and minorversion is used. The sessionid for the
    sessions based callback service has (usually) not been set for the
    pg_authenticate on a CB_NULL call which can be sent prior to the return
    of a CREATE_SESSION call, so the sessionid associated with the back channel
    thread is not used to find the client in pg_authenticate for CB_NULL calls.

    Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed
    via the new cb_process_state structure. The reference is held across
    cb_compound processing.

    Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP
    processing from process_op into nfs4_callback_sequence where it belongs.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Use the small id to pointer translator service to provide a unique callback
    identifier per SETCLIENTID call used to identify the v4.0 callback service
    associated with the clientid.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

17 Dec, 2010

3 commits

  • Clean up.

    The pointer returned by ->decode_dirent() is no longer used as a
    pointer. The only call site (xdr_decode() in fs/nfs/dir.c) simply
    extracts the errno value encoded in the pointer. Replace the
    returned pointer with a standard integer errno return value.

    Also, pass the "server" argument as part of the nfs_entry instead of
    as a separate parameter. It's faster to derive "server" in
    nfs_readdir_xdr_to_array() since we already have the directory's inode
    handy. "server" ought to be invariant for a set of entries in the
    same directory, right?

    The legacy versions of decode_dirent() don't use "server" anyway, so
    it's wasted work for them to derive and pass "server" for each entry.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • We'd like to prevent local buffer overflows caused by malicious or
    broken servers. New xdr_stream style decoders can do that.

    For efficiency, we also eventually want to be able to pass xdr_streams
    from call_decode() to all XDR decoding functions, rather than building
    an xdr_stream in every XDR decoding function in the kernel.

    nfs_decode_dirent() is renamed to follow the naming convention of the
    other two dirent decoders.

    Static helper functions are left without the "inline" directive. This
    allows the compiler to choose automatically how to optimize these for
    size or speed.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    To distinguish more clearly between the on-the-wire NFSERR_ value and
    our local errno values, use the proper type for the argument of
    nfs_stat_to_errno().

    Add a documenting comment appropriate for a global function shared
    outside this source file.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

23 Nov, 2010

1 commit


24 Oct, 2010

3 commits

  • By requsting more attributes during a readdir, we can mimic the readdir plus
    operation that was in NFSv3.

    To test, I ran the command `ls -lU --color=none` on directories with various
    numbers of files. Without readdir plus, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
    user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
    sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
    access | 3 | 1 | 1 | 4 | 31
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
    readdir | 2 | 16 | 158 | 1,575 | 15,749
    total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

    With readdir plus enabled, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
    user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
    sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
    access | 3 | 1 | 1 | 1 | 7
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 4 | 3 | 3 | 3 | 3
    readdir | 6 | 62 | 630 | 6,300 | 62,993
    total | 15 | 67 | 635 | 6,305 | 63,004

    Readdir plus disabled has about a 16x increase in the number of rpc calls and
    is 4 - 5 times slower on large directories.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • We can use vmapped pages to read more information from the network at once.
    This will reduce the number of calls needed to complete a readdir.

    Signed-off-by: Bryan Schumaker
    [trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
    kernel oops that has been occuring when reading a vmapped page.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

1 commit