30 May, 2011

40 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    eCryptfs: Remove ecryptfs_header_cache_2
    eCryptfs: Cleanup and optimize ecryptfs_lookup_interpose()
    eCryptfs: Return useful code from contains_ecryptfs_marker
    eCryptfs: Fix new inode race condition
    eCryptfs: Cleanup inode initialization code
    eCryptfs: Consolidate inode functions into inode.c

    Linus Torvalds
     
  • * 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
    pnfs-obj: pg_test check for max_io_size
    NFSv4.1: define nfs_generic_pg_test
    NFSv4.1: use pnfs_generic_pg_test directly by layout driver
    NFSv4.1: change pg_test return type to bool
    NFSv4.1: unify pnfs_pageio_init functions
    pnfs-obj: objlayout_encode_layoutcommit implementation
    pnfs: encode_layoutcommit
    pnfs-obj: report errors and .encode_layoutreturn Implementation.
    pnfs: encode_layoutreturn
    pnfs: layoutret_on_setattr
    pnfs: layoutreturn
    pnfs-obj: osd raid engine read/write implementation
    pnfs: support for non-rpc layout drivers
    pnfs-obj: define per-inode private structure
    pnfs: alloc and free layout_hdr layoutdriver methods
    pnfs-obj: objio_osd device information retrieval and caching
    pnfs-obj: decode layout, alloc/free lseg
    pnfs-obj: pnfs_osd XDR client implementation
    pnfs-obj: pnfs_osd XDR definitions
    pnfs-obj: objlayoutdriver module skeleton
    ...

    Linus Torvalds
     
  • Now that ecryptfs_lookup_interpose() is no longer using
    ecryptfs_header_cache_2 to read in metadata, the kmem_cache can be
    removed and the ecryptfs_header_cache_1 kmem_cache can be renamed to
    ecryptfs_header_cache.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • ecryptfs_lookup_interpose() has turned into spaghetti code over the
    years. This is an effort to clean it up.

    - Shorten overly descriptive variable names such as ecryptfs_dentry
    - Simplify gotos and error paths
    - Create helper function for reading plaintext i_size from metadata

    It also includes an optimization when reading i_size from the metadata.
    A complete page-sized kmem_cache_alloc() was being done to read in 16
    bytes of metadata. The buffer for that is now statically declared.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Instead of having the calling functions translate the true/false return
    code to either 0 or -EINVAL, have contains_ecryptfs_marker() return 0 or
    -EINVAL so that the calling functions can just reuse the return code.

    Also, rename the function to ecryptfs_validate_marker() to avoid callers
    mistakenly thinking that it returns true/false codes.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Only unlock and d_add() new inodes after the plaintext inode size has
    been read from the lower filesystem. This fixes a race condition that
    was sometimes seen during a multi-job kernel build in an eCryptfs mount.

    https://bugzilla.kernel.org/show_bug.cgi?id=36002

    Signed-off-by: Tyler Hicks
    Reported-by: David
    Tested-by: David

    Tyler Hicks
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    arch/tile: more /proc and /sys file support

    Linus Torvalds
     
  • * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
    nfsd: make local functions static
    NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
    NFSD: Check status from nfsd4_map_bcts_dir()
    NFSD: Remove setting unused variable in nfsd_vfs_read()
    nfsd41: error out on repeated RECLAIM_COMPLETE
    nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
    nfsd v4.1 lOCKT clientid field must be ignored
    nfsd41: add flag checking for create_session
    nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
    nfsd4: fix wrongsec handling for PUTFH + op cases
    nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
    nfsd4: introduce OPDESC helper
    nfsd4: allow fh_verify caller to skip pseudoflavor checks
    nfsd: distinguish functions of NFSD_MAY_* flags
    svcrpc: complete svsk processing on cb receive failure
    svcrpc: take advantage of tcp autotuning
    SUNRPC: Don't wait for full record to receive tcp data
    svcrpc: copy cb reply instead of pages
    svcrpc: close connection if client sends short packet
    svcrpc: note network-order types in svc_process_calldir
    ...

    Linus Torvalds
     
  • * 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    SUNRPC: Support for RPC over AF_LOCAL transports
    SUNRPC: Remove obsolete comment
    SUNRPC: Use AF_LOCAL for rpcbind upcalls
    SUNRPC: Clean up use of curly braces in switch cases
    NFS: Revert NFSROOT default mount options
    SUNRPC: Rename xs_encode_tcp_fragment_header()
    nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
    nfs41: Correct offset for LAYOUTCOMMIT
    NFS: nfs_update_inode: print current and new inode size in debug output
    NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
    NFSv4: Handle expired stateids when the lease is still valid
    SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    Squashfs: Fix sanity check patches on big-endian systems

    Linus Torvalds
     
  • Commit 1495f230fa77 ("vmscan: change shrinker API by passing
    shrink_control struct") changed the API of ->shrink(), but missed ubifs
    and cifs instances.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Implement pg_test vector to test for max IO sizes. We calculate
    a max_io_size member only once, and cache it in lseg so to not
    do so on every page insert.

    Signed-off-by: Boaz Harrosh
    [simplify logic]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • By default, unless pnfs is used coalesce pages until pg_bsize
    (rsize or wsize) is reached.

    pnfs layout drivers define their own pg_test methods that use
    pnfs_generic_pg_test and need to define their own I/O size
    limits (e.g. based on the file stripe size).

    [Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Use common code for pnfs_pageio_init_{read,write} and use
    a common generic pg_test function.

    Note that this function always assumes the the layout driver's
    pg_test method is implemented.

    [Fix BUG]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • * Define API for io-engines to report delta_space_used in IOs
    * Encode the osd-layout specific information of the layoutcommit
    XDR buffer.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • Add a layout driver method to encode the layout type specific
    opaque part of layout commit in-line in the xdr stream.

    Currently, the pnfs-objects layout driver uses it to encode metadata hints
    to the MDS and the blocks layout driver to commit provisionally allocated
    extents to the file.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • An io_state pre-allocates an error information structure for each
    possible osd-device that might error during IO. When IO is done if all
    was well the io_state is freed. (as today). If the I/O has ended with an
    error, the io_state is queued on a per-layout err_list. When eventually
    encode_layoutreturn() is called, each error is properly encoded on the
    XDR buffer and only then the io_state is removed from err_list and
    de-allocated.

    It is up to the io_engine to fill in the segment that fault and the type
    of osd_error that occurred. By calling objlayout_io_set_result() for
    each failing device.

    In objio_osd:
    * Allocate io-error descriptors space as part of io_state
    * Use generic objlayout error reporting at end of io.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • Add a layout driver method to encode the layout type specific
    opaque part of layout return in-line in the xdr stream.

    Currently the pnfs-objects layout driver uses it to encode i/o error
    information on LAYOUTRETURN.

    Signed-off-by: Andy Adamson
    [fixup layout header pointer for encode_layoutreturn]
    Signed-off-by: Benny Halevy

    Andy Adamson
     
  • With the objects layout security model, we have object capabilities
    that are associated with the layout and we anticipate that the server
    will issue a cb_layoutrecall for any setattr that changes security
    related attributes (user/group/mode/acl) or truncates the file.

    Therefore, the layout is returned before issuing the setattr to avoid
    the anticipated cb_layoutrecall.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • NFSv4.1 LAYOUTRETURN implementation

    Currently, does not support layout-type payload encoding.

    Signed-off-by: Alexandros Batsakis
    Signed-off-by: Andy Adamson
    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Fred Isaman
    Signed-off-by: Marc Eshel
    Signed-off-by: Zhang Jingwang
    [call pnfs_return_layout right before pnfs_destroy_layout]
    [remove assert_spin_locked from pnfs_clear_lseg_list]
    [remove wait parameter from the layoutreturn path.]
    [remove return_type field from nfs4_layoutreturn_args]
    [remove range from nfs4_layoutreturn_args]
    [no need to send layoutcommit from _pnfs_return_layout]
    [don't wait on sync layoutreturn]
    [fix layout stateid in layoutreturn args]
    [fixed NULL deref in _pnfs_return_layout]
    [removed recaim member of nfs4_layoutreturn_args]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • With the use of the in-kernel osd library. Implement read/write
    of data from/to osd-objects according to information specified
    in the objects-layout.

    Support for stripping over mirrors with a received stripe_unit.
    There are however a few constrains which are not supported:
    1. Stripe Unit must be a multiple of PAGE_SIZE
    2. stripe length (stripe_unit * number_of_stripes) can not be
    bigger then 32bit.

    Also support raid-groups and partial-layout. Partial-layout is
    when not all the groups are received on the line, addressing
    only a partial range of the file.

    TODO:
    Only raid0! raid 4/5/6 support will come at later stage

    A none supported layout will send IO through the MDS

    [Important fallout from the last rebase]
    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • Non-rpc layout driver such as for objects and blocks
    implement their own I/O path and error handling logic.
    Therefore bypass NFS-based error handling for these layout drivers.

    [fix lseg ref-count bugs, and null de-refs]
    [Fall out from: non-rpc layout drivers]
    Signed-off-by: Boaz Harrosh
    [get rid of PNFS_USE_RPC_CODE]
    [get rid of __nfs4_write_done_cb]
    [revert useless change in nfs4_write_done_cb]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • allocate and deallocate per-inode private pnfs_layout_hdr
    in preparation for I/O implementation.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • [gfp_flags]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • When a new layout is received in objio_alloc_lseg all device_ids
    referenced are retrieved. The device information is queried for from MDS
    and then the osd_device is looked-up from the osd-initiator library. The
    devices are cached in a per-mount-point list, for later use. At unmount
    all devices are "put" back to the library.

    objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware
    API for retrieving device information given a device_id.

    TODO: The device cache can get big. Cap its size. Keep an LRU and start
    to return devices which were not used, when list gets to big, or
    when new entries allocation fail.

    [pnfs-obj: Bugs in new global-device-cache code]
    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    [use global device cache]
    [use layout driver in global device cache]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • objlayout_alloc_lseg prepares an xdr_stream and calls the
    raid engins objio_alloc_lseg() to allocate a private
    pnfs_layout_segment.

    objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to
    decode and store the layout_segment information in an
    objio_segment struct, using the pnfs_osd_xdr.h API for
    the actual parsing the layout xdr.

    objlayout_free_lseg calls objio_free_lseg() to free the
    allocated space.

    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    [removed "extern" from function definitions]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • * Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will
    include the XDR encode/decode implementations for the pNFS
    client objlayout driver.

    [Wrong type in comments]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • * Define the PNFS_OBJLAYOUT Kconfig option in the nfs
    master Kconfig file.
    * Add the objlayout driver to the Kernel's Kbuild system.
    * Add the fs/nfs/objlayout/Kbuild file for building the
    objlayoutdriver.ko driver
    * Define fs/nfs/objlayout/objio_osd.c, register the driver on module
    initialization and unregister on exit.

    [pnfs-obj: remove of CONFIG_PNFS fallout]
    Signed-off-by: Boaz Harrosh
    [added "unsure" clause]
    [depend on NFS_V4_1]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • A pNFS client auto-negotiates a lot of features (minorversion level,
    pNFS layout type, etc.). This is convenient, but makes certain kinds of
    failures hard for a user to detect.

    For example, if the client falls back on 4.0, or falls back to MDS IO
    because the user didn't connect to the right iscsi disks before
    mounting, the only symptoms may be reduced performance, which may not be
    noticed till long after the actual failure, and may be difficult for a
    user to diagnose.

    However, such "failures" may also be perfectly normal in some cases, so
    we don't want to spam the system logs with them.

    One approach would be to put some more information into
    /proc/self/mountstats.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Benny Halevy
    [pnfs: add commit client stats]
    [fixup data types for "ret" variables in pnfs_try_to* inline funcs.]
    Signed-off-by: Benny Halevy
    [fix definition of show_pnfs for !CONFIG_PNFS]
    Signed-off-by: Benny Halevy
    [nfs41: Fix show_sessions in the not CONFIG_NFS_V4_1 case]
    There is a build error when CONFIG_NFS_V4 is set but
    CONFIG_NFS_V4_1 is *not* set. show_sessions() prototype
    was unbalanced between the two cases.
    Signed-off-by: Boaz Harrosh
    [pnfs: super.c remove CONFIG_PNFS]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy

    J. Bruce Fields
     
  • Use recalled range to invalidate particular layout segments in the layout cache.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Add offset and count parameters to pnfs_update_layout and use them to get
    the layout in the pageio path.

    Order cache layout segments in the following order:
    * offset (ascending)
    * length (descending)
    * iomode (RW before READ)

    Test byte range against the layout segment in use in pnfs_{read,write}_pg_test
    so not to coalesce pages not using the same layout segment.

    [fix lseg ordering]
    [clean up pnfs_find_lseg lseg arg]
    [remove unnecessary FIXME]
    [fix ordering in pnfs_insert_layout]
    [clean up pnfs_insert_layout]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Initialize xdr_stream and xdr_buf using an array of page pointers
    and length of buffer.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • pnfs deviceids are unique per server, per layout type.
    struct nfs_client is currently used to distinguish deviceids from
    different nfs servers, yet these may clash between different layout
    types on the same server. Therefore, use the layout driver associated
    with each deviceid at insertion time to look it up, unhash, or
    delete it.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Note: This functionlaity is incomplete as all layout segments referring to
    the 'to be removed device id' need to be reaped, and all in flight I/O drained.

    [use be32 res in nfs4_callback_devicenotify]
    [use nfs_client to qualify deviceid for cb_notify_deviceid]
    [use global deviceid cache for CB_NOTIFY_DEVICEID]
    [refactor device cache _lookup_deviceid]
    [refactor device cache _find_get_deviceid]
    Signed-off-by: Benny Halevy
    [Bug in new global-device-cache code]
    [layout_driver MUST set free_deviceid_node if using dev-cache]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Marc Eshel
     
  • The eCryptfs inode get, initialization, and dentry interposition code
    has two separate paths. One is for when dentry interposition is needed
    after doing things like a mkdir in the lower filesystem and the other
    is needed after a lookup. Unlocking new inodes and doing a d_add() needs
    to happen at different times, depending on which type of dentry
    interposing is being done.

    This patch cleans up the inode get and initialization code paths and
    splits them up so that the locking and d_add() differences mentioned
    above can be handled appropriately in a later patch.

    Signed-off-by: Tyler Hicks
    Tested-by: David

    Tyler Hicks
     
  • Use the pnfs_layoutdriver_type both as a qualifier for the deviceid,
    distinguishing deviceid from different layout types on the server,
    and for freeing the layout-driver allocated structure containing the
    nfs4_deviceid_node.

    [BUG in _deviceid_purge_client]
    [layout_driver MUST set free_deviceid_node if using dev-cache]
    [let ver < 4.1 compile]
    Signed-off-by: Boaz Harrosh
    [removed EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client)]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • These functions should live in inode.c since their focus is on inodes
    and they're primarily used by functions in inode.c.

    Also does a simple cleanup of ecryptfs_inode_test() and rolls
    ecryptfs_init_inode() into ecryptfs_inode_set().

    Signed-off-by: Tyler Hicks
    Tested-by: David

    Tyler Hicks