06 Jan, 2012

2 commits

  • As mandated by the standard. In case of an IO error, a pNFS
    objects layout driver must return it's layout. This is because
    all device errors are reported to the server as part of the
    layout return buffer.

    This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
    is done, through a bit flag on the pnfs_layoutdriver_type->flags
    member. The flag is set by the layout driver that wants a
    layout_return preformed at pnfs_ld_{write,read}_done in case
    of an error.
    (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
    because this code is never called outside of pnfs.c and pnfs IO
    paths)

    Without this patch 3.[0-2] Kernels leak memory and have an annoying
    WARN_ON after every IO error utilizing the pnfs-obj driver.

    [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • Some time along the way pNFS IO errors were switched to
    communicate with a special iodata->pnfs_error member instead
    of the regular RPC members. But objlayout was not switched
    over.

    Fix that!
    Without this fix any IO error is hanged, because IO is not
    switched to MDS and pages are never cleared or read.

    [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     

03 Nov, 2011

8 commits

  • The ore need suplied a r4w_get_page/r4w_put_page API
    from Filesystem so it can get cache pages to read-into when
    writing parial stripes.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • Finally remove all the old raid engine, which is by now
    dead code.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • In this patch we are actually moving to the ORE.
    (Object Raid Engine).

    objio_state holds a pointer to an ore_io_state. Once
    we have an ore_io_state at hand we can call the ore
    for reading/writing. We register on the done path
    to kick off the nfs io_done mechanism.

    Again for Ease of reviewing the old code is "#if 0"
    but is not removed so the diff command works better.
    The old code will be removed in the next patch.

    fs/exofs/Kconfig::ORE is modified to also be auto-included
    if PNFS_OBJLAYOUT is set. Since we now depend on ORE.
    (See comments in fs/exofs/Kconfig)

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • For Ease of reviewing I split the move to ore into 3 parts
    move to ore 01: ore_layout & ore_components
    move to ore 02: move to ORE
    move to ore 03: Remove old raid engine

    This patch modifies the objio_lseg, layout-segment level
    and devices and components arrays to use the ORE types.

    Though it will be removed soon, also the raid engine
    is modified to actually compile, possibly run, with
    the new types. So it is the same old raid engine but
    with some new ORE types.

    For Ease of reviewing, some of the old code is
    "#if 0" but is not removed so the diff command works
    better. The old code will be removed in the 3rd patch.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • * All instances of objlayout_io_state => objlayout_io_res
    * All instances of state => oir;
    * All instances of ol_state => oir;

    Big but nothing to it

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • This is part of moving objio_osd to use the ORE.

    objlayout_io_state had two functions:
    1. It was used in the error reporting mechanism at layout_return.
    This function is kept intact.
    (Later patch will rename objlayout_io_state => objlayout_io_res)
    2. Carrier of rw io members into the objio_read/write_paglist API.
    This is removed in this patch.

    The {r,w}data received from NFS are passed directly to the
    objio_{read,write}_paglist API. The io_engine is now allocating
    it's own IO state as part of the read/write. The minimal
    functionality that was part of the generic allocation is passed
    to the io_engine.

    So part of this patch is rename of:
    ios->ol_state.foo => ios->foo

    At objlayout_{read,write}_done an objlayout_io_state is passed that
    denotes the result of the IO. (Hence the later name change).
    If the IO is successful objlayout calls an objio_free_result() API
    immediately (Which for objio_osd causes the release of the io_state).
    If the IO ended in an error it is hanged onto until reported in
    layout_return and is released later through the objio_free_result()
    API. (All this is not new just renamed and cleaned)

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • objlayout driver was always returning PNFS_ATTEMPTED from it's
    read/write_pagelist operations. Even on error. Fix that.

    Start by establishing an error return API from io-engine, by
    not returning ssize_t (length-or-error) but returning "int"
    0=OK, 0>Error. And clean up all return types in io-engine.

    Then if io-engine returned error return PNFS_NOT_ATTEMPTED
    to generic layer. (With a dprint)

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • The EOF calculation was done on .read_pagelist(), cached
    in objlayout_io_state->eof, and set in objlayout_read_done()
    into nfs_read_data->res.eof.

    So set it directly into nfs_read_data->res.eof and avoid
    the extra member.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     

04 Aug, 2011

2 commits

  • There were bugs in the case of partial layout where olo_comp_index
    is not zero. This used to work and was tested but one of the later
    cleanup SQUASHMEs broke it and was not tested since.

    Also add a dprint that specify those received layout parameters.
    Everything else was already printed.

    [Needed in v3.0]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • When we have a situation that the number of pages we want
    to encode is bigger then the size of the bio. (Which can
    currently happen only when all IO is going to a single device
    .e.g group_width==1) then the IO is submitted short and we
    report back only the amount of bytes we actually wrote/read
    and all is fine. BUT ...

    There was a bug that the current length counter was advanced
    before the fail to add the extra page, and we come to a situation
    that the CDB length was one-page longer then the actual bio size,
    which is of course rejected by the osd-target.

    While here also fix the bio size calculation, in the case
    that we received more then one group of devices.

    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     

16 Jul, 2011

1 commit


15 Jul, 2011

2 commits


13 Jul, 2011

3 commits


21 Jun, 2011

1 commit

  • 1. If the intention is to coalesce requests 'prev' and 'req' then we
    have to ensure at least that we have a layout starting at
    req_offset(prev).

    2. If we're only requesting a minimal layout of length desc->pg_count,
    we need to test the length actually returned by the server before
    we allow the coalescing to occur.

    3. We need to deal correctly with (pgio->lseg == NULL)

    4. Fixup the test guarding the pnfs_update_layout.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Jun, 2011

1 commit

  • Andy's last device_cache patches, already take an extra
    reference on the newly inserted device_id. So we can remove it
    from obj-io.

    Without this patch the device_ids are leaked.

    Andy's patches are not in Linus tree yet. So I'm not sure if they are
    scheduled for this Kernel or the next. This patch should be added as
    part of these.

    CC: Andy Adamson
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     

15 Jun, 2011

1 commit

  • (d)printks should use %zd for ssize_t arguments not %ld, otherwise they might
    get a warning. I see the following with MN10300.

    fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done':
    fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

    Signed-off-by: David Howells
    cc: Trond Myklebust
    cc: linux-nfs@vger.kernel.org
    Signed-off-by: Trond Myklebust

    David Howells
     

30 May, 2011

12 commits

  • Implement pg_test vector to test for max IO sizes. We calculate
    a max_io_size member only once, and cache it in lseg so to not
    do so on every page insert.

    Signed-off-by: Boaz Harrosh
    [simplify logic]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Signed-off-by: Benny Halevy

    Benny Halevy
     
  • * Define API for io-engines to report delta_space_used in IOs
    * Encode the osd-layout specific information of the layoutcommit
    XDR buffer.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • An io_state pre-allocates an error information structure for each
    possible osd-device that might error during IO. When IO is done if all
    was well the io_state is freed. (as today). If the I/O has ended with an
    error, the io_state is queued on a per-layout err_list. When eventually
    encode_layoutreturn() is called, each error is properly encoded on the
    XDR buffer and only then the io_state is removed from err_list and
    de-allocated.

    It is up to the io_engine to fill in the segment that fault and the type
    of osd_error that occurred. By calling objlayout_io_set_result() for
    each failing device.

    In objio_osd:
    * Allocate io-error descriptors space as part of io_state
    * Use generic objlayout error reporting at end of io.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • With the objects layout security model, we have object capabilities
    that are associated with the layout and we anticipate that the server
    will issue a cb_layoutrecall for any setattr that changes security
    related attributes (user/group/mode/acl) or truncates the file.

    Therefore, the layout is returned before issuing the setattr to avoid
    the anticipated cb_layoutrecall.

    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • With the use of the in-kernel osd library. Implement read/write
    of data from/to osd-objects according to information specified
    in the objects-layout.

    Support for stripping over mirrors with a received stripe_unit.
    There are however a few constrains which are not supported:
    1. Stripe Unit must be a multiple of PAGE_SIZE
    2. stripe length (stripe_unit * number_of_stripes) can not be
    bigger then 32bit.

    Also support raid-groups and partial-layout. Partial-layout is
    when not all the groups are received on the line, addressing
    only a partial range of the file.

    TODO:
    Only raid0! raid 4/5/6 support will come at later stage

    A none supported layout will send IO through the MDS

    [Important fallout from the last rebase]
    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • allocate and deallocate per-inode private pnfs_layout_hdr
    in preparation for I/O implementation.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • When a new layout is received in objio_alloc_lseg all device_ids
    referenced are retrieved. The device information is queried for from MDS
    and then the osd_device is looked-up from the osd-initiator library. The
    devices are cached in a per-mount-point list, for later use. At unmount
    all devices are "put" back to the library.

    objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware
    API for retrieving device information given a device_id.

    TODO: The device cache can get big. Cap its size. Keep an LRU and start
    to return devices which were not used, when list gets to big, or
    when new entries allocation fail.

    [pnfs-obj: Bugs in new global-device-cache code]
    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    [use global device cache]
    [use layout driver in global device cache]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • objlayout_alloc_lseg prepares an xdr_stream and calls the
    raid engins objio_alloc_lseg() to allocate a private
    pnfs_layout_segment.

    objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to
    decode and store the layout_segment information in an
    objio_segment struct, using the pnfs_osd_xdr.h API for
    the actual parsing the layout xdr.

    objlayout_free_lseg calls objio_free_lseg() to free the
    allocated space.

    Signed-off-by: Boaz Harrosh
    [gfp_flags]
    [removed "extern" from function definitions]
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • * Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will
    include the XDR encode/decode implementations for the pNFS
    client objlayout driver.

    [Wrong type in comments]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Boaz Harrosh
     
  • * Define the PNFS_OBJLAYOUT Kconfig option in the nfs
    master Kconfig file.
    * Add the objlayout driver to the Kernel's Kbuild system.
    * Add the fs/nfs/objlayout/Kbuild file for building the
    objlayoutdriver.ko driver
    * Define fs/nfs/objlayout/objio_osd.c, register the driver on module
    initialization and unregister on exit.

    [pnfs-obj: remove of CONFIG_PNFS fallout]
    Signed-off-by: Boaz Harrosh
    [added "unsure" clause]
    [depend on NFS_V4_1]
    Signed-off-by: Benny Halevy

    Benny Halevy