10 Nov, 2006

2 commits

  • XMSTATE_SOL_HDR could be set when the xmit thread tests it, but there may
    not be anything on the r2tqueue yet. Move the XMSTATE_SOL_HDR set
    before the addition to the queue to make sure that when we pull something
    off it it is valid. This does not add locks around the xmstate test or make
    that a atmoic_t because this is a fast path and if it is set when we test it
    we can handle it there without the overhead. Later on we check the xmitqueue
    for all requests with the session lock so we will not miss it.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • Unconditionally free crypto state, as it is always allocated during
    TCP connection creation. Without this, crypto structures leak and
    crc32c module refcounts grow as connections are created and
    destroyed.

    Signed-off-by: Pete Wyckoff
    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Pete Wyckoff
     

24 Sep, 2006

1 commit


21 Sep, 2006

1 commit


03 Sep, 2006

7 commits

  • When a digest is spread across two network buffers, we currently
    ignore this and try to check the digest with the partial buffer.
    Or course this fails. This patch has use iscsi_tcp_copy to
    copy the whole digest before testing it.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • When we relogin to a target, we have not yet negotiated digests
    so we must reset the hdr_size var.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • This patch built over the last ones fixes a bug in the partial header
    resend code, where we add on another 4 bytes to the send length on the resend.
    We want just the header plus digest.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • We currently allocated seperate tfms for data and header digests. There
    is no reason for this since we can never calculate a rx header and
    digest at the same time. Same for sends. So this patch removes the data
    tfms and has the send and recv sides use the rx_tfm or tx_tfm.

    I also made the connection creation code preallocate the tfms because I
    thought I hit a bug where I changed the digests settings during a
    relogin but could not allocate the tfm and then we just failed.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • iscsi_tcp calculates padding by using the expected transfer length. This
    has the problem where if we have immediate data = no and initial R2T =
    yes, and the transfer length ended up needing padding then we send:

    1. header
    2. padding which should have gone after data
    3. data

    Besides this bug, we also assume the target will always ask for nice
    transfer lengths and the first burst length will always be a nice value.
    As far as I can tell form the RFC this is not a requirement. It would be
    silly to do this, but if someone did it we will end doing bad things.

    Finally the last bug in that bit of code is in our handling of the
    recalculation of data digests when we do not send a whole iscsi_buf in
    one try. The bug here is that we call crypto_digest_final on a
    iscsi_sendpage error, then when we send the rest of the iscsi_buf, we
    doiscsi_data_digest_init and this causes the previous data digest to be
    lost.

    And to make matters worse, some of these bugs are replicated over and
    over and over again for immediate data, solicited data and unsolicited
    data. So the attached patch made over the iscsi git tree (see
    kernel.org/git for details) which I updated today to include the patches
    I said I merged, consolidates the sending of data, padding and digests
    and calculation of data digests and fixes the above bugs.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • A couple targets like string bean and MDS, send r2ts with
    a data len greater than the max burst we agreed to. We
    were being strict in our enforcing of the iscsi rfc in that
    code path, but there is no driver limitation that prevents
    us from fullfilling the request. To allow those targets
    to work we will ignore the max_burst length and send as
    much data as the target asks for assuming it has consciously
    decided to override its max burst length.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • iSCSI RFC states that the first burst length must be smaller than the
    max burst length. We currently assume targets will be good, but that may
    not be the case, so this patch adds a check.

    This patch also moves the unsol data out offset to the lib so the LLDs
    do not have to track it.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

29 Jul, 2006

7 commits

  • The version info is useful for iscsi tcp, iser and qla4xxx so move to
    transport class.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • Must pass ISCSI_ERR values from the recv path and propogate them
    upwards.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • We currently try to allocate a max_recv_data_segment_length
    which can be very large (default is 64K), and common uses
    are up to 1MB. It is very very difficult to allocte this
    much contiguous memory and it turns out we never even use it.
    We really only need a couple of pages, so this patch has us
    allocates just what we know what we need today.

    Later if vendors start adding vendor specific data and
    we need to handle large buffers we can do this, but for
    the last 4 years we have not seen anyone do this or request
    it.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • When we enter recovery and flush the running commands
    we cannot freee the connection before flushing the commands.
    Some commands may have a reference to the connection
    that needs to be released before. iscsi_stop was forcing
    the term and suspend too early and was causing a oops
    in iser, so this patch removes those callbacks all together
    and allows the LLD to handle that detail.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • if iscsi_data_rsp fails we must bail out. Since the pdu values like
    data length are invalid we cannot continue to process the data since
    it could over run buffers.

    This fixes a bug with cisco 5428s where that target is sending
    too much data.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • The iscsi tcp code can pluck multiple rt2s from the tasks's r2tqueue
    in the xmit code. This can result in the task being queued on the xmit queue
    but gettting completed at the same time.

    This patch fixes the above bug by making the fifo a list so
    we always remove the entry on the list del.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • In the xmit patch we are sending a -EXXX value to iscsi_conn_failure
    which is causing userspace to get confused.

    We should be sending a ISCSI_ERR_* value that userspace understands.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

29 Jun, 2006

1 commit


06 Jun, 2006

3 commits

  • Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • We can race and misset the suspend bit if iscsi_write_space is
    called then iscsi_send returns with a failure indicating
    there is no space.

    To handle this this patch returns a error upwards allowing xmitworker
    to decide if we need to try and transmit again. For the no
    write space case xmitworker will not retry, and instead
    let iscsi_write_space queue it back up if needed (this relies
    on the work queue code to properly requeue us if needed).

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • If recovery failed or we are in recovery only overwrite the state
    if we are going to terminate the session or if we logged back in.

    STOP_CONN_SUSPEND and conn_cnt are not used. We only support
    a single connection session ATM, so cleanup that code while
    we are working around it.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

20 May, 2006

3 commits

  • update version

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • Discovered by steven@hayter.me.uk and patch by michaelc@cs.wisc.edu

    The dtask mempool is reserving 261120 items per session! Since we are now
    sending headers with sendmsg there is no reason for the mempool and that
    was causing us to us carzy amounts of mem. We can preallicate a header in
    the r2t and task struct and reuse them

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • From Zhen and ported by Mike:

    Don't use sendpage for the headers. sendpage for the pdu headers
    does not seem to have a performance impact, makes life harder
    for mutiple data pdus to be in flight and still trips up some
    network cards when it is from slab mem.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

10 May, 2006

4 commits

  • debugged by wrwhitehead@novell.com
    patch and analysis by fujita.tomonori@lab.ntt.co.jp

    Only tcp_read_sock and recv_actor (iscsi_tcp_data_recv for us) see
    desc.count. It is is used just for permitting tcp_read_sock to read
    the portion of data in the socket.

    When iscsi_tcp_data_recv sees a partial header, it sets
    desc.count. However, it is possible that the next skb (containing the
    rest of the header) still does not come. So I'm not sure that this
    scheme is completely correct.

    Ideally, we should use the exact length of the data in the socket for
    desc.count. However, it is not so simple (see SIOCINQ in
    tcp_ioctl). So I think that iscsi_tcp_data_recv can just stop playing
    with desc.count and tell tcp_read_sock to read the all skbs. As
    proposed already, if iscsi_tcp_data_ready sets desc.count to
    non-zero, tcp_read_sock does that.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • debugged by Ming and Rohan:

    The problem Ming and Rohan debugged was that during a normal session
    login, open-iscsi is not incrementing the exp_statsn counter. It was
    stuck at zero. From the RFC, it looks like if the login response PDU has
    a successful status then we should be incrementing that value. Also from
    the RFC, it looks like if when we drop a connection then reconnect, we
    should be using the exp_statsn from the old connection in the next
    relogin attempt.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • align printk output

    Signed-off-by: Or Gerlitz
    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Or Gerlitz
     
  • add transport end point callbacks so iscsi drivers that cannot connect
    from userspace, like iscsi tcp, using sockets do not have to
    implement their own socket infrastructure.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Or Gerlitz
     

15 Apr, 2006

4 commits

  • This just converts iscsi_tcp to the lib

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • The current iscsi_tcp eh is not nicely setup for dm-multipath
    and performs some extra task management functions when they
    are not needed.

    The attached patch:

    - Fixes the TMF issues. If a session is rebuilt
    then we do not send aborts.

    - Fixes the problem where if the host reset fired, we would
    return SUCCESS even though we had not really done anything
    yet. This ends up causing problem with scsi_error.c's TUR.

    - If someone has turned on the userspace nop daemon code to try
    and detect network problems before the scsi command timeout
    we can now drop and clean up the session before the scsi command
    timesout and fires the eh speeding up the time it takes for a
    command to go from one patch to another. For network problems
    we fail the command with DID_BUS_BUSY so if failfast is set
    scsi_decide_disposition fails the command up to dm for it to
    try on another path.

    - And we had to add some basic iscsi session block code. Previously
    if we were trying to repair a session we would retrun a MLQUEUE code
    in the queuecommand. This worked but it was not the most efficient
    or pretty thing to do since it would take a while to relogin
    to the target. For iscsi_tcp/open-iscsi a lot of the iscsi error handler
    is in userspace the block code is pretty bare. We will be
    adding to that for qla4xxx.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • For iscsi boot when going from initramfs to the real root we
    need to stop the userpsace iscsi daemon. To later restart it
    iscsid needs to be able to rebuild itself and part of that
    process is matching a session running the kernel with the
    iscsid representation. To do this the attached patch
    adds several required iscsi values. If the LLD does not provide
    them becuase, login is done in userspace, then the transport
    class and userspace set ths up for the LLD.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     
  • from hare@suse.de and michaelc@cs.wisc.edu

    hw iscsi like qla4xxx does not allocate a host per session and
    for userspace it is difficult to restart iscsid using the
    "iscsi handles" for the session and connection, so this
    patch just has the class or userspace allocate the id for
    the session and connection.

    Note: this breaks userspace and requires users to upgrade to the newest
    open-iscsi tools. Sorry about his but open-iscsi is still too new to
    say we have a stable user-kernel api and we were not good nough
    designers to know that other hw iscsi drivers and iscsid itself would
    need such changes. Actually we sorta did but at the time we did not
    have the HW available to us so we could only guess.

    Luckily, the only tools hooking into the class are the open-iscsi ones
    or other tools like iscsitart hook into the open-iscsi engine from
    userspace or prgroams like anaconda call our tools so they are not affected.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

27 Mar, 2006

1 commit

  • Modify well over a dozen mempool users to call mempool_create_slab_pool()
    rather than calling mempool_create() with extra arguments, saving about 30
    lines of code and increasing readability.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     

22 Mar, 2006

1 commit

  • SLAB_NO_REAP is documented as an option that will cause this slab not to be
    reaped under memory pressure. However, that is not what happens. The only
    thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
    slab elements that were allocated in batch in cache_reap(). Cache_reap()
    is run every few seconds independently of memory pressure.

    Could we remove the whole thing? Its only used by three slabs anyways and
    I cannot find a reason for having this option.

    There is an additional problem with SLAB_NO_REAP. If set then the recovery
    of objects from alien caches is switched off. Objects not freed on the
    same node where they were initially allocated will only be reused if a
    certain amount of objects accumulates from one alien node (not very likely)
    or if the cache is explicitly shrunk. (Strangely __cache_shrink does not
    check for SLAB_NO_REAP)

    Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.

    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Manfred Spraul
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

05 Feb, 2006

4 commits

  • >From ogerlitz@voltaire.com:

    mgmtpool shoild be frees in immdata_alloc_fail label.

    Signed-off-by: Mike Christie
    Signed-off-by: Alex Aizman
    Signed-off-by: Dmitry Yusupov
    Signed-off-by: James Bottomley

    Mike Christie
     
  • >From erezz@voltaire.com:

    We are still in ISCSI_STATE_FREE state at create time. The addition
    of the first connection puts us in ISCSI_STATE_LOGGED_IN.

    Signed-off-by: Mike Christie
    Signed-off-by: Alex Aizman
    Signed-off-by: Dmitry Yusupov
    Signed-off-by: James Bottomley

    Mike Christie
     
  • >From erezz@voltaire.com:

    rm conn->lock since it is not used anymore. The dataqueue is protected
    by the session lock and xmitmutex.

    Signed-off-by: Mike Christie
    Signed-off-by: Alex Aizman
    Signed-off-by: Dmitry Yusupov
    Signed-off-by: James Bottomley

    Mike Christie
     
  • From:
    michaelc@cs.wisc.edu
    fujita.tomonori@lab.ntt.co.jp
    da-x@monatomic.org

    and err path fixup from:
    ogerlitz@voltaire.com

    This patch cleans up that interface by having the lld and class
    pass a iscsi_cls_session or iscsi_cls_conn between each other when
    the function is used by HW and SW iscsi llds. This way the lld
    does not have to remember if it has to send a handle or pointer
    and a handle or pointer to connection, session or host.

    This also has the class verify the session handle that gets passed from
    userspace instead of using the pointer passed into the kernel directly.

    Signed-off-by: Mike Christie
    Signed-off-by: Alex Aizman
    Signed-off-by: Dmitry Yusupov
    Signed-off-by: James Bottomley

    Mike Christie
     

15 Jan, 2006

1 commit