24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this sctp implementation is free software you can redistribute it
    and or modify it under the terms of the gnu general public license
    as published by the free software foundation either version 2 or at
    your option any later version this sctp implementation is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with gnu cc see the file copying if not see
    http www gnu org licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 42 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 Apr, 2019

1 commit

  • sk_forward_alloc's updating is also done on rx path, but to be consistent
    we change to use sk_mem_charge() in sctp_skb_set_owner_r().

    In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
    which doesn't work for mem_cgroup_sockets_enabled, so we change to use
    sk_under_memory_pressure().

    When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
    should be called on both RENEGE or CHUNK DELIVERY path exit the memory
    pressure status as soon as possible.

    Note that sk_rmem_schedule() is using datalen to make things easy there.

    Reported-by: Matteo Croce
    Tested-by: Matteo Croce
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

12 Apr, 2019

4 commits


20 Nov, 2018

2 commits

  • The member subscribe should be per asoc, so that sockopt SCTP_EVENT
    in the next patch can subscribe a event from one asoc only.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • The member subscribe in sctp_sock is used to indicate to which of
    the events it is subscribed, more like a group of flags. So it's
    better to be defined as __u16 (2 bytpes), instead of struct
    sctp_event_subscribe (13 bytes).

    Note that sctp_event_subscribe is an UAPI struct, used on sockopt
    calls, and thus it will not be removed. This patch only changes
    the internal storage of the flags.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

11 Sep, 2018

1 commit


23 Dec, 2017

1 commit

  • Lots of overlapping changes. Also on the net-next side
    the XDP state management is handled more in the generic
    layers so undo the 'net' nfp fix which isn't applicable
    in net-next.

    Include a necessary change by Jakub Kicinski, with log message:

    ====================
    cls_bpf no longer takes care of offload tracking. Make sure
    netdevsim performs necessary checks. This fixes a warning
    caused by TC trying to remove a filter it has not added.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Dec, 2017

1 commit

  • Now when reneging events in sctp_ulpq_renege(), the variable freed
    could be increased by a __u16 value twice while freed is of __u16
    type. It means freed may overflow at the second addition.

    This patch is to fix it by using __u32 type for 'freed', while at
    it, also to remove 'if (chunk)' check, as all renege commands are
    generated in sctp_eat_data and it can't be NULL.

    Reported-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

12 Dec, 2017

4 commits

  • Unordered idata process is more complicated than unordered data:

    - It has to add mid into sctp_stream_out to save the next mid value,
    which is separated from ordered idata's.

    - To support pd for unordered idata, another mid and pd_mode need to
    be added to save the message id and pd state in sctp_stream_in.

    - To make unordered idata reasm easier, it adds a new event queue
    to save frags for idata.

    The patch mostly adds the samilar reasm functions for unordered idata
    as ordered idata's, and also adjusts some other codes on assign_mid,
    abort_pd and ulpevent_data for idata.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • abort_pd is added as a member of sctp_stream_interleave, used to abort
    partial delivery for data or idata, called in sctp_cmd_assoc_failed.

    Since stream interleave allows to do partial delivery for each stream
    at the same time, sctp_intl_abort_pd for idata would be very different
    from the old function sctp_ulpq_abort_pd for data.

    Note that sctp_ulpevent_make_pdapi will support per stream in this
    patch by adding pdapi_stream and pdapi_seq in sctp_pdapi_event, as
    described in section 6.1.7 of RFC6458.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • renege_events is added as a member of sctp_stream_interleave, used to
    renege some old data or idata in reasm or lobby queue properly to free
    some memory for the new data when there's memory stress.

    It defines sctp_renege_events for idata, and leaves sctp_ulpq_renege
    as it is for data.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • ulpevent_data is added as a member of sctp_stream_interleave, used to
    do the most process in ulpq, including to convert data or idata chunk
    to event, reasm them in reasm queue and put them in lobby queue in
    right order, and deliver them up to user sk rx queue.

    This procedure is described in section 2.2.3 of RFC8260.

    It adds most functions for idata here to do the similar process as
    the old functions for data. But since the details are very different
    between them, the old functions can not be reused for idata.

    event->ssn and event->ppid settings are moved to ulpevent_data from
    sctp_ulpevent_make_rcvmsg, so that sctp_ulpevent_make_rcvmsg could
    work for both data and idata.

    Note that mid is added in sctp_ulpevent for idata, __packed has to
    be used for defining sctp_ulpevent, or it would exceeds the skb cb
    that saves a sctp_ulpevent variable for ulp layer process.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

09 Sep, 2017

1 commit

  • Commit fb586f25300f ("sctp: delay calls to sk_data_ready() as much as
    possible") minimized the number of wake ups that are triggered in case
    the association receives a packet with multiple data chunks on it and/or
    when io_events are enabled and then commit 0970f5b36659 ("sctp: signal
    sk_data_ready earlier on data chunks reception") moved the wake up to as
    soon as possible. It thus relies on the state machine running later to
    clean the flag that the event was already generated.

    The issue is that there are 2 call paths that calls
    sctp_ulpq_tail_event() outside of the state machine, causing the flag to
    linger and possibly omitting a needed wake up in the sequence.

    One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
    setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
    partial reliability triggers removal of chunks from the send queue when
    the application calls sendmsg().

    This commit fixes it by not setting the flag in case the socket is not
    owned by the user, as it won't be cleaned later. This works for
    user-initiated calls and also for rx path processing.

    Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
    Reported-by: Harald Welte
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

02 Jul, 2017

1 commit


03 Jun, 2017

1 commit

  • As Marcelo's suggestion, stream is a fixed size member of asoc and would
    not grow with more streams. To avoid an allocation for it, this patch is
    to define it as an object instead of pointer and update the places using
    it, also create sctp_stream_update() called in sctp_assoc_update() to
    migrate the stream info from one stream to another.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

07 Jan, 2017

1 commit

  • sctp stream reconf, described in RFC 6525, needs a structure to
    save per stream information in assoc, like stream state.

    In the future, sctp stream scheduler also needs it to save some
    stream scheduler params and queues.

    This patchset is to prepare the stream array in assoc for stream
    reconf. It defines sctp_stream that includes stream arrays inside
    to replace ssnmap.

    Note that we use different structures for IN and OUT streams, as
    the members in per OUT stream will get more and more different
    from per IN stream.

    v1->v2:
    - put these patches into a smaller group.
    v2->v3:
    - define sctp_stream to contain stream arrays, and create stream.c
    to put stream-related functions.
    - merge 3 patches into 1, as new sctp_stream has the same name
    with before.

    Signed-off-by: Xin Long
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

19 Sep, 2016

1 commit

  • In commit 311b21774f13 ("sctp: simplify sk_receive_queue locking"), a call
    to 'skb_queue_splice_tail_init()' has been made explicit. Previously it was
    hidden in 'sctp_skb_list_tail()'

    Now, the code around it looks redundant. The '_init()' part of
    'skb_queue_splice_tail_init()' should already do the same.

    Signed-off-by: Christophe JAILLET
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Christophe Jaillet
     

31 Jul, 2016

1 commit

  • Prior to this patch, once sctp received SHUTDOWN or shutdown with RD,
    sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would
    be dropped in sctp_ulpq_tail_event(). It would cause:

    1. some notifications couldn't be received by users. like
    SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C().

    2. sctp would also never trigger sk_data_ready when the association
    was closed, making it harder to identify the end of the association
    by calling recvmsg() and getting an EOF. It was not convenient for
    kernel users.

    The check here should be stopping delivering DATA chunks after receiving
    SHUTDOWN, and stopping delivering ANY chunks after sctp_close().

    So this patch is to allow notifications to enqueue into receive queue
    even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event,
    but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all
    events.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

02 May, 2016

1 commit

  • Dave Miller pointed out that fb586f25300f ("sctp: delay calls to
    sk_data_ready() as much as possible") may insert latency specially if
    the receiving application is running on another CPU and that it would be
    better if we signalled as early as possible.

    This patch thus basically inverts the logic on fb586f25300f and signals
    it as early as possible, similar to what we had before.

    Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
    Reported-by: Dave Miller
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

16 Apr, 2016

1 commit

  • SCTP already serializes access to rcvbuf through its sock lock:
    sctp_recvmsg takes it right in the start and release at the end, while
    rx path will also take the lock before doing any socket processing. On
    sctp_rcv() it will check if there is an user using the socket and, if
    there is, it will queue incoming packets to the backlog. The backlog
    processing will do the same. Even timers will do such check and
    re-schedule if an user is using the socket.

    Simplifying this will allow us to remove sctp_skb_list_tail and get ride
    of some expensive lockings. The lists that it is used on are also
    mangled with functions like __skb_queue_tail and __skb_unlink in the
    same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
    sctp_close() will also purge those while using only the sock lock.

    Therefore the lockings performed by sctp_skb_list_tail() are not
    necessary. This patch removes this function and replaces its calls with
    just skb_queue_splice_tail_init() instead.

    The biggest gain is at sctp_ulpq_tail_event(), because the events always
    contain a list, even if it's queueing a single skb and this was
    triggering expensive calls to spin_lock_irqsave/_irqrestore for every
    data chunk received.

    As SCTP will deliver each data chunk on a corresponding recvmsg, the
    more effective the change will be.
    Before this patch, with chunks with 30 bytes:
    netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
    400000 -s 400000 400000
    on a 10Gbit link with 1500 MTU:

    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608

    With it:

    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Apr, 2016

1 commit

  • Currently processing of multiple chunks in a single SCTP packet leads to
    multiple calls to sk_data_ready, causing multiple wake up signals which
    are costy and doesn't make it wake up any faster.

    With this patch it will note that the wake up is pending and will do it
    before leaving the state machine interpreter, latest place possible to
    do it realiably and cleanly.

    Note that sk_data_ready events are not dependent on asocs, unlike waking
    up writers.

    v2: series re-checked
    v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

12 Nov, 2014

1 commit

  • Alternative to RPS/RFS is to use hardware support for multiple
    queues.

    Then split a set of million of sockets into worker threads, each
    one using epoll() to manage events on its own socket pool.

    Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
    know after accept() or connect() on which queue/cpu a socket is managed.

    We normally use one cpu per RX queue (IRQ smp_affinity being properly
    set), so remembering on socket structure which cpu delivered last packet
    is enough to solve the problem.

    After accept(), connect(), or even file descriptor passing around
    processes, applications can use :

    int cpu;
    socklen_t len = sizeof(cpu);

    getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

    And use this information to put the socket into the right silo
    for optimal performance, as all networking stack should run
    on the appropriate cpu, without need to send IPI (RPS/RFS).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2014

1 commit

  • The busy polling socket option adds support for sockets to busy wait on data
    arriving on the napi queue from which they have most recently received a frame.
    Currently only tcp and udp support this feature, but theres no reason sctp can't
    do so as well. Add it in so appliations can take advantage of it

    Signed-off-by: Neil Horman
    CC: Vlad Yasevich
    CC: "David S. Miller"
    CC: Daniel Borkmann
    CC: netdev@vger.kernel.org
    Acked-by: Vlad Yasevich
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Neil Horman
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Dec, 2013

3 commits


07 Dec, 2013

1 commit

  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    CC: Vlad Yasevich
    CC: Neil Horman
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

10 Aug, 2013

1 commit

  • With the restructuring of the lksctp.org site, we only allow bug
    reports through the SCTP mailing list linux-sctp@vger.kernel.org,
    not via SF, as SF is only used for web hosting and nothing more.
    While at it, also remove the obvious statement that bugs will be
    fixed and incooperated into the kernel.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

25 Jul, 2013

1 commit

  • The SCTP mailing list address to send patches or questions
    to is linux-sctp@vger.kernel.org and not
    lksctp-developers@lists.sourceforge.net anymore. Therefore,
    update all occurences.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

18 Apr, 2013

1 commit


01 Mar, 2013

3 commits

  • In sctp_ulpq_tail_data(), use return values 0,1 to indicate whether
    a complete event (with MSG_EOR set) was delivered. A return value
    of -ENOMEM continues to indicate an out-of-memory condition was
    encountered.

    In sctp_ulpq_retrieve_partial() and sctp_ulpq_retrieve_first(),
    correct message reassembly logic for SCTP partial delivery.
    Change logic to ensure that as much data as possible is sent
    with the initial partial delivery and that following partial
    deliveries contain all available data.

    In sctp_ulpq_partial_delivery(), attempt partial delivery only
    if the data on the head of the reassembly queue is at or before
    the cumulative TSN ACK point.

    In sctp_ulpq_renege(), use the modified return values from
    sctp_ulpq_tail_data() to choose whether to attempt partial
    delivery or to attempt to drain the reassembly queue as a
    means to reduce memory pressure. Remove call to
    sctp_tsnmap_mark(), as this is handled correctly in call to
    sctp_ulpq_tail_data().

    Signed-off-by: Lee A. Roberts
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman

    Lee A. Roberts
     
  • In sctp_ulpq_renege_list(), events being reneged from the
    ordering queue may correspond to multiple TSNs. Identify
    all affected packets; sum freed space and renege from the
    tsnmap.

    Signed-off-by: Lee A. Roberts
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman

    Lee A. Roberts
     
  • In sctp_ulpq_renege_list(), do not renege packets below the
    cumulative TSN ACK point.

    Signed-off-by: Lee A. Roberts
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman

    Lee A. Roberts
     

04 Nov, 2012

1 commit

  • Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as
    a void pointer, even though they are written as various other types. Theres no
    need for this as doing so just leads to possible type-punning issues that could
    cause crashes, and if we remain type-consistent we can actually just remove the
    void * member of the union entirely.

    Change Notes:

    v2)
    * Dropped chunk that modified SCTP_NULL to create a marker pattern
    should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning
    to .zero provides the same effect and should be faster, per Vlad Y.

    v3)
    * Reverted part of V2, opting to use memset instead of .zero, so that
    the entire union is initalized thus avoiding the i164 speculative load
    problems previously encountered, per Dave M.. Also rewrote
    SCTP_[NO]FORCE so as to use common infrastructure a little more

    Signed-off-by: Neil Horman
    CC: "David S. Miller"
    CC: linux-sctp@vger.kernel.org
    Signed-off-by: David S. Miller

    Neil Horman
     

15 Aug, 2012

1 commit


01 Jul, 2012

1 commit

  • It was noticed recently that when we send data on a transport, its possible that
    we might bundle a sack that arrived on a different transport. While this isn't
    a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
    2960:

    An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
    etc.) to the same destination transport address from which it
    received the DATA or control chunk to which it is replying. This
    rule should also be followed if the endpoint is bundling DATA chunks
    together with the reply chunk.

    This patch seeks to correct that. It restricts the bundling of sack operations
    to only those transports which have moved the ctsn of the association forward
    since the last sack. By doing this we guarantee that we only bundle outbound
    saks on a transport that has received a chunk since the last sack. This brings
    us into stricter compliance with the RFC.

    Vlad had initially suggested that we strictly allow only sack bundling on the
    transport that last moved the ctsn forward. While this makes sense, I was
    concerned that doing so prevented us from bundling in the case where we had
    received chunks that moved the ctsn on multiple transports. In those cases, the
    RFC allows us to select any of the transports having received chunks to bundle
    the sack on. so I've modified the approach to allow for that, by adding a state
    variable to each transport that tracks weather it has moved the ctsn since the
    last sack. This I think keeps our behavior (and performance), close enough to
    our current profile that I think we can do this without a sysctl knob to
    enable/disable it.

    Signed-off-by: Neil Horman
    CC: Vlad Yaseivch
    CC: David S. Miller
    CC: linux-sctp@vger.kernel.org
    Reported-by: Michele Baldessari
    Reported-by: sorin serban
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Neil Horman