02 Aug, 2016

1 commit

  • Prior to this patch, sctp defined TCP_CLOSING as SCTP_SS_CLOSING.
    TCP_CLOSING is such a special sk state in TCP that inet common codes
    even exclude it.

    For instance, inet_accept thinks the accept sk's state never be
    TCP_CLOSING, or it will give a WARN_ON. TCP works well with that
    while SCTP may trigger the call trace, as CLOSING state in SCTP
    has different meaning from TCP.

    This fix is to change to use TCP_CLOSE_WAIT as SCTP_SS_CLOSING,
    instead of TCP_CLOSING. Some side-effects could be expected,
    regardless of not being used before. inet_accept will accept it
    now.

    I did all the func_tests in lksctp-tools and ran sctp codnomicon
    fuzzer tests against this patch, no regression or failure found.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

14 Jul, 2016

4 commits

  • Identifying address family operations during rx path is not something
    expensive but it's ugly to the eye to have it done multiple times,
    specially when we already validated it during initial rx processing.

    This patch takes advantage of the now shared sctp_input_cb and make the
    pointer to the operations readily available.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • SCTP will try to access original IP headers on sctp_recvmsg in order to
    copy the addresses used. There are also other places that do similar access
    to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO
    support") they aren't always there because they are only present in the
    header skb.

    SCTP handles the queueing of incoming data by cloning the incoming skb
    and limiting to only the relevant payload. This clone has its cb updated
    to something different and it's then queued on socket rx queue. Thus we
    need to fix this in two moments.

    For rx path, not related to socket queue yet, this patch uses a
    partially copied sctp_input_cb to such GSO frags. This restores the
    ability to access the headers for this part of the code.

    Regarding the socket rx queue, it removes iif member from sctp_event and
    also add a chunk pointer on it.

    With these changes we're always able to reach the headers again.

    The biggest change here is that now the sctp_chunk struct and the
    original skb are only freed after the application consumed the buffer.
    Note however that the original payload was already like this due to the
    skb cloning.

    For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
    IPv6 now can fetch it directly from original's IPv6 CB as the original
    skb is still accessible.

    In the future we probably can simplify sctp_v*_skb_iif() stuff, as
    sctp_v4_skb_iif() was called but it's return value not used, and now
    it's not even called, but such cleanup is out of scope for this change.

    Fixes: 90017accff61 ("sctp: Add GSO support")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to
    bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so
    we shrink it, and iif member (4 bytes) which can be easily fetched from
    another place once the next patch is there, so we remove it and thus
    creating space for 8 bytes.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • We process input path in other files too and having access to it is
    nice, so move it to a header where it's shared.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

12 Jul, 2016

4 commits

  • prsctp PRIO policy is a policy to abandon lower priority chunks when
    asoc doesn't have enough snd buffer, so that the current chunk with
    higher priority can be queued successfully.

    Similar to TTL/RTX policy, we will set the priority of the chunk to
    prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
    So if PRIO policy is enabled, msg->expire_at won't work.

    asoc->sent_cnt_removable will record how many chunks can be checked to
    remove. If priority policy is enabled, when the chunk is queued into
    the out_queue, we will increase sent_cnt_removable. When the chunk is
    moved to abandon_queue or dequeue and free, we will decrease
    sent_cnt_removable.

    In sctp_sendmsg, we will check if there is enough snd buffer for current
    msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
    sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
    free chunks from out_queue in right order until the abandon+free size >
    msg_len - sctp_wfree. For the abandon size, we have to wait until it
    sends FORWARD TSN, receives the sack and the chunks are really freed.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • prsctp TTL policy is a policy to abandon chunks when they expire
    at the specific time in local stack. It's similar with expires_at
    in struct sctp_datamsg.

    This patch uses sinfo->sinfo_timetolive to set the specific time for
    TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
    So if prsctp_enable or TTL policy is not enabled, msg->expires_at
    still works as before.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
    to dump the prsctp statistics info from the asoc. The prsctp statistics
    includes abandoned_sent/unsent from the asoc. abandoned_sent is the
    count of the packets we drop packets from retransmit/transmited queue,
    and abandoned_unsent is the count of the packets we drop from out_queue
    according to the policy.

    Note: another option for prsctp statistics dump described in rfc is
    SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
    info from each stream. But by now, linux doesn't yet have per stream
    statistics info, it needs rfc6525 to be implemented. As the prsctp
    statistics for each stream has to be based on per stream statistics,
    we will delay it until rfc6525 is done in linux.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
    We will add prsctp_enable to both asoc and ep, and replace the places
    where it used net.sctp->prsctp_enable with asoc->prsctp_enable.

    ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
    asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
    also modify it's value through sockopt SCTP_PR_SUPPORTED.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

04 Jun, 2016

1 commit

  • SCTP has this pecualiarity that its packets cannot be just segmented to
    (P)MTU. Its chunks must be contained in IP segments, padding respected.
    So we can't just generate a big skb, set gso_size to the fragmentation
    point and deliver it to IP layer.

    This patch takes a different approach. SCTP will now build a skb as it
    would be if it was received using GRO. That is, there will be a cover
    skb with protocol headers and children ones containing the actual
    segments, already segmented to a way that respects SCTP RFCs.

    With that, we can tell skb_segment() to just split based on frag_list,
    trusting its sizes are already in accordance.

    This way SCTP can benefit from GSO and instead of passing several
    packets through the stack, it can pass a single large packet.

    v2:
    - Added support for receiving GSO frames, as requested by Dave Miller.
    - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
    - Added heuristics similar to what we have in TCP for not generating
    single GSO packets that fills cwnd.
    v3:
    - consider sctphdr size in skb_gso_transport_seglen()
    - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
    unsupported GSO")

    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

02 May, 2016

1 commit

  • Dave Miller pointed out that fb586f25300f ("sctp: delay calls to
    sk_data_ready() as much as possible") may insert latency specially if
    the receiving application is running on another CPU and that it would be
    better if we signalled as early as possible.

    This patch thus basically inverts the logic on fb586f25300f and signals
    it as early as possible, similar to what we had before.

    Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
    Reported-by: Dave Miller
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

28 Apr, 2016

3 commits

  • There is nothing related to BH in SNMP counters anymore,
    since linux-3.0.

    Rename helpers to use __ prefix instead of _BH prefix,
    for contexts where preemption is disabled.

    This more closely matches convention used to update
    percpu variables.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In the old days (before linux-3.0), SNMP counters were duplicated,
    one for user context, and one for BH context.

    After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
    we have a single copy, and what really matters is preemption being
    enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
    respectively.

    We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
    NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
    SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
    UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

    Following patches will rename __BH helpers to make clear their
    usage is not tied to BH being disabled.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Apr, 2016

1 commit


16 Apr, 2016

3 commits

  • For some main variables in sctp.ko, we couldn't export it to other modules,
    so we have to define some api to access them.

    It will include sctp transport and endpoint's traversal.

    There are some transport traversal functions for sctp_diag, we can also
    use it for sctp_proc. cause they have the similar situation to traversal
    transport.

    v2->v3:
    - rhashtable_walk_init need the parameter gfp, because of recent upstrem
    update

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • sctp_diag will dump some important details of sctp's assoc or ep, we use
    sctp_info to describe them, sctp_get_sctp_info to get them, and export
    it to sctp_diag.ko.

    v2->v3:
    - we will not use list_for_each_safe in sctp_get_sctp_info, cause
    all the callers of it will use lock_sock.

    - fix the holes in struct sctp_info with __reserved* field.
    because sctp_diag is a new feature, and sctp_info is just for now,
    it may be changed in the future.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • SCTP already serializes access to rcvbuf through its sock lock:
    sctp_recvmsg takes it right in the start and release at the end, while
    rx path will also take the lock before doing any socket processing. On
    sctp_rcv() it will check if there is an user using the socket and, if
    there is, it will queue incoming packets to the backlog. The backlog
    processing will do the same. Even timers will do such check and
    re-schedule if an user is using the socket.

    Simplifying this will allow us to remove sctp_skb_list_tail and get ride
    of some expensive lockings. The lists that it is used on are also
    mangled with functions like __skb_queue_tail and __skb_unlink in the
    same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
    sctp_close() will also purge those while using only the sock lock.

    Therefore the lockings performed by sctp_skb_list_tail() are not
    necessary. This patch removes this function and replaces its calls with
    just skb_queue_splice_tail_init() instead.

    The biggest gain is at sctp_ulpq_tail_event(), because the events always
    contain a list, even if it's queueing a single skb and this was
    triggering expensive calls to spin_lock_irqsave/_irqrestore for every
    data chunk received.

    As SCTP will deliver each data chunk on a corresponding recvmsg, the
    more effective the change will be.
    Before this patch, with chunks with 30 bytes:
    netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
    400000 -s 400000 400000
    on a 10Gbit link with 1500 MTU:

    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608

    With it:

    SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Apr, 2016

2 commits

  • Currently processing of multiple chunks in a single SCTP packet leads to
    multiple calls to sk_data_ready, causing multiple wake up signals which
    are costy and doesn't make it wake up any faster.

    With this patch it will note that the wake up is pending and will do it
    before leaving the state machine interpreter, latest place possible to
    do it realiably and cleanly.

    Note that sk_data_ready events are not dependent on asocs, unlike waking
    up writers.

    v2: series re-checked
    v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • It wastes space and gets worse as we add new flags, so convert bit-wide
    flags to a bitfield.

    Currently it already saves 4 bytes in sctp_sock, which are left as holes
    in it for now. The whole struct needs packing, which should be done in
    another patch.

    Note that do_auto_asconf cannot be merged, as explained in the comment
    before it.

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

11 Apr, 2016

1 commit

  • Currently on high rate SCTP streams the heartbeat timer refresh can
    consume quite a lot of resources as timer updates are costly and it
    contains a random factor, which a) is also costly and b) invalidates
    mod_timer() optimization for not editing a timer to the same value.
    It may even cause the timer to be slightly advanced, for no good reason.

    As suggested by David Laight this patch now removes this timer update
    from hot path by leaving the timer on and re-evaluating upon its
    expiration if the heartbeat is still needed or not, similarly to what is
    done for TCP. If it's not needed anymore the timer is re-scheduled to
    the new timeout, considering the time already elapsed.

    For this, we now record the last tx timestamp per transport, updated in
    the same spots as hb timer was restarted on tx. Also split up
    sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
    sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
    heartbeat one.

    On loopback with MTU of 65535 and data chunks with 1636, so that we
    have a considerable amount of chunks without stressing system calls,
    netperf -t SCTP_STREAM -l 30, perf looked like this before:

    Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
    Overhead Command Shared Object Symbol
    + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
    - _raw_write_unlock_irqrestore
    - 96,54% _raw_spin_unlock_irqrestore
    - 36,14% mod_timer
    + 97,24% sctp_transport_reset_timers
    + 2,76% sctp_do_sm
    + 33,65% __wake_up_sync_key
    + 28,77% sctp_ulpq_tail_event
    + 1,40% del_timer
    - 1,84% mod_timer
    + 99,03% sctp_transport_reset_timers
    + 0,97% sctp_do_sm
    + 1,50% sctp_ulpq_tail_event

    And after this patch, now with netperf -l 60:

    Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
    Overhead Command Shared Object Symbol
    + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms
    + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
    - _raw_spin_unlock_irqrestore
    + 49,89% __wake_up_sync_key
    + 45,68% sctp_ulpq_tail_event
    - 2,85% mod_timer
    + 76,51% sctp_transport_reset_t3_rtx
    + 23,49% sctp_do_sm
    + 1,55% del_timer
    + 2,50% netperf [sctp] [k] sctp_datamsg_from_user
    + 2,26% netperf [sctp] [k] sctp_sendmsg

    Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
    ~3.7%.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

06 Apr, 2016

1 commit


21 Mar, 2016

2 commits

  • If the user supply a different fragmentation point or if there is a
    network header that cause it to not be aligned, force it to be aligned.

    Fragmentation point at a value that is not aligned is not optimal. It
    causes extra padding to be used and has just no pros.

    v2:
    - Make use of the new WORD_TRUNC macro

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
    MTU can sometimes return values that are not aligned, like for loopback,
    which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
    will cause the last non-aligned bytes to never be used and can cause
    issues with congestion control.

    So it's better to just consider a lower MTU and keep congestion control
    calcs saner as they are based on PMTU.

    Same applies to icmp frag needed messages, which is also fixed by this
    patch.

    One other effect of this is the inability to send MTU-sized packet
    without queueing or fragmentation and without hitting Nagle. As the
    check performed at sctp_packet_can_append_data():

    if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
    /* Enough data queued to fill a packet */
    return SCTP_XMIT_OK;

    with the above example of MTU, if there are no other messages queued,
    one cannot send a packet that just fits one packet (65532 bytes) and
    without causing DATA chunk fragmentation or a delay.

    v2:
    - Added WORD_TRUNC macro

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

20 Mar, 2016

2 commits

  • David S. Miller
     
  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • Pull crypto update from Herbert Xu:
    "Here is the crypto update for 4.6:

    API:
    - Convert remaining crypto_hash users to shash or ahash, also convert
    blkcipher/ablkcipher users to skcipher.
    - Remove crypto_hash interface.
    - Remove crypto_pcomp interface.
    - Add crypto engine for async cipher drivers.
    - Add akcipher documentation.
    - Add skcipher documentation.

    Algorithms:
    - Rename crypto/crc32 to avoid name clash with lib/crc32.
    - Fix bug in keywrap where we zero the wrong pointer.

    Drivers:
    - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
    - Add PIC32 hwrng driver.
    - Support BCM6368 in bcm63xx hwrng driver.
    - Pack structs for 32-bit compat users in qat.
    - Use crypto engine in omap-aes.
    - Add support for sama5d2x SoCs in atmel-sha.
    - Make atmel-sha available again.
    - Make sahara hashing available again.
    - Make ccp hashing available again.
    - Make sha1-mb available again.
    - Add support for multiple devices in ccp.
    - Improve DMA performance in caam.
    - Add hashing support to rockchip"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
    crypto: qat - remove redundant arbiter configuration
    crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
    crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
    crypto: qat - Change the definition of icp_qat_uof_regtype
    hwrng: exynos - use __maybe_unused to hide pm functions
    crypto: ccp - Add abstraction for device-specific calls
    crypto: ccp - CCP versioning support
    crypto: ccp - Support for multiple CCPs
    crypto: ccp - Remove check for x86 family and model
    crypto: ccp - memset request context to zero during import
    lib/mpi: use "static inline" instead of "extern inline"
    lib/mpi: avoid assembler warning
    hwrng: bcm63xx - fix non device tree compatibility
    crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
    crypto: qat - The AE id should be less than the maximal AE number
    lib/mpi: Endianness fix
    crypto: rockchip - add hash support for crypto engine in rk3288
    crypto: xts - fix compile errors
    crypto: doc - add skcipher API documentation
    crypto: doc - update AEAD AD handling
    ...

    Linus Torvalds
     

14 Mar, 2016

1 commit

  • Currently sctp_sendmsg() triggers some calls that will allocate memory
    with GFP_ATOMIC even when not necessary. In the case of
    sctp_packet_transmit it will allocate a linear skb that will be used to
    construct the packet and this may cause sends to fail due to ENOMEM more
    often than anticipated specially with big MTUs.

    This patch thus allows it to inherit gfp flags from upper calls so that
    it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
    similar. All others, like retransmits or flushes started from BH, are
    still allocated using GFP_ATOMIC.

    In netperf tests this didn't result in any performance drawbacks when
    memory is not too fragmented and made it trigger ENOMEM way less often.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

09 Mar, 2016

1 commit

  • Dmitry reported that sctp_add_bind_addr may read more bytes than
    expected in case the parameter is a IPv4 addr supplied by the user
    through calls such as sctp_bindx_add(), because it always copies
    sizeof(union sctp_addr) while the buffer may be just a struct
    sockaddr_in, which is smaller.

    This patch then fixes it by limiting the memcpy to the min between the
    union size and a (new parameter) provided addr size. Where possible this
    parameter still is the size of that union, except for reading from
    user-provided buffers, which then it accounts for protocol type.

    Reported-by: Dmitry Vyukov
    Tested-by: Dmitry Vyukov
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

18 Feb, 2016

1 commit

  • Since commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg
    after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of
    sctp_datamsg_free, this function has no use in sctp.

    So we will remove it.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

29 Jan, 2016

2 commits

  • After we use refcnt to check if transport is alive, the dead can be
    removed from sctp_transport.

    The traversal of transport_addr_list in procfs dump is using
    list_for_each_entry_rcu, no need to check if it has been freed.

    sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
    protected by sock lock, it's not necessary to check dead, either.
    also, the timers are cancelled when sctp_transport_free() is
    called, that it doesn't wait for refcnt to reach 0 to cancel them.

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Now when __sctp_lookup_association is running in BH, it will try to
    check if t->dead is set, but meanwhile other CPUs may be freeing this
    transport and this assoc and if it happens that
    __sctp_lookup_association checked t->dead a bit too early, it may think
    that the association is still good while it was already freed.

    So we fix this race by using atomic_add_unless in sctp_transport_hold.
    After we get one transport from hashtable, we will hold it only when
    this transport's refcnt is not 0, so that we can make sure t->asoc
    cannot be freed before we hold the asoc again.

    Note that sctp association is not freed using RCU so we can't use
    atomic_add_unless() with it as it may just be too late for that either.

    Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path")
    Reported-by: Vlad Yasevich
    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

27 Jan, 2016

1 commit


06 Jan, 2016

2 commits

  • transport hashtable will replace the association hashtable,
    so association hashtable is not used in sctp any more, so
    drop the codes about that.

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • tranport hashtbale will replace the association hashtable to do the
    lookup for transport, and then get association by t->assoc, rhashtable
    apis will be used because of it's resizable, scalable and using rcu.

    lport + rport + paddr will be the base hashkey to locate the chain,
    with net to protect one netns from another, then plus the laddr to
    compare to get the target.

    this patch will provider the lookup functions:
    - sctp_epaddr_lookup_transport
    - sctp_addrs_lookup_transport

    hash/unhash functions:
    - sctp_hash_transport
    - sctp_unhash_transport

    init/destroy functions:
    - sctp_transport_hashtable_init
    - sctp_transport_hashtable_destroy

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

07 Dec, 2015

1 commit

  • when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING
    state, if B neither claim his rwnd is 0 nor send SACK for this data, A
    will keep retransmitting this data until t5 timeout, Max.Retrans times
    can't work anymore, which is bad.

    if B's rwnd is not 0, it should send abort after Max.Retrans times, only
    when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A
    will start t5 timer, which is also commit f8d960524328 ("sctp: Enforce
    retransmission limit during shutdown") means, but it lacks the condition
    peer rwnd == 0.

    so fix it by adding a bit (zero_window_announced) in peer to record if
    the last rwnd is 0. If it was, zero_window_announced will be set. and use
    this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING.

    Fixes: commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    lucien
     

03 Dec, 2015

1 commit


15 Jun, 2015

1 commit

  • ->auto_asconf_splist is per namespace and mangled by functions like
    sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

    Also, the call to inet_sk_copy_descendant() was backuping
    ->auto_asconf_list through the copy but was not honoring
    ->do_auto_asconf, which could lead to list corruption if it was
    different between both sockets.

    This commit thus fixes the list handling by using ->addr_wq_lock
    spinlock to protect the list. A special handling is done upon socket
    creation and destruction for that. Error handlig on sctp_init_sock()
    will never return an error after having initialized asconf, so
    sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
    will be take on sctp_close_sock(), before locking the socket, so we
    don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

    Instead of taking the lock on sctp_sock_migrate() for copying and
    restoring the list values, it's preferred to avoid rewritting it by
    implementing sctp_copy_descendant().

    Issue was found with a test application that kept flipping sysctl
    default_auto_asconf on and off, but one could trigger it by issuing
    simultaneous setsockopt() calls on multiple sockets or by
    creating/destroying sockets fast enough. This is only triggerable
    locally.

    Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
    Reported-by: Ji Jianwen
    Suggested-by: Neil Horman
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

28 May, 2015

1 commit

  • sctp_v4_map_v6 was subtly writing and reading from members
    of a union in a way the clobbered data it needed to read before
    it read it.

    Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
    that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
    result.

    Reorder things to guarantee correct behaviour no matter what the
    union layout is.

    This impacts user space clients that open an IPv6 SCTP socket and
    receive IPv4 connections. Prior to 299ee user space would see a
    sockaddr with AF_INET and a correct address, after 299ee the sockaddr
    is AF_INET6, but the address is wrong.

    Fixes: 299ee123e198 (sctp: Fixup v4mapped behaviour to comply with Sock API)
    Signed-off-by: Jason Gunthorpe
    Acked-by: Daniel Borkmann
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Jason Gunthorpe
     

25 Mar, 2015

1 commit