10 Nov, 2015

1 commit

  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Nov, 2015

2 commits

  • Merge second patch-bomb from Andrew Morton:

    - most of the rest of MM

    - procfs

    - lib/ updates

    - printk updates

    - bitops infrastructure tweaks

    - checkpatch updates

    - nilfs2 update

    - signals

    - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
    dma-debug, dma-mapping, ...

    * emailed patches from Andrew Morton : (102 commits)
    ipc,msg: drop dst nil validation in copy_msg
    include/linux/zutil.h: fix usage example of zlib_adler32()
    panic: release stale console lock to always get the logbuf printed out
    dma-debug: check nents in dma_sync_sg*
    dma-mapping: tidy up dma_parms default handling
    pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
    kexec: use file name as the output message prefix
    fs, seqfile: always allow oom killer
    seq_file: reuse string_escape_str()
    fs/seq_file: use seq_* helpers in seq_hex_dump()
    coredump: change zap_threads() and zap_process() to use for_each_thread()
    coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
    signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
    signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
    signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
    signals: kill block_all_signals() and unblock_all_signals()
    nilfs2: fix gcc uninitialized-variable warnings in powerpc build
    nilfs2: fix gcc unused-but-set-variable warnings
    MAINTAINERS: nilfs2: add header file for tracing
    nilfs2: add tracepoints for analyzing reading and writing metadata files
    ...

    Linus Torvalds
     
  • Pull trivial updates from Jiri Kosina:
    "Trivial stuff from trivial tree that can be trivially summed up as:

    - treewide drop of spurious unlikely() before IS_ERR() from Viresh
    Kumar

    - cosmetic fixes (that don't really affect basic functionality of the
    driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

    - various comment / printk fixes and updates all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    bcache: Really show state of work pending bit
    hwmon: applesmc: fix comment typos
    Kconfig: remove comment about scsi_wait_scan module
    class_find_device: fix reference to argument "match"
    debugfs: document that debugfs_remove*() accepts NULL and error values
    net: Drop unlikely before IS_ERR(_OR_NULL)
    mm: Drop unlikely before IS_ERR(_OR_NULL)
    fs: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
    UBI: Update comments to reflect UBI_METAONLY flag
    pktcdvd: drop null test before destroy functions

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

05 Oct, 2015

1 commit

  • We want to avoid using time_t in the kernel because of the y2038
    overflow problem. The use in sctp is not for storing seconds at
    all, but instead uses microseconds and is passed as 32-bit
    on all machines.

    This patch changes the type to u32, which better fits the use.

    Signed-off-by: Arnd Bergmann
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Cc: linux-sctp@vger.kernel.org
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

29 Sep, 2015

4 commits

  • IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
    is no need to do that again from its callers. Drop it.

    Acked-by: Neil Horman
    Signed-off-by: Viresh Kumar
    Signed-off-by: Jiri Kosina

    Viresh Kumar
     
  • Seemingly innocuous sctp_trans_state_to_prio_map[] array
    is way bigger than it looks, since
    "[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" !

    This patch replaces it with switch() statement.

    Signed-off-by: Denys Vlasenko
    CC: Vlad Yasevich
    CC: Neil Horman
    CC: Marcelo Ricardo Leitner
    CC: linux-sctp@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-kernel@vger.kernel.org
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Denys Vlasenko
     
  • A case can occur when sctp_accept() is called by the user during
    a heartbeat timeout event after the 4-way handshake. Since
    sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
    bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
    the listening socket but released with the new association socket.
    The result is a deadlock on any future attempts to take the listening
    socket lock.

    Note that this race can occur with other SCTP timeouts that take
    the bh_lock_sock() in the event sctp_accept() is called.

    BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
    ...
    RIP: 0010:[] [] _spin_lock+0x1e/0x30
    RSP: 0018:ffff880028323b20 EFLAGS: 00000206
    RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
    RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
    R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
    FS: 0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
    Stack:
    ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
    0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
    ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
    Call Trace:

    [] ? sctp_rcv+0x492/0xa10 [sctp]
    [] ? nf_iterate+0x69/0xb0
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [] ? nf_hook_slow+0x76/0x120
    [] ? ip_local_deliver_finish+0x0/0x2d0
    [] ? ip_local_deliver_finish+0xdd/0x2d0
    [] ? ip_local_deliver+0x98/0xa0
    [] ? ip_rcv_finish+0x12d/0x440
    [] ? ip_rcv+0x275/0x350
    [] ? __netif_receive_skb+0x4ab/0x750
    ...

    With lockdep debugging:

    =====================================
    [ BUG: bad unlock balance detected! ]
    -------------------------------------
    CslRx/12087 is trying to release lock (slock-AF_INET) at:
    [] sctp_generate_timeout_event+0x40/0xe0 [sctp]
    but there are no more locks to release!

    other info that might help us debug this:
    2 locks held by CslRx/12087:
    #0: (&asoc->timers[i]){+.-...}, at: [] run_timer_softirq+0x16f/0x3e0
    #1: (slock-AF_INET){+.-...}, at: [] sctp_generate_timeout_event+0x23/0xe0 [sctp]

    Ensure the socket taken is also the same one that is released by
    saving a copy of the socket before entering the timeout event
    critical section.

    Signed-off-by: Karl Heiss
    Signed-off-by: David S. Miller

    Karl Heiss
     
  • Fix indentation in sctp_generate_heartbeat_event.

    Signed-off-by: Karl Heiss
    Signed-off-by: David S. Miller

    Karl Heiss
     

12 Sep, 2015

1 commit

  • Consider sctp module is unloaded and is being requested because an user
    is creating a sctp socket.

    During initialization, sctp will add the new protocol type and then
    initialize pernet subsys:

    status = sctp_v4_protosw_init();
    if (status)
    goto err_protosw_init;

    status = sctp_v6_protosw_init();
    if (status)
    goto err_v6_protosw_init;

    status = register_pernet_subsys(&sctp_net_ops);

    The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
    is possible for userspace to create SCTP sockets like if the module is
    already fully loaded. If that happens, one of the possible effects is
    that we will have readers for net->sctp.local_addr_list list earlier
    than expected and sctp_net_init() does not take precautions while
    dealing with that list, leading to a potential panic but not limited to
    that, as sctp_sock_init() will copy a bunch of blank/partially
    initialized values from net->sctp.

    The race happens like this:

    CPU 0 | CPU 1
    socket() |
    __sock_create | socket()
    inet_create | __sock_create
    list_for_each_entry_rcu( |
    answer, &inetsw[sock->type], |
    list) { | inet_create
    /* no hits */ |
    if (unlikely(err)) { |
    ... |
    request_module() |
    /* socket creation is blocked |
    * the module is fully loaded |
    */ |
    sctp_init |
    sctp_v4_protosw_init |
    inet_register_protosw |
    list_add_rcu(&p->list, |
    last_perm); |
    | list_for_each_entry_rcu(
    | answer, &inetsw[sock->type],
    sctp_v6_protosw_init | list) {
    | /* hit, so assumes protocol
    | * is already loaded
    | */
    | /* socket creation continues
    | * before netns is initialized
    | */
    register_pernet_subsys |

    Simply inverting the initialization order between
    register_pernet_subsys() and sctp_v4_protosw_init() is not possible
    because register_pernet_subsys() will create a control sctp socket, so
    the protocol must be already visible by then. Deferring the socket
    creation to a work-queue is not good specially because we loose the
    ability to handle its errors.

    So, as suggested by Vlad, the fix is to split netns initialization in
    two moments: defaults and control socket, so that the defaults are
    already loaded by when we register the protocol, while control socket
    initialization is kept at the same moment it is today.

    Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

04 Sep, 2015

2 commits

  • Commit 0ca50d12fe46 added a restriction that the address must belong to
    the output interface, so that sctp will use the right interface even
    when using secondary addresses.

    But it breaks IPVS setups, on which people is used to attach VIP
    addresses to loopback interface on real servers. It's preferred to
    attach to the interface actually in use, but it's a very common setup
    and that used to work.

    This patch then saves the first routing good result, even if it would be
    going out through an interface that doesn't have that address. If no
    better hit found, it's then used. This effectively restores the original
    behavior if no better interface could be found.

    Fixes: 0ca50d12fe46 ("sctp: fix src address selection if using secondary addresses")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • Commit 0ca50d12fe46 failed to release the reference to dst entries that
    it decided to skip.

    Fixes: 0ca50d12fe46 ("sctp: fix src address selection if using secondary addresses")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

31 Aug, 2015

1 commit


29 Aug, 2015

2 commits

  • When removing an non-primary transport during ASCONF
    processing, we end up traversing the transport list
    twice: once in sctp_cmd_del_non_primary, and once in
    sctp_assoc_del_peer. We can avoid the second
    search and call sctp_assoc_rm_peer() instead.
    Found by code inspection during code reviews.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • RFC 5061:
    This is an opaque integer assigned by the sender to identify each
    request parameter. The receiver of the ASCONF Chunk will copy this
    32-bit value into the ASCONF Response Correlation ID field of the
    ASCONF-ACK response parameter. The sender of the ASCONF can use this
    same value in the ASCONF-ACK to find which request the response is
    for. Note that the receiver MUST NOT change this 32-bit value.

    Address Parameter: TLV

    This field contains an IPv4 or IPv6 address parameter, as described
    in Section 3.3.2.1 of [RFC4960].

    ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
    should be sent if the Delete IP Address is not part of the association.

    Endpoint A Endpoint B
    (ESTABLISHED) (ESTABLISHED)

    ASCONF ----------------->
    (Delete IP Address)

    Acked-by: Vlad Yasevich
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    lucien
     

28 Aug, 2015

3 commits

  • David S. Miller
     
  • Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
    fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
    resetting the association overall_error_count. This allowed the association
    to better enforce assoc.max_retrans limit.

    However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
    state. In this state, HB-ACKs will continue to reset the overall_error_count
    for the association would extend the lifetime of association unnecessarily.

    This patch solves this by resetting the overall_error_count whenever the current
    state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we
    end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
    states, but they are not really impacted because we disable Heartbeats in those
    states.

    Fixes: Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    lucien
     
  • in sctp_process_asconf(), we get address parameter from the beginning of
    the addip params. but we never check if it's really there. if the addr
    param is not there, it still can pass sctp_verify_asconf(), then to be
    handled by sctp_process_asconf(), it will not be safe.

    so add a code in sctp_verify_asconf() to check the address parameter is in
    the beginning, or return false to send abort.

    note that this can also detect multiple address parameters, and reject it.

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    lucien
     

01 Aug, 2015

1 commit


27 Jul, 2015

1 commit

  • Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
    RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
    (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net:
    sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].

    Imho, it was not a good idea, and we should just revert that message
    for a couple of reasons:

    1) It's uapi and therefore set in stone forever.

    2) To be able to run on older and newer kernels, an SCTP application
    would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
    SCTP_RCVINFO support, so that on older kernels, it can make use
    of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
    In my (limited) experience, a lot of SCTP appliances are migrating
    to newer kernels only ve(ee)ry slowly.

    3) Some people don't have the chance to change their applications,
    f.e. due to proprietary legacy stuff. So, they'll hit this warning
    in fast path and are stuck with older kernels.

    But i.e. due to point 1) I really fail to see the benefit of a warning.
    So just revert that for now, the issue was reported up Jamal.

    [1] http://thread.gmane.org/gmane.linux.network/321960/

    Reported-by: Jamal Hadi Salim
    Signed-off-by: Daniel Borkmann
    Cc: Michael Tuexen
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

21 Jul, 2015

3 commits

  • Cookie ACK is always received by the association initiator, so fix the
    comment to avoid confusion.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • In short, sctp is likely to incorrectly choose src address if socket is
    bound to secondary addresses. This patch fixes it by adding a new check
    that checks if such src address belongs to the interface that routing
    identified as output.

    This is enough to avoid rp_filter drops on remote peer.

    Details:

    Currently, sctp will do a routing attempt without specifying the src
    address and compare the returned value (preferred source) with the
    addresses that the socket is bound to. When using secondary addresses,
    this will not match.

    Then it will try specifying each of the addresses that the socket is
    bound to and re-routing, checking if that address is valid as src for
    that dst. Thing is, this check alone is weak:

    # ip r l
    192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149
    192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147

    # ip a l
    1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
    valid_lft 2160sec preferred_lft 2160sec
    inet 192.168.122.148/24 scope global secondary eth0
    valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe15:186a/64 scope link
    valid_lft forever preferred_lft forever
    3: eth1: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
    valid_lft 2162sec preferred_lft 2162sec
    inet 192.168.100.148/24 scope global secondary eth1
    valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb3:9146/64 scope link
    valid_lft forever preferred_lft forever
    4: ens9: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe05:47ee/64 scope link
    valid_lft forever preferred_lft forever

    # ip r g 192.168.100.193 from 192.168.122.148
    192.168.100.193 from 192.168.122.148 dev eth1
    cache

    Even if you specify an interface:

    # ip r g 192.168.100.193 from 192.168.122.148 oif eth1
    192.168.100.193 from 192.168.122.148 dev eth1
    cache

    Although this would be valid, peers using rp_filter will drop such
    packets as their src doesn't match the routes for that interface.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • Paves the day for the next patch. Functionality stays untouched.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

30 Jun, 2015

1 commit

  • There is NULL pointer dereference possible during statistics update if the route
    used for OOTB responce is removed at unfortunate time. If the route exists when
    we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
    ABORT, but in the meantime route is removed under our feet, we take "no_route"
    path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).

    But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
    sctp_transport_set_owner() and therefore there is no asoc associated with this
    packet. Probably temporary asoc just for OOTB responces is overkill, so just
    introduce a check like in all other places in sctp_packet_transmit(), where
    "asoc" is dereferenced.

    To reproduce this, one needs to
    0. ensure that sctp module is loaded (otherwise ABORT is not generated)
    1. remove default route on the machine
    2. while true; do
    ip route del [interface-specific route]
    ip route add [interface-specific route]
    done
    3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
    responce

    On x86_64 the crash looks like this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: [] sctp_packet_transmit+0x63c/0x730 [sctp]
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: ...
    CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
    Hardware name: ...
    task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
    RIP: 0010:[] [] sctp_packet_transmit+0x63c/0x730 [sctp]
    RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
    RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
    RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
    R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
    R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
    FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
    Stack:
    ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
    ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
    0000000000000000 0000000000000001 0000000000000000 0000000000000000
    Call Trace:

    [] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
    [] ? sctp_transport_put+0x52/0x80 [sctp]
    [] sctp_do_sm+0xb8c/0x19a0 [sctp]
    [] ? trigger_load_balance+0x90/0x210
    [] ? update_process_times+0x59/0x60
    [] ? timerqueue_add+0x60/0xb0
    [] ? enqueue_hrtimer+0x29/0xa0
    [] ? read_tsc+0x9/0x10
    [] ? put_page+0x55/0x60
    [] ? clockevents_program_event+0x6d/0x100
    [] ? skb_free_head+0x58/0x80
    [] ? chksum_update+0x1b/0x27 [crc32c_generic]
    [] ? crypto_shash_update+0xce/0xf0
    [] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
    [] sctp_inq_push+0x46/0x60 [sctp]
    [] sctp_rcv+0x880/0x910 [sctp]
    [] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
    [] ? sctp_csum_update+0x20/0x20 [sctp]
    [] ? ip_route_input_noref+0x235/0xd30
    [] ? ack_ioapic_level+0x7b/0x150
    [] ip_local_deliver_finish+0xae/0x210
    [] ip_local_deliver+0x35/0x90
    [] ip_rcv_finish+0xf5/0x370
    [] ip_rcv+0x2b8/0x3a0
    [] __netif_receive_skb_core+0x763/0xa50
    [] __netif_receive_skb+0x18/0x60
    [] netif_receive_skb_internal+0x40/0xd0
    [] napi_gro_receive+0xe8/0x120
    [] rtl8169_poll+0x2da/0x660 [r8169]
    [] net_rx_action+0x21a/0x360
    [] __do_softirq+0xe1/0x2d0
    [] irq_exit+0xad/0xb0
    [] do_IRQ+0x58/0xf0
    [] common_interrupt+0x6d/0x6d

    [] ? hrtimer_start+0x18/0x20
    [] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
    [] ? mwait_idle+0x60/0xa0
    [] arch_cpu_idle+0xf/0x20
    [] cpu_startup_entry+0x3ec/0x480
    [] rest_init+0x85/0x90
    [] start_kernel+0x48b/0x4ac
    [] ? early_idt_handlers+0x120/0x120
    [] x86_64_start_reservations+0x2a/0x2c
    [] x86_64_start_kernel+0x161/0x184
    Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
    RIP [] sctp_packet_transmit+0x63c/0x730 [sctp]
    RSP
    CR2: 0000000000000020
    ---[ end trace 5aec7fd2dc983574 ]---
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
    drm_kms_helper: panic occurred, switching back to text console
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt

    Signed-off-by: Alexander Sverdlin
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Alexander Sverdlin
     

29 Jun, 2015

1 commit


24 Jun, 2015

1 commit


15 Jun, 2015

1 commit

  • ->auto_asconf_splist is per namespace and mangled by functions like
    sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

    Also, the call to inet_sk_copy_descendant() was backuping
    ->auto_asconf_list through the copy but was not honoring
    ->do_auto_asconf, which could lead to list corruption if it was
    different between both sockets.

    This commit thus fixes the list handling by using ->addr_wq_lock
    spinlock to protect the list. A special handling is done upon socket
    creation and destruction for that. Error handlig on sctp_init_sock()
    will never return an error after having initialized asconf, so
    sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
    will be take on sctp_close_sock(), before locking the socket, so we
    don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

    Instead of taking the lock on sctp_sock_migrate() for copying and
    restoring the list values, it's preferred to avoid rewritting it by
    implementing sctp_copy_descendant().

    Issue was found with a test application that kept flipping sysctl
    default_auto_asconf on and off, but one could trigger it by issuing
    simultaneous setsockopt() calls on multiple sockets or by
    creating/destroying sockets fast enough. This is only triggerable
    locally.

    Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
    Reported-by: Ji Jianwen
    Suggested-by: Neil Horman
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Jun, 2015

1 commit


13 Jun, 2015

1 commit

  • Currently, we can ask to authenticate DATA chunks and we can send DATA
    chunks on the same packet as COOKIE_ECHO, but if you try to combine
    both, the DATA chunk will be sent unauthenticated and peer won't accept
    it, leading to a communication failure.

    This happens because even though the data was queued after it was
    requested to authenticate DATA chunks, it was also queued before we
    could know that remote peer can handle authenticating, so
    sctp_auth_send_cid() returns false.

    The fix is whenever we set up an active key, re-check send queue for
    chunks that now should be authenticated. As a result, such packet will
    now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

    Reported-by: Liu Wei
    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

26 May, 2015

2 commits

  • Instead of doing the rt6->rt6i_node check whenever we need
    to get the route's cookie. Refactor it into rt6_get_cookie().
    It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
    percpu rt6_info later.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch removes the assumptions that the returned rt is always
    a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
    destination and source address. The dst and src can be recovered from
    the calling site.

    We may consider to rename (rt6i_dst, rt6i_src) to
    (rt6i_key_dst, rt6i_key_src) later.

    Signed-off-by: Martin KaFai Lau
    Reviewed-by: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

11 May, 2015

1 commit


25 Mar, 2015

1 commit


03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

02 Mar, 2015

1 commit


06 Feb, 2015

1 commit

  • Conflicts:
    drivers/net/vxlan.c
    drivers/vhost/net.c
    include/linux/if_vlan.h
    net/core/dev.c

    The net/core/dev.c conflict was the overlap of one commit marking an
    existing function static whilst another was adding a new function.

    In the include/linux/if_vlan.h case, the type used for a local
    variable was changed in 'net', whereas the function got rewritten
    to fix a stacked vlan bug in 'net-next'.

    In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
    overlapped with an endainness fix for VHOST 1.0 in 'net'.

    In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
    in 'net-next' whereas in 'net' there was a bug fix to pass in the
    correct network namespace pointer in calls to this function.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Feb, 2015

1 commit


31 Jan, 2015

1 commit

  • When making use of RFC5061, section 4.2.4. for setting the primary IP
    address, we're passing a wrong parameter header to param_type2af(),
    resulting always in NULL being returned.

    At this point, param.p points to a sctp_addip_param struct, containing
    a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
    id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
    the actual sctp_addr_param, which also contains a sctp_paramhdr, but
    this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
    param_type2af() can make use of. Since we already hold a pointer to
    addr_param from previous line, just reuse it for param_type2af().

    Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
    Signed-off-by: Saran Maruti Ramanara
    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Saran Maruti Ramanara
     

27 Jan, 2015

1 commit

  • When hitting an INIT collision case during the 4WHS with AUTH enabled, as
    already described in detail in commit 1be9a950c646 ("net: sctp: inherit
    auth_capable on INIT collisions"), it can happen that we occasionally
    still remotely trigger the following panic on server side which seems to
    have been uncovered after the fix from commit 1be9a950c646 ...

    [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
    [ 533.913657] IP: [] __kmalloc+0x95/0x230
    [ 533.940559] PGD 5030f2067 PUD 0
    [ 533.957104] Oops: 0000 [#1] SMP
    [ 533.974283] Modules linked in: sctp mlx4_en [...]
    [ 534.939704] Call Trace:
    [ 534.951833] [] ? crypto_init_shash_ops+0x60/0xf0
    [ 534.984213] [] crypto_init_shash_ops+0x60/0xf0
    [ 535.015025] [] __crypto_alloc_tfm+0x6d/0x170
    [ 535.045661] [] crypto_alloc_base+0x4c/0xb0
    [ 535.074593] [] ? _raw_spin_lock_bh+0x12/0x50
    [ 535.105239] [] sctp_inet_listen+0x161/0x1e0 [sctp]
    [ 535.138606] [] SyS_listen+0x9d/0xb0
    [ 535.166848] [] system_call_fastpath+0x16/0x1b

    ... or depending on the the application, for example this one:

    [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
    [ 1370.026506] IP: [] kmem_cache_alloc+0x75/0x1d0
    [ 1370.054568] PGD 633c94067 PUD 0
    [ 1370.070446] Oops: 0000 [#1] SMP
    [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
    [ 1370.963431] Call Trace:
    [ 1370.974632] [] ? SyS_epoll_ctl+0x53f/0x960
    [ 1371.000863] [] SyS_epoll_ctl+0x53f/0x960
    [ 1371.027154] [] ? anon_inode_getfile+0xd3/0x170
    [ 1371.054679] [] ? __alloc_fd+0xa7/0x130
    [ 1371.080183] [] system_call_fastpath+0x16/0x1b

    With slab debugging enabled, we can see that the poison has been overwritten:

    [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten
    [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
    [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
    [ 669.826424] __slab_alloc+0x4bf/0x566
    [ 669.826433] __kmalloc+0x280/0x310
    [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp]
    [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
    [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
    [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...]
    [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
    [ 669.826635] __slab_free+0x39/0x2a8
    [ 669.826643] kfree+0x1d6/0x230
    [ 669.826650] kzfree+0x31/0x40
    [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp]
    [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp]
    [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp]

    Since this only triggers in some collision-cases with AUTH, the problem at
    heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
    when having refcnt 1, once directly in sctp_assoc_update() and yet again
    from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
    the already kzfree'd memory, which is also consistent with the observation
    of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
    at a later point in time when poison is checked on new allocation).

    Reference counting of auth keys revisited:

    Shared keys for AUTH chunks are being stored in endpoints and associations
    in endpoint_shared_keys list. On endpoint creation, a null key is being
    added; on association creation, all endpoint shared keys are being cached
    and thus cloned over to the association. struct sctp_shared_key only holds
    a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
    keeps track of users internally through refcounting. Naturally, on assoc
    or enpoint destruction, sctp_shared_key are being destroyed directly and
    the reference on sctp_auth_bytes dropped.

    User space can add keys to either list via setsockopt(2) through struct
    sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
    adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
    with refcount 1 and in case of replacement drops the reference on the old
    sctp_auth_bytes. A key can be set active from user space through setsockopt()
    on the id via sctp_auth_set_active_key(), which iterates through either
    endpoint_shared_keys and in case of an assoc, invokes (one of various places)
    sctp_auth_asoc_init_active_key().

    sctp_auth_asoc_init_active_key() computes the actual secret from local's
    and peer's random, hmac and shared key parameters and returns a new key
    directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
    the reference if there was a previous one. The secret, which where we
    eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
    intitial refcount of 1, which also stays unchanged eventually in
    sctp_assoc_update(). This key is later being used for crypto layer to
    set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

    To close the loop: asoc->asoc_shared_key is freshly allocated secret
    material and independant of the sctp_shared_key management keeping track
    of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76
    ("net: sctp: fix memory leak in auth key management") is independant of
    this bug here since it concerns a different layer (though same structures
    being used eventually). asoc->asoc_shared_key is reference dropped correctly
    on assoc destruction in sctp_association_free() and when active keys are
    being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
    of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
    to remove that sctp_auth_key_put() from there which fixes these panics.

    Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

18 Jan, 2015

1 commit

  • I.e. one-to-many sockets in SCTP are not required to explicitly
    call into connect(2) or sctp_connectx(2) prior to data exchange.
    Instead, they can directly invoke sendmsg(2) and the SCTP stack
    will automatically trigger connection establishment through 4WHS
    via sctp_primitive_ASSOCIATE(). However, this in its current
    implementation is racy: INIT is being sent out immediately (as
    it cannot be bundled anyway) and the rest of the DATA chunks are
    queued up for later xmit when connection is established, meaning
    sendmsg(2) will return successfully. This behaviour can result
    in an undesired side-effect that the kernel made the application
    think the data has already been transmitted, although none of it
    has actually left the machine, worst case even after close(2)'ing
    the socket.

    Instead, when the association from client side has been shut down
    e.g. first gracefully through SCTP_EOF and then close(2), the
    client could afterwards still receive the server's INIT_ACK due
    to a connection with higher latency. This INIT_ACK is then considered
    out of the blue and hence responded with ABORT as there was no
    alive assoc found anymore. This can be easily reproduced f.e.
    with sctp_test application from lksctp. One way to fix this race
    is to wait for the handshake to actually complete.

    The fix defers waiting after sctp_primitive_ASSOCIATE() and
    sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
    from sctp_sendmsg() have already been placed into the output
    queue through the side-effect interpreter, and therefore can then
    be bundeled together with COOKIE_ECHO control chunks.

    strace from example application (shortened):

    socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
    sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
    msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
    sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
    msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
    sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
    msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
    sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
    msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
    sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
    msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
    msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
    close(3) = 0

    tcpdump before patch (fooling the application):

    22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
    22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
    22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

    tcpdump after patch:

    14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
    14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
    14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
    14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
    14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
    14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
    14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
    14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
    14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
    14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
    14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

    Looks like this bug is from the pre-git history museum. ;)

    Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann