01 Oct, 2013

1 commit

  • - Move sysctl_local_ports from a global variable into struct netns_ipv4.
    - Modify inet_get_local_port_range to take a struct net, and update all
    of the callers.
    - Move the initialization of sysctl_local_ports into
    sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

    v2:
    - Ensure indentation used tabs
    - Fixed ip.h so it applies cleanly to todays net-next

    v3:
    - Compile fixes of strange callers of inet_get_local_port_range.
    This patch now successfully passes an allmodconfig build.
    Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

    Originally-by: Samya
    Acked-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

17 Sep, 2013

1 commit

  • Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
    messages. For IPv6, RFC4443, section 2.4. says:

    ...
    (e) An ICMPv6 error message MUST NOT be originated as a result of
    receiving the following:
    ...
    (e.2) An ICMPv6 redirect message [IPv6-DISC].
    ...

    Therefore, do not report an error to user space, just invoke dst's redirect
    callback and leave, same for IPv4 as done in TCP as well. The implication
    w/o having this patch could be that the reception of such packets would
    generate a poll notification and in worst case it could even tear down the
    whole connection. Therefore, stop updating sk_err on redirects.

    Reported-by: Duan Jiong
    Reported-by: Hannes Frederic Sowa
    Suggested-by: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

13 Sep, 2013

1 commit

  • Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not
    being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport
    does not seem to have the desired effect:

    SCTP + IPv4:

    22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116)
    192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72
    22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340)
    192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1):

    SCTP + IPv6:

    22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364)
    fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp
    1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10]

    Moreover, Alan says:

    This problem was seen with both Racoon and Racoon2. Other people have seen
    this with OpenSwan. When IPsec is configured to encrypt all upper layer
    protocols the SCTP connection does not initialize. After using Wireshark to
    follow packets, this is because the SCTP packet leaves Box A unencrypted and
    Box B believes all upper layer protocols are to be encrypted so it drops
    this packet, causing the SCTP connection to fail to initialize. When IPsec
    is configured to encrypt just SCTP, the SCTP packets are observed unencrypted.

    In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext"
    string on the other end, results in cleartext on the wire where SCTP eventually
    does not report any errors, thus in the latter case that Alan reports, the
    non-paranoid user might think he's communicating over an encrypted transport on
    SCTP although he's not (tcpdump ... -X):

    ...
    0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l....
    0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext...

    Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the
    receiver side. Initial follow-up analysis from Alan's bug report was done by
    Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this.

    SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit().
    This has the implication that it probably never really got updated along with
    changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers.

    SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since
    a call to inet6_csk_xmit() would solve this problem, but result in unecessary
    route lookups, let us just use the cached flowi6 instead that we got through
    sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(),
    we do the route lookup / flow caching in sctp_transport_route(), hold it in
    tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in
    sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect
    of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst()
    instead to get the correct source routed dst entry, which we assign to the skb.

    Also source address routing example from 625034113 ("sctp: fix sctp to work with
    ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095
    it is actually 'recommended' to not use that anyway due to traffic amplification [1].
    So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if
    we overwrite the flow destination here, the lower IPv6 layer will be unable to
    put the correct destination address into IP header, as routing header is added in
    ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside,
    result of this patch is that we do not have any XfrmInTmplMismatch increase plus on
    the wire with this patch it now looks like:

    SCTP + IPv6:

    08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba:
    AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72
    08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a:
    AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296

    This fixes Kernel Bugzilla 24412. This security issue seems to be present since
    2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have
    its fun with that. lksctp-tools IPv6 regression test suite passes as well with
    this patch.

    [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf

    Reported-by: Alan Chester
    Reported-by: Alexey Dobriyan
    Signed-off-by: Daniel Borkmann
    Cc: Steffen Klassert
    Cc: Hannes Frederic Sowa
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Sep, 2013

2 commits

  • This was originally reported in [1] and posted by Neil Horman [2], he said:

    Fix up a missed null pointer check in the asconf code. If we don't find
    a local address, but we pass in an address length of more than 1, we may
    dereference a NULL laddr pointer. Currently this can't happen, as the only
    users of the function pass in the value 1 as the addrcnt parameter, but
    its not hot path, and it doesn't hurt to check for NULL should that ever
    be the case.

    The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered
    from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1
    while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR
    so that we do *not* find a single address in the association's bind address
    list that is not in the packed array of addresses. If this happens when we
    have an established association with ASCONF-capable peers, then we could get
    a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1
    and call later sctp_make_asconf_update_ip() with NULL laddr.

    BUT: this actually won't happen as sctp_bindx_rem() will catch such a case
    and return with an error earlier. As this is incredably unintuitive and error
    prone, add a check to catch at least future bugs here. As Neil says, its not
    hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the
    single-homed host").

    [1] http://www.spinics.net/lists/linux-sctp/msg02132.html
    [2] http://www.spinics.net/lists/linux-sctp/msg02133.html

    Reported-by: Dan Carpenter
    Signed-off-by: Neil Horman
    Signed-off-by: Daniel Borkmann
    Cc: Michio Honda
    Acked-By: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • If we do not add braces around ...

    mask |= POLLERR |
    sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0;

    ... then this condition always evaluates to true as POLLERR is
    defined as 8 and binary or'd with whatever result comes out of
    sock_flag(). Hence instead of (X | Y) ? A : B, transform it into
    X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix
    smatch warnings inside datagram_poll") forgot about SCTP. :-(

    Introduced by 7d4c04fc170 ("net: add option to enable error queue
    packets waking select").

    Signed-off-by: Daniel Borkmann
    Cc: Jacob Keller
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Acked-by: Jacob Keller
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

05 Sep, 2013

1 commit

  • net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

    Initially the problem was observed with ipsec, but later it became clear that
    SCTP data chunk fragmentation algorithm has problems with MTU values which are
    not multiple of 4. Test program was used which just transmits 2000 bytes long
    packets to other host. tcpdump was used to observe re-fragmentation in IP layer
    after SCTP already fragmented data chunks.

    With MTU 1500:
    12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
    12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
    12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

    With MTU 1499:
    13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
    13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
    10.151.38.153 > 10.151.24.91: ip-proto-132
    13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
    13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

    Here problem in data portion limit calculation leads to re-fragmentation in IP,
    which is sub-optimal. The problem is max_data initial value, which doesn't take
    into account the fact, that data chunk must be padded to 4-bytes boundary.
    It's enough to correct max_data, because all later adjustments are correctly
    aligned to 4-bytes boundary.

    After the fix is applied, everything is fragmented correctly for uneven MTUs:
    15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
    15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
    15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

    The bug was there for years already, but
    - is a performance issue, the packets are still transmitted
    - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)

    Signed-off-by: Alexander Sverdlin
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Alexander Sverdlin
     

04 Sep, 2013

1 commit

  • This is a follow-up commit for commit b1dcdc68b1f4 ("net: tcp_probe:
    allow more advanced ingress filtering by mark") that allows for
    advanced SCTP probe module filtering based on skb mark (for a more
    detailed description and advantages using mark, refer to b1dcdc68b1f4).
    The current option to filter by a given port is still being preserved.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

30 Aug, 2013

1 commit


23 Aug, 2013

1 commit


17 Aug, 2013

1 commit


16 Aug, 2013

1 commit


14 Aug, 2013

1 commit

  • This patch adds a base infrastructure that allows SCTP to do
    memory accounting for control chunks. Real accounting code will
    follow.

    This patch alos fixes the following triggered bug ...

    [ 553.109742] kernel BUG at include/linux/skbuff.h:1813!
    [ 553.109766] invalid opcode: 0000 [#1] SMP
    [ 553.109789] Modules linked in: sctp libcrc32c rfcomm [...]
    [ 553.110259] uinput i915 i2c_algo_bit drm_kms_helper e1000e drm ptp
    pps_core i2c_core wmi video sunrpc
    [ 553.110320] CPU: 0 PID: 1636 Comm: lt-test_1_to_1_ Not tainted
    3.11.0-rc3+ #2
    [ 553.110350] Hardware name: LENOVO 74597D6/74597D6, BIOS 6DET60WW
    (3.10 ) 09/17/2009
    [ 553.110381] task: ffff88020a01dd40 ti: ffff880204ed0000 task.ti:
    ffff880204ed0000
    [ 553.110411] RIP: 0010:[] []
    skb_orphan.part.9+0x4/0x6 [sctp]
    [ 553.110459] RSP: 0018:ffff880204ed1bb8 EFLAGS: 00010286
    [ 553.110483] RAX: ffff8802086f5a40 RBX: ffff880204303300 RCX:
    0000000000000000
    [ 553.110487] RDX: ffff880204303c28 RSI: ffff8802086f5a40 RDI:
    ffff880202158000
    [ 553.110487] RBP: ffff880204ed1bb8 R08: 0000000000000000 R09:
    0000000000000000
    [ 553.110487] R10: ffff88022f2d9a04 R11: ffff880233001600 R12:
    0000000000000000
    [ 553.110487] R13: ffff880204303c00 R14: ffff8802293d0000 R15:
    ffff880202158000
    [ 553.110487] FS: 00007f31b31fe740(0000) GS:ffff88023bc00000(0000)
    knlGS:0000000000000000
    [ 553.110487] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 553.110487] CR2: 000000379980e3e0 CR3: 000000020d225000 CR4:
    00000000000407f0
    [ 553.110487] Stack:
    [ 553.110487] ffff880204ed1ca8 ffffffffa068d7fc 0000000000000000
    0000000000000000
    [ 553.110487] 0000000000000000 ffff8802293d0000 ffff880202158000
    ffffffff81cb7900
    [ 553.110487] 0000000000000000 0000400000001c68 ffff8802086f5a40
    000000000000000f
    [ 553.110487] Call Trace:
    [ 553.110487] [] sctp_sendmsg+0x6bc/0xc80 [sctp]
    [ 553.110487] [] ? sock_has_perm+0x75/0x90
    [ 553.110487] [] inet_sendmsg+0x63/0xb0
    [ 553.110487] [] ? selinux_socket_sendmsg+0x23/0x30
    [ 553.110487] [] sock_sendmsg+0xa6/0xd0
    [ 553.110487] [] ? _raw_spin_unlock_bh+0x15/0x20
    [ 553.110487] [] SYSC_sendto+0x128/0x180
    [ 553.110487] [] ? SYSC_connect+0xdb/0x100
    [ 553.110487] [] ? sctp_inet_listen+0x71/0x1f0
    [sctp]
    [ 553.110487] [] SyS_sendto+0xe/0x10
    [ 553.110487] [] system_call_fastpath+0x16/0x1b
    [ 553.110487] Code: e0 48 c7 c7 00 22 6a a0 e8 67 a3 f0 e0 48 c7 [...]
    [ 553.110487] RIP [] skb_orphan.part.9+0x4/0x6
    [sctp]
    [ 553.110487] RSP
    [ 553.121578] ---[ end trace 46c20c5903ef5be2 ]---

    The approach taken here is to split data and control chunks
    creation a bit. Data chunks already have memory accounting
    so noting needs to happen. For control chunks, add stubs handlers.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

13 Aug, 2013

2 commits

  • Probably this one is quite unlikely to be triggered, but it's more safe
    to do the call_rcu() at the end after we have dropped the reference on
    the asoc and freed sctp packet chunks. The reason why is because in
    sctp_transport_destroy_rcu() the transport is being kfree()'d, and if
    we're unlucky enough we could run into corrupted pointers. Probably
    that's more of theoretical nature, but it's safer to have this simple fix.

    Introduced by commit 8c98653f ("sctp: sctp_close: fix release of bindings
    for deferred call_rcu's"). I also did the 8c98653f regression test and
    it's fine that way.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The SCTP Quick failover draft [1] section 5.1, point 5 says that the cwnd
    should be 1 MTU. So, instead of 1, set it to 1 MTU.

    [1] https://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

    Reported-by: Karl Heiss
    Signed-off-by: Daniel Borkmann
    Cc: Neil Horman
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Aug, 2013

3 commits


03 Aug, 2013

1 commit

  • When sctp sits on IPv6, sctp_transport_dst_check pass cookie as ZERO,
    as a result ip6_dst_check always fail out. This behaviour makes
    transport->dst useless, because every sctp_packet_transmit must look
    for valid dst.

    Add a dst_cookie into sctp_transport, and set the cookie whenever we
    get new dst for sctp_transport. So dst validness could be checked
    against it.

    Since I have split genid for IPv4 and IPv6, also delete/add IPv6 address
    will also bump IPv6 genid. So issues we discussed in:
    http://marc.info/?l=linux-netdev&m=137404469219410&w=4
    have all been sloved for this patch.

    Signed-off-by: Fan Du
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    fan.du
     

28 Jul, 2013

1 commit


25 Jul, 2013

1 commit

  • The SCTP mailing list address to send patches or questions
    to is linux-sctp@vger.kernel.org and not
    lksctp-developers@lists.sourceforge.net anymore. Therefore,
    update all occurences.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Jul, 2013

1 commit

  • This fix has been proposed originally by Vlad Yasevich. He says:

    When SCTP makes forward progress (receives a SACK that acks new chunks,
    renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
    the route as confirmed so we don't unnecessarily send NUD probes.

    Having a simple SCTP client/server that exchange data chunks every 1sec,
    without this patch ARP requests are sent periodically every 40-60sec.
    With this fix applied, an ARP request is only done once right at the
    "session" beginning. Also, when clearing the related ARP cache entry
    manually during the session, a new request is correctly done. I have
    only "backported" this to net-next and tested that it works, so full
    credit goes to Vlad.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Jul, 2013

1 commit

  • Macro get_unused_fd() is used to allocate a file descriptor with
    default flags. Those default flags (0) can be "unsafe":
    O_CLOEXEC must be used by default to not leak file descriptor
    across exec().

    Instead of macro get_unused_fd(), functions anon_inode_getfd()
    or get_unused_fd_flags() should be used with flags given by userspace.
    If not possible, flags should be set to O_CLOEXEC to provide userspace
    with a default safe behavor.

    In a further patch, get_unused_fd() will be removed so that
    new code start using anon_inode_getfd() or get_unused_fd_flags()
    with correct flags.

    This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    The hard coded flag value (0) should be reviewed on a per-subsystem basis,
    and, if possible, set to O_CLOEXEC.

    Signed-off-by: Yann Droneaud
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Yann Droneaud
     

02 Jul, 2013

2 commits

  • After having reworked the debugging framework, Neil and Vlad agreed to
    get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:

    We can use systemtap scripts to investigate these things, we now have
    pr_debug() helpers that make life easier, and if we really need anything
    else besides those tools, we will be forced to come up with something
    better than we have there. Therefore, get rid of this ifdef debugging
    code entirely for now.

    Signed-off-by: Daniel Borkmann
    CC: Vlad Yasevich
    CC: Neil Horman
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • We should get rid of all own SCTP debug printk macros and use the ones
    that the kernel offers anyway instead. This makes the code more readable
    and conform to the kernel code, and offers all the features of dynamic
    debbuging that pr_debug() et al has, such as only turning on/off portions
    of debug messages at runtime through debugfs. The runtime cost of having
    CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
    is negligible [1]. If kernel debugging is completly turned off, then these
    statements will also compile into "empty" functions.

    While we're at it, we also need to change the Kconfig option as it /now/
    only refers to the ifdef'ed code portions in outqueue.c that enable further
    debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
    was enabled with this Kconfig option and has now been removed, we
    transform those code parts into WARNs resp. where appropriate BUG_ONs so
    that those bugs can be more easily detected as probably not many people
    have SCTP debugging permanently turned on.

    To turn on all SCTP debugging, the following steps are needed:

    # mount -t debugfs none /sys/kernel/debug
    # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control

    This can be done more fine-grained on a per file, per line basis and others
    as described in [2].

    [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
    [2] Documentation/dynamic-debug-howto.txt

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

26 Jun, 2013

4 commits

  • No need to have an extra ret variable when we directly can return
    the value of sctp_get_port_local().

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Rather instead of having the endpoint clean the garbage from the
    socket, use a sk_destruct handler sctp_destruct_sock(), that does
    the job for that when there are no more references on the socket.
    At least do this for our crypto transform through crypto_free_hash()
    that is allocated when in listening state.

    Also, perform sctp_put_port() only when sk is valid. At a later
    point in time we can still determine if there's an option of
    placing this into sk_prot->unhash() or sctp_endpoint_free() without
    any races. For now, leave it in sctp_endpoint_destroy() though.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • A trailing newline has been forgotten to add into the WARN().

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently, SCTP code defines its own timeval functions (since timeval
    is rarely used inside the kernel by others), namely tv_lt() and
    TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

    We might as well remove all those, and operate directly on ktime
    structures for a couple of reasons: ktime is available on all archs;
    complexity of ktime calculations depending on the arch is less than
    (reduces to a simple arithmetic operations on archs with
    BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
    functions (other archs); code becomes more readable; macros can be
    thrown out.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

20 Jun, 2013

2 commits

  • Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     
  • Conflicts:
    drivers/net/wireless/ath/ath9k/Kconfig
    drivers/net/xen-netback/netback.c
    net/batman-adv/bat_iv_ogm.c
    net/wireless/nl80211.c

    The ath9k Kconfig conflict was a change of a Kconfig option name right
    next to the deletion of another option.

    The xen-netback conflict was overlapping changes involving the
    handling of the notify list in xen_netbk_rx_action().

    Batman conflict resolution provided by Antonio Quartulli, basically
    keep everything in both conflict hunks.

    The nl80211 conflict is a little more involved. In 'net' we added a
    dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
    Linus reported. Meanwhile in 'net-next' the handlers were converted
    to use pre and post doit handlers which use a flag to determine
    whether to hold the RTNL mutex around the operation.

    However, the dump handlers to not use this logic. Instead they have
    to explicitly do the locking. There were apparent bugs in the
    conversion of nl80211_dump_wiphy() in that we were not dropping the
    RTNL mutex in all the return paths, and it seems we very much should
    be doing so. So I fixed that whilst handling the overlapping changes.

    To simplify the initial returns, I take the RTNL mutex after we try
    to allocate 'tb'.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Jun, 2013

2 commits

  • SCTP_STATIC is just another define for the static keyword. It's use
    is inconsistent in the SCTP code anyway and it was introduced in the
    initial implementation of SCTP in 2.5. We have a regression suite in
    lksctp-tools, but this is for user space only, so noone makes use of
    this macro anymore. The kernel test suite for 2.5 is incompatible with
    the current SCTP code anyway.

    So simply Remove it, to be more consistent with the rest of the kernel
    code.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • t_new rather obfuscates things where everyone else is using actual
    function names instead of that macro, so replace it with kzalloc,
    which is the function t_new wraps.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Jun, 2013

4 commits

  • In case we need to bail out for whatever reason during assoc
    init, we call sctp_endpoint_put() and then sock_put(), however,
    we've hold both refs in reverse, non-symmetric order, so first
    sctp_endpoint_hold() and then sock_hold().

    Reverse this, so that in an error case we have sock_put() and then
    sctp_endpoint_put(). Actually shouldn't matter too much, since both
    cleanup paths do the right thing, but that way, it is more consistent
    with the rest of the code.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • It's only used at this one time, so we could remove it as well.
    This is valid and also makes it more explicit/obvious that in case
    of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock()
    check that was recently added.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • While this currently cannot trigger any NULL pointer dereference in
    sctp_seq_dump_local_addrs(), better change the order of commands to
    prevent a future bug to happen. Although we first add SCTP_CMD_NEW_ASOC
    and then set the SCTP_CMD_INIT_CHOOSE_TRANSPORT, it is okay for now,
    since this primitive is only called by sctp_connect() or sctp_sendmsg()
    with sctp_assoc_add_peer() set first. However, lets do this precaution
    and first set the transport and then add it to the association hashlist
    to prevent in future something to possibly triggering this.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This clearly states a BUG somewhere in the SCTP code as e.g. fixed once
    in f28156335 ("sctp: Use correct sideffect command in duplicate cookie
    handling"). If this ever happens, throw a trace in the sideeffect engine
    where assocs clearly must have a primary_path assigned.

    When in sctp_seq_dump_local_addrs() also throw a WARN and bail out since
    we do not need to panic for printing this one asterisk. Also, it will
    avoid the not so obvious case when primary != NULL test passes and at a
    later point in time triggering a NULL ptr dereference caused by primary.
    While at it, also fix up the white space.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Jun, 2013

1 commit

  • In commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86
    (refactor sctp_outq_teardown to insure proper re-initalization)
    we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
    outq structure. Steve West recently asked me why I removed the q->error = 0
    initalization from sctp_outq_teardown. I did so because I was operating under
    the impression that sctp_outq_init would properly initalize that value for us,
    but it doesn't. sctp_outq_init operates under the assumption that the outq
    struct is all 0's (as it is when called from sctp_association_init), but using
    it in __sctp_outq_teardown violates that assumption. We should do a memset in
    sctp_outq_init to ensure that the entire structure is in a known state there
    instead.

    Signed-off-by: Neil Horman
    Reported-by: "West, Steve (NSN - US/Fort Worth)"
    CC: Vlad Yasevich
    CC: netdev@vger.kernel.org
    CC: davem@davemloft.net
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Neil Horman
     

13 Jun, 2013

2 commits


11 Jun, 2013

1 commit

  • While stress testing sctp sockets, I hit the following panic:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: [] sctp_endpoint_free+0xe/0x40 [sctp]
    PGD 7cead067 PUD 7ce76067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: sctp(F) libcrc32c(F) [...]
    CPU: 7 PID: 2950 Comm: acc Tainted: GF 3.10.0-rc2+ #1
    Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011
    task: ffff88007ce0e0c0 ti: ffff88007b568000 task.ti: ffff88007b568000
    RIP: 0010:[] [] sctp_endpoint_free+0xe/0x40 [sctp]
    RSP: 0018:ffff88007b569e08 EFLAGS: 00010292
    RAX: 0000000000000000 RBX: ffff88007db78a00 RCX: dead000000200200
    RDX: ffffffffa049fdb0 RSI: ffff8800379baf38 RDI: 0000000000000000
    RBP: ffff88007b569e18 R08: ffff88007c230da0 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: ffff880077990d00 R14: 0000000000000084 R15: ffff88007db78a00
    FS: 00007fc18ab61700(0000) GS:ffff88007fc60000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000020 CR3: 000000007cf9d000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff88007b569e38 ffff88007db78a00 ffff88007b569e38 ffffffffa049fded
    ffffffff81abf0c0 ffff88007db78a00 ffff88007b569e58 ffffffff8145b60e
    0000000000000000 0000000000000000 ffff88007b569eb8 ffffffff814df36e
    Call Trace:
    [] sctp_destroy_sock+0x3d/0x80 [sctp]
    [] sk_common_release+0x1e/0xf0
    [] inet_create+0x2ae/0x350
    [] __sock_create+0x11f/0x240
    [] sock_create+0x30/0x40
    [] SyS_socket+0x4c/0xc0
    [] ? do_page_fault+0xe/0x10
    [] ? page_fault+0x22/0x30
    [] system_call_fastpath+0x16/0x1b
    Code: 0c c9 c3 66 2e 0f 1f 84 00 00 00 00 00 e8 fb fe ff ff c9 c3 66 0f
    1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90
    8b 47 20 48 89 fb c6 47 1c 01 c6 40 12 07 e8 9e 68 01 00 48
    RIP [] sctp_endpoint_free+0xe/0x40 [sctp]
    RSP
    CR2: 0000000000000020
    ---[ end trace e0d71ec1108c1dd9 ]---

    I did not hit this with the lksctp-tools functional tests, but with a
    small, multi-threaded test program, that heavily allocates, binds,
    listens and waits in accept on sctp sockets, and then randomly kills
    some of them (no need for an actual client in this case to hit this).
    Then, again, allocating, binding, etc, and then killing child processes.

    This panic then only occurs when ``echo 1 > /proc/sys/net/sctp/auth_enable''
    is set. The cause for that is actually very simple: in sctp_endpoint_init()
    we enter the path of sctp_auth_init_hmacs(). There, we try to allocate
    our crypto transforms through crypto_alloc_hash(). In our scenario,
    it then can happen that crypto_alloc_hash() fails with -EINTR from
    crypto_larval_wait(), thus we bail out and release the socket via
    sk_common_release(), sctp_destroy_sock() and hit the NULL pointer
    dereference as soon as we try to access members in the endpoint during
    sctp_endpoint_free(), since endpoint at that time is still NULL. Now,
    if we have that case, we do not need to do any cleanup work and just
    leave the destruction handler.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann