24 Aug, 2018

1 commit

  • [ Upstream commit a9ba23d48dbc6ffd08426bb10f05720e0b9f5c14 ]

    At present the ipv6_renew_options_kern() function ends up calling into
    access_ok() which is problematic if done from inside an interrupt as
    access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
    (x86-64 is affected). Example warning/backtrace is shown below:

    WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
    ...
    Call Trace:

    ipv6_renew_option+0xb2/0xf0
    ipv6_renew_options+0x26a/0x340
    ipv6_renew_options_kern+0x2c/0x40
    calipso_req_setattr+0x72/0xe0
    netlbl_req_setattr+0x126/0x1b0
    selinux_netlbl_inet_conn_request+0x80/0x100
    selinux_inet_conn_request+0x6d/0xb0
    security_inet_conn_request+0x32/0x50
    tcp_conn_request+0x35f/0xe00
    ? __lock_acquire+0x250/0x16c0
    ? selinux_socket_sock_rcv_skb+0x1ae/0x210
    ? tcp_rcv_state_process+0x289/0x106b
    tcp_rcv_state_process+0x289/0x106b
    ? tcp_v6_do_rcv+0x1a7/0x3c0
    tcp_v6_do_rcv+0x1a7/0x3c0
    tcp_v6_rcv+0xc82/0xcf0
    ip6_input_finish+0x10d/0x690
    ip6_input+0x45/0x1e0
    ? ip6_rcv_finish+0x1d0/0x1d0
    ipv6_rcv+0x32b/0x880
    ? ip6_make_skb+0x1e0/0x1e0
    __netif_receive_skb_core+0x6f2/0xdf0
    ? process_backlog+0x85/0x250
    ? process_backlog+0x85/0x250
    ? process_backlog+0xec/0x250
    process_backlog+0xec/0x250
    net_rx_action+0x153/0x480
    __do_softirq+0xd9/0x4f7
    do_softirq_own_stack+0x2a/0x40

    ...

    While not present in the backtrace, ipv6_renew_option() ends up calling
    access_ok() via the following chain:

    access_ok()
    _copy_from_user()
    copy_from_user()
    ipv6_renew_option()

    The fix presented in this patch is to perform the userspace copy
    earlier in the call chain such that it is only called when the option
    data is actually coming from userspace; that place is
    do_ipv6_setsockopt(). Not only does this solve the problem seen in
    the backtrace above, it also allows us to simplify the code quite a
    bit by removing ipv6_renew_options_kern() completely. We also take
    this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
    a small amount as well.

    This patch is heavily based on a rough patch by Al Viro. I've taken
    his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
    to a memdup_user() call, made better use of the e_inval jump target in
    the same function, and cleaned up the use ipv6_renew_option() by
    ipv6_renew_options().

    CC: Al Viro
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     

28 Feb, 2018

1 commit

  • commit 01ea306f2ac2baff98d472da719193e738759d93 upstream.

    The Syzbot reported a possible deadlock in the netfilter area caused by
    rtnl lock, xt lock and socket lock being acquired with a different order
    on different code paths, leading to the following backtrace:
    Reviewed-by: Xin Long

    ======================================================
    WARNING: possible circular locking dependency detected
    4.15.0+ #301 Not tainted
    ------------------------------------------------------
    syzkaller233489/4179 is trying to acquire lock:
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    but task is already holding lock:
    (&xt[i].mutex){+.+.}, at: []
    xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

    which lock already depends on the new lock.
    ===

    Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
    only in the required scope"), we already acquire the socket lock in
    the innermost scope, where needed. In such commit I forgot to remove
    the outer-most socket lock from the getsockopt() path, this commit
    addresses the issues dropping it now.

    v1 -> v2: fix bad subj, added relavant 'fixes' tag

    Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
    Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
    Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
    Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com
    Suggested-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Signed-off-by: Pablo Neira Ayuso
    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

25 Feb, 2018

1 commit

  • commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream.

    Syzbot reported several deadlocks in the netfilter area caused by
    rtnl lock and socket lock being acquired with a different order on
    different code paths, leading to backtraces like the following one:

    ======================================================
    WARNING: possible circular locking dependency detected
    4.15.0-rc9+ #212 Not tainted
    ------------------------------------------------------
    syzkaller041579/3682 is trying to acquire lock:
    (sk_lock-AF_INET6){+.+.}, at: [] lock_sock
    include/net/sock.h:1463 [inline]
    (sk_lock-AF_INET6){+.+.}, at: []
    do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167

    but task is already holding lock:
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (rtnl_mutex){+.+.}:
    __mutex_lock_common kernel/locking/mutex.c:756 [inline]
    __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
    rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
    register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
    tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
    xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
    check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
    find_check_entry.isra.7+0x935/0xcf0
    net/ipv6/netfilter/ip6_tables.c:580
    translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
    do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
    do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
    ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
    udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    entry_SYSCALL_64_fastpath+0x29/0xa0

    -> #0 (sk_lock-AF_INET6){+.+.}:
    lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
    lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
    lock_sock include/net/sock.h:1463 [inline]
    do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
    ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
    udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    entry_SYSCALL_64_fastpath+0x29/0xa0

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(rtnl_mutex);
    lock(sk_lock-AF_INET6);
    lock(rtnl_mutex);
    lock(sk_lock-AF_INET6);

    *** DEADLOCK ***

    1 lock held by syzkaller041579/3682:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    The problem, as Florian noted, is that nf_setsockopt() is always
    called with the socket held, even if the lock itself is required only
    for very tight scopes and only for some operation.

    This patch addresses the issues moving the lock_sock() call only
    where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
    does not need anymore to acquire both locks.

    Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
    Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
    Suggested-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

31 Jan, 2018

1 commit

  • [ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

    Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
    sysctl setting") removed the initialisation of
    ipv6_pinfo::autoflowlabel and added a second flag to indicate
    whether this field or the net namespace default should be used.

    The getsockopt() handling for this case was not updated, so it
    currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
    not explicitly enabled. Fix it to return the effective value, whether
    that has been set at the socket or net namespace level.

    Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     

03 Jan, 2018

1 commit

  • [ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ]

    sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
    If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
    supposed to not include flowlabel. This is true for normal packet, but
    not for reset packet.

    The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
    we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
    changed, so the sock will keep the old behavior in terms of auto
    flowlabel. Reset packet is suffering from this problem, because reset
    packet is sent from a special control socket, which is created at boot
    time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
    socket will always have its ipv6_pinfo.autoflowlabel set, even after
    user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
    have flowlabel. Normal sock created before sysctl setting suffers from
    the same issue. We can't even turn off autoflowlabel unless we kill all
    socks in the hosts.

    To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
    autoflowlabel setting from user, otherwise we always call
    ip6_default_np_autolabel() which has the new settings of sysctl.

    Note, this changes behavior a little bit. Before commit 42240901f7c4
    (ipv6: Implement different admin modes for automatic flow labels), the
    autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
    existing connection will change autoflowlabel behavior. After that
    commit, autoflowlabel behavior is sticky in the whole life of the sock.
    With this patch, the behavior isn't sticky again.

    Cc: Martin KaFai Lau
    Cc: Eric Dumazet
    Cc: Tom Herbert
    Signed-off-by: Shaohua Li
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     

30 Aug, 2017

1 commit

  • ChunYu found a kernel warn_on during syzkaller fuzzing:

    [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
    [40226.144849] Call Trace:
    [40226.147590]
    [40226.149859] dump_stack+0xe2/0x186
    [40226.176546] __warn+0x1a4/0x1e0
    [40226.180066] warn_slowpath_null+0x31/0x40
    [40226.184555] inet_sock_destruct+0x78d/0x9a0
    [40226.246355] __sk_destruct+0xfa/0x8c0
    [40226.290612] rcu_process_callbacks+0xaa0/0x18a0
    [40226.336816] __do_softirq+0x241/0x75e
    [40226.367758] irq_exit+0x1f6/0x220
    [40226.371458] smp_apic_timer_interrupt+0x7b/0xa0
    [40226.376507] apic_timer_interrupt+0x93/0xa0

    The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
    As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
    udp has changed to use udp_destruct_sock as sk_destruct where it would
    udp_rmem_release all rmem.

    But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
    changing family to PF_INET. If rmem is not 0 at that time, and there is
    no place to release rmem before calling inet_sock_destruct, the warn_on
    will be triggered.

    This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
    any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
    already set it's sk_destruct with inet_sock_destruct and UDP has set with
    udp_destruct_sock since they're created.

    Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers")
    Reported-by: ChunYu Wang
    Signed-off-by: Xin Long
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Xin Long
     

06 Jul, 2017

1 commit

  • Pull memdup_user() conversions from Al Viro:
    "A fairly self-contained series - hunting down open-coded memdup_user()
    and memdup_user_nul() instances"

    * 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    bpf: don't open-code memdup_user()
    kimage_file_prepare_segments(): don't open-code memdup_user()
    ethtool: don't open-code memdup_user()
    do_ip_setsockopt(): don't open-code memdup_user()
    do_ipv6_setsockopt(): don't open-code memdup_user()
    irda: don't open-code memdup_user()
    xfrm_user_policy(): don't open-code memdup_user()
    ima_write_policy(): don't open-code memdup_user_nul()
    sel_write_validatetrans(): don't open-code memdup_user_nul()

    Linus Torvalds
     

04 Jul, 2017

1 commit


30 Jun, 2017

1 commit


31 Dec, 2016

1 commit

  • IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
    does not match it. e.g.,

    ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument

    Relax the check in setsockopt to allow setting mc_index to an L3 slave if
    sk_bound_dev_if points to an L3 master.

    Make a similar change for IPv6. In this case change the device lookup to
    take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
    the lookup of a potential L3 master device.

    This really only silences a setsockopt failure since uses of mc_index are
    secondary to sk_bound_dev_if if it is set. In both cases, if either index
    is an L3 slave or master, lookups are directed to the same FIB table so
    relaxing the check at setsockopt time causes no harm.

    Patch is based on a suggested change by Darwin for a problem noted in
    their code base.

    Suggested-by: Darwin Dingel
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

25 Dec, 2016

1 commit


10 Nov, 2016

1 commit


04 Nov, 2016

1 commit

  • When reading a datagram or raw packet that arrived fragmented, expose
    the maximum fragment size if recorded to allow applications to
    estimate receive path MTU.

    At this point, the field is only recorded when ipv6 connection
    tracking is enabled. A follow-up patch will record this field also
    in the ipv6 input path.

    Tested using the test for IP_RECVFRAGSIZE plus

    ip netns exec to ip addr add dev veth1 fc07::1/64
    ip netns exec from ip addr add dev veth0 fc07::2/64

    ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
    ip netns exec from nc -q 1 -u fc07::1 6000 < payload

    Both with and without enabling connection tracking

    ip6tables -A INPUT -m state --state NEW -p udp -j LOG

    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

21 Oct, 2016

1 commit

  • Baozeng reported this deadlock case:

    CPU0 CPU1
    ---- ----
    lock([ 165.136033] sk_lock-AF_INET6);
    lock([ 165.136033] rtnl_mutex);
    lock([ 165.136033] sk_lock-AF_INET6);
    lock([ 165.136033] rtnl_mutex);

    Similar to commit 87e9f0315952
    ("ipv4: fix a potential deadlock in mcast getsockopt() path")
    this is due to we still have a case, ipv6_sock_mc_close(),
    where we acquire sk_lock before rtnl_lock. Close this deadlock
    with the similar solution, that is always acquire rtnl lock first.

    Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
    Reported-by: Baozeng Ding
    Tested-by: Baozeng Ding
    Cc: Marcelo Ricardo Leitner
    Signed-off-by: Cong Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    WANG Cong
     

07 Jul, 2016

1 commit


28 Jun, 2016

1 commit


04 May, 2016

1 commit

  • In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
    variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
    functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
    This is not a good practice and makes it hard to add new parameters.
    This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
    ipv4 and include the above mentioned variables. And we only pass the
    pointer to this structure to corresponding functions. This makes it easier
    to add new parameters in the future and makes the function cleaner.

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     

08 Apr, 2016

1 commit


05 Apr, 2016

1 commit

  • Process socket-level control messages by invoking
    __sock_cmsg_send in ip6_datagram_send_ctl for control messages on
    the SOL_SOCKET layer.

    This makes sure whenever ip6_datagram_send_ctl is called for
    udp and raw, we also process socket-level control messages.

    This is a bit uglier than IPv4, since IPv6 does not have
    something like ipcm_cookie. Perhaps we can later create
    a control message cookie for IPv6?

    Note that this commit interprets new control messages that
    were ignored before. As such, this commit does not change
    the behavior of IPv6 control messages.

    Signed-off-by: Soheil Hassas Yeganeh
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     

03 Dec, 2015

1 commit

  • This patch addresses multiple problems :

    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.

    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())

    This patch adds full RCU protection to np->opt

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Apr, 2015

1 commit

  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x == NULL and sometimes as !x. !x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

21 Mar, 2015

1 commit

  • Commit baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
    missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
    IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.

    As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
    do_ipv6_setsockopt, we are good to just move the rtnl lock upper.

    Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
    Reported-by: Ying Huang
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

19 Mar, 2015

2 commits

  • in favor of their inner __ ones, which doesn't grab rtnl.

    As these functions need to operate on a locked socket, we can't be
    grabbing rtnl by then. It's too late and doing so causes reversed
    locking.

    So this patch:
    - move rtnl handling to callers instead while already fixing some
    reversed locking situations, like on vxlan and ipvs code.
    - renames __ ones to not have the __ mark:
    __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
    __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • There are some setsockopt operations in ipv4 and ipv6 that are grabbing
    rtnl after having grabbed the socket lock. Yet this makes it impossible
    to do operations that have to lock the socket when already within a rtnl
    protected scope, like ndo dev_open and dev_stop.

    We normally take coarse grained locks first but setsockopt inverted that.

    So this patch invert the lock logic for these operations and makes
    setsockopt grab rtnl if it will be needed prior to grabbing socket lock.

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

26 Jan, 2015

1 commit

  • IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
    to charge the skb to socket.

    It means that destructor must be called while socket is locked.

    Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
    to protect ourselves : kfree_skb() might race with other users
    manipulating sk->sk_forward_alloc

    Fix this race by holding socket lock for the duration of
    ip6_datagram_recv_ctl()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Dec, 2014

1 commit


25 Aug, 2014

2 commits

  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    This patch removes some blank lines between the end of a function
    definition and the EXPORT_SYMBOL_GPL macro in order to prevent
    checkpatch warning that EXPORT_SYMBOL must immediately follow
    a function.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    A number of items are addressed in this patch:
    * Multiple spaces converted to tabs
    * Spaces before tabs removed.
    * Spaces in pointer typing cleansed (char *)foo etc.
    * Remove space after sizeof
    * Ensure spacing around comparators such as if statements.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

16 Jul, 2014

1 commit


08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jul, 2014

1 commit

  • When an UDP application switches from AF_INET to AF_INET6 sockets, we
    have a small performance degradation for IPv4 communications because of
    extra cache line misses to access ipv6only information.

    This can also be noticed for TCP listeners, as ipv6_only_sock() is also
    used from __inet_lookup_listener()->compute_score()

    This is magnified when SO_REUSEPORT is used.

    Move ipv6only into struct sock_common so that it is available at
    no extra cost in lookups.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Feb, 2014

1 commit

  • This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
    got recently introduced. It doesn't honor the path mtu discovered by the
    host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
    fragments if the packet size exceeds the MTU of the outgoing interface
    MTU.

    Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
    Cc: Florian Weimer
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

20 Jan, 2014

2 commits

  • We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
    for IPv4 datagrams on IPv6 sockets.

    This patch splits the ip6_datagram_recv_ctl into two functions, one
    which handles both protocol families, AF_INET and AF_INET6, while the
    ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

    ip6_datagram_recv_*_ctl never reported back any errors, so we can make
    them return void. Also provide a helper for protocols which don't offer dual
    personality to further use ip6_datagram_recv_ctl, which is exported to
    modules.

    I needed to shuffle the code for ping around a bit to make it easier to
    implement dual personality for ping ipv6 sockets in future.

    Reported-by: Gert Doering
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • This information is already available via IPV6_FLOWINFO
    of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
    information. But it is probably logical and easier for users to add this
    here, and to control both sent/received flow label values with the
    IPV6_FLOWLABEL_MGR option.

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

16 Jan, 2014

1 commit


19 Dec, 2013

1 commit

  • IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
    nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
    information if this mode is enabled.

    The additional bit in ipv6_pinfo just eats in the padding behind the
    bitfield. There are no changes to the layout of the struct at all.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

13 Dec, 2013

1 commit

  • Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd
    "ipv6: add flowinfo for tcp6 pkt_options for all cases"

    Reported-by: kbuild test robot

    V2: fix the title, add empty line after the declaration (Sergei Shtylyov
    feedbacks)

    Signed-off-by: David S. Miller

    Florent Fourcot
     

10 Dec, 2013

2 commits

  • tclass information in now already stored in rcv_flowinfo
    We do not need to store the same information twice.

    Signed-off-by: Florent Fourcot
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     
  • The current implementation of IPV6_FLOWINFO only gives a
    result if pktoptions is available (thanks to the
    ip6_datagram_recv_ctl function).
    It gives inconsistent results to user space, sometimes
    there is a result for getsockopt(IPV6_FLOWINFO), sometimes
    not.

    This patch add rcv_flowinfo to store it, and return it to
    the userspace in the same way than other pkt_options.

    Signed-off-by: Florent Fourcot
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

09 Nov, 2013

1 commit

  • It is already possible to set/put/renew a label
    with IPV6_FLOWLABEL_MGR and setsockopt. This patch
    add the possibility to get information about this
    label (current value, time before expiration, etc).

    It helps application to take decision for a renew
    or a release of the label.

    v2:
    * Add spin_lock to prevent race condition
    * return -ENOENT if no result found
    * check if flr_action is GET

    v3:
    * move the spin_lock to protect only the
    relevant code

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot