29 Aug, 2016

1 commit


30 Jul, 2016

1 commit

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     

07 Jul, 2016

1 commit


28 Jun, 2016

1 commit

  • CALIPSO is a packet labelling protocol for IPv6 which is very similar
    to CIPSO. It is specified in RFC 5570. Much of the code is based on
    the current CIPSO code.

    This adds support for adding passthrough-type CALIPSO DOIs through the
    NLBL_CALIPSO_C_ADD command. It requires attributes:

    NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS.
    NLBL_CALIPSO_A_DOI.

    In passthrough mode the CALIPSO engine will map MLS secattr levels
    and categories directly to the packet label.

    At this stage, the major difference between this and the CIPSO
    code is that IPv6 may be compiled as a module. To allow for
    this the CALIPSO functions are registered at module init time.

    Signed-off-by: Huw Davies
    Signed-off-by: Paul Moore

    Huw Davies
     

10 Jun, 2016

1 commit

  • Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
    disabled at boot using the kernel option ipv6.disable=1. Using
    current net-next with the boot option:

    $ ip link add red type vrf table 1001

    Generates:
    [12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
    [12210.921341] IP: [] fib6_get_table+0x2c/0x5a
    [12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
    [12210.923479] Oops: 0000 [#1] SMP
    [12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
    [12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
    [12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
    [12210.929328] RIP: 0010:[] [] fib6_get_table+0x2c/0x5a
    [12210.930697] RSP: 0018:ffff8800bacaf888 EFLAGS: 00010202
    [12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
    [12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
    [12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
    [12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
    [12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
    [12210.937103] FS: 00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
    [12210.938321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
    [12210.940278] Stack:
    [12210.940603] ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
    [12210.941818] ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
    [12210.943040] ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
    [12210.944288] Call Trace:
    [12210.944688] [] fib6_new_table+0x24/0x8a
    [12210.945516] [] vrf_dev_init+0xd4/0x162
    [12210.946328] [] register_netdevice+0x100/0x396
    [12210.947209] [] vrf_newlink+0x40/0xb3
    [12210.948001] [] rtnl_newlink+0x5d3/0x6d5
    ...

    The problem above is due to the fact that the fib hash table is not
    allocated when IPv6 is disabled at boot.

    As for the VRF driver it should not do any IPv6 initializations if IPv6
    is disabled, so it needs to know if IPv6 is disabled at boot. The disable
    parameter is private to the IPv6 module, so provide an accessor for
    modules to determine if IPv6 was disabled at boot time.

    Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

08 Apr, 2016

1 commit

  • This patch adds GRO functions (gro_receive and gro_complete) to UDP
    sockets. udp_gro_receive is changed to perform socket lookup on a
    packet. If a socket is found the related GRO functions are called.

    This features obsoletes using UDP offload infrastructure for GRO
    (udp_offload). This has the advantage of not being limited to provide
    offload on a per port basis, GRO is now applied to whatever individual
    UDP sockets are bound to. This also allows the possbility of
    "application defined GRO"-- that is we can attach something like
    a BPF program to a UDP socket to perfrom GRO on an application
    layer protocol.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

06 Apr, 2016

1 commit

  • Enable peeking at UDP datagrams at the offset specified with socket
    option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
    to the end of the given datagram.

    Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f6e55
    ("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset
    on peek, decrease it on regular reads.

    When peeking, always checksum the packet immediately, to avoid
    recomputation on subsequent peeks and final read.

    The socket lock is not held for the duration of udp_recvmsg, so
    peek and read operations can run concurrently. Only the last store
    to sk_peek_off is preserved.

    Signed-off-by: Sam Kumar
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    samanthakumar
     

11 Feb, 2016

1 commit

  • In order to support fast reuseport lookups in TCP, the hash function
    defined in struct proto must be capable of returning an error code.
    This patch changes the function signature of all related hash functions
    to return an integer and handles or propagates this return value at
    all call sites.

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     

15 Dec, 2015

1 commit

  • 郭永刚 reported that one could simply crash the kernel as root by
    using a simple program:

    int socket_fd;
    struct sockaddr_in addr;
    addr.sin_port = 0;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_family = 10;

    socket_fd = socket(10,3,0x40000000);
    connect(socket_fd , &addr,16);

    AF_INET, AF_INET6 sockets actually only support 8-bit protocol
    identifiers. inet_sock's skc_protocol field thus is sized accordingly,
    thus larger protocol identifiers simply cut off the higher bits and
    store a zero in the protocol fields.

    This could lead to e.g. NULL function pointer because as a result of
    the cut off inet_num is zero and we call down to inet_autobind, which
    is NULL for raw sockets.

    kernel: Call Trace:
    kernel: [] ? inet_autobind+0x2e/0x70
    kernel: [] inet_dgram_connect+0x54/0x80
    kernel: [] SYSC_connect+0xd9/0x110
    kernel: [] ? ptrace_notify+0x5b/0x80
    kernel: [] ? syscall_trace_enter_phase2+0x108/0x200
    kernel: [] SyS_connect+0xe/0x10
    kernel: [] tracesys_phase2+0x84/0x89

    I found no particular commit which introduced this problem.

    CVE: CVE-2015-8543
    Cc: Cong Wang
    Reported-by: 郭永刚
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

04 Dec, 2015

1 commit

  • While testing the np->opt RCU conversion, I found that UDP/IPv6 was
    using a mixture of xchg() and sk_dst_lock to protect concurrent changes
    to sk->sk_dst_cache, leading to possible corruptions and crashes.

    ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
    way to fix the mess is to remove sk_dst_lock completely, as we did for
    IPv4.

    __ip6_dst_store() and ip6_dst_store() share same implementation.

    sk_setup_caps() being called with socket lock being held or not,
    we have to use sk_dst_set() instead of __sk_dst_set()

    Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
    in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

    This is because ip6_dst_store() can be called from process context,
    without any lock held.

    As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
    from another cpu doing a concurrent ip6_dst_store()

    Doing the dst dereference before doing the install is needed to make
    sure no use after free would trigger.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2015

1 commit

  • This patch addresses multiple problems :

    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.

    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())

    This patch adds full RCU protection to np->opt

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Aug, 2015

2 commits

  • Per RFC6437 stateful flow labels (e.g. labels set by flow label manager)
    cannot "disturb" nodes taking part in stateless flow labels. While the
    ranges only reduce the flow label entropy by one bit, it is conceivable
    that this might bias the algorithm on some routers causing a load
    imbalance. For best results on the Internet we really need the full
    20 bits.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
    automatic flow labels generation. There are four modes:

    0: flow labels are disabled
    1: flow labels are enabled, sockets can opt-out
    2: flow labels are allowed, sockets can opt-in
    3: flow labels are enabled and enforced, no opt-out for sockets

    np->autoflowlabel is initialized according to the sysctl value.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

10 Jul, 2015

2 commits

  • Add support to allow non-local binds similar to how this was done for IPv4.
    Non-local binds are very useful in emulating the Internet in a box, etc.

    This add the ip_nonlocal_bind sysctl under ipv6.

    Testing:

    Set up nonlocal binding and receive routing on a host, e.g.:

    ip -6 rule add from ::/0 iif eth0 lookup 200
    ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
    sysctl -w net.ipv6.ip_nonlocal_bind=1

    Set up routing to 2001:0:0:1::/64 on peer to go to first host

    ping6 -I 2001:0:0:1::1 peer-address -- to verify

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Hop was always either 0 or sizeof(struct ipv6hdr).

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

07 Jun, 2015

1 commit

  • When an application needs to force a source IP on an active TCP socket
    it has to use bind(IP, port=x).

    As most applications do not want to deal with already used ports, x is
    often set to 0, meaning the kernel is in charge to find an available
    port.
    But kernel does not know yet if this socket is going to be a listener or
    be connected.
    It has very limited choices (no full knowledge of final 4-tuple for a
    connect())

    With limited ephemeral port range (about 32K ports), it is very easy to
    fill the space.

    This patch adds a new SOL_IP socket option, asking kernel to ignore
    the 0 port provided by application in bind(IP, port=0) and only
    remember the given IP address.

    The port will be automatically chosen at connect() time, in a way
    that allows sharing a source port as long as the 4-tuples are unique.

    This new feature is available for both IPv4 and IPv6 (Thanks Neal)

    Tested:

    Wrote a test program and checked its behavior on IPv4 and IPv6.

    strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
    connect().
    Also getsockname() show that the port is still 0 right after bind()
    but properly allocated after connect().

    socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
    setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
    bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
    getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
    connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
    getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

    IPv6 test :

    socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
    setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
    bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
    getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
    connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
    getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

    I was able to bind()/connect() a million concurrent IPv4 sockets,
    instead of ~32000 before patch.

    lpaa23:~# ulimit -n 1000010
    lpaa23:~# ./bind --connect --num-flows=1000000 &
    1000000 sockets

    lpaa23:~# grep TCP /proc/net/sockstat
    TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

    Check that a given source port is indeed used by many different
    connections :

    lpaa23:~# ss -t src :40000 | head -10
    State Recv-Q Send-Q Local Address:Port Peer Address:Port
    ESTAB 0 0 127.0.0.2:40000 127.0.202.33:44983
    ESTAB 0 0 127.0.0.2:40000 127.2.27.240:44983
    ESTAB 0 0 127.0.0.2:40000 127.2.98.5:44983
    ESTAB 0 0 127.0.0.2:40000 127.0.124.196:44983
    ESTAB 0 0 127.0.0.2:40000 127.2.139.38:44983
    ESTAB 0 0 127.0.0.2:40000 127.1.59.80:44983
    ESTAB 0 0 127.0.0.2:40000 127.3.6.228:44983
    ESTAB 0 0 127.0.0.2:40000 127.0.38.53:44983
    ESTAB 0 0 127.0.0.2:40000 127.1.197.10:44983

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 May, 2015

1 commit


04 May, 2015

1 commit

  • This patch divides the IPv6 flow label space into two ranges:
    0-7ffff is reserved for flow label manager, 80000-fffff will be
    used for creating auto flow labels (per RFC6438). This only affects how
    labels are set on transmit, it does not affect receive. This range split
    can be disbaled by systcl.

    Background:

    IPv6 flow labels have been an unmitigated disappointment thus far
    in the lifetime of IPv6. Support in HW devices to use them for ECMP
    is lacking, and OSes don't turn them on by default. If we had these
    we could get much better hashing in IPv6 networks without resorting
    to DPI, possibly eliminating some of the motivations to to define new
    encaps in UDP just for getting ECMP.

    Unfortunately, the initial specfications of IPv6 did not clarify
    how they are to be used. There has always been a vague concept that
    these can be used for ECMP, flow hashing, etc. and we do now have a
    good standard how to this in RFC6438. The problem is that flow labels
    can be either stateful or stateless (as in RFC6438), and we are
    presented with the possibility that a stateless label may collide
    with a stateful one. Attempts to split the flow label space were
    rejected in IETF. When we added support in Linux for RFC6438, we
    could not turn on flow labels by default due to this conflict.

    This patch splits the flow label space and should give us
    a path to enabling auto flow labels by default for all IPv6 packets.
    This is an API change so we need to consider compatibility with
    existing deployment. The stateful range is chosen to be the lower
    values in hopes that most uses would have chosen small numbers.

    Once we resolve the stateless/stateful issue, we can proceed to
    look at enabling RFC6438 flow labels by default (starting with
    scaled testing).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

01 Apr, 2015

2 commits

  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x != NULL and sometimes as x. x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x == NULL and sometimes as !x. !x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

24 Mar, 2015

1 commit


02 Mar, 2015

1 commit


07 Oct, 2014

1 commit

  • Try to reduce number of possible fn_sernum mutation by constraining them
    to their namespace.

    Also remove rt_genid which I forgot to remove in 705f1c869d577c ("ipv6:
    remove rt6i_genid").

    Cc: YOSHIFUJI Hideaki
    Cc: Martin Lau
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

29 Sep, 2014

1 commit


10 Sep, 2014

1 commit


25 Aug, 2014

1 commit

  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    A number of items are addressed in this patch:
    * Multiple spaces converted to tabs
    * Spaces before tabs removed.
    * Spaces in pointer typing cleansed (char *)foo etc.
    * Remove space after sizeof
    * Ensure spacing around comparators such as if statements.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jul, 2014

1 commit

  • When an UDP application switches from AF_INET to AF_INET6 sockets, we
    have a small performance degradation for IPv4 communications because of
    extra cache line misses to access ipv6only information.

    This can also be noticed for TCP listeners, as ipv6_only_sock() is also
    used from __inet_lookup_listener()->compute_score()

    This is magnified when SO_REUSEPORT is used.

    Move ipv6only into struct sock_common so that it is available at
    no extra cost in lookups.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 May, 2014

1 commit

  • It doesn't seem like an protocols are setting anything other
    than the default, and allowing to arbitrarily disable checksums
    for a whole protocol seems dangerous. This can be done on a per
    socket basis.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

08 May, 2014

1 commit

  • commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
    reduced snmp array size to 1, so technically it doesn't have to be
    an array any more. What's more, after the following commit:

    commit 933393f58fef9963eac61db8093689544e29a600
    Date: Thu Dec 22 11:58:51 2011 -0600

    percpu: Remove irqsafe_cpu_xxx variants

    We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
    almost 3 years, no one complains.

    So, just convert the array to a single pointer and remove snmp_mib_init()
    and snmp_mib_free() as well.

    Cc: Christoph Lameter
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

20 Jan, 2014

1 commit

  • With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
    flow label unicity. This patch introduces a new sysctl to protect the old
    behaviour, enable by default.

    Changelog of V3:
    * rename ip6_flowlabel_consistency to flowlabel_consistency
    * use net_info_ratelimited()
    * checkpatch cleanups

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

20 Dec, 2013

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2013-12-19

    1) Use the user supplied policy index instead of a generated one
    if present. From Fan Du.

    2) Make xfrm migration namespace aware. From Fan Du.

    3) Make the xfrm state and policy locks namespace aware. From Fan Du.

    4) Remove ancient sleeping when the SA is in acquire state,
    we now queue packets to the policy instead. This replaces the
    sleeping code.

    5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
    posibility to sleep. The sleeping code is gone, so remove it.

    6) Check user specified spi for IPComp. Thr spi for IPcomp is only
    16 bit wide, so check for a valid value. From Fan Du.

    7) Export verify_userspi_info to check for valid user supplied spi ranges
    with pfkey and netlink. From Fan Du.

    8) RFC3173 states that if the total size of a compressed payload and the IPComp
    header is not smaller than the size of the original payload, the IP datagram
    must be sent in the original non-compressed form. These packets are dropped
    by the inbound policy check because they are not transformed. Document the need
    to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Dec, 2013

1 commit


10 Dec, 2013

1 commit


06 Dec, 2013

1 commit


20 Nov, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Mostly these are fixes for fallout due to merge window changes, as
    well as cures for problems that have been with us for a much longer
    period of time"

    1) Johannes Berg noticed two major deficiencies in our genetlink
    registration. Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

    2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports. From Ben Hutchings.

    3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

    4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet(). From Alexei Starovoitov.

    5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
    Makita.

    6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

    7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease. One particularly problematic area is 802.11 AMPDU in
    wireless. From Eric Dumazet.

    8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here. Fix from Eric Dumzaet, reported by Dave Jones.

    9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

    10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account. From Michael Dalton.

    11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
    bumping, this one has been with us for a while. From Eric Dumazet.

    12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

    13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

    14) macvlan has the same issue as normal vlans did wrt. propagating LRO
    disabling down to the real device, fix it the same way. From Michal
    Kubecek.

    15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

    16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

    17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

    18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little. Fix
    from Vlad Yasevich.

    19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace. From Hannes Frederic Sowa. There is another fix in the
    works from Hannes which will prevent future problems of this nature.

    20) Fix route leak in VTI tunnel transmit, from Fan Du.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    genetlink: make multicast groups const, prevent abuse
    genetlink: pass family to functions using groups
    genetlink: add and use genl_set_err()
    genetlink: remove family pointer from genl_multicast_group
    genetlink: remove genl_unregister_mc_group()
    hsr: don't call genl_unregister_mc_group()
    quota/genetlink: use proper genetlink multicast APIs
    drop_monitor/genetlink: use proper genetlink multicast APIs
    genetlink: only pass array to genl_register_family_with_ops()
    tcp: don't update snd_nxt, when a socket is switched from repair mode
    atm: idt77252: fix dev refcnt leak
    xfrm: Release dst if this dst is improper for vti tunnel
    netlink: fix documentation typo in netlink_set_err()
    be2net: Delete secondary unicast MAC addresses during be_close
    be2net: Fix unconditional enabling of Rx interface options
    net, virtio_net: replace the magic value
    ping: prevent NULL pointer dereference on write to msg_name
    bnx2x: Prevent "timeout waiting for state X"
    bnx2x: prevent CFC attention
    bnx2x: Prevent panic during DMAE timeout
    ...

    Linus Torvalds
     

19 Nov, 2013

1 commit

  • Commit 6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67
    net: ipv6: Add IPv6 support to the ping socket

    introduced a change in the cleanup logic of inet6_init and
    has a bug in that ipv6_packet_cleanup() may not be called.
    Fix the cleanup ordering.

    CC: Hannes Frederic Sowa
    CC: Lorenzo Colitti
    CC: Fabio Estevam
    Signed-off-by: Vlad Yasevich
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

14 Nov, 2013

1 commit

  • Pull core locking changes from Ingo Molnar:
    "The biggest changes:

    - add lockdep support for seqcount/seqlocks structures, this
    unearthed both bugs and required extra annotation.

    - move the various kernel locking primitives to the new
    kernel/locking/ directory"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    block: Use u64_stats_init() to initialize seqcounts
    locking/lockdep: Mark __lockdep_count_forward_deps() as static
    lockdep/proc: Fix lock-time avg computation
    locking/doc: Update references to kernel/mutex.c
    ipv6: Fix possible ipv6 seqlock deadlock
    cpuset: Fix potential deadlock w/ set_mems_allowed
    seqcount: Add lockdep functionality to seqcount/seqlock structures
    net: Explicitly initialize u64_stats_sync structures for lockdep
    locking: Move the percpu-rwsem code to kernel/locking/
    locking: Move the lglocks code to kernel/locking/
    locking: Move the rwsem code to kernel/locking/
    locking: Move the rtmutex code to kernel/locking/
    locking: Move the semaphore core to kernel/locking/
    locking: Move the spinlock code to kernel/locking/
    locking: Move the lockdep code to kernel/locking/
    locking: Move the mutex code to kernel/locking/
    hung_task debugging: Add tracepoint to report the hang
    x86/locking/kconfig: Update paravirt spinlock Kconfig description
    lockstat: Report avg wait and hold times
    lockdep, x86/alternatives: Drop ancient lockdep fixup message
    ...

    Linus Torvalds
     

06 Nov, 2013

1 commit

  • In order to enable lockdep on seqcount/seqlock structures, we
    must explicitly initialize any locks.

    The u64_stats_sync structure, uses a seqcount, and thus we need
    to introduce a u64_stats_init() function and use it to initialize
    the structure.

    This unfortunately adds a lot of fairly trivial initialization code
    to a number of drivers. But the benefit of ensuring correctness makes
    this worth while.

    Because these changes are required for lockdep to be enabled, and the
    changes are quite trivial, I've not yet split this patch out into 30-some
    separate patches, as I figured it would be better to get the various
    maintainers thoughts on how to best merge this change along with
    the seqcount lockdep enablement.

    Feedback would be appreciated!

    Signed-off-by: John Stultz
    Acked-by: Julian Anastasov
    Signed-off-by: Peter Zijlstra
    Cc: Alexey Kuznetsov
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: Hideaki YOSHIFUJI
    Cc: James Morris
    Cc: Jesse Gross
    Cc: Mathieu Desnoyers
    Cc: "Michael S. Tsirkin"
    Cc: Mirko Lindner
    Cc: Patrick McHardy
    Cc: Roger Luethi
    Cc: Rusty Russell
    Cc: Simon Horman
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Thomas Petazzoni
    Cc: Wensong Zhang
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

22 Oct, 2013

1 commit