08 May, 2019

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

    2) Add fib_sync_mem to control the amount of dirty memory we allow to
    queue up between synchronize RCU calls, from David Ahern.

    3) Make flow classifier more lockless, from Vlad Buslov.

    4) Add PHY downshift support to aquantia driver, from Heiner
    Kallweit.

    5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
    contention on SLAB spinlocks in heavy RPC workloads.

    6) Partial GSO offload support in XFRM, from Boris Pismenny.

    7) Add fast link down support to ethtool, from Heiner Kallweit.

    8) Use siphash for IP ID generator, from Eric Dumazet.

    9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
    entries, from David Ahern.

    10) Move skb->xmit_more into a per-cpu variable, from Florian
    Westphal.

    11) Improve eBPF verifier speed and increase maximum program size,
    from Alexei Starovoitov.

    12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
    spinlocks. From Neil Brown.

    13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

    14) Improve link partner cap detection in generic PHY code, from
    Heiner Kallweit.

    15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
    Maguire.

    16) Remove SKB list implementation assumptions in SCTP, your's truly.

    17) Various cleanups, optimizations, and simplifications in r8169
    driver. From Heiner Kallweit.

    18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

    19) Switch PHY drivers over to use dynamic featue detection, from
    Heiner Kallweit.

    20) Support flow steering without masking in dpaa2-eth, from Ioana
    Ciocoi.

    21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
    Pirko.

    22) Increase the strict parsing of current and future netlink
    attributes, also export such policies to userspace. From Johannes
    Berg.

    23) Allow DSA tag drivers to be modular, from Andrew Lunn.

    24) Remove legacy DSA probing support, also from Andrew Lunn.

    25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
    Haabendal.

    26) Add a generic tracepoint for TX queue timeouts to ease debugging,
    from Cong Wang.

    27) More indirect call optimizations, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
    cxgb4: Fix error path in cxgb4_init_module
    net: phy: improve pause mode reporting in phy_print_status
    dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
    net: macb: Change interrupt and napi enable order in open
    net: ll_temac: Improve error message on error IRQ
    net/sched: remove block pointer from common offload structure
    net: ethernet: support of_get_mac_address new ERR_PTR error
    net: usb: smsc: fix warning reported by kbuild test robot
    staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
    net: dsa: support of_get_mac_address new ERR_PTR error
    net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
    vrf: sit mtu should not be updated when vrf netdev is the link
    net: dsa: Fix error cleanup path in dsa_init_module
    l2tp: Fix possible NULL pointer dereference
    taprio: add null check on sched_nest to avoid potential null pointer dereference
    net: mvpp2: cls: fix less than zero check on a u32 variable
    net_sched: sch_fq: handle non connected flows
    net_sched: sch_fq: do not assume EDT packets are ordered
    net: hns3: use devm_kcalloc when allocating desc_cb
    net: hns3: some cleanup for struct hns3_enet_ring
    ...

    Linus Torvalds
     

07 May, 2019

1 commit

  • Pull timer updates from Ingo Molnar:
    "This cycle had the following changes:

    - Timer tracing improvements (Anna-Maria Gleixner)

    - Continued tasklet reduction work: remove the hrtimer_tasklet
    (Thomas Gleixner)

    - Fix CPU hotplug remove race in the tick-broadcast mask handling
    code (Thomas Gleixner)

    - Force upper bound for setting CLOCK_REALTIME, to fix ABI
    inconsistencies with handling values that are close to the maximum
    supported and the vagueness of when uptime related wraparound might
    occur. Make the consistent maximum the year 2232 across all
    relevant ABIs and APIs. (Thomas Gleixner)

    - various cleanups and smaller fixes"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick: Fix typos in comments
    tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline()
    timekeeping: Force upper bound for setting CLOCK_REALTIME
    timer/trace: Improve timer tracing
    timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
    timer: Move trace point to get proper index
    tick/sched: Update tick_sched struct documentation
    tick: Remove outgoing CPU from broadcast masks
    timekeeping: Consistently use unsigned int for seqcount snapshot
    softirq: Remove tasklet_hrtimer
    xfrm: Replace hrtimer tasklet with softirq hrtimer
    mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer

    Linus Torvalds
     

03 May, 2019

1 commit


30 Apr, 2019

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2019-04-30

    1) A lot of work to remove indirections from the xfrm code.
    From Florian Westphal.

    2) Support ESP offload in combination with gso partial.
    From Boris Pismenny.

    3) Remove some duplicated code from vti4.
    From Jeremy Sowden.

    Please note that there is merge conflict

    between commit:

    8742dc86d0c7 ("xfrm4: Fix uninitialized memory read in _decode_session4")

    from the ipsec tree and commit:

    c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")

    from the ipsec-next tree. The merge conflict will appear
    when those trees get merged during the merge window.
    The conflict can be solved as it is done in linux-next:

    https://lkml.org/lkml/2019/4/25/1207

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Apr, 2019

1 commit

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

23 Apr, 2019

4 commits


15 Apr, 2019

1 commit

  • when CONFIG_INET is not enabled:
    net/xfrm/xfrm_output.c: In function ‘xfrm4_tunnel_encap_add’:
    net/xfrm/xfrm_output.c:234:2: error: implicit declaration of function ‘ip_select_ident’ [-Werror=implicit-function-declaration]
    ip_select_ident(dev_net(dst->dev), skb, NULL);

    XFRM only supports ipv4 and ipv6 so change dependency to INET and place
    user-visible options (pfkey sockets, migrate support and the like)
    under 'if INET' guard as well.

    Fixes: 1de70830066b7 ("xfrm: remove output2 indirection from xfrm_mode")
    Reported-by: Randy Dunlap
    Signed-off-by: Florian Westphal
    Acked-by: Randy Dunlap
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

08 Apr, 2019

10 commits

  • This structure is now only 4 bytes, so its more efficient
    to cache a copy rather than its address.

    No significant size difference in allmodconfig vmlinux.

    With non-modular kernel that has all XFRM options enabled, this
    series reduces vmlinux image size by ~11kb. All xfrm_mode
    indirections are gone and all modes are built-in.

    before (ipsec-next master):
    text data bss dec filename
    21071494 7233140 11104324 39408958 vmlinux.master

    after this series:
    21066448 7226772 11104324 39397544 vmlinux.patched

    With allmodconfig kernel, the size increase is only 362 bytes,
    even all the xfrm config options removed in this series are
    modular.

    before:
    text data bss dec filename
    15731286 6936912 4046908 26715106 vmlinux.master

    after this series:
    15731492 6937068 4046908 26715468 vmlinux

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • after previous changes, xfrm_mode contains no function pointers anymore
    and all modules defining such struct contain no code except an init/exit
    functions to register the xfrm_mode struct with the xfrm core.

    Just place the xfrm modes core and remove the modules,
    the run-time xfrm_mode register/unregister functionality is removed.

    Before:

    text data bss dec filename
    7523 200 2364 10087 net/xfrm/xfrm_input.o
    40003 628 440 41071 net/xfrm/xfrm_state.o
    15730338 6937080 4046908 26714326 vmlinux

    7389 200 2364 9953 net/xfrm/xfrm_input.o
    40574 656 440 41670 net/xfrm/xfrm_state.o
    15730084 6937068 4046908 26714060 vmlinux

    The xfrm*_mode_{transport,tunnel,beet} modules are gone.

    v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
    ones rather than removing them.

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from
    ipv6 in case of CONFIG_IPV6=m.

    This change has virtually no effect on vmlinux size, but it reduces
    afinfo size and allows followup patch to make xfrm modes const.

    v2: mark if (afinfo) tests as likely (Sabrina)
    re-fetch afinfo according to inner_mode in xfrm_prepare_input().

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • similar to previous patch: no external module dependencies,
    so we can avoid the indirection by placing this in the core.

    This change removes the last indirection from xfrm_mode and the
    xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.

    Before:
    text data bss dec hex filename
    3957 136 0 4093 ffd net/xfrm/xfrm_output.o
    587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o
    649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o
    625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o
    599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o
    After:
    text data bss dec hex filename
    5359 184 0 5543 15a7 net/xfrm/xfrm_output.o
    171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o
    171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o
    172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o
    172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o

    v2: fold the *encap_add functions into xfrm*_prepare_output
    preserve (move) output2 comment (Sabrina)
    use x->outer_mode->encap, not inner
    fix a build breakage on ppc (kbuild robot)

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • No external dependencies on any module, place this in the core.
    Increase is about 1800 byte for xfrm_input.o.

    The beet helpers get added to internal header, as they can be reused
    from xfrm_output.c in the next patch (kernel contains several
    copies of them in the xfrm{4,6}_mode_beet.c files).

    Before:
    text data bss dec filename
    5578 176 2364 8118 net/xfrm/xfrm_input.o
    1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o
    171 40 0 211 net/ipv4/xfrm4_mode_transport.o
    1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o
    1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o
    172 40 0 212 net/ipv6/xfrm6_mode_ro.o
    172 40 0 212 net/ipv6/xfrm6_mode_transport.o
    1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o

    After:
    text data bss dec filename
    7373 200 2364 9937 net/xfrm/xfrm_input.o
    587 44 0 631 net/ipv4/xfrm4_mode_beet.o
    171 32 0 203 net/ipv4/xfrm4_mode_transport.o
    649 32 0 681 net/ipv4/xfrm4_mode_tunnel.o
    625 44 0 669 net/ipv6/xfrm6_mode_beet.o
    172 32 0 204 net/ipv6/xfrm6_mode_ro.o
    172 32 0 204 net/ipv6/xfrm6_mode_transport.o
    599 32 0 631 net/ipv6/xfrm6_mode_tunnel.o

    v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix
    AF_UNSPEC selector breakage (bisected by Benedict Wong)

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • There are only two versions (tunnel and transport). The ip/ipv6 versions
    are only differ in sizeof(iphdr) vs ipv6hdr.

    Place this in the core and use x->outer_mode->encap type to call the
    correct adjustment helper.

    Before:
    text data bss dec filename
    15730311 6937008 4046908 26714227 vmlinux

    After:
    15730428 6937008 4046908 26714344 vmlinux

    (about 117 byte increase)

    v2: use family from x->outer_mode, not inner

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Same is input indirection. Only exception: we need to export
    xfrm_outer_mode_output for pktgen.

    Increases size of vmlinux by about 163 byte:
    Before:
    text data bss dec filename
    15730208 6936948 4046908 26714064 vmlinux

    After:
    15730311 6937008 4046908 26714227 vmlinux

    xfrm_inner_extract_output has no more external callers, make it static.

    v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
    add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
    add WARN_ON_ONCE for 'call AF_INET6 related output function, but
    CONFIG_IPV6=n' case.
    make xfrm_inner_extract_output static

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • No need for any indirection or abstraction here, both functions
    are pretty much the same and quite small, they also have no external
    dependencies.

    xfrm_prepare_input can then be made static.

    With allmodconfig build, size increase of vmlinux is 25 byte:

    Before:
    text data bss dec filename
    15730207 6936924 4046908 26714039 vmlinux

    After:
    15730208 6936948 4046908 26714064 vmlinux

    v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca)
    change copied comment to refer to transport and network header,
    not skb->{h,nh}, which don't exist anymore. (Sabrina)
    make xfrm_prepare_input static (Eyal Birger)

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Now that we have the family available directly in the
    xfrm_mode struct, we can use that and avoid one extra dereference.

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • This will be useful to know if we're supposed to decode ipv4 or ipv6.

    While at it, make the unregister function return void, all module_exit
    functions did just BUG(); there is never a point in doing error checks
    if there is no way to handle such error.

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

27 Mar, 2019

1 commit

  • If an xfrmi is associated to a vrf layer 3 master device,
    xfrm_policy_check() fails after traffic decapsulation. The input
    interface is replaced by the layer 3 master device, and hence
    xfrmi_decode_session() can't match the xfrmi anymore to satisfy
    policy checking.

    Extend ingress xfrmi lookup to honor the original layer 3 slave
    device, allowing xfrm interfaces to operate within a vrf domain.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Martin Willi
    Signed-off-by: Steffen Klassert

    Martin Willi
     

26 Mar, 2019

1 commit

  • In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
    I introduced a check for xfrm protocol, but according to Herbert
    IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
    it should be removed from validate_tmpl().

    And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
    protocols, this is why xfrm_state_flush() could still miss
    IPPROTO_ROUTING, which leads that those entries are left in
    net->xfrm.state_all before exit net. Fix this by replacing
    IPSEC_PROTO_ANY with zero.

    This patch also extracts the check from validate_tmpl() to
    xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
    With this, no other protocols should be added into xfrm.

    Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
    Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
    Cc: Steffen Klassert
    Cc: Herbert Xu
    Signed-off-by: Cong Wang
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Cong Wang
     

24 Mar, 2019

1 commit


22 Mar, 2019

1 commit

  • Switch the timer to HRTIMER_MODE_SOFT, which executed the timer
    callback in softirq context and remove the hrtimer_tasklet.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Acked-by: David S. Miller
    Cc: netdev@vger.kernel.org
    Cc: Steffen Klassert
    Cc: Herbert Xu
    Link: https://lkml.kernel.org/r/20190301224821.29843-3-bigeasy@linutronix.de

    Thomas Gleixner
     

21 Mar, 2019

2 commits


08 Mar, 2019

1 commit

  • For rcu protected pointers, we'd better add '__rcu' for them.

    Once added '__rcu' tag for rcu protected pointer, the sparse tool reports
    warnings.

    net/xfrm/xfrm_user.c:1198:39: sparse: expected struct sock *sk
    net/xfrm/xfrm_user.c:1198:39: sparse: got struct sock [noderef] *nlsk
    [...]

    So introduce a new wrapper function of nlmsg_unicast to handle type
    conversions.

    This patch also fixes a direct access of a rcu protected socket.

    Fixes: be33690d8fcf("[XFRM]: Fix aevent related crash")
    Signed-off-by: Su Yanjun
    Signed-off-by: Steffen Klassert

    Su Yanjun
     

01 Mar, 2019

1 commit

  • UBSAN report this:

    UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
    index 6 is out of range for type 'unsigned int [6]'
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4
    0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80
    ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8
    Call Trace:
    [] __dump_stack lib/dump_stack.c:15 [inline]
    [] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
    [] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
    [] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382
    [] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289
    [] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309
    [] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243
    [] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
    [] __run_timers kernel/time/timer.c:1218 [inline]
    [] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
    [] __do_softirq+0x299/0xe10 kernel/softirq.c:273
    [] invoke_softirq kernel/softirq.c:350 [inline]
    [] irq_exit+0x216/0x2c0 kernel/softirq.c:391
    [] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
    [] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
    [] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735
    [] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52
    [] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline]
    [] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
    [] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
    [] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
    [] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
    [] cpu_idle_loop kernel/sched/idle.c:251 [inline]
    [] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
    [] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245

    The issue is triggered as this:

    xfrm_add_policy
    -->verify_newpolicy_info //check the index provided by user with XFRM_POLICY_MAX
    //In my case, the index is 0x6E6BB6, so it pass the check.
    -->xfrm_policy_construct //copy the user's policy and set xfrm_policy_timer
    -->xfrm_policy_insert
    --> __xfrm_policy_link //use the orgin dir, in my case is 2
    --> xfrm_gen_index //generate policy index, there is 0x6E6BB6

    then xfrm_policy_timer be fired

    xfrm_policy_timer
    --> xfrm_policy_id2dir //get dir from (policy index & 7), in my case is 6
    --> xfrm_policy_delete
    --> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access

    Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
    valid, to fix the issue.

    Reported-by: Hulk Robot
    Fixes: e682adf021be ("xfrm: Try to honor policy index if it's supplied by user")
    Signed-off-by: YueHaibing
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    YueHaibing
     

18 Feb, 2019

1 commit

  • After moving an XFRM interface to another namespace it stays associated
    with the original namespace (net in `struct xfrm_if` and the list keyed
    with `xfrmi_net_id`), allowing processes in the new namespace to use
    SAs/policies that were created in the original namespace. For instance,
    this allows a keying daemon in one namespace to establish IPsec SAs for
    other namespaces without processes there having access to the keys or IKE
    credentials.

    This worked fine for outbound traffic, however, for inbound traffic the
    lookup for the interfaces and the policies used the incorrect namespace
    (the one the XFRM interface was moved to).

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Tobias Brunner
    Signed-off-by: Steffen Klassert

    Tobias Brunner
     

05 Feb, 2019

1 commit

  • xfrm_state_put() moves struct xfrm_state to the GC list
    and schedules the GC work to clean it up. On net exit call
    path, xfrm_state_flush() is called to clean up and
    xfrm_flush_gc() is called to wait for the GC work to complete
    before exit.

    However, this doesn't work because one of the ->destructor(),
    ipcomp_destroy(), schedules the same GC work again inside
    the GC work. It is hard to wait for such a nested async
    callback. This is also why syzbot still reports the following
    warning:

    WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
    ...
    ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
    cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
    process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
    worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
    kthread+0x357/0x430 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

    In fact, it is perfectly fine to bypass GC and destroy xfrm_state
    synchronously on net exit call path, because it is in process context
    and doesn't need a work struct to do any blocking work.

    This patch introduces xfrm_state_put_sync() which simply bypasses
    GC, and lets its callers to decide whether to use this synchronous
    version. On net exit path, xfrm_state_fini() and
    xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
    blocking, it can use xfrm_state_put_sync() directly too.

    Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
    reflect this change.

    Fixes: b48c05ab5d32 ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
    Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
    Cc: Steffen Klassert
    Signed-off-by: Cong Wang
    Signed-off-by: Steffen Klassert

    Cong Wang
     

16 Jan, 2019

1 commit

  • Fixes 9b42c1f179a6, which changed the default route lookup behavior for
    tunnel mode SAs in the outbound direction to use the skb mark, whereas
    previously mark=0 was used if the output mark was unspecified. In
    mark-based routing schemes such as Android’s, this change in default
    behavior causes routing loops or lookup failures.

    This patch restores the default behavior of using a 0 mark while still
    incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is
    specified.

    Tested with additions to Android's kernel unit test suite:
    https://android-review.googlesource.com/c/kernel/tests/+/860150

    Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking")
    Signed-off-by: Benedict Wong
    Signed-off-by: Steffen Klassert

    Benedict Wong
     

10 Jan, 2019

1 commit

  • The check assumes that in transport mode, the first templates family
    must match the address family of the policy selector.

    Syzkaller managed to build a template using MODE_ROUTEOPTIMIZATION,
    with ipv4-in-ipv6 chain, leading to following splat:

    BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x1db/0x1854
    Read of size 4 at addr ffff888063e57aa0 by task a.out/2050
    xfrm_state_find+0x1db/0x1854
    xfrm_tmpl_resolve+0x100/0x1d0
    xfrm_resolve_and_create_bundle+0x108/0x1000 [..]

    Problem is that addresses point into flowi4 struct, but xfrm_state_find
    treats them as being ipv6 because it uses templ->encap_family is used
    (AF_INET6 in case of reproducer) rather than family (AF_INET).

    This patch inverts the logic: Enforce 'template family must match
    selector' EXCEPT for tunnel and BEET mode.

    In BEET and Tunnel mode, xfrm_tmpl_resolve_one will have remote/local
    address pointers changed to point at the addresses found in the template,
    rather than the flowi ones, so no oob read will occur.

    Reported-by: 3ntr0py1337@gmail.com
    Reported-by: Daniel Borkmann
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

09 Jan, 2019

5 commits

  • With very small change to test script we can trigger softlockup due to
    bogus assignment of 'p' (policy to be examined) on restart.

    Previously the two to-be-merged nodes had same address/prefixlength pair,
    so no erase/reinsert was necessary, we only had to append the list from
    node a to b.

    If prefix lengths are different, the node has to be deleted and re-inserted
    into the tree, with the updated prefix length. This was broken; due to
    bogus update to 'p' this loops forever.

    Add a 'restart' label and use that instead.

    While at it, don't perform the unneeded reinserts of the policies that
    are already sorted into the 'new' node.

    A previous patch in this series made xfrm_policy_inexact_list_reinsert()
    use the relative position indicator to sort policies according to age in
    case priorities are identical.

    Fixes: 6ac098b2a9d30 ("xfrm: policy: add 2nd-level saddr trees for inexact policies")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • "newpos" has wrong scope. It must be NULL on each iteration of the loop.
    Otherwise, when policy is to be inserted at the start, we would instead
    insert at point found by the previous loop-iteration instead.

    Also, we need to unlink the policy before we reinsert it to the new node,
    else we can get next-points-to-self loops.

    Because policies are only ordered by priority it is irrelevant which policy
    is "more recent" except when two policies have same priority.
    (the more recent one is placed after the older one).

    In these cases, we can use the ->pos id number to know which one is the
    'older': the higher the id, the more recent the policy.

    So we only need to unlink all policies from the node that is about to be
    removed, and insert them to the replacement node.

    Fixes: 9cf545ebd591da ("xfrm: policy: store inexact policies in a tree ordered by destination address")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • An xfrm hash rebuild has to reset the inexact policy list before the
    policies get re-inserted: A change of hash thresholds will result in
    policies to get moved from inexact tree to the policy hash table.

    If the thresholds are increased again later, they get moved from hash
    table to inexact tree.

    We must unlink all policies from the inexact tree before re-insertion.

    Otherwise 'migrate' may find policies that are in main hash table a
    second time, when it searches the inexact lists.

    Furthermore, re-insertion without deletion can cause elements ->next to
    point back to itself, causing soft lockups or double-frees.

    Reported-by: syzbot+9d971dd21eb26567036b@syzkaller.appspotmail.com
    Fixes: 9cf545ebd591da ("xfrm: policy: store inexact policies in a tree ordered by destination address")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Hash rebuild will re-set all the inexact entries, then re-insert them.
    Lookups that can occur in parallel will therefore not find any policies.

    This was safe when lookups were still guarded by rwlock.
    After rcu-ification, lookups check the hash_generation seqcount to detect
    when a hash resize takes place. Hash rebuild missed the needed increment.

    Hash resizes and hash rebuilds cannot occur in parallel (both acquire
    hash_resize_mutex), so just increment xfrm_hash_generation, like resize.

    Fixes: a7c44247f704e3 ("xfrm: policy: make xfrm_policy_lookup_bytype lockless")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • This function was modeled on the 'exact' insert one, which did not use
    the rcu variant either.

    When I fixed the 'exact' insert I forgot to propagate this to my
    development tree, so the inexact variant retained the bug.

    Fixes: 9cf545ebd591d ("xfrm: policy: store inexact policies in a tree ordered by destination address")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

21 Dec, 2018

2 commits


20 Dec, 2018

1 commit

  • Remove skb->sp and allocate secpath storage via extension
    infrastructure. This also reduces sk_buff by 8 bytes on x86_64.

    Total size of allyesconfig kernel is reduced slightly, as there is
    less inlined code (one conditional atomic op instead of two on
    skb_clone).

    No differences in throughput in following ipsec performance tests:
    - transport mode with aes on 10GB link
    - tunnel mode between two network namespaces with aes and null cipher

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal