10 Nov, 2020

1 commit

  • if xfrm_get_translator() failed, xfrm_user_policy() return without
    freeing 'data', which is allocated in memdup_sockptr().

    Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
    Reported-by: Hulk Robot
    Signed-off-by: Yu Kuai
    Signed-off-by: Steffen Klassert

    Yu Kuai
     

09 Nov, 2020

3 commits

  • 32-bit to 64-bit messages translator zerofies needed paddings in the
    translation, the rest is the actual payload.
    Don't allocate zero pages as they are not needed.

    Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • 32-bit messages translated by xfrm_compat can have attributes attached.
    For all, but XFRMA_SA, XFRMA_POLICY the size of payload is the same
    in 32-bit UABI and 64-bit UABI. For XFRMA_SA (struct xfrm_usersa_info)
    and XFRMA_POLICY (struct xfrm_userpolicy_info) it's only tail-padding
    that is present in 64-bit payload, but not in 32-bit.
    The proper size for destination nlattr is already calculated by
    xfrm_user_rcv_calculate_len64() and allocated with kvmalloc().

    xfrm_attr_cpy32() copies 32-bit copy_len into 64-bit attribute
    translated payload, zero-filling possible padding for SA/POLICY.
    Due to a typo, *pos already has 64-bit payload size, in a result next
    memset(0) is called on the memory after the translated attribute, not on
    the tail-padding of it.

    Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
    Reported-by: syzbot+c43831072e7df506a646@syzkaller.appspotmail.com
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • xfrm_xlate32() translates 64-bit message provided by kernel to be sent
    for 32-bit listener (acknowledge or monitor). Translator code doesn't
    expect XFRMA_UNSPEC attribute as it doesn't know its payload.
    Kernel never attaches such attribute, but a user can.

    I've searched if any opensource does it and the answer is no.
    Nothing on github and google finds only tfcproject that has such code
    commented-out.

    What will happen if a user sends a netlink message with XFRMA_UNSPEC
    attribute? Ipsec code ignores this attribute. But if there is a
    monitor-process or 32-bit user requested ack - kernel will try to
    translate such message and will hit WARN_ONCE() in xfrm_xlate64_attr().

    Deal with XFRMA_UNSPEC by copying the attribute payload with
    xfrm_nla_cpy(). In result, the default switch-case in xfrm_xlate64_attr()
    becomes an unused code. Leave those 3 lines in case a new xfrm attribute
    will be added.

    Fixes: 5461fc0c8d9f ("xfrm/compat: Add 64=>32-bit messages translator")
    Reported-by: syzbot+a7e701c8385bd8543074@syzkaller.appspotmail.com
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     

05 Nov, 2020

1 commit


23 Oct, 2020

1 commit

  • we found that the following race condition exists in
    xfrm_alloc_userspi flow:

    user thread state_hash_work thread
    ---- ----
    xfrm_alloc_userspi()
    __find_acq_core()
    /*alloc new xfrm_state:x*/
    xfrm_state_alloc()
    /*schedule state_hash_work thread*/
    xfrm_hash_grow_check() xfrm_hash_resize()
    xfrm_alloc_spi /*hold lock*/
    x->id.spi = htonl(spi) spin_lock_bh(&net->xfrm.xfrm_state_lock)
    /*waiting lock release*/ xfrm_hash_transfer()
    spin_lock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi*/
    hlist_add_head_rcu(&x->byspi)
    spin_unlock_bh(&net->xfrm.xfrm_state_lock)

    /*add x into hlist:net->xfrm.state_byspi 2 times*/
    hlist_add_head_rcu(&x->byspi)

    1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist
    in __find_acq_core() on the LHS;
    2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state
    (include x) into the new bydst hlist and new byspi hlist;
    3. user thread on the LHS gets the lock and adds x into the new byspi hlist again.

    So the same xfrm_state (x) is added into the same list_hash
    (net->xfrm.state_byspi) 2 times that makes the list_hash become
    an inifite loop.

    To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved
    to the back of spin_lock_bh, sothat state_hash_work thread no longer add x
    which id.spi is zero into the hash_list.

    Fixes: f034b5d4efdf ("[XFRM]: Dynamic xfrm_state hash table sizing.")
    Signed-off-by: zhuoliang zhang
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    zhuoliang zhang
     

14 Oct, 2020

1 commit


09 Oct, 2020

1 commit

  • As Nicolas noticed in his case, when xfrm_interface module is installed
    the standard IP tunnels will break in receiving packets.

    This is caused by the IP tunnel handlers with a higher priority in xfrm
    interface processing incoming packets by xfrm_input(), which would drop
    the packets and return 0 instead when anything wrong happens.

    Rather than changing xfrm_input(), this patch is to adjust the priority
    for the IP tunnel handlers in xfrm interface, so that the packets would
    go to xfrmi's later than the others', as the others' would not drop the
    packets when the handlers couldn't process them.

    Note that IPCOMP also defines its own IPIP tunnel handler and it calls
    xfrm_input() as well, so we must make its priority lower than xfrmi's,
    which means having xfrmi loaded would still break IPCOMP. We may seek
    another way to fix it in xfrm_input() in the future.

    Reported-by: Nicolas Dichtel
    Tested-by: Nicolas Dichtel
    Fixes: da9bbf0598c9 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler")
    FIxes: d7b360c2869f ("xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler")
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert

    Xin Long
     

06 Oct, 2020

2 commits


29 Sep, 2020

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2020-09-28

    1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set.
    From YueHaibing.

    2) Restore IPCB on espintcp before handing the packet to xfrm
    as the information there is still needed.
    From Sabrina Dubroca.

    3) Fix pmtu updating for xfrm interfaces.
    From Sabrina Dubroca.

    4) Some xfrm state information was not cloned with xfrm_do_migrate.
    Fixes to clone the full xfrm state, from Antony Antony.

    5) Use the correct address family in xfrm_state_find. The struct
    flowi must always be interpreted along with the original
    address family. This got lost over the years.
    Fix from Herbert Xu.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Sep, 2020

1 commit

  • The struct flowi must never be interpreted by itself as its size
    depends on the address family. Therefore it must always be grouped
    with its original family value.

    In this particular instance, the original family value is lost in
    the function xfrm_state_find. Therefore we get a bogus read when
    it's coupled with the wrong family which would occur with inter-
    family xfrm states.

    This patch fixes it by keeping the original family value.

    Note that the same bug could potentially occur in LSM through
    the xfrm_state_pol_flow_match hook. I checked the current code
    there and it seems to be safe for now as only secid is used which
    is part of struct flowi_common. But that API should be changed
    so that so that we don't get new bugs in the future. We could
    do that by replacing fl with just secid or adding a family field.

    Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com
    Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...")
    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

24 Sep, 2020

5 commits

  • Provide compat_xfrm_userpolicy_info translation for xfrm setsocketopt().
    Reallocate buffer and put the missing padding for 64-bit message.

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • Provide the user-to-kernel translator under XFRM_USER_COMPAT, that
    creates for 32-bit xfrm-user message a 64-bit translation.
    The translation is afterwards reused by xfrm_user code just as if
    userspace had sent 64-bit message.

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • Currently nlmsg_unicast() is used by functions that dump structures that
    can be different in size for compat tasks, see dump_one_state() and
    dump_one_policy().

    The following nlmsg_unicast() users exist today in xfrm:

    Function | Message can be different
    | in size on compat
    -------------------------------------------|------------------------------
    xfrm_get_spdinfo() | N
    xfrm_get_sadinfo() | N
    xfrm_get_sa() | Y
    xfrm_alloc_userspi() | Y
    xfrm_get_policy() | Y
    xfrm_get_ae() | N

    Besides, dump_one_state() and dump_one_policy() can be used by filtered
    netlink dump for XFRM_MSG_GETSA, XFRM_MSG_GETPOLICY.

    Just as for xfrm multicast, allocate frag_list for compat skb journey
    down to recvmsg() which will give user the desired skb according to
    syscall bitness.

    Signed-off-by: Dmitry Safonov

    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • Provide the kernel-to-user translator under XFRM_USER_COMPAT, that
    creates for 64-bit xfrm-user message a 32-bit translation and puts it
    in skb's frag_list. net/compat.c layer provides MSG_CMSG_COMPAT to
    decide if the message should be taken from skb or frag_list.
    (used by wext-core which has also an ABI difference)

    Kernel sends 64-bit xfrm messages to the userspace for:
    - multicast (monitor events)
    - netlink dumps

    Wire up the translator to xfrm_nlmsg_multicast().

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • Add a skeleton for xfrm_compat module and provide API to register it in
    xfrm_state.ko. struct xfrm_translator will have function pointers to
    translate messages received from 32-bit userspace or to be sent to it
    from 64-bit kernel.
    module_get()/module_put() are used instead of rcu_read_lock() as the
    module will vmalloc() memory for translation.
    The new API is registered with xfrm_state module, not with xfrm_user as
    the former needs translator for user_policy set by setsockopt() and
    xfrm_user already uses functions from xfrm_state.

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     

07 Sep, 2020

3 commits

  • When we clone state only add_time was cloned. It missed values like
    bytes, packets. Now clone the all members of the structure.

    v1->v3:
    - use memcpy to copy the entire structure

    Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • XFRMA_SEC_CTX was not cloned from the old to the new.
    Migrate this attribute during XFRMA_MSG_MIGRATE

    v1->v2:
    - return -ENOMEM on error
    v2->v3:
    - fix return type to int

    Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old
    to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE

    Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.")
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert

    Antony Antony
     

27 Aug, 2020

1 commit

  • xfrm interfaces currently test for !skb->ignore_df when deciding
    whether to update the pmtu on the skb's dst. Because of this, no pmtu
    exception is created when we do something like:

    ping -s 1438

    By dropping this check, the pmtu exception will be created and the
    next ping attempt will work.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

17 Aug, 2020

1 commit

  • Xiumei reported a bug with espintcp over IPv6 in transport mode,
    because xfrm6_transport_finish expects to find IP6CB data (struct
    inet6_skb_cb). Currently, espintcp zeroes the CB, but the relevant
    part is actually preserved by previous layers (first set up by tcp,
    then strparser only zeroes a small part of tcp_skb_tb), so we can just
    relocate it to the start of skb->cb.

    Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     

11 Aug, 2020

1 commit

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

02 Aug, 2020

1 commit


01 Aug, 2020

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2020-07-31

    1) Fix policy matching with mark and mask on userspace interfaces.
    From Xin Long.

    2) Several fixes for the new ESP in TCP encapsulation.
    From Sabrina Dubroca.

    3) Fix crash when the hold queue is used. The assumption that
    xdst->path and dst->child are not a NULL pointer only if dst->xfrm
    is not a NULL pointer is true with the exception of using the
    hold queue. Fix this by checking for hold queue usage before
    dereferencing xdst->path or dst->child.

    4) Validate pfkey_dump parameter before sending them.
    From Mark Salyzyn.

    5) Fix the location of the transport header with ESP in UDPv6
    encapsulation. From Sabrina Dubroca.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Jul, 2020

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2020-07-30

    Please note that I did the first time now --no-ff merges
    of my testing branch into the master branch to include
    the [PATCH 0/n] message of a patchset. Please let me
    know if this is desirable, or if I should do it any
    different.

    1) Introduce a oseq-may-wrap flag to disable anti-replay
    protection for manually distributed ICVs as suggested
    in RFC 4303. From Petr Vaněk.

    2) Patchset to fully support IPCOMP for vti4, vti6 and
    xfrm interfaces. From Xin Long.

    3) Switch from a linear list to a hash list for xfrm interface
    lookups. From Eyal Birger.

    4) Fixes to not register one xfrm(6)_tunnel object twice.
    From Xin Long.

    5) Fix two compile errors that were introduced with the
    IPCOMP support for vti and xfrm interfaces.
    Also from Xin Long.

    6) Make the policy hold queue work with VTI. This was
    forgotten when VTI was implemented.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Jul, 2020

2 commits

  • Currently, espintcp_rcv drops packets silently, which makes debugging
    issues difficult. Count packets as either XfrmInHdrError (when the
    packet was too short or contained invalid data) or XfrmInError (for
    other issues).

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     
  • Currently, short messages (less than 4 bytes after the length header)
    will break the stream of messages. This is unnecessary, since we can
    still parse messages even if they're too short to contain any usable
    data. This is also bogus, as keepalive messages (a single 0xff byte),
    though not needed with TCP encapsulation, should be allowed.

    This patch changes the stream parser so that short messages are
    accepted and dropped in the kernel. Messages that contain a valid SPI
    or non-ESP header are processed as before.

    Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
    Reported-by: Andrew Cagney
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     

29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. If the serialization primitive is
    not disabling preemption implicitly, preemption has to be explicitly
    disabled before entering the sequence counter write side critical
    section.

    A plain seqcount_t does not contain the information of which lock must
    be held when entering a write side critical section.

    Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
    which allow to associate a lock with the sequence counter. This enables
    lockdep to verify that the lock used for writer serialization is held
    when the write side critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200720155530.1173732-17-a.darwish@linutronix.de

    Ahmed S. Darwish
     

25 Jul, 2020

1 commit


21 Jul, 2020

1 commit


17 Jul, 2020

4 commits

  • kernel test robot reported some compile errors:

    ia64-linux-ld: net/xfrm/xfrm_interface.o: in function `xfrmi4_fini':
    net/xfrm/xfrm_interface.c:900: undefined reference to `xfrm4_tunnel_deregister'
    ia64-linux-ld: net/xfrm/xfrm_interface.c:901: undefined reference to `xfrm4_tunnel_deregister'
    ia64-linux-ld: net/xfrm/xfrm_interface.o: in function `xfrmi4_init':
    net/xfrm/xfrm_interface.c:873: undefined reference to `xfrm4_tunnel_register'
    ia64-linux-ld: net/xfrm/xfrm_interface.c:876: undefined reference to `xfrm4_tunnel_register'
    ia64-linux-ld: net/xfrm/xfrm_interface.c:885: undefined reference to `xfrm4_tunnel_deregister'

    This happened when set CONFIG_XFRM_INTERFACE=y and CONFIG_INET_TUNNEL=m.
    We don't really want xfrm_interface to depend inet_tunnel completely,
    but only to disable the tunnel code when inet_tunnel is not seen.

    So instead of adding "select INET_TUNNEL" for XFRM_INTERFACE, this patch
    is only to change to IS_REACHABLE to avoid these compile error.

    Reported-by: kernel test robot
    Fixes: da9bbf0598c9 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler")
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert

    Xin Long
     
  • In case we're compiling espintcp support only for IPv6, we should
    still initialize the common code.

    Fixes: 26333c37fc28 ("xfrm: add IPv6 support for espintcp")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     
  • man 2 recv says:

    RETURN VALUE

    When a stream socket peer has performed an orderly shutdown, the
    return value will be 0 (the traditional "end-of-file" return).

    Currently, this works for blocking reads, but non-blocking reads will
    return -EAGAIN. This patch overwrites that return value when the peer
    won't send us any more data.

    Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
    Reported-by: Andrew Cagney
    Tested-by: Andrew Cagney
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     
  • Currently, non-blocking sends from userspace result in EOPNOTSUPP.

    To support this, we need to tell espintcp_sendskb_locked() and
    espintcp_sendskmsg_locked() that non-blocking operation was requested
    from espintcp_sendmsg().

    Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
    Reported-by: Andrew Cagney
    Tested-by: Andrew Cagney
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     

14 Jul, 2020

1 commit

  • As we did in the last 2 patches for vti(6), this patch is to define a
    new xfrm_tunnel object 'xfrmi_ipip6_handler' to register for AF_INET6,
    and a new xfrm6_tunnel object 'xfrmi_ip6ip_handler' to register for
    AF_INET.

    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert

    Xin Long
     

13 Jul, 2020

2 commits


11 Jul, 2020

1 commit