25 Aug, 2020

1 commit


25 Jul, 2020

8 commits


20 Jul, 2020

6 commits

  • Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
    extension struct if present.

    ICMP messages may include an extension structure after the original
    datagram. RFC 4884 standardized this behavior. It stores the offset
    in words to the extension header in u8 icmphdr.un.reserved[1].

    The field is valid only for ICMP types destination unreachable, time
    exceeded and parameter problem, if length is at least 128 bytes and
    entire packet does not exceed 576 bytes.

    Return the offset to the start of the extension struct when reading an
    ICMP error from the error queue, if it matches the above constraints.

    Do not return the raw u8 field. Return the offset from the start of
    the user buffer, in bytes. The kernel does not return the network and
    transport headers, so subtract those.

    Also validate the headers. Return the offset regardless of validation,
    as an invalid extension must still not be misinterpreted as part of
    the original datagram. Note that !invalid does not imply valid. If
    the extension version does not match, no validation can take place,
    for instance.

    For backward compatibility, make this optional, set by setsockopt
    SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
    github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c

    For forward compatibility, reserve only setsockopt value 1, leaving
    other bits for additional icmp extensions.

    Changes
    v1->v2:
    - convert word offset to byte offset from start of user buffer
    - return in ee_data as u8 may be insufficient
    - define extension struct and object header structs
    - return len only if constraints met
    - if returning len, also validate

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Handle the few cases that need special treatment in-line using
    in_compat_syscall().

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Factor out one helper each for setting the native and compat
    version of the MCAST_MSFILTER option.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Factor out one helper each for setting the native and compat
    version of the MCAST_MSFILTER option.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Factor out one helper each for getting the native and compat
    version of the MCAST_MSFILTER option.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • All instances handle compat sockopts via in_compat_syscall() now, so
    remove the compat_{get,set} methods as well as the
    compat_nf_{get,set}sockopt wrappers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

29 May, 2020

5 commits


26 May, 2020

1 commit

  • The value of "n" is capped at 0x1ffffff but it checked for negative
    values. I don't think this causes a problem but I'm not certain and
    it's harmless to prevent it.

    Fixes: 2e04172875c9 ("ipv4: do compat setsockopt for MCAST_MSFILTER directly")
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

21 May, 2020

8 commits


12 May, 2020

1 commit

  • The msg_control field in struct msghdr can either contain a user
    pointer when used with the recvmsg system call, or a kernel pointer
    when used with sendmsg. To complicate things further kernel_recvmsg
    can stuff a kernel pointer in and then use set_fs to make the uaccess
    helpers accept it.

    Replace it with a union of a kernel pointer msg_control field, and
    a user pointer msg_control_user one, and allow kernel_recvmsg operate
    on a proper kernel pointer using a bitfield to override the normal
    choice of a user pointer for recvmsg.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

26 May, 2019

1 commit

  • In function ip_ra_control(), the pointer new_ra is allocated a memory
    space via kmalloc(). And it is used in the following codes. However,
    when there is a memory allocation error, kmalloc() fails. Thus null
    pointer dereference may happen. And it will cause the kernel to crash.
    Therefore, we should check the return value and handle the error.

    Signed-off-by: Gen Zhang
    Signed-off-by: David S. Miller

    Gen Zhang
     

10 Jan, 2019

1 commit

  • Commit 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call
    pskb_may_pull") avoided a read beyond the end of the skb linear
    segment by calling pskb_may_pull.

    That function can trigger a BUG_ON in pskb_expand_head if the skb is
    shared, which it is when when peeking. It can also return ENOMEM.

    Avoid both by switching to safer skb_header_pointer.

    Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
    Reported-by: syzbot
    Suggested-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

06 Nov, 2018

1 commit

  • When iptables command is executed, ip_{set/get}sockopt() try to upload
    bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko,
    command is failed.
    bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled.
    ip_{set/get}sockopt() only checks CONFIG_BPFILTER.
    So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled,
    iptables command is always failed.

    test config:
    CONFIG_BPFILTER=y
    # CONFIG_BPFILTER_UMH is not set

    test command:
    %iptables -L
    iptables: No chain/target/match by that name.

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

03 Oct, 2018

1 commit


25 Jul, 2018

1 commit

  • Syzbot reported a read beyond the end of the skb head when returning
    IPV6_ORIGDSTADDR:

    BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
    CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

    This logic and its ipv4 counterpart read the destination port from
    the packet at skb_transport_offset(skb) + 4.

    With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
    packet that stores headers exactly up to skb_transport_offset(skb) in
    the head and the remainder in a frag.

    Call pskb_may_pull before accessing the pointer to ensure that it lies
    in skb head.

    Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
    Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

17 Jul, 2018

1 commit

  • Based on RFC3376 5.1
    If no interface
    state existed for that multicast address before the change (i.e., the
    change consisted of creating a new per-interface record), or if no
    state exists after the change (i.e., the change consisted of deleting
    a per-interface record), then the "non-existent" state is considered
    to have a filter mode of INCLUDE and an empty source list.

    Which means a new multicast group should start with state IN().

    Function ip_mc_join_group() works correctly for IGMP ASM(Any-Source Multicast)
    mode. It adds a group with state EX() and inits crcount to mc_qrv,
    so the kernel will send a TO_EX() report message after adding group.

    But for IGMPv3 SSM(Source-specific multicast) JOIN_SOURCE_GROUP mode, we
    split the group joining into two steps. First we join the group like ASM,
    i.e. via ip_mc_join_group(). So the state changes from IN() to EX().

    Then we add the source-specific address with INCLUDE mode. So the state
    changes from EX() to IN(A).

    Before the first step sends a group change record, we finished the second
    step. So we will only send the second change record. i.e. TO_IN(A).

    Regarding the RFC stands, we should actually send an ALLOW(A) message for
    SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)'
    transition.

    The issue was exposed by commit a052517a8ff65 ("net/multicast: should not
    send source list records when have filter mode change"). Before this change,
    we used to send both ALLOW(A) and TO_IN(A). After this change we only send
    TO_IN(A).

    Fix it by adding a new parameter to init group mode. Also add new wrapper
    functions so we don't need to change too much code.

    v1 -> v2:
    In my first version I only cleared the group change record. But this is not
    enough. Because when a new group join, it will init as EXCLUDE and trigger
    an filter mode change in ip/ip6_mc_add_src(), which will clear all source
    addresses' sf_crcount. This will prevent early joined address sending state
    change records if multi source addressed joined at the same time.

    In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
    JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
    for IPv4 and IPv6.

    Fixes: a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change")
    Reviewed-by: Stefano Brivio
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

27 May, 2018

1 commit


25 May, 2018

1 commit

  • A precondition check in ip_recv_error triggered on an otherwise benign
    race. Remove the warning.

    The warning triggers when passing an ipv6 socket to this ipv4 error
    handling function. RaceFuzzer was able to trigger it due to a race
    in setsockopt IPV6_ADDRFORM.

    ---
    CPU0
    do_ipv6_setsockopt
    sk->sk_socket->ops = &inet_dgram_ops;

    ---
    CPU1
    sk->sk_prot->recvmsg
    udp_recvmsg
    ip_recv_error
    WARN_ON_ONCE(sk->sk_family == AF_INET6);

    ---
    CPU0
    do_ipv6_setsockopt
    sk->sk_family = PF_INET;

    This socket option converts a v6 socket that is connected to a v4 peer
    to an v4 socket. It updates the socket on the fly, changing fields in
    sk as well as other structs. This is inherently non-atomic. It races
    with the lockless udp_recvmsg path.

    No other code makes an assumption that these fields are updated
    atomically. It is benign here, too, as ip_recv_error cares only about
    the protocol of the skbs enqueued on the error queue, for which
    sk_family is not a precise predictor (thanks to another isue with
    IPV6_ADDRFORM).

    Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
    Fixes: 7ce875e5ecb8 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
    Reported-by: DaeRyong Jeong
    Suggested-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

24 May, 2018

1 commit

  • bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
    and user mode helper code that is embedded into bpfilter.ko

    The steps to build bpfilter.ko are the following:
    - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
    - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
    is converted into bpfilter_umh.o object file
    with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
    Example:
    $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
    0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
    0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
    0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
    - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko

    bpfilter_kern.c is a normal kernel module code that calls
    the fork_usermode_blob() helper to execute part of its own data
    as a user mode process.

    Notice that _binary_net_bpfilter_bpfilter_umh_start - end
    is placed into .init.rodata section, so it's freed as soon as __init
    function of bpfilter.ko is finished.
    As part of __init the bpfilter.ko does first request/reply action
    via two unix pipe provided by fork_usermode_blob() helper to
    make sure that umh is healthy. If not it will kill it via pid.

    Later bpfilter_process_sockopt() will be called from bpfilter hooks
    in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

    If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
    kill umh as well.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

23 Mar, 2018

1 commit