24 Feb, 2019

1 commit

  • I thought header search paths to tools/include(/uapi) were unneeded,
    but it looks like a build error occurs depending on the compiler.

    Commit 303a339f30a9 ("bpfilter: remove extra header search paths for
    bpfilter_umh") reintroduced the build error fixed by commit ae40832e53c3
    ("bpfilter: fix a build err").

    Apology for the breakage, and thanks to Guenter for reporting this.

    Fixes: 303a339f30a9 ("bpfilter: remove extra header search paths for bpfilter_umh")
    Reported-by: Guenter Roeck
    Signed-off-by: Masahiro Yamada
    Tested-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Masahiro Yamada
     

04 Feb, 2019

1 commit


17 Jan, 2019

1 commit

  • The section of bpfilter UMH blob is the ".bpfilter_umh". but this is not
    an explicit section. so linking warning occurred at compile time for the
    powerpc.
    So, this patch makes use of the ".rodata" instead of the ".bpfilter_umh".

    Config condition:

    CONFIG_BPFILTER=y
    CONFIG_BPFILTER_UMH=y

    Result:

    ld: warning: orphan section `.bpfilter_umh' from
    `net/bpfilter/bpfilter_umh_blob.o' being placed in section `.bpfilter_umh'

    Fixes: 61fbf5933d42 ("net: bpfilter: restart bpfilter_umh when error occurred")
    Reported-by: Stephen Rothwell
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

12 Jan, 2019

3 commits

  • The bpfilter.ko module can be removed while functions of the bpfilter.ko
    are executing. so panic can occurred. in order to protect that, locks can
    be used. a bpfilter_lock protects routines in the
    __bpfilter_process_sockopt() but it's not enough because __exit routine
    can be executed concurrently.

    Now, the bpfilter_umh can not run in parallel.
    So, the module do not removed while it's being used and it do not
    double-create UMH process.
    The members of the umh_info and the bpfilter_umh_ops are protected by
    the bpfilter_umh_ops.lock.

    test commands:
    while :
    do
    iptables -I FORWARD -m string --string ap --algo kmp &
    modprobe -rv bpfilter &
    done

    splat looks like:
    [ 298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b
    [ 298.628512] #PF error: [normal kernel read fault]
    [ 298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0
    [ 298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154
    [ 298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0
    [ 298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6
    [ 298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202
    [ 298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000
    [ 298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000
    [ 298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000
    [ 298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c
    [ 298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000
    [ 298.638859] FS: 00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
    [ 298.638859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0
    [ 298.638859] Call Trace:
    [ 298.638859] ? mutex_lock_io_nested+0x1560/0x1560
    [ 298.638859] ? kasan_kmalloc+0xa0/0xd0
    [ 298.638859] ? kmem_cache_alloc+0x1c2/0x260
    [ 298.638859] ? __alloc_file+0x92/0x3c0
    [ 298.638859] ? alloc_empty_file+0x43/0x120
    [ 298.638859] ? alloc_file_pseudo+0x220/0x330
    [ 298.638859] ? sock_alloc_file+0x39/0x160
    [ 298.638859] ? __sys_socket+0x113/0x1d0
    [ 298.638859] ? __x64_sys_socket+0x6f/0xb0
    [ 298.638859] ? do_syscall_64+0x138/0x560
    [ 298.638859] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 298.638859] ? __alloc_file+0x92/0x3c0
    [ 298.638859] ? init_object+0x6b/0x80
    [ 298.638859] ? cyc2ns_read_end+0x10/0x10
    [ 298.638859] ? cyc2ns_read_end+0x10/0x10
    [ 298.638859] ? hlock_class+0x140/0x140
    [ 298.638859] ? sched_clock_local+0xd4/0x140
    [ 298.638859] ? sched_clock_local+0xd4/0x140
    [ 298.638859] ? check_flags.part.37+0x440/0x440
    [ 298.638859] ? __lock_acquire+0x4f90/0x4f90
    [ 298.638859] ? set_rq_offline.part.89+0x140/0x140
    [ ... ]

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     
  • The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
    error occurred.
    The bpfilter_umh() couldn't start again because there is no restart
    routine.

    The section of the bpfilter_umh_{start/end} is no longer .init.rodata
    because these area should be reused in the restart routine. hence
    the section name is changed to .bpfilter_umh.

    The bpfilter_ops->start() is restart callback. it will be called when
    bpfilter_umh is stopped.
    The stop bit means bpfilter_umh is stopped. this bit is set by both
    start and stop routine.

    Before this patch,
    Test commands:
    $ iptables -vnL
    $ kill -9
    $ iptables -vnL
    [ 480.045136] bpfilter: write fail -32
    $ iptables -vnL

    All iptables commands will fail.

    After this patch,
    Test commands:
    $ iptables -vnL
    $ kill -9
    $ iptables -vnL
    $ iptables -vnL

    Now, all iptables commands will work.

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     
  • Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
    to release members of the umh_info.
    This patch makes bpfilter_umh's cleanup routine to use the
    umh_info->cleanup callback.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

23 Oct, 2018

1 commit


18 Oct, 2018

1 commit

  • pid_task() dereferences rcu protected tasks array.
    But there is no rcu_read_lock() in shutdown_umh() routine so that
    rcu_read_lock() is needed.
    get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
    then calls pid_task(). if task isn't NULL, it increases reference count
    of task.

    test commands:
    %modprobe bpfilter
    %modprobe -rv bpfilter

    splat looks like:
    [15102.030932] =============================
    [15102.030957] WARNING: suspicious RCU usage
    [15102.030985] 4.19.0-rc7+ #21 Not tainted
    [15102.031010] -----------------------------
    [15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
    [15102.031063]
    other info that might help us debug this:

    [15102.031332]
    rcu_scheduler_active = 2, debug_locks = 1
    [15102.031363] 1 lock held by modprobe/1570:
    [15102.031389] #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
    [15102.031552]
    stack backtrace:
    [15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
    [15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [15102.031628] Call Trace:
    [15102.031676] dump_stack+0xc9/0x16b
    [15102.031723] ? show_regs_print_info+0x5/0x5
    [15102.031801] ? lockdep_rcu_suspicious+0x117/0x160
    [15102.031855] pid_task+0x134/0x160
    [15102.031900] ? find_vpid+0xf0/0xf0
    [15102.032017] shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
    [15102.032055] stop_umh+0x46/0x52 [bpfilter]
    [15102.032092] __x64_sys_delete_module+0x47e/0x570
    [ ... ]

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Taehee Yoo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Taehee Yoo
     

06 Oct, 2018

1 commit


16 Aug, 2018

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     

25 Jul, 2018

1 commit


18 Jul, 2018

2 commits


28 Jun, 2018

2 commits


21 Jun, 2018

1 commit


20 Jun, 2018

2 commits

  • net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
    enabled, add it to .gitignore to avoid committing it.

    Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Matteo Croce
    Signed-off-by: David S. Miller

    Matteo Croce
     
  • bpfilter Makefile assumes that the system locale is en_US, and the
    parsing of objdump output fails.
    Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
    only 2 processes instead of 7.

    Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Matteo Croce
    Signed-off-by: David S. Miller

    Matteo Croce
     

08 Jun, 2018

2 commits

  • syzbot reported the following crash
    [ 338.293946] bpfilter: read fail -512
    [ 338.304515] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 338.311863] general protection fault: 0000 [#1] SMP KASAN
    [ 338.344360] RIP: 0010:__vfs_write+0x4a6/0x960
    [ 338.426363] Call Trace:
    [ 338.456967] __kernel_write+0x10c/0x380
    [ 338.460928] __bpfilter_process_sockopt+0x1d8/0x35b
    [ 338.487103] bpfilter_mbox_request+0x4d/0xb0
    [ 338.491492] bpfilter_ip_get_sockopt+0x6b/0x90

    This can happen when multiple cpus trying to talk to user mode process
    via bpfilter_mbox_request(). One cpu grabs the mutex while another goes to
    sleep on the same mutex. Then former cpu sees that umh pipe is down and
    shuts down the pipes. Later cpu finally acquires the mutex and crashes
    on freed pipe.
    Fix the race by using info.pid as an indicator that umh and pipes are healthy
    and check it after acquiring the mutex.

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Reported-by: syzbot+7ade6c94abb2774c0fee@syzkaller.appspotmail.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • CONFIG_OUTPUT_FORMAT is x86 only macro.
    Used objdump to extract elf file format.

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Reported-by: David S. Miller
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

05 Jun, 2018

1 commit


30 May, 2018

1 commit

  • gcc-7.3.0 report following err:

    HOSTCC net/bpfilter/main.o
    In file included from net/bpfilter/main.c:9:0:
    ./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory
    #include

    remove it by adding a include path.
    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")

    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

29 May, 2018

1 commit

  • bpfilter_process_sockopt is a callback that gets called from
    ip_setsockopt() and ip_getsockopt(). However, when CONFIG_INET is
    disabled, it never gets called at all, and assigning a function to the
    callback pointer results in a link failure:

    net/bpfilter/bpfilter_kern.o: In function `__stop_umh':
    bpfilter_kern.c:(.text.unlikely+0x3): undefined reference to `bpfilter_process_sockopt'
    net/bpfilter/bpfilter_kern.o: In function `load_umh':
    bpfilter_kern.c:(.init.text+0x73): undefined reference to `bpfilter_process_sockopt'

    Since there is no caller in this configuration, I assume we can
    simply make the assignment conditional.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

24 May, 2018

3 commits

  • Passing O_CREAT (00000100) to open means we should also pass file
    mode as the third parameter. Creating /dev/console as a regular
    file may not be helpful anyway, so simply drop the flag when
    opening debug_fd.

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • BPFILTER could have been enabled without INET causing this build error:
    ERROR: "bpfilter_process_sockopt" [net/bpfilter/bpfilter.ko] undefined!

    Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
    Reported-by: Jakub Kicinski
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
    and user mode helper code that is embedded into bpfilter.ko

    The steps to build bpfilter.ko are the following:
    - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
    - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
    is converted into bpfilter_umh.o object file
    with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
    Example:
    $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
    0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
    0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
    0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
    - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko

    bpfilter_kern.c is a normal kernel module code that calls
    the fork_usermode_blob() helper to execute part of its own data
    as a user mode process.

    Notice that _binary_net_bpfilter_bpfilter_umh_start - end
    is placed into .init.rodata section, so it's freed as soon as __init
    function of bpfilter.ko is finished.
    As part of __init the bpfilter.ko does first request/reply action
    via two unix pipe provided by fork_usermode_blob() helper to
    make sure that umh is healthy. If not it will kill it via pid.

    Later bpfilter_process_sockopt() will be called from bpfilter hooks
    in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

    If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
    kill umh as well.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov