30 May, 2018

1 commit

  • [ Upstream commit f29cdfbe33d6915ba8056179b0041279a67e3647 ]

    tcf_skbmod_init() can fail after the idr has been successfully reserved.
    When this happens, every subsequent attempt to configure skbmod rules
    using the same idr value will systematically fail with -ENOSPC, unless
    the first attempt was done using the 'replace' keyword:

    # tc action add action skbmod swap mac index 100
    RTNETLINK answers: Cannot allocate memory
    We have an error talking to the kernel
    # tc action add action skbmod swap mac index 100
    RTNETLINK answers: No space left on device
    We have an error talking to the kernel
    # tc action add action skbmod swap mac index 100
    RTNETLINK answers: No space left on device
    We have an error talking to the kernel
    ...

    Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
    on the error path when the idr has been reserved, but not yet inserted.
    Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
    implicitly become a 'delete' that leaks refcount in act_skbmod module:

    # rmmod act_skbmod; modprobe act_skbmod
    # tc action add action skbmod swap mac index 100
    # tc action add action skbmod swap mac continue index 100
    RTNETLINK answers: File exists
    We have an error talking to the kernel
    # tc action replace action skbmod swap mac continue index 100
    RTNETLINK answers: Cannot allocate memory
    We have an error talking to the kernel
    # tc action list action skbmod
    #
    # rmmod act_skbmod
    rmmod: ERROR: Module act_skbmod is in use

    Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

19 May, 2018

1 commit

  • [ Upstream commit a52956dfc503f8cc5cfe6454959b7049fddb4413 ]

    When application fails to pass flags in netlink TLV when replacing
    existing skbmod action, the kernel will leak refcnt:

    $ tc actions get action skbmod index 1
    total acts 0

    action order 0: skbmod pipe set smac 00:11:22:33:44:55
    index 1 ref 1 bind 0

    For example, at this point a buggy application replaces the action with
    index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags,
    however refcnt gets bumped:

    $ tc actions get actions skbmod index 1
    total acts 0

    action order 0: skbmod pipe set smac 00:11:22:33:44:55
    index 1 ref 2 bind 0
    $

    Tha patch fixes this by calling tcf_idr_release() on existing actions.

    Fixes: 86da71b57383d ("net_sched: Introduce skbmod action")
    Signed-off-by: Roman Mashak
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Roman Mashak
     

12 Apr, 2018

1 commit

  • [ Upstream commit 2d433610176d6569e8b3a28f67bc72235bf69efc ]

    when the following command

    # tc action replace action skbmod swap mac index 100

    is run for the first time, and tcf_skbmod_init() fails to allocate struct
    tcf_skbmod_params, tcf_skbmod_cleanup() calls kfree_rcu(NULL), thus
    causing the following error:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: __call_rcu+0x23/0x2b0
    PGD 8000000034057067 P4D 8000000034057067 PUD 74937067 PMD 0
    Oops: 0002 [#1] SMP PTI
    Modules linked in: act_skbmod(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache jbd2 crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep pcbc snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd glue_helper snd cryptd virtio_balloon joydev soundcore pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_net virtio_blk ata_piix libata crc32c_intel virtio_pci serio_raw virtio_ring virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_skbmod]
    CPU: 3 PID: 3144 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:__call_rcu+0x23/0x2b0
    RSP: 0018:ffffbd2e403e7798 EFLAGS: 00010246
    RAX: ffffffffc0872080 RBX: ffff981d34bff780 RCX: 00000000ffffffff
    RDX: ffffffff922a5f00 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000021f
    R10: 000000003d003000 R11: 0000000000aaaaaa R12: 0000000000000000
    R13: ffffffff922a5f00 R14: 0000000000000001 R15: ffff981d3b698c2c
    FS: 00007f3678292740(0000) GS:ffff981d3fd80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 000000007c57a006 CR4: 00000000001606e0
    Call Trace:
    __tcf_idr_release+0x79/0xf0
    tcf_skbmod_init+0x1d1/0x210 [act_skbmod]
    tcf_action_init_1+0x2cc/0x430
    tcf_action_init+0xd3/0x1b0
    tc_ctl_action+0x18b/0x240
    rtnetlink_rcv_msg+0x29c/0x310
    ? _cond_resched+0x15/0x30
    ? __kmalloc_node_track_caller+0x1b9/0x270
    ? rtnl_calcit.isra.28+0x100/0x100
    netlink_rcv_skb+0xd2/0x110
    netlink_unicast+0x17c/0x230
    netlink_sendmsg+0x2cd/0x3c0
    sock_sendmsg+0x30/0x40
    ___sys_sendmsg+0x27a/0x290
    ? filemap_map_pages+0x34a/0x3a0
    ? __handle_mm_fault+0xbfd/0xe20
    __sys_sendmsg+0x51/0x90
    do_syscall_64+0x6e/0x1a0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x7f36776a3ba0
    RSP: 002b:00007fff4703b618 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007fff4703b740 RCX: 00007f36776a3ba0
    RDX: 0000000000000000 RSI: 00007fff4703b690 RDI: 0000000000000003
    RBP: 000000005aaaba36 R08: 0000000000000002 R09: 0000000000000000
    R10: 00007fff4703b0a0 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007fff4703b754 R14: 0000000000000001 R15: 0000000000669f60
    Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
    RIP: __call_rcu+0x23/0x2b0 RSP: ffffbd2e403e7798
    CR2: 0000000000000008

    Fix it in tcf_skbmod_cleanup(), ensuring that kfree_rcu(p, ...) is called
    only when p is not NULL.

    Fixes: 86da71b57383 ("net_sched: Introduce skbmod action")
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

09 Nov, 2017

1 commit

  • This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
    If we hold that refcnt, the netns can never be destroyed until
    all actions are destroyed by user, this breaks our netns design
    which we expect all actions are destroyed when we destroy the
    whole netns.

    Cc: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

03 Nov, 2017

1 commit

  • TC actions have been destroyed asynchronously for a long time,
    previously in a RCU callback and now in a workqueue. If we
    don't hold a refcnt for its netns, we could use the per netns
    data structure, struct tcf_idrinfo, after it has been freed by
    netns workqueue.

    Hold refcnt to ensure netns destroy happens after all actions
    are gone.

    Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
    Reported-by: Lucas Bates
    Tested-by: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 Aug, 2017

1 commit

  • Typically, each TC filter has its own action. All the actions of the
    same type are saved in its hash table. But the hash buckets are too
    small that it degrades to a list. And the performance is greatly
    affected. For example, it takes about 0m11.914s to insert 64K rules.
    If we convert the hash table to IDR, it only takes about 0m1.500s.
    The improvement is huge.

    But please note that the test result is based on previous patch that
    cls_flower uses IDR.

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Chris Mi
     

14 Apr, 2017

1 commit


08 Mar, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

16 Sep, 2016

1 commit

  • This action is intended to be an upgrade from a usability perspective
    from pedit (as well as operational debugability).
    Compare this:

    sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action pedit munge offset -14 u8 set 0x02 \
    munge offset -13 u8 set 0x15 \
    munge offset -12 u8 set 0x15 \
    munge offset -11 u8 set 0x15 \
    munge offset -10 u16 set 0x1515 \
    pipe

    to:

    sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action skbmod dmac 02:15:15:15:15:15

    Also try to do a MAC address swap with pedit or worse
    try to debug a policy with destination mac, source mac and
    etherype. Then make few rules out of those and you'll get my point.

    In the future common use cases on pedit can be migrated to this action
    (as an example different fields in ip v4/6, transports like tcp/udp/sctp
    etc). For this first cut, this allows modifying basic ethernet header.

    The most important ethernet use case at the moment is when redirecting or
    mirroring packets to a remote machine. The dst mac address needs a re-write
    so that it doesnt get dropped or confuse an interconnecting (learning) switch
    or dropped by a target machine (which looks at the dst mac). And at times
    when flipping back the packet a swap of the MAC addresses is needed.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim