23 Sep, 2016

1 commit


20 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • This action is intended to be an upgrade from a usability perspective
    from pedit (as well as operational debugability).
    Compare this:

    sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action pedit munge offset -14 u8 set 0x02 \
    munge offset -13 u8 set 0x15 \
    munge offset -12 u8 set 0x15 \
    munge offset -11 u8 set 0x15 \
    munge offset -10 u16 set 0x1515 \
    pipe

    to:

    sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action skbmod dmac 02:15:15:15:15:15

    Also try to do a MAC address swap with pedit or worse
    try to debug a policy with destination mac, source mac and
    etherype. Then make few rules out of those and you'll get my point.

    In the future common use cases on pedit can be migrated to this action
    (as an example different fields in ip v4/6, transports like tcp/udp/sctp
    etc). For this first cut, this allows modifying basic ethernet header.

    The most important ethernet use case at the moment is when redirecting or
    mirroring packets to a remote machine. The dst mac address needs a re-write
    so that it doesnt get dropped or confuse an interconnecting (learning) switch
    or dropped by a target machine (which looks at the dst mac). And at times
    when flipping back the packet a swap of the MAC addresses is needed.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

11 Sep, 2016

1 commit

  • This action could be used before redirecting packets to a shared tunnel
    device, or when redirecting packets arriving from a such a device.

    The action will release the metadata created by the tunnel device
    (decap), or set the metadata with the specified values for encap
    operation.

    For example, the following flower filter will forward all ICMP packets
    destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
    redirecting, a metadata for the vxlan tunnel is created using the
    tunnel_key action and it's arguments:

    $ tc filter add dev net0 protocol ip parent ffff: \
    flower \
    ip_proto 1 \
    dst_ip 11.11.11.2 \
    action tunnel_key set \
    src_ip 11.11.0.1 \
    dst_ip 11.11.0.2 \
    id 11 \
    action mirred egress redirect dev vxlan0

    Signed-off-by: Amir Vadai
    Signed-off-by: Hadar Hen Zion
    Reviewed-by: Shmulik Ladkani
    Acked-by: Jamal Hadi Salim
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Amir Vadai
     

19 Aug, 2016

1 commit

  • The current vlan push action supports only vid and protocol options.
    Add priority option.

    Example script that adds vlan push action with vid and
    priority:

    tc filter add dev veth0 protocol ip parent ffff: \
    flower \
    indev veth0 \
    action vlan push id 100 priority 5

    Signed-off-by: Hadar Hen Zion
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Hadar Hen Zion
     

26 Jul, 2016

2 commits

  • After the previous patch, struct tc_action should be enough
    to represent the generic tc action, tcf_common is not necessary
    any more. This patch gets rid of it to make tc action code
    more readable.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

25 Jul, 2016

1 commit


05 Jul, 2016

1 commit

  • Extremely useful for setting packet type to host so i dont
    have to modify the dst mac address using pedit (which requires
    that i know the mac address)

    Example usage:
    tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \
    match ip src 5.5.5.5/32 \
    flowid 1:5 action skbedit ptype host

    This will tag all packets incoming from 5.5.5.5 with type
    PACKET_HOST

    Signed-off-by: Jamal Hadi Salim
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

30 Jun, 2016

1 commit


24 Jun, 2016

1 commit

  • Alexey reported that we have GFP_KERNEL allocation when
    holding the spinlock tcf_lock. Actually we don't have
    to take that spinlock for all the cases, especially
    for the new one we just create. To modify the existing
    actions, we still need this spinlock to make sure
    the whole update is atomic.

    For net-next, we can get rid of this spinlock because
    we already hold the RTNL lock on slow path, and on fast
    path we can use RCU to protect the metalist.

    Joint work with Jamal.

    Reported-by: Alexey Khoroshilov
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

08 Jun, 2016

1 commit


04 May, 2016

1 commit


11 Mar, 2016

2 commits


02 Mar, 2016

1 commit

  • This action allows for a sending side to encapsulate arbitrary metadata
    which is decapsulated by the receiving end.
    The sender runs in encoding mode and the receiver in decode mode.
    Both sender and receiver must specify the same ethertype.
    At some point we hope to have a registered ethertype and we'll
    then provide a default so the user doesnt have to specify it.
    For now we enforce the user specify it.

    Lets show example usage where we encode icmp from a sender towards
    a receiver with an skbmark of 17; both sender and receiver use
    ethertype of 0xdead to interop.

    YYYY: Lets start with Receiver-side policy config:
    xxx: add an ingress qdisc
    sudo tc qdisc add dev $ETH ingress

    xxx: any packets with ethertype 0xdead will be subjected to ife decoding
    xxx: we then restart the classification so we can match on icmp at prio 3
    sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
    u32 match u32 0 0 flowid 1:1 \
    action ife decode reclassify

    xxx: on restarting the classification from above if it was an icmp
    xxx: packet, then match it here and continue to the next rule at prio 4
    xxx: which will match based on skb mark of 17
    sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
    u32 match ip protocol 1 0xff flowid 1:1 \
    action continue

    xxx: match on skbmark of 0x11 (decimal 17) and accept
    sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
    handle 0x11 fw flowid 1:1 \
    action ok

    xxx: Lets show the decoding policy
    sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
    xxx:
    filter pref 2 u32
    filter pref 2 u32 fh 800: ht divisor 1
    filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0)
    match 00000000/00000000 at 0 (success 0 )
    action order 1: ife decode action reclassify
    index 1 ref 1 bind 1 installed 14 sec used 14 sec
    type: 0x0
    Metadata: allow mark allow hash allow prio allow qmap
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    xxx:
    Observe that above lists all metadatum it can decode. Typically these
    submodules will already be compiled into a monolithic kernel or
    loaded as modules

    YYYY: Lets show the sender side now ..

    xxx: Add an egress qdisc on the sender netdev
    sudo tc qdisc add dev $ETH root handle 1: prio
    xxx:
    xxx: Match all icmp packets to 192.168.122.237/24, then
    xxx: tag the packet with skb mark of decimal 17, then
    xxx: Encode it with:
    xxx: ethertype 0xdead
    xxx: add skb->mark to whitelist of metadatum to send
    xxx: rewrite target dst MAC address to 02:15:15:15:15:15
    xxx:
    sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \
    match ip dst 192.168.122.237/24 \
    match ip protocol 1 0xff \
    flowid 1:2 \
    action skbedit mark 17 \
    action ife encode \
    type 0xDEAD \
    allow mark \
    dst 02:15:15:15:15:15

    xxx: Lets show the encoding policy
    sudo tc -s filter ls dev $ETH parent 1: protocol ip
    xxx:
    filter pref 10 u32
    filter pref 10 u32 fh 800: ht divisor 1
    filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 0 success 0)
    match c0a87aed/ffffffff at 16 (success 0 )
    match 00010000/00ff0000 at 8 (success 0 )

    action order 1: skbedit mark 17
    index 6 ref 1 bind 1
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    action order 2: ife encode action pipe
    index 3 ref 1 bind 1
    dst MAC: 02:15:15:15:15:15 type: 0xDEAD
    Metadata: allow mark
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    xxx:

    test by sending ping from sender to destination

    Signed-off-by: Jamal Hadi Salim
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

17 Feb, 2016

1 commit


19 Sep, 2015

1 commit


27 Aug, 2015

1 commit

  • Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
    with extra care taken to free bpf programs after rcu grace period.
    Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
    and final destruction is done from tc_action_ops->cleanup() callback that is
    called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
    bind and refcnt reach zero which is only possible when classifier is destroyed.
    Previous two patches fixed the last two classifiers (tcindex and rsvp) to
    call tcf_exts_destroy() from rcu callback.

    Similar to gact/mirred there is a race between prog->filter and
    prog->tcf_action. Meaning that the program being replaced may use
    previous default action if it happened to return TC_ACT_UNSPEC.
    act_mirred race betwen tcf_action and tcfm_dev is similar.
    In all cases the race is harmless.
    Long term we may want to improve the situation by replacing the whole
    tc_action->priv as single pointer instead of updating inner fields one by one.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

09 Jul, 2015

2 commits

  • Like act_gact, act_mirred can be lockless in packet processing

    1) Use percpu stats
    2) update lastuse only every clock tick to avoid false sharing
    3) use rcu to protect tcfm_dev
    4) Remove spinlock usage, as it is no longer needed.

    Next step : add multi queue capability to ifb device

    Signed-off-by: Eric Dumazet
    Cc: Alexei Starovoitov
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Acked-by: Jamal Hadi Salim
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Second step for gact RCU operation :

    We want to get rid of the spinlock protecting gact operations.
    Stats (packets/bytes) will soon be per cpu.

    gact_determ() would not work without a central packet counter,
    so lets add it for this mode.

    Signed-off-by: Eric Dumazet
    Cc: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Mar, 2015

1 commit

  • This work extends the "classic" BPF programmable tc action by extending
    its scope also to native eBPF code!

    Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support
    for programmable classifiers") this adds the facility to implement fully
    flexible classifier and actions for tc that can be implemented in a C
    subset in user space, "safely" loaded into the kernel, and being run in
    native speed when JITed.

    Also, since eBPF maps can be shared between eBPF programs, it offers the
    possibility that cls_bpf and act_bpf can share data 1) between themselves
    and 2) between user space applications. That means that, f.e. customized
    runtime statistics can be collected in user space, but also more importantly
    classifier and action behaviour could be altered based on map input from
    the user space application.

    For the remaining details on the workflow and integration, see the cls_bpf
    commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1].

    [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act

    Signed-off-by: Daniel Borkmann
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Acked-by: Jiri Pirko
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

20 Jan, 2015

1 commit

  • This tc action allows you to retrieve the connection tracking mark
    This action has been used heavily by openwrt for a few years now.

    There are known limitations currently:

    doesn't work for initial packets, since we only query the ct table.
    Fine given use case is for returning packets

    no implicit defrag.
    frags should be rare so fix later..

    won't work for more complex tasks, e.g. lookup of other extensions
    since we have no means to store results

    we still have a 2nd lookup later on via normal conntrack path.
    This shouldn't break anything though since skb->nfct isn't altered.

    V2:
    remove unnecessary braces (Jiri)
    change the action identifier to 14 (Jiri)
    Fix some stylistic issues caught by checkpatch
    V3:
    Move module params to bottom (Cong)
    Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)

    Acked-by: Jiri Pirko
    Signed-off-by: Felix Fietkau
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Felix Fietkau
     

18 Jan, 2015

1 commit


25 Nov, 2014

1 commit


22 Nov, 2014

1 commit

  • This tc action allows to work with vlan tagged skbs. Two supported
    sub-actions are header pop and header push.

    Signed-off-by: Jiri Pirko
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

13 Feb, 2014

1 commit

  • Now we can totally hide it from modules. tcf_hash_*() API's
    will operate on struct tc_action, modules don't need to care about
    the details.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

07 Dec, 2013

1 commit

  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

20 Aug, 2010

1 commit

  • net/sched: add ACT_CSUM action to update packets checksums

    ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
    altered checksums in IPv4 and IPv6 packets. The following checksums are
    supported by this patch:
    - IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
    - IPv6: ICMPv6, TCP, UDP & UDPLite
    It's possible to request in the same action to update different kind of
    checksums, if the packets flow mix TCP, UDP and UDPLite, ...

    An example of usage is done in the associated iproute2 patch.

    Version 3 changes:
    - remove useless goto instructions
    - improve IPv6 hop options decoding

    Version 2 changes:
    - coding style correction
    - remove useless arguments of some functions
    - use stack in tcf_csum_dump()
    - add tcf_csum_skb_nextlayer() to factor code

    Signed-off-by: Gregoire Baron
    Acked-by: jamal
    Signed-off-by: David S. Miller

    Grégoire Baron
     

25 Jul, 2010

1 commit

  • This fixes hang when target device of mirred packet classifier
    action is removed.

    If a mirror or redirection action is configured to cause packets
    to go to another device, the classifier holds a ref count, but was assuming
    the adminstrator cleaned up all redirections before removing. The fix
    is to add a notifier and cleanup during unregister.

    The new list is implicitly protected by RTNL mutex because
    it is held during filter add/delete as well as notifier.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    stephen hemminger
     

23 Oct, 2009

1 commit


13 Sep, 2008

1 commit


11 Oct, 2007

1 commit

  • Stateless NAT is useful in controlled environments where restrictions are
    placed on through traffic such that we don't need connection tracking to
    correctly NAT protocol-specific data.

    In particular, this is of interest when the number of flows or the number
    of addresses being NATed is large, or if connection tracking information
    has to be replicated and where it is not practical to do so.

    Previously we had stateless NAT functionality which was integrated into
    the IPv4 routing subsystem. This was a great solution as long as the NAT
    worked on a subnet to subnet basis such that the number of NAT rules was
    relatively small. The reason is that for SNAT the routing based system
    had to perform a linear scan through the rules.

    If the number of rules is large then major renovations would have take
    place in the routing subsystem to make this practical.

    For the time being, the least intrusive way of achieving this is to use
    the u32 classifier written by Alexey Kuznetsov along with the actions
    infrastructure implemented by Jamal Hadi Salim.

    The following patch is an attempt at this problem by creating a new nat
    action that can be invoked from u32 hash tables which would allow large
    number of stateless NAT rules that can be used/updated in constant time.

    The actual NAT code is mostly based on the previous stateless NAT code
    written by Alexey. In future we might be able to utilise the protocol
    NAT code from netfilter to improve support for other protocols.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

23 Sep, 2006

1 commit

  • This was simply making templates of functions and mostly causing a lot
    of code duplication in the classifier action modules.

    We solve this more cleanly by having a common "struct tcf_common" that
    hash worker functions contained once in act_api.c can work with.

    Callers work with real action objects that have the common struct
    plus their module specific struct members. You go from a common
    object to the higher level one using a "to_foo()" macro which makes
    use of container_of() to do the dirty work.

    This also kills off act_generic.h which was only used by act_simple.c
    and keeping it around was more work than the it's value.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Mar, 2006

1 commit


25 Apr, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds