20 Oct, 2011

1 commit

  • This is just a cleanup.

    My testing version of Smatch warns about this:
    net/core/filter.c +380 check_load_and_stores(6)
    warn: check 'flen' for negative values

    flen comes from the user. We try to clamp the values here between 1
    and BPF_MAXINSNS but the clamp doesn't work because it could be
    negative. This is a bug, but it's not exploitable.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

23 May, 2011

1 commit


28 Apr, 2011

1 commit

  • In order to speedup packet filtering, here is an implementation of a
    JIT compiler for x86_64

    It is disabled by default, and must be enabled by the admin.

    echo 1 >/proc/sys/net/core/bpf_jit_enable

    It uses module_alloc() and module_free() to get memory in the 2GB text
    kernel range since we call helpers functions from the generated code.

    EAX : BPF A accumulator
    EBX : BPF X accumulator
    RDI : pointer to skb (first argument given to JIT function)
    RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
    r9d : skb->len - skb->data_len (headlen)
    r8 : skb->data

    To get a trace of generated code, use :

    echo 2 >/proc/sys/net/core/bpf_jit_enable

    Example of generated code :

    # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

    flen=18 proglen=147 pass=3 image=ffffffffa00b5000
    JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
    JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
    JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
    JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
    JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
    JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
    JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
    JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
    JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
    JIT code: ffffffffa00b5090: c0 c9 c3

    BPF program is 144 bytes long, so native program is almost same size ;)

    (000) ldh [12]
    (001) jeq #0x800 jt 2 jf 8
    (002) ld [26]
    (003) and #0xffffff00
    (004) jeq #0xc0a81400 jt 16 jf 5
    (005) ld [30]
    (006) and #0xffffff00
    (007) jeq #0xc0a81400 jt 16 jf 17
    (008) jeq #0x806 jt 10 jf 9
    (009) jeq #0x8035 jt 10 jf 17
    (010) ld [28]
    (011) and #0xffffff00
    (012) jeq #0xc0a81400 jt 16 jf 13
    (013) ld [38]
    (014) and #0xffffff00
    (015) jeq #0xc0a81400 jt 16 jf 17
    (016) ret #65535
    (017) ret #0

    Signed-off-by: Eric Dumazet
    Cc: Arnaldo Carvalho de Melo
    Cc: Ben Hutchings
    Cc: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Dec, 2010

1 commit


07 Dec, 2010

1 commit

  • Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
    to be able to build advanced filters.

    This can help spreading packets on several sockets with a fast
    selection, after RPS dispatch to N cpus for example, or to catch a
    percentage of flows in one queue.

    tcpdump -s 500 "cpu = 1" :

    [0] ld CPU
    [1] jeq #1 jt 2 jf 3
    [2] ret #500
    [3] ret #0

    # take 12.5 % of flows (average)
    tcpdump -s 1000 "rxhash & 7 = 2" :

    [0] ld RXHASH
    [1] and #7
    [2] jeq #2 jt 3 jf 4
    [3] ret #1000
    [4] ret #0

    Signed-off-by: Eric Dumazet
    Cc: Rui
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Nov, 2010

1 commit

  • Remove pc variable to avoid arithmetic to compute fentry at each filter
    instruction. Jumps directly manipulate fentry pointer.

    As the last instruction of filter[] is guaranteed to be a RETURN, and
    all jumps are before the last instruction, we dont need to check filter
    bounds (number of instructions in filter array) at each iteration, so we
    remove it from sk_run_filter() params.

    On x86_32 remove f_k var introduced in commit 57fe93b374a6b871
    (filter: make sure filters dont read uninitialized memory)

    Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
    avoid too many ifdefs in this code.

    This helps compiler to use cpu registers to hold fentry and A
    accumulator.

    On x86_32, this saves 401 bytes, and more important, sk_run_filter()
    runs much faster because less register pressure (One less conditional
    branch per BPF instruction)

    # size net/core/filter.o net/core/filter_pre.o
    text data bss dec hex filename
    2948 0 0 2948 b84 net/core/filter.o
    3349 0 0 3349 d15 net/core/filter_pre.o

    on x86_64 :
    # size net/core/filter.o net/core/filter_pre.o
    text data bss dec hex filename
    5173 0 0 5173 1435 net/core/filter.o
    5224 0 0 5224 1468 net/core/filter_pre.o

    Signed-off-by: Eric Dumazet
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Nov, 2010

1 commit


26 Jun, 2010

1 commit

  • Gcc is currenlty not in the ability to optimize the switch statement in
    sk_run_filter() because of dense case labels. This patch replace the
    OR'd labels with ordered sequenced case labels. The sk_chk_filter()
    function is modified to patch/replace the original OPCODES in a
    ordered but equivalent form. gcc is now in the ability to transform the
    switch statement in sk_run_filter into a jump table of complexity O(1).

    Until this patch gcc generates a sequence of conditional branches (O(n) of 567
    byte .text segment size (arch x86_64):

    7ff: 8b 06 mov (%rsi),%eax
    801: 66 83 f8 35 cmp $0x35,%ax
    805: 0f 84 d0 02 00 00 je adb
    80b: 0f 87 07 01 00 00 ja 918
    811: 66 83 f8 15 cmp $0x15,%ax
    815: 0f 84 c5 02 00 00 je ae0
    81b: 77 73 ja 890
    81d: 66 83 f8 04 cmp $0x4,%ax
    821: 0f 84 17 02 00 00 je a3e
    827: 77 29 ja 852
    829: 66 83 f8 01 cmp $0x1,%ax
    [...]

    With the modification the compiler translate the switch statement into
    the following jump table fragment:

    7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
    803: 0f 87 1f 02 00 00 ja a28
    809: 0f b7 06 movzwl (%rsi),%eax
    80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
    813: 44 89 e3 mov %r12d,%ebx
    816: e9 43 03 00 00 jmpq b5e
    81b: 41 89 dc mov %ebx,%r12d
    81e: e9 3b 03 00 00 jmpq b5e

    Furthermore, I reordered the instructions to reduce cache line misses by
    order the most common instruction to the start.

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

23 Apr, 2010

1 commit

  • Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
    access to skb->dev->type, as reported in the sll_hatype field.

    When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
    interfaces, there doesn't appear to be a way for the filter program to
    actually find out the underlying hardware type the packet was captured
    on. This patch adds such ability.

    This patch also handles the case where skb->dev can be NULL, such as on
    netlink sockets.

    Signed-off-by: Paul Evans
    Signed-off-by: David S. Miller

    Paul LeoNerd Evans
     

05 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Oct, 2009

2 commits

  • It can help being able to filter packets on their queue_mapping.

    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other functions
    can directly ignore packets with not expected queue number.

    Lets experiment this simple filter extension first.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Allow bpf to set a filter to drop packets that dont
    match a specific mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    jamal
     

20 Nov, 2008

1 commit

  • SKF_AD_NLATTR allows us to find the first matching attribute in a
    stream of netlink attributes from one offset to the end of the
    netlink message. This is not suitable to look for a specific
    matching inside a set of nested attributes.

    For example, in ctnetlink messages, if we look for the CTA_V6_SRC
    attribute in a message that talks about an IPv4 connection,
    SKF_AD_NLATTR returns the offset of CTA_STATUS which has the same
    value of CTA_V6_SRC but outside the nest. To differenciate
    CTA_STATUS and CTA_V6_SRC, we would have to make assumptions on the
    size of the attribute and the usual offset, resulting in horrible
    BSF code.

    This patch adds SKF_AD_NLATTR_NEST, which is a variant of
    SKF_AD_NLATTR, that looks for an attribute inside the limits of
    a nested attributes, but not further.

    This patch validates that we have enough room to look for the
    nested attributes - based on a suggestion from Patrick McHardy.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

10 Apr, 2008

3 commits

  • SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
    parsing and walking attributes. It takes the offset at which to start
    searching in the 'A' register and the attribute type in the 'X' register
    and returns the offset in the 'A' register. When the attribute is not
    found it returns zero.

    A top-level attribute can be located using a filter like this
    (example for nfnetlink, using struct nfgenmsg):

    ...
    {
    /* A = offset of first attribute */
    .code = BPF_LD | BPF_IMM,
    .k = sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg)
    },
    {
    /* X = CTA_PROTOINFO */
    .code = BPF_LDX | BPF_IMM,
    .k = CTA_PROTOINFO,
    },
    {
    /* A = netlink attribute offset */
    .code = BPF_LD | BPF_B | BPF_ABS,
    .k = SKF_AD_OFF + SKF_AD_NLATTR
    },
    {
    /* Exit if not found */
    .code = BPF_JMP | BPF_JEQ | BPF_K,
    .k = 0,
    .jt =
    },
    ...

    A nested attribute below the CTA_PROTOINFO attribute would then
    be parsed like this:

    ...
    {
    /* A += sizeof(struct nlattr) */
    .code = BPF_ALU | BPF_ADD | BPF_K,
    .k = sizeof(struct nlattr),
    },
    {
    /* X = CTA_PROTOINFO_TCP */
    .code = BPF_LDX | BPF_IMM,
    .k = CTA_PROTOINFO_TCP,
    },
    {
    /* A = netlink attribute offset */
    .code = BPF_LD | BPF_B | BPF_ABS,
    .k = SKF_AD_OFF + SKF_AD_NLATTR
    },
    ...

    The data of an attribute can be loaded into 'A' like this:

    ...
    {
    /* X = A (attribute offset) */
    .code = BPF_MISC | BPF_TAX,
    },
    {
    /* A = skb->data[X + k] */
    .code = BPF_LD | BPF_B | BPF_IND,
    .k = sizeof(struct nlattr),
    },
    ...

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The sk_filter function is too big to be inlined. This saves 2296 bytes
    of text on allyesconfig.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Some minor style cleanups:
    * Move __KERNEL__ definitions to one place in filter.h
    * Use const for sk_filter_len
    * Line wrapping
    * Put EXPORT_SYMBOL next to function definition

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

18 Oct, 2007

1 commit


23 Sep, 2006

1 commit

  • Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
    needlock = 0, while socket is not locked at that moment. In order to avoid
    this and similar issues in the future, use rcu for sk->sk_filter field read
    protection.

    Signed-off-by: Dmitry Mishin
    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev

    Dmitry Mishin
     

07 Jan, 2006

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds