17 Nov, 2015

1 commit

  • During review I noticed that the icache range we're flushing should
    start at header already and not at ctx.image.

    Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize
    and write protect JIT code"), we also want to make sure to flush the
    random-sized trap in front of the start of the actual program (analogous
    to x86). No operational differences from user side.

    Signed-off-by: Daniel Borkmann
    Tested-by: Nicolas Schichan
    Cc: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

20 Oct, 2015

1 commit


05 Oct, 2015

2 commits

  • For ARMv7 with UDIV instruction support, generate an UDIV instruction
    followed by an MLS instruction.

    For other ARM variants, generate code calling a C wrapper similar to
    the jit_udiv() function used for BPF_ALU | BPF_DIV instructions.

    Some performance numbers reported by the test_bpf module (the duration
    per filter run is reported in nanoseconds, between "jitted:" and
    "PASS":

    ARMv7 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS
    ARMv7 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 104 PASS
    ARMv5 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS
    ARMv5 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS
    ARMv5 kirkwood nojit: test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS
    ARMv5 kirkwood jit: test_bpf: #3 DIV_MOD_KX jited:1 311 PASS

    Signed-off-by: Nicolas Schichan
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Nicolas Schichan
     
  • Without this patch, if the only instructions using r_X are of the
    BPF_LD | BPF_IND type, r_X would not be reset to 0, using whatever
    value was there when entering the jited code. With this patch, r_X
    will be correctly marked as used so it will be reset to 0 in the
    prologue code.

    This fix also makes the test "LD_IND byte default X" pass in the
    test_bpf module when the ARM JIT is enabled.

    Signed-off-by: Nicolas Schichan
    Signed-off-by: David S. Miller

    Nicolas Schichan
     

03 Oct, 2015

1 commit

  • As we need to add further flags to the bpf_prog structure, lets migrate
    both bools to a bitfield representation. The size of the base structure
    (excluding insns) remains unchanged at 40 bytes.

    Add also tags for the kmemchecker, so that it doesn't throw false
    positives. Even in case gcc would generate suboptimal code, it's not
    being accessed in performance critical paths.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

28 Jul, 2015

3 commits


22 Jul, 2015

3 commits

  • This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
    have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
    and LD_VLAN_TAG_PRESENT tests pass.

    Signed-off-by: Nicolas Schichan
    Signed-off-by: David S. Miller

    Nicolas Schichan
     
  • Previously, the JIT would reject negative offsets known during code
    generation and mishandle negative offsets provided at runtime.

    Fix that by calling bpf_internal_load_pointer_neg_helper()
    appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
    the execution flow to the slow path helpers when the offset is
    negative.

    Signed-off-by: Nicolas Schichan
    Signed-off-by: David S. Miller

    Nicolas Schichan
     
  • To check whether the load should take the fast path or not, the code
    would check that (r_skb_hlen - load_order) is greater than the offset
    of the access using an "Unsigned higher or same" condition. For
    halfword accesses and an skb length of 1 at offset 0, that test is
    valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path
    is taken and the filter allows the load to wrongly succeed. A similar
    issue exists for word loads at offset 0 and an skb length of less than
    4.

    Fix that by using the condition "Signed greater than or equal"
    condition for the fast path code for load orders greater than 0.

    Signed-off-by: Nicolas Schichan
    Signed-off-by: David S. Miller

    Nicolas Schichan
     

14 May, 2015

1 commit

  • Four minor merge conflicts:

    1) qca_spi.c renamed the local variable used for the SPI device
    from spi_device to spi, meanwhile the spi_set_drvdata() call
    got moved further up in the probe function.

    2) Two changes were both adding new members to codel params
    structure, and thus we had overlapping changes to the
    initializer function.

    3) 'net' was making a fix to sk_release_kernel() which is
    completely removed in 'net-next'.

    4) In net_namespace.c, the rtnl_net_fill() call for GET operations
    had the command value fixed, meanwhile 'net-next' adjusted the
    argument signature a bit.

    This also matches example merge resolutions posted by Stephen
    Rothwell over the past two days.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 May, 2015

1 commit


11 May, 2015

2 commits

  • …an't fit into 12bits.

    The ARM JIT code emits "ldr rX, [pc, #offset]" to access the literal
    pool. #offset maximum value is 4095 and if the generated code is too
    large, the #offset value can overflow and not point to the expected
    slot in the literal pool. Additionally, when overflow occurs, bits of
    the overflow can end up changing the destination register of the ldr
    instruction.

    Fix that by detecting the overflow in imm_offset() and setting a flag
    that is checked for each BPF instructions converted in
    build_body(). As of now it can only be detected in the second pass. As
    a result the second build_body() call can now fail, so add the
    corresponding cleanup code in that case.

    Using multiple literal pools in the JITed code is going to require
    lots of intrusive changes to the JIT code (which would better be done
    as a feature instead of fix), just delegating to the kernel BPF
    interpreter in that case is a more straight forward, minimal fix and
    easy to backport.

    Fixes: ddecdfcea0ae ("ARM: 7259/3: net: JIT compiler for packet filters")
    Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Nicolas Schichan
     
  • In that case, emit_udiv() will be called with rn == ARM_R0 (r_scratch)
    and loading rm first into ARM_R0 will result in jit_udiv() function
    being called the same dividend and divisor. Fix that by loading rn
    first into ARM_R1 and then rm into ARM_R0.

    Signed-off-by: Nicolas Schichan
    Cc: # v3.13+
    Fixes: aee636c4809f (bpf: do not use reciprocal divide)
    Acked-by: Mircea Gherzan
    Signed-off-by: David S. Miller

    Nicolas Schichan
     

24 Sep, 2014

1 commit

  • Will Deacon pointed out, that the currently used opcode for filling holes,
    that is 0xe7ffffff, seems not robust enough ...

    $ echo 0xffffffe7 | xxd -r > test.bin
    $ arm-linux-gnueabihf-objdump -m arm -D -b binary test.bin
    ...
    0: e7ffffff udf #65535 ; 0xffff

    ... while for Thumb, it ends up as ...

    0: ffff e7ff vqshl.u64 q15, , #63

    ... which is a bit fragile. The ARM specification defines some *permanently*
    guaranteed undefined instruction (UDF) space, for example for ARM in ARMv7-AR,
    section A5.4 and for Thumb in ARMv7-M, section A5.2.6.

    Similarly, ptrace, kprobes, kgdb, bug and uprobes make use of such instruction
    as well to trap. Given mentioned section from the specification, we can find
    such a universe as (where 'x' denotes 'don't care'):

    ARM: xxxx 0111 1111 xxxx xxxx xxxx 1111 xxxx
    Thumb: 1101 1110 xxxx xxxx

    We therefore should use a more robust opcode that fits both. Russell King
    suggested that we can even reuse a single 32-bit word, that is, 0xe7fddef1
    which will fault if executed in ARM *or* Thumb mode as done in f928d4f2a86f
    ("ARM: poison the vectors page"). That will still hold our requirements:

    $ echo 0xf1defde7 | xxd -r > test.bin
    $ arm-unknown-linux-gnueabi-objdump -m arm -D -b binary test.bin
    ...
    0: e7fddef1 udf #56801 ; 0xdde1
    $ echo 0xf1defde7f1defde7f1defde7 | xxd -r > test.bin
    $ arm-unknown-linux-gnueabi-objdump -marm -Mforce-thumb -D -b binary test.bin
    ...
    0: def1 udf #241 ; 0xf1
    2: e7fd b.n 0x0
    4: def1 udf #241 ; 0xf1
    6: e7fd b.n 0x4
    8: def1 udf #241 ; 0xf1
    a: e7fd b.n 0x8

    So on ARM 0xe7fddef1 conforms to the above UDF pattern, and the low 16 bit
    likewise correspond to UDF in Thumb case. The 0xe7fd part is an unconditional
    branch back to the UDF instruction.

    Signed-off-by: Daniel Borkmann
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Mircea Gherzan
    Cc: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Sep, 2014

2 commits

  • Reported by Mikulas Patocka, kmemcheck currently barks out a
    false positive since we don't have special kmemcheck annotation
    for bitfields used in bpf_prog structure.

    We currently have jited:1, len:31 and thus when accessing len
    while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
    we're reading uninitialized memory.

    As we don't need the whole bit universe for pages member, we
    can just split it to u16 and use a bool flag for jited instead
    of a bitfield.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This is the ARM variant for 314beb9bcab ("x86: bpf_jit_comp: secure bpf
    jit against spraying attacks").

    It is now possible to implement it due to commits 75374ad47c64 ("ARM: mm:
    Define set_memory_* functions for ARM") and dca9aa92fc7c ("ARM: add
    DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for
    this facility.

    Thus, this patch makes sure the BPF generated JIT code is marked RO, as
    other kernel text sections, and also lets the generated JIT code start
    at a pseudo random offset instead on a page boundary. The holes are filled
    with illegal instructions.

    JIT tested on armv7hl with BPF test suite.

    Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Acked-by: Mircea Gherzan
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

06 Sep, 2014

1 commit

  • With eBPF getting more extended and exposure to user space is on it's way,
    hardening the memory range the interpreter uses to steer its command flow
    seems appropriate. This patch moves the to be interpreted bytecode to
    read-only pages.

    In case we execute a corrupted BPF interpreter image for some reason e.g.
    caused by an attacker which got past a verifier stage, it would not only
    provide arbitrary read/write memory access but arbitrary function calls
    as well. After setting up the BPF interpreter image, its contents do not
    change until destruction time, thus we can setup the image on immutable
    made pages in order to mitigate modifications to that code. The idea
    is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
    against spraying attacks").

    This is possible because bpf_prog is not part of sk_filter anymore.
    After setup bpf_prog cannot be altered during its life-time. This prevents
    any modifications to the entire bpf_prog structure (incl. function/JIT
    image pointer).

    Every eBPF program (including classic BPF that are migrated) have to call
    bpf_prog_select_runtime() to select either interpreter or a JIT image
    as a last setup step, and they all are being freed via bpf_prog_free(),
    including non-JIT. Therefore, we can easily integrate this into the
    eBPF life-time, plus since we directly allocate a bpf_prog, we have no
    performance penalty.

    Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
    inspection of kernel_page_tables. Brad Spengler proposed the same idea
    via Twitter during development of this patch.

    Joint work with Hannes Frederic Sowa.

    Suggested-by: Brad Spengler
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Hannes Frederic Sowa
    Cc: Alexei Starovoitov
    Cc: Kees Cook
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Aug, 2014

1 commit

  • clean up names related to socket filtering and bpf in the following way:
    - everything that deals with sockets keeps 'sk_*' prefix
    - everything that is pure BPF is changed to 'bpf_*' prefix

    split 'struct sk_filter' into
    struct sk_filter {
    atomic_t refcnt;
    struct rcu_head rcu;
    struct bpf_prog *prog;
    };
    and
    struct bpf_prog {
    u32 jited:1,
    len:31;
    struct sock_fprog_kern *orig_prog;
    unsigned int (*bpf_func)(const struct sk_buff *skb,
    const struct bpf_insn *filter);
    union {
    struct sock_filter insns[0];
    struct bpf_insn insnsi[0];
    struct work_struct work;
    };
    };
    so that 'struct bpf_prog' can be used independent of sockets and cleans up
    'unattached' bpf use cases

    split SK_RUN_FILTER macro into:
    SK_RUN_FILTER to be used with 'struct sk_filter *' and
    BPF_PROG_RUN to be used with 'struct bpf_prog *'

    __sk_filter_release(struct sk_filter *) gains
    __bpf_prog_release(struct bpf_prog *) helper function

    also perform related renames for the functions that work
    with 'struct bpf_prog *', since they're on the same lines:

    sk_filter_size -> bpf_prog_size
    sk_filter_select_runtime -> bpf_prog_select_runtime
    sk_filter_free -> bpf_prog_free
    sk_unattached_filter_create -> bpf_prog_create
    sk_unattached_filter_destroy -> bpf_prog_destroy
    sk_store_orig_filter -> bpf_prog_store_orig_filter
    sk_release_orig_filter -> bpf_release_orig_filter
    __sk_migrate_filter -> bpf_migrate_filter
    __sk_prepare_filter -> bpf_prepare_filter

    API for attaching classic BPF to a socket stays the same:
    sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
    and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
    which is used by sockets, tun, af_packet

    API for 'unattached' BPF programs becomes:
    bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
    and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
    which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

02 Jun, 2014

1 commit

  • This patch finally allows us to get rid of the BPF_S_* enum.
    Currently, the code performs unnecessary encode and decode
    workarounds in seccomp and filter migration itself when a filter
    is being attached in order to overcome BPF_S_* encoding which
    is not used anymore by the new interpreter resp. JIT compilers.

    Keeping it around would mean that also in future we would need
    to extend and maintain this enum and related encoders/decoders.
    We can get rid of all that and save us these operations during
    filter attaching. Naturally, also JIT compilers need to be updated
    by this.

    Before JIT conversion is being done, each compiler checks if A
    is being loaded at startup to obtain information if it needs to
    emit instructions to clear A first. Since BPF extensions are a
    subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
    for extensions can be removed at that point. To ease and minimalize
    code changes in the classic JITs, we have introduced bpf_anc_helper().

    Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
    arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
    unfortunately didn't have access, but changes are analogous to
    the rest.

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Mircea Gherzan
    Cc: Kees Cook
    Acked-by: Chema Gonzalez
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

31 Mar, 2014

1 commit

  • This patch adds a jited flag into sk_filter struct in order to indicate
    whether a filter is currently jited or not. The size of sk_filter is
    not being expanded as the 32 bit 'len' member allows upper bits to be
    reused since a filter can currently only grow as large as BPF_MAXINSNS.

    Therefore, there's enough room also for other in future needed flags to
    reuse 'len' field if necessary. The jited flag also allows for having
    alternative interpreter functions running as currently, we can only
    detect jit compiled filters by testing fp->bpf_func to not equal the
    address of sk_run_filter().

    Joint work with Alexei Starovoitov.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

27 Mar, 2014

1 commit

  • The packet hash can be considered a property of the packet, not just
    on RX path.

    This patch changes name of rxhash and l4_rxhash skbuff fields to be
    hash and l4_hash respectively. This includes changing uses of the
    field in the code which don't call the access functions.

    Signed-off-by: Tom Herbert
    Signed-off-by: Eric Dumazet
    Cc: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Tom Herbert
     

16 Jan, 2014

1 commit

  • At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
    were not correct. (off by one in some cases)
    http://www.wireshark.org/~darkjames/reciprocal-buggy.c

    He could also show this with BPF:
    http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

    The reciprocal divide in linux kernel is not generic enough,
    lets remove its use in BPF, as it is not worth the pain with
    current cpus.

    Signed-off-by: Eric Dumazet
    Reported-by: Jakub Zawadzki
    Cc: Mircea Gherzan
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Matt Evans
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Nov, 2013

1 commit


20 Oct, 2013

1 commit


08 Oct, 2013

1 commit

  • on x86 system with net.core.bpf_jit_enable = 1

    sudo tcpdump -i eth1 'tcp port 22'

    causes the warning:
    [ 56.766097] Possible unsafe locking scenario:
    [ 56.766097]
    [ 56.780146] CPU0
    [ 56.786807] ----
    [ 56.793188] lock(&(&vb->lock)->rlock);
    [ 56.799593]
    [ 56.805889] lock(&(&vb->lock)->rlock);
    [ 56.812266]
    [ 56.812266] *** DEADLOCK ***
    [ 56.812266]
    [ 56.830670] 1 lock held by ksoftirqd/1/13:
    [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [] vm_unmap_aliases+0x8c/0x380
    [ 56.849757]
    [ 56.849757] stack backtrace:
    [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
    [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
    [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
    [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
    [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
    [ 56.923006] Call Trace:
    [ 56.929532] [] dump_stack+0x55/0x76
    [ 56.936067] [] print_usage_bug+0x1f7/0x208
    [ 56.942445] [] ? save_stack_trace+0x2f/0x50
    [ 56.948932] [] ? check_usage_backwards+0x150/0x150
    [ 56.955470] [] mark_lock+0x282/0x2c0
    [ 56.961945] [] __lock_acquire+0x45d/0x1d50
    [ 56.968474] [] ? __lock_acquire+0x2de/0x1d50
    [ 56.975140] [] ? cpumask_next_and+0x55/0x90
    [ 56.981942] [] lock_acquire+0x92/0x1d0
    [ 56.988745] [] ? vm_unmap_aliases+0x16a/0x380
    [ 56.995619] [] _raw_spin_lock+0x41/0x50
    [ 57.002493] [] ? vm_unmap_aliases+0x16a/0x380
    [ 57.009447] [] vm_unmap_aliases+0x16a/0x380
    [ 57.016477] [] ? vm_unmap_aliases+0x8c/0x380
    [ 57.023607] [] change_page_attr_set_clr+0xc0/0x460
    [ 57.030818] [] ? trace_hardirqs_on+0xd/0x10
    [ 57.037896] [] ? kmem_cache_free+0xb0/0x2b0
    [ 57.044789] [] ? free_object_rcu+0x93/0xa0
    [ 57.051720] [] set_memory_rw+0x2f/0x40
    [ 57.058727] [] bpf_jit_free+0x2c/0x40
    [ 57.065577] [] sk_filter_release_rcu+0x1a/0x30
    [ 57.072338] [] rcu_process_callbacks+0x202/0x7c0
    [ 57.078962] [] __do_softirq+0xf7/0x3f0
    [ 57.085373] [] run_ksoftirqd+0x35/0x70

    cannot reuse jited filter memory, since it's readonly,
    so use original bpf insns memory to hold work_struct

    defer kfree of sk_filter until jit completed freeing

    tested on x86_64 and i386

    Signed-off-by: Alexei Starovoitov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

21 May, 2013

1 commit


22 Mar, 2013

1 commit

  • If bpf_jit_enable > 1, then we dump the emitted JIT compiled image
    after creation. Currently, only SPARC and PowerPC has similar output
    as in the reference implementation on x86_64. Make a small helper
    function in order to reduce duplicated code and make the dump output
    uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM
    flen, pass and proglen are currently not shown, but would be
    interesting to know as well), also for future BPF JIT implementations
    on other archs.

    Cc: Mircea Gherzan
    Cc: Matt Evans
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Mar, 2013

1 commit


15 Feb, 2013

1 commit


13 Dec, 2012

1 commit

  • Pull networking changes from David Miller:

    1) Allow to dump, monitor, and change the bridge multicast database
    using netlink. From Cong Wang.

    2) RFC 5961 TCP blind data injection attack mitigation, from Eric
    Dumazet.

    3) Networking user namespace support from Eric W. Biederman.

    4) tuntap/virtio-net multiqueue support by Jason Wang.

    5) Support for checksum offload of encapsulated packets (basically,
    tunneled traffic can still be checksummed by HW). From Joseph
    Gasparakis.

    6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
    Daniel Borkmann.

    7) Bridge port parameters over netlink and BPDU blocking support
    from Stephen Hemminger.

    8) Improve data access patterns during inet socket demux by rearranging
    socket layout, from Eric Dumazet.

    9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
    Jon Maloy.

    10) Update TCP socket hash sizing to be more in line with current day
    realities. The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

    11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

    12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

    13) Add "oops_only" mode to netconsole, from Amerigo Wang.

    14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace. From John Fastabend.

    15) Support PTP in the Tigon3 driver, from Matt Carlson.

    16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

    17) Support per-association statistics in SCTP, from Michele
    Baldessari.

    And many, many, driver updates, cleanups, and improvements. Too
    numerous to mention individually.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    net/mlx4_en: Add support for destination MAC in steering rules
    net/mlx4_en: Use generic etherdevice.h functions.
    net: ethtool: Add destination MAC address to flow steering API
    bridge: add support of adding and deleting mdb entries
    bridge: notify mdb changes via netlink
    ndisc: Unexport ndisc_{build,send}_skb().
    uapi: add missing netconf.h to export list
    pkt_sched: avoid requeues if possible
    solos-pci: fix double-free of TX skb in DMA mode
    bnx2: Fix accidental reversions.
    bna: Driver Version Updated to 3.1.2.1
    bna: Firmware update
    bna: Add RX State
    bna: Rx Page Based Allocation
    bna: TX Intr Coalescing Fix
    bna: Tx and Rx Optimizations
    bna: Code Cleanup and Enhancements
    ath9k: check pdata variable before dereferencing it
    ath5k: RX timestamp is reported at end of frame
    ath9k_htc: RX timestamp is reported at end of frame
    ...

    Linus Torvalds
     

11 Dec, 2012

2 commits

  • The offset must be multiplied by 4 to be sure to access the correct
    32bit word in the stack scratch space.

    For instance, a store at scratch memory cell #1 was generating the
    following:

    st r4, [sp, #1]

    While the correct code for this is:

    st r4, [sp, #4]

    To reproduce the bug (assuming your system has a NIC with the mac
    address 52:54:00:12:34:56):

    echo 0 > /proc/sys/net/core/bpf_jit_enable
    tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \
    == -0x3AA" # this will capture packets as expected

    echo 1 > /proc/sys/net/core/bpf_jit_enable
    tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \
    == -0x3AA" # this will not.

    This bug was present since the original inclusion of bpf_jit for ARM
    (ddecdfce: ARM: 7259/3: net: JIT compiler for packet filters).

    Signed-off-by: Nicolas Schichan
    Signed-off-by: Russell King

    Schichan Nicolas
     
  • Official prototype for kzalloc is:

    void *kzalloc(size_t, gfp_t);

    The ARM bpf_jit code was having the assumption that it was:

    void *kzalloc(gfp_t, size);

    This was resulting the use of some random GFP flags depending on the
    size requested and some random overflows once the really needed size
    was more than the value of GFP_KERNEL.

    This bug was present since the original inclusion of bpf_jit for ARM
    (ddecdfce: ARM: 7259/3: net: JIT compiler for packet filters).

    Signed-off-by: Nicolas Schichan
    Signed-off-by: Russell King

    Schichan Nicolas
     

14 Nov, 2012

2 commits


14 Jun, 2012

1 commit


24 Mar, 2012

1 commit

  • Based of Matt Evans's PPC64 implementation.

    The compiler generates ARM instructions but interworking is
    supported for Thumb2 kernels.

    Supports both little and big endian. Unaligned loads are emitted
    for ARMv6+. Not all the BPF opcodes that deal with ancillary data
    are supported. The scratch memory of the filter lives on the stack.
    Hardware integer division is used if it is available.

    Enabled in the same way as for x86-64 and PPC64:

    echo 1 > /proc/sys/net/core/bpf_jit_enable

    A value greater than 1 enables opcode output.

    Signed-off-by: Mircea Gherzan
    Acked-by: David S. Miller
    Acked-by: Eric Dumazet
    Signed-off-by: Russell King

    Mircea Gherzan