20 Oct, 2011

1 commit

  • This is just a cleanup.

    My testing version of Smatch warns about this:
    net/core/filter.c +380 check_load_and_stores(6)
    warn: check 'flen' for negative values

    flen comes from the user. We try to clamp the values here between 1
    and BPF_MAXINSNS but the clamp doesn't work because it could be
    negative. This is a bug, but it's not exploitable.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

27 May, 2011

1 commit

  • As reported by Ingo Molnar, we still have configuration combinations
    where use of the WARN_RATELIMIT interfaces break the build because
    dependencies don't get met.

    Instead of going down the long road of trying to make it so that
    ratelimit.h can get included by kernel.h or asm-generic/bug.h,
    just move the interface into ratelimit.h and make users have
    to include that.

    Reported-by: Ingo Molnar
    Signed-off-by: David S. Miller
    Acked-by: Randy Dunlap

    David S. Miller
     

24 May, 2011

1 commit

  • A mis-configured filter can spam the logs with lots of stack traces.

    Rate-limit the warnings and add printout of the bogus filter information.

    Original-patch-by: Ben Greear
    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

28 Apr, 2011

1 commit

  • In order to speedup packet filtering, here is an implementation of a
    JIT compiler for x86_64

    It is disabled by default, and must be enabled by the admin.

    echo 1 >/proc/sys/net/core/bpf_jit_enable

    It uses module_alloc() and module_free() to get memory in the 2GB text
    kernel range since we call helpers functions from the generated code.

    EAX : BPF A accumulator
    EBX : BPF X accumulator
    RDI : pointer to skb (first argument given to JIT function)
    RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
    r9d : skb->len - skb->data_len (headlen)
    r8 : skb->data

    To get a trace of generated code, use :

    echo 2 >/proc/sys/net/core/bpf_jit_enable

    Example of generated code :

    # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

    flen=18 proglen=147 pass=3 image=ffffffffa00b5000
    JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
    JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
    JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
    JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
    JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
    JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
    JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
    JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
    JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
    JIT code: ffffffffa00b5090: c0 c9 c3

    BPF program is 144 bytes long, so native program is almost same size ;)

    (000) ldh [12]
    (001) jeq #0x800 jt 2 jf 8
    (002) ld [26]
    (003) and #0xffffff00
    (004) jeq #0xc0a81400 jt 16 jf 5
    (005) ld [30]
    (006) and #0xffffff00
    (007) jeq #0xc0a81400 jt 16 jf 17
    (008) jeq #0x806 jt 10 jf 9
    (009) jeq #0x8035 jt 10 jf 17
    (010) ld [28]
    (011) and #0xffffff00
    (012) jeq #0xc0a81400 jt 16 jf 13
    (013) ld [38]
    (014) and #0xffffff00
    (015) jeq #0xc0a81400 jt 16 jf 17
    (016) ret #65535
    (017) ret #0

    Signed-off-by: Eric Dumazet
    Cc: Arnaldo Carvalho de Melo
    Cc: Ben Hutchings
    Cc: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Mar, 2011

1 commit


19 Jan, 2011

1 commit


10 Jan, 2011

1 commit

  • Fix new kernel-doc notation warning in net/core/filter.c:

    Warning(net/core/filter.c:172): No description found for parameter 'fentry'
    Warning(net/core/filter.c:172): Excess function parameter 'filter' description in 'sk_run_filter'

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     

22 Dec, 2010

1 commit

  • We can translate pseudo load instructions at filter check time to
    dedicated instructions to speed up filtering and avoid one switch().
    libpcap currently uses SKF_AD_PROTOCOL, but custom filters probably use
    other ancillary accesses.

    Note : I made the assertion that ancillary data was always accessed with
    BPF_LD|BPF_?|BPF_ABS instructions, not with BPF_LD|BPF_?|BPF_IND ones
    (offset given by K constant, not by K + X register)

    On x86_64, this saves a few bytes of text :

    # size net/core/filter.o.*
    text data bss dec hex filename
    4864 0 0 4864 1300 net/core/filter.o.new
    4944 0 0 4944 1350 net/core/filter.o.old

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Dec, 2010

1 commit

  • __load_pointer() checks data we fetch from skb is included in head
    portion, but assumes we fetch one byte, instead of up to four.

    This wont crash because we have extra bytes (struct skb_shared_info)
    after head, but this can read uninitialized bytes.

    Fix this using size of the data (1, 2, 4 bytes) in the test.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Dec, 2010

2 commits


07 Dec, 2010

3 commits

  • We added some security checks in commit 57fe93b374a6
    (filter: make sure filters dont read uninitialized memory) to close a
    potential leak of kernel information to user.

    This added a potential extra cost at run time, while we can perform a
    check of the filter itself, to make sure a malicious user doesnt try to
    abuse us.

    This patch adds a check_loads() function, whole unique purpose is to
    make this check, allocating a temporary array of mask. We scan the
    filter and propagate a bitmask information, telling us if a load M(K) is
    allowed because a previous store M(K) is guaranteed. (So that
    sk_run_filter() can possibly not read unitialized memory)

    Note: this can uncover application bug, denying a filter attach,
    previously allowed.

    Signed-off-by: Eric Dumazet
    Cc: Dan Rosenberg
    Cc: Changli Gao
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
    to be able to build advanced filters.

    This can help spreading packets on several sockets with a fast
    selection, after RPS dispatch to N cpus for example, or to catch a
    percentage of flows in one queue.

    tcpdump -s 500 "cpu = 1" :

    [0] ld CPU
    [1] jeq #1 jt 2 jf 3
    [2] ret #500
    [3] ret #0

    # take 12.5 % of flows (average)
    tcpdump -s 1000 "rxhash & 7 = 2" :

    [0] ld RXHASH
    [1] and #7
    [2] jeq #2 jt 3 jf 4
    [3] ret #1000
    [4] ret #0

    Signed-off-by: Eric Dumazet
    Cc: Rui
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
    sk_clone() in commit 47e958eac280c263397

    Problem is we can have several clones sharing a common sk_filter, and
    these clones might want to sk_filter_attach() their own filters at the
    same time, and can overwrite old_filter->rcu, corrupting RCU queues.

    We can not use filter->rcu without being sure no other thread could do
    the same thing.

    Switch code to a more conventional ref-counting technique : Do the
    atomic decrement immediately and queue one rcu call back when last
    reference is released.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Nov, 2010

5 commits

  • Conflicts:
    drivers/net/bonding/bond_main.c
    net/core/net-sysfs.c
    net/ipv6/addrconf.c

    David S. Miller
     
  • At compile time, we can replace the DIV_K instruction (divide by a
    constant value) by a reciprocal divide.

    At exec time, the expensive divide is replaced by a multiply, a less
    expensive operation on most processors.

    Signed-off-by: Eric Dumazet
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Starting the translated instruction to 1 instead of 0 allows us to
    remove one descrement at check time and makes codes[] array init
    cleaner.

    Signed-off-by: Eric Dumazet
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Remove pc variable to avoid arithmetic to compute fentry at each filter
    instruction. Jumps directly manipulate fentry pointer.

    As the last instruction of filter[] is guaranteed to be a RETURN, and
    all jumps are before the last instruction, we dont need to check filter
    bounds (number of instructions in filter array) at each iteration, so we
    remove it from sk_run_filter() params.

    On x86_32 remove f_k var introduced in commit 57fe93b374a6b871
    (filter: make sure filters dont read uninitialized memory)

    Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
    avoid too many ifdefs in this code.

    This helps compiler to use cpu registers to hold fentry and A
    accumulator.

    On x86_32, this saves 401 bytes, and more important, sk_run_filter()
    runs much faster because less register pressure (One less conditional
    branch per BPF instruction)

    # size net/core/filter.o net/core/filter_pre.o
    text data bss dec hex filename
    2948 0 0 2948 b84 net/core/filter.o
    3349 0 0 3349 d15 net/core/filter_pre.o

    on x86_64 :
    # size net/core/filter.o net/core/filter_pre.o
    text data bss dec hex filename
    5173 0 0 5173 1435 net/core/filter.o
    5224 0 0 5224 1468 net/core/filter_pre.o

    Signed-off-by: Eric Dumazet
    Acked-by: Changli Gao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Fix kernel-doc warning for sk_filter_rcu_release():

    Warning(net/core/filter.c:586): missing initial short description on line:
    * sk_filter_rcu_release: Release a socket filter by rcu_head

    Signed-off-by: Randy Dunlap
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Randy Dunlap
     

19 Nov, 2010

2 commits


11 Nov, 2010

1 commit

  • There is a possibility malicious users can get limited information about
    uninitialized stack mem array. Even if sk_run_filter() result is bound
    to packet length (0 .. 65535), we could imagine this can be used by
    hostile user.

    Initializing mem[] array, like Dan Rosenberg suggested in his patch is
    expensive since most filters dont even use this array.

    Its hard to make the filter validation in sk_chk_filter(), because of
    the jumps. This might be done later.

    In this patch, I use a bitmap (a single long var) so that only filters
    using mem[] loads/stores pay the price of added security checks.

    For other filters, additional cost is a single instruction.

    [ Since we access fentry->k a lot now, cache it in a local variable
    and mark filter entry pointer as const. -DaveM ]

    Reported-by: Dan Rosenberg
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David S. Miller
     

26 Oct, 2010

1 commit


28 Sep, 2010

1 commit

  • sk_attach_filter() and sk_detach_filter() are run with socket locked.

    Use the appropriate rcu_dereference_protected() instead of blocking BH,
    and rcu_dereference_bh().
    There is no point adding BH prevention and memory barrier.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Jun, 2010

1 commit

  • Gcc is currenlty not in the ability to optimize the switch statement in
    sk_run_filter() because of dense case labels. This patch replace the
    OR'd labels with ordered sequenced case labels. The sk_chk_filter()
    function is modified to patch/replace the original OPCODES in a
    ordered but equivalent form. gcc is now in the ability to transform the
    switch statement in sk_run_filter into a jump table of complexity O(1).

    Until this patch gcc generates a sequence of conditional branches (O(n) of 567
    byte .text segment size (arch x86_64):

    7ff: 8b 06 mov (%rsi),%eax
    801: 66 83 f8 35 cmp $0x35,%ax
    805: 0f 84 d0 02 00 00 je adb
    80b: 0f 87 07 01 00 00 ja 918
    811: 66 83 f8 15 cmp $0x15,%ax
    815: 0f 84 c5 02 00 00 je ae0
    81b: 77 73 ja 890
    81d: 66 83 f8 04 cmp $0x4,%ax
    821: 0f 84 17 02 00 00 je a3e
    827: 77 29 ja 852
    829: 66 83 f8 01 cmp $0x1,%ax
    [...]

    With the modification the compiler translate the switch statement into
    the following jump table fragment:

    7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
    803: 0f 87 1f 02 00 00 ja a28
    809: 0f b7 06 movzwl (%rsi),%eax
    80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
    813: 44 89 e3 mov %r12d,%ebx
    816: e9 43 03 00 00 jmpq b5e
    81b: 41 89 dc mov %ebx,%r12d
    81e: e9 3b 03 00 00 jmpq b5e

    Furthermore, I reordered the instructions to reduce cache line misses by
    order the most common instruction to the start.

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

23 Apr, 2010

1 commit

  • Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
    access to skb->dev->type, as reported in the sll_hatype field.

    When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
    interfaces, there doesn't appear to be a way for the filter program to
    actually find out the underlying hardware type the packet was captured
    on. This patch adds such ability.

    This patch also handles the case where skb->dev can be NULL, such as on
    netlink sockets.

    Signed-off-by: Paul Evans
    Signed-off-by: David S. Miller

    Paul LeoNerd Evans
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

1 commit


25 Feb, 2010

1 commit

  • Update rcu_dereference() primitives to use new lockdep-based
    checking. The rcu_dereference() in __in6_dev_get() may be
    protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
    The rcu_dereference() in __sk_free() is protected by the fact
    that it is never reached if an update could change it. Check
    for this by using rcu_dereference_check() to verify that the
    struct sock's ->sk_wmem_alloc counter is zero.

    Acked-by: Eric Dumazet
    Acked-by: David S. Miller
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

18 Feb, 2010

1 commit


20 Oct, 2009

2 commits

  • It can help being able to filter packets on their queue_mapping.

    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other functions
    can directly ignore packets with not expected queue number.

    Lets experiment this simple filter extension first.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Allow bpf to set a filter to drop packets that dont
    match a specific mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    jamal
     

20 Nov, 2008

1 commit

  • SKF_AD_NLATTR allows us to find the first matching attribute in a
    stream of netlink attributes from one offset to the end of the
    netlink message. This is not suitable to look for a specific
    matching inside a set of nested attributes.

    For example, in ctnetlink messages, if we look for the CTA_V6_SRC
    attribute in a message that talks about an IPv4 connection,
    SKF_AD_NLATTR returns the offset of CTA_STATUS which has the same
    value of CTA_V6_SRC but outside the nest. To differenciate
    CTA_STATUS and CTA_V6_SRC, we would have to make assumptions on the
    size of the attribute and the usual offset, resulting in horrible
    BSF code.

    This patch adds SKF_AD_NLATTR_NEST, which is a variant of
    SKF_AD_NLATTR, that looks for an attribute inside the limits of
    a nested attributes, but not further.

    This patch validates that we have enough room to look for the
    nested attributes - based on a suggestion from Patrick McHardy.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

02 Jul, 2008

1 commit


03 May, 2008

1 commit


10 Apr, 2008

3 commits

  • SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
    parsing and walking attributes. It takes the offset at which to start
    searching in the 'A' register and the attribute type in the 'X' register
    and returns the offset in the 'A' register. When the attribute is not
    found it returns zero.

    A top-level attribute can be located using a filter like this
    (example for nfnetlink, using struct nfgenmsg):

    ...
    {
    /* A = offset of first attribute */
    .code = BPF_LD | BPF_IMM,
    .k = sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg)
    },
    {
    /* X = CTA_PROTOINFO */
    .code = BPF_LDX | BPF_IMM,
    .k = CTA_PROTOINFO,
    },
    {
    /* A = netlink attribute offset */
    .code = BPF_LD | BPF_B | BPF_ABS,
    .k = SKF_AD_OFF + SKF_AD_NLATTR
    },
    {
    /* Exit if not found */
    .code = BPF_JMP | BPF_JEQ | BPF_K,
    .k = 0,
    .jt =
    },
    ...

    A nested attribute below the CTA_PROTOINFO attribute would then
    be parsed like this:

    ...
    {
    /* A += sizeof(struct nlattr) */
    .code = BPF_ALU | BPF_ADD | BPF_K,
    .k = sizeof(struct nlattr),
    },
    {
    /* X = CTA_PROTOINFO_TCP */
    .code = BPF_LDX | BPF_IMM,
    .k = CTA_PROTOINFO_TCP,
    },
    {
    /* A = netlink attribute offset */
    .code = BPF_LD | BPF_B | BPF_ABS,
    .k = SKF_AD_OFF + SKF_AD_NLATTR
    },
    ...

    The data of an attribute can be loaded into 'A' like this:

    ...
    {
    /* X = A (attribute offset) */
    .code = BPF_MISC | BPF_TAX,
    },
    {
    /* A = skb->data[X + k] */
    .code = BPF_LD | BPF_B | BPF_IND,
    .k = sizeof(struct nlattr),
    },
    ...

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The sk_filter function is too big to be inlined. This saves 2296 bytes
    of text on allyesconfig.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Some minor style cleanups:
    * Move __KERNEL__ definitions to one place in filter.h
    * Use const for sk_filter_len
    * Line wrapping
    * Put EXPORT_SYMBOL next to function definition

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

19 Oct, 2007

1 commit

  • Looks like this might be causing problems, at least for me on ppc. This
    happened during a normal boot, right around first interface config/dhcp
    run..

    cpu 0x0: Vector: 300 (Data Access) at [c00000000147b820]
    pc: c000000000435e5c: .sk_filter_delayed_uncharge+0x1c/0x60
    lr: c0000000004360d0: .sk_attach_filter+0x170/0x180
    sp: c00000000147baa0
    msr: 9000000000009032
    dar: 4
    dsisr: 40000000
    current = 0xc000000004780fa0
    paca = 0xc000000000650480
    pid = 1295, comm = dhclient3
    0:mon> t
    [c00000000147bb20] c0000000004360d0 .sk_attach_filter+0x170/0x180
    [c00000000147bbd0] c000000000418988 .sock_setsockopt+0x788/0x7f0
    [c00000000147bcb0] c000000000438a74 .compat_sys_setsockopt+0x4e4/0x5a0
    [c00000000147bd90] c00000000043955c .compat_sys_socketcall+0x25c/0x2b0
    [c00000000147be30] c000000000007508 syscall_exit+0x0/0x40
    --- Exception: c01 (System Call) at 000000000ff618d8
    SP (fffdf040) is in userspace
    0:mon>

    I.e. null pointer deref at sk_filter_delayed_uncharge+0x1c:

    0:mon> di $.sk_filter_delayed_uncharge
    c000000000435e40 7c0802a6 mflr r0
    c000000000435e44 fbc1fff0 std r30,-16(r1)
    c000000000435e48 7c8b2378 mr r11,r4
    c000000000435e4c ebc2cdd0 ld r30,-12848(r2)
    c000000000435e50 f8010010 std r0,16(r1)
    c000000000435e54 f821ff81 stdu r1,-128(r1)
    c000000000435e58 380300a4 addi r0,r3,164
    c000000000435e5c 81240004 lwz r9,4(r4)

    That's the deref of fp:

    static void sk_filter_delayed_uncharge(struct sock *sk, struct sk_filter *fp)
    {
    unsigned int size = sk_filter_len(fp);
    ...

    That is called from sk_attach_filter():

    ...
    rcu_read_lock_bh();
    old_fp = rcu_dereference(sk->sk_filter);
    rcu_assign_pointer(sk->sk_filter, fp);
    rcu_read_unlock_bh();

    sk_filter_delayed_uncharge(sk, old_fp);
    return 0;
    ...

    So, looks like rcu_dereference() returned NULL. I don't know the
    filter code at all, but it seems like it might be a valid case?
    sk_detach_filter() seems to handle a NULL sk_filter, at least.

    So, this needs review by someone who knows the filter, but it fixes the
    problem for me:

    Signed-off-by: Olof Johansson
    Signed-off-by: David S. Miller

    Olof Johansson