27 Aug, 2014

1 commit


16 Jul, 2014

1 commit


10 Dec, 2013

1 commit


20 Nov, 2013

4 commits

  • Register generic netlink multicast groups as an array with
    the family and give them contiguous group IDs. Then instead
    of passing the global group ID to the various functions that
    send messages, pass the ID relative to the family - for most
    families that's just 0 because the only have one group.

    This avoids the list_head and ID in each group, adding a new
    field for the mcast group ID offset to the family.

    At the same time, this allows us to prevent abusing groups
    again like the quota and dropmon code did, since we can now
    check that a family only uses a group it owns.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This doesn't really change anything, but prepares for the
    next patch that will change the APIs to pass the group ID
    within the family, rather than the global group ID.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • The drop monitor code is abusing the genetlink API and is
    statically using the generic netlink multicast group 1, even
    if that group belongs to somebody else (which it invariably
    will, since it's not reserved.)

    Make the drop monitor code use the proper APIs to reserve a
    group ID, but also reserve the group id 1 in generic netlink
    code to preserve the userspace API. Since drop monitor can
    be a module, don't clear the bit for it on unregistration.

    Acked-by: Neil Horman
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • As suggested by David Miller, make genl_register_family_with_ops()
    a macro and pass only the array, evaluating ARRAY_SIZE() in the
    macro, this is a little safer.

    The openvswitch has some indirection, assing ops/n_ops directly in
    that code. This might ultimately just assign the pointers in the
    family initializations, saving the struct genl_family_and_ops and
    code (once mcast groups are handled differently.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Nov, 2013

1 commit

  • Now that genl_ops are no longer modified in place when
    registering, they can be made const. This patch was done
    mostly with spatch:

    @@
    identifier ops;
    @@
    +const
    struct genl_ops ops[] = {
    ...
    };

    (except the struct thing in net/openvswitch/datapath.c)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

29 May, 2013

1 commit

  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

04 Jun, 2012

1 commit

  • drop_monitor calls several sleeping functions while in atomic context.

    BUG: sleeping function called from invalid context at mm/slub.c:943
    in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
    Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
    Call Trace:
    [] __might_sleep+0xca/0xf0
    [] kmem_cache_alloc_node+0x1b3/0x1c0
    [] ? queue_delayed_work_on+0x11c/0x130
    [] __alloc_skb+0x4b/0x230
    [] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
    [] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
    [] send_dm_alert+0x4b/0xb0 [drop_monitor]
    [] process_one_work+0x130/0x4c0
    [] worker_thread+0x159/0x360
    [] ? manage_workers.isra.27+0x240/0x240
    [] kthread+0x93/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? kthread_freezable_should_stop+0x80/0x80
    [] ? gs_change+0xb/0xb

    Rework the logic to call the sleeping functions in right context.

    Use standard timer/workqueue api to let system chose any cpu to perform
    the allocation and netlink send.

    Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
    use mod_timer() to wait 1/10 second before next try.

    Signed-off-by: Eric Dumazet
    Cc: Neil Horman
    Reviewed-by: Neil Horman
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 May, 2012

1 commit


18 May, 2012

1 commit

  • When I first wrote drop monitor I wrote it to just build monolithically. There
    is no reason it can't be built modularly as well, so lets give it that
    flexibiity.

    I've tested this by building it as both a module and monolithically, and it
    seems to work quite well

    Change notes:

    v2)
    * fixed for_each_present_cpu loops to be more correct as per Eric D.
    * Converted exit path failures to BUG_ON as per Ben H.

    v3)
    * Converted del_timer to del_timer_sync to close race noted by Ben H.

    Signed-off-by: Neil Horman
    CC: "David S. Miller"
    CC: Eric Dumazet
    CC: Ben Hutchings
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Neil Horman
     

17 May, 2012

1 commit

  • Use the current logging style.

    This enables use of dynamic debugging as well.

    Convert printk(KERN_ to pr_.
    Add pr_fmt. Remove embedded prefixes, use
    %s, __func__ instead.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

03 May, 2012

1 commit

  • I just noticed after some recent updates, that the init path for the drop
    monitor protocol has a minor error. drop monitor maintains a per cpu structure,
    that gets initalized from a single cpu. Normally this is fine, as the protocol
    isn't in use yet, but I recently made a change that causes a failed skb
    allocation to reschedule itself . Given the current code, the implication is
    that this workqueue reschedule will take place on the wrong cpu. If drop
    monitor is used early during the boot process, its possible that two cpus will
    access a single per-cpu structure in parallel, possibly leading to data
    corruption.

    This patch fixes the situation, by storing the cpu number that a given instance
    of this per-cpu data should be accessed from. In the case of a need for a
    reschedule, the cpu stored in the struct is assigned the rescheule, rather than
    the currently executing cpu

    Tested successfully by myself.

    Signed-off-by: Neil Horman
    CC: David Miller
    Signed-off-by: David S. Miller

    Neil Horman
     

28 Apr, 2012

2 commits

  • Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
    its smp protections. Specifically, its possible to replace data->skb while its
    being written. This patch corrects that by making data->skb an rcu protected
    variable. That will prevent it from being overwritten while a tracepoint is
    modifying it.

    Signed-off-by: Neil Horman
    Reported-by: Eric Dumazet
    CC: David Miller
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Eric Dumazet pointed out this warning in the drop_monitor protocol to me:

    [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
    [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
    [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
    [ 38.352582] Call Trace:
    [ 38.352592] [] ? trace_napi_poll_hit+0xd0/0xd0
    [ 38.352599] [] __might_sleep+0xca/0xf0
    [ 38.352606] [] mutex_lock+0x26/0x50
    [ 38.352610] [] ? trace_napi_poll_hit+0xd0/0xd0
    [ 38.352616] [] tracepoint_probe_register+0x29/0x90
    [ 38.352621] [] set_all_monitor_traces+0x105/0x170
    [ 38.352625] [] net_dm_cmd_trace+0x2a/0x40
    [ 38.352630] [] genl_rcv_msg+0x21a/0x2b0
    [ 38.352636] [] ? zone_statistics+0x99/0xc0
    [ 38.352640] [] ? genl_rcv+0x30/0x30
    [ 38.352645] [] netlink_rcv_skb+0xa9/0xd0
    [ 38.352649] [] genl_rcv+0x20/0x30
    [ 38.352653] [] netlink_unicast+0x1ae/0x1f0
    [ 38.352658] [] netlink_sendmsg+0x2b6/0x310
    [ 38.352663] [] sock_sendmsg+0x10f/0x130
    [ 38.352668] [] ? move_addr_to_kernel+0x60/0xb0
    [ 38.352673] [] ? verify_iovec+0x64/0xe0
    [ 38.352677] [] __sys_sendmsg+0x386/0x390
    [ 38.352682] [] ? handle_mm_fault+0x139/0x210
    [ 38.352687] [] ? do_page_fault+0x1ec/0x4f0
    [ 38.352693] [] ? set_next_entity+0x9d/0xb0
    [ 38.352699] [] ? tty_ldisc_deref+0x9/0x10
    [ 38.352703] [] ? pick_next_task_fair+0x63/0x140
    [ 38.352708] [] sys_sendmsg+0x44/0x80
    [ 38.352713] [] system_call_fastpath+0x16/0x1b

    It stems from holding a spinlock (trace_state_lock) while attempting to register
    or unregister tracepoint hooks, making in_atomic() true in this context, leading
    to the warning when the tracepoint calls might_sleep() while its taking a mutex.
    Since we only use the trace_state_lock to prevent trace protocol state races, as
    well as hardware stat list updates on an rcu write side, we can just convert the
    spinlock to a mutex to avoid this problem.

    Signed-off-by: Neil Horman
    Reported-by: Eric Dumazet
    CC: David Miller
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

22 Apr, 2012

1 commit

  • It seems there is a logic error in trace_drop_common(), since we store
    only 64 drops, even if they are from same location.

    This fix is a one liner, but we probably need more work to avoid useless
    atomic dec/inc

    Now I can watch 1 Mpps drops through dropwatch...

    Signed-off-by: Eric Dumazet
    Cc: Neil Horman
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 May, 2011

1 commit


22 Mar, 2011

1 commit


27 Jul, 2010

1 commit


21 Jul, 2010

1 commit

  • Patch to add -EAGAIN error to dropwatch netlink message handling code.
    -EAGAIN will be returned anytime userspace attempts to transition the state of
    the drop monitor service to a state that its already in. That allows user space
    to detect this condition, so it doesn't wait for a success ACK that will never
    arrive. Tested successfully by me

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

14 May, 2010

1 commit

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

17 Feb, 2010

1 commit


07 Nov, 2009

1 commit


03 Sep, 2009

1 commit

  • It was recently pointed out to me that the last_rx field of the
    net_device structure wasn't updated regularly. In fact only the
    bonding driver really uses it currently. Since the drop_monitor code
    relies on the last_rx field to detect drops on recevie in hardware, We
    need to find a more reliable way to rate limit our drop checks (so
    that we don't check for drops on every frame recevied, which would be
    inefficient. This patch makes a last_rx timestamp that is private to
    the drop monitor code and is updated for every device that we track.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

02 Sep, 2009

1 commit


15 Jun, 2009

1 commit


22 May, 2009

1 commit

  • Patch to add the ability to detect drops in hardware interfaces via dropwatch.
    Adds a tracepoint to net_rx_action to signal everytime a napi instance is
    polled. The dropmon code then periodically checks to see if the rx_frames
    counter has changed, and if so, adds a drop notification to the netlink
    protocol, using the reserved all-0's vector to indicate the drop location was in
    hardware, rather than somewhere in the code.

    Signed-off-by: Neil Horman

    include/linux/net_dropmon.h | 8 ++
    include/trace/napi.h | 11 +++
    net/core/dev.c | 5 +
    net/core/drop_monitor.c | 124 ++++++++++++++++++++++++++++++++++++++++++--
    net/core/net-traces.c | 4 +
    net/core/netpoll.c | 2
    6 files changed, 149 insertions(+), 5 deletions(-)
    Signed-off-by: David S. Miller

    Neil Horman
     

27 Apr, 2009

1 commit

  • When I initially implemented this protocol, I disregarded the use of netlink
    attribute headers, thinking for my purposes they weren't needed. I've come to
    find out that, as I'm starting to work with sending down messages with
    associated data (like config messages), the kernel code spits out warnings about
    trailing data in a netlink skb that doesn't have an associated header on it. As
    such, I'm going to start including attribute headers in my netlink transaction,
    and so for completeness, I should likely include them on messages bound from the
    kernel to user space. This patch adds that header to the kernel, and bumps the
    protocol version accordingly

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

15 Apr, 2009

1 commit

  • Impact: clean up

    Create a sub directory in include/trace called events to keep the
    trace point headers in their own separate directory. Only headers that
    declare trace points should be defined in this directory.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Neil Horman
    Cc: Zhao Lei
    Cc: Eduard - Gabriel Munteanu
    Cc: Pekka Enberg
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Mar, 2009

1 commit