25 Jul, 2010

1 commit

  • This fixes hang when target device of mirred packet classifier
    action is removed.

    If a mirror or redirection action is configured to cause packets
    to go to another device, the classifier holds a ref count, but was assuming
    the adminstrator cleaned up all redirections before removing. The fix
    is to add a notifier and cleanup during unregister.

    The new list is implicitly protected by RTNL mutex because
    it is held during filter add/delete as well as notifier.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    stephen hemminger
     

13 Jul, 2010

1 commit

  • not all of the ICMP packets need an IP header payload, so we check the length
    of the skbs only when the packets should have an IP header payload.

    Based upon analysis and initial patch by Rodrigo Partearroyo González.

    Signed-off-by: Changli Gao
    Acked-by: Herbert Xu
    ----
    net/sched/act_nat.c | 5 ++++-
    1 file changed, 4 insertions(+), 1 deletion(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

17 Jun, 2010

1 commit

  • https://bugzilla.kernel.org/show_bug.cgi?id=16183

    The sch_teql module, which can be used to load balance over a set of
    underlying interfaces, stopped working after 2.6.30 and has been
    broken in all kernels since then for any underlying interface which
    requires the addition of link level headers.

    The problem is that the transmit routine relies on being able to
    access the destination address in the skb in order to do address
    resolution once it has decided which underlying interface it is going
    to transmit through.

    In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
    default for all interfaces, which causes the destination address to be
    released before the transmit routine for the interface is called.

    The solution is to clear that flag for teql interfaces.

    Signed-off-by: Tom Hughes
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Hughes
     

03 Jun, 2010

1 commit

  • access skb->data safely

    we should use skb_header_pointer() and skb_store_bits() to access skb->data to
    handle small or non-linear skbs.

    Signed-off-by: Changli Gao
    ----
    net/sched/act_pedit.c | 24 ++++++++++++++----------
    1 file changed, 14 insertions(+), 10 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

02 Jun, 2010

2 commits

  • use skb_header_pointer() to dereference data safely

    the original skb->data dereference isn't safe, as there isn't any skb->len or
    skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch.
    And when the skb isn't long enough, we terminate the function u32_classify()
    immediately with -1.

    Signed-off-by: Changli Gao
    Signed-off-by: David S. Miller

    Changli Gao
     
  • fix the wrong checksum when addr isn't in old_addr/mask

    For TCP and UDP packets, when addr isn't in old_addr/mask we don't do SNAT or
    DNAT, and we should not update layer 4 checksum.

    Signed-off-by: Changli Gao
    ----
    net/sched/act_nat.c | 4 ++++
    1 file changed, 4 insertions(+)
    Signed-off-by: David S. Miller

    Changli Gao
     

24 May, 2010

2 commits

  • Up until now cls_cgroup has relied on fetching the classid out of
    the current executing thread. This runs into trouble when a packet
    processing is delayed in which case it may execute out of another
    thread's context.

    Furthermore, even when a packet is not delayed we may fail to
    classify it if soft IRQs have been disabled, because this scenario
    is indistinguishable from one where a packet unrelated to the
    current thread is processed by a real soft IRQ.

    In fact, the current semantics is inherently broken, as a single
    skb may be constructed out of the writes of two different tasks.
    A different manifestation of this problem is when the TCP stack
    transmits in response of an incoming ACK. This is currently
    unclassified.

    As we already have a concept of packet ownership for accounting
    purposes in the skb->sk pointer, this is a natural place to store
    the classid in a persistent manner.

    This patch adds the cls_cgroup classid in struct sock, filling up
    an existing hole on 64-bit :)

    The value is set at socket creation time. So all sockets created
    via socket(2) automatically gains the ID of the thread creating it.
    Whenever another process touches the socket by either reading or
    writing to it, we will change the socket classid to that of the
    process if it has a valid (non-zero) classid.

    For sockets created on inbound connections through accept(2), we
    inherit the classid of the original listening socket through
    sk_clone, possibly preceding the actual accept(2) call.

    In order to minimise risks, I have not made this the authoritative
    classid. For now it is only used as a backup when we execute
    with soft IRQs disabled. Once we're completely happy with its
    semantics we can use it as the sole classid.

    Footnote: I have rearranged the error path on cls_group module
    creation. If we didn't do this, then there is a window where
    someone could create a tc rule using cls_group before the cgroup
    subsystem has been registered.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Ben Pfaff reported a kernel oops and provided a test program to
    reproduce it.

    https://kerneltrap.org/mailarchive/linux-netdev/2010/5/21/6277805

    tc_fill_qdisc() should not be called for builtin qdisc, or it
    dereference a NULL pointer to get device ifindex.

    Fix is to always use tc_qdisc_dump_ignore() before calling
    tc_fill_qdisc().

    Reported-by: Ben Pfaff
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2010

6 commits

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • If the user has a bad classification configuration, and gets a packet
    that goes through too many steps. Chances are more packets will arrive,
    and the message spew will overrun syslog because it is not rate limited.
    And because it is not tagged with appropriate priority it can't not be screened.

    Added the qdisc to the message to try and give some more context when
    the message does arrive.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • The previous patch encourage me to go look at all the messages in
    the network scheduler and fix them. Many messages were missing
    any severity level. Some serious ones that should never happen
    were turned into WARN(), and the random noise messages that were
    handled changed to pr_debug().

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • When attaching filters to a class pointing to a class higher up in the
    hierarchy, classification may enter an endless loop. Currently this is
    prevented for filters that are already resolved, but not for filters
    resolved at runtime.

    Only allow filters to point downwards in the hierarchy, similar to what
    CBQ does.

    Reported-by: Pawel Staszewski
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Several netem users use TBF for rate control. But every time the parameters
    of TBF are changed it destroys the child qdisc, requiring reconfigation.
    Better to just keep child qdisc and just notify it of changed limit.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Use low order bit of skb->_skb_dst to tell dst is not refcounted.

    Change _skb_dst to _skb_refdst to make sure all uses are catched.

    skb_dst() returns the dst, regardless of noref bit set or not, but
    with a lockdep check to make sure a noref dst is not given if current
    user is not rcu protected.

    New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
    (with lockdep check)

    skb_dst_drop() drops a reference only if skb dst was refcounted.

    skb_dst_force() helper is used to force a refcount on dst, when skb
    is queued and not anymore RCU protected.

    Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
    !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
    sock_queue_rcv_skb(), in __nf_queue().

    Use skb_dst_force() in dev_requeue_skb().

    Note: dst_use_noref() still dirties dst, we might transform it
    later to do one dirtying per jiffies.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 May, 2010

2 commits


11 May, 2010

1 commit


03 May, 2010

1 commit

  • Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
    without any protection. And enqueue_to_backlog should update the netdev_rx_stat
    of the target CPU.

    This patch renames softnet_data.total to softnet_data.processed: the number of
    packets processed in uppper levels(IP stacks).

    softnet_stat data is moved into softnet_data.

    Signed-off-by: Changli Gao
    ----
    include/linux/netdevice.h | 17 +++++++----------
    net/core/dev.c | 26 ++++++++++++--------------
    net/sched/sch_generic.c | 2 +-
    3 files changed, 20 insertions(+), 25 deletions(-)
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Changli Gao
     

21 Apr, 2010

1 commit


20 Apr, 2010

1 commit


12 Apr, 2010

1 commit


07 Apr, 2010

1 commit


02 Apr, 2010

1 commit

  • One of my test machine got a deadlock during "tc" sessions,
    adding/deleting classes & filters, using traffic estimators.

    After some analysis, I believe we have a potential use after free case
    in est_timer() :

    spin_lock(e->stats_lock); << HERE >>
    read_lock(&est_lock);
    if (e->bstats == NULL) << TEST >>
    goto skip;

    Test is done a bit late, because after estimator is killed, and before
    rcu grace period elapsed, we might already have freed/reuse memory where
    e->stats_locks points to (some qdisc->q.lock)

    A possible fix is to respect a rcu grace period at Qdisc dismantle time.

    On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to
    it (for struct rcu_head) is a problem because it might change
    performance, given QDISC_ALIGNTO is 32 bytes.

    This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most
    current alignment requirements.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Mar, 2010

1 commit

  • These changes were suggested by Alexey Dobriyan :

    - psched_show() does not use any private data so just pass NULL to
    psched_open()

    - remove unnecessary return statement

    Signed-off-by: Tom Goff
    Signed-off-by: David S. Miller

    Tom Goff
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

2 commits


24 Mar, 2010

1 commit

  • Allows the net_cls cgroup subsystem to be compiled as a module

    This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem
    to be optionally compiled as a module instead of builtin. The
    cgroup_subsys struct is moved around a bit to allow the subsys_id to be
    either declared as a compile-time constant by the cgroup_subsys.h include
    in cgroup.h, or, if it's a module, initialized within the struct by
    cgroup_load_subsys.

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Ben Blum
     

23 Mar, 2010

1 commit


10 Feb, 2010

1 commit


09 Feb, 2010

1 commit


29 Jan, 2010

1 commit

  • This adds an additional queuing strategy, called pfifo_head_drop,
    to remove the oldest skb in the case of an overflow within the queue -
    the head element - instead of the last skb (tail). To remove the oldest
    skb in congested situations is useful for sensor network environments
    where newer packets reflect the superior information.

    Reviewed-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

30 Nov, 2009

1 commit


26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

21 Nov, 2009

1 commit


19 Nov, 2009

1 commit


17 Nov, 2009

2 commits

  • move checking if eaction is valid in tcf_mirred_init()

    Signed-off-by: Changli Gao
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Changli Gao
     
  • 1. don't let go back using goto.
    2. don't call skb_act_clone() until it is necessary.
    3. one exit of the critical context.

    Signed-off-by: Changli Gao
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Changli Gao
     

16 Nov, 2009

1 commit

  • Recent changes in the TX error propagation require additional checking
    and masking of values returned from hard_start_xmit(), mainly to
    separate cases where skb was consumed. This aim can be simplified by
    changing the order of NETDEV_TX and NET_XMIT codes, because the latter
    are treated similarly to negative (ERRNO) values.

    After this change much simpler dev_xmit_complete() is also used in
    sch_direct_xmit(), so it is moved to netdevice.h.

    Additionally NET_RX definitions in netdevice.h are moved up from
    between TX codes to avoid confusion while reading the TX comment.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski