10 Apr, 2013

1 commit


26 Oct, 2012

3 commits


15 Sep, 2012

3 commits

  • WARNING: With this change it is impossible to load external built
    controllers anymore.

    In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
    set, corresponding subsys_id should also be a constant. Up to now,
    net_prio_subsys_id and net_cls_subsys_id would be of the type int and
    the value would be assigned during runtime.

    By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
    to IS_ENABLED, all *_subsys_id will have constant value. That means we
    need to remove all the code which assumes a value can be assigned to
    net_prio_subsys_id and net_cls_subsys_id.

    A close look is necessary on the RCU part which was introduces by
    following patch:

    commit f845172531fb7410c7fb7780b1a6e51ee6df7d52
    Author: Herbert Xu Mon May 24 09:12:34 2010
    Committer: David S. Miller Mon May 24 09:12:34 2010

    cls_cgroup: Store classid in struct sock

    Tis code was added to init_cgroup_cls()

    /* We can't use rcu_assign_pointer because this is an int. */
    smp_wmb();
    net_cls_subsys_id = net_cls_subsys.subsys_id;

    respectively to exit_cgroup_cls()

    net_cls_subsys_id = -1;
    synchronize_rcu();

    and in module version of task_cls_classid()

    rcu_read_lock();
    id = rcu_dereference(net_cls_subsys_id);
    if (id >= 0)
    classid = container_of(task_subsys_state(p, id),
    struct cgroup_cls_state, css)->classid;
    rcu_read_unlock();

    Without an explicit explaination why the RCU part is needed. (The
    rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
    in a later commit, but that is a minor detail.)

    So here is my pondering why it was introduced and why it safe to
    remove it now. Note that this code was copied over to net_prio the
    reasoning holds for that subsystem too.

    The idea behind the RCU use for net_cls_subsys_id is to make sure we
    get a valid pointer back from task_subsys_state(). task_subsys_state()
    is just blindly accessing the subsys array and returning the
    pointer. Obviously, passing in -1 as id into task_subsys_state()
    returns an invalid value (out of lower bound).

    So this code makes sure that only after module is loaded and the
    subsystem registered, the id is assigned.

    Before unregistering the module all old readers must have left the
    critical section. This is done by assigning -1 to the id and issuing a
    synchronized_rcu(). Any new readers wont call task_subsys_state()
    anymore and therefore it is safe to unregister the subsystem.

    The new code relies on the same trick, but it looks at the subsys
    pointer return by task_subsys_state() (remember the id is constant
    and therefore we allways have a valid index into the subsys
    array).

    No precautions need to be taken during module loading
    module. Eventually, all CPUs will get a valid pointer back from
    task_subsys_state() because rebind_subsystem() which is called after
    the module init() function will assigned subsys[net_cls_subsys_id] the
    newly loaded module subsystem pointer.

    When the subsystem is about to be removed, rebind_subsystem() will
    called before the module exit() function. In this case,
    rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
    and then it calls synchronize_rcu(). All old readers have left by then
    the critical section. Any new reader wont access the subsystem
    anymore. At this point we are safe to unregister the subsystem. No
    synchronize_rcu() call is needed.

    Signed-off-by: Daniel Wagner
    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Neil Horman
    Cc: "David S. Miller"
    Cc: "Paul E. McKenney"
    Cc: Andrew Morton
    Cc: Eric Dumazet
    Cc: Gao feng
    Cc: Glauber Costa
    Cc: Herbert Xu
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: Kamezawa Hiroyuki
    Cc: netdev@vger.kernel.org
    Cc: cgroups@vger.kernel.org

    Daniel Wagner
     
  • task_cls_classid() should not be defined in case the configuration is
    CONFIG_NET_CLS_CGROUP=n. The reason is that in a following patch the
    net_cls_subsys_id will only be defined if CONFIG_NET_CLS_CGROUP!=n.
    When net_cls is not built at all a callee should only get an empty
    task_cls_classid() without any references to net_cls_subsys_id.

    Signed-off-by: Daniel Wagner
    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Neil Horman
    Cc: Gao feng
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: netdev@vger.kernel.org
    Cc: cgroups@vger.kernel.org

    Daniel Wagner
     
  • The only user of sock_update_classid() is net/socket.c which happens
    to include cls_cgroup.h directly.

    tj: Fix build breakage due to missing cls_cgroup.h inclusion in
    drivers/net/tun.c reported in linux-next by Stephen.

    Signed-off-by: Daniel Wagner
    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Neil Horman
    Reported-by: Stephen Rothwell
    Cc: Gao feng
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: netdev@vger.kernel.org
    Cc: cgroups@vger.kernel.org

    Daniel Wagner
     

07 Oct, 2010

1 commit


04 Sep, 2010

1 commit

  • Dave reported an rcu lockdep warning on 2.6.35.4 kernel

    task->cgroups and task->cgroups->subsys[i] are protected by RCU.
    So we avoid accessing invalid pointers here. This might happen,
    for example, when you are deref-ing those pointers while someone
    move @task from one cgroup to another.

    Reported-by: Dave Jones
    Signed-off-by: Li Zefan
    Signed-off-by: David S. Miller

    Li Zefan
     

20 Aug, 2010

1 commit

  • The task_cls_classid() function applies rcu_dereference() to integers,
    which does not work with the shiny new sparse-based checking in
    rcu_dereference(). This commit therefore moves to the new RCU API
    rcu_dereference_index_check().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Acked-by: David S. Miller
    Acked-by: Herbert Xu

    Paul E. McKenney
     

26 May, 2010

1 commit


25 May, 2010

1 commit


24 May, 2010

1 commit

  • Up until now cls_cgroup has relied on fetching the classid out of
    the current executing thread. This runs into trouble when a packet
    processing is delayed in which case it may execute out of another
    thread's context.

    Furthermore, even when a packet is not delayed we may fail to
    classify it if soft IRQs have been disabled, because this scenario
    is indistinguishable from one where a packet unrelated to the
    current thread is processed by a real soft IRQ.

    In fact, the current semantics is inherently broken, as a single
    skb may be constructed out of the writes of two different tasks.
    A different manifestation of this problem is when the TCP stack
    transmits in response of an incoming ACK. This is currently
    unclassified.

    As we already have a concept of packet ownership for accounting
    purposes in the skb->sk pointer, this is a natural place to store
    the classid in a persistent manner.

    This patch adds the cls_cgroup classid in struct sock, filling up
    an existing hole on 64-bit :)

    The value is set at socket creation time. So all sockets created
    via socket(2) automatically gains the ID of the thread creating it.
    Whenever another process touches the socket by either reading or
    writing to it, we will change the socket classid to that of the
    process if it has a valid (non-zero) classid.

    For sockets created on inbound connections through accept(2), we
    inherit the classid of the original listening socket through
    sk_clone, possibly preceding the actual accept(2) call.

    In order to minimise risks, I have not made this the authoritative
    classid. For now it is only used as a backup when we execute
    with soft IRQs disabled. Once we're completely happy with its
    semantics we can use it as the sole classid.

    Footnote: I have rearranged the error path on cls_group module
    creation. If we didn't do this, then there is a window where
    someone could create a tc rule using cls_group before the cgroup
    subsystem has been registered.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu