25 Jan, 2021

1 commit


06 Jan, 2021

1 commit

  • commit 2d18e54dd8662442ef5898c6bdadeaf90b3cebbc upstream.

    A memory leak is found in cgroup1_parse_param() when multiple source
    parameters overwrite fc->source in the fs_context struct without free.

    unreferenced object 0xffff888100d930e0 (size 16):
    comm "mount", pid 520, jiffies 4303326831 (age 152.783s)
    hex dump (first 16 bytes):
    74 65 73 74 6c 65 61 6b 00 00 00 00 00 00 00 00 testleak........
    backtrace:
    [] kmemdup_nul+0x2d/0xa0
    [] vfs_parse_fs_string+0xc0/0x150
    [] generic_parse_monolithic+0x15a/0x1d0
    [] path_mount+0xee1/0x1820
    [] do_mount+0xea/0x100
    [] __x64_sys_mount+0x14b/0x1f0

    Fix this bug by permitting a single source parameter and rejecting with
    an error all subsequent ones.

    Fixes: 8d2451f4994f ("cgroup1: switch to option-by-option parsing")
    Reported-by: Hulk Robot
    Signed-off-by: Qinglang Miao
    Reviewed-by: Zefan Li
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Qinglang Miao
     

08 Apr, 2020

1 commit


16 Mar, 2020

1 commit


13 Mar, 2020

1 commit

  • cgrp->root->release_agent_path is protected by both cgroup_mutex and
    release_agent_path_lock and readers can hold either one. The
    dual-locking scheme was introduced while breaking a locking dependency
    issue around cgroup_mutex but doesn't make sense anymore given that
    the only remaining reader which uses cgroup_mutex is
    cgroup1_releaes_agent().

    This patch updates cgroup1_release_agent() to use
    release_agent_path_lock so that release_agent_path is always protected
    only by release_agent_path_lock.

    While at it, convert strlen() based empty string checks to direct
    tests on the first character as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds

    Tejun Heo
     

05 Mar, 2020

1 commit

  • Older (and maybe current) versions of systemd set release_agent to "" when
    shutting down, but do not set notify_on_release to 0.

    Since 64e90a8acb85 ("Introduce STATIC_USERMODEHELPER to mediate
    call_usermodehelper()"), we filter out such calls when the user mode helper
    path is "". However, when used in conjunction with an actual (i.e. non "")
    STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
    will be called with argv[0] == "".

    Let's avoid this by not invoking the release_agent when it is "".

    Signed-off-by: Tycho Andersen
    Signed-off-by: Tejun Heo

    Tycho Andersen
     

13 Feb, 2020

1 commit

  • if seq_file .next fuction does not change position index,
    read after some lseek can generate unexpected output.

    # mount | grep cgroup
    # dd if=/mnt/cgroup.procs bs=1 # normal output
    ...
    1294
    1295
    1296
    1304
    1382
    584+0 records in
    584+0 records out
    584 bytes copied

    dd: /mnt/cgroup.procs: cannot skip to specified offset
    83 <<< generates end of last line
    1383 <<< ... and whole last line once again
    0+1 records in
    0+1 records out
    8 bytes copied

    dd: /mnt/cgroup.procs: cannot skip to specified offset
    1386 <<< generates last line anyway
    0+1 records in
    0+1 records out
    5 bytes copied

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Tejun Heo

    Vasily Averin
     

11 Feb, 2020

1 commit


08 Feb, 2020

4 commits


07 Jan, 2020

1 commit

  • Android expects system_server to be able to move tasks between different
    cgroups/cpusets, but does not want to be running as root. Let's relax
    permission check so that processes can move other tasks if they have
    CAP_SYS_NICE in the affected task's user namespace.

    BUG=b:31790445,chromium:647994
    Bug: 147109865
    TEST=Boot android container, examine logcat

    Signed-off-by: Dmitry Torokhov
    Reviewed-on: https://chromium-review.googlesource.com/394927
    Reviewed-by: Ricky Zhou
    [AmitP: Refactored original changes to align with upstream commit
    201af4c0fab0 ("cgroup: move cgroup files under kernel/cgroup/")]
    Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b
    Signed-off-by: Amit Pundir
    (cherry picked from commit ec54762b84a1d06de188bc846655305d3f7acf75)

    Dmitry Torokhov
     

07 Oct, 2019

1 commit

  • There are reports of users who use thread migrations between cgroups and
    they report performance drop after d59cfc09c32a ("sched, cgroup: replace
    signal_struct->group_rwsem with a global percpu_rwsem"). The effect is
    pronounced on machines with more CPUs.

    The migration is affected by forking noise happening in the background,
    after the mentioned commit a migrating thread must wait for all
    (forking) processes on the system, not only of its threadgroup.

    There are several places that need to synchronize with migration:
    a) do_exit,
    b) de_thread,
    c) copy_process,
    d) cgroup_update_dfl_csses,
    e) parallel migration (cgroup_{proc,thread}s_write).

    In the case of self-migrating thread, we relax the synchronization on
    cgroup_threadgroup_rwsem to avoid the cost of waiting. d) and e) are
    excluded with cgroup_mutex, c) does not matter in case of single thread
    migration and the executing thread cannot exec(2) or exit(2) while it is
    writing into cgroup.threads. In case of do_exit because of signal
    delivery, we either exit before the migration or finish the migration
    (of not yet PF_EXITING thread) and die afterwards.

    This patch handles only the case of self-migration by writing "0" into
    cgroup.threads. For simplicity, we always take cgroup_threadgroup_rwsem
    with numeric PIDs.

    This change improves migration dependent workload performance similar
    to per-signal_struct state.

    Signed-off-by: Michal Koutný
    Signed-off-by: Tejun Heo

    Michal Koutný
     

08 Aug, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

20 Apr, 2019

1 commit

  • The helper is identical to the existing cgroup_task_count()
    except it doesn't take the css_set_lock by itself, assuming
    that the caller does.

    Also, move cgroup_task_count() implementation into
    kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Tejun Heo
    Cc: kernel-team@fb.com

    Roman Gushchin
     

28 Feb, 2019

9 commits


18 Jan, 2019

1 commit

  • * make the reference from superblock to cgroup_root counting -
    do cgroup_put() in cgroup_kill_sb() whether we'd done
    percpu_ref_kill() or not; matching grab is done when we allocate
    a new root. That gives the same refcounting rules for all callers
    of cgroup_do_mount() - a reference to cgroup_root has been grabbed
    by caller and it either is transferred to new superblock or dropped.

    * have cgroup_kill_sb() treat an already killed refcount as "just
    don't bother killing it, then".

    * after successful cgroup_do_mount() have cgroup1_mount() recheck
    if we'd raced with mount/umount from somebody else and cgroup_root
    got killed. In that case we drop the superblock and bugger off
    with -ERESTARTSYS, same as if we'd found it in the list already
    dying.

    * don't bother with delayed initialization of refcount - it's
    unreliable and not needed. No need to prevent attempts to bump
    the refcount if we find cgroup_root of another mount in progress -
    sget will reuse an existing superblock just fine and if the
    other sb manages to die before we get there, we'll catch
    that immediately after cgroup_do_mount().

    * don't bother with kernfs_pin_sb() - no need for doing that
    either.

    Signed-off-by: Al Viro

    Al Viro
     

29 Dec, 2018

1 commit

  • It can be useful to inhibit all cgroup1 hierarchies especially during
    transition and for debugging. cgroup_no_v1 can block hierarchies with
    controllers which leaves out the named hierarchies. Expand it to
    cover the named hierarchies so that "cgroup_no_v1=all,named" disables
    all cgroup1 hierarchies.

    Signed-off-by: Tejun Heo
    Suggested-by: Marcin Pawlowski
    Signed-off-by: Tejun Heo

    Tejun Heo
     

12 Jul, 2018

1 commit

  • It is unwise to take spin locks from the handlers of trace events.
    Mainly, because they can introduce lockups, because it introduces locks
    in places that are normally not tested. Worse yet, because trace events
    are tucked away in the include/trace/events/ directory, locks that are
    taken there are forgotten about.

    As a general rule, I tell people never to take any locks in a trace
    event handler.

    Several cgroup trace event handlers call cgroup_path() which eventually
    takes the kernfs_rename_lock spinlock. This injects the spinlock in the
    code without people realizing it. It also can cause issues for the
    PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event
    handlers are called with preemption disabled.

    By moving the calculation of the cgroup_path() out of the trace event
    handlers and into a macro (surrounded by a
    trace_cgroup_##type##_enabled()), then we could place the cgroup_path
    into a string, and pass that to the trace event. Not only does this
    remove the taking of the spinlock out of the trace event handler, but
    it also means that the cgroup_path() only needs to be called once (it
    is currently called twice, once to get the length to reserver the
    buffer for, and once again to get the path itself. Now it only needs to
    be done once.

    Reported-by: Sebastian Andrzej Siewior
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Tejun Heo

    Steven Rostedt (VMware)
     

13 Jun, 2018

2 commits

  • The vmalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vmalloc(a * b)

    with:
    vmalloc(array_size(a, b))

    as well as handling cases of:

    vmalloc(a * b * c)

    with:

    vmalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vmalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vmalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vmalloc(C1 * C2 * C3, ...)
    |
    vmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vmalloc(C1 * C2, ...)
    |
    vmalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

16 May, 2018

1 commit


19 Dec, 2017

1 commit

  • Deadlock during cgroup migration from cpu hotplug path when a task T is
    being moved from source to destination cgroup.

    kworker/0:0
    cpuset_hotplug_workfn()
    cpuset_hotplug_update_tasks()
    hotplug_update_tasks_legacy()
    remove_tasks_in_empty_cpuset()
    cgroup_transfer_tasks() // stuck in iterator loop
    cgroup_migrate()
    cgroup_migrate_add_task()

    In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
    Task T will not migrate to destination cgroup. css_task_iter_start()
    will keep pointing to task T in loop waiting for task T cg_list node
    to be removed.

    Task T
    do_exit()
    exit_signals() // sets PF_EXITING
    exit_task_namespaces()
    switch_task_namespaces()
    free_nsproxy()
    put_mnt_ns()
    drop_collected_mounts()
    namespace_unlock()
    synchronize_rcu()
    _synchronize_rcu_expedited()
    schedule_work() // on cpu0 low priority worker pool
    wait_event() // waiting for work item to execute

    Task T inserted a work item in the worklist of cpu0 low priority
    worker pool. It is waiting for expedited grace period work item
    to execute. This work item will only be executed once kworker/0:0
    complete execution of cpuset_hotplug_workfn().

    kworker/0:0 ==> Task T ==>kworker/0:0

    In case of PF_EXITING task being migrated from source to destination
    cgroup, migrate next available task in source cgroup.

    Signed-off-by: Prateek Sood
    Signed-off-by: Tejun Heo

    Prateek Sood
     

18 Aug, 2017

1 commit


21 Jul, 2017

3 commits

  • This patch implements cgroup v2 thread support. The goal of the
    thread mode is supporting hierarchical accounting and control at
    thread granularity while staying inside the resource domain model
    which allows coordination across different resource controllers and
    handling of anonymous resource consumptions.

    A cgroup is always created as a domain and can be made threaded by
    writing to the "cgroup.type" file. When a cgroup becomes threaded, it
    becomes a member of a threaded subtree which is anchored at the
    closest ancestor which isn't threaded.

    The threads of the processes which are in a threaded subtree can be
    placed anywhere without being restricted by process granularity or
    no-internal-process constraint. Note that the threads aren't allowed
    to escape to a different threaded subtree. To be used inside a
    threaded subtree, a controller should explicitly support threaded mode
    and be able to handle internal competition in the way which is
    appropriate for the resource.

    The root of a threaded subtree, the nearest ancestor which isn't
    threaded, is called the threaded domain and serves as the resource
    domain for the whole subtree. This is the last cgroup where domain
    controllers are operational and where all the domain-level resource
    consumptions in the subtree are accounted. This allows threaded
    controllers to operate at thread granularity when requested while
    staying inside the scope of system-level resource distribution.

    As the root cgroup is exempt from the no-internal-process constraint,
    it can serve as both a threaded domain and a parent to normal cgroups,
    so, unlike non-root cgroups, the root cgroup can have both domain and
    threaded children.

    Internally, in a threaded subtree, each css_set has its ->dom_cset
    pointing to a matching css_set which belongs to the threaded domain.
    This ensures that thread root level cgroup_subsys_state for all
    threaded controllers are readily accessible for domain-level
    operations.

    This patch enables threaded mode for the pids and perf_events
    controllers. Neither has to worry about domain-level resource
    consumptions and it's enough to simply set the flag.

    For more details on the interface and behavior of the thread mode,
    please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
    by this patch.

    v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create().
    Spotted by Waiman.
    - Documentation updated as suggested by Waiman.
    - cgroup.type content slightly reformatted.
    - Mark the debug controller threaded.

    v4: - Updated to the general idea of marking specific cgroups
    domain/threaded as suggested by PeterZ.

    v3: - Dropped "join" and always make mixed children join the parent's
    threaded subtree.

    v2: - After discussions with Waiman, support for mixed thread mode is
    added. This should address the issue that Peter pointed out
    where any nesting should be avoided for thread subtrees while
    coexisting with other domain cgroups.
    - Enabling / disabling thread mode now piggy backs on the existing
    control mask update mechanism.
    - Bug fixes and cleanup.

    Signed-off-by: Tejun Heo
    Cc: Waiman Long
    Cc: Peter Zijlstra

    Tejun Heo
     
  • css_task_iter currently always walks all tasks. With the scheduled
    cgroup v2 thread support, the iterator would need to handle multiple
    types of iteration. As a preparation, add @flags to
    css_task_iter_start() and implement CSS_TASK_ITER_PROCS. If the flag
    is not specified, it walks all tasks as before. When asserted, the
    iterator only walks the group leaders.

    For now, the only user of the flag is cgroup v2 "cgroup.procs" file
    which no longer needs to skip non-leader tasks in cgroup_procs_next().
    Note that cgroup v1 "cgroup.procs" can't use the group leader walk as
    v1 "cgroup.procs" doesn't mean "list all thread group leaders in the
    cgroup" but "list all thread group id's with any threads in the
    cgroup".

    While at it, update cgroup_procs_show() to use task_pid_vnr() instead
    of task_tgid_vnr(). As the iteration guarantees that the function
    only sees group leaders, this doesn't change the output and will allow
    sharing the function for thread iteration.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Currently, writes "cgroup.procs" and "cgroup.tasks" files are all
    handled by __cgroup_procs_write() on both v1 and v2. This patch
    reoragnizes the write path so that there are common helper functions
    that different write paths use.

    While this somewhat increases LOC, the different paths are no longer
    intertwined and each path has more flexibility to implement different
    behaviors which will be necessary for the planned v2 thread support.

    v3: - Restructured so that cgroup_procs_write_permission() takes
    @src_cgrp and @dst_cgrp.

    v2: - Rolled in Waiman's task reference count fix.
    - Updated on top of nsdelegate changes.

    Signed-off-by: Tejun Heo
    Cc: Waiman Long

    Tejun Heo
     

15 Jun, 2017

2 commits

  • The debug cgroup currently resides within cgroup-v1.c and is enabled
    only for v1 cgroup. To enable the debug cgroup also for v2, it makes
    sense to put the code into its own file as it will no longer be v1
    specific. There is no change to the debug cgroup specific code.

    Signed-off-by: Waiman Long
    Signed-off-by: Tejun Heo

    Waiman Long
     
  • The reference count in the css_set data structure was used as a
    proxy of the number of tasks attached to that css_set. However, that
    count is actually not an accurate measure especially with thread mode
    support. So a new variable nr_tasks is added to the css_set to keep
    track of the actual task count. This new variable is protected by
    the css_set_lock. Functions that require the actual task count are
    updated to use the new variable.

    tj: s/task_count/nr_tasks/ for consistency with cgroup_root->nr_cgrps.
    Refreshed on top of cgroup/for-v4.13 which dropped on
    css_set_populated() -> nr_tasks conversion.

    Signed-off-by: Waiman Long
    Signed-off-by: Tejun Heo

    Waiman Long
     

02 May, 2017

1 commit

  • Pull cgroup updates from Tejun Heo:
    "Nothing major. Two notable fixes are Li's second stab at fixing the
    long-standing race condition in the mount path and suppression of
    spurious warning from cgroup_get(). All other changes are trivial"

    * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: mark cgroup_get() with __maybe_unused
    cgroup: avoid attaching a cgroup root to two different superblocks, take 2
    cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()
    cgroup: move cgroup_subsys_state parent field for cache locality
    cpuset: Remove cpuset_update_active_cpus()'s parameter.
    cgroup: switch to BUG_ON()
    cgroup: drop duplicate header nsproxy.h
    kernel: convert css_set.refcount from atomic_t to refcount_t
    kernel: convert cgroup_namespace.count from atomic_t to refcount_t

    Linus Torvalds