22 Sep, 2015

1 commit

  • commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

    Many file systems that implement the show_options hook fail to correctly
    escape their output which could lead to unescaped characters (e.g. new
    lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
    could lead to confusion, spoofed entries (resulting in things like
    systemd issuing false d-bus "mount" notifications), and who knows what
    else. This looks like it would only be the root user stepping on
    themselves, but it's possible weird things could happen in containers or
    in other situations with delegated mount privileges.

    Here's an example using overlay with setuid fusermount trusting the
    contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
    of "sudo" is something more sneaky:

    $ BASE="ovl"
    $ MNT="$BASE/mnt"
    $ LOW="$BASE/lower"
    $ UP="$BASE/upper"
    $ WORK="$BASE/work/ 0 0
    none /proc fuse.pwn user_id=1000"
    $ mkdir -p "$LOW" "$UP" "$WORK"
    $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
    $ cat /proc/mounts
    none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
    none /proc fuse.pwn user_id=1000 0 0
    $ fusermount -u /proc
    $ cat /proc/mounts
    cat: /proc/mounts: No such file or directory

    This fixes the problem by adding new seq_show_option and
    seq_show_option_n helpers, and updating the vulnerable show_option
    handlers to use them as needed. Some, like SELinux, need to be open
    coded due to unusual existing escape mechanisms.

    [akpm@linux-foundation.org: add lost chunk, per Kees]
    [keescook@chromium.org: seq_show_option should be using const parameters]
    Signed-off-by: Kees Cook
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara
    Acked-by: Paul Moore
    Cc: J. R. Okajima
    Signed-off-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

22 Jul, 2015

1 commit

  • commit f9bb48825a6b5d02f4cabcc78967c75db903dcdc upstream.

    This allows for better documentation in the code and
    it allows for a simpler and fully correct version of
    fs_fully_visible to be written.

    The mount points converted and their filesystems are:
    /sys/hypervisor/s390/ s390_hypfs
    /sys/kernel/config/ configfs
    /sys/kernel/debug/ debugfs
    /sys/firmware/efi/efivars/ efivarfs
    /sys/fs/fuse/connections/ fusectl
    /sys/fs/pstore/ pstore
    /sys/kernel/tracing/ tracefs
    /sys/fs/cgroup/ cgroup
    /sys/kernel/security/ securityfs
    /sys/fs/selinux/ selinuxfs
    /sys/fs/smackfs/ smackfs

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

16 Apr, 2015

2 commits

  • The seq_printf return value, because it's frequently misused,
    will eventually be converted to void.

    See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
    seq_has_overflowed() and make public")

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
    checks that id != 0 before issuing the function call. Today, there is
    no point in this additional check apart from optimization, because there
    is no css with id 0 to css_from_id.

    Signed-off-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

03 Mar, 2015

2 commits

  • The wrapper already calls the appropriate free
    function, use it instead of spinning our own.

    Signed-off-by: Bandan Das
    Acked-by: Zefan Li
    Signed-off-by: Tejun Heo

    Bandan Das
     
  • Currently, we call cgroup_subsys->bind only on unmount, remount, and
    when creating a new root on mount. Since the default hierarchy root is
    created in cgroup_init, we will not call cgroup_subsys->bind if the
    default hierarchy is freshly mounted. As a result, some controllers will
    behave incorrectly (most notably, the "memory" controller will not
    enable hierarchy support). Fix this by calling cgroup_subsys->bind right
    after initializing a cgroup subsystem.

    Signed-off-by: Vladimir Davydov
    Signed-off-by: Tejun Heo

    Vladimir Davydov
     

14 Feb, 2015

1 commit

  • When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
    making a separate copy of its name. It's currently only used for sysfs
    attributes whose filenames are required to stay accessible and unchanged.
    There are rare exceptions where these names are allocated and formatted
    dynamically but for the vast majority of cases they're consts in the
    rodata section.

    Now that kernfs is converted to use kstrdup_const() and kfree_const(),
    there's little point in keeping KERNFS_STATIC_NAME around. Remove it.

    Signed-off-by: Tejun Heo
    Cc: Andrzej Hajda
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

13 Feb, 2015

1 commit

  • Currently, we release css->id in css_release_work_fn, right before calling
    css_free callback, so that when css_free is called, the id may have
    already been reused for a new cgroup.

    I am going to use css->id to create unique names for per memcg kmem
    caches. Since kmem caches are destroyed only on css_free, I need css->id
    to be freed after css_free was called to avoid name clashes. This patch
    therefore moves css->id removal to css_free_work_fn. To prevent
    css_from_id from returning a pointer to a stale css, it makes
    css_release_work_fn replace the css ptr at css_idr:css->id with NULL.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Acked-by: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

22 Jan, 2015

1 commit

  • Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
    offlined groups"), re-mounting the memory controller after using it is
    very likely to hang.

    The cgroup core assumes that any remaining references after deleting a
    cgroup are temporary in nature, and synchroneously waits for them, but
    the above-mentioned commit has left-over page cache pin its css until
    it is reclaimed naturally. That being said, swap entries and charged
    kernel memory have been doing the same indefinite pinning forever, the
    bug is just more likely to trigger with left-over page cache.

    Reparenting kernel memory is highly impractical, which leaves changing
    the cgroup assumptions to reflect this: once a controller has been
    mounted and used, it has internal state that is independent from mount
    and cgroup lifetime. It can be unmounted and remounted, but it can't
    be reconfigured during subsequent mounts.

    Don't offline the controller root as long as there are any children,
    dead or alive. A remount will no longer wait for these old references
    to drain, it will simply mount the persistent controller state again.

    Reported-by: "Suzuki K. Poulose"
    Reported-by: Will Deacon
    Signed-off-by: Johannes Weiner
    Signed-off-by: Tejun Heo

    Johannes Weiner
     

18 Nov, 2014

6 commits

  • Implement cgroup_get_e_css() which finds and gets the effective css
    for the specified cgroup and subsystem combination. This function
    always returns a valid pinned css. This will be used by cgroup
    writeback support.

    While at it, add comment to cgroup_e_css() to explain why that
    function is different from cgroup_get_e_css() and has to test
    cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss).

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li

    Tejun Heo
     
  • Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is
    invoked if any of the effective csses seen from the css's cgroup may
    have changed. This will be used to implement cgroup writeback
    support.

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li

    Tejun Heo
     
  • Add a new cgroup subsys callback css_released(). This is called when
    the reference count of the css (cgroup_subsys_state) reaches zero
    before RCU scheduling free.

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li

    Tejun Heo
     
  • When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared
    asynchronously. If cgroup_subtree_control_write() is requested to
    enable the subsystem again before the entry is cleared, it has to wait
    for the previous offlining to finish and clear the @cgrp->subsys[]
    entry before trying to enable the subsystem again.

    This is currently done while verifying the input enable / disable
    parameters. This used to be correct but f63070d350e3 ("cgroup: make
    interface files visible iff enabled on cgroup->subtree_control")
    breaks it. The commit is one of the commits implementing subsystem
    dependency.

    Through subsystem dependency, some subsystems may be enabled and
    disabled implicitly in addition to the explicitly requested ones. The
    actual subsystems to be enabled and disabled are determined during
    @css_enable/disable calculation. The current offline wait logic skips
    the ones which are already implicitly enabled and then waits for
    subsystems in @enable; however, this misses the subsystems which may
    be implicitly enabled through dependency from @enable. If such
    implicitly subsystem hasn't yet finished offlining yet, the function
    ends up trying to create a css when its @cgrp->subsys[] slot is
    already occupied triggering BUG_ON() in init_and_link_css().

    Fix it by moving the wait logic after @css_enable is calculated and
    waiting for all the subsystems in @css_enable. This fixes the above
    bug as the mask contains all subsystems which are to be enabled
    including the ones enabled through dependencies.

    Signed-off-by: Tejun Heo
    Fixes: f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control")
    Acked-by: Zefan Li

    Tejun Heo
     
  • Make cgroup_subtree_control_write() first calculate new
    subtree_control (new_sc), child_subsys_mask (new_ss) and
    css_enable/disable masks before applying them to the cgroup. Also,
    store the original subtree_control (old_sc) and child_subsys_mask
    (old_ss) and use them to restore the orignal state after failure.

    This patch shouldn't cause any behavior changes. This prepares for a
    fix for a bug in the async css offline wait logic.

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li

    Tejun Heo
     
  • cgroup_refresh_child_subsys_mask() calculates and updates the
    effective @cgrp->child_subsys_maks according to the current
    @cgrp->subtree_control. Separate out the calculation part into
    cgroup_calc_child_subsys_mask(). This will be used to fix a bug in
    the async css offline wait logic.

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li

    Tejun Heo
     

10 Oct, 2014

2 commits

  • Pull percpu updates from Tejun Heo:
    "A lot of activities on percpu front. Notable changes are...

    - percpu allocator now can take @gfp. If @gfp doesn't contain
    GFP_KERNEL, it tries to allocate from what's already available to
    the allocator and a work item tries to keep the reserve around
    certain level so that these atomic allocations usually succeed.

    This will replace the ad-hoc percpu memory pool used by
    blk-throttle and also be used by the planned blkcg support for
    writeback IOs.

    Please note that I noticed a bug in how @gfp is interpreted while
    preparing this pull request and applied the fix 6ae833c7fe0c
    ("percpu: fix how @gfp is interpreted by the percpu allocator")
    just now.

    - percpu_ref now uses longs for percpu and global counters instead of
    ints. It leads to more sparse packing of the percpu counters on
    64bit machines but the overhead should be negligible and this
    allows using percpu_ref for refcnting pages and in-memory objects
    directly.

    - The switching between percpu and single counter modes of a
    percpu_ref is made independent of putting the base ref and a
    percpu_ref can now optionally be initialized in single or killed
    mode. This allows avoiding percpu shutdown latency for cases where
    the refcounted objects may be synchronously created and destroyed
    in rapid succession with only a fraction of them reaching fully
    operational status (SCSI probing does this when combined with
    blk-mq support). It's also planned to be used to implement forced
    single mode to detect underflow more timely for debugging.

    There's a separate branch percpu/for-3.18-consistent-ops which cleans
    up the duplicate percpu accessors. That branch causes a number of
    conflicts with s390 and other trees. I'll send a separate pull
    request w/ resolutions once other branches are merged"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
    percpu: fix how @gfp is interpreted by the percpu allocator
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
    percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
    percpu_ref: add PERCPU_REF_INIT_* flags
    percpu_ref: decouple switching to percpu mode and reinit
    percpu_ref: decouple switching to atomic mode and killing
    percpu_ref: add PCPU_REF_DEAD
    percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
    percpu_ref: replace pcpu_ prefix with percpu_
    percpu_ref: minor code and comment updates
    percpu_ref: relocate percpu_ref_reinit()
    Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
    Revert "percpu: free percpu allocation info for uniprocessor system"
    percpu-refcount: make percpu_ref based on longs instead of ints
    percpu-refcount: improve WARN messages
    percpu: fix locking regression in the failure path of pcpu_alloc()
    percpu-refcount: add @gfp to percpu_ref_init()
    proportions: add @gfp to init functions
    percpu_counter: add @gfp to percpu_counter_init()
    percpu_counter: make percpu_counters_lock irq-safe
    ...

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "Nothing too interesting. Just a handful of cleanup patches"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    Revert "cgroup: remove redundant variable in cgroup_mount()"
    cgroup: remove redundant variable in cgroup_mount()
    cgroup: fix missing unlock in cgroup_release_agent()
    cgroup: remove CGRP_RELEASABLE flag
    perf/cgroup: Remove perf_put_cgroup()
    cgroup: remove redundant check in cgroup_ino()
    cpuset: simplify proc_cpuset_show()
    cgroup: simplify proc_cgroup_show()
    cgroup: use a per-cgroup work for release agent
    cgroup: remove bogus comments
    cgroup: remove redundant code in cgroup_rmdir()
    cgroup: remove some useless forward declarations
    cgroup: fix a typo in comment.

    Linus Torvalds
     

26 Sep, 2014

1 commit

  • This reverts commit 0c7bf3e8cab7900e17ce7f97104c39927d835469.

    If there are child cgroups in the cgroupfs and then we umount it,
    the superblock will be destroyed but the cgroup_root will be kept
    around. When we mount it again, cgroup_mount() will find this
    cgroup_root and allocate a new sb for it.

    So with this commit we will be trapped in a dead loop in the case
    described above, because kernfs_pin_sb() keeps returning NULL.

    Currently I don't see how we can avoid using both pinned_sb and
    new_sb, so just revert it.

    Cc: Al Viro
    Reported-by: Andrey Wagin
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Zefan Li
     

25 Sep, 2014

2 commits

  • With the recent addition of percpu_ref_reinit(), percpu_ref now can be
    used as a persistent switch which can be turned on and off repeatedly
    where turning off maps to killing the ref and waiting for it to drain;
    however, there currently isn't a way to initialize a percpu_ref in its
    off (killed and drained) state, which can be inconvenient for certain
    persistent switch use cases.

    Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
    selection of operation mode; however, currently a newly initialized
    percpu_ref is always in percpu mode making it impossible to avoid the
    latency overhead of switching to atomic mode.

    This patch adds @flags to percpu_ref_init() and implements the
    following flags.

    * PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
    * PERCPU_REF_INIT_DEAD : start ref killed and drained

    These flags should be able to serve the above two use cases.

    v2: target_core_tpg.c conversion was missing. Fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner

    Tejun Heo
     
  • …linux-block into for-3.18

    This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
    kludge for SCSI blk-mq stall during probe") which implements
    __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
    commit reverted and patches to implement proper fix will be added.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Kent Overstreet <kmo@daterainc.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Christoph Hellwig <hch@lst.de>

    Tejun Heo
     

21 Sep, 2014

2 commits

  • Both pinned_sb and new_sb indicate if a new superblock is needed,
    so we can just remove new_sb.

    Note now we must check if kernfs_tryget_sb() returns NULL, because
    when it returns NULL, kernfs_mount() may still re-use an existing
    superblock, which is just allocated by another concurent mount.

    Suggested-by: Tejun Heo
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Zefan Li
     
  • The patch 971ff4935538: "cgroup: use a per-cgroup work for release
    agent" from Sep 18, 2014, leads to the following static checker
    warning:

    kernel/cgroup.c:5310 cgroup_release_agent()
    warn: 'mutex:&cgroup_mutex' is sometimes locked here and sometimes unlocked.

    Reported-by: Dan Carpenter
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Zefan Li
     

19 Sep, 2014

4 commits

  • We call put_css_set() after setting CGRP_RELEASABLE flag in
    cgroup_task_migrate(), but in other places we call it without setting
    the flag. I don't see the necessity of this flag.

    Moreover once the flag is set, it will never be cleared, unless writing
    to the notify_on_release control file, so it can be quite confusing
    if we look at the output of debug.releasable.

    # mount -t cgroup -o debug xxx /cgroup
    # mkdir /cgroup/child
    # cat /cgroup/child/debug.releasable
    0 /cgroup/child/tasks
    # cat /cgroup/child/debug.releasable
    0
    # echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
    # cat /proc/child/debug.releasable
    1
    Signed-off-by: Tejun Heo

    Zefan Li
     
  • Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().

    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Zefan Li
     
  • Instead of using a global work to schedule release agent on removable
    cgroups, we change to use a per-cgroup work to do this, which makes
    the code much simpler.

    v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)

    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Zefan Li
     
  • cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls
    pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.

    It is wrong that we release the mutex in the failure path in
    pidlist_array_load(), because cgroup_pidlist_stop() will be called
    no matter if cgroup_pidlist_start() returns errno or not.

    Fixes: 4bac00d16a8760eae7205e41d2c246477d42a210
    Cc: # 3.14+
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo
    Acked-by: Cong Wang

    Zefan Li
     

18 Sep, 2014

4 commits


08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_refs too.

    This patch doesn't make any functional difference.

    v2: blk-mq conversion was missing. Updated.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Benjamin LaHaise
    Cc: Li Zefan
    Cc: Nicholas A. Bellinger
    Cc: Jens Axboe

    Tejun Heo
     

05 Sep, 2014

2 commits

  • When cgroup_kn_lock_live() is called through some kernfs operation and
    another thread is calling cgroup_rmdir(), we'll trigger the warning in
    cgroup_get().

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
    ...
    Call Trace:
    [] dump_stack+0x41/0x52
    [] warn_slowpath_common+0x7f/0xa0
    [] warn_slowpath_null+0x1d/0x20
    [] cgroup_get+0x89/0xa0
    [] cgroup_kn_lock_live+0x28/0x70
    [] __cgroup_procs_write.isra.26+0x51/0x230
    [] cgroup_tasks_write+0x12/0x20
    [] cgroup_file_write+0x40/0x130
    [] kernfs_fop_write+0xd1/0x160
    [] vfs_write+0x98/0x1e0
    [] SyS_write+0x4d/0xa0
    [] sysenter_do_call+0x12/0x12
    ---[ end trace 6f2e0c38c2108a74 ]---

    Fix this by calling css_tryget() instead of cgroup_get().

    v2:
    - move cgroup_tryget() right below cgroup_get() definition. (Tejun)

    Cc: # 3.15+
    Reported-by: Toralf Förster
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Li Zefan
     
  • Run these two scripts concurrently:

    for ((; ;))
    {
    mkdir /cgroup/sub
    rmdir /cgroup/sub
    }

    for ((; ;))
    {
    echo $$ > /cgroup/sub/cgroup.procs
    echo $$ > /cgroup/cgroup.procs
    }

    A kernel bug will be triggered:

    BUG: unable to handle kernel NULL pointer dereference at 00000038
    IP: [] cgroup_put+0x9/0x80
    ...
    Call Trace:
    [] cgroup_kn_unlock+0x39/0x50
    [] cgroup_kn_lock_live+0x61/0x70
    [] __cgroup_procs_write.isra.26+0x51/0x230
    [] cgroup_tasks_write+0x12/0x20
    [] cgroup_file_write+0x40/0x130
    [] kernfs_fop_write+0xd1/0x160
    [] vfs_write+0x98/0x1e0
    [] SyS_write+0x4d/0xa0
    [] sysenter_do_call+0x12/0x12

    We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
    concurrent thread can access kn->priv after the clearing.

    We should move the clearing to css_release_work_fn(). At that time
    no one is holding reference to the cgroup and no one can gain a new
    reference to access it.

    v2:
    - move RCU_INIT_POINTER() into the else block. (Tejun)
    - remove the cgroup_parent() check. (Tejun)
    - update the comment in css_tryget_online_from_dir().

    Cc: # 3.15+
    Reported-by: Toralf Förster
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo

    Li Zefan
     

25 Aug, 2014

1 commit


23 Aug, 2014

1 commit

  • Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
    legacy cgroup files to show up on default hierarhcy if susbsystem does
    not have any files defined for default hierarchy.

    But this seems to be working only if legacy files are defined in
    ss->legacy_cftypes. If one adds some cftypes later using
    cgroup_add_legacy_cftypes(), these files don't show up on default
    hierarchy. Update the function accordingly so that the dynamically
    added legacy files also show up in the default hierarchy if the target
    subsystem is also using the base legacy files for the default
    hierarchy.

    tj: Patch description and comment updates.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Tejun Heo

    Vivek Goyal
     

18 Aug, 2014

1 commit


05 Aug, 2014

2 commits

  • Pull cgroup changes from Tejun Heo:
    "Mostly changes to get the v2 interface ready. The core features are
    mostly ready now and I think it's reasonable to expect to drop the
    devel mask in one or two devel cycles at least for a subset of
    controllers.

    - cgroup added a controller dependency mechanism so that block cgroup
    can depend on memory cgroup. This will be used to finally support
    IO provisioning on the writeback traffic, which is currently being
    implemented.

    - The v2 interface now uses a separate table so that the interface
    files for the new interface are explicitly declared in one place.
    Each controller will explicitly review and add the files for the
    new interface.

    - cpuset is getting ready for the hierarchical behavior which is in
    the similar style with other controllers so that an ancestor's
    configuration change doesn't change the descendants' configurations
    irreversibly and processes aren't silently migrated when a CPU or
    node goes down.

    All the changes are to the new interface and no behavior changed for
    the multiple hierarchies"

    * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
    cpuset: fix the WARN_ON() in update_nodemasks_hier()
    cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
    cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
    cgroup: distinguish the default and legacy hierarchies when handling cftypes
    cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
    cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
    cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
    cpuset: export effective masks to userspace
    cpuset: allow writing offlined masks to cpuset.cpus/mems
    cpuset: enable onlined cpu/node in effective masks
    cpuset: refactor cpuset_hotplug_update_tasks()
    cpuset: make cs->{cpus, mems}_allowed as user-configured masks
    cpuset: apply cs->effective_{cpus,mems}
    cpuset: initialize top_cpuset's configured masks at mount
    cpuset: use effective cpumask to build sched domains
    cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
    cpuset: update cs->effective_{cpus, mems} when config changes
    cpuset: update cpuset->effective_{cpus,mems} at hotplug
    cpuset: add cs->effective_cpus and cs->effective_mems
    cgroup: clean up sane_behavior handling
    ...

    Linus Torvalds
     
  • Pull percpu updates from Tejun Heo:

    - Major reorganization of percpu header files which I think makes
    things a lot more readable and logical than before.

    - percpu-refcount is updated so that it requires explicit destruction
    and can be reinitialized if necessary. This was pulled into the
    block tree to replace the custom percpu refcnting implemented in
    blk-mq.

    - In the process, percpu and percpu-refcount got cleaned up a bit

    * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
    percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
    percpu-refcount: require percpu_ref to be exited explicitly
    percpu-refcount: use unsigned long for pcpu_count pointer
    percpu-refcount: add helpers for ->percpu_count accesses
    percpu-refcount: one bit is enough for REF_STATUS
    percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
    workqueue: stronger test in process_one_work()
    workqueue: clear POOL_DISASSOCIATED in rebind_workers()
    percpu: Use ALIGN macro instead of hand coding alignment calculation
    percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
    percpu: preffity percpu header files
    percpu: use raw_cpu_*() to define __this_cpu_*()
    percpu: reorder macros in percpu header files
    percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
    percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
    percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
    percpu: reorganize include/linux/percpu-defs.h
    percpu: move accessors from include/linux/percpu.h to percpu-defs.h
    percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
    percpu: introduce arch_raw_cpu_ptr()
    ...

    Linus Torvalds
     

15 Jul, 2014

2 commits

  • cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not
    supported on the default hierarchy and is currently initialized
    statically and just includes the debug subsystem. Now that there's
    cgroup_subsys->dfl_files, we can easily tell which subsystems support
    the default hierarchy or not.

    Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether
    cgroup_subsys->dfl_files is NULL. After all, subsystems with NULL
    ->dfl_files aren't useable on the default hierarchy anyway.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo
     
  • cgroup now distinguishes cftypes for the default and legacy
    hierarchies more explicitly by using separate arrays and
    CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
    inside cgroup core proper. Let's make it clear that the flags are
    internal by prefixing them with double underscores.

    CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency. The
    two flags are also collected and assigned bits >= 16 so that they
    aren't mixed with the published flags.

    v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
    revision to the previous patch.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan

    Tejun Heo