Eric Lee / smarc-fsl-linux-kernel

08 Nov, 2011

1 commit

5f1116167 Documentation: Fix typo in freezer-subsystem.txt ... Browse Code »

Fix a typo in Documentation/cgroups/freezer-subsystem.txt.

Signed-off-by: Rafael J. Wysocki
Reviewed-by: Srivatsa S. Bhat
Acked-by: Randy Dunlap

Rafael J. Wysocki
2011-11-08 06:02:25 +0800

05 Nov, 2011

1 commit

5fe69d7e2 Documentation: update cgroups notes ... Browse Code »

- ns cgroup has been removed.
- it's true moving a task to another cgroup can fail.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Li Zefan
2011-11-05 03:01:46 +0800

03 Nov, 2011

1 commit

9b272977e memcg: skip scanning active lists based on individual size ... Browse Code »

Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Reviewed-by: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-11-03 07:07:00 +0800

15 Sep, 2011

1 commit

185efc0f9 memcg: Revert "memcg: add memory.vmscan_stat" ... Browse Code »

Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between. Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Michal Hocko
Cc: Ying Han
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-09-15 09:09:38 +0800

27 Jul, 2011

1 commit

82f9d486e memcg: add memory.vmscan_stat ... Browse Code »

The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim
in...") says it adds scanning stats to memory.stat file. But it doesn't
because we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
- the number of scanned pages(total, anon, file)
- the number of rotated pages(total, anon, file)
- the number of freed pages(total, anon, file)
- the number of elaplsed time (including sleep/pause time)

for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

# echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
scanned_pages_by_limit 9471864
scanned_anon_pages_by_limit 6640629
scanned_file_pages_by_limit 2831235
rotated_pages_by_limit 4243974
rotated_anon_pages_by_limit 3971968
rotated_file_pages_by_limit 272006
freed_pages_by_limit 2318492
freed_anon_pages_by_limit 962052
freed_file_pages_by_limit 1356440
elapsed_ns_by_limit 351386416101
scanned_pages_by_system 0
scanned_anon_pages_by_system 0
scanned_file_pages_by_system 0
rotated_pages_by_system 0
rotated_anon_pages_by_system 0
rotated_file_pages_by_system 0
freed_pages_by_system 0
freed_anon_pages_by_system 0
freed_file_pages_by_system 0
elapsed_ns_by_system 0
scanned_pages_by_limit_under_hierarchy 9471864
scanned_anon_pages_by_limit_under_hierarchy 6640629
scanned_file_pages_by_limit_under_hierarchy 2831235
rotated_pages_by_limit_under_hierarchy 4243974
rotated_anon_pages_by_limit_under_hierarchy 3971968
rotated_file_pages_by_limit_under_hierarchy 272006
freed_pages_by_limit_under_hierarchy 2318492
freed_anon_pages_by_limit_under_hierarchy 962052
freed_file_pages_by_limit_under_hierarchy 1356440
elapsed_ns_by_limit_under_hierarchy 351386416101
scanned_pages_by_system_under_hierarchy 0
scanned_anon_pages_by_system_under_hierarchy 0
scanned_file_pages_by_system_under_hierarchy 0
rotated_pages_by_system_under_hierarchy 0
rotated_anon_pages_by_system_under_hierarchy 0
rotated_file_pages_by_system_under_hierarchy 0
freed_pages_by_system_under_hierarchy 0
freed_anon_pages_by_system_under_hierarchy 0
freed_file_pages_by_system_under_hierarchy 0
elapsed_ns_by_system_under_hierarchy 0

total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc->nr_scanned at el is not designed for exporting information. For
example, nr_scanned is reset frequentrly and incremented +2 at scanning
mapped pages.

To avoid complexity, I added a new param in scan_control which is for
exporting scanning score.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Michal Hocko
Cc: Ying Han
Cc: Andrew Bresticker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-07-27 07:49:42 +0800

24 Jul, 2011

2 commits

9fd615f46 Documentation: fix ambigous text for root cpuset ... Browse Code »

Only the root cpuset has cpuset.memory_pressure_enabled flag,
but not the only one.

Signed-off-by: Wanlong Gao
Signed-off-by: Randy Dunlap
Acked-by: Paul Menage
Signed-off-by: Linus Torvalds

Wanlong Gao
2011-07-24 01:58:08 +0800
e47f9d84e Documentation: fix echo command in cgroups/cpuacct.txt ... Browse Code »

Must echo a task id to the cgroups' tasks file, but not to a directory.

Signed-off-by: Wanlong Gao
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Wanlong Gao
2011-07-24 01:58:08 +0800

07 Jul, 2011

1 commit

9b61fc4cf Documentation: fix cgroup blkio throttle filenames ... Browse Code »

All the blkio.throttle.* file names are incorrectly reported without
".throttle" in the documentation. Fix it.

Signed-off-by: Andrea Righi
Signed-off-by: Randy Dunlap
Acked-by: Vivek Goyal
Signed-off-by: Linus Torvalds

Andrea Righi
2011-07-07 04:17:51 +0800

16 Jun, 2011

3 commits

67de0162f Documentation: fix cgroup typos and formatting ... Browse Code »

Fix format and spelling.

Signed-off-by: Jörg Sommer
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jörg Sommer
2011-06-16 12:52:50 +0800
f6e07d380 Documentation: update cgroupfs mount point ... Browse Code »

According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup. Hence, this should be used in the documentation.

Signed-off-by: Jörg Sommer
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jörg Sommer
2011-06-16 12:52:50 +0800
50c35e5ba memcg: add documentation for the memory.numastat API ... Browse Code »

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: Ying Han
Reviewed-by: KOSAKI Motohiro
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2011-06-16 11:03:59 +0800

27 May, 2011

3 commits

a77aea920 cgroup: remove the ns_cgroup ... Browse Code »

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup

The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano
Signed-off-by: Serge E. Hallyn
Cc: Eric W. Biederman
Cc: Jamal Hadi Salim
Reviewed-by: Li Zefan
Acked-by: Paul Menage
Acked-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Lezcano
2011-05-27 08:12:34 +0800
74a1166df cgroups: make procs file writable ... Browse Code »

Make procs file writable to move all threads by tgid at once.

Add functionality that enables users to move all threads in a threadgroup
at once to a cgroup by writing the tgid to the 'cgroup.procs' file. This
current implementation makes use of a per-threadgroup rwsem that's taken
for reading in the fork() path to prevent newly forking threads within the
threadgroup from "escaping" while the move is in progress.

Signed-off-by: Ben Blum
Cc: "Eric W. Biederman"
Cc: Li Zefan
Cc: Matt Helsley
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-05-27 08:12:34 +0800
f780bdb7c cgroups: add per-thread subsystem callbacks ... Browse Code »

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface. Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this. All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.

Signed-off-by: Ben Blum
Cc: "Eric W. Biederman"
Cc: Li Zefan
Cc: Matt Helsley
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-05-27 08:12:34 +0800

29 Apr, 2011

1 commit

a111c966a memcg: update documentation to describe usage_in_bytes ... Browse Code »

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
batched (delayed) uncharge at truncation/unmap. And since cdec2e42(memcg:
coalesce charging via percpu storage), we have percpu cache for
res_counter.

These changes improved performance of memory cgroup very much, but made
res_counter->usage usually have a bigger value than the actual value of
memory usage. So, *.usage_in_bytes, which show res_counter->usage, are
not desirable for precise values of memory(and swap) usage anymore.

Instead of removing these files completely(because we cannot know
res_counter->usage without them), this patch updates the meaning of those
files.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2011-04-29 02:28:21 +0800

05 Apr, 2011

1 commit

6ad85239d Documentation: update cgroups info user groups names ... Browse Code »

Update suitable words to explain / understand cgroups contents.

Signed-off-by: Geunsik Lim
Cc: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Geunsik Lim
2011-04-05 08:51:47 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

19 Mar, 2011

1 commit

e16b396ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
Update cpuset info & webiste for cgroups
dcdbas: force SMI to happen when expected
arch/arm/Kconfig: remove one to many l's in the word.
asm-generic/user.h: Fix spelling in comment
drm: fix printk typo 'sracth'
Remove one to many n's in a word
Documentation/filesystems/romfs.txt: fixing link to genromfs
drivers:scsi Change printk typo initate -> initiate
serial, pch uart: Remove duplicate inclusion of linux/pci.h header
fs/eventpoll.c: fix spelling
mm: Fix out-of-date comments which refers non-existent functions
drm: Fix printk typo 'failled'
coh901318.c: Change initate to initiate.
mbox-db5500.c Change initate to initiate.
edac: correct i82975x error-info reported
edac: correct i82975x mci initialisation
edac: correct commented info
fs: update comments to point correct document
target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
...

Trivial conflict in fs/eventpoll.c (spelling vs addition)

Linus Torvalds
2011-03-19 01:37:40 +0800

17 Mar, 2011

1 commit

bb6405eab Documentation: update cgroup pid and cpuset information ... Browse Code »

The cgroup documentation does not specify how a process can be removed
from a particular group. This patch adds a note at the end of the
simple example about how this is done. Also, some cgroups (like
cpusets) require user input before a new group can be used. This is
noted in the patch as well.

Signed-off-by: Eric B Munson
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Eric B Munson
2011-03-17 01:47:04 +0800

09 Mar, 2011

2 commits

8671139b1 Update cpuset info & webiste for cgroups ... Browse Code »

cpuset related websited is changed.
and, update list of cpuset using cgroup(controller group).

Signed-off-by: Geunsik Lim
Acked-by: Paul Menage
Signed-off-by: Jiri Kosina

GeunSik Lim
2011-03-09 06:12:39 +0800
df457f845 blk-cgroup: Lower minimum weight from 100 to 10. ... Browse Code »

We've found that we still get good, useful isolation at weights this
low. I'd like to adjust the minimum so that any other changes can take
these values into account.

Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-09 02:45:00 +0800

02 Mar, 2011

1 commit

0bbfeb832 cfq-iosched: Always provide group isolation. ... Browse Code »

Effectively, make group_isolation=1 the default and remove the tunable.
The setting group_isolation=0 was because by default we idle on
sync-noidle tree and on fast devices, this can be very harmful for
throughput.

However, this problem can also be addressed by tuning slice_idle and
possibly group_idle on faster storage devices.

This change simplifies the CFQ code by removing the feature entirely.

Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-02 04:05:08 +0800

17 Feb, 2011

1 commit

689bca3bb memcg: clarify use_hierarchy documentation ... Browse Code »

The memcg code does not allow changing memory.use_hierarchy if the
parent cgroup has enabled use_hierarchy. Update documentation to match
the code.

Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Jiri Kosina

Greg Thelen
2011-02-17 16:17:21 +0800

14 Jan, 2011

5 commits

11ff26c88 revert documentaion update for memcg's dirty ratio. ... Browse Code »

Subjct: Revert memory cgroup dirty_ratio Documentation.

The commit ece72400c2a27a3d726cb0854449f991d9fcd2da adds documentation
for memcg's dirty ratio. But the function is not implemented yet.
Remove the documentation for avoiding confusing users.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Greg Thelen
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-01-14 23:52:02 +0800
ece72400c memcg: document cgroup dirty memory interfaces ... Browse Code »

Document cgroup dirty memory interfaces and statistics.

[akpm@linux-foundation.org: fix use_hierarchy description]
Signed-off-by: Andrea Righi
Signed-off-by: Greg Thelen
Cc: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2011-01-14 09:32:50 +0800
275220f0f Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...

Linus Torvalds
2011-01-14 02:45:01 +0800
008d23e48 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
Documentation/trace/events.txt: Remove obsolete sched_signal_send.
writeback: fix global_dirty_limits comment runtime -> real-time
ppc: fix comment typo singal -> signal
drivers: fix comment typo diable -> disable.
m68k: fix comment typo diable -> disable.
wireless: comment typo fix diable -> disable.
media: comment typo fix diable -> disable.
remove doc for obsolete dynamic-printk kernel-parameter
remove extraneous 'is' from Documentation/iostats.txt
Fix spelling milisec -> ms in snd_ps3 module parameter description
Fix spelling mistakes in comments
Revert conflicting V4L changes
i7core_edac: fix typos in comments
mm/rmap.c: fix comment
sound, ca0106: Fix assignment to 'channel'.
hrtimer: fix a typo in comment
init/Kconfig: fix typo
anon_inodes: fix wrong function name in comment
fix comment typos concerning "consistent"
poll: fix a typo in comment
...

Fix up trivial conflicts in:
- drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
- fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

Linus Torvalds
2011-01-14 02:05:56 +0800
1bdcd78e2 cgroups: remove deprecated subsystem from examples. ... Browse Code »

The 'ns' cgroup is considered deprecated. Change the cgroup subsystem
used in the examples of the cgroup documentation from 'ns' to 'blkio'.

Signed-off-by: Trevor Woerner
Acked-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trevor Woerner
2011-01-14 00:03:16 +0800

16 Nov, 2010

1 commit

bdc85df7a blk-cgroup: Allow creation of hierarchical cgroups ... Browse Code »

o Allow hierarchical cgroup creation for blkio controller

o Currently we disallow it as both the io controller policies (throttling
as well as proportion bandwidth) do not support hierarhical accounting
and control. But the flip side is that blkio controller can not be used with
libvirt as libvirt creates a cgroup hierarchy deeper than 1 level.

//libvirt/qemu/

o So this patch will allow creation of cgroup hierarhcy but at the backend
everything will be treated as flat. So if somebody created a an hierarchy
like as follows.

root
/ \
test1 test2
|
test3

CFQ and throttling will practically treat all groups at same level.

pivot
/ | \ \
root test1 test2 test3

o Once we have actual support for hierarchical accounting and control
then we can introduce another cgroup tunable file "blkio.use_hierarchy"
which will be 0 by default but if user wants to enforce hierarhical
control then it can be set to 1. This way there should not be any
ABI problems down the line.

o The only not so pretty part is introduction of extra file "use_hierarchy"
down the line. Kame-san had mentioned that hierarhical accounting is
expensive in memory controller hence they keep it off by default. I
suspect same will be the case for IO controller also as for each IO
completion we shall have to account IO through hierarchy up to the root.
if yes, then it probably is not a very bad idea to introduce this extra
file so that it will be used only when somebody needs it and some people
might enable hierarchy only in part of the hierarchy.

o This is how basically memory controller also uses "use_hierarhcy" and
they also allowed creation of hierarchies when actual backend support
was not available.

Signed-off-by: Vivek Goyal
Acked-by: Balbir Singh
Reviewed-by: Gui Jianfeng
Reviewed-by: Ciju Rajan K
Tested-by: Ciju Rajan K
Signed-off-by: Jens Axboe

Vivek Goyal
2010-11-16 02:37:36 +0800

02 Nov, 2010

1 commit

b595076a1 tree-wide: fix comment/printk typos ... Browse Code »

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König
Signed-off-by: Jiri Kosina

Uwe Kleine-König
2010-11-02 03:38:34 +0800

28 Oct, 2010

1 commit

97978e6d1 cgroup: add clone_children control file ... Browse Code »

The ns_cgroup is a control group interacting with the namespaces. When a
new namespace is created, a corresponding cgroup is automatically created
too. The cgroup name is the pid of the process who did 'unshare' or the
child of 'clone'.

This cgroup is tied with the namespace because it prevents a process to
escape the control group and use the post_clone callback, so the child
cgroup inherits the values of the parent cgroup.

Unfortunately, the more we use this cgroup and the more we are facing
problems with it:

(1) when a process unshares, the cgroup name may conflict with a
previous cgroup with the same pid, so unshare or clone return -EEXIST

(2) the cgroup creation is out of control because there may have an
application creating several namespaces where the system will
automatically create several cgroups in his back and let them on the
cgroupfs (eg. a vrf based on the network namespace).

(3) the mix of (1) and (2) force an administrator to regularly check
and clean these cgroups.

This patchset removes the ns_cgroup by adding a new flag to the cgroup and
the cgroupfs mount option. It enables the copy of the parent cgroup when
a child cgroup is created. We can then safely remove the ns_cgroup as
this flag brings a compatibility. We have now to manually create and add
the task to a cgroup, which is consistent with the cgroup framework.

This patch:

Sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup. This control file
is a boolean specifying if the child cgroup should be a clone of the
parent cgroup or not. The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the
post_clone callback.

The option can be set at mount time by specifying the 'clone_children'
mount option.

Signed-off-by: Daniel Lezcano
Signed-off-by: Serge E. Hallyn
Cc: Eric W. Biederman
Acked-by: Paul Menage
Reviewed-by: Li Zefan
Cc: Jamal Hadi Salim
Cc: Matt Helsley
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Lezcano
2010-10-28 09:03:09 +0800

23 Oct, 2010

1 commit

e9dd2b683 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
cfq-iosched: Fix a gcc 4.5 warning and put some comments
block: Turn bvec_k{un,}map_irq() into static inline functions
block: fix accounting bug on cross partition merges
block: Make the integrity mapped property a bio flag
block: Fix double free in blk_integrity_unregister
block: Ensure physical block size is unsigned int
blkio-throttle: Fix possible multiplication overflow in iops calculations
blkio-throttle: limit max iops value to UINT_MAX
blkio-throttle: There is no need to convert jiffies to milli seconds
blkio-throttle: Fix link failure failure on i386
blkio: Recalculate the throttled bio dispatch time upon throttle limit change
blkio: Add root group to td->tg_list
blkio: deletion of a cgroup was causes oops
blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: revert bad fix for memory hotplug causing bounces
Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: Prevent hang_check firing during long I/O
cfq: improve fsync performance for small files
...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

Linus Torvalds
2010-10-23 08:00:32 +0800

16 Sep, 2010

1 commit

2786c4e5e blkio: Documentation Update ... Browse Code »

o Documentation update

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-09-16 14:45:03 +0800

23 Aug, 2010

1 commit

6d6ac1c1a cfq-iosched: Documentation help for new tunables ... Browse Code »

Some documentation to provide help with tunables.

Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe

Vivek Goyal
2010-08-23 18:25:29 +0800

04 Aug, 2010

1 commit

0ea6e6112 Documentation: update broken web addresses. ... Browse Code »

Below you will find an updated version from the original series bunching all patches into one big patch
updating broken web addresses that are located in Documentation/*
Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
Now there are also some addresses pointing to .spec files some are located, but some(after searching
on the companies site)where still no where to be found. In this case I just changed the address
to the company site this way the users can contact the company and they can locate them for the users.

Signed-off-by: Justin P. Mattock
Signed-off-by: Thomas Weber
Signed-off-by: Mike Frysinger
Cc: Paulo Marques
Cc: Randy Dunlap
Cc: Michael Neuling
Signed-off-by: Jiri Kosina

Justin P. Mattock
2010-08-04 21:21:40 +0800

28 May, 2010

5 commits

dc10e281f memcg: update documentation ... Browse Code »

Some information are old, and I think current document doesn't work as "a
guide for users". We need summary of all of our controls, at least.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Randy Dunlap
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800
87946a722 memcg: move charge of file pages ... Browse Code »

This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting
bit 1 of /memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e. they might not be the task's "RSS", but other task's "RSS" that maps
the same file. And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1). So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-05-28 00:12:43 +0800
3c11ecf44 memcg: oom kill disable and oom status ... Browse Code »

This adds a feature to disable oom-killer for memcg, if disabled, of
course, tasks under memcg will stop.

But now, we have oom-notifier for memcg. And the world around memcg is
not under out-of-memory. memcg's out-of-memory just shows memcg hits
limit. Then, administrator or management daemon can recover the situation
by

- kill some process
- enlarge limit, add more swap.
- migrate some tasks
- remove file cache on tmps (difficult ?)

Unlike oom-killer, you can take enough information before killing tasks.
(by gcore, or, ps etc.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800
9490ff275 memcg: oom notifier ... Browse Code »

Considering containers or other resource management softwares in userland,
event notification of OOM in memcg should be implemented. Now, memcg has
"threshold" notifier which uses eventfd, we can make use of it for oom
notification.

This patch adds oom notification eventfd callback for memcg. The usage is
very similar to threshold notifier, but control file is memory.oom_control
and no arguments other than eventfd is required.

% cgroup_event_notifier /cgroup/A/memory.oom_control dummy
(About cgroup_event_notifier, see Documentation/cgroup/)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: David Rientjes
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800
595f4b694 Documentation/cgroups/cgroups.txt: fix reference to "numtasks" ... Browse Code »

Signed-off-by: Trevor Woerner
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Trevor Woerner
2010-05-28 00:12:43 +0800