Eric Lee / smarc-fsl-linux-kernel

12 Jul, 2013

1 commit

36805aaea Merge branch 'for-3.11/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block IO updates from Jens Axboe:
"Here are the core IO block bits for 3.11. It contains:

- A tweak to the reserved tag logic from Jan, for weirdo devices with
just 3 free tags. But for those it improves things substantially
for random writes.

- Periodic writeback fix from Jan. Marked for stable as well.

- Fix for a race condition in IO scheduler switching from Jianpeng.

- The hierarchical blk-cgroup support from Tejun. This is the grunt
of the series.

- blk-throttle fix from Vivek.

Just a note that I'm in the middle of a relocation, whole family is
flying out tomorrow. Hence I will be awal the remainder of this week,
but back at work again on Monday the 15th. CC'ing Tejun, since any
potential "surprises" will most likely be from the blk-cgroup work.
But it's been brewing for a while and sitting in my tree and
linux-next for a long time, so should be solid."

* 'for-3.11/core' of git://git.kernel.dk/linux-block: (36 commits)
elevator: Fix a race in elevator switching
block: Reserve only one queue tag for sync IO if only 3 tags are available
writeback: Fix periodic writeback after fs mount
blk-throttle: implement proper hierarchy support
blk-throttle: implement throtl_grp->has_rules[]
blk-throttle: Account for child group's start time in parent while bio climbs up
blk-throttle: add throtl_qnode for dispatch fairness
blk-throttle: make throtl_pending_timer_fn() ready for hierarchy
blk-throttle: make tg_dispatch_one_bio() ready for hierarchy
blk-throttle: make blk_throtl_bio() ready for hierarchy
blk-throttle: make blk_throtl_drain() ready for hierarchy
blk-throttle: dispatch from throtl_pending_timer_fn()
blk-throttle: implement dispatch looping
blk-throttle: separate out throtl_service_queue->pending_timer from throtl_data->dispatch_work
blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats update with it
blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log()
blk-throttle: add throtl_service_queue->parent_sq
blk-throttle: generalize update_disptime optimization in blk_throtl_bio()
blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]
blk-throttle: move bio_lists[] and friends to throtl_service_queue
...

Linus Torvalds
2013-07-12 04:03:24 +0800

05 Jul, 2013

1 commit

80cc38b16 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
treewide: relase -> release
Documentation/cgroups/memory.txt: fix stat file documentation
sysctl/net.txt: delete reference to obsolete 2.4.x kernel
spinlock_api_smp.h: fix preprocessor comments
treewide: Fix typo in printk
doc: device tree: clarify stuff in usage-model.txt.
open firmware: "/aliasas" -> "/aliases"
md: bcache: Fixed a typo with the word 'arithmetic'
irq/generic-chip: fix a few kernel-doc entries
frv: Convert use of typedef ctl_table to struct ctl_table
sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
doc: clk: Fix incorrect wording
Documentation/arm/IXP4xx fix a typo
Documentation/networking/ieee802154 fix a typo
Documentation/DocBook/media/v4l fix a typo
Documentation/video4linux/si476x.txt fix a typo
Documentation/virtual/kvm/api.txt fix a typo
Documentation/early-userspace/README fix a typo
Documentation/video4linux/soc-camera.txt fix a typo
lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
...

Linus Torvalds
2013-07-05 02:40:58 +0800

04 Jul, 2013

1 commit

f968ef1c5 memcg: update TODO list in Documentation ... Browse Code »

hugetlb cgroup has already been implemented.

Signed-off-by: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Rob Landley
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2013-07-04 07:07:31 +0800

24 Jun, 2013

1 commit

a15e41909 Documentation/cgroups/memory.txt: fix stat file documentation ... Browse Code »

Documentation for inactive_anon / active_anon was mixed up. Fix that.

Signed-off-by: Aaro Koskinen
Acked-by: Rob Landley
Signed-off-by: Jiri Kosina

Aaro Koskinen
2013-06-24 17:22:44 +0800

19 Jun, 2013

1 commit

0a0fca9d8 sched: Rename sched.c as sched/core.c in comments and Documentation ... Browse Code »

Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.

I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.

I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.

Signed-off-by: Viresh Kumar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar

Viresh Kumar
2013-06-19 18:58:42 +0800

28 May, 2013

1 commit

f884ab15a doc: fix misspellings with 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2013-05-28 18:02:12 +0800

15 May, 2013

1 commit

9138125be blk-throttle: implement proper hierarchy support ... Browse Code »

With the recent updates, blk-throttle is finally ready for proper
hierarchy support. Dispatching now honors service_queue->parent_sq
and propagates correctly. The only thing missing is setting
->parent_sq correctly so that throtl_grp hierarchy matches the cgroup
hierarchy.

This patch updates throtl_pd_init() such that service_queues form the
same hierarchy as the cgroup hierarchy if sane_behavior is enabled.
As this concludes proper hierarchy support for blkcg, the shameful
.broken_hierarchy tag is removed from blkio_subsys.

v2: Updated blkio-controller.txt as suggested by Vivek.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Cc: Li Zefan

Tejun Heo
2013-05-15 04:52:38 +0800

08 May, 2013

1 commit

b070e65c0 mm, memcg: add rss_huge stat to memory.stat ... Browse Code »

This exports the amount of anonymous transparent hugepages for each
memcg via the new "rss_huge" stat in memory.stat. The units are in
bytes.

This is helpful to determine the hugepage utilization for individual
jobs on the system in comparison to rss and opportunities where
MADV_HUGEPAGE may be helpful.

The amount of anonymous transparent hugepages is also included in "rss"
for backwards compatibility.

Signed-off-by: David Rientjes
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2013-05-08 09:38:26 +0800

02 May, 2013

1 commit

73287a43c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):

1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.

2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.

3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.

4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.

6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.

Use it to support wake-on-lan in mv643xx_eth.

From Michael Stapelberg.

7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.

8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.

9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.

11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.

12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.

13) Start adding networking selftests.

14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.

15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.

16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.

17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.

18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.

20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.

21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.

22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.

23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.

24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.

25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.

26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.

27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.

29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.

30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

31) Support several new r8169 chips, from Hayes Wang.

32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.

33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

34) Add 802.1ad vlan offload support, from Patrick McHardy.

35) Support mmap() based netlink communication, also from Patrick
McHardy.

36) Support HW timestamping in mlx4 driver, from Amir Vadai.

37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.

38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.

39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...

Linus Torvalds
2013-05-02 05:08:52 +0800

01 May, 2013

1 commit

5d434fcb2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
mm: Convert print_symbol to %pSR
gfs2: Convert print_symbol to %pSR
m32r: Convert print_symbol to %pSR
iostats.txt: add easy-to-find description for field 6
x86 cmpxchg.h: fix wrong comment
treewide: Fix typo in printk and comments
doc: devicetree: Fix various typos
docbook: fix 8250 naming in device-drivers
pata_pdc2027x: Fix compiler warning
treewide: Fix typo in printks
mei: Fix comments in drivers/misc/mei
treewide: Fix typos in kernel messages
pm44xx: Fix comment for "CONFIG_CPU_IDLE"
doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
mmzone: correct "pags" to "pages" in comment.
kernel-parameters: remove outdated 'noresidual' parameter
Remove spurious _H suffixes from ifdef comments
sound: Remove stray pluses from Kconfig file
radio-shark: Fix printk "CONFIG_LED_CLASS"
doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
...

Linus Torvalds
2013-05-01 00:36:50 +0800

30 Apr, 2013

2 commits

191a71209 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- Fixes and a lot of cleanups. Locking cleanup is finally complete.
cgroup_mutex is no longer exposed to individual controlelrs which
used to cause nasty deadlock issues. Li fixed and cleaned up quite a
bit including long standing ones like racy cgroup_path().

- device cgroup now supports proper hierarchy thanks to Aristeu.

- perf_event cgroup now supports proper hierarchy.

- A new mount option "__DEVEL__sane_behavior" is added. As indicated
by the name, this option is to be used for development only at this
point and generates a warning message when used. Unfortunately,
cgroup interface currently has too many brekages and inconsistencies
to implement a consistent and unified hierarchy on top. The new flag
is used to collect the behavior changes which are necessary to
implement consistent unified hierarchy. It's likely that this flag
won't be used verbatim when it becomes ready but will be enabled
implicitly along with unified hierarchy.

The option currently disables some of broken behaviors in cgroup core
and also .use_hierarchy switch in memcg (will be routed through -mm),
which can be used to make very unusual hierarchy where nesting is
partially honored. It will also be used to implement hierarchy
support for blk-throttle which would be impossible otherwise without
introducing a full separate set of control knobs.

This is essentially versioning of interface which isn't very nice but
at this point I can't see any other options which would allow keeping
the interface the same while moving towards hierarchy behavior which
is at least somewhat sane. The planned unified hierarchy is likely
to require some level of adaptation from userland anyway, so I think
it'd be best to take the chance and update the interface such that
it's supportable in the long term.

Maintaining the existing interface does complicate cgroup core but
shouldn't put too much strain on individual controllers and I think
it'd be manageable for the foreseeable future. Maybe we'll be able
to drop it in a decade.

Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
cpuset: fix compile warning when CONFIG_SMP=n
cpuset: fix cpu hotplug vs rebuild_sched_domains() race
cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
cgroup: restore the call to eventfd->poll()
cgroup: fix use-after-free when umounting cgroupfs
cgroup: fix broken file xattrs
devcg: remove parent_cgroup.
memcg: force use_hierarchy if sane_behavior
cgroup: remove cgrp->top_cgroup
cgroup: introduce sane_behavior mount option
move cgroupfs_root to include/linux/cgroup.h
cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
cgroup: make cgroup_path() not print double slashes
Revert "cgroup: remove bind() method from cgroup_subsys."
perf: make perf_event cgroup hierarchical
cgroup: implement cgroup_is_descendant()
cgroup: make sure parent won't be destroyed before its children
cgroup: remove bind() method from cgroup_subsys.
devcg: remove broken_hierarchy tag
cgroup: remove cgroup_lock_is_held()
...

Linus Torvalds
2013-04-30 10:14:20 +0800
70ddf637e memcg: add memory.pressure_level events ... Browse Code »

With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the pressure level
notifications. The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations. Monitoring this reclaiming activity might be useful for
maintaining cache level. Upon notification, the program (typically
"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file
caches, etc. Upon this event applications may decide to further analyze
vmstat/zoneinfo/memcg or internal memory usage statistics and free any
resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e. the
events are not pass-through. Here is what this means: for example you
have three cgroups: A->B->C. Now you set up an event listener on
cgroups A, B and C, and suppose group C experiences some pressure. In
this situation, only group C will receive the notification, i.e. groups
A and B will not receive it. This is done to avoid excessive
"broadcasting" of messages, which disturbs the system and which is
especially bad if we are low on memory or thrashing. So, organize the
cgroups wisely, or propagate the events manually (or, ask us to
implement the pass-through events, explaining why would you need them.)

Performance wise, the memory pressure notifications feature itself is
lightweight and does not require much of bookkeeping, in contrast to the
rest of memcg features. Unfortunately, as of current memcg
implementation, pages accounting is an inseparable part and cannot be
turned off. The good news is that there are some efforts[1] to improve
the situation; plus, implementing the same, fully API-compatible[2]
interface for CONFIG_MEMCG=n case (e.g. embedded) is also a viable
option, so it will not require any changes on the userland side.

[1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
[2] http://lkml.org/lkml/2013/2/21/454

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings]
Signed-off-by: Anton Vorontsov
Acked-by: Kirill A. Shutemov
Acked-by: KAMEZAWA Hiroyuki
Cc: Tejun Heo
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Mel Gorman
Cc: Glauber Costa
Cc: Michal Hocko
Cc: Luiz Capitulino
Cc: Greg Thelen
Cc: Leonid Moiseichuk
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Bartlomiej Zolnierkiewicz
Cc: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Vorontsov
2013-04-30 06:54:38 +0800

13 Apr, 2013

1 commit

26d5bbe5b Revert "cgroup: remove bind() method from cgroup_subsys." ... Browse Code »

This reverts commit 84cfb6ab484b442d5115eb3baf9db7d74a3ea626. There
are scheduled changes which make use of the removed callback.

Signed-off-by: Tejun Heo
Cc: Rami Rosen
Cc: Li Zefan

Tejun Heo
2013-04-13 01:29:04 +0800

11 Apr, 2013

1 commit

84cfb6ab4 cgroup: remove bind() method from cgroup_subsys. ... Browse Code »

The bind() method of cgroup_subsys is not used in any of the
controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
devices, perf, hugetlb, cpu and cpuacct)

tj: Removed the entry on ->bind() from
Documentation/cgroups/cgroups.txt. Also updated a couple
paragraphs which were suggesting that dynamic re-binding may be
implemented. It's not gonna.

Signed-off-by: Rami Rosen
Signed-off-by: Tejun Heo

Rami Rosen
2013-04-11 01:46:59 +0800

09 Apr, 2013

1 commit

077f02f1b Documentation: cgroup: add documentation for net_cls cgroups. ... Browse Code »

This patch adds a new file, Documentation/cgroups/net_cls.txt, with info
about net_cls cgroups, and updates the 00-INDEX accordingly.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2013-04-09 04:55:28 +0800

04 Apr, 2013

1 commit

1ae65ae92 cgroups: Documentation/cgroup/cgroup.txt - a trivial fix. ... Browse Code »

This trivial patch removes a word which appears twice in
Documentation/cgroup/cgroup.txt.

Signed-off-by: Rami Rosen
Signed-off-by: Tejun Heo

Rami Rosen
2013-04-04 05:03:30 +0800

27 Mar, 2013

1 commit

df7c6b992 doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" ... Browse Code »

Signed-off-by: Paul Bolle
Acked-by: Rob Landley
Signed-off-by: Jiri Kosina

Paul Bolle
2013-03-27 21:14:02 +0800

20 Mar, 2013

1 commit

bd2953ebb devcg: propagate local changes down the hierarchy ... Browse Code »

This patch makes exception changes to propagate down in hierarchy respecting
when possible local exceptions.

New exceptions allowing additional access to devices won't be propagated, but
it'll be possible to add an exception to access all of part of the newly
allowed device(s).

New exceptions disallowing access to devices will be propagated down and the
local group's exceptions will be revalidated for the new situation.
Example:
A
/ \
B

group behavior exceptions
A allow "b 8:* rwm", "c 116:1 rw"
B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"

If a new exception is added to group A:
# echo "c 116:* r" > A/devices.deny
it'll propagate down and after revalidating B's local exceptions, the exception
"c 116:2 rwm" will be removed.

In case parent's exceptions change and local exceptions are not allowed anymore,
they'll be deleted.

v7:
- do not allow behavior change when the cgroup has children
- update documentation

v6: fixed issues pointed by Serge Hallyn
- only copy parent's exceptions while propagating behavior if the local
behavior is different
- while propagating exceptions, do not clear and copy parent's: it'd be against
the premise we don't propagate access to more devices

v5: fixed issues pointed by Serge Hallyn
- updated documentation
- not propagating when an exception is written to devices.allow
- when propagating a new behavior, clean the local exceptions list if they're
for a different behavior

v4: fixed issues pointed by Tejun Heo
- separated function to walk the tree and collect valid propagation targets

v3: fixed issues pointed by Tejun Heo
- update documentation
- move css_online/css_offline changes to a new patch
- use cgroup_for_each_descendant_pre() instead of own descendant walk
- move exception_copy rework to a separared patch
- move exception_clean rework to a separated patch

v2: fixed issues pointed by Tejun Heo
- instead of keeping the local settings that won't apply anymore, remove them

Cc: Tejun Heo
Cc: Serge Hallyn
Signed-off-by: Aristeu Rozanski
Signed-off-by: Tejun Heo

Aristeu Rozanski
2013-03-20 22:50:21 +0800

13 Mar, 2013

1 commit

d7eeac191 cgroup: hold cgroup_mutex before calling css_offline() ... Browse Code »

cpuset no longer nests cgroup_mutex inside cpu_hotplug lock, so
we don't have to release cgroup_mutex before calling css_offline().

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Li Zefan
2013-03-13 06:35:59 +0800

01 Mar, 2013

1 commit

ee89f8125 Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:

- The big cfq/blkcg update from Tejun and and Vivek.

- Additional block and writeback tracepoints from Tejun.

- Improvement of the should sort (based on queues) logic in the plug
flushing.

- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.

- Various little fixes.

You'll get two trivial merge conflicts, which should be easy enough to
fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...

Linus Torvalds
2013-03-01 04:52:24 +0800

28 Feb, 2013

1 commit

52b233c86 Documentation/cgroups/blkio-controller.txt: fix typo ... Browse Code »

Signed-off-by: Warren Turkal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Warren Turkal
2013-02-28 11:10:11 +0800

10 Jan, 2013

1 commit

d02f7aa8d cfq-iosched: enable full blkcg hierarchy support ... Browse Code »

With the previous two patches, all cfqg scheduling decisions are based
on vfraction and ready for hierarchy support. The only thing which
keeps the behavior flat is cfqg_flat_parent() which makes vfraction
calculation consider all non-root cfqgs children of the root cfqg.

Replace it with cfqg_parent() which returns the real parent. This
enables full blkcg hierarchy support for cfq-iosched. For example,
consider the following hierarchy.

root
/ \
A:500 B:250
/ \
AA:500 AB:1000

For simplicity, let's say all the leaf nodes have active tasks and are
on service tree. For each leaf node, vfraction would be

AA: (500 / 1500) * (500 / 750) =~ 0.2222
AB: (1000 / 1500) * (500 / 750) =~ 0.4444
B: (250 / 750) =~ 0.3333

and vdisktime will be distributed accordingly. For more detail,
please refer to Documentation/block/cfq-iosched.txt.

v2: cfq-iosched.txt updated to describe group scheduling as suggested
by Vivek.

v3: blkio-controller.txt updated.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:11 +0800

08 Jan, 2013

1 commit

92e015b1c cgroups: move cgroup_event_listener.c to tools/cgroup ... Browse Code »

Move the cgroup_event_listener.c tool from Documentation into the new
tools/cgroup directory.

This change involves wiring cgroup_event_listener.c into the tools/
make system so that is can be built with:
$ make tools/cgroup

Signed-off-by: Greg Thelen
Signed-off-by: Tejun Heo

Greg Thelen
2013-01-08 01:41:28 +0800

19 Dec, 2012

3 commits

92e793495 kmem: add slab-specific documentation about the kmem controller ... Browse Code »

Signed-off-by: Glauber Costa
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: JoonSoo Kim
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Suleiman Souhlal
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:15 +0800
d5bdae7d5 memcg: add documentation about the kmem controller ... Browse Code »

Signed-off-by: Glauber Costa
Acked-by: Kamezawa Hiroyuki
Acked-by: Michal Hocko
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: JoonSoo Kim
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Suleiman Souhlal
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:13 +0800
50bdd430c res_counter: return amount of charges after res_counter_uncharge() ... Browse Code »

It is useful to know how many charges are still left after a call to
res_counter_uncharge. While it is possible to issue a res_counter_read
after uncharge, this can be racy.

If we need, for instance, to take some action when the counters drop down
to 0, only one of the callers should see it. This is the same semantics
as the atomic variables in the kernel.

Since the current return value is void, we don't need to worry about
anything breaking due to this change: nobody relied on that, and only
users appearing from now on will be checking this value.

Signed-off-by: Glauber Costa
Reviewed-by: Michal Hocko
Acked-by: Kamezawa Hiroyuki
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Suleiman Souhlal
Cc: Tejun Heo
Cc: Christoph Lameter
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: JoonSoo Kim
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:12 +0800

13 Dec, 2012

2 commits

38d7bee9d cpuset: use N_MEMORY instead N_HIGH_MEMORY ... Browse Code »

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan
Acked-by: Hillf Danton
Signed-off-by: Wen Congyang
Cc: Christoph Lameter
Cc: Lin Feng
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2012-12-13 09:38:32 +0800
d206e0903 Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"A lot of activities on cgroup side. The big changes are focused on
making cgroup hierarchy handling saner.

- cgroup_rmdir() had peculiar semantics - it allowed cgroup
destruction to be vetoed by individual controllers and tried to
drain refcnt synchronously. The vetoing never worked properly and
caused good deal of contortions in cgroup. memcg was the last
reamining user. Michal Hocko removed the usage and cgroup_rmdir()
path has been simplified significantly. This was done in a
separate branch so that the memcg people can base further memcg
changes on top.

- The above allowed cleaning up cgroup lifecycle management and
implementation of generic cgroup iterators which are used to
improve hierarchy support.

- cgroup_freezer updated to allow migration in and out of a frozen
cgroup and handle hierarchy. If a cgroup is frozen, all descendant
cgroups are frozen.

- netcls_cgroup and netprio_cgroup updated to handle hierarchy
properly.

- Various fixes and cleanups.

- Two merge commits. One to pull in memcg and rmdir cleanups (needed
to build iterators). The other pulled in cgroup/for-3.7-fixes for
device_cgroup fixes so that further device_cgroup patches can be
stacked on top."

Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
cgroup: update Documentation/cgroups/00-INDEX
cgroup_rm_file: don't delete the uncreated files
cgroup: remove subsystem files when remounting cgroup
cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
cgroup: warn about broken hierarchies only after css_online
cgroup: list_del_init() on removed events
cgroup: fix lockdep warning for event_control
cgroup: move list add after list head initilization
netprio_cgroup: allow nesting and inherit config on cgroup creation
netprio_cgroup: implement netprio[_set]_prio() helpers
netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
netprio_cgroup: reimplement priomap expansion
netprio_cgroup: shorten variable names in extend_netdev_table()
netprio_cgroup: simplify write_priomap()
netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
cgroup: remove obsolete guarantee from cgroup_task_migrate.
cgroup: add cgroup->id
cgroup, cpuset: remove cgroup_subsys->post_clone()
cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
...

Linus Torvalds
2012-12-13 00:18:24 +0800

12 Dec, 2012

1 commit

348b46553 Documentation/cgroups/memory.txt: s/mem_cgroup_charge/mem_cgroup_change_common/ ... Browse Code »

mem_cgroup_charge_common() is invoked as the entry point for cgroup limits
charge rather than mem_cgroup_charge(), as the later has been removed for
years. Update the cgroup/memory.txt to reflect this change.

Signed-off-by: Jie Liu
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Liu
2012-12-12 09:22:24 +0800

09 Dec, 2012

1 commit

15ef4ffaa cgroup: update Documentation/cgroups/00-INDEX ... Browse Code »

There are new files added to cgroup documentation. Let's update the
index file to include the new files.

Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Signed-off-by: Tejun Heo

Namjae Jeon
2012-12-09 21:52:58 +0800

22 Nov, 2012

1 commit

811d8d6ff netprio_cgroup: allow nesting and inherit config on cgroup creation ... Browse Code »

Inherit netprio configuration from ->css_online(), allow nesting and
remove .broken_hierarchy marking. This makes netprio_cgroup's
behavior match netcls_cgroup's.

Note that this patch changes userland-visible behavior. Nesting is
allowed and the first level cgroups below the root cgroup behave
differently - they inherit priorities from the root cgroup on creation
instead of starting with 0. This is unfortunate but not doing so is
much crazier.

Signed-off-by: Tejun Heo
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
2012-11-22 23:32:47 +0800

20 Nov, 2012

3 commits

033fa1c5f cgroup, cpuset: remove cgroup_subsys->post_clone() ... Browse Code »

Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core. cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.

Signed-off-by: Tejun Heo
Original-patch-by: Glauber Costa
Original-patch:
Acked-by: Serge E. Hallyn
Acked-by: Li Zefan
Cc: Glauber Costa

Tejun Heo
2012-11-20 00:13:39 +0800
2260e7fc1 cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ ... Browse Code »

clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation. Also, drop
clone_children() wrapper in cgroup.c. The thin wrapper is used only a
few times and one of them will go away soon.

Signed-off-by: Tejun Heo
Acked-by: Serge E. Hallyn
Acked-by: Li Zefan
Cc: Glauber Costa

Tejun Heo
2012-11-20 00:13:38 +0800
92fb97487 cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ... Browse Code »

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2012-11-20 00:13:38 +0800

17 Nov, 2012

1 commit

9a5a8f19b memcg: oom: fix totalpages calculation for memory.swappiness==0 ... Browse Code »

oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation. The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp. memsw portion of it).

This is usually correct but since fe35004fbf9e ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages. This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.

The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.

Signed-off-by: Michal Hocko
Acked-by: David Rientjes
Acked-by: Johannes Weiner
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-11-17 06:33:04 +0800

10 Nov, 2012

1 commit

ef9fe980c cgroup_freezer: implement proper hierarchy support ... Browse Code »

Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.

This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.

freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.

Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.

As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.

Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.

v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor. Fixed.

v3: Documentation/cgroups/freezer-subsystem.txt updated.

Signed-off-by: Tejun Heo
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 02:52:30 +0800

09 Oct, 2012

1 commit

1939c557b memcg: trivial fixes for Documentation/cgroups/memory.txt ... Browse Code »

While reading through Documentation/cgroups/memory.txt, I found a number
of minor wordos and typos. The patch below is a conservative handling of
some of these: it provides just a number of "obviously correct" fixes to
the English that improve the readability of the document somewhat.
Obviously some more significant fixes need to be made to the document, but
some of those may not be in the "obvious correct" category.

Signed-off-by: Michael Kerrisk
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk
2012-10-09 15:22:54 +0800

14 Sep, 2012

1 commit

83b061fc0 cgroup: trivial fixes for Documentation/cgroups/cgroups.txt ... Browse Code »

While reading through Documentation/cgroups/cgroups.txt, I found a
number of minor wordos and typos. The patch below is a conservative
handling of some of these: it provides just a number of "obviously
correct" fixes to the English that improve the readability
of the document somewhat. Obviously some more significant
fixes could be made to the document, but some of those
may not be in the "obviously correct" category.

Signed-off-by: Michael Kerrisk
Signed-off-by: Tejun Heo

Michael Kerrisk
2012-09-14 02:10:54 +0800

13 Sep, 2012

1 commit

19ec2567e cgroup: add documentation on extended attributes usage ... Browse Code »

v2: update cgroups.txt instead of creating a new file

Cc: Tejun Heo
Cc: Hugh Dickins
Cc: Hillf Danton
Cc: Lennart Poettering
Acked-by: Li Zefan
Signed-off-by: Aristeu Rozanski
Signed-off-by: Tejun Heo

Aristeu Rozanski
2012-09-13 02:39:50 +0800

01 Aug, 2012

1 commit

05a73ed29 mm/memcg: complete documentation for tcp memcg files ... Browse Code »

Signed-off-by: Wanpeng Li
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-08-01 09:42:43 +0800