Eric Lee / smarc-fsl-linux-kernel

18 Feb, 2016

1 commit

bd4508e85 core: remove unneded headers for net cgroup controllers. ... Browse Code »

commit 3ed80a6 (cgroup: drop module support) made including
module.h redundant in the net cgroup controllers,
netclassid_cgroup.c and netprio_cgroup.c. This patch
removes them.

Signed-off-by: Rami Rosen
Acked-by: Tejun Heo
Signed-off-by: David S. Miller

Rosen, Rami
9 years ago

18 Dec, 2015

1 commit

b3e0d3d7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller

David S. Miller
10 years ago

09 Dec, 2015

3 commits

bd1060a1d sock, cgroup: add sock->sk_cgroup ... Browse Code »

In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

In preparation, this patch updates sock->sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used. Once either of the two is used,
sock->sk_cgrp_data reverts to its previous role of carrying prioidx
and classid. This is to avoid adding yet another cgroup related field
to struct sock.

As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead. It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs. Non-critical inaccuracies from small race windows won't
make any noticeable difference.

This patch doesn't make use of the pointer yet. The following patch
will implement netfilter match for cgroup2 membership.

v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
cgroup specific field.

v3: Add comments explaining why sock_data_prioidx() and
sock_data_classid() use different fallback values.

Signed-off-by: Tejun Heo
Cc: Daniel Borkmann
Cc: Daniel Wagner
CC: Neil Horman
Signed-off-by: David S. Miller

Tejun Heo
10 years ago
2a56a1fec net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct ... Browse Code »

Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct
and its accessors are defined in cgroup-defs.h. This is to prepare
for overloading the fields with a cgroup pointer.

This patch mostly performs equivalent conversions but the followings
are noteworthy.

* Equality test before updating classid is removed from
sock_update_classid(). This shouldn't make any noticeable
difference and a similar test will be implemented on the helper side
later.

* sock_update_netprioidx() now takes struct sock_cgroup_data and can
be moved to netprio_cgroup.h without causing include dependency
loop. Moved.

* The dummy version of sock_update_netprioidx() converted to a static
inline function while at it.

Signed-off-by: Tejun Heo
Signed-off-by: David S. Miller

Tejun Heo
10 years ago
297dbde19 netprio_cgroup: limit the maximum css->id to USHRT_MAX ... Browse Code »

netprio builds per-netdev contiguous priomap array which is indexed by
css->id. The array is allocated using kzalloc() effectively limiting
the maximum ID supported to some thousand range. This patch caps the
maximum supported css->id to USHRT_MAX which should be way above what
is actually useable.

This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed
up part will be used to overload the cgroup related fields.
sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
two cgroup related fields are adjacent.

Signed-off-by: Tejun Heo
Acked-by: Daniel Wagner
Cc: Daniel Borkmann
CC: Neil Horman
Signed-off-by: David S. Miller

Tejun Heo
10 years ago

03 Dec, 2015

1 commit

1f7dd3e5a cgroup: fix handling of multi-destination migration from subtree_control enabling ... Browse Code »

Consider the following v2 hierarchy.

P0 (+memory) --- P1 (-memory) --- A
\- B

P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[] dump_stack+0x4e/0x82
[] warn_slowpath_common+0x82/0xc0
[] warn_slowpath_null+0x1a/0x20
[] pids_cancel.constprop.6+0x31/0x40
[] pids_can_attach+0x6d/0xf0
[] cgroup_taskset_migrate+0x6c/0x330
[] cgroup_migrate+0xf5/0x190
[] cgroup_attach_task+0x176/0x200
[] __cgroup_procs_write+0x2ad/0x460
[] cgroup_procs_write+0x14/0x20
[] cgroup_file_write+0x35/0x1c0
[] kernfs_fop_write+0x141/0x190
[] __vfs_write+0x28/0xe0
[] vfs_write+0xac/0x1a0
[] SyS_write+0x49/0xb0
[] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.

* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo
Reported-and-tested-by: Daniel Wagner
Cc: Aleksa Sarai

Tejun Heo
10 years ago

15 Jul, 2014

1 commit

5577964e6 cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes ... Browse Code »

Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them. This is quite hairy and error-prone. Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.

cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type. This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.

In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes. This patch is pure rename.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vivek Goyal
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Aristeu Rozanski
Cc: Aneesh Kumar K.V

Tejun Heo
11 years ago

17 May, 2014

1 commit

5c9d535b8 cgroup: remove css_parent() ... Browse Code »

cgroup in general is moving towards using cgroup_subsys_state as the
fundamental structural component and css_parent() was introduced to
convert from using cgroup->parent to css->parent. It was quite some
time ago and we're moving forward with making css more prominent.

This patch drops the trivial wrapper css_parent() and let the users
dereference css->parent. While at it, explicitly mark fields of css
which are public and immutable.

v2: New usage from device_cgroup.c converted.

Signed-off-by: Tejun Heo
Acked-by: Michal Hocko
Acked-by: Neil Horman
Acked-by: "David S. Miller"
Acked-by: Li Zefan
Cc: Vivek Goyal
Cc: Jens Axboe
Cc: Peter Zijlstra
Cc: Johannes Weiner

Tejun Heo
11 years ago

14 May, 2014

1 commit

451af504d cgroup: replace cftype->write_string() with cftype->write() ... Browse Code »

Convert all cftype->write_string() users to the new cftype->write()
which maps directly to kernfs write operation and has full access to
kernfs and cgroup contexts. The conversions are mostly mechanical.

* @css and @cft are accessed using of_css() and of_cft() accessors
respectively instead of being specified as arguments.

* Should return @nbytes on success instead of 0.

* @buf is not trimmed automatically. Trim if necessary. Note that
blkcg and netprio don't need this as the parsers already handle
whitespaces.

cftype->write_string() has no user left after the conversions and
removed.

While at it, remove unnecessary local variable @p in
cgroup_subtree_control_write() and stale comment about
CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c.

This patch doesn't introduce any visible behavior changes.

v2: netprio was missing from conversion. Converted.

Signed-off-by: Tejun Heo
Acked-by: Aristeu Rozanski
Acked-by: Vivek Goyal
Acked-by: Li Zefan
Cc: Jens Axboe
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Neil Horman
Cc: "David S. Miller"

Tejun Heo
11 years ago

19 Mar, 2014

1 commit

4d3bb511b cgroup: drop const from @buffer of cftype->write_string() ... Browse Code »

cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer. The
only thing const achieves is unnecessarily complicating parsing of the
buffer. Drop const from @buffer.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki

Tejun Heo
11 years ago

13 Feb, 2014

1 commit

924f0d9a2 cgroup: drop @skip_css from cgroup_taskset_for_each() ... Browse Code »

If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching
css. The intention of the interface is to make it easy to skip css's
(cgroup_subsys_states) which already match the migration target;
however, this is entirely unnecessary as migration taskset doesn't
include tasks which are already in the target cgroup. Drop @skip_css
from cgroup_taskset_for_each().

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann

Tejun Heo
11 years ago

08 Feb, 2014

3 commits

073219e99 cgroup: clean up cgroup_subsys names and initialization ... Browse Code »

cgroup_subsys is a bit messier than it needs to be.

* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.

* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.

* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.

This patch cleans up cgroup_subsys names and initialization by doing
the followings.

* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.

* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.

cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event

* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.

* Redundant cgroup_subsys declarations removed.

* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.

This patch doesn't introduce any behavior changes.

v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: "David S. Miller"
Acked-by: "Rafael J. Wysocki"
Acked-by: Michal Hocko
Acked-by: Peter Zijlstra
Acked-by: Aristeu Rozanski
Acked-by: Ingo Molnar
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Serge E. Hallyn
Cc: Vivek Goyal
Cc: Thomas Graf

Tejun Heo
11 years ago
3ed80a62b cgroup: drop module support ... Browse Code »

With module supported dropped from net_prio, no controller is using
cgroup module support. None of actual resource controllers can be
built as a module and we aren't gonna add new controllers which don't
control resources. This patch drops module support from cgroup.

* cgroup_[un]load_subsys() and cgroup_subsys->module removed.

* As there's no point in distinguishing IS_BUILTIN() and IS_MODULE(),
cgroup_subsys.h now uses IS_ENABLED() directly.

* enum cgroup_subsys_id now exactly matches the list of enabled
controllers as ordered in cgroup_subsys.h.

* cgroup_subsys[] is now a contiguously occupied array. Size
specification is no longer necessary and dropped.

* for_each_builtin_subsys() is removed and for_each_subsys() is
updated to not require any locking.

* module ref handling is removed from rebind_subsystems().

* Module related comments dropped.

v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").

v3: Added {} around the if (need_forkexit_callback) block in
cgroup_post_fork() for readability as suggested by Li.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
11 years ago
af6363374 cgroup: make CONFIG_CGROUP_NET_PRIO bool and drop unnecessary init_netclassid_cgroup() ... Browse Code »

net_prio is the only cgroup which is allowed to be built as a module.
The savings from allowing one controller to be built as a module are
tiny especially given that cgroup module support itself adds quite a
bit of complexity.

Given that none of other controllers has much chance of being made a
module and that we're unlikely to add new modular controllers, the
added complexity is simply not justifiable.

As a first step to drop cgroup module support, this patch changes the
config option to bool from tristate and drops module related code from
it.

Also, while an earlier commit fe1217c4f3f7 ("net: net_cls: move
cgroupfs classid handling into core") dropped module support from
net_cls cgroup, it retained a call to cgroup_load_subsys(), which is
noop for built-in controllers. Drop it along with
init_netclassid_cgroup().

v2: Removed modular version of task_netprioidx() in
include/net/netprio_cgroup.h as suggested by Li Zefan.

v3: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core"). net_cls cgroup part is mostly
dropped except for removal of init_netclassid_cgroup().

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: "David S. Miller"
Acked-by: Li Zefan
Cc: Thomas Graf

Tejun Heo
11 years ago

26 Jan, 2014

1 commit

4ba9920e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) BPF debugger and asm tool by Daniel Borkmann.

2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.

4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.

5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.

6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.

7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.

9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.

21) Support 10G in generic phylib. From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...

Linus Torvalds
11 years ago

11 Dec, 2013

1 commit

8e3bff96a net: more spelling fixes ... Browse Code »

Various spelling fixes in networking stack

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
12 years ago

06 Dec, 2013

2 commits

2da8ca822 cgroup: replace cftype->read_seq_string() with cftype->seq_show() ... Browse Code »

In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs. This patch
replaces cftype->read_seq_string() with cftype->seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.

The conversions are mechanical. As ->seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively. In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.

This patch does not introduce any behavior changes.

Signed-off-by: Tejun Heo
Acked-by: Aristeu Rozanski
Acked-by: Vivek Goyal
Acked-by: Michal Hocko
Acked-by: Daniel Wagner
Acked-by: Li Zefan
Cc: Jens Axboe
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Neil Horman

Tejun Heo
12 years ago
e92e113ca netprio_cgroup: convert away from cftype->read_map() ... Browse Code »

In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

cftype->read_map() doesn't add any value and being replaced with
->read_seq_string(). Update read_priomap() to use ->read_seq_string()
instead.

This patch doesn't make any visible behavior changes.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Daniel Wagner
Acked-by: Li Zefan

Tejun Heo
12 years ago

09 Oct, 2013

1 commit

e1af5e445 cgroup: netprio: remove unnecessary task_netprioidx ... Browse Code »

Since the tasks have been migrated to the cgroup,
there is no need to call task_netprioidx to get
task's cgroup id.

Signed-off-by: Gao feng
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Gao feng
12 years ago

09 Aug, 2013

5 commits

d99c8727e cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup ... Browse Code »

cgroup is in the process of converting to css (cgroup_subsys_state)
from cgroup as the principal subsystem interface handle. This is
mostly to prepare for the unified hierarchy support where css's will
be created and destroyed dynamically but also helps cleaning up
subsystem implementations as css is usually what they are interested
in anyway.

cgroup_taskset which is used by the subsystem attach methods is the
last cgroup subsystem API which isn't using css as the handle. Update
cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.

The conversions are pretty mechanical. One exception is
cpuset::cgroup_cs(), which lost its last user and got removed.

This patch shouldn't introduce any functional changes.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Daniel Wagner
Cc: Ingo Molnar
Cc: Matt Helsley
Cc: Steven Rostedt

Tejun Heo
12 years ago
182446d08 cgroup: pass around cgroup_subsys_state instead of cgroup in file methods ... Browse Code »

cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup. cftypes for the cgroup core files don't have their subsytem
pointer set. These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
of @cgroup for consistency. This will make the code look simpler
too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
vmpressure while mem_cgroup_from_cont() can be made static.
Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left. Removed.

* cpuacct: cgroup_ca() doesn't have any user left. Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
Removed.

* net_cls: cgrp_cls_state() doesn't have any user left. Removed.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Acked-by: Daniel Wagner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley
Cc: Jens Axboe
Cc: Steven Rostedt

Tejun Heo
12 years ago
eb95419b0 cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods ... Browse Code »

cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
unbound from cgroups and thus css's (cgroup_subsys_state) may be
created and destroyed dynamically over the lifetime of a cgroup,
which is different from the current state where all css's are
allocated and destroyed together with the associated cgroup. This
in turn means that cgroup_css() should be synchronized and may
return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
hierarchy means that the task and descendant iterators should behave
differently depending on the specific subsystem the iteration is
being performed for.

* In majority of the cases, subsystems only care about its part in the
cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
often obtain the matching css pointer from the cgroup and don't
bother with the cgroup pointer itself. Passing around css fits
much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup. The conversions are mostly straight-forward. A few
noteworthy changes are

* ->css_alloc() now takes css of the parent cgroup rather than the
pointer to the new cgroup as the css for the new cgroup doesn't
exist yet. Knowing the parent css is enough for all the existing
subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
with local variable @css as suggested by Li Zefan.

Rebased on top of new for-3.12 which includes for-3.11-fixes so
that ->css_free() invocation added by da0a12caff ("cgroup: fix a
leak when percpu_ref_init() fails") is converted too. Suggested
by Li Zefan.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Acked-by: Daniel Wagner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley
Cc: Jens Axboe
Cc: Steven Rostedt

Tejun Heo
12 years ago
6d37b9742 netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state ... Browse Code »

cgroup controller API will be converted to primarily use struct
cgroup_subsys_state instead of struct cgroup. In preparation, make
the internal functions of netprio_cgroup pass around @css instead of
@cgrp.

While at it, kill struct cgroup_netprio_state which only contained
struct cgroup_subsys_state without serving any purpose. All functions
are converted to deal with @css directly.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Neil Horman
Acked-by: David S. Miller

Tejun Heo
12 years ago
8af01f56a cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ ... Browse Code »

The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward. Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css(). This patch is pure rename.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
12 years ago

29 May, 2013

1 commit

351638e7d net: pass info struct via netdevice notifier ... Browse Code »

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller

Jiri Pirko
12 years ago

07 Feb, 2013

1 commit

62b5942aa net: core: Remove unnecessary alloc/OOM messages ... Browse Code »

alloc failures already get standardized OOM
messages and a dump_stack.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
12 years ago

13 Dec, 2012

1 commit

6be35c700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
Baldessari.

And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...

Linus Torvalds
13 years ago

22 Nov, 2012

6 commits

811d8d6ff netprio_cgroup: allow nesting and inherit config on cgroup creation ... Browse Code »

Inherit netprio configuration from ->css_online(), allow nesting and
remove .broken_hierarchy marking. This makes netprio_cgroup's
behavior match netcls_cgroup's.

Note that this patch changes userland-visible behavior. Nesting is
allowed and the first level cgroups below the root cgroup behave
differently - they inherit priorities from the root cgroup on creation
instead of starting with 0. This is unfortunate but not doing so is
much crazier.

Signed-off-by: Tejun Heo
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago
666b0ebe2 netprio_cgroup: implement netprio[_set]_prio() helpers ... Browse Code »

Introduce two helpers - netprio_prio() and netprio_set_prio() - which
hide the details of priomap access and expansion. This will help
implementing hierarchy support.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago
88d642fa2 netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx ... Browse Code »

With priomap expansion no longer depending on knowing max id
allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx.
Drop prioidx alloc/free logic and convert all uses to cgroup->id.

* In cgrp_css_alloc(), parent->id test is moved above @cs allocation
to simplify error path.

* In cgrp_css_free(), @cs assignment is made initialization.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago
4a6ee25c7 netprio_cgroup: reimplement priomap expansion ... Browse Code »

netprio kept track of the highest prioidx allocated and resized
priomaps accordingly when necessary. This makes it necessary to keep
track of prioidx allocation and may end up resizing on every new
prioidx.

Update extend_netdev_table() such that it takes @target_idx which the
priomap should be able to accomodate. If the priomap is large enough,
nothing happens; otherwise, the size is doubled until @target_idx can
be accomodated.

This makes max_prioidx and write_update_netdev_table() unnecessary.
write_priomap() now calls extend_netdev_table() directly.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago
52bca930c netprio_cgroup: shorten variable names in extend_netdev_table() ... Browse Code »

The function is about to go through a rewrite. In preparation,
shorten the variable names so that we don't repeat "priomap" so often.

This patch is cosmetic.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago
6d5759dd0 netprio_cgroup: simplify write_priomap() ... Browse Code »

sscanf() doesn't bite.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Tested-and-Acked-by: Daniel Wagner
Acked-by: David S. Miller

Tejun Heo
13 years ago

20 Nov, 2012

1 commit

92fb97487 cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ... Browse Code »

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
13 years ago

26 Oct, 2012

1 commit

c658f19db cgroup: net_prio: Mark local used function static ... Browse Code »

net_prio_attach() is only access via cgroup_subsys callbacks,
therefore we can reduce the visibility of this function.

Signed-off-by: Daniel Wagner
Cc: "David S. Miller"
Cc: John Fastabend
Cc: Li Zefan
Cc: Neil Horman
Cc: Tejun Heo
Cc:
Cc:
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Daniel Wagner
13 years ago

03 Oct, 2012

3 commits

aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
13 years ago
aecdc33e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking changes from David Miller:

1) GRE now works over ipv6, from Dmitry Kozlov.

2) Make SCTP more network namespace aware, from Eric Biederman.

3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

4) Make openvswitch network namespace aware, from Pravin B Shelar.

5) IPV6 NAT implementation, from Patrick McHardy.

6) Server side support for TCP Fast Open, from Jerry Chu and others.

7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.

8) Increate the loopback default MTU to 64K, from Eric Dumazet.

9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.

From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.

From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...

Linus Torvalds
13 years ago
68d47a137 Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup hierarchy update from Tejun Heo:
"Currently, different cgroup subsystems handle nested cgroups
completely differently. There's no consistency among subsystems and
the behaviors often are outright broken.

People at least seem to agree that the broken hierarhcy behaviors need
to be weeded out if any progress is gonna be made on this front and
that the fallouts from deprecating the broken behaviors should be
acceptable especially given that the current behaviors don't make much
sense when nested.

This patch makes cgroup emit warning messages if cgroups for
subsystems with broken hierarchy behavior are nested to prepare for
fixing them in the future. This was put in a separate branch because
more related changes were expected (didn't make it this round) and the
memory cgroup wanted to pull in this and make changes on top."

* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them

Linus Torvalds
13 years ago

27 Sep, 2012

1 commit

c3c073f80 new helper: iterate_fd() ... Browse Code »

iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.

tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.

Signed-off-by: Al Viro

Al Viro
13 years ago

15 Sep, 2012

1 commit

8c7f6edbd cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them ... Browse Code »

Currently, cgroup hierarchy support is a mess. cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children. blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup. Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy. Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support. The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
subsystems, so that implementing proper hierarchy support on those
doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
subsystems to implement proper hierarchy support.

For now, start with a single warning message. We can whine louder
later on.

v2: Fixed a typo spotted by Michal. Warning message updated.

v3: Updated memcg part so that it doesn't generate warning in the
cases where .use_hierarchy=false doesn't make the behavior
different from root.use_hierarchy=true. Fixed a typo spotted by
Glauber.

v4: Check ->broken_hierarchy after cgroup creation is complete so that
->create() can affect the result per Michal. Dropped unnecessary
memcg root handling per Michal.

Signed-off-by: Tejun Heo
Acked-by: Michal Hocko
Acked-by: Li Zefan
Acked-by: Serge E. Hallyn
Cc: Glauber Costa
Cc: Peter Zijlstra
Cc: Paul Turner
Cc: Johannes Weiner
Cc: Thomas Graf
Cc: Vivek Goyal
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Neil Horman
Cc: Aneesh Kumar K.V

Tejun Heo
13 years ago