13 Dec, 2011
12 commits
-
This patch introduces kmem.tcp.max_usage_in_bytes file, living in the
kmem_cgroup filesystem. The root cgroup will display a value equal
to RESOURCE_MAX. This is to avoid introducing any locking schemes in
the network paths when cgroups are not being actively used.All others, will see the maximum memory ever used by this cgroup.
Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch introduces kmem.tcp.failcnt file, living in the
kmem_cgroup filesystem. Following the pattern in the other
memcg resources, this files keeps a counter of how many times
allocation failed due to limits being hit in this cgroup.
The root cgroup will always show a failcnt of 0.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch introduces kmem.tcp.usage_in_bytes file, living in the
kmem_cgroup filesystem. It is a simple read-only file that displays the
amount of kernel memory currently consumed by the cgroup.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
effectively control the amount of kernel memory pinned by a cgroup.This value is ignored in the root cgroup, and in all others,
caps the value specified by the admin in the net namespaces'
view of tcp_sysctl_mem.If namespaces are being used, the admin is allowed to set a
value bigger than cgroup's maximum, the same way it is allowed
to set pretty much unlimited values in a real box.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.This patch handles the generic part of the code, and has nothing
tcp-specific.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Kirill A. Shutemov
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller -
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller -
This patch lays down the foundation for the kernel memory component
of the Memory Controller.As of today, I am only laying down the following files:
* memory.independent_kmem_limit
* memory.kmem.limit_in_bytes (currently ignored)
* memory.kmem.usage_in_bytes (always zero)Signed-off-by: Glauber Costa
CC: Kirill A. Shutemov
CC: Paul Menage
CC: Greg Thelen
CC: Johannes Weiner
CC: Michal Hocko
Signed-off-by: David S. Miller -
After a guest is live migrated, the xen-netfront driver emits a gratuitous
ARP message, so that networking hardware on the target host's subnet can
take notice, and public routing to the guest is re-established. However,
if the packet appears on the backend interface before the backend is added
to the target host's bridge, the packet is lost, and the migrated guest's
peers become unable to talk to the guest.A sufficient two-parts condition to prevent the above is:
(1) ensure that the backend only moves to Connected xenbus state after its
hotplug scripts completed, ie. the netback interface got added to the
bridge; and(2) ensure the frontend only queues the gARP when it sees the backend move
to Connected.These two together provide complete ordering. Sub-condition (1) is already
satisfied by commit f942dc2552b8 in Linus' tree, based on commit
6b0b80ca7165 from [1].In general, the full condition is sufficient, not necessary, because,
according to [2], live migration has been working for a long time without
satisfying sub-condition (2). However, after 6b0b80ca7165 was backported
to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6
guest. This patch intends to provide (2) for upstream.The Reviewed-by line comes from [3].
[1] git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history
[2] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html
[3] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.htmlSigned-off-by: Laszlo Ersek
Reviewed-by: Ian Campbell
Signed-off-by: David S. Miller -
…wireless-next into for-davem
12 Dec, 2011
6 commits
-
Don't write more than the requested number of bytes of an batman-adv icmp
packet to the userspace buffer. Otherwise unrelated userspace memory might get
overridden by the kernel.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner -
The access_ok read check can be directly done in copy_from_user since a failure
of access_ok is handled the same way as an error in __copy_from_user.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner -
Writing a icmp_packet_rr and then reading icmp_packet can lead to kernel
memory corruption, if __user *buf is just below TASK_SIZE.Signed-off-by: Paul Kot
[sven@narfation.org: made it checkpatch clean]
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner -
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
disable Tx vlan offloading in certain cases.
Signed-off-by: Ajit Khaparde
Signed-off-by: David S. Miller -
update pmem_fifo_overflow_drop, rx_priority_pause_frames counters.
Signed-off-by: Ajit Khaparde
Signed-off-by: David S. Miller
11 Dec, 2011
2 commits
-
Wrap the udp6 lookup into the proper ifdef-s.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Eric Dumazet reported, that when inet_diag is built-in the udp_diag also goes
built-in and when ipv6 is a module the udp6 lookup symbol is not found.LD .tmp_vmlinux1
net/built-in.o: In function `udp_dump_one':
udp_diag.c:(.text+0xa2b40): undefined reference to `__udp6_lib_lookup'
make: *** [.tmp_vmlinux1] Erreur 1Fix this by making udp diag build mode depend on both -- inet diag and ipv6.
Reported-by: Eric Dumazet
Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Dec, 2011
17 commits
-
It looks like the regression was introduced between 20111202 and
20111205 (linux-next tree). Symptoms: connection to AP seem to be
established, but no data goes though it in any way. Tested on intel
5300.
Peek at the changes have shown that it looks like at least part of
the code wasn't merged properly. It was originally committed into
iwl_agn.c but code in question was moved to iwl-mac80211.c.
This patch puts code in place and my card works again.Signed-off-by: Nikolay Martynov
Signed-off-by: John W. Linville -
CC [M] drivers/net/wireless/wl12xx/tx.o
drivers/net/wireless/wl12xx/tx.c: In function ‘wl1271_tx_fill_hdr’:
drivers/net/wireless/wl12xx/tx.c:288:6: warning: ‘tx_attr’ may be used uninitialized in this functionSigned-off-by: John W. Linville
-
Copy-s/tcp/udp/-paste from TCP bits.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Do the same as TCP does -- iterate the given udp_table, filter
sockets with bytecode and dump sockets into reply message.The same filtering as for TCP applies, though only some of the
state bits really matter.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Do the same as TCP does -- lookup a socket in the given udp_table,
check cookie, fill the reply message with existing inet socket dumping
helper and send one back.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Introduce the transport level diag handler module for UDP (and UDP-lite)
sockets and register (empty for now) callbacks in the inet_diag module.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The UDP diag get_exact handler will require them to find a
socket by provided net, [sd]addr-s, [sd]ports and device.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Introduce two callbacks in inet_diag_handler -- one for dumping all
sockets (with filters) and the other one for dumping a single sk.Replace direct calls to icsk handlers with indirect calls to callbacks
provided by handlers.Make existing TCP and DCCP handlers use provided helpers for icsk-s.
The UDP diag module will provide its own.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The existing inet_csk_diag_fill dumps the inet connection sock info
into the netlink inet_diag_message. Prepare this routine to be able
to dump only the inet_sock part of a socket if the icsk part is missing.This will be used by UDP diag module when dumping UDP sockets.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The upcoming UDP module will require exactly this ability, so just
move the existing code to provide one.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Similar to previous patch: the 1st part locks the inet handler
and will get generalized and the 2nd one dumps icsk-s and will
be used by TCP and DCCP handlers.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The 1st part locks the inet handler and the 2nd one dump the
inet connection sock.In the next patches the 1st part will be generalized to call
the socket dumping routine indirectly (i.e. TCP/UDP/DCCP) and
the 2nd part will be used by TCP and DCCP handlers.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The netlink diag susbsys stores sk address bits in the nl message
as a "cookie" and uses one when dumps details about particular
socket.The same will be required for udp diag module, so introduce a heler
in inet_diag moduleSigned-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
There's an info_size value stored on inet_diag_handler, but for existing
code this value is effectively constant, so just use sizeof(struct tcp_info)
where required.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.Old tc binaries are non aware of new attributes, and still set/get Plog.
New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.# tc -d qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
This reverts commit 865d9f9f748fdc1943679ea65d9ee1dc55e4a6ae.
This commit breaks the build with CONFIG_NETPRIO_CGROUP=y so
revert it. It does build as a module though. The SUBSYS macro
in the cgroup core code automatically defines a subsys structure
as extern. Long term we should fix the macro. And I need to
fully build test things.Tested with CONFIG_NETPRIO_CGROUP={y|m|n} with and without
CONFIG_CGROUPS defined.Signed-off-by: John Fastabend
CC: Neil Horman
Reported-By: Eric Dumazet
Signed-off-by: David S. Miller
09 Dec, 2011
3 commits
-
These tests are off by one because sock_diag_handlers[] only has AF_MAX
elements.Signed-off-by: Dan Carpenter
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
net_prio_subsys can be made static this removes the sparse
warning it was throwing.Signed-off-by: John Fastabend
Acked-by: Neil Horman
Signed-off-by: David S. Miller -
The code is missing initialization of NO_FCOE_FLAG and NO_ISCSI*FLAGS
when CONFIG_CNIC is not selected.
This causes panic during driver load since commit
1d187b34daaecbb87aa523ba46b92930a388cb21 where NO_FCOE tested
unconditionally (outside #ifdef BCM_CNIC structure) and
accessed fp[FCOE_IDX] which is not allocated.Reported-by: Eric Dumazet
Signed-off-by: Dmitry Kravkov
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller