Eric Lee / smarc-fsl-linux-kernel

24 Mar, 2011

40 commits

bfdc0b497 sysctl: restrict write access to dmesg_restrict ... Browse Code »
45

When dmesg_restrict is set to 1 CAP_SYS_ADMIN is needed to read the kernel
ring buffer. But a root user without CAP_SYS_ADMIN is able to reset
dmesg_restrict to 0.

This is an issue when e.g. LXC (Linux Containers) are used and complete
user space is running without CAP_SYS_ADMIN. A unprivileged and jailed
root user can bypass the dmesg_restrict protection.

With this patch writing to dmesg_restrict is only allowed when root has
CAP_SYS_ADMIN.

Signed-off-by: Richard Weinberger
Acked-by: Dan Rosenberg
Acked-by: Serge E. Hallyn
Cc: Eric Paris
Cc: Kees Cook
Cc: James Morris
Cc: Eugene Teo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Richard Weinberger
2011-03-24 10:46:54 +0800
cb16e95fa sysctl: add some missing input constraint checks ... Browse Code »

Add boundaries of allowed input ranges for: dirty_expire_centisecs,
drop_caches, overcommit_memory, page-cluster and panic_on_oom.

Signed-off-by: Petr Holasek
Acked-by: Dave Young
Cc: David Rientjes
Cc: Wu Fengguang
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2011-03-24 10:46:51 +0800
256c53a65 sysctl_check: drop dead code ... Browse Code »

Drop dead code.

Signed-off-by: Denis Kirjanov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis Kirjanov
2011-03-24 10:46:51 +0800
814ecf6e5 sysctl_check: drop table->procname checks ... Browse Code »

Since the for loop checks for the table->procname drop useless
table->procname checks inside the loop body

Signed-off-by: Denis Kirjanov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis Kirjanov
2011-03-24 10:46:50 +0800
ad4ac17ad rapidio: fix potential null deref on failure path ... Browse Code »

If rio is not a switch then "rswitch" is null.

Signed-off-by: Dan Carpenter
Cc: Matt Porter
Cc: Kumar Gala
Signed-off-by: Alexandre Bounine
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2011-03-24 10:46:44 +0800
c1256ebe6 rapidio: remove mport resource reservation from common RIO code ... Browse Code »

Removes resource reservation from the common sybsystem initialization code
and make it part of mport driver initialization. This resolves conflict
with resource reservation by device specific mport drivers.

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:43 +0800
569fccb6b rapidio: modify mport ID assignment ... Browse Code »

Changes mport ID and host destination ID assignment to implement unified
method common to all mport drivers. Makes "riohdid=" kernel command line
parameter common for all architectures with support for more that one host
destination ID assignment.

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:43 +0800
2f809985d rapidio: modify subsystem and driver initialization sequence ... Browse Code »

Subsystem initialization sequence modified to support presence of multiple
RapidIO controllers in the system. The new sequence is compatible with
initialization of PCI devices.

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:42 +0800
388b78adc rapidio: modify configuration to support PCI-SRIO controller ... Browse Code »

1. Add an option to include RapidIO support if the PCI is available.
2. Add FSL_RIO configuration option to enable controller selection.
3. Add RapidIO support option into x86 and MIPS architectures.

Signed-off-by: Alexandre Bounine
Acked-by: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:42 +0800
f8f062698 rapidio: add architecture specific callbacks ... Browse Code »

This set of patches eliminates RapidIO dependency on PowerPC architecture
and makes it available to other architectures (x86 and MIPS). It also
enables support of new platform independent RapidIO controllers such as
PCI-to-SRIO and PCI Express-to-SRIO.

This patch:

Extend number of mport callback functions to eliminate direct linking of
architecture specific mport operations.

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:41 +0800
e15b4d687 rapidio: add RapidIO documentation ... Browse Code »

Add RapidIO documentation files as it was discussed earlier (see thread
http://marc.info/?l=linux-kernel&m=129202338918062&w=2)

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:41 +0800
cd8b974fa rapidio: add new sysfs attributes ... Browse Code »

Add new sysfs attributes.

1. Routing information required to to reach the RIO device:
destid - device destination ID (real for for endpoint, route for switch)
hopcount - hopcount for maintenance requests (switches only)

2. device linking information:
lprev - name of device that precedes the given device in the enumeration
or discovery order (displayed along with of the port to which it
is attached).
lnext - names of devices (with corresponding port numbers) that are
attached to the given device as next in the enumeration or
discovery order (switches only)

Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-03-24 10:46:41 +0800
cfaf346cb drivers/char/mem.c: clean up the code ... Browse Code »

Reduce the lines of code and simplify the logic.

Signed-off-by: Changli Gao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changli Gao
2011-03-24 10:46:40 +0800
115bcd156 drivers/staging/tty/specialix.c: convert func_enter to func_exit ... Browse Code »

Convert calls to func_enter on leaving a function to func_exit.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

//
@@
@@

- func_enter();
+ func_exit();
return...;
//

Signed-off-by: Julia Lawall
Cc: Roger Wolff
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Julia Lawall
2011-03-24 10:46:39 +0800
d9d691f58 drivers/tty/bfin_jtag_comm.c: avoid calling put_tty_driver on NULL ... Browse Code »

put_tty_driver calls tty_driver_kref_put on its argument, and then
tty_driver_kref_put calls kref_put on the address of a field of this
argument. kref_put checks for NULL, but in this case the field is likely
to have some offset and so the result of taking its address will not be
NULL. Labels are added to be able to skip over the call to put_tty_driver
when the argument will be NULL.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

//
@@
expression *x;
@@

*if (x == NULL)
{ ...
* put_tty_driver(x);
...
return ...;
}
//

Signed-off-by: Julia Lawall
Cc: Torben Hohn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Julia Lawall
2011-03-24 10:46:39 +0800
73210a135 drivers/char: add MSM smd_pkt driver ... Browse Code »

Add smd_pkt driver which provides device interface to smd packet ports.

Signed-off-by: Niranjana Vishwanathapura
Cc: Brian Swetland
Cc: Greg KH
Cc: Alan Cox
Cc: David Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Niranjana Vishwanathapura
2011-03-24 10:46:38 +0800
0dcf334c4 drivers/char/ipmi/ipmi_si_intf.c: fix cleanup_one_si section mismatch ... Browse Code »

commit d2478521afc2022 ("char/ipmi: fix OOPS caused by
pnp_unregister_driver on unregistered driver") introduced a section
mismatch by calling __exit cleanup_ipmi_si from __devinit init_ipmi_si.

Remove __exit annotation from cleanup_ipmi_si.

Signed-off-by: Sergey Senozhatsky
Acked-by: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2011-03-24 10:46:38 +0800
5883f57ca proc: protect mm start_code/end_code in /proc/pid/stat ... Browse Code »

While mm->start_stack was protected from cross-uid viewing (commit
f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged
processes")), the start_code and end_code values were not. This would
allow the text location of a PIE binary to leak, defeating ASLR.

Note that the value "1" is used instead of "0" for a protected value since
"ps", "killall", and likely other readers of /proc/pid/stat, take
start_code of "0" to mean a kernel thread and will misbehave. Thanks to
Brad Spengler for pointing this out.

Addresses CVE-2011-0726

Signed-off-by: Kees Cook
Cc:
Cc: Alexey Dobriyan
Cc: David Howells
Cc: Eugene Teo
Cc: Martin Schwidefsky
Cc: Brad Spengler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2011-03-24 10:46:37 +0800
312ec7e50 proc: make struct proc_dir_entry::namelen unsigned int ... Browse Code »

1. namelen is declared "unsigned short" which hints for "maybe space savings".
Indeed in 2.4 struct proc_dir_entry looked like:

struct proc_dir_entry {
unsigned short low_ino;
unsigned short namelen;

Now, low_ino is "unsigned int", all savings were gone for a long time.
"struct proc_dir_entry" is not that countless to worry about it's size,
anyway.

2. converting from unsigned short to int/unsigned int can only create
problems, we better play it safe.

Space is not really conserved, because of natural alignment for the next
field. sizeof(struct proc_dir_entry) remains the same.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-03-24 10:46:37 +0800
fc3d8767b procfs: fix some wrong error code usage ... Browse Code »

[root@wei 1]# cat /proc/1/mem
cat: /proc/1/mem: No such process

error code -ESRCH is wrong in this situation. Return -EPERM instead.

Signed-off-by: Jovi Zhang
Reviewed-by: KOSAKI Motohiro
Cc: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jovi Zhang
2011-03-24 10:46:36 +0800
0db0c01b5 procfs: fix /proc/<pid>/maps heap check ... Browse Code »

The current code fails to print the "[heap]" marking if the heap is split
into multiple mappings.

Fix the check so that the marking is displayed in all possible cases:
1. vma matches exactly the heap
2. the heap vma is merged e.g. with bss
3. the heap vma is splitted e.g. due to locked pages

Test cases. In all cases, the process should have mapping(s) with
[heap] marking:

(1) vma matches exactly the heap

#include
#include
#include

int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}

# ./test1
check /proc/553/maps
[1] + Stopped ./test1
# cat /proc/553/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3113640 /test1
00010000-00011000 rw-p 00000000 01:00 3113640 /test1
00011000-00012000 rw-p 00000000 00:00 0 [heap]
4006f000-40070000 rw-p 00000000 00:00 0

(2) the heap vma is merged

#include
#include
#include

char foo[4096] = "foo";
char bar[4096];

int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}

# ./test2
check /proc/556/maps
[2] + Stopped ./test2
# cat /proc/556/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3116312 /test2
00010000-00012000 rw-p 00000000 01:00 3116312 /test2
00012000-00014000 rw-p 00000000 00:00 0 [heap]
4004a000-4004b000 rw-p 00000000 00:00 0

(3) the heap vma is splitted (this fails without the patch)

#include
#include
#include
#include

int main (void)
{
if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) &&
(sbrk(4096) != (void *)-1)) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}

# ./test3
check /proc/559/maps
[1] + Stopped ./test3
# cat /proc/559/maps|head -4
00008000-00009000 r-xp 00000000 01:00 3119108 /test3
00010000-00011000 rw-p 00000000 01:00 3119108 /test3
00011000-00012000 rw-p 00000000 00:00 0 [heap]
00012000-00013000 rw-p 00000000 00:00 0 [heap]

It looks like the bug has been there forever, and since it only results in
some information missing from a procfile, it does not fulfil the -stable
"critical issue" criteria.

Signed-off-by: Aaro Koskinen
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aaro Koskinen
2011-03-24 10:46:36 +0800
51e031496 proc: hide kernel addresses via %pK in /proc/<pid>/stack ... Browse Code »

This file is readable for the task owner. Hide kernel addresses from
unprivileged users, leave them function names and offsets.

Signed-off-by: Konstantin Khlebnikov
Acked-by: Kees Cook
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-03-24 10:46:36 +0800
523fb486b cpuset: hold callback_mutex in cpuset_post_clone() ... Browse Code »

Chaning cpuset->mems/cpuset->cpus should be protected under
callback_mutex.

cpuset_clone() doesn't follow this rule. It's ok because it's
called when creating and initializing a cgroup, but we'd better
hold the lock to avoid subtil break in the future.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:35 +0800
ee24d3797 cpuset: fix unchecked calls to NODEMASK_ALLOC() ... Browse Code »

Those functions that use NODEMASK_ALLOC() can't propagate errno
to users, but will fail silently.

Fix it by using a static nodemask_t variable for each function, and
those variables are protected by cgroup_mutex;

[akpm@linux-foundation.org: fix comment spelling, strengthen cgroup_lock comment]
Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:35 +0800
c8163ca8a cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_attach() ... Browse Code »

oldcs->mems_allowed is not modified during cpuset_attach(), so we don't
have to copy it to a buffer allocated by NODEMASK_ALLOC(). Just pass it
to cpuset_migrate_mm().

Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:34 +0800
9303e0c48 cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_sprintf_memlist() ... Browse Code »

It's not necessary to copy cpuset->mems_allowed to a buffer allocated by
NODEMASK_ALLOC(). Just pass it to nodelist_scnprintf().

As spotted by Paul, a side effect is we fix a bug that the function can
return -ENOMEM but the caller doesn't expect negative return value.
Therefore change the return value of cpuset_sprintf_cpulist() and
cpuset_sprintf_memlist() from int to size_t.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Acked-by: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2011-03-24 10:46:34 +0800
f9434ad15 memcg: give current access to memory reserves if it's trying to die ... Browse Code »

When a memcg is oom and current has already received a SIGKILL, then give
it access to memory reserves with a higher scheduling priority so that it
may quickly exit and free its memory.

This is identical to the global oom killer and is done even before
checking for panic_on_oom: a pending SIGKILL here while panic_on_oom is
selected is guaranteed to have come from userspace; the thread only needs
access to memory reserves to exit and thus we don't unnecessarily panic
the machine until the kernel has no last resort to free memory.

Signed-off-by: David Rientjes
Cc: Balbir Singh
Cc: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-03-24 10:46:33 +0800
5a6475a4e memcg: fix leak on wrong LRU with FUSE ... Browse Code »

fs/fuse/dev.c::fuse_try_move_page() does

(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)

This implies the page is _on_ LRU when it's added to radix-tree. So, the
page is added to memory cgroup while it's on LRU. because LRU is lazy and
no one flushs it.

This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.

And we need to taking care of pagevec.

If PageLRU(page) is set before we add PCG_USED bit, the page will not be
added to memcg's LRU (in short period). So, regardlress of PageLRU(page)
value before commit_charge(), we need to check PageLRU(page) after
commit_charge().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30432

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Johannes Weiner
Acked-by: Daisuke Nishimura
Cc: Miklos Szeredi
Cc: Balbir Singh
Reported-by: Daniel Poelzleithner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-03-24 10:46:33 +0800
6cfddb261 memcg: page_cgroup array is never stored on reserved pages ... Browse Code »

KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
PageReserved because we never store the array on reserved pages (neither
alloc_pages_exact nor vmalloc use those pages).

So we can replace the check by a BUG_ON.

Signed-off-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:33 +0800
dde79e005 page_cgroup: reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM ... Browse Code »

Currently we are allocating a single page_cgroup array per memory section
(stored in mem_section->base) when CONFIG_SPARSEMEM is selected. This is
correct but memory inefficient solution because the allocated memory
(unless we fall back to vmalloc) is not kmalloc friendly:

- 32b - 16384 entries (20B per entry) fit into 327680B so the
524288B slab cache is used
- 32b with PAE - 131072 entries with 2621440B fit into 4194304B
- 64b - 32768 entries (40B per entry) fit into 2097152 cache

This is ~37% wasted space per memory section and it sumps up for the whole
memory. On a x86_64 machine it is something like 6MB per 1GB of RAM.

We can reduce the internal fragmentation by using alloc_pages_exact which
allocates PAGE_SIZE aligned blocks so we will get down to
Cc: Dave Hansen
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:32 +0800
4be4489fe mm/memcontrol.c: suppress uninitialized-var warning with older gcc's ... Browse Code »

mm/memcontrol.c: In function 'mem_cgroup_force_empty':
mm/memcontrol.c:2280: warning: 'flags' may be used uninitialized in this function

It's a false positive.

Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2011-03-24 10:46:32 +0800
7a159cc9d memcg: use native word page statistics counters ... Browse Code »

The statistic counters are in units of pages, there is no reason to make
them 64-bit wide on 32-bit machines.

Make them native words. Since they are signed, this leaves 31 bit on
32-bit machines, which can represent roughly 8TB assuming a page size of
4k.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Johannes Weiner
Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:31 +0800
e9f8974f2 memcg: break out event counters from other stats ... Browse Code »

For increasing and decreasing per-cpu cgroup usage counters it makes sense
to use signed types, as single per-cpu values might go negative during
updates. But this is not the case for only-ever-increasing event
counters.

All the counters have been signed 64-bit so far, which was enough to count
events even with the sign bit wasted.

This patch:
- divides s64 counters into signed usage counters and unsigned
monotonically increasing event counters.
- converts unsigned event counters into 'unsigned long' rather than
'u64'. This matches the type used by the /proc/vmstat event counters.

The next patch narrows the signed usage counters type (on 32-bit CPUs,
that is).

Signed-off-by: Johannes Weiner
Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:31 +0800
7ec99d621 memcg: unify charge/uncharge quantities to units of pages ... Browse Code »

There is no clear pattern when we pass a page count and when we pass a
byte count that is a multiple of PAGE_SIZE.

We never charge or uncharge subpage quantities, so convert it all to page
counts.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:30 +0800
7ffd4ca7a memcg: convert uncharge batching from bytes to page granularity ... Browse Code »

We never uncharge subpage quantities.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:30 +0800
11c9ea4e8 memcg: convert per-cpu stock from bytes to page granularity ... Browse Code »

We never keep subpage quantities in the per-cpu stock.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
e7018b8d2 memcg: keep only one charge cancelling function ... Browse Code »

We have two charge cancelling functions: one takes a page count, the other
a page size. The second one just divides the parameter by PAGE_SIZE and
then calls the first one. This is trivial, no need for an extra function.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
bf1ff2635 memcg: remove memcg->reclaim_param_lock ... Browse Code »

The reclaim_param_lock is only taken around single reads and writes to
integer variables and is thus superfluous. Drop it.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
4dc03de1b memcg: charged pages always have valid per-memcg zone info ... Browse Code »

page_cgroup_zoneinfo() will never return NULL for a charged page, remove
the check for it in mem_cgroup_get_reclaim_stat_from_page().

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800
6b3ae58ef memcg: remove direct page_cgroup-to-page pointer ... Browse Code »

In struct page_cgroup, we have a full word for flags but only a few are
reserved. Use the remaining upper bits to encode, depending on
configuration, the node or the section, to enable page_cgroup-to-page
lookups without a direct pointer.

This saves a full word for every page in a system with memory cgroups
enabled.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800