Eric Lee / smarc-fsl-linux-kernel

05 Dec, 2011

1 commit

52cef1891 slab, lockdep: Fix silly bug ... Browse Code »

Commit 30765b92 ("slab, lockdep: Annotate the locks before using
them") moves the init_lock_keys() call from after g_cpucache_up =
FULL, to before it. And overlooks the fact that init_node_lock_keys()
tests for it and ignores everything !FULL.

Introduce a LATE stage and change the lockdep test to be
Cc: Pekka Enberg
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:44:00 +0800

28 Sep, 2011

1 commit

ab067e99d mm: restrict access to slab files under procfs and sysfs ... Browse Code »

Historically /proc/slabinfo and files under /sys/kernel/slab/* have
world read permissions and are accessible to the world. slabinfo
contains rather private information related both to the kernel and
userspace tasks. Depending on the situation, it might reveal either
private information per se or information useful to make another
targeted attack. Some examples of what can be learned by
reading/watching for /proc/slabinfo entries:

1) dentry (and different *inode*) number might reveal other processes fs
activity. The number of dentry "active objects" doesn't strictly show
file count opened/touched by a process, however, there is a good
correlation between them. The patch "proc: force dcache drop on
unauthorized access" relies on the privacy of dentry count.

2) different inode entries might reveal the same information as (1), but
these are more fine granted counters. If a filesystem is mounted in a
private mount point (or even a private namespace) and fs type differs from
other mounted fs types, fs activity in this mount point/namespace is
revealed. If there is a single ecryptfs mount point, the whole fs
activity of a single user is revealed. Number of files in ecryptfs
mount point is a private information per se.

3) fuse_* reveals number of files / fs activity of a user in a user
private mount point. It is approx. the same severity as ecryptfs
infoleak in (2).

4) sysfs_dir_cache similar to (2) reveals devices' addition/removal,
which can be otherwise hidden by "chmod 0700 /sys/". With 0444 slabinfo
the precise number of sysfs files is known to the world.

5) buffer_head might reveal some kernel activity. With other
information leaks an attacker might identify what specific kernel
routines generate buffer_head activity.

6) *kmalloc* infoleaks are very situational. Attacker should watch for
the specific kmalloc size entry and filter the noise related to the unrelated
kernel activity. If an attacker has relatively silent victim system, he
might get rather precise counters.

Additional information sources might significantly increase the slabinfo
infoleak benefits. E.g. if an attacker knows that the processes
activity on the system is very low (only core daemons like syslog and
cron), he may run setxid binaries / trigger local daemon activity /
trigger network services activity / await sporadic cron jobs activity
/ etc. and get rather precise counters for fs and network activity of
these privileged tasks, which is unknown otherwise.

Also hiding slabinfo and /sys/kernel/slab/* is a one step to complicate
exploitation of kernel heap overflows (and possibly, other bugs). The
related discussion:

http://thread.gmane.org/gmane.linux.kernel/1108378

To keep compatibility with old permission model where non-root
monitoring daemon could watch for kernel memleaks though slabinfo one
should do:

groupadd slabinfo
usermod -a -G slabinfo $MONITOR_USER

And add the following commands to init scripts (to mountall.conf in
Ubuntu's upstart case):

chmod g+r /proc/slabinfo /sys/kernel/slab/*/*
chgrp slabinfo /proc/slabinfo /sys/kernel/slab/*/*

Signed-off-by: Vasiliy Kulikov
Reviewed-by: Kees Cook
Reviewed-by: Dave Hansen
Acked-by: Christoph Lameter
Acked-by: David Rientjes
CC: Valdis.Kletnieks@vt.edu
CC: Linus Torvalds
CC: Alan Cox
Signed-off-by: Pekka Enberg

Vasiliy Kulikov
2011-09-28 03:59:27 +0800

19 Sep, 2011

1 commit

d20bbfab0 Merge branch 'slab/urgent' into slab/next Browse Code »

Pekka Enberg
2011-09-19 22:46:07 +0800

04 Aug, 2011

2 commits

30765b92a slab, lockdep: Annotate the locks before using them ... Browse Code »
43

Fernando found we hit the regular OFF_SLAB 'recursion' before we
annotate the locks, cure this.

The relevant portion of the stack-trace:

> [ 0.000000] [] rt_spin_lock+0x50/0x56
> [ 0.000000] [] __cache_free+0x43/0xc3
> [ 0.000000] [] kmem_cache_free+0x6c/0xdc
> [ 0.000000] [] slab_destroy+0x4f/0x53
> [ 0.000000] [] free_block+0x94/0xc1
> [ 0.000000] [] do_tune_cpucache+0x10b/0x2bb
> [ 0.000000] [] enable_cpucache+0x7b/0xa7
> [ 0.000000] [] kmem_cache_init_late+0x1f/0x61
> [ 0.000000] [] start_kernel+0x24c/0x363
> [ 0.000000] [] i386_start_kernel+0xa9/0xaf

Reported-by: Fernando Lopez-Lezcano
Acked-by: Pekka Enberg
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptop
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-08-04 16:18:00 +0800
83835b3d9 slab, lockdep: Annotate slab -> rcu -> debug_object -> slab ... Browse Code »

Lockdep thinks there's lock recursion through:

kmem_cache_free()
cache_flusharray()
spin_lock(&l3->list_lock) list_lock) --'

Now debug objects doesn't use SLAB_DESTROY_BY_RCU and hence there is no
actual possibility of recursing. Luckily debug objects marks it slab
with SLAB_DEBUG_OBJECTS so we can identify the thing.

Mark all SLAB_DEBUG_OBJECTS (all one!) slab caches with a special
lockdep key so that lockdep sees its a different cachep.

Also add a WARN on trying to create a SLAB_DESTROY_BY_RCU |
SLAB_DEBUG_OBJECTS cache, to avoid possible future trouble.

Reported-and-tested-by: Sebastian Siewior
[ fixes to the initial patch ]
Reported-by: Thomas Gleixner
Acked-by: Pekka Enberg
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1311341165.27400.58.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-08-04 16:17:54 +0800

01 Aug, 2011

1 commit

fdde6abb3 slab: use print_hex_dump ... Browse Code »

Less code and the advantage of ascii dump.

before:
| Slab corruption: names_cache start=c5788000, len=4096
| 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00
| 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
| 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff
| 030: ff ff ff ff e2 b4 17 18 c7 e4 08 06 00 01 08 00
| 040: 06 04 00 01 e2 b4 17 18 c7 e4 0a 00 00 01 00 00
| 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b

after:
| Slab corruption: size-4096 start=c38a9000, len=4096
| 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00 kk....V...$...*.
| 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
| 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ................
| 030: ff ff ff ff d2 56 5f aa db 9c 08 06 00 01 08 00 .....V_.........
| 040: 06 04 00 01 d2 56 5f aa db 9c 0a 00 00 01 00 00 .....V_.........
| 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b ........kkkkkkkk

Acked-by: Christoph Lameter
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Pekka Enberg

Sebastian Andrzej Siewior
2011-08-01 00:16:33 +0800

31 Jul, 2011

1 commit

eacbbae38 slab: use NUMA_NO_NODE ... Browse Code »

Use the nice enumerated constant.

Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Pekka Enberg

Andrew Morton
2011-07-31 23:14:21 +0800

28 Jul, 2011

1 commit

acfe7d744 slab: remove one NR_CPUS dependency ... Browse Code »

Reduce high order allocations in do_tune_cpucache() for some setups.
(NR_CPUS=4096 -> we need 64KB)

Signed-off-by: Eric Dumazet
Acked-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Eric Dumazet
2011-07-28 18:40:08 +0800

23 Jul, 2011

1 commit

f99b7880c Merge branch 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slab: fix DEBUG_SLAB warning
slab: shrink sizeof(struct kmem_cache)
slab: fix DEBUG_SLAB build
SLUB: Fix missing include
slub: reduce overhead of slub_debug
slub: Add method to verify memory is not freed
slub: Enable backtrace for create/delete points
slab allocators: Provide generic description of alignment defines
slab, slub, slob: Unify alignment definition
slob/lockdep: Fix gfp flags passed to lockdep

Linus Torvalds
2011-07-23 03:44:30 +0800

22 Jul, 2011

1 commit

7ea466f22 slab: fix DEBUG_SLAB warning ... Browse Code »

In commit c225150b "slab: fix DEBUG_SLAB build",
"if ((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))" is always true if
ARCH_SLAB_MINALIGN == 0. Do not print warning if ARCH_SLAB_MINALIGN == 0.

Signed-off-by: Tetsuo Handa
Signed-off-by: Pekka Enberg

Tetsuo Handa
2011-07-22 16:01:03 +0800

21 Jul, 2011

1 commit

b56efcf0a slab: shrink sizeof(struct kmem_cache) ... Browse Code »

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)

We now allocate exact needed size (using nr_cpu_ids and nr_node_ids)

This also makes code a bit smaller on x86_64, since some field offsets
are less than the 127 limit :

Before patch :
# size mm/slab.o
text data bss dec hex filename
22605 361665 32 384302 5dd2e mm/slab.o

After patch :
# size mm/slab.o
text data bss dec hex filename
22349 353473 8224 384046 5dc2e mm/slab.o

CC: Andrew Morton
Reported-by: Konstantin Khlebnikov
Signed-off-by: Eric Dumazet
Acked-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Eric Dumazet
2011-07-21 01:27:56 +0800

18 Jul, 2011

1 commit

c225150b8 slab: fix DEBUG_SLAB build ... Browse Code »

Fix CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y build error and warnings.

Now that ARCH_SLAB_MINALIGN defaults to __alignof__(unsigned long long),
it is always defined (when slab.h included), but cannot be used in #if:
mm/slab.c: In function `cache_alloc_debugcheck_after':
mm/slab.c:3156:5: warning: "__alignof__" is not defined
mm/slab.c:3156:5: error: missing binary operator before token "("
make[1]: *** [mm/slab.o] Error 1

So just remove the #if and #endif lines, but then 64-bit build warns:
mm/slab.c: In function `cache_alloc_debugcheck_after':
mm/slab.c:3156:6: warning: cast from pointer to integer of different size
mm/slab.c:3158:10: warning: format `%d' expects type `int', but argument
3 has type `long unsigned int'
Fix those with casts, whatever the actual type of ARCH_SLAB_MINALIGN.

Acked-by: Christoph Lameter
Signed-off-by: Hugh Dickins
Signed-off-by: Pekka Enberg

Hugh Dickins
2011-07-18 20:20:49 +0800

04 Jun, 2011

1 commit

a947eb95e SLAB: Record actual last user of freed objects. ... Browse Code »

Currently, when using CONFIG_DEBUG_SLAB, we put in kfree() or
kmem_cache_free() as the last user of free objects, which is not
very useful, so change it to the caller of those functions instead.

Acked-by: David Rientjes
Acked-by: Christoph Lameter
Signed-off-by: Suleiman Souhlal
Signed-off-by: Pekka Enberg

Suleiman Souhlal
2011-06-04 00:33:50 +0800

21 May, 2011

1 commit

268bb0ce3 sanitize <linux/prefetch.h> usage ... Browse Code »

Commit e66eed651fd1 ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.

So this fixes things up a bit, using

grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

to guide us in finding files that either need
inclusion, or have it despite not needing it.

There are more of them around (mostly network drivers), but this gets
many core ones.

Reported-by: Stephen Rothwell
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-05-21 03:50:29 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

1 commit

5fda1bd5b mm: notifier_from_errno() cleanup ... Browse Code »

While looking at some other notifier callbacks I noticed this code could
use a simple cleanup.

notifier_from_errno() no longer needs the if (ret)/else conditional. That
same conditional is now done in notifier_from_errno().

Signed-off-by: Prarit Bhargava
Cc: Paul Menage
Cc: Li Zefan
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Prarit Bhargava
2011-03-23 08:44:01 +0800

12 Mar, 2011

3 commits

865d794d1 Merge branch 'slab/urgent' into slab/next Browse Code »

Pekka Enberg
2011-03-12 00:11:19 +0800
c91495567 Merge branch 'slab/rcu' into slab/next ... Browse Code »

Conflicts:
mm/slub.c

Pekka Enberg
2011-03-12 00:10:45 +0800
5bfe53a77 slab,rcu: don't assume the size of struct rcu_head ... Browse Code »

The size of struct rcu_head may be changed. When it becomes larger,
it may pollute the data after struct slab.

Acked-by: Christoph Lameter
Signed-off-by: Lai Jiangshan
Signed-off-by: Pekka Enberg

Lai Jiangshan
2011-03-12 00:06:35 +0800

14 Feb, 2011

1 commit

3ff84a7f3 Revert "slab: Fix missing DEBUG_SLAB last user" ... Browse Code »

This reverts commit 5c5e3b33b7cb959a401f823707bee006caadd76e.

The commit breaks ARM thusly:

| Mount-cache hash table entries: 512
| slab error in verify_redzone_free(): cache `idr_layer_cache': memory outside object was overwritten
| Backtrace:
| [] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c)
| [] (dump_stack+0x0/0x1c) from [] (__slab_error+0x28/0x30)
| [] (__slab_error+0x0/0x30) from [] (cache_free_debugcheck+0x1c0/0x2b8)
| [] (cache_free_debugcheck+0x0/0x2b8) from [] (kmem_cache_free+0x3c/0xc0)
| [] (kmem_cache_free+0x0/0xc0) from [] (ida_get_new_above+0x19c/0x1c0)
| [] (ida_get_new_above+0x0/0x1c0) from [] (alloc_vfsmnt+0x54/0x144)
| [] (alloc_vfsmnt+0x0/0x144) from [] (vfs_kern_mount+0x30/0xec)
| [] (vfs_kern_mount+0x0/0xec) from [] (kern_mount_data+0x1c/0x20)
| [] (kern_mount_data+0x0/0x20) from [] (sysfs_init+0x68/0xc8)
| [] (sysfs_init+0x0/0xc8) from [] (mnt_init+0x90/0x1b0)
| [] (mnt_init+0x0/0x1b0) from [] (vfs_caches_init+0x100/0x140)
| [] (vfs_caches_init+0x0/0x140) from [] (start_kernel+0x2e8/0x368)
| [] (start_kernel+0x0/0x368) from [] (__enable_mmu+0x0/0x2c)
| c0113268: redzone 1:0xd84156c5c032b3ac, redzone 2:0xd84156c5635688c0.
| slab error in cache_alloc_debugcheck_after(): cache `idr_layer_cache': double free, or memory outside object was overwritten
| ...
| c011307c: redzone 1:0x9f91102ffffffff, redzone 2:0x9f911029d74e35b
| slab: Internal list corruption detected in cache 'idr_layer_cache'(24), slabp c0113000(16). Hexdump:
|
| 000: 20 4f 10 c0 20 4f 10 c0 7c 00 00 00 7c 30 11 c0
| 010: 10 00 00 00 10 00 00 00 00 00 c9 17 fe ff ff ff
| 020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 030: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 050: fe ff ff ff fe ff ff ff fe ff ff ff 11 00 00 00
| 060: 12 00 00 00 13 00 00 00 14 00 00 00 15 00 00 00
| 070: 16 00 00 00 17 00 00 00 c0 88 56 63
| kernel BUG at /home/rmk/git/linux-2.6-rmk/mm/slab.c:2928!

Reference: https://lkml.org/lkml/2011/2/7/238
Cc: # 2.6.35.y and later
Reported-and-analyzed-by: Russell King
Signed-off-by: Pekka Enberg

Pekka Enberg
2011-02-14 23:46:21 +0800

24 Jan, 2011

1 commit

63310467a mm: Remove support for kmem_cache_name() ... Browse Code »

The last user was ext4 and Eric Sandeen removed the call in a recent patch. See
the following URL for the discussion:

http://marc.info/?l=linux-ext4&m=129546975702198&w=2

Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2011-01-24 03:00:05 +0800

15 Jan, 2011

1 commit

68a1b1955 mm/slab.c: make local symbols static ... Browse Code »

Local symbols should be static.

Signed-off-by: H Hartley Sweeten
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matt Mackall
Signed-off-by: Pekka Enberg

H Hartley Sweeten
2011-01-15 19:28:36 +0800

11 Jan, 2011

1 commit

a1e8fad59 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Fix a crash during slabinfo -v
tracing/slab: Move kmalloc tracepoint out of inline code
slub: Fix slub_lock down/up imbalance
slub: Fix build breakage in Documentation/vm
slub tracing: move trace calls out of always inlined functions to reduce kernel code size
slub: move slabinfo.c to tools/slub/slabinfo.c

Linus Torvalds
2011-01-11 00:38:01 +0800

08 Jan, 2011

2 commits

72eb6a791 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.

Linus Torvalds
2011-01-08 09:02:58 +0800
23d69b09b Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
usb: don't use flush_scheduled_work()
speedtch: don't abuse struct delayed_work
media/video: don't use flush_scheduled_work()
media/video: explicitly flush request_module work
ioc4: use static work_struct for ioc4_load_modules()
init: don't call flush_scheduled_work() from do_initcalls()
s390: don't use flush_scheduled_work()
rtc: don't use flush_scheduled_work()
mmc: update workqueue usages
mfd: update workqueue usages
dvb: don't use flush_scheduled_work()
leds-wm8350: don't use flush_scheduled_work()
mISDN: don't use flush_scheduled_work()
macintosh/ams: don't use flush_scheduled_work()
vmwgfx: don't use flush_scheduled_work()
tpm: don't use flush_scheduled_work()
sonypi: don't use flush_scheduled_work()
hvsi: don't use flush_scheduled_work()
xen: don't use flush_scheduled_work()
gdrom: don't use flush_scheduled_work()
...

Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
as per Tejun.

Linus Torvalds
2011-01-08 08:58:04 +0800

07 Jan, 2011

1 commit

ccd35fb9f kernel: kmem_ptr_validate considered harmful ... Browse Code »

This is a nasty and error prone API. It is no longer used, remove it.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:16 +0800

17 Dec, 2010

1 commit

909ea9646 core: Replace __get_cpu_var with __this_cpu_read if not used for an address. ... Browse Code »

__get_cpu_var() can be replaced with this_cpu_read and will then use a
single read instruction with implied address calculation to access the
correct per cpu instance.

However, the address of a per cpu variable passed to __this_cpu_read()
cannot be determined (since it's an implied address conversion through
segment prefixes). Therefore apply this only to uses of __get_cpu_var
where the address of the variable is not used.

Cc: Pekka Enberg
Cc: Hugh Dickins
Cc: Thomas Gleixner
Acked-by: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2010-12-17 22:07:19 +0800

15 Dec, 2010

1 commit

afe2c511f workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() ... Browse Code »

cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago. Convert all the
in-kernel users. The conversions are completely equivalent and
trivial.

Signed-off-by: Tejun Heo
Acked-by: "David S. Miller"
Acked-by: Greg Kroah-Hartman
Acked-by: Evgeniy Polyakov
Cc: Jeff Garzik
Cc: Benjamin Herrenschmidt
Cc: Mauro Carvalho Chehab
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov
Cc: David Woodhouse
Cc: "J. Bruce Fields"
Cc: Neil Brown
Cc: Alex Elder
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Andrew Morton
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust
Cc: linux-nfs@vger.kernel.org

Tejun Heo
2010-12-15 17:56:11 +0800

29 Nov, 2010

1 commit

85beb5869 tracing/slab: Move kmalloc tracepoint out of inline code ... Browse Code »

The tracepoint for kmalloc is in the slab inlined code which causes
every instance of kmalloc to have the tracepoint.

This patch moves the tracepoint out of the inline code to the
slab C file, which removes a large number of inlined trace
points.

objdump -dr vmlinux.slab| grep 'jmpq.*
Signed-off-by: Pekka Enberg

Steven Rostedt
2010-11-29 03:16:28 +0800

27 Oct, 2010

1 commit

732eacc05 replace nested max/min macros with {max,min}3 macro ... Browse Code »

Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer
Cc: Joe Perches
Cc: Ingo Molnar
Cc: Hartley Sweeten
Cc: Russell King
Cc: Benjamin Herrenschmidt
Cc: Thomas Gleixner
Cc: Herbert Xu
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hagen Paul Pfeifer
2010-10-27 07:52:12 +0800

23 Aug, 2010

1 commit

bc584c510 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slab: fix object alignment
slub: add missing __percpu markup in mm/slub_def.h

Linus Torvalds
2010-08-23 01:08:52 +0800

10 Aug, 2010

1 commit

4e60c86bd gcc-4.6: mm: fix unused but set warnings ... Browse Code »

No real bugs, just some dead code and some fixups.

Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2010-08-10 11:44:58 +0800

09 Aug, 2010

1 commit

1ab335d8f slab: fix object alignment ... Browse Code »

This patch fixes alignment of slab objects in case CONFIG_DEBUG_PAGEALLOC is
active.
Before this spot in kmem_cache_create, we have this situation:
- align contains the required alignment of the object
- cachep->obj_offset is 0 or equals align in case of CONFIG_DEBUG_SLAB
- size equals the size of the object, or object plus trailing redzone in case
of CONFIG_DEBUG_SLAB

This spot tries to fill one page per object if the object is in certain size
limits, however setting obj_offset to PAGE_SIZE - size does break the object
alignment since size may not be aligned with the required alignment.
This patch simply adds an ALIGN(size, align) to the equation and fixes the
object size detection accordingly.

This code in drivers/s390/cio/qdio_setup_init has lead to incorrectly aligned
slab objects (sizeof(struct qdio_q) equals 1792):
qdio_q_cache = kmem_cache_create("qdio_q", sizeof(struct qdio_q),
256, 0, NULL);

Acked-by: Christoph Lameter
Signed-off-by: Carsten Otte
Signed-off-by: Pekka Enberg

Carsten Otte
2010-08-09 23:48:07 +0800

07 Aug, 2010

1 commit

b57bdda58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Allow removal of slab caches during boot
Revert "slub: Allow removal of slab caches during boot"
slub numa: Fix rare allocation from unexpected node
slab: use deferable timers for its periodic housekeeping
slub: Use kmem_cache flags to detect if slab is in debugging mode.
slub: Allow removal of slab caches during boot
slub: Check kasprintf results in kmem_cache_init()
SLUB: Constants need UL
slub: Use a constant for a unspecified node.
SLOB: Free objects to their own list
slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACING

Linus Torvalds
2010-08-07 02:44:08 +0800

20 Jul, 2010

1 commit

78b435368 slab: use deferable timers for its periodic housekeeping ... Browse Code »

slab has a "once every 2 second" timer for its housekeeping.
As the number of logical processors is growing, its more and more
common that this 2 second timer becomes the primary wakeup source.

This patch turns this housekeeping timer into a deferable timer,
which means that the timer does not interrupt idle, but just runs
at the next event that wakes the cpu up.

The impact is that the timer likely runs a bit later, but during the
delay no code is running so there's not all that much reason for
a difference in housekeeping to occur because of this delay.

Signed-off-by: Arjan van de Ven
Signed-off-by: Pekka Enberg

Arjan van de Ven
2010-07-20 15:03:23 +0800

09 Jun, 2010

1 commit

039ca4e74 tracing: Remove kmemtrace ftrace plugin ... Browse Code »

We have been resisting new ftrace plugins and removing existing
ones, and kmemtrace has been superseded by kmem trace events
and perf-kmem, so we remove it.

Signed-off-by: Li Zefan
Acked-by: Pekka Enberg
Acked-by: Eduard - Gabriel Munteanu
Cc: Ingo Molnar
Cc: Steven Rostedt
[ remove kmemtrace from the makefile, handle slob too ]
Signed-off-by: Frederic Weisbecker

Li Zefan
2010-06-09 23:31:22 +0800

28 May, 2010

3 commits

7d6e6d09d numa: slab: use numa_mem_id() for slab local memory node ... Browse Code »

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless nodes
well. Specifically, the "fast path"--____cache_alloc()--will never
succeed as slab doesn't cache offnode object on the per cpu queues, and
for memoryless nodes, all memory will be "off node" relative to
numa_node_id(). This adds significant overhead to all kmem cache
allocations, incurring a significant regression relative to earlier
kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to return
the "effective local memory node" for the calling context. This is the
first node in the local node's generic fallback zonelist-- the same node
that "local" mempolicy-based allocations would use. This lets slab cache
these "local" allocations and avoid fallback/refill on every allocation.

N.B.: Slab will need to handle node and memory hotplug events that could
change the value returned by numa_mem_id() for any given node if recent
changes to address memory hotplug don't already address this. E.g., flush
all per cpu slab queues before rebuilding the zonelists while the
"machine" is held in the stopped state.

Performance impact on "hackbench 400 process 200"

2.6.34-rc3-mmotm-100405-1609 no-patch this-patch
ia64 no memoryless nodes [avg of 10]: 11.713 11.637 ~0.65 diff
ia64 cpus all on memless nodes [10]: 228.259 26.484 ~8.6x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating from
a single node's mm pagepool. The cache lines of the single node are
distributed/interleaved over the memory of the real physical nodes, but
the zone lock, list heads, ... of the single node with memory still each
live in a single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 40]: 2.883 2.845

Signed-off-by: Lee Schermerhorn
Cc: Tejun Heo
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Nick Piggin
Cc: David Rientjes
Cc: Eric Whitney
Cc: KAMEZAWA Hiroyuki
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: "Luck, Tony"
Cc: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-28 00:12:57 +0800
eac406801 slab: convert cpu notifier to return encapsulate errno value ... Browse Code »

By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for slab.

Signed-off-by: Akinobu Mita
Cc: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-05-28 00:12:48 +0800
6adef3ebe cpusets: new round-robin rotor for SLAB allocations ... Browse Code »

We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system. There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below. The program "lfile" writes a file consisting
of 10 pages. The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

# ./lfile
allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages. Writing the file allocates both a data page & a slab page
(buffer_head). This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

# echo 0 >/dev/cpuset/memory_spread_slab
# ./lfile
allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Paul Menage
Cc: Jack Steiner
Cc: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jack Steiner
2010-05-28 00:12:44 +0800

25 May, 2010

1 commit

c0ff7453b cpuset,mm: fix no node to alloc memory when changing cpuset's mems ... Browse Code »
43

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800