12 Jan, 2012

2 commits


10 Jan, 2012

1 commit

  • Including trace/events/*.h TRACE_EVENT() macro headers in other headers
    can cause strange side effects if another trace/event/*.h header
    includes that header. Having trace/events/kmem.h inside slab_def.h
    caused a compile error in sparc64 when changes were done to some header
    files. Moving the kmem.h trace header out of slab.h and into slab.c
    fixes the problem.

    Note, both slub.c and slob.c already include the trace/events/kmem.h
    file. Only slab.c had it missing.

    Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.au

    Reported-by: Stephen Rothwell
    Signed-off-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

05 Dec, 2011

1 commit

  • Commit 30765b92 ("slab, lockdep: Annotate the locks before using
    them") moves the init_lock_keys() call from after g_cpucache_up =
    FULL, to before it. And overlooks the fact that init_node_lock_keys()
    tests for it and ignores everything !FULL.

    Introduce a LATE stage and change the lockdep test to be
    Cc: Pekka Enberg
    Cc: stable@kernel.org
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Nov, 2011

1 commit


11 Nov, 2011

2 commits

  • Introduce new slab_max_order kernel parameter which is the equivalent of
    slub_max_order.

    For immediate purposes, allows users to override the heuristic that sets
    the max order to 1 by default if they have more than 32MB of RAM. This
    may result in page allocation failures if there is substantial
    fragmentation.

    Another usecase would be to increase the max order for better
    performance.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     
  • slab_break_gfp_order is more appropriately named slab_max_order since it
    enforces the maximum order size of slabs as long as a single object will
    still fit.

    Also rename BREAK_GFP_ORDER_{LO,HI} accordingly.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     

28 Sep, 2011

1 commit

  • Historically /proc/slabinfo and files under /sys/kernel/slab/* have
    world read permissions and are accessible to the world. slabinfo
    contains rather private information related both to the kernel and
    userspace tasks. Depending on the situation, it might reveal either
    private information per se or information useful to make another
    targeted attack. Some examples of what can be learned by
    reading/watching for /proc/slabinfo entries:

    1) dentry (and different *inode*) number might reveal other processes fs
    activity. The number of dentry "active objects" doesn't strictly show
    file count opened/touched by a process, however, there is a good
    correlation between them. The patch "proc: force dcache drop on
    unauthorized access" relies on the privacy of dentry count.

    2) different inode entries might reveal the same information as (1), but
    these are more fine granted counters. If a filesystem is mounted in a
    private mount point (or even a private namespace) and fs type differs from
    other mounted fs types, fs activity in this mount point/namespace is
    revealed. If there is a single ecryptfs mount point, the whole fs
    activity of a single user is revealed. Number of files in ecryptfs
    mount point is a private information per se.

    3) fuse_* reveals number of files / fs activity of a user in a user
    private mount point. It is approx. the same severity as ecryptfs
    infoleak in (2).

    4) sysfs_dir_cache similar to (2) reveals devices' addition/removal,
    which can be otherwise hidden by "chmod 0700 /sys/". With 0444 slabinfo
    the precise number of sysfs files is known to the world.

    5) buffer_head might reveal some kernel activity. With other
    information leaks an attacker might identify what specific kernel
    routines generate buffer_head activity.

    6) *kmalloc* infoleaks are very situational. Attacker should watch for
    the specific kmalloc size entry and filter the noise related to the unrelated
    kernel activity. If an attacker has relatively silent victim system, he
    might get rather precise counters.

    Additional information sources might significantly increase the slabinfo
    infoleak benefits. E.g. if an attacker knows that the processes
    activity on the system is very low (only core daemons like syslog and
    cron), he may run setxid binaries / trigger local daemon activity /
    trigger network services activity / await sporadic cron jobs activity
    / etc. and get rather precise counters for fs and network activity of
    these privileged tasks, which is unknown otherwise.

    Also hiding slabinfo and /sys/kernel/slab/* is a one step to complicate
    exploitation of kernel heap overflows (and possibly, other bugs). The
    related discussion:

    http://thread.gmane.org/gmane.linux.kernel/1108378

    To keep compatibility with old permission model where non-root
    monitoring daemon could watch for kernel memleaks though slabinfo one
    should do:

    groupadd slabinfo
    usermod -a -G slabinfo $MONITOR_USER

    And add the following commands to init scripts (to mountall.conf in
    Ubuntu's upstart case):

    chmod g+r /proc/slabinfo /sys/kernel/slab/*/*
    chgrp slabinfo /proc/slabinfo /sys/kernel/slab/*/*

    Signed-off-by: Vasiliy Kulikov
    Reviewed-by: Kees Cook
    Reviewed-by: Dave Hansen
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    CC: Valdis.Kletnieks@vt.edu
    CC: Linus Torvalds
    CC: Alan Cox
    Signed-off-by: Pekka Enberg

    Vasiliy Kulikov
     

19 Sep, 2011

1 commit


04 Aug, 2011

2 commits

  • Fernando found we hit the regular OFF_SLAB 'recursion' before we
    annotate the locks, cure this.

    The relevant portion of the stack-trace:

    > [ 0.000000] [] rt_spin_lock+0x50/0x56
    > [ 0.000000] [] __cache_free+0x43/0xc3
    > [ 0.000000] [] kmem_cache_free+0x6c/0xdc
    > [ 0.000000] [] slab_destroy+0x4f/0x53
    > [ 0.000000] [] free_block+0x94/0xc1
    > [ 0.000000] [] do_tune_cpucache+0x10b/0x2bb
    > [ 0.000000] [] enable_cpucache+0x7b/0xa7
    > [ 0.000000] [] kmem_cache_init_late+0x1f/0x61
    > [ 0.000000] [] start_kernel+0x24c/0x363
    > [ 0.000000] [] i386_start_kernel+0xa9/0xaf

    Reported-by: Fernando Lopez-Lezcano
    Acked-by: Pekka Enberg
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Lockdep thinks there's lock recursion through:

    kmem_cache_free()
    cache_flusharray()
    spin_lock(&l3->list_lock) list_lock) --'

    Now debug objects doesn't use SLAB_DESTROY_BY_RCU and hence there is no
    actual possibility of recursing. Luckily debug objects marks it slab
    with SLAB_DEBUG_OBJECTS so we can identify the thing.

    Mark all SLAB_DEBUG_OBJECTS (all one!) slab caches with a special
    lockdep key so that lockdep sees its a different cachep.

    Also add a WARN on trying to create a SLAB_DESTROY_BY_RCU |
    SLAB_DEBUG_OBJECTS cache, to avoid possible future trouble.

    Reported-and-tested-by: Sebastian Siewior
    [ fixes to the initial patch ]
    Reported-by: Thomas Gleixner
    Acked-by: Pekka Enberg
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311341165.27400.58.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

01 Aug, 2011

1 commit

  • Less code and the advantage of ascii dump.

    before:
    | Slab corruption: names_cache start=c5788000, len=4096
    | 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00
    | 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    | 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff
    | 030: ff ff ff ff e2 b4 17 18 c7 e4 08 06 00 01 08 00
    | 040: 06 04 00 01 e2 b4 17 18 c7 e4 0a 00 00 01 00 00
    | 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b

    after:
    | Slab corruption: size-4096 start=c38a9000, len=4096
    | 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00 kk....V...$...*.
    | 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    | 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ................
    | 030: ff ff ff ff d2 56 5f aa db 9c 08 06 00 01 08 00 .....V_.........
    | 040: 06 04 00 01 d2 56 5f aa db 9c 0a 00 00 01 00 00 .....V_.........
    | 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b ........kkkkkkkk

    Acked-by: Christoph Lameter
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Pekka Enberg

    Sebastian Andrzej Siewior
     

31 Jul, 2011

1 commit


28 Jul, 2011

1 commit


23 Jul, 2011

1 commit

  • * 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slab: fix DEBUG_SLAB warning
    slab: shrink sizeof(struct kmem_cache)
    slab: fix DEBUG_SLAB build
    SLUB: Fix missing include
    slub: reduce overhead of slub_debug
    slub: Add method to verify memory is not freed
    slub: Enable backtrace for create/delete points
    slab allocators: Provide generic description of alignment defines
    slab, slub, slob: Unify alignment definition
    slob/lockdep: Fix gfp flags passed to lockdep

    Linus Torvalds
     

22 Jul, 2011

1 commit

  • In commit c225150b "slab: fix DEBUG_SLAB build",
    "if ((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))" is always true if
    ARCH_SLAB_MINALIGN == 0. Do not print warning if ARCH_SLAB_MINALIGN == 0.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Pekka Enberg

    Tetsuo Handa
     

21 Jul, 2011

1 commit

  • Reduce high order allocations for some setups.
    (NR_CPUS=4096 -> we need 64KB per kmem_cache struct)

    We now allocate exact needed size (using nr_cpu_ids and nr_node_ids)

    This also makes code a bit smaller on x86_64, since some field offsets
    are less than the 127 limit :

    Before patch :
    # size mm/slab.o
    text data bss dec hex filename
    22605 361665 32 384302 5dd2e mm/slab.o

    After patch :
    # size mm/slab.o
    text data bss dec hex filename
    22349 353473 8224 384046 5dc2e mm/slab.o

    CC: Andrew Morton
    Reported-by: Konstantin Khlebnikov
    Signed-off-by: Eric Dumazet
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Eric Dumazet
     

18 Jul, 2011

1 commit

  • Fix CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y build error and warnings.

    Now that ARCH_SLAB_MINALIGN defaults to __alignof__(unsigned long long),
    it is always defined (when slab.h included), but cannot be used in #if:
    mm/slab.c: In function `cache_alloc_debugcheck_after':
    mm/slab.c:3156:5: warning: "__alignof__" is not defined
    mm/slab.c:3156:5: error: missing binary operator before token "("
    make[1]: *** [mm/slab.o] Error 1

    So just remove the #if and #endif lines, but then 64-bit build warns:
    mm/slab.c: In function `cache_alloc_debugcheck_after':
    mm/slab.c:3156:6: warning: cast from pointer to integer of different size
    mm/slab.c:3158:10: warning: format `%d' expects type `int', but argument
    3 has type `long unsigned int'
    Fix those with casts, whatever the actual type of ARCH_SLAB_MINALIGN.

    Acked-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Pekka Enberg

    Hugh Dickins
     

04 Jun, 2011

1 commit

  • Currently, when using CONFIG_DEBUG_SLAB, we put in kfree() or
    kmem_cache_free() as the last user of free objects, which is not
    very useful, so change it to the caller of those functions instead.

    Acked-by: David Rientjes
    Acked-by: Christoph Lameter
    Signed-off-by: Suleiman Souhlal
    Signed-off-by: Pekka Enberg

    Suleiman Souhlal
     

21 May, 2011

1 commit

  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • While looking at some other notifier callbacks I noticed this code could
    use a simple cleanup.

    notifier_from_errno() no longer needs the if (ret)/else conditional. That
    same conditional is now done in notifier_from_errno().

    Signed-off-by: Prarit Bhargava
    Cc: Paul Menage
    Cc: Li Zefan
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     

12 Mar, 2011

3 commits


14 Feb, 2011

1 commit

  • This reverts commit 5c5e3b33b7cb959a401f823707bee006caadd76e.

    The commit breaks ARM thusly:

    | Mount-cache hash table entries: 512
    | slab error in verify_redzone_free(): cache `idr_layer_cache': memory outside object was overwritten
    | Backtrace:
    | [] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c)
    | [] (dump_stack+0x0/0x1c) from [] (__slab_error+0x28/0x30)
    | [] (__slab_error+0x0/0x30) from [] (cache_free_debugcheck+0x1c0/0x2b8)
    | [] (cache_free_debugcheck+0x0/0x2b8) from [] (kmem_cache_free+0x3c/0xc0)
    | [] (kmem_cache_free+0x0/0xc0) from [] (ida_get_new_above+0x19c/0x1c0)
    | [] (ida_get_new_above+0x0/0x1c0) from [] (alloc_vfsmnt+0x54/0x144)
    | [] (alloc_vfsmnt+0x0/0x144) from [] (vfs_kern_mount+0x30/0xec)
    | [] (vfs_kern_mount+0x0/0xec) from [] (kern_mount_data+0x1c/0x20)
    | [] (kern_mount_data+0x0/0x20) from [] (sysfs_init+0x68/0xc8)
    | [] (sysfs_init+0x0/0xc8) from [] (mnt_init+0x90/0x1b0)
    | [] (mnt_init+0x0/0x1b0) from [] (vfs_caches_init+0x100/0x140)
    | [] (vfs_caches_init+0x0/0x140) from [] (start_kernel+0x2e8/0x368)
    | [] (start_kernel+0x0/0x368) from [] (__enable_mmu+0x0/0x2c)
    | c0113268: redzone 1:0xd84156c5c032b3ac, redzone 2:0xd84156c5635688c0.
    | slab error in cache_alloc_debugcheck_after(): cache `idr_layer_cache': double free, or memory outside object was overwritten
    | ...
    | c011307c: redzone 1:0x9f91102ffffffff, redzone 2:0x9f911029d74e35b
    | slab: Internal list corruption detected in cache 'idr_layer_cache'(24), slabp c0113000(16). Hexdump:
    |
    | 000: 20 4f 10 c0 20 4f 10 c0 7c 00 00 00 7c 30 11 c0
    | 010: 10 00 00 00 10 00 00 00 00 00 c9 17 fe ff ff ff
    | 020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
    | 030: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
    | 040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
    | 050: fe ff ff ff fe ff ff ff fe ff ff ff 11 00 00 00
    | 060: 12 00 00 00 13 00 00 00 14 00 00 00 15 00 00 00
    | 070: 16 00 00 00 17 00 00 00 c0 88 56 63
    | kernel BUG at /home/rmk/git/linux-2.6-rmk/mm/slab.c:2928!

    Reference: https://lkml.org/lkml/2011/2/7/238
    Cc: # 2.6.35.y and later
    Reported-and-analyzed-by: Russell King
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     

24 Jan, 2011

1 commit


15 Jan, 2011

1 commit


11 Jan, 2011

1 commit


08 Jan, 2011

2 commits

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     
  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
    usb: don't use flush_scheduled_work()
    speedtch: don't abuse struct delayed_work
    media/video: don't use flush_scheduled_work()
    media/video: explicitly flush request_module work
    ioc4: use static work_struct for ioc4_load_modules()
    init: don't call flush_scheduled_work() from do_initcalls()
    s390: don't use flush_scheduled_work()
    rtc: don't use flush_scheduled_work()
    mmc: update workqueue usages
    mfd: update workqueue usages
    dvb: don't use flush_scheduled_work()
    leds-wm8350: don't use flush_scheduled_work()
    mISDN: don't use flush_scheduled_work()
    macintosh/ams: don't use flush_scheduled_work()
    vmwgfx: don't use flush_scheduled_work()
    tpm: don't use flush_scheduled_work()
    sonypi: don't use flush_scheduled_work()
    hvsi: don't use flush_scheduled_work()
    xen: don't use flush_scheduled_work()
    gdrom: don't use flush_scheduled_work()
    ...

    Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
    as per Tejun.

    Linus Torvalds
     

07 Jan, 2011

1 commit


17 Dec, 2010

1 commit

  • __get_cpu_var() can be replaced with this_cpu_read and will then use a
    single read instruction with implied address calculation to access the
    correct per cpu instance.

    However, the address of a per cpu variable passed to __this_cpu_read()
    cannot be determined (since it's an implied address conversion through
    segment prefixes). Therefore apply this only to uses of __get_cpu_var
    where the address of the variable is not used.

    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

15 Dec, 2010

1 commit

  • cancel_rearming_delayed_work[queue]() has been superceded by
    cancel_delayed_work_sync() quite some time ago. Convert all the
    in-kernel users. The conversions are completely equivalent and
    trivial.

    Signed-off-by: Tejun Heo
    Acked-by: "David S. Miller"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Evgeniy Polyakov
    Cc: Jeff Garzik
    Cc: Benjamin Herrenschmidt
    Cc: Mauro Carvalho Chehab
    Cc: netdev@vger.kernel.org
    Cc: Anton Vorontsov
    Cc: David Woodhouse
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Cc: Alex Elder
    Cc: xfs-masters@oss.sgi.com
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Andrew Morton
    Cc: netfilter-devel@vger.kernel.org
    Cc: Trond Myklebust
    Cc: linux-nfs@vger.kernel.org

    Tejun Heo
     

29 Nov, 2010

1 commit


27 Oct, 2010

1 commit

  • Use the new {max,min}3 macros to save some cycles and bytes on the stack.
    This patch substitutes trivial nested macros with their counterpart.

    Signed-off-by: Hagen Paul Pfeifer
    Cc: Joe Perches
    Cc: Ingo Molnar
    Cc: Hartley Sweeten
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Herbert Xu
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hagen Paul Pfeifer
     

23 Aug, 2010

1 commit


10 Aug, 2010

1 commit


09 Aug, 2010

1 commit

  • This patch fixes alignment of slab objects in case CONFIG_DEBUG_PAGEALLOC is
    active.
    Before this spot in kmem_cache_create, we have this situation:
    - align contains the required alignment of the object
    - cachep->obj_offset is 0 or equals align in case of CONFIG_DEBUG_SLAB
    - size equals the size of the object, or object plus trailing redzone in case
    of CONFIG_DEBUG_SLAB

    This spot tries to fill one page per object if the object is in certain size
    limits, however setting obj_offset to PAGE_SIZE - size does break the object
    alignment since size may not be aligned with the required alignment.
    This patch simply adds an ALIGN(size, align) to the equation and fixes the
    object size detection accordingly.

    This code in drivers/s390/cio/qdio_setup_init has lead to incorrectly aligned
    slab objects (sizeof(struct qdio_q) equals 1792):
    qdio_q_cache = kmem_cache_create("qdio_q", sizeof(struct qdio_q),
    256, 0, NULL);

    Acked-by: Christoph Lameter
    Signed-off-by: Carsten Otte
    Signed-off-by: Pekka Enberg

    Carsten Otte
     

07 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slub: Allow removal of slab caches during boot
    Revert "slub: Allow removal of slab caches during boot"
    slub numa: Fix rare allocation from unexpected node
    slab: use deferable timers for its periodic housekeeping
    slub: Use kmem_cache flags to detect if slab is in debugging mode.
    slub: Allow removal of slab caches during boot
    slub: Check kasprintf results in kmem_cache_init()
    SLUB: Constants need UL
    slub: Use a constant for a unspecified node.
    SLOB: Free objects to their own list
    slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACING

    Linus Torvalds