09 Dec, 2011

1 commit


26 Jul, 2011

1 commit

  • RED_INACTIVE is a slab thing, and reusing it for memblock was
    inappropriate, because memblock is dealing with phys_addr_t's which have a
    Kconfigurable sizeof().

    Create a new poison type for this application. Fixes the sparse warning

    warning: cast truncates bits from constant value (9f911029d74e35b becomes 9d74e35b)

    Reported-by: H Hartley Sweeten
    Tested-by: H Hartley Sweeten
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

11 Aug, 2010

1 commit

  • This patch adds reverse mapping feature for hugepage by introducing
    mapcount for shared/private-mapped hugepage and anon_vma for
    private-mapped hugepage.

    While hugepage is not currently swappable, reverse mapping can be useful
    for memory error handler.

    Without this patch, memory error handler cannot identify processes
    using the bad hugepage nor unmap it from them. That is:
    - for shared hugepage:
    we can collect processes using a hugepage through pagecache,
    but can not unmap the hugepage because of the lack of mapcount.
    - for privately mapped hugepage:
    we can neither collect processes nor unmap the hugepage.
    This patch solves these problems.

    This patch include the bug fix given by commit 23be7468e8, so reverts it.

    Dependency:
    "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

    ChangeLog since May 24.
    - create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
    - move functions setting up anon_vma for hugepage into mm/rmap.c.

    ChangeLog since May 13.
    - rebased to 2.6.34
    - fix logic error (in case that private mapping and shared mapping coexist)
    - move is_vm_hugetlb_page() into include/linux/mm.h to use this function
    from linear_page_index()
    - define and use linear_hugepage_index() instead of compound_order()
    - use page_move_anon_rmap() in hugetlb_cow()
    - copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
    - revert commit 24be7468 completely

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Acked-by: Fengguang Wu
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

25 Apr, 2010

1 commit

  • If a futex key happens to be located within a huge page mapped
    MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
    page->mapping that will never exist.

    See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
    about the problem.

    This patch makes page->mapping a poisoned value that includes
    PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
    continue but because of PAGE_MAPPING_ANON, the poisoned value is not
    dereferenced or used by futex. No other part of the VM should be
    dereferencing the page->mapping of a hugetlbfs page as its page cache is
    not on the LRU.

    This patch fixes the problem with the test case described in the bugzilla.

    [akpm@linux-foundation.org: mel cant spel]
    Signed-off-by: Mel Gorman
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

12 Jan, 2010

1 commit

  • The list macros use LIST_POISON1 and LIST_POISON2 as undereferencable
    pointers in order to trap erronous use of freed list_heads. Unfortunately
    userspace can arrange for those pointers to actually be dereferencable,
    potentially turning an oops to an expolit.

    To avoid this allow architectures (currently x86_64 only) to override
    the default values for these pointers with truly-undereferencable values.
    This is easy on x86_64 as the virtual address space is large and contains
    areas that cannot be mapped.

    Other 64-bit architectures will likely find similar unmapped ranges.

    [ingo: switch to 0xdead000000000000 as the unmapped area]
    [ingo: add comments, cleanup]
    [jaswinder: eliminate sparse warnings]

    Acked-by: Linus Torvalds
    Signed-off-by: Jaswinder Singh Rajput
    Signed-off-by: Ingo Molnar
    Signed-off-by: Avi Kivity
    Signed-off-by: Linus Torvalds

    Avi Kivity
     

22 Sep, 2009

1 commit

  • Newly initialized flex_array's and/or flex_array_part's are now poisoned
    with a new poison value, FLEX_ARRAY_FREE. It's value is similar to
    POISON_FREE used in the various slab allocators, but is different to
    distinguish between flex array's poisoned kmem and slab allocator poisoned
    kmem.

    This will allow us to identify flex_array_part's that only contain free
    elements (and free them with an addition to the flex_array API). This
    could also be extended in the future to identify `get' uses on elements
    that have not been `put'.

    If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided.
    These elements are considered to be in-use since they have been
    initialized.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

01 Apr, 2009

1 commit

  • CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and
    s390. This patch implements it for the rest of the architectures by
    filling the pages with poison byte patterns after free_pages() and
    verifying the poison patterns before alloc_pages().

    This generic one cannot detect invalid page accesses immediately but
    invalid read access may cause invalid dereference by poisoned memory and
    invalid write access can be detected after a long delay.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

30 Apr, 2008

1 commit

  • Add calls to the generic object debugging infrastructure and provide fixup
    functions which allow to keep the system alive when recoverable problems have
    been detected by the object debugging core code.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

18 Oct, 2007

1 commit


09 May, 2007

1 commit

  • There are two problems with the existing redzone implementation.

    Firstly, it's causing misalignment of structures which contain a 64-bit
    integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
    modules to fail to load because of the misalignment. (In particular, the
    first check in
    net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

    On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

    With slab debugging, we use 32-bit redzones. And allocated slab objects
    aren't sufficiently aligned to hold a structure containing a uint64_t.

    By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
    redzone checks on those architectures. By using 64-bit redzones we avoid that
    loss of debugging, and also fix the other problem while we're at it.

    When investigating this, I noticed that on 64-bit platforms we're using a
    32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
    aside for the redzone. Which means that the four bytes immediately before
    or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
    machines, respectively. Which is probably not the most useful choice of
    poison value.

    One way to fix both of those at once is just to switch to 64-bit
    redzones in all cases.

    Signed-off-by: David Woodhouse
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

08 May, 2007

1 commit

  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

03 May, 2007

1 commit

  • On x86-64, kernel memory freed after init can be entirely unmapped instead
    of just getting 'poisoned' by overwriting with a debug pattern.

    On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
    can also be write-protected.

    Compared to the first version, this one prevents re-creating deleted
    mappings in the kernel image range on x86-64, if those got removed
    previously. This, together with the original changes, prevents temporarily
    having inconsistent mappings when cacheability attributes are being
    changed on such pages (e.g. from AGP code). While on i386 such duplicate
    mappings don't exist, the same change is done there, too, both for
    consistency and because checking pte_present() before using various other
    pte_XXX functions is a requirement anyway. At once, i386 code gets
    adjusted to use pte_huge() instead of open coding this.

    AK: split out cpa() changes

    Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen

    Jan Beulich
     

04 Jul, 2006

2 commits


28 Jun, 2006

3 commits

  • Add more poison values to include/linux/poison.h. It's not clear to me
    whether some others should be added or not, so I haven't added any of
    these:

    ./include/linux/libata.h:#define ATA_TAG_POISON 0xfafbfcfdU
    ./arch/ppc/8260_io/fcc_enet.c:1918: memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
    ./drivers/usb/mon/mon_text.c:429: memset(mem, 0xe5, sizeof(struct mon_event_text));
    ./drivers/char/ftape/lowlevel/ftape-ctl.c:738: memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
    ./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Update two drivers to use poison.h.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Localize poison values into one header file for better documentation and
    easier/quicker debugging and so that the same values won't be used for
    multiple purposes.

    Use these constants in core arch., mm, driver, and fs code.

    Signed-off-by: Randy Dunlap
    Acked-by: Matt Mackall
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap