29 Nov, 2011

1 commit

  • Conflicts & resolutions:

    * arch/x86/xen/setup.c

    dc91c728fd "xen: allow extra memory to be in multiple regions"
    24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

    conflicted on xen_add_extra_mem() updates. The resolution is
    trivial as the latter just want to replace
    memblock_x86_reserve_range() with memblock_reserve().

    * drivers/pci/intel-iommu.c

    166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
    5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

    conflicted as the former moved the file under drivers/iommu/.
    Resolved by applying the chnages from the latter on the moved
    file.

    * mm/Kconfig

    6661672053a "memblock: add NO_BOOTMEM config symbol"
    c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

    conflicted trivially. Both added config options. Just
    letting both add their own options resolves the conflict.

    * mm/memblock.c

    d1f0ece6cdc "mm/memblock.c: small function definition fixes"
    ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

    confliected. The former updates function removed by the
    latter. Resolution is trivial.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

01 Nov, 2011

1 commit

  • With the NO_BOOTMEM symbol added architectures may now use the following
    syntax to tell that they do not need bootmem:

    select NO_BOOTMEM

    This is much more convinient than adding a new kconfig symbol which was
    otherwise required.

    Adding this symbol does not conflict with the architctures that already
    define their own symbol.

    Signed-off-by: Sam Ravnborg
    Cc: Yinghai Lu
    Acked-by: Tejun Heo
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     

15 Jul, 2011

2 commits

  • From 6839454ae63f1eb21e515c10229ca95c22955fec Mon Sep 17 00:00:00 2001
    From: Tejun Heo
    Date: Thu, 14 Jul 2011 11:22:17 +0200

    Make ARCH_DISCARD_MEMBLOCK a config option so that it can be handled
    together with other MEMBLOCK options.

    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/r/20110714094603.GH3455@htj.dyndns.org
    Cc: Yinghai Lu
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: H. Peter Anvin

    Tejun Heo
     
  • From 83103b92f3234ec830852bbc5c45911bd6cbdb20 Mon Sep 17 00:00:00 2001
    From: Tejun Heo
    Date: Thu, 14 Jul 2011 11:22:16 +0200

    Add optional region->nid which can be enabled by arch using
    CONFIG_HAVE_MEMBLOCK_NODE_MAP. When enabled, memblock also carries
    NUMA node information and replaces early_node_map[].

    Newly added memblocks have MAX_NUMNODES as nid. Arch can then call
    memblock_set_node() to set node information. memblock takes care of
    merging and node affine allocations w.r.t. node information.

    When MEMBLOCK_NODE_MAP is enabled, early_node_map[], related data
    structures and functions to manipulate and iterate it are disabled.
    memblock version of __next_mem_pfn_range() is provided such that
    for_each_mem_pfn_range() behaves the same and its users don't have to
    be updated.

    -v2: Yinghai spotted section mismatch caused by missing
    __init_memblock in memblock_set_node(). Fixed.

    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/r/20110714094342.GF3455@htj.dyndns.org
    Cc: Yinghai Lu
    Cc: Benjamin Herrenschmidt
    Signed-off-by: H. Peter Anvin

    Tejun Heo
     

10 Jun, 2011

1 commit


27 May, 2011

1 commit

  • This third patch of eight in this cleancache series provides
    the core code for cleancache that interfaces between the hooks in
    VFS and individual filesystems and a cleancache backend. It also
    includes build and config patches.

    Two new files are added: mm/cleancache.c and include/linux/cleancache.h.

    Note that CONFIG_CLEANCACHE can default to on; in systems that do
    not provide a cleancache backend, all hooks devolve to a simple
    check of a global enable flag, so performance impact should
    be negligible but can be reduced to zero impact if config'ed off.
    However for this first commit, it defaults to off.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    Credits: Cleancache_ops design derived from Jeremy Fitzhardinge
    design for tmem

    [v8: dan.magenheimer@oracle.com: fix exportfs call affecting btrfs]
    [v8: akpm@linux-foundation.org: use static inline function, not macro]
    [v7: dan.magenheimer@oracle.com: cleanup sysfs and remove cleancache prefix]
    [v6: JBeulich@novell.com: robustly handle buggy fs encode_fh actor definition]
    [v5: jeremy@goop.org: clean up global usage and static var names]
    [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
    [v5: hch@infradead.org: cleaner non-global interface for ops registration]
    [v4: adilger@sun.com: interface must support exportfs FS's]
    [v4: hch@infradead.org: interface must support 64-bit FS on 32-bit kernel]
    [v3: akpm@linux-foundation.org: use one ops struct to avoid pointer hops]
    [v3: akpm@linux-foundation.org: document and ensure PageLocked reqts are met]
    [v3: ngupta@vflare.org: fix success/fail codes, change funcs to void]
    [v2: viro@ZenIV.linux.org.uk: use sane types]
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Al Viro
    Acked-by: Andrew Morton
    Acked-by: Nitin Gupta
    Acked-by: Minchan Kim
    Acked-by: Andreas Dilger
    Acked-by: Jan Beulich
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Chris Mason
    Cc: Ted Ts'o
    Cc: Mark Fasheh
    Cc: Joel Becker

    Dan Magenheimer
     

26 Jan, 2011

1 commit

  • Commit 5d6892407 ("thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE
    enabled") causes this warning during the configuration process:

    warning: (TRANSPARENT_HUGEPAGE) selects COMPACTION which has unmet
    direct dependencies (EXPERIMENTAL && HUGETLB_PAGE && MMU)

    COMPACTION doesn't depend on HUGETLB_PAGE, it doesn't depend on THP
    either, it is also useful for regular alloc_pages(order > 0) including
    the very kernel stack during fork (THREAD_ORDER = 1). It's always
    better to enable COMPACTION.

    The warning should be an error because we would end up with MIGRATION
    not selected, and COMPACTION wouldn't work without migration (despite it
    seems to build with an inline migrate_pages returning -ENOSYS).

    I'd also like to remove EXPERIMENTAL: compaction has been in the kernel
    for some releases (for full safety the default remains disabled which I
    think is enough).

    Signed-off-by: Andrea Arcangeli
    Reported-by: Luca Tettamanti
    Tested-by: Luca Tettamanti
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

14 Jan, 2011

4 commits

  • With transparent hugepage support we need compaction for the "defrag"
    sysfs controls to be effective.

    At the moment THP hangs the system if COMPACTION isn't selected, as
    without COMPACTION lumpy reclaim wouldn't be entirely disabled. So at the
    moment it's not orthogonal. When lumpy will be removed from the VM I can
    remove the select COMPACTION in theory, but then 99% of THP users would be
    still doing a mistake in disabling compaction, even if the mistake won't
    return in fatal runtime but just slightly degraded performance. So from a
    theoretical standpoing forcing the below select is not needed (the
    dependency isn't strict nor at compile time nor at runtime) but from a
    practical standpoint it is safer.

    If anybody really wants THP to run without compaction, it'd be such a
    weird setup that editing the Kconfig file to allow it will be surely not a
    problem.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Allow to choose between the always|madvise default for page faults and
    khugepaged at config time. madvise guarantees zero risk of higher memory
    footprint for applications (applications using madvise(MADV_HUGEPAGE)
    won't risk to use any more memory by backing their virtual regions with
    hugepages).

    Initially set the default to N and don't depend on EMBEDDED.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add support for transparent hugepages to x86 32bit.

    Share the same VM_ bitflag for VM_MAPPED_COPY. mm/nommu.c will never
    support transparent hugepages.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add config option.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: update comments to reflect that percpu allocations are always zero-filled
    percpu: Optimize __get_cpu_var()
    x86, percpu: Optimize this_cpu_ptr
    percpu: clear memory allocated with the km allocator
    percpu: fix build breakage on s390 and cleanup build configuration tests
    percpu: use percpu allocator on UP too
    percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
    vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

    Fixed up trivial conflicts in include/linux/percpu.h

    Linus Torvalds
     

10 Sep, 2010

1 commit

  • COMPACTION enables MIGRATION, but MIGRATION spawns a warning if numa or
    memhotplug aren't selected. However MIGRATION doesn't depend on them. I
    guess it's just trying to be strict doing a double check on who's enabling
    it, but it doesn't know that compaction also enables MIGRATION.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

08 Sep, 2010

1 commit

  • On UP, percpu allocations were redirected to kmalloc. This has the
    following problems.

    * For certain amount of allocations (determined by
    PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
    allocator can be used before the usual kernel memory allocator is
    brought online. On SMP, this is used to initialize the kernel
    memory allocator.

    * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
    doesn't. For example, workqueue makes use of larger alignments for
    cpu_workqueues.

    Currently, users of percpu allocators need to handle UP differently,
    which is somewhat fragile and ugly. Other than small amount of
    memory, there isn't much to lose by enabling percpu allocator on UP.
    It can simply use kernel memory based chunk allocation which was added
    for SMP archs w/o MMUs.

    This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
    makes UP build use percpu-km. As percpu addresses and kernel
    addresses are always identity mapped and static percpu variables don't
    need any special treatment, nothing is arch dependent and mm/percpu.c
    implements generic setup_per_cpu_areas() for UP.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Pekka Enberg

    Tejun Heo
     

14 Jul, 2010

1 commit

  • via following scripts

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/lmb/memblock/g' \
    -e 's/LMB/MEMBLOCK/g' \
    $FILES

    for N in $(find . -name lmb.[ch]); do
    M=$(echo $N | sed 's/lmb/memblock/g')
    mv $N $M
    done

    and remove some wrong change like lmbench and dlmb etc.

    also move memblock.c from lib/ to mm/

    Suggested-by: Ingo Molnar
    Acked-by: "H. Peter Anvin"
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Linus Torvalds
    Signed-off-by: Yinghai Lu
    Signed-off-by: Benjamin Herrenschmidt

    Yinghai Lu
     

25 May, 2010

1 commit

  • CONFIG_MIGRATION currently depends on CONFIG_NUMA or on the architecture
    being able to hot-remove memory. The main users of page migration such as
    sys_move_pages(), sys_migrate_pages() and cpuset process migration are
    only beneficial on NUMA so it makes sense.

    As memory compaction will operate within a zone and is useful on both NUMA
    and non-NUMA systems, this patch allows CONFIG_MIGRATION to be set if the
    user selects CONFIG_COMPACTION as an option.

    [akpm@linux-foundation.org: Depend on CONFIG_HUGETLB_PAGE]
    Signed-off-by: Mel Gorman
    Reviewed-by: Christoph Lameter
    Reviewed-by: Rik van Riel
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

04 Mar, 2010

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'x86-bootmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
    early_res: Need to save the allocation name in drop_range_partial()
    sparsemem: Fix compilation on PowerPC
    early_res: Add free_early_partial()
    x86: Fix non-bootmem compilation on PowerPC
    core: Move early_res from arch/x86 to kernel/
    x86: Add find_fw_memmap_area
    Move round_up/down to kernel.h
    x86: Make 32bit support NO_BOOTMEM
    early_res: Enhance check_and_double_early_res
    x86: Move back find_e820_area to e820.c
    x86: Add find_early_area_size
    x86: Separate early_res related code from e820.c
    x86: Move bios page reserve early to head32/64.c
    sparsemem: Put mem map for one node together.
    sparsemem: Put usemap for one node together
    x86: Make 64 bit use early_res instead of bootmem before slab
    x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA
    x86: Make early_node_mem get mem > 4 GB if possible
    x86: Dynamically increase early_res array size
    x86: Introduce max_early_res and early_res_count
    ...

    Linus Torvalds
     

13 Feb, 2010

1 commit

  • Add vmemmap_alloc_block_buf for mem map only.

    It will fallback to the old way if it cannot get a block that big.

    Before this patch, when a node have 128g ram installed, memmap are
    split into two parts or more.
    [ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
    [ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
    [ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
    [ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
    [ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
    [ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
    [ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
    [ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
    [ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
    [ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
    [ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
    [ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
    [ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
    [ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
    [ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
    [ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
    [ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
    [ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
    [ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
    [ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7

    after patch will get
    [ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
    [ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
    [ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
    [ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
    [ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
    [ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
    [ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
    [ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7

    -v2: change buf to vmemmap_buf instead according to Ingo
    also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo
    -v3: according to Andrew, use sizeof(name) instead of hard coded 15

    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Cc: Christoph Lameter
    Acked-by: Christoph Lameter
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

05 Jan, 2010

1 commit

  • We previously had 2 quicklists, one for the PGD case and one for PTEs.
    Now that the PGD/PMD cases are handled through slab caches due to the
    multi-level configurability, only the PTE quicklist remains. As such,
    reduce NR_QUICK to its appropriate size and bump down the PTE quicklist
    index.

    Signed-off-by: Paul Mundt

    Paul Mundt
     

22 Dec, 2009

1 commit

  • The injector filter requires stable_page_flags() which is supplied
    by procfs. So make it dependent on that.

    Also add ifdefs around the filter code in memory-failure.c so that
    when the filter is disabled due to missing dependencies the whole
    code still builds.

    Reported-by: Ingo Molnar
    Signed-off-by: Andi Kleen

    Andi Kleen
     

17 Dec, 2009

1 commit

  • In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
    skipped by the compiler. We do this as the minimum mmap address doesn't make
    any sense in NOMMU mode.

    mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Signed-off-by: James Morris

    David Howells
     

16 Dec, 2009

5 commits

  • Signed-off-by: Andi Kleen

    Andi Kleen
     
  • When specified, only poison pages if ((page_flags & mask) == value).

    - corrupt-filter-flags-mask
    - corrupt-filter-flags-value

    This allows stress testing of many kinds of pages.

    Strictly speaking, the buddy pages requires taking zone lock, to avoid
    setting PG_hwpoison on a "was buddy but now allocated to someone" page.
    However we can just do nothing because we set PG_locked in the beginning,
    this prevents the page allocator from allocating it to someone. (It will
    BUG() on the unexpected PG_locked, which is fine for hwpoison testing.)

    [AK: Add select PROC_PAGE_MONITOR to satisfy dependency]

    CC: Nick Piggin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andi Kleen

    Wu Fengguang
     
  • Now that ksm pages are swappable, and the known holes plugged, remove
    mention of unswappable kernel pages from KSM documentation and comments.

    Remove the totalram_pages/4 initialization of max_kernel_pages. In fact,
    remove max_kernel_pages altogether - we can reinstate it if removal turns
    out to break someone's script; but if we later want to limit KSM's memory
    usage, limiting the stable nodes would not be an effective approach.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t,
    and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep
    enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that.

    When 2.6.15 placed the split page table lock inside struct page (usually
    sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and
    we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4 by
    letting the spinlock_t occupy both page->private and page->mapping).

    Should these debugging options be allowed to double the size of a struct
    page, when only one minority use of the page (as a page table) needs to
    fit a spinlock in there? Perhaps not.

    Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK or
    DEBUG_LOCK_ALLOC is in force. I've sometimes tried to be cleverer,
    kmallocing a cacheline for the spinlock when it doesn't fit, but given up
    each time. Falling back to mm->page_table_lock (as we do when ptlock is
    not split) lets lockdep check out the strictest path anyway.

    And now that some arches allow 8192 cpus, use 999999 for infinity.

    (What has this got to do with KSM swapping? It doesn't care about the
    size of struct page, but may care about random junk in page->mapping - to
    be explained separately later.)

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Nick Piggin
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove three degrees of obfuscation, left over from when we had
    CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is
    CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only
    built when CONFIG_MMU, so don't need such conditions at all.

    Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from
    169 defconfigs: leave those to evolve in due course.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

18 Nov, 2009

1 commit

  • Allow memory hotplug and hibernation in the same kernel

    Memory hotplug and hibernation were exclusive in Kconfig. This is
    obviously a problem for distribution kernels who want to support both in
    the same image.

    After some discussions with Rafael and others the only problem is with
    parallel memory hotadd or removal while a hibernation operation is in
    process. It was also working for s390 before.

    This patch removes the Kconfig level exclusion, and simply makes the
    memory add / remove functions grab the pm_mutex to exclude against
    hibernation.

    Fixes a regression - old kernels didn't exclude memory hotadd and
    hibernation.

    Signed-off-by: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

29 Oct, 2009

2 commits

  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
    powerpc: Minor cleanup to lib/Kconfig.debug
    powerpc: Minor cleanup to sound/ppc/Kconfig
    powerpc: Minor cleanup to init/Kconfig
    powerpc: Limit memory hotplug support to PPC64 Book-3S machines
    powerpc: Limit hugetlbfs support to PPC64 Book-3S machines
    powerpc: Fix compile errors found by new ppc64e_defconfig
    powerpc: Add a Book-3E 64-bit defconfig
    powerpc/booke: Fix xmon single step on PowerPC Book-E
    powerpc: Align vDSO base address
    powerpc: Fix segment mapping in vdso32
    powerpc/iseries: Remove compiler version dependent hack
    powerpc/perf_events: Fix priority of MSR HV vs PR bits
    powerpc/5200: Update defconfigs
    drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM
    powerpc/boot/dts: drop obsolete 'fsl5200-clocking'
    of: Remove nested function
    mpc5200: support for the MAN mpc5200 based board mucmc52
    mpc5200: support for the MAN mpc5200 based board uc101

    Linus Torvalds
     
  • Currently, sparsemem is only available if EXPERIMENTAL is enabled.
    However, it hasn't ever been marked experimental.

    It's been about four years since sparsemem was merged, and we have
    platforms which depend on it; allow architectures to decide whether
    sparsemem should be the default memory model.

    Signed-off-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     

27 Oct, 2009

1 commit


08 Oct, 2009

1 commit

  • Adjust the max_kernel_pages default to a quarter of totalram_pages,
    instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from
    highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be
    pinning 32MB of lowmem with their rmap_items, so no need for the more
    obscure calculation (nor for its own special init function).

    There is no way for the user to switch KSM on if CONFIG_SYSFS is not
    enabled, so in that case default run to KSM_RUN_MERGE.

    Update KSM Documentation and Kconfig to reflect the new defaults.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

27 Sep, 2009

1 commit

  • This build failure triggers:

    In file included from include/linux/suspend.h:8,
    from arch/x86/kernel/asm-offsets_32.c:11,
    from arch/x86/kernel/asm-offsets.c:2:
    include/linux/mm.h:503:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS

    Because due to the hwpoison page flag we ran out of page
    flags on 32-bit.

    Dont turn on hwpoison on 32-bit NUMA (it's rare in any
    case).

    Also clean up the Kconfig dependencies in the generic MM
    code by introducing ARCH_SUPPORTS_MEMORY_FAILURE.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Linus Torvalds
     

24 Sep, 2009

1 commit

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     

22 Sep, 2009

2 commits

  • Add Documentation/vm/ksm.txt: how to use the Kernel Samepage Merging feature

    Signed-off-by: Hugh Dickins
    Cc: Michael Kerrisk
    Cc: Randy Dunlap
    Acked-by: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This patch presents the mm interface to a dummy version of ksm.c, for
    better scrutiny of that interface: the real ksm.c follows later.

    When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
    MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
    pretending that they can be serviced. But when CONFIG_KSM=y, accept them
    even if KSM is not currently running, and even on areas which KSM will not
    touch (e.g. hugetlb or shared file or special driver mappings).

    Like other madvices, report ENOMEM despite success if any area in the
    range is unmapped, and use EAGAIN to report out of memory.

    Define vma flag VM_MERGEABLE to identify an area on which KSM may try
    merging pages: leave it to ksm_madvise() to decide whether to set it.
    Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
    VM_MERGEABLE areas, to minimize callouts when forking or exiting.

    Based upon earlier patches by Chris Wright and Izik Eidus.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Chris Wright
    Signed-off-by: Izik Eidus
    Cc: Michael Kerrisk
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Avi Kivity
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

16 Sep, 2009

3 commits

  • Useful for some testing scenarios, although specific testing is often
    done better through MADV_POISON

    This can be done with the x86 level MCE injector too, but this interface
    allows it to do independently from low level x86 changes.

    v2: Add module license (Haicheng Li)

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Add the high level memory handler that poisons pages
    that got corrupted by hardware (typically by a two bit flip in a DIMM
    or a cache) on the Linux level. The goal is to prevent everyone
    from accessing these pages in the future.

    This done at the VM level by marking a page hwpoisoned
    and doing the appropriate action based on the type of page
    it is.

    The code that does this is portable and lives in mm/memory-failure.c

    To quote the overview comment:

    High level machine check handler. Handles pages reported by the
    hardware as being corrupted usually due to a 2bit ECC memory or cache
    failure.

    This focuses on pages detected as corrupted in the background.
    When the current CPU tries to consume corruption the currently
    running process can just be killed directly instead. This implies
    that if the error cannot be handled for some reason it's safe to
    just ignore it because no corruption has been consumed yet. Instead
    when that happens another machine check will happen.

    Handles page cache pages in various states. The tricky part
    here is that we can access any page asynchronous to other VM
    users, because memory failures could happen anytime and anywhere,
    possibly violating some of their assumptions. This is why this code
    has to be extremely careful. Generally it tries to use normal locking
    rules, as in get the standard locks, even if that means the
    error handling takes potentially a long time.

    Some of the operations here are somewhat inefficient and have non
    linear algorithmic complexity, because the data structures have not
    been optimized for this case. This is in particular the case
    for the mapping from a vma to a process. Since this case is expected
    to be rare we hope we can get away with this.

    There are in principle two strategies to kill processes on poison:
    - just unmap the data and wait for an actual reference before
    killing
    - kill as soon as corruption is detected.
    Both have advantages and disadvantages and should be used
    in different situations. Right now both are implemented and can
    be switched with a new sysctl vm.memory_failure_early_kill
    The default is early kill.

    The patch does some rmap data structure walking on its own to collect
    processes to kill. This is unusual because normally all rmap data structure
    knowledge is in rmap.c only. I put it here for now to keep
    everything together and rmap knowledge has been seeping out anyways

    Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
    Nick Piggin (who did a lot of great work) and others.

    Cc: npiggin@suse.de
    Cc: riel@redhat.com
    Signed-off-by: Andi Kleen
    Acked-by: Rik van Riel
    Reviewed-by: Hidehiro Kawai

    Andi Kleen
     
  • * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, pat: Fix cacheflush address in change_page_attr_set_clr()
    mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
    x86: Fix earlyprintk=dbgp for machines without NX
    x86, pat: Sanity check remap_pfn_range for RAM region
    x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
    x86, pat: Add lookup_memtype to get the current memtype of a paddr
    x86, pat: Use page flags to track memtypes of RAM pages
    x86, pat: Generalize the use of page flag PG_uncached
    x86, pat: Add rbtree to do quick lookup in memtype tracking
    x86, pat: Add PAT reserve free to io_mapping* APIs
    x86, pat: New i/f for driver to request memtype for IO regions
    x86, pat: ioremap to follow same PAT restrictions as other PAT users
    x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
    x86, mtrr: make mtrr_aps_delayed_init static bool
    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
    generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
    x86: Fix an incorrect argument of reserve_bootmem()
    x86: Fix system crash when loading with "reservetop" parameter

    Linus Torvalds
     

01 Sep, 2009

1 commit

  • CONFIG_PAGEFLAGS_EXTENDED disables a trick to conserve pageflags.
    This trick is indended to be enabled when the pressure on page flags
    is very high.

    The previous condition was:

    - depends on 64BIT || SPARSEMEM_VMEMMAP || !NUMA || !SPARSEMEM

    ... however, the sparsemem code already has a way to crowd out the
    node number from the pageflags, which means that !NUMA actually
    doesn't contribute to hard pageflags exhaustion.

    This is required for the new PG_uncached flag to not cause pageflags
    exhaustion on x86_32 + PAE + SPARSEMEM + !NUMA.

    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Cc: Venkatesh Pallipadi
    Cc: Suresh Siddha

    H. Peter Anvin
     

17 Aug, 2009

1 commit

  • Currently SELinux enforcement of controls on the ability to map low memory
    is determined by the mmap_min_addr tunable. This patch causes SELinux to
    ignore the tunable and instead use a seperate Kconfig option specific to how
    much space the LSM should protect.

    The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
    permissions will always protect the amount of low memory designated by
    CONFIG_LSM_MMAP_MIN_ADDR.

    This allows users who need to disable the mmap_min_addr controls (usual reason
    being they run WINE as a non-root user) to do so and still have SELinux
    controls preventing confined domains (like a web server) from being able to
    map some area of low memory.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris