16 May, 2018

1 commit

  • commit 27227c733852f71008e9bf165950bb2edaed3a90 upstream.

    Memory hotplug and hotremove operate with per-block granularity. If the
    machine has a large amount of memory (more than 64G), the size of a
    memory block can span multiple sections. By mistake, during hotremove
    we set only the first section to offline state.

    The bug was discovered because kernel selftest started to fail:
    https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop

    After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
    is older than this commit. In this optimization we also added a check
    for sections to be in a proper state during hotplug operation.

    Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
    Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
    Signed-off-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Steven Sistare
    Cc: Daniel Jordan
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Pavel Tatashin
     

10 Jan, 2018

1 commit

  • commit d09cfbbfa0f761a97687828b5afb27b56cbf2e19 upstream.

    In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime
    for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to
    save memory.

    It allocates the first dimension of array with sizeof(struct mem_section).

    It costs extra memory, should be sizeof(struct mem_section *).

    Fix it.

    Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com
    Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
    Signed-off-by: Baoquan He
    Tested-by: Dave Young
    Acked-by: Kirill A. Shutemov
    Cc: Kirill A. Shutemov
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Atsushi Kumagai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Baoquan He
     

25 Dec, 2017

2 commits

  • commit 629a359bdb0e0652a8227b4ff3125431995fec6e upstream.

    Since commit:

    83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")

    we allocate the mem_section array dynamically in sparse_memory_present_with_active_regions(),
    but some architectures, like arm64, don't call the routine to initialize sparsemem.

    Let's move the initialization into memory_present() it should cover all
    architectures.

    Reported-and-tested-by: Sudeep Holla
    Tested-by: Bjorn Andersson
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
    Link: http://lkml.kernel.org/r/20171107083337.89952-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar
    Cc: Dan Rue
    Cc: Naresh Kamboju
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     
  • commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.

    Size of the mem_section[] array depends on the size of the physical address space.

    In preparation for boot-time switching between paging modes on x86-64
    we need to make the allocation of mem_section[] dynamic, because otherwise
    we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB
    for 4-level paging and 2MB for 5-level paging mode.

    The patch allocates the array on the first call to sparse_memory_present_with_active_regions().

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Cyrill Gorcunov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 Sep, 2017

1 commit

  • online_mem_sections() accidentally marks online only the first section
    in the given range. This is a typo which hasn't been noticed because I
    haven't tested large 2GB blocks previously. All users of
    pfn_to_online_page would get confused on the the rest of the pfn range
    in the block.

    All we need to fix this is to use iterator (pfn) rather than start_pfn.

    Link: http://lkml.kernel.org/r/20170904112210.3401-1-mhocko@kernel.org
    Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Anshuman Khandual
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

07 Sep, 2017

1 commit

  • Commit f52407ce2dea ("memory hotplug: alloc page from other node in
    memory online") has introduced N_HIGH_MEMORY checks to only use NUMA
    aware allocations when there is some memory present because the
    respective node might not have any memory yet at the time and so it
    could fail or even OOM.

    Things have changed since then though. Zonelists are now always
    initialized before we do any allocations even for hotplug (see
    959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
    zonelist")).

    Therefore these checks are not really needed. In fact caller of the
    allocator should never care about whether the node is populated because
    that might change at any time.

    Link: http://lkml.kernel.org/r/20170721143915.14161-10-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Shaohua Li
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

07 Jul, 2017

3 commits

  • The current memory hotplug implementation relies on having all the
    struct pages associate with a zone/node during the physical hotplug
    phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the
    vast majority of cases this means that they are added to ZONE_NORMAL.
    This has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory
    hotadd without sparsemem") and it wasn't a big deal back then because
    movable onlining didn't exist yet.

    Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
    onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
    memory and portion memory") and then things got more complicated.
    Rather than reconsidering the zone association which was no longer
    needed (because the memory hotplug already depended on SPARSEMEM) a
    convoluted semantic of zone shifting has been developed. Only the
    currently last memblock or the one adjacent to the zone_movable can be
    onlined movable. This essentially means that the online type changes as
    the new memblocks are added.

    Let's simulate memory hot online manually
    $ echo 0x100000000 > /sys/devices/system/memory/probe
    $ grep . /sys/devices/system/memory/memory32/valid_zones
    Normal Movable

    $ echo $((0x100000000+(128< /sys/devices/system/memory/probe
    $ grep . /sys/devices/system/memory/memory3?/valid_zones
    /sys/devices/system/memory/memory32/valid_zones:Normal
    /sys/devices/system/memory/memory33/valid_zones:Normal Movable

    $ echo $((0x100000000+2*(128< /sys/devices/system/memory/probe
    $ grep . /sys/devices/system/memory/memory3?/valid_zones
    /sys/devices/system/memory/memory32/valid_zones:Normal
    /sys/devices/system/memory/memory33/valid_zones:Normal
    /sys/devices/system/memory/memory34/valid_zones:Normal Movable

    $ echo online_movable > /sys/devices/system/memory/memory34/state
    $ grep . /sys/devices/system/memory/memory3?/valid_zones
    /sys/devices/system/memory/memory32/valid_zones:Normal
    /sys/devices/system/memory/memory33/valid_zones:Normal Movable
    /sys/devices/system/memory/memory34/valid_zones:Movable Normal

    This is an awkward semantic because an udev event is sent as soon as the
    block is onlined and an udev handler might want to online it based on
    some policy (e.g. association with a node) but it will inherently race
    with new blocks showing up.

    This patch changes the physical online phase to not associate pages with
    any zone at all. All the pages are just marked reserved and wait for
    the onlining phase to be associated with the zone as per the online
    request. There are only two requirements

    - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

    - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

    the latter one is not an inherent requirement and can be changed in the
    future. It preserves the current behavior and made the code slightly
    simpler. This is subject to change in future.

    This means that the same physical online steps as above will lead to the
    following state: Normal Movable

    /sys/devices/system/memory/memory32/valid_zones:Normal Movable
    /sys/devices/system/memory/memory33/valid_zones:Normal Movable

    /sys/devices/system/memory/memory32/valid_zones:Normal Movable
    /sys/devices/system/memory/memory33/valid_zones:Normal Movable
    /sys/devices/system/memory/memory34/valid_zones:Normal Movable

    /sys/devices/system/memory/memory32/valid_zones:Normal Movable
    /sys/devices/system/memory/memory33/valid_zones:Normal Movable
    /sys/devices/system/memory/memory34/valid_zones:Movable

    Implementation:
    The current move_pfn_range is reimplemented to check the above
    requirements (allow_online_pfn_range) and then updates the respective
    zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
    pfn range with the zone/node. __add_pages is updated to not require the
    zone and only initializes sections in the range. This allowed to
    simplify the arch_add_memory code (s390 could get rid of quite some of
    code).

    devm_memremap_pages is the only user of arch_add_memory which relies on
    the zone association because it only hooks into the memory hotplug only
    half way. It uses it to associate the new memory with ZONE_DEVICE but
    doesn't allow it to be {on,off}lined via sysfs. This means that this
    particular code path has to call move_pfn_range_to_zone explicitly.

    The original zone shifting code is kept in place and will be removed in
    the follow up patch for an easier review.

    Please note that this patch also changes the original behavior when
    offlining a memory block adjacent to another zone (Normal vs. Movable)
    used to allow to change its movable type. This will be handled later.

    [richard.weiyang@gmail.com: simplify zone_intersects()]
    Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
    [richard.weiyang@gmail.com: remove duplicate call for set_page_links]
    Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
    [akpm@linux-foundation.org: remove unused local `i']
    Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Wei Yang
    Tested-by: Dan Williams
    Tested-by: Reza Arbab
    Acked-by: Heiko Carstens # For s390 bits
    Acked-by: Vlastimil Babka
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Balbir Singh
    Cc: Daniel Kiper
    Cc: David Rientjes
    Cc: Igor Mammedov
    Cc: Jerome Glisse
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Tobias Regnery
    Cc: Toshi Kani
    Cc: Vitaly Kuznetsov
    Cc: Xishi Qiu
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • __pageblock_pfn_to_page has two users currently, set_zone_contiguous
    which checks whether the given zone contains holes and
    pageblock_pfn_to_page which then carefully returns a first valid page
    from the given pfn range for the given zone. This doesn't handle zones
    which are not fully populated though. Memory pageblocks can be offlined
    or might not have been onlined yet. In such a case the zone should be
    considered to have holes otherwise pfn walkers can touch and play with
    offline pages.

    Current callers of pageblock_pfn_to_page in compaction seem to work
    properly right now because they only isolate PageBuddy
    (isolate_freepages_block) or PageLRU resp. __PageMovable
    (isolate_migratepages_block) which will be always false for these pages.
    It would be safer to skip these pages altogether, though.

    In order to do this patch adds a new memory section state
    (SECTION_IS_ONLINE) which is set in memory_present (during boot time) or
    in online_pages_range during the memory hotplug. Similarly
    offline_mem_sections clears the bit and it is called when the memory
    range is offlined.

    pfn_to_online_page helper is then added which check the mem section and
    only returns a page if it is onlined already.

    Use the new helper in __pageblock_pfn_to_page and skip the whole page
    block in such a case.

    [mhocko@suse.com: check valid section number in pfn_to_online_page (Vlastimil),
    mark sections online after all struct pages are initialized in
    online_pages_range (Vlastimil)]
    Link: http://lkml.kernel.org/r/20170518164210.GD18333@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20170515085827.16474-8-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Balbir Singh
    Cc: Dan Williams
    Cc: Daniel Kiper
    Cc: David Rientjes
    Cc: Heiko Carstens
    Cc: Igor Mammedov
    Cc: Jerome Glisse
    Cc: Joonsoo Kim
    Cc: Martin Schwidefsky
    Cc: Mel Gorman
    Cc: Reza Arbab
    Cc: Tobias Regnery
    Cc: Toshi Kani
    Cc: Vitaly Kuznetsov
    Cc: Xishi Qiu
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • There are a number of times that we loop over NR_MEM_SECTIONS, looking
    for section_present() on each section. But, when we have very large
    physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS
    becomes very large, making the loops quite long.

    With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops
    are 512k iterations, which we barely notice on modern hardware. But,
    raising MAX_PHYSMEM_BITS higher (like we will see on systems that
    support 5-level paging) makes this 64x longer and we start to notice,
    especially on slower systems like simulators. A 10-second delay for
    512k iterations is annoying. But, a 640- second delay is crippling.

    This does not help if we have extremely sparse physical address spaces,
    but those are quite rare. We expect that most of the "slow" systems
    where this matters will also be quite small and non-sparse.

    To fix this, we track the highest section we've ever encountered. This
    lets us know when we will *never* see another section_present(), and
    lets us break out of the loops earlier.

    Doing the whole for_each_present_section_nr() macro is probably
    overkill, but it will ensure that any future loop iterations that we
    grow are more likely to be correct.

    Kirrill said "It shaved almost 40 seconds from boot time in qemu with
    5-level paging enabled for me".

    Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.com
    Signed-off-by: Dave Hansen
    Tested-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

04 May, 2017

1 commit

  • The current implementation calculates usemap_size in two steps:
    * calculate number of bytes to cover these bits
    * calculate number of "unsigned long" to cover these bytes

    It would be more clear by:
    * calculate number of "unsigned long" to cover these bits
    * multiple it with sizeof(unsigned long)

    This patch refine usemap_size() a little to make it more easy to
    understand.

    Link: http://lkml.kernel.org/r/20170310043713.96871-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     

23 Feb, 2017

2 commits

  • To identify that pages of page table are allocated from bootmem
    allocator, magic number sets to page->lru.next.

    But page->lru list is initialized in reserve_bootmem_region(). So when
    calling free_pagetable(), the function cannot find the magic number of
    pages. And free_pagetable() frees the pages by free_reserved_page() not
    put_page_bootmem().

    But if the pages are allocated from bootmem allocator and used as page
    table, the pages have private flag. So before freeing the pages, we
    should clear the private flag by put_page_bootmem().

    Before applying the commit 7bfec6f47bb0 ("mm, page_alloc: check multiple
    page fields with a single branch"), we could find the following visible
    issue:

    BUG: Bad page state in process kworker/u1024:1
    page:ffffea103cfd8040 count:0 mapcount:0 mappi
    flags: 0x6fffff80000800(private)
    page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
    bad because of flags: 0x800(private)

    Call Trace:
    [...] dump_stack+0x63/0x87
    [...] bad_page+0x114/0x130
    [...] free_pages_prepare+0x299/0x2d0
    [...] free_hot_cold_page+0x31/0x150
    [...] __free_pages+0x25/0x30
    [...] free_pagetable+0x6f/0xb4
    [...] remove_pagetable+0x379/0x7ff
    [...] vmemmap_free+0x10/0x20
    [...] sparse_remove_one_section+0x149/0x180
    [...] __remove_pages+0x2e9/0x4f0
    [...] arch_remove_memory+0x63/0xc0
    [...] remove_memory+0x8c/0xc0
    [...] acpi_memory_device_remove+0x79/0xa5
    [...] acpi_bus_trim+0x5a/0x8d
    [...] acpi_bus_trim+0x38/0x8d
    [...] acpi_device_hotplug+0x1b7/0x418
    [...] acpi_hotplug_work_fn+0x1e/0x29
    [...] process_one_work+0x152/0x400
    [...] worker_thread+0x125/0x4b0
    [...] kthread+0xd8/0xf0
    [...] ret_from_fork+0x22/0x40

    And the issue still silently occurs.

    Until freeing the pages of page table allocated from bootmem allocator,
    the page->freelist is never used. So the patch sets magic number to
    page->freelist instead of page->lru.next.

    [isimatu.yasuaki@jp.fujitsu.com: fix merge issue]
    Link: http://lkml.kernel.org/r/722b1cc4-93ac-dd8b-2be2-7a7e313b3b0b@gmail.com
    Link: http://lkml.kernel.org/r/2c29bd9f-5b67-02d0-18a3-8828e78bbb6f@gmail.com
    Signed-off-by: Yasuaki Ishimatsu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • free_map_bootmem() uses page->private directly to set
    removing_section_nr argument. But to get page->private value,
    page_private() has been prepared.

    So free_map_bootmem() should use page_private() instead of
    page->private.

    Link: http://lkml.kernel.org/r/1d34eaa5-a506-8b7a-6471-490c345deef8@gmail.com
    Signed-off-by: Yasuaki Ishimatsu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

03 Aug, 2016

1 commit

  • There was only one use of __initdata_refok and __exit_refok

    __init_refok was used 46 times against 82 for __ref.

    Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
    section reference annotations tags: __ref, __refdata, __refconst")

    This patch removes the following compatibility definitions and replaces
    them treewide.

    /* compatibility defines */
    #define __init_refok __ref
    #define __initdata_refok __refdata
    #define __exit_refok __ref

    I can also provide separate patches if necessary.
    (One patch per tree and check in 1 month or 2 to remove old definitions)

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Ingo Molnar
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

29 Jul, 2016

1 commit

  • When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get the
    section number with a subtraction directly.

    Link: http://lkml.kernel.org/r/1468988310-11560-1-git-send-email-zhouchengming1@huawei.com
    Signed-off-by: Zhou Chengming
    Cc: Dave Hansen
    Cc: Tejun Heo
    Cc: Hanjun Guo
    Cc: Li Bin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhou Chengming
     

18 Mar, 2016

2 commits

  • Most of the mm subsystem uses pr_ so make it consistent.

    Miscellanea:

    - Realign arguments
    - Add missing newline to format
    - kmemleak-test.c has a "kmemleak: " prefix added to the
    "Kmemleak testing" logging message via pr_fmt

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Kernel style prefers a single string over split strings when the string is
    'user-visible'.

    Miscellanea:

    - Add a missing newline
    - Realign arguments

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

16 Jan, 2016

1 commit

  • In support of providing struct page for large persistent memory
    capacities, use struct vmem_altmap to change the default policy for
    allocating memory for the memmap array. The default vmemmap_populate()
    allocates page table storage area from the page allocator. Given
    persistent memory capacities relative to DRAM it may not be feasible to
    store the memmap in 'System Memory'. Instead vmem_altmap represents
    pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
    requests.

    Signed-off-by: Dan Williams
    Reported-by: kbuild test robot
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

08 Apr, 2014

1 commit

  • To increase compiler portability there is which
    provides convenience macros for various gcc constructs. Eg: __weak for
    __attribute__((weak)). I've replaced all instances of gcc attributes with
    the right macro in the memory management (/mm) subsystem.

    [akpm@linux-foundation.org: while-we're-there consistency tweaks]
    Signed-off-by: Gideon Israel Dsouza
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gideon Israel Dsouza
     

02 Apr, 2014

1 commit


22 Jan, 2014

1 commit

  • Switch to memblock interfaces for early memory allocator instead of
    bootmem allocator. No functional change in beahvior than what it is in
    current code from bootmem users points of view.

    Archs already converted to NO_BOOTMEM now directly use memblock
    interfaces instead of bootmem wrappers build on top of memblock. And
    the archs which still uses bootmem, these new apis just fallback to
    exiting bootmem APIs.

    Signed-off-by: Santosh Shilimkar
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: Grygorii Strashko
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tejun Heo
    Cc: Tony Lindgren
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     

13 Nov, 2013

2 commits

  • We pass the number of pages which hold page structs of a memory section
    to free_map_bootmem(). This is right when !CONFIG_SPARSEMEM_VMEMMAP but
    wrong when CONFIG_SPARSEMEM_VMEMMAP. When CONFIG_SPARSEMEM_VMEMMAP, we
    should pass the number of pages of a memory section to free_map_bootmem.

    So the fix is removing the nr_pages parameter. When
    CONFIG_SPARSEMEM_VMEMMAP, we directly use the prefined marco
    PAGES_PER_SECTION in free_map_bootmem. When !CONFIG_SPARSEMEM_VMEMMAP,
    we calculate page numbers needed to hold the page structs for a memory
    section and use the value in free_map_bootmem().

    This was found by reading the code. And I have no machine that support
    memory hot-remove to test the bug now.

    Signed-off-by: Zhang Yanfei
    Reviewed-by: Wanpeng Li
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Toshi Kani
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • For below functions,

    - sparse_add_one_section()
    - kmalloc_section_memmap()
    - __kmalloc_section_memmap()
    - __kfree_section_memmap()

    they are always invoked to operate on one memory section, so it is
    redundant to always pass a nr_pages parameter, which is the page numbers
    in one section. So we can directly use predefined macro PAGES_PER_SECTION
    instead of passing the parameter.

    Signed-off-by: Zhang Yanfei
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Toshi Kani
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     

12 Sep, 2013

1 commit

  • After commit 9bdac9142407 ("sparsemem: Put mem map for one node
    together."), vmemmap for one node will be allocated together, its logic
    is similar as memory allocation for pageblock flags. This patch
    introduces alloc_usemap_and_memmap to extract the same logic of memory
    alloction for pageblock flags and vmemmap.

    Signed-off-by: Wanpeng Li
    Cc: Dave Hansen
    Cc: Rik van Riel
    Cc: Fengguang Wu
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Cc: Yasuaki Ishimatsu
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Jiri Kosina
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     

10 Jul, 2013

1 commit

  • With CONFIG_MEMORY_HOTREMOVE unset, there is a compile warning:

    mm/sparse.c:755: warning: `clear_hwpoisoned_pages' defined but not used

    And Bisecting it ended up pointing to 4edd7ceff ("mm, hotplug: avoid
    compiling memory hotremove functions when disabled").

    This is because the commit above put sparse_remove_one_section() within
    the protection of CONFIG_MEMORY_HOTREMOVE but the only user of
    clear_hwpoisoned_pages() is sparse_remove_one_section(), and it is not
    within the protection of CONFIG_MEMORY_HOTREMOVE.

    So put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE should fix
    the warning.

    Signed-off-by: Zhang Yanfei
    Cc: David Rientjes
    Acked-by: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     

05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds
     

04 Jul, 2013

1 commit

  • Instead of leaving a hidden trap for the next person who comes along and
    wants to add something to mem_section, add a big fat warning about it
    needing to be a power-of-2, and insert a BUILD_BUG_ON() in sparse_init()
    to catch mistakes.

    Right now non-power-of-2 mem_sections cause a number of WARNs at boot
    (which don't clearly point to the size of mem_section as an issue), but
    the system limps on (temporarily, at least).

    This is based upon Dave Hansen's earlier RFC where he ran into the same
    issue:
    "sparsemem: fix boot when SECTIONS_PER_ROOT is not power-of-2"
    http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/03077.html

    Signed-off-by: Cody P Schafer
    Acked-by: Dave Hansen
    Cc: Jiang Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     

28 May, 2013

1 commit


30 Apr, 2013

2 commits

  • __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
    pseries will return -EOPNOTSUPP if unsupported.

    Adding an #ifdef causes several other functions it depends on to also
    become unnecessary, which saves in .text when disabled (it's disabled in
    most defconfigs besides powerpc, including x86). remove_memory_block()
    becomes static since it is not referenced outside of
    drivers/base/memory.c.

    Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
    and disabled.

    Signed-off-by: David Rientjes
    Acked-by: Toshi Kani
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Greg Kroah-Hartman
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The sparse code, when asking the architecture to populate the vmemmap,
    specifies the section range as a starting page and a number of pages.

    This is an awkward interface, because none of the arch-specific code
    actually thinks of the range in terms of 'struct page' units and always
    translates it to bytes first.

    In addition, later patches mix huge page and regular page backing for
    the vmemmap. For this, they need to call vmemmap_populate_basepages()
    on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these
    are not necessarily multiples of the 'struct page' size and so this unit
    is too coarse.

    Just translate the section range into bytes once in the generic sparse
    code, then pass byte ranges down the stack.

    Signed-off-by: Johannes Weiner
    Cc: Ben Hutchings
    Cc: Bernhard Schmidt
    Cc: Johannes Weiner
    Cc: Russell King
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: Heiko Carstens
    Acked-by: David S. Miller
    Tested-by: David S. Miller
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

24 Feb, 2013

4 commits

  • Since MCE is an x86 concept, and this code is in mm/, it would be better
    to use the name num_poisoned_pages instead of mce_bad_pages.

    [akpm@linux-foundation.org: fix mm/sparse.c]
    Signed-off-by: Xishi Qiu
    Signed-off-by: Jiang Liu
    Suggested-by: Borislav Petkov
    Reviewed-by: Wanpeng Li
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • usemap could also be allocated as compound pages. Should also consider
    compound pages when freeing memmap.

    If we don't fix it, there could be problems when we free vmemmap
    pagetables which are stored in compound pages. The old pagetables will
    not be freed properly, and when we add the memory again, no new
    pagetable will be created. And the old pagetable entry is used, than
    the kernel will panic.

    The call trace is like the following:

    BUG: unable to handle kernel paging request at ffffea0040000000
    IP: [] sparse_add_one_section+0xef/0x166
    PGD 7ff7d4067 PUD 78e035067 PMD 78e11d067 PTE 0
    Oops: 0002 [#1] SMP
    Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg lpc_ich mfd_core i2c_i801 i2c_core i7core_edac edac_core ioatdma e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
    CPU 0
    Pid: 4, comm: kworker/0:0 Tainted: G W 3.8.0-rc3-phy-hot-remove+ #3 FUJITSU-SV PRIMEQUEST 1800E/SB
    RIP: 0010:[] [] sparse_add_one_section+0xef/0x166
    RSP: 0018:ffff8807bdcb35d8 EFLAGS: 00010006
    RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000200000
    RDX: ffff88078df01148 RSI: 0000000000000282 RDI: ffffea0040000000
    RBP: ffff8807bdcb3618 R08: 4cf05005b019467a R09: 0cd98fa09631467a
    R10: 0000000000000000 R11: 0000000000030e20 R12: 0000000000008000
    R13: ffffea0040000000 R14: ffff88078df66248 R15: ffff88078ea13b10
    FS: 0000000000000000(0000) GS:ffff8807c1a00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffea0040000000 CR3: 0000000001c0c000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kworker/0:0 (pid: 4, threadinfo ffff8807bdcb2000, task ffff8807bde18000)
    Call Trace:
    __add_pages+0x85/0x120
    arch_add_memory+0x71/0xf0
    add_memory+0xd6/0x1f0
    acpi_memory_device_add+0x170/0x20c
    acpi_device_probe+0x50/0x18a
    really_probe+0x6c/0x320
    driver_probe_device+0x47/0xa0
    __device_attach+0x53/0x60
    bus_for_each_drv+0x6c/0xa0
    device_attach+0xa8/0xc0
    bus_probe_device+0xb0/0xe0
    device_add+0x301/0x570
    device_register+0x1e/0x30
    acpi_device_register+0x1d8/0x27c
    acpi_add_single_object+0x1df/0x2b9
    acpi_bus_check_add+0x112/0x18f
    acpi_ns_walk_namespace+0x105/0x255
    acpi_walk_namespace+0xcf/0x118
    acpi_bus_scan+0x5b/0x7c
    acpi_bus_add+0x2a/0x2c
    container_notify_cb+0x112/0x1a9
    acpi_ev_notify_dispatch+0x46/0x61
    acpi_os_execute_deferred+0x27/0x34
    process_one_work+0x20e/0x5c0
    worker_thread+0x12e/0x370
    kthread+0xee/0x100
    ret_from_fork+0x7c/0xb0
    Code: 00 00 48 89 df 48 89 45 c8 e8 3e 71 b1 ff 48 89 c2 48 8b 75 c8 b8 ef ff ff ff f6 02 01 75 4b 49 63 cc 31 c0 4c 89 ef 48 c1 e1 06 aa 48 8b 02 48 83 c8 01 48 85 d2 48 89 02 74 29 a8 01 74 25
    RIP [] sparse_add_one_section+0xef/0x166
    RSP
    CR2: ffffea0040000000
    ---[ end trace e7f94e3a34c442d4 ]---
    Kernel panic - not syncing: Fatal exception

    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • Introduce a new API vmemmap_free() to free and remove vmemmap
    pagetables. Since pagetable implements are different, each architecture
    has to provide its own version of vmemmap_free(), just like
    vmemmap_populate().

    Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.

    [mhocko@suse.cz: fix implicit declaration of remove_pagetable]
    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Jianguo Wu
    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • In __remove_section(), we locked pgdat_resize_lock when calling
    sparse_remove_one_section(). This lock will disable irq. But we don't
    need to lock the whole function. If we do some work to free pagetables
    in free_section_usemap(), we need to call flush_tlb_all(), which need
    irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
    will be triggered.

    If we lock the whole sparse_remove_one_section(), then we come to this call trace:

    ------------[ cut here ]------------
    WARNING: at kernel/smp.c:461 smp_call_function_many+0xbd/0x260()
    Hardware name: PRIMEQUEST 1800E
    ......
    Call Trace:
    smp_call_function_many+0xbd/0x260
    smp_call_function+0x3b/0x50
    on_each_cpu+0x3b/0xc0
    flush_tlb_all+0x1c/0x20
    remove_pagetable+0x14e/0x1d0
    vmemmap_free+0x18/0x20
    sparse_remove_one_section+0xf7/0x100
    __remove_section+0xa2/0xb0
    __remove_pages+0xa0/0xd0
    arch_remove_memory+0x6b/0xc0
    remove_memory+0xb8/0xf0
    acpi_memory_device_remove+0x53/0x96
    acpi_device_remove+0x90/0xb2
    __device_release_driver+0x7c/0xf0
    device_release_driver+0x2f/0x50
    acpi_bus_remove+0x32/0x6d
    acpi_bus_trim+0x91/0x102
    acpi_bus_hot_remove_device+0x88/0x16b
    acpi_os_execute_deferred+0x27/0x34
    process_one_work+0x20e/0x5c0
    worker_thread+0x12e/0x370
    kthread+0xee/0x100
    ret_from_fork+0x7c/0xb0
    ---[ end trace 25e85300f542aa01 ]---

    Signed-off-by: Tang Chen
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Wen Congyang
    Acked-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

12 Dec, 2012

2 commits

  • If sparse memory vmemmap is enabled, we can't free the memory to store
    struct page when a memory device is hotremoved, because we may store
    struct page in the memory to manage the memory which doesn't belong to
    this memory device. When we hotadded this memory device again, we will
    reuse this memory to store struct page, and struct page may contain some
    obsolete information, and we will get bad-page state:

    init_memory_mapping: [mem 0x80000000-0x9fffffff]
    Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617
    Policy zone: Normal
    BUG: Bad page state in process bash pfn:9b6dc
    page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd
    page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
    Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
    Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
    Call Trace:
    [] ? bad_page+0xb0/0x100
    [] ? free_pages_prepare+0xb3/0x100
    [] ? free_hot_cold_page+0x48/0x1a0
    [] ? online_pages_range+0x68/0xa0
    [] ? __online_page_increment_counters+0x10/0x10
    [] ? walk_system_ram_range+0x101/0x110
    [] ? online_pages+0x1a5/0x2b0
    [] ? __memory_block_change_state+0x20d/0x270
    [] ? store_mem_state+0xb6/0xf0
    [] ? sysfs_write_file+0xd2/0x160
    [] ? vfs_write+0xaa/0x160
    [] ? sys_write+0x47/0x90
    [] ? async_page_fault+0x25/0x30
    [] ? system_call_fastpath+0x16/0x1b
    Disabling lock debugging due to kernel taint

    This patch clears the memory to store struct page to avoid unexpected error.

    Signed-off-by: Wen Congyang
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Minchan Kim
    Acked-by: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Reported-by: Vasilis Liaskovitis
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • When we hotremove a memory device, we will free the memory to store struct
    page. If the page is hwpoisoned page, we should decrease mce_bad_pages.

    [akpm@linux-foundation.org: cleanup ifdefs]
    Signed-off-by: Wen Congyang
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Len Brown
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Christoph Lameter
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Cc: Dave Hansen
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     

01 Dec, 2012

1 commit

  • I enable CONFIG_DEBUG_VIRTUAL and CONFIG_SPARSEMEM_VMEMMAP, when doing
    memory hotremove, there is a kernel BUG at arch/x86/mm/physaddr.c:20.

    It is caused by free_section_usemap()->virt_to_page(), virt_to_page() is
    only used for kernel direct mapping address, but sparse-vmemmap uses
    vmemmap address, so it is going wrong here.

    ------------[ cut here ]------------
    kernel BUG at arch/x86/mm/physaddr.c:20!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: acpihp_drv acpihp_slot edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse vfat fat loop dm_mod coretemp kvm crc32c_intel ipv6 ixgbe igb iTCO_wdt i7core_edac edac_core pcspkr iTCO_vendor_support ioatdma microcode joydev sr_mod i2c_i801 dca lpc_ich mfd_core mdio tpm_tis i2c_core hid_generic tpm cdrom sg tpm_bios rtc_cmos button ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh ata_generic ata_piix libata megaraid_sas scsi_mod
    CPU 39
    Pid: 6454, comm: sh Not tainted 3.7.0-rc1-acpihp-final+ #45 QCI QSSC-S4R/QSSC-S4R
    RIP: 0010:[] [] __phys_addr+0x88/0x90
    RSP: 0018:ffff8804440d7c08 EFLAGS: 00010006
    RAX: 0000000000000006 RBX: ffffea0012000000 RCX: 000000000000002c
    ...

    Signed-off-by: Jianguo Wu
    Signed-off-by: Jiang Liu
    Reviewd-by: Wen Congyang
    Acked-by: Johannes Weiner
    Reviewed-by: Yasuaki Ishimatsu
    Reviewed-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     

01 Aug, 2012

3 commits

  • sparse_index_init() uses the index_init_lock spinlock to protect root
    mem_section assignment. The lock is not necessary anymore because the
    function is called only during boot (during paging init which is executed
    only from a single CPU) and from the hotplug code (by add_memory() via
    arch_add_memory()) which uses mem_hotplug_mutex.

    The lock was introduced by 28ae55c9 ("sparsemem extreme: hotplug
    preparation") and sparse_index_init() was used only during boot at that
    time.

    Later when the hotplug code (and add_memory()) was introduced there was no
    synchronization so it was possible to online more sections from the same
    root probably (though I am not 100% sure about that). The first
    synchronization has been added by 6ad696d2 ("mm: allow memory hotplug and
    hibernation in the same kernel") which was later replaced by the
    mem_hotplug_mutex - 20d6c96b ("mem-hotplug: introduce
    {un}lock_memory_hotplug()").

    Let's remove the lock as it is not needed and it makes the code more
    confusing.

    [mhocko@suse.cz: changelog]
    Signed-off-by: Gavin Shan
    Reviewed-by: Michal Hocko
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • __section_nr() was implemented to retrieve the corresponding memory
    section number according to its descriptor. It's possible that the
    specified memory section descriptor doesn't exist in the global array. So
    add more checking on that and report an error for a wrong case.

    Signed-off-by: Gavin Shan
    Acked-by: David Rientjes
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • With CONFIG_SPARSEMEM_EXTREME, the two levels of memory section
    descriptors are allocated from slab or bootmem. When allocating from
    slab, let slab/bootmem allocator clear the memory chunk. We needn't clear
    it explicitly.

    Signed-off-by: Gavin Shan
    Reviewed-by: Michal Hocko
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan