10 Jul, 2013

2 commits


04 Jul, 2013

8 commits

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds
     
  • Prepare for removing num_physpages.

    Signed-off-by: Jiang Liu
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Enhance adjust_managed_page_count() to adjust totalhigh_pages for
    highmem pages. And change code which directly adjusts totalram_pages to
    use adjust_managed_page_count() because it adjusts totalram_pages,
    totalhigh_pages and zone->managed_pages altogether in a safe way.

    Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon
    driver bacause adjust_managed_page_count() has already adjusted
    totalhigh_pages.

    This patch also fixes two bugs:

    1) enhances virtio_balloon driver to adjust totalhigh_pages when
    reserve/unreserve pages.
    2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing
    memory.

    We still need to deal with modifications of totalram_pages in file
    arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts.

    [akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei]
    [akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c]
    Signed-off-by: Jiang Liu
    Cc: Chris Metcalf
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: "H. Peter Anvin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Marek Szyprowski
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yinghai Lu
    Cc: Russell King
    Cc: Sergei Shtylyov
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • In order to simpilify management of totalram_pages and
    zone->managed_pages, make __free_pages_bootmem() only available at boot
    time. With this change applied, __free_pages_bootmem() will only be
    used by bootmem.c and nobootmem.c at boot time, so mark it as __init.
    Other callers of __free_pages_bootmem() have been converted to use
    free_reserved_page(), which handles totalram_pages and
    zone->managed_pages in a safer way.

    This patch also fix a bug in free_pagetable() for x86_64, which should
    increase zone->managed_pages instead of zone->present_pages when freeing
    reserved pages.

    And now we have managed_pages_count_lock to protect totalram_pages and
    zone->managed_pages, so remove the redundant ppb_lock lock in
    put_page_bootmem(). This greatly simplifies the locking rules.

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Will Deacon
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Fix some trivial typos in comments.

    Signed-off-by: Jiang Liu
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Marek Szyprowski
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • During early boot-up, iomem_resource is set up from the boot descriptor
    table, such as EFI Memory Table and e820. Later,
    acpi_memory_device_add() calls add_memory() for each ACPI memory device
    object as it enumerates ACPI namespace. This add_memory() call is
    expected to fail in register_memory_resource() at boot since
    iomem_resource has been set up from EFI/e820. As a result, add_memory()
    returns -EEXIST, which acpi_memory_device_add() handles as the normal
    case.

    This scheme works fine, but the following error message is logged for
    every ACPI memory device object during boot-up.

    "System RAM resource %pR cannot be added\n"

    This patch changes register_memory_resource() to use pr_debug() for the
    message as it shows up under the normal case.

    Signed-off-by: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as
    follows:

    * Must be held any time you expect node_start_pfn, node_present_pages
    * or node_spanned_pages stay constant. [...]

    So actually hold it when we update node_present_pages in __offline_pages().

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Cody P Schafer
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as
    follows:

    * Must be held any time you expect node_start_pfn, node_present_pages
    * or node_spanned_pages stay constant. [...]

    So actually hold it when we update node_present_pages in online_pages().

    Signed-off-by: Cody P Schafer
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     

28 Jun, 2013

1 commit

  • * acpi-hotplug:
    ACPI: Do not use CONFIG_ACPI_HOTPLUG_MEMORY_MODULE
    ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq
    Memory hotplug: Move alternative function definitions to header
    ACPI / processor: Fix potential NULL pointer dereference in acpi_processor_add()
    Memory hotplug / ACPI: Simplify memory removal
    ACPI / scan: Add second pass of companion offlining to hot-remove code
    Driver core / MM: Drop offline_memory_block()
    ACPI / processor: Pass processor object handle to acpi_bind_one()
    ACPI: Drop removal_type field from struct acpi_device
    Driver core / memory: Simplify __memory_block_change_state()
    ACPI / processor: Initialize per_cpu(processors, pr->id) properly
    CPU: Fix sysfs cpu/online of offlined CPUs
    Driver core: Introduce offline/online callbacks for memory blocks
    ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
    ACPI / processor: Use common hotplug infrastructure
    ACPI / hotplug: Use device offline/online for graceful hot-removal
    Driver core: Use generic offline/online for CPU offline/online
    Driver core: Add offline/online device operations

    Rafael J. Wysocki
     

02 Jun, 2013

3 commits

  • Move the definitions of offline_pages() and remove_memory()
    for CONFIG_MEMORY_HOTREMOVE to memory_hotplug.h, where they belong,
    and make them static inline.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Now that the memory offlining should be taken care of by the
    companion device offlining code in acpi_scan_hot_remove(), the
    ACPI memory hotplug driver doesn't need to offline it in
    remove_memory() any more. Moreover, since the return value of
    remove_memory() is not used, it's better to make it be a void
    function and trigger a BUG() if the memory scheduled for removal is
    not offline.

    Change the code in accordance with the above observations.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Toshi Kani

    Rafael J. Wysocki
     
  • Since offline_memory_block(mem) is functionally equivalent to
    device_offline(&mem->dev), make the only caller of the former use
    the latter instead and drop offline_memory_block() entirely.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Acked-by: Toshi Kani

    Rafael J. Wysocki
     

25 May, 2013

1 commit

  • Fix printk format warnings in mm/memory_hotplug.c by using "%pa":

    mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat]
    mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat]

    Signed-off-by: Randy Dunlap
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

12 May, 2013

1 commit

  • During ACPI memory hotplug configuration bind memory blocks residing
    in modules removable through the standard ACPI mechanism to struct
    acpi_device objects associated with ACPI namespace objects
    representing those modules. Accordingly, unbind those memory blocks
    from the struct acpi_device objects when the memory modules in
    question are being removed.

    When "offline" operation for devices representing memory blocks is
    introduced, this will allow the ACPI core's device hot-remove code to
    use it to carry out remove_memory() for those memory blocks and check
    the results of that before it actually removes the modules holding
    them from the system.

    Since walk_memory_range() is used for accessing all memory blocks
    corresponding to a given ACPI namespace object, it is exported from
    memory_hotplug.c so that the code in acpi_memhotplug.c can use it.

    Signed-off-by: Rafael J. Wysocki
    Tested-by: Vasilis Liaskovitis
    Reviewed-by: Toshi Kani

    Rafael J. Wysocki
     

30 Apr, 2013

4 commits

  • PFN_PHYS() is a phys_addr_t, which can be u32 or u64.
    Fix the build warning when phys_addr_t is u32.

    mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat]: => 1685:3
    mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]: => 1685:3

    Signed-off-by: Randy Dunlap
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
    pseries will return -EOPNOTSUPP if unsupported.

    Adding an #ifdef causes several other functions it depends on to also
    become unnecessary, which saves in .text when disabled (it's disabled in
    most defconfigs besides powerpc, including x86). remove_memory_block()
    becomes static since it is not referenced outside of
    drivers/base/memory.c.

    Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
    and disabled.

    Signed-off-by: David Rientjes
    Acked-by: Toshi Kani
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Greg Kroah-Hartman
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Change __remove_pages() to call release_mem_region_adjustable(). This
    allows a requested memory range to be released from the iomem_resource
    table even if it does not match exactly to an resource entry but still
    fits into. The resource entries initialized at bootup usually cover the
    whole contiguous memory ranges and may not necessarily match with the
    size of memory hot-delete requests.

    If release_mem_region_adjustable() failed, __remove_pages() emits a
    warning message and continues to proceed as it was the case with
    release_mem_region(). release_mem_region(), which is defined to
    __release_region(), emits a warning message and returns no error since a
    void function.

    Signed-off-by: Toshi Kani
    Reviewed-by : Yasuaki Ishimatsu
    Acked-by: David Rientjes
    Cc: Ram Pai
    Cc: T Makphaibulchoke
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Jiang Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Fix a typo "end_pft" in the comment of walk_memory_range().

    Signed-off-by: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

23 Mar, 2013

1 commit


14 Mar, 2013

1 commit

  • remove_memory() calls walk_memory_range() with [start_pfn, end_pfn), where
    end_pfn is exclusive in this range. Therefore, end_pfn needs to be set to
    the next page of the end address.

    Signed-off-by: Toshi Kani
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Kamezawa Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

24 Feb, 2013

18 commits

  • Replace open coded pgdat_end_pfn() with helper function.

    Signed-off-by: Cody P Schafer
    Cc: David Hansen
    Cc: Catalin Marinas
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Remove open coding of ensure_zone_is_initialzied().

    Signed-off-by: Cody P Schafer
    Cc: David Hansen
    Cc: Catalin Marinas
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • ensure_zone_is_initialized() checks if a zone is in a empty & not
    initialized state (typically occuring after it is created in memory
    hotplugging), and, if so, calls init_currently_empty_zone() to
    initialize the zone.

    Signed-off-by: Cody P Schafer
    Cc: David Hansen
    Cc: Catalin Marinas
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code
    duplication.

    This also switches to using them in compaction (where an additional
    variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and
    kmemleak.

    Note that in compaction.c I avoid calling zone_end_pfn() repeatedly
    because I expect at some point the sycronization issues with start_pfn &
    spanned_pages will need fixing, either by actually using the seqlock or
    clever memory barrier usage.

    Signed-off-by: Cody P Schafer
    Cc: David Hansen
    Cc: Catalin Marinas
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • No functional change, but the only purpose of the offlining argument to
    migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
    KSM page for memory hotremove (which took ksm_thread_mutex) but not for
    other callers. Now all cases are safe, remove the arg.

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Cc: Petr Holasek
    Cc: Andrea Arcangeli
    Cc: Izik Eidus
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Function put_page_bootmem() is used to free pages allocated by bootmem
    allocator, so it should increase totalram_pages when freeing pages into
    the buddy system.

    Signed-off-by: Jiang Liu
    Cc: Wen Congyang
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Maciej Rutecki
    Cc: Chris Clayton
    Cc: "Rafael J . Wysocki"
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Jianguo Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • When the node is offlined, there is no memory/cpu on the node. If a
    sleep task runs on a cpu of this node, it will be migrated to the cpu on
    the other node. So we can clear cpu-to-node mapping.

    [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit]
    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • try_offline_node() will be needed in the tristate
    drivers/acpi/processor_driver.c.

    The node will be offlined when all memory/cpu on the node have been
    hotremoved. So we need the function try_offline_node() in cpu-hotplug
    path.

    If the memory-hotplug is disabled, and cpu-hotplug is enabled

    1. no memory no the node
    we don't online the node, and cpu's node is the nearest node.

    2. the node contains some memory
    the node has been onlined, and cpu's node is still needed
    to migrate the sleep task on the cpu to the same node.

    So we do nothing in try_offline_node() in this case.

    [rientjes@google.com: export the function try_offline_node() fix]
    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Len Brown
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • Since there is no way to guarentee the address of pgdat/zone is not on
    stack of any kernel threads or used by other kernel objects without
    reference counting or other symchronizing method, we cannot reset
    node_data and free pgdat when offlining a node. Just reset pgdat to 0
    and reuse the memory when the node is online again.

    The problem is suggested by Kamezawa Hiroyuki. The idea is from Wen
    Congyang.

    NOTE: If we don't reset pgdat to 0, the WARN_ON in free_area_init_node()
    will be triggered.

    [akpm@linux-foundation.org: fix warning when CONFIG_NEED_MULTIPLE_NODES=n]
    [akpm@linux-foundation.org: fix the warning again again]
    Signed-off-by: Tang Chen
    Reviewed-by: Wen Congyang
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • We call hotadd_new_pgdat() to allocate memory to store node_data. So we
    should free it when removing a node.

    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Reviewed-by: Kamezawa Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • Introduce a new function try_offline_node() to remove sysfs file of node
    when all memory sections of this node are removed. If some memory
    sections of this node are not removed, this function does nothing.

    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • When memory is added, we update zone's and pgdat's start_pfn and
    spanned_pages in __add_zone(). So we should revert them when the memory
    is removed.

    The patch adds a new function __remove_zone() to do this.

    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even
    if we use SPARSEMEM_VMEMMAP, we can unregister the memory_section.

    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • In __remove_section(), we locked pgdat_resize_lock when calling
    sparse_remove_one_section(). This lock will disable irq. But we don't
    need to lock the whole function. If we do some work to free pagetables
    in free_section_usemap(), we need to call flush_tlb_all(), which need
    irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
    will be triggered.

    If we lock the whole sparse_remove_one_section(), then we come to this call trace:

    ------------[ cut here ]------------
    WARNING: at kernel/smp.c:461 smp_call_function_many+0xbd/0x260()
    Hardware name: PRIMEQUEST 1800E
    ......
    Call Trace:
    smp_call_function_many+0xbd/0x260
    smp_call_function+0x3b/0x50
    on_each_cpu+0x3b/0xc0
    flush_tlb_all+0x1c/0x20
    remove_pagetable+0x14e/0x1d0
    vmemmap_free+0x18/0x20
    sparse_remove_one_section+0xf7/0x100
    __remove_section+0xa2/0xb0
    __remove_pages+0xa0/0xd0
    arch_remove_memory+0x6b/0xc0
    remove_memory+0xb8/0xf0
    acpi_memory_device_remove+0x53/0x96
    acpi_device_remove+0x90/0xb2
    __device_release_driver+0x7c/0xf0
    device_release_driver+0x2f/0x50
    acpi_bus_remove+0x32/0x6d
    acpi_bus_trim+0x91/0x102
    acpi_bus_hot_remove_device+0x88/0x16b
    acpi_os_execute_deferred+0x27/0x34
    process_one_work+0x20e/0x5c0
    worker_thread+0x12e/0x370
    kthread+0xee/0x100
    ret_from_fork+0x7c/0xb0
    ---[ end trace 25e85300f542aa01 ]---

    Signed-off-by: Tang Chen
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Wen Congyang
    Acked-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • For removing memmap region of sparse-vmemmap which is allocated bootmem,
    memmap region of sparse-vmemmap needs to be registered by
    get_page_bootmem(). So the patch searches pages of virtual mapping and
    registers the pages by get_page_bootmem().

    NOTE: register_page_bootmem_memmap() is not implemented for ia64,
    ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
    and revert register_page_bootmem_info_node() when platform doesn't
    support it.

    It's implemented by adding a new Kconfig option named
    CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
    by memory-hotplug feature fully supported archs(currently only on
    x86_64).

    Since we have 2 config options called MEMORY_HOTPLUG and
    MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
    and codes in function register_page_bootmem_info_node() are only
    used for collecting infomation for hot-remove, so reside it under
    MEMORY_HOTREMOVE.

    Besides page_isolation.c selected by MEMORY_ISOLATION under
    MEMORY_HOTPLUG is also such case, move it too.

    [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
    [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
    [mhocko@suse.cz: remove the arch specific functions without any implementation]
    [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
    [rientjes@google.com: fix defined but not used warning]
    Signed-off-by: Wen Congyang
    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Tang Chen
    Reviewed-by: Wu Jianguo
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Michal Hocko
    Signed-off-by: Lin Feng
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • For removing memory, we need to remove page tables. But it depends on
    architecture. So the patch introduce arch_remove_memory() for removing
    page table. Now it only calls __remove_pages().

    Note: __remove_pages() for some archtecuture is not implemented
    (I don't know how to implement it for s390).

    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Acked-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Kamezawa Hiroyuki
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     
  • When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start,
    type} sysfs files are created. But there is no code to remove these
    files. This patch implements the function to remove them.

    We cannot free firmware_map_entry which is allocated by bootmem because
    there is no way to do so when the system is up. But we can at least
    remember the address of that memory and reuse the storage when the
    memory is added next time.

    This patch also introduces a new list map_entries_bootmem to link the
    map entries allocated by bootmem when they are removed, and a lock to
    protect it. And these entries will be reused when the memory is
    hot-added again.

    The idea is suggestted by Andrew Morton.

    NOTE: It is unsafe to return an entry pointer and release the
    map_entries_lock. So we should not hold the map_entries_lock
    separately in firmware_map_find_entry() and
    firmware_map_remove_entry(). Hold the map_entries_lock across find
    and remove /sys/firmware/memmap/X operation.

    And also, users of these two functions need to be careful to
    hold the lock when using these two functions.

    [tangchen@cn.fujitsu.com: Hold spinlock across find|remove /sys operation]
    [tangchen@cn.fujitsu.com: fix the wrong comments of map_entries]
    [tangchen@cn.fujitsu.com: reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem]
    [tangchen@cn.fujitsu.com: fix section mismatch problem]
    [tangchen@cn.fujitsu.com: fix the doc format in drivers/firmware/memmap.c]
    Signed-off-by: Wen Congyang
    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Tang Chen
    Reviewed-by: Kamezawa Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Lai Jiangshan
    Cc: Tang Chen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Julian Calaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • offlining memory blocks and checking whether memory blocks are offlined
    are very similar. This patch introduces a new function to remove
    redundant codes.

    Signed-off-by: Wen Congyang
    Signed-off-by: Tang Chen
    Reviewed-by: Kamezawa Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Lai Jiangshan
    Cc: Wu Jianguo
    Cc: Yasuaki Ishimatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang