11 Jun, 2015

1 commit

  • Izumi found the following oops when hot re-adding a node:

    BUG: unable to handle kernel paging request at ffffc90008963690
    IP: __wake_up_bit+0x20/0x70
    Oops: 0000 [#1] SMP
    CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
    Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
    task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
    RIP: 0010:[] [] __wake_up_bit+0x20/0x70
    RSP: 0018:ffff880017b97be8 EFLAGS: 00010246
    RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
    RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
    RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
    R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
    FS: 00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
    Call Trace:
    unlock_page+0x6d/0x70
    generic_write_end+0x53/0xb0
    xfs_vm_write_end+0x29/0x80 [xfs]
    generic_perform_write+0x10a/0x1e0
    xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
    xfs_file_write_iter+0x79/0x120 [xfs]
    __vfs_write+0xd4/0x110
    vfs_write+0xac/0x1c0
    SyS_write+0x58/0xd0
    system_call_fastpath+0x12/0x76
    Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
    RIP [] __wake_up_bit+0x20/0x70
    RSP
    CR2: ffffc90008963690

    Reproduce method (re-add a node)::
    Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)

    This seems an use-after-free problem, and the root cause is
    zone->wait_table was not set to *NULL* after free it in
    try_offline_node.

    When hot re-add a node, we will reuse the pgdat of it, so does the zone
    struct, and when add pages to the target zone, it will init the zone
    first (including the wait_table) if the zone is not initialized. The
    judgement of zone initialized is based on zone->wait_table:

    static inline bool zone_is_initialized(struct zone *zone)
    {
    return !!zone->wait_table;
    }

    so if we do not set the zone->wait_table to *NULL* after free it, the
    memory hotplug routine will skip the init of new zone when hot re-add
    the node, and the wait_table still points to the freed memory, then we
    will access the invalid address when trying to wake up the waiting
    people after the i/o operation with the page is done, such as mentioned
    above.

    Signed-off-by: Gu Zheng
    Reported-by: Taku Izumi
    Reviewed by: Yasuaki Ishimatsu
    Cc: KAMEZAWA Hiroyuki
    Cc: Tang Chen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gu Zheng
     

16 Apr, 2015

1 commit

  • Now we have an easy access to hugepages' activeness, so existing helpers to
    get the information can be cleaned up.

    [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/]
    Signed-off-by: Naoya Horiguchi
    Cc: Hugh Dickins
    Reviewed-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

15 Apr, 2015

2 commits

  • There's a deadlock when concurrently hot-adding memory through the probe
    interface and switching a memory block from offline to online.

    When hot-adding memory via the probe interface, add_memory() first takes
    mem_hotplug_begin() and then device_lock() is later taken when registering
    the newly initialized memory block. This creates a lock dependency of (1)
    mem_hotplug.lock (2) dev->mutex.

    When switching a memory block from offline to online, dev->mutex is first
    grabbed in device_online() when the write(2) transitions an existing
    memory block from offline to online, and then online_pages() will take
    mem_hotplug_begin().

    This creates a lock inversion between mem_hotplug.lock and dev->mutex.
    Vitaly reports that this deadlock can happen when kworker handling a probe
    event races with systemd-udevd switching a memory block's state.

    This patch requires the state transition to take mem_hotplug_begin()
    before dev->mutex. Hot-adding memory via the probe interface creates a
    memory block while holding mem_hotplug_begin(), there is no way to take
    dev->mutex first in this case.

    online_pages() and offline_pages() are only called when transitioning
    memory block state. We now require that mem_hotplug_begin() is taken
    before calling them -- this requires exporting the mem_hotplug_begin() and
    mem_hotplug_done() to generic code. In all hot-add and hot-remove cases,
    mem_hotplug_begin() is done prior to device_online(). This is all that is
    needed to avoid the deadlock.

    Signed-off-by: David Rientjes
    Reported-by: Vitaly Kuznetsov
    Tested-by: Vitaly Kuznetsov
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: "K. Y. Srinivasan"
    Cc: Yasuaki Ishimatsu
    Cc: Tang Chen
    Cc: Vlastimil Babka
    Cc: Zhang Zhen
    Cc: Vladimir Davydov
    Cc: Wang Nan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Use macro section_nr_to_pfn() to switch between section and pfn, instead
    of open-coding it. No semantic changes.

    Signed-off-by: Sheng Yong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sheng Yong
     

26 Mar, 2015

1 commit

  • Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under
    stress condition:

    BUG: unable to handle kernel paging request at 0000000000025f60
    IP: next_online_pgdat+0x1/0x50
    PGD 0
    Oops: 0000 [#1] SMP
    ACPI: Device does not support D3cold
    Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
    CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1
    Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
    Workqueue: events vmstat_update
    task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
    RIP: 0010: next_online_pgdat+0x1/0x50
    RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286
    RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
    RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
    RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
    R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
    R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
    FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    refresh_cpu_vm_stats+0xd0/0x140
    vmstat_update+0x11/0x50
    process_one_work+0x194/0x3d0
    worker_thread+0x12b/0x410
    kthread+0xc6/0xd0
    ret_from_fork+0x7c/0xb0

    The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
    try_offline_node, which will reset all the content of pgdat to 0, as the
    pgdat is accessed lock-free, so that the users still using the pgdat
    will panic, such as the vmstat_update routine.

    process A: offline node XX:

    vmstat_updat()
    refresh_cpu_vm_stats()
    for_each_populated_zone()
    find online node XX
    cond_resched()
    offline cpu and memory, then try_offline_node()
    node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
    zone = next_zone(zone)
    pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now
    next_online_pgdat(pgdat)
    next_online_node(pgdat->node_id); // NULL pointer access

    So the solution here is postponing the reset of obsolete pgdat from
    try_offline_node() to hotadd_new_pgdat(), and just resetting
    pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
    0 to avoid breaking pointer information in pgdat.

    Signed-off-by: Gu Zheng
    Reported-by: Xishi Qiu
    Suggested-by: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: Yasuaki Ishimatsu
    Cc: Taku Izumi
    Cc: Tang Chen
    Cc: Xie XiuQi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gu Zheng
     

11 Dec, 2014

2 commits

  • Memory hotplug and failure mechanisms have several places where pcplists
    are drained so that pages are returned to the buddy allocator and can be
    e.g. prepared for offlining. This is always done in the context of a
    single zone, we can reduce the pcplists drain to the single zone, which
    is now possible.

    The change should make memory offlining due to hotremove or failure
    faster and not disturbing unrelated pcplists anymore.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The functions for draining per-cpu pages back to buddy allocators
    currently always operate on all zones. There are however several cases
    where the drain is only needed in the context of a single zone, and
    spilling other pcplists is a waste of time both due to the extra
    spilling and later refilling.

    This patch introduces new zone pointer parameter to drain_all_pages()
    and changes the dummy parameter of drain_local_pages() to be also a zone
    pointer. When NULL is passed, the functions operate on all zones as
    usual. Passing a specific zone pointer reduces the work to the single
    zone.

    All callers are updated to pass the NULL pointer in this patch.
    Conversion to single zone (where appropriate) is done in further
    patches.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

14 Nov, 2014

2 commits

  • When memory is hot-added, all the memory is in offline state. So clear
    all zones' present_pages because they will be updated in online_pages()
    and offline_pages(). Otherwise, /proc/zoneinfo will corrupt:

    When the memory of node2 is offline:

    # cat /proc/zoneinfo
    ......
    Node 2, zone Movable
    ......
    spanned 8388608
    present 8388608
    managed 0

    When we online memory on node2:

    # cat /proc/zoneinfo
    ......
    Node 2, zone Movable
    ......
    spanned 8388608
    present 16777216
    managed 8388608

    Signed-off-by: Tang Chen
    Reviewed-by: Yasuaki Ishimatsu
    Cc: [3.16+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • In free_area_init_core(), zone->managed_pages is set to an approximate
    value for lowmem, and will be adjusted when the bootmem allocator frees
    pages into the buddy system.

    But free_area_init_core() is also called by hotadd_new_pgdat() when
    hot-adding memory. As a result, zone->managed_pages of the newly added
    node's pgdat is set to an approximate value in the very beginning.

    Even if the memory on that node has node been onlined,
    /sys/device/system/node/nodeXXX/meminfo has wrong value:

    hot-add node2 (memory not onlined)
    cat /sys/device/system/node/node2/meminfo
    Node 2 MemTotal: 33554432 kB
    Node 2 MemFree: 0 kB
    Node 2 MemUsed: 33554432 kB
    Node 2 Active: 0 kB

    This patch fixes this problem by reset node managed pages to 0 after
    hot-adding a new node.

    1. Move reset_managed_pages_done from reset_node_managed_pages() to
    reset_all_zones_managed_pages()
    2. Make reset_node_managed_pages() non-static
    3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
    is initialized

    Signed-off-by: Tang Chen
    Signed-off-by: Yasuaki Ishimatsu
    Cc: [3.16+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

30 Oct, 2014

1 commit

  • When hot adding the same memory after hot removal, the following
    messages are shown:

    WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
    ...
    Call Trace:
    dump_stack+0x46/0x58
    warn_slowpath_common+0x81/0xa0
    warn_slowpath_null+0x1a/0x20
    free_area_init_node+0x3fe/0x426
    hotadd_new_pgdat+0x90/0x110
    add_memory+0xd4/0x200
    acpi_memory_device_add+0x1aa/0x289
    acpi_bus_attach+0xfd/0x204
    acpi_bus_attach+0x178/0x204
    acpi_bus_scan+0x6a/0x90
    acpi_device_hotplug+0xe8/0x418
    acpi_hotplug_work_fn+0x1f/0x2b
    process_one_work+0x14e/0x3f0
    worker_thread+0x11b/0x510
    kthread+0xe1/0x100
    ret_from_fork+0x7c/0xb0

    The detaled explanation is as follows:

    When hot removing memory, pgdat is set to 0 in try_offline_node(). But
    if the pgdat is allocated by bootmem allocator, the clearing step is
    skipped.

    And when hot adding the same memory, the uninitialized pgdat is reused.
    But free_area_init_node() checks wether pgdat is set to zero. As a
    result, free_area_init_node() hits WARN_ON().

    This patch clears pgdat which is allocated by bootmem allocator in
    try_offline_node().

    Signed-off-by: Yasuaki Ishimatsu
    Cc: Zhang Zhen
    Cc: Wang Nan
    Cc: Tang Chen
    Reviewed-by: Toshi Kani
    Cc: Dave Hansen
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

10 Oct, 2014

1 commit

  • Currently memory-hotplug has two limits:

    1. If the memory block is in ZONE_NORMAL, you can change it to
    ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.

    2. If the memory block is in ZONE_MOVABLE, you can change it to
    ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

    With this patch, we can easy to know a memory block can be onlined to
    which zone, and don't need to know the above two limits.

    Updated the related Documentation.

    [akpm@linux-foundation.org: use conventional comment layout]
    [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
    [akpm@linux-foundation.org: remove unused local zone_prev]
    Signed-off-by: Zhang Zhen
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Toshi Kani
    Cc: Yasuaki Ishimatsu
    Cc: Naoya Horiguchi
    Cc: Wang Nan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Zhen
     

07 Aug, 2014

3 commits

  • This series of patches fixes a problem when adding memory in bad manner.
    For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
    memory installed, following commands cause problem:

    # echo 0x40000000 > /sys/devices/system/memory/probe
    [ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
    # echo 0x48000000 > /sys/devices/system/memory/probe
    [ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
    # echo online_movable > /sys/devices/system/memory/memory9/state
    # echo 0x50000000 > /sys/devices/system/memory/probe
    [ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
    # echo 0x58000000 > /sys/devices/system/memory/probe
    [ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
    # echo online_movable > /sys/devices/system/memory/memory11/state
    # echo online> /sys/devices/system/memory/memory8/state
    # echo online> /sys/devices/system/memory/memory10/state
    # echo offline> /sys/devices/system/memory/memory9/state
    [ 30.558819] Offlined Pages 32768
    # free
    total used free shared buffers cached
    Mem: 780588 18014398509432020 830552 0 0 51180
    -/+ buffers/cache: 18014398509380840 881732
    Swap: 0 0 0

    This is because the above commands probe higher memory after online a
    section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
    for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.

    After the second online_movable, the problem can be observed from
    zoneinfo:

    # cat /proc/zoneinfo
    ...
    Node 0, zone Movable
    pages free 65491
    min 250
    low 312
    high 375
    scanned 0
    spanned 18446744073709518848
    present 65536
    managed 65536
    ...

    This series of patches solve the problem by checking ZONE_MOVABLE when
    choosing zone for new memory. If new memory is inside or higher than
    ZONE_MOVABLE, makes it go there instead.

    After applying this series of patches, following are free and zoneinfo
    result (after offlining memory9):

    bash-4.2# free
    total used free shared buffers cached
    Mem: 780956 80112 700844 0 0 51180
    -/+ buffers/cache: 28932 752024
    Swap: 0 0 0

    bash-4.2# cat /proc/zoneinfo

    Node 0, zone DMA
    pages free 3389
    min 14
    low 17
    high 21
    scanned 0
    spanned 4095
    present 3998
    managed 3977
    nr_free_pages 3389
    ...
    start_pfn: 1
    inactive_ratio: 1
    Node 0, zone DMA32
    pages free 73724
    min 341
    low 426
    high 511
    scanned 0
    spanned 98304
    present 98304
    managed 92958
    nr_free_pages 73724
    ...
    start_pfn: 4096
    inactive_ratio: 1
    Node 0, zone Normal
    pages free 32630
    min 120
    low 150
    high 180
    scanned 0
    spanned 32768
    present 32768
    managed 32768
    nr_free_pages 32630
    ...
    start_pfn: 262144
    inactive_ratio: 1
    Node 0, zone Movable
    pages free 65476
    min 241
    low 301
    high 361
    scanned 0
    spanned 98304
    present 65536
    managed 65536
    nr_free_pages 65476
    ...
    start_pfn: 294912
    inactive_ratio: 1

    This patch (of 7):

    Introduce zone_for_memory() in arch independent code for
    arch_add_memory() use.

    Many arch_add_memory() function simply selects ZONE_HIGHMEM or
    ZONE_NORMAL and add new memory into it. However, with the existance of
    ZONE_MOVABLE, the selection method should be carefully considered: if
    new, higher memory is added after ZONE_MOVABLE is setup, the default
    zone and ZONE_MOVABLE may overlap each other.

    should_add_memory_movable() checks the status of ZONE_MOVABLE. If it
    has already contain memory, compare the address of new memory and
    movable memory. If new memory is higher than movable, it should be
    added into ZONE_MOVABLE instead of default zone.

    Signed-off-by: Wang Nan
    Cc: Zhang Yanfei
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: "Mel Gorman"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang Nan
     
  • In store_mem_state(), we have:

    ...
    334 else if (!strncmp(buf, "offline", min_t(int, count, 7)))
    335 online_type = -1;
    ...
    355 case -1:
    356 ret = device_offline(&mem->dev);
    357 break;
    ...

    Here, "offline" is hard coded as -1.

    This patch does the following renaming:

    ONLINE_KEEP -> MMOP_ONLINE_KEEP
    ONLINE_KERNEL -> MMOP_ONLINE_KERNEL
    ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE

    and introduces MMOP_OFFLINE = -1 to avoid hard coding.

    Signed-off-by: Tang Chen
    Cc: Hu Tao
    Cc: Greg Kroah-Hartman
    Cc: Lai Jiangshan
    Cc: Yasuaki Ishimatsu
    Cc: Gu Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • grow_zone_span and grow_pgdat_span are only called by
    __meminit __add_zone

    Signed-off-by: Fabian Frederick
    Cc: Toshi Kani
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

05 Jun, 2014

3 commits

  • Memory migration uses a callback defined by the caller to determine how to
    allocate destination pages. When migration fails for a source page,
    however, it frees the destination page back to the system.

    This patch adds a memory migration callback defined by the caller to
    determine how to free destination pages. If a caller, such as memory
    compaction, builds its own freelist for migration targets, this can reuse
    already freed memory instead of scanning additional memory.

    If the caller provides a function to handle freeing of destination pages,
    it is called when page migration fails. If the caller passes NULL then
    freeing back to the system will be handled as usual. This patch
    introduces no functional change.

    Signed-off-by: David Rientjes
    Reviewed-by: Naoya Horiguchi
    Acked-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Replace ((x) >> PAGE_SHIFT) with the pfn macro.

    Signed-off-by: Fabian Frederick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • kmem_cache_{create,destroy,shrink} need to get a stable value of
    cpu/node online mask, because they init/destroy/access per-cpu/node
    kmem_cache parts, which can be allocated or destroyed on cpu/mem
    hotplug. To protect against cpu hotplug, these functions use
    {get,put}_online_cpus. However, they do nothing to synchronize with
    memory hotplug - taking the slab_mutex does not eliminate the
    possibility of race as described in patch 2.

    What we need there is something like get_online_cpus, but for memory.
    We already have lock_memory_hotplug, which serves for the purpose, but
    it's a bit of a hammer right now, because it's backed by a mutex. As a
    result, it imposes some limitations to locking order, which are not
    desirable, and can't be used just like get_online_cpus. That's why in
    patch 1 I substitute it with get/put_online_mems, which work exactly
    like get/put_online_cpus except they block not cpu, but memory hotplug.

    [ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by
    myself, because it used an rw semaphore for get/put_online_mems,
    making them dead lock prune. ]

    This patch (of 2):

    {un}lock_memory_hotplug, which is used to synchronize against memory
    hotplug, is currently backed by a mutex, which makes it a bit of a
    hammer - threads that only want to get a stable value of online nodes
    mask won't be able to proceed concurrently. Also, it imposes some
    strong locking ordering rules on it, which narrows down the set of its
    usage scenarios.

    This patch introduces get/put_online_mems, which are the same as
    get/put_online_cpus, but for memory hotplug, i.e. executing a code
    inside a get/put_online_mems section will guarantee a stable value of
    online nodes, present pages, etc.

    lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.

    Signed-off-by: Vladimir Davydov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Tang Chen
    Cc: Zhang Yanfei
    Cc: Toshi Kani
    Cc: Xishi Qiu
    Cc: Jiang Liu
    Cc: Rafael J. Wysocki
    Cc: David Rientjes
    Cc: Wen Congyang
    Cc: Yasuaki Ishimatsu
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

24 Jan, 2014

2 commits

  • We don't need to do register_memory_resource() under
    lock_memory_hotplug() since it has its own lock and doesn't make any
    callbacks.

    Also register_memory_resource return NULL on failure so we don't have
    anything to cleanup at this point.

    The reason for this rfc is I was doing some experiments with hotplugging
    of memory on some of our larger systems. While it seems to work, it can
    be quite slow. With some preliminary digging I found that
    lock_memory_hotplug is clearly ripe for breakup.

    It could be broken up per nid or something but it also covers the
    online_page_callback. The online_page_callback shouldn't be very hard
    to break out.

    Also there is the issue of various structures(wmarks come to mind) that
    are only updated under the lock_memory_hotplug that would need to be
    dealt with.

    Cc: Tang Chen
    Cc: Wen Congyang
    Cc: Kamezawa Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Cc: "Rafael J. Wysocki"
    Cc: Hedi
    Cc: Mike Travis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Zimmer
     
  • bad_page() is cool in that it prints out a bunch of data about the page.
    But, I can never remember which page flags are good and which are bad,
    or whether ->index or ->mapping is required to be NULL.

    This patch allows bad/dump_page() callers to specify a string about why
    they are dumping the page and adds explanation strings to a number of
    places. It also adds a 'bad_flags' argument to bad_page(), which it
    then dumps out separately from the flags which are actually set.

    This way, the messages will show specifically why the page was bad,
    *specifically* which flags it is complaining about, if it was a page
    flag combination which was the problem.

    [akpm@linux-foundation.org: switch to pr_alert]
    Signed-off-by: Dave Hansen
    Reviewed-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

22 Jan, 2014

3 commits

  • Correct ensure_zone_is_initialized() function description according to
    the introduced memblock APIs for early memory allocations.

    Signed-off-by: Grygorii Strashko
    Signed-off-by: Santosh Shilimkar
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tejun Heo
    Cc: Tony Lindgren
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     
  • Clean-up to remove depedency with bootmem headers.

    Signed-off-by: Grygorii Strashko
    Signed-off-by: Santosh Shilimkar
    Reviewed-by: Tejun Heo
    Cc: Yinghai Lu
    Cc: Arnd Bergmann
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Christoph Lameter
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tony Lindgren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Grygorii Strashko
     
  • Linux kernel cannot migrate pages used by the kernel. As a result,
    hotpluggable memory used by the kernel won't be able to be hot-removed.
    To solve this problem, the basic idea is to prevent memblock from
    allocating hotpluggable memory for the kernel at early time, and arrange
    all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as
    ZONE_MOVABLE when initializing zones.

    In the previous patches, we have marked hotpluggable memory regions with
    MEMBLOCK_HOTPLUG flag in memblock.memory.

    In this patch, we make memblock skip these hotpluggable memory regions
    in the default top-down allocation function if movable_node boot option
    is specified.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Tang Chen
    Signed-off-by: Zhang Yanfei
    Cc: "H. Peter Anvin"
    Cc: "Rafael J . Wysocki"
    Cc: Chen Tang
    Cc: Gong Chen
    Cc: Ingo Molnar
    Cc: Jiang Liu
    Cc: Johannes Weiner
    Cc: Lai Jiangshan
    Cc: Larry Woodman
    Cc: Len Brown
    Cc: Liu Jiang
    Cc: Mel Gorman
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Prarit Bhargava
    Cc: Rik van Riel
    Cc: Taku Izumi
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Thomas Renninger
    Cc: Toshi Kani
    Cc: Vasilis Liaskovitis
    Cc: Wanpeng Li
    Cc: Wen Congyang
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

13 Nov, 2013

6 commits

  • The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
    As we mentioned before, if hotpluggable memory is used by the kernel, it
    cannot be hot-removed. So memory hotplug users may want to set all
    hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

    Memory hotplug users may also set a node as movable node, which has
    ZONE_MOVABLE only, so that the whole node can be hot-removed.

    But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
    kernel cannot use memory in movable nodes. This will cause NUMA
    performance down. And other users may be unhappy.

    So we need a way to allow users to enable and disable this functionality.
    In this patch, we introduce movable_node boot option to allow users to
    choose to not to consume hotpluggable memory at early boot time and later
    we can set it as ZONE_MOVABLE.

    To achieve this, the movable_node boot option will control the memblock
    allocation direction. That said, after memblock is ready, before SRAT is
    parsed, we should allocate memory near the kernel image as we explained in
    the previous patches. So if movable_node boot option is set, the kernel
    does the following:

    1. After memblock is ready, make memblock allocate memory bottom up.
    2. After SRAT is parsed, make memblock behave as default, allocate memory
    top down.

    Users can specify "movable_node" in kernel commandline to enable this
    functionality. For those who don't use memory hotplug or who don't want
    to lose their NUMA performance, just don't specify anything. The kernel
    will work as before.

    Signed-off-by: Tang Chen
    Signed-off-by: Zhang Yanfei
    Suggested-by: Kamezawa Hiroyuki
    Suggested-by: Ingo Molnar
    Acked-by: Tejun Heo
    Acked-by: Toshi Kani
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Wanpeng Li
    Cc: Thomas Renninger
    Cc: Yinghai Lu
    Cc: Jiang Liu
    Cc: Wen Congyang
    Cc: Lai Jiangshan
    Cc: Yasuaki Ishimatsu
    Cc: Taku Izumi
    Cc: Mel Gorman
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • For below functions,

    - sparse_add_one_section()
    - kmalloc_section_memmap()
    - __kmalloc_section_memmap()
    - __kfree_section_memmap()

    they are always invoked to operate on one memory section, so it is
    redundant to always pass a nr_pages parameter, which is the page numbers
    in one section. So we can directly use predefined macro PAGES_PER_SECTION
    instead of passing the parameter.

    Signed-off-by: Zhang Yanfei
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Toshi Kani
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
    mem_online_node() to put its node online if offlined and then call
    build_all_zonelists() to initialize the zone list.

    These steps are specific to memory hotplug, and should be managed in
    mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the
    whole steps.

    For this reason, this patch replaces mem_online_node() with
    try_online_node(), which performs the whole steps with
    lock_memory_hotplug() held. try_online_node() is named after
    try_offline_node() as they have similar purpose.

    There is no functional change in this patch.

    Signed-off-by: Toshi Kani
    Reviewed-by: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Use "pfn_to_nid(pfn)" instead of "page_to_nid(pfn_to_page(pfn))".

    Signed-off-by: Xishi Qiu
    Acked-by: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • A is_memblock_offlined() return or 1 means memory block is offlined, but
    is_memblock_offlined_cb() returning 1 means memory block is not offlined,
    this will confuse somebody, so rename the function.

    Signed-off-by: Xishi Qiu
    Acked-by: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn +
    pgdat->node_spanned_pages". Simplify the code, no functional change.

    Signed-off-by: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

13 Sep, 2013

1 commit

  • Pull ACPI and power management fixes from Rafael Wysocki:
    "All of these commits are fixes that have emerged recently and some of
    them fix bugs introduced during this merge window.

    Specifics:

    1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events

    After the recent ACPIPHP changes we've seen some interesting
    breakage on a system that triggers device check notifications
    during boot for non-existing devices. Although those
    notifications are really spurious, we should be able to deal with
    them nevertheless and that shouldn't introduce too much overhead.
    Four commits to make that work properly.

    2) Memory hotplug and hibernation mutual exclusion rework

    This was maent to be a cleanup, but it happens to fix a classical
    ABBA deadlock between system suspend/hibernation and ACPI memory
    hotplug which is possible if they are started roughly at the same
    time. Three commits rework memory hotplug so that it doesn't
    acquire pm_mutex and make hibernation use device_hotplug_lock
    which prevents it from racing with memory hotplug.

    3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix

    The ACPI LPSS driver crashes during boot on Apple Macbook Air with
    Haswell that has slightly unusual BIOS configuration in which one
    of the LPSS device's _CRS method doesn't return all of the
    information expected by the driver. Fix from Mika Westerberg, for
    stable.

    4) ACPICA fix related to Store->ArgX operation

    AML interpreter fix for obscure breakage that causes AML to be
    executed incorrectly on some machines (observed in practice).
    From Bob Moore.

    5) ACPI core fix for PCI ACPI device objects lookup

    There still are cases in which there is more than one ACPI device
    object matching a given PCI device and we don't choose the one
    that the BIOS expects us to choose, so this makes the lookup take
    more criteria into account in those cases.

    6) Fix to prevent cpuidle from crashing in some rare cases

    If the result of cpuidle_get_driver() is NULL, which can happen on
    some systems, cpuidle_driver_ref() will crash trying to use that
    pointer and the Daniel Fu's fix prevents that from happening.

    7) cpufreq fixes related to CPU hotplug

    Stephen Boyd reported a number of concurrency problems with
    cpufreq related to CPU hotplug which are addressed by a series of
    fixes from Srivatsa S Bhat and Viresh Kumar.

    8) cpufreq fix for time conversion in time_in_state attribute

    Time conversion carried out by cpufreq when user space attempts to
    read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
    won't work correcty if cputime_t doesn't map directly to jiffies.
    Fix from Andreas Schwab.

    9) Revert of a troublesome cpufreq commit

    Commit 7c30ed5 (cpufreq: make sure frequency transitions are
    serialized) was intended to address some known concurrency
    problems in cpufreq related to the ordering of transitions, but
    unfortunately it introduced several problems of its own, so I
    decided to revert it now and address the original problems later
    in a more robust way.

    10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.

    11) cpufreq fixes related to system suspend/resume

    The recent cpufreq changes that made it preserve CPU sysfs
    attributes over suspend/resume cycles introduced a possible NULL
    pointer dereference that caused it to crash during the second
    attempt to suspend. Three commits from Srivatsa S Bhat fix that
    problem and a couple of related issues.

    12) cpufreq locking fix

    cpufreq_policy_restore() should acquire the lock for reading, but
    it acquires it for writing. Fix from Lan Tianyu"

    * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
    cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
    cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
    cpufreq: Restructure if/else block to avoid unintended behavior
    cpufreq: Fix crash in cpufreq-stats during suspend/resume
    intel_pstate: Add Haswell CPU models
    Revert "cpufreq: make sure frequency transitions are serialized"
    cpufreq: Use signed type for 'ret' variable, to store negative error values
    cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
    cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
    cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
    cpufreq: Split __cpufreq_remove_dev() into two parts
    cpufreq: Fix wrong time unit conversion
    cpufreq: serialize calls to __cpufreq_governor()
    cpufreq: don't allow governor limits to be changed when it is disabled
    ACPI / bind: Prefer device objects with _STA to those without it
    ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks
    ACPI / hotplug / PCI: Use _OST to notify firmware about notify status
    ACPI / hotplug / PCI: Avoid doing too much for spurious notifies
    ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.
    ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
    ...

    Linus Torvalds
     

12 Sep, 2013

7 commits

  • Until now we can't offline memory blocks which contain hugepages because a
    hugepage is considered as an unmovable page. But now with this patch
    series, a hugepage has become movable, so by using hugepage migration we
    can offline such memory blocks.

    What's different from other users of hugepage migration is that we need to
    decompose all the hugepages inside the target memory block into free buddy
    pages after hugepage migration, because otherwise free hugepages remaining
    in the memory block intervene the memory offlining. For this reason we
    introduce new functions dissolve_free_huge_page() and
    dissolve_free_huge_pages().

    Other than that, what this patch does is straightforwardly to add hugepage
    migration code, that is, adding hugepage code to the functions which scan
    over pfn and collect hugepages to be migrated, and adding a hugepage
    allocation function to alloc_migrate_target().

    As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
    over them because it's larger than memory block. So we now simply leave
    it to fail as it is.

    [yongjun_wei@trendmicro.com.cn: remove duplicated include]
    Signed-off-by: Naoya Horiguchi
    Acked-by: Andi Kleen
    Cc: Hillf Danton
    Cc: Wanpeng Li
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: KOSAKI Motohiro
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: "Aneesh Kumar K.V"
    Signed-off-by: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • lock_device_hotplug() serializes hotplug & online/offline operations. The
    lock is held in common sysfs online/offline interfaces and ACPI hotplug
    code paths.

    And here are the code paths:

    - CPU & Mem online/offline via sysfs online
    store_online()->lock_device_hotplug()

    - Mem online via sysfs state:
    store_mem_state()->lock_device_hotplug()

    - ACPI CPU & Mem hot-add:
    acpi_scan_bus_device_check()->lock_device_hotplug()

    - ACPI CPU & Mem hot-delete:
    acpi_scan_hot_remove()->lock_device_hotplug()

    try_offline_node() off-lines a node if all memory sections and cpus are
    removed on the node. It is called from acpi_processor_remove() and
    acpi_memory_remove_memory()->remove_memory() paths, both of which are in
    the ACPI hotplug code.

    try_offline_node() calls stop_machine() to stop all cpus while checking
    all cpu status with the assumption that the caller is not protected from
    CPU hotplug or CPU online/offline operations. However, the caller is
    always serialized with lock_device_hotplug(). Also, the code needs to be
    properly serialized with a lock, not by stopping all cpus at a random
    place with stop_machine().

    This patch removes the use of stop_machine() in try_offline_node() and
    adds comments to try_offline_node() and remove_memory() that
    lock_device_hotplug() is required.

    Signed-off-by: Toshi Kani
    Acked-by: Rafael J. Wysocki
    Cc: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • add_memory() and remove_memory() can only handle a memory range aligned
    with section. There are problems when an unaligned range is added and
    then deleted as follows:

    - add_memory() with an unaligned range succeeds, but __add_pages()
    called from add_memory() adds a whole section of pages even though
    a given memory range is less than the section size.
    - remove_memory() to the added unaligned range hits BUG_ON() in
    __remove_pages().

    This patch changes add_memory() and remove_memory() to check if a given
    memory range is aligned with section at the beginning. As the result,
    add_memory() fails with -EINVAL when a given range is unaligned, and does
    not add such memory range. This prevents remove_memory() to be called
    with an unaligned range as well. Note that remove_memory() has to use
    BUG_ON() since this function cannot fail.

    [akpm@linux-foundation.org: avoid printk warnings]
    Signed-off-by: Toshi Kani
    Acked-by: KOSAKI Motohiro
    Reviewed-by: Tang Chen
    Reviewed-by: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Use "zone_is_initialized()" instead of "if (zone->wait_table)".
    Simplify the code, no functional change.

    Signed-off-by: Xishi Qiu
    Cc: Cody P Schafer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • Use "zone_is_empty()" instead of "if (zone->spanned_pages)".
    Simplify the code, no functional change.

    Signed-off-by: Xishi Qiu
    Cc: Cody P Schafer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages".
    Simplify the code, no functional change.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Xishi Qiu
    Cc: Cody P Schafer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • I think we can remove "BUG_ON(start_pfn >= end_pfn)" in __offline_pages(),
    because in memory_block_action() "nr_pages = PAGES_PER_SECTION * sections_per_block"
    is always greater than 0.

    memory_block_action()
    offline_pages()
    __offline_pages()
    BUG_ON(start_pfn >= end_pfn)

    In v2.6.32, If info->length==0, this way may hit this BUG_ON().
    acpi_memory_disable_device()
    remove_memory(info->start_addr, info->length)
    offline_pages()

    A later Fujitsu patch renamed this function and the BUG_ON() is
    unnecessary.

    Signed-off-by: Xishi Qiu
    Reviewed-by: Dave Hansen
    Cc: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

31 Aug, 2013

1 commit

  • Since all of the memory hotplug operations have to be carried out
    under device_hotplug_lock, they won't need to acquire pm_mutex if
    device_hotplug_lock is held around hibernation.

    For this reason, make the hibernation code acquire
    device_hotplug_lock after freezing user space processes and
    release it before thawing them. At the same tim drop the
    lock_system_sleep() and unlock_system_sleep() calls from
    lock_memory_hotplug() and unlock_memory_hotplug(), respectively.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Toshi Kani

    Rafael J. Wysocki
     

10 Jul, 2013

2 commits


04 Jul, 2013

1 commit

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds