13 Jan, 2012

1 commit

  • Currently no udev events for memory hotplug "online" and "offline" are
    generated:

    # udevadm monitor
    # echo offline > /sys/devices/system/memory/memory4/state
    ==> No event

    When kdump is loaded, kexec detects the current memory configuration and
    stores it in the pre-allocated ELF core header. Therefore, for kdump it
    is necessary to reload the kdump kernel with kexec when the memory
    configuration changes (e.g. for online/offline hotplug memory).

    In order to do this automatically, udev rules should be used. This kernel
    patch adds udev events for "online" and "offline". Together with this
    kernel patch, the following udev rules for online/offline have to be added
    to "/etc/udev/rules.d/98-kexec.rules":

    SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/etc/init.d/kdump restart"
    SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart"

    [sfr@canb.auug.org.au: fixups for class to subsystem conversion]
    Signed-off-by: Michael Holzheu
    Cc: Heiko Carstens
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Kay Sievers
    Cc: Dave Hansen
    Cc: Martin Schwidefsky
    Cc: Greg KH
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     

22 Dec, 2011

1 commit

  • This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

19 Oct, 2011

2 commits

  • (Resending as I am not seeing it in -next so maybe it got lost)

    mm: memory hotplug: Check if pages are correctly reserved on a per-section basis

    It is expected that memory being brought online is PageReserved
    similar to what happens when the page allocator is being brought up.
    Memory is onlined in "memory blocks" which consist of one or more
    sections. Unfortunately, the code that verifies PageReserved is
    currently assuming that the memmap backing all these pages is virtually
    contiguous which is only the case when CONFIG_SPARSEMEM_VMEMMAP is set.
    As a result, memory hot-add is failing on those configurations with
    the message;

    kernel: section number XXX page number 256 not reserved, was it already online?

    This patch updates the PageReserved check to lookup struct page once
    per section to guarantee the correct struct page is being checked.

    [Check pages within sections properly: rientjes@google.com]
    [original patch by: nfont@linux.vnet.ibm.com]
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Tested-by: Nathan Fontenot
    Signed-off-by: Greg Kroah-Hartman

    Mel Gorman
     
  • This reverts commit 54f23eb7ba7619de85d8edca6e5336bc33072dbd.

    Turns out this patch is wrong, another correct one will follow it.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

27 Sep, 2011

2 commits

  • The check to ensure that pages of recently added memory sections are correctly
    marked as reserved before trying to online the memory is broken. The request
    to online the memory fails with the following:

    kernel: section number XXX page number 256 not reserved, was it already online?

    This updates the page reservation checking to check the pages of each memory
    section of the memory block being onlined individually.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     
  • The sysfs memory probe interface allows unaligned regions
    to be added:

    # echo 0xffffff > /sys/devices/system/memory/probe

    # cat /proc/iomem
    00ffffff-01fffffe : System RAM
    01ffffff-02fffffe : System RAM
    02ffffff-03fffffe : System RAM
    03ffffff-04fffffe : System RAM
    04ffffff-05fffffe : System RAM

    Return -EINVAL instead of creating these bad regions.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Greg Kroah-Hartman

    Anton Blanchard
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

12 Jul, 2011

1 commit


24 May, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    b43: fix comment typo reqest -> request
    Haavard Skinnemoen has left Atmel
    cris: typo in mach-fs Makefile
    Kconfig: fix copy/paste-ism for dell-wmi-aio driver
    doc: timers-howto: fix a typo ("unsgined")
    perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
    md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
    treewide: fix a few typos in comments
    regulator: change debug statement be consistent with the style of the rest
    Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
    audit: acquire creds selectively to reduce atomic op overhead
    rtlwifi: don't touch with treewide double semicolon removal
    treewide: cleanup continuations and remove logging message whitespace
    ath9k_hw: don't touch with treewide double semicolon removal
    include/linux/leds-regulator.h: fix syntax in example code
    tty: fix typo in descripton of tty_termios_encode_baud_rate
    xtensa: remove obsolete BKL kernel option from defconfig
    m68k: fix comment typo 'occcured'
    arch:Kconfig.locks Remove unused config option.
    treewide: remove extra semicolons
    ...

    Linus Torvalds
     

13 May, 2011

1 commit


12 May, 2011

1 commit

  • On ppc64 the minimum memory section for hotplug is 16MB but most
    recent machines have a memory block size of 256MB. This means
    memory_block_change_state does 16 separate calls to
    memory_section_action.

    This also means we call the notifiers 16 times and the hook
    in the ehea network driver is quite costly. To offline one 256MB
    region takes:

    # time echo offline > /sys/devices/system/memory/memory32/state
    7.9s

    This patch removes the loop and calls online_pages or
    remove_memory once for the entire region and in doing so makes
    the logic simpler since we don't have to back out if things fail
    part way through.

    The same test to offline one region now takes:

    # time echo online > /sys/devices/system/memory/memory32/state
    0.67s

    Over 11 times faster.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Greg Kroah-Hartman

    Anton Blanchard
     

26 Apr, 2011

1 commit


10 Apr, 2011

1 commit


04 Feb, 2011

3 commits

  • As a follow-on to the recent patches I submitted that allowed for a sysfs
    memory block to span multiple memory sections, we should also update the
    probe routine to online all of the memory sections in a memory block. Without
    this patch the current code will only add a single memory section. I think
    the probe routine should add all of the memory sections in the specified memory
    block so that its behavior is in line with memory hotplug actions through
    the sysfs interfaces.

    This patch applies on top of the previous sysfs memory updates to allow
    a sysfs directory o span multiple memory sections.

    https://lkml.org/lkml/2011/1/20/245

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     
  • Update the 'phys_index' property of a the memory_block struct to be
    called start_section_nr, and add a end_section_nr property. The
    data tracked here is the same but the updated naming is more in line
    with what is stored here, namely the first and last section number
    that the memory block spans.

    The names presented to userspace remain the same, phys_index for
    start_section_nr and end_phys_index for end_section_nr, to avoid breaking
    anything in userspace.

    This also updates the node sysfs code to be aware of the new capability for
    a memory block to contain multiple memory sections and be aware of the memory
    block structure name changes (start_section_nr). This requires an additional
    parameter to unregister_mem_sect_under_nodes so that we know which memory
    section of the memory block to unregister.

    Signed-off-by: Nathan Fontenot
    Reviewed-by: Robin Holt
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     
  • Update the memory sysfs code such that each sysfs memory directory is now
    considered a memory block that can span multiple memory sections per
    memory block. The default size of each memory block is SECTION_SIZE_BITS
    to maintain the current behavior of having a single memory section per
    memory block (i.e. one sysfs directory per memory section).

    For architectures that want to have memory blocks span multiple
    memory sections they need only define their own memory_block_size_bytes()
    routine.

    Update the memory hotplug documentation to reflect the new behaviors of
    memory blocks reflected in sysfs.

    Signed-off-by: Nathan Fontenot
    Reviewed-by: Robin Holt
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Greg Kroah-Hartman

    Nathan Fontenot
     

23 Oct, 2010

4 commits


10 Apr, 2010

1 commit

  • This reverts commit ba168fc37dea145deeb8fa9e7e71c748d2e00d74.

    It changes user-visible sysfs interfaces, and breaks some existing user
    space applications which apparently rely on the fact that the output
    does not contain the "0x" prefix.

    Requested-by: Heiko Carstens
    Acked-by: KOSAKI Motohiro
    Acked-by: Wu Fengguang
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

18 Mar, 2010

1 commit

  • /sys/devices/system/memory/memoryX/phys_device is supposed to contain the
    number of the physical device that the corresponding piece of memory
    belongs to.

    In case a physical device should be replaced or taken offline for whatever
    reason it is necessary to set all corresponding memory pieces offline.
    The current implementation always sets phys_device to '0' and there is no
    way or hook to change that. Seems like there was a plan to implement that
    but it wasn't finished for whatever reason.

    So add a weak function which architectures can override to actually set
    the phys_device from within add_memory_block().

    Signed-off-by: Heiko Carstens
    Cc: Dave Hansen
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

08 Mar, 2010

3 commits

  • Constify struct kset_uevent_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Passing the attribute to the low level IO functions allows all kinds
    of cleanups, by sharing low level IO code without requiring
    an own function for every piece of data.

    Also drivers can extend the attributes with own data fields
    and use that in the low level function.

    This makes the class attributes the same as sysdev_class attributes
    and plain attributes.

    This will allow further cleanups in drivers.

    Full tree sweep converting all users.

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     
  • This attribute is really a sysdev_class attribute, not a plain class attribute.

    They are identical in layout currently, but this might not always be
    the case.

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

21 Jan, 2010

1 commit


17 Jan, 2010

2 commits

  • The function prototype mismatches in call stack:

    [] print_block_size+0x58/0x60
    [] sysdev_class_show+0x1f/0x30
    [] sysfs_read_file+0xcb/0x1f0
    [] vfs_read+0xc8/0x180

    Due to prototype mismatch, print_block_size() will sprintf() into
    *attribute instead of *buf, hence user space will read the initial
    zeros from *buf:
    $ hexdump /sys/devices/system/memory/block_size_bytes
    0000000 0000 0000 0000 0000
    0000008

    After patch:
    cat /sys/devices/system/memory/block_size_bytes
    0x8000000

    This complements commits c29af9636 and 4a0b2b4dbe.

    Signed-off-by: Wu Fengguang
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: "Zheng, Shaohui"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Signed-off-by: Wu Fengguang
    Cc: Andi Kleen
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

18 Dec, 2009

1 commit

  • Memory balloon drivers can allocate a large amount of memory which is not
    movable but could be freed to accomodate memory hotplug remove.

    Prior to calling the memory hotplug notifier chain the memory in the
    pageblock is isolated. Currently, if the migrate type is not
    MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal
    for that page range to fail.

    Rather than failing pageblock isolation if the migrateteype is not
    MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock,
    and not on the LRU, are owned by a registered balloon driver (or other
    entity) using a notifier chain. If all of the non-movable pages are owned
    by a balloon, they can be freed later through the memory notifier chain
    and the range can still be isolated in set_migratetype_isolate().

    Signed-off-by: Robert Jennings
    Cc: Mel Gorman
    Cc: Ingo Molnar
    Cc: Brian King
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Benjamin Herrenschmidt

    Robert Jennings
     

16 Dec, 2009

1 commit

  • This is a simpler, gentler variant of memory_failure() for soft page
    offlining controlled from user space. It doesn't kill anything, just
    tries to invalidate and if that doesn't work migrate the
    page away.

    This is useful for predictive failure analysis, where a page has
    a high rate of corrected errors, but hasn't gone bad yet. Instead
    it can be offlined early and avoided.

    The offlining is controlled from sysfs, including a new generic
    entry point for hard page offlining for symmetry too.

    We use the page isolate facility to prevent re-allocation
    race. Normally this is only used by memory hotplug. To avoid
    races with memory allocation I am using lock_system_sleep().
    This avoids the situation where memory hotplug is about
    to isolate a page range and then hwpoison undoes that work.
    This is a big hammer currently, but the simplest solution
    currently.

    When the page is not free or LRU we try to free pages
    from slab and other caches. The slab freeing is currently
    quite dumb and does not try to focus on the specific slab
    cache which might own the page. This could be potentially
    improved later.

    Thanks to Fengguang Wu and Haicheng Li for some fixes.

    [Added fix from Andrew Morton to adapt to new migrate_pages prototype]
    Signed-off-by: Andi Kleen

    Andi Kleen
     

07 Jan, 2009

1 commit

  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

20 Oct, 2008

1 commit


28 Jul, 2008

1 commit


27 Jul, 2008

1 commit

  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.

    Signed-off-by: Arjan van de Ven
    Cc: Greg KH
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

25 Jul, 2008

1 commit

  • Memory may be hot-removed on a per-memory-block basis, particularly on
    POWER where the SPARSEMEM section size often matches the memory-block
    size. A user-level agent must be able to identify which sections of
    memory are likely to be removable before attempting the potentially
    expensive operation. This patch adds a file called "removable" to the
    memory directory in sysfs to help such an agent. In this patch, a memory
    block is considered removable if;

    o It contains only MOVABLE pageblocks
    o It contains only pageblocks with free pages regardless of pageblock type

    On the other hand, a memory block starting with a PageReserved() page will
    never be considered removable. Without this patch, the user-agent is
    forced to choose a memory block to remove randomly.

    Sample output of the sysfs files:

    ./memory/memory0/removable: 0
    ./memory/memory1/removable: 0
    ./memory/memory2/removable: 0
    ./memory/memory3/removable: 0
    ./memory/memory4/removable: 0
    ./memory/memory5/removable: 0
    ./memory/memory6/removable: 0
    ./memory/memory7/removable: 1
    ./memory/memory8/removable: 0
    ./memory/memory9/removable: 0
    ./memory/memory10/removable: 0
    ./memory/memory11/removable: 0
    ./memory/memory12/removable: 0
    ./memory/memory13/removable: 0
    ./memory/memory14/removable: 0
    ./memory/memory15/removable: 0
    ./memory/memory16/removable: 0
    ./memory/memory17/removable: 1
    ./memory/memory18/removable: 1
    ./memory/memory19/removable: 1
    ./memory/memory20/removable: 1
    ./memory/memory21/removable: 1
    ./memory/memory22/removable: 1

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

22 Jul, 2008

1 commit

  • This allow to dynamically generate attributes and share show/store
    functions between attributes. Right now most attributes are generated
    by special macros and lots of duplicated code. With the attribute
    passed it's instead possible to attach some data to the attribute
    and then use that in shared low level functions to do different things.

    I need this for the dynamically generated bank attributes in the x86
    machine check code, but it'll allow some further cleanups.

    I converted all users in tree to the new show/store prototype. It's a single
    huge patch to avoid unbisectable sections.

    Runtime tested: x86-32, x86-64
    Compiled only: ia64, powerpc
    Not compile tested/only grep converted: sh, arm, avr32

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

13 May, 2008

1 commit


20 Apr, 2008

2 commits