26 Jun, 2015

1 commit


10 Oct, 2014

1 commit

  • By the following commits, we prevented from allocating firmware_map_entry
    of same memory range:
    f0093ede: drivers/firmware/memmap.c: don't allocate firmware_map_entry
    of same memory range
    49c8b24d: drivers/firmware/memmap.c: pass the correct argument to
    firmware_map_find_entry_bootmem()

    But it's not enough. When PNP0C80 device is added by acpi_scan_init(),
    memmap sysfses of same firmware_map_entry are created twice as follows:

    # cat /sys/firmware/memmap/*/start
    0x40000000000
    0x60000000000
    0x4a837000
    0x4a83a000
    0x4a8b5000
    ...
    0x40000000000
    0x60000000000
    ...

    The flows of the issues are as follows:

    1. e820_reserve_resources() allocates firmware_map_entrys of all
    memory ranges defined in e820. And, these firmware_map_entrys
    are linked with map_entries list.

    map_entries -> entry 1 -> ... -> entry N

    2. When PNP0C80 device is limited by mem= boot option, acpi_scan_init()
    added the memory device. In this case, firmware_map_add_hotplug()
    allocates firmware_map_entry and creates memmap sysfs.

    map_entries -> entry 1 -> ... -> entry N -> entry N+1
    |
    memmap 1

    3. firmware_memmap_init() creates memmap sysfses of firmware_map_entrys
    linked with map_entries.

    map_entries -> entry 1 -> ... -> entry N -> entry N+1
    | | |
    memmap 2 memmap N+1 memmap 1
    memmap N+2

    So while hot removing the PNP0C80 device, kernel panic occurs as follows:

    BUG: unable to handle kernel paging request at 00000001003e000b
    IP: sysfs_open_file+0x46/0x2b0
    PGD 203a89fe067 PUD 0
    Oops: 0000 [#1] SMP
    ...
    Call Trace:
    do_dentry_open+0x1ef/0x2a0
    finish_open+0x31/0x40
    do_last+0x57c/0x1220
    path_openat+0xc2/0x4c0
    do_filp_open+0x4b/0xb0
    do_sys_open+0xf3/0x1f0
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b

    The patch adds a check of confirming whether memmap sysfs of
    firmware_map_entry has been created, and does not create memmap
    sysfs of same firmware_map_entry.

    Signed-off-by: Yasuaki Ishimatsu
    Cc: Santosh Shilimkar
    Cc: Toshi Kani
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

07 Aug, 2014

2 commits

  • When limiting memory by mem= and ACPI DSDT table has PNP0C80,
    firmware_map_entrys of same memory range are allocated and memmap X
    sysfses which have same memory range are created as follows:

    # cat /sys/firmware/memmap/0/*
    0x407ffffffff
    0x40000000000
    System RAM
    # cat /sys/firmware/memmap/33/*
    0x407ffffffff
    0x40000000000
    System RAM
    # cat /sys/firmware/memmap/35/*
    0x407ffffffff
    0x40000000000
    System RAM

    In this case, when hot-removing memory, kernel panic occurs, showing
    following call trace:

    BUG: unable to handle kernel paging request at 00000001003e000b
    IP: sysfs_open_file+0x46/0x2b0
    PGD 203a89fe067 PUD 0
    Oops: 0000 [#1] SMP
    ...
    Call Trace:
    do_dentry_open+0x1ef/0x2a0
    finish_open+0x31/0x40
    do_last+0x57c/0x1220
    path_openat+0xc2/0x4c0
    do_filp_open+0x4b/0xb0
    do_sys_open+0xf3/0x1f0
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b

    The problem occurs as follows:

    When calling e820_reserve_resources(), firmware_map_entrys of all e820
    memory map are allocated. And all firmware_map_entrys is added
    map_entries list as follows:

    map_entries
    -> +--- entry A --------+ -> ...
    | start 0x407ffffffff|
    | end 0x40000000000|
    | type System RAM |
    +--------------------+

    After that, if ACPI DSDT table has PNP0C80 and the memory range is
    limited by mem=, the PNP0C80 is hot-added. Then firmware_map_entry of
    PNP0C80 is allocated and added map_entries list as follows:

    map_entries
    -> +--- entry A --------+ -> ... -> +--- entry B --------+
    | start 0x407ffffffff| | start 0x407ffffffff|
    | end 0x40000000000| | end 0x40000000000|
    | type System RAM | | type System RAM |
    +--------------------+ +--------------------+

    Then memmap 0 sysfs for entry B is created.

    After that, firmware_memmap_init() creates memmap sysfses of all
    firmware_map_entrys in map_entries list. As a result, memmap 33 sysfs
    for entry A and memmap 35 sysfs for entry B are created. But kobject of
    entry B has been used by memmap 0 sysfs. So when creating memmap 35
    sysfs, the kobject is broken.

    If hot-removing memory, memmap 0 sysfs is destroyed and kobject of
    memmap 0 sysfs is freed. But the kobject can be accessed via memmap 35
    sysfs. So when open memmap 35 sysfs, kernel panic occurs.

    This patch checks whether there is firmware_map_entry of same memory
    range in map_entries list and don't allocate firmware_map_entry of same
    memroy range.

    Signed-off-by: Yasuaki Ishimatsu
    Cc: Santosh Shilimkar
    Cc: Toshi Kani
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • firmware_map_add_hotplug() calls firmware_map_find_entry_bootmem() to
    get free firmware_map_entry. But end arguments is not correct. So
    firmware_map_find_entry_bootmem() cannot not find firmware_map_entry.

    The patch passes the correct end argument to firmware_map_find_entry_bootmem().

    Signed-off-by: Yasuaki Ishimatsu
    Cc: Santosh Shilimkar
    Cc: Toshi Kani
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

22 Jan, 2014

1 commit

  • Switch to memblock interfaces for early memory allocator instead of
    bootmem allocator. No functional change in beahvior than what it is in
    current code from bootmem users points of view.

    Archs already converted to NO_BOOTMEM now directly use memblock
    interfaces instead of bootmem wrappers build on top of memblock. And
    the archs which still uses bootmem, these new apis just fallback to
    exiting bootmem APIs.

    Signed-off-by: Grygorii Strashko
    Signed-off-by: Santosh Shilimkar
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tejun Heo
    Cc: Tony Lindgren
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     

30 Apr, 2013

1 commit

  • When hot removing memory, a firmware_map_entry which has memory range of
    the memory is released by release_firmware_map_entry(). If the entry is
    allocated by bootmem, release_firmware_map_entry() adds the entry to
    map_entires_bootmem list when firmware_map_find_entry() finds the entry
    from map_entries list. But firmware_map_find_entry never find the entry
    sicne map_entires list does not have the entry. So the entry just
    leaks.

    Here are steps of leaking firmware_map_entry:
    firmware_map_remove()
    -> firmware_map_find_entry()
    Find released entry from map_entries list
    -> firmware_map_remove_entry()
    Delete the entry from map_entries list
    -> remove_sysfs_fw_map_entry()
    ...
    -> release_firmware_map_entry()
    -> firmware_map_find_entry()
    Find the entry from map_entries list but the entry has been
    deleted from map_entries list. So the entry is not added
    to map_entries_bootmem. Thus the entry leaks

    release_firmware_map_entry() should not call firmware_map_find_entry()
    since releaed entry has been deleted from map_entries list. So the
    patch delete firmware_map_find_entry() from releae_firmware_map_entry()

    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Wanpeng Li
    Reviewed-by: Tang Chen
    Acked-by: Toshi Kani
    Cc: Wen Congyang
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

24 Feb, 2013

1 commit

  • When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start,
    type} sysfs files are created. But there is no code to remove these
    files. This patch implements the function to remove them.

    We cannot free firmware_map_entry which is allocated by bootmem because
    there is no way to do so when the system is up. But we can at least
    remember the address of that memory and reuse the storage when the
    memory is added next time.

    This patch also introduces a new list map_entries_bootmem to link the
    map entries allocated by bootmem when they are removed, and a lock to
    protect it. And these entries will be reused when the memory is
    hot-added again.

    The idea is suggestted by Andrew Morton.

    NOTE: It is unsafe to return an entry pointer and release the
    map_entries_lock. So we should not hold the map_entries_lock
    separately in firmware_map_find_entry() and
    firmware_map_remove_entry(). Hold the map_entries_lock across find
    and remove /sys/firmware/memmap/X operation.

    And also, users of these two functions need to be careful to
    hold the lock when using these two functions.

    [tangchen@cn.fujitsu.com: Hold spinlock across find|remove /sys operation]
    [tangchen@cn.fujitsu.com: fix the wrong comments of map_entries]
    [tangchen@cn.fujitsu.com: reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem]
    [tangchen@cn.fujitsu.com: fix section mismatch problem]
    [tangchen@cn.fujitsu.com: fix the doc format in drivers/firmware/memmap.c]
    Signed-off-by: Wen Congyang
    Signed-off-by: Yasuaki Ishimatsu
    Signed-off-by: Tang Chen
    Reviewed-by: Kamezawa Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Jiang Liu
    Cc: Jianguo Wu
    Cc: Lai Jiangshan
    Cc: Tang Chen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Julian Calaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

20 Oct, 2012

1 commit


31 Jul, 2012

1 commit

  • There are two ways to create /sys/firmware/memmap/X sysfs:

    - firmware_map_add_early
    When the system starts, it is calledd from e820_reserve_resources()
    - firmware_map_add_hotplug
    When the memory is hot plugged, it is called from add_memory()

    But these functions are called without unifying value of end argument as
    below:

    - end argument of firmware_map_add_early() : start + size - 1
    - end argument of firmware_map_add_hogplug() : start + size

    The patch unifies them to "start + size". Even if applying the patch,
    /sys/firmware/memmap/X/end file content does not change.

    [akpm@linux-foundation.org: clarify comments]
    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

1 commit

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     

07 Mar, 2010

1 commit

  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     

22 Sep, 2009

1 commit

  • Since alloc_bootmem() will never return inaccessible (via virtual
    addressing) memory anyway, using the ..._low() variant only makes sense
    when the physical address range of the allocated memory must fulfill
    further constraints, espacially since on 64-bits (or more generally in all
    cases where the pools the two variants allocate from are than the full
    available range.

    Probably the use in alloc_tce_table() could also be eliminated (based on
    code inspection of pci-calgary_64.c), but that seems too risky given I
    know nothing about that hardware and have no way to test it.

    Signed-off-by: Jan Beulich
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

17 Jun, 2009

1 commit

  • Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484

    Peer reported:
    | The bug is introduced from kernel 2.6.27, if E820 table reserve the memory
    | above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000
    | (reserved)), system will report Int 6 error and hang up. The bug is caused by
    | the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit
    | variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6
    | error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this
    | bug.
    |======
    |static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
    | const char *type,
    | struct firmware_map_entry *entry)

    and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set.

    it turns out we need to pass u64 instead of resource_size_t for that.

    [akpm@linux-foundation.org: add comment]
    Reported-and-tested-by: Peer Chen
    Signed-off-by: Yinghai Lu
    Cc: Ingo Molnar
    Acked-by: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

19 Feb, 2009

1 commit

  • Since I don't work for SUSE any more and the bwalle@suse.de address is
    invalid, correct it in the copyright headers and documentation.

    Signed-off-by: Bernhard Walle
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

09 Jan, 2009

1 commit

  • Building an allnoconfig kernel, sparse asked whether these could be
    static, so I checked, and they are only used in the file where they are
    declared.

    Signed-off-by: Roel Kluin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roel Kluin
     

13 Aug, 2008

1 commit

  • Various cleanup the drivers/firmware/memmap (after review by AKPM):

    - fix kdoc to conform to the standard
    - move kdoc from header to implementation files
    - remove superfluous WARN_ON() after kmalloc()
    - WARN_ON(x); if (!x) -> if(!WARN_ON(x))
    - improve some comments

    Signed-off-by: Bernhard Walle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

27 Jul, 2008

1 commit

  • Fix firmware/memmap printk format warnings:

    drivers/firmware/memmap.c:156: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
    drivers/firmware/memmap.c:161: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'

    Signed-off-by: Randy Dunlap
    Cc: Bernhard Walle
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

08 Jul, 2008

1 commit

  • This patch adds /sys/firmware/memmap interface that represents the BIOS
    (or Firmware) provided memory map. The tree looks like:

    /sys/firmware/memmap/0/start (hex number)
    end (hex number)
    type (string)
    ... /1/start
    end
    type

    With the following shell snippet one can print the memory map in the same form
    the kernel prints itself when booting on x86 (the E820 map).

    --------- 8< --------------------------
    #!/bin/sh
    cd /sys/firmware/memmap
    for dir in * ; do
    start=$(cat $dir/start)
    end=$(cat $dir/end)
    type=$(cat $dir/type)
    printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
    done
    --------- >8 --------------------------

    That patch only provides the needed interface:

    1. The sysfs interface.
    2. The structure and enumeration definition.
    3. The function firmware_map_add() and firmware_map_add_early()
    that should be called from architecture code (E820/EFI, for
    example) to add the contents to the interface.

    If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
    nothing without cluttering the architecture-specific code with #ifdef's.

    The purpose of the new interface is kexec: While /proc/iomem represents
    the *used* memory map (e.g. modified via kernel parameters like 'memmap'
    and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
    map provided via the firmware. So kexec can:

    - use the original memory map for rebooting,
    - use the /proc/iomem for setting up the ELF core headers for kdump
    case that should only represent the memory of the system.

    The patch has been tested on i386 and x86_64.

    Signed-off-by: Bernhard Walle
    Acked-by: Greg KH
    Acked-by: Vivek Goyal
    Cc: kexec@lists.infradead.org
    Cc: yhlu.kernel@gmail.com
    Signed-off-by: Ingo Molnar

    Bernhard Walle