06 Oct, 2012

1 commit

  • Using a recursive call add a non-conflicting region in
    __reserve_region_with_split() could result in a stack overflow in the case
    that the recursive calls are too deep. Convert the recursive calls to an
    iterative loop to avoid the problem.

    Tested on a machine containing 135 regions. The kernel no longer panicked
    with stack overflow.

    Also tested with code arbitrarily adding regions with no conflict,
    embedding two consecutive conflicts and embedding two non-consecutive
    conflicts.

    Signed-off-by: T Makphaibulchoke
    Reviewed-by: Ram Pai
    Cc: Paul Gortmaker
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    T Makphaibulchoke
     

31 Jul, 2012

1 commit

  • When the requested range is outside of the root range the logic in
    __reserve_region_with_split will cause an infinite recursion which will
    overflow the stack as seen in the warning bellow.

    This particular stack overflow was caused by requesting the
    (100000000-107ffffff) range while the root range was (0-ffffffff). In
    this case __request_resource would return the whole root range as
    conflict range (i.e. 0-ffffffff). Then, the logic in
    __reserve_region_with_split would continue the recursion requesting the
    new range as (conflict->end+1, end) which incidentally in this case
    equals the originally requested range.

    This patch aborts looking for an usable range when the request does not
    intersect with the root range. When the request partially overlaps with
    the root range, it ajust the request to fall in the root range and then
    continues with the new request.

    When the request is modified or aborted errors and a stack trace are
    logged to allow catching the errors in the upper layers.

    [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
    [ 5.975150] Modules linked in:
    [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
    [ 5.985324] Call Trace:
    [ 5.987759] [] ? console_unlock+0x17b/0x18d
    [ 5.992891] [] warn_slowpath_common+0x48/0x5d
    [ 5.998194] [] ? sub_preempt_count+0x63/0x89
    [ 6.003412] [] warn_slowpath_null+0xf/0x13
    [ 6.008453] [] sub_preempt_count+0x63/0x89
    [ 6.013499] [] _raw_spin_unlock+0x27/0x3f
    [ 6.018453] [] add_partial+0x36/0x3b
    [ 6.022973] [] deactivate_slab+0x96/0xb4
    [ 6.027842] [] __slab_alloc.isra.54.constprop.63+0x204/0x241
    [ 6.034456] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.039842] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.045232] [] kmem_cache_alloc_trace+0x51/0xb0
    [ 6.050710] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.056100] [] kzalloc.constprop.5+0x29/0x38
    [ 6.061320] [] __reserve_region_with_split+0x1c/0xd1
    [ 6.067230] [] __reserve_region_with_split+0xc6/0xd1
    ...
    [ 7.179057] [] __reserve_region_with_split+0xc6/0xd1
    [ 7.184970] [] reserve_region_with_split+0x30/0x42
    [ 7.190709] [] e820_reserve_resources_late+0xd1/0xe9
    [ 7.196623] [] pcibios_resource_survey+0x23/0x2a
    [ 7.202184] [] pcibios_init+0x23/0x35
    [ 7.206789] [] pci_subsys_init+0x3f/0x44
    [ 7.211659] [] do_one_initcall+0x72/0x122
    [ 7.216615] [] ? pci_legacy_init+0x3d/0x3d
    [ 7.221659] [] kernel_init+0xa6/0x118
    [ 7.226265] [] ? start_kernel+0x334/0x334
    [ 7.231223] [] kernel_thread_helper+0x6/0x10

    Signed-off-by: Octavian Purdila
    Signed-off-by: Ram Pai
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Octavian Purdila
     

14 Jun, 2012

1 commit


01 Jun, 2012

1 commit

  • In the comment of allocate_resource(), the explanation of parameter max
    and min is not correct.

    Actually, these two parameters are used to specify the range of the
    resource that will be allocated, not the min/max size that will be
    allocated.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     

04 Feb, 2012

1 commit


31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

30 Sep, 2011

1 commit

  • __find_resource() incorrectly returns a resource window which overlaps
    an existing allocated window. This happens when the parent's
    resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
    to all its children resource-windows.

    __find_resource() looks for gaps in resource allocation among the
    children resource windows. When it encounters the last child window it
    blindly tries the range next to one allocated to the last child. Since
    the last child's window ends at 0xffffffff the calculation overflows,
    leading the algorithm to believe that any window in the range 0x0000000
    to 0xfffffff is available for allocation. This leads to a conflicting
    window allocation.

    Michal Ludvig reported this issue seen on his platform. The following
    patch fixes the problem and has been verified by Michal. I believe this
    bug has been there for ages. It got exposed by git commit 2bbc6942273b
    ("PCI : ability to relocate assigned pci-resources")

    Signed-off-by: Ram Pai
    Tested-by: Michal Ludvig
    Signed-off-by: Linus Torvalds

    Ram Pai
     

31 Jul, 2011

1 commit


07 Jul, 2011

1 commit


18 Dec, 2010

2 commits


29 Oct, 2010

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits)
    x86: allocate space within a region top-down
    x86: update iomem_resource end based on CPU physical address capabilities
    x86/PCI: allocate space from the end of a region, not the beginning
    PCI: allocate bus resources from the top down
    resources: support allocating space within a region from the top down
    resources: handle overflow when aligning start of available area
    resources: ensure callback doesn't allocate outside available space
    resources: factor out resource_clip() to simplify find_resource()
    resources: add a default alignf to simplify find_resource()
    x86/PCI: MMCONFIG: fix region end calculation
    PCI: Add support for polling PME state on suspended legacy PCI devices
    PCI: Export some PCI PM functionality
    PCI: fix message typo
    PCI: log vendor/device ID always
    PCI: update Intel chipset names and defines
    PCI: use new ccflags variable in Makefile
    PCI: add PCI_MSIX_TABLE/PBA defines
    PCI: add PCI vendor id for STmicroelectronics
    x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
    PCI: OLPC: Only enable PCI configuration type override on XO-1
    ...

    Linus Torvalds
     

28 Oct, 2010

1 commit

  • If the same resource is inserted to the resource tree (maybe not on
    purpose), a dead loop will be created. In this situation, The kernel does
    not report any warning or error :(

    The command below will show a endless print.
    #cat /proc/iomem

    [akpm@linux-foundation.org: add WARN_ON()]
    Signed-off-by: Huang Shijie
    Cc: Jesse Barnes
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     

27 Oct, 2010

5 commits

  • Allocate space from the top of a region first, then work downward,
    if an architecture desires this.

    When we allocate space from a resource, we look for gaps between children
    of the resource. Previously, we always looked at gaps from the bottom up.
    For example, given this:

    [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
    [mem 0xbff00000-0xbfffffff] gap -- available
    [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
    [mem 0xe0000000-0xf7ffffff] gap -- available

    we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
    then the [mem 0xe0000000-0xf7ffffff] gap.

    With this patch an architecture can choose to allocate from the top gap
    [mem 0xe0000000-0xf7ffffff] first.

    We can't do this across the board because iomem_resource.end is initialized
    to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't
    address the entire 64-bit physical address space. Therefore, we only
    allocate top-down if the arch requests it by clearing
    "resource_alloc_from_bottom".

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would
    make us think there's more available space than there really is. We
    would likely return something that conflicts with a previous resource,
    which would cause a failure when allocate_resource() requests the newly-
    allocated region.

    Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027
    Reported-by: Fabrice Bellet
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • The alignment callback returns a proposed location, which may have been
    adjusted to avoid ISA aliases or for other architecture-specific reasons.

    We already had a check ("tmp.start < tmp.end") to make sure the callback
    doesn't return an area that extends past the available area. This patch
    reworks the check to make sure it doesn't return an area that extends
    either below or above the available area.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This factors out the min/max clipping to simplify find_resource().
    No functional change.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This removes a test from find_resource(), which is getting cluttered.
    No functional change.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

12 May, 2010

1 commit

  • SuperIO devices share regions and use lock/unlock operations to chip
    select. We therefore need to be able to request a resource and wait for
    it to be freed by whichever other SuperIO device currently hogs it.
    Right now you have to poll which is horrible.

    Add a MUXED field to IO port resources. If the MUXED field is set on the
    resource and on the request (via request_muxed_region) then we block
    until the previous owner of the muxed resource releases their region.

    This allows us to implement proper resource sharing and locking for
    superio chips using code of the form

    enable_my_superio_dev() {
    request_muxed_region(0x44, 0x02, "superio:watchdog");
    outb() ..sequence to enable chip
    }

    disable_my_superio_dev() {
    outb() .. sequence of disable chip
    release_region(0x44, 0x02);
    }

    Signed-off-by: Giel van Schijndel
    Signed-off-by: Alan Cox
    Signed-off-by: Jesse Barnes

    Alan Cox
     

24 Mar, 2010

1 commit

  • request_resource() and insert_resource() only return success or failure,
    which no information about what existing resource conflicted with the
    proposed new reservation. This patch adds request_resource_conflict()
    and insert_resource_conflict(), which return the conflicting resource.

    Callers may use this for better error messages or to adjust the new
    resource and retry the request.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

04 Mar, 2010

1 commit


03 Mar, 2010

1 commit


02 Mar, 2010

1 commit

  • The System RAM walk shall skip partial RAM pages and avoid calling
    func() on them. So that page_is_ram() return 0 for a partial RAM page.

    In particular, it shall not call func() with len=0.
    This fixes a boot time bug reported by Sachin and root caused by Thomas:

    > >>> WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x169/0x2f1()
    > >>> Hardware name: BladeCenter LS21 -[79716AA]-
    > >>> Modules linked in:
    > >>> Pid: 0, comm: swapper Not tainted 2.6.33-git6-autotest #1
    > >>> Call Trace:
    > >>> [] ? __ioremap_caller+0x169/0x2f1
    > >>> [] warn_slowpath_common+0x77/0xa4
    > >>> [] warn_slowpath_null+0xf/0x11
    > >>> [] __ioremap_caller+0x169/0x2f1
    > >>> [] ? acpi_os_map_memory+0x12/0x1b
    > >>> [] ioremap_nocache+0x12/0x14
    > >>> [] acpi_os_map_memory+0x12/0x1b
    > >>> [] acpi_tb_verify_table+0x29/0x5b
    > >>> [] acpi_load_tables+0x39/0x15a
    > >>> [] acpi_early_init+0x60/0xf5
    > >>> [] start_kernel+0x397/0x3a7
    > >>> [] x86_64_start_reservations+0xa5/0xa9
    > >>> [] x86_64_start_kernel+0xe1/0xe8
    > >>> ---[ end trace 4eaa2a86a8e2da22 ]---
    > >>> ioremap reserve_memtype failed -22

    The return code is -EINVAL, so it failed in the is_ram check, which is
    not too surprising

    > BIOS-provided physical RAM map:
    > BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
    > BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
    > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
    > BIOS-e820: 0000000000100000 - 00000000cffa3900 (usable)
    > BIOS-e820: 00000000cffa3900 - 00000000cffa7400 (ACPI data)

    The ACPI data is not starting on a page boundary and neither does the
    usable RAM area end on a page boundary. Very useful !

    > ACPI: DSDT 00000000cffa3900 036CE (v01 IBM SERLEWIS 00001000 INTL 20060912)

    ACPI is trying to map DSDT at cffa3900, which results in a check
    vs. cffa3000 which is the relevant page boundary. The generic is_ram
    check correctly identifies that as RAM because it's in the usable
    resource area. The old e820 based is_ram check does not take
    overlapping resource areas into account. That's why it works.

    CC: Sachin Sant
    CC: Thomas Gleixner
    CC: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Wu Fengguang
     

01 Mar, 2010

1 commit

  • * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, mm: Unify kernel_physical_mapping_init() API
    x86, mm: Allow highmem user page tables to be disabled at boot time
    x86: Do not reserve brk for DMI if it's not going to be used
    x86: Convert tlbstate_lock to raw_spinlock
    x86: Use the generic page_is_ram()
    x86: Remove BIOS data range from e820
    Move page_is_ram() declaration to mm.h
    Generic page_is_ram: use __weak
    resources: introduce generic page_is_ram()

    Linus Torvalds
     

23 Feb, 2010

3 commits


18 Feb, 2010

1 commit


02 Feb, 2010

2 commits

  • Use __weak instead of __attribute__((weak)).

    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: H. Peter Anvin

    Andrew Morton
     
  • It's based on walk_system_ram_range(), for archs that don't have
    their own page_is_ram().

    The static verions in MIPS and SCORE are also made global.

    v4: prefer plain 1 instead of PAGE_IS_RAM (H. Peter Anvin)
    v3: add comment (KAMEZAWA Hiroyuki)
    "AFAIK, this "System RAM" information has been used for kdump to
    grab valid memory area and seems good for the kernel itself."
    v2: add PAGE_IS_RAM macro (Américo Wang)

    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Américo Wang
    Cc: linux-mips@linux-mips.org
    Cc: Yinghai Lu
    Acked-by: Ralf Baechle
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    LKML-Reference:
    Cc: Andrew Morton
    Signed-off-by: H. Peter Anvin

    Wu Fengguang
     

22 Dec, 2009

1 commit

  • The second parameter to alignf() in allocate_resource() must
    reflect what new resource is attempted to be allocated, else
    functions like pcibios_align_resource() (at least on x86) or
    pcmcia_align() can't work correctly.

    Commit 1e5ad9679016275d422e36b12a98b0927d76f556 broke this by
    setting the "new" resource until we're about to return success.
    To keep the resource untouched when allocate_resource() fails,
    a "tmp" resource is introduced.

    Signed-off-by: Dominik Brodowski
    Acked-by: Bjorn Helgaas
    Cc: Yinghai Lu
    Cc: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Dominik Brodowski
     

05 Nov, 2009

1 commit

  • When "allocate_resource(root, new, size, ...)" fails, we currently
    clobber "new". This is inconvenient for the caller, who might care
    about the original contents of the resource.

    For example, when pci_bus_alloc_resource() fails, the "can't allocate
    mem resource %pR" message from pci_assign_resources() currently contains
    junk for the resource start/end.

    This patch delays the "new" update until we're about to return success.

    Acked-by: Linus Torvalds
    Acked-by: Yinghai Lu
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

23 Sep, 2009

1 commit

  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

01 Jul, 2009

1 commit

  • When the 32-bit signed quantities get assigned to the u64 resource_size_t,
    they are incorrectly sign-extended.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253
    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905

    Signed-off-by: Zhang Rui
    Reported-by: Leann Ogasawara
    Cc: Pierre Ossman
    Reported-by:
    Tested-by:
    Cc:
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Rui
     

19 Apr, 2009

1 commit

  • This function is not actually used right now, since the original use
    case for it was done with insert_resource_expand_to_fit() instead.

    However, we now have another usage case that wants to basically do a
    "reserve IO resource, splitting around existing resources", however that
    one doesn't actually want the "recurse into the conflicting resource"
    logic at all.

    And since recursing into the conflicting resource was the most complex
    part, and isn't wanted, just remove it. Maybe we'll some day want both
    versions, but we can just resurrect the logic then.

    Tested-by: Yinghai Lu
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Jan, 2009

1 commit


08 Jan, 2009

1 commit

  • Device drivers that use pci_request_regions() (and similar APIs) have a
    reasonable expectation that they are the only ones accessing their device.
    As part of the e1000e hunt, we were afraid that some userland (X or some
    bootsplash stuff) was mapping the MMIO region that the driver thought it
    had exclusively via /dev/mem or via various sysfs resource mappings.

    This patch adds the option for device drivers to cause their reserved
    regions to the "banned from /dev/mem use" list, so now both kernel memory
    and device-exclusive MMIO regions are banned.
    NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.

    In addition to the config option, a kernel parameter iomem=relaxed is
    provided for the cases where developers want to diagnose, in the field,
    drivers issues from userspace.

    Reviewed-by: Matthew Wilcox
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Jesse Barnes

    Arjan van de Ven
     

17 Dec, 2008

1 commit

  • Impact: reduce false positives in iomem_map_sanity_check()

    Some drivers (vesafb) only map/reserve a portion of a resource.
    If then some other driver comes in and maps the whole resource,
    the current code WARN_ON's. This is not the intent of the checks
    in iomem_map_sanity_check(); rather these checks want to
    warn when crossing *hardware* resources only.

    This patch skips BUSY resources as suggested by Linus.

    Note: having two drivers talk to the same hardware at the same
    time is obviously not optimal behavior, but that's a separate story.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

02 Nov, 2008

1 commit


29 Oct, 2008

1 commit

  • Impact: avoid false-positive WARN_ON()

    Andi Kleen reported:
    > When running x86info on a 2.6.27-git8 system I get
    >
    > resource map sanity check conflict: 0x9e000 0x9efff 0x10000 0x9e7ff System RAM
    > ------------[ cut here ]------------
    > WARNING: at /home/lsrc/linux/arch/x86/mm/ioremap.c:226 __ioremap_caller+0xf2/0x2d6()
    > ...

    Some of the pages below the 1MB ISA addresses will be shared typically by both
    BIOS and system usable RAM. For example:
    BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
    BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)

    x86info reads the low physical address using /dev/mem, which internally
    uses ioremap() for accessing non RAM pages. ioremap() of such low
    pages conflicts with multiple resource entities leading to the
    above warning.

    Change the iomem_map_sanity_check() to allow mapping a page spanning multiple
    resource entities (minimum granularity that one can map is a page anyhow).

    Signed-off-by: Suresh Siddha
    Signed-off-by: Ingo Molnar

    Suresh Siddha