16 Apr, 2015

1 commit

  • All users of __check_region(), check_region(), and check_mem_region() are
    gone. We got rid of the last user in v4.0-rc1. Remove them.

    bloat-o-meter on x86_64 shows:

    add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102)
    function old new delta
    __kstrtab___check_region 15 - -15
    __ksymtab___check_region 16 - -16
    __check_region 71 - -71

    Signed-off-by: Jakub Sitnicki
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jakub Sitnicki
     

05 Feb, 2015

1 commit

  • Currently ACPI, PCI and pnp all implement the same resource list
    management with different data structure. We need to transfer from
    one data structure into another when passing resources from one
    subsystem into another subsystem. So move struct resource_list_entry
    from ACPI into resource core and rename it as resource_entry,
    then it could be reused by different subystems and avoid the data
    structure conversion.

    Introduce dedicated header file resource_ext.h instead of embedding
    it into ioport.h to avoid header file inclusion order issues.

    Signed-off-by: Jiang Liu
    Acked-by: Vinod Koul
    Signed-off-by: Rafael J. Wysocki

    Jiang Liu
     

14 Oct, 2014

1 commit

  • We have a large university system in the UK that is experiencing very long
    delays modprobing the driver for a specific I/O device. The delay is from
    8-10 minutes per device and there are 31 devices in the system. This 4 to
    5 hour delay in starting up those I/O devices is very much a burden on the
    customer.

    There are two causes for requiring a restart/reload of the drivers. First
    is periodic preventive maintenance (PM) and the second is if any of the
    devices experience a fatal error. Both of these trigger this excessively
    long delay in bringing the system back up to full capability.

    The problem was tracked down to a very slow IOREMAP operation and the
    excessively long ioresource lookup to insure that the user is not
    attempting to ioremap RAM. These patches provide a speed up to that
    function.

    The modprobe time appears to be affected quite a bit by previous activity
    on the ioresource list, which I suspect is due to cache preloading. While
    the overall improvement is impacted by other overhead of starting the
    devices, this drastically improves the modprobe time.

    Also our system is considerably smaller so the percentages gained will not
    be the same. Best case improvement with the modprobe on our 20 device
    smallish system was from 'real 5m51.913s' to 'real 0m18.275s'.

    This patch (of 2):

    Since the ioremap operation is verifying that the specified address range
    is NOT RAM, it will search the entire ioresource list if the condition is
    true. To make matters worse, it does this one 4k page at a time. For a
    128M BAR region this is 32 passes to determine the entire region does not
    contain any RAM addresses.

    This patch provides another resource lookup function, region_is_ram, that
    searches for the entire region specified, verifying that it is completely
    contained within the resource region. If it is found, then it is checked
    to be RAM or not, within a single pass.

    The return result reflects if it was found or not (-1), and whether it is
    RAM (1) or not (0). This allows the caller to fallback to the previous
    page by page search if it was not found.

    [akpm@linux-foundation.org: fix spellos and typos in comment]
    Signed-off-by: Mike Travis
    Acked-by: Alex Thorlton
    Reviewed-by: Cliff Wickman
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Mark Salter
    Cc: Dave Young
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Travis
     

10 Oct, 2014

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "The interesting things here are:

    - Turn on Config Request Retry Status Software Visibility. This
    caused hangs last time, but we included a fix this time.
    - Rework PCI device configuration to use _HPP/_HPX more aggressively
    - Allow PCI devices to be put into D3cold during system suspend
    - Add arm64 PCI support
    - Add APM X-Gene host bridge driver
    - Add TI Keystone host bridge driver
    - Add Xilinx AXI host bridge driver

    More detailed summary:

    Enumeration
    - Check Vendor ID only for Config Request Retry Status (Rajat Jain)
    - Enable Config Request Retry Status when supported (Rajat Jain)
    - Add generic domain handling (Catalin Marinas)
    - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado)

    Resource management
    - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu)
    - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr)

    PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Move _HPP & _HPX handling into core (Bjorn Helgaas)
    - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas)
    - Apply _HPP/_HPX to display devices (Bjorn Helgaas)
    - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas)
    - Fix wait time in pciehp timeout message (Yinghai Lu)
    - Add more pciehp Slot Control debug output (Yinghai Lu)
    - Stop disabling pciehp notifications during init (Yinghai Lu)

    MSI
    - Remove arch_msi_check_device() (Alexander Gordeev)
    - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev)
    - Move D0 check into pci_msi_check_device() (Alexander Gordeev)
    - Remove unused kobject from struct msi_desc (Yijing Wang)
    - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang)
    - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang)
    - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang)
    - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang)
    - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang)

    Power management
    - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki)
    - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki)

    AER
    - Add additional AER error strings (Gong Chen)
    - Make standalone includable (Thierry Reding)

    Virtualization
    - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson)
    - Add ACS quirk for Intel 10G NICs (Alex Williamson)
    - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp)
    - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson)
    - Add device flag helpers (Ethan Zhao)
    - Assume all Mellanox devices have broken INTx masking (Gavin Shan)

    Generic host bridge driver
    - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau)
    - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau)
    - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau)
    - Fix the conversion of IO ranges into IO resources (Liviu Dudau)
    - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau)
    - Add support for parsing PCI host bridge resources from DT (Liviu Dudau)
    - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau)
    - Add arm64 architectural support for PCI (Liviu Dudau)

    APM X-Gene
    - Add APM X-Gene PCIe driver (Tanmay Inamdar)
    - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar)

    Freescale i.MX6
    - Probe in module_init(), not fs_initcall() (Lucas Stach)
    - Delay enabling reference clock for SS until it stabilizes (Tim Harvey)

    Marvell MVEBU
    - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni)

    NVIDIA Tegra
    - Make sure the PCIe PLL is really reset (Eric Yuen)
    - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang)
    - Fix extended configuration space mapping (Peter Daifuku)
    - Implement resource hierarchy (Thierry Reding)
    - Clear CLKREQ# enable on port disable (Thierry Reding)
    - Add Tegra124 support (Thierry Reding)

    ST Microelectronics SPEAr13xx
    - Pass config resource through reg property (Pratyush Anand)

    Synopsys DesignWare
    - Use NULL instead of false (Fabio Estevam)
    - Parse bus-range property from devicetree (Lucas Stach)
    - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach)
    - Remove pci_assign_unassigned_resources() (Lucas Stach)
    - Check private_data validity in single place (Lucas Stach)
    - Setup and clear exactly one MSI at a time (Lucas Stach)
    - Remove open-coded bitmap operations (Lucas Stach)
    - Fix configuration base address when using 'reg' (Minghuan Lian)
    - Fix IO resource end address calculation (Minghuan Lian)
    - Rename get_msi_data() to get_msi_addr() (Minghuan Lian)
    - Add get_msi_data() to pcie_host_ops (Minghuan Lian)
    - Add support for v3.65 hardware (Murali Karicheri)
    - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand)

    TI Keystone
    - Add TI Keystone PCIe driver (Murali Karicheri)
    - Limit MRSS for all downstream devices (Murali Karicheri)
    - Assume controller is already in RC mode (Murali Karicheri)
    - Set device ID based on SoC to support multiple ports (Murali Karicheri)

    Xilinx AXI
    - Add Xilinx AXI PCIe driver (Srikanth Thokala)
    - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter)

    Miscellaneous
    - Clean up whitespace (Quentin Lambert)
    - Remove assignments from "if" conditions (Quentin Lambert)
    - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri)
    - x86: Mark DMI tables as initialization data (Mathias Krause)
    - x86: Move __init annotation to the correct place (Mathias Krause)
    - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause)
    - x86: Constify pci_mmcfg_probes[] array (Mathias Krause)
    - x86: Mark PCI BIOS initialization code as such (Mathias Krause)
    - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya)
    - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)"

    * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits)
    arm64: dts: Add APM X-Gene PCIe device tree nodes
    PCI: Add ACS quirk for AMD A88X southbridge devices
    PCI: xgene: Add APM X-Gene PCIe driver
    PCI: designware: Remove open-coded bitmap operations
    PCI/MSI: Remove unnecessary temporary variable
    PCI/MSI: Use __write_msi_msg() instead of write_msi_msg()
    MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg()
    PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg()
    PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints
    PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib
    PCI/MSI: Remove unused kobject from struct msi_desc
    PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported()
    PCI/MSI: Move D0 check into pci_msi_check_device()
    PCI/MSI: Remove arch_msi_check_device()
    irqchip: armada-370-xp: Remove arch_msi_check_device()
    PCI/MSI/PPC: Remove arch_msi_check_device()
    arm64: Add architectural support for PCI
    PCI: Add pci_remap_iospace() to map bus I/O resources
    of/pci: Add support for parsing PCI host bridge resources from DT
    of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr()
    ...

    Conflicts:
    arch/arm64/boot/dts/apm-storm.dtsi

    Linus Torvalds
     

05 Sep, 2014

1 commit

  • Provide device-managed implementations of the request_resource() and
    release_resource() functions. Upon failure to request a resource, the new
    devm_request_resource() function will output an error message for
    consistent error reporting.

    Signed-off-by: Thierry Reding
    Signed-off-by: Bjorn Helgaas
    Acked-by: Tejun Heo

    Thierry Reding
     

30 Aug, 2014

1 commit

  • Richard and Daniel reported that UML is broken due to changes to
    resource traversal functions. Problem is that iomem_resource.child can
    be null and new code does not consider that possibility. Old code used
    a for loop and that loop will not even execute if p was null.

    Revert back to for() loop logic and bail out if p is null.

    I also moved sibling_only check out of resource_lock. There is no
    reason to keep it inside the lock.

    Following is backtrace of the UML crash.

    RIP: 0033:[]
    RSP: 0000000081459da0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: 00000000219b3fff RCX: 000000006010d1d9
    RDX: 0000000000000001 RSI: 00000000602dfb94 RDI: 0000000081459df8
    RBP: 0000000081459de0 R08: 00000000601b59f4 R09: ffffffff0000ff00
    R10: ffffffff0000ff00 R11: 0000000081459e88 R12: 0000000081459df8
    R13: 00000000219b3fff R14: 00000000602dfb94 R15: 0000000000000000
    Kernel panic - not syncing: Segfault with no mm
    CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-10454-g58d08e3 #13
    Stack:
    00000000 000080d0 81459df0 219b3fff
    81459e70 6010d1d9 ffffffff 6033e010
    81459e50 6003a269 81459e30 00000000
    Call Trace:
    [] ? kclist_add_private+0x0/0xe7
    [] walk_system_ram_range+0x61/0xb7
    [] ? proc_kcore_init+0x0/0xf1
    [] kcore_update_ram+0x4c/0x168
    [] ? kclist_add+0x0/0x2e
    [] proc_kcore_init+0xea/0xf1
    [] ? proc_kcore_init+0x0/0xf1
    [] ? proc_kcore_init+0x0/0xf1
    [] do_one_initcall+0x13c/0x204
    [] ? parse_args+0x1df/0x2e0
    [] ? parameq+0x0/0x3a
    [] ? strcpy+0x0/0x18
    [] kernel_init_freeable+0x240/0x31e
    [] kernel_init+0x12/0x148
    [] new_thread_handler+0x81/0xa3

    Fixes 8c86e70acead629aacb4a ("resource: provide new functions to walk
    through resources").

    Reported-by: Daniel Walter
    Tested-by: Richard Weinberger
    Tested-by: Toralf Förster
    Tested-by: Daniel Walter
    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

09 Aug, 2014

1 commit

  • I have added two more functions to walk through resources.

    Currently walk_system_ram_range() deals with pfn and /proc/iomem can
    contain partial pages. By dealing in pfn, callback function loses the
    info that last page of a memory range is a partial page and not the full
    page. So I implemented walk_system_ram_res() which returns u64 values to
    callback functions and now it properly return start and end address.

    walk_system_ram_range() uses find_next_system_ram() to find the next ram
    resource. This in turn only travels through siblings of top level child
    and does not travers through all the nodes of the resoruce tree. I also
    need another function where I can walk through all the resources, for
    example figure out where "GART" aperture is. Figure out where ACPI memory
    is.

    So I wrote another function walk_iomem_res() which walks through all
    /proc/iomem resources and returns matches as asked by caller. Caller can
    specify "name" of resource, start and end and flags.

    Got rid of find_next_system_ram_res() and instead implemented more generic
    find_next_iomem_res() which can be used to traverse top level children
    only based on an argument.

    Signed-off-by: Vivek Goyal
    Cc: Yinghai Lu
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

24 May, 2014

1 commit

  • The resource map sanity check message is a bit confusing. Change it to be
    more readable:

    -resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
    +resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff]

    Signed-off-by: Bjorn Helgaas

    Bjorn Helgaas
     

04 Apr, 2014

1 commit


20 Mar, 2014

1 commit

  • We don't set the type (I/O, memory, etc.) of resources added by
    __request_region(), which leads to confusing messages like this:

    address space collision: [io 0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000]

    Set the type of a new resource added by __request_region() (used by
    request_region() and request_mem_region()) to the type of its parent. This
    makes the resource tree internally consistent and fixes messages like the
    above, where the ACPI CPU throttle resource really is an I/O port region,
    but request_region() didn't fill in the type, so %pR didn't know how to
    print it.

    Sample dmesg showing the issue at the link below.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611
    Reported-by: Paul Bolle
    Signed-off-by: Bjorn Helgaas

    Bjorn Helgaas
     

27 Feb, 2014

1 commit

  • We have two identical copies of resource_contains() already, and more
    places that could use it. This moves it to ioport.h where it can be
    shared.

    resource_contains(struct resource *r1, struct resource *r2) returns true
    iff r1 and r2 are the same type (most callers already checked this
    separately) and the r1 address range completely contains r2.

    In addition, the new resource_contains() checks that both r1 and r2 have
    addresses assigned to them. If a resource is IORESOURCE_UNSET, it doesn't
    have a valid address and can't contain or be contained by another resource.
    Some callers already check this or for res->start.

    No functional change.

    Signed-off-by: Bjorn Helgaas

    Bjorn Helgaas
     

04 Jul, 2013

1 commit


07 Jun, 2013

1 commit

  • When param1 is enabled in EINJ but not assigned with a valid
    value, sometimes it will cause the error like below:

    APEI: Can not request [mem 0x7aaa7000-0x7aaa7007] for APEI EINJ Trigger registers

    It is because some firmware will access target address specified in
    param1 to trigger the error when injecting memory error. This will
    cause resource conflict with regular memory. So It must be removed
    from trigger table resources, but incorrect param1/param2
    combination will stop this action. Add extra check to avoid
    this kind of error.

    Signed-off-by: Chen Gong
    Signed-off-by: Tony Luck

    Chen Gong
     

30 Apr, 2013

3 commits

  • When hot removing memory presented at boot time, following messages are shown:

    kernel BUG at mm/slub.c:3409!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc scsi_transport_fc scsi_tgt scsi_mod
    CPU 0
    Pid: 5091, comm: kworker/0:2 Tainted: G W 3.9.0-rc6+ #15
    RIP: kfree+0x232/0x240
    Process kworker/0:2 (pid: 5091, threadinfo ffff88084678c000, task ffff88083928ca80)
    Call Trace:
    __release_region+0xd4/0xe0
    __remove_pages+0x52/0x110
    arch_remove_memory+0x89/0xd0
    remove_memory+0xc4/0x100
    acpi_memory_device_remove+0x6d/0xb1
    acpi_device_remove+0x89/0xab
    __device_release_driver+0x7c/0xf0
    device_release_driver+0x2f/0x50
    acpi_bus_device_detach+0x6c/0x70
    acpi_ns_walk_namespace+0x11a/0x250
    acpi_walk_namespace+0xee/0x137
    acpi_bus_trim+0x33/0x7a
    acpi_bus_hot_remove_device+0xc4/0x1a1
    acpi_os_execute_deferred+0x27/0x34
    process_one_work+0x1f7/0x590
    worker_thread+0x11a/0x370
    kthread+0xee/0x100
    ret_from_fork+0x7c/0xb0
    RIP [] kfree+0x232/0x240
    RSP

    The reason why the messages are shown is to release a resource
    structure, allocated by bootmem, by kfree(). So when we release a
    resource structure, we should check whether it is allocated by bootmem
    or not.

    But even if we know a resource structure is allocated by bootmem, we
    cannot release it since SLxB cannot treat it. So for reusing a resource
    structure, this patch remembers it by using bootmem_resource as follows:

    When releasing a resource structure by free_resource(), free_resource()
    checks whether the resource structure is allocated by bootmem or not.
    If it is allocated by bootmem, free_resource() adds it to
    bootmem_resource. If it is not allocated by bootmem, free_resource()
    release it by kfree().

    And when getting a new resource structure by get_resource(),
    get_resource() checks whether bootmem_resource has released resource
    structures or not. If there is a released resource structure,
    get_resource() returns it. If there is not a releaed resource
    structure, get_resource() returns new resource structure allocated by
    kzalloc().

    [akpm@linux-foundation.org: s/get_resource/alloc_resource/]
    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Toshi Kani
    Cc: Johannes Weiner
    Cc: Ram Pai
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasuaki Ishimatsu
     
  • Add release_mem_region_adjustable(), which releases a requested region
    from a currently busy memory resource. This interface adjusts the
    matched memory resource accordingly even if the requested region does
    not match exactly but still fits into.

    This new interface is intended for memory hot-delete. During bootup,
    memory resources are inserted from the boot descriptor table, such as
    EFI Memory Table and e820. Each memory resource entry usually covers
    the whole contigous memory range. Memory hot-delete request, on the
    other hand, may target to a particular range of memory resource, and its
    size can be much smaller than the whole contiguous memory. Since the
    existing release interfaces like __release_region() require a requested
    region to be exactly matched to a resource entry, they do not allow a
    partial resource to be released.

    This new interface is restrictive (i.e. release under certain
    conditions), which is consistent with other release interfaces,
    __release_region() and __release_resource(). Additional release
    conditions, such as an overlapping region to a resource entry, can be
    supported after they are confirmed as valid cases.

    There is no change to the existing interfaces since their restriction is
    valid for I/O resources.

    [akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()]
    [akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily]
    [akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi]
    Signed-off-by: Toshi Kani
    Reviewed-by : Yasuaki Ishimatsu
    Cc: David Rientjes
    Reviewed-by: Ram Pai
    Cc: T Makphaibulchoke
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Jiang Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Add __adjust_resource(), which is called by adjust_resource() internally
    after the resource_lock is held. There is no interface change to
    adjust_resource(). This change allows other functions to call
    __adjust_resource() internally while the resource_lock is held.

    Signed-off-by: Toshi Kani
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: David Rientjes
    Cc: Ram Pai
    Cc: T Makphaibulchoke
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Jiang Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

06 Oct, 2012

1 commit

  • Using a recursive call add a non-conflicting region in
    __reserve_region_with_split() could result in a stack overflow in the case
    that the recursive calls are too deep. Convert the recursive calls to an
    iterative loop to avoid the problem.

    Tested on a machine containing 135 regions. The kernel no longer panicked
    with stack overflow.

    Also tested with code arbitrarily adding regions with no conflict,
    embedding two consecutive conflicts and embedding two non-consecutive
    conflicts.

    Signed-off-by: T Makphaibulchoke
    Reviewed-by: Ram Pai
    Cc: Paul Gortmaker
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    T Makphaibulchoke
     

31 Jul, 2012

1 commit

  • When the requested range is outside of the root range the logic in
    __reserve_region_with_split will cause an infinite recursion which will
    overflow the stack as seen in the warning bellow.

    This particular stack overflow was caused by requesting the
    (100000000-107ffffff) range while the root range was (0-ffffffff). In
    this case __request_resource would return the whole root range as
    conflict range (i.e. 0-ffffffff). Then, the logic in
    __reserve_region_with_split would continue the recursion requesting the
    new range as (conflict->end+1, end) which incidentally in this case
    equals the originally requested range.

    This patch aborts looking for an usable range when the request does not
    intersect with the root range. When the request partially overlaps with
    the root range, it ajust the request to fall in the root range and then
    continues with the new request.

    When the request is modified or aborted errors and a stack trace are
    logged to allow catching the errors in the upper layers.

    [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
    [ 5.975150] Modules linked in:
    [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
    [ 5.985324] Call Trace:
    [ 5.987759] [] ? console_unlock+0x17b/0x18d
    [ 5.992891] [] warn_slowpath_common+0x48/0x5d
    [ 5.998194] [] ? sub_preempt_count+0x63/0x89
    [ 6.003412] [] warn_slowpath_null+0xf/0x13
    [ 6.008453] [] sub_preempt_count+0x63/0x89
    [ 6.013499] [] _raw_spin_unlock+0x27/0x3f
    [ 6.018453] [] add_partial+0x36/0x3b
    [ 6.022973] [] deactivate_slab+0x96/0xb4
    [ 6.027842] [] __slab_alloc.isra.54.constprop.63+0x204/0x241
    [ 6.034456] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.039842] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.045232] [] kmem_cache_alloc_trace+0x51/0xb0
    [ 6.050710] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.056100] [] kzalloc.constprop.5+0x29/0x38
    [ 6.061320] [] __reserve_region_with_split+0x1c/0xd1
    [ 6.067230] [] __reserve_region_with_split+0xc6/0xd1
    ...
    [ 7.179057] [] __reserve_region_with_split+0xc6/0xd1
    [ 7.184970] [] reserve_region_with_split+0x30/0x42
    [ 7.190709] [] e820_reserve_resources_late+0xd1/0xe9
    [ 7.196623] [] pcibios_resource_survey+0x23/0x2a
    [ 7.202184] [] pcibios_init+0x23/0x35
    [ 7.206789] [] pci_subsys_init+0x3f/0x44
    [ 7.211659] [] do_one_initcall+0x72/0x122
    [ 7.216615] [] ? pci_legacy_init+0x3d/0x3d
    [ 7.221659] [] kernel_init+0xa6/0x118
    [ 7.226265] [] ? start_kernel+0x334/0x334
    [ 7.231223] [] kernel_thread_helper+0x6/0x10

    Signed-off-by: Octavian Purdila
    Signed-off-by: Ram Pai
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Octavian Purdila
     

14 Jun, 2012

1 commit


01 Jun, 2012

1 commit

  • In the comment of allocate_resource(), the explanation of parameter max
    and min is not correct.

    Actually, these two parameters are used to specify the range of the
    resource that will be allocated, not the min/max size that will be
    allocated.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     

04 Feb, 2012

1 commit


31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

30 Sep, 2011

1 commit

  • __find_resource() incorrectly returns a resource window which overlaps
    an existing allocated window. This happens when the parent's
    resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
    to all its children resource-windows.

    __find_resource() looks for gaps in resource allocation among the
    children resource windows. When it encounters the last child window it
    blindly tries the range next to one allocated to the last child. Since
    the last child's window ends at 0xffffffff the calculation overflows,
    leading the algorithm to believe that any window in the range 0x0000000
    to 0xfffffff is available for allocation. This leads to a conflicting
    window allocation.

    Michal Ludvig reported this issue seen on his platform. The following
    patch fixes the problem and has been verified by Michal. I believe this
    bug has been there for ages. It got exposed by git commit 2bbc6942273b
    ("PCI : ability to relocate assigned pci-resources")

    Signed-off-by: Ram Pai
    Tested-by: Michal Ludvig
    Signed-off-by: Linus Torvalds

    Ram Pai
     

31 Jul, 2011

1 commit


07 Jul, 2011

1 commit


18 Dec, 2010

2 commits


29 Oct, 2010

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits)
    x86: allocate space within a region top-down
    x86: update iomem_resource end based on CPU physical address capabilities
    x86/PCI: allocate space from the end of a region, not the beginning
    PCI: allocate bus resources from the top down
    resources: support allocating space within a region from the top down
    resources: handle overflow when aligning start of available area
    resources: ensure callback doesn't allocate outside available space
    resources: factor out resource_clip() to simplify find_resource()
    resources: add a default alignf to simplify find_resource()
    x86/PCI: MMCONFIG: fix region end calculation
    PCI: Add support for polling PME state on suspended legacy PCI devices
    PCI: Export some PCI PM functionality
    PCI: fix message typo
    PCI: log vendor/device ID always
    PCI: update Intel chipset names and defines
    PCI: use new ccflags variable in Makefile
    PCI: add PCI_MSIX_TABLE/PBA defines
    PCI: add PCI vendor id for STmicroelectronics
    x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
    PCI: OLPC: Only enable PCI configuration type override on XO-1
    ...

    Linus Torvalds
     

28 Oct, 2010

1 commit

  • If the same resource is inserted to the resource tree (maybe not on
    purpose), a dead loop will be created. In this situation, The kernel does
    not report any warning or error :(

    The command below will show a endless print.
    #cat /proc/iomem

    [akpm@linux-foundation.org: add WARN_ON()]
    Signed-off-by: Huang Shijie
    Cc: Jesse Barnes
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     

27 Oct, 2010

5 commits

  • Allocate space from the top of a region first, then work downward,
    if an architecture desires this.

    When we allocate space from a resource, we look for gaps between children
    of the resource. Previously, we always looked at gaps from the bottom up.
    For example, given this:

    [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
    [mem 0xbff00000-0xbfffffff] gap -- available
    [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
    [mem 0xe0000000-0xf7ffffff] gap -- available

    we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
    then the [mem 0xe0000000-0xf7ffffff] gap.

    With this patch an architecture can choose to allocate from the top gap
    [mem 0xe0000000-0xf7ffffff] first.

    We can't do this across the board because iomem_resource.end is initialized
    to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't
    address the entire 64-bit physical address space. Therefore, we only
    allocate top-down if the arch requests it by clearing
    "resource_alloc_from_bottom".

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would
    make us think there's more available space than there really is. We
    would likely return something that conflicts with a previous resource,
    which would cause a failure when allocate_resource() requests the newly-
    allocated region.

    Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027
    Reported-by: Fabrice Bellet
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • The alignment callback returns a proposed location, which may have been
    adjusted to avoid ISA aliases or for other architecture-specific reasons.

    We already had a check ("tmp.start < tmp.end") to make sure the callback
    doesn't return an area that extends past the available area. This patch
    reworks the check to make sure it doesn't return an area that extends
    either below or above the available area.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This factors out the min/max clipping to simplify find_resource().
    No functional change.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This removes a test from find_resource(), which is getting cluttered.
    No functional change.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

12 May, 2010

1 commit

  • SuperIO devices share regions and use lock/unlock operations to chip
    select. We therefore need to be able to request a resource and wait for
    it to be freed by whichever other SuperIO device currently hogs it.
    Right now you have to poll which is horrible.

    Add a MUXED field to IO port resources. If the MUXED field is set on the
    resource and on the request (via request_muxed_region) then we block
    until the previous owner of the muxed resource releases their region.

    This allows us to implement proper resource sharing and locking for
    superio chips using code of the form

    enable_my_superio_dev() {
    request_muxed_region(0x44, 0x02, "superio:watchdog");
    outb() ..sequence to enable chip
    }

    disable_my_superio_dev() {
    outb() .. sequence of disable chip
    release_region(0x44, 0x02);
    }

    Signed-off-by: Giel van Schijndel
    Signed-off-by: Alan Cox
    Signed-off-by: Jesse Barnes

    Alan Cox
     

24 Mar, 2010

1 commit

  • request_resource() and insert_resource() only return success or failure,
    which no information about what existing resource conflicted with the
    proposed new reservation. This patch adds request_resource_conflict()
    and insert_resource_conflict(), which return the conflicting resource.

    Callers may use this for better error messages or to adjust the new
    resource and retry the request.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     

04 Mar, 2010

1 commit


03 Mar, 2010

1 commit


02 Mar, 2010

1 commit

  • The System RAM walk shall skip partial RAM pages and avoid calling
    func() on them. So that page_is_ram() return 0 for a partial RAM page.

    In particular, it shall not call func() with len=0.
    This fixes a boot time bug reported by Sachin and root caused by Thomas:

    > >>> WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x169/0x2f1()
    > >>> Hardware name: BladeCenter LS21 -[79716AA]-
    > >>> Modules linked in:
    > >>> Pid: 0, comm: swapper Not tainted 2.6.33-git6-autotest #1
    > >>> Call Trace:
    > >>> [] ? __ioremap_caller+0x169/0x2f1
    > >>> [] warn_slowpath_common+0x77/0xa4
    > >>> [] warn_slowpath_null+0xf/0x11
    > >>> [] __ioremap_caller+0x169/0x2f1
    > >>> [] ? acpi_os_map_memory+0x12/0x1b
    > >>> [] ioremap_nocache+0x12/0x14
    > >>> [] acpi_os_map_memory+0x12/0x1b
    > >>> [] acpi_tb_verify_table+0x29/0x5b
    > >>> [] acpi_load_tables+0x39/0x15a
    > >>> [] acpi_early_init+0x60/0xf5
    > >>> [] start_kernel+0x397/0x3a7
    > >>> [] x86_64_start_reservations+0xa5/0xa9
    > >>> [] x86_64_start_kernel+0xe1/0xe8
    > >>> ---[ end trace 4eaa2a86a8e2da22 ]---
    > >>> ioremap reserve_memtype failed -22

    The return code is -EINVAL, so it failed in the is_ram check, which is
    not too surprising

    > BIOS-provided physical RAM map:
    > BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
    > BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
    > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
    > BIOS-e820: 0000000000100000 - 00000000cffa3900 (usable)
    > BIOS-e820: 00000000cffa3900 - 00000000cffa7400 (ACPI data)

    The ACPI data is not starting on a page boundary and neither does the
    usable RAM area end on a page boundary. Very useful !

    > ACPI: DSDT 00000000cffa3900 036CE (v01 IBM SERLEWIS 00001000 INTL 20060912)

    ACPI is trying to map DSDT at cffa3900, which results in a check
    vs. cffa3000 which is the relevant page boundary. The generic is_ram
    check correctly identifies that as RAM because it's in the usable
    resource area. The old e820 based is_ram check does not take
    overlapping resource areas into account. That's why it works.

    CC: Sachin Sant
    CC: Thomas Gleixner
    CC: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Wu Fengguang
     

01 Mar, 2010

1 commit

  • * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, mm: Unify kernel_physical_mapping_init() API
    x86, mm: Allow highmem user page tables to be disabled at boot time
    x86: Do not reserve brk for DMI if it's not going to be used
    x86: Convert tlbstate_lock to raw_spinlock
    x86: Use the generic page_is_ram()
    x86: Remove BIOS data range from e820
    Move page_is_ram() declaration to mm.h
    Generic page_is_ram: use __weak
    resources: introduce generic page_is_ram()

    Linus Torvalds