06 Sep, 2016

1 commit

  • This is a trivial fix to correct upper bound addresses to always be
    inclusive. Previously, the majority of ranges specified were inclusive with a
    small minority specifying an exclusive upper bound. This patch fixes this
    inconsistency.

    Signed-off-by: Lorenzo Stoakes
    Signed-off-by: Jonathan Corbet

    Lorenzo Stoakes
     

07 Aug, 2016

1 commit

  • Pull documentation fixes from Jonathan Corbet:
    "Three fixes for the docs build, including removing an annoying warning
    on 'make help' if sphinx isn't present"

    * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
    DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
    Documenation: update cgroup's document path
    Documentation/sphinx: do not warn about missing tools in 'make help'

    Linus Torvalds
     

04 Aug, 2016

1 commit


26 Jul, 2016

1 commit

  • Pull x86 boot updates from Ingo Molnar:
    "The main changes:

    - add initial commits to randomize kernel memory section virtual
    addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
    (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

    - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
    Cook)

    - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

    - misc cleanups/fixes"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: Simplify EBDA-vs-BIOS reservation logic
    x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
    x86/boot: Reorganize and clean up the BIOS area reservation code
    x86/mm: Do not reference phys addr beyond kernel
    x86/mm: Add memory hotplug support for KASLR memory randomization
    x86/mm: Enable KASLR for vmalloc memory regions
    x86/mm: Enable KASLR for physical mapping memory regions
    x86/mm: Implement ASLR for kernel memory regions
    x86/mm: Separate variable for trampoline PGD
    x86/mm: Add PUD VA support for physical mapping
    x86/mm: Update physical mapping variable names
    x86/mm: Refactor KASLR entropy functions
    x86/KASLR: Fix boot crash with certain memory configurations
    x86/boot/64: Add forgotten end of function marker
    x86/KASLR: Allow randomization below the load address
    x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
    x86/KASLR: Randomize virtual address separately
    x86/KASLR: Clarify identity map interface
    x86/boot: Refuse to build with data relocations
    x86/KASLR, x86/power: Remove x86 hibernation restrictions

    Linus Torvalds
     

08 Jul, 2016

1 commit

  • Randomizes the virtual address space of kernel memory regions for
    x86_64. This first patch adds the infrastructure and does not randomize
    any region. The following patches will randomize the physical memory
    mapping, vmalloc and vmemmap regions.

    This security feature mitigates exploits relying on predictable kernel
    addresses. These addresses can be used to disclose the kernel modules
    base addresses or corrupt specific structures to elevate privileges
    bypassing the current implementation of KASLR. This feature can be
    enabled with the CONFIG_RANDOMIZE_MEMORY option.

    The order of each memory region is not changed. The feature looks at the
    available space for the regions based on different configuration options
    and randomizes the base and space between each. The size of the physical
    memory mapping is the available physical memory. No performance impact
    was detected while testing the feature.

    Entropy is generated using the KASLR early boot functions now shared in
    the lib directory (originally written by Kees Cook). Randomization is
    done on PGD & PUD page table levels to increase possible addresses. The
    physical memory mapping code was adapted to support PUD level virtual
    addresses. This implementation on the best configuration provides 30,000
    possible virtual addresses in average for each memory region. An
    additional low memory page is used to ensure each CPU can start with a
    PGD aligned virtual address (for realmode).

    x86/dump_pagetable was updated to correctly display each region.

    Updated documentation on x86_64 memory layout accordingly.

    Performance data, after all patches in the series:

    Kernbench shows almost no difference (-+ less than 1%):

    Before:

    Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
    User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
    (13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)

    After:

    Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
    User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
    (12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)

    Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):

    attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
    5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
    10,0.068,0.071 average,0.0677,0.0677

    Signed-off-by: Thomas Garnier
    Signed-off-by: Kees Cook
    Cc: Alexander Kuleshov
    Cc: Alexander Popov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Aneesh Kumar K.V
    Cc: Baoquan He
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Christian Borntraeger
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: Jan Beulich
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Lv Zheng
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: Xiao Guangrong
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Thomas Garnier
     

01 Jul, 2016

1 commit


22 Apr, 2016

1 commit

  • Correct the size of the module mapping space and the maximum available
    physical memory size of current processors.

    Signed-off-by: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: corbet@lwn.net
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/1461310504-15977-1-git-send-email-jgross@suse.com
    Signed-off-by: Ingo Molnar

    Juergen Gross
     

21 Mar, 2016

1 commit

  • Pull EFI updates from Ingo Molnar:
    "The main changes are:

    - Use separate EFI page tables when executing EFI firmware code.
    This isolates the EFI context from the rest of the kernel, which
    has security and general robustness advantages. (Matt Fleming)

    - Run regular UEFI firmware with interrupts enabled. This is already
    the status quo under other OSs. (Ard Biesheuvel)

    - Various x86 EFI enhancements, such as the use of non-executable
    attributes for EFI memory mappings. (Sai Praneeth Prakhya)

    - Various arm64 UEFI enhancements. (Ard Biesheuvel)

    - ... various fixes and cleanups.

    The separate EFI page tables feature got delayed twice already,
    because it's an intrusive change and we didn't feel confident about
    it - third time's the charm we hope!"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    x86/mm/pat: Fix boot crash when 1GB pages are not supported by the CPU
    x86/efi: Only map kernel text for EFI mixed mode
    x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
    x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
    efi/arm*: Perform hardware compatibility check
    efi/arm64: Check for h/w support before booting a >4 KB granular kernel
    efi/arm: Check for LPAE support before booting a LPAE kernel
    efi/arm-init: Use read-only early mappings
    efi/efistub: Prevent __init annotations from being used
    arm64/vmlinux.lds.S: Handle .init.rodata.xxx and .init.bss sections
    efi/arm64: Drop __init annotation from handle_kernel_image()
    x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
    efi/runtime-wrappers: Run UEFI Runtime Services with interrupts enabled
    efi: Reformat GUID tables to follow the format in UEFI spec
    efi: Add Persistent Memory type name
    efi: Add NV memory attribute
    x86/efi: Show actual ending addresses in efi_print_memmap
    x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0
    efivars: Use to_efivar_entry
    efi: Runtime-wrapper: Get rid of the rtc_lock spinlock
    ...

    Linus Torvalds
     

18 Feb, 2016

1 commit

  • The Intel Software Developer Manual describes bit 24 in the MCG_CAP
    MSR:

    MCG_SER_P (software error recovery support present) flag,
    bit 24 — Indicates (when set) that the processor supports
    software error recovery

    But only some models with this capability bit set will actually
    generate recoverable machine checks.

    Check the model name and set a synthetic capability bit. Provide
    a command line option to set this bit anyway in case the kernel
    doesn't recognise the model name.

    Signed-off-by: Tony Luck
    Reviewed-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com
    Signed-off-by: Ingo Molnar

    Tony Luck
     

29 Nov, 2015

1 commit

  • Make it clear that the EFI page tables are only available during
    EFI runtime calls since that subject has come up a fair numbers
    of times in the past.

    Additionally, add the EFI region start and end addresses to the
    table so that it's possible to see at a glance where they fall
    in relation to other regions.

    Signed-off-by: Matt Fleming
    Reviewed-by: Borislav Petkov
    Acked-by: Borislav Petkov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Jones
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sai Praneeth Prakhya
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1448658575-17029-7-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     

23 Jun, 2015

1 commit

  • Pull x86 core updates from Ingo Molnar:
    "There were so many changes in the x86/asm, x86/apic and x86/mm topics
    in this cycle that the topical separation of -tip broke down somewhat -
    so the result is a more traditional architecture pull request,
    collected into the 'x86/core' topic.

    The topics were still maintained separately as far as possible, so
    bisectability and conceptual separation should still be pretty good -
    but there were a handful of merge points to avoid excessive
    dependencies (and conflicts) that would have been poorly tested in the
    end.

    The next cycle will hopefully be much more quiet (or at least will
    have fewer dependencies).

    The main changes in this cycle were:

    * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
    Gleixner)

    - This is the second and most intrusive part of changes to the x86
    interrupt handling - full conversion to hierarchical interrupt
    domains:

    [IOAPIC domain] -----
    |
    [MSI domain] --------[Remapping domain] ----- [ Vector domain ]
    | (optional) |
    [HPET MSI domain] ----- |
    |
    [DMAR domain] -----------------------------
    |
    [Legacy domain] -----------------------------

    This now reflects the actual hardware and allowed us to distangle
    the domain specific code from the underlying parent domain, which
    can be optional in the case of interrupt remapping. It's a clear
    separation of functionality and removes quite some duct tape
    constructs which plugged the remap code between ioapic/msi/hpet
    and the vector management.

    - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
    injection into guests (Feng Wu)

    * x86/asm changes:

    - Tons of cleanups and small speedups, micro-optimizations. This
    is in preparation to move a good chunk of the low level entry
    code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
    Brian Gerst)

    - Moved all system entry related code to a new home under
    arch/x86/entry/ (Ingo Molnar)

    - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
    Conversion to C will reintroduce many of them - but meanwhile
    they are only getting in the way, and the upstream kernel does
    not rely on them (Ingo Molnar)

    - NOP handling refinements. (Borislav Petkov)

    * x86/mm changes:

    - Big PAT and MTRR rework: making the code more robust and
    preparing to phase out exposing direct MTRR interfaces to drivers -
    in favor of using PAT driven interfaces (Toshi Kani, Luis R
    Rodriguez, Borislav Petkov)

    - New ioremap_wt()/set_memory_wt() interfaces to support
    Write-Through cached memory mappings. This is especially
    important for good performance on NVDIMM hardware (Toshi Kani)

    * x86/ras changes:

    - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

    This is an important RAS feature which adds hardware support for
    poisoned data. That means roughly that the hardware marks data
    which it has detected as corrupted but wasn't able to correct, as
    poisoned data and raises an APIC interrupt to signal that in the
    form of a deferred error. It is the OS's responsibility then to
    take proper recovery action and thus prolonge system lifetime as
    far as possible.

    - Add support for Intel "Local MCE"s: upcoming CPUs will support
    CPU-local MCE interrupts, as opposed to the traditional system-
    wide broadcasted MCE interrupts (Ashok Raj)

    - Misc cleanups (Borislav Petkov)

    * x86/platform changes:

    - Intel Atom SoC updates

    ... and lots of other cleanups, fixlets and other changes - see the
    shortlog and the Git log for details"

    * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
    x86/hpet: Use proper hpet device number for MSI allocation
    x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
    x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
    x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
    x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
    genirq: Prevent crash in irq_move_irq()
    genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
    iommu, x86: Properly handle posted interrupts for IOMMU hotplug
    iommu, x86: Provide irq_remapping_cap() interface
    iommu, x86: Setup Posted-Interrupts capability for Intel iommu
    iommu, x86: Add cap_pi_support() to detect VT-d PI capability
    iommu, x86: Avoid migrating VT-d posted interrupts
    iommu, x86: Save the mode (posted or remapped) of an IRTE
    iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
    iommu: dmar: Provide helper to copy shared irte fields
    iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
    iommu: Add new member capability to struct irq_remap_ops
    x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
    x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
    x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
    ...

    Linus Torvalds
     

07 Jun, 2015

1 commit

  • Initialize and prepare for handling LMCEs. Add a boot-time
    option to disable LMCEs.

    Signed-off-by: Ashok Raj
    [ Simplify stuff, align statements for better readability, reflow comments; kill
    unused lmce_clear(); save us an MSR write if LMCE is already enabled. ]
    Signed-off-by: Borislav Petkov
    Cc: Andrew Morton
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Ashok Raj
     

27 May, 2015

1 commit

  • ... to Documentation/x86/ as it is going to collect more and not
    only 64-bit specific info.

    Signed-off-by: Borislav Petkov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/1432628901-18044-16-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

14 Feb, 2015

1 commit

  • This patch adds arch specific code for kernel address sanitizer.

    16TB of virtual addressed used for shadow memory. It's located in range
    [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
    stacks.

    At early stage we map whole shadow region with zero page. Latter, after
    pages mapped to direct mapping address range we unmap zero pages from
    corresponding shadow (see kasan_map_shadow()) and allocate and map a real
    shadow memory reusing vmemmap_populate() function.

    Also replace __pa with __pa_nodebug before shadow initialized. __pa with
    CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
    __phys_addr is instrumented, so __asan_load could be called before shadow
    area initialized.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Jim Davis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

03 Jan, 2015

1 commit

  • This causes all non-NMI, non-double-fault kernel entries from
    userspace to run on the normal kernel stack. Double-fault is
    exempt to minimize confusion if we double-fault directly from
    userspace due to a bad kernel stack.

    This is, suprisingly, simpler and shorter than the current code. It
    removes the IMO rather frightening paranoid_userspace path, and it
    make sync_regs much simpler.

    There is no risk of stack overflow due to this change -- the kernel
    stack that we switch to is empty.

    This will also enable us to create non-atomic sections within
    machine checks from userspace, which will simplify memory failure
    handling. It will also allow the upcoming fsgsbase code to be
    simplified, because it doesn't need to worry about usergs when
    scheduling in paranoid_exit, as that code no longer exists.

    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Tony Luck
    Acked-by: Borislav Petkov
    Signed-off-by: Andy Lutomirski

    Andy Lutomirski
     

19 Sep, 2014

1 commit

  • Peter Anvin says:

    > 0xffff880000000000 is the lowest usable address because we have
    > agreed to leave 0xffff800000000000-0xffff880000000000 for the
    > hypervisor or other non-OS uses.

    Let's call this out in the documentation.

    This came up during the kernel address sanitizer discussions
    where it was proposed to use this area for other kernel things.

    Signed-off-by: Dave Hansen
    Cc: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Link: http://lkml.kernel.org/r/20140918195606.841389D2@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

01 May, 2014

1 commit

  • The IRET instruction, when returning to a 16-bit segment, only
    restores the bottom 16 bits of the user space stack pointer. This
    causes some 16-bit software to break, but it also leaks kernel state
    to user space. We have a software workaround for that ("espfix") for
    the 32-bit kernel, but it relies on a nonzero stack segment base which
    is not available in 64-bit mode.

    In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

    we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
    the logic that 16-bit support is crippled on 64-bit kernels anyway (no
    V86 support), but it turns out that people are doing stuff like
    running old Win16 binaries under Wine and expect it to work.

    This works around this by creating percpu "ministacks", each of which
    is mapped 2^16 times 64K apart. When we detect that the return SS is
    on the LDT, we copy the IRET frame to the ministack and use the
    relevant alias to return to userspace. The ministacks are mapped
    readonly, so if IRET faults we promote #GP to #DF which is an IST
    vector and thus has its own stack; we then do the fixup in the #DF
    handler.

    (Making #GP an IST exception would make the msr_safe functions unsafe
    in NMI/MC context, and quite possibly have other effects.)

    Special thanks to:

    - Andy Lutomirski, for the suggestion of using very small stack slots
    and copy (as opposed to map) the IRET frame there, and for the
    suggestion to mark them readonly and let the fault promote to #DF.
    - Konrad Wilk for paravirt fixup and testing.
    - Borislav Petkov for testing help and useful comments.

    Reported-by: Brian Gerst
    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Cc: Konrad Rzeszutek Wilk
    Cc: Borislav Petkov
    Cc: Andrew Lutomriski
    Cc: Linus Torvalds
    Cc: Dirk Hohndel
    Cc: Arjan van de Ven
    Cc: comex
    Cc: Alexander van Heukelum
    Cc: Boris Ostrovsky
    Cc: # consider after upstream merge

    H. Peter Anvin
     

23 Jan, 2014

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual rocket science stuff from trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    neighbour.h: fix comment
    sched: Fix warning on make htmldocs caused by wait.h
    slab: struct kmem_cache is protected by slab_mutex
    doc: Fix typo in USB Gadget Documentation
    of/Kconfig: Spelling s/one/once/
    mkregtable: Fix sscanf handling
    lp5523, lp8501: comment improvements
    thermal: rcar: comment spelling
    treewide: fix comments and printk msgs
    IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
    Documentation: update /proc/uptime field description
    Documentation: Fix size parameter for snprintf
    arm: fix comment header and macro name
    asm-generic: uaccess: Spelling s/a ny/any/
    mtd: onenand: fix comment header
    doc: driver-model/platform.txt: fix a typo
    drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
    doc: Fix typo (acces_process_vm -> access_process_vm)
    treewide: Fix typos in printk
    drivers/gpu/drm/qxl/Kconfig: reformat the help text
    ...

    Linus Torvalds
     

02 Dec, 2013

1 commit


26 Nov, 2013

1 commit


02 Nov, 2013

1 commit

  • We map the EFI regions needed for runtime services non-contiguously,
    with preserved alignment on virtual addresses starting from -4G down
    for a total max space of 64G. This way, we provide for stable runtime
    services addresses across kernels so that a kexec'd kernel can still use
    them.

    Thus, they're mapped in a separate pagetable so that we don't pollute
    the kernel namespace.

    Add an efi= kernel command line parameter for passing miscellaneous
    options and chicken bits from the command line.

    While at it, add a chicken bit called "efi=old_map" which can be used as
    a fallback to the old runtime services mapping method in case there's
    some b0rkage with a particular EFI implementation (haha, it is hard to
    hold up the sarcasm here...).

    Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Matt Fleming

    Borislav Petkov
     

09 Jul, 2013

1 commit


30 Apr, 2013

1 commit


11 Apr, 2013

1 commit

  • Documentation/kernel-parameters.txt and
    Documentation/x86/x86_64/boot-options.txt contain virtually
    identical text describing earlyprintk.

    This consolidates the two copies and updates the documentation a
    bit. No one ever documented the:

    earlyprintk=serial,0x1008,115200

    syntax, nor mentioned that ARM is now a supported earlyprintk
    arch.

    Signed-off-by: Dave Hansen
    Cc: Rob Landley
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Link: http://lkml.kernel.org/r/20130410210338.E2930E98@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

03 Apr, 2013

1 commit


23 Jan, 2013

1 commit


28 Sep, 2012

1 commit

  • The ACPI spec doesn't provide for a way for the bios to pass down
    recommended thresholds to the OS on a _per-bank_ basis. This patch adds
    a new boot option, which if passed, tells Linux to use CMCI thresholds
    set by the bios.

    As fail-safe, we initialize threshold to 1 if some banks have not been
    initialized by the bios and warn the user.

    Signed-off-by: Naveen N. Rao
    Signed-off-by: Tony Luck

    Naveen N. Rao
     

10 May, 2011

1 commit


23 Mar, 2011

1 commit


18 Mar, 2011

1 commit


29 Jun, 2010

1 commit

  • IRQ stacks provide much better safety against unexpected stack use from
    interrupts, at the minimal downside of slightly higher memory usage.
    Enable irq stacks also for the default 8k stack on 32-bit kernels to
    minimize the problem of stack overflows through interrupt activity.

    This is what the 64-bit kernel and various other architectures already do.

    Signed-off-by: Christoph Hellwig
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     

16 Feb, 2010

2 commits

  • Now that numa=fake=[MG] is implemented, it is possible to remove
    configurable node size support. The command-line parsing was already
    broken (numa=fake=*128, for example, would not work) and since fake nodes
    are now interleaved over physical nodes, this support is no longer
    required.

    Signed-off-by: David Rientjes
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    David Rientjes
     
  • numa=fake=N specifies the number of fake nodes, N, to partition the
    system into and then allocates them by interleaving over physical nodes.
    This requires knowledge of the system capacity when attempting to
    allocate nodes of a certain size: either very large nodes to benchmark
    scalability of code that operates on individual nodes, or very small
    nodes to find bugs in the VM.

    This patch introduces numa=fake=[MG] so it is possible to specify
    the size of each node to allocate. When used, nodes of the size
    specified will be allocated and interleaved over the set of physical
    nodes.

    FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
    include/asm/numa_64.h.

    Signed-off-by: David Rientjes
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    David Rientjes
     

12 Jun, 2009

1 commit


11 Jun, 2009

1 commit

  • This patch introduces three boot options (no_cmci, dont_log_ce
    and ignore_ce) to control handling for corrected errors.

    The "mce=no_cmci" boot option disables the CMCI feature.

    Since CMCI is a new feature so having boot controls to disable
    it will be a help if the hardware is misbehaving.

    The "mce=dont_log_ce" boot option disables logging for corrected
    errors. All reported corrected errors will be cleared silently.
    This option will be useful if you never care about corrected
    errors.

    The "mce=ignore_ce" boot option disables features for corrected
    errors, i.e. polling timer and cmci. All corrected events are
    not cleared and kept in bank MSRs.

    Usually this disablement is not recommended, however it will be
    a help if there are some conflict with the BIOS or hardware
    monitoring applications etc., that clears corrected events in
    banks instead of OS.

    [ And trivial cleanup (space -> tab) for doc is included. ]

    Signed-off-by: Hidetoshi Seto
    Reviewed-by: Andi Kleen
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

04 Jun, 2009

1 commit

  • On Intel platforms machine check exceptions are always broadcast to
    all CPUs. This patch makes the machine check handler synchronize all
    these machine checks, elect a Monarch to handle the event and collect
    the worst event from all CPUs and then process it first.

    This has some advantages:

    - When there is a truly data corrupting error the system panics as
    quickly as possible. This improves containment of corrupted
    data and makes sure the corrupted data never hits stable storage.

    - The panics are synchronized and do not reenter the panic code
    on multiple CPUs (which currently does not handle this well).

    - All the errors are reported. Currently it often happens that
    another CPU happens to do the panic first, but reports useless
    information (empty machine check) because the real error
    happened on another CPU which came in later.
    This is a big advantage on Nehalem where the 8 threads per CPU
    lead to often the wrong CPU winning the race and dumping
    useless information on a machine check. The problem also occurs
    in a less severe form on older CPUs.

    - The system can detect when no CPUs detected a machine check
    and shut down the system. This can happen when one CPU is so
    badly hung that that it cannot process a machine check anymore
    or when some external agent wants to stop the system by
    asserting the machine check pin. This follows Intel hardware
    recommendations.

    - This matches the recommended error model by the CPU designers.

    - The events can be output in true severity order

    - When a panic happens on another CPU it makes sure to be actually
    be able to process the stop IPI by enabling interrupts.

    The code is extremly careful to handle timeouts while waiting
    for other CPUs. It can't rely on the normal timing mechanisms
    (jiffies, ktime_get) because of its asynchronous/lockless nature,
    so it uses own timeouts using ndelay() and a "SPINUNIT"

    The timeout is configurable. By default it waits for upto one
    second for the other CPUs. This can be also disabled.

    From some informal testing AMD systems do not see to broadcast
    machine checks, so right now it's always disabled by default on
    non Intel CPUs or also on very old Intel systems.

    Includes fixes from Ying Huang
    Fixed a "ecception" in a comment (H.Seto)
    Moved global_nwo reset later based on suggestion from H.Seto
    v2: Avoid duplicate messages

    [ Impact: feature, fixes long standing problems. ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

29 May, 2009

1 commit

  • Document that check_interval set to 0 means no polling.
    Noticed by Hidetoshi Seto

    Also add a reference from boot options to the sysfs tunables

    Acked-by: Hidetoshi Seto
    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

18 May, 2009

1 commit

  • after:

    | commit b263295dbffd33b0fbff670720fa178c30e3392a
    | Author: Christoph Lameter
    | Date: Wed Jan 30 13:30:47 2008 +0100
    |
    | x86: 64-bit, make sparsemem vmemmap the only memory model

    we don't have MEMORY_HOTPLUG_RESERVE anymore.

    Historically, x86-64 had an architecture-specific method for memory hotplug
    whereby it scanned the SRAT for physical memory ranges that could be
    potentially used for memory hot-add later. By reserving those ranges
    without physical memory, the memmap would be allocated and left dormant
    until needed. This depended on the DISCONTIG memory model which has been
    removed so the code implementing HOTPLUG_RESERVE is now dead.

    This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.

    (Changelog authored by Mel.)

    v2: updated changelog, and remove hotadd= in doc

    [ Impact: remove dead code ]

    Signed-off-by: Yinghai Lu
    Reviewed-by: Christoph Lameter
    Reviewed-by: Mel Gorman
    Workflow-found-OK-by: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

06 May, 2009

2 commits

  • Fix a trivial typo in Documentation/x86/x86_64/mm.txt.

    [ Impact: documentation only ]

    Signed-off-by: H. Peter Anvin
    Cc: Rik van Riel

    H. Peter Anvin
     
  • Extend the maximum addressable memory on x86-64 from 2^44 to
    2^46 bytes. This requires some shuffling around of the vmalloc
    and virtual memmap memory areas, to keep them away from the
    direct mapping of up to 64TB of physical memory.

    This patch also introduces a guard hole between the vmalloc
    area and the virtual memory map space. There's really no
    good reason why we wouldn't have a guard hole there.

    [ Impact: future hardware enablement ]

    Signed-off-by: Rik van Riel
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Rik van Riel