02 Nov, 2011

23 commits


24 Oct, 2011

1 commit

  • Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
    regression since 2.6.39, namely the machine reboots occasionally at S4
    resume. It doesn't happen always, overall rate is about 1/20. But,
    like other bugs, once when this happens, it continues to happen.

    This patch fixes the problem by essentially reverting the memory
    assignment in the older way.

    Signed-off-by: Takashi Iwai
    Cc:
    Cc: Rafael J. Wysocki
    Cc: Yinghai Lu
    [ We'll hopefully find the real fix, but that's too late for 3.1 now ]
    Signed-off-by: Linus Torvalds

    Takashi Iwai
     

14 Oct, 2011

2 commits


11 Oct, 2011

1 commit

  • This UML breakage:

    linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
    linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

    Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
    vsyscall= parameter") - the vsyscall emulation code is not fully cooked
    yet as UML relies on some rather fragile SIGSEGV semantics.

    Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
    to vsyscall=native for now, this patch implements that.

    Signed-off-by: Adrian Bunk
    Acked-by: Andrew Lutomirski
    Cc: H. Peter Anvin
    Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi
    Signed-off-by: Ingo Molnar

    Adrian Bunk
     

07 Oct, 2011

1 commit

  • In summary, this DMI quirk uses the _CRS info by default for the ASUS
    M2V-MX SE by turning on `pci=use_crs` and is similar to the quirk
    added by commit 2491762cfb47 ("x86/PCI: use host bridge _CRS info on
    ASRock ALiveSATA2-GLAN") whose commit message should be read for further
    information.

    Since commit 3e3da00c01d0 ("x86/pci: AMD one chain system to use pci
    read out res") Linux gives the following oops:

    parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
    HDA Intel 0000:20:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
    HDA Intel 0000:20:01.0: setting latency timer to 64
    BUG: unable to handle kernel paging request at ffffc90011c08000
    IP: [] azx_probe+0x3ad/0x86b [snd_hda_intel]
    PGD 13781a067 PUD 13781b067 PMD 1300ba067 PTE 800000fd00000173
    Oops: 0009 [#1] SMP
    last sysfs file: /sys/module/snd_pcm/initstate
    CPU 0
    Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event tpm_tis tpm snd_seq tpm_bios psmouse parport_pc snd_timer snd_seq_device parport processor evdev snd i2c_viapro thermal_sys amd64_edac_mod k8temp i2c_core soundcore shpchp pcspkr serio_raw asus_atk0110 pci_hotplug edac_core button snd_page_alloc edac_mce_amd ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod raid1 md_mod usbhid hid sg sd_mod crc_t10dif sr_mod cdrom ata_generic uhci_hcd sata_via pata_via libata ehci_hcd usbcore scsi_mod via_rhine mii nls_base [last unloaded: scsi_wait_scan]
    Pid: 1153, comm: work_for_cpu Not tainted 2.6.37-1-amd64 #1 M2V-MX SE/System Product Name
    RIP: 0010:[] [] azx_probe+0x3ad/0x86b [snd_hda_intel]
    RSP: 0018:ffff88013153fe50 EFLAGS: 00010286
    RAX: ffffc90011c08000 RBX: ffff88013029ec00 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffff88013341d000 R08: 0000000000000000 R09: 0000000000000040
    R10: 0000000000000286 R11: 0000000000003731 R12: ffff88013029c400
    R13: 0000000000000000 R14: 0000000000000000 R15: ffff88013341d090
    FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:00000000f7610ab0
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffc90011c08000 CR3: 0000000132f57000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process work_for_cpu (pid: 1153, threadinfo ffff88013153e000, task ffff8801303c86c0)
    Stack:
    0000000000000005 ffffffff8123ad65 00000000000136c0 ffff88013029c400
    ffff8801303c8998 ffff88013341d000 ffff88013341d090 ffff8801322d9dc8
    ffff88013341d208 0000000000000000 0000000000000000 ffffffff811ad232
    Call Trace:
    [] ? __pm_runtime_set_status+0x162/0x186
    [] ? local_pci_probe+0x49/0x92
    [] ? do_work_for_cpu+0x0/0x1b
    [] ? do_work_for_cpu+0x0/0x1b
    [] ? do_work_for_cpu+0xb/0x1b
    [] ? kthread+0x7a/0x82
    [] ? kernel_thread_helper+0x4/0x10
    [] ? kthread+0x0/0x82
    [] ? kernel_thread_helper+0x0/0x10
    Code: f4 01 00 00 ef 31 f6 48 89 df e8 29 dd ff ff 85 c0 0f 88 2b 03 00 00 48 89 ef e8 b4 39 c3 e0 8b 7b 40 e8 fc 9d b1 e0 48 8b 43 38 8b 10 66 89 14 24 8b 43 14 83 e8 03 83 f8 01 77 32 31 d2 be
    RIP [] azx_probe+0x3ad/0x86b [snd_hda_intel]
    RSP
    CR2: ffffc90011c08000
    ---[ end trace 8d1f3ebc136437fd ]---

    Trusting the ACPI _CRS information (`pci=use_crs`) fixes this problem.

    $ dmesg | grep -i crs # with the quirk
    PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug

    The match has to be against the DMI board entries though since the vendor entries are not populated.

    DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304 10/30/2007

    This quirk should be removed when `pci=use_crs` is enabled for machines
    from 2006 or earlier or some other solution is implemented.

    Using coreboot [1] with this board the problem does not exist but this
    quirk also does not affect it either. To be safe though the check is
    tightened to only take effect when the BIOS from American Megatrends is
    used.

    15:13 < ruik> but coreboot does not need that
    15:13 < ruik> because i have there only one root bus
    15:13 < ruik> the audio is behind a bridge

    $ sudo dmidecode
    BIOS Information
    Vendor: American Megatrends Inc.
    Version: 0304
    Release Date: 10/30/2007

    [1] http://www.coreboot.org/

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=30552

    Cc: stable@kernel.org (2.6.34)
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: x86@kernel.org
    Signed-off-by: Paul Menzel
    Signed-off-by: Bjorn Helgaas
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Paul Menzel
     

01 Oct, 2011

1 commit

  • …for-linus' of git://tesla.tglx.de/git/linux-2.6-tip

    * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
    irq: Fix check for already initialized irq_domain in irq_domain_add
    irq: Add declaration of irq_domain_simple_ops to irqdomain.h

    * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
    x86/rtc: Don't recursively acquire rtc_lock

    * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
    posix-cpu-timers: Cure SMP wobbles
    sched: Fix up wchan borkage
    sched/rt: Migrate equal priority tasks to available CPUs

    Linus Torvalds
     

26 Sep, 2011

2 commits


21 Sep, 2011

1 commit

  • A deadlock was introduced on x86 in commit ef68c8f87ed1 ("x86:
    Serialize EFI time accesses on rtc_lock") because efi_get_time()
    and friends can be called with rtc_lock already held by
    read_persistent_time(), e.g.:

    timekeeping_init()
    read_persistent_clock()

    To fix this let's push the locking down into the get_wallclock()
    and set_wallclock() implementations. Only the clock
    implementations that access the x86 RTC directly need to acquire
    rtc_lock, so it makes sense to push the locking down into the
    rtc, vrtc and efi code.

    The virtualization implementations don't require rtc_lock to be
    held because they provide their own serialization.

    Signed-off-by: Matt Fleming
    Acked-by: Jan Beulich
    Acked-by: Avi Kivity [for the virtualization aspect]
    Cc: "H. Peter Anvin"
    Cc: Zhang Rui
    Cc: Josh Boyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Matt Fleming
     

17 Sep, 2011

1 commit


16 Sep, 2011

1 commit

  • On x86-64, they were just wasteful: with the explicitly added (now
    unnecessary) padding, the size of the alternatives structure was 16
    bytes, and an alignment of 8 bytes didn't hurt much.

    However, it was still silly, since the natural size and alignment for
    the structure is actually just 12 bytes, 4-byte aligned since commit
    59e97e4d6fbc ("x86: Make alternative instruction pointers relative").
    So removing the padding, and removing the extra alignment is just a good
    idea.

    On x86-32, the alignment of 4 bytes was correct, but was incorrectly
    hardcoded as 8 bytes in . That header file had
    used to be an x86-64 only header file, but various unification efforts
    have made it be used for x86-32 too (ie the unification of rwlock and
    rwsem).

    That in turn caused x86-32 boot failures, because the extra alignment
    would result in random zero-filled words in the altinstructions section,
    causing oopses early at boot when doing alternative instruction
    replacement.

    So just remove all the alignment noise entirely. It's wrong, and it's
    unnecessary. The section itself is already properly aligned by the
    linker scripts, and all additions to the section had better be of the
    proper 12-byte format, keeping it aligned. So if the align directive
    were to ever make a difference, that would be an indication of a serious
    bug to begin with.

    Reported-by: Werner Landgraf
    Acked-by: Andrew Lutomirski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

15 Sep, 2011

1 commit


13 Sep, 2011

2 commits

  • The patch "xen: use maximum reservation to limit amount of usable RAM"
    (d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
    do not use 'dom0_mem=' argument with:

    reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
    (XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
    (XEN) domain_crash_sync called from entry.S
    (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
    ...

    The reason being that the last E820 entry is created using the
    'extra_pages' (which is based on how many pages have been freed).
    The mentioned git commit sets the initial value of 'extra_pages'
    using a hypercall which returns the number of pages (if dom0_mem
    has been used) or -1 otherwise. If the later we return with
    MAX_DOMAIN_PAGES as basis for calculation:

    return min(max_pages, MAX_DOMAIN_PAGES);

    and use it:

    extra_limit = xen_get_max_pages();
    if (extra_limit >= max_pfn)
    extra_pages = extra_limit - max_pfn;
    else
    extra_pages = 0;

    which means we end up with extra_pages = 128GB in PFNs (33554432)
    - 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
    and then we add that value to the E820 making it:

    Xen: 00000000ff000000 - 0000000100000000 (reserved)
    Xen: 0000000100000000 - 000000133f2e2000 (usable)

    which is clearly wrong. It should look as so:

    Xen: 00000000ff000000 - 0000000100000000 (reserved)
    Xen: 0000000100000000 - 000000027fbda000 (usable)

    Naturally this problem does not present itself if dom0_mem=max:X
    is used.

    CC: stable@kernel.org
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    David Vrabel
     
  • * 'upstream/bugfix' of git://github.com/jsgf/linux-xen:
    xen: use non-tracing preempt in xen_clocksource_read()

    Linus Torvalds
     

10 Sep, 2011

1 commit

  • Commit b03e7495a862 ("PCI: Set PCI-E Max Payload Size on fabric")
    introduced a potential NULL pointer dereference in calls to
    pcie_bus_configure_settings due to attempts to access pci_bus self
    variables when the self pointer is NULL.

    To correct this, verify that the self pointer in pci_bus is non-NULL
    before dereferencing it.

    Reported-by: Stanislaw Gruszka
    Signed-off-by: Shyam Iyer
    Signed-off-by: Jon Mason
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Shyam Iyer
     

09 Sep, 2011

1 commit

  • PV spinlocks cannot possibly work with the current code because they are
    enabled after pvops patching has already been done, and because PV
    spinlocks use a different data structure than native spinlocks so we
    cannot switch between them dynamically. A spinlock that has been taken
    once by the native code (__ticket_spin_lock) cannot be taken by
    __xen_spin_lock even after it has been released.

    Reported-and-Tested-by: Stefan Bader
    Signed-off-by: Stefano Stabellini
    Signed-off-by: Konrad Rzeszutek Wilk

    Stefano Stabellini
     

08 Sep, 2011

1 commit