12 Apr, 2008

2 commits

  • * 'docs' of git://git.lwn.net/linux-2.6:
    Add additional examples in Documentation/spinlocks.txt
    Move sched-rt-group.txt to scheduler/
    Documentation: move rpc-cache.txt to filesystems/
    Documentation: move nfsroot.txt to filesystems/
    Spell out behavior of atomic_dec_and_lock() in kerneldoc
    Fix a typo in highres.txt
    Fixes to the seq_file document
    Fill out information on patch tags in SubmittingPatches
    Add the seq_file documentation

    Linus Torvalds
     
  • Documentation/ is a little large, and filesystems/ seems an obvious
    place for this file.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Jonathan Corbet

    J. Bruce Fields
     

05 Apr, 2008

1 commit

  • The effects of cgroup_disable=foo are:

    - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
    - foo isn't visible as an individually mountable subsystem

    As a result there will only ever be one call to foo->create(), at init time;
    all processes will stay in this group, and the group will never be mounted on
    a visible hierarchy. Any additional effects (e.g. not allocating metadata)
    are up to the foo subsystem.

    This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
    but it could easily be extended to do so if any of the early_init systems
    wanted it - I think it would just involve some nastier parameter processing
    since it would occur before the command-line argument parser had been run.

    Hugh said:

    Ballpark figures, I'm trying to get this question out rather than
    processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
    to the affected paths, booting with cgroup_disable=memory cuts that back to
    1% overhead (due to slightly bigger struct page).

    I'm no expert on distros, they may have no interest whatever in
    CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
    without it, or apply the cgroup_disable=memory patches.

    Unix bench's execl test result on x86_64 was

    == just after boot without mounting any cgroup fs.==
    mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6
    mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0
    ==

    [lizf@cn.fujitsu.com: fix boot option parsing]
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Sudhir Kumar
    Cc: YAMAMOTO Takashi
    Cc: David Rientjes
    Signed-off-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

02 Apr, 2008

1 commit

  • Some time ago it turned out that our suspend code ordering broke some
    NVidia-based systems that hung if _PTS was executed with one of the PCI
    devices, specifically a USB controller, in a low power state.

    Then, it was noticed that the suspend code ordering was not compliant
    with ACPI 1.0, although it was compliant with ACPI 2.0 (and later), and
    it was argued that the code had to be changed for that reason (ref.
    http://bugzilla.kernel.org/show_bug.cgi?id=9528).

    So we did, but evidently we did wrong, because it's now turning out that
    some systems have been broken by this change. Refs:
    http://bugzilla.kernel.org/show_bug.cgi?id=10340
    https://bugzilla.novell.com/show_bug.cgi?id=374217#c16

    [ I said at that time that something like this might happend, but the
    majority of people involved thought that it was improbable due to the
    necessity to preserve the compliance of hardware with ACPI 1.0. ]

    This actually is a quite serious regression from 2.6.24.

    Moreover, the ACPI 1.0 ordering of suspend code introduced another issue
    that I have only noticed recently. Namely, if the suspend of one of
    devices fails, the already suspended devices will be resumed without
    executing _WAK before, which leads to problems on some systems (for
    example, in such situations thermal management is broken on my HP
    nx6325). Consequently, it also breaks suspend debugging on the affected
    systems.

    Note also, that the requirement to execute _PTS before suspending
    devices does not really make sense, because the device in question may
    be put into a low power state at run time for a reason unrelated to a
    system-wide suspend.

    For the reasons outlined above, the change of the suspend ordering
    should be reverted, which is done by the patch below.

    [ Felix Möller: "I am the reporter from the original Novell Bug:

    https://bugzilla.novell.com/show_bug.cgi?id=374217

    I just tried current git head (two hours ago) with the patch (the one
    from the beginning of this thread) from Rafael and without it. With
    the patch my MacBook does suspend without it does not." ]

    Signed-off-by: Rafael J. Wysocki
    Tested-by: Felix Möller
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

25 Mar, 2008

1 commit


18 Mar, 2008

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: ALPS - fix forward/back buttons reversed on Acer 5520-5290
    Input: ALPS - put secondary device in proper place in sysfs
    Input: wacom - add support for Bamboo1, BambooFun, and Cintiq 12WX
    Input: document i8042.noloop
    Input: add keyboard notifier documentation
    Input: ads7846 - fix uninitialized var warning
    Input: i8042 - add SNI RM support
    Input: i8042 - add Lenovo 3000 N100 to nomux blacklist
    Input: i8042 - fix warning on non-x86 builds
    Input: cobalt_btns - assorted fixes

    Linus Torvalds
     

16 Mar, 2008

1 commit

  • This essentially reverts commit 71fc47a9adf8ee89e5c96a47222915c5485ac437
    ("ACPI: basic initramfs DSDT override support"), because the code simply
    isn't ready.

    It did ugly things to the init sequence to populate the rootfs image
    early, but that just ended up showing other problems with the whole
    approach. The fact is, the VFS layer simply isn't initialized this
    early, and the relevant ACPI code should either run much later, or this
    shouldn't be done at all.

    For 2.6.25, we'll just pick the latter option. We can revisit this
    concept later if necessary.

    Cc: Dave Hansen
    Cc: Tilman Schmidt
    Cc: Andrew Morton
    Cc: Thomas Renninger
    Cc: Eric Piel
    Cc: Len Brown
    Cc: Christoph Hellwig
    Cc: Markus Gaugusch
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Mar, 2008

1 commit


13 Mar, 2008

1 commit


08 Mar, 2008

1 commit


21 Feb, 2008

1 commit

  • This patch implements libata.force module parameter which can
    selectively override ATA port, link and device configurations
    including cable type, SATA PHY SPD limit, transfer mode and NCQ.

    For example, you can say "use 1.5Gbps for all fan-out ports attached
    to the second port but allow 3.0Gbps for the PMP device itself, oh,
    the device attached to the third fan-out port chokes on NCQ and
    shouldn't go over UDMA4" by the following.

    libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

19 Feb, 2008

1 commit

  • This patch removes the mca-pentium boot option that was a noop.

    besides the source code cleanup factor, this saves some text as well:

    arch/x86/kernel/cpu/bugs.o:
    text data bss dec hex filename
    651 77 4 732 2dc bugs.o.before
    631 53 4 688 2b0 bugs.o.after

    Signed-off-by: Adrian Bunk
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Adrian Bunk
     

09 Feb, 2008

1 commit


07 Feb, 2008

3 commits


06 Feb, 2008

1 commit

  • with module_param macro, the __setup code can be killed now:
    const __setup("all-generic-ide", ide_generic_all_on);

    and the module name "generic.ko" is not descriptive to its functionality,
    can be changed in Makefile, the "ide-pci-generic.ko" is better.

    the ide-pci-generic.all-generic-ide parameter also documented
    in Documentation/kernel-parameters.txt

    Signed-off-by: Denis Cheng
    Cc: Greg Kroah-Hartman
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Denis Cheng
     

03 Feb, 2008

3 commits


02 Feb, 2008

1 commit

  • The ACPI 1.0 specification wants us to put devices into low power
    states after executing the _PTS global control method, while ACPI
    2.0 and later want us to do that in the reverse order. The current
    suspend code follows ACPI 2.0 in that respect which causes some
    ACPI 1.0x systems to hang during suspend (ref.
    http://bugzilla.kernel.org/show_bug.cgi?id=9528).

    Make the suspend code execute _PTS before putting devices into low
    power states (ie. in accordance with ACPI 1.0x) and provide a command
    line option to override the default if need be.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Rafael J. Wysocki
     

31 Jan, 2008

1 commit


30 Jan, 2008

11 commits

  • The new "mfgptfix" boot command line option may be usd to fix MFGPT
    timers on AMD Geode platforms when the BIOS has incorrectly applied
    a workaround. TinyBIOS version 0.98 is known to be affected, 0.99
    fixes the problem by letting the user disable the workaround.

    Signed-off-by: Willy Tarreau
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Willy Tarreau
     
  • when MTRRs are not covering the whole e820 table, we need to trim the
    RAM and need to update e820.

    reuse some code on 64-bit as well.

    here need to add early_get_cap and use it in early_cpu_detect, and move
    mtrr_bp_init early.

    The code successfully trimmed the memory map on Justin's system:

    from:

    [ 0.000000] BIOS-e820: 0000000100000000 - 000000022c000000 (usable)

    to:

    [ 0.000000] modified: 0000000100000000 - 0000000228000000 (usable)
    [ 0.000000] modified: 0000000228000000 - 000000022c000000 (reserved)

    According to Justin it makes quite a difference:

    | When I boot the box without any trimming it acts like a 286 or 386,
    | takes about 10 minutes to boot (using raptor disks).

    Signed-off-by: Yinghai Lu
    Tested-by: Justin Piszcz
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Yinghai Lu
     
  • Add a generic option to clear any cpuid bit. I added it because it was
    very easy to add with the new generic cpuid disable bitmap and perhaps
    it will be useful in the future.

    Signed-off-by: Andi Kleen
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     
  • To disable CLFLUSH usage, especially in change_page_attr().

    Signed-off-by: Andi Kleen
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     
  • On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
    available RAM, meaning the last few megs (or even gigs) of memory will be
    marked uncached. Since Linux tends to allocate from high memory addresses
    first, this causes the machine to be unusably slow as soon as the kernel
    starts really using memory (i.e. right around init time).

    This patch works around the problem by scanning the MTRRs at boot and
    figuring out whether the current end_pfn value (setup by early e820 code)
    goes beyond the highest WB MTRR range, and if so, trimming it to match. A
    fairly obnoxious KERN_WARNING is printed too, letting the user know that
    not all of their memory is available due to a likely BIOS bug.

    Something similar could be done on i386 if needed, but the boot ordering
    would be slightly different, since the MTRR code on i386 depends on the
    boot_cpu_data structure being setup.

    This patch fixes a bug in the last patch that caused the code to run on
    non-Intel machines (AMD machines apparently don't need it and it's untested
    on other non-Intel machines, so best keep it off).

    Further enhancements and fixes from:

    Yinghai Lu
    Andi Kleen

    Signed-off-by: Jesse Barnes
    Tested-by: Justin Piszcz
    Cc: Andi Kleen
    Cc: "Eric W. Biederman"
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Jesse Barnes
     
  • For K8 system: 4G RAM with memory hole remapping enabled, or more than
    4G RAM installed.

    when try to use kexec second kernel, and the first doesn't include
    gart_shutdown. the second kernel could have different aper position than
    the first kernel. and second kernel could use that hole as RAM that is
    still used by GART set by the first kernel. esp. when try to kexec
    2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
    10). the new kernel will use aper by GART (set by first kernel) for
    vmemmap. and after new kernel setting one new GART. the position will be
    real RAM. the _mapcount set is lost.

    Bad page state in process 'swapper'
    page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
    Trying to fix it up, but a reboot is needed
    Backtrace:
    Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13

    Call Trace:
    [] bad_page+0x63/0x8d
    [] __free_pages_ok+0x7c/0x2a5
    [] free_all_bootmem_core+0xd0/0x198
    [] numa_free_all_bootmem+0x3b/0x76
    [] mem_init+0x3b/0x152
    [] start_kernel+0x236/0x2c2
    [] _sinittext+0x11a/0x121

    and
    [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
    phys addr is : 0x1c200000

    RHEL 5.1 kernel -53 said:
    PCI-DMA: aperture base @ 1c000000 size 65536 KB

    new kernel said:
    Mapping aperture over 65536 KB of RAM @ 3c000000

    So could try to disable that GART if possible.

    According to Ingo

    > hm, i'm wondering, instead of modifying the GART, why dont we simply
    > _detect_ whatever GART settings we have inherited, and propagate that
    > into our e820 maps? I.e. if there's inconsistency, then punch that out
    > from the memory maps and just dont use that memory.
    >
    > that way it would not matter whether the GART settings came from a [old
    > or crashing] Linux kernel that has not called gart_iommu_shutdown(), or
    > whether it's a BIOS that has set up an aperture hole inconsistent with
    > the memory map it passed. (or the memory map we _think_ i tried to pass
    > us)
    >
    > it would also be more robust to only read and do a memory map quirk
    > based on that, than actively trying to change the GART so early in the
    > bootup. Later on we have to re-enable the GART _anyway_ and have to
    > punch a hole for it.
    >
    > and as a bonus, we would have shored up our defenses against crappy
    > BIOSes as well.

    add e820 modification for gart inconsistent setting.

    gart_fix_e820=off could be used to disable e820 fix.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Yinghai Lu
     
  • The 32 bit x86 tree has a very useful feature that prints the Code: line
    for the code even before the trapping instrution (and the start of the
    trapping instruction is then denoted with a <>). Unfortunately, the 64 bit
    x86 tree does not yet have this feature, making diagnosing backtraces harder
    than needed.

    This patch adds this feature in the same was as the 32 bit tree has
    (including the same kernel boot parameter), and including a bugfix
    to make the code use probe_kernel_address() rarther than a buggy (deadlocking)
    __get_user.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     
  • support according to fixes of x86_64 support.

    - Delete efi_rt_lock because it is used during system early boot,
    before SMP is initialized.

    - Change local_flush_tlb() to __flush_tlb_all() to flush global page
    mapping.

    - Clean up includes.

    - Revise Kconfig description.

    - Enable noefi kernel parameter on i386.

    Signed-off-by: Huang Ying
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Huang, Ying
     
  • This makes x86_64's ia32 emulation support share the sources used in the
    32-bit kernel for the 32-bit vDSO and much of its setup code.

    The 32-bit vDSO mapping now behaves the same on x86_64 as on native 32-bit.
    The abi.syscall32 sysctl on x86_64 now takes the same values that
    vm.vdso_enabled takes on the 32-bit kernel. That is, 1 means a randomized
    vDSO location, 2 means the fixed old address. The CONFIG_COMPAT_VDSO
    option is now available to make this the default setting, the same meaning
    it has for the 32-bit kernel. (This does not affect the 64-bit vDSO.)

    The argument vdso32=[012] can be used on both 32-bit and 64-bit kernels to
    set this paramter at boot time. The vdso=[012] argument still does this
    same thing on the 32-bit kernel.

    Signed-off-by: Roland McGrath
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Roland McGrath
     
  • various changes to the in_p/out_p delay details:

    - add the io_delay=none method
    - make each method selectable from the kernel config
    - simplify the delay code a bit by getting rid of an indirect function call
    - add the /proc/sys/kernel/io_delay_type sysctl
    - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
    - make the io delay config not depend on CONFIG_DEBUG_KERNEL

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Tested-by: "David P. Reed"

    Ingo Molnar
     
  • x86: provide a DMI based port 0x80 I/O delay override.

    Certain (HP) laptops experience trouble from our port 0x80 I/O delay
    writes. This patch provides for a DMI based switch to the "alternate
    diagnostic port" 0xed (as used by some BIOSes as well) for these.

    David P. Reed confirmed that port 0xed works for him and provides a
    proper delay. The symptoms of _not_ working are a hanging machine,
    with "hwclock" use being a direct trigger.

    Earlier versions of this attempted to simply use udelay(2), with the
    2 being a value tested to be a nicely conservative upper-bound with
    help from many on the linux-kernel mailinglist but that approach has
    two problems.

    First, pre-loops_per_jiffy calibration (which is post PIT init while
    some implementations of the PIT are actually one of the historically
    problematic devices that need the delay) udelay() isn't particularly
    well-defined. We could initialise loops_per_jiffy conservatively (and
    based on CPU family so as to not unduly delay old machines) which
    would sort of work, but...

    Second, delaying isn't the only effect that a write to port 0x80 has.
    It's also a PCI posting barrier which some devices may be explicitly
    or implicitly relying on. Alan Cox did a survey and found evidence
    that additionally some drivers may be racy on SMP without the bus
    locking outb.

    Switching to an inb() makes the timing too unpredictable and as such,
    this DMI based switch should be the safest approach for now. Any more
    invasive changes should get more rigid testing first. It's moreover
    only very few machines with the problem and a DMI based hack seems
    to fit that situation.

    This also introduces a command-line parameter "io_delay" to override
    the DMI based choice again:

    io_delay=

    where "standard" means using the standard port 0x80 and "alternate"
    port 0xed.

    This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
    command-line ("io_delay=udelay") choice for testing purposes as well.

    This does not change the io_delay() in the boot code which is using
    the same port 0x80 I/O delay but those do not appear to be a problem
    as David P. Reed reported the problem was already gone after using the
    udelay version. He moreover reported that booting with "acpi=off" also
    fixed things and seeing as how ACPI isn't touched until after this DMI
    based I/O port switch I believe it's safe to leave the ones in the boot
    code be.

    The DMI strings from David's HP Pavilion dv9000z are in there already
    and we need to get/verify the DMI info from other machines with the
    problem, notably the HP Pavilion dv6000z.

    This patch is partly based on earlier patches from Pavel Machek and
    David P. Reed.

    Signed-off-by: Rene Herman
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Rene Herman
     

26 Jan, 2008

3 commits

  • Information about a ccw device will be dumped in
    case of a ccw timeout. This can be enabled with
    the kernel parameter ccw_timeout_log.

    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     
  • Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
    [SCSI] usbstorage: use last_sector_bug flag universally
    [SCSI] libsas: abstract STP task status into a function
    [SCSI] ultrastor: clean up inline asm warnings
    [SCSI] aic7xxx: fix firmware build
    [SCSI] aacraid: fib context lock for management ioctls
    [SCSI] ch: remove forward declarations
    [SCSI] ch: fix device minor number management bug
    [SCSI] ch: handle class_device_create failure properly
    [SCSI] NCR5380: fix section mismatch
    [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
    [SCSI] IB/iSER: add logical unit reset support
    [SCSI] don't use __GFP_DMA for sense buffers if not required
    [SCSI] use dynamically allocated sense buffer
    [SCSI] scsi.h: add macro for enclosure bit of inquiry data
    [SCSI] sd: add fix for devices with last sector access problems
    [SCSI] fix pcmcia compile problem
    [SCSI] aacraid: add Voodoo Lite class of cards.
    [SCSI] aacraid: add new driver features flags
    [SCSI] qla2xxx: Update version number to 8.02.00-k7.
    [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
    ...

    Linus Torvalds
     

25 Jan, 2008

1 commit

  • Change the NMI handler to use the die notifier chain to signal anyone
    who cares. Add a simple "nmi debugger" which hooks into this chain and
    that may dump registers, task state, etc. when it happens.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     

24 Jan, 2008

1 commit


17 Jan, 2008

1 commit

  • This adds the hugepagesz boot-time parameter for ppc64. It lets one
    pick the size for huge pages. The choices available are 64K and 16M
    when the base page size is 4k. It defaults to 16M (previously the
    only only choice) if nothing or an invalid choice is specified.

    Tested 64K huge pages successfully with the libhugetlbfs 1.2.

    Signed-off-by: Jon Tollefson
    Signed-off-by: Paul Mackerras

    Jon Tollefson
     

12 Jan, 2008

1 commit