08 May, 2007

1 commit

  • The nr_cpu_ids value is currently only calculated in smp_init. However, it
    may be needed before (SLUB needs it on kmem_cache_init!) and other kernel
    components may also want to allocate dynamically sized per cpu array before
    smp_init. So move the determination of possible cpus into sched_init()
    where we already loop over all possible cpus early in boot.

    Also initialize both nr_node_ids and nr_cpu_ids with the highest value they
    could take. If we have accidental users before these values are determined
    then the current valud of 0 may cause too small per cpu and per node arrays
    to be allocated. If it is set to the maximum possible then we only waste
    some memory for early boot users.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 May, 2007

1 commit

  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
    [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
    [PATCH] i386: type may be unused
    [PATCH] i386: Some additional chipset register values validation.
    [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
    [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
    [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
    [PATCH] i386: white space fixes in i387.h
    [PATCH] i386: Drop noisy e820 debugging printks
    [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
    [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
    [PATCH] x86-64: Share identical video.S between i386 and x86-64
    [PATCH] x86-64: Remove CONFIG_REORDER
    [PATCH] x86-64: Print type and size correctly for unknown compat ioctls
    [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
    [PATCH] i386: Little cleanups in smpboot.c
    [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
    [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
    [PATCH] i386: Add X86_FEATURE_RDTSCP
    [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
    [PATCH] i386: Implement alternative_io for i386
    ...

    Fix up trivial conflict in include/linux/highmem.h manually.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 May, 2007

1 commit


03 May, 2007

7 commits

  • set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
    it at some point in their setup routine, and then the generic code sets up the
    reverse mapping from the msi_desc back to the irq.

    set_irq_msi() should do both connections, making it the one and only call
    required to connect an irq with it's MSI desc and vice versa.

    The arch code MUST call set_irq_msi(), and it must do so only once it's sure
    it's not going to fail the irq allocation.

    Given that there's no need for the arch to return the irq anymore, the return
    value from the arch setup routine just becomes 0 for success and anything else
    for failure.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • We need to work on cleaning up the relationship between kobjects, ksets and
    ktypes. The removal of 'struct subsystem' is the first step of this,
    especially as it is not really needed at all.

    Thanks to Kay for fixing the bugs in this patch.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Add hooks to allow a paravirt implementation to track the lifetime of
    an mm. Paravirtualization requires three hooks, but only two are
    needed in common code. They are:

    arch_dup_mmap, which is called when a new mmap is created at fork

    arch_exit_mmap, which is called when the last process reference to an
    mm is dropped, which typically happens on exit and exec.

    The third hook is activate_mm, which is called from the arch-specific
    activate_mm() macro/function, and so doesn't need stub versions for
    other architectures. It's called when an mm is first used.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: linux-arch@vger.kernel.org
    Cc: James Bottomley
    Acked-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
    Ingo suggested KVM as well).

    Because larger alignments can use more room, we increase the max per-cpu
    memory to 64k rather than 32k: it's getting a little tight.

    Signed-off-by: Rusty Russell
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Jeremy Fitzhardinge
     
  • Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
    the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common
    to all architectures; if an architecture wants to set
    PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
    the only one which does).

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell
    Cc: Eric W. Biederman
    Cc: Andi Kleen

    Jeremy Fitzhardinge
     
  • o virt_to_page() call should be used on kernel linear addresses and not
    on kernel text and data addresses. Swsusp code uses it on kernel data
    (statically allocated swsusp_header).

    o Allocate swsusp_header dynamically so that virt_to_page() can be used
    safely.

    o I am changing this because in next few patches, __pa() on x86_64 will
    no longer support kernel text and data addresses and hibernation breaks.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andi Kleen

    Vivek Goyal
     
  • o __pa() should be used only on kernel linearly mapped virtual addresses
    and not on kernel text and data addresses.

    o Hibernation code needs to determine the physical address associated
    with kernel symbol to mark a section boundary which contains pages which
    don't have to be saved and restored during hibernate/resume operation.

    o Move this piece of code in arch dependent section. So that architectures
    which don't have kernel text/data mapped into kernel linearly mapped
    region can come up with their own ways of determining physical addresses
    associated with a kernel text.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andi Kleen

    Vivek Goyal
     

01 May, 2007

5 commits

  • This patch changes the docs and behaviour from "all states valid" to "no
    states valid" if no .valid callback is assigned. Users of pm_ops that only
    need mem sleep can assign pm_valid_only_mem without any overhead, others
    will require more elaborate callbacks.

    Now that all users of pm_ops have a .valid callback this is a safe thing to
    do and prevents things from getting messy again as they were before.

    Signed-off-by: Johannes Berg
    Acked-by: Pavel Machek
    Looks-okay-to: Rafael J. Wysocki
    Cc:
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • Almost all users of pm_ops only support mem sleep, don't check in .valid and
    don't reject any others in .prepare so users can be confused if they check
    /sys/power/state, especially when new states are added (these would then
    result in s-t-r although they're supposed to be something different).

    This patch implements a generic pm_valid_only_mem function that is then
    exported for users and puts it to use in almost all existing pm_ops.

    Signed-off-by: Johannes Berg
    Cc: David Brownell
    Acked-by: Pavel Machek
    Cc: linux-pm@lists.linux-foundation.org
    Cc: Len Brown
    Acked-by: Russell King
    Cc: Greg KH
    Cc: "Rafael J. Wysocki"
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • This patch removes the firmware disk suspend mode which is the wrong approach,
    it is supposed to be used for implementing firmware-based disk suspend but
    cannot actually be used for that.

    Signed-off-by: Johannes Berg
    Acked-by: Pavel Machek
    Cc:
    Cc: David Brownell
    Cc: Len Brown
    Acked-by: Russell King
    Cc: Greg KH
    Cc: "Rafael J. Wysocki"
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • This patch series cleans up some misconceptions about pm_ops. Some users of
    the pm_ops structure attempt to use it to stop the user from entering suspend
    to disk, this, however, is not possible since the user can always use
    "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also,
    platforms that don't support suspend to disk simply should not allow
    configuring SOFTWARE_SUSPEND (read the help text on it, it only selects
    suspend to disk and nothing else, all the other stuff depends on PM).

    The pm_ops structure is actually intended to provide a way to enter
    platform-defined sleep states (currently supported states are "standby" and
    "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured)
    allows a platform to support a platform specific way to enter low-power mode
    once everything has been saved to disk. This is currently only used by ACPI
    (S4).

    This patch:

    The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really
    seems to understand what it actually does.

    This patch clarifies the pm_disk_mode description.

    It also removes all the arm and sh users that think they can veto suspend to
    disk via pm_ops; not so since the user can always do echo shutdown >
    /sys/power/disk, they need to find a better way involving Kconfig or such.

    ACPI is the only user left with a non-zero pm_disk_mode.

    The patch also sets the default mode to shutdown again, but when a new pm_ops
    is registered its pm_disk_mode is selected as default, that way the default
    stays for ACPI where it is apparently required.

    Signed-off-by: Johannes Berg
    Cc: David Brownell
    Acked-by: Pavel Machek
    Cc:
    Cc: Len Brown
    Acked-by: Russell King
    Cc: Greg KH
    Cc: "Rafael J. Wysocki"
    Acked-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • Today's print_symbol function dumps a kernel symbol with printk. This
    patch extends the functionality of kallsyms.c so that the symbol lookup
    function may be used without the printk. This is useful for modules that
    want to dump symbols elsewhere, for example, to debugfs. I intend to use
    the new function call in the GFS2 file system (which will be a separate
    patch).

    [akpm@linux-foundation.org: build fix]
    [clameter@sgi.com: sprint_symbol should return length of string like sprintf]
    Signed-off-by: Robert Peterson
    Cc: Rusty Russell
    Cc: Roman Zippel
    Cc: "Randy.Dunlap"
    Cc: Sam Ravnborg
    Acked-by: Paulo Marques
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Peterson
     

29 Apr, 2007

1 commit

  • Both old-IDE and libata should be able handle all controllers and
    devices found using normal resource reservation methods.

    This eliminates the awful, low-performing split-driver configuration
    where old-IDE drove the PATA portion of a PCI device, in PIO-only mode,
    and libata drove the SATA portion of the /same/ PCI device, in DMA mode.
    Typically vendors would ship SATA hard drive / PATA optical
    configuration, which would lend itself to slow (PIO-only) CD-ROM
    performance.

    For Intel users running in combined mode, it is now wholly dependent on
    your driver choice (potentially link order, if you compile both drivers
    in) whether old-IDE or libata will drive your hardware.

    In either case, you will get full performance from both SATA and PATA
    ports now, without having to pass a kernel command line parameter.

    Signed-off-by: Jeff Garzik

    Jeff Garzik
     

28 Apr, 2007

5 commits

  • Fix miscellaneous networking compilation errors.

    (*) Export ktime_add_ns() for modules.

    (*) wext_proc_init() should have an ANSI declaration.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits)
    dev_dbg: check dev_dbg() arguments
    drivers/base/attribute_container.c: use mutex instead of binary semaphore
    mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs
    s2ram: add arch irq disable/enable hooks
    define platform wakeup hook, use in pci_enable_wake()
    security: prevent permission checking of file removal via sysfs_remove_group()
    device_schedule_callback() needs a module reference
    s390: cio: Delay uevents for subchannels
    sysfs: bin.c printk fix
    Driver core: use mutex instead of semaphore in DMA pool handler
    driver core: bus_add_driver should return an error if no bus
    debugfs: Add debugfs_create_u64()
    the overdue removal of the mount/umount uevents
    kobject: Comment and warning fixes to kobject.c
    Driver core: warn when userspace writes to the uevent file in a non-supported way
    Driver core: make uevent-environment available in uevent-file
    kobject core: remove rwsem from struct subsystem
    qeth: Remove usage of subsys.rwsem
    PHY: remove rwsem use from phy core
    IEEE1394: remove rwsem use from ieee1394 core
    ...

    Linus Torvalds
     
  • mod_sysfs_setup() doesn't return an errno when kobject_add_dir() for module
    "holders" directory fails. So caller of mod_sysfs_setup() will keep going
    and get oops.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Akinobu Mita
     
  • After some more discussion this patch replaces it:

    From: Johannes Berg
    Subject: suspend: add arch irq disable/enable hooks

    For powermac, we need to do some things between suspending devices and
    device_power_off, for example setting the decrementer. This patch
    allows architectures to define arch_s2ram_{en,dis}able_irqs in their
    asm/suspend.h to have control over this step.

    Signed-off-by: Johannes Berg
    Acked-by: Pavel Machek
    Cc: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • show_state() (SysRq-T) developed the buggy habbit of not showing
    TASK_RUNNING tasks. This was due to the mistaken belief that state_filter
    == -1 would be a pass-through filter - while in reality it did not let
    TASK_RUNNING == 0 p->state values through.

    Fix this by restoring the original '!state_filter means all tasks'
    special-case i had in the original version. Test-built and test-booted on
    i686, SysRq-T now works as intended.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

27 Apr, 2007

1 commit


26 Apr, 2007

6 commits

  • Switch cb_lock to mutex and allow netlink kernel users to override it
    with a subsystem specific mutex for consistent locking in dump callbacks.
    All netlink_dump_start users have been audited not to rely on any
    side-effects of the previously used spinlock.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Change tcp_probe to use ktime (needed to add one export).
    Add option to only get events when cwnd changes - from Doug Leith

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
    number of direct accesses to skb->data and for consistency with all the other
    cast skb member helpers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Get rid of the manual clock source selection mess and use ktime. Also
    use a scalar representation, which allows to clean up pkt_sched.h a bit
    more and results in less ktime_to_ns() calls in most cases.

    The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
    inefficient by this patch, following patches will convert all qdiscs
    to hrtimers and get rid of them entirely.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • We currently use a special structure (struct skb_timeval) and plain
    'struct timeval' to store packet timestamps in sk_buffs and struct
    sock.

    This has some drawbacks :
    - Fixed resolution of micro second.
    - Waste of space on 64bit platforms where sizeof(struct timeval)=16

    I suggest using ktime_t that is a nice abstraction of high resolution
    time services, currently capable of nanosecond resolution.

    As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
    a 8 byte shrink of this structure on 64bit architectures. Some other
    structures also benefit from this size reduction (struct ipq in
    ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

    Once this ktime infrastructure adopted, we can more easily provide
    nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
    SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

    Note : this patch includes a bug correction in
    compat_sock_get_timestamp() where a "err = 0;" was missing (so this
    syscall returned -ENOENT instead of 0)

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: John find
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Apr, 2007

1 commit

  • The commit 34f5a39899f3f3e815da64f48ddb72942d86c366 restricted reading
    of the tainted value. The attached patch changes this back to a
    write-only check and restores the read behaviour of older versions.

    Signed-off-by: Bastian Blank
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bastian Blank
     

13 Apr, 2007

1 commit


08 Apr, 2007

4 commits

  • Getting rid of the p->children printout in show_task() left behind an
    unused variable.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • the p->parent PID printout gives us all the information about the
    task tree that we need - the eldest_child()/older_sibling()/
    younger_sibling() printouts are mostly historic and i do not
    remember ever having used those fields. (IMO in fact they confuse
    the SysRq-T output.) So remove them.

    This code has sentimental value though, those fields and
    printouts are one of the oldest ones still surviving from
    Linux v0.95's kernel/sched.c:

    if (p->p_ysptr || p->p_osptr)
    printk(" Younger sib=%d, older sib=%d\n\r",
    p->p_ysptr ? p->p_ysptr->pid : -1,
    p->p_osptr ? p->p_osptr->pid : -1);
    else
    printk("\n\r");

    written 15 years ago, in early 1992.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus 'snif' Torvalds

    Ingo Molnar
     
  • devres should be deallocated with devres_free() not kfree(). This bug
    corrupts slab on IRQ request failure. Fix it.

    Signed-off-by: Tejun Heo
    Cc: Andrew Morton
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Soeren Sonnenburg reported that upon resume he is getting
    this backtrace:

    [] smp_apic_timer_interrupt+0x57/0x90
    [] retrigger_next_event+0x0/0xb0
    [] apic_timer_interrupt+0x28/0x30
    [] retrigger_next_event+0x0/0xb0
    [] __kfifo_put+0x8/0x90
    [] on_each_cpu+0x35/0x60
    [] clock_was_set+0x18/0x20
    [] timekeeping_resume+0x7c/0xa0
    [] __sysdev_resume+0x11/0x80
    [] sysdev_resume+0x47/0x80
    [] device_power_up+0x5/0x10

    it turns out that on resume we mistakenly re-enable interrupts too
    early. Do the timer retrigger only on the current CPU.

    Signed-off-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Soeren Sonnenburg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

05 Apr, 2007

3 commits

  • In debugging a problem w/ the -rt tree, I noticed that on systems that mark
    the tsc as unstable before it is registered, the TSC would still be
    selected and used for a short period of time. Digging in it looks to be a
    result of the mix of the clocksource list changes and my clocksource
    initialization changes.

    With the -rt tree, using a bad TSC, even for a short period of time can
    results in a hang at boot. I was not able to reproduce this hang w/
    mainline, but I'm not completely certain that someone won't trip on it.

    This patch resolves the issue by initializing the jiffies clocksource
    earlier so a bad TSC won't get selected just because nothing else is yet
    registered.

    Signed-off-by: John Stultz
    Acked-by: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • Fix a bug in the swsusp's memory shrinker that causes some systems using
    highmem to refuse to suspend to disk if image_size is set above 1/2 of
    available RAM.

    Special thanks to Jiri Slaby for reporting the problem and assistance in
    debugging it.

    Signed-off-by: Rafael J. Wysocki
    Cc: Jiri Slaby
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This patch adds 2 missing symbol exports: jiffies_to_timeval() and
    timeval_to_jiffies(). The (not yet merged) dm-raid4-5 module will need
    them, and they used to be indirectly exported by virtue of being inline
    functions.

    Commit 8b9365d753d9870bb6451504c13570b81923228f ("[PATCH] Uninline
    jiffies.h functions") uninlined them, and thus modules now need them
    explicitly exported to use them.

    Signed-off-by: Thomas Bittermann
    Acked-by: Andrew Morton
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: john stultz
    Signed-off-by: Linus Torvalds

    Thomas Bittermann
     

03 Apr, 2007

2 commits

  • Fix the regression resulting from the recent change of suspend code
    ordering that causes systems based on Intel x86 CPUs using the microcode
    driver to hang during the resume.

    The problem occurs since the microcode driver uses request_firmware() in
    its CPU hotplug notifier, which is called after tasks has been frozen and
    hangs. It can be fixed by telling the microcode driver to use the
    microcode stored in memory during the resume instead of trying to load it
    from disk.

    Signed-off-by: Rafael J. Wysocki
    Adrian Bunk
    Cc: Tigran Aivazian
    Cc: Pavel Machek
    Cc: Maxim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • built-in drivers had broken sysfs links that caused bootup hangs for
    certain driver unregistry sequences.

    Signed-off-by: Ingo Molnar
    Acked-by: Kay Sievers
    Signed-off-by: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kay Sievers
     

29 Mar, 2007

1 commit

  • In commit 0475ac0845f9295bc5f69af45f58dff2c104c8d1 when converting the
    orphaned process group handling to use struct pid I made a small
    mistake. I accidentally replaced an == with a !=.

    Besides just being a dumb thing to do apparently this has a bad side
    effect. The improper orphaned process group detection causes kwin to
    die after a suspend/resume cycle.

    I'm amazed this patch has been around as long as it has without anyone
    else noticing something funny going on.

    And the following people deserve credit for spotting and helping
    to reproduce this.

    Thanks to: Sid Boyce
    Thanks to: "Michael Wu"

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman