19 Oct, 2007

40 commits

  • * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
    hrtimer: hook compat_sys_nanosleep up to high res timer code
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    sched: reduce schedstat variable overhead a bit
    sched: add KERN_CONT annotation
    sched: cleanup, make struct rq comments more consistent
    sched: cleanup, fix spacing
    sched: fix return value of wait_for_completion_interruptible()

    Linus Torvalds
     
  • Get rid of sparse related warnings from places that use integer as NULL
    pointer.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stephen Hemminger
    Cc: Andi Kleen
    Cc: Jeff Garzik
    Cc: Matt Mackall
    Cc: Ian Kent
    Cc: Arnd Bergmann
    Cc: Davide Libenzi
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     
  • This adds items to the taststats struct to account for user and system
    time based on scaling the CPU frequency and instruction issue rates.

    Adds account_(user|system)_time_scaled callbacks which architectures
    can use to account for time using this mechanism.

    Signed-off-by: Michael Neuling
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     
  • Signed-off-by: Daniel Walker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Just removing white space at the end of lines.

    Signed-off-by: Daniel Walker
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Large chunks of 5 spaces instead of tabs.

    Signed-off-by: Daniel Walker
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Lots of converting spaces to tabs.

    Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Signed-off-by: Daniel Walker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
    can change the capabilities of another process, p2. This is not the
    meaning that was intended for this capability at all, and this
    implementation came about purely because, without filesystem capabilities,
    there was no way to use capabilities without one process bestowing them on
    another.

    Since we now have a filesystem support for capabilities we can fix the
    implementation of CAP_SETPCAP.

    The most significant thing about this change is that, with it in effect, no
    process can set the capabilities of another process.

    The capabilities of a program are set via the capability convolution
    rules:

    pI(post-exec) = pI(pre-exec)
    pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI)
    pE(post-exec) = fE ? pP(post-exec) : 0

    at exec() time. As such, the only influence the pre-exec() program can
    have on the post-exec() program's capabilities are through the pI
    capability set.

    The correct implementation for CAP_SETPCAP (and that enabled by this patch)
    is that it can be used to add extra pI capabilities to the current process
    - to be picked up by subsequent exec()s when the above convolution rules
    are applied.

    Here is how it works:

    Let's say we have a process, p. It has capability sets, pE, pP and pI.
    Generally, p, can change the value of its own pI to pI' where

    (pI' & ~pI) & ~pP = 0.

    That is, the only new things in pI' that were not present in pI need to
    be present in pP.

    The role of CAP_SETPCAP is basically to permit changes to pI beyond
    the above:

    if (pE & CAP_SETPCAP) {
    pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */
    }

    This capability is useful for things like login, which (say, via
    pam_cap) might want to raise certain inheritable capabilities for use
    by the children of the logged-in user's shell, but those capabilities
    are not useful to or needed by the login program itself.

    One such use might be to limit who can run ping. You set the
    capabilities of the 'ping' program to be "= cap_net_raw+i", and then
    only shells that have (pI & CAP_NET_RAW) will be able to run
    it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
    would have to also have (pP & CAP_NET_RAW) in order to raise this
    capability and pass it on through the inheritable set.

    Signed-off-by: Andrew Morgan
    Signed-off-by: Serge E. Hallyn
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morgan
     
  • After adding checking to register_sysctl_table and finding a whole new set
    of bugs. Missed by countless code reviews and testers I have finally lost
    patience with the binary sysctl interface.

    The binary sysctl interface has been sort of deprecated for years and
    finding a user space program that uses the syscall is more difficult then
    finding a needle in a haystack. Problems continue to crop up, with the in
    kernel implementation. So since supporting something that no one uses is
    silly, deprecate sys_sysctl with a sufficient grace period and notice that
    the handful of user space applications that care can be fixed or replaced.

    The /proc/sys sysctl interface that people use will continue to be
    supported indefinitely.

    This patch moves the tested warning about sysctls from the path where
    sys_sysctl to a separate path called from both implementations of
    sys_sysctl, and it adds a proper entry into
    Documentation/feature-removal-schedule.

    Allowing us to revisit this in a couple years time and actually kill
    sys_sysctl.

    [lethal@linux-sh.org: sysctl: Fix syscall disabled build]
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • It turns out that the net/irda code didn't register any of it's binary paths
    in the global sysctl.h header file so I missed them completely when making an
    authoritative list of binary sysctl paths in the kernel. So add them to the
    list of valid binary sysctl paths.

    Signed-off-by: Eric W. Biederman
    Acked-by: Samuel Ortiz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Well it turns out after I dug into the problems a little more I was returning
    a few false positives so this patch updates my logic to remove them.

    - Don't complain about 0 ctl_names in sysctl_check_binary_path
    It is valid for someone to remove the sysctl binary interface
    and still keep the same sysctl proc interface.

    - Count ctl_names and procnames as matching if they both don't
    exist.

    - Only warn about missing min&max when the generic functions care.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After going through the kernels sysctl tables several times it has become
    clear that code review and testing is just not effective in prevent
    problematic sysctl tables from being used in the stable kernel. I certainly
    can't seem to fix the problems as fast as they are introduced.

    Therefore this patch adds sysctl_check_table which is called when a sysctl
    table is registered and checks to see if we have a problematic sysctl table.

    The biggest part of the code is the table of valid binary sysctl entries, but
    since we have frozen our set of binary sysctls this table should not need to
    change, and it makes it much easier to detect when someone unintentionally
    adds a new binary sysctl value.

    As best as I can determine all of the several hundred errors spewed on boot up
    now are legitimate.

    [bunk@kernel.org: kernel/sysctl_check.c must #include ]
    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • It looks like we inadvertently killed the cad_pid binary sysctl support when
    cap_pid was changed to be a struct pid. Since no one has complained just
    remove the binary path.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl
    logic into drivers/char/pty.c

    As well as cleaning up the logic this prevents sysctl_check_table from
    complaining that the root table has a NULL data pointer on something with
    generic methods.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl
    does not handle properly with a 64bit kernel and a 32bit user space.

    Since no one is likely to be using the binary sysctl values and the ascii
    interface still works, this patch just removes support for the binary sysctl
    interface from the kernel.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Cc: Badari Pulavarty
    Cc: Len Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • These functions are all wrapper functions for the proc interface that are
    needed for them to work correctly.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Acked-by: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • There as been no easy way to wrap the default sysctl strategy routine except
    for returning 0. Which is not always what we want. The few instances I have
    seen that want different behaviour have written their own version of
    sysctl_data. While not too hard it is unnecessary code and has the potential
    for extra bugs.

    So to make these situations easier and make that part of sysctl more symetric
    I have factord sysctl_data out of do_sysctl_strategy and exported as a
    function everyone can use.

    Further having sysctl_data be an explicit function makes checking for badly
    formed sysctl tables much easier.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't
    needed and is a bit of a nuisance because it makes it harder to recognize
    ctl_table is a type name.

    So this patch removes it from the generic sysctl code. Hopefully I will have
    enough energy to send the rest of my patches will follow and to remove it from
    the rest of the kernel.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The functions in a CPU notifier chain is called with CPU_UP_PREPARE event
    before making the CPU online. If one of the callback returns NOTIFY_BAD, it
    stops to deliver CPU_UP_PREPARE event, and CPU online operation is canceled.
    Then CPU_UP_CANCELED event is delivered to the functions in a CPU notifier
    chain again.

    This CPU_UP_CANCELED event is delivered to the functions which have been
    called with CPU_UP_PREPARE, not delivered to the functions which haven't been
    called with CPU_UP_PREPARE.

    The problem that makes existing cpu hotplug error handlings complex is that
    the CPU_UP_CANCELED event is delivered to the function that has returned
    NOTIFY_BAD, too.

    Usually we don't expect to call destructor function against the object that
    has failed to initialize. It is like:

    err = register_something();
    if (err) {
    unregister_something();
    return err;
    }

    So it is natural to deliver CPU_UP_CANCELED event only to the functions that
    have returned NOTIFY_OK with CPU_UP_PREPARE event and not to call the function
    that have returned NOTIFY_BAD. This is what this patch is doing.

    Otherwise, every cpu hotplug notifiler has to track whether notifiler event is
    failed or not for each cpu. (drivers/base/topology.c is doing this with
    topology_dev_map)

    Similary this patch makes same thing with CPU_DOWN_PREPARE and CPU_DOWN_FAILED
    evnets.

    Acked-by: Rusty Russell
    Signed-off-by: Akinobu Mita
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • If memchr argument is longer than strlen(kp->name), there will be some
    weird result.

    It will casuse duplicate filenames in sysfs for the "nousb". kernel
    warning messages are as bellow:

    sysfs: duplicate filename 'usbcore' can not be created
    WARNING: at fs/sysfs/dir.c:416 sysfs_add_one()
    [] sysfs_add_one+0xa0/0xe0
    [] create_dir+0x48/0xb0
    [] sysfs_create_dir+0x29/0x50
    [] create_dir+0x1b/0x50
    [] kobject_add+0x46/0x150
    [] kobject_init+0x3a/0x80
    [] kernel_param_sysfs_setup+0x50/0xb0
    [] param_sysfs_builtin+0xee/0x130
    [] param_sysfs_init+0x23/0x60
    [] __next_cpu+0x12/0x20
    [] kernel_init+0x0/0xb0
    [] kernel_init+0x0/0xb0
    [] do_initcalls+0x46/0x1e0
    [] create_proc_entry+0x52/0x90
    [] register_irq_proc+0x9c/0xc0
    [] proc_mkdir_mode+0x34/0x50
    [] kernel_init+0x0/0xb0
    [] kernel_init+0x62/0xb0
    [] kernel_thread_helper+0x7/0x14
    =======================
    kobject_add failed for usbcore with -EEXIST, don't try to register things with the same name in the same directory.
    [] kobject_add+0xf6/0x150
    [] kernel_param_sysfs_setup+0x50/0xb0
    [] param_sysfs_builtin+0xee/0x130
    [] param_sysfs_init+0x23/0x60
    [] __next_cpu+0x12/0x20
    [] kernel_init+0x0/0xb0
    [] kernel_init+0x0/0xb0
    [] do_initcalls+0x46/0x1e0
    [] create_proc_entry+0x52/0x90
    [] register_irq_proc+0x9c/0xc0
    [] proc_mkdir_mode+0x34/0x50
    [] kernel_init+0x0/0xb0
    [] kernel_init+0x62/0xb0
    [] kernel_thread_helper+0x7/0x14
    =======================
    Module 'usbcore' failed to be added to sysfs, error number -17
    The system will be unstable now.

    Signed-off-by: Dave Young
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
    include powerpc), it is possible for the vdso to get out of sync if a user
    calls (admittedly unusual) settimeofday(NULL, ptr).

    This patch adds a hook for architectures that set
    CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
    updatee their copy in the vdso.

    Signed-off-by: Tony Breeds
    Cc: Andi Kleen
    Cc: Tony Luck
    Acked-by: John Stultz
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Breeds
     
  • Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
    during 2.6.9 development. Commit introduced io_wait field which remained
    write-only than and still remains write-only.

    Also garbage collect macros which "use" io_wait.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Make hibernation_platform_enter() execute the enter-a-sleep-state sequence
    instead of the mixed shutdown-with-entering-S4 thing.

    Replace the shutting down of devices done by kernel_shutdown_prepare(), before
    entering the ACPI S4 sleep state, with suspending them and the shutting down
    of sysdevs with calling device_power_down(PMSG_SUSPEND) (just like before
    entering S1 or S3, but the target state is now S4).  Also, disable the
    nonboot CPUs before entering the sleep state (S4), which generally always is a
    good idea.

    This is known to fix the "double disk spin down during hibernation" on some
    machines, eg. HPC nx6325 (ref. http://lkml.org/lkml/2007/8/7/316 and the
    following thread).  Moreover, it has been reported to make
    /sys/class/rtc/rtc0/wakealarm work correctly with hibernation for some users.
    It also generally causes the hibernation state (ACPI S4) to be entered faster.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • The following scenario leads to total confusion of the platform firmware on
    some boxes (eg. HPC nx6325):
    * Hibernate with ACPI enabled
    * Resume passing "acpi=off" to the boot kernel

    To prevent this from happening it's necessary to check if ACPI is enabled (and
    enable it if that's not the case) _right_ _after_ control has been transfered
    from the boot kernel to the image kernel, before device_power_up() is called
    (ie. with interrupts disabled).  Enabling ACPI after calling
    device_power_up() turns out to be insufficient.

    For this reason, introduce new hibernation callback ->leave() that will be
    executed before device_power_up() by the restored image kernel.  To make it
    work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c
    (it's name is changed to "create_image", which is more up to the point).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Add the bits needed for supporting arbitrary boot kernels to the common
    hibernation code.

    To support arbitrary boot kernels, make it possible to replace the 'struct
    new_utsname' and the kernel version in the hibernation image header by some
    architecture specific data that will be used to verify if the image is valid
    and to restore the image.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop
    the serial console from being suspended when the rest of the machine goes
    to sleep. This is incredibly useful for debugging power management-related
    things; however, having it as a compile-time option has proved to be
    incredibly inconvenient for us (OLPC). There are plenty of times that we
    want serial console to not suspend, but for the most part we'd like serial
    console to be suspended.

    This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel
    boot parameter (no_console_suspend). By default, the serial console will
    be suspended along with the rest of the system; by passing
    'no_console_suspend' to the kernel during boot, serial console will remain
    alive during suspend.

    For now, this is pretty serial console specific; further fixes could be
    applied to make this work for things like netconsole.

    Signed-off-by: Andres Salomon
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Pavel Machek
    Cc: Nigel Cunningham
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • Measure the time of the freezing of tasks, even if it doesn't fail.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki