17 Oct, 2007

5 commits

  • Kconfig.preempt is not included on some archs (for example, m68k). On those
    archs, the Kconfig machinery complains that KVM selects an undefined symbol
    PREEMPT_NOTIFIERS (which lives in Kconfig.preempt).

    So move the offending symbol into a Kconfig file which is included by
    everyone.

    Cc: Roman Zippel
    Cc: Geert Uytterhoeven
    Signed-off-by: Avi Kivity
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Avi Kivity
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
    kbuild: introduce ccflags-y, asflags-y and ldflags-y
    kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
    kbuild: enable use of AFLAGS and CFLAGS on commandline
    kbuild: enable 'make AFLAGS=...' to add additional options to AS
    kbuild: fix AFLAGS use in h8300 and m68knommu
    kbuild: check for wrong use of CFLAGS
    kbuild: enable 'make CFLAGS=...' to add additional options to CC
    kbuild: fix up CFLAGS usage
    kbuild: make modpost detect unterminated device id lists
    kbuild: call export_report from the Makefile
    kbuild: move Kai Germaschewski to CREDITS
    kconfig/menuconfig: distinguish between selected-by-another options and comments
    kconfig: tristate choices with mixed tristate and boolean values
    include/linux/Kbuild: remove duplicate entries
    kbuild: kill backward compatibility checks
    kbuild: kill EXTRA_ARFLAGS
    kbuild: fix documentation in makefiles.txt
    kbuild: call make once for all targets when O=.. is used
    kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
    kbuild: update _shipped files for kconfig syntax cleanup
    ...

    Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.

    Linus Torvalds
     
  • Grouping pages by mobility can be disabled at compile-time. This was
    considered undesirable by a number of people. However, in the current stack of
    patches, it is not a simple case of just dropping the configurable patch as it
    would cause merge conflicts. This patch backs out the configuration option.

    Signed-off-by: Mel Gorman
    Acked-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The grouping mechanism has some memory overhead and a more complex allocation
    path. This patch allows the strategy to be disabled for small memory systems
    or if it is known the workload is suffering because of the strategy. It also
    acts to show where the page groupings strategy interacts with the standard
    buddy allocator.

    Signed-off-by: Mel Gorman
    Signed-off-by: Joel Schopp
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Optionally add a boot delay after each kernel printk() call, crudely
    measured in milliseconds, with a maximum delay of 10 seconds per printk.

    Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
    "lpj=loops_per_jiffy boot_delay=100"
    to the kernel command line.

    It has been useful in cases like "during boot, my machine just reboots or the
    screen goes black" by slowing down printk, (and adding initcall_debug), we can
    usually see the last thing that happened before the lights went out which is
    usually a valuable clue.

    [akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
    [akpm@linux-foundation.org: fix lots of stuff]
    [bunk@stusta.de: kernel/printk.c: make 2 variables static]
    [heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
    Signed-off-by: Randy Dunlap
    Signed-off-by: Dave Jones
    Signed-off-by: Adrian Bunk
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

15 Oct, 2007

7 commits

  • Fix coding style issues reported by Randy Dunlap and others

    Signed-off-by: Dhaval Giani
    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • enable CONFIG_FAIR_GROUP_SCHED=y by default.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • fair-group sched, cleanups.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • Enable user-id based fair group scheduling. This is useful for anyone
    who wants to test the group scheduler w/o having to enable
    CONFIG_CGROUPS.

    A separate scheduling group (i.e struct task_grp) is automatically created for
    every new user added to the system. Upon uid change for a task, it is made to
    move to the corresponding scheduling group.

    A /proc tunable (/proc/root_user_share) is also provided to tune root
    user's quota of cpu bandwidth.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • With the view of supporting user-id based fair scheduling (and not just
    container-based fair scheduling), this patch renames several functions
    and makes them independent of whether they are being used for container
    or user-id based fair scheduling.

    Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating
    less-sized array for tg->cfs_rq[] and tf->se[]).

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Srivatsa Vaddagiri
     
  • Add interface to control cpu bandwidth allocation to task-groups.

    (not yet configurable, due to missing CONFIG_CONTAINERS)

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Srivatsa Vaddagiri
     
  • The variable CFLAGS is a wellknown variable and the usage by
    kbuild may result in unexpected behaviour.
    On top of that several people over time has asked for a way to
    pass in additional flags to gcc.

    This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
    tree and enabling one to use:
    make CFLAGS=...
    to specify additional gcc commandline options.

    One usecase is when trying to find gcc bugs but other
    use cases has been requested too.

    Patch was tested on following architectures:
    alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k

    Test was simple to do a defconfig build, apply the patch and check
    that nothing got rebuild.

    Signed-off-by: Sam Ravnborg

    Sam Ravnborg
     

20 Sep, 2007

2 commits

  • There is still some confusion and disagreement over what this interface should
    actually do. So it is best that we disable it in 2.6.23 until we get that
    fully sorted out.

    (sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we
    assume that nobody is using it yet).

    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Commit 831441862956fffa17b9801db37e6ea1650b0f69 (Freezer: make kernel
    threads nonfreezable by default) breaks freezing when attempting to resume
    from an initrd, because the init (which is freezeable) spins while waiting
    for another thread to run /linuxrc, but doesn't check whether it has been
    told to enter the refrigerator. The original patch replaced a call to
    try_to_freeze() with a call to yield(). I believe a simple reversion is
    wrong because if !CONFIG_PM_SLEEP, try_to_freeze() is a noop. It should
    still yield.

    Signed-off-by: Nigel Cunningham
    Acked-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nigel Cunningham
     

31 Aug, 2007

1 commit

  • Alexey Dobriyan reports that maxcpus=1 is still broken in 2.6.23-rc4:
    if CONFIG_HOTPLUG_CPU is not set, x86_64 bootup oopses in show_stat() -
    for_each_possible_cpu accesses a per-cpu area which was never set up.

    Alexey identified commit 61ec7567db103d537329b0db9a887db570431ff4
    (ACPI: boot correctly with "nosmp" or "maxcpus=0") as the origin;
    but it's not really to blame, just exposes a bug in 2.6.23-rc1's commit
    8b3b295502444340dd0701855ac422fbf32e161d (Especially when !CONFIG_HOTPLUG_CPU,
    avoid needlessy allocating resources for CPUs that can never become available).

    rc1's test for max_cpus < 2 in start_kernel() wasn't working because
    max_cpus was still NR_CPUS at that point: until rc4 moved the maxcpus
    parsing earlier. Now it sets cpu_possible_map to 1 before allocating
    all possible per-cpu areas; then smp_init() expands cpu_possible_map
    to cpu_present_map (0xf in my case) later on.

    rc1's commit has good intentions, but expects cpu_present_map to be
    limited by maxcpus, which is only the case on i386. cpus_and(possible,
    possible,present) might be good, but needs an audit of cpu_present_map
    uses - there may well be assumptions that any cpu present is possible.

    So stay safe for now and just revert those #ifndef CONFIG_HOTPLUG_CPU
    optimizations in rc1's commit.

    Signed-off-by: Hugh Dickins
    Cc: Alexey Dobriyan
    Cc: Len Brown
    Cc: Andrew Morton
    Cc: Jan Beulich
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Aug, 2007

1 commit

  • Commit 61ec7567db103d537329b0db9a887db570431ff4 ('ACPI: boot correctly
    with "nosmp" or "maxcpus=0"') broke 'maxcpus=' handling on x86[-64].

    maxcpus=N is now having no effect on x86_64, and freezing bootup on i386
    (because of inconsistency with the separate maxcpus parsing down in
    arch/i386, I guess). That's because early_param parsing is a little
    different from __setup parsing, and needs the "=" omitted: then it seems
    to work as the original commit intended (no mention of IO-APIC in
    /proc/interrupts when maxcpus=0).

    Signed-off-by: Hugh Dickins
    Cc: Andrew Morton
    Cc: Len Brown
    Cc: Andi Kleen
    Cc: Rusty Russell
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 Aug, 2007

1 commit

  • In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
    However, in ACPI mode, these parameters didn't completely disable
    the IO APIC initialization code and boot failed.

    init/main.c:
    Disable the IO_APIC if "nosmp" or "maxcpus=0"
    undefine disable_ioapic_setup() when it doesn't apply.

    i386:
    delete ioapic_setup(), it was a duplicate of parse_noapic()
    delete undefinition of disable_ioapic_setup()

    x86_64:
    rename disable_ioapic_setup() to parse_noapic() to match i386
    define disable_ioapic_setup() in header to match i386

    http://bugzilla.kernel.org/show_bug.cgi?id=1641

    Acked-by: Andi Kleen
    Signed-off-by: Len Brown

    Len Brown
     

01 Aug, 2007

2 commits


31 Jul, 2007

1 commit

  • * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6.23:
    sh: Fix fs.h removal from mm.h regressions.
    sh: fix get_wchan() for SH kernels without framepointers
    sh: arch/sh/boot - fix shell usage
    rtc: rtc-sh: Correct sh_rtc_set_time() for some SH-3 parts.
    sh: remove support for sh7300 and solution engine 7300
    sh: Add sh to the CC_OPTIMIZE_FOR_SIZE dependencies.
    sh: Kill off virt_to_bus()/bus_to_virt().
    sh: sh-sci - fix SH7708 support
    sh: Restrict DSP support to specific CPUs.
    sh: Silence sq compile warning on sh4 nommu.
    sh: Kill the rest of the SE73180 cruft.
    sh: remove support for sh73180 and solution engine 73180
    sh: remove old broken pint code
    sh: Reclaim beginning of P3 space for vmalloc area.
    sh: Fix Dreamcast DMA issues.
    sh: Add kmap_coherent()/kunmap_coherent() interface for SH-4.

    Linus Torvalds
     

27 Jul, 2007

1 commit


26 Jul, 2007

1 commit


18 Jul, 2007

2 commits

  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • There are some reports that 2.6.22 has SLUB as the default. Not
    true!

    This will make SLUB the default for 2.6.23.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

8 commits

  • Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for
    CPUs that can never become available.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • It's useful sometimes to disable the softlockup checker at boottime.
    Especially if it triggers during a distro install.

    Signed-off-by: Dave Jones
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
    deactivate the unshare of the uts and ipc namespaces and do not improve
    performance.

    Signed-off-by: Cedric Le Goater
    Acked-by: "Serge E. Hallyn"
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Some buses (e.g. USB and MMC) do their scanning of devices in the
    background, causing a race between them and prepare_namespace(). In order
    to be able to use these buses without an initrd, we now wait for the device
    specified in root= to actually show up.

    If the device never shows up than we will hang in an infinite loop. In
    order to not mess with setups that reboot on panic, the feature must be
    turned on via the command line option "rootwait".

    [bunk@stusta.de: root_wait can become static]
    Signed-off-by: Pierre Ossman
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Ossman
     
  • Change menuconfig objects from "menu, config" into "menuconfig" so that the
    user can disable the whole feature without entering its menu first.

    Signed-off-by: Jan Engelhardt
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Engelhardt
     
  • Currently slob is disabled if we're using sparsemem, due to an earlier
    patch from Goto-san. Slob and static sparsemem work without any trouble as
    it is, and the only hiccup is a missing slab_is_available() in the case of
    sparsemem extreme. With this, we're rid of the last set of restrictions
    for slob usage.

    Signed-off-by: Paul Mundt
    Acked-by: Pekka Enberg
    Acked-by: Matt Mackall
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
    include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
    serial initializing stage. the console_init=>serial8250_console_init=>
    register_console=>serial8250_console_setup will return -ENDEV, and console
    ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
    drivers/serial/serial_core.c to call register_console to get console ttyS0.
    that is too late.

    Make early_uart to use early_param, so uart console can be used earlier. Make
    it to be bootconsole with CON_BOOT flag, so can use console handover feature.
    and it will switch to corresponding normal serial console automatically.

    new command line will be:
    console=uart8250,io,0x3f8,9600n8
    console=uart8250,mmio,0xff5e0000,115200n8
    or
    earlycon=uart8250,io,0x3f8,9600n8
    earlycon=uart8250,mmio,0xff5e0000,115200n8

    it will print in very early stage:
    Early serial console at I/O port 0x3f8 (options '9600n8')
    console [uart0] enabled
    later for console it will print:
    console handover: boot [uart0] -> real [ttyS0]

    Signed-off-by:
    Cc: Andi Kleen
    Cc: Bjorn Helgaas
    Cc: Russell King
    Cc: Gerd Hoffmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

10 Jul, 2007

2 commits


19 May, 2007

1 commit


17 May, 2007

2 commits

  • No arch sets ARCH_USES_SLAB_PAGE_STRUCT anymore.

    Remove the experimental dependency as well since we want to have it as
    a real alternative to SLAB.

    It all comes down to killing a single line from init/Kconfig.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because
    even on UP, RCU freeing semantics are not equivalent to simply freeing
    immediately. This also allows SLOB to be used on SMP.

    Signed-off-by: Nick Piggin
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

11 May, 2007

3 commits

  • This is a very simple and light file descriptor, that can be used as event
    wait/dispatch by userspace (both wait and dispatch) and by the kernel
    (dispatch only). It can be used instead of pipe(2) in all cases where those
    would simply be used to signal events. Their kernel overhead is much lower
    than pipes, and they do not consume two fds. When used in the kernel, it can
    offer an fd-bridge to enable, for example, functionalities like KAIO or
    syslets/threadlets to signal to an fd the completion of certain operations.
    But more in general, an eventfd can be used by the kernel to signal readiness,
    in a POSIX poll/select way, of interfaces that would otherwise be incompatible
    with it. The API is:

    int eventfd(unsigned int count);

    The eventfd API accepts an initial "count" parameter, and returns an eventfd
    fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

    The POLLIN flag is raised when the internal counter is greater than zero.

    The POLLOUT flag is raised when at least a value of "1" can be written to the
    internal counter.

    The POLLERR flag is raised when an overflow in the counter value is detected.

    The write(2) operation can never overflow the counter, since it blocks (unless
    O_NONBLOCK is set, in which case -EAGAIN is returned).

    But the eventfd_signal() function can do it, since it's supposed to not sleep
    during its operation.

    The read(2) function reads the __u64 counter value, and reset the internal
    value to zero. If the value read is equal to (__u64) -1, an overflow happened
    on the internal counter (due to 2^64 eventfd_signal() posts that has never
    been retired - unlickely, but possible).

    The write(2) call writes an __u64 count value, and adds it to the current
    counter. The eventfd fd supports O_NONBLOCK also.

    On the kernel side, we have:

    struct file *eventfd_fget(int fd);
    int eventfd_signal(struct file *file, unsigned int n);

    The eventfd_fget() should be called to get a struct file* from an eventfd fd
    (this is an fget() + check of f_op being an eventfd fops pointer).

    The kernel can then call eventfd_signal() every time it wants to post an event
    to userspace. The eventfd_signal() function can be called from any context.
    An eventfd() simple test and bench is available here:

    http://www.xmailserver.org/eventfd-bench.c

    This is the eventfd-based version of pipetest-4 (pipe(2) based):

    http://www.xmailserver.org/pipetest-4.c

    Not that performance matters much in the eventfd case, but eventfd-bench
    shows almost as double as performance than pipetest-4.

    [akpm@linux-foundation.org: fix i386 build]
    [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch introduces a new system call for timers events delivered though
    file descriptors. This allows timer event to be used with standard POSIX
    poll(2), select(2) and read(2). As a consequence of supporting the Linux
    f_op->poll subsystem, they can be used with epoll(2) too.

    The system call is defined as:

    int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

    The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
    w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
    s new file descriptor will be created, otherwise the existing "ufd" will be
    re-programmed.

    The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
    specified in the "utmr->it_value" parameter is the expiry time for the timer.

    If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
    otherwise it's a relative time.

    If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
    tv_nsec == 0), this is the period at which the following ticks should be
    generated.

    The "utmr->it_interval" should be set to zero if only one tick is requested.
    Setting the "utmr->it_value" to zero will disable the timer, or will create a
    timerfd without the timer enabled.

    The function returns the new (or same, in case "ufd" is a valid timerfd
    descriptor) file, or -1 in case of error.

    As stated before, the timerfd file descriptor supports poll(2), select(2) and
    epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
    returned.

    The read(2) call can be used, and it will return a u32 variable holding the
    number of "ticks" that happened on the interface since the last call to
    read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
    be returned if no ticks happened.

    A quick test program, shows timerfd working correctly on my amd64 box:

    http://www.xmailserver.org/timerfd-test.c

    [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi