06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

24 Jan, 2008

1 commit

  • Using 64k pages on 64-bit PowerPC systems makes life difficult for
    emulators that are trying to emulate an ISA, such as x86, which use a
    smaller page size, since the emulator can no longer use the MMU and
    the normal system calls for controlling page protections. Of course,
    the emulator can emulate the MMU by checking and possibly remapping
    the address for each memory access in software, but that is pretty
    slow.

    This provides a facility for such programs to control the access
    permissions on individual 4k sub-pages of 64k pages. The idea is
    that the emulator supplies an array of protection masks to apply to a
    specified range of virtual addresses. These masks are applied at the
    level where hardware PTEs are inserted into the hardware page table
    based on the Linux PTEs, so the Linux PTEs are not affected. Note
    that this new mechanism does not allow any access that would otherwise
    be prohibited; it can only prohibit accesses that would otherwise be
    allowed. This new facility is only available on 64-bit PowerPC and
    only when the kernel is configured for 64k pages.

    The masks are supplied using a new subpage_prot system call, which
    takes a starting virtual address and length, and a pointer to an array
    of protection masks in memory. The array has a 32-bit word per 64k
    page to be protected; each 32-bit word consists of 16 2-bit fields,
    for which 0 allows any access (that is otherwise allowed), 1 prevents
    write accesses, and 2 or 3 prevent any access.

    Implicit in this is that the regions of the address space that are
    protected are switched to use 4k hardware pages rather than 64k
    hardware pages (on machines with hardware 64k page support). In fact
    the whole process is switched to use 4k hardware pages when the
    subpage_prot system call is used, but this could be improved in future
    to switch only the affected segments.

    The subpage protection bits are stored in a 3 level tree akin to the
    page table tree. The top level of this tree is stored in a structure
    that is appended to the top level of the page table tree, i.e., the
    pgd array. Since it will often only be 32-bit addresses (below 4GB)
    that are protected, the pointers to the first four bottom level pages
    are also stored in this structure (each bottom level page contains the
    protection bits for 1GB of address space), so the protection bits for
    addresses below 4GB can be accessed with one fewer loads than those
    for higher addresses.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     

08 Nov, 2007

1 commit


18 Jul, 2007

1 commit

  • fallocate() is a new system call being proposed here which will allow
    applications to preallocate space to any file(s) in a file system.
    Each file system implementation that wants to use this feature will need
    to support an inode operation called ->fallocate().
    Applications can use this feature to avoid fragmentation to certain
    level and thus get faster access speed. With preallocation, applications
    also get a guarantee of space for particular file(s) - even if later the
    the system becomes full.

    Currently, glibc provides an interface called posix_fallocate() which
    can be used for similar cause. Though this has the advantage of working
    on all file systems, but it is quite slow (since it writes zeroes to
    each block that has to be preallocated). Without a doubt, file systems
    can do this more efficiently within the kernel, by implementing
    the proposed fallocate() system call. It is expected that
    posix_fallocate() will be modified to call this new system call first
    and incase the kernel/filesystem does not implement it, it should fall
    back to the current implementation of writing zeroes to the new blocks.
    ToDos:
    1. Implementation on other architectures (other than i386, x86_64,
    and ppc). Patches for s390(x) and ia64 are already available from
    previous posts, but it was decided that they should be added later
    once fallocate is in the mainline. Hence not including those patches
    in this take.
    2. Changes to glibc,
    a) to support fallocate() system call
    b) to make posix_fallocate() and posix_fallocate64() call fallocate()

    Signed-off-by: Amit Arora

    Amit Arora
     

29 Jun, 2007

1 commit

  • Not all the world is an i386. Many architectures need 64-bit arguments to be
    aligned in suitable pairs of registers, and the original
    sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an
    argument register for padding after the first integer. Since we don't
    normally have more than 6 arguments for system calls, that left no room for
    the final argument on some architectures.

    Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which
    all fits nicely. In fact, ARM already had that, but called it
    sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement
    the needed compatibility routine. And stop the missing syscall check from
    bitching about the absence of sys_sync_file_range() if we've implemented
    sys_sync_file_range2() instead.

    Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64.

    Signed-off-by: David Woodhouse
    Acked-by: Russell King
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

17 May, 2007

1 commit


10 May, 2007

1 commit


18 Apr, 2007

1 commit


12 Mar, 2007

2 commits


13 Feb, 2007

1 commit


16 Nov, 2006

1 commit


04 Nov, 2006

1 commit


21 Jun, 2006

1 commit