11 Dec, 2006

1 commit

  • Currently, each fdtable supports three dynamically-sized arrays of data: the
    fdarray and two fdsets. The code allows the number of fds supported by the
    fdarray (fdtable->max_fds) to differ from the number of fds supported by each
    of the fdsets (fdtable->max_fdset).

    In practice, it is wasteful for these two sizes to differ: whenever we hit a
    limit on the smaller-capacity structure, we will reallocate the entire fdtable
    and all the dynamic arrays within it, so any delta in the memory used by the
    larger-capacity structure will never be touched at all.

    Rather than hogging this excess, we shouldn't even allocate it in the first
    place, and keep the capacities of the fdarray and the fdsets equal. This
    patch removes fdtable->max_fdset. As an added bonus, most of the supporting
    code becomes simpler.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

30 Sep, 2006

1 commit

  • POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX. In
    this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
    of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
    the compile-time constant which happens to also be named OPEN_MAX. In the
    current code, an application may poll up to max_fdset file descriptors,
    even if this exceeds RLIMIT_NOFILE. The current code also breaks
    applications which poll more than max_fdset descriptors, which worked circa
    2.4.18 when the check was against NR_OPEN, which is 1024*1024. This patch
    enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
    been changed at run time with ulimit -n.

    To elaborate on the rationale for this, there are three cases:

    1) RLIMIT_NOFILE is at the default value of 1024

    In this (default) case, the patch changes nothing. Calls with nfds > 1024
    fail with EINVAL both before and after the patch, and calls with nfds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Snook
     

26 Jun, 2006

1 commit

  • If you do a poll() call with timeout -1, the wait will be a big number
    (depending on HZ) instead of infinite wait, since -1 is passed to the
    msecs_to_jiffies function.

    Signed-off-by: Frode Isaksen
    Acked-by: Nishanth Aravamudan
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frode Isaksen
     

23 Jun, 2006

1 commit

  • The "count" and "pt" variables are declared and modified by do_poll(), as
    well as accessed and written indirectly in the do_pollfd() subroutine.

    This patch pulls all handling of these variables into the do_poll()
    function, thereby eliminating the odd use of indirection in do_pollfd().
    This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
    into its only caller do_poll(). As an added bonus, the patch saves a few
    clock cycles, and also adds comments to make the code easier to follow.

    Signed-off-by: Vadim Lobanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

11 Apr, 2006

2 commits

  • If SELECT_STACK_ALLOC is not a multiple of sizeof(long) then stack_fds[]
    would be shorter than SELECT_STACK_ALLOC bytes and could overflow later in
    the function. Fixed by simply rearranging the test later to work on
    sizeof(stack_fds) Currently SELECT_STACK_ALLOC is 256 so this doesn't
    happen, but it's nasty to have things like this hidden in the code. What
    if later someone decides to change SELECT_STACK_ALLOC to 300?

    Signed-off-by: Mitchell Blank Jr
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mitchell Blank Jr
     
  • fs/select.c: In function `core_sys_select':
    fs/select.c:339: warning: assignment from incompatible pointer type
    fs/select.c:376: warning: comparison of distinct pointer types lacks a cast

    By using a void* we can remove lots of casts rather than adding more.

    Cc: Jes Sorensen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Apr, 2006

1 commit

  • Commit 70674f95c0a2ea694d5c39f4e514f538a09be36f:

    [PATCH] Optimize select/poll by putting small data sets on the stack

    resulted in the poll stack being 4-byte aligned on 64-bit architectures,
    causing misaligned accesses to elements in the array.

    This patch fixes it by declaring the stack in terms of 'long' instead
    of 'char'.

    Force alignment of poll and select stacks to long to avoid unaligned
    access on 64 bit architectures.

    Signed-off-by: Jes Sorensen
    Signed-off-by: Linus Torvalds

    Jes Sorensen
     

29 Mar, 2006

3 commits

  • Mark the f_ops members of inodes as const, as well as fix the
    ripple-through this causes by places that copy this f_ops and then "do
    stuff" with it.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Optimize select and poll by a using stack space for small fd sets

    This brings back an old optimization from Linux 2.0. Using the stack is
    faster than kmalloc. On a Intel P4 system it speeds up a select of a
    single pty fd by about 13% (~4000 cycles -> ~3500)

    It also saves memory because a daemon hanging in select or poll will
    usually save one or two less pages. This can add up - e.g. if you have 10
    daemons blocking in poll/select you save 40KB of memory.

    I did a patch for this long ago, but it was never applied. This version is
    a reimplementation of the old patch that tries to be less intrusive. I
    only did the minimal changes needed for the stack allocation.

    The cut off point before external memory is allocated is currently at
    832bytes. The system calls always allocate this much memory on the stack.

    These 832 bytes are divided into 256 bytes frontend data (for the select
    bitmaps of the pollfds) and the rest of the space for the wait queues used
    by the low level drivers. There are some extreme cases where this won't
    work out for select and it falls back to allocating memory too early -
    especially with very sparse large select bitmaps - but the majority of
    processes who only have a small number of file descriptors should be ok.
    [TBD: 832/256 might not be the best split for select or poll]

    I suspect more optimizations might be possible, but they would be more
    complicated. One way would be to cache the select/poll context over
    multiple system calls because typically the input values should be similar.
    Problem is when to flush the file descriptors out though.

    Signed-off-by: Andi Kleen
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

18 Feb, 2006

1 commit

  • I got all of these backwards. We want to return

    min(input timeout, new timeout)

    to userspace to prevent increasing the time-remaining value.

    Thanks to Ernst Herzberg for reporting and diagnosing.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Feb, 2006

1 commit

  • With David Woodhouse

    select() presently has a habit of increasing the value of the user's
    `timeout' argument on return.

    We were writing back a timeout larger than the original. We _deliberately_
    round up, since we know we must wait at _least_ as long as the caller asks
    us to.

    The patch adds a couple of helper functions for magnitude comparison of
    timespecs and of timevals, and uses them to prevent the various poll and
    select functions from returning a timeout which is larger than the one which
    was passed in.

    The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
    timeout value to the old one and was returning that. It should just return
    the new timeout value.

    (We have various handy timespec/timeval-to-from-nsec conversion functions in
    time.h. But this code open-codes it all).

    Cc: "David S. Miller"
    Cc: Andi Kleen
    Cc: Ulrich Drepper
    Cc: Thomas Gleixner
    Cc: george anzinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Feb, 2006

1 commit


19 Jan, 2006

1 commit

  • The following implementation of ppoll() and pselect() system calls
    depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
    thread_info.

    These system calls have to change the signal mask during their
    operation, and signal handlers must be invoked using the new, temporary
    signal mask. The old signal mask must be restored either upon successful
    exit from the system call, or upon returning from the invoked signal
    handler if the system call is interrupted. We can't simply restore the
    original signal mask and return to userspace, since the restored signal
    mask may actually block the signal which interrupted the system call.

    The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
    path to trap into do_signal() just as TIF_SIGPENDING does, and by
    causing do_signal() to use the saved signal mask instead of the current
    signal mask when setting up the stack frame for the signal handler -- or
    by causing do_signal() to simply restore the saved signal mask in the
    case where there is no handler to be invoked.

    The first patch implements the sys_pselect() and sys_ppoll() system
    calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
    #ifdef should go away in time when all architectures have implemented
    it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
    kernel (in the -mm tree), and the third patch then removes the
    arch-specific implementations of sys_rt_sigsuspend() and replaces them
    with generic versions using the same trick.

    The fourth and fifth patches, provided by David Howells, implement
    TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
    adds the syscalls to the i386 syscall table.

    This patch:

    Add the pselect() and ppoll() system calls, providing core routines usable by
    the original select() and poll() system calls and also the new calls (with
    their semantics w.r.t timeouts).

    Signed-off-by: David Woodhouse
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

10 Sep, 2005

2 commits

  • With the use of RCU in files structure, the look-up of files using fds can now
    be lock-free. The lookup is protected by rcu_read_lock()/rcu_read_unlock().
    This patch changes the readers to use lock-free lookup.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     
  • In order for the RCU to work, the file table array, sets and their sizes must
    be updated atomically. Instead of ensuring this through too many memory
    barriers, we put the arrays and their sizes in a separate structure. This
    patch takes the first step of putting the file table elements in a separate
    structure fdtable that is embedded withing files_struct. It also changes all
    the users to refer to the file table using files_fdtable() macro. Subsequent
    applciation of RCU becomes easier after this.

    Signed-off-by: Dipankar Sarma
    Signed-Off-By: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

06 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds