21 Mar, 2011

1 commit


14 Jan, 2011

1 commit

  • On some architectures __kernel_suseconds_t is int. On these archs struct
    timeval has padding bytes at the end. This struct is copied to userspace
    with these padding bytes uninitialized. This leads to leaking of contents
    of kernel stack memory.

    This bug was added with v2.6.27-rc5-286-gb773ad4.

    [akpm@linux-foundation.org: avoid the memset on architectures which don't need it]
    Signed-off-by: Vasiliy Kulikov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

28 Oct, 2010

2 commits


13 Mar, 2010

1 commit

  • Add a generic implementation of the old select() syscall, which expects
    its argument in a memory block and switch all architectures over to use
    it.

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Reviewed-by: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Acked-by: Andreas Schwab
    Acked-by: Russell King
    Acked-by: Greg Ungerer
    Acked-by: David Howells
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

07 Mar, 2010

1 commit

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

05 Oct, 2009

1 commit


23 Sep, 2009

1 commit

  • __estimate_accuracy() was prone to integer overflow, for example if *tv ==
    {2147, 483648000} on a 32 bit computer (or even for delays as small as
    {429, 500000000} if the task is niced).

    Because the result was already forced between 0 and 100ms, the effect of
    the overflow was not too problematic, but the use of the hrtimer range
    feature was not optimal in overflow cases.

    This patch ensures that there can not be an integer overflow in this
    function.

    Signed-off-by: Guillaume Knispel
    Cc: Alexander Viro
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Cc: Heiko Carstens
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

16 Aug, 2009

1 commit

  • The triggered field of struct poll_wqueues introduced in commit
    5f820f648c92a5ecc771a96b3c29aa6e90013bba ("poll: allow f_op->poll to
    sleep").

    It was first set to 1 in pollwake() (now __pollwake() ), tested and
    later set to 0 in poll_schedule_timeout(), but not initialized before.

    As a result when the process needs to sleep, triggered was likely to be
    non-zero even if pollwake() is not called before the first
    poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
    called and an extra loop calling all ->poll() would be done.

    This patch initialize triggered to 0 in poll_initwait() so the ->poll()
    are not called twice before the process goes to sleep when it needs to.

    Signed-off-by: Guillaume Knispel
    Acked-by: Thomas Gleixner
    Acked-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

17 Jun, 2009

1 commit

  • After introduction of keyed wakeups Davide Libenzi did on epoll, we are
    able to avoid spurious wakeups in poll()/select() code too.

    For example, typical use of poll()/select() is to wait for incoming
    network frames on many sockets. But TX completion for UDP/TCP frames call
    sock_wfree() which in turn schedules thread.

    When scheduled, thread does a full scan of all polled fds and can sleep
    again, because nothing is really available. If number of fds is large,
    this cause significant load.

    This patch makes select()/poll() aware of keyed wakeups and useless
    wakeups are avoided. This reduces number of context switches by about 50%
    on some setups, and work performed by sofirq handlers.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Acked-by: Andi Kleen
    Acked-by: Ingo Molnar
    Acked-by: Davide Libenzi
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

14 Jan, 2009

4 commits

  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Not a single architecture has wired up sys_pselect7 plus it is the
    only system call with seven parameters. Just make it static and
    rename it to do_pselect which will do the work for sys_pselect6.

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
    seen occasional 5-second hangs from telnet. telnetd calls select with a
    NULL timeout, but with the new kernel, the system call occasionally
    returns 0, which causes telnet to call sleep (5). This did not happen
    with earlier kernels.

    The code in sys_pselect7 looks a bit strange, in particular the variable
    "to" is initialized to NULL, then changed if a non-null timeout was
    passed in, but not used further. It needs to be passed to
    core_sys_select instead of &end_time.

    This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51
    ("select: switch select() and poll() over to hrtimers").

    Signed-off-by: Bernd Schmidt
    Reviewed-by: Ulrich Drepper
    Tested-by: Robin Getz
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Bernd Schmidt
     

07 Jan, 2009

1 commit

  • f_op->poll is the only vfs operation which is not allowed to sleep. It's
    because poll and select implementation used task state to synchronize
    against wake ups, which doesn't have to be the case anymore as wait/wake
    interface can now use custom wake up functions. The non-sleep restriction
    can be a bit tricky because ->poll is not called from an atomic context
    and the result of accidentally sleeping in ->poll only shows up as
    temporary busy looping when the timing is right or rather wrong.

    This patch converts poll/select to use custom wake up function and use
    separate triggered variable to synchronize against wake up events. The
    only added overhead is an extra function call during wake up and
    negligible.

    This patch removes the one non-sleep exception from vfs locking rules and
    is beneficial to userland filesystem implementations like FUSE, 9p or
    peculiar fs like spufs as it's very difficult for those to implement
    non-sleeping poll method.

    While at it, make the following cosmetic changes to make poll.h and
    select.c checkpatch friendly.

    * s/type * symbol/type *symbol/ : three places in poll.h
    * remove blank line before EXPORT_SYMBOL() : two places in select.c

    Oleg: spotted missing barrier in poll_schedule_timeout()
    Davide: spotted missing write barrier in pollwake()

    Signed-off-by: Tejun Heo
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Miklos Szeredi
    Cc: Davide Libenzi
    Cc: Brad Boyer
    Cc: Al Viro
    Cc: Roland McGrath
    Cc: Mauro Carvalho Chehab
    Signed-off-by: Andrew Morton
    Cc: Davide Libenzi
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Oct, 2008

1 commit

  • Some userland apps seem to pass in a "0" for the seconds, and several
    seconds worth of usecs to select(). The old kernels accepted this just
    fine, so the new kernels must too.

    However, due to the upscaling of the microseconds to nanoseconds we had
    some cases where we got math overflow, and depending on the GCC version
    (due to inlining decisions) that actually resulted in an -EINVAL return.

    This patch fixes this by adding the excess microseconds to the seconds
    field.

    Also with thanks to Marcin Slusarz for spotting some implementation bugs
    in the diagnostics patches.

    Reported-by: Carlos R. Mafra
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

08 Sep, 2008

2 commits


06 Sep, 2008

3 commits

  • This patch makes the select() and poll() hrtimers use the new range
    feature and settings from the task struct.

    In addition, this includes the estimate_accuracy() function that Linus
    posted to lkml, but changed entirely based on other peoples lkml feedback.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • With lots of help, input and cleanups from Thomas Gleixner

    This patch switches select() and poll() over to hrtimers.

    The core of the patch is replacing the "s64 timeout" with a
    "struct timespec end_time" in all the plumbing.

    But most of the diffstat comes from using the just introduced helpers:
    poll_select_set_timeout
    poll_select_copy_remaining
    timespec_add_safe
    which make manipulating the timespec easier and less error-prone.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     
  • This patch adds 2 helpers that will be used for the hrtimer based select/poll:

    poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
    pair) and turns that into a "struct timespec" that represents the absolute end time.
    This is a common operation in the many select() and poll() variants and needs various,
    common, sanity checks.

    poll_select_copy_remaining() is a helper that takes care of copying the remaining
    time to userspace, as select(), pselect() and ppoll() do. This function comes in
    both a natural and a compat implementation (due to datastructure differences).

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arjan van de Ven

    Thomas Gleixner
     

23 Jun, 2008

1 commit

  • Christian Borntraeger reported that reinstating cond_resched() with
    CONFIG_PREEMPT caused a performance regression on lmbench:

    For example select file 500:
    23 microseconds
    32 microseconds

    and that's really because we totally unnecessarily do the cond_resched()
    in the innermost loop of select(), which is just silly.

    This moves it out from the innermost loop (which only ever loops ove the
    bits in a single "unsigned long" anyway), which makes the performance
    regression go away.

    Reported-and-tested-by: Christian Borntraeger
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 May, 2008

2 commits


30 Apr, 2008

2 commits

  • Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
    #ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
    set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

22 Apr, 2008

1 commit

  • These are small cleanups all over the tree.

    Trivial style and comment changes to
    fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c

    Signed-off-by: Pavel Machek
    Signed-off-by: Jesper Juhl

    Pavel Machek
     

07 Feb, 2008

1 commit

  • schedule_timeout(jiffies) waits for at least jiffies - 1. Add 1 jiffie to
    the timeout_jiffies calculated in sys_poll() to wait at least
    timeout_msecs, like poll() manpage says.

    Signed-off-by: Karsten Wiese
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese
     

20 Oct, 2007

1 commit


17 Oct, 2007

3 commits

  • Lomesh reported poll returning EINTR during suspend/resume cycle. This is
    caused by the STOP/CONT cycle that the freezer uses, generating a pending
    signal for what in effect is an ignored signal. In general poll is a
    little eager in returning EINTR, when it could try not bother userspace and
    simply restart the syscall. Both select and ppoll do use ERESTARTNOHAND to
    restart the syscall. Oleg points out that simply using ERESTARTNOHAND will
    cause poll to restart with original timeout value. which could ultimately
    lead to process never returning to userspace. Instead use
    ERESTART_RESTARTBLOCK, and restart poll with updated timeout value.
    Inspired by Manfred's use ERESTARTNOHAND in poll patch.

    [bunk@kernel.org: do_restart_poll() can become static]
    Cc: Manfred Spraul
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: "Agarwal, Lomesh"
    Signed-off-by: Chris Wright
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wright
     
  • do_poll() checks signal_pending() but returns 0 when interrupted. This means
    the caller has to check signal_pending() again.

    Change it to return -EINTR when signal_pending() and count == 0.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Davide Libenzi
    Cc: Vadim Lobanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. Lessens both the source and compiled code (100 bytes) and imho makes
    the code much more understandable.

    With this patch "struct poll_list *head" always points to on-stack stack_pps,
    so we can remove all "is it on-stack" and "was it initialized" checks.

    Also, move poll_initwait/poll_freewait and -EINTR detection closer to the
    do_poll()'s callsite.

    [akpm@linux-foundation.org: fix warning (size_t != uint)]
    Signed-off-by: Oleg Nesterov
    Looks-good-to: Andi Kleen
    Cc: Davide Libenzi
    Cc: Vadim Lobanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

12 Sep, 2007

1 commit

  • Taneli Vähäkangas reported that commit
    786d7e1612f0b0adb6046f19b906609e4fe8b1ba aka "Fix rmmod/read/write races
    in /proc entries" broke SBCL + SLIME combo.

    The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
    ->poll handler. The new code makes ->poll always there and returns 0 by
    default, which is not correct. Return DEFAULT_POLLMASK instead.

    Steps to reproduce:

    install emacs, SBCL, SLIME
    emacs
    M-x slime in *inferior-lisp* buffer
    [watch it doing "Connecting to Swank on port X.."]

    Please, apply before 2.6.23.

    P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.

    Signed-off-by: Alexey Dobriyan
    Cc: T Taneli Vahakangas
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 May, 2007

3 commits


11 Dec, 2006

1 commit

  • Currently, each fdtable supports three dynamically-sized arrays of data: the
    fdarray and two fdsets. The code allows the number of fds supported by the
    fdarray (fdtable->max_fds) to differ from the number of fds supported by each
    of the fdsets (fdtable->max_fdset).

    In practice, it is wasteful for these two sizes to differ: whenever we hit a
    limit on the smaller-capacity structure, we will reallocate the entire fdtable
    and all the dynamic arrays within it, so any delta in the memory used by the
    larger-capacity structure will never be touched at all.

    Rather than hogging this excess, we shouldn't even allocate it in the first
    place, and keep the capacities of the fdarray and the fdsets equal. This
    patch removes fdtable->max_fdset. As an added bonus, most of the supporting
    code becomes simpler.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

30 Sep, 2006

1 commit

  • POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX. In
    this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
    of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
    the compile-time constant which happens to also be named OPEN_MAX. In the
    current code, an application may poll up to max_fdset file descriptors,
    even if this exceeds RLIMIT_NOFILE. The current code also breaks
    applications which poll more than max_fdset descriptors, which worked circa
    2.4.18 when the check was against NR_OPEN, which is 1024*1024. This patch
    enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
    been changed at run time with ulimit -n.

    To elaborate on the rationale for this, there are three cases:

    1) RLIMIT_NOFILE is at the default value of 1024

    In this (default) case, the patch changes nothing. Calls with nfds > 1024
    fail with EINVAL both before and after the patch, and calls with nfds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Snook
     

26 Jun, 2006

1 commit

  • If you do a poll() call with timeout -1, the wait will be a big number
    (depending on HZ) instead of infinite wait, since -1 is passed to the
    msecs_to_jiffies function.

    Signed-off-by: Frode Isaksen
    Acked-by: Nishanth Aravamudan
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frode Isaksen
     

23 Jun, 2006

1 commit

  • The "count" and "pt" variables are declared and modified by do_poll(), as
    well as accessed and written indirectly in the do_pollfd() subroutine.

    This patch pulls all handling of these variables into the do_poll()
    function, thereby eliminating the odd use of indirection in do_pollfd().
    This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
    into its only caller do_poll(). As an added bonus, the patch saves a few
    clock cycles, and also adds comments to make the code easier to follow.

    Signed-off-by: Vadim Lobanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov