08 Feb, 2013

1 commit


27 Sep, 2012

2 commits


27 Jul, 2012

1 commit

  • Recently, glibc made a change to suppress sign-conversion warnings in
    FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the
    kernel's definition of __NFDBITS if applications #include
    after including . A build failure would
    be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
    flags to gcc.

    It was suggested that the kernel should either match the glibc
    definition of __NFDBITS or remove that entirely. The current in-kernel
    uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
    uses of the related __FDELT and __FDMASK defines. Given that, we'll
    continue the cleanup that was started with commit 8b3d1cda4f5f
    ("posix_types: Remove fd_set macros") and drop the remaining unused
    macros.

    Additionally, linux/time.h has similar macros defined that expand to
    nothing so we'll remove those at the same time.

    Reported-by: Jeff Law
    Suggested-by: Linus Torvalds
    CC:
    Signed-off-by: Josh Boyer
    [ .. and fix up whitespace as per akpm ]
    Signed-off-by: Linus Torvalds

    Josh Boyer
     

02 Jun, 2012

1 commit


30 Mar, 2012

1 commit

  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     

25 Mar, 2012

1 commit

  • Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
    "Fix up files in fs/ and lib/ dirs to only use module.h if they really
    need it.

    These are trivial in scope vs the work done previously. We now have
    things where any few remaining cleanups can be farmed out to arch or
    subsystem maintainers, and I have done so when possible. What is
    remaining here represents the bits that don't clearly lie within a
    single arch/subsystem boundary, like the fs dir and the lib dir.

    Some duplicate includes arising from overlapping fixes from
    independent subsystem maintainer submissions are also quashed."

    Fix up trivial conflicts due to clashes with other include file cleanups
    (including some due to the previous bug.h cleanup pull).

    * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    lib: reduce the use of module.h wherever possible
    fs: reduce the use of module.h wherever possible
    includecheck: delete any duplicate instances of module.h

    Linus Torvalds
     

24 Mar, 2012

1 commit

  • In some cases the poll() implementation in a driver has to do different
    things depending on the events the caller wants to poll for. An example
    is when a driver needs to start a DMA engine if the caller polls for
    POLLIN, but doesn't want to do that if POLLIN is not requested but instead
    only POLLOUT or POLLPRI is requested. This is something that can happen
    in the video4linux subsystem among others.

    Unfortunately, the current epoll/poll/select implementation doesn't
    provide that information reliably. The poll_table_struct does have it: it
    has a key field with the event mask. But once a poll() call matches one
    or more bits of that mask any following poll() calls are passed a NULL
    poll_table pointer.

    Also, the eventpoll implementation always left the key field at ~0 instead
    of using the requested events mask.

    This was changed in eventpoll.c so the key field now contains the actual
    events that should be polled for as set by the caller.

    The solution to the NULL poll_table pointer is to set the qproc field to
    NULL in poll_table once poll() matches the events, not the poll_table
    pointer itself. That way drivers can obtain the mask through a new
    poll_requested_events inline.

    The poll_table_struct can still be NULL since some kernel code calls it
    internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In
    that case poll_requested_events() returns ~0 (i.e. all events).

    Very rarely drivers might want to know whether poll_wait will actually
    wait. If another earlier file descriptor in the set already matched the
    events the caller wanted to wait for, then the kernel will return from the
    select() call without waiting. This might be useful information in order
    to avoid doing expensive work.

    A new helper function poll_does_not_wait() is added that drivers can use
    to detect this situation. This is now used in sock_poll_wait() in
    include/net/sock.h. This was the only place in the kernel that needed
    this information.

    Drivers should no longer access any of the poll_table internals, but use
    the poll_requested_events() and poll_does_not_wait() access functions
    instead. In order to enforce that the poll_table fields are now prepended
    with an underscore and a comment was added warning against using them
    directly.

    This required a change in unix_dgram_poll() in unix/af_unix.c which used
    the key field to get the requested events. It's been replaced by a call
    to poll_requested_events().

    For qproc it was especially important to change its name since the
    behavior of that field changes with this patch since this function pointer
    can now be NULL when that wasn't possible in the past.

    Any driver accessing the qproc or key fields directly will now fail to compile.

    Some notes regarding the correctness of this patch: the driver's poll()
    function is called with a 'struct poll_table_struct *wait' argument. This
    pointer may or may not be NULL, drivers can never rely on it being one or
    the other as that depends on whether or not an earlier file descriptor in
    the select()'s fdset matched the requested events.

    There are only three things a driver can do with the wait argument:

    1) obtain the key field:

    events = wait ? wait->key : ~0;

    This will still work although it should be replaced with the new
    poll_requested_events() function (which does exactly the same).
    This will now even work better, since wait is no longer set to NULL
    unnecessarily.

    2) use the qproc callback. This could be deadly since qproc can now be
    NULL. Renaming qproc should prevent this from happening. There are no
    kernel drivers that actually access this callback directly, BTW.

    3) test whether wait == NULL to determine whether poll would return without
    waiting. This is no longer sufficient as the correct test is now
    wait == NULL || wait->_qproc == NULL.

    However, the worst that can happen here is a slight performance hit in
    the case where wait != NULL and wait->_qproc == NULL. In that case the
    driver will assume that poll_wait() will actually add the fd to the set
    of waiting file descriptors. Of course, poll_wait() will not do that
    since it tests for wait->_qproc. This will not break anything, though.

    There is only one place in the whole kernel where this happens
    (sock_poll_wait() in include/net/sock.h) and that code will be replaced
    by a call to poll_does_not_wait() in the next patch.

    Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
    actually waiting. The next file descriptor from the set might match the
    event mask and thus any possible waits will never happen.

    Signed-off-by: Hans Verkuil
    Reviewed-by: Jonathan Corbet
    Reviewed-by: Al Viro
    Cc: Davide Libenzi
    Signed-off-by: Hans de Goede
    Cc: Mauro Carvalho Chehab
    Cc: David Miller
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans Verkuil
     

29 Feb, 2012

1 commit


22 Feb, 2012

1 commit

  • The 'poll()' system call timeout parameter is supposed to be 'int', not
    'long'.

    Now, the reason this matters is that right now 32-bit compat mode is
    broken on at least x86-64, because the 32-bit code just calls
    'sys_poll()' directly on x86-64, and the 32-bit argument will have been
    zero-extended, turning a signed 'int' into a large unsigned 'long'
    value.

    We could just introduce a 'compat_sys_poll()' function for this, and
    that may eventually be what we have to do, but since the actual standard
    poll() semantics is *supposed* to be 'int', and since at least on x86-64
    glibc sign-extends the argument before invocing the system call (so
    nobody can actually use a 64-bit timeout value in user space _anyway_,
    even in 64-bit binaries), the simpler solution would seem to be to just
    fix the definition of the system call to match what it should have been
    from the very start.

    If it turns out that somebody somehow circumvents the user-level libc
    64-bit sign extension and actually uses a large unsigned 64-bit timeout
    despite that not being how poll() is supposed to work, we will need to
    do the compat_sys_poll() approach.

    Reported-by: Thomas Meyer
    Acked-by: Eric Dumazet
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

20 Feb, 2012

1 commit

  • Replace the fd_sets in struct fdtable with an array of unsigned longs and then
    use the standard non-atomic bit operations rather than the FD_* macros.

    This:

    (1) Removes the abuses of struct fd_set:

    (a) Since we don't want to allocate a full fd_set the vast majority of the
    time, we actually, in effect, just allocate a just-big-enough array of
    unsigned longs and cast it to an fd_set type - so why bother with the
    fd_set at all?

    (b) Some places outside of the core fdtable handling code (such as
    SELinux) want to look inside the array of unsigned longs hidden inside
    the fd_set struct for more efficient iteration over the entire set.

    (2) Eliminates the use of FD_*() macros in the kernel completely.

    (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
    userspace.

    Signed-off-by: David Howells
    Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
    Signed-off-by: H. Peter Anvin
    Cc: Al Viro

    David Howells
     

21 Mar, 2011

1 commit


14 Jan, 2011

1 commit

  • On some architectures __kernel_suseconds_t is int. On these archs struct
    timeval has padding bytes at the end. This struct is copied to userspace
    with these padding bytes uninitialized. This leads to leaking of contents
    of kernel stack memory.

    This bug was added with v2.6.27-rc5-286-gb773ad4.

    [akpm@linux-foundation.org: avoid the memset on architectures which don't need it]
    Signed-off-by: Vasiliy Kulikov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

28 Oct, 2010

2 commits


13 Mar, 2010

1 commit

  • Add a generic implementation of the old select() syscall, which expects
    its argument in a memory block and switch all architectures over to use
    it.

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Reviewed-by: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Acked-by: Andreas Schwab
    Acked-by: Russell King
    Acked-by: Greg Ungerer
    Acked-by: David Howells
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

07 Mar, 2010

1 commit

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

05 Oct, 2009

1 commit


23 Sep, 2009

1 commit

  • __estimate_accuracy() was prone to integer overflow, for example if *tv ==
    {2147, 483648000} on a 32 bit computer (or even for delays as small as
    {429, 500000000} if the task is niced).

    Because the result was already forced between 0 and 100ms, the effect of
    the overflow was not too problematic, but the use of the hrtimer range
    feature was not optimal in overflow cases.

    This patch ensures that there can not be an integer overflow in this
    function.

    Signed-off-by: Guillaume Knispel
    Cc: Alexander Viro
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Cc: Heiko Carstens
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

16 Aug, 2009

1 commit

  • The triggered field of struct poll_wqueues introduced in commit
    5f820f648c92a5ecc771a96b3c29aa6e90013bba ("poll: allow f_op->poll to
    sleep").

    It was first set to 1 in pollwake() (now __pollwake() ), tested and
    later set to 0 in poll_schedule_timeout(), but not initialized before.

    As a result when the process needs to sleep, triggered was likely to be
    non-zero even if pollwake() is not called before the first
    poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
    called and an extra loop calling all ->poll() would be done.

    This patch initialize triggered to 0 in poll_initwait() so the ->poll()
    are not called twice before the process goes to sleep when it needs to.

    Signed-off-by: Guillaume Knispel
    Acked-by: Thomas Gleixner
    Acked-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Guillaume Knispel
     

17 Jun, 2009

1 commit

  • After introduction of keyed wakeups Davide Libenzi did on epoll, we are
    able to avoid spurious wakeups in poll()/select() code too.

    For example, typical use of poll()/select() is to wait for incoming
    network frames on many sockets. But TX completion for UDP/TCP frames call
    sock_wfree() which in turn schedules thread.

    When scheduled, thread does a full scan of all polled fds and can sleep
    again, because nothing is really available. If number of fds is large,
    this cause significant load.

    This patch makes select()/poll() aware of keyed wakeups and useless
    wakeups are avoided. This reduces number of context switches by about 50%
    on some setups, and work performed by sofirq handlers.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Acked-by: Andi Kleen
    Acked-by: Ingo Molnar
    Acked-by: Davide Libenzi
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

14 Jan, 2009

4 commits

  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Not a single architecture has wired up sys_pselect7 plus it is the
    only system call with seven parameters. Just make it static and
    rename it to do_pselect which will do the work for sys_pselect6.

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
    seen occasional 5-second hangs from telnet. telnetd calls select with a
    NULL timeout, but with the new kernel, the system call occasionally
    returns 0, which causes telnet to call sleep (5). This did not happen
    with earlier kernels.

    The code in sys_pselect7 looks a bit strange, in particular the variable
    "to" is initialized to NULL, then changed if a non-null timeout was
    passed in, but not used further. It needs to be passed to
    core_sys_select instead of &end_time.

    This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51
    ("select: switch select() and poll() over to hrtimers").

    Signed-off-by: Bernd Schmidt
    Reviewed-by: Ulrich Drepper
    Tested-by: Robin Getz
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Bernd Schmidt
     

07 Jan, 2009

1 commit

  • f_op->poll is the only vfs operation which is not allowed to sleep. It's
    because poll and select implementation used task state to synchronize
    against wake ups, which doesn't have to be the case anymore as wait/wake
    interface can now use custom wake up functions. The non-sleep restriction
    can be a bit tricky because ->poll is not called from an atomic context
    and the result of accidentally sleeping in ->poll only shows up as
    temporary busy looping when the timing is right or rather wrong.

    This patch converts poll/select to use custom wake up function and use
    separate triggered variable to synchronize against wake up events. The
    only added overhead is an extra function call during wake up and
    negligible.

    This patch removes the one non-sleep exception from vfs locking rules and
    is beneficial to userland filesystem implementations like FUSE, 9p or
    peculiar fs like spufs as it's very difficult for those to implement
    non-sleeping poll method.

    While at it, make the following cosmetic changes to make poll.h and
    select.c checkpatch friendly.

    * s/type * symbol/type *symbol/ : three places in poll.h
    * remove blank line before EXPORT_SYMBOL() : two places in select.c

    Oleg: spotted missing barrier in poll_schedule_timeout()
    Davide: spotted missing write barrier in pollwake()

    Signed-off-by: Tejun Heo
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Miklos Szeredi
    Cc: Davide Libenzi
    Cc: Brad Boyer
    Cc: Al Viro
    Cc: Roland McGrath
    Cc: Mauro Carvalho Chehab
    Signed-off-by: Andrew Morton
    Cc: Davide Libenzi
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Oct, 2008

1 commit

  • Some userland apps seem to pass in a "0" for the seconds, and several
    seconds worth of usecs to select(). The old kernels accepted this just
    fine, so the new kernels must too.

    However, due to the upscaling of the microseconds to nanoseconds we had
    some cases where we got math overflow, and depending on the GCC version
    (due to inlining decisions) that actually resulted in an -EINVAL return.

    This patch fixes this by adding the excess microseconds to the seconds
    field.

    Also with thanks to Marcin Slusarz for spotting some implementation bugs
    in the diagnostics patches.

    Reported-by: Carlos R. Mafra
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

08 Sep, 2008

2 commits


06 Sep, 2008

3 commits

  • This patch makes the select() and poll() hrtimers use the new range
    feature and settings from the task struct.

    In addition, this includes the estimate_accuracy() function that Linus
    posted to lkml, but changed entirely based on other peoples lkml feedback.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • With lots of help, input and cleanups from Thomas Gleixner

    This patch switches select() and poll() over to hrtimers.

    The core of the patch is replacing the "s64 timeout" with a
    "struct timespec end_time" in all the plumbing.

    But most of the diffstat comes from using the just introduced helpers:
    poll_select_set_timeout
    poll_select_copy_remaining
    timespec_add_safe
    which make manipulating the timespec easier and less error-prone.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     
  • This patch adds 2 helpers that will be used for the hrtimer based select/poll:

    poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
    pair) and turns that into a "struct timespec" that represents the absolute end time.
    This is a common operation in the many select() and poll() variants and needs various,
    common, sanity checks.

    poll_select_copy_remaining() is a helper that takes care of copying the remaining
    time to userspace, as select(), pselect() and ppoll() do. This function comes in
    both a natural and a compat implementation (due to datastructure differences).

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arjan van de Ven

    Thomas Gleixner
     

23 Jun, 2008

1 commit

  • Christian Borntraeger reported that reinstating cond_resched() with
    CONFIG_PREEMPT caused a performance regression on lmbench:

    For example select file 500:
    23 microseconds
    32 microseconds

    and that's really because we totally unnecessarily do the cond_resched()
    in the innermost loop of select(), which is just silly.

    This moves it out from the innermost loop (which only ever loops ove the
    bits in a single "unsigned long" anyway), which makes the performance
    regression go away.

    Reported-and-tested-by: Christian Borntraeger
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 May, 2008

2 commits


30 Apr, 2008

2 commits

  • Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
    #ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
    set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

22 Apr, 2008

1 commit

  • These are small cleanups all over the tree.

    Trivial style and comment changes to
    fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c

    Signed-off-by: Pavel Machek
    Signed-off-by: Jesper Juhl

    Pavel Machek
     

07 Feb, 2008

1 commit

  • schedule_timeout(jiffies) waits for at least jiffies - 1. Add 1 jiffie to
    the timeout_jiffies calculated in sys_poll() to wait at least
    timeout_msecs, like poll() manpage says.

    Signed-off-by: Karsten Wiese
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese
     

20 Oct, 2007

1 commit