13 Nov, 2013
1 commit
-
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixesplus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
30 Oct, 2013
1 commit
-
This reverts commit 9745cdb36da8 (select: use freezable blocking call)
that triggers problems during resume from suspend to RAM on Paul Bolle's
32-bit x86 machines. Paul says:Ever since I tried running (release candidates of) v3.11 on the two
working i686s I still have lying around I ran into issues on resuming
from suspend. Reverting 9745cdb36da8 (select: use freezable blocking
call) resolves those issues.Resuming from suspend on i686 on (release candidates of) v3.11 and
later triggers issues like:traps: systemd[1] general protection ip:b738e490 sp:bf882fc0 error:0 in libc-2.16.so[b731c000+1b0000]
and
traps: rtkit-daemon[552] general protection ip:804d6e5 sp:b6cb32f0 error:0 in rtkit-daemon[8048000+d000]
Once I hit the systemd error I can only get out of the mess that the
system is at that point by power cycling it.Since we are reverting another freezer-related change causing similar
problems to happen, this one should be reverted as well.References: https://lkml.org/lkml/2013/10/29/583
Reported-by: Paul Bolle
Fixes: 9745cdb36da8 (select: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki
Cc: 3.11+ # 3.11+
25 Oct, 2013
1 commit
-
Signed-off-by: Al Viro
11 Jul, 2013
1 commit
-
Rename the file and correct all the places where it is included.
Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
10 Jul, 2013
2 commits
-
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit 0a4db187a999 ("Merge branch 'll_poll'")From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.5) Allow controlling network device VF link state using netlink, from
Rony Efraim.6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
... -
Suggested by Linus:
Changed time accounting for busy-poll:
- Make it microsecond based.
- Use unsigned longs.
- Revert back to use time_after instead of time_in_range.
Reorder poll/select busy loop conditions:
- Clear busy_flag after one time we can't busy-poll.
- Only init busy_end if we actually are going to busy-poll.
Added one more missing need_resched() test.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
09 Jul, 2013
1 commit
-
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
03 Jul, 2013
1 commit
-
Time in range will fail safely if we move to a different cpu with an
extremely large clock skew.
Add time_in_range64() and convert lls to use it.changelog:
v2
- fixed double call to sched_clock in can_poll_ll
- fixed checkpatchismsSigned-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
02 Jul, 2013
1 commit
-
Change Low Latency Sockets code for select and poll so that
when LLS is disabled sched_clock() is never called.Also, avoid sending POLL_LL to sockets if disabled.
Reported-by: Andi Kleen
Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
26 Jun, 2013
1 commit
-
select/poll busy-poll support.
Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txtAdd a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
12 May, 2013
1 commit
-
Avoid waking up every thread sleeping in a select call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.Acked-by: Tejun Heo
Signed-off-by: Colin Cross
Signed-off-by: Rafael J. Wysocki
08 Feb, 2013
1 commit
-
Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.hSigned-off-by: Clark Williams
Cc: Peter Zijlstra
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar
27 Sep, 2012
2 commits
-
Signed-off-by: Al Viro
-
simplifies a bunch of callers...
Signed-off-by: Al Viro
27 Jul, 2012
1 commit
-
Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
after including . A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely. The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines. Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f5f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.Reported-by: Jeff Law
Suggested-by: Linus Torvalds
CC:
Signed-off-by: Josh Boyer
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds
02 Jun, 2012
1 commit
-
Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h. Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.Signed-off-by: Al Viro
30 Mar, 2012
1 commit
-
Pull x32 support for x86-64 from Ingo Molnar:
"This tree introduces the X32 binary format and execution mode for x86:
32-bit data space binaries using 64-bit instructions and 64-bit kernel
syscalls.This allows applications whose working set fits into a 32 bits address
space to make use of 64-bit instructions while using a 32-bit address
space with shorter pointers, more compressed data structures, etc."Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
x32: Fix alignment fail in struct compat_siginfo
x32: Fix stupid ia32/x32 inversion in the siginfo format
x32: Add ptrace for x32
x32: Switch to a 64-bit clock_t
x32: Provide separate is_ia32_task() and is_x32_task() predicates
x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
x86/x32: Fix the binutils auto-detect
x32: Warn and disable rather than error if binutils too old
x32: Only clear TIF_X32 flag once
x32: Make sure TS_COMPAT is cleared for x32 tasks
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
fs: Fix close_on_exec pointer in alloc_fdtable
x32: Drop non-__vdso weak symbols from the x32 VDSO
x32: Fix coding style violations in the x32 VDSO code
x32: Add x32 VDSO support
x32: Allow x32 to be configured
x32: If configured, add x32 system calls to system call tables
x32: Handle process creation
x32: Signal-related system calls
x86: Add #ifdef CONFIG_COMPAT to
...
25 Mar, 2012
1 commit
-
Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
"Fix up files in fs/ and lib/ dirs to only use module.h if they really
need it.These are trivial in scope vs the work done previously. We now have
things where any few remaining cleanups can be farmed out to arch or
subsystem maintainers, and I have done so when possible. What is
remaining here represents the bits that don't clearly lie within a
single arch/subsystem boundary, like the fs dir and the lib dir.Some duplicate includes arising from overlapping fixes from
independent subsystem maintainer submissions are also quashed."Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
lib: reduce the use of module.h wherever possible
fs: reduce the use of module.h wherever possible
includecheck: delete any duplicate instances of module.h
24 Mar, 2012
1 commit
-
In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for. An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested. This is something that can happen
in the video4linux subsystem among others.Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably. The poll_table_struct does have it: it
has a key field with the event mask. But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself. That way drivers can obtain the mask through a new
poll_requested_events inline.The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In
that case poll_requested_events() returns ~0 (i.e. all events).Very rarely drivers might want to know whether poll_wait will actually
wait. If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting. This might be useful information in order
to avoid doing expensive work.A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation. This is now used in sock_poll_wait() in
include/net/sock.h. This was the only place in the kernel that needed
this information.Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead. In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events. It's been replaced by a call
to poll_requested_events().For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.Any driver accessing the qproc or key fields directly will now fail to compile.
Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument. This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.There are only three things a driver can do with the wait argument:
1) obtain the key field:
events = wait ? wait->key : ~0;
This will still work although it should be replaced with the new
poll_requested_events() function (which does exactly the same).
This will now even work better, since wait is no longer set to NULL
unnecessarily.2) use the qproc callback. This could be deadly since qproc can now be
NULL. Renaming qproc should prevent this from happening. There are no
kernel drivers that actually access this callback directly, BTW.3) test whether wait == NULL to determine whether poll would return without
waiting. This is no longer sufficient as the correct test is now
wait == NULL || wait->_qproc == NULL.However, the worst that can happen here is a slight performance hit in
the case where wait != NULL and wait->_qproc == NULL. In that case the
driver will assume that poll_wait() will actually add the fd to the set
of waiting file descriptors. Of course, poll_wait() will not do that
since it tests for wait->_qproc. This will not break anything, though.There is only one place in the whole kernel where this happens
(sock_poll_wait() in include/net/sock.h) and that code will be replaced
by a call to poll_does_not_wait() in the next patch.Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
actually waiting. The next file descriptor from the set might match the
event mask and thus any possible waits will never happen.Signed-off-by: Hans Verkuil
Reviewed-by: Jonathan Corbet
Reviewed-by: Al Viro
Cc: Davide Libenzi
Signed-off-by: Hans de Goede
Cc: Mauro Carvalho Chehab
Cc: David Miller
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Feb, 2012
1 commit
-
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.Signed-off-by: Paul Gortmaker
22 Feb, 2012
1 commit
-
The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.Reported-by: Thomas Meyer
Acked-by: Eric Dumazet
Signed-off-by: Linus Torvalds
20 Feb, 2012
1 commit
-
Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.This:
(1) Removes the abuses of struct fd_set:
(a) Since we don't want to allocate a full fd_set the vast majority of the
time, we actually, in effect, just allocate a just-big-enough array of
unsigned longs and cast it to an fd_set type - so why bother with the
fd_set at all?(b) Some places outside of the core fdtable handling code (such as
SELinux) want to look inside the array of unsigned longs hidden inside
the fd_set struct for more efficient iteration over the entire set.(2) Eliminates the use of FD_*() macros in the kernel completely.
(3) Permits the __FD_*() macros to be deleted entirely where not exposed to
userspace.Signed-off-by: David Howells
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin
Cc: Al Viro
21 Mar, 2011
1 commit
-
Remove the leftover from the commit 8ff3e8e85fa6 ("select:
switch select() and poll() over to hrtimers").Signed-off-by: Namhyung Kim
Acked-by: Arjan van de Ven
Signed-off-by: Al Viro
14 Jan, 2011
1 commit
-
On some architectures __kernel_suseconds_t is int. On these archs struct
timeval has padding bytes at the end. This struct is copied to userspace
with these padding bytes uninitialized. This leads to leaking of contents
of kernel stack memory.This bug was added with v2.6.27-rc5-286-gb773ad4.
[akpm@linux-foundation.org: avoid the memset on architectures which don't need it]
Signed-off-by: Vasiliy Kulikov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Oct, 2010
2 commits
-
This make epoll use hrtimers for the timeout value which prevents
epoll_wait() from timing out up to a millisecond early.This mirrors the behavior of select() and poll().
Signed-off-by: Shawn Bohrer
Cc: Al Viro
Acked-by: Davide Libenzi
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make it a subsystem-specific identifier because we wish to amke it
non-static in the next patch ("epoll: make epoll_wait() use the hrtimer
range feature").Cc: Shawn Bohrer
Cc: Al Viro
Cc: Davide Libenzi
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Mar, 2010
1 commit
-
Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Acked-by: Andreas Schwab
Acked-by: Russell King
Acked-by: Greg Ungerer
Acked-by: David Howells
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Mar, 2010
1 commit
-
Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.Signed-off-by: Jiri Slaby
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Oct, 2009
1 commit
-
Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
23 Sep, 2009
1 commit
-
__estimate_accuracy() was prone to integer overflow, for example if *tv ==
{2147, 483648000} on a 32 bit computer (or even for delays as small as
{429, 500000000} if the task is niced).Because the result was already forced between 0 and 100ms, the effect of
the overflow was not too problematic, but the use of the hrtimer range
feature was not optimal in overflow cases.This patch ensures that there can not be an integer overflow in this
function.Signed-off-by: Guillaume Knispel
Cc: Alexander Viro
Cc: Arjan van de Ven
Cc: Thomas Gleixner
Cc: Heiko Carstens
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Aug, 2009
1 commit
-
The triggered field of struct poll_wqueues introduced in commit
5f820f648c92a5ecc771a96b3c29aa6e90013bba ("poll: allow f_op->poll to
sleep").It was first set to 1 in pollwake() (now __pollwake() ), tested and
later set to 0 in poll_schedule_timeout(), but not initialized before.As a result when the process needs to sleep, triggered was likely to be
non-zero even if pollwake() is not called before the first
poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
called and an extra loop calling all ->poll() would be done.This patch initialize triggered to 0 in poll_initwait() so the ->poll()
are not called twice before the process goes to sleep when it needs to.Signed-off-by: Guillaume Knispel
Acked-by: Thomas Gleixner
Acked-by: Tejun Heo
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
17 Jun, 2009
1 commit
-
After introduction of keyed wakeups Davide Libenzi did on epoll, we are
able to avoid spurious wakeups in poll()/select() code too.For example, typical use of poll()/select() is to wait for incoming
network frames on many sockets. But TX completion for UDP/TCP frames call
sock_wfree() which in turn schedules thread.When scheduled, thread does a full scan of all polled fds and can sleep
again, because nothing is really available. If number of fds is large,
this cause significant load.This patch makes select()/poll() aware of keyed wakeups and useless
wakeups are avoided. This reduces number of context switches by about 50%
on some setups, and work performed by sofirq handlers.Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Acked-by: Andi Kleen
Acked-by: Ingo Molnar
Acked-by: Davide Libenzi
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jan, 2009
4 commits
-
Signed-off-by: Heiko Carstens
-
Signed-off-by: Heiko Carstens
-
Not a single architecture has wired up sys_pselect7 plus it is the
only system call with seven parameters. Just make it static and
rename it to do_pselect which will do the work for sys_pselect6.Signed-off-by: Heiko Carstens
-
Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
seen occasional 5-second hangs from telnet. telnetd calls select with a
NULL timeout, but with the new kernel, the system call occasionally
returns 0, which causes telnet to call sleep (5). This did not happen
with earlier kernels.The code in sys_pselect7 looks a bit strange, in particular the variable
"to" is initialized to NULL, then changed if a non-null timeout was
passed in, but not used further. It needs to be passed to
core_sys_select instead of &end_time.This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51
("select: switch select() and poll() over to hrtimers").Signed-off-by: Bernd Schmidt
Reviewed-by: Ulrich Drepper
Tested-by: Robin Getz
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
07 Jan, 2009
1 commit
-
f_op->poll is the only vfs operation which is not allowed to sleep. It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions. The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events. The
only added overhead is an extra function call during wake up and
negligible.This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.* s/type * symbol/type *symbol/ : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.cOleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()Signed-off-by: Tejun Heo
Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Ingo Molnar
Cc: Christoph Hellwig
Signed-off-by: Miklos Szeredi
Cc: Davide Libenzi
Cc: Brad Boyer
Cc: Al Viro
Cc: Roland McGrath
Cc: Mauro Carvalho Chehab
Signed-off-by: Andrew Morton
Cc: Davide Libenzi
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Oct, 2008
1 commit
-
Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select(). The old kernels accepted this just
fine, so the new kernels must too.However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.This patch fixes this by adding the excess microseconds to the seconds
field.Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.Reported-by: Carlos R. Mafra
Signed-off-by: Arjan van de Ven
Signed-off-by: Linus Torvalds
08 Sep, 2008
2 commits
-
the slack estimator used unsigned math; however for very short delay it's
possible that by the time you calculate the timeout, it's already passed and
you get a negative time/slack... in an unsigned variable... which then gets
turned into a 100 msec delay rather than zero.This patch fixes this by using a signed typee in the right places.
Signed-off-by: Arjan van de Ven
-
(based on lkml review)
* use rt_task()
* task_nice() has a signSigned-off-by: Arjan van de Ven