Eric Lee / linux-smarc-t335x-v3.2

02 Dec, 2008

1 commit

7ef9964e6 epoll: introduce resource usage limits ... Browse Code »

It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface. Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds. To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced. A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:

max_user_instances = Maximum number of devices - per user

max_user_watches = Maximum number of "watched" fds - per user

The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM. As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users. The
default value for "max_user_instances" is set to 128, that should be
enough too.

This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
listed, so that should be ok.

[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Cc: Cyrill Gorcunov
Reported-by: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2008-12-02 11:55:24 +0800

27 Oct, 2008

1 commit

9ce209d64 epoll: avoid double-inserts in case of EFAULT ... Browse Code »

In commit f337b9c58332bdecde965b436e47ea4c94d30da0 ("epoll: drop
unnecessary test") Thomas found that there is an unnecessary (always
true) test in ep_send_events(). The callback never inserts into
->rdllink while the send loop is performed, and also does the
~EP_PRIVATE_BITS test. Given we're holding the mutex during this time,
the conditions tested inside the loop are always true.

HOWEVER.

The test "!ep_is_linked(&epi->rdllink)" wasn't there because we insert
into ->rdllink, but because the send-events loop might terminate before
the whole list is scanned (-EFAULT).

In such cases, when the loop terminates early, and when a (leftover)
file received an event while we're performing the lockless loop, we need
such test to avoid to double insert the epoll items. The list_splice()
done a few steps below, will correctly re-insert the ones that were left
on "txlist".

This should fix the kenrel.org bugzilla entry 11831.

Signed-off-by: Davide Libenzi
Signed-off-by: Linus Torvalds

Davide Libenzi
2008-10-27 03:09:49 +0800

17 Oct, 2008

1 commit

f337b9c58 epoll: drop unnecessary test ... Browse Code »

Thomas found that there is an unnecessary (always true) test in
ep_send_events(). The callback never inserts into ->rdllink while the
send loop is performed, and also does the ~EP_PRIVATE_BITS test. Given
we're holding the mutex during this time, the conditions tested inside the
loop are always true. This patch drops the test done inside the
re-insertion loop.

Signed-off-by: Davide Libenzi
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2008-10-17 02:21:32 +0800

13 Aug, 2008

1 commit

523723bb5 fs/eventpoll.c: fix sys_epoll_create1() comment ... Browse Code »

The `size' argument was removed.

Reported-by: Geert Uytterhoeven
Cc: Ulrich Drepper
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-08-13 07:07:30 +0800

25 Jul, 2008

4 commits

9fe5ad9c8 flag parameters add-on: remove epoll_create size param ... Browse Code »

Remove the size parameter from the new epoll_create syscall and renames the
syscall itself. The updated test program follows.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#ifndef __NR_epoll_create2
# ifdef __x86_64__
# define __NR_epoll_create2 291
# elif defined __i386__
# define __NR_epoll_create2 329
# else
# error "need __NR_epoll_create2"
# endif
#endif

#define EPOLL_CLOEXEC O_CLOEXEC

int
main (void)
{
int fd = syscall (__NR_epoll_create2, 0);
if (fd == -1)
{
puts ("epoll_create2(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("epoll_create2(0) set close-on-exec flag");
return 1;
}
close (fd);

fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
if (fd == -1)
{
puts ("epoll_create2(EPOLL_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
return 1;
}
close (fd);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:29 +0800
e38b36f32 flag parameters: check magic constants ... Browse Code »

This patch adds test that ensure the boundary conditions for the various
constants introduced in the previous patches is met. No code is generated.

[akpm@linux-foundation.org: fix alpha]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:29 +0800
a0998b50c flag parameters: epoll_create ... Browse Code »

This patch adds the new epoll_create2 syscall. It extends the old epoll_create
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.

A new name EPOLL_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#ifndef __NR_epoll_create2
# ifdef __x86_64__
# define __NR_epoll_create2 291
# elif defined __i386__
# define __NR_epoll_create2 329
# else
# error "need __NR_epoll_create2"
# endif
#endif

#define EPOLL_CLOEXEC O_CLOEXEC

int
main (void)
{
int fd = syscall (__NR_epoll_create2, 1, 0);
if (fd == -1)
{
puts ("epoll_create2(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("epoll_create2(0) set close-on-exec flag");
return 1;
}
close (fd);

fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
if (fd == -1)
{
puts ("epoll_create2(EPOLL_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
return 1;
}
close (fd);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:28 +0800
7d9dbca34 flag parameters: anon_inode_getfd extension ... Browse Code »

This patch just extends the anon_inode_getfd interface to take an additional
parameter with a flag value. The flag value is passed on to
get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.

No actual semantic changes here, the changed callers all pass 0 for now.

[akpm@linux-foundation.org: KVM fix]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800

02 May, 2008

1 commit

2030a42ce [PATCH] sanitize anon_inode_getfd() ... Browse Code »

a) none of the callers even looks at inode or file returned by anon_inode_getfd()
b) any caller that would try to look at those would be racy, since by the time
it returns we might have raced with close() from another thread and that
file would be pining for fjords.

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:50 +0800

30 Apr, 2008

2 commits

f3de272b8 signals: use HAVE_SET_RESTORE_SIGMASK ... Browse Code »

Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-04-30 23:29:37 +0800
4e4c22c71 signals: add set_restore_sigmask ... Browse Code »

This adds the set_restore_sigmask() inline in and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
change, but abstracts the details of the flag protocol from all the calls.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-04-30 23:29:37 +0800

29 Apr, 2008

1 commit

cdac75e6f epoll: avoid kmemcheck warning ... Browse Code »

Epoll calls rb_set_parent(n, n) to initialize the rb-tree node, but
rb_set_parent() accesses node's pointer in its code. This creates a
warning in kmemcheck (reported by Vegard Nossum) about an uninitialized
memory access. The warning is harmless since the following rb-tree node
insert is going to overwrite the node data. In any case I think it's
better to not have that happening at all, and fix it by simplifying the
code to get rid of a few lines that became superfluous after the previous
epoll changes.

Signed-off-by: Davide Libenzi
Cc: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2008-04-29 23:05:59 +0800

06 Feb, 2008

1 commit

0ccf831cb lockdep: annotate epoll ... Browse Code »

On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:

> I remember I talked with Arjan about this time ago. Basically, since 1)
> you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
> are used, you can see a wake_up() from inside another wake_up(), but they
> will never refer to the same lock instance.
> Think about:
>
> dfd = socket(...);
> efd1 = epoll_create();
> efd2 = epoll_create();
> epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
>
> When a packet arrives to the device underneath "dfd", the net code will
> issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
> callback wakeup entry on that queue, and the wake_up() performed by the
> "dfd" net code will end up in ep_poll_callback(). At this point epoll
> (efd1) notices that it may have some event ready, so it needs to wake up
> the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
> that ends up in another wake_up(), after having checked about the
> recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
> avoid stack blasting. Never hit the same queue, to avoid loops like:
>
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
> epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
> epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
>
> The code "if (tncur->wq == wq || ..." prevents re-entering the same
> queue/lock.

Since the epoll code is very careful to not nest same instance locks
allow the recursion.

Signed-off-by: Peter Zijlstra
Tested-by: Stefan Richter
Acked-by: Davide Libenzi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2008-02-06 01:44:07 +0800

07 Dec, 2007

1 commit

4a6e9e2ce Use wake_up_locked() in eventpoll ... Browse Code »

Replace the uses of __wake_up_locked with wake_up_locked

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2007-12-07 06:07:16 +0800

20 Oct, 2007

1 commit

b70c39409 fs/eventpoll.c: use list_for_each_entry() instead of list_for_each() ... Browse Code »

fs/eventpoll.c: use list_for_each_entry() instead of list_for_each()
in ep_poll_safewake()

Signed-off-by: Matthias Kaehlcke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2007-10-20 02:53:38 +0800

19 Oct, 2007

1 commit

c80544dc0 sparse pointer use of zero as null ... Browse Code »

Get rid of sparse related warnings from places that use integer as NULL
pointer.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger
Cc: Andi Kleen
Cc: Jeff Garzik
Cc: Matt Mackall
Cc: Ian Kent
Cc: Arnd Bergmann
Cc: Davide Libenzi
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Hemminger
2007-10-19 05:37:31 +0800

20 Jul, 2007

1 commit

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800

15 May, 2007

4 commits

f0ee9aabb epoll: move kfree inside ep_free ... Browse Code »

Move the kfree() call inside the ep_free() function.

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-15 23:54:00 +0800
67647d0fb epoll: fix some comments ... Browse Code »

Fixes some epoll code comments.

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-15 23:54:00 +0800
c7ea76302 epoll locks changes and cleanups ... Browse Code »

Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is no more
needed. For the same reason, the rwlock can become a simple spinlock.

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-15 23:53:59 +0800
d47de16c7 fix epoll single pass code and add wait-exclusive flag ... Browse Code »

Fixes the epoll single pass code. During the unlocked event delivery (to
userspace) code, the poll callback can re-issue new events, and we must
receive them correctly. Since we loop in a lockless fashion, we want to be
O(nready), and we don't want to flash on/off the spinlock for every event, we
have the poll callback to use a secondary list to queue events while we're
inside the event delivery loop. The rw_semaphore has been turned into a
mutex. This patch also adds the wait-exclusive flag, as suggested by Davi
Arnaut.

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-15 23:53:59 +0800

11 May, 2007

3 commits

7699acd13 epoll cleanups: epoll remove static pre-declarations and akpm-ize the code ... Browse Code »

Re-arrange epoll code to avoid static functions pre-declarations, and apply
akpm-filter on it.

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:37 +0800
cea692418 epoll cleanups: epoll no module ... Browse Code »

Epoll is either compiled it, or not (if EMBEDDED). Remove the module code
and use fs_initcall().

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:37 +0800
da66f7cb0 epoll: use anonymous inodes ... Browse Code »

Cut out lots of code from epoll, by reusing the anonymous inode source
patch (fs/anon_inodes.c).

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:37 +0800

09 May, 2007

3 commits

b5e618181 Introduce a handy list_first_entry macro ... Browse Code »

There are many places in the kernel where the construction like

foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

list_first_entry(head, type, member) \
list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-05-09 02:15:11 +0800
e63340ae6 header cleaning: don't include smp_lock.h when not used ... Browse Code »

Remove includes of where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-05-09 02:15:07 +0800
6192bd536 epoll: optimizations and cleanups ... Browse Code »

Epoll is doing multiple passes over the ready set at the moment, because of
the constraints over the f_op->poll() call. Looking at the code again, I
noticed that we already hold the epoll semaphore in read, and this
(together with other locking conditions that hold while doing an
epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace
(in a single pass).

This is a stress application that can be used to test the new code. It
spwans multiple thread and call epoll_wait() and epoll_ctl() from many
threads. Stress tested on my dual Opteron 254 w/out any problems.

http://www.xmailserver.org/totalmess.c

This is not a benchmark, just something that tries to stress and exploit
possible problems with the new code.
Also, I made a stupid micro-benchmark:

http://www.xmailserver.org/epwbench.c

[1] Considering that epoll must be thread-safe, there are five ways we can
be hit during an epoll_wait() transfer loop (ep_send_events()):

1) The epoll fd going away and calling ep_free
This just can't happen, since we did an fget() in sys_epoll_wait

2) An epoll_ctl(EPOLL_CTL_DEL)
This can't happen because epoll_ctl() gets ep->sem in write, and
we're holding it in read during ep_send_events()

3) An fd stored inside the epoll fd going away
This can't happen because in eventpoll_release_file() we get
ep->sem in write, and we're holding it in read during
ep_send_events()

4) Another epoll_wait() happening on another thread
They both can be inside ep_send_events() at the same time, we get
(splice) the ready-list under the spinlock, so each one will get
its own ready list. Note that an fd cannot be at the same time
inside more than one ready list, because ep_poll_callback() will
not re-queue it if it sees it already linked:

if (ep_is_linked(&epi->rdllink))
goto is_linked;

Another case that can happen, is two concurrent epoll_wait(),
coming in with a userspace event buffer of size, say, ten.
Suppose there are 50 event ready in the list. The first
epoll_wait() will "steal" the whole list, while the second, seeing
no events, will go to sleep. But at the end of ep_send_events() in
the first epoll_wait(), we will re-inject surplus ready fds, and we
will trigger the proper wake_up to the second epoll_wait().

5) ep_poll_callback() hitting us asyncronously
This is the tricky part. As I said above, the ep_is_linked() test
done inside ep_poll_callback(), will guarantee us that until the
item will result linked to a list, ep_poll_callback() will not try
to re-queue it again (read, write data on any of its members). When
we do a list_del() in ep_send_events(), the item will still satisfy
the ep_is_linked() test (whatever data is written in prev/next,
it'll never be its own pointer), so ep_poll_callback() will still
leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&epi->rdllink)
that it'll become visible to ep_poll_callback(), but at the point
we're already past it.

[akpm@osdl.org: 80 cols]
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-09 02:15:01 +0800

09 Dec, 2006

1 commit

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

08 Dec, 2006

2 commits

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800

12 Oct, 2006

1 commit

b611967de [PATCH] epoll_pwait() ... Browse Code »

Implement the epoll_pwait system call, that extend the event wait mechanism
with the same logic ppoll and pselect do. The definition of epoll_pwait
is:

int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
int timeout, const sigset_t *sigmask, size_t sigsetsize);

The difference between the vanilla epoll_wait and epoll_pwait is that the
latter allows the caller to specify a signal mask to be set while waiting
for events. Hence epoll_pwait will wait until either one monitored event,
or an unmasked signal happen. If sigmask is NULL, the epoll_pwait system
call will act exactly like epoll_wait. For the POSIX definition of
pselect, information is available here:

http://www.opengroup.org/onlinepubs/009695399/functions/select.html

Signed-off-by: Davide Libenzi
Cc: David Woodhouse
Cc: Andi Kleen
Cc: Michael Kerrisk
Cc: Ulrich Drepper
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2006-10-12 02:14:21 +0800

03 Oct, 2006

1 commit

c3b657138 [PATCH] fs/eventpoll: error handling micro-cleanup ... Browse Code »

While reviewing the 'may be used uninitialized' bogus gcc warnings, I
noticed that an error code assignment was only needed if an error had
actually occured.

Signed-off-by: Jeff Garzik
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Garzik
2006-10-03 23:03:41 +0800

27 Sep, 2006

1 commit

ba52de123 [PATCH] inode-diet: Eliminate i_blksize from the inode structure ... Browse Code »

This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.

Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.

[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Ts'o
2006-09-27 23:26:18 +0800

28 Aug, 2006

1 commit

45f17e0c2 [PATCH] eventpoll.c compile fix ... Browse Code »

Fix two compile failures in eventpoll.c code which would happen if
DEBUG_EPOLL is bigger than zero.

Signed-off-by: Masoud Sharbiani
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masoud Asgharifard Sharbiani
2006-08-28 02:01:31 +0800

04 Jul, 2006

1 commit

36c8b5868 [PATCH] sched: cleanup, remove task_t, convert to struct task_struct ... Browse Code »

cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:11 +0800

26 Jun, 2006

1 commit

3419b23a9 [PATCH] epoll: use unlocked wqueue operations ... Browse Code »

A few days ago Arjan signaled a lockdep red flag on epoll locks, and
precisely between the epoll's device structure lock (->lock) and the wait
queue head lock (->lock).

Like I explained in another email, and directly to Arjan, this can't happen
in reality because of the explicit check at eventpoll.c:592, that does not
allow to drop an epoll fd inside the same epoll fd. Since lockdep is
working on per-structure locks, it will never be able to know of policies
enforced in other parts of the code.

It was decided time ago of having the ability to drop epoll fds inside
other epoll fds, that triggers a very trick wakeup operations (due to
possibly reentrant callback-driven wakeups) handled by the
ep_poll_safewake() function. While looking again at the code though, I
noticed that all the operations done on the epoll's main structure wait
queue head (->wq) are already protected by the epoll lock (->lock), so that
locked-style functions can be used to manipulate the ->wq member. This
makes both a lock-acquire save, and lockdep happy.

Running totalmess on my dual opteron for a while did not reveal any problem
so far:

http://www.xmailserver.org/totalmess.c

Signed-off-by: Davide Libenzi
Cc: Arjan van de Ven
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2006-06-26 01:01:13 +0800

23 Jun, 2006

1 commit

454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

21 Apr, 2006

1 commit

c569882b2 [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-21 20:17:24 +0800

11 Apr, 2006

1 commit

2395140ee [PATCH] uniform POLLRDHUP handling between epoll and poll/select ... Browse Code »

As reported by Michael Kerrisk, POLLRDHUP handling was not consistent
between epoll and poll/select, since in epoll it was unmaskeable. This
patch brings uniformity in POLLRDHUP handling.

Signed-off-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2006-04-11 21:18:42 +0800

29 Mar, 2006

1 commit

4b6f5d20b [PATCH] Make most file operations structs in fs/ const ... Browse Code »

This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-29 01:16:06 +0800