19 May, 2007

2 commits

  • The timerfd was using the unlocked waitqueue operations, but it was
    using a different lock, so poll_wait() would race with it.

    This makes timerfd directly use the waitqueue lock.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • The eventfd was using the unlocked waitqueue operations, but it was
    using a different lock, so poll_wait() would race with it.

    This makes eventfd directly use the waitqueue lock.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

17 May, 2007

10 commits

  • Trond Myklebust
     
  • grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means
    the allocation of radix tree nodes is done with GFP_NOFS and the allocation
    of a new page is done using GFP_NOFS.

    The mapping has a flags field that contains the necessary allocation flags
    for the page cache allocation. These need to be consulted in order to get
    DMA and HIGHMEM allocations etc right. And yes a blockdev could be
    allowing Highmem allocations if its a ramdisk.

    Cc: Hugh Dickins
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
    acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
    comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
    file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
    call under dqonoff_mutex and save some problems with races...

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Use zero_user_page() instead of open-coding it.

    Signed-off-by: Nate Diller
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     
  • Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
    filename size and change it to 128, so that extensive patterns such as
    '/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
    generation.

    Signed-off-by: Dan Aloni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Aloni
     
  • Trond Myklebust
     
  • Just thought this is easier to read.

    Acked-by: Davide Libenzi
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • afs_prepare_write() should not mark a page up to date if it only partially
    fills it in, in expectation of the caller filling in the rest prior to calling
    commit_write(). commit_write(), however, should mark the page up to date.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix AFS to write back dirty on unmounting. This didn't happen because
    afs_super_ops.drop_inode was pointing to generic_delete_inode. Now this
    pointer is left set to NULL so that the default behaviour occurs instead.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

16 May, 2007

1 commit


15 May, 2007

11 commits

  • Move the kfree() call inside the ep_free() function.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Fixes some epoll code comments.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Changes the rwlock to a spinlock, and drops the use-count variable.
    Operations are always bound by the mutex now, so the use-count is no more
    needed. For the same reason, the rwlock can become a simple spinlock.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Fixes the epoll single pass code. During the unlocked event delivery (to
    userspace) code, the poll callback can re-issue new events, and we must
    receive them correctly. Since we loop in a lockless fashion, we want to be
    O(nready), and we don't want to flash on/off the spinlock for every event, we
    have the poll callback to use a secondary list to queue events while we're
    inside the event delivery loop. The rw_semaphore has been turned into a
    mutex. This patch also adds the wait-exclusive flag, as suggested by Davi
    Arnaut.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • - fs/lockd/xdr4.c:140:27: warning: incorrect type in argument 2 (different
    explicit signedness)
    - fs/lockd/xdr4.c:141:27: warning: incorrect type in argument 2 (different
    explicit signedness)
    - fs/lockd/xdr4.c:432:28: warning: incorrect type in argument 2 (different
    explicit signedness)
    - fs/lockd/xdr4.c:433:28: warning: incorrect type in argument 2 (different
    explicit signedness)
    - fs/lockd/xdr4.c:587:20: warning: symbol 'nlm_version4' was not declared.
    Should it be static?

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • - fs/nfs/nfs4xdr.c:2499:42: warning: incorrect type in argument 2
    (different signedness)
    - fs/nfs/nfs4xdr.c:2658:49: warning: incorrect type in argument 4
    (different explicit signedness)
    - fs/nfs/nfs4xdr.c:2683:50: warning: incorrect type in argument 4
    (different explicit signedness)
    - fs/nfs/nfs4xdr.c:3063:68: warning: incorrect type in argument 4
    (different explicit signedness)
    - fs/nfs/nfs4xdr.c:3065:68: warning: incorrect type in argument 4
    (different explicit signedness)

    - fs/nfs/callback_xdr.c:138:31: warning: incorrect type in argument 2
    (different signedness)

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • - fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared.
    Should it be static?
    - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared.
    Should it be static?
    - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one
    - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not
    declared. Should it be static?
    - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease'
    was not declared. Should it be static?

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The XDR code should not depend on the physical allocation size of
    structures like nfs4_stateid and nfs4_verifier since those may have to
    change at some future date. We therefore replace all uses of
    sizeof() with constants like NFS4_VERIFIER_SIZE and NFS4_STATEID_SIZE.

    This also has the side-effect of fixing some warnings of the type
    format ‘%u’ expects type ‘unsigned int’, but argument X has type
    ‘long unsigned int’
    on 64-bit systems

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Use zero_user_page() instead of the newly deprecated memclear_highpage_flush().

    Signed-off-by: Nate Diller
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Trond Myklebust

    Nate Diller
     
  • reclaimer() calls allow_signal() which plays with parent process's ->sighand.

    Signed-off-by: Oleg Nesterov
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Trond Myklebust

    Oleg Nesterov
     
  • nlmsvc_timeout is already in units of HZ...

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

13 May, 2007

1 commit

  • Use zero_user_page() instead of open-coding it.

    [akpm@linux-foundation.org: kmap-type fixes]
    Signed-off-by: Nate Diller
    Acked-by: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

12 May, 2007

1 commit


11 May, 2007

14 commits

  • Re-arrange epoll code to avoid static functions pre-declarations, and apply
    akpm-filter on it.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Epoll is either compiled it, or not (if EMBEDDED). Remove the module code
    and use fs_initcall().

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Cut out lots of code from epoll, by reusing the anonymous inode source
    patch (fs/anon_inodes.c).

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This is an example about how to add eventfd support to the current KAIO code,
    in order to enable KAIO to post readiness events to a pollable fd (hence
    compatible with POSIX select/poll). The KAIO code simply signals the eventfd
    fd when events are ready, and this triggers a POLLIN in the fd. This patch
    uses a reserved for future use member of the struct iocb to pass an eventfd
    file descriptor, that KAIO will use to post events every time a request
    completes. At that point, an aio_getevents() will return the completed result
    to a struct io_event. I made a quick test program to verify the patch, and it
    runs fine here:

    http://www.xmailserver.org/eventfd-aio-test.c

    The test program uses poll(2), but it'd, of course, work with select and epoll
    too.

    This can allow to schedule both block I/O and other poll-able devices
    requests, and wait for results using select/poll/epoll. In a typical
    scenario, an application would submit KAIO request using aio_submit(), and
    will also use epoll_ctl() on the whole other class of devices (that with the
    addition of signals, timers and user events, now it's pretty much complete),
    and then would:

    epoll_wait(...);
    for_each_event {
    if (curr_event_is_kaiofd) {
    aio_getevents();
    dispatch_aio_events();
    } else {
    dispatch_epoll_event();
    }
    }

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This is a very simple and light file descriptor, that can be used as event
    wait/dispatch by userspace (both wait and dispatch) and by the kernel
    (dispatch only). It can be used instead of pipe(2) in all cases where those
    would simply be used to signal events. Their kernel overhead is much lower
    than pipes, and they do not consume two fds. When used in the kernel, it can
    offer an fd-bridge to enable, for example, functionalities like KAIO or
    syslets/threadlets to signal to an fd the completion of certain operations.
    But more in general, an eventfd can be used by the kernel to signal readiness,
    in a POSIX poll/select way, of interfaces that would otherwise be incompatible
    with it. The API is:

    int eventfd(unsigned int count);

    The eventfd API accepts an initial "count" parameter, and returns an eventfd
    fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

    The POLLIN flag is raised when the internal counter is greater than zero.

    The POLLOUT flag is raised when at least a value of "1" can be written to the
    internal counter.

    The POLLERR flag is raised when an overflow in the counter value is detected.

    The write(2) operation can never overflow the counter, since it blocks (unless
    O_NONBLOCK is set, in which case -EAGAIN is returned).

    But the eventfd_signal() function can do it, since it's supposed to not sleep
    during its operation.

    The read(2) function reads the __u64 counter value, and reset the internal
    value to zero. If the value read is equal to (__u64) -1, an overflow happened
    on the internal counter (due to 2^64 eventfd_signal() posts that has never
    been retired - unlickely, but possible).

    The write(2) call writes an __u64 count value, and adds it to the current
    counter. The eventfd fd supports O_NONBLOCK also.

    On the kernel side, we have:

    struct file *eventfd_fget(int fd);
    int eventfd_signal(struct file *file, unsigned int n);

    The eventfd_fget() should be called to get a struct file* from an eventfd fd
    (this is an fget() + check of f_op being an eventfd fops pointer).

    The kernel can then call eventfd_signal() every time it wants to post an event
    to userspace. The eventfd_signal() function can be called from any context.
    An eventfd() simple test and bench is available here:

    http://www.xmailserver.org/eventfd-bench.c

    This is the eventfd-based version of pipetest-4 (pipe(2) based):

    http://www.xmailserver.org/pipetest-4.c

    Not that performance matters much in the eventfd case, but eventfd-bench
    shows almost as double as performance than pipetest-4.

    [akpm@linux-foundation.org: fix i386 build]
    [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch implements the necessary compat code for the timerfd system call.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch introduces a new system call for timers events delivered though
    file descriptors. This allows timer event to be used with standard POSIX
    poll(2), select(2) and read(2). As a consequence of supporting the Linux
    f_op->poll subsystem, they can be used with epoll(2) too.

    The system call is defined as:

    int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

    The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
    w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
    s new file descriptor will be created, otherwise the existing "ufd" will be
    re-programmed.

    The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
    specified in the "utmr->it_value" parameter is the expiry time for the timer.

    If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
    otherwise it's a relative time.

    If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
    tv_nsec == 0), this is the period at which the following ticks should be
    generated.

    The "utmr->it_interval" should be set to zero if only one tick is requested.
    Setting the "utmr->it_value" to zero will disable the timer, or will create a
    timerfd without the timer enabled.

    The function returns the new (or same, in case "ufd" is a valid timerfd
    descriptor) file, or -1 in case of error.

    As stated before, the timerfd file descriptor supports poll(2), select(2) and
    epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
    returned.

    The read(2) call can be used, and it will return a u32 variable holding the
    number of "ticks" that happened on the interface since the last call to
    read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
    be returned if no ticks happened.

    A quick test program, shows timerfd working correctly on my amd64 box:

    http://www.xmailserver.org/timerfd-test.c

    [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch implements the necessary compat code for the signalfd system call.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch add an anonymous inode source, to be used for files that need
    and inode only in order to create a file*. We do not care of having an
    inode for each file, and we do not even care of having different names in
    the associated dentries (dentry names will be same for classes of file*).
    This allow code reuse, and will be used by epoll, signalfd and timerfd
    (and whatever else there'll be).

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Make autofs container-friendly by caching struct pid reference rather than
    pid_t and using pid_nr() to retreive a task's pid_t.

    ChangeLog:
    - Fix Eric Biederman's comments - Use find_get_pid() to hold a
    reference to oz_pgrp and release while unmounting; separate out
    changes to autofs and autofs4.
    - Fix Cedric's comments: retain old prototype of parse_options()
    and move necessary change to its caller.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Eric Biederman
    Cc: containers@lists.osdl.org
    Acked-by: Eric W. Biederman
    Cc: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files
    being modified for container/pidspace issues.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Cc: Eric W. Biederman
    Cc: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Clean up massive code duplication between mpage_writepages() and
    generic_writepages().

    The new generic function, write_cache_pages() takes a function pointer
    argument, which will be called for each page to be written.

    Maybe cifs_writepages() too can use this infrastructure, but I'm not
    touching that with a ten-foot pole.

    The upcoming page writeback support in fuse will also want this.

    Signed-off-by: Miklos Szeredi
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi