11 May, 2007

4 commits

  • This is a very simple and light file descriptor, that can be used as event
    wait/dispatch by userspace (both wait and dispatch) and by the kernel
    (dispatch only). It can be used instead of pipe(2) in all cases where those
    would simply be used to signal events. Their kernel overhead is much lower
    than pipes, and they do not consume two fds. When used in the kernel, it can
    offer an fd-bridge to enable, for example, functionalities like KAIO or
    syslets/threadlets to signal to an fd the completion of certain operations.
    But more in general, an eventfd can be used by the kernel to signal readiness,
    in a POSIX poll/select way, of interfaces that would otherwise be incompatible
    with it. The API is:

    int eventfd(unsigned int count);

    The eventfd API accepts an initial "count" parameter, and returns an eventfd
    fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

    The POLLIN flag is raised when the internal counter is greater than zero.

    The POLLOUT flag is raised when at least a value of "1" can be written to the
    internal counter.

    The POLLERR flag is raised when an overflow in the counter value is detected.

    The write(2) operation can never overflow the counter, since it blocks (unless
    O_NONBLOCK is set, in which case -EAGAIN is returned).

    But the eventfd_signal() function can do it, since it's supposed to not sleep
    during its operation.

    The read(2) function reads the __u64 counter value, and reset the internal
    value to zero. If the value read is equal to (__u64) -1, an overflow happened
    on the internal counter (due to 2^64 eventfd_signal() posts that has never
    been retired - unlickely, but possible).

    The write(2) call writes an __u64 count value, and adds it to the current
    counter. The eventfd fd supports O_NONBLOCK also.

    On the kernel side, we have:

    struct file *eventfd_fget(int fd);
    int eventfd_signal(struct file *file, unsigned int n);

    The eventfd_fget() should be called to get a struct file* from an eventfd fd
    (this is an fget() + check of f_op being an eventfd fops pointer).

    The kernel can then call eventfd_signal() every time it wants to post an event
    to userspace. The eventfd_signal() function can be called from any context.
    An eventfd() simple test and bench is available here:

    http://www.xmailserver.org/eventfd-bench.c

    This is the eventfd-based version of pipetest-4 (pipe(2) based):

    http://www.xmailserver.org/pipetest-4.c

    Not that performance matters much in the eventfd case, but eventfd-bench
    shows almost as double as performance than pipetest-4.

    [akpm@linux-foundation.org: fix i386 build]
    [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch introduces a new system call for timers events delivered though
    file descriptors. This allows timer event to be used with standard POSIX
    poll(2), select(2) and read(2). As a consequence of supporting the Linux
    f_op->poll subsystem, they can be used with epoll(2) too.

    The system call is defined as:

    int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

    The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
    w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
    s new file descriptor will be created, otherwise the existing "ufd" will be
    re-programmed.

    The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
    specified in the "utmr->it_value" parameter is the expiry time for the timer.

    If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
    otherwise it's a relative time.

    If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
    tv_nsec == 0), this is the period at which the following ticks should be
    generated.

    The "utmr->it_interval" should be set to zero if only one tick is requested.
    Setting the "utmr->it_value" to zero will disable the timer, or will create a
    timerfd without the timer enabled.

    The function returns the new (or same, in case "ufd" is a valid timerfd
    descriptor) file, or -1 in case of error.

    As stated before, the timerfd file descriptor supports poll(2), select(2) and
    epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
    returned.

    The read(2) call can be used, and it will return a u32 variable holding the
    number of "ticks" that happened on the interface since the last call to
    read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
    be returned if no ticks happened.

    A quick test program, shows timerfd working correctly on my amd64 box:

    http://www.xmailserver.org/timerfd-test.c

    [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch add an anonymous inode source, to be used for files that need
    and inode only in order to create a file*. We do not care of having an
    inode for each file, and we do not even care of having different names in
    the associated dentries (dentry names will be same for classes of file*).
    This allow code reuse, and will be used by epoll, signalfd and timerfd
    (and whatever else there'll be).

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

18 Feb, 2007

1 commit


09 Dec, 2006

1 commit

  • Introduce several fsstack_copy_* functions which allow stackable filesystems
    (such as eCryptfs and Unionfs) to easily copy over (currently only) inode
    attributes. This prevents code duplication and allows for code reuse.

    [akpm@osdl.org: Remove unneeded wrapper]
    [bunk@stusta.de: fs/stack.c should #include ]
    Signed-off-by: Josef "Jeff" Sipek
    Cc: Michael Halcrow
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

12 Oct, 2006

2 commits


05 Oct, 2006

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6: (292 commits)
    [GFS2] Fix endian bug for de_type
    [GFS2] Initialize SELinux extended attributes at inode creation time.
    [GFS2] Move logging code into log.c (mostly)
    [GFS2] Mark nlink cleared so VFS sees it happen
    [GFS2] Two redundant casts removed
    [GFS2] Remove uneeded endian conversion
    [GFS2] Remove duplicate sb reading code
    [GFS2] Mark metadata reads for blktrace
    [GFS2] Remove iflags.h, use FS_
    [GFS2] Fix code style/indent in ops_file.c
    [GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
    [GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
    [GFS2] inode-diet: Eliminate i_blksize from the inode structure
    [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
    [GFS2] Fix typo in last patch
    [GFS2] Fix direct i/o logic in filemap.c
    [GFS2] Fix bug in Makefiles for lock modules
    [GFS2] Remove (extra) fs_subsys declaration
    [GFS2/DLM] Fix trailing whitespace
    [GFS2] Tidy up meta_io code
    ...

    Linus Torvalds
     

04 Oct, 2006

1 commit

  • eCryptfs is a stacked cryptographic filesystem for Linux. It is derived from
    Erez Zadok's Cryptfs, implemented through the FiST framework for generating
    stacked filesystems. eCryptfs extends Cryptfs to provide advanced key
    management and policy features. eCryptfs stores cryptographic metadata in the
    header of each file written, so that encrypted files can be copied between
    hosts; the file will be decryptable with the proper key, and there is no need
    to keep track of any additional information aside from what is already in the
    encrypted file itself.

    [akpm@osdl.org: updates for ongoing API changes]
    [bunk@stusta.de: cleanups]
    [akpm@osdl.org: alpha build fix]
    [akpm@osdl.org: cleanups]
    [tytso@mit.edu: inode-diet updates]
    [pbadari@us.ibm.com: generic_file_*_read/write() interface updates]
    [rdunlap@xenotime.net: printk format fixes]
    [akpm@osdl.org: make slab creation and teardown table-driven]
    Signed-off-by: Phillip Hellewell
    Signed-off-by: Michael Halcrow
    Signed-off-by: Erez Zadok
    Signed-off-by: Adrian Bunk
    Signed-off-by: Stephan Mueller
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     

02 Oct, 2006

1 commit


01 Oct, 2006

2 commits

  • * fs/open.c is getting bit crowdy
    * preparation to lutimes(2)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

30 Sep, 2006

1 commit

  • The patches solve the following problem: We want to grant access to devices
    based on who is logged in from where, etc. This includes switching back and
    forth between multiple user sessions, etc.

    Using ACLs to define device access for logged-in users gives us all the
    flexibility we need in order to fully solve the problem.

    Device special files nowadays usually live on tmpfs, hence tmpfs ACLs.

    Different distros have come up with solutions that solve the problem to
    different degrees: SUSE uses a resource manager which tracks login sessions
    and sets ACLs on device inodes as appropriate. RedHat uses pam_console, which
    changes the primary file ownership to the logged-in user. Others use a set of
    groups that users must be in in order to be granted the appropriate accesses.

    The freedesktop.org project plans to implement a combination of a
    console-tracker and a HAL-device-list based solution to grant access to
    devices to users, and more distros will likely follow this approach.

    These patches have first been posted here on 2 February 2005, and again
    on 8 January 2006. We have been shipping them in SLES9 and SLES10 with
    no problems reported. The previous submission is archived here:

    http://lkml.org/lkml/2006/1/8/229
    http://lkml.org/lkml/2006/1/8/230
    http://lkml.org/lkml/2006/1/8/231

    This patch:

    Add some infrastructure for access control lists on in-memory
    filesystems such as tmpfs.

    Signed-off-by: Andreas Gruenbacher
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     

03 Jul, 2006

1 commit


27 Jun, 2006

1 commit


20 Jun, 2006

1 commit

  • The following series of patches introduces a kernel API for inotify,
    making it possible for kernel modules to benefit from inotify's
    mechanism for watching inodes. With these patches, inotify will
    maintain for each caller a list of watches (via an embedded struct
    inotify_watch), where each inotify_watch is associated with a
    corresponding struct inode. The caller registers an event handler and
    specifies for which filesystem events their event handler should be
    called per inotify_watch.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     

26 May, 2006

1 commit


18 May, 2006

1 commit


03 Apr, 2006

1 commit


01 Apr, 2006

2 commits

  • Steven Whitehouse
     
  • Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
    fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
    Reasons:

    - It's more flexible. Things which would require two or three syscalls with
    fadvise() can be done in a single syscall.

    - Using fadvise() in this manner is something not covered by POSIX.

    The patch wires up the syscall for x86.

    The sycall is implemented in the new fs/sync.c. The intention is that we can
    move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

    Documentation for the syscall is in fs/sync.c.

    A test app (sync_file_range.c) is in
    http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

    The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
    say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
    NFS_DATA_SYNC which is hopefully the more common."

    Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
    the queue is congested. This is trivial to fix: add a new flag bit, set
    wbc->nonblocking. But I'm not sure that we want to expose implementation
    details down to that level.

    Note: it's notable that we can sync an fd which wasn't opened for writing.
    Same with fsync() and fdatasync()).

    Note: the code takes some care to handle attempts to sync file contents
    outside the 16TB offset on 32-bit machines. It makes such attempts appear to
    succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
    requests fail...

    Cc: Nick Piggin
    Cc: Michael Kerrisk
    Cc: Ulrich Drepper
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

31 Mar, 2006

1 commit

  • This adds support for the sys_splice system call. Using a pipe as a
    transport, it can connect to files or sockets (latter as output only).

    From the splice.c comments:

    "splice": joining two ropes together by interweaving their strands.

    This is the "extended pipe" functionality, where a pipe is used as
    an arbitrary in-memory buffer. Think of a pipe as a small kernel
    buffer that you can use to transfer data from one end to the other.

    The traditional unix read/write is extended with a "splice()" operation
    that transfers data buffers to or from a pipe buffer.

    Named by Larry McVoy, original implementation from Linus, extended by
    Jens to support splicing to files and fixing the initial implementation
    bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

24 Mar, 2006

1 commit


18 Jan, 2006

1 commit

  • This is the core of the distributed lock manager which is required
    to use GFS2 as a cluster filesystem. It is also used by CLVM and
    can be used as a standalone lock manager independantly of either
    of these two projects.

    It implements VAX-style locking modes.

    Signed-off-by: David Teigland
    Signed-off-by: Steve Whitehouse

    David Teigland
     

17 Jan, 2006

1 commit


11 Jan, 2006

1 commit

  • Now that all these entries in the arch ioctl32.c files are gone [1], we can
    build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
    special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
    This is not needed but harmless on all other architectures. Also remove some
    superflous includes in fs/compat_ioctl.c

    Tested on ppc64.

    [1] parisc still had it's PPP handler left, which is not fully correct
    for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
    kick in for all netdevice users. We can introduce a proper handler
    in one of the next patch series by adding a compat_ioctl method to
    struct net_device but for now let's just kill it - parisc doesn't
    compile in mainline anyway and I don't want this to block this
    patchset.

    Signed-off-by: Christoph Hellwig
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

09 Jan, 2006

1 commit

  • Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
    discard as much pagecache and/or reclaimable slab objects as it can. THis
    operation requires root permissions.

    It won't drop dirty data, so the user should run `sync' first.

    Caveats:

    a) Holds inode_lock for exorbitant amounts of time.

    b) Needs to be taught about NUMA nodes: propagate these all the way through
    so the discarding can be controlled on a per-node basis.

    This is a debugging feature: useful for getting consistent results between
    filesystem benchmarks. We could possibly put it under a config option, but
    it's less than 300 bytes.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Jan, 2006

2 commits


08 Nov, 2005

1 commit


10 Sep, 2005

2 commits

  • This patch adds FUSE filesystem to MAINTAINERS, fs/Kconfig and
    fs/Makefile.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • OVERVIEW

    V9FS is a distributed file system for Linux which provides an
    implementation of the Plan 9 resource sharing protocol 9P. It can be
    used to share all sorts of resources: static files, synthetic file servers
    (such as /proc or /sys), devices, and application file servers (such as
    FUSE).

    BACKGROUND

    Plan 9 (http://plan9.bell-labs.com/plan9) is a research operating
    system and associated applications suite developed by the Computing
    Science Research Center of AT&T Bell Laboratories (now a part of
    Lucent Technologies), the same group that developed UNIX , C, and C++.
    Plan 9 was initially released in 1993 to universities, and then made
    generally available in 1995. Its core operating systems code laid the
    foundation for the Inferno Operating System released as a product by
    Lucent Bell-Labs in 1997. The Inferno venture was the only commercial
    embodiment of Plan 9 and is currently maintained as a product by Vita
    Nuova (http://www.vitanuova.com). After updated releases in 2000 and
    2002, Plan 9 was open-sourced under the OSI approved Lucent Public
    License in 2003.

    The Plan 9 project was started by Ken Thompson and Rob Pike in 1985.
    Their intent was to explore potential solutions to some of the
    shortcomings of UNIX in the face of the widespread use of high-speed
    networks to connect machines. In UNIX, networking was an afterthought
    and UNIX clusters became little more than a network of stand-alone
    systems. Plan 9 was designed from first principles as a seamless
    distributed system with integrated secure network resource sharing.
    Applications and services were architected in such a way as to allow
    for implicit distribution across a cluster of systems. Configuring an
    environment to use remote application components or services in place
    of their local equivalent could be achieved with a few simple command
    line instructions. For the most part, application implementations
    operated independent of the location of their actual resources.

    Commercial operating systems haven't changed much in the 20 years
    since Plan 9 was conceived. Network and distributed systems support is
    provided by a patchwork of middle-ware, with an endless number of
    packages supplying pieces of the puzzle. Matters are complicated by
    the use of different complicated protocols for individual services,
    and separate implementations for kernel and application resources.
    The V9FS project (http://v9fs.sourceforge.net) is an attempt to bring
    Plan 9's unified approach to resource sharing to Linux and other
    operating systems via support for the 9P2000 resource sharing
    protocol.

    V9FS HISTORY

    V9FS was originally developed by Ron Minnich and Maya Gokhale at Los
    Alamos National Labs (LANL) in 1997. In November of 2001, Greg Watson
    setup a SourceForge project as a public repository for the code which
    supported the Linux 2.4 kernel.

    About a year ago, I picked up the initial attempt Ron Minnich had
    made to provide 2.6 support and got the code integrated into a 2.6.5
    kernel. I then went through a line-for-line re-write attempting to
    clean-up the code while more closely following the Linux Kernel style
    guidelines. I co-authored a paper with Ron Minnich on the V9FS Linux
    support including performance comparisons to NFSv3 using Bonnie and
    PostMark - this paper appeared at the USENIX/FREENIX 2005
    conference in April 2005:
    ( http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html ).

    CALL FOR PARTICIPATION/REQUEST FOR COMMENTS

    Our 2.6 kernel support is stabilizing and we'd like to begin pursuing
    its integration into the official kernel tree. We would appreciate any
    review, comments, critiques, and additions from this community and are
    actively seeking people to join our project and help us produce
    something that would be acceptable and useful to the Linux community.

    STATUS

    The code is reasonably stable, although there are no doubt corner cases
    our regression tests haven't discovered yet. It is in regular use by several
    of the developers and has been tested on x86 and PowerPC
    (32-bit and 64-bit) in both small and large (LANL cluster) deployments.
    Our current regression tests include fsx, bonnie, and postmark.

    It was our intention to keep things as simple as possible for this
    release -- trying to focus on correctness within the core of the
    protocol support versus a rich set of features. For example: a more
    complete security model and cache layer are in the road map, but
    excluded from this release. Additionally, we have removed support for
    mmap operations at Al Viro's request.

    PERFORMANCE

    Detailed performance numbers and analysis are included in the FREENIX
    paper, but we show comparable performance to NFSv3 for large file
    operations based on the Bonnie benchmark, and superior performance for
    many small file operations based on the PostMark benchmark. Somewhat
    preliminary graphs (from the FREENIX paper) are available
    (http://v9fs.sourceforge.net/perf/index.html).

    RESOURCES

    The source code is available in a few different forms:

    tarballs: http://v9fs.sf.net
    CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
    CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
    Git: rsync://v9fs.graverobber.org/v9fs (webgit: http://v9fs.graverobber.org)
    9P: tcp!v9fs.graverobber.org!6564

    The user-level server is available from either the Plan 9 distribution
    or from http://v9fs.sf.net
    Other support applications are still being developed, but preliminary
    version can be downloaded from sourceforge.

    Documentation on the protocol has historically been the Plan 9 Man
    pages (http://plan9.bell-labs.com/sys/man/5/INDEX.html), but there is
    an effort under way to write a more complete Internet-Draft style
    specification (http://v9fs.sf.net/rfc).

    There are a couple of mailing lists supporting v9fs, but the most used
    is v9fs-developer@lists.sourceforge.net -- please direct/cc your
    comments there so the other v9fs contibutors can participate in the
    conversation. There is also an IRC channel: irc://freenode.net/#v9fs

    This part of the patch contains Documentation, Makefiles, and configuration
    file changes.

    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Van Hensbergen
     

08 Sep, 2005

1 commit

  • Here's the latest version of relayfs, against linux-2.6.11-mm2. I'm hoping
    you'll consider putting this version back into your tree - the previous
    rounds of comment seem to have shaken out all the API issues and the number
    of comments on the code itself have also steadily dwindled.

    This patch is essentially the same as the relayfs redux part 5 patch, with
    some minor changes based on reviewer comments. Thanks again to Pekka
    Enberg for those. The patch size without documentation is now a little
    smaller at just over 40k. Here's a detailed list of the changes:

    - removed the attribute_flags in relay open and changed it to a
    boolean specifying either overwrite or no-overwrite mode, and removed
    everything referencing the attribute flags.
    - added a check for NULL names in relayfs_create_entry()
    - got rid of the unnecessary multiple labels in relay_create_buf()
    - some minor simplification of relay_alloc_buf() which got rid of a
    couple params
    - updated the Documentation

    In addition, this version (through code contained in the relay-apps tarball
    linked to below, not as part of the relayfs patch) tries to make it as easy
    as possible to create the cooperating kernel/user pieces of a typical and
    common type of logging application, one where kernel logging is kicked off
    when a user space data collection app starts and stops when the collection
    app exits, with the data being automatically logged to disk in between. To
    create this type of application, you basically just include a header file
    (relay-app.h, included in the relay-apps tarball) in your kernel module,
    define a couple of callbacks and call an initialization function, and on
    the user side call a single function that sets up and continuously monitors
    the buffers, and writes data to files as it becomes available. Channels
    are created when the collection app is started and destroyed when it exits,
    not when the kernel module is inserted, so different channel buffer sizes
    can be specified for each separate run via command-line options. See the
    README in the relay-apps tarball for details.

    Also included in the relay-apps tarball are a couple examples
    demonstrating how you can use this to create quick and dirty kernel
    logging/debugging applications. They are:

    - tprintk, short for 'tee printk', which temporarily puts a kprobe on
    printk() and writes a duplicate stream of printk output to a relayfs
    channel. This could be used anywhere there's printk() debugging code
    in the kernel which you'd like to exercise, but would rather not have
    your system logs cluttered with debugging junk. You'd probably want
    to kill klogd while you do this, otherwise there wouldn't be much
    point (since putting a kprobe on printk() doesn't change the output
    of printk()). I've used this method to temporarily divert the packet
    logging output of the iptables LOG target from the system logs to
    relayfs files instead, for instance.

    - klog, which just provides a printk-like formatted logging function
    on top of relayfs. Again, you can use this to keep stuff out of your
    system logs if used in place of printk.

    The example applications can be found here:

    http://prdownloads.sourceforge.net/dprobes/relay-apps.tar.gz?download

    From: Christoph Hellwig

    avoid lookup_hash usage in relayfs

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     

13 Jul, 2005

1 commit

  • inotify is intended to correct the deficiencies of dnotify, particularly
    its inability to scale and its terrible user interface:

    * dnotify requires the opening of one fd per each directory
    that you intend to watch. This quickly results in too many
    open files and pins removable media, preventing unmount.
    * dnotify is directory-based. You only learn about changes to
    directories. Sure, a change to a file in a directory affects
    the directory, but you are then forced to keep a cache of
    stat structures.
    * dnotify's interface to user-space is awful. Signals?

    inotify provides a more usable, simple, powerful solution to file change
    notification:

    * inotify's interface is a system call that returns a fd, not SIGIO.
    You get a single fd, which is select()-able.
    * inotify has an event that says "the filesystem that the item
    you were watching is on was unmounted."
    * inotify can watch directories or files.

    Inotify is currently used by Beagle (a desktop search infrastructure),
    Gamin (a FAM replacement), and other projects.

    See Documentation/filesystems/inotify.txt.

    Signed-off-by: Robert Love
    Cc: John McCutchan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Love
     

28 Jun, 2005

1 commit

  • This updates the CFQ io scheduler to the new time sliced design (cfq
    v3). It provides full process fairness, while giving excellent
    aggregate system throughput even for many competing processes. It
    supports io priorities, either inherited from the cpu nice value or set
    directly with the ioprio_get/set syscalls. The latter closely mimic
    set/getpriority.

    This import is based on my latest from -mm.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

23 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds