13 Dec, 2005

1 commit


23 Nov, 2005

1 commit

  • Correct lots of URLs in Documentation/ Also a few minor whitespace cleanups
    and typo/spello fixes. Sadly there are still a lot of bad URLs remaining.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

09 Nov, 2005

1 commit


07 Nov, 2005

4 commits


03 Nov, 2005

1 commit


11 Oct, 2005

1 commit

  • file operations ->write(), ->aio_write(), and ->writev() for regular
    files. This replaces the old use of generic_file_write(), et al and
    the address space operations ->prepare_write and ->commit_write.
    This means that both sparse and non-sparse (unencrypted and
    uncompressed) files can now be extended using the normal write(2)
    code path. There are two limitations at present and these are that
    we never create sparse files and that we only have limited support
    for highly fragmented files, i.e. ones whose data attribute is split
    across multiple extents. When such a case is encountered,
    EOPNOTSUPP is returned.

    Signed-off-by: Anton Altaparmakov

    Anton Altaparmakov
     

18 Sep, 2005

1 commit


10 Sep, 2005

6 commits

  • Make data caching behavior selectable on a per-open basis instead of
    per-mount. Compatibility for the old mount options 'kernel_cache' and
    'direct_io' is retained in the userspace library (version 2.4.0-pre1 or
    later).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This adds the FUSE device handling functions.

    This contains the following files:

    o dev.c
    - fuse device operations (read, write, release, poll)
    - registers misc device
    - support for sending requests to userspace

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch brings the now out-of-date Documentation/filesystems/vfs.txt
    back to life. Thanks to Carsten Otte, Trond Myklebust, and Anton
    Altaparmakov for their help on updating this documentation.

    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka J Enberg
     
  • Someone complained about the docs for vm_overcommit_memory being wrong.
    This patch copies the text from the vm documentation into procfs.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • OVERVIEW

    V9FS is a distributed file system for Linux which provides an
    implementation of the Plan 9 resource sharing protocol 9P. It can be
    used to share all sorts of resources: static files, synthetic file servers
    (such as /proc or /sys), devices, and application file servers (such as
    FUSE).

    BACKGROUND

    Plan 9 (http://plan9.bell-labs.com/plan9) is a research operating
    system and associated applications suite developed by the Computing
    Science Research Center of AT&T Bell Laboratories (now a part of
    Lucent Technologies), the same group that developed UNIX , C, and C++.
    Plan 9 was initially released in 1993 to universities, and then made
    generally available in 1995. Its core operating systems code laid the
    foundation for the Inferno Operating System released as a product by
    Lucent Bell-Labs in 1997. The Inferno venture was the only commercial
    embodiment of Plan 9 and is currently maintained as a product by Vita
    Nuova (http://www.vitanuova.com). After updated releases in 2000 and
    2002, Plan 9 was open-sourced under the OSI approved Lucent Public
    License in 2003.

    The Plan 9 project was started by Ken Thompson and Rob Pike in 1985.
    Their intent was to explore potential solutions to some of the
    shortcomings of UNIX in the face of the widespread use of high-speed
    networks to connect machines. In UNIX, networking was an afterthought
    and UNIX clusters became little more than a network of stand-alone
    systems. Plan 9 was designed from first principles as a seamless
    distributed system with integrated secure network resource sharing.
    Applications and services were architected in such a way as to allow
    for implicit distribution across a cluster of systems. Configuring an
    environment to use remote application components or services in place
    of their local equivalent could be achieved with a few simple command
    line instructions. For the most part, application implementations
    operated independent of the location of their actual resources.

    Commercial operating systems haven't changed much in the 20 years
    since Plan 9 was conceived. Network and distributed systems support is
    provided by a patchwork of middle-ware, with an endless number of
    packages supplying pieces of the puzzle. Matters are complicated by
    the use of different complicated protocols for individual services,
    and separate implementations for kernel and application resources.
    The V9FS project (http://v9fs.sourceforge.net) is an attempt to bring
    Plan 9's unified approach to resource sharing to Linux and other
    operating systems via support for the 9P2000 resource sharing
    protocol.

    V9FS HISTORY

    V9FS was originally developed by Ron Minnich and Maya Gokhale at Los
    Alamos National Labs (LANL) in 1997. In November of 2001, Greg Watson
    setup a SourceForge project as a public repository for the code which
    supported the Linux 2.4 kernel.

    About a year ago, I picked up the initial attempt Ron Minnich had
    made to provide 2.6 support and got the code integrated into a 2.6.5
    kernel. I then went through a line-for-line re-write attempting to
    clean-up the code while more closely following the Linux Kernel style
    guidelines. I co-authored a paper with Ron Minnich on the V9FS Linux
    support including performance comparisons to NFSv3 using Bonnie and
    PostMark - this paper appeared at the USENIX/FREENIX 2005
    conference in April 2005:
    ( http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html ).

    CALL FOR PARTICIPATION/REQUEST FOR COMMENTS

    Our 2.6 kernel support is stabilizing and we'd like to begin pursuing
    its integration into the official kernel tree. We would appreciate any
    review, comments, critiques, and additions from this community and are
    actively seeking people to join our project and help us produce
    something that would be acceptable and useful to the Linux community.

    STATUS

    The code is reasonably stable, although there are no doubt corner cases
    our regression tests haven't discovered yet. It is in regular use by several
    of the developers and has been tested on x86 and PowerPC
    (32-bit and 64-bit) in both small and large (LANL cluster) deployments.
    Our current regression tests include fsx, bonnie, and postmark.

    It was our intention to keep things as simple as possible for this
    release -- trying to focus on correctness within the core of the
    protocol support versus a rich set of features. For example: a more
    complete security model and cache layer are in the road map, but
    excluded from this release. Additionally, we have removed support for
    mmap operations at Al Viro's request.

    PERFORMANCE

    Detailed performance numbers and analysis are included in the FREENIX
    paper, but we show comparable performance to NFSv3 for large file
    operations based on the Bonnie benchmark, and superior performance for
    many small file operations based on the PostMark benchmark. Somewhat
    preliminary graphs (from the FREENIX paper) are available
    (http://v9fs.sourceforge.net/perf/index.html).

    RESOURCES

    The source code is available in a few different forms:

    tarballs: http://v9fs.sf.net
    CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
    CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
    Git: rsync://v9fs.graverobber.org/v9fs (webgit: http://v9fs.graverobber.org)
    9P: tcp!v9fs.graverobber.org!6564

    The user-level server is available from either the Plan 9 distribution
    or from http://v9fs.sf.net
    Other support applications are still being developed, but preliminary
    version can be downloaded from sourceforge.

    Documentation on the protocol has historically been the Plan 9 Man
    pages (http://plan9.bell-labs.com/sys/man/5/INDEX.html), but there is
    an effort under way to write a more complete Internet-Draft style
    specification (http://v9fs.sf.net/rfc).

    There are a couple of mailing lists supporting v9fs, but the most used
    is v9fs-developer@lists.sourceforge.net -- please direct/cc your
    comments there so the other v9fs contibutors can participate in the
    conversation. There is also an IRC channel: irc://freenode.net/#v9fs

    This part of the patch contains Documentation, Makefiles, and configuration
    file changes.

    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Van Hensbergen
     
  • Add documentation describing the new locking scheme for file descriptor table.

    Signed-off-by: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

09 Sep, 2005

1 commit


08 Sep, 2005

1 commit

  • Here's the latest version of relayfs, against linux-2.6.11-mm2. I'm hoping
    you'll consider putting this version back into your tree - the previous
    rounds of comment seem to have shaken out all the API issues and the number
    of comments on the code itself have also steadily dwindled.

    This patch is essentially the same as the relayfs redux part 5 patch, with
    some minor changes based on reviewer comments. Thanks again to Pekka
    Enberg for those. The patch size without documentation is now a little
    smaller at just over 40k. Here's a detailed list of the changes:

    - removed the attribute_flags in relay open and changed it to a
    boolean specifying either overwrite or no-overwrite mode, and removed
    everything referencing the attribute flags.
    - added a check for NULL names in relayfs_create_entry()
    - got rid of the unnecessary multiple labels in relay_create_buf()
    - some minor simplification of relay_alloc_buf() which got rid of a
    couple params
    - updated the Documentation

    In addition, this version (through code contained in the relay-apps tarball
    linked to below, not as part of the relayfs patch) tries to make it as easy
    as possible to create the cooperating kernel/user pieces of a typical and
    common type of logging application, one where kernel logging is kicked off
    when a user space data collection app starts and stops when the collection
    app exits, with the data being automatically logged to disk in between. To
    create this type of application, you basically just include a header file
    (relay-app.h, included in the relay-apps tarball) in your kernel module,
    define a couple of callbacks and call an initialization function, and on
    the user side call a single function that sets up and continuously monitors
    the buffers, and writes data to files as it becomes available. Channels
    are created when the collection app is started and destroyed when it exits,
    not when the kernel module is inserted, so different channel buffer sizes
    can be specified for each separate run via command-line options. See the
    README in the relay-apps tarball for details.

    Also included in the relay-apps tarball are a couple examples
    demonstrating how you can use this to create quick and dirty kernel
    logging/debugging applications. They are:

    - tprintk, short for 'tee printk', which temporarily puts a kprobe on
    printk() and writes a duplicate stream of printk output to a relayfs
    channel. This could be used anywhere there's printk() debugging code
    in the kernel which you'd like to exercise, but would rather not have
    your system logs cluttered with debugging junk. You'd probably want
    to kill klogd while you do this, otherwise there wouldn't be much
    point (since putting a kprobe on printk() doesn't change the output
    of printk()). I've used this method to temporarily divert the packet
    logging output of the iptables LOG target from the system logs to
    relayfs files instead, for instance.

    - klog, which just provides a printk-like formatted logging function
    on top of relayfs. Again, you can use this to keep stuff out of your
    system logs if used in place of printk.

    The example applications can be found here:

    http://prdownloads.sourceforge.net/dprobes/relay-apps.tar.gz?download

    From: Christoph Hellwig

    avoid lookup_hash usage in relayfs

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     

06 Sep, 2005

2 commits


05 Sep, 2005

1 commit

  • Add a "smaps" entry to /proc/pid: show howmuch memory is resident in each
    mapping.

    People that want to perform a memory consumption analysing can use it
    mainly if someone needs to figure out which libraries can be reduced for
    embedded systems. So the new features are the physical size of shared and
    clean [or dirty]; private and clean [or dirty].

    Take a look the example below:

    # cat /proc/4576/smaps

    08048000-080dc000 r-xp /bin/bash
    Size: 592 KB
    Rss: 500 KB
    Shared_Clean: 500 KB
    Shared_Dirty: 0 KB
    Private_Clean: 0 KB
    Private_Dirty: 0 KB
    080dc000-080e2000 rw-p /bin/bash
    Size: 24 KB
    Rss: 24 KB
    Shared_Clean: 0 KB
    Shared_Dirty: 0 KB
    Private_Clean: 0 KB
    Private_Dirty: 24 KB
    080e2000-08116000 rw-p
    Size: 208 KB
    Rss: 208 KB
    Shared_Clean: 0 KB
    Shared_Dirty: 0 KB
    Private_Clean: 0 KB
    Private_Dirty: 208 KB
    b7e2b000-b7e34000 r-xp /lib/tls/libnss_files-2.3.2.so
    Size: 36 KB
    Rss: 12 KB
    Shared_Clean: 12 KB
    Shared_Dirty: 0 KB
    Private_Clean: 0 KB
    Private_Dirty: 0 KB
    ...

    (Includes a cleanup from "Richard Purdie" )

    From: Torsten Foertsch

    show_smap calls first show_map and then prints its additional information to
    the seq_file. show_map checks if all it has to print fits into the buffer and
    if yes marks the current vma as written. While that is correct for show_map
    it is not for show_smap. Here the vma should be marked as written only after
    the additional information is also written.

    The attached patch cures the problem. It moves the functionality of the
    show_map function to a new function show_map_internal that is called with an
    additional struct mem_size_stats* argument. Then show_map calls
    show_map_internal with NULL as struct mem_size_stats* whereas show_smap calls
    it with a real pointer. Now the final

    if (m->count < m->size) /* vma is copied successfully */
    m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;

    is done only if the whole entry fits into the buffer.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mauricio Lin
     

17 Jul, 2005

1 commit


16 Jul, 2005

1 commit


14 Jul, 2005

1 commit


13 Jul, 2005

1 commit

  • inotify is intended to correct the deficiencies of dnotify, particularly
    its inability to scale and its terrible user interface:

    * dnotify requires the opening of one fd per each directory
    that you intend to watch. This quickly results in too many
    open files and pins removable media, preventing unmount.
    * dnotify is directory-based. You only learn about changes to
    directories. Sure, a change to a file in a directory affects
    the directory, but you are then forced to keep a cache of
    stat structures.
    * dnotify's interface to user-space is awful. Signals?

    inotify provides a more usable, simple, powerful solution to file change
    notification:

    * inotify's interface is a system call that returns a fd, not SIGIO.
    You get a single fd, which is select()-able.
    * inotify has an event that says "the filesystem that the item
    you were watching is on was unmounted."
    * inotify can watch directories or files.

    Inotify is currently used by Beagle (a desktop search infrastructure),
    Gamin (a FAM replacement), and other projects.

    See Documentation/filesystems/inotify.txt.

    Signed-off-by: Robert Love
    Cc: John McCutchan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Love
     

27 Jun, 2005

1 commit

  • The situation: VFS inode X on a mounted ntfs volume is dirty. For
    same inode X, the ntfs_inode is dirty and thus corresponding on-disk
    inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
    to the table of inodes, i.e. $MFT, inode 0.
    What happens:
    Process 1: sys_sync()/umount()/whatever... calls
    __sync_single_inode() for $MFT -> do_writepages() -> write_page for
    the dirty page containing the on-disk inode X, the page is now locked
    -> ntfs_write_mst_block() which clears PageUptodate() on the page to
    prevent anyone else getting hold of it whilst it does the write out.
    This is necessary as the on-disk inode needs "fixups" applied before
    the write to disk which are removed again after the write and
    PageUptodate is then set again. It then analyses the page looking
    for dirty on-disk inodes and when it finds one it calls
    ntfs_may_write_mft_record() to see if it is safe to write this
    on-disk inode. This then calls ilookup5() to check if the
    corresponding VFS inode is in icache(). This in turn calls ifind()
    which waits on the inode lock via wait_on_inode whilst holding the
    global inode_lock.
    Process 2: pdflush results in a call to __sync_single_inode for the
    same VFS inode X on the ntfs volume. This locks the inode (I_LOCK)
    then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
    read_cache_page() for the page (in page cache of table of inodes
    $MFT, inode 0) containing the on-disk inode. This page has
    PageUptodate() clear because of Process 1 (see above) so
    read_cache_page() blocks when it tries to take the page lock for the
    page so it can call ntfs_read_page().
    Thus Process 1 is holding the page lock on the page containing the
    on-disk inode X and it is waiting on the inode X to be unlocked in
    ifind() so it can write the page out and then unlock the page.
    And Process 2 is holding the inode lock on inode X and is waiting for
    the page to be unlocked so it can call ntfs_readpage() or discover
    that Process 1 set PageUptodate() again and use the page.
    Thus we have a deadlock due to ifind() waiting on the inode lock.
    The solution: The fix is to use the newly introduced
    ilookup5_nowait() which does not wait on the inode's lock and hence
    avoids the deadlock. This is safe as we do not care about the VFS
    inode and only use the fact that it is in the VFS inode cache and the
    fact that the vfs and ntfs inodes are one struct in memory to find
    the ntfs inode in memory if present. Also, the ntfs inode has its
    own locking so it does not matter if the vfs inode is locked.

    Signed-off-by: Anton Altaparmakov

    Anton Altaparmakov
     

26 Jun, 2005

1 commit


25 Jun, 2005

1 commit


24 Jun, 2005

1 commit


23 Jun, 2005

1 commit


22 Jun, 2005

2 commits

  • The current isofs treatment of hidden files is flawed in two ways. First,
    it does not provide sufficient granularity; it hides both 'hidden' files
    and 'associated' files (resource fork for Mac files). Second, the default
    behavior to completely strip hidden files, while an admirable
    implementation of the spec, is a poor choice given the real world use of
    hidden files as a poor mans copy protection scheme for MSDOS and Windows
    based systems. A longer description of this is available here:

    http://www.uwsg.iu.edu/hypermail/linux/kernel/0205.3/0267.html

    This patch was originally built after a few private conversations with Alan
    Cox; I shamefully failed to persist in seeing it go forward, I hope to make
    amends now.

    This patch introduces granularity by allowing explicit control for both
    hidden and associated files. It also reverses the default so that by
    default, hidden files are treated as regular files on the iso9660 file
    system.

    This allow Wine to process Windows CDs, including those that are hybrid
    Mac/Windows CDs properly and completely, without our having to go muck up
    peoples fstabs as we do now. (I have tested this with such a hybrid +
    hidden CD and have verified that this patch works as claimed).

    Signed-off-by: Jeremy White
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy White
     
  • To improve shmem scalability, we allowed tmpfs instances which don't need
    their blocks or inodes limited not to count them, and not to allocate any
    sbinfo. Which was okay when the only use for the sbinfo was accounting
    blocks and inodes; but since then a couple of unrelated projects extending
    tmpfs want to store other data in the sbinfo. Whether either extension
    reaches mainline is beside the point: I'm guilty of a bad design decision,
    and should restore sbinfo to make any such future extensions easier.

    So, once again allocate a shmem_sb_info for every shmem/tmpfs instance, and
    now let max_blocks 0 indicate unlimited blocks, and max_inodes 0 unlimited
    inodes. Brent Casavant verified (many months ago) that this does not
    perceptibly impact the scalability (since the unlimited sbinfo cacheline is
    repeatedly accessed but only once dirtied).

    And merge shmem_set_size into its sole caller shmem_remount_fs.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 Jun, 2005

1 commit


22 May, 2005

1 commit


18 May, 2005

1 commit

  • The driver model has a "detach_state" mechanism that:

    - Has never been used by any in-kernel drive;
    - Is superfluous, since driver remove() methods can do the same thing;
    - Became buggy when the suspend() parameter changed semantics and type;
    - Could self-deadlock when called from certain suspend contexts;
    - Is effectively wasted documentation, object code, and headspace.

    This removes that "detach_state" mechanism; net code shrink, as well
    as a per-device saving in the driver model and sysfs.

    Signed-off-by: David Brownell
    Signed-off-by: Greg Kroah-Hartman

    David Brownell
     

05 May, 2005

1 commit


01 May, 2005

2 commits


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds