18 Jul, 2007

2 commits

  • Signed-off-by: Josef 'Jeff' Sipek
    Acked-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • This patch adds the kernelcore= parameter for x86.

    Once all patches are applied, a new command-line parameter exist and a new
    sysctl. This patch adds the necessary documentation.

    From: Yasunori Goto

    When "kernelcore" boot option is specified, kernel can't boot up on ia64
    because of an infinite loop. In addition, the parsing code can be handled
    in an architecture-independent manner.

    This patch uses common code to handle the kernelcore= parameter. It is
    only available to architectures that support arch-independent zone-sizing
    (i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
    ignore the boot parameter.

    [bunk@stusta.de: make cmdline_parse_kernelcore() static]
    Signed-off-by: Mel Gorman
    Signed-off-by: Yasunori Goto
    Acked-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Jul, 2007

4 commits

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
    [PATCH] ocfs2: zero_user_page conversion
    ocfs2: Support xfs style space reservation ioctls
    ocfs2: support for removing file regions
    ocfs2: update truncate handling of partial clusters
    ocfs2: btree support for removal of arbirtrary extents
    ocfs2: Support creation of unwritten extents
    ocfs2: support writing of unwritten extents
    ocfs2: small cleanup of ocfs2_write_begin_nolock()
    ocfs2: btree changes for unwritten extents
    ocfs2: abstract btree growing calls
    ocfs2: use all extent block suballocators
    ocfs2: plug truncate into cached dealloc routines
    ocfs2: simplify deallocation locking
    ocfs2: harden buffer check during mapping of page blocks
    ocfs2: shared writeable mmap
    ocfs2: factor out write aops into nolock variants
    ocfs2: rework ocfs2_buffered_write_cluster()
    ocfs2: take ip_alloc_sem during entire truncate
    ocfs2: Add "preferred slot" mount option
    [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
    ...

    Linus Torvalds
     
  • Update Documentation/filesystems/vfs.txt

    Signed-off-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Update the description of struct file_system_type and get_sb() in
    Documentation/filesystems/vfs.txt to match the current code.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Documentation for the /proc/$pid/stat file.

    Signed-off-by: Kees Cook
    Cc: Rob Landley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

11 Jul, 2007

3 commits

  • Sometimes other drivers depend on particular configfs items. For
    example, ocfs2 mounts depend on a heartbeat region item. If that
    region item is removed with rmdir(2), the ocfs2 mount must BUG or go
    readonly. Not happy.

    This provides two additional API calls: configfs_depend_item() and
    configfs_undepend_item(). A client driver can call
    configfs_depend_item() on an existing item to tell configfs that it is
    depended on. configfs will then return -EBUSY from rmdir(2) for that
    item. When the item is no longer depended on, the client driver calls
    configfs_undepend_item() on it.

    These API cannot be called underneath any configfs callbacks, as
    they will conflict. They can block and allocate. A client driver
    probably shouldn't calling them of its own gumption. Rather it should
    be providing an API that external subsystems call.

    How does this work? Imagine the ocfs2 mount process. When it mounts,
    it asks for a heart region item. This is done via a call into the
    heartbeat code. Inside the heartbeat code, the region item is looked
    up. Here, the heartbeat code calls configfs_depend_item(). If it
    succeeds, then heartbeat knows the region is safe to give to ocfs2.
    If it fails, it was being torn down anyway, and heartbeat can gracefully
    pass up an error.

    [ Fixed some bad whitespace in configfs.txt. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add a notification callback, ops->disconnect_notify(). It has the same
    prototype as ->drop_item(), but it will be called just before the item
    linkage is broken. This way, configfs users who want to do work while
    the object is still in the heirarchy have a chance.

    Client drivers will still need to config_item_put() in their
    ->drop_item(), if they implement it. They need do nothing in
    ->disconnect_notify(). They don't have to provide it if they don't
    care. But someone who wants to be notified before ci_parent is set to
    NULL can now be notified.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Convert the su_sem member of struct configfs_subsystem to a struct
    mutex, as that's what it is. Also convert all the users and update
    Documentation/configfs.txt and Documentation/configfs_example.c
    accordingly.

    [ Conflict in fs/dlm/config.c with commit
    3168b0780d06ace875696f8a648d04d6089654e5 manually resolved. --Mark ]

    Inspired-by: Satyam Sharma
    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

09 Jun, 2007

1 commit

  • Randy Dunlap reports that a tmpfs, mounted with NUMA mpol= specifying an
    offline node, crashes as soon as data is allocated upon it. Now restrict it
    to online nodes, where before it restricted to MAX_NUMNODES.

    Signed-off-by: Hugh Dickins
    Cc: Robin Holt
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Tested-and-acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

25 May, 2007

1 commit


09 May, 2007

7 commits

  • This patch substitutes i_sem by i_mutex in
    Documentation/filesystems/Locking.
    The patch also removes a couple of trailing white-spaces.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Adrian Bunk

    Artem Bityutskiy
     
  • Fix various typos in kernel docs and Kconfigs, 2.6.21-rc4.

    Signed-off-by: Matt LaPlante
    Signed-off-by: Adrian Bunk

    Matt LaPlante
     
  • Signed-off-by: Randy Dunlap
    Signed-off-by: Adrian Bunk

    Randy Dunlap
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
    JFS: Fix race waking up jfsIO kernel thread
    JFS: use __set_current_state()
    Copy i_flags to jfs inode flags on write
    JFS: document uid, gid, and umask mount options in jfs.txt

    Linus Torvalds
     
  • It seems that the recent Windows changed specification, and it's
    undocumented. Windows doesn't update ->free_clusters correctly.

    This patch doesn't use ->free_clusters by default. (instead, add "usefree"
    for forcing to use it)

    Signed-off-by: OGAWA Hirofumi
    Cc: Juergen Beisert
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • 1) Introduces a new method in 'struct dentry_operations'. This method
    called d_dname() might be called from d_path() to build a pathname for
    special filesystems. It is called without locks.

    Future patches (if we succeed in having one common dentry for all
    pipes/sockets) may need to change prototype of this method, but we now
    use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

    2) Adds a dynamic_dname() helper function that eases d_dname() implementations

    3) Defines d_dname method for sockets : No more sprintf() at socket
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    4) Defines d_dname method for pipes : No more sprintf() at pipe
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
    *nice* speedup on my Pentium(M) 1.6 Ghz :

    3.090 s instead of 3.450 s

    Signed-off-by: Eric Dumazet
    Acked-by: Christoph Hellwig
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
    information about the memory location and usage of processes. Issues:

    - maps should not be world-readable, especially if programs expect any
    kind of ASLR protection from local attackers.
    - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
    check the maps when %n is in a *printf call, and a setuid(getuid())
    process wouldn't be able to read its own maps file. (For reference
    see http://lkml.org/lkml/2006/1/22/150)
    - a system-wide toggle is needed to allow prior behavior in the case of
    non-root applications that depend on access to the maps contents.

    This change implements a check using "ptrace_may_attach" before allowing
    access to read the maps contents. To control this protection, the new knob
    /proc/sys/kernel/maps_protect has been added, with corresponding updates to
    the procfs documentation.

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: New sysctl numbers are old hat]
    Signed-off-by: Kees Cook
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

08 May, 2007

1 commit

  • Adds /proc/pid/clear_refs. When any non-zero number is written to this file,
    pte_mkold() and ClearPageReferenced() is called for each pte and its
    corresponding page, respectively, in that task's VMAs. This file is only
    writable by the user who owns the task.

    It is now possible to measure _approximately_ how much memory a task is using
    by clearing the reference bits with

    echo 1 > /proc/pid/clear_refs

    and checking the reference count for each VMA from the /proc/pid/smaps output
    at a measured time interval. For example, to observe the approximate change
    in memory footprint for a task, write a script that clears the references
    (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
    extracts the size in kB. Add the sizes for each VMA together for the total
    referenced footprint. Moments later, repeat the process and observe the
    difference.

    For example, using an efficient Mozilla:

    accumulated time referenced memory
    ---------------- -----------------
    0 s 408 kB
    1 s 408 kB
    2 s 556 kB
    3 s 1028 kB
    4 s 872 kB
    5 s 1956 kB
    6 s 416 kB
    7 s 1560 kB
    8 s 2336 kB
    9 s 1044 kB
    10 s 416 kB

    This is a valuable tool to get an approximate measurement of the memory
    footprint for a task.

    Cc: Hugh Dickins
    Cc: Paul Mundt
    Cc: Christoph Lameter
    Signed-off-by: David Rientjes
    [akpm@linux-foundation.org: build fixes]
    [mpm@selenic.com: rename for_each_pmd]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

27 Apr, 2007

1 commit


26 Apr, 2007

1 commit


10 Mar, 2007

1 commit


05 Mar, 2007

1 commit


21 Feb, 2007

1 commit

  • simple_prepare_write leaks uninitialised kernel data. This happens because
    the it leaves an uninitialised "hole" over the part of the page that the
    write is expected to go to. This is fine, but it then marks the page
    uptodate, which means a concurrent read can come in and copy the
    uninitialised memory into userspace before it written to.

    Fix it by simply marking it uptodate in simple_commit_write instead, after
    the hole has been filled in. This could theoretically break an fs that
    uses simple_prepare_write and not simple_commit_write, and that relies on
    the incorrect simple_prepare_write behaviour. Luckily, none of those
    exists in the tree.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

20 Feb, 2007

1 commit


19 Feb, 2007

1 commit

  • While cacheing is generally frowned upon in the 9p world, it has its
    place -- particularly in situations where the remote file system is
    exclusive and/or read-only. The vacfs views of venti content addressable
    store are a real-world instance of such a situation. To facilitate higher
    performance for these workloads (and eventually use the fscache patches),
    we have enabled a "loose" cache mode which does not attempt to maintain
    any form of consistency on the page-cache or dcache. This results in over
    two orders of magnitude performance improvement for cacheable block reads
    in the Bonnie benchmark. The more aggressive use of the dcache also seems
    to improve metadata operational performance.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     

18 Feb, 2007

1 commit


13 Feb, 2007

1 commit

  • These series of patches add UFS2 write-support. UFS2 - is default file system
    for recent versions of FreeBSD.

    The main differences from UFS1 from write support point of view
    are:
    1)Not all inodes are allocated during formatation of disk.
    2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they
    are 32bit).

    So patch series consist of
    1)make possible mount UFS2 in read-write mode
    2)code to write ufs2 inodes and code to initialize inodes chunks.
    3)work with 64bit meta-data

    I made simple testing like create/deleting/writing/reading/truncating, also I
    ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD
    fsck do not find any errors in fs.

    This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation:
    remove note about bug(it fixed by reallocate blocks on the fly patch) and add
    me in the list of people who want receive bug reports.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

12 Feb, 2007

1 commit

  • Mathieu originally needed to add this for tracing Xen, but it's something
    that's needed for any application that can be tracing while cpus are added.

    unplug isn't supported by this patch. The thought was that at minumum a new
    buffer needs to be added when a cpu comes up, but it wasn't worth the effort
    to remove buffers on cpu down since they'd be freed soon anyway when the
    channel was closed.

    [zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock]
    Signed-off-by: Mathieu Desnoyers
    Cc: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

27 Jan, 2007

1 commit


18 Jan, 2007

1 commit


12 Jan, 2007

1 commit

  • NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

    Signed-off-by: Trond Myklebust

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Peter Zijlstra
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

31 Dec, 2006

1 commit


14 Dec, 2006

1 commit


08 Dec, 2006

5 commits

  • We forgot to document the atime_quantum mount option in ocfs2.txt. This adds
    a proper description of how it works.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • Remove two different changelog files from fs/sysv/ and merges the INTRO
    file into Documentation/filesystems/sysv-fs.txt

    Signed-off-by: Adrian Bunk
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Fixed long-lived typo: remount_fs() needs BKL

    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Add 'blksize' option for block device based filesystems. During
    initialization this is used to set the block size on the device and the super
    block. The default block size is 512bytes.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • I never intended this, but people started using fuse to implement block device
    based "real" filesystems (ntfs-3g, zfs).

    The following four patches add better support for these kinds of filesystems.
    Unlike "normal" fuse filesystems, using this feature should require superuser
    privileges (enforced by the fusermount utility).

    Thanks to Szabolcs Szakacsits for the input and testing.

    This patch adds a 'fuseblk' filesystem type, which is only different from the
    'fuse' filesystem type in how the 'dev_name' mount argument is interpreted.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

30 Nov, 2006

1 commit