20 Oct, 2007

19 commits

  • Use task_pid() to get leader's 'struct pid' and avoid the find_pid().

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Rename the child_reaper() function to task_child_reaper() to be similar to
    other task_* functions and to distinguish the function from 'struct
    pid_namspace.child_reaper'.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • With multiple pid namespaces, a process is known by some pid_t in every
    ancestor pid namespace. Every time the process forks, the child process also
    gets a pid_t in every ancestor pid namespace.

    While a process is visible in >=1 pid namespaces, it can see pid_t's in only
    one pid namespace. We call this pid namespace it's "active pid namespace",
    and it is always the youngest pid namespace in which the process is known.

    This patch defines and uses a wrapper to find the active pid namespace of a
    process. The implementation of the wrapper will be changed in when support
    for multiple pid namespaces are added.

    Changelog:
    2.6.22-rc4-mm2-pidns1:
    - [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
    task_active_pid_ns() in child_reaper() since task->nsproxy
    can be NULL during task exit (so child_reaper() continues to
    use init_pid_ns).

    to implement child_reaper() since init_pid_ns.child_reaper to
    implement child_reaper() since tsk->nsproxy can be NULL during exit.

    2.6.21-rc6-mm1:
    - Rename task_pid_ns() to task_active_pid_ns() to reflect that a
    process can have multiple pid namespaces.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • The set of functions process_session, task_session, process_group and
    task_pgrp is confusing, as the names can be mixed with each other when looking
    at the code for a long time.

    The proposals are to
    * equip the functions that return the integer with _nr suffix to
    represent that fact,
    * and to make all functions work with task (not process) by making
    the common prefix of the same name.

    For monotony the routines signal_session() and set_signal_session() are
    replaced with task_session_nr() and set_task_session(), especially since they
    are only used with the explicit task->signal dereference.

    Signed-off-by: Pavel Emelianov
    Acked-by: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Cedric Le Goater
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Remove the filesystem support logic from the cpusets system and makes cpusets
    a cgroup subsystem

    The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
    passed through to the cgroup filesystem with the appropriate options to
    emulate the old cpuset filesystem behaviour.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add:

    /proc/cgroups - general system info

    /proc/*/cgroup - per-task cgroup membership info

    [a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Implement support for file systems larger than 8 TiB.

    The reiserfs superblock contains a 16 bit value for counting the number of
    bitmap blocks. The rest of the disk format supports file systems up to 2^32
    blocks, but the bitmap block limitation artificially limits this to 8 TiB with
    a 4KiB block size.

    Rather than trust the superblock's 16-bit bitmap block count, we calculate it
    dynamically based on the number of blocks in the file system. When an
    incorrect value is observed in the superblock, it is zeroed out, ensuring that
    older kernels will not be able to mount the file system.

    Userspace support has already been implemented and shipped in reiserfsprogs
    3.6.20.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • The first_zero_hint metadata caching was never actually used, and it's of
    dubious optimization quality. This patch removes it.

    It doesn't actually shrink the size of the reiserfs_bitmap_info struct, since
    that doesn't work with block sizes larger than 8K. There was a big fixme in
    there, and with all the work lately in allowing block size > page size, I
    might as well kill the fixme as well.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Do a quick signedness check for block numbers. There are a number of places
    where signed integers are used for block numbers, which limits the usable file
    system size to 8 TiB. The disk format, excepting a problem which will be
    fixed in the following patch, supports file systems up to 16 TiB in size.
    This patch cleans up those sites so that we can enable the full usable size.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Correct the memset in reiserfs_resize to clear the memory allocated for the
    new bitmap info structs. Previously, it would clear the memory used by the
    old size. Depending on the contents of memory, this could cause incorrect
    caching behavior for bitmap blocks in the newly allocated area.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Build in is_reusable() unconditionally and use it to catch corruption before
    it reaches the block freeing paths.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Change reiserfs_panic() to use panic() initially instead of BUG(). Using
    BUG() ignores the configurable panic behavior, so systems that should be
    failing and rebooting are left hanging. This causes problems in
    active/standby HA scenarios.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Add I_MUTEX_XATTR annotations to the inode locking in the reiserfs xattr code.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Note from Mingming's JBD2 fix:

    Noticed all warnings are occurs when the debug level is 0. Then found the
    "jbd2: Move jbd2-debug file to debugfs" patch
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b

    changed the jbd2_journal_enable_debug from int type to u8, makes the
    jbd_debug comparision is always true when the debugging level is 0. Thus
    the compile warning occurs.

    Thought about changing the jbd2_journal_enable_debug data type back to int,
    but can't, because the jbd2-debug is moved to debug fs, where calling
    debugfs_create_u8() to create the debugfs entry needs the value to be u8
    type.

    Even if we changed the data type back to int, the code is still buggy,
    kernel should not print jbd2 debug message if the jbd2_journal_enable_debug
    is set to 0. But this is not the case.

    The fix is change the level of debugging to 1. The same should fixed in
    ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
    probably should fix it all together.

    Signed-off-by: Jose R. Santos
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jose R. Santos
     
  • We should really call journal_abort() and not __journal_abort_hard() in
    case of errors. The latter call does not record the error in the journal
    superblock and thus filesystem won't be marked as with errors later (and
    user could happily mount it without any warning).

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but
    create_proc_entry() does not do lookups on file names that are more that
    one directory deep. This causes the entry creation to fail and hence, no
    proc file is created.

    Instead of fixing this on procfs might as well move the jbd2-debug file to
    debugfs which would be the preferred location for this kind of tunable.
    The new location is now /sys/kernel/debug/jbd/jbd-debug.

    [akpm@linux-foundation.org: zillions of cleanups]
    Signed-off-by: Jose R. Santos
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jose R. Santos
     
  • Convert kmalloc to kzalloc() and get rid of the memset().

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This patch uses vm_get_page_prot() to setup vma->vm_page_prot.

    Though inside vm_get_page_prot() the protection flags is AND with
    (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.

    Signed-off-by: Coly Li
    Cc: Hugh Dickins
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Coly Li
     
  • Declarations go into headers.

    Signed-off-by: Miklos Szeredi
    Cc: Ram Pai
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

19 Oct, 2007

21 commits

  • Get rid of sparse related warnings from places that use integer as NULL
    pointer.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stephen Hemminger
    Cc: Andi Kleen
    Cc: Jeff Garzik
    Cc: Matt Mackall
    Cc: Ian Kent
    Cc: Arnd Bergmann
    Cc: Davide Libenzi
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     
  • There are cases when the filesystem will be passed the buffer from a single
    read or write call, namely:

    1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
    through the page cache, but go directly to the userspace fs

    2) currently buffered writes are done with single page requests, but
    if Nick's ->perform_write() patch goes it, it will be possible to
    do larger write requests. But only if the original write() was
    also bigger than a page.

    In these cases the filesystem might want to give a hint to the app
    about the optimal I/O size.

    Allow the userspace filesystem to supply a blksize value to be returned by
    stat() and friends. If the field is zero, it defaults to the old
    PAGE_CACHE_SIZE value.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • For mandatory locking the userspace filesystem needs to know the lock
    ownership for read, write and truncate operations.

    This patch adds the necessary fields to the protocol.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch adds a new helper function fuse_write_fill() which makes it
    possible to send WRITE requests asynchronously.

    A new flag for WRITE requests is also added which indicates that this a write
    from the page cache, and not a "normal" file write.

    This patch is in preparation for writable mmap support.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Each WRITE request must carry a valid file descriptor. When a page is written
    back from a memory mapping, the file through which the page was dirtied is not
    available, so a new mechananism is needed to find a suitable file in
    ->writepage(s).

    A list of fuse_files is added to fuse_inode. The file is removed from the
    list in fuse_release().

    This patch is in preparation for writable mmap support.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • It is trivial to add support for flock(2) semantics to the existing protocol,
    by setting the lock owner field to the file pointer, and passing a new
    FUSE_LK_FLOCK flag with the locking request.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single
    request, instead of separate truncate and open requests.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW. These
    mean, that atime or mtime should be changed to the current time.

    Also it is now possible to update atime or mtime individually, not just
    together.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was
    initiated by open() due to the O_TRUNC flag".

    This way filesystems wanting to implement truncation within their ->open()
    method can ignore such truncate requests.

    This is a quick & dirty hack, but it comes for free.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Clean up supplying open file to the setattr operation. In addition to being a
    cleanup it prepares for the changes in the way the open file is passed to the
    setattr method.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add necessary protocol changes for supplying a file handle with the getattr
    operation. Step the API version to 7.9.

    This patch doesn't actually supply the file handle, because that needs some
    kind of VFS support, which we haven't yet been able to agree upon.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Getattr and lookup operations can be running in parallel to attribute changing
    operations, such as write and setattr.

    This means, that if for example getattr was slower than a write, the cached
    size attribute could be set to a stale value.

    To prevent this race, introduce a per-filesystem attribute version counter.
    This counter is incremented whenever cached attributes are modified, and the
    incremented value stored in the inode.

    Before storing new attributes in the cache, getattr and lookup check, using
    the version number, whether the attributes have been modified during the
    request's lifetime. If so, the returned attributes are not cached, because
    they might be stale.

    Thanks to Jakub Bogusz for the bug report and test program.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Miklos Szeredi
    Cc: Jakub Bogusz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • The following operation didn't check if sending the request was allowed:

    setattr
    listxattr
    statfs

    Some other operations don't explicitly do the check, but VFS calls
    ->permission() which checks this.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • setup_new_group_blocks() manipulates the group descriptor block bh under
    the block_bitmap bh's lock. It shouldn't matter since nobody but resize
    should be touching these blocks, but it's worth fixing up.

    Signed-off-by: Eric Sandeen
    C:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • This patch set supports large block size(>4k,
    Signed-off-by: Mingming Cao
    Cc:
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Takashi Sato
     
  • Remove the hardcoded value 256 in fs/cramfs/inode.c and replaces it with
    CRAMFS_MAXPATHLEN.

    Tested on an i386 box.
    Signed-off-by: Andi Drebes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Drebes
     
  • Remove a variable that is never read.

    Signed-off-by: Andi Drebes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Drebes
     
  • If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
    the setuid/setgid bits. For CIFS, skip the mode change and let the server
    handle it.

    Signed-off-by: Jeff Layton
    Cc: Steven French
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
    the setuid/setgid bits. For NFS, skip the mode change and let the server
    handle it.

    Signed-off-by: Jeff Layton
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • When an unprivileged process attempts to modify a file that has the setuid or
    setgid bits set, the VFS will attempt to clear these bits. The VFS will set
    the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call
    notify_change to clear these bits and set the mode accordingly.

    With a networked filesystem (NFS and CIFS in particular but likely others),
    the client machine or process may not have credentials that allow for setting
    the mode. In some situations, this can lead to file corruption, an operation
    failing outright because the setattr fails, or to races that lead to a mode
    change being reverted.

    In this situation, we'd like to just leave the handling of this to the server
    and ignore these bits. The problem is that by the time the setattr op is
    called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode
    change. The setattr operation has no way to know its intent.

    The following patch fixes this by making notify_change no longer clear the
    ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off
    to the setattr inode op. setattr can then check for the presence of these
    bits, and if they're set it can assume that the mode change was only for the
    purposes of clearing these bits.

    This means that we now have an implicit assumption that notify_change is never
    called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently
    enforces that, so this patch also adds a BUG() if that occurs.

    Signed-off-by: Jeff Layton
    Cc: Michael Halcrow
    Cc: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Cc: "Vladimir V. Saveliev"
    Cc: Josef 'Jeff' Sipek
    Cc: Trond Myklebust
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • reiserfs_setattr can call notify_change recursively using the same
    iattr struct. This could cause it to trip the BUG() in notify_change.
    Fix reiserfs to clear those bits near the beginning of the function.

    Signed-off-by: Jeff Layton
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Cc: "Vladimir V. Saveliev"
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton