20 Jul, 2007

40 commits

  • Bruce and David's patches clashed.

    fs/afs/flock.c: In function 'afs_do_getlk':
    fs/afs/flock.c:459: error: void value not ignored as it ought to be

    Cc: "J. Bruce Fields"
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Share a little common code, reverse the arguments for consistency, drop the
    unnecessary "inline", and lowercase the name.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • EX_RDONLY is only called in one place; just put it there.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
    up some unnecessary checks.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • I converted the various export-returning functions to return -ENOENT instead
    of NULL, but missed a few cases.

    This particular case could cause actual bugs in the case of a krb5 client that
    doesn't match any ip-based client and that is trying to access a filesystem
    not exported to krb5 clients.

    Signed-off-by: "J. Bruce Fields"
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • The value of nperbucket calculated here is too small--we should be rounding up
    instead of down--with the result that the index j in the following loop can
    overflow the raparm_hash array. At least in my case, the next thing in memory
    turns out to be export_table, so the symptoms I see are crashes caused by the
    appearance of four zeroed-out export entries in the first bucket of the hash
    table of exports (which were actually entries in the readahead cache, a
    pointer to which had been written to the export table in this initialization
    code).

    It looks like the bug was probably introduced with commit
    fce1456a19f5c08b688c29f00ef90fdfa074c79b ("knfsd: make the readahead params
    cache SMP-friendly").

    Cc:
    Cc: Greg Banks
    Signed-off-by: "J. Bruce Fields"
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

    Here is a short excerpt of the semantic patch performing
    this transformation:

    @@
    type T2;
    expression x;
    identifier f,fld;
    expression E;
    expression E1,E2;
    expression e1,e2,e3,y;
    statement S;
    @@

    x =
    - kmalloc
    + kzalloc
    (E1,E2)
    ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
    - memset((T2)x,0,E1);

    @@
    expression E1,E2,E3;
    @@

    - kzalloc(E1 * E2,E3)
    + kcalloc(E1,E2,E3)

    [akpm@linux-foundation.org: get kcalloc args the right way around]
    Signed-off-by: Yoann Padioleau
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Acked-by: Russell King
    Cc: Bryan Wu
    Acked-by: Jiri Slaby
    Cc: Dave Airlie
    Acked-by: Roland Dreier
    Cc: Jiri Kosina
    Acked-by: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Pierre Ossman
    Cc: Jeff Garzik
    Cc: "David S. Miller"
    Acked-by: Greg KH
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yoann Padioleau
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Similar information can easily be obtained with strace -c.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • The sb_info structure only contains a single pointer to the character device,
    there is no need for the added indirection.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Venus returns an ENOENT error on open, so we shouldn't try to grab the
    filehandle for the returned fd.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • We ignore signals for about 30 seconds to give userspace a chance to see the
    upcall. As we did not block signals we ended up in a busy loop for the
    remainder of the period when a signal is received.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Make the code that processes upcall responses more straightforward, uncovered
    at least one bad assumption. We trusted that vc_inuse would be 0 when upcalls
    are aborted, however the device may have been reopened.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • - Make sure device index is not a negative number.
    - Unlink queued requests when the device is closed to avoid passing them
    to the next opener.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Set MS_NOATIME flag to avoid unnecessary calls when the coda inode is
    accessed.

    Also, set statfs.f_bsize to 4k. 1k is obviously too small for the suggested
    IO size.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • A directory without children may still be busy when it is the cwd for some
    process. We can safely remove such a directory because the VFS prevents
    further operations. Also we don't need to call d_delete as it is already
    called in vfs_rmdir.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • The Coda client sets the directory link count to 1 when it isn't sure how many
    subdirectories we have. In this case we shouldn't change the link count in
    the kernel when a subdirectory is created or removed.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Change the epoch value to forces a refresh instead of clearing the cached
    rights mask and block all further accesses to the object.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • When open fails the fd in the response is uninitialized and we ended up taking
    a reference on the file struct and never released it.

    Signed-off-by: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Harkes
     
  • Looking at the current linus-git tree jbd_debug() define in
    include/linux/jbd2.h

    extern u8 journal_enable_debug;

    #define jbd_debug(n, f, a...) \
    do { \
    if ((n) fs/ext4/inode.c: In function ‘ext4_write_inode’:
    > fs/ext4/inode.c:2906: warning: comparison is always true due to limited
    > range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’:
    > fs/jbd2/recovery.c:254: warning: comparison is always true due to
    > limited range of data type
    > fs/jbd2/recovery.c:257: warning: comparison is always true due to
    > limited range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’:
    > fs/jbd2/recovery.c:301: warning: comparison is always true due to
    > limited range of data type
    >
    Noticed all warnings are occurs when the debug level is 0. Then found
    the "jbd2: Move jbd2-debug file to debugfs" patch
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b

    changed the jbd2_journal_enable_debug from int type to u8, makes the
    jbd_debug comparision is always true when the debugging level is 0. Thus
    the compile warning occurs.

    Thought about changing the jbd2_journal_enable_debug data type back to
    int, but can't, because the jbd2-debug is moved to debug fs, where
    calling debugfs_create_u8() to create the debugfs entry needs the value
    to be u8 type.

    Even if we changed the data type back to int, the code is still buggy,
    kernel should not print jbd2 debug message if the
    jbd2_journal_enable_debug is set to 0. But this is not the case.

    The fix is change the level of debugging to 1. The same should fixed in
    ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
    probably should fix it all together.

    Signed-off-by: Mingming Cao
    Cc: Jeff Garzik
    Cc: Theodore Tso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This patch enables core dump filtering for ELF-FDPIC-formatted core file.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch removes an unused argument from elf_fdpic_dump_segments().

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch enables core dump filtering for ELF-formatted core file.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch adds an interface to set/reset flags which determines each memory
    segment should be dumped or not when a core file is generated.

    /proc//coredump_filter file is provided to access the flags. You can
    change the flag status for a particular process by writing to or reading from
    the file.

    The flag status is inherited to the child process when it is created.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch changes mm_struct.dumpable to a pair of bit flags.

    set_dumpable() converts three-value dumpable to two flags and stores it into
    lower two bits of mm_struct.flags instead of mm_struct.dumpable.
    get_dumpable() behaves in the opposite way.

    [akpm@linux-foundation.org: export set_dumpable]
    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Signed-off-by: Josef 'Jeff' Sipek
    Cc: Al Viro
    Acked-by: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Signed-off-by: Josef 'Jeff' Sipek
    Cc: Al Viro
    Acked-by: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • use vfs_path_lookup instead of open-coding the necessary functionality.

    Signed-off-by: Josef 'Jeff' Sipek
    Acked-by: NeilBrown
    Cc: Al Viro
    Acked-by: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Stackable file systems, among others, frequently need to lookup paths or
    path components starting from an arbitrary point in the namespace
    (identified by a dentry and a vfsmount). Currently, such file systems use
    lookup_one_len, which is frowned upon [1] as it does not pass the lookup
    intent along; not passing a lookup intent, for example, can trigger BUG_ON's
    when stacking on top of NFSv4.

    The first patch introduces a new lookup function to allow lookup starting
    from an arbitrary point in the namespace. This approach has been suggested
    by Christoph Hellwig [2].

    The second patch changes sunrpc to use vfs_path_lookup.

    The third patch changes nfsctl.c to use vfs_path_lookup.

    The fourth patch marks link_path_walk static.

    The fifth, and last patch, unexports path_walk because it is no longer
    unnecessary to call it directly, and using the new vfs_path_lookup is
    cleaner.

    For example, the following snippet of code, looks up "some/path/component"
    in a directory pointed to by parent_{dentry,vfsmnt}:

    err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
    "some/path/component", 0, &nd);
    if (!err) {
    /* exits */

    ...

    /* once done, release the references */
    path_release(&nd);
    } else if (err == -ENOENT) {
    /* doesn't exist */
    } else {
    /* other error */
    }

    VFS functions such as lookup_create can be used on the nameidata structure
    to pass the create intent to the file system.

    Signed-off-by: Josef 'Jeff' Sipek
    Cc: Al Viro
    Acked-by: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
    the old mm into the new mm.

    We create the new mm before the binfmt code runs, and place the new stack at
    the very top of the address space. Once the binfmt code runs and figures out
    where the stack should be, we move it downwards.

    It is a bit peculiar in that we have one task with two mm's, one of which is
    inactive.

    [a.p.zijlstra@chello.nl: limit stack size]
    Signed-off-by: Ollie Wild
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Hugh Dickins
    [bunk@stusta.de: unexport bprm_mm_init]
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ollie Wild
     
  • The purpose of audit_bprm() is to log the argv array to a userspace daemon at
    the end of the execve system call. Since user-space hasn't had time to run,
    this array is still in pristine state on the process' stack; so no need to
    copy it, we can just grab it from there.

    In order to minimize the damage to audit_log_*() copy each string into a
    temporary kernel buffer first.

    Currently the audit code requires that the full argument vector fits in a
    single packet. So currently it does clip the argv size to a (sysctl) limit,
    but only when execve auditing is enabled.

    If the audit protocol gets extended to allow for multiple packets this check
    can be removed.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ollie Wild
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Split ondemand readahead interface into two functions. I think this makes it
    a little clearer for non-readahead experts (like Rusty).

    Internally they both call ondemand_readahead(), but the page argument is
    changed to an obvious boolean flag.

    Signed-off-by: Rusty Russell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Pass real splice size to page_cache_readahead_ondemand().

    The splice code works in chunks of 16 pages internally. The readahead code
    should be told of the overall splice size, instead of the internal chunk size.
    Otherwize bad things may happen. Imagine some 17-page random splice reads.
    The code before this patch will result in two readahead calls: readahead(16);
    readahead(1); That leads to one 16-page I/O and one 32-page I/O: one extra I/O
    and 31 readahead miss pages.

    Signed-off-by: Fengguang Wu
    Cc: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Move synchronous page_cache_readahead_ondemand() call out of splice loop.

    This avoids one pointless page allocation/insertion in case of non-zero
    ra_pages, or many pointless readahead calls in case of zero ra_pages.

    Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will
    not get expected readahead behavior anyway. The splice code works in batches
    of 16 pages, which can be taken as another form of synchronous readahead.

    Signed-off-by: Fengguang Wu
    Cc: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Convert ext3/ext4 dir reads to use on-demand readahead.

    Readahead for dirs operates _not_ on file level, but on blockdev level. This
    makes a difference when the data blocks are not continuous. And the read
    routine is somehow opaque: there's no handy info about the status of current
    page. So a simplified call scheme is employed: to call into readahead
    whenever the current page falls out of readahead windows.

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu