22 May, 2010

1 commit

  • Don't put struct file on the stack as it takes up quite a lot of space
    and violates lifetime rules for struct file.

    Rather than calling afs_readpage() indirectly from the directory routines by
    way of read_mapping_page(), split afs_readpage() to have afs_page_filler()
    that's given a key instead of a file and call read_cache_page(), specifying the
    new function directly. Use it in afs_readpages() as well.

    Also make use of this in afs_mntpt_check_symlink() too for the same reason.

    Reported-by: Al Viro
    Signed-off-by: Al Viro
    Signed-off-by: David Howells

    Al Viro
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

20 Nov, 2009

1 commit

  • Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
    under OOM conditions, but that are waiting for write to the cache. Under these
    conditions, vmscan calls the releasepage() function of the netfs, asking if a
    page can be discarded.

    The problem is typified by the following trace of a stuck process:

    kslowd005 D 0000000000000000 0 4253 2 0x00000080
    ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
    0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
    000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
    Call Trace:
    [] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
    [] ? autoremove_wake_function+0x0/0x34
    [] ? __fscache_check_page_write+0x63/0x70 [fscache]
    [] nfs_fscache_release_page+0x4e/0xc4 [nfs]
    [] nfs_release_page+0x3c/0x41 [nfs]
    [] try_to_release_page+0x32/0x3b
    [] shrink_page_list+0x316/0x4ac
    [] shrink_inactive_list+0x392/0x67c
    [] ? __mutex_unlock_slowpath+0x100/0x10b
    [] ? trace_hardirqs_on_caller+0x10c/0x130
    [] ? mutex_unlock+0x9/0xb
    [] shrink_list+0x8d/0x8f
    [] shrink_zone+0x278/0x33c
    [] ? ktime_get_ts+0xad/0xba
    [] try_to_free_pages+0x22e/0x392
    [] ? isolate_pages_global+0x0/0x212
    [] __alloc_pages_nodemask+0x3dc/0x5cf
    [] grab_cache_page_write_begin+0x65/0xaa
    [] ext3_write_begin+0x78/0x1eb
    [] generic_file_buffered_write+0x109/0x28c
    [] ? current_fs_time+0x22/0x29
    [] __generic_file_aio_write+0x350/0x385
    [] ? generic_file_aio_write+0x4a/0xae
    [] generic_file_aio_write+0x60/0xae
    [] do_sync_write+0xe3/0x120
    [] ? autoremove_wake_function+0x0/0x34
    [] ? __dentry_open+0x1a5/0x2b8
    [] ? dentry_open+0x82/0x89
    [] cachefiles_write_page+0x298/0x335 [cachefiles]
    [] fscache_write_op+0x178/0x2c2 [fscache]
    [] fscache_op_execute+0x7a/0xd1 [fscache]
    [] slow_work_execute+0x18f/0x2d1
    [] slow_work_thread+0x1c5/0x308
    [] ? autoremove_wake_function+0x0/0x34
    [] ? slow_work_thread+0x0/0x308
    [] kthread+0x7a/0x82
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? tg_shares_up+0x171/0x227
    [] ? kthread+0x0/0x82
    [] ? child_rip+0x0/0x20

    In the above backtrace, the following is happening:

    (1) A page storage operation is being executed by a slow-work thread
    (fscache_write_op()).

    (2) FS-Cache farms the operation out to the cache to perform
    (cachefiles_write_page()).

    (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
    standard write (do_sync_write()) under KERNEL_DS directly from the netfs
    page.

    (4) However, for Ext3 to perform the write, it must allocate some memory, in
    particular, it must allocate at least one page cache page into which it
    can copy the data from the netfs page.

    (5) Under OOM conditions, the memory allocator can't immediately come up with
    a page, so it uses vmscan to find something to discard
    (try_to_free_pages()).

    (6) vmscan finds a clean netfs page it might be able to discard (possibly the
    one it's trying to write out).

    (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
    called with __GFP_WAIT, so the netfs decides to wait for the store to
    complete (__fscache_wait_on_page_write()).

    (8) This blocks a slow-work processing thread - possibly against itself.

    The system ends up stuck because it can't write out any netfs pages to the
    cache without allocating more memory.

    To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
    actually being performed. This means that some data won't make it into the
    cache this time. To support this, a new FS-Cache function is added
    fscache_maybe_release_page() that replaces what the netfs releasepage()
    functions used to do with respect to the cache.

    The decisions fscache_maybe_release_page() makes are counted and displayed
    through /proc/fs/fscache/stats on a line labelled "VmScan". There are four
    counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
    pages that were pending storage when we first looked, but weren't by the time
    we got the object lock; "bsy=N" - pages that we ignored as they were actively
    being written when we looked; and "can=N" - pages that we cancelled the storage
    of.

    What I'd really like to do is alter the behaviour of the cancellation
    heuristics, depending on how necessary it is to expel pages. If there are
    plenty of other pages that aren't waiting to be written to the cache that
    could be ejected first, then it would be nice to hold up on immediate
    cancellation of cache writes - but I don't see a way of doing that.

    Signed-off-by: David Howells

    David Howells
     

28 Aug, 2009

1 commit

  • kAFS crashes when asked to read a symbolic link because page_getlink()
    passes a NULL file pointer to read_mapping_page(), but afs_readpage()
    expects a file pointer from which to extract a key.

    Modify afs_readpage() to request the appropriate key from the calling
    process's keyrings if a file struct is not supplied with one attached.

    Signed-off-by: David Howells
    Acked-by: Anton Blanchard
    Signed-off-by: Linus Torvalds

    David Howells
     

18 Apr, 2009

1 commit

  • If CONFIG_AFS_FSCACHE is not defined, the following warning is displayed when
    fs/afs/file.c is compiled:

    fs/afs/file.c:111: warning: ‘afs_file_readpage_read_complete’ defined but not used

    This occurs because all calls to this function are guarded by
    CONFIG_AFS_FSCACHE. Thus, guard its definition as well.

    Signed-off-by: Matt Kraai
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Matt Kraai
     

03 Apr, 2009

1 commit

  • The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
    through it any attached caches. The kAFS filesystem will use caching
    automatically if it's available.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     

17 Oct, 2008

1 commit

  • Cannot assume writes will fully complete, so this conversion goes the easy
    way and always brings the page uptodate before the write.

    [dhowells@redhat.com: style tweaks]
    Signed-off-by: Nick Piggin
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Jul, 2007

1 commit


10 Jul, 2007

1 commit


11 May, 2007

1 commit

  • Fix a couple of problems with unlinking AFS files.

    (1) The parent directory wasn't being updated properly between unlink() and
    the following lookup().

    It seems that, for some reason, invalidate_remote_inode() wasn't
    discarding the directory contents correctly, so this patch calls
    invalidate_inode_pages2() instead on non-regular files.

    (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a
    source server recorded without oopsing.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

10 May, 2007

2 commits

  • Implement support for writing to regular AFS files, including:

    (1) write

    (2) truncate

    (3) fsync, fdatasync

    (4) chmod, chown, chgrp, utime.

    AFS writeback attempts to batch writes into as chunks as large as it can manage
    up to the point that it writes back 65535 pages in one chunk or it meets a
    locked page.

    Furthermore, if a page has been written to using a particular key, then should
    another write to that page use some other key, the first write will be flushed
    before the second is allowed to take place. If the first write fails due to a
    security error, then the page will be scrapped and reread before the second
    write takes place.

    If a page is dirty and the callback on it is broken by the server, then the
    dirty data is not discarded (same behaviour as NFS).

    Shared-writable mappings are not supported by this patch.

    [akpm@linux-foundation.org: fix a bunch of warnings]
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Make some miscellaneous changes to the AFS filesystem:

    (1) Assert RCU barriers on module exit to make sure RCU has finished with
    callbacks in this module.

    (2) Correctly handle the AFS server returning a zero-length read.

    (3) Split out data zapping calls into one function (afs_zap_data).

    (4) Rename some afs_file_*() functions to afs_*() where they apply to
    non-regular files too.

    (5) Be consistent about the presentation of volume ID:vnode ID in debugging
    output.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

27 Apr, 2007

4 commits

  • Add support for the create, link, symlink, unlink, mkdir, rmdir and
    rename VFS operations to the in-kernel AFS filesystem.

    Also:

    (1) Fix dentry and inode revalidation. d_revalidate should only look at
    state of the dentry. Revalidation of the contents of an inode pointed to
    by a dentry is now separate.

    (2) Fix afs_lookup() to hash negative dentries as well as positive ones.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Add security support to the AFS filesystem. Kerberos IV tickets are added as
    RxRPC keys are added to the session keyring with the klog program. open() and
    other VFS operations then find this ticket with request_key() and either use
    it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open).

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Clean up the AFS sources.

    Also remove references to AFS keys. RxRPC keys are used instead.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

01 Oct, 2006

2 commits


29 Jun, 2006

1 commit


27 Mar, 2006

1 commit

  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

07 Nov, 2005

2 commits

  • This is the fs/ part of the big kfree cleanup patch.

    Remove pointless checks for NULL prior to calling kfree() in fs/.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • afs actually had a write method that returned different errors depending on
    whether some flag was set - better return the standard EINVAL errno.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

30 Oct, 2005

1 commit

  • Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
    a many-threaded application which concurrently initializes different parts of
    a large anonymous area.

    This patch corrects that, by using a separate spinlock per page table page, to
    guard the page table entries in that page, instead of using the mm's single
    page_table_lock. (But even then, page_table_lock is still used to guard page
    table allocation, and anon_vma allocation.)

    In this implementation, the spinlock is tucked inside the struct page of the
    page table page: with a BUILD_BUG_ON in case it overflows - which it would in
    the case of 32-bit PA-RISC with spinlock debugging enabled.

    Splitting the lock is not quite for free: another cacheline access. Ideally,
    I suppose we would use split ptlock only for multi-threaded processes on
    multi-cpu machines; but deciding that dynamically would have its own costs.
    So for now enable it by config, at some number of cpus - since the Kconfig
    language doesn't support inequalities, let preprocessor compare that with
    NR_CPUS. But I don't think it's worth being user-configurable: for good
    testing of both split and unsplit configs, split now at 4 cpus, and perhaps
    change that to 8 later.

    There is a benefit even for singly threaded processes: kswapd can be attacking
    one part of the mm while another part is busy faulting.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Oct, 2005

1 commit

  • - ->releasepage() annotated (s/int/gfp_t), instances updated
    - missing gfp_t in fs/* added
    - fixed misannotation from the original sweep caught by bitwise checks:
    XFS used __nocast both for gfp_t and for flags used by XFS allocator.
    The latter left with unsigned int __nocast; we might want to add a
    different type for those but for now let's leave them alone. That,
    BTW, is a case when __nocast use had been actively confusing - it had
    been used in the same code for two different and similar types, with
    no way to catch misuses. Switch of gfp_t to bitwise had caught that
    immediately...

    One tricky bit is left alone to be dealt with later - mapping->flags is
    a mix of gfp_t and error indications. Left alone for now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

01 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds