20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

17 Jul, 2007

1 commit


10 Jul, 2007

1 commit


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

1 commit


08 May, 2007

2 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd. All depending on whether the filler is async and/or can return
    with a !uptodate page.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

18 Apr, 2007

1 commit

  • This patch should fix or partly fix this bug:
    http://bugzilla.kernel.org/show_bug.cgi?id=8276

    The problem is:

    - if we see "zero link case" during reading inode operation, we call
    ufs_error(which remount fs readonly), but not "mark" inode as bad (1)

    - in readonly case we do not fill some data structures, which are used in
    read and write case (2)

    - VFS call ufs_delete_inode if link count is zero (3)

    so (1)->(3)->(2) cause oops, this patch should fix such scenario

    Signed-off-by: Evgeniy Dushistov
    Cc: Jim Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

17 Mar, 2007

4 commits

  • During modification of code to support UFS2 writing, the case with
    "three indirect" blocks in truncate path was missed, this patch fixes
    this situation.

    Signed-off-by: Evgeniy Dushistov
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • This patch fix behaviour in such test scenario:

    lseek(fd, BIG_OFFSET)
    write(fd, buf, sizeof(buf))
    truncate(BIG_OFFSET)
    truncate(BIG_OFFSET + sizeof(buf))
    read(fd, buf...)

    Because of if file big enough(BIG_OFFSET) we start allocate space by block,
    ordinary block size > page size, so we should zeroize the rest of block in
    truncate(except last framgnet, about which VFS should care), to not get
    garbage, when we extend file.

    Also patch corrects conversion from pointer to block to physical block number,
    this helps in case of not common used UFS types.

    And add to debug output inode number.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • This fixes "change blocks numbers on the fly" in case when "prepare
    write page" is in the call chain, in this case some buffers may be not
    uptodate and not mapped, we should care to map them and load from disk.

    This patch was tested with:
    - ufs regressions simple tests
    - fsx-linux
    - ltp(20060306)
    - untar and build kernel

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • This patch corrects work with time in UFS2 case.

    1) According to UFS2 disk layout modification/access and so on "time"
    should be hold in two variables one 64bit for seconds and another 32bit for
    nanoseconds,

    at now for some unknown reason we suppose that "inode time" holds in
    three variables 32bit for seconds, 32bit for milliseconds and 32bit for
    nanoseconds.

    2) We set amount of nanoseconds in "VFS inode" to 0 during read, instead of
    getting values from "on disk inode"(this should close
    http://bugzilla.kernel.org/show_bug.cgi?id=7991).

    Signed-off-by: Evgeniy Dushistov
    Cc: Bjoern Jacke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

5 commits

  • This patch is inspired by Arjan's "Patch series to mark struct
    file_operations and struct inode_operations const".

    Compile tested with gcc & sparse.

    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Patch adds ability to work with 64bit metadata, this made by replacing work
    with 32bit pointers by inline functions.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • This patch adds into write inode path function to write UFS2 inode, and
    modifys allocate inode path to allocate and init additional inode chunks.

    Also some cleanups:
    - remove not used parameters in some functions
    - remove i_gen field from ufs_inode_info structure,
    there is i_generation in inode structure with same purposes.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • These series of patches add UFS2 write-support. UFS2 - is default file system
    for recent versions of FreeBSD.

    The main differences from UFS1 from write support point of view
    are:
    1)Not all inodes are allocated during formatation of disk.
    2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they
    are 32bit).

    So patch series consist of
    1)make possible mount UFS2 in read-write mode
    2)code to write ufs2 inodes and code to initialize inodes chunks.
    3)work with 64bit meta-data

    I made simple testing like create/deleting/writing/reading/truncating, also I
    ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD
    fsck do not find any errors in fs.

    This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation:
    remove note about bug(it fixed by reallocate blocks on the fly patch) and add
    me in the list of people who want receive bug reports.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

10 Feb, 2007

1 commit

  • This is a fix of regression, which triggered by ~2.6.16.

    Patch with name ufs-directory-and-page-cache-from-blocks-to-pages.patch: in
    additional to conversation from block to page cache mechanism added new
    checks of directory integrity, one of them that directory entry do not
    across directory chunks.

    But some kinds of UFS: OpenStep UFS and Apple UFS (looks like these are the
    same filesystems) have different directory chunk size, then common
    UFSes(BSD and Solaris UFS).

    So this patch adds ability to works with variable size of directory chunks,
    and set it for ufstype=openstep to right size.

    Tested on darwin ufs.

    Signed-off-by: Evgeniy Dushistov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

31 Jan, 2007

3 commits

  • In blocks reallocation function sometimes does not update some of
    buffer_head::b_blocknr, which may and cause data damage.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • During ufs_trunc_direct which is subroutine of ufs::truncate, we try the first
    of all free parts of block and then whole blocks. But we calculate size of
    block's part to free in the wrong way.

    This may cause bad update of used blocks and fragments statistic, and you can
    got report that you have free 32T on 1Gb partition.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • These series of patches result of UFS1 write support stress testing, like
    running fsx-linux, untar and build linux kernel etc

    We pass from ufs::get_block_t to levels below: pointer to the current page, to
    make possible things like reallocation of blocks on the fly, and we also uses
    this pointer for indication, what actually we allocate data block or meta data
    block, but currently we make decision about what we allocate on the wrong
    level, this may and cause oops if we allocate blocks in some special order.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

06 Jan, 2007

1 commit

  • Looks like this is the problem, which point Al Viro some time ago:

    ufs's get_block callback allocates 16k of disk at a time, and links that
    entire 16k into the file's metadata. But because get_block is called for only
    a single buffer_head (a 2k buffer_head in this case?) we are only able to tell
    the VFS that this 2k is buffer_new().

    So when ufs_getfrag_block() is later called to map some more data in the file,
    and when that data resides within the remaining 14k of this fragment,
    ufs_getfrag_block() will incorrectly return a !buffer_new() buffer_head.

    I don't see _right_ way to do nullification of whole block, if use inode
    page cache, some pages may be outside of inode limits (inode size), and
    will be lost; if use blockdev page cache it is possible to zero real data,
    if later inode page cache will be used.

    The simpliest way, as can I see usage of block device page cache, but not only
    mark dirty, but also sync it during "nullification". I use my simple tests
    collection, which I used for check that create,open,write,read,close works on
    ufs, and I see that this patch makes ufs code 18% slower then before.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

09 Dec, 2006

1 commit


08 Dec, 2006

4 commits


11 Oct, 2006

1 commit


01 Oct, 2006

2 commits

  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This patch cleans up generic_file_*_read/write() interfaces. Christoph
    Hellwig gave me the idea for this clean ups.

    In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
    do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
    to cleanup all variants of generic_file_* routines.

    Final available interfaces:

    generic_file_aio_read() - read handler
    generic_file_aio_write() - write handler
    generic_file_aio_write_nolock() - no lock write handler

    __generic_file_aio_write_nolock() - internal worker routine

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

27 Sep, 2006

3 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Conversions from kmalloc+memset to kzalloc.

    Signed-off-by: Panagiotis Issaris
    Jffs2-bit-acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     

28 Aug, 2006

2 commits

  • 1) When we allocated last fragment in ufs_truncate, we read page, check
    if block mapped to address, and if not trying to allocate it. This is
    wrong behaviour, fragment may be NOT allocated, but mapped, this
    happened because of "block map" function not checked allocated fragment
    or not, it just take address of the first fragment in the block, add
    offset of fragment and return result, this is correct behaviour in
    almost all situation except call from ufs_truncate.

    2) Almost all implementation of UFS, which I can investigate have such
    "defect": if you have full disk, and try truncate file, for example 3GB
    to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
    evade from this problem, but "block allocation" algorithm is tied to
    right value of i_lastfrag, and fix of this corner case may slow down of
    ordinaries scenarios, so this patch makes behavior of "truncate"
    operations similar to what other UFS implementations do.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • On UFS, this scenario:
    open(O_TRUNC)
    lseek(1024 * 1024 * 80)
    write("A")
    lseek(1024 * 2)
    write("A")

    may cause access to invalid address.

    This happened because of "goal" is calculated in wrong way in block
    allocation path, as I see this problem exists also in 2.4.

    We use construction like this i_data[lastfrag], i_data array of pointers to
    direct blocks, indirect and so on, it has ceratain size ~20 elements, and
    lastfrag may have value for example 40000.

    Also this patch fixes related to handling such scenario issues, wrong
    zeroing metadata, in case of block(not fragment) allocation, and wrong goal
    calculation, when we allocate block

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

06 Aug, 2006

2 commits

  • ufs_get_locked_page is called twice in ufs code, one time in ufs_truncate
    path(we allocated last block), and another time when fragments are
    reallocated. In ideal world in the second case on allocation/free block
    layer we should not know that things like `truncate' exists, but now with
    such crutch like ufs_get_locked_page we can (or should?) skip truncated
    pages.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • As discussed earlier:
    http://lkml.org/lkml/2006/6/28/136
    this patch fixes such issue:

    `ufs_get_locked_page' takes page from cache
    after that `vmtruncate' takes page and deletes it from cache
    `ufs_get_locked_page' locks page, and reports about EIO error.

    Also because of find_lock_page always return valid page or NULL, we have no
    need to check it if page not NULL.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     

01 Aug, 2006

1 commit


04 Jul, 2006

1 commit

  • The quota code plays interesting games with the lock ordering; to quote Jan:

    | i_mutex of inode containing quota file is acquired after all other
    | quota locks. i_mutex of all other inodes is acquired before quota
    | locks. Quota code makes sure (by resetting inode operations and
    | setting special flag on inode) that noone tries to enter quota code
    | while holding i_mutex on a quota file...

    The good news is that all of this special case i_mutex grabbing happens in the
    (per filesystem) low level quota write function. For this special case we
    need a new I_MUTEX_* nesting level, since this just entirely outside any of
    the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
    this is not ever going to deadlock; and based on that the patch below is what
    it takes to inform lockdep of these very interesting new locking rules.

    The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
    deepest possible level of nesting for i_mutex, and that this only should be
    used in quota write (and possibly read) function of filesystems. This makes
    the lock ordering of the I_MUTEX_* levels:

    I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA

    Has no effect on non-lockdep kernels.

    Signed-off-by: Arjan van de Ven
    Acked-by: Ingo Molnar
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven