26 Mar, 2009

2 commits


12 Feb, 2009

1 commit

  • For a reason that I was unable to understand in three months of debugging,
    mount ext2 -o remount stopped working properly when remounting from
    regular operation to xip, or the other way around. According to a git
    bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
    rework in the vm:

    commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
    Author: Nick Piggin
    Date: Mon Apr 28 02:13:02 2008 -0700

    xip: support non-struct page backed memory

    In the failing scenario, the filesystem is mounted read only via root=
    kernel parameter on s390x. During remount (in rc.sysinit), the inodes of
    the bash binary and its libraries are busy and cannot be invalidated (the
    bash which is running rc.sysinit resides on subject filesystem).
    Afterwards, another bash process (running ifup-eth) recurses into a
    subshell, runs dup_mm (via fork). Some of the mappings in this bash
    process were created from inodes that could not be invalidated during
    remount.

    Both parent and child process crash some time later due to inconsistencies
    in their address spaces. The issue seems to be timing sensitive, various
    attempts to recreate it have failed.

    This patch refuses to change the xip flag during remount in case some
    inodes cannot be invalidated. This patch keeps users from running into
    that issue.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Carsten Otte
    Cc: Nick Piggin
    Cc: Jared Hulbert
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     

16 Jan, 2009

1 commit

  • We used to just write changed page for IS_DIRSYNC inodes. But we also
    have to update the directory inode itself just for the case that we've
    allocated a new block and changed i_size.

    [akpm@linux-foundation.org: still sync the data page]
    Signed-off-by: Jan Kara
    Tested-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

09 Jan, 2009

4 commits

  • At the moment there are few restrictions on which flags may be set on
    which inodes. Specifically DIRSYNC may only be set on directories and
    IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
    TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
    on non-regular file, non-directories.

    Introduces a flags masking function which masks flags based on mode and
    use it during inode creation and when flags are set via the ioctl to
    facilitate future consistency.

    Signed-off-by: Duane Griffin
    Acked-by: Andreas Dilger
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Duane Griffin
     
  • At present BTREE/INDEX is the only flag that new ext2 inodes do NOT
    inherit from their parent. In addition prevent the flags DIRTY, ECOMPR,
    INDEX, IMAGIC and TOPDIR from being inherited. List inheritable flags
    explicitly to prevent future flags from accidentally being inherited.

    This fixes the TOPDIR flag inheritance bug reported at
    http://bugzilla.kernel.org/show_bug.cgi?id=9866.

    Signed-off-by: Duane Griffin
    Acked-by: Andreas Dilger
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Duane Griffin
     
  • As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes on 64-bit
    which makes it a very bad fit for SLAB allocators. The culprit of the
    wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
    NR_CPUS >= 32.

    To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
    page in the worst case, separately. This shinks down struct ext2_sb_info
    enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
    32 KB saving 15 KB of memory.

    Acked-by: Andreas Dilger
    Signed-off-by: Pekka Enberg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka J Enberg
     
  • There is no argument named @chain in ext2_splice_branch, remove references
    to it.

    Signed-off-by: Qinghuang Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qinghuang Feng
     

01 Jan, 2009

2 commits

  • * make ext2_new_inode() put the inode into icache in locked state
    * do not unlock until the inode is fully set up; otherwise nfsd
    might pick it in half-baked state.
    * make sure that ext2_new_inode() does *not* lead to two inodes with the
    same inumber hashed at the same time; otherwise a bogus fhandle coming
    from nfsd might race with inode creation:

    nfsd: iget_locked() creates inode
    nfsd: try to read from disk, block on that.
    ext2_new_inode(): allocate inode with that inumber
    ext2_new_inode(): insert it into icache, set it up and dirty
    ext2_write_inode(): get the relevant part of inode table in cache,
    set the entry for our inode (and start writing to disk)
    nfsd: get CPU again, look into inode table, see nice and sane on-disk
    inode, set the in-core inode from it

    oops - we have two in-core inodes with the same inumber live in icache,
    both used for IO. Welcome to fs corruption...

    Signed-off-by: Al Viro

    Al Viro
     
  • Ensure fast symlink targets are NUL-terminated, even if corrupted
    on-disk.

    Cc: Andrew Morton
    Signed-off-by: Duane Griffin
    Signed-off-by: Al Viro

    Duane Griffin
     

14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: linux-ext4@vger.kernel.org
    Signed-off-by: James Morris

    David Howells
     

24 Oct, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
    [PATCH] kill the rest of struct file propagation in block ioctls
    [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
    [PATCH] get rid of blkdev_locked_ioctl()
    [PATCH] get rid of blkdev_driver_ioctl()
    [PATCH] sanitize blkdev_get() and friends
    [PATCH] remember mode of reiserfs journal
    [PATCH] propagate mode through swsusp_close()
    [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
    [PATCH] pass fmode_t to blkdev_put()
    [PATCH] kill the unused bsize on the send side of /dev/loop
    [PATCH] trim file propagation in block/compat_ioctl.c
    [PATCH] end of methods switch: remove the old ones
    [PATCH] switch sr
    [PATCH] switch sd
    [PATCH] switch ide-scsi
    [PATCH] switch tape_block
    [PATCH] switch dcssblk
    [PATCH] switch dasd
    [PATCH] switch mtd_blkdevs
    [PATCH] switch mmc
    ...

    Linus Torvalds
     

23 Oct, 2008

2 commits


21 Oct, 2008

2 commits


17 Oct, 2008

2 commits

  • A very large directory with many read failures (either due to storage
    problems, or due to invalid size & blocks from corruption) will generate a
    printk storm as the filesystem continues to try to read all the blocks.
    This flood of messages can tie up the box until it is complete - which may
    be a very long time, especially for very large corrupted values.

    This is fixed by only reporting the corruption once each time we try to
    read the directory.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: Eugene Teo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • We could run into ENOSPC error on ext2, even when there is free blocks on
    the filesystem.

    The problem is triggered in the case the goal block group has 0 free
    blocks , and the rest block groups are skipped due to the check of
    "free_blocks < windowsz/2". Current code could fall back to non
    reservation allocation to prevent early ENOSPC after examing all the block
    groups with reservation on , but this code was bypassed if the reservation
    window is turned off already, which is true in this case.

    This patch fixed two issues:
    1) We don't need to turn off block reservation if the goal block group has
    0 free blocks left and continue search for the rest of block groups.

    Current code the intention is to turn off the block reservation if the
    goal allocation group has a few (some) free blocks left (not enough for
    make the desired reservation window),to try to allocation in the goal
    block group, to get better locality. But if the goal blocks have 0 free
    blocks, it should leave the block reservation on, and continues search for
    the next block groups,rather than turn off block reservation completely.

    2) we don't need to check the window size if the block reservation is off.

    The problem was originally found and fixed in ext4.

    Signed-off-by: Mingming Cao
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

14 Oct, 2008

1 commit

  • This is a much better version of a previous patch to make the parser
    tables constant. Rather than changing the typedef, we put the "const" in
    all the various places where its required, allowing the __initconst
    exception for nfsroot which was the cause of the previous trouble.

    This was posted for review some time ago and I believe its been in -mm
    since then.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Steven Whitehouse
     

04 Oct, 2008

1 commit

  • Any block based fs (this patch includes ext3) just has to declare its own
    fiemap() function and then call this generic function with its own
    get_block_t. This works well for block based filesystems that will map
    multiple contiguous blocks at one time, but will work for filesystems that
    only map one block at a time, you will just end up with an "extent" for each
    block. One gotcha is this will not play nicely where there is hole+data
    after the EOF. This function will assume its hit the end of the data as soon
    as it hits a hole after the EOF, so if there is any data past that it will
    not pick that up. AFAIK no block based fs does this anyway, but its in the
    comments of the function anyway just in case.

    Signed-off-by: Josef Bacik
    Signed-off-by: Mark Fasheh
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org

    Josef Bacik
     

29 Jul, 2008

1 commit

  • When we read some part of a file through pagecache, if there is a
    pagecache of corresponding index but this page is not uptodate, read IO
    is issued and this page will be uptodate.

    I think this is good for pagesize == blocksize environment but there is
    room for improvement on pagesize != blocksize environment. Because in
    this case a page can have multiple buffers and even if a page is not
    uptodate, some buffers can be uptodate.

    So I suggest that when all buffers which correspond to a part of a file
    that we want to read are uptodate, use this pagecache and copy data from
    this pagecache to user buffer even if a page is not uptodate. This can
    reduce read IO and improve system throughput.

    I wrote a benchmark program and got result number with this program.

    This benchmark do:

    1: mount and open a test file.

    2: create a 512MB file.

    3: close a file and umount.

    4: mount and again open a test file.

    5: pwrite randomly 300000 times on a test file. offset is aligned
    by IO size(1024bytes).

    6: measure time of preading randomly 100000 times on a test file.

    The result was:
    2.6.26
    330 sec

    2.6.26-patched
    226 sec

    Arch:i386
    Filesystem:ext3
    Blocksize:1024 bytes
    Memory: 1GB

    On ext3/4, a file is written through buffer/block. So random read/write
    mixed workloads or random read after random write workloads are optimized
    with this patch under pagesize != blocksize environment. This test result
    showed this.

    The benchmark program is as follows:

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define LEN 1024
    #define LOOP 1024*512 /* 512MB */

    main(void)
    {
    unsigned long i, offset, filesize;
    int fd;
    char buf[LEN];
    time_t t1, t2;

    if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
    perror("cannot mount\n");
    exit(1);
    }
    memset(buf, 0, LEN);
    fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
    if (fd < 0) {
    perror("cannot open file\n");
    exit(1);
    }
    for (i = 0; i < LOOP; i++)
    write(fd, buf, LEN);
    close(fd);
    if (umount("/root/test1/") < 0) {
    perror("cannot umount\n");
    exit(1);
    }
    if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
    perror("cannot mount\n");
    exit(1);
    }
    fd = open("/root/test1/testfile", O_RDWR);
    if (fd < 0) {
    perror("cannot open file\n");
    exit(1);
    }

    filesize = LEN * LOOP;
    for (i = 0; i < 300000; i++){
    offset = (random() % filesize) & (~(LEN - 1));
    pwrite(fd, buf, LEN, offset);
    }
    printf("start test\n");
    time(&t1);
    for (i = 0; i < 100000; i++){
    offset = (random() % filesize) & (~(LEN - 1));
    pread(fd, buf, LEN, offset);
    }
    time(&t2);
    printf("%ld sec\n", t2-t1);
    close(fd);
    if (umount("/root/test1/") < 0) {
    perror("cannot umount\n");
    exit(1);
    }
    }

    Signed-off-by: Hisashi Hifumi
    Cc: Nick Piggin
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

27 Jul, 2008

2 commits

  • * kill nameidata * argument; map the 3 bits in ->flags anybody cares
    about to new MAY_... ones and pass with the mask.
    * kill redundant gfs2_iop_permission()
    * sanitize ecryptfs_permission()
    * fix remaining places where ->permission() instances might barf on new
    MAY_... found in mask.

    The obvious next target in that direction is permission(9)

    folded fix for nfs_permission() breakage from Miklos Szeredi

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

2 commits

  • Move declarations of some macros, which should be in fact functions to
    quotaops.h. This way they can be later converted to inline functions
    because we can now use declarations from quota.h. Also add necessary
    includes of quotaops.h to a few files.

    [akpm@linux-foundation.org: fix JFS build]
    [akpm@linux-foundation.org: fix UFS build]
    [vegard.nossum@gmail.com: fix QUOTA=n build]
    Signed-off-by: Jan Kara
    Cc: Vegard Nossum
    Cc: Arjen Pool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • remove the definitions of macros:
    XATTR_TRUSTED_PREFIX
    XATTR_USER_PREFIX
    since they are defined in linux/xattr.h

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shen Feng
     

28 Apr, 2008

10 commits

  • If the block allocator gets blocks out of system zone ext2 calls ext2_error.
    But if the file system is mounted with errors=continue retry block allocation.
    We need to mark the system zone blocks as in use to make sure retry don't
    pick them again

    System zone is the block range mapping block bitmap, inode bitmap and inode
    table.

    [akpm@linux-foundation.org: fix typo in comment]
    Signed-off-by: Aneesh Kumar K.V
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
    side-effects to allow a definition of BUG_ON that drops the code completely.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @ disable unlikely @ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (unlikely(E)) { BUG(); }
    + BUG_ON(E);
    )

    @@ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (E) { BUG(); }
    + BUG_ON(E);
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • Use ext2_fsblk_t type for filesystem-wide blocks number

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use ext2_group_first_block_no() and assign the return values to
    ext2_fsblk_t variables.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Improve ext2_readdir() return value for ext2_get_page() failure by using the
    actual result of ext2_get_page().

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Convert byte order of constant instead of variable which can be done at
    compile time (vs run time).

    Signed-off-by: Marcin Slusarz
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • replace all:
    little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
    expression_in_cpu_byteorder);
    with:
    leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
    generated with semantic patch

    Signed-off-by: Marcin Slusarz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
    the user mappings.

    This requires the get_xip_page API to be changed to an address based one.
    Improve the API layering a little bit too, while we're here.

    This is required in order to support XIP filesystems on memory that isn't
    backed with struct page (but memory with struct page is still supported too).

    Signed-off-by: Nick Piggin
    Acked-by: Carsten Otte
    Cc: Jared Hulbert
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Alter the block device ->direct_access() API to work with the new
    get_xip_mem() API (that requires both kaddr and pfn are returned).

    Some architectures will not do the right thing in their virt_to_page() for use
    by XIP (to translate from the kernel virtual address returned by
    direct_access(), to a user mappable pfn in XIP's page fault handler.

    However, we can't switch it to just return the pfn and not the kaddr, because
    we have no good way to get a kva from a pfn, and XIP requires the kva for its
    read(2) and write(2) handlers. So we have to return both.

    Signed-off-by: Jared Hulbert
    Signed-off-by: Nick Piggin
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: linux-mm@kvack.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jared Hulbert
     

22 Apr, 2008

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
    DOC: A couple corrections and clarifications in USB doc.
    Generate a slightly more informative error msg for bad HZ
    fix typo "is" -> "if" in Makefile
    ext*: spelling fix prefered -> preferred
    DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
    KEYS: Fix the comment to match the file name in rxrpc-type.h.
    RAID: remove trailing space from printk line
    DMA engine: typo fixes
    Remove unused MAX_NODES_SHIFT
    MAINTAINERS: Clarify access to OCFS2 development mailing list.
    V4L: Storage class should be before const qualifier (sn9c102)
    V4L: Storage class should be before const qualifier
    sonypi: Storage class should be before const qualifier
    intel_menlow: Storage class should be before const qualifier
    DVB: Storage class should be before const qualifier
    arm: Storage class should be before const qualifier
    ALSA: Storage class should be before const qualifier
    acpi: Storage class should be before const qualifier
    firmware_sample_driver.c: fix coding style
    MAINTAINERS: Add ati_remote2 driver
    ...

    Fixed up trivial conflicts in firmware_sample_driver.c

    Linus Torvalds
     
  • Spelling fix: prefered -> preferred

    Signed-off-by: Benoit Boissinot
    Signed-off-by: Jesper Juhl

    Benoit Boissinot
     

19 Apr, 2008

1 commit


16 Apr, 2008

1 commit

  • mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
    filesystems are calling this function while holding xattr_sem so possible
    recursion into the fs violates locking ordering of xattr_sem and transaction
    start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
    can specify desired gfp mask and use GFP_NOFS from all of them.

    Signed-off-by: Jan Kara
    Reported-by: Dave Jones
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

09 Feb, 2008

1 commit