27 Jul, 2007

1 commit

  • ext[234]_check_descriptors sanity checks block group descriptor geometry at
    mount time, testing whether the block bitmap, inode bitmap, and inode table
    reside wholly within the blockgroup. However, the inode table test is off
    by one so that if the last block in the inode table resides on the last
    block of the block group, the test incorrectly fails. This is because it
    tests the last block as (start + length) rather than (start + length - 1).

    This can be seen by trying to mount a filesystem made such as:

    mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024

    which yields:

    EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
    EXT2-fs: group descriptors corrupted!

    There is a similar bug in e2fsprogs, patch already sent for that.

    (I wonder if inside(), outside(), and/or in_range() should someday be
    used in this and other tests throughout the ext filesystems...)

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

20 Jul, 2007

2 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

    Here is a short excerpt of the semantic patch performing
    this transformation:

    @@
    type T2;
    expression x;
    identifier f,fld;
    expression E;
    expression E1,E2;
    expression e1,e2,e3,y;
    statement S;
    @@

    x =
    - kmalloc
    + kzalloc
    (E1,E2)
    ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
    - memset((T2)x,0,E1);

    @@
    expression E1,E2,E3;
    @@

    - kzalloc(E1 * E2,E3)
    + kcalloc(E1,E2,E3)

    [akpm@linux-foundation.org: get kcalloc args the right way around]
    Signed-off-by: Yoann Padioleau
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Acked-by: Russell King
    Cc: Bryan Wu
    Acked-by: Jiri Slaby
    Cc: Dave Airlie
    Acked-by: Roland Dreier
    Cc: Jiri Kosina
    Acked-by: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Pierre Ossman
    Cc: Jeff Garzik
    Cc: "David S. Miller"
    Acked-by: Greg KH
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yoann Padioleau
     

18 Jul, 2007

2 commits

  • Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
    users to it. This is done because we want to avoid bugs in the future
    where we check for only effective fsuid of the current task against a
    file's owning uid, without simultaneously checking for CAP_FOWNER as
    well, thus violating its semantics.
    [ XFS uses special macros and structures, and in general looked ...
    untouchable, so we leave it alone -- but it has been looked over. ]

    The (current->fsuid != inode->i_uid) check in generic_permission() and
    exec_permission_lite() is left alone, because those operations are
    covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
    falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

    Signed-off-by: Satyam Sharma
    Cc: Al Viro
    Acked-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Satyam Sharma
     
  • currently the export_operation structure and helpers related to it are in
    fs.h. fs.h is already far too large and there are very few places needing the
    export bits, so split them off into a separate header.

    [akpm@linux-foundation.org: fix cifs build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Jul, 2007

2 commits

  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Signed-off-by: Jan Kara
    Acked-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

10 Jul, 2007

2 commits

  • This patch removes xip_file_sendfile, the sendfile implementation for
    xip without replacement. Those customers that use xip on s390 are not
    using sendfile() as far as we know, and so far s390 is the only platform
    this could potentially be used on so far.
    Having sendfile is not a popular feature for execute in place file
    systems, however we have a working implementation of splice_read() based
    on fs/splice.c if anyone asks for it.
    At this point in time, it does not seem preferable to merge
    splice_read() for xip because it causes extra maintenence effort due to
    code duplication and it requires struct page behind the xip memory
    segment. We'd like to get rid of that in favor of supporting flash based
    embedded platforms (Monta Vista work) soon.

    Signed-off-by: Carsten Otte
    Signed-off-by: Jens Axboe

    Carsten Otte
     
  • They can use generic_file_splice_read() instead. Since sys_sendfile() now
    prefers that, there should be no change in behaviour.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

29 Jun, 2007

1 commit


24 Jun, 2007

1 commit

  • Yan Zheng pointed out that ext2_remount lacks checking if -o xip should be
    enabled or not. This patch checks for presence of direct_access on the
    backing block device and if the blocksize meets the requirements.

    Signed-off-by: Carsten Otte
    Cc: Yan Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     

17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

3 commits

  • Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
    ext2-specific i_flags. Hence, when someone sets these flags via a different
    interface than ioctl, they are stored correctly.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079

    signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit

    10000011110110100100111110111101 .. -2,082,844,739
    10000011110110100100111110111101 .. 2,212,122,557
    Cc:

    Andreas says:

    This patch is now treating timestamps with the high bit set as negative
    times (before Jan 1, 1970). This means we lose 1/2 of the possible range
    of timestamps (lopping off 68 years before unix timestamp overflow -
    now only 30 years away :-) to handle the extremely rare case of setting
    timestamps into the distant past.

    If we are only interested in fixing the underflow case, we could just
    limit the values to 0 instead of storing negative values. At worst this
    will skew the timestamp by a few hours for timezones in the far east
    (files would still show Jan 1, 1970 in "ls -l" output).

    That said, it seems 32-bit systems (mine at least) allow files to be set
    into the past (01/01/1907 works fine) so it seems this patch is bringing
    the x86_64 behaviour into sync with other kernels.

    On the plus side, we have a patch that is ready to add nanosecond timestamps
    to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
    this extends the maximum date to 2242.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Rechberger
     

08 May, 2007

2 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd. All depending on whether the filler is async and/or can return
    with a !uptodate page.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

21 Feb, 2007

1 commit


13 Feb, 2007

2 commits

  • This patch is inspired by Arjan's "Patch series to mark struct
    file_operations and struct inode_operations const".

    Compile tested with gcc & sparse.

    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

12 Feb, 2007

2 commits

  • Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
    ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
    kernel built without ACL support, then umask was ignored when creating
    inodes - though root or user has umask 022, touch creates files as 0666,
    and mkdir creates directories as 0777.

    This appears to have worked right until 2.6.11, when a fix to the default
    mode on symlinks (always 0777) assumed VFS applies umask: which it does,
    unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
    s_flags according to s_mount_opt set according to def_mount_opts.

    We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
    but other filesystems only set MS_POSIXACL when ACLs are configured. We
    could fix this at another level; but it seems most robust to avoid setting
    the s_mount_opt flag in the first place (at the expense of more ifdefs).

    Likewise don't set the XATTR_USER flag when built without XATTR support.

    Signed-off-by: Hugh Dickins
    Cc: Tigran Aivazian
    Cc:
    Cc: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This one was pointed out on the MOKB site:
    http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html

    If a directory's i_size is corrupted, ext2_find_entry() will keep
    processing pages until the i_size is reached, even if there are no more
    blocks associated with the directory inode. This patch puts in some
    minimal sanity-checking so that we don't keep checking pages (and issuing
    errors) if we know there can be no more data to read, based on the block
    count of the directory inode.

    This is somewhat similar in approach to the ext3 patch I sent earlier this
    year.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

09 Dec, 2006

2 commits

  • This facility provides three entry points:

    ilog2() Log base 2 of unsigned long
    ilog2_u32() Log base 2 of u32
    ilog2_u64() Log base 2 of u64

    These facilities can either be used inside functions on dynamic data:

    int do_something(long q)
    {
    ...;
    y = ilog2(x)
    ...;
    }

    Or can be used to statically initialise global variables with constant values:

    unsigned n = ilog2(27);

    When performing static initialisation, the compiler will report "error:
    initializer element is not constant" if asked to take a log of zero or of
    something not reducible to a constant. They treat negative numbers as
    unsigned.

    When not dealing with a constant, they fall back to using fls() which permits
    them to use arch-specific log calculation instructions - such as BSR on
    x86/x86_64 or SCAN on FRV - if available.

    [akpm@osdl.org: MMC fix]
    Signed-off-by: David Howells
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Herbert Xu
    Cc: David Howells
    Cc: Wojtek Kaniewski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the ext2
    filesystem.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

08 Dec, 2006

5 commits

  • Port commit a090d9132c1e53e3517111123680c15afb25c0a4 into ext2:

    All modifications of ->i_flags in inodes that might be visible to somebody
    else must be under ->i_mutex. That patch fixes ext2 ioctl() setting S_APPEND.

    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • lock_super() is unnecessary for setting super-block feature flags. Use the
    provided *_SET_COMPAT_FEATURE() macros as well.

    Signed-off-by: Andreas Gruenbacher
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     
  • Update ext2_statfs to return an FSID that is a 64 bit XOR of the 128 bit
    filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
    entry for details:

    http://bugzilla.kernel.org/show_bug.cgi?id=136

    Cc: Andreas Dilger
    Cc: Stephen Tweedie
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

12 Oct, 2006

1 commit

  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT2_ERRORS_CONTINUE should be read from the superblock as default value for
    error behaviour. parse_option() should clean the alternative options and
    should not change default value taken from the superblock.

    Signed-off-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     

01 Oct, 2006

4 commits

  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This patch cleans up generic_file_*_read/write() interfaces. Christoph
    Hellwig gave me the idea for this clean ups.

    In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
    do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
    to cleanup all variants of generic_file_* routines.

    Final available interfaces:

    generic_file_aio_read() - read handler
    generic_file_aio_write() - write handler
    generic_file_aio_write_nolock() - no lock write handler

    __generic_file_aio_write_nolock() - internal worker routine

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch removes readv() and writev() methods and replaces them with
    aio_read()/aio_write() methods.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Move the Ext2 device ioctl compat stuff from fs/compat_ioctl.c to the Ext2
    driver so that the Ext2 header file doesn't need to be included.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

27 Sep, 2006

5 commits


19 Sep, 2006

1 commit

  • Fix a performance degradation introduced in 2.6.17. (30% degradation
    running dbench with 16 threads)

    Commit 21730eed11de42f22afcbd43f450a1872a0b5ea1, which claims to make
    EXT2_DEBUG work again, moves the taking of the kernel lock out of
    debug-only code in ext2_count_free_inodes and ext2_count_free_blocks and
    into ext2_statfs.

    The same problem was fixed in ext3 by removing the lock completely (commit
    5b11687924e40790deb0d5f959247ade82196665)

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp