03 Nov, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
    jbd2: Unify log messages in jbd2 code
    jbd/jbd2: validate sb->s_first in journal_get_superblock()
    ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
    ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
    ext4: fix a typo in struct ext4_allocation_context
    ext4: Don't normalize an falloc request if it can fit in 1 extent.
    ext4: remove comments about extent mount option in ext4_new_inode()
    ext4: let ext4_discard_partial_buffers handle unaligned range correctly
    ext4: return ENOMEM if find_or_create_pages fails
    ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
    ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
    ext4: optimize locking for end_io extent conversion
    ext4: remove unnecessary call to waitqueue_active()
    ext4: Use correct locking for ext4_end_io_nolock()
    ext4: fix race in xattr block allocation path
    ext4: trace punch_hole correctly in ext4_ext_map_blocks
    ext4: clean up AGGRESSIVE_TEST code
    ext4: move variables to their scope
    ext4: fix quota accounting during migration
    ext4: migrate cleanup
    ...

    Linus Torvalds
     

28 Oct, 2011

1 commit


25 Oct, 2011

1 commit

  • In ext4_file_open, the filesystem records the mountpoint of the first
    file that is opened after mounting the filesystem. It does this by
    allocating a 64-byte stack buffer, calling d_path() to grab the mount
    point through which this file was accessed, and then memcpy()ing 64
    bytes into the superblock's s_last_mounted field, starting from the
    return value of d_path(), which is stored as "cp". However, if cp >
    buf (which it frequently is since path components are prepended
    starting at the end of buf) then we can end up copying stack data into
    the superblock.

    Writing stack variables into the superblock doesn't sound like a great
    idea, so use strlcpy instead. Andi Kleen suggested using strlcpy
    instead of strncpy.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong
     

26 Jul, 2011

1 commit

  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2011

1 commit

  • Since Ext4 has its own lseek we need to make sure it handles
    SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic
    case, somebody else can come along and make it do fancy things later. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

26 May, 2011

1 commit

  • Trivial conversion. Fixup one error handling case calling vmtruncate()
    and remove ->truncate callback. We also fix a bug that IS_IMMUTABLE and
    IS_APPEND files could not be truncated during failed writes. In fact, the
    test can be completely removed as upper layers do necessary permission
    checks for truncate in do_sys_[f]truncate() and may_open() anyway.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

12 Feb, 2011

1 commit

  • ext4 has a data corruption case when doing non-block-aligned
    asynchronous direct IO into a sparse file, as demonstrated
    by xfstest 240.

    The root cause is that while ext4 preallocates space in the
    hole, mappings of that space still look "new" and
    dio_zero_block() will zero out the unwritten portions. When
    more than one AIO thread is going, they both find this "new"
    block and race to zero out their portion; this is uncoordinated
    and causes data corruption.

    Dave Chinner fixed this for xfs by simply serializing all
    unaligned asynchronous direct IO. I've done the same here.
    The difference is that we only wait on conversions, not all IO.
    This is a very big hammer, and I'm not very pleased with
    stuffing this into ext4_file_write(). But since ext4 is
    DIO_LOCKING, we need to serialize it at this high level.

    I tried to move this into ext4_ext_direct_IO, but by then
    we have the i_mutex already, and we will wait on the
    work queue to do conversions - which must also take the
    i_mutex. So that won't work.

    This was originally exposed by qemu-kvm installing to
    a raw disk image with a normal sector-63 alignment. I've
    tested a backport of this patch with qemu, and it does
    avoid the corruption. It is also quite a lot slower
    (14 min for package installs, vs. 8 min for well-aligned)
    but I'll take slow correctness over fast corruption any day.

    Mingming suggested that we can track outstanding
    conversions, and wait on those so that non-sparse
    files won't be affected, and I've implemented that here;
    unaligned AIO to nonsparse files won't take a perf hit.

    [tytso@mit.edu: Keep the mutex as a hashed array instead
    of bloating the ext4 inode]

    [tytso@mit.edu: Fix up namespace issues so that global
    variables are protected with an "ext4_" prefix.]

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

17 Jan, 2011

1 commit

  • Currently all filesystems except XFS implement fallocate asynchronously,
    while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
    I/O we really want our allocation on disk, especially for the !KEEP_SIZE
    case where we actually grow the file with user-visible zeroes. On the
    other hand always commiting the transaction is a bad idea for fast-path
    uses of fallocate like for example in recent Samba versions. Given
    that block allocation is a data plane operation anyway change it from
    an inode operation to a file operation so that we have the file structure
    available that lets us check for O_SYNC.

    This also includes moving the code around for a few of the filesystems,
    and remove the already unnedded S_ISDIR checks given that we only wire
    up fallocate for regular files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

11 Jan, 2011

1 commit


28 Oct, 2010

1 commit

  • The llseek system call should return EINVAL if passed a seek offset
    which results in a write error. What this maximum offset should be
    depends on whether or not the huge_file file system feature is set,
    and whether or not the file is extent based or not.

    If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be
    written (write systemcall) is different from the maximum size which can be
    sought (lseek systemcall).

    For example, the following 2 cases demonstrates the differences
    between the maximum size which can be written, versus the seek offset
    allowed by the llseek system call:

    #1: mkfs.ext3 ; mount -t ext4
    #2: mkfs.ext3 ; tune2fs -Oextent,huge_file ; mount -t ext4

    Table. the max file size which we can write or seek
    at each filesystem feature tuning and file flag setting
    +============+===============================+===============================+
    | \ File flag| | |
    | \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
    |case \| | |
    +------------+-------------------------------+-------------------------------+
    | #1 | write: 2194719883264 | write: -------------- |
    | | seek: 2199023251456 | seek: -------------- |
    +------------+-------------------------------+-------------------------------+
    | #2 | write: 4402345721856 | write: 17592186044415 |
    | | seek: 17592186044415 | seek: 17592186044415 |
    +------------+-------------------------------+-------------------------------+

    The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
    (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped
    maxbytes). Although generic_file_llseek uses only extent-mapped maxbytes.
    (llseek of ext4_file_operations is generic_file_llseek which uses
    sb->s_maxbytes.)

    Therefore we create ext4 llseek function which uses 2 maxbytes.

    The new own function originates from generic_file_llseek().
    If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters
    inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.

    Signed-off-by: Toshiyuki Okajima
    Signed-off-by: "Theodore Ts'o"
    Cc: Andreas Dilger

    Toshiyuki Okajima
     

27 Jul, 2010

1 commit

  • By running the following reproducer, we can confirm that the write
    system call returns with 0 when it should return the error EFBIG.

    #!/bin/sh

    /bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1
    /sbin/mkfs.ext3 -Fq ./img
    /bin/mount -o loop -t ext4 ./img /mnt
    /bin/touch /mnt/file
    strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = "
    /bin/umount /mnt
    exit

    Signed-off-by: Toshiyuki Okajima
    Signed-off-by: "Theodore Ts'o"
    Cc: Eric Sandeen

    Toshiyuki Okajima
     

12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

17 May, 2010

1 commit


06 Mar, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (36 commits)
    ext4: fix up rb_root initializations to use RB_ROOT
    ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl
    ext4: Fix the NULL reference in double_down_write_data_sem()
    ext4: Fix insertion point of extent in mext_insert_across_blocks()
    ext4: consolidate in_range() definitions
    ext4: cleanup to use ext4_grp_offs_to_block()
    ext4: cleanup to use ext4_group_first_block_no()
    ext4: Release page references acquired in ext4_da_block_invalidatepages
    ext4: Fix ext4_quota_write cross block boundary behaviour
    ext4: Convert BUG_ON checks to use ext4_error() instead
    ext4: Use direct_IO_no_locking in ext4 dio read
    ext4: use ext4_get_block_write in buffer write
    ext4: mechanical rename some of the direct I/O get_block's identifiers
    ext4: make "offset" consistent in ext4_check_dir_entry()
    ext4: Handle non empty on-disk orphan link
    ext4: explicitly remove inode from orphan list after failed direct io
    ext4: fix error handling in migrate
    ext4: deprecate obsoleted mount options
    ext4: Fix fencepost error in chosing choosing group vs file preallocation.
    jbd2: clean up an assertion in jbd2_journal_commit_transaction()
    ...

    Linus Torvalds
     

05 Mar, 2010

2 commits

  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

04 Mar, 2010

1 commit


25 Jan, 2010

1 commit

  • At several places we modify EXT4_I(inode)->i_state without holding
    i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
    ext4_do_update_inode, ...). These modifications are racy and we can
    lose updates to i_state. So convert handling of i_state to use bitops
    which are atomic.

    Cc: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 Sep, 2009

1 commit


14 Sep, 2009

1 commit


09 Sep, 2009

1 commit


13 Jun, 2009

1 commit

  • This field can be very helpful when a system administrator is trying
    to sort through large numbers of block devices or filesystem images.
    What is stored in this field can be ambiguous if multiple filesystem
    namespaces are in play; what we store in practice is the mountpoint
    interpreted by the process's namespace which first opens a file in the
    filesystem.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 Mar, 2009

1 commit

  • With delayed allocation we should not/cannot discard inode prealloc
    space during file close. We would still have dirty pages for which we
    haven't allocated blocks yet. With this fix after each get_blocks
    request we check whether we have zero reserved blocks and if yes and
    we don't have any writers on the file we discard inode prealloc space.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

24 Feb, 2009

1 commit

  • When closing a file that had been previously truncated, force any
    delay allocated blocks that to be allocated so that if the filesystem
    is mounted with data=ordered, the data blocks will be pushed out to
    disk along with the journal commit. Many application programs expect
    this, so we do this to avoid zero length files if the system crashes
    unexpectedly.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

23 Nov, 2008

1 commit

  • * Change EXT4_HAS_*_FEATURE to return a boolean
    * Add a function prototype for ext4_fiemap() in ext4.h
    * Make ext4_ext_fiemap_cb() and ext4_xattr_fiemap() be static functions
    * Add lock annotations to mb_free_blocks()

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

11 Oct, 2008

1 commit


10 Oct, 2008

1 commit


07 Oct, 2008

1 commit

  • ext4_ext_walk_space() was reinstated to be used for iterating over file
    extents with a callback; it is used by the ext4 fiemap implementation.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org

    Eric Sandeen
     

09 Sep, 2008

1 commit


12 Jul, 2008

2 commits

  • Right now i_blocks is not getting updated until the blocks are actually
    allocaed on disk. This means with delayed allocation, right after files
    are copied, "ls -sF" shoes the file as taking 0 blocks on disk. "du"
    also shows the files taking zero space, which is highly confusing to the
    user.

    Since delayed allocation already keeps track of per-inode total
    number of blocks that are subject to delayed allocation, this patch fix
    this by using that to adjust the value returned by stat(2). When real
    block allocation is done, the i_blocks will get updated. Since the
    reserved blocks for delayed allocation will be decreased, this will be
    keep value returned by stat(2) consistent.

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • We would like to get notified when we are doing a write on mmap section.
    This is needed with respect to preallocated area. We split the preallocated
    area into initialzed extent and uninitialzed extent in the call back. This
    let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
    that would result in data loss. The changes are also needed to handle ENOSPC
    when writing to an mmap section of files with holes.

    Acked-by: Jan Kara
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

30 Apr, 2008

2 commits


29 Jan, 2008

2 commits


18 Jul, 2007

1 commit

  • This patch implements ->fallocate() inode operation in ext4. With this
    patch users of ext4 file systems will be able to use fallocate() system
    call for persistent preallocation. Current implementation only supports
    preallocation for regular files (directories not supported as of date)
    with extent maps. This patch does not support block-mapped files currently.
    Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
    now.

    Signed-off-by: Amit Arora

    Amit Arora
     

10 Jul, 2007

1 commit


13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

09 Dec, 2006

1 commit