04 Jul, 2013

1 commit

  • Page reclaim keeps track of dirty and under writeback pages and uses it
    to determine if wait_iff_congested() should stall or if kswapd should
    begin writing back pages. This fails to account for buffer pages that
    can be under writeback but not PageWriteback which is the case for
    filesystems like ext3 ordered mode. Furthermore, PageDirty buffer pages
    can have all the buffers clean and writepage does no IO so it should not
    be accounted as congested.

    This patch adds an address_space operation that filesystems may
    optionally use to check if a page is really dirty or really under
    writeback. An implementation is provided for for buffer_heads is added
    and used for block operations and ext3 in ordered mode. By default the
    page flags are obeyed.

    Credit goes to Jan Kara for identifying that the page flags alone are
    not sufficient for ext3 and sanity checking a number of ideas on how the
    problem could be addressed.

    Signed-off-by: Mel Gorman
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Jiri Slaby
    Cc: Valdis Kletnieks
    Cc: Zlatko Calusic
    Cc: dormando
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

22 May, 2013

2 commits

  • ->invalidatepage() aop now accepts range to invalidate so we can make
    use of it in journal_invalidatepage() and all the users in ext3 file
    system. Also update ext3 trace point to print out length argument.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara

    Lukas Czerner
     
  • Currently there is no way to truncate partial page where the end
    truncate point is not at the end of the page. This is because it was not
    needed and the functionality was enough for file system truncate
    operation to work properly. However more file systems now support punch
    hole feature and it can benefit from mm supporting truncating page just
    up to the certain point.

    Specifically, with this functionality truncate_inode_pages_range() can
    be changed so it supports truncating partial page at the end of the
    range (currently it will BUG_ON() if 'end' is not at the end of the
    page).

    This commit changes the invalidatepage() address space operation
    prototype to accept range to be invalidated and update all the instances
    for it.

    We also change the block_invalidatepage() in the same way and actually
    make a use of the new length argument implementing range invalidation.

    Actual file system implementations will follow except the file systems
    where the changes are really simple and should not change the behaviour
    in any way .Implementation for truncate_page_range() which will be able
    to accept page unaligned ranges will follow as well.

    Signed-off-by: Lukas Czerner
    Cc: Andrew Morton
    Cc: Hugh Dickins

    Lukas Czerner
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

20 Mar, 2013

1 commit

  • In data=journal mode, if we unmount the file system before a
    transaction has a chance to complete, when the journal inode is being
    evicted, we can end up calling into log_wait_commit() for the
    last transaction, after the journalling machinery has been shut down.
    That triggers the WARN_ONCE in __log_start_commit().

    Arguably we should adjust ext3_should_journal_data() to return FALSE
    for the journal inode, but the only place it matters is
    ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
    the patch much more obviously correct by inspection(tm), we'll fix it
    by explicitly not trying to waiting for a journal commit when we are
    evicting the journal inode, since it's guaranteed to never succeed in
    this case.

    This can be easily replicated via:

    mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb

    This is a port of ext4 fix from Ted Ts'o.

    Signed-off-by: Jan Kara

    Jan Kara
     

21 Jan, 2013

3 commits


13 Dec, 2012

1 commit

  • Just use WARN_ON rather than an if containing only WARN_ON(1).

    A simplified version of the semantic patch that makes this transformation
    is as follows: (http://coccinelle.lip6.fr/)

    //
    @@
    expression e;
    @@
    - if (e) WARN_ON(1);
    + WARN_ON(e);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Jan Kara

    Julia Lawall
     

02 Oct, 2012

1 commit

  • Pull the trivial tree from Jiri Kosina:
    "Tiny usual fixes all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    doc: fix old config name of kprobetrace
    fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
    btrfs: fix the commment for the action flags in delayed-ref.h
    btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
    vfs: fix kerneldoc for generic_fh_to_parent()
    treewide: fix comment/printk/variable typos
    ipr: fix small coding style issues
    doc: fix broken utf8 encoding
    nfs: comment fix
    platform/x86: fix asus_laptop.wled_type module parameter
    mfd: printk/comment fixes
    doc: getdelays.c: remember to close() socket on error in create_nl_socket()
    doc: aliasing-test: close fd on write error
    mmc: fix comment typos
    dma: fix comments
    spi: fix comment/printk typos in spi
    Coccinelle: fix typo in memdup_user.cocci
    tmiofb: missing NULL pointer checks
    tools: perf: Fix typo in tools/perf
    tools/testing: fix comment / output typos
    ...

    Linus Torvalds
     

04 Sep, 2012

1 commit

  • Code tracking when transaction needs to be committed on fdatasync(2) forgets
    to handle a situation when only inode's i_size is changed. Thus in such
    situations fdatasync(2) doesn't force transaction with new i_size to disk
    and that can result in wrong i_size after a crash.

    Fix the issue by updating inode's i_datasync_tid whenever its size is
    updated.

    CC: # >= 2.6.32
    Reported-by: Kristian Nielsen
    Signed-off-by: Jan Kara

    Jan Kara
     

02 Sep, 2012

1 commit


04 Aug, 2012

1 commit


29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

16 May, 2012

1 commit


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

01 Apr, 2012

1 commit


01 Mar, 2012

1 commit


10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

3 commits

  • Delete any instances of include module.h that were not strictly
    required. In the case of ext2, the declaration of MODULE_LICENSE
    etc. were in inode.c but the module_init/exit were in super.c, so
    relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
    consistent with ext3 and ext4 at the same time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jan Kara

    Paul Gortmaker
     
  • WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
    remounted read-only. This unnecessarily scares users (well, they should be
    scared because of filesystem error, but the stack trace distracts them from the
    right source of their fear ;-). We could as well just remove the WARN_ON but
    it's not hard to fix it to not trip on filesystem with errors and not use more
    cycles in the common case so that's what we do.

    CC: stable@kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     
  • ll_rw_block() is deprecated. Thus we replace it with other functions.

    CC: Jan Kara
    Signed-off-by: Zheng Liu
    Signed-off-by: Jan Kara

    Zheng Liu
     

02 Dec, 2011

1 commit


22 Nov, 2011

1 commit

  • This is an fsfuzzer bug. ->s_journal is set at the end of
    ext3_load_journal() but we try to use it in the error handling from
    ext3_get_journal() while it's still NULL.

    [ 337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
    [ 337.040380] IP: [] _raw_spin_lock+0x9/0x30
    [ 337.041687] PGD 0
    [ 337.043118] Oops: 0002 [#1] SMP
    [ 337.044483] CPU 3
    [ 337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
    [ 337.047633]
    [ 337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511 /RV411/RV511/E3511/S3511
    [ 337.051064] RIP: 0010:[] [] _raw_spin_lock+0x9/0x30
    [ 337.052879] RSP: 0018:ffff8800b1d11ae8 EFLAGS: 00010282
    [ 337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
    [ 337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
    [ 337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
    [ 337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
    [ 337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
    [ 337.063385] FS: 00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
    [ 337.065110] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
    [ 337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
    [ 337.073800] Stack:
    [ 337.075487] ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
    [ 337.077255] ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
    [ 337.078851] ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
    [ 337.080284] Call Trace:
    [ 337.081706] [] log_start_commit+0x1f/0x40
    [ 337.083107] [] ext3_evict_inode+0x1fd/0x2a0
    [ 337.084490] [] evict+0xa1/0x1a0
    [ 337.085857] [] iput+0x101/0x210
    [ 337.087220] [] iget_failed+0x21/0x30
    [ 337.088581] [] ext3_iget+0x15c/0x450
    [ 337.089936] [] ? ext3_rsv_window_add+0x81/0x100
    [ 337.091284] [] ext3_get_journal+0x15/0xde
    [ 337.092641] [] ext3_fill_super+0xf2b/0x1c30
    [ 337.093991] [] ? register_shrinker+0x4d/0x60
    [ 337.095332] [] mount_bdev+0x1a2/0x1e0
    [ 337.096680] [] ? ext3_setup_super+0x210/0x210
    [ 337.098026] [] ext3_mount+0x10/0x20
    [ 337.099362] [] mount_fs+0x3e/0x1b0
    [ 337.100759] [] ? __alloc_percpu+0xb/0x10
    [ 337.102330] [] vfs_kern_mount+0x65/0xc0
    [ 337.103889] [] do_kern_mount+0x4f/0x100
    [ 337.105442] [] do_mount+0x19c/0x890
    [ 337.106989] [] ? memdup_user+0x46/0x90
    [ 337.108572] [] ? strndup_user+0x53/0x70
    [ 337.110114] [] sys_mount+0x8b/0xe0
    [ 337.111617] [] system_call_fastpath+0x16/0x1b
    [ 337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
    [ 337.116588] RIP [] _raw_spin_lock+0x9/0x30
    [ 337.118260] RSP
    [ 337.119998] CR2: 0000000000000024
    [ 337.188701] ---[ end trace c36d790becac1615 ]---

    Signed-off-by: Dan Carpenter
    Signed-off-by: Jan Kara

    Dan Carpenter
     

02 Nov, 2011

1 commit


23 Aug, 2011

2 commits


27 Jul, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
    ext3.txt: update the links in the section "useful links" to the latest ones
    ext3: Fix data corruption in inodes with journalled data
    ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
    ext3: Fix compilation with -DDX_DEBUG
    quota: Remove unused declaration
    jbd: Use WRITE_SYNC in journal checkpoint.
    jbd: Fix oops in journal_remove_journal_head()
    ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
    ext3/ioctl.c: silence sparse warnings about different address spaces
    ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
    ext3: Improve truncate error handling
    ext3: use proper little-endian bitops
    ext2: include fs.h into ext2_fs.h
    ext3: Fix oops in ext3_try_to_allocate_with_rsv()
    jbd: fix a bug of leaking jh->b_jcount
    jbd: remove dependency on __GFP_NOFAIL
    ext3: Convert ext3 to new truncate calling convention
    jbd: Add fixed tracepoints
    ext3: Add fixed tracepoints

    Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
    new fixed tracepoints.

    Linus Torvalds
     

23 Jul, 2011

1 commit

  • When journalling data for an inode (either because it is a symlink or
    because the filesystem is mounted in data=journal mode), ext3_evict_inode()
    can discard unwritten data by calling truncate_inode_pages(). This is
    because we don't mark the buffer / page dirty when journalling data but only
    add the buffer to the running transaction and thus mm does not know there
    are still unwritten data.

    Fix the problem by carefully tracking transaction containing inode's data,
    committing this transaction, and writing uncheckpointed buffers when inode
    should be reaped.

    Signed-off-by: Jan Kara

    Jan Kara
     

21 Jul, 2011

2 commits

  • Simple filesystems always pass inode->i_sb_bdev as the block device
    argument, and never need a end_io handler. Let's simply things for
    them and for my grepping activity by dropping these arguments. The
    only thing not falling into that scheme is ext4, which passes and
    end_io handler without needing special flags (yet), but given how
    messy the direct I/O code there is use of __blockdev_direct_IO
    in one instead of two out of three cases isn't going to make a large
    difference anyway.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

25 Jun, 2011

3 commits

  • New truncate calling convention allows us to handle errors from
    ext3_block_truncate_page(). So reorganize the code so that
    ext3_block_truncate_page() is called before we change inode size.

    This also removes unnecessary block zeroing from error recovery after failed
    buffered writes (zeroing isn't needed because we could have never written
    non-zero data to disk). We have to be careful and keep zeroing in direct IO
    write error recovery because there we might have already overwritten end of the
    last file block.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
    could not be truncated during failed writes as we change the code. In fact the
    test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
    upper layers in do_sys_[f]truncate(), may_write(), etc.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • This commit adds fixed tracepoints to the ext3 code. It is based on ext4
    tracepoints, however due to the differences of both file systems, there
    are some tracepoints missing (those for delaloc and for multi-block
    allocator) and there are some ext3 specific as well (for reservation
    windows).

    Here is a list:

    ext3_free_inode
    ext3_request_inode
    ext3_allocate_inode
    ext3_evict_inode
    ext3_drop_inode
    ext3_mark_inode_dirty
    ext3_write_begin
    ext3_ordered_write_end
    ext3_writeback_write_end
    ext3_journalled_write_end
    ext3_ordered_writepage
    ext3_writeback_writepage
    ext3_journalled_writepage
    ext3_readpage
    ext3_releasepage
    ext3_invalidatepage
    ext3_discard_blocks
    ext3_request_blocks
    ext3_allocate_blocks
    ext3_free_blocks
    ext3_sync_file_enter
    ext3_sync_file_exit
    ext3_sync_fs
    ext3_rsv_window_add
    ext3_discard_reservation
    ext3_alloc_new_reservation
    ext3_reserved
    ext3_forget
    ext3_read_block_bitmap
    ext3_direct_IO_enter
    ext3_direct_IO_exit
    ext3_unlink_enter
    ext3_unlink_exit
    ext3_truncate_enter
    ext3_truncate_exit
    ext3_get_blocks_enter
    ext3_get_blocks_exit
    ext3_load_inode

    Signed-off-by: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Lukas Czerner
     

27 May, 2011

1 commit

  • Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
    anything else, so that the filesystem can track internally if it
    needs to push out a transaction for fdatasync or not.

    This is just the prototype change with no user for it yet. I plan
    to push large XFS changes for the next merge window, and getting
    this trivial infrastructure in this window would help a lot to avoid
    tree interdependencies.

    Also remove incorrect comments that ->dirty_inode can't block. That
    has been changed a long time ago, and many implementations rely on it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Apr, 2011

1 commit


31 Mar, 2011

1 commit


24 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Jan, 2011

1 commit