08 Apr, 2014

3 commits

  • Merge second patch-bomb from Andrew Morton:
    - the rest of MM
    - zram updates
    - zswap updates
    - exit
    - procfs
    - exec
    - wait
    - crash dump
    - lib/idr
    - rapidio
    - adfs, affs, bfs, ufs
    - cris
    - Kconfig things
    - initramfs
    - small amount of IPC material
    - percpu enhancements
    - early ioremap support
    - various other misc things

    * emailed patches from Andrew Morton : (156 commits)
    MAINTAINERS: update Intel C600 SAS driver maintainers
    fs/ufs: remove unused ufs_super_block_third pointer
    fs/ufs: remove unused ufs_super_block_second pointer
    fs/ufs: remove unused ufs_super_block_first pointer
    fs/ufs/super.c: add __init to init_inodecache()
    doc/kernel-parameters.txt: add early_ioremap_debug
    arm64: add early_ioremap support
    arm64: initialize pgprot info earlier in boot
    x86: use generic early_ioremap
    mm: create generic early_ioremap() support
    x86/mm: sparse warning fix for early_memremap
    lglock: map to spinlock when !CONFIG_SMP
    percpu: add preemption checks to __this_cpu ops
    vmstat: use raw_cpu_ops to avoid false positives on preemption checks
    slub: use raw_cpu_inc for incrementing statistics
    net: replace __this_cpu_inc in route.c with raw_cpu_inc
    modules: use raw_cpu_write for initialization of per cpu refcount.
    mm: use raw_cpu ops for determining current NUMA node
    percpu: add raw_cpu_ops
    slub: fix leak of 'name' in sysfs_slab_add
    ...

    Linus Torvalds
     
  • filemap_map_pages() is generic implementation of ->map_pages() for
    filesystems who uses page cache.

    It should be safe to use filemap_map_pages() for ->map_pages() if
    filesystem use filemap_fault() for ->fault().

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Alexander Viro
    Cc: Dave Chinner
    Cc: Ning Qu
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Pull f2fs updates from Jaegeuk Kim:
    "This patch-set includes the following major enhancement patches.
    - introduce large directory support
    - introduce f2fs_issue_flush to merge redundant flush commands
    - merge write IOs as much as possible aligned to the segment
    - add sysfs entries to tune the f2fs configuration
    - use radix_tree for the free_nid_list to reduce in-memory operations
    - remove costly bit operations in f2fs_find_entry
    - enhance the readahead flow for CP/NAT/SIT/SSA blocks

    The other bug fixes are as follows:
    - recover xattr node blocks correctly after sudden-power-cut
    - fix to calculate the maximum number of node ids
    - enhance to handle many error cases

    And, there are a bunch of cleanups"

    * tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
    f2fs: fix wrong statistics of inline data
    f2fs: check the acl's validity before setting
    f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
    f2fs: fix to cover io->bio with io_rwsem
    f2fs: fix error path when fail to read inline data
    f2fs: use list_for_each_entry{_safe} for simplyfying code
    f2fs: avoid free slab cache under spinlock
    f2fs: avoid unneeded lookup when xattr name length is too long
    f2fs: avoid unnecessary bio submit when wait page writeback
    f2fs: return -EIO when node id is not matched
    f2fs: avoid RECLAIM_FS-ON-W warning
    f2fs: skip unnecessary node writes during fsync
    f2fs: introduce fi->i_sem to protect fi's info
    f2fs: change reclaim rate in percentage
    f2fs: add missing documentation for dir_level
    f2fs: remove unnecessary threshold
    f2fs: throttle the memory footprint with a sysfs entry
    f2fs: avoid to drop nat entries due to the negative nr_shrink
    f2fs: call f2fs_wait_on_page_writeback instead of native function
    f2fs: introduce nr_pages_to_write for segment alignment
    ...

    Linus Torvalds
     

07 Apr, 2014

3 commits

  • If we remove a file that has inline data after mount, our statistics turns to
    inaccurate.

    cat /sys/kernel/debug/f2fs/status
    - Inline_data Inode: 4294967295

    Let's add stat_inc_inline_inode() to stat inline info of the file when lookup.

    Change log from v1:
    o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Before setting the acl, call posix_acl_valid() to check if it is
    valid or not.

    Signed-off-by: zhangzhen
    Signed-off-by: Jaegeuk Kim

    ZhangZhen
     
  • Some storage devices show relatively high latencies to complete cache_flush
    commands, even though their normal IO speed is prettry much high. In such
    the case, it needs to merge cache_flush commands as much as possible to avoid
    issuing them redundantly.
    So, this patch introduces a mount option, "-o flush_merge", to mitigate such
    the overhead.

    If this option is enabled by user, F2FS merges the cache_flush commands and then
    issues just one cache_flush on behalf of them. Once the single command is
    finished, F2FS sends a completion signal to all the pending threads.

    Note that, this option can be used under a workload consisting of very intensive
    concurrent fsync calls, while the storage handles cache_flush commands slowly.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Apr, 2014

4 commits


01 Apr, 2014

3 commits


20 Mar, 2014

8 commits

  • This patch should resolve the following possible bug.

    RECLAIM_FS-ON-W at:
    mark_held_locks+0xb9/0x140
    lockdep_trace_alloc+0x85/0xf0
    __kmalloc+0x53/0x1d0
    read_all_xattrs+0x3d1/0x3f0 [f2fs]
    f2fs_getxattr+0x4f/0x100 [f2fs]
    f2fs_get_acl+0x4c/0x290 [f2fs]
    get_acl+0x4f/0x80
    posix_acl_create+0x72/0x180
    f2fs_init_acl+0x29/0xcc [f2fs]
    __f2fs_add_link+0x259/0x710 [f2fs]
    f2fs_create+0xad/0x1c0 [f2fs]
    vfs_create+0xed/0x150
    do_last+0xd36/0xed0
    path_openat+0xc5/0x680
    do_filp_open+0x43/0xa0
    do_sys_open+0x13c/0x230
    SyS_creat+0x1e/0x20
    system_call_fastpath+0x16/0x1b

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If multiple redundant fsync calls are triggered, we don't need to write its
    node pages with fsync mark continuously.

    So, this patch adds FI_NEED_FSYNC to track whether the latest node block is
    written with the fsync mark or not.
    If the mark was set, a new fsync doesn't need to write a node block.
    Otherwise, we should do a new node block with the mark for roll-forward
    recovery.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
    pino, i_nlink.
    This enables to remove i_mutex during f2fs_sync_file, resulting in performance
    improvement when a number of fsync calls are triggered from many concurrent
    threads.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • It is more reasonable to determine the reclaiming rate of prefree segments
    according to the volume size, which is set to 5% by default.
    For example, if the volume is 128GB, the prefree segments are reclaimed
    when the number reaches to 6.4GB.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis
    of the memory footprint.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch introduces ram_thresh, a sysfs entry, which controls the memory
    footprint used by the free nid list and the nat cache.

    Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
    cache was managed by NM_WOUT_THRESHOLD.
    However, this approach cannot be applied dynamically according to the system.

    So, this patch adds ram_thresh that users can specify the threshold, which is
    in order of 1 / 1024.
    For example, if the total ram size is 4GB and the value is set to 10 by default,
    f2fs tries to control the number of free nids and nat caches not to consume over
    10 * (4GB / 1024) = 10MB.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The try_to_free_nats should not receive the negative nr_shrink.
    Otherwise, it can drop all the nat entries by the while loop.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If a page is on writeback, f2fs can face with deadlock due to under writepages.
    This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw
    merged IOs, which is implemented by f2fs_wait_on_page_writeback.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

18 Mar, 2014

8 commits

  • This patch introduces nr_pages_to_write to align page writes to the segment
    or other operational unit size, which can be tuned according to the system
    environment.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch increases pages_skipped when skipping writepages.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can
    be skipped.
    The dentry, node, and meta pages can be conrolled by F2FS without breaking the
    FS consistency.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The get_dirty_dents gives us the number of dirty dentry pages.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Previously 'background_gc={on***,off***}' is being parsed as correct option,
    with this patch we cloud fix the trivial bug in mount process.

    Change log from v1:
    o need to check length of parameter suggested by Jaegeuk Kim.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • We should return error number of read_normal_summaries instead of -EINVAL when
    read_normal_summaries failed.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch introduces a help function f2fs_has_xattr_block for better
    readability.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • The original segment_info's show looks out-of-format:
    cat /proc/fs/f2fs/loop0/segment_info
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 512
    512 512 512 512 512 512 512 0 0 512
    348 0 263 0 0 512 0 0 512 512
    512 512 0 512 512 512 512 512 512 512
    512 512 511 328 512 512 512 512 512 512
    512 512 512 512 512 512 512 0 0 175

    Let's fix this and show type for each segment.
    cat /proc/fs/f2fs/loop0/segment_info
    format: segment_type|valid_blocks
    segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
    0 2|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
    10 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
    20 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
    30 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
    40 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
    50 3|0 3|0 3|0 3|0 3|0 3|0 3|0 0|0 3|0 3|0
    60 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|512
    70 3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0 3|0 3|512
    80 3|0 3|0 3|0 3|0 3|0 3|512 3|0 3|0 3|512 3|512
    90 3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512
    100 3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512
    110 0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0 4|0 3|512

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

12 Mar, 2014

2 commits


11 Mar, 2014

1 commit


10 Mar, 2014

4 commits

  • Previously, we ra_sum_pages to pre-read contiguous pages as more
    as possible, and if we fail to alloc more pages, an ENOMEM error
    will be reported upstream, even though we have alloced some pages
    yet. In fact, we can use the available pages to do the job partly,
    and continue the rest in the following circle. Only reporting ENOMEM
    upstream if we really can not alloc any available page.

    And another fix is ignoring dealing with the following pages if an
    EIO occurs when reading page from page_list.

    Signed-off-by: Gu Zheng
    Reviewed-by: Chao Yu
    [Jaegeuk Kim: modify the flow for better neat code]
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • The original segment_info's show is a bit out-of-format:

    [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
    0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    ......
    0 0 0 0 0 0 0 0 0 0
    0 0 1 0 0 1 [root@guz Demoes]#

    so we fix it here for better legibility.
    [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    ......
    0 0 0 0 0 0 0 0 0 0
    0 0 1 0 0 1
    [root@guz Demoes]#

    Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • Integrated a couple of minor changes for better readability suggested by
    Chao Yu.

    Signed-off-by: Gu Zheng
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     

05 Mar, 2014

1 commit