08 Apr, 2014

1 commit

  • filemap_map_pages() is generic implementation of ->map_pages() for
    filesystems who uses page cache.

    It should be safe to use filemap_map_pages() for ->map_pages() if
    filesystem use filemap_fault() for ->fault().

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Alexander Viro
    Cc: Dave Chinner
    Cc: Ning Qu
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

04 Apr, 2014

6 commits

  • Add code to check sizes of on-disk data of metadata files such as inode
    size, segment usage size, DAT entry size, and checkpoint size. Although
    these sizes are read from disk, the current implementation doesn't check
    them.

    If these sizes are not sane on disk, it can cause out-of-range access to
    metadata or memory access overrun on metadata block buffers due to
    overflow in sundry calculations.

    Both lower limit and upper limit of metadata sizes are verified to
    prevent these issues.

    Signed-off-by: Ryusuke Konishi
    Cc: Andreas Rohner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Add support for the FITRIM ioctl, which enables user space tools to
    issue TRIM/DISCARD requests to the underlying device. Every clean
    segment within the specified range will be discarded.

    Signed-off-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     
  • Add nilfs_sufile_trim_fs(), which takes an fstrim_range structure and
    calls blkdev_issue_discard for every clean segment in the specified
    range. The range is truncated to file system block boundaries.

    Signed-off-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     
  • With this ioctl the segment usage entries in the SUFILE can be updated
    from userspace.

    This is useful, because it allows the userspace GC to modify and update
    segment usage entries for specific segments, which enables it to avoid
    unnecessary write operations.

    If a segment needs to be cleaned, but there is no or very little
    reclaimable space in it, the cleaning operation basically degrades to a
    useless moving operation. In the end the only thing that changes is the
    location of the data and a timestamp in the segment usage information.
    With this ioctl the GC can skip the cleaning and update the segment
    usage entries directly instead.

    This is basically a shortcut to cleaning the segment. It is still
    necessary to read the segment summary information, but the writing of
    the live blocks can be skipped if it's not worth it.

    [konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
    Signed-off-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     
  • Introduce nilfs_sufile_set_suinfo(), which expects an array of
    nilfs_suinfo_update structures and updates the segment usage information
    accordingly.

    This is basically a helper function for the newly introduced
    NILFS_IOCTL_SET_SUINFO ioctl.

    [konishi.ryusuke@lab.ntt.co.jp: use put_bh() instead of brelse() because we know bh != NULL]
    Signed-off-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     
  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

24 Jan, 2014

2 commits

  • Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2
    specific ioctls in Documentation/filesystems/nilfs2.txt.

    Signed-off-by: Vyacheslav Dubeyko
    Reviewed-by: Ryusuke Konishi
    Cc: Wenliang Fan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • The local variable 'pos' in nilfs_ioctl_wrap_copy function can overflow if
    a large number was passed to argv->v_index from userspace and the sum of
    argv->v_index and argv->v_nmembs exceeds the maximum value of __u64 type
    integer (= ~(__u64)0 = 18446744073709551615).

    Here, argv->v_index is a 64-bit width argument to specify the start
    position of target data items (such as segment number, checkpoint number,
    or virtual block address of nilfs), and argv->v_nmembs gives the total
    number of the items that userland programs (such as lssu, lscp, or
    cleanerd) want to get information about, which also gives the maximum
    element count of argv->v_base[] array.

    nilfs_ioctl_wrap_copy() calls dofunc() repeatedly and increments the
    position variable 'pos' at the end of each iteration if dofunc() itself
    didn't update 'pos':

    if (pos == ppos)
    pos += n;

    This patch prevents the overflow here by rejecting pairs of a start
    position (argv->v_index) and a total count (argv->v_nmembs) which leads to
    the overflow.

    [konishi.ryusuke@lab.ntt.co.jp: fix signedness issue]
    Signed-off-by: Wenliang Fan
    Cc: Vyacheslav Dubeyko
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wenliang Fan
     

15 Jan, 2014

1 commit

  • There is a bug in the function nilfs_segctor_collect, which results in
    active data being written to a segment, that is marked as clean. It is
    possible, that this segment is selected for a later segment
    construction, whereby the old data is overwritten.

    The problem shows itself with the following kernel log message:

    nilfs_sufile_do_cancel_free: segment 6533 must be clean

    Usually a few hours later the file system gets corrupted:

    NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
    NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

    The issue can be reproduced with a file system that is nearly full and
    with the cleaner running, while some IO intensive task is running.
    Although it is quite hard to reproduce.

    This is what happens:

    1. The cleaner starts the segment construction
    2. nilfs_segctor_collect is called
    3. sc_stage is on NILFS_ST_SUFILE and segments are freed
    4. sc_stage is on NILFS_ST_DAT current segment is full
    5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
    6. The new segment is one of the segments freed in step 3
    7. nilfs_sufile_cancel_freev is called and produces an error message
    8. Loop around and the collection starts again
    9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
    10. A few hours later another segment construction allocates the
    segment and causes file system corruption

    This can be prevented by simply reordering the statements. If
    nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
    the freed segments are marked as dirty and cannot be allocated any more.

    Signed-off-by: Andreas Rohner
    Reviewed-by: Ryusuke Konishi
    Tested-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     

24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

01 Oct, 2013

1 commit

  • Many NILFS2 users were reported about strange file system corruption
    (for example):

    NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
    NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

    But such error messages are consequence of file system's issue that takes
    place more earlier. Fortunately, Jerome Poulin
    and Anton Eliasson were reported about another
    issue not so recently. These reports describe the issue with segctor
    thread's crash:

    BUG: unable to handle kernel paging request at 0000000000004c83
    IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Call Trace:
    nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
    nilfs_segctor_construct+0x17b/0x290 [nilfs2]
    nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
    kthread+0xc0/0xd0
    ret_from_fork+0x7c/0xb0

    These two issues have one reason. This reason can raise third issue
    too. Third issue results in hanging of segctor thread with eating of
    100% CPU.

    REPRODUCING PATH:

    One of the possible way or the issue reproducing was described by
    Jermoe me Poulin :

    1. init S to get to single user mode.
    2. sysrq+E to make sure only my shell is running
    3. start network-manager to get my wifi connection up
    4. login as root and launch "screen"
    5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
    6. lscp | xz -9e > lscp.txt.xz
    7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
    8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
    9. start a screen and launch strace -f -o find-cat.log -t find
    /mnt/nilfs -type f -exec cat {} > /dev/null \;
    10. start a screen and launch strace -f -o apt-get.log -t apt-get update
    11. launch the last command again as it did not crash the first time
    12. apt-get crashes
    13. ps aux > ps-aux-crashed.log
    13. sysrq+W
    14. sysrq+E wait for everything to terminate
    15. sysrq+SUSB

    Simplified way of the issue reproducing is starting kernel compilation
    task and "apt-get update" in parallel.

    REPRODUCIBILITY:

    The issue is reproduced not stable [60% - 80%]. It is very important to
    have proper environment for the issue reproducing. The critical
    conditions for successful reproducing:

    (1) It should have big modified file by mmap() way.

    (2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

    (3) It should be intensive background activity of files modification
    in another thread.

    INVESTIGATION:

    First of all, it is possible to see that the reason of crash is not valid
    page address:

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
    NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

    Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
    number. And b_blocknr with b_size values look like block numbers. So,
    buffer_head's pointer points on not proper address value.

    Detailed investigation of the issue is discovered such picture:

    [-----------------------------SEGMENT 6783-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

    [-----------------------------SEGMENT 6784-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

    [-----------------------------SEGMENT 6785-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

    NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

    BUG: unable to handle kernel paging request at 0000000000001a82
    IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Usually, for every segment we collect dirty files in list. Then, dirty
    blocks are gathered for every dirty file, prepared for write and
    submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
    place complete write phase after calling nilfs_end_bio_write() on the
    block layer. Buffers/pages are marked as not dirty on final phase and
    processed files removed from the list of dirty files.

    It is possible to see that we had three prepare_write and submit_bio
    phases before segbuf_wait and complete_write phase. Moreover, segments
    compete between each other for dirty blocks because on every iteration
    of segments processing dirty buffer_heads are added in several lists of
    payload_buffers:

    [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

    The next pointer is the same but prev pointer has changed. It means
    that buffer_head has next pointer from one list but prev pointer from
    another. Such modification can be made several times. And, finally, it
    can be resulted in various issues: (1) segctor hanging, (2) segctor
    crashing, (3) file system metadata corruption.

    FIX:
    This patch adds:

    (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

    (2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

    (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

    Reported-by: Jerome Poulin
    Reported-by: Anton Eliasson
    Cc: Paul Fertser
    Cc: ARAI Shun-ichi
    Cc: Piotr Szymaniak
    Cc: Juan Barry Manuel Canham
    Cc: Zahid Chowdhury
    Cc: Elmer Zhang
    Cc: Kenneth Langga
    Signed-off-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     

13 Sep, 2013

1 commit


05 Sep, 2013

1 commit

  • Pull vfs pile 1 from Al Viro:
    "Unfortunately, this merge window it'll have a be a lot of small piles -
    my fault, actually, for not keeping #for-next in anything that would
    resemble a sane shape ;-/

    This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
    cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
    components) + several long-standing patches from various folks.

    There definitely will be a lot more (starting with Miklos'
    check_submount_and_drop() series)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    direct-io: Handle O_(D)SYNC AIO
    direct-io: Implement generic deferred AIO completions
    add formats for dentry/file pathnames
    kvm eventfd: switch to fdget
    powerpc kvm: use fdget
    switch fchmod() to fdget
    switch epoll_ctl() to fdget
    switch copy_module_from_fd() to fdget
    git simplify nilfs check for busy subtree
    ibmasmfs: don't bother passing superblock when not needed
    don't pass superblock to hypfs_{mkdir,create*}
    don't pass superblock to hypfs_diag_create_files
    don't pass superblock to hypfs_vm_create_files()
    oprofile: get rid of pointless forward declarations of struct super_block
    oprofilefs_create_...() do not need superblock argument
    oprofilefs_mkdir() doesn't need superblock argument
    don't bother with passing superblock to oprofile_create_stats_files()
    oprofile: don't bother with passing superblock to ->create_files()
    don't bother passing sb to oprofile_create_files()
    coh901318: don't open-code simple_read_from_buffer()
    ...

    Linus Torvalds
     

04 Sep, 2013

1 commit


24 Aug, 2013

2 commits

  • Fix the issue with improper counting number of flying bio requests for
    BIO_EOPNOTSUPP error detection case.

    The sb_nbio must be incremented exactly the same number of times as
    complete() function was called (or will be called) because
    nilfs_segbuf_wait() will call wail_for_completion() for the number of
    times set to sb_nbio:

    do {
    wait_for_completion(&segbuf->sb_bio_event);
    } while (--segbuf->sb_nbio > 0);

    Two functions complete() and wait_for_completion() must be called the
    same number of times for the same sb_bio_event. Otherwise,
    wait_for_completion() will hang or leak.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Remove double call of bio_put() in nilfs_end_bio_write() for the case of
    BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter
    and he suggests first version of the fix too.

    Signed-off-by: Vyacheslav Dubeyko
    Reported-by: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     

05 Jul, 2013

1 commit


04 Jul, 2013

2 commits

  • The cp_inodes_count and cp_blocks_count are represented as __le64 type in
    on-disk structure (struct nilfs_checkpoint). But analogous fields in
    in-core structure (struct nilfs_root) are represented by atomic_t type.

    This patch replaces atomic_t on atomic64_t type in representation of
    inodes_count and blocks_count fields in struct nilfs_root.

    Signed-off-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Acked-by: Joern Engel
    Cc: Clemens Eisserer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Currently, NILFS2 returns 0 as free inodes count (f_ffree) and current
    used inodes count as total file nodes in file system (f_files):

    df -i
    Filesystem Inodes IUsed IFree IUse% Mounted on
    /dev/loop0 2 2 0 100% /mnt/nilfs2

    This patch implements real calculation of free inodes count. First of
    all, it is calculated total file nodes in file system as
    (desc_blocks_count * groups_per_desc_block * entries_per_group). Then, it
    is calculated free inodes count as difference the total file nodes and
    used inodes count. As a result, we have such output for NILFS2:

    df -i
    Filesystem Inodes IUsed IFree IUse% Mounted on
    /dev/loop0 4194304 2114701 2079603 51% /mnt/nilfs2

    Reported-by: Clemens Eisserer
    Signed-off-by: Vyacheslav Dubeyko
    Signed-off-by: Ryusuke Konishi
    Tested-by: Vyacheslav Dubeyko
    Cc: Joern Engel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     

29 Jun, 2013

1 commit


25 May, 2013

1 commit

  • nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

    DESCRIPTION:
    There are use-cases when NILFS2 file system (formatted with block size
    lesser than 4 KB) can be remounted in RO mode because of encountering of
    "broken bmap" issue.

    The issue was reported by Anthony Doggett :
    "The machine I've been trialling nilfs on is running Debian Testing,
    Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
    version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
    also reproduced it (identically) with Debian Unstable amd64 and Debian
    Experimental (using the 3.8-trunk kernel). The problematic partitions
    were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

    SYMPTOMS:
    (1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

    (2) The NILFS2 file system is remounted in RO mode.

    REPRODUSING PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett ):

    ----------------[BEGIN SCRIPT]--------------------

    VG=unencrypted
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------

    REPRODUCIBILITY: 100%

    INVESTIGATION:
    As it was discovered, the issue takes place during segment
    construction after executing such sequence of user-space operations:

    open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
    fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
    ftruncate(7, 60)

    The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
    bmap (inode number=28)" takes place because of trying to get block
    number for third block of the file with logical offset #3072 bytes. As
    it is possible to see from above output, the file has 60 bytes of the
    whole size. So, it is enough one block (1 KB in size) allocation for
    the whole file. Trying to operate with several blocks instead of one
    takes place because of discovering several dirty buffers for this file
    in nilfs_segctor_scan_file() method.

    The root cause of this issue is in nilfs_set_page_dirty function which
    is called just before writing to an mmapped page.

    When nilfs_page_mkwrite function handles a page at EOF boundary, it
    fills hole blocks only inside EOF through __block_page_mkwrite().

    The __block_page_mkwrite() function calls set_page_dirty() after filling
    hole blocks, thus nilfs_set_page_dirty function (=
    a_ops->set_page_dirty) is called. However, the current implementation
    of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
    at EOF boundary.

    As a result, buffers outside EOF are inconsistently marked dirty and
    queued for write even though they are not mapped with nilfs_get_block
    function.

    FIX:
    This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

    Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
    for this issue.

    Signed-off-by: Ryusuke Konishi
    Reported-by: Anthony Doggett
    Reported-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

01 May, 2013

3 commits

  • page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
    unneeded test.

    The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
    error: we previously assumed 'inode' could be null (see line 195)".

    Reported-by: Dan Carpenter
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Change test_bit(PG_locked, &page->flags) to PageLocked().

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • …river's internal error or metadata corruption

    The NILFS2 driver remounts itself in RO mode in the case of discovering
    metadata corruption (for example, discovering a broken bmap). But
    usually, this takes place when there have been file system operations
    before remounting in RO mode.

    Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
    modified inodes' address spaces. It results in flush kernel thread's
    infinite trying to flush dirty pages in RO mode. As a result, it is
    possible to see such side effects as: (1) flush kernel thread occupies
    50% - 99% of CPU time; (2) system can't be shutdowned without manual
    power switch off.

    SYMPTOMS:
    (1) System log contains error message: "Remounting filesystem read-only".
    (2) The flush kernel thread occupies 50% - 99% of CPU time.
    (3) The system can't be shutdowned without manual power switch off.

    REPRODUCTION PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

    ----------------[BEGIN SCRIPT]--------------------
    #!/bin/bash

    VG=unencrypted
    #apt-get install nilfs-tools darcs
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------

    (3) Try to shutdown the system.

    REPRODUCIBILITY: 100%

    FIX:

    This patch implements checking mount state of NILFS2 driver in
    nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
    methods. If it is detected the RO mount state then all dirty pages are
    simply discarded with warning messages is written in system log.

    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
    Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
    Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
    Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
    Cc: Elmer Zhang <freeboy6716@gmail.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Vyacheslav Dubeyko
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit


23 Feb, 2013

1 commit


22 Feb, 2013

3 commits

  • Merge misc patches from Andrew Morton:

    - Florian has vanished so I appear to have become fbdev maintainer
    again :(

    - Joel and Mark are distracted to welcome to the new OCFS2 maintainer

    - The backlight queue

    - Small core kernel changes

    - lib/ updates

    - The rtc queue

    - Various random bits

    * akpm: (164 commits)
    rtc: rtc-davinci: use devm_*() functions
    rtc: rtc-max8997: use devm_request_threaded_irq()
    rtc: rtc-max8907: use devm_request_threaded_irq()
    rtc: rtc-da9052: use devm_request_threaded_irq()
    rtc: rtc-wm831x: use devm_request_threaded_irq()
    rtc: rtc-tps80031: use devm_request_threaded_irq()
    rtc: rtc-lp8788: use devm_request_threaded_irq()
    rtc: rtc-coh901331: use devm_clk_get()
    rtc: rtc-vt8500: use devm_*() functions
    rtc: rtc-tps6586x: use devm_request_threaded_irq()
    rtc: rtc-imxdi: use devm_clk_get()
    rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
    rtc: rtc-pcf8583: use dev_warn() instead of printk()
    rtc: rtc-sun4v: use pr_warn() instead of printk()
    rtc: rtc-vr41xx: use dev_info() instead of printk()
    rtc: rtc-rs5c313: use pr_err() instead of printk()
    rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
    rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
    rtc: rtc-ds2404: use dev_err() instead of printk()
    rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
    ...

    Linus Torvalds
     
  • Create a helper function to check if a backing device requires stable
    page writes and, if so, performs the necessary wait. Then, make it so
    that all points in the memory manager that handle making pages writable
    use the helper function. This should provide stable page write support
    to most filesystems, while eliminating unnecessary waiting for devices
    that don't require the feature.

    Before this patchset, all filesystems would block, regardless of whether
    or not it was necessary. ext3 would wait, but still generate occasional
    checksum errors. The network filesystems were left to do their own
    thing, so they'd wait too.

    After this patchset, all the disk filesystems except ext3 and btrfs will
    wait only if the hardware requires it. ext3 (if necessary) snapshots
    pages instead of blocking, and btrfs provides its own bdi so the mm will
    never wait. Network filesystems haven't been touched, so either they
    provide their own stable page guarantees or they don't block at all.
    The blocking behavior is back to what it was before 3.0 if you don't
    have a disk requiring stable page writes.

    Here's the result of using dbench to test latency on ext2:

    3.8.0-rc3:
    Operation Count AvgLat MaxLat
    ----------------------------------------
    WriteX 109347 0.028 59.817
    ReadX 347180 0.004 3.391
    Flush 15514 29.828 287.283

    Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms

    3.8.0-rc3 + patches:
    WriteX 105556 0.029 4.273
    ReadX 335004 0.005 4.112
    Flush 14982 30.540 298.634

    Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms

    As you can see, the maximum write latency drops considerably with this
    patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave
    similarly, but see the cover letter for those results.

    Signed-off-by: Darrick J. Wong
    Acked-by: Steven Whitehouse
    Reviewed-by: Jan Kara
    Cc: Adrian Hunter
    Cc: Andy Lutomirski
    Cc: Artem Bityutskiy
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: Jens Axboe
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

05 Feb, 2013

1 commit

  • There exists a situation when GC can work in background alone without
    any other filesystem activity during significant time.

    The nilfs_clean_segments() method calls nilfs_segctor_construct() that
    updates superblocks in the case of NILFS_SC_SUPER_ROOT and
    THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the
    nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
    As a result, the update of superblocks doesn't occurred all this time
    and in the case of SPOR superblocks keep very old values of last super
    root placement.

    SYMPTOMS:

    Trying to mount a NILFS2 volume after SPOR in such environment ends with
    very long mounting time (it can achieve about several hours in some
    cases).

    REPRODUCING PATH:

    1. It needs to use external USB HDD, disable automount and doesn't
    make any additional filesystem activity on the NILFS2 volume.

    2. Generate temporary file with size about 100 - 500 GB (for example,
    dd if=/dev/zero of= bs=1073741824 count=200). The size of
    file defines duration of GC working.

    3. Then it needs to delete file.

    4. Start GC manually by means of command "nilfs-clean -p 0". When you
    start GC by means of such way then, at the end, superblocks is updated
    by once. So, for simulation of SPOR, it needs to wait sometime (15 -
    40 minutes) and simply switch off USB HDD manually.

    5. Switch on USB HDD again and try to mount NILFS2 volume. As a
    result, NILFS2 volume will mount during very long time.

    REPRODUCIBILITY: 100%

    FIX:

    This patch adds checking that superblocks need to update and set
    THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.

    Reported-by: Sergey Alexandrov
    Signed-off-by: Vyacheslav Dubeyko
    Tested-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     

12 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: KONISHI Ryusuke
    Signed-off-by: Kees Cook
    Acked-by: Ryusuke Konishi

    Kees Cook
     

21 Dec, 2012

1 commit


12 Dec, 2012

1 commit

  • Overhaul struct address_space.assoc_mapping renaming it to
    address_space.private_data and its type is redefined to void*. By this
    approach we consistently name the .private_* elements from struct
    address_space as well as allow extended usage for address_space
    association with other data structures through ->private_data.

    Also, all users of old ->assoc_mapping element are converted to reflect
    its new name and type change (->private_data).

    Signed-off-by: Rafael Aquini
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini