02 Oct, 2013

2 commits


01 Oct, 2013

4 commits

  • Pull NFS client bugfixes from Trond Myklebust:
    - Stable fix for Oopses in the pNFS files layout driver
    - Fix a regression when doing a non-exclusive file create on NFSv4.x
    - NFSv4.1 security negotiation fixes when looking up the root
    filesystem
    - Fix a memory ordering issue in the pNFS files layout driver

    * tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS: Give "flavor" an initial value to fix a compile warning
    NFSv4.1: try SECINFO_NO_NAME flavs until one works
    NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
    NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
    NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton.

    * emailed patches from Andrew Morton : (22 commits)
    pidns: fix free_pid() to handle the first fork failure
    ipc,msg: prevent race with rmid in msgsnd,msgrcv
    ipc/sem.c: update sem_otime for all operations
    mm/hwpoison: fix the lack of one reference count against poisoned page
    mm/hwpoison: fix false report on 2nd attempt at page recovery
    mm/hwpoison: fix test for a transparent huge page
    mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
    block: change config option name for cmdline partition parsing
    mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
    mm: avoid reinserting isolated balloon pages into LRU lists
    arch/parisc/mm/fault.c: fix uninitialized variable usage
    include/asm-generic/vtime.h: avoid zero-length file
    nilfs2: fix issue with race condition of competition between segments for dirty blocks
    Documentation/kernel-parameters.txt: replace kernelcore with Movable
    mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
    kernel/kmod.c: check for NULL in call_usermodehelper_exec()
    ipc/sem.c: synchronize the proc interface
    ipc/sem.c: optimize sem_lock()
    ipc/sem.c: fix race in sem_lock()
    mm/compaction.c: periodically schedule when freeing pages
    ...

    Linus Torvalds
     
  • Many NILFS2 users were reported about strange file system corruption
    (for example):

    NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
    NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

    But such error messages are consequence of file system's issue that takes
    place more earlier. Fortunately, Jerome Poulin
    and Anton Eliasson were reported about another
    issue not so recently. These reports describe the issue with segctor
    thread's crash:

    BUG: unable to handle kernel paging request at 0000000000004c83
    IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Call Trace:
    nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
    nilfs_segctor_construct+0x17b/0x290 [nilfs2]
    nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
    kthread+0xc0/0xd0
    ret_from_fork+0x7c/0xb0

    These two issues have one reason. This reason can raise third issue
    too. Third issue results in hanging of segctor thread with eating of
    100% CPU.

    REPRODUCING PATH:

    One of the possible way or the issue reproducing was described by
    Jermoe me Poulin :

    1. init S to get to single user mode.
    2. sysrq+E to make sure only my shell is running
    3. start network-manager to get my wifi connection up
    4. login as root and launch "screen"
    5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
    6. lscp | xz -9e > lscp.txt.xz
    7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
    8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
    9. start a screen and launch strace -f -o find-cat.log -t find
    /mnt/nilfs -type f -exec cat {} > /dev/null \;
    10. start a screen and launch strace -f -o apt-get.log -t apt-get update
    11. launch the last command again as it did not crash the first time
    12. apt-get crashes
    13. ps aux > ps-aux-crashed.log
    13. sysrq+W
    14. sysrq+E wait for everything to terminate
    15. sysrq+SUSB

    Simplified way of the issue reproducing is starting kernel compilation
    task and "apt-get update" in parallel.

    REPRODUCIBILITY:

    The issue is reproduced not stable [60% - 80%]. It is very important to
    have proper environment for the issue reproducing. The critical
    conditions for successful reproducing:

    (1) It should have big modified file by mmap() way.

    (2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

    (3) It should be intensive background activity of files modification
    in another thread.

    INVESTIGATION:

    First of all, it is possible to see that the reason of crash is not valid
    page address:

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
    NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

    Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
    number. And b_blocknr with b_size values look like block numbers. So,
    buffer_head's pointer points on not proper address value.

    Detailed investigation of the issue is discovered such picture:

    [-----------------------------SEGMENT 6783-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

    [-----------------------------SEGMENT 6784-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

    [-----------------------------SEGMENT 6785-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

    NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

    BUG: unable to handle kernel paging request at 0000000000001a82
    IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Usually, for every segment we collect dirty files in list. Then, dirty
    blocks are gathered for every dirty file, prepared for write and
    submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
    place complete write phase after calling nilfs_end_bio_write() on the
    block layer. Buffers/pages are marked as not dirty on final phase and
    processed files removed from the list of dirty files.

    It is possible to see that we had three prepare_write and submit_bio
    phases before segbuf_wait and complete_write phase. Moreover, segments
    compete between each other for dirty blocks because on every iteration
    of segments processing dirty buffer_heads are added in several lists of
    payload_buffers:

    [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

    The next pointer is the same but prev pointer has changed. It means
    that buffer_head has next pointer from one list but prev pointer from
    another. Such modification can be made several times. And, finally, it
    can be resulted in various issues: (1) segctor hanging, (2) segctor
    crashing, (3) file system metadata corruption.

    FIX:
    This patch adds:

    (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

    (2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

    (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

    Reported-by: Jerome Poulin
    Reported-by: Anton Eliasson
    Cc: Paul Fertser
    Cc: ARAI Shun-ichi
    Cc: Piotr Szymaniak
    Cc: Juan Barry Manuel Canham
    Cc: Zahid Chowdhury
    Cc: Elmer Zhang
    Cc: Kenneth Langga
    Signed-off-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • A high setting of max_map_count, and a process core-dumping with a large
    enough vm_map_count could result in an NT_FILE note not being written,
    and the kernel crashing immediately later because it has assumed
    otherwise.

    Reproduction of the oops-causing bug described here:

    https://lkml.org/lkml/2013/8/30/50

    Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump
    note section to contain file names of mapped file") from Oct 4, 2012.

    This patch make that section optional in that case. fill_files_note()
    should signify the error, and also let the info struct in
    elf_core_dump() be zero-initialized so that we can check for the
    optionally written note.

    [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
    [akpm@linux-foundation.org: fix sparse warning]
    Signed-off-by: Dan Aloni
    Cc: Al Viro
    Cc: Denys Vlasenko
    Reported-by: Martin MOKREJS
    Tested-by: Martin MOKREJS
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Aloni
     

30 Sep, 2013

7 commits


29 Sep, 2013

1 commit

  • Pull xfs bugfixes from Ben Myers:
    - fix for directory node collapse regression
    - fix for recovery over stale on disk structures
    - fix for eofblocks ioctl
    - fix asserts in xfs_inode_free
    - lock the ail before removing an item from it

    * tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs:
    xfs: fix node forward in xfs_node_toosmall
    xfs: log recovery lsn ordering needs uuid check
    xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
    xfs: asserting lock not held during freeing not valid
    xfs: lock the AIL before removing the buffer item

    Linus Torvalds
     

28 Sep, 2013

1 commit


26 Sep, 2013

2 commits

  • Commit f5ea1100 cleans up the disk to host conversions for
    node directory entries, but because a variable is reused in
    xfs_node_toosmall() the next node is not correctly found.
    If the original node is small enough (< BBTOB(bp->b_length),
    file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569

    Keep the original node header to get the correct forward node.

    (When a node is considered for a merge with a sibling, it overwrites the
    sibling pointers of the original incore nodehdr with the sibling's
    pointers. This leads to loop considering the original node as a merge
    candidate with itself in the second pass, and so it incorrectly
    determines a merge should occur.)

    Signed-off-by: Mark Tinguely
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    [v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
    cleaned up whitespace. -bpm]

    Mark Tinguely
     
  • Determine if we've created a new file by examining the directory change
    attribute and/or the O_EXCL flag.

    This fixes a regression when doing a non-exclusive create of a new file.
    If the FILE_CREATED flag is not set, the atomic_open() command will
    perform full file access permissions checks instead of just checking
    for MAY_OPEN.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

25 Sep, 2013

7 commits

  • Merge fixes from Andrew Morton:
    "Bunch of fixes.

    And a reversion of mhocko's "Soft limit rework" patch series. This is
    actually your fault for opening the merge window when I was off racing ;)

    I didn't read the email thread before sending everything off.
    Johannes Weiner raised significant issues:

    http://www.spinics.net/lists/cgroups/msg08813.html

    and we agreed to back it all out"

    I clearly need to be more aware of Andrew's racing schedule.

    * akpm:
    MAINTAINERS: update mach-bcm related email address
    checkpatch: make extern in .h prototypes quieter
    cciss: fix info leak in cciss_ioctl32_passthru()
    cpqarray: fix info leak in ida_locked_ioctl()
    kernel/reboot.c: re-enable the function of variable reboot_default
    audit: fix endless wait in audit_log_start()
    revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"
    revert "memcg: get rid of soft-limit tree infrastructure"
    revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"
    revert "memcg: enhance memcg iterator to support predicates"
    revert "memcg: track children in soft limit excess to improve soft limit"
    revert "memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything"
    revert "memcg: track all children over limit in the root"
    revert "memcg, vmscan: do not fall into reclaim-all pass too quickly"
    fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volume
    watchdog: update watchdog_thresh properly
    watchdog: update watchdog attributes atomically

    Linus Torvalds
     
  • While printing 32-bit node numbers, an 8-byte string is not enough.
    Increase the size of the string to 12 chars.

    This got left out in commit 49fa8140e487 ("fs/ocfs2/super.c: Use bigger
    nodestr to accomodate 32-bit node numbers").

    Signed-off-by: Goldwyn Rodrigues
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • The memcpy() in bio_copy_data() was using the wrong offset vars, leading
    to data corruption in weird unusual setups.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: linux-stable # >= v3.9
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • After a fair number of xfstests runs, xfs/182 started to fail
    regularly with a corrupted directory - a directory read verifier was
    failing after recovery because it found a block with a XARM magic
    number (remote attribute block) rather than a directory data block.

    The first time I saw this repeated failure I did /something/ and the
    problem went away, so I was never able to find the underlying
    problem. Test xfs/182 failed again today, and I found the root
    cause before I did /something else/ that made it go away.

    Tracing indicated that the block in question was being correctly
    logged, the log was being flushed by sync, but the buffer was not
    being written back before the shutdown occurred. Tracing also
    indicated that log recovery was also reading the block, but then
    never writing it before log recovery invalidated the cache,
    indicating that it was not modified by log recovery.

    More detailed analysis of the corpse indicated that the filesystem
    had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
    block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
    indicated it was a block from an older filesystem. The reason that
    log recovery didn't replay it was that the LSN in the XARM block was
    larger than the LSN of the transaction being replayed, and so the
    block was not overwritten by log recovery.

    Hence, log recovery cant blindly trust the magic number and LSN in
    the block - it must verify that it belongs to the filesystem being
    recovered before using the LSN. i.e. if the UUIDs don't match, we
    need to unconditionally recovery the change held in the log.

    This patch was first tested on a block device that was repeatedly
    causing xfs/182 to fail with the same failure on the same block with
    the same directory read corruption signature (i.e. XARM block). It
    did not fail, and hasn't failed since.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • It uses a kernel internal structure in it's definition rather than
    the user visible structure that is passed to the ioctl.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When we free an inode, we do so via RCU. As an RCU lookup can occur
    at any time before we free an inode, and that lookup takes the inode
    flags lock, we cannot safely assert that the flags lock is not held
    just before marking it dead and running call_rcu() to free the
    inode.

    We check on allocation of a new inode structre that the lock is not
    held, so we still have protection against locks being leaked and
    hence not correctly initialised when allocated out of the slab.
    Hence just remove the assert...

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Regression introduced by commit 46f9d2e ("xfs: aborted buf items can
    be in the AIL") which fails to lock the AIL before removing the
    item. Spinlock debugging throws a warning about this.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

24 Sep, 2013

3 commits

  • There are two locks involved in managing the journal lists. The general
    reiserfs_write_lock and the journal->j_flush_mutex.

    While flush_journal_list is sleeping to acquire the j_flush_mutex or to
    submit a block for write, it will drop the write lock. This allows
    another thread to acquire the write lock and ultimately call
    flush_used_journal_lists to traverse the list of journal lists and
    select one for flushing. It can select the journal_list that has just
    had flush_journal_list called on it in the original thread and call it
    again with the same journal_list.

    The second thread then drops the write lock to acquire j_flush_mutex and
    the first thread reacquires it and continues execution and eventually
    clears and frees the journal list before dropping j_flush_mutex and
    returning.

    The second thread acquires j_flush_mutex and ends up operating on a
    journal_list that has already been released. If the memory hasn't
    been reused, we'll soon after hit a BUG_ON because the transaction id
    has already been cleared. If it's been reused, we'll crash in other
    fun ways.

    Since flush_journal_list will synchronize on j_flush_mutex, we can fix
    the race by taking a proper reference in flush_used_journal_lists
    and checking to see if it's still valid after the mutex is taken. It's
    safe to iterate the list of journal lists and pick a list with
    just the write lock as long as a reference is taken on the journal list
    before we drop the lock. We already have code to handle whether a
    transaction has been flushed already so we can use that to handle the
    race and get rid of the trans_id BUG_ON.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara

    Jeff Mahoney
     
  • Commit a3172027 introduced test_transaction as a requirement for
    flushing old lists -- but it can never return 1 unless the transaction
    has already been flushed.

    As a result, we have a routine that iterates the j_realblocks list but
    doesn't actually do anything. Since it's been this way since 2006 and
    the latency numbers were what Chris expected, let's just rip it out.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara

    Jeff Mahoney
     
  • A user has reported an oops in udf_statfs() that was caused by
    numOfPartitions entry in LVID structure being corrupted. Fix the problem
    by verifying whether numOfPartitions makes sense at least to the extent
    that LVID fits into a single block as it should.

    Reported-by: Juergen Weigert
    Signed-off-by: Jan Kara

    Jan Kara
     

23 Sep, 2013

2 commits

  • Pull block IO fixes from Jens Axboe:
    "After merge window, no new stuff this time only a collection of neatly
    confined and simple fixes"

    * 'for-3.12/core' of git://git.kernel.dk/linux-block:
    cfq: explicitly use 64bit divide operation for 64bit arguments
    block: Add nr_bios to block_rq_remap tracepoint
    If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
    bio-integrity: Fix use of bs->bio_integrity_pool after free
    blkcg: relocate root_blkg setting and clearing
    block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
    block: trace all devices plug operation

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "These are mostly bug fixes and a two small performance fixes. The
    most important of the bunch are Josef's fix for a snapshotting
    regression and Mark's update to fix compile problems on arm"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: create the uuid tree on remount rw
    btrfs: change extent-same to copy entire argument struct
    Btrfs: dir_inode_operations should use btrfs_update_time also
    btrfs: Add btrfs: prefix to kernel log output
    btrfs: refuse to remount read-write after abort
    Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
    Btrfs: don't leak transaction in btrfs_sync_file()
    Btrfs: add the missing mutex unlock in write_all_supers()
    Btrfs: iput inode on allocation failure
    Btrfs: remove space_info->reservation_progress
    Btrfs: kill delay_iput arg to the wait_ordered functions
    Btrfs: fix worst case calculator for space usage
    Revert "Btrfs: rework the overcommit logic to be based on the total size"
    Btrfs: improve replacing nocow extents
    Btrfs: drop dir i_size when adding new names on replay
    Btrfs: replay dir_index items before other items
    Btrfs: check roots last log commit when checking if an inode has been logged
    Btrfs: actually log directory we are fsync()'ing
    Btrfs: actually limit the size of delalloc range
    Btrfs: allocate the free space by the existed max extent size when ENOSPC
    ...

    Linus Torvalds
     

21 Sep, 2013

11 commits

  • Users have been complaining of the uuid tree stuff warning that there is no uuid
    root when trying to do snapshot operations. This is because if you mount -o ro
    we will not create the uuid tree. But then if you mount -o rw,remount we will
    still not create it and then any subsequent snapshot/subvol operations you try
    to do will fail gloriously. Fix this by creating the uuid_root on remount rw if
    it was not already there. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
    back to it's argument struct. Unfortunately, not all architectures provide
    __put_user_unaligned(), so compiles break on them if btrfs is selected.

    Instead, just copy the whole struct in / out at the start and end of
    operations, respectively.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Mark Fasheh
     
  • Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on
    RO subvolumes) ensures that the access time of an inode is not updated when
    the inode lives in a read-only subvolume.
    However, if a directory on a read-only subvolume is accessed, the atime is
    updated. This results in a write operation to a read-only subvolume. I
    believe that access times should never be updated on read-only subvolumes.

    To reproduce:

    # mkfs.btrfs -f /dev/dm-3
    (...)
    # mount /dev/dm-3 /mnt
    # btrfs subvol create /mnt/sub
    Create subvolume '/mnt/sub'
    # mkdir /mnt/sub/dir
    # echo "abc" > /mnt/sub/dir/file
    # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
    Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
    # stat /mnt/rosnap/dir
    File: `/mnt/rosnap/dir'
    Size: 8 Blocks: 0 IO Block: 4096 directory
    Device: 16h/22d Inode: 257 Links: 1
    Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2013-09-11 07:21:49.389157126 -0400
    Modify: 2013-09-11 07:22:02.330156079 -0400
    Change: 2013-09-11 07:22:02.330156079 -0400
    # ls /mnt/rosnap/dir
    file
    # stat /mnt/rosnap/dir
    File: `/mnt/rosnap/dir'
    Size: 8 Blocks: 0 IO Block: 4096 directory
    Device: 16h/22d Inode: 257 Links: 1
    Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2013-09-11 07:22:56.797151670 -0400
    Modify: 2013-09-11 07:22:02.330156079 -0400
    Change: 2013-09-11 07:22:02.330156079 -0400

    Reported-by: Koen De Wit
    Signed-off-by: Guangyu Sun
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Guangyu Sun
     
  • The kernel log entries for device label %s and device fsid %pU
    are missing the btrfs: prefix. Add those here.

    Signed-off-by: Frank Holton
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Frank Holton
     
  • It's still possible to flip the filesystem into RW mode after it's
    remounted RO due to an abort. There are lots of places that check for
    the superblock error bit and will not write data, but we should not let
    the filesystem appear read-write.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
    subvolume by passing a subvolume id of 0.

    Signed-off-by: chandan
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    chandan
     
  • In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
    a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
    return without ending/freeing the transaction.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • The BUG() was replaced by btrfs_error() and return -EIO with the
    patch "get rid of one BUG() in write_all_supers()", but the missing
    mutex_unlock() was overlooked.

    The 0-DAY kernel build service from Intel reported the missing
    unlock which was found by the coccinelle tool:

    fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374

    Signed-off-by: Stefan Behrens
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • We don't do the iput when we fail to allocate our delayed delalloc work in
    __start_delalloc_inodes, fix this.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This isn't used for anything anymore, just remove it.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This is a left over of how we used to wait for ordered extents, which was to
    grab the inode and then run filemap flush on it. However if we have an ordered
    extent then we already are holding a ref on the inode, and we just use
    btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
    the inode to start work on the ordered extent. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik