09 Oct, 2014

1 commit

  • Some UDF media have special inodes (like VAT or metadata partition
    inodes) whose link_count is 0. Thus commit 4071b9136223 (udf: Properly
    detect stale inodes) broke loading these inodes because udf_iget()
    started returning -ESTALE for them. Since we still need to properly
    detect stale inodes queried by NFS, create two variants of udf_iget() -
    one which is used for looking up special inodes (which ignores
    link_count == 0) and one which is used for other cases which return
    ESTALE when link_count == 0.

    Fixes: 4071b913622316970d0e1919f7d82b4403fec5f2
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

29 Sep, 2014

1 commit


17 Sep, 2014

1 commit

  • Currently write(2) updating i_size and close(2) of the file can race in
    such a way that udf_truncate_tail_extent() called from
    udf_file_release() sees old i_size but already new extents added by the
    running write call. This results in complaints like:
    UDF-fs: warning (device vdb2): udf_truncate_tail_extent: Too long extent
    after EOF in inode 877: i_size: 0 lbcount: 1073739776 extent 0+1073739776
    UDF-fs: error (device vdb2): udf_truncate_tail_extent: Extent after EOF
    in inode 877

    Fix the problem by grabbing i_mutex in udf_file_release() to be sure
    i_size is consistent with current state of extent list. Also avoid
    truncating tail extent unnecessarily when the file is still open for
    writing.

    Signed-off-by: Jan Kara

    Jan Kara
     

05 Sep, 2014

6 commits

  • Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • Currently udf_iget() (triggered by NFS) can race with udf_new_inode()
    leading to two inode structures with the same inode number:

    nfsd: iget_locked() creates inode
    nfsd: try to read from disk, block on that.
    udf_new_inode(): allocate inode with that inumber
    udf_new_inode(): insert it into icache, set it up and dirty
    udf_write_inode(): write inode into buffer cache
    nfsd: get CPU again, look into buffer cache, see nice and sane on-disk
    inode, set the in-core inode from it

    Fix the problem by putting inode into icache in locked state (I_NEW set)
    and unlocking it only after it's fully set up.

    Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • boilerplate code in udf_{create,mknod,symlink} taken to new helper

    symlink case converted to unique id calculated by udf_new_inode() - no
    point finding a new one.

    Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • Currently UDF doesn't initialize i_generation in any way and thus NFS
    can easily get reallocated inodes from stale file handles. Luckily UDF
    already has a unique object identifier associated with each inode -
    i_unique. Use that for initialization of i_generation.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • NFS can easily ask for inodes that are already deleted. Currently UDF
    happily returns such inodes which is a bug. Return -ESTALE if
    udf_read_inode() is asked to read deleted inode.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Currently __udf_read_inode() wasn't returning anything and we found out
    whether we succeeded reading inode by checking whether inode is bad or
    not. udf_iget() returned NULL on failure and inode pointer otherwise.
    Make these two functions properly propagate errors up the call stack and
    use the return value in callers.

    Signed-off-by: Jan Kara

    Jan Kara
     

04 Sep, 2014

3 commits


20 Aug, 2014

1 commit


16 Jul, 2014

3 commits


07 May, 2014

5 commits


13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

08 Apr, 2014

1 commit

  • Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
    "various cleanups for ext2, ext3, udf, isofs, a documentation update
    for quota, and a fix of a race in reiserfs readdir implementation"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    reiserfs: fix race in readdir
    ext2: acl: remove unneeded include of linux/capability.h
    ext3: explicitly remove inode from orphan list after failed direct io
    fs/isofs/inode.c add __init to init_inodecache()
    ext3: Speedup WB_SYNC_ALL pass
    fs/quota/Kconfig: Update filesystems
    ext3: Update outdated comment before ext3_ordered_writepage()
    ext3: Update PF_MEMALLOC handling in ext3_write_inode()
    ext2/3: use prandom_u32() instead of get_random_bytes()
    ext3: remove an unneeded check in ext3_new_blocks()
    ext3: remove unneeded check in ext3_ordered_writepage()
    fs: Mark function as static in ext3/xattr_security.c
    fs: Mark function as static in ext3/dir.c
    fs: Mark function as static in ext2/xattr_security.c
    ext3: Add __init macro to init_inodecache
    ext2: Add __init macro to init_inodecache
    udf: Add __init macro to init_inodecache
    fs: udf: parse_options: blocksize check

    Linus Torvalds
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Apr, 2014

1 commit


13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

03 Mar, 2014

2 commits


21 Feb, 2014

1 commit

  • UDF has two types of files - files with data stored in inode (ICB in
    UDF terminology) and files with data stored in external data blocks. We
    convert file from in-inode format to external format in
    udf_file_aio_write() when we find out data won't fit into inode any
    longer. However the following race between two O_APPEND writes can happen:

    CPU1 CPU2
    udf_file_aio_write() udf_file_aio_write()
    down_write(&iinfo->i_data_sem);
    checks that i_size + count1 fits within inode
    => no need to convert
    up_write(&iinfo->i_data_sem);
    down_write(&iinfo->i_data_sem);
    checks that i_size + count2 fits
    within inode => no need to convert
    up_write(&iinfo->i_data_sem);
    generic_file_aio_write()
    - extends file by count1 bytes
    generic_file_aio_write()
    - extends file by count2 bytes

    Clearly if count1 + count2 doesn't fit into the inode, we overwrite
    kernel buffers beyond inode, possibly corrupting the filesystem as well.

    Fix the problem by acquiring i_mutex before checking whether write fits
    into the inode and using __generic_file_aio_write() afterwards which
    puts check and write into one critical section.

    Reported-by: Al Viro
    Signed-off-by: Jan Kara

    Jan Kara
     

24 Dec, 2013

1 commit

  • Lockdep is complaining about UDF:
    =============================================
    [ INFO: possible recursive locking detected ]
    3.12.0+ #16 Not tainted
    ---------------------------------------------
    ln/7386 is trying to acquire lock:
    (&ei->i_data_sem){+.+...}, at: [] udf_get_block+0x8d/0x130

    but task is already holding lock:
    (&ei->i_data_sem){+.+...}, at: [] udf_symlink+0x8d/0x690

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&ei->i_data_sem);
    lock(&ei->i_data_sem);

    *** DEADLOCK ***

    This is because we hold i_data_sem of the symlink inode while calling
    udf_add_entry() for the directory. I don't think this can ever lead to
    deadlocks since we never hold i_data_sem for two inodes in any other
    place.

    The fix is simple - move unlock of i_data_sem for symlink inode up. We
    don't need it for anything when linking symlink inode to directory.

    Reported-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

19 Oct, 2013

1 commit

  • The UDF driver was not strict enough about checking the IDs in the
    VSDs when mounting, which resulted in reading through all the sectors
    of the block device in some unfortunate cases. Eg, trying to mount my
    uninitialized 200G SSD partition (all 0xFF bytes) took ~350 minutes to
    fail, because the code expected some of the valid IDs or a zero byte.
    During this, the mount couldn't be killed, sync from the cmdline
    blocked, and the machine froze into the shutdown. Valid filesystems
    (extX, btrfs, ntfs) were rejected by the mere accident of having a
    zero byte at just the right place in some of their sectors, close
    enough to the beginning not to generate excess I/O. The fix adds a
    hard limit on the VSD sector offset, adds the two missing VSD IDs, and
    stops scanning when encountering an invalid ID. Also replaced the
    magic number 32768 with a more meaningful #define, and supressed the
    bogus message about failing to read the first sector if no UDF fs was
    detected.

    Signed-off-by: Peter A. Felvegi
    Signed-off-by: Jan Kara

    Peter A. Felvegi
     

24 Sep, 2013

1 commit

  • A user has reported an oops in udf_statfs() that was caused by
    numOfPartitions entry in LVID structure being corrupted. Fix the problem
    by verifying whether numOfPartitions makes sense at least to the extent
    that LVID fits into a single block as it should.

    Reported-by: Juergen Weigert
    Signed-off-by: Jan Kara

    Jan Kara
     

14 Sep, 2013

1 commit

  • Pull aio changes from Ben LaHaise:
    "First off, sorry for this pull request being late in the merge window.
    Al had raised a couple of concerns about 2 items in the series below.
    I addressed the first issue (the race introduced by Gu's use of
    mm_populate()), but he has not provided any further details on how he
    wants to rework the anon_inode.c changes (which were sent out months
    ago but have yet to be commented on).

    The bulk of the changes have been sitting in the -next tree for a few
    months, with all the issues raised being addressed"

    * git://git.kvack.org/~bcrl/aio-next: (22 commits)
    aio: rcu_read_lock protection for new rcu_dereference calls
    aio: fix race in ring buffer page lookup introduced by page migration support
    aio: fix rcu sparse warnings introduced by ioctx table lookup patch
    aio: remove unnecessary debugging from aio_free_ring()
    aio: table lookup: verify ctx pointer
    staging/lustre: kiocb->ki_left is removed
    aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
    aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
    aio: convert the ioctx list to table lookup v3
    aio: double aio_max_nr in calculations
    aio: Kill ki_dtor
    aio: Kill ki_users
    aio: Kill unneeded kiocb members
    aio: Kill aio_rw_vect_retry()
    aio: Don't use ctx->tail unnecessarily
    aio: io_cancel() no longer returns the io_event
    aio: percpu ioctx refcount
    aio: percpu reqs_available
    aio: reqs_active -> reqs_available
    aio: fix build when migration is disabled
    ...

    Linus Torvalds
     

13 Sep, 2013

1 commit


01 Aug, 2013

2 commits

  • Refuse RW mount of udf filesystem. So far we just silently changed it
    to RO mount but when the media is writeable, block layer won't notice
    this change and thus will think device is used RW and will block eject
    button of the drive. That is unexpected by users because for
    non-writeable media eject button works just fine.

    Userspace mount(8) command handles this just fine and retries mounting
    with MS_RDONLY set so userspace shouldn't see any regression. Plus any
    tool mounting udf is likely confronted with the case of read-only
    media where block layer already refuses to mount the filesystem without
    MS_RDONLY set so our behavior shouldn't be anything new for it.

    Reported-by: Hui Wang
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Change all function used in filesystem discovery during mount to user
    standard kernel return values - -errno on error, 0 on success instead
    of 1 on failure and 0 on success. This allows us to pass error number
    (not just failure / success) so we can abort device scanning earlier
    in case of errors like EIO or ENOMEM . Also we will be able to return
    EROFS in case writeable mount is requested but writing isn't supported.

    Signed-off-by: Jan Kara

    Jan Kara
     

30 Jul, 2013

1 commit

  • This code doesn't serve any purpose anymore, since the aio retry
    infrastructure has been removed.

    This change should be safe because aio_read/write are also used for
    synchronous IO, and called from do_sync_read()/do_sync_write() - and
    there's no looping done in the sync case (the read and write syscalls).

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Signed-off-by: Benjamin LaHaise

    Kent Overstreet
     

29 Jun, 2013

2 commits