11 Jan, 2012

4 commits

  • Nothing requires that we lock the filesystem until the root inode is
    provided.

    Also iget5_locked() triggers a warning because we are holding the
    filesystem lock while allocating the inode, which result in a lockdep
    suspicion that we have a lock inversion against the reclaim path:

    [ 1986.896979] =================================
    [ 1986.896990] [ INFO: inconsistent lock state ]
    [ 1986.896997] 3.1.1-main #8
    [ 1986.897001] ---------------------------------
    [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [] reiserfs_write_lock+0x20/0x2a
    [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
    [ 1986.897050] [] mark_held_locks+0xae/0xd0
    [ 1986.897060] [] lockdep_trace_alloc+0x7d/0x91
    [ 1986.897068] [] kmem_cache_alloc+0x1a/0x93
    [ 1986.897078] [] reiserfs_alloc_inode+0x13/0x3d
    [ 1986.897088] [] alloc_inode+0x14/0x5f
    [ 1986.897097] [] iget5_locked+0x62/0x13a
    [ 1986.897106] [] reiserfs_fill_super+0x410/0x8b9
    [ 1986.897114] [] mount_bdev+0x10b/0x159
    [ 1986.897123] [] get_super_block+0x10/0x12
    [ 1986.897131] [] mount_fs+0x59/0x12d
    [ 1986.897138] [] vfs_kern_mount+0x45/0x7a
    [ 1986.897147] [] do_kern_mount+0x2f/0xb0
    [ 1986.897155] [] do_mount+0x5c2/0x612
    [ 1986.897163] [] sys_mount+0x61/0x8f
    [ 1986.897170] [] sysenter_do_call+0x12/0x32
    [ 1986.897181] irq event stamp: 7509691
    [ 1986.897186] hardirqs last enabled at (7509691): [] kmem_cache_alloc+0x6e/0x93
    [ 1986.897197] hardirqs last disabled at (7509690): [] kmem_cache_alloc+0x24/0x93
    [ 1986.897209] softirqs last enabled at (7508896): [] __do_softirq+0xee/0xfd
    [ 1986.897222] softirqs last disabled at (7508859): [] do_softirq+0x50/0x9d
    [ 1986.897234]
    [ 1986.897235] other info that might help us debug this:
    [ 1986.897242] Possible unsafe locking scenario:
    [ 1986.897244]
    [ 1986.897250] CPU0
    [ 1986.897254] ----
    [ 1986.897257] lock(&REISERFS_SB(s)->lock);
    [ 1986.897265]
    [ 1986.897269] lock(&REISERFS_SB(s)->lock);
    [ 1986.897276]
    [ 1986.897277] *** DEADLOCK ***
    [ 1986.897278]
    [ 1986.897286] no locks held by kswapd0/16.
    [ 1986.897291]
    [ 1986.897292] stack backtrace:
    [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
    [ 1986.897306] Call Trace:
    [ 1986.897314] [] ? printk+0xf/0x11
    [ 1986.897324] [] print_usage_bug+0x20e/0x21a
    [ 1986.897332] [] ? print_irq_inversion_bug+0x172/0x172
    [ 1986.897341] [] mark_lock+0x27f/0x483
    [ 1986.897349] [] __lock_acquire+0x628/0x1472
    [ 1986.897358] [] lock_acquire+0x47/0x5e
    [ 1986.897366] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897384] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897397] [] mutex_lock_nested+0x35/0x26f
    [ 1986.897409] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897421] [] reiserfs_write_lock+0x20/0x2a
    [ 1986.897433] [] map_block_for_writepage+0xc9/0x590
    [ 1986.897448] [] ? create_empty_buffers+0x33/0x8f
    [ 1986.897461] [] ? get_parent_ip+0xb/0x31
    [ 1986.897472] [] ? sub_preempt_count+0x81/0x8e
    [ 1986.897485] [] ? _raw_spin_unlock+0x27/0x3d
    [ 1986.897496] [] ? get_parent_ip+0xb/0x31
    [ 1986.897508] [] reiserfs_writepage+0x1b9/0x3e7
    [ 1986.897521] [] ? clear_page_dirty_for_io+0xcb/0xde
    [ 1986.897533] [] ? trace_hardirqs_on_caller+0x108/0x138
    [ 1986.897546] [] ? trace_hardirqs_on+0xb/0xd
    [ 1986.897559] [] shrink_page_list+0x34f/0x5e2
    [ 1986.897572] [] shrink_inactive_list+0x172/0x22c
    [ 1986.897585] [] shrink_zone+0x303/0x3b1
    [ 1986.897597] [] ? _raw_spin_unlock+0x27/0x3d
    [ 1986.897611] [] kswapd+0x3b7/0x5f2

    The deadlock shouldn't happen since we are doing that allocation in the
    mount path, the filesystem is not available for any reclaim. Still the
    warning is annoying.

    To solve this, acquire the lock later only where we need it, right before
    calling reiserfs_read_locked_inode() that wants to lock to walk the tree.

    Reported-by: Knut Petersen
    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • journal_init() doesn't need the lock since no operation on the filesystem
    is involved there. journal_read() and get_list_bitmap() have yet to be
    reviewed carefully though before removing the lock there. Just keep the
    it around these two calls for safety.

    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • In the mount path, transactions that are made before journal
    initialization don't involve the filesystem. We can delay the reiserfs
    lock until we play with the journal.

    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • Signed-off-by: Davidlohr Bueso
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

2 commits

  • This patch fixes a crash in reiserfs_delete_xattrs during umount.

    When shrink_dcache_for_umount clears the dcache from
    generic_shutdown_super, delayed evictions are forced to disk. If an
    evicted inode has extended attributes associated with it, it will
    need to walk the xattr tree to locate and remove them.

    But since shrink_dcache_for_umount will BUG if it encounters active
    dentries, the xattr tree must be released before it's called or it will
    crash during every umount.

    This patch forces the evictions to occur before generic_shutdown_super
    by calling shrink_dcache_sb first. The additional evictions caused
    by the removal of each associated xattr file and dir will be automatically
    handled as they're added to the LRU list.

    CC: reiserfs-devel@vger.kernel.org
    CC: stable@kernel.org
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara

    Jeff Mahoney
     
  • When jqfmt mount option is not specified on remount, we mistakenly clear
    s_jquota_fmt value stored in superblock. Fix the problem.

    CC: stable@kernel.org
    CC: reiserfs-devel@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

07 Jan, 2012

2 commits


04 Jan, 2012

7 commits


02 Nov, 2011

2 commits


25 Oct, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
    MAINTAINERS: linux-m32r is moderated for non-subscribers
    linux@lists.openrisc.net is moderated for non-subscribers
    Drop default from "DM365 codec select" choice
    parisc: Kconfig: cleanup Kernel page size default
    Kconfig: remove redundant CONFIG_ prefix on two symbols
    cris: remove arch/cris/arch-v32/lib/nand_init.S
    microblaze: add missing CONFIG_ prefixes
    h8300: drop puzzling Kconfig dependencies
    MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
    tty: drop superfluous dependency in Kconfig
    ARM: mxc: fix Kconfig typo 'i.MX51'
    Fix file references in Kconfig files
    aic7xxx: fix Kconfig references to READMEs
    Fix file references in drivers/ide/
    thinkpad_acpi: Fix printk typo 'bluestooth'
    bcmring: drop commented out line in Kconfig
    btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
    doc: raw1394: Trivial typo fix
    CIFS: Don't free volume_info->UNC until we are entirely done with it.
    treewide: Correct spelling of successfully in comments
    ...

    Linus Torvalds
     

15 Sep, 2011

2 commits


09 Aug, 2011

1 commit


01 Aug, 2011

2 commits


26 Jul, 2011

9 commits

  • * Merge akpm patch series: (122 commits)
    drivers/connector/cn_proc.c: remove unused local
    Documentation/SubmitChecklist: add RCU debug config options
    reiserfs: use hweight_long()
    reiserfs: use proper little-endian bitops
    pnpacpi: register disabled resources
    drivers/rtc/rtc-tegra.c: properly initialize spinlock
    drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
    drivers/rtc: add support for Qualcomm PMIC8xxx RTC
    drivers/rtc/rtc-s3c.c: support clock gating
    drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
    init: skip calibration delay if previously done
    misc/eeprom: add eeprom access driver for digsy_mtc board
    misc/eeprom: add driver for microwire 93xx46 EEPROMs
    checkpatch.pl: update $logFunctions
    checkpatch: make utf-8 test --strict
    checkpatch.pl: add ability to ignore various messages
    checkpatch: add a "prefer __aligned" check
    checkpatch: validate signature styles and To: and Cc: lines
    checkpatch: add __rcu as a sparse modifier
    checkpatch: suggest using min_t or max_t
    ...

    Did this as a merge because of (trivial) conflicts in
    - Documentation/feature-removal-schedule.txt
    - arch/xtensa/include/asm/uaccess.h
    that were just easier to fix up in the merge than in the patch series.

    Linus Torvalds
     
  • Use hweight_long() to count free bits in the bitmap.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Using __test_and_{set,clear}_bit_le() with ignoring its return value can
    be replaced with __{set,clear}_bit_le().

    This introduces reiserfs_{set,clear}_le_bit for __{set,clear}_bit_le and
    does the above change with them.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: take the ACL checks to common code
    bury posix_acl_..._masq() variants
    kill boilerplates around posix_acl_create_masq()
    generic_acl: no need to clone acl just to push it to set_cached_acl()
    kill boilerplate around posix_acl_chmod_masq()
    reiserfs: cache negative ACLs for v1 stat format
    xfs: cache negative ACLs if there is no attribute fork
    9p: do no return 0 from ->check_acl without actually checking
    vfs: move ACL cache lookup into generic code
    CIFS: Fix oops while mounting with prefixpath
    xfs: Fix wrong return value of xfs_file_aio_write
    fix devtmpfs race
    caam: don't pass bogus S_IFCHR to debugfs_create_...()
    get rid of create_proc_entry() abuses - proc_mkdir() is there for purpose
    asus-wmi: ->is_visible() can't return negative
    fix jffs2 ACLs on big-endian with 16bit mode_t
    9p: close ACL leaks
    ocfs2_init_acl(): fix a leak
    VFS : mount lock scalability for internal mounts

    Linus Torvalds
     
  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • new helper: posix_acl_create(&acl, gfp, mode_p). Replaces acl with
    modified clone, on failure releases acl and replaces with NULL.
    Returns 0 or -ve on error. All callers of posix_acl_create_masq()
    switched.

    Signed-off-by: Al Viro

    Al Viro
     
  • new helper: posix_acl_chmod(&acl, gfp, mode). Replaces acl with modified
    clone or with NULL if that has failed; returns 0 or -ve on error. All
    callers of posix_acl_chmod_masq() switched to that - they'd been doing
    exactly the same thing.

    Signed-off-by: Al Viro

    Al Viro
     
  • Always set up a negative ACL cache entry if the inode can't have ACLs.
    That behaves much better than doing this check inside ->check_acl.

    Also remove the left over MAY_NOT_BLOCK check.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
    block: strict rq_affinity
    backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
    block: fix patch import error in max_discard_sectors check
    block: reorder request_queue to remove 64 bit alignment padding
    CFQ: add think time check for group
    CFQ: add think time check for service tree
    CFQ: move think time check variables to a separate struct
    fixlet: Remove fs_excl from struct task.
    cfq: Remove special treatment for metadata rqs.
    block: document blk_plug list access
    block: avoid building too big plug list
    compat_ioctl: fix make headers_check regression
    block: eliminate potential for infinite loop in blkdev_issue_discard
    compat_ioctl: fix warning caused by qemu
    block: flush MEDIA_CHANGE from drivers on close(2)
    blk-throttle: Make total_nr_queued unsigned
    block: Add __attribute__((format(printf...) and fix fallout
    fs/partitions/check.c: make local symbols static
    block:remove some spare spaces in genhd.c
    block:fix the comment error in blkdev.h
    ...

    Linus Torvalds
     

21 Jul, 2011

5 commits

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     
  • Change the default reiserfs mount option to barrier=flush. Based on a patch
    from Jeff Mahoney in the SuSE tree.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Simple filesystems always pass inode->i_sb_bdev as the block device
    argument, and never need a end_io handler. Let's simply things for
    them and for my grepping activity by dropping these arguments. The
    only thing not falling into that scheme is ext4, which passes and
    end_io handler without needing special flags (yet), but given how
    messy the direct I/O code there is use of __blockdev_direct_IO
    in one instead of two out of three cases isn't going to make a large
    difference anyway.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • i_alloc_sem is a rather special rw_semaphore. It's the last one that may
    be released by a non-owner, and it's write side is always mirrored by
    real exclusion. It's intended use it to wait for all pending direct I/O
    requests to finish before starting a truncate.

    Replace it with a hand-grown construct:

    - exclusion for truncates is already guaranteed by i_mutex, so it can
    simply fall way
    - the reader side is replaced by an i_dio_count member in struct inode
    that counts the number of pending direct I/O requests. Truncate can't
    proceed as long as it's non-zero
    - when i_dio_count reaches non-zero we wake up a pending truncate using
    wake_up_bit on a new bit in i_flags
    - new references to i_dio_count can't appear while we are waiting for
    it to read zero because the direct I/O count always needs i_mutex
    (or an equivalent like XFS's i_iolock) for starting a new operation.

    This scheme is much simpler, and saves the space of a spinlock_t and a
    struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
    system).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

20 Jul, 2011

2 commits