15 Aug, 2012

1 commit

  • The BKL push-down for reiserfs made lock recursion a special case that needs
    to be handled explicitly. One of the cases that was unhandled is dropping
    the quota during inode eviction. Both reiserfs_evict_inode and
    reiserfs_write_dquot take the write lock, but when the journal lock is
    taken it only drops one the references. The locking rules are that the journal
    lock be acquired before the write lock so leaving the reference open leads
    to a ABBA deadlock.

    This patch pushes the unlock up before clear_inode and avoids the recursive
    locking.

    Another ABBA situation can occur when the write lock is dropped while reading
    the bitmap buffer while in the quota code. When the lock is reacquired, it
    will deadlock against dquot->dq_lock and dqopt->dqio_mutex in the dquot_acquire
    path. It's safe to retain the lock across the read and should be cached under
    write load.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara

    Jeff Mahoney
     

23 Jul, 2012

2 commits

  • d_instantiate(dentry, inode);
    unlock_new_inode(inode);

    is a bad idea; do it the other way round...

    Signed-off-by: Al Viro

    Al Viro
     
  • Since the moment writes to quota files are using block device page cache and
    space for quota structures is reserved at the moment they are first accessed we
    have no reason to sync quota before inode writeback. In fact this order is now
    only harmful since quota information can easily change during inode writeback
    (either because conversion of delayed-allocated extents or simply because of
    allocation of new blocks for simple filesystems not using page_mkwrite).

    So move syncing of quota information after writeback of inodes into ->sync_fs
    method. This way we do not have to use ->quota_sync callback which is primarily
    intended for use by quotactl syscall anyway and we get rid of calling
    ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
    proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
    also support legacy non-journalled quotas) and thus there are no dirty quota
    structures.

    CC: "Theodore Ts'o"
    CC: Joel Becker
    CC: reiserfs-devel@vger.kernel.org
    Acked-by: Steven Whitehouse
    Acked-by: Dave Kleikamp
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

14 Jul, 2012

4 commits


01 Jun, 2012

5 commits

  • This patch stops reiserfs using the VFS 'write_super()' method along with the
    s_dirt flag, because they are on their way out.

    The whole "superblock write-out" VFS infrastructure is served by the
    'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
    writes out all dirty superblock using the '->write_super()' call-back. But the
    problem with this thread is that it wastes power by waking up the system every
    5 seconds, even if there are no diry superblocks, or there are no client
    file-systems which would need this (e.g., btrfs does not use
    '->write_super()'). So we want to kill it completely and thus, we need to make
    file-systems to stop using the '->write_super()' VFS service, and then remove
    it together with the kernel thread.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The 'journal_mark_dirty()' function currently first marks the superblock as
    dirty by setting 's_dirt' to 1, then does various sanity checks and returns,
    then actuall does all the magic with the journal.

    This is not an ideal order, though. It makes more sense to first do all the
    checks, then do all the internal stuff, and at the end notify the VFS that the
    superblock is now dirty.

    This patch moves the 's_dirt = 1' assignment from the very beginning of this
    function to the very end.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The 'reiserfs_resize()' function marks the superblock as dirty by assigning 1
    to 's_dirt' and then calls 'journal_mark_dirty()' which does the same. Thus,
    we can remove the assignment from 'reiserfs_resize()'.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • Turn 'reiserfs_flush_old_commits()' into a void function because the callers
    do not cares about what it returns anyway.

    We are going to remove the 'sb->s_dirt' field completely and this patch is a
    small step towards this direction.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • We have the reiserfs superblock pointer in the 'sbi' variable in this
    function, no need to use the 'REISERFS_SB(s)' macro which is the same.
    This is jut a small clean-up.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     

30 May, 2012

1 commit

  • pass inode + parent's inode or NULL instead of dentry + bool saying
    whether we want the parent or not.

    NOTE: that needs ceph fix folded in.

    Signed-off-by: Al Viro

    Al Viro
     

29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

16 May, 2012

1 commit


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

29 Mar, 2012

2 commits

  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     

25 Mar, 2012

1 commit

  • Pull cleanup from Paul Gortmaker:
    "The changes shown here are to unify linux's BUG support under the one
    file. Due to historical reasons, we have some BUG code
    in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in
    linux/kernel.h predates the addition of linux/bug.h, but old code in
    kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h
    was including to pseudo link them.

    This has caused confusion[1] and general yuck/WTF[2] reactions. Here
    is an example that violates the principle of least surprise:

    CC lib/string.o
    lib/string.c: In function 'strlcat':
    lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
    make[2]: *** [lib/string.o] Error 1
    $
    $ grep linux/bug.h lib/string.c
    #include
    $

    We've included for the BUG infrastructure and yet we
    still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
    very confusing for someone who is new to kernel development.

    With the above in mind, the goals of this changeset are:

    1) find and fix any include/*.h files that were relying on the
    implicit presence of BUG code.
    2) find and fix any C files that were consuming kernel.h and hence
    relying on implicitly getting some/all BUG code.
    3) Move the BUG related code living in kernel.h to
    4) remove the asm/bug.h from kernel.h to finally break the chain.

    During development, the order was more like 3-4, build-test, 1-2. But
    to ensure that git history for bisect doesn't get needless build
    failures introduced, the commits have been reorderd to fix the problem
    areas in advance.

    [1] https://lkml.org/lkml/2012/1/3/90
    [2] https://lkml.org/lkml/2012/1/17/414"

    Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
    and linux-next.

    * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    kernel.h: doesn't explicitly use bug.h, so don't include it.
    bug: consolidate BUILD_BUG_ON with other bug code
    BUG: headers with BUG/BUG_ON etc. need linux/bug.h
    bug.h: add include of it to various implicit C users
    lib: fix implicit users of kernel.h for TAINT_WARN
    spinlock: macroize assert_spin_locked to avoid bug.h dependency
    x86: relocate get/set debugreg fcns to include/asm/debugreg.

    Linus Torvalds
     

22 Mar, 2012

2 commits

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     
  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

21 Mar, 2012

6 commits


20 Mar, 2012

1 commit


04 Feb, 2012

2 commits


11 Jan, 2012

4 commits

  • Nothing requires that we lock the filesystem until the root inode is
    provided.

    Also iget5_locked() triggers a warning because we are holding the
    filesystem lock while allocating the inode, which result in a lockdep
    suspicion that we have a lock inversion against the reclaim path:

    [ 1986.896979] =================================
    [ 1986.896990] [ INFO: inconsistent lock state ]
    [ 1986.896997] 3.1.1-main #8
    [ 1986.897001] ---------------------------------
    [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [] reiserfs_write_lock+0x20/0x2a
    [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
    [ 1986.897050] [] mark_held_locks+0xae/0xd0
    [ 1986.897060] [] lockdep_trace_alloc+0x7d/0x91
    [ 1986.897068] [] kmem_cache_alloc+0x1a/0x93
    [ 1986.897078] [] reiserfs_alloc_inode+0x13/0x3d
    [ 1986.897088] [] alloc_inode+0x14/0x5f
    [ 1986.897097] [] iget5_locked+0x62/0x13a
    [ 1986.897106] [] reiserfs_fill_super+0x410/0x8b9
    [ 1986.897114] [] mount_bdev+0x10b/0x159
    [ 1986.897123] [] get_super_block+0x10/0x12
    [ 1986.897131] [] mount_fs+0x59/0x12d
    [ 1986.897138] [] vfs_kern_mount+0x45/0x7a
    [ 1986.897147] [] do_kern_mount+0x2f/0xb0
    [ 1986.897155] [] do_mount+0x5c2/0x612
    [ 1986.897163] [] sys_mount+0x61/0x8f
    [ 1986.897170] [] sysenter_do_call+0x12/0x32
    [ 1986.897181] irq event stamp: 7509691
    [ 1986.897186] hardirqs last enabled at (7509691): [] kmem_cache_alloc+0x6e/0x93
    [ 1986.897197] hardirqs last disabled at (7509690): [] kmem_cache_alloc+0x24/0x93
    [ 1986.897209] softirqs last enabled at (7508896): [] __do_softirq+0xee/0xfd
    [ 1986.897222] softirqs last disabled at (7508859): [] do_softirq+0x50/0x9d
    [ 1986.897234]
    [ 1986.897235] other info that might help us debug this:
    [ 1986.897242] Possible unsafe locking scenario:
    [ 1986.897244]
    [ 1986.897250] CPU0
    [ 1986.897254] ----
    [ 1986.897257] lock(&REISERFS_SB(s)->lock);
    [ 1986.897265]
    [ 1986.897269] lock(&REISERFS_SB(s)->lock);
    [ 1986.897276]
    [ 1986.897277] *** DEADLOCK ***
    [ 1986.897278]
    [ 1986.897286] no locks held by kswapd0/16.
    [ 1986.897291]
    [ 1986.897292] stack backtrace:
    [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
    [ 1986.897306] Call Trace:
    [ 1986.897314] [] ? printk+0xf/0x11
    [ 1986.897324] [] print_usage_bug+0x20e/0x21a
    [ 1986.897332] [] ? print_irq_inversion_bug+0x172/0x172
    [ 1986.897341] [] mark_lock+0x27f/0x483
    [ 1986.897349] [] __lock_acquire+0x628/0x1472
    [ 1986.897358] [] lock_acquire+0x47/0x5e
    [ 1986.897366] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897384] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897397] [] mutex_lock_nested+0x35/0x26f
    [ 1986.897409] [] ? reiserfs_write_lock+0x20/0x2a
    [ 1986.897421] [] reiserfs_write_lock+0x20/0x2a
    [ 1986.897433] [] map_block_for_writepage+0xc9/0x590
    [ 1986.897448] [] ? create_empty_buffers+0x33/0x8f
    [ 1986.897461] [] ? get_parent_ip+0xb/0x31
    [ 1986.897472] [] ? sub_preempt_count+0x81/0x8e
    [ 1986.897485] [] ? _raw_spin_unlock+0x27/0x3d
    [ 1986.897496] [] ? get_parent_ip+0xb/0x31
    [ 1986.897508] [] reiserfs_writepage+0x1b9/0x3e7
    [ 1986.897521] [] ? clear_page_dirty_for_io+0xcb/0xde
    [ 1986.897533] [] ? trace_hardirqs_on_caller+0x108/0x138
    [ 1986.897546] [] ? trace_hardirqs_on+0xb/0xd
    [ 1986.897559] [] shrink_page_list+0x34f/0x5e2
    [ 1986.897572] [] shrink_inactive_list+0x172/0x22c
    [ 1986.897585] [] shrink_zone+0x303/0x3b1
    [ 1986.897597] [] ? _raw_spin_unlock+0x27/0x3d
    [ 1986.897611] [] kswapd+0x3b7/0x5f2

    The deadlock shouldn't happen since we are doing that allocation in the
    mount path, the filesystem is not available for any reclaim. Still the
    warning is annoying.

    To solve this, acquire the lock later only where we need it, right before
    calling reiserfs_read_locked_inode() that wants to lock to walk the tree.

    Reported-by: Knut Petersen
    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • journal_init() doesn't need the lock since no operation on the filesystem
    is involved there. journal_read() and get_list_bitmap() have yet to be
    reviewed carefully though before removing the lock there. Just keep the
    it around these two calls for safety.

    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • In the mount path, transactions that are made before journal
    initialization don't involve the filesystem. We can delay the reiserfs
    lock until we play with the journal.

    Signed-off-by: Frederic Weisbecker
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jeff Mahoney
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • Signed-off-by: Davidlohr Bueso
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

2 commits

  • This patch fixes a crash in reiserfs_delete_xattrs during umount.

    When shrink_dcache_for_umount clears the dcache from
    generic_shutdown_super, delayed evictions are forced to disk. If an
    evicted inode has extended attributes associated with it, it will
    need to walk the xattr tree to locate and remove them.

    But since shrink_dcache_for_umount will BUG if it encounters active
    dentries, the xattr tree must be released before it's called or it will
    crash during every umount.

    This patch forces the evictions to occur before generic_shutdown_super
    by calling shrink_dcache_sb first. The additional evictions caused
    by the removal of each associated xattr file and dir will be automatically
    handled as they're added to the LRU list.

    CC: reiserfs-devel@vger.kernel.org
    CC: stable@kernel.org
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jan Kara

    Jeff Mahoney
     
  • When jqfmt mount option is not specified on remount, we mistakenly clear
    s_jquota_fmt value stored in superblock. Fix the problem.

    CC: stable@kernel.org
    CC: reiserfs-devel@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

07 Jan, 2012

2 commits


04 Jan, 2012

1 commit