07 Jan, 2012

1 commit


04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

10 May, 2011

3 commits


09 Mar, 2011

6 commits


24 Feb, 2011

1 commit

  • Michael Leun reported that running parallel opens on a fuse filesystem
    can trigger a "kernel BUG at mm/truncate.c:475"

    Gurudas Pai reported the same bug on NFS.

    The reason is, unmap_mapping_range() is not prepared for more than
    one concurrent invocation per inode. For example:

    thread1: going through a big range, stops in the middle of a vma and
    stores the restart address in vm_truncate_count.

    thread2: comes in with a small (e.g. single page) unmap request on
    the same vma, somewhere before restart_address, finds that the
    vma was already unmapped up to the restart address and happily
    returns without doing anything.

    Another scenario would be two big unmap requests, both having to
    restart the unmapping and each one setting vm_truncate_count to its
    own value. This could go on forever without any of them being able to
    finish.

    Truncate and hole punching already serialize with i_mutex. Other
    callers of unmap_mapping_range() do not, and it's difficult to get
    i_mutex protection for all callers. In particular ->d_revalidate(),
    which calls invalidate_inode_pages2_range() in fuse, may be called
    with or without i_mutex.

    This patch adds a new mutex to 'struct address_space' to prevent
    running multiple concurrent unmap_mapping_range() on the same mapping.

    [ We'll hopefully get rid of all this with the upcoming mm
    preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
    lockbreak" patch in particular. But that is for 2.6.39 ]

    Signed-off-by: Miklos Szeredi
    Reported-by: Michael Leun
    Reported-by: Gurudas Pai
    Tested-by: Gurudas Pai
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

22 Jan, 2011

1 commit

  • Fixes the following kernel oops in nilfs_setup_super() which could
    arise if one of two super-blocks is unavailable.

    > BUG: unable to handle kernel NULL pointer dereference at (null)
    > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
    > EIP: 0060:[] EFLAGS: 00010202 CPU: 3
    > EIP is at memcpy+0xc/0x1b
    > Call Trace:
    > [] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
    > [] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
    > [] ? nilfs_mount+0x4f9/0x62c [nilfs2]
    > [] ? kstrdup+0x36/0x3f
    > [] ? nilfs_mount+0x0/0x62c [nilfs2]
    > [] ? vfs_kern_mount+0x4d/0x12c
    > [] ? get_fs_type+0x76/0x8f
    > [] ? do_kern_mount+0x33/0xbf
    > [] ? do_mount+0x2ed/0x714
    > [] ? copy_mount_options+0x28/0xfc
    > [] ? sys_mount+0x72/0xaf
    > [] ? syscall_call+0x7/0xb

    Reported-by: Wakko Warner
    Signed-off-by: Ryusuke Konishi
    Tested-by: Wakko Warner
    Cc: stable [2.6.37, 2.6.36]
    LKML-Reference:

    Ryusuke Konishi
     

14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

11 Jan, 2011

1 commit


10 Jan, 2011

2 commits


07 Jan, 2011

2 commits

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
    0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
    we start protecting many other dentry members with d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

13 Nov, 2010

2 commits

  • After recent blkdev_get() modifications, open_by_devnum() and
    open_bdev_exclusive() are simple wrappers around blkdev_get().
    Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

    blkdev_get_by_dev() is identical to open_by_devnum().
    blkdev_get_by_path() is slightly different in that it doesn't
    automatically add %FMODE_EXCL to @mode.

    All users are converted. Most conversions are mechanical and don't
    introduce any behavior difference. There are several exceptions.

    * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
    reason to OR it explicitly on blkdev_put().

    * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
    sb->s_mode.

    * With the above changes, sb->s_mode now always should contain
    FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
    errors.

    The new blkdev_get_*() functions are with proper docbook comments.
    While at it, add function description to blkdev_get() too.

    Signed-off-by: Tejun Heo
    Cc: Philipp Reisner
    Cc: Neil Brown
    Cc: Mike Snitzer
    Cc: Joern Engel
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: KONISHI Ryusuke
    Cc: reiserfs-devel@vger.kernel.org
    Cc: xfs-masters@oss.sgi.com
    Cc: Alexander Viro

    Tejun Heo
     
  • Over time, block layer has accumulated a set of APIs dealing with bdev
    open, close, claim and release.

    * blkdev_get/put() are the primary open and close functions.

    * bd_claim/release() deal with exclusive open.

    * open/close_bdev_exclusive() are combination of open and claim and
    the other way around, respectively.

    * bd_link/unlink_disk_holder() to create and remove holder/slave
    symlinks.

    * open_by_devnum() wraps bdget() + blkdev_get().

    The interface is a bit confusing and the decoupling of open and claim
    makes it impossible to properly guarantee exclusive access as
    in-kernel open + claim sequence can disturb the existing exclusive
    open even before the block layer knows the current open if for another
    exclusive access. Reorganize the interface such that,

    * blkdev_get() is extended to include exclusive access management.
    @holder argument is added and, if is @FMODE_EXCL specified, it will
    gain exclusive access atomically w.r.t. other exclusive accesses.

    * blkdev_put() is similarly extended. It now takes @mode argument and
    if @FMODE_EXCL is set, it releases an exclusive access. Also, when
    the last exclusive claim is released, the holder/slave symlinks are
    removed automatically.

    * bd_claim/release() and close_bdev_exclusive() are no longer
    necessary and either made static or removed.

    * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
    is no longer necessary and removed.

    * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
    and blkdev_get(). It also has an unexpected extra bdev_read_only()
    test which probably should be moved into blkdev_get().

    * open_by_devnum() is modified to take @holder argument and pass it to
    blkdev_get().

    Most of bdev open/close operations are unified into blkdev_get/put()
    and most exclusive accesses are tested atomically at the open time (as
    it should). This cleans up code and removes some, both valid and
    invalid, but unnecessary all the same, corner cases.

    open_bdev_exclusive() and open_by_devnum() can use further cleanup -
    rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
    special features. Well, let's leave them for another day.

    Most conversions are straight-forward. drbd conversion is a bit more
    involved as there was some reordering, but the logic should stay the
    same.

    Signed-off-by: Tejun Heo
    Acked-by: Neil Brown
    Acked-by: Ryusuke Konishi
    Acked-by: Mike Snitzer
    Acked-by: Philipp Reisner
    Cc: Peter Osterlund
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: "Theodore Ts'o"
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: dm-devel@redhat.com
    Cc: drbd-dev@lists.linbit.com
    Cc: Leo Chen
    Cc: Scott Branden
    Cc: Chris Mason
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: reiserfs-devel@vger.kernel.org
    Cc: Alexander Viro

    Tejun Heo
     

29 Oct, 2010

1 commit


23 Oct, 2010

18 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (36 commits)
    nilfs2: eliminate sparse warning - "context imbalance"
    nilfs2: eliminate sparse warnings - "symbol not declared"
    nilfs2: get rid of bdi from nilfs object
    nilfs2: change license of exported header file
    nilfs2: add bdev freeze/thaw support
    nilfs2: accept 64-bit checkpoint numbers in cp mount option
    nilfs2: remove own inode allocator and destructor for metadata files
    nilfs2: get rid of back pointer to writable sb instance
    nilfs2: get rid of mi_nilfs back pointer to nilfs object
    nilfs2: see state of root dentry for mount check of snapshots
    nilfs2: use iget for all metadata files
    nilfs2: get rid of GCDAT inode
    nilfs2: add routines to redirect access to buffers of DAT file
    nilfs2: add routines to roll back state of DAT file
    nilfs2: add routines to save and restore bmap state
    nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes
    nilfs2: allow nilfs_clear_inode to clear metadata file inodes
    nilfs2: get rid of snapshot mount flag
    nilfs2: simplify life cycle management of nilfs object
    nilfs2: do not allocate multiple super block instances for a device
    ...

    Linus Torvalds
     
  • change nilfs_dat_commit_free and nilfs_inode_cachep static
    to fix following warnings

    fs/nilfs2/super.c:72:19: warning: symbol 'nilfs_inode_cachep' was not declared. Should it be static?
    fs/nilfs2/dat.c:106:6: warning: symbol 'nilfs_dat_commit_free' was not declared. Should it be static?

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • Nilfs now can use sb->s_bdi to get backing_dev_info, so we use it
    instead of ns_bdi on the nilfs object and remove ns_bdi.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs hasn't supported the freeze/thaw feature because it didn't work
    due to the peculiar design that multiple super block instances could
    be allocated for a device. This limitation was removed by the patch
    "nilfs2: do not allocate multiple super block instances for a device".

    So now this adds the freeze/thaw support to nilfs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The current implementation doesn't mount snapshots with checkpoint
    numbers larger than INT_MAX since it uses match_int() for parsing
    "cp=" mount option.

    This uses simple_strtoull() for the conversion to resolve the issue.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This finally removes own inode allocator and destructor functions for
    metadata files. Several routines, nilfs_mdt_new(),
    nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and
    nilfs_alloc_inode_common() will be gone.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • After applied the patch that unified sb instances, root dentry of
    snapshots can be left in dcache even after their trees are unmounted.

    The orphan root dentry/inode keeps a root object, and this causes
    false positive of nilfs_checkpoint_is_mounted function.

    This resolves the issue by having nilfs_checkpoint_is_mounted test
    whether the root dentry is busy or not.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This makes use of iget5_locked to allocate or get inode for metadata
    files to stop using own inode allocator.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This stops pre-allocating nilfs object in nilfs_get_sb routine, and
    stops managing its life cycle by reference counting.

    nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex,
    nilfs_objects list, and the reference counter will be removed through
    the simplification.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This stops allocating multiple super block instances for a device.

    All snapshots and a current mode mount (i.e. latest tree) will be
    controlled with nilfs_root objects that are kept within an sb
    instance.

    nilfs_get_sb() is rewritten so that it always has a root object for
    the latest tree and snapshots make additional root objects.

    The root dentry of the latest tree is binded to sb->s_root even if it
    isn't attached on a directory. Root dentries of snapshots or the
    latest tree are binded to mnt->mnt_root on which they are mounted.

    With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list,
    and nilfs->ns_current back pointer, are deleted. In addition,
    init_nilfs() and load_nilfs() are simplified since they will be called
    once for a device, not repeatedly called for mount points.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This splits the code to attach snapshots into a separate routine for
    convenience sake.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This splits the code to allocate root dentry into a separate routine
    for convenience in successive changes.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
    object.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This rewrites functions using ifile so that they get ifile from
    nilfs_root object, and will remove sbi->s_ifile. Some functions that
    don't know the root object are extended to receive it from caller.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The previous export operations cannot handle multiple versions of
    a filesystem if they belong to the same sb instance.

    This adds a new type of file handle and extends export operations so
    that they can get the inode specified by a checkpoint number as well
    as an inode number and a generation number.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This puts a pointer to nilfs_root object in the private part of
    on-memory inode, and makes nilfs_iget function pick up the inode with
    the same root object.

    Non-root inodes inherit its nilfs_root object from parent inode. That
    of the root inode is allocated through nilfs_attach_checkpoint()
    function.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This uses iget5_locked instead of iget_locked so that gc cache can
    look up inodes with an inode number and an optional checkpoint number.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The current nilfs_destroy_inode() doesn't handle metadata file inodes
    including gc inodes (dummy inodes used for garbage collection).

    This allows nilfs_destroy_inode() to destroy inodes of metadata files.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi