10 May, 2011

1 commit

  • After having applied commit 9954e7af14868b8b ("nilfs2: add free
    entries count only if clear bit operation succeeded"), a free routine
    of nilfs came to fall into an infinite loop, outputting the same
    message endlessly:

    nilfs_palloc_freev: entry number 29497 already freed
    nilfs_palloc_freev: entry number 29497 already freed
    nilfs_palloc_freev: entry number 29497 already freed
    nilfs_palloc_freev: entry number 29497 already freed
    nilfs_palloc_freev: entry number 29497 already freed ...

    That patch broke the routine so that a loop counter is never updated
    in an abnormal state. This fixes the regression.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

06 Apr, 2011

1 commit

  • With the ->sync_page() hook gone, we have a few users that
    add their own static address_space_operations without any
    functions defined.

    fs/inode.c already has an empty_aops that it uses for init
    purposes. Lets export that and use it in the places where
    an otherwise empty aops was defined.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Mar, 2011

3 commits

  • Fixes whitespace coding style issues.

    Signed-off-by: Nicolas Kaiser
    Signed-off-by: Ryusuke Konishi

    Nicolas Kaiser
     
  • Nilfs in 2.6.39-rc1 hit the following oops:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    IP: [] try_to_release_page+0x2a/0x3d
    PGD 234cb6067 PUD 234c72067 PMD 0
    Oops: 0000 [#1] SMP

    Process truncate (pid: 10995, threadinfo ffff8802353c2000, task ffff880234cfa000)
    Stack:
    ffff8802333c77b8 ffffffff810b64b0 0000000000003802 ffffffffa0052cca
    0000000000000000 ffff8802353c3b58 0000000000000000 ffff8802353c3b58
    0000000000000001 0000000000000000 ffffea0007b92308 ffffea0007b92308
    Call Trace:
    [] ? invalidate_inode_pages2_range+0x15f/0x273
    [] ? nilfs_palloc_get_block+0x2d/0xaf [nilfs2]
    [] ? bit_waitqueue+0x14/0xa1
    [] ? wake_up_bit+0x10/0x20
    [] ? nilfs_forget_buffer+0x66/0x7a [nilfs2]
    [] ? nilfs_btree_concat_left+0x5c/0x77 [nilfs2]
    [] ? nilfs_btree_delete+0x395/0x3cf [nilfs2]
    [] ? nilfs_bmap_do_delete+0x6e/0x79 [nilfs2]
    [] ? nilfs_btree_last_key+0x14b/0x15e [nilfs2]
    [] ? nilfs_bmap_truncate+0x2f/0x83 [nilfs2]
    [] ? nilfs_bmap_last_key+0x35/0x62 [nilfs2]
    [] ? nilfs_truncate_bmap+0x6b/0xc7 [nilfs2]
    [] ? nilfs_truncate+0x79/0xe4 [nilfs2]
    [] ? vmtruncate+0x33/0x3b
    [] ? nilfs_setattr+0x4d/0x8c [nilfs2]
    [] ? do_page_fault+0x31b/0x356
    [] ? notify_change+0x17d/0x262
    [] ? do_truncate+0x65/0x80
    [] ? sys_ftruncate+0xf1/0xf6
    [] ? system_call_fastpath+0x16/0x1b
    Code: c3 48 83 ec 08 48 8b 17 48 8b 47 18 80 e2 01 75 04 0f 0b eb fe 48 8b 17 80 e6 20 74 05 31 c0 41 59 c3 48 85 c0 74 11 48 8b 40 58
    8b 40 48 48 85 c0 74 04 41 58 ff e0 59 e9 b1 b5 05 00 41 54
    RIP [] try_to_release_page+0x2a/0x3d
    RSP
    CR2: 0000000000000048

    This oops was brought in by the change "block: remove per-queue
    plugging" (commit: 7eaceaccab5f40bb). It initializes mapping->a_ops
    with a NULL pointer for some pages in nilfs (e.g. btree node pages),
    but mm code doesn't NULL pointer checks against mapping->a_ops. (the
    check is done for each callback function)

    This corrects the aops initialization and fixes the oops.

    Signed-off-by: Ryusuke Konishi
    Acked-by: Jens Axboe

    Ryusuke Konishi
     
  • From the result of a function test of mmap, mmap write to shared pages
    turned out to be broken for hole blocks. It doesn't write out filled
    blocks and the data will be lost after umount. This is due to a bug
    that the target file is not queued for log writer when filling hole
    blocks.

    Also, nilfs_page_mkwrite function exits normal code path even after
    successfully filled hole blocks due to a change of block_page_mkwrite
    function; just after nilfs was merged into the mainline,
    block_page_mkwrite() started to return VM_FAULT_LOCKED instead of zero
    by the patch "mm: close page_mkwrite races" (commit:
    b827e496c893de0c). The current nilfs_page_mkwrite() is not handling
    this value properly.

    This corrects nilfs_page_mkwrite() and will resolve the data loss
    problem in mmap write.

    [This should be applied to every kernel since 2.6.30 but a fix is
    needed for 2.6.37 and prior kernels]

    Signed-off-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc: stable [2.6.38]

    Ryusuke Konishi
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

24 Mar, 2011

2 commits

  • And give it a kernel-doc comment.

    [akpm@linux-foundation.org: btrfs changed in linux-next]
    Signed-off-by: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • As a preparation for removing ext2 non-atomic bit operations from
    asm/bitops.h. This converts ext2 non-atomic bit operations to
    little-endian bit operations.

    Signed-off-by: Akinobu Mita
    Acked-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

10 Mar, 2011

3 commits


09 Mar, 2011

8 commits


08 Mar, 2011

10 commits

  • This records the number of used blocks per checkpoint in each
    checkpoint entry of cpfile. Even though userland tools can get the
    block count via nilfs_get_cpinfo ioctl, it was not updated by the
    nilfs2 kernel code. This fixes the issue and makes it available for
    userland tools to calculate used amount per checkpoint.

    Signed-off-by: Ryusuke Konishi
    Cc: Jiro SEKIBA

    Ryusuke Konishi
     
  • This is a similar change to those in ext2/ext3 codebase (commit
    40a063f6691ce937 and a4ae3094869f18e2, respectively).

    The addition of 64k block capability in the rec_len_from_disk and
    rec_len_to_disk functions added a bit of math overhead which slows
    down file create workloads needlessly when the architecture cannot
    even support 64k blocks. This will cut the corner.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • At present, the same warning message can be output twice when nilfs
    detected a problem on super blocks:

    NILFS warning: broken superblock. using spare superblock.
    NILFS warning: broken superblock. using spare superblock.
    ...

    This is because these super blocks are reloaded with the block size
    written in a super block if it differs from the first block size, but
    this repetition looks somewhat confusing. So, we hint at what is
    going on by appending block size information to those messages.

    Reported-by: Wakko Warner
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if
    application is 32 bit and kernel is 64 bit.

    This issue is avoidable by adding compat_ioctl method.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Add support for the standard attributes set via chattr and read via
    lsattr. These attributes are already in the flags value in the nilfs2
    inode, but currently we don't have any ioctl commands that expose them
    to the userland.

    Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting
    i_generation, which allows users to list the file's generation number
    with "lsattr -v".

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs has few rectrictions on which flags may be set on which inodes
    like ext2/3/4 filesystems used to be. Specifically DIRSYNC may only
    be set on directories and IMMUTABLE and APPEND may not be set on
    links. Tighten that to disallow TOPDIR being set on non-directories
    and only NODUMP and NOATIME to be set on non-regular file,
    non-directories.

    This introduces a flags masking function like those of extN and uses
    it during inode creation.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • At present, nilfs marks S_NOATIME flag on all inodes. This restricts
    nilfs_set_inode_flags function so that it marks S_NOATIME only if a
    given inode has an FS_NOATIME_FL flag.

    Although nilfs does not support atime yet, touch_atime() still safely
    returns on IS_NOATIME check since MS_NOATIME is always set on sb.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL,
    NILFS_COMPR_FL, and so forth) with common inode flags, and removes the
    own flag declarations.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Three functions of the current persistent object allocator,
    nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and
    nilfs_palloc_freev functions unconditionally add a counter after doing
    clear bit operation on a bitmap block.

    If the clear bit operation overlapped, the counter will not add up.
    This fixes the issue by making the counter operations conditional.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This fixes the issue that inodes count will not add up after removal
    of raw inodes fails. Hence, this prevents possible under flow of the
    inodes count.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

04 Mar, 2011

1 commit


03 Mar, 2011

1 commit


02 Mar, 2011

1 commit

  • According to the report from Jiro SEKIBA titled "regression in
    2.6.37?" (Message-Id: ), on 2.6.37 and
    later kernels, lscp command no longer displays "i" flag on checkpoints
    that snapshot operations or garbage collection created.

    This is a regression of nilfs2 checkpointing function, and it's
    critical since it broke behavior of a part of nilfs2 applications.
    For instance, snapshot manager of TimeBrowse gets to create
    meaningless snapshots continuously; snapshot creation triggers another
    checkpoint, but applications cannot distinguish whether the new
    checkpoint contains meaningful changes or not without the i-flag.

    This patch fixes the regression and brings that application behavior
    back to normal.

    Reported-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Tested-by: Jiro SEKIBA
    Cc: stable [2.6.37]

    Ryusuke Konishi
     

24 Feb, 2011

1 commit

  • Michael Leun reported that running parallel opens on a fuse filesystem
    can trigger a "kernel BUG at mm/truncate.c:475"

    Gurudas Pai reported the same bug on NFS.

    The reason is, unmap_mapping_range() is not prepared for more than
    one concurrent invocation per inode. For example:

    thread1: going through a big range, stops in the middle of a vma and
    stores the restart address in vm_truncate_count.

    thread2: comes in with a small (e.g. single page) unmap request on
    the same vma, somewhere before restart_address, finds that the
    vma was already unmapped up to the restart address and happily
    returns without doing anything.

    Another scenario would be two big unmap requests, both having to
    restart the unmapping and each one setting vm_truncate_count to its
    own value. This could go on forever without any of them being able to
    finish.

    Truncate and hole punching already serialize with i_mutex. Other
    callers of unmap_mapping_range() do not, and it's difficult to get
    i_mutex protection for all callers. In particular ->d_revalidate(),
    which calls invalidate_inode_pages2_range() in fuse, may be called
    with or without i_mutex.

    This patch adds a new mutex to 'struct address_space' to prevent
    running multiple concurrent unmap_mapping_range() on the same mapping.

    [ We'll hopefully get rid of all this with the upcoming mm
    preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
    lockbreak" patch in particular. But that is for 2.6.39 ]

    Signed-off-by: Miklos Szeredi
    Reported-by: Michael Leun
    Reported-by: Gurudas Pai
    Tested-by: Gurudas Pai
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

22 Jan, 2011

1 commit

  • Fixes the following kernel oops in nilfs_setup_super() which could
    arise if one of two super-blocks is unavailable.

    > BUG: unable to handle kernel NULL pointer dereference at (null)
    > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
    > EIP: 0060:[] EFLAGS: 00010202 CPU: 3
    > EIP is at memcpy+0xc/0x1b
    > Call Trace:
    > [] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
    > [] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
    > [] ? nilfs_mount+0x4f9/0x62c [nilfs2]
    > [] ? kstrdup+0x36/0x3f
    > [] ? nilfs_mount+0x0/0x62c [nilfs2]
    > [] ? vfs_kern_mount+0x4d/0x12c
    > [] ? get_fs_type+0x76/0x8f
    > [] ? do_kern_mount+0x33/0xbf
    > [] ? do_mount+0x2ed/0x714
    > [] ? copy_mount_options+0x28/0xfc
    > [] ? sys_mount+0x72/0xaf
    > [] ? syscall_call+0x7/0xb

    Reported-by: Wakko Warner
    Signed-off-by: Ryusuke Konishi
    Tested-by: Wakko Warner
    Cc: stable [2.6.37, 2.6.36]
    LKML-Reference:

    Ryusuke Konishi
     

14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

11 Jan, 2011

1 commit


10 Jan, 2011

4 commits