10 Jan, 2012

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2/3/4: delete unneeded includes of module.h
    ext{3,4}: Fix potential race when setversion ioctl updates inode
    udf: Mark LVID buffer as uptodate before marking it dirty
    ext3: Don't warn from writepage when readonly inode is spotted after error
    jbd: Remove j_barrier mutex
    reiserfs: Force inode evictions before umount to avoid crash
    reiserfs: Fix quota mount option parsing
    udf: Treat symlink component of type 2 as /
    udf: Fix deadlock when converting file from in-ICB one to normal one
    udf: Cleanup calling convention of inode_getblk()
    ext2: Fix error handling on inode bitmap corruption
    ext3: Fix error handling on inode bitmap corruption
    ext3: replace ll_rw_block with other functions
    ext3: NULL dereference in ext3_evict_inode()
    jbd: clear revoked flag on buffers before a new transaction started
    ext3: call ext3_mark_recovery_complete() when recovery is really needed

    Linus Torvalds
     

09 Jan, 2012

2 commits

  • j_barrier mutex is used for serializing different journal lock operations. The
    problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
    with j_barrier mutex held which makes lockdep freak out. Also hibernation code
    wants to freeze filesystem but it cannot do so because it then cannot hibernate
    the system because of mutex being locked.

    So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
    Since locking journal is a rare operation we don't have to care about fairness
    or such things.

    CC: Andrew Morton
    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

06 Dec, 2011

1 commit


22 Nov, 2011

2 commits

  • Currently, we clear revoked flag only when a block is reused. However,
    this can tigger a false journal error. Consider a situation when a block
    is used as a meta block and is deleted(revoked) in ordered mode, then the
    block is allocated as a data block to a file. At this moment, user changes
    the file's journal mode from ordered to journaled and truncates the file.
    The block will be considered re-revoked by journal because it has revoked
    flag still pending from the last transaction and an assertion triggers.

    We fix the problem by keeping the revoked status more uptodate - we clear
    revoked flag when switching revoke tables to reflect there is no revoked
    buffers in current transaction any more.

    Signed-off-by: Yongqiang Yang
    Signed-off-by: Jan Kara

    Yongqiang Yang
     
  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

02 Nov, 2011

1 commit

  • I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
    mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
    image has s_first = 0 in journal superblock, and the 0 is passed to
    journal->j_head in journal_reset(), then to blocknr in
    cleanup_journal_tail(), in the end the J_ASSERT failed.

    So validate s_first after reading journal superblock from disk in
    journal_get_superblock() to ensure s_first is valid.

    The following script could reproduce it:

    fstype=ext3
    blocksize=1024
    img=$fstype.img
    offset=0
    found=0
    magic="c0 3b 39 98"

    dd if=/dev/zero of=$img bs=1M count=8
    mkfs -t $fstype -b $blocksize -F $img
    filesize=`stat -c %s $img`
    while [ $offset -lt $filesize ]
    do
    if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
    echo "Found journal: $offset"
    found=1
    break
    fi
    offset=`echo "$offset+$blocksize" | bc`
    done

    if [ $found -ne 1 ];then
    echo "Magic \"$magic\" not found"
    exit 1
    fi

    dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

    mkdir -p ./mnt
    mount -o loop $img ./mnt

    Cc: Jan Kara
    Signed-off-by: Eryu Guan
    Signed-off-by: "Theodore Ts'o"

    Eryu Guan
     

28 Jun, 2011

1 commit

  • In journal checkpoint, we write the buffer and wait for its finish.
    But in cfq, the async queue has a very low priority, and in our test,
    if there are too many sync queues and every queue is filled up with
    requests, and the process will hang waiting for the log space.

    So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
    be moved into sync queue and handled by cfq timely. We also use the new plug,
    sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

    Reported-by: Robin Dong
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     

27 Jun, 2011

1 commit

  • journal_remove_journal_head() can oops when trying to access journal_head
    returned by bh2jh(). This is caused for example by the following race:

    TASK1 TASK2
    journal_commit_transaction()
    ...
    processing t_forget list
    __journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    journal_try_to_free_buffers()
    journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    journal_put_journal_head(jh)
    journal_remove_journal_head(bh);

    journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
    part of any transaction and thus frees journal_head before TASK1 gets to doing
    so. Note that even buffer_head can be released by try_to_free_buffers() after
    journal_put_journal_head() which adds even larger opportunity for oops (but I
    didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head reference
    (in b_jcount). That way we don't have to remove journal_head explicitely via
    journal_remove_journal_head() and instead just remove journal_head when
    b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
    [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
    journal_head which needs modification of a few callers. Also we have to be
    careful because once journal_head is removed, buffer_head might be freed as
    well. So we have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara

    Jan Kara
     

25 Jun, 2011

3 commits

  • journal_get_create_access should drop jh->b_jcount in error handling path

    Signed-off-by: Ding Dinghua
    Signed-off-by: Jan Kara

    Ding Dinghua
     
  • The callers of start_this_handle() (or better ext3_journal_start()) are not
    really prepared to handle allocation failures. Such failures can for example
    result in silent data loss when it happens in ext3_..._writepage(). OTOH
    __GFP_NOFAIL is going away so we just retry allocation in start_this_handle().

    This loop is potentially dangerous because the oom killer cannot be invoked
    for GFP_NOFS allocation, so there is a potential for infinitely looping.
    But still this is better than silent data loss.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • This commit adds fixed tracepoint for jbd. It has been based on fixed
    tracepoints for jbd2, however there are missing those for collecting
    statistics, since I think that it will require more intrusive patch so I
    should have its own commit, if someone decide that it is needed. Also
    there are new tracepoints in __journal_drop_transaction() and
    journal_update_superblock().

    The list of jbd tracepoints:

    jbd_checkpoint
    jbd_start_commit
    jbd_commit_locking
    jbd_commit_flushing
    jbd_commit_logging
    jbd_drop_transaction
    jbd_end_commit
    jbd_do_submit_data
    jbd_cleanup_journal_tail
    jbd_update_superblock_end

    Signed-off-by: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Lukas Czerner
     

24 May, 2011

1 commit


17 May, 2011

3 commits

  • summarise_journal_usage seems to be obsolete for a long time,
    so remove it.

    Cc: Jan Kara
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     
  • In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
    from shadow state. The waking code in journal_commit_transaction() has
    a bug because it does not issue a memory barrier after the buffer is moved
    from the shadow state and before wake_up_bit() is called. Thus a waitqueue
    check can happen before the buffer is actually moved from the shadow state
    and waiting process may never be woken. Fix the problem by issuing proper
    barrier.

    CC: stable@kernel.org
    Reported-by: Tao Ma
    Signed-off-by: Jan Kara

    Jan Kara
     
  • If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated. If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd/commit.c:

    J_ASSERT(journal->j_running_transaction != NULL);

    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync(). But
    with the right workload, it can.

    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that is valid for us to
    start: namely, the currently running transaction (if it exists).

    CC: stable@kernel.org
    Reported-by: Martin_Zielinski@McAfee.com
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Jan Kara

    Ted Ts'o
     

31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

17 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Mar, 2011

1 commit


10 Dec, 2010

1 commit


28 Oct, 2010

11 commits


23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

20 Sep, 2010

1 commit

  • Fsync performance for small files achieved by cfq on high-end disks is
    lower than what deadline can achieve, due to idling introduced between
    the sync write happening in process context and the journal commit.

    Moreover, when competing with a sequential reader, a process writing
    small files and fsync-ing them is starved.

    This patch fixes the two problems by:
    - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
    flag set,
    - force all queues that have REQ_NOIDLE requests to be put in the noidle
    tree.

    Having the queue associated to the fsync-ing process and the one associated
    to journal commits in the noidle tree allows:
    - switching between them without idling,
    - fairness vs. competing idling queues, since they will be serviced only
    after the noidle tree expires its slice.

    Acked-by: Vivek Goyal
    Reviewed-by: Jeff Moyer
    Tested-by: Jeff Moyer
    Signed-off-by: Corrado Zoccolo
    Signed-off-by: Jens Axboe

    Corrado Zoccolo
     

10 Sep, 2010

1 commit


18 Aug, 2010

2 commits

  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2010

1 commit


22 May, 2010

1 commit

  • log_start_commit() returns 1 only when it started a transaction
    commit. Thus in case transaction commit is already running, we
    fail to wait for the commit to finish. Fix the issue by always
    waiting for the commit regardless of the log_start_commit return
    value.

    Signed-off-by: Jan Kara

    Jan Kara