09 Jan, 2012

1 commit

  • j_barrier mutex is used for serializing different journal lock operations. The
    problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
    with j_barrier mutex held which makes lockdep freak out. Also hibernation code
    wants to freeze filesystem but it cannot do so because it then cannot hibernate
    the system because of mutex being locked.

    So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
    Since locking journal is a rare operation we don't have to care about fairness
    or such things.

    CC: Andrew Morton
    Acked-by: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     

27 Jun, 2011

1 commit

  • journal_remove_journal_head() can oops when trying to access journal_head
    returned by bh2jh(). This is caused for example by the following race:

    TASK1 TASK2
    journal_commit_transaction()
    ...
    processing t_forget list
    __journal_refile_buffer(jh);
    if (!jh->b_transaction) {
    jbd_unlock_bh_state(bh);
    journal_try_to_free_buffers()
    journal_grab_journal_head(bh)
    jbd_lock_bh_state(bh)
    __journal_try_to_free_buffer()
    journal_put_journal_head(jh)
    journal_remove_journal_head(bh);

    journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
    part of any transaction and thus frees journal_head before TASK1 gets to doing
    so. Note that even buffer_head can be released by try_to_free_buffers() after
    journal_put_journal_head() which adds even larger opportunity for oops (but I
    didn't see this happen in reality).

    Fix the problem by making transactions hold their own journal_head reference
    (in b_jcount). That way we don't have to remove journal_head explicitely via
    journal_remove_journal_head() and instead just remove journal_head when
    b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
    [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
    journal_head which needs modification of a few callers. Also we have to be
    careful because once journal_head is removed, buffer_head might be freed as
    well. So we have to get our own buffer_head reference where it matters.

    Signed-off-by: Jan Kara

    Jan Kara
     

25 Jun, 2011

2 commits

  • journal_get_create_access should drop jh->b_jcount in error handling path

    Signed-off-by: Ding Dinghua
    Signed-off-by: Jan Kara

    Ding Dinghua
     
  • The callers of start_this_handle() (or better ext3_journal_start()) are not
    really prepared to handle allocation failures. Such failures can for example
    result in silent data loss when it happens in ext3_..._writepage(). OTOH
    __GFP_NOFAIL is going away so we just retry allocation in start_this_handle().

    This loop is potentially dangerous because the oom killer cannot be invoked
    for GFP_NOFS allocation, so there is a potential for infinitely looping.
    But still this is better than silent data loss.

    Signed-off-by: Jan Kara

    Jan Kara
     

24 May, 2011

1 commit


31 Mar, 2011

1 commit


10 Dec, 2010

1 commit


28 Oct, 2010

3 commits


08 Mar, 2010

1 commit


05 Mar, 2010

1 commit

  • Delay discarding buffers in journal_unmap_buffer until
    we know that "add to orphan" operation has definitely been
    committed, otherwise the log space of committing transation
    may be freed and reused before truncate get committed, updates
    may get lost if crash happens.

    This patch is a backport of JBD2 fix by dingdinghua .

    Signed-off-by: Jan Kara

    Jan Kara
     

09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

16 Sep, 2009

2 commits


16 Jul, 2009

1 commit

  • The following race can happen:

    CPU1 CPU2
    checkpointing code checks the buffer, adds
    it to an array for writeback
    do_get_write_access()
    ...
    lock_buffer()
    unlock_buffer()
    flush_batch() submits the buffer for IO
    __jbd_journal_file_buffer()

    So a buffer under writeout is returned from do_get_write_access(). Since
    the filesystem code relies on the fact that journaled buffers cannot be
    written out, it does not take the buffer lock and so it can modify buffer
    while it is under writeout. That can lead to a filesystem corruption
    if we crash at the right moment. The similar problem can happen with
    the journal_get_create_access() path.
    We fix the problem by clearing the buffer dirty bit under buffer_lock
    even if the buffer is on BJ_None list. Actually, we clear the dirty bit
    regardless the list the buffer is in and warn about the fact if
    the buffer is already journalled.

    Thanks for spotting the problem goes to dingdinghua .

    Reported-by: dingdinghua
    Signed-off-by: Jan Kara

    Jan Kara
     

19 Jun, 2009

1 commit

  • I delete the following patch
    "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
    Author: Mingming Cao
    Date: Fri Jul 25 01:46:22 2008 -0700

    jbd: fix race between free buffer and commit transaction

    This patch is no longer needed because if race between freeing buffer and
    committing transaction functionality occurs and dio gets error, currently
    dio falls back to buffered IO by the following patch.

    commit 6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd
    Author: Hisashi Hifumi
    Date: Tue Sep 2 14:35:40 2008 -0700

    VFS: fix dio write returning EIO when try_to_release_page fails

    Signed-off-by: Hisashi Hifumi
    Cc: Theodore Tso
    Cc: Mingming Cao
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

28 Mar, 2009

1 commit


09 Jan, 2009

2 commits

  • Remove excess kernel-doc from fs/jbd/transaction.c:

    Warning(linux-2.6.28-git5//fs/jbd/transaction.c:764): Excess function parameter 'credits' description in 'journal_get_write_access'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • There is a flaw with the way jbd handles fsync batching. If we fsync() a
    file and we were not the last person to run fsync() on this fs then we
    automatically sleep for 1 jiffie in order to wait for new writers to join
    into the transaction before forcing the commit. The problem with this is
    that with really fast storage (ie a Clariion) the time it takes to commit
    a transaction to disk is way faster than 1 jiffie in most cases, so
    sleeping means waiting longer with nothing to do than if we just committed
    the transaction and kept going. Ric Wheeler noticed this when using
    fs_mark with more than 1 thread, the throughput would plummet as he added
    more threads.

    This patch attempts to fix this problem by recording the average time in
    nanoseconds that it takes to commit a transaction to disk, and what time
    we started the transaction. If we run an fsync() and we have been running
    for less time than it takes to commit the transaction to disk, we sleep
    for the delta amount of time and then commit to disk. We acheive
    sub-jiffie sleeping using schedule_hrtimeout. This means that the wait
    time is auto-tuned to the speed of the underlying disk, instead of having
    this static timeout. I weighted the average according to somebody's
    comments (Andreas Dilger I think) in order to help normalize random
    outliers where we take way longer or way less time to commit than the
    average. I also have a min() check in there to make sure we don't sleep
    longer than a jiffie in case our storage is super slow, this was requested
    by Andrew.

    I unfortunately do not have access to a Clariion, so I had to use a
    ramdisk to represent a super fast array. I tested with a SATA drive with
    barrier=1 to make sure there was no regression with local disks, I tested
    with a 4 way multipathed Apple Xserve RAID array and of course the
    ramdisk. I ran the following command

    fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t $i

    where $i was 2, 4, 8, 16 and 32. I mkfs'ed the fs each time. Here are my
    results

    type threads with patch without patch
    sata 2 24.6 26.3
    sata 4 49.2 48.1
    sata 8 70.1 67.0
    sata 16 104.0 94.1
    sata 32 153.6 142.7

    xserve 2 246.4 222.0
    xserve 4 480.0 440.8
    xserve 8 829.5 730.8
    xserve 16 1172.7 1026.9
    xserve 32 1816.3 1650.5

    ramdisk 2 2538.3 1745.6
    ramdisk 4 2942.3 661.9
    ramdisk 8 2882.5 999.8
    ramdisk 16 2738.7 1801.9
    ramdisk 32 2541.9 2394.0

    Signed-off-by: Josef Bacik
    Cc: Andreas Dilger
    Cc: Arjan van de Ven
    Cc: Ric Wheeler
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

31 Oct, 2008

1 commit

  • Delete excess kernel-doc notation in fs/ subdirectory:

    Warning(linux-2.6.27-git10//fs/jbd/transaction.c:886): Excess function parameter or struct member 'credits' description in 'journal_get_undo_access'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

20 Oct, 2008

1 commit

  • In ordered mode, if a file data buffer being dirtied exists in the
    committing transaction, we write the buffer to the disk, move it from the
    committing transaction to the running transaction, then dirty it. But we
    don't have to remove the buffer from the committing transaction when the
    buffer couldn't be written out, otherwise it would miss the error and the
    committing transaction would not abort.

    This patch adds an error check before removing the buffer from the
    committing transaction.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     

11 Aug, 2008

2 commits


26 Jul, 2008

1 commit

  • journal_try_to_free_buffers() could race with jbd commit transaction when
    the later is holding the buffer reference while waiting for the data
    buffer to flush to disk. If the caller of journal_try_to_free_buffers()
    request tries hard to release the buffers, it will treat the failure as
    error and return back to the caller. We have seen the directo IO failed
    due to this race. Some of the caller of releasepage() also expecting the
    buffer to be dropped when passed with GFP_KERNEL mask to the
    releasepage()->journal_try_to_free_buffers().

    With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
    indicating this call could wait, in case of try_to_free_buffers() failed,
    let's waiting for journal_commit_transaction() to finish commit the
    current committing transaction, then try to free those buffers again.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Mingming Cao
    Reviewed-by: Badari Pulavarty
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

28 Apr, 2008

3 commits

  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • There are several cases where the running transaction can get buffers added to
    its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
    counter end up larger than its t_outstanding_credits counter.

    This will cause issues when starting new transactions as while we are logging
    buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
    negative, we will report that we need less space in the journal than we
    actually need, so transactions will be started even though there may not be
    enough room for them. In the worst case scenario (which admittedly is almost
    impossible to reproduce) this will result in the journal running out of space.

    The fix is to only
    refile buffers from the committing transaction to the running transactions
    BJ_Modified list when b_modified is set on that journal, which is the only way
    to be sure if the running transaction has modified that buffer.

    This patch also fixes an accounting error in journal_forget, it is possible
    that we can call journal_forget on a buffer without having modified it, only
    gotten write access to it, so instead of freeing a credit, we only do so if
    the buffer was modified. The assert will help catch if this problem occurs.
    Without these two patches I could hit this assert within minutes of running
    postmark, with them this issue no longer arises. Thank you,

    Signed-off-by: Josef Bacik
    Cc:
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • Currently at the start of a journal commit we loop through all of the buffers
    on the committing transaction and clear the b_modified flag (the flag that is
    set when a transaction modifies the buffer) under the j_list_lock.

    The problem is that everywhere else this flag is modified only under the jbd
    lock buffer flag, so it will race with a running transaction who could
    potentially set it, and have it unset by the committing transaction.

    This is also a big waste, you can have several thousands of buffers that you
    are clearing the modified flag on when you may not need to. This patch
    removes this code and instead clears the b_modified flag upon entering
    do_get_write_access/journal_get_create_access, so if that transaction does
    indeed use the buffer then it will be accounted for properly, and if it does
    not then we know we didn't use it.

    That will be important for the next patch in this series. Tested thoroughly
    by myself using postmark/iozone/bonnie++.

    Signed-off-by: Josef Bacik
    Cc:
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

20 Mar, 2008

2 commits

  • Fix kernel-doc notation warnings in fs/.

    Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
    * mark_files_ro
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
    * lookup_one_len: filesystem helper to lookup single pathname component
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
    * bh_uptodate_or_lock: Test whether the buffer is uptodate
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
    * bh_submit_read: Submit a locked buffer for reading
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
    * writeback_acquire: attempt to get exclusive writeback access to a device
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
    * writeback_in_progress: determine whether there is writeback in progress
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
    * writeback_release: relinquish exclusive writeback access against a device.
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
    Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
    * void journal_invalidatepage()

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix kernel-doc notation in jbd.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

04 Mar, 2008

1 commit


18 Jan, 2008

1 commit

  • This likely fixes the oops in __lock_acquire reported as:

    http://www.kerneloops.org/raw.php?rawid=2753&msgid=
    http://www.kerneloops.org/raw.php?rawid=2749&msgid=

    In these reported oopses, start_this_handle is returning -EROFS.

    Signed-off-by: Jonas Bonn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonas Bonn
     

20 Oct, 2007

1 commit


19 Oct, 2007

1 commit

  • Get rid of sparse related warnings from places that use integer as NULL
    pointer.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stephen Hemminger
    Cc: Andi Kleen
    Cc: Jeff Garzik
    Cc: Matt Mackall
    Cc: Ian Kent
    Cc: Arnd Bergmann
    Cc: Davide Libenzi
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     

18 Oct, 2007

2 commits


12 Oct, 2007

1 commit


09 May, 2007

2 commits


11 Dec, 2006

1 commit

  • This patch introduces a user: of the round_jiffies() function; the "5 second"
    ext3/jbd wakeup.

    While "every 5 seconds" doesn't sound as a problem, there can be many of these
    (and these timers do add up over all the kernel). The "5 second" wakeup isn't
    really timing sensitive; in addition even with rounding it'll still happen
    every 5 seconds (with the exception of the very first time, which is likely to
    be rounded up to somewhere closer to 6 seconds)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven