10 Sep, 2011

9 commits


07 Sep, 2011

2 commits

  • While running extended fsx tests to verify the preceeding patches,
    a similar bug was also found in the write operation

    When ever a write operation begins or ends in a hole,
    or extends EOF, the partial page contained in the hole
    or beyond EOF needs to be zeroed out.

    To correct this the new ext4_discard_partial_page_buffers_no_lock
    routine is used to zero out the partial page, but only for buffer
    heads that are already unmapped.

    Signed-off-by: Allison Henderson
    Signed-off-by: "Theodore Ts'o"

    Allison Henderson
     
  • While running extended fsx tests to verify the first
    two patches, a similar bug was also found in the
    truncate operation.

    This bug happens because the truncate routine only zeros
    the unblock aligned portion of the last page. This means
    that the block aligned portions of the page appearing after
    i_size are left unzeroed, and the buffer heads still mapped.

    This bug is corrected by using ext4_discard_partial_page_buffers
    in the truncate routine to zero the partial page and unmap
    the buffer headers.

    Signed-off-by: Allison Henderson
    Signed-off-by: "Theodore Ts'o"

    Allison Henderson
     

06 Sep, 2011

1 commit

  • In delayed allocation mode, it's important to only call
    ext4_jbd2_file_inode when the file has been extended. This is
    necessary to avoid a race which first got introduced in commit
    678aaf481, but which was made much more common with the introduction
    of the "punch hole" functionality. (Especially when dioread_nolock
    was enabled; when I could reliably reproduce this problem with
    xfstests #74.)

    The race is this: If while trying to writeback a delayed allocation
    inode, there is a need to map delalloc blocks, and we run out of space
    in the journal, *and* at the same time the inode is already on the
    committing transaction's t_inode_list (because for example while doing
    the punch hole operation, ext4_jbd2_file_inode() is called), then the
    commit operation will wait for the inode to finish all of its pending
    writebacks by calling filemap_fdatawait(), but since that inode has
    one or more pages with the PageWriteback flag set, the commit
    operation will wait forever, and the so the writeback of the inode can
    never take place, and the kjournald thread and the writeback thread
    end up waiting for each other --- forever.

    It's important at this point to recall why an inode is placed on the
    t_inode_list; it is to provide the data=ordered guarantees that we
    don't end up exposing stale data. In the case where we are truncating
    or punching a hole in the inode, there is no possibility that stale
    data could be exposed in the first place, so we don't need to put the
    inode on the t_inode_list!

    The right long-term fix is to get rid of data=ordered mode altogether,
    and only update the extent tree or indirect blocks after the data has
    been written. Until then, this change will also avoid some
    unnecessary waiting in the commit operation.

    Signed-off-by: "Theodore Ts'o"
    Cc: Allison Henderson
    Cc: Jan Kara

    Theodore Ts'o
     

04 Sep, 2011

3 commits

  • This silences some Sparse warnings:
    fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
    fs/jbd2/transaction.c:135:69: expected restricted gfp_t [usertype] flags
    fs/jbd2/transaction.c:135:69: got int [signed] gfp_mask

    Signed-off-by: Dan Carpenter
    Signed-off-by: "Theodore Ts'o"

    Dan Carpenter
     
  • Add debugging information in case jbd2_journal_dirty_metadata() is
    called with a buffer_head which didn't have
    jbd2_journal_get_write_access() called on it, or if the journal_head
    has the wrong transaction in it. In addition, return an error code.
    This won't change anything for ocfs2, which will BUG_ON() the non-zero
    exit code.

    For ext4, the caller of this function is ext4_handle_dirty_metadata(),
    and on seeing a non-zero return code, will call __ext4_journal_stop(),
    which will print the function and line number of the (buggy) calling
    function and abort the journal. This will allow us to recover instead
    of bug halting, which is better from a robustness and reliability
    point of view.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • If the user explicitly specifies conflicting mount options for
    delalloc or dioread_nolock and data=journal, fail the mount, instead
    of printing a warning and continuing (since many user's won't look at
    dmesg and notice the warning).

    Also, print a single warning that data=journal implies that delayed
    allocation is not on by default (since it's not supported), and
    furthermore that O_DIRECT is not supported. Improve the text in
    Documentation/filesystems/ext4.txt so this is clear there as well.

    Similarly, if the dioread_nolock mount option is specified when the
    file system block size != PAGE_SIZE, fail the mount instead of
    printing a warning message and ignoring the mount option.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

03 Sep, 2011

3 commits

  • This patch fixes a second punch hole bug found by xfstests 127.

    This bug happens because punch hole needs to flush the pages
    of the hole to avoid race conditions. But if the end of the
    hole is in the same page as i_size, the buffer heads beyond
    i_size need to be unmapped and the page needs to be zeroed
    after it is flushed.

    To correct this, the new ext4_discard_partial_page_buffers
    routine is used to zero and unmap the partial page
    beyond i_size if the end of the hole appears in the same
    page as i_size.

    The code has also been optimized to set the end of the hole
    to the page after i_size if the specified hole exceeds i_size,
    and the code that flushes the pages has been simplified.

    Signed-off-by: Allison Henderson

    Allison Henderson
     
  • This patch addresses a bug found by xfstests 75, 112, 127
    when blocksize = 1k

    This bug happens because the punch hole code only zeros
    out non block aligned regions of the page. This means that if the
    blocks are smaller than a page, then the block aligned regions of
    the page inside the hole are left un-zeroed, and their buffer heads
    are still mapped. This bug is corrected by using
    ext4_discard_partial_page_buffers to properly zero the partial page
    at the head and tail of the hole, and unmap the corresponding buffer
    heads

    This patch also addresses a bug reported by Lukas while working on a
    new patch to add discard support for loop devices using punch hole.
    The bug happened because of the first and last block number
    needed to be cast to a larger data type before calculating the
    byte offset, but since now we only need the byte offsets of the
    pages, we no longer even need to be calculating the byte offsets
    of the blocks. The code to do the block offset calculations is
    removed in this patch.

    Signed-off-by: Allison Henderson

    Allison Henderson
     
  • This patch adds two new routines: ext4_discard_partial_page_buffers
    and ext4_discard_partial_page_buffers_no_lock.

    The ext4_discard_partial_page_buffers routine is a wrapper
    function to ext4_discard_partial_page_buffers_no_lock.
    The wrapper function locks the page and passes it to
    ext4_discard_partial_page_buffers_no_lock.
    Calling functions that already have the page locked can call
    ext4_discard_partial_page_buffers_no_lock directly.

    The ext4_discard_partial_page_buffers_no_lock function
    zeros a specified range in a page, and unmaps the
    corresponding buffer heads. Only block aligned regions of the
    page will have their buffer heads unmapped. Unblock aligned regions
    will be mapped if needed so that they can be updated with the
    partial zero out. This function is meant to
    be used to update a page and its buffer heads to be zeroed
    and unmapped when the corresponding blocks have been released
    or will be released.

    This routine is used in the following scenarios:
    * A hole is punched and the non page aligned regions
    of the head and tail of the hole need to be discarded

    * The file is truncated and the partial page beyond EOF needs
    to be discarded

    * The end of a hole is in the same page as EOF. After the
    page is flushed, the partial page beyond EOF needs to be
    discarded.

    * A write operation begins or ends inside a hole and the partial
    page appearing before or after the write needs to be discarded

    * A write operation extends EOF and the partial page beyond EOF
    needs to be discarded

    This function takes a flag EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
    which is used when a write operation begins or ends in a hole.
    When the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED flag is used, only
    buffer heads that are already unmapped will have the corresponding
    regions of the page zeroed.

    Signed-off-by: Allison Henderson
    Signed-off-by: "Theodore Ts'o"

    Allison Henderson
     

01 Sep, 2011

2 commits

  • ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads
    that point to directory blocks assigned to the directory inode. However, the
    function calls ext4_handle_dirty_metadata with the inode of the file that's
    being added to the directory, not the directory inode itself. Therefore,
    correct the code to dirty the directory buffers with the directory inode, not
    the file inode.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     
  • ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir".
    Unfortunately, dir_block belongs to the newly created directory (which is
    "inode"), not the parent directory (which is "dir"). Fix the incorrect
    association.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Darrick J. Wong
     

31 Aug, 2011

4 commits

  • When ext4_rename performs a directory rename (move), dir_bh is a
    buffer that is modified to update the '..' link in the directory being
    moved (old_inode). However, ext4_handle_dirty_metadata is called with
    the old parent directory inode (old_dir) and dir_bh, which is
    incorrect because dir_bh does not belong to the parent inode. Fix
    this error.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Darrick J. Wong
     
  • Currently attempts to open a file with O_DIRECT in data=journal mode
    causes the open to fail with -EINVAL. This makes it very hard to test
    data=journal mode. So we will let the open succeed, but then always
    fall back to O_DSYNC buffered writes.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This doesn't make much sense, and it exposes a bug in the kernel where
    attempts to create a new file in an append-only directory using
    O_CREAT will fail (but still leave a zero-length file). This was
    discovered when xfstests #79 was generalized so it could run on all
    file systems.

    Signed-off-by: "Theodore Ts'o"
    Cc:stable@kernel.org

    Theodore Ts'o
     
  • The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
    in ext4_evict_inode() causes lockdep complaining about potential
    deadlock in several places. In most/all of these LOCKDEP complaints
    it looks like it's a false positive, since many of the potential
    circular locking cases can't take place by the time the
    ext4_evict_inode() is called; but since at the very least it may mask
    real problems, we need to address this.

    This change removes the flush_completed_IO() and i_mutex lock in
    ext4_evict_inode(). Instead, we take a different approach to resolve
    the software lockup that commit 2581fdc810 intends to fix. Rather
    than having ext4-dio-unwritten thread wait for grabing the i_mutex
    lock of an inode, we use mutex_trylock() instead, and simply requeue
    the work item if we fail to grab the inode's i_mutex lock.

    This should speed up work queue processing in general and also
    prevents the following deadlock scenario: During page fault,
    shrink_icache_memory is called that in turn evicts another inode B.
    Inode B has some pending io_end work so it calls ext4_ioend_wait()
    that waits for inode B's i_ioend_count to become zero. However, inode
    B's ioend work was queued behind some of inode A's ioend work on the
    same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
    thread on that cpu is processing inode A's ioend work, it tries to
    grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
    still hold before the page fault happened, we enter a deadlock.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

23 Aug, 2011

3 commits


22 Aug, 2011

3 commits


21 Aug, 2011

4 commits

  • This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check
    if there is enough space for balancing smarter"). We can't do 64-bit
    divides on 32-bit architectures.

    In cases where we need to divide/multiply by 2 we should just left/right
    shift respectively, and in cases where theres N number of devices use
    do_div. Also make the counters u64 to match up with rw_devices.
    Thanks,

    Signed-off-by: Josef Bacik
    Acked-and-tested-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: flush any pending end_io requests before DIO reads w/dioread_nolock
    ext4: fix nomblk_io_submit option so it correctly converts uninit blocks
    ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.
    ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode
    ext4: Fix ext4_should_writeback_data() for no-journal mode

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: sound/aoa/fabrics/layout.c: remove unneeded kfree
    ALSA: hda - Fix error check from snd_hda_get_conn_index() in patch_cirrus.c
    ALSA: hda - Don't spew too many ELD errors
    ALSA: usb-audio - Fix missing mixer dB information
    ALSA: hda - Add "PCM" volume to vmaster slave list
    ALSA: hda - Fix duplicated capture-volume creation for ALC268 models
    ALSA: ac97: Add HP Compaq dc5100 SFF(PT003AW) to Headphone Jack Sense whitelist
    ALSA: snd_usb_caiaq: track submitted output urbs

    Linus Torvalds
     
  • Fix new kernel-doc warning in pci.c:

    Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps'
    Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps'

    Signed-off-by: Randy Dunlap
    Cc: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

20 Aug, 2011

6 commits

  • The label outnodev is only used when kzalloc has not yet taken place or has
    failed, so there is no need for the call for kfree under this label.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    identifier x;
    expression E1!=0,E2,E3,E4;
    statement S;
    iterator I;
    @@

    (
    if (...) { ... when != kfree(x)
    when != x = E3
    when != E3 = x
    * return ...;
    }
    ... when != x = E2
    when != I(...,x,...) S
    if (...) { ... when != x = E4
    kfree(x); ... return ...; }
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Takashi Iwai

    Julia Lawall
     
  • snd_hda_get_conn_index() returns a negative value while the current code
    stores it in an unsigned int. It must be stored in a signed integer.

    Reported-by: Jesper Juhl
    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • Currently HD-audio driver shows the all error ELD byte as an error
    in the kernel message. This is annoying when the video driver doesn't
    set the correct ELD from the beginning. e.g. radeon sends a zero-byte
    data, but we still check ELD with the fixed 128 byte as a workaround
    for some broken devices, it spews 128-times errors.

    For avoiding this, the driver aborts reading when the first byte is
    invalid. In such a case, the whole data is certainly invalid.

    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel:
    drm/i915: set GFX_MODE to pre-Ivybridge default value even on Ivybridge

    Linus Torvalds
     
  • There is a race between ext4 buffer write and direct_IO read with
    dioread_nolock mount option enabled. The problem is that we clear
    PageWriteback flag during end_io time but will do
    uninitialized-to-initialized extent conversion later with dioread_nolock.
    If an O_direct read request comes in during this period, ext4 will return
    zero instead of the recently written data.

    This patch checks whether there are any pending uninitialized-to-initialized
    extent conversion requests before doing O_direct read to close the race.
    Note that this is just a bandaid fix. The fundamental issue is that we
    clear PageWriteback flag before we really complete an IO, which is
    problem-prone. To fix the fundamental issue, we may need to implement an
    extent tree cache that we can use to look up pending to-be-converted extents.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Jiaying Zhang
     
  • Prior to Ivybridge, the GFX_MODE would default to 0x800, meaning that
    MI_FLUSH would flush the TLBs in addition to the rest of the caches
    indicated in the MI_FLUSH command. However starting with Ivybridge, the
    register defaults to 0x2800 out of reset, meaning that to invalidate the
    TLB we need to use PIPE_CONTROL. Since we're not doing that yet, go
    back to the old default so things work.

    v2: don't forget to actually *clear* the new bit

    Reviewed-by: Eric Anholt
    Reviewed-by: Chris Wilson
    Tested-by: Kenneth Graunke
    Signed-off-by: Jesse Barnes

    Jesse Barnes