09 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

02 Dec, 2011

2 commits

  • The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
    Please let me know if I missed anything, and I will try to get it changed and resent.

    Signed-off-by: Justin P. Mattock
    Acked-by: Randy Dunlap
    Signed-off-by: Jiri Kosina

    Justin P. Mattock
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
    ocfs2: avoid unaligned access to dqc_bitmap
    ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
    ocfs2: honor O_(D)SYNC flag in fallocate
    ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
    ocfs2: send correct UUID to cleancache initialization
    ocfs2: Commit transactions in error cases -v2
    ocfs2: make direntry invalid when deleting it
    fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
    ocfs2: Avoid livelock in ocfs2_readpage()
    ocfs2: serialize unaligned aio
    ocfs2: Implement llseek()
    ocfs2: Fix ocfs2_page_mkwrite()
    ocfs2: Add comment about orphan scanning
    ocfs2: Clean up messages in the fs
    ocfs2/cluster: Cluster up now includes network connections too
    ocfs2/cluster: Add new function o2net_fill_node_map()
    ocfs2/cluster: Fix output in file elapsed_time_in_ms
    ocfs2/dlm: dlmlock_remote() needs to account for remastery
    ocfs2/dlm: Take inflight reference count for remotely mastered resources too
    ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
    ...

    Linus Torvalds
     

17 Nov, 2011

1 commit


22 Aug, 2011

1 commit


28 Jul, 2011

1 commit

  • Fix a corruption that can happen when we have (two or more) outstanding
    aio's to an overlapping unaligned region. Ext4
    (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
    similar issues.

    In our case what happens is that we can have an outstanding aio on a region
    and if a write comes in with some bytes overlapping the original aio we may
    decide to read that region into a page before continuing (typically because
    of buffered-io fallback). Since we have no ordering guarantees with the
    aio, we can read stale or bad data into the page and then write it back out.

    If the i/o is page and block aligned, then we avoid this issue as there
    won't be any need to read data from disk.

    I took the same approach as Eric in the ext4 patch and introduced some
    serialization of unaligned async direct i/o. I don't expect this to have an
    effect on the most common cases of AIO. Unaligned aio will be slower
    though, but that's far more acceptable than data corruption.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Mark Fasheh
     

26 Jul, 2011

2 commits

  • ocfs2 implements its own llseek() to provide the SEEK_HOLE/SEEK_DATA
    functionality.

    SEEK_HOLE sets the file pointer to the start of either a hole or an unwritten
    (preallocated) extent, that is greater than or equal to the supplied offset.

    SEEK_DATA sets the file pointer to the start of an allocated extent (not
    unwritten) that is greater than or equal to the supplied offset.

    If the supplied offset is on a desired region, then the file pointer is set
    to it. Offsets greater than or equal to the file size return -ENXIO.

    Unwritten (preallocated) extents are considered holes because the file system
    treats reads to such regions in the same way as it does to holes.

    Signed-off-by: Sunil Mushran

    Sunil Mushran
     
  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2011

4 commits

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     
  • Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
    This these filesystems to also protect truncate against direct I/O requests
    by using common code. Right now the only non-DIO_LOCKING filesystem that
    appears to do so is XFS, which uses an opencoded variant of the i_dio_count
    scheme.

    Behaviour doesn't change for filesystems never calling inode_dio_wait.
    For ext4 behaviour changes when using the dioread_nonlock option, which
    previously was missing any protection between truncate and direct I/O reads.
    For ocfs2 that handcrafted i_dio_count manipulations are replaced with
    the common code now enable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • i_alloc_sem is a rather special rw_semaphore. It's the last one that may
    be released by a non-owner, and it's write side is always mirrored by
    real exclusion. It's intended use it to wait for all pending direct I/O
    requests to finish before starting a truncate.

    Replace it with a hand-grown construct:

    - exclusion for truncates is already guaranteed by i_mutex, so it can
    simply fall way
    - the reader side is replaced by an i_dio_count member in struct inode
    that counts the number of pending direct I/O requests. Truncate can't
    proceed as long as it's non-zero
    - when i_dio_count reaches non-zero we wake up a pending truncate using
    wake_up_bit on a new bit in i_flags
    - new references to i_dio_count can't appear while we are waiting for
    it to read zero because the direct I/O count always needs i_mutex
    (or an equivalent like XFS's i_iolock) for starting a new operation.

    This scheme is much simpler, and saves the space of a spinlock_t and a
    struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
    system).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

20 Jul, 2011

3 commits


26 May, 2011

1 commit


14 May, 2011

1 commit

  • In the case of removing a partial extent record which covers a hole, current
    punching-hole logic will try to remove more than the length of whole extent
    record, which leads to the failure of following assert(fs/ocfs2/alloc.c):

    5507 BUG_ON(cpos < le32_to_cpu(rec->e_cpos) || trunc_range > rec_range);

    This patch tries to skip existing hole at the last attempt of removing a partial
    extent record, what's more, it also adds some necessary comments for better
    understanding of punching-hole codes.

    Signed-off-by: Tristan Ye
    Signed-off-by: Joel Becker

    Tristan Ye
     

07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

22 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

17 Jan, 2011

2 commits

  • Currently all filesystems except XFS implement fallocate asynchronously,
    while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
    I/O we really want our allocation on disk, especially for the !KEEP_SIZE
    case where we actually grow the file with user-visible zeroes. On the
    other hand always commiting the transaction is a bad idea for fast-path
    uses of fallocate like for example in recent Samba versions. Given
    that block allocation is a data plane operation anyway change it from
    an inode operation to a file operation so that we have the file structure
    available that lets us check for O_SYNC.

    This also includes moving the code around for a few of the filesystems,
    and remove the already unnedded S_ISDIR checks given that we only wire
    up fallocate for regular files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of various home grown checks that might need updates for new
    flags just check for any bit outside the mask of the features supported
    by the filesystem. This makes the check future proof for any newly
    added flag.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

13 Jan, 2011

1 commit


07 Jan, 2011

1 commit


10 Dec, 2010

1 commit

  • Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX
    rw_lock like buffered writes did(rw_level == 1), it turns out messing the
    usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being
    failed to get up_read'd correctly.

    This patch tries to teach ocfs2_dio_end_io to understand well on all locking
    stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private
    data, just like what we did for rw_lock.

    Signed-off-by: Tristan Ye
    Signed-off-by: Joel Becker

    Tristan Ye
     

26 Oct, 2010

1 commit

  • __block_write_begin and block_prepare_write are identical except for slightly
    different calling conventions. Convert all callers to the __block_write_begin
    calling conventions and drop block_prepare_write.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • Commit dd3932eddf42 ("block: remove BLKDEV_IFL_WAIT") had removed the
    flag argument to blkdev_issue_flush(), but the ocfs2 merge brought in a
    new one. It didn't cause a merge conflict, so the merges silently
    worked out fine, but the result didn't actually compile.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Oct, 2010

1 commit

  • Currently, the default behavior of O_DIRECT writes was allowing
    concurrent writing among nodes to the same file, with no cluster
    coherency guaranteed (no EX lock held). This can leave stale data in
    the cache for buffered reads on other nodes.

    The new mount option introduce a chance to choose two different
    behaviors for O_DIRECT writes:

    * coherency=full, as the default value, will disallow
    concurrent O_DIRECT writes by taking
    EX locks.

    * coherency=buffered, allow concurrent O_DIRECT writes
    without EX lock among nodes, which
    gains high performance at risk of
    getting stale data on other nodes.

    Signed-off-by: Tristan Ye
    Signed-off-by: Joel Becker

    Tristan Ye
     

16 Sep, 2010

1 commit


10 Sep, 2010

3 commits


08 Sep, 2010

3 commits

  • The patch is to fix the regression bug brought from commit 6b933c8...( 'ocfs2:
    Avoid direct write if we fall back to buffered I/O'):

    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285

    The commit 6b933c8e6f1a2f3118082c455eef25f9b1ac7b45 changed __generic_file_aio_write
    to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to flush
    the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt
    the O_DIRECT semantics somehow in extented odirect writes.

    This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly
    flushed.

    Signed-off-by: Tristan Ye
    Signed-off-by: Tao Ma

    Tristan Ye
     
  • We cannot call grab_cache_page() when holding filesystem locks or with
    a transaction started as grab_cache_page() calls page allocation with
    GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
    causing deadlocks or various assertion failures. We have to use
    find_or_create_page() instead and pass it GFP_NOFS as we do with other
    allocations.

    Acked-by: Mark Fasheh
    Signed-off-by: Jan Kara
    Signed-off-by: Tao Ma

    Jan Kara
     
  • When 'barrier' mount option is specified, we have to issue a cache flush
    during fdatasync(2). We have to do this even if inode doesn't have
    I_DIRTY_DATASYNC set because we still have to get written *data* to disk so
    that they are not lost in case of crash.

    Acked-by: Tao Ma
    Signed-off-by: Jan Kara
    Singed-off-by: Tao Ma

    Jan Kara
     

12 Aug, 2010

2 commits


10 Aug, 2010

2 commits

  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Jul, 2010

1 commit