14 Oct, 2009

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cciss: Add cciss_allow_hpsa module parameter
    cciss: Fix multiple calls to pci_release_regions
    blk-settings: fix function parameter kernel-doc notation
    writeback: kill space in debugfs item name
    writeback: account IO throttling wait as iowait
    elv_iosched_store(): fix strstrip() misuse
    cfq-iosched: avoid probable slice overrun when idling
    cfq-iosched: apply bool value where we return 0/1
    cfq-iosched: fix think time allowed for seekers
    cfq-iosched: fix the slice residual sign
    cfq-iosched: abstract out the 'may this cfqq dispatch' logic
    block: use proper BLK_RW_ASYNC in blk_queue_start_tag()
    block: Seperate read and write statistics of in_flight requests v2
    block: get rid of kblock_schedule_delayed_work()
    cfq-iosched: fix possible problem with jiffies wraparound
    cfq-iosched: fix issue with rq-rq merging and fifo list ordering

    Linus Torvalds
     

13 Oct, 2009

2 commits

  • This avoids updating the superblock write time when we are mounting
    the root file system read/only but we need to replay the journal; at
    that point, for people who are east of GMT and who make their clock
    tick in localtime for Windows bug-for-bug compatibility, and this will
    cause e2fsck to complain and force a full file system check.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Jan Kara

    Theodore Ts'o
     
  • struct sockaddr_storage * can safely be used as struct sockaddr *.
    Suppress an "incompatible pointer type" warning.

    Signed-off-by: Stefan Richter
    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Stefan Richter
     

12 Oct, 2009

3 commits

  • An interestingly corrupted romfs file system exposed a problem with the
    romfs_dev_strnlen function: it's passing the wrong value to its helpers.
    Rather than limit the string to the length passed in by the callers, it
    uses the size of the device as the limit.

    Signed-off-by: Bernd Schmidt
    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Bernd Schmidt
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: fix file clone ioctl for bookend extents
    Btrfs: fix uninit compiler warning in cow_file_range_nocow
    Btrfs: constify dentry_operations
    Btrfs: optimize back reference update during btrfs_drop_snapshot
    Btrfs: remove negative dentry when deleting subvolumne
    Btrfs: optimize fsync for the single writer case
    Btrfs: async delalloc flushing under space pressure
    Btrfs: release delalloc reservations on extent item insertion
    Btrfs: delay clearing EXTENT_DELALLOC for compressed extents
    Btrfs: cleanup extent_clear_unlock_delalloc flags
    Btrfs: fix possible softlockup in the allocator
    Btrfs: fix deadlock on async thread startup

    Linus Torvalds
     
  • After m68k's task_thread_info() doesn't refer to current,
    it's possible to remove sched.h from interrupt.h and not break m68k!
    Many thanks to Heiko Carstens for allowing this.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

10 Oct, 2009

2 commits


09 Oct, 2009

22 commits

  • The file clone ioctl was incorrectly taking the offset into the
    extent on disk into account when calculating the length of the
    cloned extent.

    The length never changes based on the offset into the physical extent.

    Test case:

    fallocate -l 1g image
    mke2fs image
    bcp image image2
    e2fsck -f image2

    (errors on image2)

    The math bug ends up wrapping the length of the extent, and things
    go wrong from there.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The extent_type variable was exposed uninit via a goto. It should be
    impossible to trigger because it is protected by a check on another
    variable, but this makes sure.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Signed-off-by: Chris Mason

    Alexey Dobriyan
     
  • This patch reading level 0 tree blocks that already use full backrefs.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • The use of btrfs_dentry_delete is removing dentries from the
    dcache when deleting subvolumne. btrfs_dentry_delete ignores
    negative dentries. This is incorrect since if we don't remove
    the negative dentry, its parent dentry can't be removed.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFSv4: Kill nfs4_renewd_prepare_shutdown()
    NFSv4: Fix the referral mount code
    nfs: Avoid overrun when copying client IP address string
    NFS: Fix port initialisation in nfs_remount()
    NFS: Fix port and mountport display in /proc/self/mountinfo
    NFS: Fix a default mount regression...

    Linus Torvalds
     
  • This patch optimizes the tree logging stuff so it doesn't always wait 1 jiffie
    for new people to join the logging transaction if there is only ever 1 writer.
    This helps a little bit with latency where we have something like RPM where it
    will fdatasync every file it writes, and so waiting the 1 jiffie for every
    fdatasync really starts to add up.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch moves the delalloc flushing that occurs when we are under space
    pressure off to a async thread pool. This helps since we only free up
    metadata space when we actually insert the extent item, which means it takes
    quite a while for space to be free'ed up if we wait on all ordered extents.
    However, if space is freed up due to inline extents being inserted, we can
    wake people who are waiting up early, and they can finish their work.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch fixes an issue with the delalloc metadata space reservation
    code. The problem is we used to free the reservation as soon as we
    allocated the delalloc region. The problem with this is if we are not
    inserting an inline extent, we don't actually insert the extent item until
    after the ordered extent is written out. This patch does 3 things,

    1) It moves the reservation clearing stuff into the ordered code, so when
    we remove the ordered extent we remove the reservation.
    2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
    delalloc bits in the cases where we want to clear the metadata reservation
    when we clear the delalloc extent, in the case that we do an inline extent
    or we invalidate the page.
    3) It adds another waitqueue to the space info so that when we start a fs
    wide delalloc flush, anybody else who also hits that area will simply wait
    for the flush to finish and then try to make their allocation.

    This has been tested thoroughly to make sure we did not regress on
    performance.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • When compression is on, the cow_file_range code is farmed off to
    worker threads. This allows us to do significant CPU work in parallel
    on SMP machines.

    But it is a delicate balance around when we clear flags and how. In
    the past we cleared the delalloc flag immediately, which was safe
    because the pages stayed locked.

    But this is causing problems with the newest ENOSPC code, and with the
    recent extent state cleanups we can now clear the delalloc bit at the
    same time the uncompressed code does.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • extent_clear_unlock_delalloc has a growing set of ugly parameters
    that is very difficult to read and maintain.

    This switches to a flag field and well named flag defines.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Alex Elder
     
  • Now that the VFS actually waits for the data I/O to complete before
    calling into ->fsync we can stop doing it ourselves.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • This is for bug #850,
    http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
    XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31

    The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
    xfs_bmapi() ASSERTs that it has been asked for at least one map,

    and it was getting 0.

    The root cause is that our guesstimated "bufsize" from xfs_file_readdir
    was fairly small, and the

    bufsize -= length;

    in the loop was going negative - except bufsize is a size_t, so it
    was wrapping to a very large number.

    Then when we did
    ra_want = howmany(bufsize + mp->m_dirblksize,
    mp->m_sb.sb_blocksize) - 1;

    with that very large number, the (int) ra_want was coming out
    negative, and a subsequent compare:

    if (1 + ra_want > map_blocks ...

    was coming out -true- (negative int compare w/ uint) and we went
    back to xfs_bmapi() for more, even though we did not need more,
    and asked for 0 maps, and hit the ASSERT.

    We have kind of a type mess here, but just keeping bufsize from
    going negative is probably sufficient to avoid the problem.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Eric Sandeen
     
  • We want to always cover the log after writing out the superblock, and
    in case of a synchronous writeout make sure we actually wait for the
    log to be covered. That way a filesystem that has been sync()ed can
    be considered clean by log recovery.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Eric Sandeen
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • To make sure they get properly waited on in sync when I/O is in flight and
    we latter need to update the inode size. Requires a new helper to check if an
    ioend structure is beyond the current EOF.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case
    as that is just an optional first pass and the superblock will be written back
    properly in the next call with wait = 1. Instead perform an opportunistic
    quota writeback to have less work later. Also remove the freeze special case
    as we do a proper wait = 1 call in the freeze code anyway.

    Also rename the function to xfs_fs_sync_fs to match the normal naming
    convention, update comments and avoid calling into the laptop_mode logic on
    an error.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • We need to do a synchronous xfs_sync_fsdata to make sure the superblock
    actually is on disk when we return.

    Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
    flag is never checked.

    Move xfs_filestream_flush call later to only release inodes after they
    have been written out.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • This is picking up on Felix's repost of Dave's patch to implement a
    .dirty_inode method. We really need this notification because
    the VFS keeps writing directly into the inode structure instead
    of going through methods to update this state. In addition to
    the long-known atime issue we now also have a caller in VM code
    that updates c/mtime that way for shared writeable mmaps. And
    I found another one that no one has noticed in practice in the FIFO
    code.

    So implement ->dirty_inode to set i_update_core whenever the
    inode gets externally dirtied, and switch the c/mtime handling to
    the same scheme we already use for atime (always picking up
    the value from the Linux inode).

    Note that this patch also removes the xfs_synchronize_atime call
    in xfs_reclaim it was superflous as we already synchronize the time
    when writing the inode via the log (xfs_inode_item_format) or the
    normal buffers (xfs_iflush_int).

    In addition also remove the I_CLEAR check before copying the Linux
    timestamps - now that we always have the Linux inode available
    we can always use the timestamps in it.

    Also switch to just using file_update_time for regular reads/writes -
    that will get us all optimization done to it for free and make
    sure we notice early when it breaks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Felix Blyakher
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • The unencrypted files are being measured. Update the counters to get
    rid of the ecryptfs imbalance message. (http://bugzilla.redhat.com/519737)

    Reported-by: Sachin Garg
    Cc: Eric Paris
    Cc: Dustin Kirkland
    Cc: James Morris
    Cc: David Safford
    Cc: stable@kernel.org
    Signed-off-by: Mimi Zohar
    Signed-off-by: Tyler Hicks

    Mimi Zohar
     
  • eCryptfs no longer uses a netlink interface to communicate with
    ecryptfsd, so NET is not a valid dependency anymore.

    MD5 is required and must be built for eCryptfs to be of any use.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • ecryptfs uses crypto APIs so it should depend on CRYPTO.
    Otherwise many build errors occur. [63 lines not pasted]

    Signed-off-by: Randy Dunlap
    Cc: Andrew Morton
    Cc: ecryptfs-devel@lists.launchpad.net
    Signed-off-by: Tyler Hicks

    Randy Dunlap
     

08 Oct, 2009

3 commits

  • The NFSv4 renew daemon is shared between all active super blocks that refer
    to a particular NFS server, so it is wrong to be shutting it down in
    nfs4_kill_super every time a super block is destroyed.

    This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and
    leaves it up to nfs4_shutdown_client() to also shut down the renew daemon
    by means of the existing call to nfs4_kill_renewd().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • This flag indicates a hardware detected memory corruption on the page.
    Any future access of the page data may bring down the machine.

    Signed-off-by: Wu Fengguang
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • fix the following 'make includecheck' warning:

    fs/proc/kcore.c: linux/mm.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     

07 Oct, 2009

6 commits

  • Fix a typo which causes try_location() to use the wrong length argument
    when calling nfs_parse_server_name(). This again, causes the initialisation
    of the mount's sockaddr structure to fail.

    Also ensure that if nfs4_pathname_string() returns an error, then we pass
    that error back up the stack instead of ENOENT.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • As seen in , nfs4_init_client() can
    overrun the source string when copying the client IP address from
    nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr. Since
    these are both treated as null-terminated strings elsewhere, the copy
    should be done with strlcpy() not memcpy().

    Signed-off-by: Ben Hutchings
    Signed-off-by: Trond Myklebust

    Ben Hutchings
     
  • The recent changeset 53a0b9c4c99ab0085a06421f71592722e5b3fd5f (NFS: Replace
    nfs_parse_ip_address() with rpc_pton()) broke nfs_remount, since the call
    to rpc_pton() will zero out the port number in data->nfs_server.address.

    This is actually due to a bug in nfs_remount: it should be looking at the
    port number in nfs_server.port instead...

    This fixes bug
    http://bugzilla.kernel.org/show_bug.cgi?id=14276

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Currently, the port and mount port will both display as 65535 if you do not
    specify a port number. That would be wrong...

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • With the recent spate of changes, the nfs protocol version will now default
    to 2 instead of 3, while the mount protocol version defaults to 3.

    The following patch should ensure the defaults are consistent with the
    previous defaults of vers=3,proto=tcp,mountvers=3,mountproto=tcp.

    This fixes the bug
    http://bugzilla.kernel.org/show_bug.cgi?id=14259

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read
    and write statistics of in_flight requests. And exported the number
    of read and write requests in progress seperately through sysfs.

    But Corrado Zoccolo reported getting strange
    output from "iostat -kx 2". Global values for service time and
    utilization were garbage. For interval values, utilization was always
    100%, and service time is higher than normal.

    So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b

    The problem was in part_round_stats_single(), I missed the following:
    if (now == part->stamp)
    return;

    - if (part->in_flight) {
    + if (part_in_flight(part)) {
    __part_stat_add(cpu, part, time_in_queue,
    part_in_flight(part) * (now - part->stamp));
    __part_stat_add(cpu, part, io_ticks, (now - part->stamp));

    With this chunk included, the reported regression gets fixed.

    Signed-off-by: Nikanth Karthikesan

    --
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     

06 Oct, 2009

1 commit

  • Like the cluster allocating stuff, we can lockup the box with the normal
    allocation path. This happens when we

    1) Start to cache a block group that is severely fragmented, but has a decent
    amount of free space.
    2) Start to commit a transaction
    3) Have the commit try and empty out some of the delalloc inodes with extents
    that are relatively large.

    The inodes will not be able to make the allocations because they will ask for
    allocations larger than a contiguous area in the free space cache. So we will
    wait for more progress to be made on the block group, but since we're in a
    commit the caching kthread won't make any more progress and it already has
    enough free space that wait_block_group_cache_progress will just return. So,
    if we wait and fail to make the allocation the next time around, just loop and
    go to the next block group. This keeps us from getting stuck in a softlockup.
    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik