23 Jan, 2007

4 commits

  • For large size DIO that needs multiple bio, one full page worth of data was
    lost at the boundary of bio's maximum sector or segment limits. After a
    bio is full and got submitted. The outer while (nbytes) { ... } loop will
    allocate a new bio and just march on to index into next page. It just
    forgets about the page that bio_add_page() rejected when previous bio is
    full. Fix it by put the rejected page back to pvec so we pick it up again
    for the next bio.

    Signed-off-by: Ken Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     
  • size_t is unsigned. IO errors aren't getting through.

    Cc: "Chen, Kenneth W"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • * git://git.infradead.org/mtd-2.6: (84 commits)
    [JFFS2] debug.h: include for current->pid
    [MTD] OneNAND: Handle DDP chip boundary during read-while-load
    [MTD] OneNAND: return ecc error code only when 2-bit ecc occurs
    [MTD] OneNAND: Implement read-while-load
    [MTD] OneNAND: fix onenand_wait bug in read ecc error
    [MTD] OneNAND: release CPU in cycles
    [MTD] OneNAND: add subpage write support
    [MTD] OneNAND: fix onenand_wait bug
    [JFFS2] use the ref_offset macro
    [JFFS2] Reschedule in loops
    [JFFS2] Fix error-path leak in summary scan
    [JFFS2] add cond_resched() when garbage collecting deletion dirent
    [MTD] Nuke IVR leftovers
    [MTD] OneNAND: fix oob handling in recent oob patch
    [MTD] Fix ssfdc blksize typo
    [JFFS2] replace kmalloc+memset with kzalloc
    [MTD] Fix SSFDC build for variable blocksize.
    [MTD] ESB2ROM uses PCI
    [MTD] of_device-based physmap driver
    [MTD] Support combined RedBoot FIS directory and configuration area
    ...

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
    ocfs2: Add backup superblock info to ocfs2_fs.h
    ocfs2: cleanup ocfs2_iget() errors
    ocfs2: Directory c/mtime update fixes
    ocfs2: Don't print errors when following symlinks

    Linus Torvalds
     

22 Jan, 2007

4 commits


18 Jan, 2007

3 commits


13 Jan, 2007

1 commit


12 Jan, 2007

2 commits

  • Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
    xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

    (XFS unlocks the semaphore from a different task, by design. The mutex
    code warns about this)

    Signed-off-by: Dave Chinner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     
  • NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

    Signed-off-by: Trond Myklebust

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Peter Zijlstra
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

11 Jan, 2007

1 commit

  • Revert previous attempts at messing with the linux banner string and
    simply use a separate format string for proc.

    Signed-off-by: Roman Zippel
    Acked-by: Olaf Hering
    Acked-by: Jean Delvare
    Cc: Andrey Borzenkov
    Cc: Andrew Morton
    Cc: Andy Whitcroft
    Cc: Herbert Poetzl
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

10 Jan, 2007

2 commits


07 Jan, 2007

1 commit

  • This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b.

    Hugh Dickins reports that it causes random failures on x86 with SuSE
    10.2, and points out

    "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE,
    sure to place the ET_DYN from time to time just where the comment
    says it's trying to avoid? I assume that somehow results in the error
    reported."

    (where the comment in question is the existing comment in the source
    code about mmap/brk clashes).

    Suggested-by: Hugh Dickins
    Acked-by: Marcus Meissner
    Cc: Andrew Morton
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Jan, 2007

3 commits

  • Looks like this is the problem, which point Al Viro some time ago:

    ufs's get_block callback allocates 16k of disk at a time, and links that
    entire 16k into the file's metadata. But because get_block is called for only
    a single buffer_head (a 2k buffer_head in this case?) we are only able to tell
    the VFS that this 2k is buffer_new().

    So when ufs_getfrag_block() is later called to map some more data in the file,
    and when that data resides within the remaining 14k of this fragment,
    ufs_getfrag_block() will incorrectly return a !buffer_new() buffer_head.

    I don't see _right_ way to do nullification of whole block, if use inode
    page cache, some pages may be outside of inode limits (inode size), and
    will be lost; if use blockdev page cache it is possible to zero real data,
    if later inode page cache will be used.

    The simpliest way, as can I see usage of block device page cache, but not only
    mark dirty, but also sync it during "nullification". I use my simple tests
    collection, which I used for check that create,open,write,read,close works on
    ufs, and I see that this patch makes ufs code 18% slower then before.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • CVE-2006-5753 is for a case where an inode can be marked bad, switching
    the ops to bad_inode_ops, which are all connected as:

    static int return_EIO(void)
    {
    return -EIO;
    }

    #define EIO_ERROR ((void *) (return_EIO))

    static struct inode_operations bad_inode_ops =
    {
    .create = bad_inode_create
    ...etc...

    The problem here is that the void cast causes return types to not be
    promoted, and for ops such as listxattr which expect more than 32 bits of
    return value, the 32-bit -EIO is interpreted as a large positive 64-bit
    number, i.e. 0x00000000fffffffa instead of 0xfffffffa.

    This goes particularly badly when the return value is taken as a number of
    bytes to copy into, say, a user's buffer for example...

    I originally had coded up the fix by creating a return_EIO_ macro
    for each return type, like this:

    static int return_EIO_int(void)
    {
    return -EIO;
    }
    #define EIO_ERROR_INT ((void *) (return_EIO_int))

    static struct inode_operations bad_inode_ops =
    {
    .create = EIO_ERROR_INT,
    ...etc...

    but Al felt that it was probably better to create an EIO-returner for each
    actual op signature. Since so few ops share a signature, I just went ahead
    & created an EIO function for each individual file & inode op that returns
    a value.

    Signed-off-by: Eric Sandeen
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Fix filenames on adfs discs being terminated at the first character greater
    than 128 (adfs filenames are Latin 1). I saw this problem when using a
    loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it
    there.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Bursa
     

03 Jan, 2007

1 commit


31 Dec, 2006

3 commits

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
    ocfs2: export heartbeat thread pid via configfs
    ocfs2: always unmap in ocfs2_data_convert_worker()
    ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
    ocfs2: Allow direct I/O read past end of file
    ocfs2: don't print error in ocfs2_permission()

    Linus Torvalds
     
  • ramfs doesn't provide the .set_dirty_page a_op, and when the BLOCK layer is
    not configured in, 'set_page_dirty' makes a call via a NULL pointer.

    Signed-off-by: Dimitri Gorokhovik
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dimitri Gorokhovik
     
  • lockdep found a AB BC CA lock inversion in retry-based AIO:

    1) The task struct's alloc_lock (A) is acquired in process context with
    interrupts enabled. An interrupt might arrive and call wake_up() which
    grabs the wait queue's q->lock (B).

    2) When performing retry-based AIO the AIO core registers
    aio_wake_function() as the wake funtion for iocb->ki_wait. It is called
    with the wait queue's q->lock (B) held and then tries to add the iocb to
    the run list after acquiring the ctx_lock (C).

    3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
    alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning
    saying that we're trying to connect the irq-safe q->lock to the
    irq-unsafe alloc_lock via ctx_lock.

    This fixes the inversion by calling unuse_mm() in the AIO kick handing path
    after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx
    could set ctx->mm to NULL, so we must only access ctx->mm while we have the
    lock.

    Signed-off-by: Zach Brown
    Signed-off-by: Suparna Bhattacharya
    Acked-by: Benjamin LaHaise
    Cc: "Chen, Kenneth W"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

29 Dec, 2006

5 commits


24 Dec, 2006

2 commits


23 Dec, 2006

5 commits

  • In the current jbd code, if a buffer on BJ_SyncData list is dirty and not
    locked, the buffer is refiled to BJ_Locked list, submitted to the IO and
    waited for IO completion.

    But the fsstress test showed the case that when a buffer was already
    submitted to the IO just before the buffer_dirty(bh) check, the buffer was
    not waited for IO completion.

    Following patch solves this problem. If it is assumed that a buffer is
    submitted to the IO before the buffer_dirty(bh) check and still being
    written to disk, this buffer is refiled to BJ_Locked list.

    Signed-off-by: Hisashi Hifumi
    Cc: Jan Kara
    Cc: "Stephen C. Tweedie"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     
  • Christoph Hellwig has expressed concerns that the recent fdtable changes
    expose the details of the RCU methodology used to release no-longer-used
    fdtable structures to the rest of the kernel. The trivial patch below
    addresses these concerns by introducing the appropriate free_fdtable()
    calls, which simply wrap the release RCU usage. Since free_fdtable() is a
    one-liner, it makes sense to promote it to an inline helper.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     
  • Mark JFFS as broken and provide a warning to users that it is deprecated
    and scheduled for removal in 2.6.21

    Signed-off-by: Josh Boyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Boyer
     
  • Trevor found a file size problem in eCryptfs in recent kernels, and he
    tracked it down to an fsstack change.

    This was the eCryptfs copy_attr_all:

    > -void ecryptfs_copy_attr_all(struct inode *dest, const struct inode *src)
    > -{
    > - dest->i_mode = src->i_mode;
    > - dest->i_nlink = src->i_nlink;
    > - dest->i_uid = src->i_uid;
    > - dest->i_gid = src->i_gid;
    > - dest->i_rdev = src->i_rdev;
    > - dest->i_atime = src->i_atime;
    > - dest->i_mtime = src->i_mtime;
    > - dest->i_ctime = src->i_ctime;
    > - dest->i_blkbits = src->i_blkbits;
    > - dest->i_flags = src->i_flags;
    > -}

    This is the fsstack copy_attr_all:

    > +void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
    > + int (*get_nlinks)(struct inode *))
    > +{
    > + if (!get_nlinks)
    > + dest->i_nlink = src->i_nlink;
    > + else
    > + dest->i_nlink = (*get_nlinks)(dest);
    > +
    > + dest->i_mode = src->i_mode;
    > + dest->i_uid = src->i_uid;
    > + dest->i_gid = src->i_gid;
    > + dest->i_rdev = src->i_rdev;
    > + dest->i_atime = src->i_atime;
    > + dest->i_mtime = src->i_mtime;
    > + dest->i_ctime = src->i_ctime;
    > + dest->i_blkbits = src->i_blkbits;
    > + dest->i_flags = src->i_flags;
    > +
    > + fsstack_copy_inode_size(dest, src);
    > +}

    The addition of copy_inode_size breaks eCryptfs, since eCryptfs needs to
    interpolate the file sizes (eCryptfs has extra space in the lower file for
    the header). The setting of the upper inode size occurs elsewhere in
    eCryptfs, and the new copy_attr_all now undoes what eCryptfs was doing
    right beforehand.

    I see three ways of going forward from here. (1) Something like this patch
    needs to go in (assuming it jives with Unionfs), (2) we need to make a
    change to the fsstack API for more fine-grained control over copying
    attributes (e.g., by also including a callback function for calculating the
    right file size, which will require some more work on both eCryptfs and
    Unionfs), or (3) the fsstack patch on eCryptfs (commit
    0cc72dc7f050188d8d7344b1dd688cbc68d3cd30 made on Fri Dec 8 02:36:31 2006
    -0800) needs to be yanked in 2.6.20.

    I think the simplest solution, from eCryptfs' perspective, is to just
    remove the inode size copy.

    Remove inode size copy in general fsstack attr copy code. Stacked
    filesystems may need to interpolate the inode size, since the file
    size in the lower file may be different than the file size in the
    stacked layer.

    Signed-off-by: Michael Halcrow
    Acked-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Add proper prototypes for sysv_{init,destroy}_icache() in sysv.h

    Signed-off-by: Adrian Bunk
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

22 Dec, 2006

3 commits

  • XFS appears to call clear_page_dirty to get the mapping tree dirty tag
    set correctly at the same time the page dirty flag is cleared. I note
    that this can be done by set_page_writeback() if we clear the dirty flag
    on the page first when we are writing back the entire page.

    Hence it seems to me that the XFS call to clear_page_dirty() could
    easily be substituted by clear_page_dirty_for_io() followed by a call to
    set_page_writeback() to get the mapping tree tags set correctly after
    the page has been marked clean.

    Signed-off-by: Linus Torvalds

    David Chinner
     
  • The use by FUSE was just a remnant of an optimization from the time
    when writable mappings were supported.

    Now FUSE never actually allows the creation of dirty pages, so this
    invocation of clear_page_dirty() is effectively a no-op.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch removes some questionable code that attempted to make a
    no-longer-used page easier to reclaim.

    Calling metapage_writepage against such a page will not result in any
    I/O being performed, so removing this code shouldn't be a big deal.

    [ It's likely that we could have just replaced the "clear_page_dirty()"
    call with a call to "cancel_dirty_page()" instead, but in the
    meantime this is cleaner and simpler anyway, so unless there is some
    overriding reason (and Dave implies there isn't) I'll just use this
    patch as-is. - Linus ]

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp