27 Jan, 2010

1 commit

  • Commit 703625118069f9f8960d356676662d3db5a9d116 exposed that f_modown()
    should call write_lock_irqsave instead of just write_lock_irq so that
    because a caller could have a spinlock held and it would not be good to
    renable interrupts.

    Cc: Eric W. Biederman
    Cc: Al Viro
    Cc: Alan Cox
    Cc: Tavis Ormandy
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Greg Kroah-Hartman
     

26 Jan, 2010

1 commit


25 Jan, 2010

1 commit

  • KVM needs a wait to atomically remove themselves from the eventfd ->poll()
    wait queue head, in order to handle correctly their IRQfd deassign
    operation.

    This patch introduces such API, plus a way to read an eventfd from its
    context.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Avi Kivity

    Davide Libenzi
     

21 Jan, 2010

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
    tty: fix race in tty_fasync
    serial: serial_cs: oxsemi quirk breaks resume
    serial: imx: bit &/| confusion
    serial: Fix crash if the minimum rate of the device is > 9600 baud
    serial-core: resume serial hardware with no_console_suspend
    serial: 8250_pnp: use wildcard for serial Wacom tablets
    nozomi: quick fix for the close/close bug
    compat_ioctl: Supress "unknown cmd" message on serial /dev/console

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    fs/bio.c: fix shadows sparse warning
    drbd: The kernel code is now equivalent to out of tree release 8.3.7
    drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
    drbd: Don't go into StandAlone mode when authentification failes because of network error
    drivers/block/drbd/drbd_receiver.c: correct NULL test
    cfq-iosched: Respect ioprio_class when preempting
    genhd: overlapping variable definition
    block: removed unused as_io_context
    DM: Fix device mapper topology stacking
    block: bdev_stack_limits wrapper
    block: Fix discard alignment calculation and printing
    block: Correct handling of bottom device misaligment
    drbd: check on CONFIG_LBDAF, not LBD
    drivers/block/drbd: Correct NULL test
    drbd: Silenced an assert that could triggered after changing write ordering method
    drbd: Kconfig fix
    drbd: Fix for a race between IO and a detach operation [Bugz 262]
    drbd: Use drbd_crypto_is_hash() instead of an open coded check

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    ecryptfs: use after free
    ecryptfs: Eliminate useless code
    ecryptfs: fix interpose/interpolate typos in comments
    ecryptfs: pass matching flags to interpose as defined and used there
    ecryptfs: remove unnecessary d_drop calls in ecryptfs_link
    ecryptfs: don't ignore return value from lock_rename
    ecryptfs: initialize private persistent file before dereferencing pointer
    eCryptfs: Remove mmap from directory operations
    eCryptfs: Add getattr function
    eCryptfs: Use notify_change for truncating lower inodes

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: fix possible panic on unmount
    Btrfs: deal with NULL acl sent to btrfs_set_acl
    Btrfs: fix regression in orphan cleanup
    Btrfs: Fix race in btrfs_mark_extent_written
    Btrfs, fix memory leaks in error paths
    Btrfs: align offsets for btrfs_ordered_update_i_size
    btrfs: fix missing last-entry in readdir(3)

    Linus Torvalds
     
  • After the commit fb07a5f8 ("compat_ioctl: remove all VT ioctl
    handling"), I got this error message on 64-bit mips kernel with 32-bit
    busybox userland:

    ioctl32(init:1): Unknown cmd fd(0) cmd(00005600){t:'V';sz:0} arg(7fd76480) on /dev/console

    The cmd 5600 is VT_OPENQRY. The busybox's init issues this ioctl to
    know vt-console or serial-console. If the console was serial console,
    VT ioctls are not handled by the serial driver.

    And by quick search, I found some programs using VT_GETMODE to check
    vt-console is available or not.

    Signed-off-by: Atsushi Nemoto
    Cc: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Atsushi Nemoto
     

20 Jan, 2010

10 commits

  • The "full_alg_name" variable is used on a couple error paths, so we
    shouldn't free it until the end.

    Signed-off-by: Dan Carpenter
    Cc: stable@kernel.org
    Signed-off-by: Tyler Hicks

    Dan Carpenter
     
  • The variable lower_dentry is initialized twice to the same (side effect-free)
    expression. Drop one initialization.

    A simplified version of the semantic match that finds this problem is:
    (http://coccinelle.lip6.fr/)

    //
    @forall@
    idexpression *x;
    identifier f!=ERR_PTR;
    @@

    x = f(...)
    ... when != x
    (
    x = f(...,,...)
    |
    * x = f(...)
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Tyler Hicks

    Julia Lawall
     
  • Signed-off-by: Erez Zadok
    Acked-by: Dustin Kirkland
    Signed-off-by: Tyler Hicks

    Erez Zadok
     
  • ecryptfs_interpose checks if one of the flags passed is
    ECRYPTFS_INTERPOSE_FLAG_D_ADD, defined as 0x00000001 in ecryptfs_kernel.h.
    But the only user of ecryptfs_interpose to pass a non-zero flag to it, has
    hard-coded the value as "1". This could spell trouble if any of these values
    changes in the future.

    Signed-off-by: Erez Zadok
    Cc: Dustin Kirkland
    Cc: Al Viro
    Signed-off-by: Tyler Hicks

    Erez Zadok
     
  • Unnecessary because it would unhash perfectly valid dentries, causing them
    to have to be re-looked up the next time they're needed, which presumably is
    right after.

    Signed-off-by: Aseem Rastogi
    Signed-off-by: Shrikar archak
    Signed-off-by: Erez Zadok
    Cc: Saumitra Bhanage
    Cc: Al Viro
    Signed-off-by: Tyler Hicks

    Erez Zadok
     
  • Signed-off-by: Erez Zadok
    Cc: Dustin Kirkland
    Cc: Andrew Morton
    Cc: Al Viro
    Signed-off-by: Tyler Hicks

    Erez Zadok
     
  • Ecryptfs_open dereferences a pointer to the private lower file (the one
    stored in the ecryptfs inode), without checking if the pointer is NULL.
    Right afterward, it initializes that pointer if it is NULL. Swap order of
    statements to first initialize. Bug discovered by Duckjin Kang.

    Signed-off-by: Duckjin Kang
    Signed-off-by: Erez Zadok
    Cc: Dustin Kirkland
    Cc: Al Viro
    Cc:
    Signed-off-by: Tyler Hicks

    Erez Zadok
     
  • Adrian reported that mkfontscale didn't work inside of eCryptfs mounts.
    Strace revealed the following:

    open("./", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
    fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
    open("./fonts.scale", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
    getdents(3, /* 80 entries */, 32768) = 2304
    open("./.", O_RDONLY) = 5
    fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
    fstat64(5, {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0
    mmap2(NULL, 16384, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7fcf000
    close(5) = 0
    --- SIGBUS (Bus error) @ 0 (0) ---
    +++ killed by SIGBUS +++

    The mmap2() on a directory was successful, resulting in a SIGBUS
    signal later. This patch removes mmap() from the list of possible
    ecryptfs_dir_fops so that mmap() isn't possible on eCryptfs directory
    files.

    https://bugs.launchpad.net/ecryptfs/+bug/400443

    Reported-by: Adrian C.
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • The i_blocks field of an eCryptfs inode cannot be trusted, but
    generic_fillattr() uses it to instantiate the blocks field of a stat()
    syscall when a filesystem doesn't implement its own getattr(). Users
    have noticed that the output of du is incorrect on newly created files.

    This patch creates ecryptfs_getattr() which calls into the lower
    filesystem's getattr() so that eCryptfs can use its kstat.blocks value
    after calling generic_fillattr(). It is important to note that the
    block count includes the eCryptfs metadata stored in the beginning of
    the lower file plus any padding used to fill an extent before
    encryption.

    https://bugs.launchpad.net/ecryptfs/+bug/390833

    Reported-by: Dominic Sacré
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • When truncating inodes in the lower filesystem, eCryptfs directly
    invoked vmtruncate(). As Christoph Hellwig pointed out, vmtruncate() is
    a filesystem helper function, but filesystems may need to do more than
    just a call to vmtruncate().

    This patch moves the lower inode truncation out of ecryptfs_truncate()
    and renames the function to truncate_upper(). truncate_upper() updates
    an iattr for the lower inode to indicate if the lower inode needs to be
    truncated upon return. ecryptfs_setattr() then calls notify_change(),
    using the updated iattr for the lower inode, to complete the truncation.

    For eCryptfs functions needing to truncate, ecryptfs_truncate() is
    reintroduced as a simple way to truncate the upper inode to a specified
    size and then truncate the lower inode accordingly.

    https://bugs.launchpad.net/bugs/451368

    Reported-by: Christoph Hellwig
    Acked-by: Dustin Kirkland
    Cc: ecryptfs-devel@lists.launchpad.net
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     

19 Jan, 2010

2 commits

  • fs/bio.c:81:33: warning: symbol 'bslab' shadows an earlier one
    fs/bio.c:74:25: originally declared here

    Signed-off-by: Thiago Farina
    Signed-off-by: Jens Axboe

    Thiago Farina
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: xfs_swap_extents needs to handle dynamic fork offsets
    xfs: fix missing error check in xfs_rtfree_range
    xfs: fix stale inode flush avoidance
    xfs: Remove inode iolock held check during allocation
    xfs: reclaim all inodes by background tree walks
    xfs: Avoid inodes in reclaim when flushing from inode cache
    xfs: reclaim inodes under a write lock

    Linus Torvalds
     

18 Jan, 2010

8 commits

  • We can race with the unmount of an fs and the stopping of a kthread where we
    will free the block group before we're done using it. The reason for this is
    because we do not hold a reference on the block group while its caching, since
    the allocator drops its reference once it exits or moves on to the next block
    group. This patch fixes the problem by taking a reference to the block group
    before we start caching and dropping it when we're done to make sure all
    accesses to the block group are safe. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • It is legal for btrfs_set_acl to be sent a NULL acl. This
    makes sure we don't dereference it. A similar patch was sent by
    Johannes Hirte

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Currently orphan cleanup only ever gets triggered if we cross subvolumes during
    a lookup, which means that if we just mount a plain jane fs that has orphans in
    it, they will never get cleaned up. This results in panic's like these

    http://www.kerneloops.org/oops.php?number=1109085

    where adding an orphan entry results in -EEXIST being returned and we panic. In
    order to fix this, we check to see on lookup if our root has had the orphan
    cleanup done, and if not go ahead and do it. This is easily reproduceable by
    running this testcase

    #include
    #include
    #include
    #include
    #include
    #include

    int main(int argc, char **argv)
    {
    char data[4096];
    char newdata[4096];
    int fd1, fd2;

    memset(data, 'a', 4096);
    memset(newdata, 'b', 4096);

    while (1) {
    int i;

    fd1 = creat("file1", 0666);
    if (fd1 < 0)
    break;

    for (i = 0; i < 512; i++)
    write(fd1, data, 4096);

    fsync(fd1);
    close(fd1);

    fd2 = creat("file2", 0666);
    if (fd2 < 0)
    break;

    ftruncate(fd2, 4096 * 512);

    for (i = 0; i < 512; i++)
    write(fd2, newdata, 4096);
    close(fd2);

    i = rename("file2", "file1");
    unlink("file1");
    }

    return 0;
    }

    and then pulling the power on the box, and then trying to run that test again
    when the box comes back up. I've tested this locally and it fixes the problem.
    Thanks to Tomas Carnecky for helping me track this down initially.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Fix bug reported by Johannes Hirte. The reason of that bug
    is btrfs_del_items is called after btrfs_duplicate_item and
    btrfs_del_items triggers tree balance. The fix is check that
    case and call btrfs_search_slot when needed.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • Stanse found 2 memory leaks in relocate_block_group and
    __btrfs_map_block. cluster and multi are not freed/assigned on all
    paths. Fix that.

    Signed-off-by: Jiri Slaby
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Chris Mason

    Jiri Slaby
     
  • Some callers of btrfs_ordered_update_i_size can now pass in
    a NULL for the ordered extent to update against. This makes
    sure we properly align the offset they pass in when deciding
    how much to bump the on disk i_size.

    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
    commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
    Author: Jan Engelhardt
    Date: Wed Dec 9 22:57:36 2009 +0100

    Btrfs: fix missing last-entry in readdir(3)

    When one does a 32-bit readdir(3), the last entry of a directory is
    missing. This is however not due to passing a large value to filldir,
    but it seems to have to do with glibc doing telldir or something
    quirky. In any case, this patch fixes it in practice.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Chris Mason

    Jan Engelhardt
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    do_add_mount() should sanitize mnt_flags
    CIFS shouldn't make mountpoints shrinkable
    mnt_flags fixes in do_remount()
    attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared()
    may_umount() needs namespace_sem
    Fix configfs leak
    Fix the -ESTALE handling in do_filp_open()
    ecryptfs: Fix refcnt leak on ecryptfs_follow_link() error path
    Fix ACC_MODE() for real
    Unrot uml mconsole a bit
    hppfs: handle ->put_link()
    Kill 9p readlink()
    fix autofs/afs/etc. magic mountpoint breakage

    Linus Torvalds
     

17 Jan, 2010

7 commits

  • Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
    over the end of a truncation. The problem is that
    ramfs_nommu_check_mappings() checks that the reduced file size against the
    VMA tree, but not the vm_region tree.

    The following sequence of events can cause the problem:

    fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
    ftruncate(fd, 32 * 1024);
    a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    munmap(a, 32 * 1024);
    ftruncate(fd, 16 * 1024);
    c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

    Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b'
    sees that the vm_region from 'a' is covering the region it wants and so
    shares it, pinning it in memory.

    Mapping 'a' then goes away and the file is truncated to the end of VMA
    'b'. However, the region allocated by 'a' is still in effect, and has
    _not_ been reduced.

    Mapping 'c' is then created, and because there's a vm_region covering the
    desired region, get_unmapped_area() is _not_ called to repeat the check,
    and the mapping is granted, even though the pages from the latter half of
    the mapping have been discarded.

    However:

    d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

    Mapping 'd' should work, and should end up sharing the region allocated by
    'a'.

    To deal with this, we shrink the vm_region struct during the truncation,
    lest do_mmap_pgoff() take it as licence to share the full region
    automatically without calling the get_unmapped_area() file op again.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix the race between the truncation of a ramfs file and an attempt to make
    a shared mmap of region of that file.

    The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to
    verify that the file region is made of contiguous pages and to find its
    base address - but there isn't any locking to guarantee this region until
    vma_prio_tree_insert() is called by add_vma_to_mm().

    Note that moving the functionality into f_op->mmap() doesn't help as that
    is also called before vma_prio_tree_insert().

    Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it
    does its checks. This means that this function will wait whilst mmaps
    take place.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • MNT_WRITE_HOLD shouldn't leak into new vfsmount and neither
    should MNT_SHARED (the latter will be set properly, along with
    the rest of shared-subtree data structures)

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • * need vfsmount_lock over modifying it
    * need to preserve MNT_SHARED/MNT_UNBINDABLE

    Signed-off-by: Al Viro

    Al Viro
     
  • race in mnt_flags update

    Signed-off-by: Al Viro

    Al Viro
     
  • otherwise it races with clone_mnt() changing mnt_share/mnt_slaves

    Signed-off-by: Al Viro

    Al Viro
     

16 Jan, 2010

5 commits

  • inotify will WARN() if it finds that the idr and the fsnotify internals
    somehow got out of sync. It was only supposed to do this once but due
    to this stupid bug it would warn every single time a problem was
    detected.

    Signed-off-by: Eric Paris
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Since commit 7e790dd5fc937bc8d2400c30a05e32a9e9eef276 ("inotify: fix
    error paths in inotify_update_watch") inotify changed the manor in which
    it gave watch descriptors back to userspace. Previous to this commit
    inotify acted like the following:

    inotify_add_watch(X, Y, Z) = 1
    inotify_rm_watch(X, 1);
    inotify_add_watch(X, Y, Z) = 2

    but after this patch inotify would return watch descriptors like so:

    inotify_add_watch(X, Y, Z) = 1
    inotify_rm_watch(X, 1);
    inotify_add_watch(X, Y, Z) = 1

    which I saw as equivalent to opening an fd where

    open(file) = 1;
    close(1);
    open(file) = 1;

    seemed perfectly reasonable. The issue is that quite a bit of userspace
    apparently relies on the behavior in which watch descriptors will not be
    quickly reused. KDE relies on it, I know some selinux packages rely on
    it, and I have heard complaints from other random sources such as debian
    bug 558981.

    Although the man page implies what we do is ok, we broke userspace so
    this patch almost reverts us to the old behavior. It is still slightly
    racey and I have patches that would fix that, but they are rather large
    and this will fix it for all real world cases. The race is as follows:

    - task1 creates a watch and blocks in idr_new_watch() before it updates
    the hint.
    - task2 creates a watch and updates the hint.
    - task1 updates the hint with it's older wd
    - task removes the watch created by task2
    - task adds a new watch and will reuse the wd originally given to task2

    it requires moving some locking around the hint (last_wd) but this should
    solve it for the real world and be -stable safe.

    As a side effect this patch papers over a bug in the lib/idr code which
    is causing a large number WARN's to pop on people's system and many
    reports in kerneloops.org. I'm working on the root cause of that idr
    bug seperately but this should make inotify immune to that issue.

    Signed-off-by: Eric Paris
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • When swapping extents, we can corrupt inodes by swapping data forks
    that are in incompatible formats. This is caused by the two indoes
    having different fork offsets due to the presence of an attribute
    fork on an attr2 filesystem. xfs_fsr tries to be smart about
    setting the fork offset, but the trick it plays only works on attr1
    (old fixed format attribute fork) filesystems.

    Changing the way xfs_fsr sets up the attribute fork will prevent
    this situation from ever occurring, so in the kernel code we can get
    by with a preventative fix - check that the data fork in the
    defragmented inode is in a format valid for the inode it is being
    swapped into. This will lead to files that will silently and
    potentially repeatedly fail defragmentation, so issue a warning to
    the log when this particular failure occurs to let us know that
    xfs_fsr needs updating/fixing.

    To help identify how to improve xfs_fsr to avoid this issue, add
    trace points for the inodes being swapped so that we can determine
    why the swap was rejected and to confirm that the code is making the
    right decisions and modifications when swapping forks.

    A further complication is even when the swap is allowed to proceed
    when the fork offset is different between the two inodes then value
    for the maximum number of extents the data fork can hold can be
    wrong. Make sure these are also set correctly after the swap occurs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • When xfs_rtfind_forw() returns an error, the block is returned
    uninitialised. xfs_rtfree_range() is not checking the error return,
    so could be using an uninitialised block number for modifying bitmap
    summary info.

    The problem was found by gcc when compiling the *userspace* libxfs
    code - it is an copy of the kernel code with the exact same bug.
    gcc gives an uninitialised variable warning on the userspace code
    but not on the kernel code. You gotta love the consistency (Mmmm,
    slightly chewy today!).

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • When reclaiming stale inodes, we need to guarantee that inodes are
    unpinned before returning with a "clean" status. If we don't we can
    reclaim inodes that are pinned, leading to use after free in the
    transaction subsystem as transactions complete.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner