28 Aug, 2006

11 commits

  • None of the other /proc/meminfo lines have a space in the identifier. This
    post-2.6.17 addition has the potential to break existing parsers, so use an
    underscore instead (like Committed_AS).

    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This fixes the locking error noticed by lockdep:

    =============================================
    [ INFO: possible recursive locking detected ]
    ---------------------------------------------
    init/1 is trying to acquire lock:
    (&sighand->siglock){....}, at: [] flush_old_exec+0x3ae/0x859

    but task is already holding lock:
    (&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

    other info that might help us debug this:
    2 locks held by init/1:
    #0: (tasklist_lock){..--}, at: [] flush_old_exec+0x38e/0x859
    #1: (&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

    stack backtrace:
    [] show_trace_log_lvl+0x54/0xfd
    [] show_trace+0xd/0x10
    [] dump_stack+0x19/0x1b
    [] __lock_acquire+0x773/0x997
    [] lock_acquire+0x4b/0x6c
    [] _spin_lock+0x19/0x28
    [] flush_old_exec+0x3ae/0x859
    [] load_elf_binary+0x4aa/0x1628
    [] search_binary_handler+0xa7/0x24e
    [] do_execve+0x15b/0x1f9
    [] sys_execve+0x29/0x4d
    [] syscall_call+0x7/0xb

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • reiserfs seems to have another locking level layer for the i_mutex due to the
    xattrs-are-a-directory thing.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • JBD currently allocates commit and frozen buffers from slabs. With
    CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
    boundary causing IO problems.

    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127

    So, instead of allocating these from regular slabs - manage allocation from
    its own slabs and disable slab debug for these slabs.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Fix two compile failures in eventpoll.c code which would happen if
    DEBUG_EPOLL is bigger than zero.

    Signed-off-by: Masoud Sharbiani
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masoud Asgharifard Sharbiani
     
  • 1) When we allocated last fragment in ufs_truncate, we read page, check
    if block mapped to address, and if not trying to allocate it. This is
    wrong behaviour, fragment may be NOT allocated, but mapped, this
    happened because of "block map" function not checked allocated fragment
    or not, it just take address of the first fragment in the block, add
    offset of fragment and return result, this is correct behaviour in
    almost all situation except call from ufs_truncate.

    2) Almost all implementation of UFS, which I can investigate have such
    "defect": if you have full disk, and try truncate file, for example 3GB
    to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
    evade from this problem, but "block allocation" algorithm is tied to
    right value of i_lastfrag, and fix of this corner case may slow down of
    ordinaries scenarios, so this patch makes behavior of "truncate"
    operations similar to what other UFS implementations do.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • On UFS, this scenario:
    open(O_TRUNC)
    lseek(1024 * 1024 * 80)
    write("A")
    lseek(1024 * 2)
    write("A")

    may cause access to invalid address.

    This happened because of "goal" is calculated in wrong way in block
    allocation path, as I see this problem exists also in 2.4.

    We use construction like this i_data[lastfrag], i_data array of pointers to
    direct blocks, indirect and so on, it has ceratain size ~20 elements, and
    lastfrag may have value for example 40000.

    Also this patch fixes related to handling such scenario issues, wrong
    zeroing metadata, in case of block(not fragment) allocation, and wrong goal
    calculation, when we allocate block

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • To handle the earlier bogus ENOSPC error caused by filesystem full of block
    reservation, current code falls back to non block reservation, starts to
    allocate block(s) from the goal allocation block group as if there is no
    block reservation.

    Current code needs to re-load the corresponding block group descriptor for
    the initial goal block group in this case. The patch fixes this.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
    divide error.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries Brouwer
     
  • Mounting a (corrupt) minix filesystem with zero s_zmap_blocks
    gives a spectacular crash on my 2.6.17.8 system, no doubt
    because minix/inode.c does an unconditional
    minix_set_bit(0,sbi->s_zmap[0]->b_data);

    [akpm@osdl.org: make labels conistent while we're there]

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries Brouwer
     
  • On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
    > =============================================
    > [ INFO: possible recursive locking detected ]
    > ---------------------------------------------
    > parted/7929 is trying to acquire lock:
    > (&bdev->bd_mutex){--..}, at: [] __blkdev_put+0x1e/0x13c
    >
    > but task is already holding lock:
    > (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    >
    > other info that might help us debug this:
    > 1 lock held by parted/7929:
    > #0: (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    > stack backtrace:
    > [] show_trace_log_lvl+0x58/0x15b
    > [] show_trace+0xd/0x10
    > [] dump_stack+0x17/0x1a
    > [] __lock_acquire+0x753/0x99c
    > [] lock_acquire+0x4a/0x6a
    > [] mutex_lock_nested+0xc8/0x20c
    > [] __blkdev_put+0x1e/0x13c
    > [] blkdev_put+0xa/0xc
    > [] do_open+0x336/0x3a8
    > [] blkdev_open+0x1f/0x4c
    > [] __dentry_open+0xc7/0x1aa
    > [] nameidata_to_filp+0x1c/0x2e
    > [] do_filp_open+0x2e/0x35
    > [] do_sys_open+0x38/0x68
    > [] sys_open+0x16/0x18
    > [] sysenter_past_esp+0x56/0x8d

    OK, I'm having a look here; its all new to me so bear with me.

    blkdev_open() calls
    do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
    mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)

    then something fails, and we're thrown to:

    out_first: where
    if (bdev != bdev->bd_contains)
    blkdev_put(bdev->bd_contains) which is
    __blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
    mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) bd_contains is either bdev or whole, and
    since we take the branch it must be whole. So it seems to me the
    following patch would be the right one:

    [akpm@osdl.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

27 Aug, 2006

1 commit

  • The current sun disklabel code uses a signed int for the sector count.
    When partitions larger than 1 TB are used, the cast to a sector_t causes
    the partition sizes to be invalid:

    # cat /proc/paritions | grep sdan
    66 112 2146435072 sdan
    66 115 9223372036853660736 sdan3
    66 120 9223372036853660736 sdan8

    This patch switches the sector count to an unsigned int to fix this.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Jeff Mahoney
     

25 Aug, 2006

12 commits

  • Greg Kroah-Hartman
     
  • The check in open_exec() for inode->i_mode & 0111 has been made
    redundant by the fix to permission().

    Signed-off-by: Trond Myklebust
    (cherry picked from 1d3741c5d991686699f100b65b9956f7ee7ae0ae commit)

    Trond Myklebust
     
  • The check in prepare_binfmt() for inode->i_mode & 0111 is redundant,
    since open_exec() will already have done that.

    Signed-off-by: Trond Myklebust
    (cherry picked from 822dec482ced07af32c378cd936d77345786572b commit)

    Trond Myklebust
     
  • Currently, the access() call will return incorrect information on NFS if
    there exists an ACL that grants execute access to the user on a regular
    file. The reason the information is incorrect is that the VFS overrides
    this execute access in open_exec() by checking (inode->i_mode & 0111).

    This patch propagates the VFS execute bit check back into the generic
    permission() call.

    Signed-off-by: Trond Myklebust
    (cherry picked from 64cbae98848c4c99851cb0a405f0b4982cd76c1e commit)

    Trond Myklebust
     
  • This is needed in order to handle any NFS4ERR_DELAY errors that might be
    returned by the server. It also ensures that we map the NFSv4 errors before
    they are returned to userland.

    Signed-off-by: Trond Myklebust
    (cherry picked from 71c12b3f0abc7501f6ed231a6d17bc9c05a238dc commit)

    Trond Myklebust
     
  • Check the bounds of length specifiers more thoroughly in the XDR decoding of
    NFS4 readdir reply data.

    Currently, if the server returns a bitmap or attr length that causes the
    current decode point pointer to wrap, this could go undetected (consider a
    small "negative" length on a 32-bit machine).

    Also add a check into the main XDR decode handler to make sure that the amount
    of data is a multiple of four bytes (as specified by RFC-1014). This makes
    sure that we can do u32* pointer subtraction in the NFS client without risking
    an undefined result (the result is undefined if the pointers are not correctly
    aligned with respect to one another).

    Signed-Off-By: David Howells
    Signed-off-by: Trond Myklebust
    (cherry picked from 5861fddd64a7eaf7e8b1a9997455a24e7f688092 commit)

    David Howells
     
  • The problem is that we may be caching writes that would extend the file and
    create a hole in the region that we are reading. In this case, we need to
    detect the eof from the server, ensure that we zero out the pages that
    are part of the hole and mark them as up to date.

    Signed-off-by: Trond Myklebust
    (cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)

    Trond Myklebust
     
  • nlm_traverse_files() is not allowed to hold the nlm_file_mutex while calling
    nlm_inspect file, since it may end up calling nlm_release_file() when
    releaseing the blocks.

    Signed-off-by: Trond Myklebust
    (cherry picked from e558d3cde986e04f68afe8c790ad68ef4b94587a commit)

    Trond Myklebust
     
  • rpc_unlink() and rpc_rmdir() will dput the dentry reference for you.

    Signed-off-by: Trond Myklebust
    (cherry picked from a05a57effa71a1f67ccbfc52335c10c8b85f3f6a commit)

    Trond Myklebust
     
  • Signe-off-by: Trond Myklebust
    (cherry picked from 88bf6d811b01a4be7fd507d18bf5f1c527989089 commit)

    Trond Myklebust
     
  • I'm trying to speeding up mkdir(2) for network file systems. A typical
    mkdir(2) calls two inode_operations: lookup and mkdir. The lookup
    operation would fail with ENOENT in common case. I think it is unnecessary
    because the subsequent mkdir operation can check it. In case of creat(2),
    lookup operation is called with the LOOKUP_CREATE flag, so individual
    filesystem can omit real lookup. e.g. nfs_lookup().

    Here is a sample patch which uses LOOKUP_CREATE and O_EXCL on mkdir,
    symlink and mknod. This uses the gadget for creat(2).

    And here is the result of a benchmark on NFSv3.
    mkdir(2) 10,000 times:
    original 50.5 sec
    patched 29.0 sec

    Signed-off-by: ASANO Masahiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Trond Myklebust
    (cherry picked from fab7bf44449b29f9d5572a5dd8adcf7c91d5bf0f commit)

    ASANO Masahiro
     
  • nfs_wb_page() waits on request completion and, as a result, is not safe to be
    called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS
    allocation. Fix possible deadlock by analyzing gfp mask and refusing to
    release page if __GFP_FS is not set.

    Signed-off-by: Nikita Danilov
    Signed-off-by: Trond Myklebust
    (cherry picked from 374d969debfb290bafcb41d28918dc6f7e43ce31 commit)

    Nikita Danilov
     

23 Aug, 2006

1 commit


21 Aug, 2006

3 commits


16 Aug, 2006

1 commit


15 Aug, 2006

4 commits

  • fcntl(F_SETSIG) no longer works on leases because
    lease_release_private_callback() gets called as the lease is copied in
    order to initialise it.

    The problem is that lease_alloc() performs an unnecessary initialisation,
    which sets the lease_manager_ops. Avoid the problem by allocating the
    target lease structure using locks_alloc_lock().

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • Don't let fuse_readpages leave the @pages list not empty when exiting
    on error.

    [akpm@osdl.org: kernel-doc fixes]
    Signed-off-by: Alexander Zarochentsev
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Alexander Zarochentsev
     
  • Eric says:

    > I saw an oops down this path when trying to create a new file on a UDF
    > filesystem which was internally marked as readonly, but mounted rw:
    >
    > udf_create
    > udf_new_inode
    > new_inode
    > alloc_inode
    > udf_alloc_inode
    > udf_new_block
    > returns EIO due to readonlyness
    > iput (on error)

    I ran into the same issue today, but when listing a directory with
    invalid/corrupt entries:

    udf_lookup
    udf_iget
    get_new_inode_fast
    alloc_inode
    udf_alloc_inode
    __udf_read_inode
    fails for any reason
    iput (on error)
    ...

    The following patch to udf_alloc_inode() should take care of both (and
    other similar) cases, but I've only tested it with udf_lookup().

    Signed-off-by: Dan Bastone
    Cc: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Dan Bastone
     
  • Don't use NULL as a printf control string. Fixes bug #6889.

    Cc: Ralph Corderoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     

10 Aug, 2006

2 commits

  • Greg Kroah-Hartman
     
  • We recently fixed an out-of-space deadlock in XFS, and part of that fix
    involved the addition of the XFS_ALLOC_FLAG_FREEING flag to some of the
    space allocator calls to indicate they're freeing space, not allocating
    it. There was a missed xfs_alloc_fix_freelist condition test that did not
    correctly test "flags". The same test would also test an uninitialised
    structure field (args->userdata) and depending on its value either would
    or would not return early with a critical buffer pointer set to NULL.

    This fixes that up, adds asserts to several places to catch future botches
    of this nature, and skips sections of xfs_alloc_fix_freelist that are
    irrelevent for the space-freeing case.

    SGI-PV: 955303
    SGI-Modid: xfs-linux-melb:xfs-kern:26743a

    Signed-off-by: Nathan Scott

    Nathan Scott
     

08 Aug, 2006

5 commits