09 Jun, 2006

40 commits

  • Respond to a moved error on NFS lookup by setting up the referral.
    Note: We don't actually follow the referral during lookup/getattr, but
    later when we detect fsid mismatch in inode revalidation (similar to the
    processing done for cloning submounts). Referrals will have fake attributes
    until they are actually followed or traversed.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • Set up mountpoint when hitting a referral on moved error by getting
    fs_locations.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • Move existing code into a separate function so that it can be also used by
    referral code.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • This is (similar to getattr bitmap) but includes fs_locations and
    mounted_on_fileid attributes. Use this bitmap for encoding in fs_locations
    requests.
    Note: We can probably do better by requesting locations as part of fsinfo
    itself.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • Per referral draft, only fs_locations, fsid, and mounted_on_fileid can be
    requested in a GETATTR on referrals.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • It is ignored if fileid is also requested. This will be used on referrals
    (fs_locations).

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • Use component4-style formats for decoding list of servers and pathnames in
    fs_locations.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik
     
  • NFSv4 allows for the fact that filesystems may be replicated across
    several servers or that they may be migrated to a backup server in case of
    failure of the primary server.
    fs_locations is an NFSv4 operation for retrieving information about the
    location of migrated and/or replicated filesystems.

    Based on an initial implementation by Jiaying Zhang
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Make automounted partitions expire using the mark_mounts_for_expiry()
    function. The timeout is controlled via a sysctl.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • This should enable us to detect if we are crossing a mountpoint in the
    case where the server is exporting "nohide" mounts.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Allow filesystems to decide to perform pre-umount processing whether or not
    MNT_FORCE is set.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Allow a submount to be marked as being 'shrinkable' by means of the
    vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
    attempts to recursively unmount these submounts.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Replace all module uses with the new vfs_kern_mount() interface, and fix up
    simple_pin_fs().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • do_kern_mount() does not allow the kernel to use private mount interfaces
    without exposing the same interfaces to userland. The problem is that the
    filesystem is referenced by name, thus meaning that it and its mount
    interface must be registered in the global filesystem list.

    vfs_kern_mount() passes the struct file_system_type as an explicit
    parameter in order to overcome this limitation.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Now that we have a real nfs_invalidate_page() to ensure that
    truncate_inode_pages() does the right thing when there are pending dirty
    pages, we can get rid of nfs_delete_inode().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • In the case of a call to truncate_inode_pages(), we should really try to
    cancel any pending writes on the page.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We just set *acl_len to zero, and attrlen is unsigned, so this comparison
    is clearly bogus. I have no idea what I was thinking.

    Fixes a bug that caused getacl to fail over krb5p.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Trond Myklebust

    Alexey Dobriyan
     
  • Fix two errors in the client-side acl cache: First, when nfs3_proc_getacl
    requests only the default acl of a file and the access acl is not cached
    already, a NULL access acl entry is cached instead of ERR_PTR(-EAGAIN)
    ("not cached").

    Second, update the cached acls in nfs3_proc_setacls: nfs_refresh_inode does
    not always invalidate the cached acls, and when it does not, the cached acls
    get out of sync.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Trond Myklebust

    Andreas Gruenbacher
     
  • Currently, we are accounting for all calls to nfs_revalidate_inode(), but not
    to nfs_revalidate_mapping(), or nfs_lookup_verify_inode(), etc...

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Separate out the function of revalidating the inode metadata, and
    revalidating the mapping. The former may be called by lookup(),
    and only really needs to check that permissions, ctime, etc haven't changed
    whereas the latter needs only done when we want to read data from the page
    cache, and may need to sync and then invalidate the mapping.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Whenever the directory changes, we want to make sure that we always
    invalidate its page cache. Fix up update_changeattr() and
    nfs_mark_for_revalidate() so that they do so.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Fix up a bug in the handling of NFS_INO_REVAL_PAGECACHE: make sure that
    nfs_update_inode() clears it when we're sure we're not racing with other
    updates.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Clean up use of page_array, and fix an off-by-one error noticed by Tom
    Talpey which causes kmalloc calls in cases where using the page_array
    is sufficient.

    Test plan:
    Normal client functional testing with r/wsize=32768.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The XID generator uses get_random_bytes to generate an initial XID.
    NFS_ROOT starts up before the random driver, though, so get_random_bytes
    doesn't set a random XID for NFS_ROOT. This causes NFS_ROOT mount points
    to reuse XIDs every time the client is booted. If the client boots often
    enough, the server will start serving old replies out of its DRC.

    Use net_random() instead.

    Test plan:
    I/O intensive workloads should perform well and generate no errors. Traces
    taken during client reboots should show that NFS_ROOT mounts use unique
    XIDs after every reboot.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Make the RPC client select privileged ephemeral source ports at
    random. This improves DRC behavior on the server by using the
    same port when reconnecting for the same mount point, but using
    a different port for fresh mounts.

    The Linux TCP implementation already does this for nonprivileged
    ports. Note that TCP sockets in TIME_WAIT will prevent quick reuse
    of a random ephemeral port number by leaving the port INUSE until
    the connection transitions out of TIME_WAIT.

    Test plan:
    Connectathon against every known server implementation using multiple
    mount points. Locking especially.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The Linux NFSv4 server violates RFC3530 in that the change attribute is not
    guaranteed to be updated for every change to the inode. Our optimisation
    for checking whether or not the inode metadata has changed or not is broken
    too. Grr....

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The code that is supposed to zero the uninitialised partial pages when the
    server returns a short read is currently broken: it looks at the nfs_page
    wb_pgbase and wb_bytes fields instead of the equivalent nfs_read_data
    values when deciding where to start truncating the page.

    Also ensure that we are more careful about setting PG_uptodate
    before retrying a short read: the retry will change the nfs_read_data
    args.pgbase and args.count.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • * 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
    e1000: remove risky prefetch on next_skb->data
    e1000: fix ethtool test irq alloc as "probe"
    [PATCH] bcm43xx: add DMA rx poll workaround to DMA4

    Linus Torvalds
     
  • From: Martin Schwidefsky

    __futex_atomic_op needs to do an atomic operation in the user address space,
    not the kernel address space. Add the missing sacf 256/sacf 0 to switch to
    the secondary mode before doing the compare-and-swap. In addition add
    another fixup for catch specification exceptions if the compare-and-swap
    address is not aligned.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     
  • Looking at the reiser4 crash, I found a leak in debugfs. In
    debugfs_mknod(), we create the inode before checking if the dentry
    already has one attached. We don't free it if that is the case.

    These bugs happen quite often, I'm starting to think we should disallow
    such coding in CodingStyle.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • There's a race between shutting down one io scheduler and firing up the
    next, in which a new io could enter and cause the io scheduler to be
    invoked with bad or NULL data.

    To fix this, we need to maintain the queue lock for a bit longer.
    Unfortunately we cannot do that, since the elevator init requires to be
    run without the lock held. This isn't easily fixable, without also
    changing the mempool API. So split the initialization into two parts,
    and alloc-init operation and an attach operation. Then we can
    preallocate the io scheduler and related structures, and run the attach
    inside the lock after we detach the old one.

    This patch has survived 30 minutes of 1 second io scheduler switching
    with a very busy io load.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • From: Malcom Parsons

    When scrolling up in SCROLL_PAN_REDRAW mode with a large limited scroll
    region, the bottom few lines have to be redrawn. Without this patch, the
    wrong text is drawn into these lines, corrupting the display.

    Observed in 2.6.14 when running an IRC client in the Nintendo DS linux
    port.

    I haven't tested if scrolling down has the same problem.

    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Malcom Parsons
     
  • From: Ralf Baechle

    uses struct mm_struct and relies on a definition or
    declaration somehow magically being dragged in which may result in a
    build:

    [...]
    CC mm/mempolicy.o
    In file included from mm/mempolicy.c:69:
    include/linux/mempolicy.h:150: warning: ‘struct mm_struct’ declared inside parameter list
    include/linux/mempolicy.h:150: warning: its scope is only this definition or declaration, which is probably not what you want
    include/linux/mempolicy.h:175: warning: ‘struct mm_struct’ declared inside parameter list
    mm/mempolicy.c:622: error: conflicting types for ‘do_migrate_pages’
    include/linux/mempolicy.h:175: error: previous declaration of ‘do_migrate_pages’ was here
    mm/mempolicy.c:1661: error: conflicting types for ‘mpol_rebind_mm’
    include/linux/mempolicy.h:150: error: previous declaration of ‘mpol_rebind_mm’ was here
    make[1]: *** [mm/mempolicy.o] Error 1
    make: *** [mm] Error 2
    [ralf@denk linux-ip35]$

    Including is a step into direction of include hell so
    fixed by adding a forward declaration of struct mm_struct instead.

    Signed-off-by: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • From: Lennert Buytenhek

    The recent renaming of m48t86's ->readb() and ->writeb() platform driver
    methods (2d7b20c1884777e66009be1a533641c19c4705f6) to ->readbyte() and
    ->writebyte() to fix the ia64 build broke the build of the cirrus ep93xx
    ARM platform. This patch fixes it up.

    Signed-off-by: Lennert Buytenhek
    Cc: Alessandro Zummo
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lennert Buytenhek
     
  • From: "Andy Currid"

    This patch fixes a kernel panic during boot that occurs on NVIDIA platforms
    that have HPET enabled.

    When HPET is enabled, the standard timer IRQ is routed to IOAPIC pin 2 and is
    advertised as such in the ACPI APIC table - but an earlier workaround in the
    kernel was ignoring this override. The fix is to honor timer IRQ overrides
    from ACPI when HPET is detected on an NVIDIA platform.

    Signed-off-by: Andy Currid
    Cc: "Brown, Len"
    Cc: "Yu, Luming"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Currid