18 Jul, 2011

2 commits


16 Jul, 2011

10 commits

  • Before nfs41 client's RECLAIM_COMPLETE done, nfs server should deny any
    new locks or opens.

    rfc5661:

    " Whenever a client establishes a new client ID and before it does
    the first non-reclaim operation that obtains a lock, it MUST send a
    RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
    locks to reclaim. If non-reclaim locking operations are done before
    the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. "

    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     
  • From: Miklos Szeredi

    Remove SLAB initialization entirely, as suggested by Bruce and Linus.
    Allocate with __GFP_ZERO instead and only initialize list heads.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: J. Bruce Fields

    Miklos Szeredi
     
  • Check in SEQUENCE that the request doesn't exceed maxreq_sz for the
    given session.

    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     
  • According to RFC5661, 18.36.3,

    "if the client selects a value for ca_maxresponsesize such that
    a replier on a channel could never send a response,the server
    SHOULD return NFS4ERR_TOOSMALL in the CREATE_SESSION reply."

    So, error out when the client sets a maxreq_sz less than the minimum
    possible SEQUENCE request size, or sets a maxresp_sz less than the
    minimum possible SEQUENCE reply size.

    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     
  • Stateid's hold a read reference for a read open, a write reference for a
    write open, and an additional one of each for each read+write open. The
    latter wasn't getting put on a downgrade, so something like:

    open RW
    open R
    downgrade to R

    was resulting in a file leak.

    Also fix an imbalance in an error path.

    Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix
    downgrade/lock logic".

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Without this, for example,

    open read
    open read+write
    close

    will result in a struct file leak.

    Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix
    downgrade/lock logic".

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • This operation is used by the client to check the validity of a list of
    stateids.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: J. Bruce Fields

    Bryan Schumaker
     
  • This operation is used by the client to tell the server to free a
    stateid.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: J. Bruce Fields

    Bryan Schumaker
     
  • As promised in feature-removal-schedule.txt it is time to
    remove the nfsctl system call.

    Userspace has perferred to not use this call throughout 2.6 and it has been
    excluded in the default configuration since 2.6.36 (9 months ago).

    So this patch removes all the code that was being compiled out.

    There are still references to sys_nfsctl in various arch systemcall tables
    and related code. These should be cleaned out too, probably in the next
    merge window.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
    the client ID derived from the session ID of SEQUENCE is not the same
    as the client ID to be destroyed. If the client IDs are the same,
    then the server MUST return NFS4ERR_CLIENTID_BUSY.

    (that's not implemented yet)

    If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
    operation in the COMPOUND request (otherwise, the server MUST return
    NFS4ERR_NOT_ONLY_OP).

    This fixes the error return; before, we returned
    NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP.

    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Benny Halevy
     

14 Jul, 2011

1 commit


12 Jul, 2011

3 commits


10 Jul, 2011

2 commits

  • Regression introduced in commit 724d9f1cfba.

    Prior to that, expand_dfs_referral would regenerate the mount data string
    and then call cifs_parse_mount_options to re-parse it (klunky, but it
    worked). The above commit moved cifs_parse_mount_options out of cifs_mount,
    so the re-parsing of the new mount options no longer occurred. Fix it by
    making expand_dfs_referral re-parse the mount options.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • This needs to be done regardless of whether that KConfig option is set
    or not.

    Reported-by: Sven-Haegar Koch
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

09 Jul, 2011

2 commits


08 Jul, 2011

2 commits

  • Signed-off-by: Jeff Layton
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Jeff Layton
     
  • Add an FS-Cache helper to bulk uncache pages on an inode. This will
    only work for the circumstance where the pages in the cache correspond
    1:1 with the pages attached to an inode's page cache.

    This is required for CIFS and NFS: When disabling inode cookie, we were
    returning the cookie and setting cifsi->fscache to NULL but failed to
    invalidate any previously mapped pages. This resulted in "Bad page
    state" errors and manifested in other kind of errors when running
    fsstress. Fix it by uncaching mapped pages when we disable the inode
    cookie.

    This patch should fix the following oops and "Bad page state" errors
    seen during fsstress testing.

    ------------[ cut here ]------------
    kernel BUG at fs/cachefiles/namei.c:201!
    invalid opcode: 0000 [#1] SMP
    Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
    RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
    RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282
    RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
    RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
    R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
    R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
    FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
    Stack:
    0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
    ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
    ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
    Call Trace:
    cachefiles_lookup_object+0x78/0xd4 [cachefiles]
    fscache_lookup_object+0x131/0x16d [fscache]
    fscache_object_work_func+0x1bc/0x669 [fscache]
    process_one_work+0x186/0x298
    worker_thread+0xda/0x15d
    kthread+0x84/0x8c
    kernel_thread_helper+0x4/0x10
    RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles]
    ---[ end trace 1d481c9af1804caa ]---

    I tested the uncaching by the following means:

    (1) Create a big file on my NFS server (104857600 bytes).

    (2) Read the file into the cache with md5sum on the NFS client. Look in
    /proc/fs/fscache/stats:

    Pages : mrk=25601 unc=0

    (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc
    again:

    Pages : mrk=25601 unc=25601

    Reported-by: Jeff Layton
    Signed-off-by: David Howells
    Reviewed-and-Tested-by: Suresh Jayaraman
    cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    David Howells
     

07 Jul, 2011

9 commits

  • We need to make sure the data relocation inode doesn't go through
    the delayed metadata updates, otherwise we get an oops during balance:

    kernel BUG at fs/btrfs/relocation.c:4303!
    [SNIP]
    Call Trace:
    [] ? update_ref_for_cow+0x22d/0x330 [btrfs]
    [] __btrfs_cow_block+0x451/0x5e0 [btrfs]
    [] ? read_block_for_search+0x14d/0x4d0 [btrfs]
    [] btrfs_cow_block+0x10b/0x240 [btrfs]
    [] btrfs_search_slot+0x49e/0x7a0 [btrfs]
    [] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
    [] ? mutex_lock+0x1e/0x50
    [] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
    [] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
    [] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
    [] btrfs_commit_transaction+0x250/0x850 [btrfs]
    [] ? find_get_pages+0x39/0x130
    [] ? join_transaction+0x25/0x250 [btrfs]
    [] ? wake_up_bit+0x40/0x40
    [] prepare_to_relocate+0xda/0xf0 [btrfs]
    [] relocate_block_group+0x4b/0x620 [btrfs]
    [] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
    [] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
    [] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
    [] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
    [] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
    [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
    [] ? btrfs_previous_item+0xb1/0x150 [btrfs]
    [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
    [] btrfs_balance+0x21a/0x2b0 [btrfs]
    [] btrfs_ioctl+0x798/0xd20 [btrfs]
    [] ? handle_mm_fault+0x148/0x270
    [] ? do_page_fault+0x1d8/0x4b0
    [] do_vfs_ioctl+0x9a/0x540
    [] sys_ioctl+0xa1/0xb0
    [] system_call_fastpath+0x16/0x1b
    [SNIP]
    RIP [] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • A user reported an error where if we try to balance an fs after a device has
    been removed it will blow up. This is because we get an EIO back and this is
    where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks,

    Reported-by: Anand Jain
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • There are three missed mount options settable by user which are not
    currently displayed in mount output.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • When inodes are marked stale in a transaction, they are treated
    specially when the inode log item is being inserted into the AIL.
    It tries to avoid moving the log item forward in the AIL due to a
    race condition with the writing the underlying buffer back to disk.
    The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
    in the AIL").

    To avoid moving the item forward, we return a LSN smaller than the
    commit_lsn of the completing transaction, thereby trying to trick
    the commit code into not moving the inode forward at all. I'm not
    sure this ever worked as intended - it assumes the inode is already
    in the AIL, but I don't think the returned LSN would have been small
    enough to prevent moving the inode. It appears that the reason it
    worked is that the lower LSN of the inodes meant they were inserted
    into the AIL and flushed before the inode buffer (which was moved to
    the commit_lsn of the transaction).

    The big problem is that with delayed logging, the returning of the
    different LSN means insertion takes the slow, non-bulk path. Worse
    yet is that insertion is to a position -before- the commit_lsn so it
    is doing a AIL traversal on every insertion, and has to walk over
    all the items that have already been inserted into the AIL. It's
    expensive.

    To compound the matter further, with delayed logging inodes are
    likely to go from clean to stale in a single checkpoint, which means
    they aren't even in the AIL at all when we come across them at AIL
    insertion time. Hence these were all getting inserted into the AIL
    when they simply do not need to be as inodes marked XFS_ISTALE are
    never written back.

    Transactional/recovery integrity is maintained in this case by the
    other items in the unlink transaction that were modified (e.g. the
    AGI btree blocks) and committed in the same checkpoint.

    So to fix this, simply unpin the stale inodes directly in
    xfs_inode_item_committed() and return -1 to indicate that the AIL
    insertion code does not need to do any further processing of these
    inodes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • ...as that makes for a cumbersome interface. Make it take a regular
    smb_vol pointer and rely on the caller to zero it out if needed.

    Signed-off-by: Jeff Layton
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Jeff Layton
     
  • Regression introduced by commit f87d39d9513.

    Signed-off-by: Jeff Layton
    Reviewed-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Jeff Layton
     
  • This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's
    called the following call to cifs_get_tcp_session will oops as the
    volume_info pointer will then be NULL.

    The caller of cifs_mount should clean up this data since it passed it
    in. There's no need for us to call this here.

    Regression introduced by commit 724d9f1cfba.

    Reported-by: Adam Williamson
    Cc: Pavel Shilovsky
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • The shdr4extnum variable isn't being freed in the cleanup process of
    elf_fdpic_core_dump().

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • locks_alloc_lock() assumed that the allocated struct file_lock is
    already initialized to zero members. This is only true for the first
    allocation of the structure, after reuse some of the members will have
    random values.

    This will for example result in passing random fl_start values to
    userspace in fuse for FL_FLOCK locks, which is an information leak at
    best.

    Fix by reinitializing those members which may be non-zero after freeing.

    Signed-off-by: Miklos Szeredi
    CC: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

06 Jul, 2011

2 commits


02 Jul, 2011

1 commit

  • Benjamin S. reported that he was unable to suspend his machine while
    it had a cifs share mounted. The freezer caused this to spew when he
    tried it:

    -----------------------[snip]------------------
    PM: Syncing filesystems ... done.
    Freezing user space processes ... (elapsed 0.01 seconds) done.
    Freezing remaining freezable tasks ...
    Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
    cifsd S ffff880127f7b1b0 0 1821 2 0x00800000
    ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff
    ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000
    ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010
    Call Trace:
    [] ? sk_reset_timer+0xf/0x19
    [] ? tcp_connect+0x43c/0x445
    [] ? tcp_v4_connect+0x40d/0x47f
    [] ? schedule_timeout+0x21/0x1ad
    [] ? _raw_spin_lock_bh+0x9/0x1f
    [] ? release_sock+0x19/0xef
    [] ? inet_stream_connect+0x14c/0x24a
    [] ? autoremove_wake_function+0x0/0x2a
    [] ? ipv4_connect+0x39c/0x3b5 [cifs]
    [] ? cifs_reconnect+0x1fc/0x28a [cifs]
    [] ? cifs_demultiplex_thread+0x397/0xb9f [cifs]
    [] ? perf_event_exit_task+0xb9/0x1bf
    [] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
    [] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
    [] ? kthread+0x7a/0x82
    [] ? kernel_thread_helper+0x4/0x10
    [] ? kthread+0x0/0x82
    [] ? kernel_thread_helper+0x0/0x10

    Restarting tasks ... done.
    -----------------------[snip]------------------

    We do attempt to perform a try_to_freeze in cifs_reconnect, but the
    connection attempt itself seems to be taking longer than 20s to time
    out. The connect timeout is governed by the socket send and receive
    timeouts, so we can shorten that period by setting those timeouts
    before attempting the connect instead of after.

    Adam Williamson tested the patch and said that it seems to have fixed
    suspending on his laptop when a cifs share is mounted.

    Reported-by: Benjamin S
    Tested-by: Adam Williamson
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

30 Jun, 2011

2 commits


29 Jun, 2011

2 commits

  • In current pnfs tree, all the layouts set mds_offset in their
    .write_pagelist member.
    mds_offset is only used by generic layer and should be handled by it.

    This patch is for upstream. It is needed in this -rc series to fix a
    bug in objects layout_commit.

    I'll send patches for objects and blocks to be
    squashed into current pnfs tree.

    TODO: It looks like the read path needs the same patch.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • /proc/PID/io may be used for gathering private information. E.g. for
    openssh and vsftpd daemons wchars/rchars may be used to learn the
    precise password length. Restrict it to processes being able to ptrace
    the target process.

    ptrace_may_access() is needed to prevent keeping open file descriptor of
    "io" file, executing setuid binary and gathering io information of the
    setuid'ed process.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

28 Jun, 2011

2 commits

  • Under heavy memory and filesystem load, users observe the assertion
    mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
    page reclaim reclaiming the last page from a mapping in the following
    race:

    CPU0 CPU1
    ...
    shrink_page_list()
    __remove_mapping()
    __delete_from_page_cache()
    radix_tree_delete()
    evict_inode()
    truncate_inode_pages()
    truncate_inode_pages_range()
    pagevec_lookup() - finds nothing
    end_writeback()
    mapping->nrpages != 0 -> BUG
    page->mapping = NULL
    mapping->nrpages--

    Fix the problem by doing a reliable check of mapping->nrpages under
    mapping->tree_lock in end_writeback().

    Analyzed by Jay , lost in LKML, and dug out
    by Miklos Szeredi .

    Cc: Jay
    Cc: Miklos Szeredi
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • romfs_get_unmapped_area() checks argument `len' without considering
    PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after
    commit f67d9b1576c ("nommu: add page_align to mmap").

    Fix the check by changing it in same way ramfs_nommu_get_unmapped_area()
    was changed in ramfs/file-nommu.c.

    Signed-off-by: Bob Liu
    Cc: David Howells
    Cc: Paul Mundt
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu