25 Oct, 2011

8 commits

  • * 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (26 commits)
    Check validity of cl_rpcclient in nfs_server_list_show
    NFS: Get rid of the nfs_rdata_mempool
    NFS: Don't rely on PageError in nfs_readpage_release_partial
    NFS: Get rid of unnecessary calls to ClearPageError() in read code
    NFS: Get rid of nfs_restart_rpc()
    NFS: Get rid of the unused nfs_write_data->flags field
    NFS: Get rid of the unused nfs_read_data->flags field
    NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
    NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
    NFS: Use the inode->i_version to cache NFSv4 change attribute information
    SUNRPC: Remove unnecessary export of rpc_sockaddr2uaddr
    SUNRPC: Fix rpc_sockaddr2uaddr
    nfs/super.c: local functions should be static
    pnfsblock: fix writeback deadlock
    pnfsblock: fix NULL pointer dereference
    pnfs: recoalesce when ld read pagelist fails
    pnfs: recoalesce when ld write pagelist fails
    pnfs: make _set_lo_fail generic
    pnfsblock: add missing rpc_put_mount and path_put
    SUNRPC/NFS: make rpc pipe upcall generic
    ...

    Linus Torvalds
     
  • * 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
    nfs41: implement DESTROY_CLIENTID operation
    nfsd4: typo logical vs bitwise negate for want_mask
    nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
    nfsd4: seq->status_flags may be used unitialized
    nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
    nfsd4: implement new 4.1 open reclaim types
    nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
    nfsd4: warn on open failure after create
    nfsd4: preallocate open stateid in process_open1()
    nfsd4: do idr preallocation with stateid allocation
    nfsd4: preallocate nfs4_file in process_open1()
    nfsd4: clean up open owners on OPEN failure
    nfsd4: simplify process_open1 logic
    nfsd4: make is_open_owner boolean
    nfsd4: centralize renew_client() calls
    nfsd4: typo logical vs bitwise negate
    nfs: fix bug about IPv6 address scope checking
    nfsd4: more robust ignoring of WANT bits in OPEN
    nfsd4: move name-length checks to xdr
    nfsd4: move access/deny validity checks to xdr code
    ...

    Linus Torvalds
     
  • In commit 8a9ea3237e7e ("Merge git://.../davem/net-next") where my sysfs
    changes from the net tree merged with the sysfs rbtree changes from
    Mickulas Patocka the conflict resolution failed to preserve the
    simplified property that was the point of my changes.

    That is sysfs_find_dirent can now say something is a match if and only
    s_name and s_ns match what we are looking for, and sysfs_readdir can
    simply return all of the directory entries where s_ns matches the
    directory that we should be returning.

    Now that we are back to exact matches we can tweak sysfs_find_dirent and
    the name rb_tree to order sysfs_dirents by s_ns s_name and remove the
    second loop in sysfs_find_dirent. However that change seems a bit much
    for a conflict resolution so it can come later.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
    dp83640: free packet queues on remove
    dp83640: use proper function to free transmit time stamping packets
    ipv6: Do not use routes from locally generated RAs
    |PATCH net-next] tg3: add tx_dropped counter
    be2net: don't create multiple RX/TX rings in multi channel mode
    be2net: don't create multiple TXQs in BE2
    be2net: refactor VF setup/teardown code into be_vf_setup/clear()
    be2net: add vlan/rx-mode/flow-control config to be_setup()
    net_sched: cls_flow: use skb_header_pointer()
    ipv4: avoid useless call of the function check_peer_pmtu
    TCP: remove TCP_DEBUG
    net: Fix driver name for mdio-gpio.c
    ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
    rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
    ipv4: fix ipsec forward performance regression
    jme: fix irq storm after suspend/resume
    route: fix ICMP redirect validation
    net: hold sock reference while processing tx timestamps
    tcp: md5: add more const attributes
    Add ethtool -g support to virtio_net
    ...

    Fix up conflicts in:
    - drivers/net/Kconfig:
    The split-up generated a trivial conflict with removal of a
    stale reference to Documentation/networking/net-modules.txt.
    Remove it from the new location instead.
    - fs/sysfs/dir.c:
    Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
    with Eric Biederman's changes for tagged directories.

    Linus Torvalds
     
  • * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
    mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
    Revert "memory hotplug: Correct page reservation checking"
    Update email address for stable patch submission
    dynamic_debug: fix undefined reference to `__netdev_printk'
    dynamic_debug: use a single printk() to emit messages
    dynamic_debug: remove num_enabled accounting
    dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
    uio: Support physical addresses >32 bits on 32-bit systems
    sysfs: add unsigned long cast to prevent compile warning
    drivers: base: print rejected matches with DEBUG_DRIVER
    memory hotplug: Correct page reservation checking
    memory hotplug: Refuse to add unaligned memory regions
    remove the messy code file Documentation/zh_CN/SubmitChecklist
    ARM: mxc: convert device creation to use platform_device_register_full
    new helper to create platform devices with dma mask
    docs/driver-model: Update device class docs
    docs/driver-model: Document device.groups
    kobj_uevent: Ignore if some listeners cannot handle message
    dynamic_debug: make netif_dbg() call __netdev_printk()
    dynamic_debug: make netdev_dbg() call __netdev_printk()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
    MAINTAINERS: linux-m32r is moderated for non-subscribers
    linux@lists.openrisc.net is moderated for non-subscribers
    Drop default from "DM365 codec select" choice
    parisc: Kconfig: cleanup Kernel page size default
    Kconfig: remove redundant CONFIG_ prefix on two symbols
    cris: remove arch/cris/arch-v32/lib/nand_init.S
    microblaze: add missing CONFIG_ prefixes
    h8300: drop puzzling Kconfig dependencies
    MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
    tty: drop superfluous dependency in Kconfig
    ARM: mxc: fix Kconfig typo 'i.MX51'
    Fix file references in Kconfig files
    aic7xxx: fix Kconfig references to READMEs
    Fix file references in drivers/ide/
    thinkpad_acpi: Fix printk typo 'bluestooth'
    bcmring: drop commented out line in Kconfig
    btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
    doc: raw1394: Trivial typo fix
    CIFS: Don't free volume_info->UNC until we are entirely done with it.
    treewide: Correct spelling of successfully in comments
    ...

    Linus Torvalds
     
  • * 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits)
    TOMOYO: Fix incomplete read after seek.
    Smack: allow to access /smack/access as normal user
    TOMOYO: Fix unused kernel config option.
    Smack: fix: invalid length set for the result of /smack/access
    Smack: compilation fix
    Smack: fix for /smack/access output, use string instead of byte
    Smack: domain transition protections (v3)
    Smack: Provide information for UDS getsockopt(SO_PEERCRED)
    Smack: Clean up comments
    Smack: Repair processing of fcntl
    Smack: Rule list lookup performance
    Smack: check permissions from user space (v2)
    TOMOYO: Fix quota and garbage collector.
    TOMOYO: Remove redundant tasklist_lock.
    TOMOYO: Fix domain transition failure warning.
    TOMOYO: Remove tomoyo_policy_memory_lock spinlock.
    TOMOYO: Simplify garbage collector.
    TOMOYO: Fix make namespacecheck warnings.
    target: check hex2bin result
    encrypted-keys: check hex2bin result
    ...

    Linus Torvalds
     
  • David S. Miller
     

24 Oct, 2011

5 commits


21 Oct, 2011

1 commit


20 Oct, 2011

7 commits

  • sysfs is a core piece of ifrastructure that many people use and
    few people have all of the rules in their head on how to use
    it correctly. Add warnings for people using tagged directories
    improperly to that any misuses can be caught and diagnosed quickly.

    A single inexpensive test in sysfs_find_dirent is almost sufficient
    to catch all possible misuses. An additional warning is needed
    in sysfs_add_dirent so that we actually fail when attempting to
    add an untagged dirent in a tagged directory.

    Signed-off-by: Eric W. Biederman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs
    file we can remove support for untagged files in tagged directories.

    This change removes any ambiguity of what a NULL namespace value
    means. A NULL namespace parameter after this patch means
    that we are talking about an untagged sysfs dirent.

    This makes the sysfs code much less prone to mistakes when during
    maintenance.

    Signed-off-by: Eric W. Biederman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Looking up files in sysfs is hard to understand and analyize because we
    currently allow placing untagged files in tagged directories. In the
    implementation of that we have two subtly different meanings of NULL.
    NULL meaning there is no tag on a directory entry and NULL meaning
    we don't care which namespace the lookup is performed for. This
    multiple uses of NULL have resulted in subtle bugs (since fixed)
    in the code.

    Currently it is only the bonding driver that needs to have an untagged
    file in a tagged directory.

    To untagle this mess I am adding support for tagged files to sysfs.
    Modifying the bonding driver to implement bonding_masters as a tagged
    file. Registering bonding_masters once for each network namespace.
    Then I am removing support for untagged entries in tagged sysfs
    directories.

    Resulting in code that is much easier to reason about.

    Signed-off-by: Eric W. Biederman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • We don't need a mempool in order to guarantee reliable NFS read performance.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Don't rely on the PageError flag to tell us if one of the partial reads of
    the page failed. Instead, replace that with a dedicated flag in the
    struct nfs_page.

    Then clean out redundant uses of the PageError flag: the VM no longer
    checks it for reads.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The generic file read code does that for us anyway.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • It can trivially be replaced with rpc_restart_call_prepare.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

19 Oct, 2011

19 commits

  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • 0c12eaffdf09466f36a9ffe970dda8f4aeb6efc0 "nfsd: don't break lease on
    CLAIM_DELEGATE_CUR" was a temporary workaround for a problem fixed
    properly in the vfs layer by 778fc546f749c588aa2f6cd50215d2715c374252
    "locks: fix tracking of inprogress lease breaks", so we can revert that
    change (but keeping some minor cleanup from that commit).

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Both LOOKUP and OPEN operations may return NFS4ERR_BADNAME if we send a
    an invalid name as a filename argument. As far as the application is
    concerned, it just has to know that the file doesn't exist, and so
    ENOENT would be the appropriate reply. We should only return EINVAL
    if the filename is being used to _create_ a new object on the
    remote filesystem.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • ...and also remove the associated nfs_v4_clientops entry.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • commit ae50c0b5 "pnfs: client stats" added additional information to
    the output of /proc/self/mountstats. The new functions introduced are
    only used in this file and should be marked static.

    If CONFIG_NFS_V4_1 is not defined, empty stub functions are used. If
    CONFIG_NFS_V4 is not defined these stub functions are not used at all.
    Adding static for the functions results in compile warnings:

    fs/nfs/super.c:743: warning: 'show_sessions' defined but not used
    fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used

    Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two
    show_ functions.

    Signed-off-by: H Hartley Sweeten
    Cc: Trond Myklebust
    Signed-off-by: Trond Myklebust

    H Hartley Sweeten
     
  • We should check if the sector is already initialized before
    trying to grab the page from page cache. Otherwise when two
    pages of the same block are written back by two threads each
    calling from writepage_locked, it can cause deadlock like bellow.

    [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
    [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1080.972812] kswapd0 D ffff88000c4926c0 0 25 2 0x00000000
    [ 1080.972816] ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
    [ 1080.972821] ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
    [ 1080.972824] ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
    [ 1080.972828] Call Trace:
    [ 1080.972860] [] ? read_tsc+0x9/0x19
    [ 1080.972877] [] ? lock_page+0x2b/0x2b
    [ 1080.972899] [] io_schedule+0x63/0x7e
    [ 1080.972902] [] sleep_on_page+0xe/0x12
    [ 1080.972905] [] __wait_on_bit_lock+0x46/0x8f
    [ 1080.972916] [] ? lock_release_holdtime.part.7+0x6b/0x72
    [ 1080.972919] [] __lock_page+0x66/0x68
    [ 1080.972928] [] ? autoremove_wake_function+0x3d/0x3d
    [ 1080.972932] [] lock_page+0x27/0x2b
    [ 1080.972934] [] find_lock_page+0x34/0x57
    [ 1080.972937] [] find_or_create_page+0x34/0x8a
    [ 1080.972947] [] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
    [ 1080.972951] [] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
    [ 1080.972995] [] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
    [ 1080.973033] [] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
    [ 1080.973089] [] nfs_pageio_doio+0x1a/0x43 [nfs]
    [ 1080.973098] [] nfs_pageio_complete+0x16/0x2d [nfs]
    [ 1080.973108] [] nfs_writepage_locked+0xa0/0xbf [nfs]
    [ 1080.973119] [] nfs_writepage+0x16/0x2b [nfs]
    [ 1080.973122] [] ? clear_page_dirty_for_io+0x87/0x9a
    [ 1080.973133] [] shrink_page_list+0x39b/0x6c8
    [ 1080.973139] [] shrink_inactive_list+0x22c/0x39e
    [ 1080.973144] [] ? lock_release_holdtime.part.7+0x6b/0x72
    [ 1080.973148] [] shrink_zone+0x445/0x588
    [ 1080.973152] [] balance_pgdat+0x2c2/0x56b
    [ 1080.973170] [] ? __bitmap_weight+0x34/0x80
    [ 1080.973175] [] kswapd+0x2be/0x2fa
    [ 1080.973179] [] ? __init_waitqueue_head+0x4b/0x4b
    [ 1080.973183] [] ? balance_pgdat+0x56b/0x56b
    [ 1080.973187] [] kthread+0xa8/0xb0
    [ 1080.973200] [] kernel_thread_helper+0x4/0x10
    [ 1080.973205] [] ? __init_kthread_worker+0x5a/0x5a
    [ 1080.973210] [] ? gs_change+0x13/0x13
    [ 1080.973213] no locks held by kswapd0/25.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • bl_add_page_to_bio returns error pointer. bio should be reset to
    NULL in failure cases as the out path always calls bl_submit_bio.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
    mds.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
    mds.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • file layout and block layout both use it to set mark layout io failure
    bit. So make it generic.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • Reviewed-by: Jeff Layton
    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • The same function is used by idmap, gss and blocklayout code. Make it
    generic.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • Make the status field explicitly 32 bits. "...it's unlikely that the kernel
    and userspace would differ on the size of an int here, but it might be a
    good idea to go ahead and make that explicitly 32 bits in case we end up
    dealing with more exotic arches at some point in the future."

    Suggested-by: Jeff Layton
    Signed-off-by: Jim Rees
    Signed-off-by: Benny Halevy
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Jim Rees
     
  • Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
    nfs4_blk_decode_device.

    Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
    nfs4_blk_get_deviceinfo.

    Signed-off-by: Jim Rees
    Signed-off-by: Benny Halevy
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Jim Rees
     
  • nfs_find_and_lock_request will take a reference to the nfs_page and
    will then put it if the req is already locked. It's possible though
    that the reference will be the last one. That put then can kick off
    a whole series of reference puts:

    nfs_page
    nfs_open_context
    dentry
    inode

    If the inode ends up being deleted, then the VFS will call
    truncate_inode_pages. That function will try to take the page lock, but
    it was already locked when migrate_page was called. The code
    deadlocks.

    Fix this by simply refusing the migration request if PagePrivate is
    already set, indicating that the page is already associated with an
    active read or write request.

    We've had a customer test a backported version of this patch and
    the preliminary results seem good.

    Cc: stable@kernel.org
    Cc: Andrea Arcangeli
    Reported-by: Harshula Jayasuriya
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • The result from ipv6_addr_scope() always not be a single SCOPE,
    so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
    at nfs_sockaddr_match_ipaddr6.

    This patch fixs the problem, and lets checking address before scope_id.

    Signed-off-by: Mi Jinlong
    Signed-off-by: Trond Myklebust

    Mi Jinlong
     
  • commit 420e3646 allowed the kernel to reduce the number of unnecessary
    commit calls by skipping the commit when there are a large number of
    outstanding pages.

    However, the current test in nfs_commit_unstable_pages does not handle
    the edge condition properly. When ncommit == 0, then that means that the
    kernel doesn't need to do anything more for the inode. The current test
    though in the WB_SYNC_NONE case will return true, and the inode will end
    up being marked dirty. Once that happens the inode will never be clean
    until there's a WB_SYNC_ALL flush.

    Fix this by immediately returning from nfs_commit_unstable_pages when
    ncommit == 0.

    Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
    has a backported version of 420e3646. The inode cache there was growing
    very large. The inode cache was unable to be shrunk since the inodes
    were all marked dirty. Calling sync() would essentially "fix" the
    problem -- the WB_SYNC_ALL flush would result in the inodes all being
    marked clean.

    What I'm not clear on is how big a problem this is in mainline kernels
    as the writeback code there is very different. Either way, it seems
    incorrect to re-mark the inode dirty in this case.

    Reported-by: Mike McLean
    Signed-off-by: Jeff Layton
    Cc: stable@kernel.org [2.6.34+]
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2.

    The reverted commit was rendered obsolete by a VFS fix: commit
    5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in
    two steps). We now no longer need to worry about writeback_single_inode()
    missing our marking the inode for COMMIT in 'do_writepages()' call.

    Reverting this patch, fixes a performance regression in which the inode
    would continuously get queued to the dirty list, causing the writeback
    code to unnecessarily try to send a COMMIT.

    Signed-off-by: Trond Myklebust
    Tested-by: Simon Kirby
    Cc: stable@kernel.org [2.6.35+]

    Trond Myklebust