08 Jan, 2015

1 commit

  • Currently, nfs4_set_delegation takes a reference to an existing
    delegation and then checks to see if there is a conflict. If there is
    one, then it doesn't release that reference.

    Change the code to take the reference after the check and only if there
    is no conflict.

    Signed-off-by: Jeff Layton
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

17 Dec, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A comparatively quieter cycle for nfsd this time, but still with two
    larger changes:

    - RPC server scalability improvements from Jeff Layton (using RCU
    instead of a spinlock to find idle threads).

    - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
    Schumaker, enabling fallocate on new clients"

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd4: fix xdr4 count of server in fs_location4
    nfsd4: fix xdr4 inclusion of escaped char
    sunrpc/cache: convert to use string_escape_str()
    sunrpc: only call test_bit once in svc_xprt_received
    fs: nfsd: Fix signedness bug in compare_blob
    sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
    sunrpc: convert to lockless lookup of queued server threads
    sunrpc: fix potential races in pool_stats collection
    sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
    sunrpc: require svc_create callers to pass in meaningful shutdown routine
    sunrpc: have svc_wake_up only deal with pool 0
    sunrpc: convert sp_task_pending flag to use atomic bitops
    sunrpc: move rq_cachetype field to better optimize space
    sunrpc: move rq_splice_ok flag into rq_flags
    sunrpc: move rq_dropme flag into rq_flags
    sunrpc: move rq_usedeferral flag to rq_flags
    sunrpc: move rq_local field to rq_flags
    sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
    nfsd: minor off by one checks in __write_versions()
    sunrpc: release svc_pool_map reference when serv allocation fails
    ...

    Linus Torvalds
     

12 Dec, 2014

1 commit

  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

2 commits

  • Pull VFS changes from Al Viro:
    "First pile out of several (there _definitely_ will be more). Stuff in
    this one:

    - unification of d_splice_alias()/d_materialize_unique()

    - iov_iter rewrite

    - killing a bunch of ->f_path.dentry users (and f_dentry macro).

    Getting that completed will make life much simpler for
    unionmount/overlayfs, since then we'll be able to limit the places
    sensitive to file _dentry_ to reasonably few. Which allows to have
    file_inode(file) pointing to inode in a covered layer, with dentry
    pointing to (negative) dentry in union one.

    Still not complete, but much closer now.

    - crapectomy in lustre (dead code removal, mostly)

    - "let's make seq_printf return nothing" preparations

    - assorted cleanups and fixes

    There _definitely_ will be more piles"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    copy_from_iter_nocache()
    new helper: iov_iter_kvec()
    csum_and_copy_..._iter()
    iov_iter.c: handle ITER_KVEC directly
    iov_iter.c: convert copy_to_iter() to iterate_and_advance
    iov_iter.c: convert copy_from_iter() to iterate_and_advance
    iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
    iov_iter.c: convert iov_iter_zero() to iterate_and_advance
    iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
    iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
    iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
    iov_iter.c: iterate_and_advance
    iov_iter.c: macros for iterating over iov_iter
    kill f_dentry macro
    dcache: fix kmemcheck warning in switch_names
    new helper: audit_file()
    nfsd_vfs_write(): use file_inode()
    ncpfs: use file_inode()
    kill f_dentry uses
    lockd: get rid of ->f_path.dentry->d_sb
    ...

    Linus Torvalds
     
  • This patch effectively reverts commit 500f80872645 ("net: ovs: use CRC32
    accelerated flow hash if available"), and other remaining arch_fast_hash()
    users such as from nfsd via commit 6282cd565553 ("NFSD: Don't hand out
    delegations for 30 seconds after recalling them.") where it has been used
    as a hash function for bloom filtering.

    While we think that these users are actually not much of concern, it has
    been requested to remove the arch_fast_hash() library bits that arose
    from [1] entirely as per recent discussion [2]. The main argument is that
    using it as a hash may introduce bias due to its linearity (see avalanche
    criterion) and thus makes it less clear (though we tried to document that)
    when this security/performance trade-off is actually acceptable for a
    general purpose library function.

    Lets therefore avoid any further confusion on this matter and remove it to
    prevent any future accidental misuse of it. For the time being, this is
    going to make hashing of flow keys a bit more expensive in the ovs case,
    but future work could reevaluate a different hashing discipline.

    [1] https://patchwork.ozlabs.org/patch/299369/
    [2] https://patchwork.ozlabs.org/patch/418756/

    Cc: Neil Brown
    Cc: Francesco Fusco
    Cc: Jesse Gross
    Cc: Thomas Graf
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Dec, 2014

8 commits


09 Dec, 2014

1 commit


02 Dec, 2014

1 commit

  • My static checker complains that if "len == remaining" then it means we
    have truncated the last character off the version string.

    The intent of the code is that we print as many versions as we can
    without truncating a version. Then we put a newline at the end. If the
    newline can't fit we return -EINVAL.

    Signed-off-by: Dan Carpenter
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Dan Carpenter
     

20 Nov, 2014

7 commits


08 Nov, 2014

3 commits

  • The global state_lock protects the file_hashtbl, and that has the
    potential to be a scalability bottleneck.

    Address this by making the file_hashtbl use RCU. Add a rcu_head to the
    nfs4_file and use that when freeing ones that have been hashed. In order
    to conserve space, we union the fi_rcu field with the fi_delegations
    list_head which must be clear by the time the last reference to the file
    is dropped.

    Convert find_file_locked to use RCU lookup primitives and not to require
    that the state_lock be held, and convert find_file to do a lockless
    lookup. Convert find_or_add_file to attempt a lockless lookup first, and
    then fall back to doing a locked search and insert if that fails to find
    anything.

    Also, minimize the number of times we need to calculate the hash value
    by passing it in as an argument to the search and insert functions, and
    optimize the order of arguments in nfsd4_init_file.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • DEALLOCATE only returns a status value, meaning we can use the noop()
    xdr encoder to reply to the client.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • The ALLOCATE operation is used to preallocate space in a file. I can do
    this by using vfs_fallocate() to do the actual preallocation.

    ALLOCATE only returns a status indicator, so we don't need to write a
    special encode() function.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     

01 Nov, 2014

1 commit


24 Oct, 2014

4 commits

  • They're a bit outdated wrt to some recent changes.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • A client may not want to use the back channel on a transport it sent
    CREATE_SESSION on, in which case it clears SESSION4_BACK_CHAN.

    However, cl_cb_addr should be populated anyway, to be used if the
    client binds other connections to this session. If cl_cb_addr is
    not initialized, rpc_create() fails when the server attempts to
    set up a back channel on such secondary transports.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The vfs_fsync_range() call during write processing got the end of the
    range off by one. The range is inclusive, not exclusive. The error has
    nfsd sync more data than requested -- it's correct but unnecessary
    overhead.

    The call during commit processing is correct so I copied that pattern in
    write processing. Maybe a helper would be nice but I kept it trivial.

    This is untested. I found it while reviewing code for something else
    entirely.

    Signed-off-by: Zach Brown
    Signed-off-by: J. Bruce Fields

    Zach Brown
     
  • Unknown operation numbers are caught in nfsd4_decode_compound() which
    sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
    error causes the main loop in nfsd4_proc_compound() to skip most
    processing. But nfsd4_proc_compound also peeks ahead at the next
    operation in one case and doesn't take similar precautions there.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

21 Oct, 2014

1 commit

  • We added this new estimator function but forgot to hook it up. The
    effect is that NFSv4.1 (and greater) won't do zero-copy reads.

    The estimate was also wrong by 8 bytes.

    Fixes: ccae70a9ee41 "nfsd4: estimate sequence response size"
    Cc: stable@vger.kernel.org
    Reported-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

13 Oct, 2014

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     

12 Oct, 2014

2 commits

  • Pull security subsystem updates from James Morris.

    Mostly ima, selinux, smack and key handling updates.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
    integrity: do zero padding of the key id
    KEYS: output last portion of fingerprint in /proc/keys
    KEYS: strip 'id:' from ca_keyid
    KEYS: use swapped SKID for performing partial matching
    KEYS: Restore partial ID matching functionality for asymmetric keys
    X.509: If available, use the raw subjKeyId to form the key description
    KEYS: handle error code encoded in pointer
    selinux: normalize audit log formatting
    selinux: cleanup error reporting in selinux_nlmsg_perm()
    KEYS: Check hex2bin()'s return when generating an asymmetric key ID
    ima: detect violations for mmaped files
    ima: fix race condition on ima_rdwr_violation_check and process_measurement
    ima: added ima_policy_flag variable
    ima: return an error code from ima_add_boot_aggregate()
    ima: provide 'ima_appraise=log' kernel option
    ima: move keyring initialization to ima_init()
    PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
    PKCS#7: Better handling of unsupported crypto
    KEYS: Overhaul key identification when searching for asymmetric keys
    KEYS: Implement binary asymmetric key ID handling
    ...

    Linus Torvalds
     
  • Pull file locking related changes from Jeff Layton:
    "This release is a little more busy for file locking changes than the
    last:

    - a set of patches from Kinglong Mee to fix the lockowner handling in
    knfsd
    - a pile of cleanups to the internal file lease API. This should get
    us a bit closer to allowing for setlease methods that can block.

    There are some dependencies between mine and Bruce's trees this cycle,
    and I based my tree on top of the requisite patches in Bruce's tree"

    * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
    locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
    locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
    locks: set fl_owner for leases to filp instead of current->files
    locks: give lm_break a return value
    locks: __break_lease cleanup in preparation of allowing direct removal of leases
    locks: remove i_have_this_lease check from __break_lease
    locks: move freeing of leases outside of i_lock
    locks: move i_lock acquisition into generic_*_lease handlers
    locks: define a lm_setup handler for leases
    locks: plumb a "priv" pointer into the setlease routines
    nfsd: don't keep a pointer to the lease in nfs4_file
    locks: clean up vfs_setlease kerneldoc comments
    locks: generic_delete_lease doesn't need a file_lock at all
    nfsd: fix potential lease memory leak in nfs4_setlease
    locks: close potential race in lease_get_mtime
    security: make security_file_set_fowner, f_setown and __f_setown void return
    locks: consolidate "nolease" routines
    locks: remove lock_may_read and lock_may_write
    lockd: rip out deferred lock handling from testlock codepath
    NFSD: Get reference of lockowner when coping file_lock
    ...

    Linus Torvalds
     

09 Oct, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - support the NFSv4.2 SEEK operation (allowing clients to support
    SEEK_HOLE/SEEK_DATA), thanks to Anna.
    - end the grace period early in a number of cases, mitigating a
    long-standing annoyance, thanks to Jeff
    - improve SMP scalability, thanks to Trond"

    * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
    nfsd: eliminate "to_delegation" define
    NFSD: Implement SEEK
    NFSD: Add generic v4.2 infrastructure
    svcrdma: advertise the correct max payload
    nfsd: introduce nfsd4_callback_ops
    nfsd: split nfsd4_callback initialization and use
    nfsd: introduce a generic nfsd4_cb
    nfsd: remove nfsd4_callback.cb_op
    nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
    nfsd: fix nfsd4_cb_recall_done error handling
    nfsd4: clarify how grace period ends
    nfsd4: stop grace_time update at end of grace period
    nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
    nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
    nfsd: serialize nfsdcltrack upcalls for a particular client
    nfsd: pass extra info in env vars to upcalls to allow for early grace period end
    nfsd: add a v4_end_grace file to /proc/fs/nfsd
    lockd: add a /proc/fs/lockd/nlm_end_grace file
    nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
    nfsd: remove redundant boot_time parm from grace_done client tracking op
    ...

    Linus Torvalds
     

08 Oct, 2014

5 commits

  • Christoph suggests:

    "Add a return value to lm_break so that the lock manager can tell the
    core code "you can delete this lease right now". That gets rid of
    the games with the timeout which require all kinds of race avoidance
    code in the users."

    Do that here and have the nfsd lease break routine use it when it detects
    that there was a race between setting up the lease and it being broken.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     
  • There was only one place where we still could free a file_lock while
    holding the i_lock -- lease_modify. Add a new list_head argument to the
    lm_change operation, pass in a private list when calling it, and fix
    those callers to dispose of the list once the lock has been dropped.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     
  • ...and move the fasync setup into it for fcntl lease calls. At the same
    time, change the semantics of how the file_lock double-pointer is
    handled. Up until now, on a successful lease return you got a pointer to
    the lock on the list. This is bad, since that pointer can no longer be
    relied on as valid once the inode->i_lock has been released.

    Change the code to instead just zero out the pointer if the lease we
    passed in ended up being used. Then the callers can just check to see
    if it's NULL after the call and free it if it isn't.

    The priv argument has the same semantics. The lm_setup function can
    zero the pointer out to signal to the caller that it should not be
    freed after the function returns.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     
  • In later patches, we're going to add a new lock_manager_operation to
    finish setting up the lease while still holding the i_lock. To do
    this, we'll need to pass a little bit of info in the fcntl setlease
    case (primarily an fasync structure). Plumb the extra pointer into
    there in advance of that.

    We declare this pointer as a void ** to make it clear that this is
    private info, and that the caller isn't required to set this unless
    the lm_setup specifically requires it.

    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     
  • Now that we don't need to pass in an actual lease pointer to
    vfs_setlease on unlock, we can stop tracking a pointer to the lease in
    the nfs4_file.

    Switch all of the places that check the fi_lease to check fi_deleg_file
    instead. We always set that at the same time so it will have the same
    semantics.

    Cc: J. Bruce Fields
    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig

    Jeff Layton