12 Jul, 2013

2 commits

  • Pull second set of NFS client updates from Trond Myklebust:
    "This mainly contains some small readdir optimisations that had
    dependencies on Al Viro's readdir rewrite. There is also a fix for a
    nasty deadlock which surfaced earlier in this merge window.

    Highlights include:
    - Fix an_rpc pipefs regression that causes a deadlock on mount
    - Readdir optimisations by Scott Mayhew and Jeff Layton
    - clean up the rpc_pipefs dentry operation setup"

    * tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: Fix a deadlock in rpc_client_register()
    rpc_pipe: rpc_dir_inode_operations can be static
    NFS: Allow nfs_updatepage to extend a write under additional circumstances
    NFS: Make nfs_readdir revalidate less often
    NFS: Make nfs_attribute_cache_expired() non-static
    rpc_pipe: set dentry operations at d_alloc time
    nfs: set verifier on existing dentries in nfs_prime_dcache

    Linus Torvalds
     
  • Pull nfsd changes from Bruce Fields:
    "Changes this time include:

    - 4.1 enabled on the server by default: the last 4.1-specific issues
    I know of are fixed, so we're not going to find the rest of the
    bugs without more exposure.
    - Experimental support for NFSv4.2 MAC Labeling (to allow running
    selinux over NFS), from Dave Quigley.
    - Fixes for some delicate cache/upcall races that could cause rare
    server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
    debugging persistence.
    - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
    and v4.1-specific, but also a generic bug handling fragmented rpc
    calls"

    * 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
    nfsd4: support minorversion 1 by default
    nfsd4: allow destroy_session over destroyed session
    svcrpc: fix failures to handle -1 uid's
    sunrpc: Don't schedule an upcall on a replaced cache entry.
    net/sunrpc: xpt_auth_cache should be ignored when expired.
    sunrpc/cache: ensure items removed from cache do not have pending upcalls.
    sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
    sunrpc/cache: remove races with queuing an upcall.
    nfsd4: return delegation immediately if lease fails
    nfsd4: do not throw away 4.1 lock state on last unlock
    nfsd4: delegation-based open reclaims should bypass permissions
    svcrpc: don't error out on small tcp fragment
    svcrpc: fix handling of too-short rpc's
    nfsd4: minor read_buf cleanup
    nfsd4: fix decoding of compounds across page boundaries
    nfsd4: clean up nfs4_open_delegation
    NFSD: Don't give out read delegations on creates
    nfsd4: allow client to send no cb_sec flavors
    nfsd4: fail attempts to request gss on the backchannel
    nfsd4: implement minimal SP4_MACH_CRED
    ...

    Linus Torvalds
     

11 Jul, 2013

1 commit

  • Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
    PipeFS MOUNT notifications) introduces a regression when we call
    rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.

    By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
    end up deadlocking in gss_pipes_dentries_create_net().
    Fix is to register the client and release the mutex before calling
    rpcauth_create().

    Reported-by: Weston Andros Adamson
    Tested-by: Weston Andros Adamson
    Cc: Stanislav Kinsbursky
    Cc: # : 3848160: SUNRPC: fix races on PipeFS MOUNT
    Cc: # : e73f4cc: SUNRPC: split client creation
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Jul, 2013

4 commits

  • Hi Jeff,

    FYI, there are new sparse warnings show up in

    tree: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-next
    head: 296afe1f58d55fd56ed85daaafafcfee39f59ece
    commit: 76fa66657900071016f2bae61de28f059f3f2abf [2/5] rpc_pipe: set dentry operations at d_alloc time

    >> net/sunrpc/rpc_pipe.c:496:31: sparse: symbol 'rpc_dir_inode_operations' was not declared. Should it be static?

    Please consider folding the attached diff :-)

    Signed-off-by: Fengguang Wu
    Signed-off-by: Trond Myklebust

    Fengguang Wu
     
  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     
  • Currently the way these get set is a little convoluted. If the dentry is
    allocated via lookup from userland, then it gets set by simple_lookup.
    If it gets allocated when the kernel is populating the directory, then
    it gets set via __rpc_lookup_create_exclusive, which has to check
    whether they might already be set. Between both of these, this ensures
    that all dentries have their d_op pointer set.

    Instead of doing that, just have them set at d_alloc time by pointing
    sb->s_d_op at them. With that change, we no longer want the lookup op
    to set them, so we must move to using our own lookup routine.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Pull NFS client updates from Trond Myklebust:
    "Feature highlights include:
    - Add basic client support for NFSv4.2
    - Add basic client support for Labeled NFS (selinux for NFSv4.2)
    - Fix the use of credentials in NFSv4.1 stateful operations, and add
    support for NFSv4.1 state protection.

    Bugfix highlights:
    - Fix another NFSv4 open state recovery race
    - Fix an NFSv4.1 back channel session regression
    - Various rpc_pipefs races
    - Fix another issue with NFSv3 auth negotiation

    Please note that Labeled NFS does require some additional support from
    the security subsystem. The relevant changesets have all been
    reviewed and acked by James Morris."

    * tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
    NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
    NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
    nfs: have NFSv3 try server-specified auth flavors in turn
    nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
    nfs: move server_authlist into nfs_try_mount_request
    nfs: refactor "need_mount" code out of nfs_try_mount
    SUNRPC: PipeFS MOUNT notification optimization for dying clients
    SUNRPC: split client creation routine into setup and registration
    SUNRPC: fix races on PipeFS UMOUNT notifications
    SUNRPC: fix races on PipeFS MOUNT notifications
    NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
    NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
    NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
    NFS: Improve legacy idmapping fallback
    NFSv4.1 end back channel session draining
    NFS: Apply v4.1 capabilities to v4.2
    NFSv4.1: Clean up layout segment comparison helper names
    NFSv4.1: layout segment comparison helpers should take 'const' parameters
    NFSv4: Move the DNS resolver into the NFSv4 module
    rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
    ...

    Linus Torvalds
     

09 Jul, 2013

1 commit

  • As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
    kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
    (0xffff) uid or gid would fail with a badcred error.

    Commit afe3c3fd5392b2f0066930abc5dbd3f4b14a0f13 "svcrpc: fix failures to
    handle -1 uid's and gid's" fixed part of the problem, but overlooked the
    gid upcall--the kernel can request supplementary gid's for the -1 uid,
    but mountd's attempt write a response will get -EINVAL.

    Symptoms were nfsd failing to reply to the first attempt to use a newly
    negotiated krb5 context.

    Reported-by: Sven Geggus
    Tested-by: Sven Geggus
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

04 Jul, 2013

3 commits

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds
     
  • Calling kthread_run with a single name parameter causes it to be handled
    as a format string. Many callers are passing potentially dynamic string
    content, so use "%s" in those cases to avoid any potential accidents.

    Signed-off-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the total number of ACPI commits is slightly greater than
    the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
    remains the most active patch submitter.

    To me, the most significant change is the addition of offline/online
    device operations to the driver core (with the Greg's blessing) and
    the related modifications of the ACPI core hotplug code. Next are the
    freezer updates from Colin Cross that should make the freezing of
    tasks a bit less heavy weight.

    We also have a couple of regression fixes, a number of fixes for
    issues that have not been identified as regressions, two new drivers
    and a bunch of cleanups all over.

    Highlights:

    - Hotplug changes to support graceful hot-removal failures.

    It sometimes is necessary to fail device hot-removal operations
    gracefully if they cannot be carried out completely. For example,
    if memory from a memory module being hot-removed has been allocated
    for the kernel's own use and cannot be moved elsewhere, it's
    desirable to fail the hot-removal operation in a graceful way
    rather than to crash the kernel, but currenty a success or a kernel
    crash are the only possible outcomes of an attempted memory
    hot-removal. Needless to say, that is not a very attractive
    alternative and it had to be addressed.

    However, in order to make it work for memory, I first had to make
    it work for CPUs and for this purpose I needed to modify the ACPI
    processor driver. It's been split into two parts, a resident one
    handling the low-level initialization/cleanup and a modular one
    playing the actual driver's role (but it binds to the CPU system
    device objects rather than to the ACPI device objects representing
    processors). That's been sort of like a live brain surgery on a
    patient who's riding a bike.

    So this is a little scary, but since we found and fixed a couple of
    regressions it caused to happen during the early linux-next testing
    (a month ago), nobody has complained.

    As a bonus we remove some duplicated ACPI hotplug code, because the
    ACPI-based CPU hotplug is now going to use the common ACPI hotplug
    code.

    - Lighter weight freezing of tasks.

    These changes from Colin Cross and Mandeep Singh Baines are
    targeted at making the freezing of tasks a bit less heavy weight
    operation. They reduce the number of tasks woken up every time
    during the freezing, by using the observation that the freezer
    simply doesn't need to wake up some of them and wait for them all
    to call refrigerator(). The time needed for the freezer to decide
    to report a failure is reduced too.

    Also reintroduced is the check causing a lockdep warining to
    trigger when try_to_freeze() is called with locks held (which is
    generally unsafe and shouldn't happen).

    - cpufreq updates

    First off, a commit from Srivatsa S Bhat fixes a resume regression
    introduced during the 3.10 cycle causing some cpufreq sysfs
    attributes to return wrong values to user space after resume. The
    fix is kind of fresh, but also it's pretty obvious once Srivatsa
    has identified the root cause.

    Second, we have a new freqdomain_cpus sysfs attribute for the
    acpi-cpufreq driver to provide information previously available via
    related_cpus. From Lan Tianyu.

    Finally, we fix a number of issues, mostly related to the
    CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
    up some code. The majority of changes from Viresh Kumar with bits
    from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
    Arnd Bergmann, and Tang Yuantian.

    - ACPICA update

    A usual bunch of updates from the ACPICA upstream.

    During the 3.4 cycle we introduced support for ACPI 5 extended
    sleep registers, but they are only supposed to be used if the
    HW-reduced mode bit is set in the FADT flags and the code attempted
    to use them without checking that bit. That caused suspend/resume
    regressions to happen on some systems. Fix from Lv Zheng causes
    those registers to be used only if the HW-reduced mode bit is set.

    Apart from this some other ACPICA bugs are fixed and code cleanups
    are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
    Zhang Rui.

    - cpuidle updates

    New driver for Xilinx Zynq processors is added by Michal Simek.

    Multidriver support simplification, addition of some missing
    kerneldoc comments and Kconfig-related fixes come from Daniel
    Lezcano.

    - ACPI power management updates

    Changes to make suspend/resume work correctly in Xen guests from
    Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
    cleanups and fixes of the ACPI device power state selection
    routine.

    - ACPI documentation updates

    Some previously missing pieces of ACPI documentation are added by
    Lv Zheng and Aaron Lu (hopefully, that will help people to
    uderstand how the ACPI subsystem works) and one outdated doc is
    updated by Hanjun Guo.

    - Assorted ACPI updates

    We finally nailed down the IA-64 issue that was the reason for
    reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers
    against objects having scan handlers"), so we can fix it and move
    the ACPI scan handler check added to the ACPI video driver back to
    the core.

    A mechanism for adding CMOS RTC address space handlers is
    introduced by Lan Tianyu to allow some EC-related breakage to be
    fixed on some systems.

    A spec-compliant implementation of acpi_os_get_timer() is added by
    Mika Westerberg.

    The evaluation of _STA is added to do_acpi_find_child() to avoid
    situations in which a pointer to a disabled device object is
    returned instead of an enabled one with the same _ADR value. From
    Jeff Wu.

    Intel BayTrail PCH (Platform Controller Hub) support is added to
    the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
    driver is modified to work around a couple of known BIOS issues.
    Changes from Mika Westerberg and Heikki Krogerus.

    The EC driver is fixed by Vasiliy Kulikov to use get_user() and
    put_user() instead of dereferencing user space pointers blindly.

    Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
    Kani.

    - Assorted power management updates

    The "runtime idle" helper routine is changed to take the return
    values of the callbacks executed by it into account and to call
    rpm_suspend() if they return 0, which allows us to reduce the
    overall code bloat a bit (by dropping some code that's not
    necessary any more after that modification).

    The runtime PM documentation is updated by Alan Stern (to reflect
    the "runtime idle" behavior change).

    New trace points for PM QoS are added by Sahara
    ().

    PM QoS documentation is updated by Lan Tianyu.

    Code cleanups are made and minor issues are addressed by Bernie
    Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

    - devfreq updates

    New driver for the Exynos5-bus device from Abhilash Kesavan.

    Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
    Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

    - OMAP power management updates

    Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
    updates from Andrii Tseglytskyi and Nishanth Menon."

    * tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
    cpufreq: Fix cpufreq regression after suspend/resume
    ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
    PM / Sleep: Warn about system time after resume with pm_trace
    cpufreq: don't leave stale policy pointer in cdbs->cur_policy
    acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
    cpufreq: make sure frequency transitions are serialized
    ACPI: implement acpi_os_get_timer() according the spec
    ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
    ACPI: Add CMOS RTC Operation Region handler support
    ACPI / processor: Drop unused variable from processor_perflib.c
    cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
    ...

    Linus Torvalds
     

02 Jul, 2013

10 commits

  • When a cache entry is replaced, the "expiry_time" get set to
    zero by a call to "cache_fresh_locked(..., 0)" at the end of
    "sunrpc_cache_update".

    This low expiry time makes cache_check() think that the 'refresh_age'
    is negative, so the 'age' is comparatively large and a refresh is
    triggered.
    However refreshing a replaced entry it pointless, it cannot achieve
    anything useful.

    So teach cache_check to ignore a low refresh_age when expiry_time
    is zero.

    Reported-by: Bodo Stroesser
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • commit d202cce8963d9268ff355a386e20243e8332b308
    sunrpc: never return expired entries in sunrpc_cache_lookup

    moved the 'entry is expired' test from cache_check to
    sunrpc_cache_lookup, so that it happened early and some races could
    safely be ignored.

    However the ip_map (in svcauth_unix.c) has a separate single-item
    cache which allows quick lookup without locking. An entry in this
    case would not be subject to the expiry test and so could be used
    well after it has expired.

    This is not normally a big problem because the first time it is used
    after it is expired an up-call will be scheduled to refresh the entry
    (if it hasn't been scheduled already) and the old entry will then
    be invalidated. So on the second attempt to use it after it has
    expired, ip_map_cached_get will discard it.

    However that is subtle and not ideal, so replace the "!cache_valid"
    test with "cache_is_expired".
    In doing this we drop the test on the "CACHE_VALID" bit. This is
    unnecessary as the bit is never cleared, and an entry will only
    be cached if the bit is set.

    Reported-by: Bodo Stroesser
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • It is possible for a race to set CACHE_PENDING after cache_clean()
    has removed a cache entry from the cache.
    If CACHE_PENDING is still set when the entry is finally 'put',
    the cache_dequeue() will never happen and we can leak memory.

    So set a new flag 'CACHE_CLEANED' when we remove something from
    the cache, and don't queue any upcall if it is set.

    If CACHE_PENDING is set before CACHE_CLEANED, the call that
    cache_clean() makes to cache_fresh_unlocked() will free memory
    as needed. If CACHE_PENDING is set after CACHE_CLEANED, the
    test in sunrpc_cache_pipe_upcall will ensure that the memory
    is not allocated.

    Reported-by:
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • cache_fresh_unlocked() is called when a cache entry
    has been updated and ensures that if there were any
    pending upcalls, they are cleared.

    So every time we update a cache entry, we should call this,
    and this should be the only way that we try to clear
    pending calls (that sort of uniformity makes code sooo much
    easier to read).

    try_to_negate_entry() will (possibly) mark an entry as
    negative. If it doesn't, it is because the entry already
    is VALID.
    So the entry will be valid on exit, so it is appropriate to
    call cache_fresh_unlocked().
    So tidy up try_to_negate_entry() to do that, and remove
    partial open-coded cache_fresh_unlocked() from the one
    call-site of try_to_negate_entry().

    In the other branch of the 'switch(cache_make_upcall())',
    we again have a partial open-coded version of cache_fresh_unlocked().
    Replace that with a real call.

    And again in cache_clean(), use a real call to cache_fresh_unlocked().

    These call sites might previously have called
    cache_revisit_request() if CACHE_PENDING wasn't set.
    This is never necessary because cache_revisit_request() can
    only do anything if the item is in the cache_defer_hash,
    However any time that an item is added to the cache_defer_hash
    (setup_deferral), the code immediately tests CACHE_PENDING,
    and removes the entry again if it is clear. So all other
    places we only need to 'cache_revisit_request' if we've
    just cleared CACHE_PENDING.

    Reported-by: Bodo Stroesser
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • We currently queue an upcall after setting CACHE_PENDING,
    and dequeue after clearing CACHE_PENDING.
    So a request should only be present when CACHE_PENDING is set.

    However we don't combine the test and the enqueue/dequeue in
    a protected region, so it is possible (if unlikely) for a race
    to result in a request being queued without CACHE_PENDING set,
    or a request to be absent despite CACHE_PENDING.

    So: include a test for CACHE_PENDING inside the regions of
    enqueue and dequeue where queue_lock is held, and abort
    the operation if the value is not as expected.

    Also remove the early 'return' from cache_dequeue() to ensure that it
    always removes all entries: As there is no locking between setting
    CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not
    inconceivable for some other thread to clear CACHE_PENDING and then
    someone else to set it and call sunrpc_cache_pipe_upcall, both before
    the original threads completed the call.

    With this, it perfectly safe and correct to:
    - call cache_dequeue() if and only if we have just
    cleared CACHE_PENDING
    - call sunrpc_cache_pipe_upcall() (via cache_make_upcall)
    if and only if we have just set CACHE_PENDING.

    Reported-by: Bodo Stroesser
    Signed-off-by: NeilBrown
    Signed-off-by: Bodo Stroesser
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • Though clients we care about mostly don't do this, it is possible for
    rpc requests to be sent in multiple fragments. Here we have a sanity
    check to ensure that the final received rpc isn't too small--except that
    the number we're actually checking is the length of just the final
    fragment, not of the whole rpc. So a perfectly legal rpc that's
    unluckily fragmented could cause the server to close the connection
    here.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • If we detect that an rpc is too short, we abort and close the
    connection. Except, there's a bug here: we're leaving sk_datalen
    nonzero without leaving any pages in the sk_pages array. The most
    likely result of the inconsistency is a subsequent crash in
    svc_tcp_clear_pages.

    Also demote the BUG_ON in svc_tcp_clear_pages to a WARN.

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
    This will make it easier to enforce SP4_MACH_CRED, which needs to
    compare the mechanism used on the exchange_id with that used on
    protected operations.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Common helper to zero out fields of the svc_cred.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Merge bugfixes into my for-3.11 branch.

    J. Bruce Fields
     

29 Jun, 2013

5 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Not need to create pipes for dying client. So just skip them.

    Note: we can safely dereference the client structure, because notification
    caller is holding sn->pipefs_sb_lock.

    Signed-off-by: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This helper moves all "registration" code to the new rpc_client_register()
    helper.
    This helper will be used later in the series to synchronize against PipeFS
    MOUNT/UMOUNT events.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • CPU#0 CPU#1
    ----------------------------- -----------------------------
    rpc_kill_sb
    sn->pipefs_sb = NULL rpc_release_client
    (UMOUNT_EVENT) rpc_free_auth
    rpc_pipefs_event
    rpc_get_client_for_event
    !atomic_inc_not_zero(cl_count)

    atomic_inc(cl_count)
    rpc_free_client
    rpc_clnt_remove_pipedir

    To fix this, this patch does the following:

    1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held.
    2) Removes SUNRPC client from the list AFTER pipes destroying.
    3) Doesn't hold RPC client on notification: if client in the list, then it
    can't be destroyed while sn->pipefs_sb_lock in hold by notification caller.

    Signed-off-by: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • Below are races, when RPC client can be created without PiepFS dentries

    CPU#0 CPU#1
    ----------------------------- -----------------------------
    rpc_new_client rpc_fill_super
    rpc_setup_pipedir
    mutex_lock(&sn->pipefs_sb_lock)
    rpc_get_sb_net == NULL
    (no per-net PipeFS superblock)
    sn->pipefs_sb = sb;
    notifier_call_chain(MOUNT)
    (client is not in the list)
    rpc_register_client
    (client without pipes dentries)

    To fix this patch:
    1) makes PipeFS mount notification call with pipefs_sb_lock being held.
    2) releases pipefs_sb_lock on new SUNRPC client creation only after
    registration.

    Signed-off-by: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     

28 Jun, 2013

1 commit

  • * freezer:
    af_unix: use freezable blocking calls in read
    sigtimedwait: use freezable blocking call
    nanosleep: use freezable blocking call
    futex: use freezable blocking call
    select: use freezable blocking call
    epoll: use freezable blocking call
    binder: use freezable blocking calls
    freezer: add new freezable helpers using freezer_do_not_count()
    freezer: convert freezable helpers to static inline where possible
    freezer: convert freezable helpers to freezer_do_not_count()
    freezer: skip waking up tasks with PF_FREEZER_SKIP set
    freezer: shorten freezer sleep time using exponential backoff
    lockdep: check that no locks held at freeze time
    lockdep: remove task argument from debug_check_no_locks_held
    freezer: add unsafe versions of freezable helpers for CIFS
    freezer: add unsafe versions of freezable helpers for NFS

    Rafael J. Wysocki
     

19 Jun, 2013

1 commit

  • We had a report of a reproducible WARNING:

    [ 1360.039358] ------------[ cut here ]------------
    [ 1360.043978] WARNING: at fs/dcache.c:1355 d_set_d_op+0x8d/0xc0()
    [ 1360.049880] Hardware name: HP Z200 Workstation
    [ 1360.054308] Modules linked in: nfsv4 nfs dns_resolver fscache nfsd
    auth_rpcgss nfs_acl lockd sunrpc sg acpi_cpufreq mperf coretemp kvm_intel kvm
    snd_hda_codec_realtek snd_hda_intel snd_hda_codec hp_wmi crc32c_intel
    snd_hwdep e1000e snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd
    sparse_keymap rfkill soundcore serio_raw ptp iTCO_wdt pps_core pcspkr
    iTCO_vendor_support mei microcode lpc_ich mfd_core wmi xfs libcrc32c sr_mod
    sd_mod cdrom crc_t10dif radeon i2c_algo_bit drm_kms_helper ttm ahci libahci
    drm i2c_core libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
    auth_rpcgss]
    [ 1360.107406] Pid: 8814, comm: mount.nfs4 Tainted: G I -------------- 3.9.0-0.55.el7.x86_64 #1
    [ 1360.116771] Call Trace:
    [ 1360.119219] [] warn_slowpath_common+0x70/0xa0
    [ 1360.125208] [] warn_slowpath_null+0x1a/0x20
    [ 1360.131025] [] d_set_d_op+0x8d/0xc0
    [ 1360.136159] [] __rpc_lookup_create_exclusive+0x4f/0x80 [sunrpc]
    [ 1360.143710] [] rpc_mkpipe_dentry+0x86/0x170 [sunrpc]
    [ 1360.150311] [] nfs_idmap_new+0x96/0x130 [nfsv4]
    [ 1360.156475] [] nfs4_init_client+0xad/0x2d0 [nfsv4]
    [ 1360.162902] [] ? idr_get_empty_slot+0x16f/0x3c0
    [ 1360.169062] [] ? idr_mark_full+0x52/0x60
    [ 1360.174615] [] ? idr_alloc+0x79/0xe0
    [ 1360.179826] [] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc]
    [ 1360.187635] [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]
    [ 1360.194493] [] nfs_get_client+0x27a/0x350 [nfs]
    [ 1360.200666] [] nfs4_set_client.isra.8+0x78/0x100 [nfsv4]
    [ 1360.207624] [] nfs4_create_server+0xf3/0x3a0 [nfsv4]
    [ 1360.214222] [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
    [ 1360.220644] [] mount_fs+0x39/0x1b0
    [ 1360.225691] [] ? __alloc_percpu+0x10/0x20
    [ 1360.231348] [] vfs_kern_mount+0x5f/0xf0
    [ 1360.236822] [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [ 1360.243246] [] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [ 1360.249410] [] ? get_nfs_version+0x27/0x80 [nfs]
    [ 1360.255659] [] nfs_fs_mount+0x5c5/0xd10 [nfs]
    [ 1360.261650] [] ? nfs_clone_super+0x140/0x140 [nfs]
    [ 1360.268074] [] ? param_set_portnr+0x60/0x60 [nfs]
    [ 1360.274406] [] mount_fs+0x39/0x1b0
    [ 1360.279443] [] ? __alloc_percpu+0x10/0x20
    [ 1360.285088] [] vfs_kern_mount+0x5f/0xf0
    [ 1360.290556] [] do_mount+0x1fd/0xa00
    [ 1360.295677] [] ? __get_free_pages+0xe/0x50
    [ 1360.301405] [] ? copy_mount_options+0x36/0x170
    [ 1360.307479] [] sys_mount+0x83/0xc0
    [ 1360.312515] [] system_call_fastpath+0x16/0x1b
    [ 1360.318503] ---[ end trace 8fa1f4cbc36094a7 ]---

    The problem is that we're ending up in __rpc_lookup_create_exclusive
    with a negative dentry that already has d_op set. A little debugging
    has shown that when we hit this, the d_ops are already set to
    simple_dentry_operations.

    I believe that what's happening is that during a mount, idmapd is racing
    in and doing a lookup of /var/lib/nfs/rpc_pipefs/nfs/clnt???/idmap.
    Before that dentry reference is released, the kernel races in to create
    that file and finds the new negative dentry, which already has the
    d_op set.

    This patch just avoids setting the d_op if it's already set.
    simple_dentry_operations and rpc_dentry_operations are functionally
    equivalent so it shouldn't matter which one it's set to.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

07 Jun, 2013

3 commits


31 May, 2013

1 commit

  • Pull nfsd fixes from Bruce Fields:
    "A couple minor fixes for the (new to 3.10) gss-proxy code.

    And one regression from user-namespace changes. (XBMC clients were
    doing something admittedly weird--sending -1 gid's--but something that
    we used to allow.)"

    * 'for-3.10' of git://linux-nfs.org/~bfields/linux:
    svcrpc: fix failures to handle -1 uid's and gid's
    svcrpc: implement O_NONBLOCK behavior for use-gss-proxy
    svcauth_gss: fix error code in use_gss_proxy()

    Linus Torvalds
     

29 May, 2013

2 commits

  • As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
    kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
    (0xffff) uid or gid would fail with a badcred error.

    Reported symptoms were xmbc clients failing on upgrade of the NFS
    server; examination of the network trace showed them sending -1 as the
    gid.

    Reported-by: Julian Sikorski
    Tested-by: Julian Sikorski
    Cc: "Eric W. Biederman"
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Somebody noticed LTP was complaining about O_NONBLOCK opens of
    /proc/net/rpc/use-gss-proxy succeeding and then a following read
    hanging.

    I'm not convinced LTP really has any business opening random proc files
    and expecting them to behave a certain way. Maybe this isn't really a
    bug.

    But in any case the O_NONBLOCK behavior could be useful for someone that
    wants to test whether gss-proxy is up without waiting.

    Reported-by: Jan Stancek
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

23 May, 2013

1 commit

  • The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
    be careful about ordering the calls to rpc_test_and_set_running(task) and
    rpc_clear_queued(task). If we get the order wrong, then we may end up
    testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
    and changed the state of the rpc_task.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     

21 May, 2013

1 commit


16 May, 2013

3 commits

  • This seems to have been overlooked when we did the namespace
    conversion. If a container is running a legacy version of rpc.gssd
    then it will be disrupted if the global 'pipe_version' is set by a
    container running the new version of rpc.gssd.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Recent changes to the NFS security flavour negotiation mean that
    we have a stronger dependency on rpc.gssd. If the latter is not
    running, because the user failed to start it, then we time out
    and mark the container as not having an instance. We then
    use that information to time out faster the next time.

    If, on the other hand, the rpc.gssd successfully binds to an rpc_pipe,
    then we mark the container as having an rpc.gssd instance.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If wait_event_interruptible_timeout() is successful, it returns
    the number of seconds remaining until the timeout. In that
    case, we should be retrying the upcall.

    Signed-off-by: Trond Myklebust

    Trond Myklebust