24 Dec, 2008

16 commits

  • When we can update_open_stateid(), we need to be certain that we don't
    race with a delegation return. While we could do this by grabbing the
    nfs_client->cl_lock, a dedicated spin lock in the delegation structure
    will scale better.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If the admin has specified the "noresvport" option for an NFS mount
    point, the kernel's NFS client uses an unprivileged source port for
    the main NFS transport. The kernel's lockd client should use an
    unprivileged port in this case as well.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • If the admin has specified the "noresvport" option for an NFS mount
    point, the kernel's NFS client uses an unprivileged source port for
    the main NFS transport. The kernel's mountd client should use an
    unprivileged port in this case as well.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The standard default security setting for NFS is AUTH_SYS. An NFS
    client connects to NFS servers via a privileged source port and a
    fixed standard destination port (2049). The client sends raw uid and
    gid numbers to identify users making NFS requests, and the server
    assumes an appropriate authority on the client has vetted these
    values because the source port is privileged.

    On Linux, by default in-kernel RPC services use a privileged port in
    the range between 650 and 1023 to avoid using source ports of well-
    known IP services. Using such a small range limits the number of NFS
    mount points and the number of unique NFS servers to which a client
    can connect concurrently.

    An NFS client can use unprivileged source ports to expand the range of
    source port numbers, allowing more concurrent server connections and
    more NFS mount points. Servers must explicitly allow NFS connections
    from unprivileged ports for this to work.

    In the past, bumping the value of the sunrpc.max_resvport sysctl on
    the client would permit the NFS client to use unprivileged ports.
    Bumping this setting also changes the maximum port number used by
    other in-kernel RPC services, some of which still required a port
    number less than 1023.

    This is exacerbated by the way source port numbers are chosen by the
    Linux RPC client, which starts at the top of the range and works
    downwards. It means that bumping the maximum means all RPC services
    requesting a source port will likely get an unprivileged port instead
    of a privileged one.

    Changing this setting effects all NFS mount points on a client. A
    sysadmin could not selectively choose which mount points would use
    non-privileged ports and which could not.

    Lastly, this mechanism of expanding the limit on the number of NFS
    mount points was entirely undocumented.

    To address the need for the NFS client to use a large range of source
    ports without interfering with the activity of other in-kernel RPC
    services, we introduce a new NFS mount option. This option explicitly
    tells only the NFS client to use a non-privileged source port when
    communicating with the NFS server for one specific mount point.

    This new mount option is called "resvport," like the similar NFS mount
    option on FreeBSD and Mac OS X. A sister patch for nfs-utils will be
    submitted that documents this new option in nfs(5).

    The default setting for this new mount option requires the NFS client
    to use a privileged port, as before. Explicitly specifying the
    "noresvport" mount option allows the NFS client to use an unprivileged
    source port for this mount point when connecting to the NFS server
    port.

    This mount option is supported only for text-based NFS mounts.

    [ Sidebar: it is widely known that security mechanisms based on the
    use of privileged source ports are ineffective. However, the NFS
    client can combine the use of unprivileged ports with the use of
    secure authentication mechanisms, such as Kerberos. This allows a
    large number of connections and mount points while ensuring a useful
    level of security.

    Eventually we may change the default setting for this option
    depending on the security flavor used for the mount. For example,
    if the mount is using only AUTH_SYS, then the default setting will
    be "resvport;" if the mount is using a strong security flavor such
    as krb5, the default setting will be "noresvport." ]

    Signed-off-by: Chuck Lever
    [Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
    was being called with incorrect arguments.]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Make it possible for the NFSv4 mount set up logic to pass mount option
    flags down the stack to nfs_create_rpc_client().

    This is immediately useful if we want NFS mount options to modulate
    settings of the underlying RPC transport, but it may be useful at some
    later point if other parts of the NFSv4 mount initialization logic
    want to know what the mount options are.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The nfs_create_rpc_client() function sets up an RPC client for an NFS
    mount point. Add an option that allows it to set up an RPC transport
    from an unprivileged port.

    Instead of having nfs_create_rpc_client()'s callers retain local
    knowledge about how to set up an RPC client, create a couple of flag
    arguments to control the use of RPC_CLNT_CREATE flags.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up: convert nfs_mount() to take a single data structure argument to make
    it simpler to add more arguments.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up: The nfs_mount() function is not to be used outside of the
    NFS client. Move its public declaration to fs/nfs/internal.h.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up: I'm about to move the declaration of nfs_mount into
    fs/nfs/internal.h and include it in fs/nfs/nfsroot.c. There's a
    conflicting definition of nfs_path in fs/nfs/internal.h and
    fs/nfs/nfsroot.c, so rename the private one.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • My understanding is that there is a push to turn the kernel_thread
    interface into a non-exported symbol and move all kernel threads to use
    the kthread API. This patch changes lockd to use kthread_run to spawn
    the reclaimer thread.

    I've made the assumption here that the extra module references taken
    when we spawn this thread are unnecessary and removed them. I've also
    added a KERN_ERR printk that pops if the thread can't be spawned to warn
    the admin that the locks won't be reclaimed.

    In the future, it would be nice to be able to notify userspace that
    locks have been lost (probably by implementing SIGLOST), and adding some
    good policies about how long we should reattempt to reclaim the locks.

    Finally, I removed a comment about memory leaks that I believe is
    obsolete and added a new one to clarify the result of sending a SIGKILL
    to the reclaimer thread. As best I can tell, doing so doesn't actually
    cause a memory leak.

    I consider this patch 2.6.29 material.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Again, this has never been intended as a public abi for out-of-tree
    modules.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We've never considered the sunrpc code as part of any ABI to be used by
    out-of-tree modules.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Somehow, this escaped the previous purge. There should be no need to keep
    any extra locks in the XDR callbacks.

    The NFS client XDR code only writes into private objects, whereas all reads
    of shared objects are confined to fields that do not change, such as
    filehandles...

    Ditto for lockd, the NFSv2/v3 client mount code, and rpcbind.

    The nfsd XDR code may require the BKL, but since it does a synchronous RPC
    call from a thread that already holds the lock, that issue is moot.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • aops->readpages() and its NFS helper readpage_async_filler() will only
    be called to do readahead I/O for newly allocated pages. So it's not
    necessary to test for the always 0 dirty/uptodate page flags.

    The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
    readahead has been synchronous since 2.6.23, because that call will
    clear PG_readahead, which is the reminder for asynchronous readahead.

    More background: the PG_readahead page flag is shared with PG_reclaim,
    one for read path and the other for write path. clear_page_dirty_for_io()
    unconditionally clears PG_readahead to prevent possible readahead residuals,
    assuming itself to be always called in the write path. However, NFS is one
    and the only exception in that it _always_ calls clear_page_dirty_for_io()
    in the read path, i.e. for readpages()/readpage().

    Cc: Trond Myklebust
    Signed-off-by: Wu Fengguang
    Signed-off-by: Trond Myklebust

    Wu Fengguang
     

21 Dec, 2008

3 commits

  • Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW

    commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68 (clocksource:
    introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only
    available to read out the raw not NTP adjusted system time.

    The above commit did not prevent that a posix timer can be created
    with that clockid. The timer_create() syscall succeeds and initializes
    the timer to a non existing hrtimer base. When the timer is deleted
    either by timer_delete() or by the exit() cleanup the kernel crashes.

    Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the
    posix clock function to no_timer_create which returns an error code.

    Reported-and-tested-by: Eric Sesterhenn
    Signed-off-by: Thomas Gleixner
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: change simple_strtol to simple_strtoul
    9p: convert d_iname references to d_name.name
    9p: Remove potentially bad parameter from function entry debug print.

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: fix resume (S2R) broken by Intel microcode module, on A110L
    x86 gart: don't complain if no AMD GART found
    AMD IOMMU: panic if completion wait loop fails
    AMD IOMMU: set cmd buffer pointers to zero manually
    x86: re-enable MCE on secondary CPUS after suspend/resume
    AMD IOMMU: allocate rlookup_table with __GFP_ZERO

    Linus Torvalds
     

20 Dec, 2008

10 commits

  • Impact: fix deadlock

    This is in response to the following bug report:

    Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100
    Subject : resume (S2R) broken by Intel microcode module, on A110L
    Submitter : Andreas Mohr
    Date : 2008-11-25 08:48 (19 days old)
    Handled-By : Dmitry Adamushko

    [ The deadlock scenario has been discovered by Andreas Mohr ]

    I think I might have a logical explanation why the system:

    (http://bugzilla.kernel.org/show_bug.cgi?id=12100)

    might hang upon resuming, OTOH it should have likely hanged each and every time.

    (1) possible deadlock in microcode_resume_cpu() if either 'if' section is
    taken;

    (2) now, I don't see it in spec. and can't experimentally verify it (newer
    ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd
    think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should
    be back to its original one (we need to reload ucode anyway so it doesn't seem
    logical if a cpu doesn't drop the version)... if so, the comparison with
    memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of
    the aforementioned 'if' sections might have been triggered - leading to a
    deadlock.

    Obviously, in my tests I simulated loading/resuming with the ucode of the same
    version (just to see that the file is loaded/re-loaded upon resuming) so this
    issue has never popped up.

    I'd appreciate if someone with an appropriate system might give a try to the
    2nd patch (titled "fix a comparison && deadlock...").

    In any case, the deadlock situation is a must-have fix.

    Reported-by: Andreas Mohr
    Signed-off-by: Dmitry Adamushko
    Tested-by: Andreas Mohr
    Signed-off-by: Ingo Molnar
    Cc:

    Signed-off-by: Ingo Molnar

    Dmitry Adamushko
     
  • Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that
    simple_strtol.

    A simplified version of the semantic patch that makes this change is as
    follows: (http://www.emn.fr/x-info/coccinelle/)

    //
    @r2@
    long e;
    position p;
    @@

    e = simple_strtol@p(...)

    @@
    position p != r2.p;
    type T;
    T e;
    @@

    e =
    - simple_strtol@p
    + simple_strtoul
    (...)
    //

    Signed-off-by: Julia Lawall
    Acked-by: Eric Van Hensbergen

    Julia Lawall
     
  • d_iname is rubbish for long file names.
    Use d_name.name in printks instead.

    Signed-off-by: Wu Fengguang
    Acked-by: Eric Van Hensbergen

    Wu Fengguang
     
  • Signed-off-by: Duane Griffin
    Signed-off-by: Eric Van Hensbergen

    Duane Griffin
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] mpt fusion: clear list of outstanding commands on host reset
    [SCSI] scsi_lib: only call scsi_unprep_request() under queue lock
    [SCSI] ibmvstgt: move crq_queue_create to the end of initialization
    [SCSI] libiscsi REGRESSION: fix passthrough support with older iscsi tools
    [SCSI] aacraid: disable Dell Percraid quirk on Adaptec 2200S and 2120S

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: Fix a Oops bug in omap soc driver.
    ALSA: hda - Remove non-working headphone control for Dell laptops
    ALSA: hda - Add no-jd model for IDT 92HD73xx
    ALSA: Revert "ALSA: hda: removed unneeded hp_nid references"
    ALSA: hda - Add quirk for Dell Studio 17
    ALSA: hda - Fix silent HP output on D975

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cciss: fix problem that deleting multiple logical drives could cause a panic

    Linus Torvalds
     
  • * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm/i915: GEM on PAE has problems - disable it for now.
    drm/i915: Don't return busy for buffers left on the flushing list.

    Linus Torvalds
     
  • * 'for-linus' of git://neil.brown.name/md:
    md: Don't read past end of bitmap when reading bitmap.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI hotplug: ibmphp: Fix module ref count underflow
    PCI hotplug: acpiphp wants a 64-bit _SUN
    PCI: pciehp: fix unexpected power off with pciehp_force
    PCI: fix aer resume sanity check

    Linus Torvalds
     

19 Dec, 2008

11 commits