18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

05 Oct, 2016

1 commit

  • boot_time is represented as a struct timespec.
    struct timespec and CURRENT_TIME are not y2038 safe.
    Overall, the plan is to use timespec64 and ktime_t for
    all internal kernel representation of timestamps.
    CURRENT_TIME will also be removed.

    boot_time is used to construct the nfs client boot verifier.

    Use ktime_t to represent boot_time and ktime_get_real() for
    the boot_time value.

    Following Trond's request https://lkml.org/lkml/2016/6/9/22 ,
    use ktime_t instead of converting to struct timespec64.

    Use higher and lower 32 bit parts of ktime_t for the boot
    verifier.

    Use the lower 32 bit part of ktime_t for the authsys_parms
    stamp field.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Cc: Trond Myklebust
    Cc: Anna Schumaker
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Anna Schumaker

    Deepa Dinamani
     

13 Nov, 2014

1 commit

  • The rpc_pipefs code isn't thread safe, leading to occasional use after
    frees when running xfstests generic/241 (dbench).

    Signed-off-by: Christoph Hellwig
    Link: http://lkml.kernel.org/r/1411740170-18611-2-git-send-email-hch@lst.de
    Cc: stable@vger.kernel.org # 3.17.x
    Signed-off-by: Trond Myklebust

    Christoph Hellwig
     

05 Aug, 2014

1 commit

  • The usage of pid_ns->child_reaper->nsproxy->net_ns in
    nfs_server_list_open and nfs_client_list_open is not safe.

    /proc for a pid namespace can remain mounted after the all of the
    process in that pid namespace have exited. There are also times
    before the initial process in a pid namespace has started or after the
    initial process in a pid namespace has exited where
    pid_ns->child_reaper can be NULL or stale. Making the idiom
    pid_ns->child_reaper->nsproxy a double whammy of problems.

    Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and
    /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and
    /proc/net/nfsfs/volumes and add a symlink from the original location,
    and to use seq_open_net as it has been designed.

    Cc: stable@vger.kernel.org
    Cc: Trond Myklebust
    Cc: Stanislav Kinsbursky
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

02 Oct, 2012

4 commits


31 Jul, 2012

1 commit

  • This patch exports symbols needed by the v4 module. In addition, I also
    switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
    CONFIG_NFS_V4_MODULE are set.

    The module (nfs4.ko) will be created in the same directory as nfs.ko and
    will be automatically loaded the first time you try to mount over NFS v4.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

23 May, 2012

1 commit

  • Currently our NFS client assigns a unique SETCLIENTID boot verifier
    for each server IP address it knows about. It's set to CURRENT_TIME
    when the struct nfs_client for that server IP is created.

    During the SETCLIENTID operation, our client also presents an
    nfs_client_id4 string to servers, as an identifier on which the server
    can hang all of this client's NFSv4 state. Our client's
    nfs_client_id4 string is unique for each server IP address.

    An NFSv4 server is obligated to wipe all NFSv4 state associated with
    an nfs_client_id4 string when the client presents the same
    nfs_client_id4 string along with a changed SETCLIENTID boot verifier.

    When our client unmounts the last of a server's shares, it destroys
    that server's struct nfs_client. The next time the client mounts that
    NFS server, it creates a fresh struct nfs_client with a fresh boot
    verifier. On seeing the fresh verifer, the server wipes any previous
    NFSv4 state associated with that nfs_client_id4.

    However, NFSv4.1 clients are supposed to present the same
    nfs_client_id4 string to all servers. And, to support Transparent
    State Migration, the same nfs_client_id4 string should be presented
    to all NFSv4.0 servers so they recognize that migrated state for this
    client belongs with state a server may already have for this client.
    (This is known as the Uniform Client String model).

    If the nfs_client_id4 string is the same but the boot verifier changes
    for each server IP address, SETCLIENTID and EXCHANGE_ID operations
    from such a client could unintentionally result in a server wiping a
    client's previously obtained lease.

    Thus, if our NFS client is going to use a fixed nfs_client_id4 string,
    either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a
    boot verifier that does not change depending on server IP address.
    Replace our current per-nfs_client boot verifier with a per-nfs_net
    boot verifier.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

11 Mar, 2012

2 commits

  • This queue is used for sleeping in kernel and it have to be per-net since we
    don't want to wake any other waiters except in out network nemespace.
    BTW, move wq to per-net data is easy. But some way to handle upcall timeouts
    have to be provided. On message destroy in case of timeout, tasks, waiting for
    message to be delivered, should be awakened. Thus, some data required to
    located the right wait queue. Chosen solution replaces rpc_pipe_msg object with
    new introduced bl_pipe_msg object, containing rpc_pipe_msg and proper wq.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This global variable is used for blocklayout downcall and thus can be corrupted
    if case of existence of multiple networks namespaces.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     

07 Feb, 2012

4 commits

  • This patch makes nfs_clients_lock allocated per network namespace. All items it
    protects are already network namespace aware.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This patch makes ID's infrastructure network namespace aware. This was done
    mainly because of nfs_client_lock, which is desired to be per network
    namespace, but protects NFS clients ID's.

    NOTE: NFS client's net pointer have to be set prior to ID initialization,
    proper assignment was moved.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This patch splits global list of NFS servers into per-net-ns array of lists.
    This looks more strict and clearer.
    BTW, this patch also makes "/proc/fs/nfsfs/volumes" content depends on /proc
    mount owner pid namespace. See below for details.

    NOTE: few words about how was /proc/fs/nfsfs/ entries content show per network
    namespace done. This is a little bit tricky and not the best is could be. But
    it's cheap (proper fix for /proc conteinerization is a hard nut to crack).
    The idea is simple: take proper network namespace from pid namespace
    child reaper nsproxy of /proc/ mount creator.
    This actually means, that if there are 2 containers with different net
    namespace sharing pid namespace, then read of /proc/fs/nfsfs/ entries will
    always return content, taken from net namespace of pid namespace creator task
    (and thus second namespace set wil be unvisible).

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This patch splits global list of NFS clients into per-net-ns array of lists.
    This looks more strict and clearer.
    BTW, this patch also makes "/proc/fs/nfsfs/servers" entry content depends on
    /proc mount owner pid namespace. See below for details.

    NOTE: few words about how was /proc/fs/nfsfs/ entries content show per network
    namespace done. This is a little bit tricky and not the best is could be. But
    it's cheap (proper fix for /proc conteinerization is a hard nut to crack).
    The idea is simple: take proper network namespace from pid namespace
    child reaper nsproxy of /proc/ mount creator.
    This actually means, that if there are 2 containers with different net
    namespace sharing pid namespace, then read of /proc/fs/nfsfs/ entries will
    always return content, taken from net namespace of pid namespace creator task
    (and thus second namespace set wil be unvisible).

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     

01 Feb, 2012

2 commits

  • This patch implements blocklayout pipe creation and registration per each
    existent network namespace.
    This was achived by registering NFS per-net operations, responsible for
    blocklayout pipe allocation/register and unregister/destruction instead of
    initialization and destruction of static "bl_device_pipe" pipe (this one was
    removed).
    Note, than pointer to network blocklayout pipe is stored in per-net "nfs_net"
    structure, because allocating of one more per-net structure for blocklayout
    module looks redundant.
    This patch also changes dev_remove() function prototype (and all it's callers,
    where it' requied) by adding network namespace pointer parameter, which is used
    to discover proper blocklayout pipe for rpc_queue_upcall() call.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky
     
  • This patch implements DNS resolver cache creation and registration for each
    alive network namespace context.
    This was done by registering NFS per-net operations, responsible for DNS cache
    allocation/register and unregister/destructioning instead of initialization and
    destruction of static "nfs_dns_resolve" cache detail (this one was removed).
    Pointer to network dns resolver cache is stored in new per-net "nfs_net"
    structure.
    This patch also changes nfs_dns_resolve_name() function prototype (and it's
    calls) by adding network pointer parameter, which is used to get proper DNS
    resolver cache pointer for do_cache_lookup_wait() call.

    Note: empty nfs_dns_resolver_init() and nfs_dns_resolver_destroy() functions
    will be used in next patch in the series.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Trond Myklebust

    Stanislav Kinsbursky