06 Aug, 2016

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:

    - new vsock device support in host and guest

    - platform IOMMU support in host and guest, including compatibility
    quirks for legacy systems.

    - misc fixes and cleanups.

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    VSOCK: Use kvfree()
    vhost: split out vringh Kconfig
    vhost: detect 32 bit integer wrap around
    vhost: new device IOTLB API
    vhost: drop vringh dependency
    vhost: convert pre sorted vhost memory array to interval tree
    vhost: introduce vhost memory accessors
    VSOCK: Add Makefile and Kconfig
    VSOCK: Introduce vhost_vsock.ko
    VSOCK: Introduce virtio_transport.ko
    VSOCK: Introduce virtio_vsock_common.ko
    VSOCK: defer sock removal to transports
    VSOCK: transport-specific vsock_transport functions
    vhost: drop vringh dependency
    vop: pull in vhost Kconfig
    virtio: new feature to detect IOMMU device quirk
    balloon: check the number of available pages in leak balloon
    vhost: lockless enqueuing
    vhost: simplify work flushing

    Linus Torvalds
     

05 Aug, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Trond made a change to the server's tcp logic that allows a fast
    client to better take advantage of high bandwidth networks, but may
    increase the risk that a single client could starve other clients;
    a new sunrpc.svc_rpc_per_connection_limit parameter should help
    mitigate this in the (hopefully unlikely) event this becomes a
    problem in practice.

    - Tom Haynes added a minimal flex-layout pnfs server, which is of no
    use in production for now--don't build it unless you're doing
    client testing or further server development"

    * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: remove some dead code in nfsd_create_locked()
    nfsd: drop unnecessary MAY_EXEC check from create
    nfsd: clean up bad-type check in nfsd_create_locked
    nfsd: remove unnecessary positive-dentry check
    nfsd: reorganize nfsd_create
    nfsd: check d_can_lookup in fh_verify of directories
    nfsd: remove redundant zero-length check from create
    nfsd: Make creates return EEXIST instead of EACCES
    SUNRPC: Detect immediate closure of accepted sockets
    SUNRPC: accept() may return sockets that are still in SYN_RECV
    nfsd: allow nfsd to advertise multiple layout types
    nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
    nfsd/blocklayout: Make sure calculate signature/designator length aligned
    xfs: abstract block export operations from nfsd layouts
    SUNRPC: Remove unused callback xpo_adjust_wspace()
    SUNRPC: Change TCP socket space reservation
    SUNRPC: Add a server side per-connection limit
    SUNRPC: Micro optimisation for svc_data_ready
    SUNRPC: Call the default socket callbacks instead of open coding
    SUNRPC: lock the socket while detaching it
    ...

    Linus Torvalds
     

04 Aug, 2016

1 commit

  • The use of config_enabled() against config options is ambiguous. In
    practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
    author might have used it for the meaning of IS_ENABLED(). Using
    IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
    clearer.

    This commit replaces config_enabled() with IS_ENABLED() where possible.
    This commit is only touching bool config options.

    I noticed two cases where config_enabled() is used against a tristate
    option:

    - config_enabled(CONFIG_HWMON)
    [ drivers/net/wireless/ath/ath10k/thermal.c ]

    - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
    [ drivers/gpu/drm/gma500/opregion.c ]

    I did not touch them because they should be converted to IS_BUILTIN()
    in order to keep the logic, but I was not sure it was the authors'
    intention.

    Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Acked-by: Kees Cook
    Cc: Stas Sergeev
    Cc: Matt Redfearn
    Cc: Joshua Kinard
    Cc: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Markos Chandras
    Cc: "Dmitry V. Levin"
    Cc: yu-cheng yu
    Cc: James Hogan
    Cc: Brian Gerst
    Cc: Johannes Berg
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: Will Drewry
    Cc: Nikolay Martynov
    Cc: Huacai Chen
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Daniel Borkmann
    Cc: Leonid Yegoshin
    Cc: Rafal Milecki
    Cc: James Cowgill
    Cc: Greg Kroah-Hartman
    Cc: Ralf Baechle
    Cc: Alex Smith
    Cc: Adam Buchbinder
    Cc: Qais Yousef
    Cc: Jiang Liu
    Cc: Mikko Rapeli
    Cc: Paul Gortmaker
    Cc: Denys Vlasenko
    Cc: Brian Norris
    Cc: Hidehiro Kawai
    Cc: "Luis R. Rodriguez"
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Dave Hansen
    Cc: "Kirill A. Shutemov"
    Cc: Roland McGrath
    Cc: Paul Burton
    Cc: Kalle Valo
    Cc: Viresh Kumar
    Cc: Tony Wu
    Cc: Huaitong Han
    Cc: Sumit Semwal
    Cc: Alexei Starovoitov
    Cc: Juergen Gross
    Cc: Jason Cooper
    Cc: "David S. Miller"
    Cc: Oleg Nesterov
    Cc: Andrea Gelmini
    Cc: David Woodhouse
    Cc: Marc Zyngier
    Cc: Rabin Vincent
    Cc: "Maciej W. Rozycki"
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

03 Aug, 2016

2 commits

  • Pull networking fixes from David Miller:

    1) Fix several cases of missing of_node_put() calls in various
    networking drivers. From Peter Chen.

    2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

    3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

    4) Fix lockups in CPDMA driver, from Grygorii Strashko.

    5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

    6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

    7) Various printf format specifier fixes, from Heinrich Schuchardt.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    qed: Fail driver load in 100g MSI mode.
    ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
    ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
    ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
    ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
    ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
    ethernet: altera: add missing of_node_put
    8139too: fix system hang when there is a tx timeout event.
    qed: Fix error return code in qed_resc_alloc()
    net: qlcnic: avoid superfluous assignement
    dsa: b53: remove redundant if
    ...

    Linus Torvalds
     
  • Pull Ceph updates from Ilya Dryomov:
    "The highlights are:

    - RADOS namespace support in libceph and CephFS (Zheng Yan and
    myself). The stopgaps added in 4.5 to deny access to inodes in
    namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
    bit is now fully supported

    - A large rework of the MDS cap flushing code (Zheng Yan)

    - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU

    On top of that we've got a few CephFS bug fixes, a couple of cleanups
    and Arnd's workaround for a weird genksyms issue"

    * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
    ceph: fix symbol versioning for ceph_monc_do_statfs
    ceph: Correctly return NXIO errors from ceph_llseek
    ceph: Mark the file cache as unreclaimable
    ceph: optimize cap flush waiting
    ceph: cleanup ceph_flush_snaps()
    ceph: kick cap flushes before sending other cap message
    ceph: introduce an inode flag to indicates if snapflush is needed
    ceph: avoid sending duplicated cap flush message
    ceph: unify cap flush and snapcap flush
    ceph: use list instead of rbtree to track cap flushes
    ceph: update types of some local varibles
    ceph: include 'follows' of pending snapflush in cap reconnect message
    ceph: update cap reconnect message to version 3
    ceph: mount non-default filesystem by name
    libceph: fsmap.user subscription support
    ceph: handle LOOKUP_RCU in ceph_d_revalidate
    ceph: allow dentry_lease_is_valid to work under RCU walk
    ceph: clear d_fsinfo pointer under d_lock
    ceph: remove ceph_mdsc_lease_release
    ceph: don't use ->d_time
    ...

    Linus Torvalds
     

02 Aug, 2016

7 commits

  • Enable virtio-vsock and vhost-vsock.

    Signed-off-by: Asias He
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Asias He
     
  • VM sockets virtio transport implementation. This driver runs in the
    guest.

    Signed-off-by: Asias He
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Asias He
     
  • This module contains the common code and header files for the following
    virtio_transporto and vhost_vsock kernel modules.

    Signed-off-by: Asias He
    Signed-off-by: Claudio Imbrenda
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Asias He
     
  • The virtio transport will implement graceful shutdown and the related
    SO_LINGER socket option. This requires orphaning the sock but keeping
    it in the table of connections after .release().

    This patch adds the vsock_remove_sock() function and leaves it up to the
    transport when to remove the sock.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Stefan Hajnoczi
     
  • struct vsock_transport contains function pointers called by AF_VSOCK
    core code. The transport may want its own transport-specific function
    pointers and they can be added after struct vsock_transport.

    Allow the transport to fetch vsock_transport. It can downcast it to
    access transport-specific function pointers.

    The virtio transport will use this.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Stefan Hajnoczi
     
  • This modification is useful for debugging issues that happen while
    the socket is being initialised.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • We're seeing traces of the following form:

    [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
    [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
    [10952.396362] nfsd: connect from 10.2.6.1, port=187
    [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
    [10952.396368] setting up TCP socket for reading
    [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
    [10952.396373] svc: transport ffff8803eb10a000 put into queue
    [10952.396375] svc: transport ffff88042ba4a000 put into queue
    [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
    [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
    [10952.396381] svc_recv: found XPT_CLOSE
    [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
    [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
    [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
    [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

    i.e. an immediate close of the socket after initialisation.

    The culprit appears to be the test at the end of svc_tcp_init, which
    checks if the newly created socket is in the TCP_ESTABLISHED state,
    and immediately closes it if not. The evidence appears to suggest that
    the socket might still be in the SYN_RECV state at this time.

    The fix is to check for both states, and then to add a check in
    svc_tcp_state_change() to ensure we don't close the socket when
    it transitions into TCP_ESTABLISHED.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

31 Jul, 2016

7 commits

  • Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of
    CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the
    assoc is closed when sctp_accept clones a new sk.

    If there is still data in sk receive queue, users will not be able
    to read it any more, as sctp_recvmsg returns directly if sk state
    is CLOSED.

    This patch is to add CLOSED state check in sctp_recvmsg to allow
    reading data from TCP-style sk with CLOSED state as what TCP does.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Prior to this patch, once sctp received SHUTDOWN or shutdown with RD,
    sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would
    be dropped in sctp_ulpq_tail_event(). It would cause:

    1. some notifications couldn't be received by users. like
    SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C().

    2. sctp would also never trigger sk_data_ready when the association
    was closed, making it harder to identify the end of the association
    by calling recvmsg() and getting an EOF. It was not convenient for
    kernel users.

    The check here should be stopping delivering DATA chunks after receiving
    SHUTDOWN, and stopping delivering ANY chunks after sctp_close().

    So this patch is to allow notifications to enqueue into receive queue
    even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event,
    but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all
    events.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • sctp needs to queue auth chunk back when we know that we are going
    to generate another segment. But commit f1533cce60d1 ("sctp: fix
    panic when sending auth chunks") requeues the last chunk processed
    which is probably not the auth chunk.

    It causes panic when calculating the MAC in sctp_auth_calculate_hmac(),
    as the incorrect offset of the auth chunk in skb->data.

    This fix is to requeue it by using packet->auth.

    Fixes: f1533cce60d1 ("sctp: fix panic when sending auth chunks")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • tcp_select_initial_window() intends to advertise a window
    scaling for the maximum possible window size. To do so,
    it considers the maximum of net.ipv4.tcp_rmem[2] and
    net.core.rmem_max as the only possible upper-bounds.
    However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
    to set the socket's receive buffer size to values
    larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
    Thus, SO_RCVBUFFORCE is effectively ignored by
    tcp_select_initial_window().

    To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
    net.core.rmem_max and socket's initial buffer space.

    Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
    Signed-off-by: Soheil Hassas Yeganeh
    Suggested-by: Neal Cardwell
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     
  • Using list_move() instead of list_del() + list_add().

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • In the error handling case of nla_nest_start() failed read_unlock_bh()
    is called to unlock a lock that had not been taken yet. sparse warns
    about the context imbalance as the following:

    net/tipc/monitor.c:799:23: warning:
    context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block

    Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes')
    Signed-off-by: Wei Yongjun
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable bugfixes:
    - nfs: don't create zero-length requests

    - several LAYOUTGET bugfixes

    Features:
    - several performance related features

    - more aggressive caching when we can rely on close-to-open
    cache consistency

    - remove serialisation of O_DIRECT reads and writes

    - optimise several code paths to not flush to disk unnecessarily.

    However allow for the idiosyncracies of pNFS for those layout
    types that need to issue a LAYOUTCOMMIT before the metadata can
    be updated on the server.

    - SUNRPC updates to the client data receive path

    - pNFS/SCSI support RH/Fedora dm-mpath device nodes

    - pNFS files/flexfiles can now use unprivileged ports when
    the generic NFS mount options allow it.

    Bugfixes:
    - Don't use RDMA direct data placement together with data
    integrity or privacy security flavours

    - Remove the RDMA ALLPHYSICAL memory registration mode as
    it has potential security holes.

    - Several layout recall fixes to improve NFSv4.1 protocol
    compliance.

    - Fix an Oops in the pNFS files and flexfiles connection
    setup to the DS

    - Allow retry of operations that used a returned delegation
    stateid

    - Don't mark the inode as revalidated if a LAYOUTCOMMIT is
    outstanding

    - Fix writeback races in nfs4_copy_range() and
    nfs42_proc_deallocate()"

    * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits)
    pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
    NFSv4: Clean up lookup of SECINFO_NO_NAME
    NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
    NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
    SUNRPC: Fix a compiler warning in fs/nfs/clnt.c
    pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
    pNFS: Cleanup - do layout segment initialisation in one place
    pNFS: Remove redundant stateid invalidation
    pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
    pNFS: Clear the layout metadata if the server changed the layout stateid
    pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
    NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
    pNFS: Do not set plh_return_seq for non-callback related layoutreturns
    pNFS: Ensure layoutreturn acts as a completion for layout callbacks
    pNFS: Fix CB_LAYOUTRECALL stateid verification
    pNFS: Always update the layout barrier seqid on LAYOUTGET
    pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
    pNFS: Clear the layout return tracking on layout reinitialisation
    pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
    nfs: don't create zero-length requests
    ...

    Linus Torvalds
     

30 Jul, 2016

2 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     
  • Pull userns vfs updates from Eric Biederman:
    "This tree contains some very long awaited work on generalizing the
    user namespace support for mounting filesystems to include filesystems
    with a backing store. The real world target is fuse but the goal is
    to update the vfs to allow any filesystem to be supported. This
    patchset is based on a lot of code review and testing to approach that
    goal.

    While looking at what is needed to support the fuse filesystem it
    became clear that there were things like xattrs for security modules
    that needed special treatment. That the resolution of those concerns
    would not be fuse specific. That sorting out these general issues
    made most sense at the generic level, where the right people could be
    drawn into the conversation, and the issues could be solved for
    everyone.

    At a high level what this patchset does a couple of simple things:

    - Add a user namespace owner (s_user_ns) to struct super_block.

    - Teach the vfs to handle filesystem uids and gids not mapping into
    to kuids and kgids and being reported as INVALID_UID and
    INVALID_GID in vfs data structures.

    By assigning a user namespace owner filesystems that are mounted with
    only user namespace privilege can be detected. This allows security
    modules and the like to know which mounts may not be trusted. This
    also allows the set of uids and gids that are communicated to the
    filesystem to be capped at the set of kuids and kgids that are in the
    owning user namespace of the filesystem.

    One of the crazier corner casees this handles is the case of inodes
    whose i_uid or i_gid are not mapped into the vfs. Most of the code
    simply doesn't care but it is easy to confuse the inode writeback path
    so no operation that could cause an inode write-back is permitted for
    such inodes (aka only reads are allowed).

    This set of changes starts out by cleaning up the code paths involved
    in user namespace permirted mounts. Then when things are clean enough
    adds code that cleanly sets s_user_ns. Then additional restrictions
    are added that are possible now that the filesystem superblock
    contains owner information.

    These changes should not affect anyone in practice, but there are some
    parts of these restrictions that are changes in behavior.

    - Andy's restriction on suid executables that does not honor the
    suid bit when the path is from another mount namespace (think
    /proc/[pid]/fd/) or when the filesystem was mounted by a less
    privileged user.

    - The replacement of the user namespace implicit setting of MNT_NODEV
    with implicitly setting SB_I_NODEV on the filesystem superblock
    instead.

    Using SB_I_NODEV is a stronger form that happens to make this state
    user invisible. The user visibility can be managed but it caused
    problems when it was introduced from applications reasonably
    expecting mount flags to be what they were set to.

    There is a little bit of work remaining before it is safe to support
    mounting filesystems with backing store in user namespaces, beyond
    what is in this set of changes.

    - Verifying the mounter has permission to read/write the block device
    during mount.

    - Teaching the integrity modules IMA and EVM to handle filesystems
    mounted with only user namespace root and to reduce trust in their
    security xattrs accordingly.

    - Capturing the mounters credentials and using that for permission
    checks in d_automount and the like. (Given that overlayfs already
    does this, and we need the work in d_automount it make sense to
    generalize this case).

    Furthermore there are a few changes that are on the wishlist:

    - Get all filesystems supporting posix acls using the generic posix
    acls so that posix_acl_fix_xattr_from_user and
    posix_acl_fix_xattr_to_user may be removed. [Maintainability]

    - Reducing the permission checks in places such as remount to allow
    the superblock owner to perform them.

    - Allowing the superblock owner to chown files with unmapped uids and
    gids to something that is mapped so the files may be treated
    normally.

    I am not considering even obvious relaxations of permission checks
    until it is clear there are no more corner cases that need to be
    locked down and handled generically.

    Many thanks to Seth Forshee who kept this code alive, and putting up
    with me rewriting substantial portions of what he did to handle more
    corner cases, and for his diligent testing and reviewing of my
    changes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
    fs: Call d_automount with the filesystems creds
    fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    evm: Translate user/group ids relative to s_user_ns when computing HMAC
    dquot: For now explicitly don't support filesystems outside of init_user_ns
    quota: Handle quota data stored in s_user_ns in quota_setxquota
    quota: Ensure qids map to the filesystem
    vfs: Don't create inodes with a uid or gid unknown to the vfs
    vfs: Don't modify inodes with a uid or gid unknown to the vfs
    cred: Reject inodes with invalid ids in set_create_file_as()
    fs: Check for invalid i_uid in may_follow_link()
    vfs: Verify acls are valid within superblock's s_user_ns.
    userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
    fs: Refuse uid/gid changes which don't map into s_user_ns
    selinux: Add support for unprivileged mounts from user namespaces
    Smack: Handle labels consistently in untrusted mounts
    Smack: Add support for unprivileged mounts from user namespaces
    fs: Treat foreign mounts as nosuid
    fs: Limit file caps to the user namespace of the super block
    userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
    userns: Remove implicit MNT_NODEV fragility.
    ...

    Linus Torvalds
     

29 Jul, 2016

1 commit

  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

28 Jul, 2016

7 commits

  • Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Add pool namesapce pointer to struct ceph_file_layout and struct
    ceph_object_locator. Pool namespace is used by when mapping object
    to PG, it's also used when composing OSD request.

    The namespace pointer in struct ceph_file_layout is RCU protected.
    So libceph can read namespace without taking lock.

    Signed-off-by: Yan, Zheng
    [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • The data structure is for storing namesapce string. It allows namespace
    string to be shared between cephfs inodes with same layout. This data
    structure can also be referenced by OSD request.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Define new ceph_file_layout structure and rename old ceph_file_layout
    to ceph_file_layout_legacy. This is preparation for adding namespace
    to ceph_file_layout structure.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • - decode.h needs slab.h for kmalloc()
    - osd_client.h needs msgpool.h for struct ceph_msgpool
    - msgpool.h doesn't need messenger.h

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     

27 Jul, 2016

11 commits