27 Mar, 2016

1 commit

  • Pull Ceph updates from Sage Weil:
    "There is quite a bit here, including some overdue refactoring and
    cleanup on the mon_client and osd_client code from Ilya, scattered
    writeback support for CephFS and a pile of bug fixes from Zheng, and a
    few random cleanups and fixes from others"

    [ I already decided not to pull this because of it having been rebased
    recently, but ended up changing my mind after all. Next time I'll
    really hold people to it. Oh well. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
    libceph: use KMEM_CACHE macro
    ceph: use kmem_cache_zalloc
    rbd: use KMEM_CACHE macro
    ceph: use lookup request to revalidate dentry
    ceph: kill ceph_get_dentry_parent_inode()
    ceph: fix security xattr deadlock
    ceph: don't request vxattrs from MDS
    ceph: fix mounting same fs multiple times
    ceph: remove unnecessary NULL check
    ceph: avoid updating directory inode's i_size accidentally
    ceph: fix race during filling readdir cache
    libceph: use sizeof_footer() more
    ceph: kill ceph_empty_snapc
    ceph: fix a wrong comparison
    ceph: replace CURRENT_TIME by current_fs_time()
    ceph: scattered page writeback
    libceph: add helper that duplicates last extent operation
    libceph: enable large, variable-sized OSD requests
    libceph: osdc->req_mempool should be backed by a slab pool
    libceph: make r_request msg_size calculation clearer
    ...

    Linus Torvalds
     

26 Mar, 2016

17 commits

  • Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.

    Signed-off-by: Geliang Tang
    Signed-off-by: Ilya Dryomov

    Geliang Tang
     
  • Don't open-code sizeof_footer() in read_partial_message() and
    ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now
    possible to use con_out_kvec_add() in prepare_write_message_footer().

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • This helper duplicates last extent operation in OSD request, then
    adjusts the new extent operation's offset and length. The helper
    is for scatterd page writeback, which adds nonconsecutive dirty
    pages to single OSD request.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Turn r_ops into a flexible array member to enable large, consisting of
    up to 16 ops, OSD requests. The use case is scattered writeback in
    cephfs and, as far as the kernel client is concerned, 16 is just a made
    up number.

    r_ops had size 3 for copyup+hint+write, but copyup is really a special
    case - it can only happen once. ceph_osd_request_cache is therefore
    stuffed with num_ops=2 requests, anything bigger than that is allocated
    with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
    means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
    users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
    that.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • ceph_osd_request_cache was introduced a long time ago. Also, osd_req
    is about to get a flexible array member, which ceph_osd_request_cache
    is going to be aware of.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Although msg_size is calculated correctly, the terms are grouped in
    a misleading way - snaps appears to not have room for a u32 length.
    Move calculation closer to its use and regroup terms.

    No functional change.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • This avoids defining large array of r_reply_op_{len,result} in
    in struct ceph_osd_request.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Follow userspace nomenclature on this - the next commit adds
    outdata_len.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • This can happen if __close_session() in ceph_monc_stop() races with
    a connection reset. We need to ignore such faults, otherwise it's
    likely we would take !hunting, call __schedule_delayed() and end up
    with delayed_work() executing on invalid memory, among other things.

    The (two!) con->private tests are useless, as nothing ever clears
    con->private. Nuke them.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Doing __schedule_delayed() in the hunting branch is pointless, as the
    tick will have already been scheduled by then.

    What we need to do instead is *reschedule* it in the !hunting branch,
    after reopen_session() changes hunt_mult, which affects the delay.
    This helps with spacing out connection attempts and avoiding things
    like two back-to-back attempts followed by a longer period of waiting
    around.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • hunting is now set in __open_session() and cleared in finish_hunting(),
    instead of all around. The "session lost" message is printed not only
    on connection resets, but also on keepalive timeouts.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Unless we are in the process of setting up a client (i.e. connecting to
    the monitor cluster for the first time), apply a backoff: every time we
    want to reopen a session, increase our timeout by a multiple (currently
    2); when we complete the connection, reduce that multipler by 50%.

    Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Split ping interval and ping timeout: ping interval is 10s; keepalive
    timeout is 30s.

    Make monc_ping_timeout a constant while at it - it's not actually
    exported as a mount option (and the rest of tick-related settings won't
    be either), so it's got no place in ceph_options.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Don't try to reconnect to the same monitor when we fail to establish
    a session within a timeout or it's lost.

    For that, pick_new_mon() needs to see the old value of cur_mon, so
    don't clear it in __close_session() - all calls to __close_session()
    but one are followed by __open_session() anyway. __open_session() is
    only called when a new session needs to be established, so the "already
    open?" branch, which is now in the way, is simply dropped.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • It is currently hard-coded in the mon_client that mdsmap and monmap
    subs are continuous, while osdmap sub is always "onetime". To better
    handle full clusters/pools in the osd_client, we need to be able to
    issue continuous osdmap subs. Revamp subs code to allow us to specify
    for each sub whether it should be continuous or not.

    Although not strictly required for the above, switch to SUBSCRIBE2
    protocol while at it, eliminating the ambiguity between a request for
    "every map since X" and a request for "just the latest" when we don't
    have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
    required - it's been supported since pre-argonaut (2010).

    Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
    in before we validate the epoch and successfully install the new map
    can mess up mon_client sub state.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Coupling hunting state with subscribe state is not a good idea. Clear
    hunting when we complete the authentication handshake.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Our debugfs dir name is a concatenation of cluster fsid and client
    unique ID ("global_id"). It used to be the case that we learned
    global_id first, nowadays we always learn fsid first - the monmap is
    sent before any auth replies are. ceph_debugfs_client_init() call in
    ceph_monc_handle_map() is therefore never executed and can be removed.

    Its counterpart in handle_auth_reply() doesn't really belong there
    either: having to do monc->client and unlocking early to work around
    lockdep is a testament to that. Move it into __ceph_open_session(),
    where it can be called unconditionally.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

25 Mar, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Various bugfixes, a RDMA update from Chuck Lever, and support for a
    new pnfs layout type from Christoph Hellwig. The new layout type is a
    variant of the block layout which uses SCSI features to offer improved
    fencing and device identification.

    (Also: note this pull request also includes the client side of SCSI
    layout, with Trond's permission.)"

    * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
    sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
    nfsd: recover: fix memory leak
    nfsd: fix deadlock secinfo+readdir compound
    nfsd4: resfh unused in nfsd4_secinfo
    svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
    svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
    svcrdma: Remove close_out exit path
    svcrdma: Hook up the logic to return ERR_CHUNK
    svcrdma: Use correct XID in error replies
    svcrdma: Make RDMA_ERROR messages work
    rpcrdma: Add RPCRDMA_HDRLEN_ERR
    svcrdma: svc_rdma_post_recv() should close connection on error
    svcrdma: Close connection when a send error occurs
    nfsd: Lower NFSv4.1 callback message size limit
    svcrdma: Do not send Write chunk XDR pad with inline content
    svcrdma: Do not write xdr_buf::tail in a Write chunk
    svcrdma: Find client-provided write and reply chunks once per reply
    nfsd: Update NFS server comments related to RDMA support
    nfsd: Fix a memory leak when meeting unsupported state_protect_how4
    nfsd4: fix bad bounds checking

    Linus Torvalds
     

24 Mar, 2016

3 commits

  • Pull networking bugfixes from David Miller:
    "Several bug fixes rolling in, some for changes introduced in this
    merge window, and some for problems that have existed for some time:

    1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.

    2) The new DST_CACHE should be a silent config option, from Dave
    Jones.

    3) inet_current_timestamp() unintentionally truncates timestamps to
    16-bit, from Deepa Dinamani.

    4) Missing reference to netns in ppp, from Guillaume Nault.

    5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.

    6) Missing kernel doc documentation for function arguments in various
    spots around the networking, from Luis de Bethencourt.

    7) UDP stopped receiving broadcast packets properly, due to
    overzealous multicast checks, fix from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    net: ping: make ping_v6_sendmsg static
    hv_netvsc: Fix the order of num_sc_offered decrement
    net: Fix typos and whitespace.
    hv_netvsc: Fix the array sizes to be max supported channels
    hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
    ppp: take reference on channels netns
    net: Reset encap_level to avoid resetting features on inner IP headers
    net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
    net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
    at803x: fix reset handling
    AF_VSOCK: Shrink the area influenced by prepare_to_wait
    Revert "vsock: Fix blocking ops call in prepare_to_wait"
    macb: fix PHY reset
    ipv4: initialize flowi4_flags before calling fib_lookup()
    fsl/fman: Workaround for Errata A-007273
    ipv4: fix broadcast packets reception
    net: hns: bug fix about the overflow of mss
    net: hns: adds limitation for debug port mtu
    net: hns: fix the bug about mtu setting
    net: hns: fixes a bug of RSS
    ...

    Linus Torvalds
     
  • As ping_v6_sendmsg is used only in this file,
    making it static

    The body of "pingv6_prot" and "pingv6_protosw" were
    moved at the middle of the file, to avoid having to
    declare some static prototypes.

    Signed-off-by: Haishuang Yan
    Signed-off-by: David S. Miller

    Haishuang Yan
     
  • This patch corrects an oversight in which we were allowing the encap_level
    value to pass from the outer headers to the inner headers. As a result we
    were incorrectly identifying UDP or GRE tunnels as also making use of ipip
    or sit when the second header actually represented a tunnel encapsulated in
    either a UDP or GRE tunnel which already had the features masked.

    Fixes: 76443456227097179c1482 ("net: Move GSO csum into SKB_GSO_CB")
    Reported-by: Tom Herbert
    Signed-off-by: Alexander Duyck
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Alexander Duyck
     

23 Mar, 2016

10 commits

  • Merge third patch-bomb from Andrew Morton:

    - more ocfs2 changes

    - a few hotfixes

    - Andy's compat cleanups

    - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
    panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

    - many rapidio updates: fixes, new drivers.

    - kcov: kernel code coverage feature. Like gcov, but not
    "prohibitively expensive".

    - extable code consolidation for various archs

    * emailed patches from Andrew Morton : (81 commits)
    ia64/extable: use generic search and sort routines
    x86/extable: use generic search and sort routines
    s390/extable: use generic search and sort routines
    alpha/extable: use generic search and sort routines
    kernel/...: convert pr_warning to pr_warn
    drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
    drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
    memremap: add MEMREMAP_WC flag
    memremap: don't modify flags
    kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
    mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
    ipc/sem: make semctl setting sempid consistent
    ubsan: fix tree-wide -Wmaybe-uninitialized false positives
    kfifo: fix sparse complaints
    scripts/gdb: account for changes in module data structure
    scripts/gdb: add cmdline reader command
    scripts/gdb: add version command
    kernel: add kcov code coverage
    profile: hide unused functions when !CONFIG_PROC_FS
    hpwdt: use nmi_panic() when kernel panics in NMI handler
    ...

    Linus Torvalds
     
  • Pull more rdma updates from Doug Ledford:
    "Round two of 4.6 merge window patches.

    This is a monster pull request. I held off on the hfi1 driver updates
    (the hfi1 driver is intimately tied to the qib driver and the new
    rdmavt software library that was created to help both of them) in my
    first pull request. The hfi1/qib/rdmavt update is probably 90% of
    this pull request. The hfi1 driver is being left in staging so that
    it can be fixed up in regards to the API that Al and yourself didn't
    like. Intel has agreed to do the work, but in the meantime, this
    clears out 300+ patches in the backlog queue and brings my tree and
    their tree closer to sync.

    This also includes about 10 patches to the core and a few to mlx5 to
    create an infrastructure for configuring SRIOV ports on IB devices.
    That series includes one patch to the net core that we sent to netdev@
    and Dave Miller with each of the three revisions to the series. We
    didn't get any response to the patch, so we took that as implicit
    approval.

    Finally, this series includes Intel's new iWARP driver for their x722
    cards. It's not nearly the beast as the hfi1 driver. It also has a
    linux-next merge issue, but that has been resolved and it now passes
    just fine.

    Summary:

    - A few minor core fixups needed for the next patch series

    - The IB SRIOV series. This has bounced around for several versions.
    Of note is the fact that the first patch in this series effects the
    net core. It was directed to netdev and DaveM for each iteration
    of the series (three versions total). Dave did not object, but did
    not respond either. I've taken this as permission to move forward
    with the series.

    - The new Intel X722 iWARP driver

    - A huge set of updates to the Intel hfi1 driver. Of particular
    interest here is that we have left the driver in staging since it
    still has an API that people object to. Intel is working on a fix,
    but getting these patches in now helps keep me sane as the upstream
    and Intel's trees were over 300 patches apart"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
    IB/ipoib: Allow mcast packets from other VFs
    IB/mlx5: Implement callbacks for manipulating VFs
    net/mlx5_core: Implement modify HCA vport command
    net/mlx5_core: Add VF param when querying vport counter
    IB/ipoib: Add ndo operations for configuring VFs
    IB/core: Add interfaces to control VF attributes
    IB/core: Support accessing SA in virtualized environment
    IB/core: Add subnet prefix to port info
    IB/mlx5: Fix decision on using MAD_IFC
    net/core: Add support for configuring VF GUIDs
    IB/{core, ulp} Support above 32 possible device capability flags
    IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
    net/mlx5_core: Introduce offload arithmetic hardware capabilities
    net/mlx5_core: Refactor device capability function
    net/mlx5_core: Fix caching ATOMIC endian mode capability
    ib_srpt: fix a WARN_ON() message
    i40iw: Replace the obsolete crypto hash interface with shash
    IB/hfi1: Add SDMA cache eviction algorithm
    IB/hfi1: Switch to using the pin query function
    IB/hfi1: Specify mm when releasing pages
    ...

    Linus Torvalds
     
  • The code wants to prevent compat code from receiving messages. Use
    in_compat_syscall for this.

    Signed-off-by: Andy Lutomirski
    Cc: Steffen Klassert
    Cc: Herbert Xu
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • SCTP unfortunately has a different ABI for SCTP_SOCKOPT_CONNECTX3 for
    32-bit and 64-bit callers. Use in_compat_syscall to correctly
    distinguish them on all architectures.

    Signed-off-by: Andy Lutomirski
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • When a thread is prepared for waiting by calling prepare_to_wait, sleeping
    is not allowed until either the wait has taken place or finish_wait has
    been called. The existing code in af_vsock imposed unnecessary no-sleep
    assumptions to a broad list of backend functions.
    This patch shrinks the influence of prepare_to_wait to the area where it
    is strictly needed, therefore relaxing the no-sleep restriction there.

    Signed-off-by: Claudio Imbrenda
    Signed-off-by: David S. Miller

    Claudio Imbrenda
     
  • This reverts commit 5988818008257ca42010d6b43a3e0e48afec9898 ("vsock: Fix
    blocking ops call in prepare_to_wait")

    The commit reverted with this patch caused us to potentially miss wakeups.
    Since the condition is not checked between the prepare_to_wait and the
    schedule(), if a wakeup happens after the condition is checked but before
    the sleep happens, we will miss it. ( A description of the problem can be
    found here: http://www.makelinux.net/ldd3/chp-6-sect-2 ).

    By reverting the patch, the behaviour is still incorrect (since we
    shouldn't sleep between the prepare_to_wait and the schedule) but at least
    it will not miss wakeups.

    The next patch in the series actually fixes the behaviour.

    Signed-off-by: Claudio Imbrenda
    Signed-off-by: David S. Miller

    Claudio Imbrenda
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Features:
    - Add support for multiple NFSv4.1 callbacks in flight
    - Initial patchset for RPC multipath support
    - Adapt RPC/RDMA to use the new completion queue API

    Bugfixes and cleanups:
    - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
    - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
    - Fix RPC/RDMA credit accounting
    - Properly handle RDMA_ERROR replies
    - xprtrdma: Do not wait if ib_post_send() fails
    - xprtrdma: Segment head and tail XDR buffers on page boundaries
    - xprtrdma cleanups for dprintk, physical_op_map and unused macros"

    * tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
    nfs/blocklayout: make sure making a aligned read request
    nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
    nfs: remove nfs_inode_dio_wait
    nfs: remove nfs4_file_fsync
    xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
    xprtrdma: Use an anonymous union in struct rpcrdma_mw
    xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
    xprtrdma: Serialize credit accounting again
    xprtrdma: Properly handle RDMA_ERROR replies
    rpcrdma: Add RPCRDMA_HDRLEN_ERR
    xprtrdma: Do not wait if ib_post_send() fails
    xprtrdma: Segment head and tail XDR buffers on page boundaries
    xprtrdma: Clean up dprintk format string containing a newline
    xprtrdma: Clean up physical_op_map()
    xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
    NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
    pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
    SUNRPC: Allow addition of new transports to a struct rpc_clnt
    NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
    SUNRPC: Make NFS swap work with multipath
    ...

    Linus Torvalds
     
  • Field fl4.flowi4_flags is not initialized in fib_compute_spec_dst()
    before calling fib_lookup(), which means fib_table_lookup() is
    using non-deterministic data at this line:

    if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {

    Fix by initializing the entire fl4 structure, which will prevent
    similar issues as fields are added in the future by ensuring that
    all fields are initialized to zero unless explicitly initialized
    to another value.

    Fixes: 58189ca7b2741 ("net: Fix vti use case with oif in dst lookups")
    Suggested-by: David Ahern
    Signed-off-by: Lance Richardson
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Lance Richardson
     
  • Currently, ingress ipv4 broadcast datagrams are dropped since,
    in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on
    bcast packets.

    This patch addresses the issue, invoking ip_check_mc_rcu()
    only for mcast packets.

    Fixes: 6e5403093261 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()")
    Signed-off-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
    dev_ioctl(), which provides support for NIC driver ioctls, which
    includes ethtool support. This is similar to the way ioctls are handled
    in udp.c or tcp.c.

    This removes the requirement that ethtool for example be tied to the
    support of a specific L3 protocol (ethtool uses an AF_INET socket
    today).

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    David Decotigny
     

22 Mar, 2016

6 commits

  • The millisecond timestamps returned by the function is
    converted to network byte order by making a call to htons().
    htons() only returns __be16 while __be32 is required here.

    This was identified by the sparse warning from the buildbot:
    net/ipv4/af_inet.c:1405:16: sparse: incorrect type in return
    expression (different base types)
    net/ipv4/af_inet.c:1405:16: expected restricted __be32
    net/ipv4/af_inet.c:1405:16: got restricted __be16 [usertype]

    Change the function to use htonl() to return the correct __be32 type
    instead so that the millisecond value doesn't get truncated.

    Signed-off-by: Deepa Dinamani
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: Hideaki YOSHIFUJI
    Cc: James Morris
    Cc: Patrick McHardy
    Cc: Arnd Bergmann
    Fixes: 822c868532ca ("net: ipv4: Convert IP network timestamps to be y2038 safe")
    Reported-by: Fengguang Wu [0-day test robot]
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • commit 911362c70d ("net: add dst_cache support") added a new
    kconfig option that gets selected by other networking options.
    It seems the intent wasn't to offer this as a user-selectable
    option given the lack of help text, so this patch converts it
    to a silent option.

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     
  • Add two new NLAs to support configuration of Infiniband node or port
    GUIDs. New applications can choose to use this interface to configure
    GUIDs with iproute2 with commands such as:

    ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
    ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

    A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
    request to change the GUID.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • It can be useful to lower max_gso_segs on NIC with very low
    number of TX descriptors like bcmgenet.

    However, this is defeated by bridge since it does not propagate
    the lower value of max_gso_segs and max_gso_size.

    Signed-off-by: Eric Dumazet
    Cc: Petri Gynther
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It can be useful to report dev->gso_max_segs and dev->gso_max_size
    so that "ip -d link" can display them to help debugging.

    For the moment, these attributes are read-only.

    Signed-off-by: Eric Dumazet
    Cc: Petri Gynther
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When the function dev_get_phys_port_name was added it missed a description
    for it's len argument. Adding it.

    Fixes: db24a9044ee1 ("net: add support for phys_port_name")
    Signed-off-by: Luis de Bethencourt
    Signed-off-by: David S. Miller

    Luis de Bethencourt
     

21 Mar, 2016

2 commits

  • Pull x86 protection key support from Ingo Molnar:
    "This tree adds support for a new memory protection hardware feature
    that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

    There's a background article at LWN.net:

    https://lwn.net/Articles/643797/

    The gist is that protection keys allow the encoding of
    user-controllable permission masks in the pte. So instead of having a
    fixed protection mask in the pte (which needs a system call to change
    and works on a per page basis), the user can map a (handful of)
    protection mask variants and can change the masks runtime relatively
    cheaply, without having to change every single page in the affected
    virtual memory range.

    This allows the dynamic switching of the protection bits of large
    amounts of virtual memory, via user-space instructions. It also
    allows more precise control of MMU permission bits: for example the
    executable bit is separate from the read bit (see more about that
    below).

    This tree adds the MM infrastructure and low level x86 glue needed for
    that, plus it adds a high level API to make use of protection keys -
    if a user-space application calls:

    mmap(..., PROT_EXEC);

    or

    mprotect(ptr, sz, PROT_EXEC);

    (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
    this special case, and will set a special protection key on this
    memory range. It also sets the appropriate bits in the Protection
    Keys User Rights (PKRU) register so that the memory becomes unreadable
    and unwritable.

    So using protection keys the kernel is able to implement 'true'
    PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
    PROT_READ as well. Unreadable executable mappings have security
    advantages: they cannot be read via information leaks to figure out
    ASLR details, nor can they be scanned for ROP gadgets - and they
    cannot be used by exploits for data purposes either.

    We know about no user-space code that relies on pure PROT_EXEC
    mappings today, but binary loaders could start making use of this new
    feature to map binaries and libraries in a more secure fashion.

    There is other pending pkeys work that offers more high level system
    call APIs to manage protection keys - but those are not part of this
    pull request.

    Right now there's a Kconfig that controls this feature
    (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
    (like most x86 CPU feature enablement code that has no runtime
    overhead), but it's not user-configurable at the moment. If there's
    any serious problem with this then we can make it configurable and/or
    flip the default"

    * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
    x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
    mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
    x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
    mm/core, x86/mm/pkeys: Add execute-only protection keys support
    x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
    x86/mm/pkeys: Allow kernel to modify user pkey rights register
    x86/fpu: Allow setting of XSAVE state
    x86/mm: Factor out LDT init from context init
    mm/core, x86/mm/pkeys: Add arch_validate_pkey()
    mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
    x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
    x86/mm/pkeys: Add Kconfig prompt to existing config option
    x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
    x86/mm/pkeys: Dump PKRU with other kernel registers
    mm/core, x86/mm/pkeys: Differentiate instruction fetches
    x86/mm/pkeys: Optimize fault handling in access_error()
    mm/core: Do not enforce PKEY permissions on remote mm access
    um, pkeys: Add UML arch_*_access_permitted() methods
    mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
    x86/mm/gup: Simplify get_user_pages() PTE bit handling
    ...

    Linus Torvalds
     
  • Commit 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe")
    added the argument cpu_bstats to functions gen_new_estimator and
    gen_replace_estimator and now the descriptions of these are missing for the
    documentation. Adding them.

    Signed-off-by: Luis de Bethencourt
    Signed-off-by: David S. Miller

    Luis de Bethencourt