04 Mar, 2017

1 commit

  • Pull vfs 'statx()' update from Al Viro.

    This adds the new extended stat() interface that internally subsumes our
    previous stat interfaces, and allows user mode to specify in more detail
    what kind of information it wants.

    It also allows for some explicit synchronization information to be
    passed to the filesystem, which can be relevant for network filesystems:
    is the cached value ok, or do you need open/close consistency, or what?

    From David Howells.

    Andreas Dilger points out that the first version of the extended statx
    interface was posted June 29, 2010:

    https://www.spinics.net/lists/linux-fsdevel/msg33831.html

    * 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    statx: Add a system call to make enhanced file info available

    Linus Torvalds
     

03 Mar, 2017

1 commit

  • Add a system call to make extended file information available, including
    file creation and some attribute flags where available through the
    underlying filesystem.

    The getattr inode operation is altered to take two additional arguments: a
    u32 request_mask and an unsigned int flags that indicate the
    synchronisation mode. This change is propagated to the vfs_getattr*()
    function.

    Functions like vfs_stat() are now inline wrappers around new functions
    vfs_statx() and vfs_statx_fd() to reduce stack usage.

    ========
    OVERVIEW
    ========

    The idea was initially proposed as a set of xattrs that could be retrieved
    with getxattr(), but the general preference proved to be for a new syscall
    with an extended stat structure.

    A number of requests were gathered for features to be included. The
    following have been included:

    (1) Make the fields a consistent size on all arches and make them large.

    (2) Spare space, request flags and information flags are provided for
    future expansion.

    (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
    __s64).

    (4) Creation time: The SMB protocol carries the creation time, which could
    be exported by Samba, which will in turn help CIFS make use of
    FS-Cache as that can be used for coherency data (stx_btime).

    This is also specified in NFSv4 as a recommended attribute and could
    be exported by NFSD [Steve French].

    (5) Lightweight stat: Ask for just those details of interest, and allow a
    netfs (such as NFS) to approximate anything not of interest, possibly
    without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
    Dilger] (AT_STATX_DONT_SYNC).

    (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
    its cached attributes are up to date [Trond Myklebust]
    (AT_STATX_FORCE_SYNC).

    And the following have been left out for future extension:

    (7) Data version number: Could be used by userspace NFS servers [Aneesh
    Kumar].

    Can also be used to modify fill_post_wcc() in NFSD which retrieves
    i_version directly, but has just called vfs_getattr(). It could get
    it from the kstat struct if it used vfs_xgetattr() instead.

    (There's disagreement on the exact semantics of a single field, since
    not all filesystems do this the same way).

    (8) BSD stat compatibility: Including more fields from the BSD stat such
    as creation time (st_btime) and inode generation number (st_gen)
    [Jeremy Allison, Bernd Schubert].

    (9) Inode generation number: Useful for FUSE and userspace NFS servers
    [Bernd Schubert].

    (This was asked for but later deemed unnecessary with the
    open-by-handle capability available and caused disagreement as to
    whether it's a security hole or not).

    (10) Extra coherency data may be useful in making backups [Andreas Dilger].

    (No particular data were offered, but things like last backup
    timestamp, the data version number and the DOS archive bit would come
    into this category).

    (11) Allow the filesystem to indicate what it can/cannot provide: A
    filesystem can now say it doesn't support a standard stat feature if
    that isn't available, so if, for instance, inode numbers or UIDs don't
    exist or are fabricated locally...

    (This requires a separate system call - I have an fsinfo() call idea
    for this).

    (12) Store a 16-byte volume ID in the superblock that can be returned in
    struct xstat [Steve French].

    (Deferred to fsinfo).

    (13) Include granularity fields in the time data to indicate the
    granularity of each of the times (NFSv4 time_delta) [Steve French].

    (Deferred to fsinfo).

    (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
    Note that the Linux IOC flags are a mess and filesystems such as Ext4
    define flags that aren't in linux/fs.h, so translation in the kernel
    may be a necessity (or, possibly, we provide the filesystem type too).

    (Some attributes are made available in stx_attributes, but the general
    feeling was that the IOC flags were to ext[234]-specific and shouldn't
    be exposed through statx this way).

    (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
    Michael Kerrisk].

    (Deferred, probably to fsinfo. Finding out if there's an ACL or
    seclabal might require extra filesystem operations).

    (16) Femtosecond-resolution timestamps [Dave Chinner].

    (A __reserved field has been left in the statx_timestamp struct for
    this - if there proves to be a need).

    (17) A set multiple attributes syscall to go with this.

    ===============
    NEW SYSTEM CALL
    ===============

    The new system call is:

    int ret = statx(int dfd,
    const char *filename,
    unsigned int flags,
    unsigned int mask,
    struct statx *buffer);

    The dfd, filename and flags parameters indicate the file to query, in a
    similar way to fstatat(). There is no equivalent of lstat() as that can be
    emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
    also no equivalent of fstat() as that can be emulated by passing a NULL
    filename to statx() with the fd of interest in dfd.

    Whether or not statx() synchronises the attributes with the backing store
    can be controlled by OR'ing a value into the flags argument (this typically
    only affects network filesystems):

    (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
    respect.

    (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
    its attributes with the server - which might require data writeback to
    occur to get the timestamps correct.

    (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
    network filesystem. The resulting values should be considered
    approximate.

    mask is a bitmask indicating the fields in struct statx that are of
    interest to the caller. The user should set this to STATX_BASIC_STATS to
    get the basic set returned by stat(). It should be noted that asking for
    more information may entail extra I/O operations.

    buffer points to the destination for the data. This must be 256 bytes in
    size.

    ======================
    MAIN ATTRIBUTES RECORD
    ======================

    The following structures are defined in which to return the main attribute
    set:

    struct statx_timestamp {
    __s64 tv_sec;
    __s32 tv_nsec;
    __s32 __reserved;
    };

    struct statx {
    __u32 stx_mask;
    __u32 stx_blksize;
    __u64 stx_attributes;
    __u32 stx_nlink;
    __u32 stx_uid;
    __u32 stx_gid;
    __u16 stx_mode;
    __u16 __spare0[1];
    __u64 stx_ino;
    __u64 stx_size;
    __u64 stx_blocks;
    __u64 __spare1[1];
    struct statx_timestamp stx_atime;
    struct statx_timestamp stx_btime;
    struct statx_timestamp stx_ctime;
    struct statx_timestamp stx_mtime;
    __u32 stx_rdev_major;
    __u32 stx_rdev_minor;
    __u32 stx_dev_major;
    __u32 stx_dev_minor;
    __u64 __spare2[14];
    };

    The defined bits in request_mask and stx_mask are:

    STATX_TYPE Want/got stx_mode & S_IFMT
    STATX_MODE Want/got stx_mode & ~S_IFMT
    STATX_NLINK Want/got stx_nlink
    STATX_UID Want/got stx_uid
    STATX_GID Want/got stx_gid
    STATX_ATIME Want/got stx_atime{,_ns}
    STATX_MTIME Want/got stx_mtime{,_ns}
    STATX_CTIME Want/got stx_ctime{,_ns}
    STATX_INO Want/got stx_ino
    STATX_SIZE Want/got stx_size
    STATX_BLOCKS Want/got stx_blocks
    STATX_BASIC_STATS [The stuff in the normal stat struct]
    STATX_BTIME Want/got stx_btime{,_ns}
    STATX_ALL [All currently available stuff]

    stx_btime is the file creation time, stx_mask is a bitmask indicating the
    data provided and __spares*[] are where as-yet undefined fields can be
    placed.

    Time fields are structures with separate seconds and nanoseconds fields
    plus a reserved field in case we want to add even finer resolution. Note
    that times will be negative if before 1970; in such a case, the nanosecond
    fields will also be negative if not zero.

    The bits defined in the stx_attributes field convey information about a
    file, how it is accessed, where it is and what it does. The following
    attributes map to FS_*_FL flags and are the same numerical value:

    STATX_ATTR_COMPRESSED File is compressed by the fs
    STATX_ATTR_IMMUTABLE File is marked immutable
    STATX_ATTR_APPEND File is append-only
    STATX_ATTR_NODUMP File is not to be dumped
    STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

    Within the kernel, the supported flags are listed by:

    KSTAT_ATTR_FS_IOC_FLAGS

    [Are any other IOC flags of sufficient general interest to be exposed
    through this interface?]

    New flags include:

    STATX_ATTR_AUTOMOUNT Object is an automount trigger

    These are for the use of GUI tools that might want to mark files specially,
    depending on what they are.

    Fields in struct statx come in a number of classes:

    (0) stx_dev_*, stx_blksize.

    These are local system information and are always available.

    (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
    stx_size, stx_blocks.

    These will be returned whether the caller asks for them or not. The
    corresponding bits in stx_mask will be set to indicate whether they
    actually have valid values.

    If the caller didn't ask for them, then they may be approximated. For
    example, NFS won't waste any time updating them from the server,
    unless as a byproduct of updating something requested.

    If the values don't actually exist for the underlying object (such as
    UID or GID on a DOS file), then the bit won't be set in the stx_mask,
    even if the caller asked for the value. In such a case, the returned
    value will be a fabrication.

    Note that there are instances where the type might not be valid, for
    instance Windows reparse points.

    (2) stx_rdev_*.

    This will be set only if stx_mode indicates we're looking at a
    blockdev or a chardev, otherwise will be 0.

    (3) stx_btime.

    Similar to (1), except this will be set to 0 if it doesn't exist.

    =======
    TESTING
    =======

    The following test program can be used to test the statx system call:

    samples/statx/test-statx.c

    Just compile and run, passing it paths to the files you want to examine.
    The file is built automatically if CONFIG_SAMPLES is enabled.

    Here's some example output. Firstly, an NFS directory that crosses to
    another FSID. Note that the AUTOMOUNT attribute is set because transiting
    this directory will cause d_automount to be invoked by the VFS.

    [root@andromeda ~]# /tmp/test-statx -A /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:26 Inode: 1703937 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000
    Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

    Secondly, the result of automounting on that directory.

    [root@andromeda ~]# /tmp/test-statx /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:27 Inode: 2 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

02 Mar, 2017

1 commit

  • So the original intention of tsk_cpus_allowed() was to 'future-proof'
    the field - but it's pretty ineffectual at that, because half of
    the code uses ->cpus_allowed directly ...

    Also, the wrapper makes the code longer than the original expression!

    So just get rid of it. This also shrinks a bit.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Feb, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
    Varadhan.

    2) Simplify classifier state on sk_buff in order to shrink it a bit.
    From Willem de Bruijn.

    3) Introduce SIPHASH and it's usage for secure sequence numbers and
    syncookies. From Jason A. Donenfeld.

    4) Reduce CPU usage for ICMP replies we are going to limit or
    suppress, from Jesper Dangaard Brouer.

    5) Introduce Shared Memory Communications socket layer, from Ursula
    Braun.

    6) Add RACK loss detection and allow it to actually trigger fast
    recovery instead of just assisting after other algorithms have
    triggered it. From Yuchung Cheng.

    7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

    8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

    9) Export MPLS packet stats via netlink, from Robert Shearman.

    10) Significantly improve inet port bind conflict handling, especially
    when an application is restarted and changes it's setting of
    reuseport. From Josef Bacik.

    11) Implement TX batching in vhost_net, from Jason Wang.

    12) Extend the dummy device so that VF (virtual function) features,
    such as configuration, can be more easily tested. From Phil
    Sutter.

    13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
    Dumazet.

    14) Add new bpf MAP, implementing a longest prefix match trie. From
    Daniel Mack.

    15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

    16) Add new aquantia driver, from David VomLehn.

    17) Add bpf tracepoints, from Daniel Borkmann.

    18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
    Florian Fainelli.

    19) Remove custom busy polling in many drivers, it is done in the core
    networking since 4.5 times. From Eric Dumazet.

    20) Support XDP adjust_head in virtio_net, from John Fastabend.

    21) Fix several major holes in neighbour entry confirmation, from
    Julian Anastasov.

    22) Add XDP support to bnxt_en driver, from Michael Chan.

    23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

    24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

    25) Support GRO in IPSEC protocols, from Steffen Klassert"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
    Revert "ath10k: Search SMBIOS for OEM board file extension"
    net: socket: fix recvmmsg not returning error from sock_error
    bnxt_en: use eth_hw_addr_random()
    bpf: fix unlocking of jited image when module ronx not set
    arch: add ARCH_HAS_SET_MEMORY config
    net: napi_watchdog() can use napi_schedule_irqoff()
    tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
    net/hsr: use eth_hw_addr_random()
    net: mvpp2: enable building on 64-bit platforms
    net: mvpp2: switch to build_skb() in the RX path
    net: mvpp2: simplify MVPP2_PRS_RI_* definitions
    net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
    net: mvpp2: remove unused register definitions
    net: mvpp2: simplify mvpp2_bm_bufs_add()
    net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
    net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
    net: mvpp2: release reference to txq_cpu[] entry after unmapping
    net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
    net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
    net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
    ...

    Linus Torvalds
     

22 Feb, 2017

1 commit

  • Pull security layer updates from James Morris:
    "Highlights:

    - major AppArmor update: policy namespaces & lots of fixes

    - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

    - SELinux cgroupfs labeling support

    - SELinux context mounts on tmpfs, ramfs, devpts within user
    namespaces

    - improved TPM 2.0 support"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
    tpm: declare tpm2_get_pcr_allocation() as static
    tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
    tpm xen: drop unneeded chip variable
    tpm: fix misspelled "facilitate" in module parameter description
    tpm_tis: fix the error handling of init_tis()
    KEYS: Use memzero_explicit() for secret data
    KEYS: Fix an error code in request_master_key()
    sign-file: fix build error in sign-file.c with libressl
    selinux: allow changing labels for cgroupfs
    selinux: fix off-by-one in setprocattr
    tpm: silence an array overflow warning
    tpm: fix the type of owned field in cap_t
    tpm: add securityfs support for TPM 2.0 firmware event log
    tpm: enhance read_log_of() to support Physical TPM event log
    tpm: enhance TPM 2.0 PCR extend to support multiple banks
    tpm: implement TPM 2.0 capability to get active PCR banks
    tpm: fix RC value check in tpm2_seal_trusted
    tpm_tis: fix iTPM probe via probe_itpm() function
    tpm: Begin the process to deprecate user_read_timer
    tpm: remove tpm_read_index and tpm_write_index from tpm.h
    ...

    Linus Torvalds
     

21 Feb, 2017

1 commit

  • Pull perf updates from Ingo Molnar:
    "On the kernel side the main changes in this cycle were:

    - Add Intel Kaby Lake CPU support (Srinivas Pandruvada)

    - AMD uncore driver updates for fam17 (Janakarajan Natarajan)

    - Intel/PT updates and core events optimizations and cleanups
    (Alexander Shishkin)

    - cgroups events fixes (David Carrillo-Cisneros)

    - kprobes improvements (Masami Hiramatsu)

    - ... plus misc fixes and updates.

    On the tooling side the main changes were:

    - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with
    CC=clang, to, for instance, take advantage of better warnings
    (Arnaldo Carvalho de Melo):

    - Introduce the 'delta-abs' 'perf diff' compute method, that orders
    the histogram entries by the absolute value of the percentage delta
    for a function in two perf.data files, i.e. the functions that
    changed the most (increase or decrease in samples) comes first
    (Namhyung Kim)

    - Add support for parsing Intel uncore vendor event files and add
    uncore vendor events for the Intel server processors (Haswell,
    Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE
    (Andi Kleen)

    - Introduce 'perf ftrace' a perf front end to the kernel's ftrace
    function and function_graph tracer, defaulting to the
    "function_graph" tracer, more work will be done in reviving this
    effort, forward porting it from its initial patch submission
    (Namhyung Kim)

    - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single
    hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa)

    - Account thread wait time (off CPU time) separately: sleep, iowait
    and preempt, based on the prev_state of the last event, show the
    breakdown when using "perf sched timehist --state" (Namhyumg Kim)

    - Add more triggers to switch the output file (perf.data.TIMESTAMP).

    Now, in addition to switching to a different output file when
    receiving a SIGUSR2, one can also specify file size and time based
    triggers:

    perf record -a --switch-output=signal

    is equivalent to what we had before:

    perf record -a --switch-output

    While we can also ask for the file to be "sliced" by size, taking
    into account that that will happen only when we get woken up by the
    kernel, i.e. one has to take into account the --mmap-pages (the
    size of the perf mmap ring buffer):

    perf record -a --switch-output=2G

    will break the perf.data output into multiple files limited to 2GB
    of samples, right when generating the output.

    For time based samples, alert() will be used, so to have 1 minute
    limited perf.data output files:

    perf record -a --switch-output=1m

    (Jiri Olsa)

    - Improve 'perf trace' (Arnaldo Carvalho de Melo)

    - 'perf kallsyms' toy tool to look for extended symbol information on
    the running kernel and demonstrate the machine/thread/symbol APIs
    for use in other tools, such as 'perf probe' (Arnaldo Carvalho de
    Melo)

    - ... plus tons of other changes, see the shortlog and Git log for
    details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (131 commits)
    perf tools: Add missing parse_events_error() prototype
    perf pmu: Fix check for unset alias->unit array
    perf tools: Be consistent on the type of map->symbols[] interator
    perf intel pt decoder: clang has no -Wno-override-init
    perf evsel: Do not put a variable sized type not at the end of a struct
    perf probe: Avoid accessing uninitialized 'map' variable
    perf tools: Do not put a variable sized type not at the end of a struct
    perf record: Do not put a variable sized type not at the end of a struct
    perf tests: Synthesize struct instead of using field after variable sized type
    perf bench numa: Make sure dprintf() is not defined
    Revert "perf bench futex: Sanitize numeric parameters"
    tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER
    tools: Set the maximum optimization level according to the compiler being used
    tools: Suppress request for warning options not existent in clang
    samples/bpf: Reset global variables
    samples/bpf: Ignore already processed ELF sections
    samples/bpf: Add missing header
    perf symbols: dso->name is an array, no need to check it against NULL
    perf tests record: No need to test an array against NULL
    perf symbols: No need to check if sym->name is NULL
    ...

    Linus Torvalds
     

17 Feb, 2017

1 commit


14 Feb, 2017

3 commits

  • Before loading a new ELF, clean previous kernel version, license and
    processed sections.

    Signed-off-by: Mickaël Salaün
    Acked-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170208202744.16274-3-mic@digikod.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Mickaël Salaün
     
  • Add a missing check for the map fixup loop.

    Signed-off-by: Mickaël Salaün
    Acked-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170208202744.16274-2-mic@digikod.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Mickaël Salaün
     
  • Include unistd.h to define __NR_getuid and __NR_getsid.

    Signed-off-by: Mickaël Salaün
    Acked-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170208202744.16274-4-mic@digikod.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Mickaël Salaün
     

13 Feb, 2017

1 commit

  • If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
    to the given cgroup the descendent cgroup will be able to override
    effective bpf program that was inherited from this cgroup.
    By default it's not passed, therefore override is disallowed.

    Examples:
    1.
    prog X attached to /A with default
    prog Y fails to attach to /A/B and /A/B/C
    Everything under /A runs prog X

    2.
    prog X attached to /A with allow_override.
    prog Y fails to attach to /A/B with default (non-override)
    prog M attached to /A/B with allow_override.
    Everything under /A/B runs prog M only.

    3.
    prog X attached to /A with allow_override.
    prog Y fails to attach to /A with default.
    The user has to detach first to switch the mode.

    In the future this behavior may be extended with a chain of
    non-overridable programs.

    Also fix the bug where detach from cgroup where nothing is attached
    was not throwing error. Return ENOENT in such case.

    Add several testcases and adjust libbpf.

    Fixes: 3007098494be ("cgroup: add support for eBPF programs")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Tejun Heo
    Acked-by: Daniel Mack
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

28 Jan, 2017

1 commit


24 Jan, 2017

1 commit

  • Extend the map_perf_test_{user,kern}.c infrastructure to stress test
    lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
    the latency depending on trie size and lookup count.

    On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
    bpf program takes roughly 6.5us on my system. Lookups in empty tries
    take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
    entries take ~7.1us (on the first _and_ any subsequent try).

    Signed-off-by: David Herrmann
    Reviewed-by: Daniel Mack
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    David Herrmann
     

21 Jan, 2017

1 commit

  • Fix build errors for samples/bpf xdp_tx_iptunnel and tc_l2_redirect,
    when dynamic debugging is enabled (CONFIG_DYNAMIC_DEBUG) by defining a
    fake KBUILD_MODNAME.

    Just like Daniel Borkmann fixed other samples/bpf in commit
    96a8eb1eeed2 ("bpf: fix samples to add fake KBUILD_MODNAME").

    Fixes: 12d8bb64e3f6 ("bpf: xdp: Add XDP example for head adjustment")
    Fixes: 90e02896f1a4 ("bpf: Add test for bpf_redirect to ipip/ip6tnl")
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

16 Jan, 2017

1 commit

  • Pull perf fixes from Ingo Molnar:
    "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
    driver fixes, plus miscellanous tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Reject non sampling events with precise_ip
    perf/x86/intel: Account interrupts for PEBS errors
    perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
    perf/core: Fix sys_perf_event_open() vs. hotplug
    perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
    perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
    perf/x86: Set pmu->module in Intel PMU modules
    perf probe: Fix to probe on gcc generated symbols for offline kernel
    perf probe: Fix --funcs to show correct symbols for offline module
    perf symbols: Robustify reading of build-id from sysfs
    perf tools: Install tools/lib/traceevent plugins with install-bin
    tools lib traceevent: Fix prev/next_prio for deadline tasks
    perf record: Fix --switch-output documentation and comment
    perf record: Make __record_options static
    tools lib subcmd: Add OPT_STRING_OPTARG_SET option
    perf probe: Fix to get correct modname from elf header
    samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
    samples/bpf sock_example: Avoid getting ethhdr from two includes
    perf sched timehist: Show total scheduling time

    Linus Torvalds
     

12 Jan, 2017

3 commits

  • We set info.count to 1 in mtty_get_irq_info() so static checkers
    complain that, "Why do we have impossible conditions?" The answer is
    that it seems to be left over dead code that can be safely removed.

    Signed-off-by: Dan Carpenter
    Reviewed-by: Kirti Wankhede
    Signed-off-by: Alex Williamson

    Dan Carpenter
     
  • This is a sample driver for documentation so the impact is probably
    pretty low. But we should check that bar_index is valid so we
    don't write beyond the end of the mdev_state->region_info[] array.

    Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Kirti Wankhede
    Signed-off-by: Alex Williamson

    Dan Carpenter
     
  • The copy_to_user() function returns the number of bytes which it wasn't
    able to copy but we want to return a negative error code.

    Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Kirti Wankhede
    Signed-off-by: Alex Williamson

    Dan Carpenter
     

09 Jan, 2017

1 commit

  • There were some bugs in the JNE64 and JLT64 comparision macros. This fixes
    them, improves comments, and cleans up the file while we are at it.

    Reported-by: Stephen Röttger
    Signed-off-by: Mathias Svensson
    Signed-off-by: Kees Cook
    Cc: stable@vger.kernel.org
    Signed-off-by: James Morris

    Mathias Svensson
     

05 Jan, 2017

1 commit

  • …linux/kernel/git/acme/linux into perf/urgent

    Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:

    Fixes:

    - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)

    - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)

    - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)

    - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)

    - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)

    - Fix 'perf probe' for cross arch probing (Masami Hiramatsu)

    Improvement:

    - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Ingo Molnar
     

04 Jan, 2017

1 commit


30 Dec, 2016

4 commits


28 Dec, 2016

2 commits

  • Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Joe Stringer
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To avoid the following build failure on Alpine Linux 3.4, that has
    clang-3.8 with the bpf target:

    HOSTCC samples/bpf/sock_example.o
    In file included from /usr/include/net/ethernet.h:10:0,
    from /git/linux/samples/bpf/sock_example.h:7,
    from /git/linux/samples/bpf/sock_example.c:30:
    /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
    ethhdr'
    struct ethhdr {
    ^
    In file included from /git/linux/samples/bpf/sock_example.c:26:0:
    ./usr/include/linux/if_ether.h:144:8: note: originally defined here
    struct ethhdr {
    ^
    scripts/Makefile.host:124: recipe for target
    'samples/bpf/sock_example.o' failed
    make[2]: *** [samples/bpf/sock_example.o] Error 1
    /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed

    So include net/if_ether.h for the needs of sock_example.h, using the
    same include that sock_example.c uses.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Joe Stringer
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

24 Dec, 2016

1 commit

  • Pull perf fixes from Ingo Molnar:
    "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
    plus on the tooling side there's a number of fixes and some late
    updates"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    perf sched timehist: Fix invalid period calculation
    perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
    perf sched timehist: Enlarge default 'comm_width'
    perf sched timehist: Honour 'comm_width' when aligning the headers
    perf/x86: Fix overlap counter scheduling bug
    perf/x86/pebs: Fix handling of PEBS buffer overflows
    samples/bpf: Move open_raw_sock to separate header
    samples/bpf: Remove perf_event_open() declaration
    samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
    tools lib bpf: Add bpf_prog_{attach,detach}
    samples/bpf: Switch over to libbpf
    perf diff: Do not overwrite valid build id
    perf annotate: Don't throw error for zero length symbols
    perf bench futex: Fix lock-pi help string
    perf trace: Check if MAP_32BIT is defined (again)
    samples/bpf: Make perf_event_read() static
    uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
    samples/bpf: Make samples more libbpf-centric
    tools lib bpf: Add flags to bpf_create_map()
    tools lib bpf: use __u32 from linux/types.h
    ...

    Linus Torvalds
     

20 Dec, 2016

6 commits

  • This function was declared in libbpf.c and was the only remaining
    function in this library, but has nothing to do with BPF. Shift it out
    into a new header, sock_example.h, and include it from the relevant
    samples.

    Signed-off-by: Joe Stringer
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • This declaration was made in samples/bpf/libbpf.c for convenience, but
    there's already one in tools/perf/perf-sys.h. Reuse that one.

    Committer notes:

    Testing it:

    $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
    make[1]: Entering directory '/home/build/v4.9.0-rc8+'
    CHK include/config/kernel.release
    GEN ./Makefile
    CHK include/generated/uapi/linux/version.h
    Using /home/acme/git/linux as source for kernel
    CHK include/generated/utsrelease.h
    CHK include/generated/timeconst.h
    CHK include/generated/bounds.h
    CHK include/generated/asm-offsets.h
    CALL /home/acme/git/linux/scripts/checksyscalls.sh
    HOSTCC samples/bpf/test_verifier.o
    HOSTCC samples/bpf/libbpf.o
    HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o
    HOSTCC samples/bpf/test_maps.o
    HOSTCC samples/bpf/sock_example.o
    HOSTCC samples/bpf/bpf_load.o

    HOSTLD samples/bpf/trace_event
    HOSTLD samples/bpf/sampleip
    HOSTLD samples/bpf/tc_l2_redirect
    make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
    $

    Also tested the offwaketime resulting from the rebuild, seems to work as
    before.

    Signed-off-by: Joe Stringer
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
    [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • Only one of the examples declare the bpf_insn bpf proggie as a const:

    $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
    samples/bpf/fds_example.c: static const struct bpf_insn insns[] = {
    samples/bpf/sock_example.c: struct bpf_insn prog[] = {
    samples/bpf/test_cgrp2_attach2.c: struct bpf_insn prog[] = {
    samples/bpf/test_cgrp2_attach.c: struct bpf_insn prog[] = {
    samples/bpf/test_cgrp2_sock.c: struct bpf_insn prog[] = {
    $

    Which causes this warning:

    [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/

    HOSTCC samples/bpf/fds_example.o
    /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
    /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
    insns, insns_cnt, "GPL", 0,
    ^~~~~
    In file included from /git/linux/samples/bpf/libbpf.h:5:0,
    from /git/linux/samples/bpf/bpf_load.h:4,
    from /git/linux/samples/bpf/fds_example.c:15:
    /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
    int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
    ^~~~~~~~~~~~~~~~
    HOSTCC samples/bpf/sockex1_user.o

    So just ditch that 'const' to reduce build noise, leaving changing the
    bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
    adequate.

    Cc: Joe Stringer
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
    eBPF programs to cgroups") added these functions to samples/libbpf, but
    during this merge all of the samples libbpf functionality is shifting to
    tools/lib/bpf. Shift these functions there.

    Committer notes:

    Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
    like the other wrapper calls to sys_bpf with bpf_attr to make this build
    in older toolchais, such as the ones in CentOS 5 and 6.

    Signed-off-by: Joe Stringer
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
    Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • Now that libbpf under tools/lib/bpf/* is synced with the version from
    samples/bpf, we can get rid most of the libbpf library here.

    Committer notes:

    Built it in a docker fedora rawhide container and ran it in the f25 host, seems
    to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
    doesn't seem to have introduced problems and Joe said he tested it with
    all the entries in samples/bpf/ and other code he found:

    [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install

    [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
    [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
    make[1]: Entering directory '/tmp/build/linux'
    CHK include/config/kernel.release
    HOSTCC scripts/basic/fixdep
    GEN ./Makefile
    CHK include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK include/generated/utsrelease.h
    HOSTCC scripts/basic/bin2c
    HOSTCC arch/x86/tools/relocs_32.o
    HOSTCC arch/x86/tools/relocs_64.o
    LD samples/bpf/built-in.o

    HOSTCC samples/bpf/fds_example.o
    HOSTCC samples/bpf/sockex1_user.o
    /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
    /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
    insns, insns_cnt, "GPL", 0,
    ^~~~~
    In file included from /git/linux/samples/bpf/libbpf.h:5:0,
    from /git/linux/samples/bpf/bpf_load.h:4,
    from /git/linux/samples/bpf/fds_example.c:15:
    /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
    int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
    ^~~~~~~~~~~~~~~~
    HOSTCC samples/bpf/sockex2_user.o

    HOSTCC samples/bpf/xdp_tx_iptunnel_user.o
    clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD samples/bpf/tc_l2_redirect

    HOSTLD samples/bpf/lwt_len_hist
    HOSTLD samples/bpf/xdp_tx_iptunnel
    make[1]: Leaving directory '/tmp/build/linux'
    [root@f5065a7d6272 linux]#

    And then, in the host:

    [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
    /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
    [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
    [root@jouet bpf]# file offwaketime
    offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
    [root@jouet bpf]# readelf -SW offwaketime
    offwaketime offwaketime_kern.o offwaketime_user.o
    [root@jouet bpf]# readelf -SW offwaketime_kern.o
    There are 11 section headers, starting at offset 0x700:

    Section Headers:
    [Nr] Name Type Address Off Size ES Flg Lk Inf Al
    [ 0] NULL 0000000000000000 000000 000000 00 0 0 0
    [ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1
    [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
    [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
    [ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8
    [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
    [ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8
    [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
    [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
    [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
    [10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8
    Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
    qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
    firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
    swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
    [root@jouet bpf]#

    Signed-off-by: Joe Stringer
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Cc: netdev@vger.kernel.org
    Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
    Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
    [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
    some warnings building samples/bpf/ on a Fedora Rawhide container, with
    clang/llvm 3.9 I noticed this:

    [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
    make[1]: Entering directory '/tmp/build/linux'
    CHK include/config/kernel.release
    GEN ./Makefile
    CHK include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel

    HOSTCC samples/bpf/trace_output_user.o
    /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
    prototype for 'perf_event_read' [-Wmissing-prototypes]
    void perf_event_read(print_fn fn)
    ^~~~~~~~~~~~~~~
    HOSTLD samples/bpf/trace_output
    make[1]: Leaving directory '/tmp/build/linux'

    Shut up the compiler by making that function static.

    Acked-by: Daniel Borkmann
    Cc: Alexei Starovoitov
    Cc: Joe Stringer
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

18 Dec, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
    about missing CRCs (Nick Piggin)

    - asm-exports fix for LTO (Nicolas Pitre)

    - thin archives improvements (Nick Piggin)

    - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
    Piggin)

    - genksyms support for __builtin_va_list keyword

    - misc minor fixes

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    x86/kbuild: enable modversions for symbols exported from asm
    kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
    scripts/kallsyms: remove last remnants of --page-offset option
    make use of make variable CURDIR instead of calling pwd
    kbuild: cmd_export_list: tighten the sed script
    kbuild: minor improvement for thin archives build
    kbuild: modpost warn if export version crc is missing
    kbuild: keep data tables through dead code elimination
    kbuild: improve linker compatibility with lib-ksyms.o build
    genksyms: Regenerate parser
    kbuild/genksyms: handle va_list type
    kbuild: thin archives for multi-y targets
    kbuild: kallsyms allow 3-pass generation if symbols size has changed

    Linus Torvalds
     

16 Dec, 2016

2 commits

  • Pull tracing updates from Steven Rostedt:
    "This release has a few updates:

    - STM can hook into the function tracer
    - Function filtering now supports more advance glob matching
    - Ftrace selftests updates and added tests
    - Softirq tag in traces now show only softirqs
    - ARM nop added to non traced locations at compile time
    - New trace_marker_raw file that allows for binary input
    - Optimizations to the ring buffer
    - Removal of kmap in trace_marker
    - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
    - Other various fixes and clean ups"

    * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
    selftests: ftrace: Shift down default message verbosity
    kprobes/trace: Fix kprobe selftest for newer gcc
    tracing/kprobes: Add a helper method to return number of probe hits
    tracing/rb: Init the CPU mask on allocation
    tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
    tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
    fgraph: Handle a case where a tracer ignores set_graph_notrace
    tracing: Replace kmap with copy_from_user() in trace_marker writing
    ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
    tracing: Allow benchmark to be enabled at early_initcall()
    tracing: Have system enable return error if one of the events fail
    tracing: Do not start benchmark on boot up
    tracing: Have the reg function allow to fail
    ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
    ring-buffer: Froce rb_update_write_stamp() to be inlined
    ring-buffer: Force inline of hotpath helper functions
    tracing: Make __buffer_unlock_commit() always_inline
    tracing: Make tracepoint_printk a static_key
    ring-buffer: Always inline rb_event_data()
    ring-buffer: Make rb_reserve_next_event() always inlined
    ...

    Linus Torvalds
     
  • Switch all of the sample code to use the function names from
    tools/lib/bpf so that they're consistent with that, and to declare their
    own log buffers. This allow the next commit to be purely devoted to
    getting rid of the duplicate library in samples/bpf.

    Committer notes:

    Testing it:

    On a fedora rawhide container, with clang/llvm 3.9, sharing the host
    linux kernel git tree:

    # make O=/tmp/build/linux/ headers_install
    # make O=/tmp/build/linux -C samples/bpf/

    Since I forgot to make it privileged, just tested it outside the
    container, using what it generated:

    # uname -a
    Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
    # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
    # ls -la offwaketime
    -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
    # file offwaketime
    offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
    # readelf -SW offwaketime_kern.o | grep PROGBITS
    [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
    [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
    [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
    [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
    [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
    [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
    # ./offwaketime | head -5
    swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
    CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
    Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
    firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
    JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
    #

    Signed-off-by: Joe Stringer
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     

15 Dec, 2016

1 commit

  • Pull security subsystem updates from James Morris:
    "Generally pretty quiet for this release. Highlights:

    Yama:
    - allow ptrace access for original parent after re-parenting

    TPM:
    - add documentation
    - many bugfixes & cleanups
    - define a generic open() method for ascii & bios measurements

    Integrity:
    - Harden against malformed xattrs

    SELinux:
    - bugfixes & cleanups

    Smack:
    - Remove unnecessary smack_known_invalid label
    - Do not apply star label in smack_setprocattr hook
    - parse mnt opts after privileges check (fixes unpriv DoS vuln)"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits)
    Yama: allow access for the current ptrace parent
    tpm: adjust return value of tpm_read_log
    tpm: vtpm_proxy: conditionally call tpm_chip_unregister
    tpm: Fix handling of missing event log
    tpm: Check the bios_dir entry for NULL before accessing it
    tpm: return -ENODEV if np is not set
    tpm: cleanup of printk error messages
    tpm: replace of_find_node_by_name() with dev of_node property
    tpm: redefine read_log() to handle ACPI/OF at runtime
    tpm: fix the missing .owner in tpm_bios_measurements_ops
    tpm: have event log use the tpm_chip
    tpm: drop tpm1_chip_register(/unregister)
    tpm: replace dynamically allocated bios_dir with a static array
    tpm: replace symbolic permission with octal for securityfs files
    char: tpm: fix kerneldoc tpm2_unseal_trusted name typo
    tpm_tis: Allow tpm_tis to be bound using DT
    tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV
    tpm: Only call pm_runtime_get_sync if device has a parent
    tpm: define a generic open() method for ascii & bios measurements
    Documentation: tpm: add the Physical TPM device tree binding documentation
    ...

    Linus Torvalds
     

14 Dec, 2016

1 commit

  • Pull VFIO updates from Alex Williamson:

    - VFIO updates for v4.10 primarily include a new Mediated Device
    interface, which essentially allows software defined devices to be
    exposed to users through VFIO. The host vendor driver providing this
    virtual device polices, or mediates user access to the device.

    These devices often incorporate portions of real devices, for
    instance the primary initial users of this interface expose vGPUs
    which allow the user to map mediated devices, or mdevs, to a portion
    of a physical GPU. QEMU composes these mdevs into PCI representations
    using the existing VFIO user API. This enables both Intel KVM-GT
    support, which is also expected to arrive into Linux mainline during
    the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O
    devices (aka CCW devices) for s390 virtualization support. (Kirti
    Wankhede, Neo Jia)

    - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin)

    - Fixes to VFIO capability chain handling (Eric Auger)

    - Error handling fixes for fallout from mdev (Christophe JAILLET)

    - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song)

    - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia)

    * tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
    vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages
    vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP.
    vfio iommu type1: WARN_ON if notifier block is not unregistered
    kvm: set/clear kvm to/from vfio_group when group add/delete
    vfio: support notifier chain in vfio_group
    vfio: vfio_register_notifier: classify iommu notifier
    vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'
    vfio: fix vfio_info_cap_add/shift
    vfio/pci: Drop unnecessary pcibios_err_to_errno()
    MAINTAINERS: Add entry VFIO based Mediated device drivers
    docs: Sample driver to demonstrate how to use Mediated device framework.
    docs: Sysfs ABI for mediated device framework
    docs: Add Documentation for Mediated devices
    vfio: Define device_api strings
    vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
    vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
    vfio: Introduce vfio_set_irqs_validate_and_prepare()
    vfio_pci: Update vfio_pci to use vfio_info_add_capability()
    vfio: Introduce common function to add capabilities
    vfio iommu: Add blocking notifier to notify DMA_UNMAP
    ...

    Linus Torvalds
     

11 Dec, 2016

1 commit