04 Aug, 2016

1 commit

  • The use of config_enabled() against config options is ambiguous. In
    practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
    author might have used it for the meaning of IS_ENABLED(). Using
    IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
    clearer.

    This commit replaces config_enabled() with IS_ENABLED() where possible.
    This commit is only touching bool config options.

    I noticed two cases where config_enabled() is used against a tristate
    option:

    - config_enabled(CONFIG_HWMON)
    [ drivers/net/wireless/ath/ath10k/thermal.c ]

    - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
    [ drivers/gpu/drm/gma500/opregion.c ]

    I did not touch them because they should be converted to IS_BUILTIN()
    in order to keep the logic, but I was not sure it was the authors'
    intention.

    Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Acked-by: Kees Cook
    Cc: Stas Sergeev
    Cc: Matt Redfearn
    Cc: Joshua Kinard
    Cc: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Markos Chandras
    Cc: "Dmitry V. Levin"
    Cc: yu-cheng yu
    Cc: James Hogan
    Cc: Brian Gerst
    Cc: Johannes Berg
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: Will Drewry
    Cc: Nikolay Martynov
    Cc: Huacai Chen
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Daniel Borkmann
    Cc: Leonid Yegoshin
    Cc: Rafal Milecki
    Cc: James Cowgill
    Cc: Greg Kroah-Hartman
    Cc: Ralf Baechle
    Cc: Alex Smith
    Cc: Adam Buchbinder
    Cc: Qais Yousef
    Cc: Jiang Liu
    Cc: Mikko Rapeli
    Cc: Paul Gortmaker
    Cc: Denys Vlasenko
    Cc: Brian Norris
    Cc: Hidehiro Kawai
    Cc: "Luis R. Rodriguez"
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Dave Hansen
    Cc: "Kirill A. Shutemov"
    Cc: Roland McGrath
    Cc: Paul Burton
    Cc: Kalle Valo
    Cc: Viresh Kumar
    Cc: Tony Wu
    Cc: Huaitong Han
    Cc: Sumit Semwal
    Cc: Alexei Starovoitov
    Cc: Juergen Gross
    Cc: Jason Cooper
    Cc: "David S. Miller"
    Cc: Oleg Nesterov
    Cc: Andrea Gelmini
    Cc: David Woodhouse
    Cc: Marc Zyngier
    Cc: Rabin Vincent
    Cc: "Maciej W. Rozycki"
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

03 Aug, 2016

2 commits

  • Pull networking fixes from David Miller:

    1) Fix several cases of missing of_node_put() calls in various
    networking drivers. From Peter Chen.

    2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

    3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

    4) Fix lockups in CPDMA driver, from Grygorii Strashko.

    5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

    6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

    7) Various printf format specifier fixes, from Heinrich Schuchardt.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    qed: Fail driver load in 100g MSI mode.
    ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
    ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
    ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
    ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
    ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
    ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
    ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
    ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
    ethernet: altera: add missing of_node_put
    8139too: fix system hang when there is a tx timeout event.
    qed: Fix error return code in qed_resc_alloc()
    net: qlcnic: avoid superfluous assignement
    dsa: b53: remove redundant if
    ...

    Linus Torvalds
     
  • Pull Ceph updates from Ilya Dryomov:
    "The highlights are:

    - RADOS namespace support in libceph and CephFS (Zheng Yan and
    myself). The stopgaps added in 4.5 to deny access to inodes in
    namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
    bit is now fully supported

    - A large rework of the MDS cap flushing code (Zheng Yan)

    - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU

    On top of that we've got a few CephFS bug fixes, a couple of cleanups
    and Arnd's workaround for a weird genksyms issue"

    * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
    ceph: fix symbol versioning for ceph_monc_do_statfs
    ceph: Correctly return NXIO errors from ceph_llseek
    ceph: Mark the file cache as unreclaimable
    ceph: optimize cap flush waiting
    ceph: cleanup ceph_flush_snaps()
    ceph: kick cap flushes before sending other cap message
    ceph: introduce an inode flag to indicates if snapflush is needed
    ceph: avoid sending duplicated cap flush message
    ceph: unify cap flush and snapcap flush
    ceph: use list instead of rbtree to track cap flushes
    ceph: update types of some local varibles
    ceph: include 'follows' of pending snapflush in cap reconnect message
    ceph: update cap reconnect message to version 3
    ceph: mount non-default filesystem by name
    libceph: fsmap.user subscription support
    ceph: handle LOOKUP_RCU in ceph_d_revalidate
    ceph: allow dentry_lease_is_valid to work under RCU walk
    ceph: clear d_fsinfo pointer under d_lock
    ceph: remove ceph_mdsc_lease_release
    ceph: don't use ->d_time
    ...

    Linus Torvalds
     

31 Jul, 2016

7 commits

  • Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of
    CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the
    assoc is closed when sctp_accept clones a new sk.

    If there is still data in sk receive queue, users will not be able
    to read it any more, as sctp_recvmsg returns directly if sk state
    is CLOSED.

    This patch is to add CLOSED state check in sctp_recvmsg to allow
    reading data from TCP-style sk with CLOSED state as what TCP does.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Prior to this patch, once sctp received SHUTDOWN or shutdown with RD,
    sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would
    be dropped in sctp_ulpq_tail_event(). It would cause:

    1. some notifications couldn't be received by users. like
    SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C().

    2. sctp would also never trigger sk_data_ready when the association
    was closed, making it harder to identify the end of the association
    by calling recvmsg() and getting an EOF. It was not convenient for
    kernel users.

    The check here should be stopping delivering DATA chunks after receiving
    SHUTDOWN, and stopping delivering ANY chunks after sctp_close().

    So this patch is to allow notifications to enqueue into receive queue
    even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event,
    but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all
    events.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • sctp needs to queue auth chunk back when we know that we are going
    to generate another segment. But commit f1533cce60d1 ("sctp: fix
    panic when sending auth chunks") requeues the last chunk processed
    which is probably not the auth chunk.

    It causes panic when calculating the MAC in sctp_auth_calculate_hmac(),
    as the incorrect offset of the auth chunk in skb->data.

    This fix is to requeue it by using packet->auth.

    Fixes: f1533cce60d1 ("sctp: fix panic when sending auth chunks")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • tcp_select_initial_window() intends to advertise a window
    scaling for the maximum possible window size. To do so,
    it considers the maximum of net.ipv4.tcp_rmem[2] and
    net.core.rmem_max as the only possible upper-bounds.
    However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
    to set the socket's receive buffer size to values
    larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
    Thus, SO_RCVBUFFORCE is effectively ignored by
    tcp_select_initial_window().

    To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
    net.core.rmem_max and socket's initial buffer space.

    Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
    Signed-off-by: Soheil Hassas Yeganeh
    Suggested-by: Neal Cardwell
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     
  • Using list_move() instead of list_del() + list_add().

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • In the error handling case of nla_nest_start() failed read_unlock_bh()
    is called to unlock a lock that had not been taken yet. sparse warns
    about the context imbalance as the following:

    net/tipc/monitor.c:799:23: warning:
    context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block

    Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes')
    Signed-off-by: Wei Yongjun
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable bugfixes:
    - nfs: don't create zero-length requests

    - several LAYOUTGET bugfixes

    Features:
    - several performance related features

    - more aggressive caching when we can rely on close-to-open
    cache consistency

    - remove serialisation of O_DIRECT reads and writes

    - optimise several code paths to not flush to disk unnecessarily.

    However allow for the idiosyncracies of pNFS for those layout
    types that need to issue a LAYOUTCOMMIT before the metadata can
    be updated on the server.

    - SUNRPC updates to the client data receive path

    - pNFS/SCSI support RH/Fedora dm-mpath device nodes

    - pNFS files/flexfiles can now use unprivileged ports when
    the generic NFS mount options allow it.

    Bugfixes:
    - Don't use RDMA direct data placement together with data
    integrity or privacy security flavours

    - Remove the RDMA ALLPHYSICAL memory registration mode as
    it has potential security holes.

    - Several layout recall fixes to improve NFSv4.1 protocol
    compliance.

    - Fix an Oops in the pNFS files and flexfiles connection
    setup to the DS

    - Allow retry of operations that used a returned delegation
    stateid

    - Don't mark the inode as revalidated if a LAYOUTCOMMIT is
    outstanding

    - Fix writeback races in nfs4_copy_range() and
    nfs42_proc_deallocate()"

    * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits)
    pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
    NFSv4: Clean up lookup of SECINFO_NO_NAME
    NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
    NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
    SUNRPC: Fix a compiler warning in fs/nfs/clnt.c
    pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
    pNFS: Cleanup - do layout segment initialisation in one place
    pNFS: Remove redundant stateid invalidation
    pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
    pNFS: Clear the layout metadata if the server changed the layout stateid
    pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
    NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
    pNFS: Do not set plh_return_seq for non-callback related layoutreturns
    pNFS: Ensure layoutreturn acts as a completion for layout callbacks
    pNFS: Fix CB_LAYOUTRECALL stateid verification
    pNFS: Always update the layout barrier seqid on LAYOUTGET
    pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
    pNFS: Clear the layout return tracking on layout reinitialisation
    pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
    nfs: don't create zero-length requests
    ...

    Linus Torvalds
     

30 Jul, 2016

2 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     
  • Pull userns vfs updates from Eric Biederman:
    "This tree contains some very long awaited work on generalizing the
    user namespace support for mounting filesystems to include filesystems
    with a backing store. The real world target is fuse but the goal is
    to update the vfs to allow any filesystem to be supported. This
    patchset is based on a lot of code review and testing to approach that
    goal.

    While looking at what is needed to support the fuse filesystem it
    became clear that there were things like xattrs for security modules
    that needed special treatment. That the resolution of those concerns
    would not be fuse specific. That sorting out these general issues
    made most sense at the generic level, where the right people could be
    drawn into the conversation, and the issues could be solved for
    everyone.

    At a high level what this patchset does a couple of simple things:

    - Add a user namespace owner (s_user_ns) to struct super_block.

    - Teach the vfs to handle filesystem uids and gids not mapping into
    to kuids and kgids and being reported as INVALID_UID and
    INVALID_GID in vfs data structures.

    By assigning a user namespace owner filesystems that are mounted with
    only user namespace privilege can be detected. This allows security
    modules and the like to know which mounts may not be trusted. This
    also allows the set of uids and gids that are communicated to the
    filesystem to be capped at the set of kuids and kgids that are in the
    owning user namespace of the filesystem.

    One of the crazier corner casees this handles is the case of inodes
    whose i_uid or i_gid are not mapped into the vfs. Most of the code
    simply doesn't care but it is easy to confuse the inode writeback path
    so no operation that could cause an inode write-back is permitted for
    such inodes (aka only reads are allowed).

    This set of changes starts out by cleaning up the code paths involved
    in user namespace permirted mounts. Then when things are clean enough
    adds code that cleanly sets s_user_ns. Then additional restrictions
    are added that are possible now that the filesystem superblock
    contains owner information.

    These changes should not affect anyone in practice, but there are some
    parts of these restrictions that are changes in behavior.

    - Andy's restriction on suid executables that does not honor the
    suid bit when the path is from another mount namespace (think
    /proc/[pid]/fd/) or when the filesystem was mounted by a less
    privileged user.

    - The replacement of the user namespace implicit setting of MNT_NODEV
    with implicitly setting SB_I_NODEV on the filesystem superblock
    instead.

    Using SB_I_NODEV is a stronger form that happens to make this state
    user invisible. The user visibility can be managed but it caused
    problems when it was introduced from applications reasonably
    expecting mount flags to be what they were set to.

    There is a little bit of work remaining before it is safe to support
    mounting filesystems with backing store in user namespaces, beyond
    what is in this set of changes.

    - Verifying the mounter has permission to read/write the block device
    during mount.

    - Teaching the integrity modules IMA and EVM to handle filesystems
    mounted with only user namespace root and to reduce trust in their
    security xattrs accordingly.

    - Capturing the mounters credentials and using that for permission
    checks in d_automount and the like. (Given that overlayfs already
    does this, and we need the work in d_automount it make sense to
    generalize this case).

    Furthermore there are a few changes that are on the wishlist:

    - Get all filesystems supporting posix acls using the generic posix
    acls so that posix_acl_fix_xattr_from_user and
    posix_acl_fix_xattr_to_user may be removed. [Maintainability]

    - Reducing the permission checks in places such as remount to allow
    the superblock owner to perform them.

    - Allowing the superblock owner to chown files with unmapped uids and
    gids to something that is mapped so the files may be treated
    normally.

    I am not considering even obvious relaxations of permission checks
    until it is clear there are no more corner cases that need to be
    locked down and handled generically.

    Many thanks to Seth Forshee who kept this code alive, and putting up
    with me rewriting substantial portions of what he did to handle more
    corner cases, and for his diligent testing and reviewing of my
    changes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
    fs: Call d_automount with the filesystems creds
    fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    evm: Translate user/group ids relative to s_user_ns when computing HMAC
    dquot: For now explicitly don't support filesystems outside of init_user_ns
    quota: Handle quota data stored in s_user_ns in quota_setxquota
    quota: Ensure qids map to the filesystem
    vfs: Don't create inodes with a uid or gid unknown to the vfs
    vfs: Don't modify inodes with a uid or gid unknown to the vfs
    cred: Reject inodes with invalid ids in set_create_file_as()
    fs: Check for invalid i_uid in may_follow_link()
    vfs: Verify acls are valid within superblock's s_user_ns.
    userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
    fs: Refuse uid/gid changes which don't map into s_user_ns
    selinux: Add support for unprivileged mounts from user namespaces
    Smack: Handle labels consistently in untrusted mounts
    Smack: Add support for unprivileged mounts from user namespaces
    fs: Treat foreign mounts as nosuid
    fs: Limit file caps to the user namespace of the super block
    userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
    userns: Remove implicit MNT_NODEV fragility.
    ...

    Linus Torvalds
     

29 Jul, 2016

1 commit

  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

28 Jul, 2016

7 commits

  • Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Add pool namesapce pointer to struct ceph_file_layout and struct
    ceph_object_locator. Pool namespace is used by when mapping object
    to PG, it's also used when composing OSD request.

    The namespace pointer in struct ceph_file_layout is RCU protected.
    So libceph can read namespace without taking lock.

    Signed-off-by: Yan, Zheng
    [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • The data structure is for storing namesapce string. It allows namespace
    string to be shared between cephfs inodes with same layout. This data
    structure can also be referenced by OSD request.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Define new ceph_file_layout structure and rename old ceph_file_layout
    to ceph_file_layout_legacy. This is preparation for adding namespace
    to ceph_file_layout structure.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • - decode.h needs slab.h for kmalloc()
    - osd_client.h needs msgpool.h for struct ceph_msgpool
    - msgpool.h doesn't need messenger.h

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     

27 Jul, 2016

11 commits


26 Jul, 2016

9 commits

  • After the previous patch, struct tc_action should be enough
    to represent the generic tc action, tcf_common is not necessary
    any more. This patch gets rid of it to make tc action code
    more readable.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • After a612769774a3 ("udp: prevent bugcheck if filter truncates packet
    too much"), there followed various other fixes for similar cases such
    as f4979fcea7fd ("rose: limit sk_filter trim to payload").

    Latter introduced a new helper sk_filter_trim_cap(), where we can pass
    the trim limit directly to the socket filter handling. Make use of it
    here as well with sizeof(struct udphdr) as lower cap limit and drop the
    extra skb->len test in UDP's input path.

    Signed-off-by: Daniel Borkmann
    Cc: Willem de Bruijn
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     
  • I was seeing a lot of these:

    BUG: sleeping function called from invalid context at mm/slab.h:388
    in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
    Preemption disabled at:[] rhashtable_walk_start+0x46/0x150

    [] preempt_count_add+0x1fb/0x280
    [] _raw_spin_lock+0x12/0x40
    [] console_unlock+0x2f7/0x930
    [] vprintk_emit+0x2fb/0x520
    [] vprintk_default+0x1a/0x20
    [] printk+0x94/0xb0
    [] print_stack_trace+0xe0/0x170
    [] ___might_sleep+0x3be/0x460
    [] __might_sleep+0x90/0x1a0
    [] kmem_cache_alloc+0x153/0x1e0
    [] rhashtable_walk_init+0xfe/0x2d0
    [] sctp_transport_walk_start+0x1e/0x60
    [] sctp_transport_seq_start+0x4d/0x150
    [] seq_read+0x27b/0x1180
    [] proc_reg_read+0xbc/0x180
    [] __vfs_read+0xdb/0x610
    [] vfs_read+0xea/0x2d0
    [] SyS_pread64+0x11b/0x150
    [] do_syscall_64+0x19c/0x410
    [] return_from_SYSCALL_64+0x0/0x6a
    [] 0xffffffffffffffff

    Apparently we always need to call rhashtable_walk_stop(), even when
    rhashtable_walk_start() fails:

    * rhashtable_walk_start - Start a hash table walk
    * @iter: Hash table iterator
    *
    * Start a hash table walk. Note that we take the RCU lock in all
    * cases including when we return an error. So you must always call
    * rhashtable_walk_stop to clean up.

    otherwise we never call rcu_read_unlock() and we get the splat above.

    Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
    See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
    See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
    Cc: Xin Long
    Cc: Herbert Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Vegard Nossum
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Vegard Nossum
     
  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     
  • I ran into this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
    RIP: 0010:[] [] irttp_connect_request+0x36/0x710
    RSP: 0018:ffff880111747bb8 EFLAGS: 00010286
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
    RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
    RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
    R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
    FS: 00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
    Stack:
    0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
    ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
    ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
    Call Trace:
    [] irda_connect+0x562/0x1190
    [] SYSC_connect+0x202/0x2a0
    [] SyS_connect+0x9/0x10
    [] do_syscall_64+0x19c/0x410
    [] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
    RIP [] irttp_connect_request+0x36/0x710
    RSP
    ---[ end trace 4cda2588bc055b30 ]---

    The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
    and then irttp_connect_request() almost immediately dereferences it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Vegard Nossum
    Signed-off-by: David S. Miller

    Vegard Nossum
     
  • The head skb for GSO packets won't travel through the inner depths of
    SCTP stack as it doesn't contain any chunks on it. That means skb->sk
    doesn't get set and then when sctp_recvmsg() calls
    sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
    to check flags at the socket (sp->v4mapped).

    The fix is to initialize skb->sk for th head skb once we are able to do
    it. That is, when the first chunk is processed.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • Now that the backlog processing is called with BH enabled, we have to
    disable BH before taking the socket lock via bh_lock_sock() otherwise
    it may dead lock:

    sctp_backlog_rcv()
    bh_lock_sock(sk);

    if (sock_owned_by_user(sk)) {
    if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
    sctp_chunk_free(chunk);
    else
    backloged = 1;
    } else
    sctp_inq_push(inqueue, chunk);

    bh_unlock_sock(sk);

    while sctp_inq_push() was disabling/enabling BH, but enabling BH
    triggers pending softirq, which then may try to re-lock the socket in
    sctp_rcv().

    [ 219.187215]
    [ 219.187217] [] _raw_spin_lock+0x20/0x30
    [ 219.187223] [] sctp_rcv+0x48c/0xba0 [sctp]
    [ 219.187225] [] ? nf_iterate+0x62/0x80
    [ 219.187226] [] ip_local_deliver_finish+0x94/0x1e0
    [ 219.187228] [] ip_local_deliver+0x6f/0xf0
    [ 219.187229] [] ? ip_rcv_finish+0x3b0/0x3b0
    [ 219.187230] [] ip_rcv_finish+0xd8/0x3b0
    [ 219.187232] [] ip_rcv+0x282/0x3a0
    [ 219.187233] [] ? update_curr+0x66/0x180
    [ 219.187235] [] __netif_receive_skb_core+0x524/0xa90
    [ 219.187236] [] ? update_cfs_shares+0x30/0xf0
    [ 219.187237] [] ? __enqueue_entity+0x6c/0x70
    [ 219.187239] [] ? enqueue_entity+0x204/0xdf0
    [ 219.187240] [] __netif_receive_skb+0x18/0x60
    [ 219.187242] [] process_backlog+0x9e/0x140
    [ 219.187243] [] net_rx_action+0x22c/0x370
    [ 219.187245] [] __do_softirq+0x112/0x2e7
    [ 219.187247] [] do_softirq_own_stack+0x1c/0x30
    [ 219.187247]
    [ 219.187248] [] do_softirq.part.14+0x38/0x40
    [ 219.187249] [] __local_bh_enable_ip+0x7d/0x80
    [ 219.187254] [] sctp_inq_push+0x68/0x80 [sctp]
    [ 219.187258] [] sctp_backlog_rcv+0x151/0x1c0 [sctp]
    [ 219.187260] [] __release_sock+0x87/0xf0
    [ 219.187261] [] release_sock+0x30/0xa0
    [ 219.187265] [] sctp_accept+0x17d/0x210 [sctp]
    [ 219.187266] [] ? prepare_to_wait_event+0xf0/0xf0
    [ 219.187268] [] inet_accept+0x3c/0x130
    [ 219.187269] [] SYSC_accept4+0x103/0x210
    [ 219.187271] [] ? _raw_spin_unlock_bh+0x1a/0x20
    [ 219.187272] [] ? release_sock+0x8c/0xa0
    [ 219.187276] [] ? sctp_inet_listen+0x62/0x1b0 [sctp]
    [ 219.187277] [] SyS_accept+0x10/0x20

    Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change")
    Cc: Eric Dumazet
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner