31 Jul, 2016

1 commit


30 Jul, 2016

1 commit

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • Currently lastuse is updated on entry creation and cache hit, but it should
    also be updated on entry change. Since both on add and update the ttl array
    is updated we can simply update the lastuse in ipmr_update_thresholds.

    Signed-off-by: Nikolay Aleksandrov
    CC: Roopa Prabhu
    CC: Donald Sharp
    CC: David S. Miller
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

26 Jul, 2016

2 commits

  • After a612769774a3 ("udp: prevent bugcheck if filter truncates packet
    too much"), there followed various other fixes for similar cases such
    as f4979fcea7fd ("rose: limit sk_filter trim to payload").

    Latter introduced a new helper sk_filter_trim_cap(), where we can pass
    the trim limit directly to the socket filter handling. Make use of it
    here as well with sizeof(struct udphdr) as lower cap limit and drop the
    extra skb->len test in UDP's input path.

    Signed-off-by: Daniel Borkmann
    Cc: Willem de Bruijn
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Default kernel behavior is to delete IPv6 addresses on link
    down, which entails deletion of the multicast and the
    subnet-router anycast addresses. These deletions do not
    happen with sysctl setting to keep global IPv6 addresses on
    link down, so every link down/up causes an increment of the
    anycast and multicast refcounts. These bogus refcounts may
    stop these addrs from being removed on subsequent calls to
    delete them. The solution is to leave the groups for the
    multicast and subnet anycast on link down for the callflow
    when global IPv6 addresses are kept.

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Signed-off-by: Mike Manning
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Mike Manning
     

25 Jul, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next,
    they are:

    1) Count pre-established connections as active in "least connection"
    schedulers such that pre-established connections to avoid overloading
    backend servers on peak demands, from Michal Kubecek via Simon Horman.

    2) Address a race condition when resizing the conntrack table by caching
    the bucket size when fulling iterating over the hashtable in these
    three possible scenarios: 1) dump via /proc/net/nf_conntrack,
    2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
    From Liping Zhang.

    3) Revisit early_drop() path to perform lockless traversal on conntrack
    eviction under stress, use del_timer() as synchronization point to
    avoid two CPUs evicting the same entry, from Florian Westphal.

    4) Move NAT hlist_head to nf_conn object, this simplifies the existing
    NAT extension and it doesn't increase size since recent patches to
    align nf_conn, from Florian.

    5) Use rhashtable for the by-source NAT hashtable, also from Florian.

    6) Don't allow --physdev-is-out from OUTPUT chain, just like
    --physdev-out is not either, from Hangbin Liu.

    7) Automagically set on nf_conntrack counters if the user tries to
    match ct bytes/packets from nftables, from Liping Zhang.

    8) Remove possible_net_t fields in nf_tables set objects since we just
    simply pass the net pointer to the backend set type implementations.

    9) Fix possible off-by-one in h323, from Toby DiPasquale.

    10) early_drop() may be called from ctnetlink patch, so we must hold
    rcu read size lock from them too, this amends Florian's patch #3
    coming in this batch, from Liping Zhang.

    11) Use binary search to validate jump offset in x_tables, this
    addresses the O(n!) validation that was introduced recently
    resolve security issues with unpriviledge namespaces, from Florian.

    12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.

    13) Three updates for nft_log: Fix log prefix leak in error path. Bail
    out on loglevel larger than debug in nft_log and set on the new
    NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.

    14) Allow to filter rule dumps in nf_tables based on table and chain
    names.

    15) Simplify connlabel to always use 128 bits to store labels and
    get rid of unused function in xt_connlabel, from Florian.

    16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
    helper, by Gao Feng.

    17) Put back x_tables module reference in nft_compat on error, from
    Liping Zhang.

    18) Add a reference count to the x_tables extensions cache in
    nft_compat, so we can remove them when unused and avoid a crash
    if the extensions are rmmod, again from Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jul, 2016

1 commit


19 Jul, 2016

1 commit

  • The dummy ruleset I used to test the original validation change was broken,
    most rules were unreachable and were not tested by mark_source_chains().

    In some cases rulesets that used to load in a few seconds now require
    several minutes.

    sample ruleset that shows the behaviour:

    echo "*filter"
    for i in $(seq 0 100000);do
    printf ":chain_%06x - [0:0]\n" $i
    done
    for i in $(seq 0 100000);do
    printf -- "-A INPUT -j chain_%06x\n" $i
    printf -- "-A INPUT -j chain_%06x\n" $i
    printf -- "-A INPUT -j chain_%06x\n" $i
    done
    echo COMMIT

    [ pipe result into iptables-restore ]

    This ruleset will be about 74mbyte in size, with ~500k searches
    though all 500k[1] rule entries. iptables-restore will take forever
    (gave up after 10 minutes)

    Instead of always searching the entire blob for a match, fill an
    array with the start offsets of every single ipt_entry struct,
    then do a binary search to check if the jump target is present or not.

    After this change ruleset restore times get again close to what one
    gets when reverting 36472341017529e (~3 seconds on my workstation).

    [1] every user-defined rule gets an implicit RETURN, so we get
    300k jumps + 100k userchains + 100k returns -> 500k rule entries

    Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps")
    Reported-by: Jeff Wu
    Tested-by: Jeff Wu
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Jul, 2016

1 commit

  • In preparation for hardware offloading of ipmr/ip6mr we need an
    interface that allows to check (and later update) the age of entries.
    Relying on stats alone can show activity but not actual age of the entry,
    furthermore when there're tens of thousands of entries a lot of the
    hardware implementations only support "hit" bits which are cleared on
    read to denote that the entry was active and shouldn't be aged out,
    these can then be naturally translated into age timestamp and will be
    compatible with the software forwarding age. Using a lastuse entry doesn't
    affect performance because the members in that cache line are written to
    along with the age.
    Since all new users are encouraged to use ipmr via netlink, this is
    exported via the RTA_EXPIRES attribute.
    Also do a minor local variable declaration style adjustment - arrange them
    longest to shortest.

    Signed-off-by: Nikolay Aleksandrov
    CC: Roopa Prabhu
    CC: Shrijeet Mukherjee
    CC: Satish Ashok
    CC: Donald Sharp
    CC: David S. Miller
    CC: Alexey Kuznetsov
    CC: James Morris
    CC: Hideaki YOSHIFUJI
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

12 Jul, 2016

1 commit

  • If socket filter truncates an udp packet below the length of UDP header
    in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
    BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
    kernel is configured that way) can be easily enforced by an unprivileged
    user which was reported as CVE-2016-6162. For a reproducer, see
    http://seclists.org/oss-sec/2016/q3/8

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Marco Grassi
    Signed-off-by: Michal Kubecek
    Acked-by: Eric Dumazet
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Michal Kubeček
     

10 Jul, 2016

2 commits

  • All inet6_netconf_notify_devconf() callers are in process context,
    so we can use GFP_KERNEL allocations if we take care of not holding
    a rwlock while not needed in ip6mr (we hold RTNL there)

    Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status")
    Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status")
    Signed-off-by: Eric Dumazet
    Cc: Nicolas Dichtel
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Extend the SIT driver to support MPLS over IPv4. This implementation
    extends existing support for IPv6 over IPv4 and IPv4 over IPv4.

    Signed-off-by: Simon Horman
    Reviewed-by: Dinan Gunawardena
    Signed-off-by: David S. Miller

    Simon Horman
     

07 Jul, 2016

3 commits

  • James Morris
     
  • Conflicts:
    drivers/net/ethernet/mellanox/mlx5/core/en.h
    drivers/net/ethernet/mellanox/mlx5/core/en_main.c
    drivers/net/usb/r8152.c

    All three conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next,
    they are:

    1) Don't use userspace datatypes in bridge netfilter code, from
    Tobin Harding.

    2) Iterate only once over the expectation table when removing the
    helper module, instead of once per-netns, from Florian Westphal.

    3) Extra sanitization in xt_hook_ops_alloc() to return error in case
    we ever pass zero hooks, xt_hook_ops_alloc():

    4) Handle NFPROTO_INET from the logging core infrastructure, from
    Liping Zhang.

    5) Autoload loggers when TRACE target is used from rules, this doesn't
    change the behaviour in case the user already selected nfnetlink_log
    as preferred way to print tracing logs, also from Liping Zhang.

    6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
    by cache lines, increases the size of entries in 11% per entry.
    From Florian Westphal.

    7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

    8) Remove useless defensive check in nf_logger_find_get() from Shivani
    Bhardwaj.

    9) Remove zone extension as place it in the conntrack object, this is
    always include in the hashing and we expect more intensive use of
    zones since containers are in place. Also from Florian Westphal.

    10) Owner match now works from any namespace, from Eric Bierdeman.

    11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

    12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

    13) Introduce generic macros for nf_tables object generation masks.

    14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

    15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

    16) Support for deletion of just added elements in the hash set type.

    17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

    18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

    19) Support for matching inverted set lookups, from Arturo Borrero.

    20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

    21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

    22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

    23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

    24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jul, 2016

1 commit

  • It was first reported and reproduced by Petr (thanks!) in
    https://bugzilla.kernel.org/show_bug.cgi?id=119581

    free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

    However, after fixing a deadlock bug in
    commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
    free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

    It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

    kmemleak somehow did not report it. We nailed it down by
    observing the pcpu entries in /proc/vmallocinfo (first suggested
    by Hannes, thanks!).

    Signed-off-by: Martin KaFai Lau
    Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
    Reported-by: Petr Novopashenniy
    Tested-by: Petr Novopashenniy
    Acked-by: Hannes Frederic Sowa
    Cc: Hannes Frederic Sowa
    Cc: Petr Novopashenniy
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

03 Jul, 2016

1 commit

  • netfilter uses multiple FWINV #defines with identical form that hide a
    specific structure variable and dereference it with a invflags member.

    $ git grep "#define FWINV"
    include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
    net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
    net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
    net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
    net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
    net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))

    Consolidate these macros into a single NF_INVF macro.

    Miscellanea:

    o Neaten the alignment around these uses
    o A few lines are > 80 columns for intelligibility

    Signed-off-by: Joe Perches
    Signed-off-by: Pablo Neira Ayuso

    Joe Perches
     

01 Jul, 2016

2 commits

  • No need for a special case to handle NF_INET_POST_ROUTING, this is
    basically the same handling as for prerouting, input, forward.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Some arches have virtually mapped kernel stacks, or will soon have.

    tcp_md5_hash_header() uses an automatic variable to copy tcp header
    before mangling th->check and calling crypto function, which might
    be problematic on such arches.

    David says that using percpu storage is also problematic on non SMP
    builds.

    Just use kmalloc() to allocate scratch areas.

    Signed-off-by: Eric Dumazet
    Reported-by: Andy Lutomirski
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Jun, 2016

1 commit


28 Jun, 2016

13 commits


27 Jun, 2016

1 commit

  • with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
    nexthop lookups"), net hop lookup is first performed on route creation
    in the passed-in table.
    However device match is not enforced in table lookup, so the found
    route can be later discarded due to egress device mismatch and no
    global lookup will be performed.
    This cause the following to fail:

    ip link add dummy1 type dummy
    ip link add dummy2 type dummy
    ip link set dummy1 up
    ip link set dummy2 up
    ip route add 2001:db8:8086::/48 dev dummy1 metric 20
    ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
    ip route add 2001:db8:8086::/48 dev dummy2 metric 21
    ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
    RTNETLINK answers: No route to host

    This change fixes the issue enforcing device lookup in
    ip6_nh_lookup_table()

    v1->v2: updated commit message title

    Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
    Reported-and-tested-by: Beniamino Galvani
    Signed-off-by: Paolo Abeni
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

19 Jun, 2016

4 commits

  • When receiving an ICMPv4 message containing extensions as
    defined in RFC 4884, and translating it to ICMPv6 at SIT
    or GRE tunnel, we need some extra manipulation in order
    to properly forward the extensions.

    This patch only takes care of Time Exceeded messages as they
    are the ones that typically carry information from various
    routers in a fabric during a traceroute session.

    It also avoids complex skb logic if the data_len is not
    a multiple of 8.

    RFC states :

    The "original datagram" field MUST contain at least 128 octets.
    If the original datagram did not contain 128 octets, the
    "original datagram" field MUST be zero padded to 128 octets.

    In practice routers use 128 bytes of original datagram, not more.

    Initial translation was added in commit ca15a078bd90
    ("sit: generate icmpv6 error when receiving icmpv4 error")

    Signed-off-by: Eric Dumazet
    Cc: Oussama Ghorbel
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For better traceroute/mtr support for SIT and GRE tunnels,
    we translate IPV4 ICMP ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED

    We also have to translate the IPv4 source IP address of ICMP
    message to IPv6 v4mapped.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We want to use this helper from GRE as well, so this is
    the time to move it in net/ipv6/icmp.c

    Also add a @nhs parameter, since SIT and GRE have different
    values for the header(s) to skip.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SIT or GRE tunnels might want to translate an IPV4 address
    into a v4mapped one when translating ICMP to ICMPv6.

    This patch adds the parameter to icmp6_send() but
    does not change icmpv6_send() signature.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jun, 2016

2 commits

  • IPv6 version of 3f2fb9a834cb ("net: l3mdev: address selection should only
    consider devices in L3 domain") and the follow up commit, a17b693cdd876
    ("net: l3mdev: prefer VRF master for source address selection").

    That is, if outbound device is given then the address preference order
    is an address from that device, an address from the master device if it
    is enslaved, and then an address from a device in the same L3 domain.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • IPv6 source address selection needs to consider the real egress route.
    Similar to IPv4 implement a get_saddr6 method which is called if
    source address has not been set. The get_saddr6 method does a full
    lookup which means pulling a route from the VRF FIB table and properly
    considering linklocal/multicast destination addresses. Lookup failures
    (eg., unreachable) then cause the source address selection to fail
    which gets propagated back to the caller.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern