13 Dec, 2016

2 commits

  • The comment on the name indirection suggested an issue but turned out
    to be untrue. Digging in older kernel version showed issue with ipw2x00
    but that is no longer true so get rid on the name indirection.

    Signed-off-by: Arend van Spriel
    Signed-off-by: Johannes Berg

    Arend Van Spriel
     
  • Pull sparc updates from David Miller:
    "Just a bunch of small cleanups and fixes here, and support for user
    probes from Allen Pais"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: fix a building error reported by kbuild
    sparc64: fix typo in pgd_clear()
    sparc64: restore irq in error paths in iommu
    sparc: leon: Fix a retry loop in leon_init_timers()
    sparc64: make string buffers large enough
    sparc64: move dereference after check for NULL
    sparc: kernel: use builtin_platform_driver
    sparc64:Support User Probes for sparc

    Linus Torvalds
     

12 Dec, 2016

1 commit


11 Dec, 2016

4 commits

  • PPPOL2TP_MSG_* and L2TP_MSG_* are duplicates, and are being used
    interchangeably in the kernel, so let's standardize on L2TP_MSG_*
    internally, and keep PPPOL2TP_MSG_* defined in UAPI for compatibility.

    Signed-off-by: Asbjoern Sloth Toennesen
    Signed-off-by: David S. Miller

    Asbjørn Sloth Tønnesen
     
  • Move the L2TP_MSG_* definitions to UAPI, as it is part of
    the netlink API.

    Signed-off-by: Asbjoern Sloth Toennesen
    Signed-off-by: David S. Miller

    Asbjørn Sloth Tønnesen
     
  • David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Limit the number of can filters to avoid > MAX_ORDER allocations.
    Fix from Marc Kleine-Budde.

    2) Limit GSO max size in netvsc driver to avoid problems with NVGRE
    configurations. From Stephen Hemminger.

    3) Return proper error when memory allocation fails in
    ser_gigaset_init(), from Dan Carpenter.

    4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
    Feng.

    5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from
    Florian Fainelli.

    6) Handle probe deferral properly in smsc911x driver.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: mlx5: Fix Kconfig help text
    net: smsc911x: back out silently on probe deferrals
    ibmveth: set correct gso_size and gso_type
    net: ethernet: cpmac: Call SET_NETDEV_DEV()
    net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
    vhost-vsock: fix orphan connection reset
    cxgb4/cxgb4vf: Assign netdev->dev_port with port ID
    driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
    ser_gigaset: return -ENOMEM on error instead of success
    NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
    can: peak: fix bad memory access and free sequence
    phy: Don't increment MDIO bus refcount unless it's a different owner
    netvsc: reduce maximum GSO size
    drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
    can: raw: raw_setsockopt: limit number of can_filter that can be set

    Linus Torvalds
     

10 Dec, 2016

4 commits

  • …inux/kernel/git/jberg/mac80211-next

    Johannes Berg says:

    ====================
    Three fixes:
    * fix a logic bug introduced by a previous cleanup
    * fix nl80211 attribute confusing (trying to use
    a single attribute for two purposes)
    * fix a long-standing BSS leak that happens when an
    association attempt is abandoned
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • If udp_recvmsg() constantly releases sk_rmem_alloc
    for every read packet, it gives opportunity for
    producers to immediately grab spinlocks and desperatly
    try adding another packet, causing false sharing.

    We can add a simple heuristic to give the signal
    by batches of ~25 % of the queue capacity.

    This patch considerably increases performance under
    flood by about 50 %, since the thread draining the queue
    is no longer slowed by false sharing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In UDP RX handler, we currently clear skb->dev before skb
    is added to receive queue, because device pointer is no longer
    available once we exit from RCU section.

    Since this first cache line is always hot, lets reuse this space
    to store skb->truesize and thus avoid a cache line miss at
    udp_recvmsg()/udp_skb_destructor time while receive queue
    spinlock is held.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pull libnvdimm fixes from Dan Williams:
    "Several fixes to the DSM (ACPI device specific method) marshaling
    implementation.

    I consider these urgent enough to send for 4.9 consideration since
    they fix the kernel's handling of ARS (Address Range Scrub) commands.
    Especially for platforms without machine-check-recovery capabilities,
    successful execution of ARS commands enables the platform to
    potentially break out of an infinite reboot problem if a media error
    is present in the boot path. There is also a one line fix for a
    device-dax read-only mapping regression.

    Commits 9a901f5495e2 ("acpi, nfit: fix extended status translations
    for ACPI DSMs") and 325896ffdf90 ("device-dax: fix private mapping
    restriction, permit read-only") are true regression fixes for changes
    introduced this cycle.

    Commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status
    output length handling") fixes the kernel's handling of zero-length
    results, this never would have worked in the past, but we only just
    recently discovered a BIOS implementation that emits this arguably
    spec non-compliant result.

    The remaining two commits are additional fall out from thinking
    through the implications of a zero / truncated length result of the
    ARS Status command.

    In order to mitigate the risk that these changes introduce yet more
    regressions they are backstopped by a new unit test in commit
    a7de92dac9f0 ("tools/testing/nvdimm: unit test acpi_nfit_ctl()") that
    mocks up inputs to acpi_nfit_ctl()"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: fix private mapping restriction, permit read-only
    tools/testing/nvdimm: unit test acpi_nfit_ctl()
    acpi, nfit: fix bus vs dimm confusion in xlat_status
    acpi, nfit: validate ars_status output buffer size
    acpi, nfit, libnvdimm: fix / harden ars_status output length handling
    acpi, nfit: fix extended status translations for ACPI DSMs

    Linus Torvalds
     

09 Dec, 2016

14 commits

  • When mac80211 abandons an association attempt, it may free
    all the data structures, but inform cfg80211 and userspace
    about it only by sending the deauth frame it received, in
    which case cfg80211 has no link to the BSS struct that was
    used and will not cfg80211_unhold_bss() it.

    Fix this by providing a way to inform cfg80211 of this with
    the BSS entry passed, so that it can clean up properly, and
    use this ability in the appropriate places in mac80211.

    This isn't ideal: some code is more or less duplicated and
    tracing is missing. However, it's a fairly small change and
    it's thus easier to backport - cleanups can come later.

    Cc: stable@vger.kernel.org
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • NL80211_ATTR_MAC was used to set both the specific BSSID to be scanned
    and the random MAC address to be used when privacy is enabled. When both
    the features are enabled, both the BSSID and the local MAC address were
    getting same value causing Probe Request frames to go with unintended
    DA. Hence, this has been fixed by using a different NL80211_ATTR_BSSID
    attribute to set the specific BSSID (which was the more recent addition
    in cfg80211) for a scan.

    Backwards compatibility with old userspace software is maintained to
    some extent by allowing NL80211_ATTR_MAC to be used to set the specific
    BSSID when scanning without enabling random MAC address use.

    Scanning with random source MAC address was introduced by commit
    ad2b26abc157 ("cfg80211: allow drivers to support random MAC addresses
    for scan") and the issue was introduced with the addition of the second
    user for the same attribute in commit 818965d39177 ("cfg80211: Allow a
    scan request for a specific BSSID").

    Fixes: 818965d39177 ("cfg80211: Allow a scan request for a specific BSSID")
    Signed-off-by: Vamsi Krishna
    Signed-off-by: Jouni Malinen
    Signed-off-by: Johannes Berg

    Vamsi Krishna
     
  • This patch allows XDP prog to extend/remove the packet
    data at the head (like adding or removing header). It is
    done by adding a new XDP helper bpf_xdp_adjust_head().

    It also renames bpf_helper_changes_skb_data() to
    bpf_helper_changes_pkt_data() to better reflect
    that XDP prog does not work on skb.

    This patch adds one "xdp_adjust_head" bit to bpf_prog for the
    XDP-capable driver to check if the XDP prog requires
    bpf_xdp_adjust_head() support. The driver can then decide
    to error out during XDP_SETUP_PROG.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • >From : Woojung Huh

    Add functions to unregister phy fixup for modules.

    int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
    Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
    phy_uid_mask

    int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
    Unregister phy fixup from phy_fixup_list.
    Use it for fixup registered by phy_register_fixup_for_uid()

    int phy_unregister_fixup_for_id(const char *bus_id)
    Unregister phy fixup from phy_fixup_list.
    Use it for fixup registered by phy_register_fixup_for_id()

    Signed-off-by: Woojung Huh
    Signed-off-by: David S. Miller

    Woojung.Huh@microchip.com
     
  • Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    and 484611357c19 ("bpf: allow access into map value arrays") by themselves
    are correct, but in combination they make state equivalence ignore 'id' field
    of the register state which can lead to accepting invalid program.

    Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • sk_drops can be an often written field, do not read it unless
    application showed interest.

    Note that sk_drops can be read via inet_diag, so applications
    can avoid getting this info from every received packet.

    In the future, 'reading' sk_drops might require folding per node or per
    cpu fields, and thus become even more expensive than today.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Under UDP flood, many softirq producers try to add packets to
    UDP receive queue, and one user thread is burning one cpu trying
    to dequeue packets as fast as possible.

    Two parts of the per packet cost are :
    - copying payload from kernel space to user space,
    - freeing memory pieces associated with skb.

    If socket is under pressure, softirq handler(s) can try to pull in
    skb->head the payload of the packet if it fits.

    Meaning the softirq handler(s) can free/reuse the page fragment
    immediately, instead of letting udp_recvmsg() do this hundreds of usec
    later, possibly from another node.

    Additional gains :
    - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
    - We avoid cache line misses at copyout() time and consume_skb() time,
    and avoid one put_page() with potential alien freeing on NUMA hosts.

    This comes at the cost of a copy, bounded to available tail room, which
    is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
    than necessary)

    This patch gave me about 5 % increase in throughput in my tests.

    skb_condense() helper could probably used in other contexts.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • RFS is not commonly used, so add a jump label to avoid some conditionals
    in fast path.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The driver currently always sets the PBLx8/PBLx4 bit, which means that
    the pbl values configured via the pbl/txpbl/rxpbl DT properties are
    always multiplied by 8/4 in the hardware.

    In order to allow the DT to configure lower pbl values, while at the
    same time not changing behavior of any existing device trees using the
    pbl/txpbl/rxpbl settings, add a property to disable the multiplication
    of the pbl by 8/4 in the hardware.

    Suggested-by: Rabin Vincent
    Signed-off-by: Niklas Cassel
    Acked-by: Alexandre Torgue
    Signed-off-by: David S. Miller

    Niklas Cassel
     
  • GMAC and newer supports independent programmable burst lengths for
    DMA tx/rx. Add new optional devicetree properties representing this.

    To be backwards compatible, snps,pbl will still be valid, but
    snps,txpbl/snps,rxpbl will override the value in snps,pbl if set.

    If the IP is synthesized to use the AXI interface, there is a register
    and a matching DT property inside the optional stmmac-axi-config DT node
    for controlling burst lengths, named snps,blen.
    However, using this register, it is not possible to control tx and rx
    independently. Also, this register is not available if the IP was
    synthesized with, e.g., the AHB interface.

    Signed-off-by: Niklas Cassel
    Acked-by: Alexandre Torgue
    Signed-off-by: David S. Miller

    Niklas Cassel
     
  • Telit LE922A MBIM based composition does not work properly
    with altsetting toggle done in cdc_ncm_bind_common.

    This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
    to avoid this procedure that, instead, is mandatory for
    other modems.

    Signed-off-by: Daniele Palmas
    Reviewed-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Daniele Palmas
     
  • Support matching on ICMP type and code.

    Example usage:

    tc qdisc add dev eth0 ingress

    tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 ip_proto icmp type 8 code 0 action drop

    tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 ip_proto icmpv6 type 128 code 0 action drop

    Signed-off-by: Simon Horman
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Allow dissection of ICMP(V6) type and code. This should only occur
    if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set.

    There are currently no users of FLOW_DISSECTOR_KEY_ICMP.
    A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by
    the flower classifier.

    Signed-off-by: Simon Horman
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Add UAPI to provide set of flags for matching, where the flags
    provided from user-space are mapped to flow-dissector flags.

    The 1st flag allows to match on whether the packet is an
    IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.

    Signed-off-by: Or Gerlitz
    Reviewed-by: Paul Blakey
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Or Gerlitz
     

08 Dec, 2016

2 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains a large Netfilter update for net-next,
    to summarise:

    1) Add support for stateful objects. This series provides a nf_tables
    native alternative to the extended accounting infrastructure for
    nf_tables. Two initial stateful objects are supported: counters and
    quotas. Objects are identified by a user-defined name, you can fetch
    and reset them anytime. You can also use a maps to allow fast lookups
    using any arbitrary key combination. More info at:

    http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

    2) On-demand registration of nf_conntrack and defrag hooks per netns.
    Register nf_conntrack hooks if we have a stateful ruleset, ie.
    state-based filtering or NAT. The new nf_conntrack_default_on sysctl
    enables this from newly created netnamespaces. Default behaviour is not
    modified. Patches from Florian Westphal.

    3) Allocate 4k chunks and then use these for x_tables counter allocation
    requests, this improves ruleset load time and also datapath ruleset
    evaluation, patches from Florian Westphal.

    4) Add support for ebpf to the existing x_tables bpf extension.
    From Willem de Bruijn.

    5) Update layer 4 checksum if any of the pseudoheader fields is updated.
    This provides a limited form of 1:1 stateless NAT that make sense in
    specific scenario, eg. load balancing.

    6) Add support to flush sets in nf_tables. This series comes with a new
    set->ops->deactivate_one() indirection given that we have to walk
    over the list of set elements, then deactivate them one by one.
    The existing set->ops->deactivate() performs an element lookup that
    we don't need.

    7) Two patches to avoid cloning packets, thus speed up packet forwarding
    via nft_fwd from ingress. From Florian Westphal.

    8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
    prevent infinite loops, patch from Dwip Banerjee. And one minor
    refactoring from Gao feng.

    9) Revisit recent log support for nf_tables netdev families: One patch
    to ensure that we correctly handle non-ethernet packets. Another
    patch to add missing logger definition for netdev. Patches from
    Liping Zhang.

    10) Three patches for nft_fib, one to address insufficient register
    initialization and another to solve incorrect (although harmless)
    byteswap operation. Moreover update xt_rpfilter and nft_fib to match
    lbcast packets with zeronet as source, eg. DHCP Discover packets
    (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

    11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
    Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
    been broken in many-cast mode for some little time, let's give them a
    chance by placing them at the same level as other existing protocols.
    Thus, users don't explicitly have to modprobe support for this and
    NAT rules work for them. Some people point to the lack of support in
    SOHO Linux-based routers that make deployment of new protocols harder.
    I guess other middleboxes outthere on the Internet are also to blame.
    Anyway, let's see if this has any impact in the midrun.

    12) Skip software SCTP software checksum calculation if the NIC comes
    with SCTP checksum offload support. From Davide Caratti.

    13) Initial core factoring to prepare conversion to hook array. Three
    patches from Aaron Conole.

    14) Gao Feng made a wrong conversion to switch in the xt_multiport
    extension in a patch coming in the previous batch. Fix it in this
    batch.

    15) Get vmalloc call in sync with kmalloc flags to avoid a warning
    and likely OOM killer intervention from x_tables. From Marcelo
    Ricardo Leitner.

    16) Update Arturo Borrero's email address in all source code headers.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This reverts commit 8ab2ae655bfe384335c5b6b0d6041e0ddce26b00.

    I loved that commit because of how it explained what the problem with
    newer versions of binutils were, but the actual patch itself turns out
    to not work very well.

    It has two problems:

    - a zero CRC value isn't actually right. It happens to work for the
    case where both sides of the equation fail at giving the symbol a
    crc, but there are cases where the users of the exported symbol get
    the right crc (due to seeing the C declarations), but the actual
    exporting itself does not (due to the whole weak asm symbol issue).

    So then the module load fails after all - we did have a crc for the
    symbol, but we couldn't match it with the loaded module.

    - it seems that the alpha assembler has special semantics for the
    '.set' directive, and on alpha it doesn't actually set the value of
    the specified symbol at all, it is instead used to set various
    assembly modes (eg ".set noat" and ".set noreorder").

    So using ".set" to set the symbol value would just cause build
    failures on alpha.

    I'm sure we'll find some other workaround for these issues (hopefully
    that involves getting rid of modversions entirely some day, but people
    are also talking about just using smarter tools). But for now we'll
    just fall back on commit faaae2a58143 ("Re-enable CONFIG_MODVERSIONS in
    a slightly weaker form") that just let's a missing crc through.

    Reported-by: Jan Stancek
    Reported-by: Philip Müller
    Reported-by: Guenter Roeck
    Cc: Arnd Bergmann
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Dec, 2016

13 commits

  • Paolo noticed a cache line miss in UDP recvmsg() to access
    sk_rxhash, sharing a cache line with sk_drops.

    sk_drops might be heavily incremented by cpus handling a flood targeting
    this socket.

    We might place sk_drops on a separate cache line, but lets try
    to avoid wasting 64 bytes per socket just for this, since we have
    other bottlenecks to take care of.

    sock_rps_record_flow() should only access sk_rxhash for connected
    flows.

    Testing sk_state for TCP_ESTABLISHED covers most of the cases for
    connected sockets, for a zero cost, since system calls using
    sock_rps_record_flow() also access sk->sk_prot which is on the
    same cache line.

    A follow up patch will provide a static_key (Jump Label) since most
    hosts do not even use RFS.

    Signed-off-by: Eric Dumazet
    Reported-by: Paolo Abeni
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add support for attaching an eBPF object by file descriptor.

    The iptables binary can be called with a path to an elf object or a
    pinned bpf object. Also pass the mode and path to the kernel to be
    able to return it later for iptables dump and save.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: Pablo Neira Ayuso

    Willem de Bruijn
     
  • This patch adds support for set flushing, that consists of walking over
    the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
    This patch requires the following changes:

    1) Add set->ops->deactivate_one() operation: This allows us to
    deactivate an element from the set element walk path, given we can
    skip the lookup that happens in ->deactivate().

    2) Add a new nft_trans_alloc_gfp() function since we need to allocate
    transactions using GFP_ATOMIC given the set walk path happens with
    held rcu_read_lock.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch allows us to refer to stateful object dictionaries, the
    source register indicates the key data to be used to look up for the
    corresponding state object. We can refer to these maps through names or,
    alternatively, the map transaction id. This allows us to refer to both
    anonymous and named maps.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch allows you to refer to stateful objects from set elements.
    This provides the infrastructure to create maps where the right hand
    side of the mapping is a stateful object.

    This allows us to build dictionaries of stateful objects, that you can
    use to perform fast lookups using any arbitrary key combination.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag
    indicates we have reached overquota.

    Add pointer to table from nft_object, so we can use it when sending the
    depletion notification to userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Introduce nf_tables_obj_notify() to notify internal state changes in
    stateful objects. This is used by the quota object to report depletion
    in a follow up patch.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
    dump-and-reset of the stateful object. This also comes with add support
    for atomic dump and reset for counter and quota objects.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Add a new attribute NFTA_QUOTA_CONSUMED that displays the amount of
    quota that has been already consumed. This allows us to restore the
    internal state of the quota object between reboots as well as to monitor
    how wasted it is.

    This patch changes the logic to account for the consumed bytes, instead
    of the bytes that remain to be consumed.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds a check to limit the number of can_filters that can be
    set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
    are not prevented resulting in a warning.

    Reference: https://lkml.org/lkml/2016/12/2/230

    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Cc: linux-stable
    Signed-off-by: Marc Kleine-Budde

    Marc Kleine-Budde
     
  • David S. Miller
     
  • Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
    field of the ARS (Address Range Scrub) Status command, a firmware
    implementation may in practice return 0, 4, or 8 to indicate that there
    is no output payload to process.

    The specification states "Size of Output Buffer in bytes, including this
    field.". However, 'Output Buffer' is also the name of the entire
    payload, and earlier in the specification it states "Max Query ARS
    Status Output Buffer Size: Maximum size of buffer (including the Status
    and Extended Status fields)".

    Without this fix if the BIOS happens to return 0 it causes memory
    corruption as evidenced by this result from the acpi_nfit_ctl() unit
    test.

    ars_status00000000: 00020000 00000000 ........
    BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
    kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
    task: ffff8803332d2ec0 task.stack: ffffc9000174c000
    RIP: 0010:[] [] __memcpy+0x12/0x20
    RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246
    RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
    RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
    RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
    R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
    FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
    Stack:
    ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
    0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
    0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
    Call Trace:
    [] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
    [] nfit_test_probe+0x670/0xb1b [nfit_test]

    Cc:
    Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
    Signed-off-by: Dan Williams

    Dan Williams
     
  • This new expression allows us to refer to existing stateful objects from
    rules.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso