08 Jan, 2014

1 commit

  • This change allows to follow a recommandation of RFC4942.

    - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
    as source addresses for ICMPv6 echo reply. This sysctl is false by default
    to preserve existing behavior.
    - Add inline check ipv6_anycast_destination().
    - Use them in icmpv6_echo_reply().

    Reference:
    RFC4942 - IPv6 Transition/Coexistence Security Considerations
    (http://tools.ietf.org/html/rfc4942#section-2.1.6)

    2.1.6. Anycast Traffic Identification and Security

    [...]
    To avoid exposing knowledge about the internal structure of the
    network, it is recommended that anycast servers now take advantage of
    the ability to return responses with the anycast address as the
    source address if possible.

    Signed-off-by: Francois-Xavier Le Bail
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    FX Le Bail
     

07 Jan, 2014

1 commit


06 Jan, 2014

1 commit

  • Pablo Neira Ayuso says:

    ====================
    netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next tree,
    they are:

    * Add full port randomization support. Some crazy researchers found a way
    to reconstruct the secure ephemeral ports that are allocated in random mode
    by sending off-path bursts of UDP packets to overrun the socket buffer of
    the DNS resolver to trigger retransmissions, then if the timing for the
    DNS resolution done by a client is larger than usual, then they conclude
    that the port that received the burst of UDP packets is the one that was
    opened. It seems a bit aggressive method to me but it seems to work for
    them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
    new NAT mode to fully randomize ports using prandom.

    * Add a new classifier to x_tables based on the socket net_cls set via
    cgroups. These includes two patches to prepare the field as requested by
    Zefan Li. Also from Daniel Borkmann.

    * Use prandom instead of get_random_bytes in several locations of the
    netfilter code, from Florian Westphal.

    * Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
    mark, also from Florian Westphal.

    * Fix compilation warning due to unused variable in IPVS, from Geert
    Uytterhoeven.

    * Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

    * Add IPComp extension to x_tables, from Fan Du.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Jan, 2014

1 commit

  • It would be useful e.g. in a server or desktop environment to have
    a facility in the notion of fine-grained "per application" or "per
    application group" firewall policies. Probably, users in the mobile,
    embedded area (e.g. Android based) with different security policy
    requirements for application groups could have great benefit from
    that as well. For example, with a little bit of configuration effort,
    an admin could whitelist well-known applications, and thus block
    otherwise unwanted "hard-to-track" applications like [1] from a
    user's machine. Blocking is just one example, but it is not limited
    to that, meaning we can have much different scenarios/policies that
    netfilter allows us than just blocking, e.g. fine grained settings
    where applications are allowed to connect/send traffic to, application
    traffic marking/conntracking, application-specific packet mangling,
    and so on.

    Implementation of PID-based matching would not be appropriate
    as they frequently change, and child tracking would make that
    even more complex and ugly. Cgroups would be a perfect candidate
    for accomplishing that as they associate a set of tasks with a
    set of parameters for one or more subsystems, in our case the
    netfilter subsystem, which, of course, can be combined with other
    cgroup subsystems into something more complex if needed.

    As mentioned, to overcome this constraint, such processes could
    be placed into one or multiple cgroups where different fine-grained
    rules can be defined depending on the application scenario, while
    e.g. everything else that is not part of that could be dropped (or
    vice versa), thus making life harder for unwanted processes to
    communicate to the outside world. So, we make use of cgroups here
    to track jobs and limit their resources in terms of iptables
    policies; in other words, limiting, tracking, etc what they are
    allowed to communicate.

    In our case we're working on outgoing traffic based on which local
    socket that originated from. Also, one doesn't even need to have
    an a-prio knowledge of the application internals regarding their
    particular use of ports or protocols. Matching is *extremly*
    lightweight as we just test for the sk_classid marker of sockets,
    originating from net_cls. net_cls and netfilter do not contradict
    each other; in fact, each construct can live as standalone or they
    can be used in combination with each other, which is perfectly fine,
    plus it serves Tejun's requirement to not introduce a new cgroups
    subsystem. Through this, we result in a very minimal and efficient
    module, and don't add anything except netfilter code.

    One possible, minimal usage example (many other iptables options
    can be applied obviously):

    1) Configuring cgroups if not already done, e.g.:

    mkdir /sys/fs/cgroup/net_cls
    mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
    mkdir /sys/fs/cgroup/net_cls/0
    echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
    (resp. a real flow handle id for tc)

    2) Configuring netfilter (iptables-nftables), e.g.:

    iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

    3) Running applications, e.g.:

    ping 208.67.222.222
    echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
    64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
    [...]
    ping 208.67.220.220
    ping: sendmsg: Operation not permitted
    [...]
    echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
    64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
    [...]

    Of course, real-world deployments would make use of cgroups user
    space toolsuite, or own custom policy daemons dynamically moving
    applications from/to various cgroups.

    [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

    Signed-off-by: Daniel Borkmann
    Cc: Tejun Heo
    Cc: cgroups@vger.kernel.org
    Acked-by: Li Zefan
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     

01 Jan, 2014

1 commit


31 Dec, 2013

1 commit


25 Dec, 2013

2 commits

  • Pull block fixes from Jens Axboe:
    - fix for a memory leak on certain unplug events
    - a collection of bcache fixes from Kent and Nicolas
    - a few null_blk fixes and updates form Matias
    - a marking of static of functions in the stec pci-e driver

    * 'for-linus' of git://git.kernel.dk/linux-block:
    null_blk: support submit_queues on use_per_node_hctx
    null_blk: set use_per_node_hctx param to false
    null_blk: corrections to documentation
    null_blk: warning on ignored submit_queues param
    null_blk: refactor init and init errors code paths
    null_blk: documentation
    null_blk: mem garbage on NUMA systems during init
    drivers: block: Mark the functions as static in skd_main.c
    bcache: New writeback PD controller
    bcache: bugfix for race between moving_gc and bucket_invalidate
    bcache: fix for gc and writeback race
    bcache: bugfix - moving_gc now moves only correct buckets
    bcache: fix for gc crashing when no sectors are used
    bcache: Fix heap_peek() macro
    bcache: Fix for can_attach_cache()
    bcache: Fix dirty_data accounting
    bcache: Use uninterruptible sleep in writeback
    bcache: kthread don't set writeback task to INTERUPTIBLE
    block: fix memory leaks on unplugging block device
    bcache: fix sparse non static symbol warning

    Linus Torvalds
     
  • Pull libata fixes from Tejun Heo:
    "There's one interseting commit - "libata, freezer: avoid block device
    removal while system is frozen". It's an ugly hack working around a
    deadlock condition between driver core resume and block layer device
    removal paths through freezer which was made more reproducible by
    writeback being converted to workqueue some releases ago. The bug has
    nothing to do with libata but it's just an workaround which is easy to
    backport. After discussion, Rafael and I seem to agree that we don't
    really need kernel freezables - both kthread and workqueue. There are
    few specific workqueues which constitute PM operations and require
    freezing, which will be converted to use workqueue_set_max_active()
    instead. All other kernel freezer uses are planned to be removed,
    followed by the removal of kthread and workqueue freezer support,
    hopefully.

    Others are device-specific fixes. The most notable is the addition of
    NO_NCQ_TRIM which is used to disable queued TRIM commands to Micro
    M500 SSDs which otherwise suffers data corruption"

    * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    libata, freezer: avoid block device removal while system is frozen
    libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs
    libata: disable a disk via libata.force params
    ahci: bail out on ICH6 before using AHCI BAR
    ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN
    libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8

    Linus Torvalds
     

22 Dec, 2013

2 commits


21 Dec, 2013

2 commits


20 Dec, 2013

2 commits

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2013-12-19

    1) Use the user supplied policy index instead of a generated one
    if present. From Fan Du.

    2) Make xfrm migration namespace aware. From Fan Du.

    3) Make the xfrm state and policy locks namespace aware. From Fan Du.

    4) Remove ancient sleeping when the SA is in acquire state,
    we now queue packets to the policy instead. This replaces the
    sleeping code.

    5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
    posibility to sleep. The sleeping code is gone, so remove it.

    6) Check user specified spi for IPComp. Thr spi for IPcomp is only
    16 bit wide, so check for a valid value. From Fan Du.

    7) Export verify_userspi_info to check for valid user supplied spi ranges
    with pfkey and netlink. From Fan Du.

    8) RFC3173 states that if the total size of a compressed payload and the IPComp
    header is not smaller than the size of the original payload, the IP datagram
    must be sent in the original non-compressed form. These packets are dropped
    by the inbound policy check because they are not transformed. Document the need
    to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Marc Kleine-Budde says:

    ====================
    this is a pull request of four patches for net-next/master.

    There is one patch by Markus Pargmann, which speeds up the c_can
    driver, a patch by John Whitmore which updates the in tree
    documentation. A patch by Jeff Kirsher which replaces the FSF's address
    by a link and a patch by Alexander Shiyan which converts the mcp251x
    driver to make use of managed resources.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Dec, 2013

4 commits

  • Add description of module and its parameters.

    Signed-off-by: Matias Bjorling
    Signed-off-by: Jens Axboe

    Matias Bjorling
     
  • Pull crypto key patches from David Howells:
    "There are four items:

    - A patch to fix X.509 certificate gathering. The problem was that I
    was coming up with a different path for signing_key.x509 in the
    build directory if it didn't exist to if it did exist. This meant
    that the X.509 cert container object file would be rebuilt on the
    second rebuild in a build directory and the kernel would get
    relinked.

    - Unconditionally remove files generated by SYSTEM_TRUSTED_KEYRING=y
    when doing make mrproper.

    - Actually initialise the persistent-keyring semaphore for
    init_user_ns. I have no idea why this works at all for users in
    the base user namespace unless it's something to do with systemd
    containerising the system.

    - Documentation for module signing"

    * 'keys-devel' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    Add Documentation/module-signing.txt file
    KEYS: fix uninitialized persistent_keyring_register_sem
    KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y
    X.509: Fix certificate gathering

    Linus Torvalds
     
  • This new mode discards all incoming fragmentation-needed notifications
    as I guess was originally intended with this knob. To not break backward
    compatibility too much, I only added a special case for mode 2 in the
    receiving path.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Conflicts:
    drivers/net/ethernet/intel/i40e/i40e_main.c
    drivers/net/macvtap.c

    Both minor merge hassles, simple overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2013

1 commit


17 Dec, 2013

2 commits

  • Changed MAINTAINERS file to add Documentation/networking/can.txt to the list of
    maintained files.

    can.txt:
    - Globally changed Socket CAN to SocketCAN
    - Removed section 3.3 from the document
    - Updated Section 7
    - Corrected a few simple typos

    Acked-by: Oliver Hartkopp
    Signed-off-by: John Whitmore
    Signed-off-by: Marc Kleine-Budde

    John Whitmore
     
  • A user on StackExchange had a failing SSD that's soldered directly
    onto the motherboard of his system. The BIOS does not give any option
    to disable it at all, so he can't just hide it from the OS via the
    BIOS.

    The old IDE layer had hdX=noprobe override for situations like this,
    but that was never ported to the libata layer.

    This patch implements a disable flag for libata.force.

    Example use:

    libata.force=2.0:disable

    [v2 of the patch, removed the nodisable flag per Tejun Heo]

    Signed-off-by: Robin H. Johnson
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org
    Link: http://unix.stackexchange.com/questions/102648/how-to-tell-linux-kernel-3-0-to-completely-ignore-a-failing-disk
    Link: http://askubuntu.com/questions/352836/how-can-i-tell-linux-kernel-to-completely-ignore-a-disk-as-if-it-was-not-even-co
    Link: http://superuser.com/questions/599333/how-to-disable-kernel-probing-for-drive

    Robin H. Johnson
     

16 Dec, 2013

2 commits

  • Create Documentation/networking/ipsec.txt to document IPsec
    corner issues and other info, which will be useful when user
    deploying IPsec.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • Pull networking fixes from David Miller:

    1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
    figure out why it breaks things.

    2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
    was basically doing "x == x", from Dave Jones.

    3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
    Sebastian Siewior.

    4) Blackhole and prohibit routes in ipv6 were not doing the right thing
    because their ->input and ->output methods were not being assigned
    correctly. Now they behave properly like their ipv4 counterparts.
    From Kamala R.

    5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
    really do not support this feature and will send garbage packets if
    fed fraglist SKBs. From Eric Dumazet.

    6) Fix long standing user triggerable BUG_ON over loopback in RDS
    protocol stack, from Venkat Venkatsubra.

    7) Several not so common code paths can potentially try to invoke
    packet scheduler actions that might be NULL without checking. Shore
    things up by either 1) defining a method as mandatory and erroring
    on registration if that method is NULL 2) defininig a method as
    optional and the registration function hooks up a default
    implementation when NULL is seen. From Jamal Hadi Salim.

    8) Fix fragment detection in xen-natback driver, from Paul Durrant.

    9) Kill dangling enter_memory_pressure method in cg_proto ops, from
    Eric W Biederman.

    10) SKBs that traverse namespaces should have their local_df cleared,
    from Hannes Frederic Sowa.

    11) IOCB file position is not being updated by macvtap_aio_read() and
    tun_chr_aio_read(). From Zhi Yong Wu.

    12) Don't free virtio_net netdev before releasing all of the NAPI
    instances. From Andrey Vagin.

    13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.

    14) IPv6 routes that are no cached routes should not count against the
    garbage collection limits. We had this almost right, but were
    missing handling addrconf generated routes properly. From Hannes
    Frederic Sowa.

    15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
    route info when they are called, from Stefan Tomanek.

    16) TUN and MACVTAP have had truncated packet signalling for some time,
    fix from Jason Wang.

    17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.

    18) xen-netback does not interpret the NAPI budget properly for TX work,
    fix from Paul Durrant.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
    igb: Fix for issue where values could be too high for udelay function.
    i40e: fix null dereference
    xen-netback: fix gso_prefix check
    net: make neigh_priv_len in struct net_device 16bit instead of 8bit
    drivers: net: cpsw: fix for cpsw crash when build as modules
    xen-netback: napi: don't prematurely request a tx event
    xen-netback: napi: fix abuse of budget
    sch_tbf: use do_div() for 64-bit divide
    udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
    net:fec: remove duplicate lines in comment about errata ERR006358
    Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
    8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
    xen-netback: make sure skb linear area covers checksum field
    net: smc91x: Fix device tree based configuration so it's usable
    udp: ipv4: fix potential use after free in udp_v4_early_demux()
    macvtap: signal truncated packets
    tun: unbreak truncated packet signalling
    net: sched: htb: fix the calculation of quantum
    net: sched: tbf: fix the calculation of max_size
    micrel: add support for KSZ8041RNLI
    ...

    Linus Torvalds
     

14 Dec, 2013

1 commit

  • Pull device mapper fixes from Mike Snitzer:
    "A set of device-mapper fixes for 3.13.

    A fix for possible memory corruption during DM table load, fix a
    possible leak of snapshot space in case of a crash, fix a possible
    deadlock due to a shared workqueue in the delay target, fix to
    initialize read-only module parameters that are used to export metrics
    for dm stats and dm bufio.

    Quite a few stable fixes were identified for both the thin-
    provisioning and caching targets as a result of increased regression
    testing using the device-mapper-test-suite (dmts). The most notable
    of these are the reference counting fixes for the space map btree that
    is used by the dm-array interface -- without these the dm-cache
    metadata will leak, resulting in dm-cache devices running out of
    metadata blocks. Also, some important fixes related to the
    thin-provisioning target's transition to read-only mode on error"

    * tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm array: fix a reference counting bug in shadow_ablock
    dm space map: disallow decrementing a reference count below zero
    dm stats: initialize read-only module parameter
    dm bufio: initialize read-only module parameters
    dm cache: actually resize cache
    dm cache: update Documentation for invalidate_cblocks's range syntax
    dm cache policy mq: fix promotions to occur as expected
    dm thin: allow pool in read-only mode to transition to read-write mode
    dm thin: re-establish read-only state when switching to fail mode
    dm thin: always fallback the pool mode if commit fails
    dm thin: switch to read-only mode if metadata space is exhausted
    dm thin: switch to read only mode if a mapping insert fails
    dm space map metadata: return on failure in sm_metadata_new_block
    dm table: fail dm_table_create on dm_round_up overflow
    dm snapshot: avoid snapshot space leak on crash
    dm delay: fix a possible deadlock due to shared workqueue

    Linus Torvalds
     

13 Dec, 2013

3 commits

  • This patch adds the Documentation/module-signing.txt file that is
    currently missing from the Documentation directory. The init/Kconfig
    file references the Documentation/module-signing.txt file to explain
    how kernel module signing works. This patch supplies this documentation.

    Signed-off-by: James Solner
    Signed-off-by: David Howells

    James Solner
     
  • Pull media fixes from Mauro Carvalho Chehab:
    "A dvb core deadlock fix, a couple videobuf2 fixes an a series of media
    driver fixes"

    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (30 commits)
    [media] videobuf2-dma-sg: fix possible memory leak
    [media] vb2: regression fix: always set length field.
    [media] mt9p031: Include linux/of.h header
    [media] rtl2830: add parent for I2C adapter
    [media] media: marvell-ccic: use devm to release clk
    [media] ths7303: Declare as static a private function
    [media] em28xx-video: Swap release order to avoid lock nesting
    [media] usbtv: Add support for PAL video source
    [media] media_tree: Fix spelling errors
    [media] videobuf2: Add support for file access mode flags for DMABUF exporting
    [media] radio-shark2: Mark shark_resume_leds() inline to kill compiler warning
    [media] radio-shark: Mark shark_resume_leds() inline to kill compiler warning
    [media] af9035: unlock on error in af9035_i2c_master_xfer()
    [media] af9033: fix broken I2C
    [media] v4l: omap3isp: Don't check for missing get_fmt op on remote subdev
    [media] af9035: fix broken I2C and USB I/O
    [media] wm8775: fix broken audio routing
    [media] marvell-ccic: drop resource free in driver remove
    [media] tef6862/radio-tea5764: actually assign clamp result
    [media] cx231xx: use after free on error path in probe
    ...

    Linus Torvalds
     
  • Pull misc keyrings fixes from David Howells:
    "These break down into five sets:

    - A patch to error handling in the big_key type for huge payloads.
    If the payload is larger than the "low limit" and the backing store
    allocation fails, then big_key_instantiate() doesn't clear the
    payload pointers in the key, assuming them to have been previously
    cleared - but only one of them is.

    Unfortunately, the garbage collector still calls big_key_destroy()
    when sees one of the pointers with a weird value in it (and not
    NULL) which it then tries to clean up.

    - Three patches to fix the keyring type:

    * A patch to fix the hash function to correctly divide keyrings off
    from keys in the topology of the tree inside the associative
    array. This is only a problem if searching through nested
    keyrings - and only if the hash function incorrectly puts the a
    keyring outside of the 0 branch of the root node.

    * A patch to fix keyrings' use of the associative array. The
    __key_link_begin() function initially passes a NULL key pointer
    to assoc_array_insert() on the basis that it's holding a place in
    the tree whilst it does more allocation and stuff.

    This is only a problem when a node contains 16 keys that match at
    that level and we want to add an also matching 17th. This should
    easily be manufactured with a keyring full of keyrings (without
    chucking any other sort of key into the mix) - except for (a)
    above which makes it on average adding the 65th keyring.

    * A patch to fix searching down through nested keyrings, where any
    keyring in the set has more than 16 keyrings and none of the
    first keyrings we look through has a match (before the tree
    iteration needs to step to a more distal node).

    Test in keyutils test suite:

    http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1

    - A patch to fix the big_key type's use of a shmem file as its
    backing store causing audit messages and LSM check failures. This
    is done by setting S_PRIVATE on the file to avoid LSM checks on the
    file (access to the shmem file goes through the keyctl() interface
    and so is gated by the LSM that way).

    This isn't normally a problem if a key is used by the context that
    generated it - and it's currently only used by libkrb5.

    Test in keyutils test suite:

    http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e

    - A patch to add a generated file to .gitignore.

    - A patch to fix the alignment of the system certificate data such
    that it it works on s390. As I understand it, on the S390 arch,
    symbols must be 2-byte aligned because loading the address discards
    the least-significant bit"

    * tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    KEYS: correct alignment of system_certificate_list content in assembly file
    Ignore generated file kernel/x509_certificate_list
    security: shmem: implement kernel private shmem inodes
    KEYS: Fix searching of nested keyrings
    KEYS: Fix multiple key add into associative array
    KEYS: Fix the keyring hash function
    KEYS: Pre-clear struct key on allocation

    Linus Torvalds
     

12 Dec, 2013

2 commits

  • This patch significantly updates the BPF documentation and describes
    its internal architecture, Linux extensions, and handling of the
    kernel's BPF and JIT engine, plus documents how development can be
    facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Commit 89ce376c6bdc (drivers/net: Use of_match_ptr() macro in smc91x.c)
    added minimal device tree support to smc91x, but it's not working on
    many platforms because of the lack of some key configuration bits.

    Fix the issue by parsing the necessary configuration like the
    smc911x driver is doing. As most smc91x users seem to use 16-bit
    access, let's default to that if no reg-io-width is specified.

    Cc: Nicolas Pitre
    Cc: Mark Rutland
    Cc: netdev@vger.kernel.org
    Cc: devicetree@vger.kernel.org
    Acked-by: Nishanth Menon
    Signed-off-by: Tony Lindgren
    Signed-off-by: David S. Miller

    Tony Lindgren
     

11 Dec, 2013

1 commit


10 Dec, 2013

5 commits

  • There are quite a lot of drivers touching a PHY device MII_BMCR
    register to reset the PHY without taking care of:

    1) ensuring that BMCR_RESET is cleared after a given timeout
    2) the PHY state machine resuming to the proper state and re-applying
    potentially changed settings such as auto-negotiation

    Introduce phy_poll_reset() which will take care of polling the MII_BMCR
    for the BMCR_RESET bit to be cleared after a given timeout or return a
    timeout error code.

    In order to make sure the PHY is in a correct state, phy_init_hw() first
    issues a software reset through MII_BMCR and then applies any fixups.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • This patch introduces a PACKET_QDISC_BYPASS socket option, that
    allows for using a similar xmit() function as in pktgen instead
    of taking the dev_queue_xmit() path. This can be very useful when
    PF_PACKET applications are required to be used in a similar
    scenario as pktgen, but with full, flexible packet payload that
    needs to be provided, for example.

    On default, nothing changes in behaviour for normal PF_PACKET
    TX users, so everything stays as is for applications. New users,
    however, can now set PACKET_QDISC_BYPASS if needed to prevent
    own packets from i) reentering packet_rcv() and ii) to directly
    push the frame to the driver.

    In doing so we can increase pps (here 64 byte packets) for
    PF_PACKET a bit:

    # CPUs -- QDISC_BYPASS -- qdisc path -- qdisc path[**]
    1 CPU == 1,509,628 pps -- 1,208,708 -- 1,247,436
    2 CPUs == 3,198,659 pps -- 2,536,012 -- 1,605,779
    3 CPUs == 4,787,992 pps -- 3,788,740 -- 1,735,610
    4 CPUs == 6,173,956 pps -- 4,907,799 -- 1,909,114
    5 CPUs == 7,495,676 pps -- 5,956,499 -- 2,014,422
    6 CPUs == 9,001,496 pps -- 7,145,064 -- 2,155,261
    7 CPUs == 10,229,776 pps -- 8,190,596 -- 2,220,619
    8 CPUs == 11,040,732 pps -- 9,188,544 -- 2,241,879
    9 CPUs == 12,009,076 pps -- 10,275,936 -- 2,068,447
    10 CPUs == 11,380,052 pps -- 11,265,337 -- 1,578,689
    11 CPUs == 11,672,676 pps -- 11,845,344 -- 1,297,412
    [...]
    20 CPUs == 11,363,192 pps -- 11,014,933 -- 1,245,081

    [**]: qdisc path with packet_rcv(), how probably most people
    seem to use it (hopefully not anymore if not needed)

    The test was done using a modified trafgen, sending a simple
    static 64 bytes packet, on all CPUs. The trick in the fast
    "qdisc path" case, is to avoid reentering packet_rcv() by
    setting the RAW socket protocol to zero, like:
    socket(PF_PACKET, SOCK_RAW, 0);

    Tradeoffs are documented as well in this patch, clearly, if
    queues are busy, we will drop more packets, tc disciplines are
    ignored, and these packets are not visible to taps anymore. For
    a pktgen like scenario, we argue that this is acceptable.

    The pointer to the xmit function has been placed in packet
    socket structure hole between cached_dev and prot_hook that
    is hot anyway as we're working on cached_dev in each send path.

    Done in joint work together with Jesper Dangaard Brouer.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
    Daniel's direct transmit changes depend upon.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit e40526cb20b5 introduced a cached dev pointer, that gets
    hooked into register_prot_hook(), __unregister_prot_hook() to
    update the device used for the send path.

    We need to fix this up, as otherwise this will not work with
    sockets created with protocol = 0, plus with sll_protocol = 0
    passed via sockaddr_ll when doing the bind.

    So instead, assign the pointer directly. The compiler can inline
    these helper functions automagically.

    While at it, also assume the cached dev fast-path as likely(),
    and document this variant of socket creation as it seems it is
    not widely used (seems not even the author of TX_RING was aware
    of that in his reference example [1]). Tested with reproducer
    from e40526cb20b5.

    [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example

    Fixes: e40526cb20b5 ("packet: fix use after free race in send path when dev is released")
    Signed-off-by: Daniel Borkmann
    Tested-by: Salam Noureddine
    Tested-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently it is not possible for userspace to map a DMABUF exported buffer
    with write permissions. This patch allows to also pass O_RDONLY/O_RDWR when
    exporting the buffer, so that userspace may map it with write permissions.

    Signed-off-by: Philipp Zabel
    Signed-off-by: Sylwester Nawrocki
    Signed-off-by: Mauro Carvalho Chehab

    Philipp Zabel
     

09 Dec, 2013

1 commit

  • Pull char/misc driver fixes from Greg KH:
    "Nothing huge, just a few small bugfixes for problems reported, and a
    device id update"

    * tag 'char-misc-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    mei: add 9 series PCH mei device ids
    drivers/char/i8k.c: add Dell XPLS L421X
    MAINTAINERS: add HSI subsystem
    misc: mic: Suppress memory space sparse warnings
    misc: mic: Fix endianness issues.
    misc: mic: Fix user space namespace pollution from mic_common.h.
    misc: mic: Bug fix for sysfs poll usage.
    misc: mic: Minor bug fix in 'retry' loops.
    misc: mic: Change mic_notify(...) to return true.
    extcon: remove freed groups caused the panic or warning in unregister flow
    extcon: arizona: Get pdata from arizona structure not device

    Linus Torvalds
     

07 Dec, 2013

4 commits

  • Add a new check for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to reduce
    the number of or's used in the ether_addr_equal comparison to very
    slightly improve function performance.

    Simplify the ether_addr_equal_64bits implementation.
    Integrate and remove the zap_last_2bytes helper as it's now
    used only once.

    Remove the now unused compare_ether_addr function.

    Update the unaligned-memory-access documentation to remove the
    compare_ether_addr description and show how unaligned accesses
    could occur with ether_addr_equal.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • The 'max-speed' property is optional but defined in the ePAPR
    specification and now supported by the Linux Device Tree parsing
    infrastructure.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • John W. Linville says:

    ====================
    Please pull this batch of updates intended for the 3.14 stream...

    For the mac80211 bits, Johannes says:

    "I have various improvements/cleanups/fixes all over, but the shortlog
    shows that Luis's regulatory work and mesh work from the cozybit folks
    are the biggest ones, along with the CSA fixes."

    Along with that, we have big batches of updates to brcmfmac, rtlwifi,
    and ath9k. There are updates to wcn36xx, rt2x00, and a handful of
    others as well.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • With the introduction of TCP Small Queues, TSO auto sizing, and TCP
    pacing, we can implement Automatic Corking in the kernel, to help
    applications doing small write()/sendmsg() to TCP sockets.

    Idea is to change tcp_push() to check if the current skb payload is
    under skb optimal size (a multiple of MSS bytes)

    If under 'size_goal', and at least one packet is still in Qdisc or
    NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
    will be delayed up to TX completion time.

    This delay might allow the application to coalesce more bytes
    in the skb in following write()/sendmsg()/sendfile() system calls.

    The exact duration of the delay is depending on the dynamics
    of the system, and might be zero if no packet for this flow
    is actually held in Qdisc or NIC TX ring.

    Using FQ/pacing is a way to increase the probability of
    autocorking being triggered.

    Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
    this feature and default it to 1 (enabled)

    Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
    This counter is incremented every time we detected skb was under used
    and its flush was deferred.

    Tested:

    Interesting effects when using line buffered commands under ssh.

    Excellent performance results in term of cpu usage and total throughput.

    lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    9410.39

    Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

    35209.439626 task-clock # 2.901 CPUs utilized
    2,294 context-switches # 0.065 K/sec
    101 CPU-migrations # 0.003 K/sec
    4,079 page-faults # 0.116 K/sec
    97,923,241,298 cycles # 2.781 GHz [83.31%]
    51,832,908,236 stalled-cycles-frontend # 52.93% frontend cycles idle [83.30%]
    25,697,986,603 stalled-cycles-backend # 26.24% backend cycles idle [66.70%]
    102,225,978,536 instructions # 1.04 insns per cycle
    # 0.51 stalled cycles per insn [83.38%]
    18,657,696,819 branches # 529.906 M/sec [83.29%]
    91,679,646 branch-misses # 0.49% of all branches [83.40%]

    12.136204899 seconds time elapsed

    lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    6624.89

    Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
    40045.864494 task-clock # 3.301 CPUs utilized
    171 context-switches # 0.004 K/sec
    53 CPU-migrations # 0.001 K/sec
    4,080 page-faults # 0.102 K/sec
    111,340,458,645 cycles # 2.780 GHz [83.34%]
    61,778,039,277 stalled-cycles-frontend # 55.49% frontend cycles idle [83.31%]
    29,295,522,759 stalled-cycles-backend # 26.31% backend cycles idle [66.67%]
    108,654,349,355 instructions # 0.98 insns per cycle
    # 0.57 stalled cycles per insn [83.34%]
    19,552,170,748 branches # 488.244 M/sec [83.34%]
    157,875,417 branch-misses # 0.81% of all branches [83.34%]

    12.130267788 seconds time elapsed

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet