08 Apr, 2014

1 commit

  • If the renamed symbol is defined lib/iomap.c implements ioport_map and
    ioport_unmap and currently (nearly) all platforms define the port
    accessor functions outb/inb and friend unconditionally. So
    HAS_IOPORT_MAP is the better name for this.

    Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.

    The motivation for this change is to reintroduce a symbol HAS_IOPORT
    that signals if outb/int et al are available. I will address that at
    least one merge window later though to keep surprises to a minimum and
    catch new introductions of (HAS|NO)_IOPORT.

    The changes in this commit were done using:

    $ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'

    Signed-off-by: Uwe Kleine-König
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     

13 Mar, 2014

1 commit

  • Many architectures have a stub cputime.h that only include the default
    cputime.h

    Lets remove the useless headers, we only need to mention that we want
    the default headers on the Kbuild files.

    Cc: Archs
    Cc: Ingo Molnar
    Cc: Marcelo Tosatti
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Acked-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

10 Feb, 2014

2 commits

  • This patch allows each architecture to add its specific assembly optimized
    arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
    MCS lock and unlock functions.

    Signed-off-by: Tim Chen
    Cc: Scott J Norton
    Cc: Raghavendra K T
    Cc: AswinChandramouleeswaran
    Cc: George Spelvin
    Cc: Rik vanRiel
    Cc: Andrea Arcangeli
    Cc: MichelLespinasse
    Cc: Peter Hurley
    Cc: Andi Kleen
    Cc: Alex Shi
    Cc: Dave Hansen
    Cc: Tim Chen
    Cc: Arnd Bergmann
    Cc: "Figo.zhang"
    Cc: "Paul E.McKenney"
    Cc: "H. Peter Anvin"
    Cc: Davidlohr Bueso
    Cc: Waiman Long
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Matthew R Wilcox
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
    Signed-off-by: Ingo Molnar

    Tim Chen
     
  • We perform a clean up of the Kbuid files in each architecture.
    We order the files in each Kbuild in alphabetical order
    by running the below script.

    for i in arch/*/include/asm/Kbuild
    do
    cat $i | gawk '/^generic-y/ {
    i = 3;
    do {
    for (; i ${i}.sorted;
    mv ${i}.sorted $i;
    done

    Signed-off-by: Tim Chen
    Cc: Arnd Bergmann
    Cc: Matthew R Wilcox
    Cc: AswinChandramouleeswaran
    Cc: Dave Hansen
    Cc: "Paul E.McKenney"
    Cc: Scott J Norton
    Cc: Will Deacon
    Cc: "Figo.zhang"
    Cc: Linus Torvalds
    Cc: Rik van Riel
    Cc: Waiman Long
    Cc: Peter Hurley
    Cc: Andrea Arcangeli
    Cc: Tim Chen
    Cc: Alex Shi
    Cc: Raghavendra K T
    Cc: Andi Kleen
    Cc: George Spelvin
    Cc: MichelLespinasse
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Davidlohr Bueso
    Cc: Andrew Morton
    Signed-off-by: Peter Zijlstra
    [ Fixed build bug. ]
    Signed-off-by: Ingo Molnar

    Tim Chen
     

26 Jan, 2014

1 commit

  • Pull networking updates from David Miller:

    1) BPF debugger and asm tool by Daniel Borkmann.

    2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

    3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

    4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match. From Ben Hutchings.

    5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

    6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

    7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

    8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

    9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

    10) Fix ipv6 router reachability probing, from Jiri Benc.

    11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

    12) Support tunneling in GRO layer, from Jerry Chu.

    13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

    14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI. From Atzm Watanabe.

    15) New "Heavy Hitter" qdisc, from Terry Lam.

    16) Significantly improve the IPSEC support in pktgen, from Fan Du.

    17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
    Herbert.

    18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

    19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

    20) Key TCP metrics blobs also by source address, not just destination
    address. From Christoph Paasch.

    21) Support 10G in generic phylib. From Andy Fleming.

    22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided. From Tom Herbert.

    The wireless and netfilter folks have been busy little bees too.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
    net/cxgb4: Fix referencing freed adapter
    ipv6: reallocate addrconf router for ipv6 address when lo device up
    fib_frontend: fix possible NULL pointer dereference
    rtnetlink: remove IFLA_BOND_SLAVE definition
    rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
    qlcnic: update version to 5.3.55
    qlcnic: Enhance logic to calculate msix vectors.
    qlcnic: Refactor interrupt coalescing code for all adapters.
    qlcnic: Update poll controller code path
    qlcnic: Interrupt code cleanup
    qlcnic: Enhance Tx timeout debugging.
    qlcnic: Use bool for rx_mac_learn.
    bonding: fix u64 division
    rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
    sfc: Use the correct maximum TX DMA ring size for SFC9100
    Add Shradha Shah as the sfc driver maintainer.
    net/vxlan: Share RX skb de-marking and checksum checks with ovs
    tulip: cleanup by using ARRAY_SIZE()
    ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
    net/cxgb4: Don't retrieve stats during recovery
    ...

    Linus Torvalds
     

24 Jan, 2014

1 commit

  • Remove an outdated reference to "most personal computers" having only one
    CPU, and change the use of "singleprocessor" and "single processor" in
    CONFIG_SMP's documentation to "uniprocessor" across all arches where that
    documentation is present.

    Signed-off-by: Robert Graffham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Graffham
     

19 Jan, 2014

1 commit

  • For user space packet capturing libraries such as libpcap, there's
    currently only one way to check which BPF extensions are supported
    by the kernel, that is, commit aa1113d9f85d ("net: filter: return
    -EINVAL if BPF_S_ANC* operation is not supported"). For querying all
    extensions at once this might be rather inconvenient.

    Therefore, this patch introduces a new option which can be used as
    an argument for getsockopt(), and allows one to obtain information
    about which BPF extensions are supported by the current kernel.

    As David Miller suggests, we do not need to define any bits right
    now and status quo can just return 0 in order to state that this
    versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
    additions to BPF extensions need to add their bits to the
    bpf_tell_extensions() function, as documented in the comment.

    Signed-off-by: Michal Sekletar
    Cc: David Miller
    Reviewed-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Michal Sekletar
     

12 Jan, 2014

1 commit

  • We're going to be adding a few new barrier primitives, and in order to
    avoid endless duplication make more agressive use of
    asm-generic/barrier.h.

    Change the asm-generic/barrier.h such that it allows partial barrier
    definitions and fills out the rest with defaults.

    There are a few architectures (m32r, m68k) that could probably
    do away with their barrier.h file entirely but are kept for now due to
    their unconventional nop() implementation.

    Suggested-by: Geert Uytterhoeven
    Reviewed-by: "Paul E. McKenney"
    Reviewed-by: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra
    Cc: Michael Ellerman
    Cc: Michael Neuling
    Cc: Russell King
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Victor Kaplansky
    Cc: Tony Luck
    Cc: Oleg Nesterov
    Cc: Benjamin Herrenschmidt
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Dec, 2013

1 commit


20 Nov, 2013

1 commit

  • Pull irq cleanups from Ingo Molnar:
    "This is a multi-arch cleanup series from Thomas Gleixner, which we
    kept to near the end of the merge window, to not interfere with
    architecture updates.

    This series (motivated by the -rt kernel) unifies more aspects of IRQ
    handling and generalizes PREEMPT_ACTIVE"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    preempt: Make PREEMPT_ACTIVE generic
    sparc: Use preempt_schedule_irq
    ia64: Use preempt_schedule_irq
    m32r: Use preempt_schedule_irq
    hardirq: Make hardirq bits generic
    m68k: Simplify low level interrupt handling code
    genirq: Prevent spurious detection for unconditionally polled interrupts

    Linus Torvalds
     

16 Nov, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual earth-shaking, news-breaking, rocket science pile from
    trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
    doc: add missing files to timers/00-INDEX
    timekeeping: Fix some trivial typos in comments
    mm: Fix some trivial typos in comments
    irq: Fix some trivial typos in comments
    NUMA: fix typos in Kconfig help text
    mm: update 00-INDEX
    doc: Documentation/DMA-attributes.txt fix typo
    DRM: comment: `halve' -> `half'
    Docs: Kconfig: `devlopers' -> `developers'
    doc: typo on word accounting in kprobes.c in mutliple architectures
    treewide: fix "usefull" typo
    treewide: fix "distingush" typo
    mm/Kconfig: Grammar s/an/a/
    kexec: Typo s/the/then/
    Documentation/kvm: Update cpuid documentation for steal time and pv eoi
    treewide: Fix common typo in "identify"
    __page_to_pfn: Fix typo in comment
    Correct some typos for word frequency
    clk: fixed-factor: Fix a trivial typo
    ...

    Linus Torvalds
     

15 Nov, 2013

3 commits


14 Nov, 2013

3 commits


13 Nov, 2013

1 commit

  • Pull networking updates from David Miller:

    1) The addition of nftables. No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations. For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset. In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

    2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

    3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

    4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

    5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

    6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

    7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option. From Eric Dumazet.

    8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

    9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too. Implementation from Shawn
    Bohrer.

    10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets. With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention. From Eric Dumazet.

    11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations. From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

    12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels. From Eric Dumazet.

    13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies. From Hannes Frederic Sowa. The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

    14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more. Many drivers have been cleaned
    up in this way, from Jingoo Han.

    15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

    16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled. Also from Daniel
    Borkmann.

    17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value. This helps avoid PMTU attacks,
    particularly on DNS servers. From Hannes Frederic Sowa.

    18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net. From Jason Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    random32: add test cases for taus113 implementation
    random32: upgrade taus88 generator to taus113 from errata paper
    random32: move rnd_state to linux/random.h
    random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
    random32: add periodic reseeding
    random32: fix off-by-one in seeding requirement
    PHY: Add RTL8201CP phy_driver to realtek
    xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
    macmace: add missing platform_set_drvdata() in mace_probe()
    ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
    ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
    vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
    ixgbe: add warning when max_vfs is out of range.
    igb: Update link modes display in ethtool
    netfilter: push reasm skb through instead of original frag skbs
    ip6_output: fragment outgoing reassembled skb properly
    MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
    net_sched: tbf: support of 64bit rates
    ixgbe: deleting dfwd stations out of order can cause null ptr deref
    ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
    ...

    Linus Torvalds
     

14 Oct, 2013

1 commit


29 Sep, 2013

1 commit

  • As mentioned in commit afe4fd062416b ("pkt_sched: fq: Fair Queue packet
    scheduler"), this patch adds a new socket option.

    SO_MAX_PACING_RATE offers the application the ability to cap the
    rate computed by transport layer. Value is in bytes per second.

    u32 val = 1000000;
    setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

    To be effectively paced, a flow must use FQ packet scheduler.

    Note that a packet scheduler takes into account the headers for its
    computations. The effective payload rate depends on MSS and retransmits
    if any.

    I chose to make this pacing rate a SOL_SOCKET option instead of a
    TCP one because this can be used by other protocols.

    Signed-off-by: Eric Dumazet
    Cc: Steinar H. Gunderson
    Cc: Michael Kerrisk
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Sep, 2013

1 commit

  • In order to prepare to per-arch implementations of preempt_count move
    the required bits into an asm-generic header and use this for all
    archs.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Sep, 2013

2 commits

  • After the last architecture switched to generic hard irqs the config
    options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
    for !CONFIG_GENERIC_HARDIRQS can be removed.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Unlike global OOM handling, memory cgroup code will invoke the OOM killer
    in any OOM situation because it has no way of telling faults occuring in
    kernel context - which could be handled more gracefully - from
    user-triggered faults.

    Pass a flag that identifies faults originating in user space from the
    architecture-specific fault handlers to generic code so that memcg OOM
    handling can be improved.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    Note that some harmless section mismatch warnings may result, since
    notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
    are flagged as __cpuinit -- so if we remove the __cpuinit from
    arch specific callers, we will also get section mismatch warnings.
    As an intermediate step, we intend to turn the linux/init.h cpuinit
    content into no-ops as early as possible, since that will get rid
    of these warnings. In any case, they are temporary and harmless.

    This removes all the arch/m32r uses of the __cpuinit macros from
    all C files. Currently m32r does not have any __CPUINIT used in
    assembly files.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: Hirokazu Takata
    Cc: linux-m32r@ml.linux-m32r.org
    Cc: linux-m32r-ja@ml.linux-m32r.org
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

11 Jul, 2013

1 commit

  • Rename LL_SO to BUSY_POLL_SO
    Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
    Fix up users of these variables.
    Fix documentation for sysctl.

    a patch for the socket.7 man page will follow separately,
    because of limitations of my mail setup.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

10 Jul, 2013

1 commit

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     

05 Jul, 2013

2 commits

  • Merge Kconfig menu diet patches from Dave Hansen:
    "I think the "Kernel Hacking" menu has gotten a bit out of hand. It is
    over 120 lines long on my system with everything enabled and options
    are scattered around it haphazardly.

    http://sr71.net/~dave/linux/kconfig-horror.png

    Let's try to introduce some sanity. This set takes that 120 lines
    down to 55 and makes it vastly easier to find some things. It's a
    start.

    This set stands on its own, but there is plenty of room for follow-up
    patches. The arch-specific debug options still end up getting stuck
    in the top-level "kernel hacking" menu. OPTIMIZE_INLINING, for
    instance, could obviously go in to the "compiler options" menu, but
    the fact that it is defined in arch/ in a separate Kconfig file keeps
    it on its own for the moment.

    The Signed-off-by's in here look funky. I changed employers while
    working on this set, so I have signoffs from both email addresses"

    * emailed patches from Dave Hansen :
    hang and lockup detection menu
    kconfig: consolidate printk options
    group locking debugging options
    consolidate compilation option configs
    consolidate runtime testing configs
    order memory debugging Kconfig options
    consolidate per-arch stack overflow debugging options

    Linus Torvalds
     
  • Original posting:

    http://lkml.kernel.org/r/20121214184202.F54094D9@kernel.stglabs.ibm.com

    Several architectures have similar stack debugging config options.
    They all pretty much do the same thing, some with slightly
    differing help text.

    This patch changes the architectures to instead enable a Kconfig
    boolean, and then use that boolean in the generic Kconfig.debug
    to present the actual menu option. This removes a bunch of
    duplication and adds consistency across arches.

    Signed-off-by: Dave Hansen
    Reviewed-by: H. Peter Anvin
    Reviewed-by: James Hogan
    Acked-by: Chris Metcalf [for tile]
    Signed-off-by: Dave Hansen
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

04 Jul, 2013

5 commits

  • Prepare for killing free_all_bootmem_node() by using free_all_bootmem().

    Signed-off-by: Jiang Liu
    Cc: Hirokazu Takata
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Prepare for removing num_physpages and simplify mem_init().

    Signed-off-by: Jiang Liu
    Cc: Hirokazu Takata
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Address more review comments from last round of code review.
    1) Enhance free_reserved_area() to support poisoning freed memory with
    pattern '0'. This could be used to get rid of poison_init_mem()
    on ARM64.
    2) A previous patch has disabled memory poison for initmem on s390
    by mistake, so restore to the original behavior.
    3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

    Signed-off-by: Jiang Liu
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Change signature of free_reserved_area() according to Russell King's
    suggestion to fix following build warnings:

    arch/arm/mm/init.c: In function 'mem_init':
    arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
    In file included from include/linux/mman.h:4:0,
    from arch/arm/mm/init.c:15:
    include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
    extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

    mm/page_alloc.c: In function 'free_reserved_area':
    >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
    In file included from arch/mips/include/asm/page.h:49:0,
    from include/linux/mmzone.h:20,
    from include/linux/gfp.h:4,
    from include/linux/mm.h:8,
    from mm/page_alloc.c:18:
    arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
    mm/page_alloc.c: In function 'free_area_init_nodes':
    mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

    Also address some minor code review comments.

    Signed-off-by: Jiang Liu
    Reported-by: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

03 Jul, 2013

1 commit

  • Pull voluntary preemption fixes from Ingo Molnar:
    "This tree contains a speedup which is achieved through better
    might_sleep()/might_fault() preemption point annotations for uaccess
    functions, by Michael S Tsirkin:

    1. The only reason uaccess routines might sleep is if they fault.
    Make this explicit for all architectures.

    2. A voluntary preemption point in uaccess functions means compiler
    can't inline them efficiently, this breaks assumptions that they
    are very fast and small that e.g. net code seems to make. Remove
    this preemption point so behaviour matches with what callers
    assume.

    3. Accesses (e.g through socket ops) to kernel memory with KERNEL_DS
    like net/sunrpc does will never sleep. Remove an unconditinal
    might_sleep() in the might_fault() inline in kernel.h (used when
    PROVE_LOCKING is not set).

    4. Accesses with pagefault_disable() return EFAULT but won't cause
    caller to sleep. Check for that and thus avoid might_sleep() when
    PROVE_LOCKING is set.

    These changes offer a nice speedup for CONFIG_PREEMPT_VOLUNTARY=y
    kernels, here's a network bandwidth measurement between a virtual
    machine and the host:

    before:
    incoming: 7122.77 Mb/s
    outgoing: 8480.37 Mb/s

    after:
    incoming: 8619.24 Mb/s [ +21.0% ]
    outgoing: 9455.42 Mb/s [ +11.5% ]

    I kept these changes in a separate tree, separate from scheduler
    changes, because it's a mixed MM and scheduler topic"

    * 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    mm, sched: Allow uaccess in atomic with pagefault_disable()
    mm, sched: Drop voluntary schedule from might_fault()
    x86: uaccess s/might_sleep/might_fault/
    tile: uaccess s/might_sleep/might_fault/
    powerpc: uaccess s/might_sleep/might_fault/
    mn10300: uaccess s/might_sleep/might_fault/
    microblaze: uaccess s/might_sleep/might_fault/
    m32r: uaccess s/might_sleep/might_fault/
    frv: uaccess s/might_sleep/might_fault/
    arm64: uaccess s/might_sleep/might_fault/
    asm-generic: uaccess s/might_sleep/might_fault/

    Linus Torvalds
     

29 Jun, 2013

1 commit


18 Jun, 2013

1 commit


28 May, 2013

1 commit

  • The only reason uaccess routines might sleep
    is if they fault. Make this explicit.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1369577426-26721-4-git-send-email-mst@redhat.com
    Signed-off-by: Ingo Molnar

    Michael S. Tsirkin
     

02 May, 2013

1 commit

  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     
  • show_regs() is inherently arch-dependent but it does make sense to print
    generic debug information and some archs already do albeit in slightly
    different forms. This patch introduces a generic function to print debug
    information from show_regs() so that different archs print out the same
    information and it's much easier to modify what's printed.

    show_regs_print_info() prints out the same debug info as dump_stack()
    does plus task and thread_info pointers.

    * Archs which didn't print debug info now do.

    alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
    metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
    um, xtensa

    * Already prints debug info. Replaced with show_regs_print_info().
    The printed information is superset of what used to be there.

    arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

    * s390 is special in that it used to print arch-specific information
    along with generic debug info. Heiko and Martin think that the
    arch-specific extra isn't worth keeping s390 specfic implementation.
    Converted to use the generic version.

    Note that now all archs print the debug info before actual register
    dumps.

    An example BUG() dump follows.

    kernel BUG at /work/os/work/kernel/workqueue.c:4841!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
    Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
    task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
    RIP: 0010:[] [] init_workqueues+0x4/0x6
    RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
    RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
    RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
    RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
    0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
    ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
    Call Trace:
    [] do_one_initcall+0x122/0x170
    [] kernel_init_freeable+0x9b/0x1c8
    [] ? rest_init+0x140/0x140
    [] kernel_init+0xe/0xf0
    [] ret_from_fork+0x7c/0xb0
    [] ? rest_init+0x140/0x140
    ...

    v2: Typo fix in x86-32.

    v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it. s390
    specific implementation dropped as requested by s390 maintainers.

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Acked-by: Jesper Nilsson
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Bjorn Helgaas
    Cc: Fengguang Wu
    Cc: Mike Frysinger
    Cc: Vineet Gupta
    Cc: Sam Ravnborg
    Acked-by: Chris Metcalf [tile bits]
    Acked-by: Richard Kuo [hexagon bits]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo