17 May, 2015

1 commit

  • We currently have no limit on the number of elements in a hash table.
    This is a problem because some users (tipc) set a ceiling on the
    maximum table size and when that is reached the hash table may
    degenerate. Others may encounter OOM when growing and if we allow
    insertions when that happens the hash table perofrmance may also
    suffer.

    This patch adds a new paramater insecure_max_entries which becomes
    the cap on the table. If unset it defaults to max_size * 2. If
    it is also zero it means that there is no cap on the number of
    elements in the table. However, the table will grow whenever the
    utilisation hits 100% and if that growth fails, you will get ENOMEM
    on insertion.

    As allowing oversubscription is potentially dangerous, the name
    contains the word insecure.

    Note that the cap is not a hard limit. This is done for performance
    reasons as enforcing a hard limit will result in use of atomic ops
    that are heavier than the ones we currently use.

    The reasoning is that we're only guarding against a gross over-
    subscription of the table, rather than a small breach of the limit.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

07 May, 2015

1 commit


06 May, 2015

3 commits

  • The documentation shows a need for gcc > 4.9.2, but it's really >=. The
    Kconfig entries don't show require versions so add them. Correct a
    latter/later typo too. Also mention that gcc 5 required to catch out of
    bounds accesses to global and stack variables.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • The file lib/find_last_bit.c was no longer used and supposed to be
    deleted by commit 8f6f19dd51 ("lib: move find_last_bit to
    lib/find_next_bit.c") but that delete didn't happen. This gets rid of
    it.

    Signed-off-by: Yury Norov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a build problem with bcm63xx and yet another fix to the
    memzero_explicit function to ensure that the memset is not elided"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    hwrng: bcm63xx - Fix driver compilation
    lib: make memzero_explicit more robust against dead store elimination

    Linus Torvalds
     

04 May, 2015

1 commit

  • In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
    of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
    case LTO would decide to inline memzero_explicit() and eventually
    find out it could be elimiated as dead store.

    While using barrier() works well for the case of gcc, recent efforts
    from LLVMLinux people suggest to use llvm as an alternative to gcc,
    and there, Stephan found in a simple stand-alone user space example
    that llvm could nevertheless optimize and thus elimitate the memset().
    A similar issue has been observed in the referenced llvm bug report,
    which is regarded as not-a-bug.

    Based on some experiments, icc is a bit special on its own, while it
    doesn't seem to eliminate the memset(), it could do so with an own
    implementation, and then result in similar findings as with llvm.

    The fix in this patch now works for all three compilers (also tested
    with more aggressive optimization levels). Arguably, in the current
    kernel tree it's more of a theoretical issue, but imho, it's better
    to be pedantic about it.

    It's clearly visible with gcc/llvm though, with the below code: if we
    would have used barrier() only here, llvm would have omitted clearing,
    not so with barrier_data() variant:

    static inline void memzero_explicit(void *s, size_t count)
    {
    memset(s, 0, count);
    barrier_data(s);
    }

    int main(void)
    {
    char buff[20];
    memzero_explicit(buff, sizeof(buff));
    return 0;
    }

    $ gcc -O2 test.c
    $ gdb a.out
    (gdb) disassemble main
    Dump of assembler code for function main:
    0x0000000000400400 : lea -0x28(%rsp),%rax
    0x0000000000400405 : movq $0x0,-0x28(%rsp)
    0x000000000040040e : movq $0x0,-0x20(%rsp)
    0x0000000000400417 : movl $0x0,-0x18(%rsp)
    0x000000000040041f : xor %eax,%eax
    0x0000000000400421 : retq
    End of assembler dump.

    $ clang -O2 test.c
    $ gdb a.out
    (gdb) disassemble main
    Dump of assembler code for function main:
    0x00000000004004f0 : xorps %xmm0,%xmm0
    0x00000000004004f3 : movaps %xmm0,-0x18(%rsp)
    0x00000000004004f8 : movl $0x0,-0x8(%rsp)
    0x0000000000400500 : lea -0x18(%rsp),%rax
    0x0000000000400505 : xor %eax,%eax
    0x0000000000400507 : retq
    End of assembler dump.

    As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
    this in compiler-gcc.h only to be picked up. For a fallback or otherwise
    unsupported compiler, we define it as a barrier. Similarly, for ecc which
    does not support gcc inline asm.

    Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
    Reported-by: Stephan Mueller
    Tested-by: Stephan Mueller
    Signed-off-by: Daniel Borkmann
    Cc: Theodore Ts'o
    Cc: Stephan Mueller
    Cc: Hannes Frederic Sowa
    Cc: mancha security
    Cc: Mark Charlebois
    Cc: Behan Webster
    Signed-off-by: Herbert Xu

    Daniel Borkmann
     

28 Apr, 2015

1 commit

  • Pull networking fixes from David Miller:

    1) mlx4 doesn't check fully for supported valid RSS hash function, fix
    from Amir Vadai

    2) Off by one in ibmveth_change_mtu(), from David Gibson

    3) Prevent altera chip from reporting false error interrupts in some
    circumstances, from Chee Nouk Phoon

    4) Get rid of that stupid endless loop trying to allocate a FIN packet
    in TCP, and in the process kill deadlocks. From Eric Dumazet

    5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
    Eric Dumazet

    6) Fix two bugs in async rhashtable resizing, from Thomas Graf

    7) Fix topology server listener socket namespace bug in TIPC, from Ying
    Xue

    8) Add some missing HAS_DMA kconfig dependencies, from Geert
    Uytterhoeven

    9) bgmac driver intends to force re-polling but does so by returning
    the wrong value from it's ->poll() handler. Fix from Rafał Miłecki

    10) When the creater of an rhashtable configures a max size for it,
    don't bark in the logs and drop insertions when that is exceeded.
    Fix from Johannes Berg

    11) Recover from out of order packets in ppp mppe properly, from Sylvain
    Rochet

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    bnx2x: really disable TPA if 'disable_tpa' option is set
    net:treewide: Fix typo in drivers/net
    net/mlx4_en: Prevent setting invalid RSS hash function
    mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
    netfilter; Add some missing default cases to switch statements in nft_reject.
    ppp: mppe: discard late packet in stateless mode
    ppp: mppe: sanity error path rework
    net/bonding: Make DRV macros private
    net: rfs: fix crash in get_rps_cpus()
    altera tse: add support for fixed-links.
    pxa168: fix double deallocation of managed resources
    net: fix crash in build_skb()
    net: eth: altera: Resolve false errors from MSGDMA to TSE
    ehea: Fix memory hook reference counting crashes
    net/tg3: Release IRQs on permanent error
    net: mdio-gpio: support access that may sleep
    inet: fix possible panic in reqsk_queue_unlink()
    rhashtable: don't attempt to grow when at max_size
    bgmac: fix requests for extra polling calls from NAPI
    tcp: avoid looping in tcp_send_fin()
    ...

    Linus Torvalds
     

25 Apr, 2015

1 commit

  • Pull md updates from Neil Brown:
    "More updates that usual this time. A few have performance impacts
    which hould mostly be positive, but RAID5 (in particular) can be very
    work-load ensitive... We'll have to wait and see.

    Highlights:

    - "experimental" code for managing md/raid1 across a cluster using
    DLM. Code is not ready for general use and triggers a WARNING if
    used. However it is looking good and mostly done and having in
    mainline will help co-ordinate development.

    - RAID5/6 can now batch multiple (4K wide) stripe_heads so as to
    handle a full (chunk wide) stripe as a single unit.

    - RAID6 can now perform read-modify-write cycles which should help
    performance on larger arrays: 6 or more devices.

    - RAID5/6 stripe cache now grows and shrinks dynamically. The value
    set is used as a minimum.

    - Resync is now allowed to go a little faster than the 'mininum' when
    there is competing IO. How much faster depends on the speed of the
    devices, so the effective minimum should scale with device speed to
    some extent"

    * tag 'md/4.1' of git://neil.brown.name/md: (58 commits)
    md/raid5: don't do chunk aligned read on degraded array.
    md/raid5: allow the stripe_cache to grow and shrink.
    md/raid5: change ->inactive_blocked to a bit-flag.
    md/raid5: move max_nr_stripes management into grow_one_stripe and drop_one_stripe
    md/raid5: pass gfp_t arg to grow_one_stripe()
    md/raid5: introduce configuration option rmw_level
    md/raid5: activate raid6 rmw feature
    md/raid6 algorithms: xor_syndrome() for SSE2
    md/raid6 algorithms: xor_syndrome() for generic int
    md/raid6 algorithms: improve test program
    md/raid6 algorithms: delta syndrome functions
    raid5: handle expansion/resync case with stripe batching
    raid5: handle io error of batch list
    RAID5: batch adjacent full stripe write
    raid5: track overwrite disk count
    raid5: add a new flag to track if a stripe can be batched
    raid5: use flex_array for scribble data
    md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid
    md: allow resync to go faster when there is competing IO.
    md: remove 'go_faster' option from ->sync_request()
    ...

    Linus Torvalds
     

23 Apr, 2015

2 commits

  • The current code currently only stops inserting rehashes into the
    chain when no resizes are currently scheduled. As long as resizes
    are scheduled and while inserting above the utilization watermark,
    more and more rehashes will be scheduled.

    This lead to a perfect DoS storm with thousands of rehashes
    scheduled which lead to thousands of spinlocks to be taken
    sequentially.

    Instead, only allow either a series of resizes or a single rehash.
    Drop any further rehashes and return -EBUSY.

    Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
    Signed-off-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
    we can't allocate the necessary memory in the current context but the
    limits as set by the user would still allow to grow.

    Thus attempt an async resize in the background where we can allocate
    using GFP_KERNEL which is more likely to succeed. The insertion itself
    will still fail to indicate pressure.

    This fixes a bug where the table would never continue growing once the
    utilization is above 100%.

    Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
    Signed-off-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Graf
     

22 Apr, 2015

6 commits

  • Pull sparc fixes from David Miller:

    1) ldc_alloc_exp_dring() can be called from softints, so use
    GFP_ATOMIC. From Sowmini Varadhan.

    2) Some minor warning/build fixups for the new iommu-common code on
    certain archs and with certain debug options enabled. Also from
    Sowmini Varadhan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context
    sparc64: Use M7 PMC write on all chips T4 and onward.
    iommu-common: rename iommu_pool_hash to iommu_hash_common
    iommu-common: fix x86_64 compiler warnings

    Linus Torvalds
     
  • The second and (last) optimized XOR syndrome calculation. This version
    supports right and left side optimization. All CPUs with architecture
    older than Haswell will benefit from it.

    It should be noted that SSE2 movntdq kills performance for memory areas
    that are read and written simultaneously in chunks smaller than cache
    line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR
    functions.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • Start the algorithms with the very basic one. It is left and right
    optimized. That means we can avoid all calculations for unneeded pages
    above the right stop offset. For pages below the left start offset we
    still need the syndrome multiplication but without reading data pages.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • It is always helpful to have a test tool in place if we implement
    new data critical algorithms. So add some test routines to the raid6
    checker that can prove if the new xor_syndrome() works as expected.

    Run through all permutations of start/stop pages per algorithm and
    simulate a xor_syndrome() assisted rmw run. After each rmw check if
    the recovery algorithm still confirms that the stripe is fine.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • v3: s-o-b comment, explanation of performance and descision for
    the start/stop implementation

    Implementing rmw functionality for RAID6 requires optimized syndrome
    calculation. Up to now we can only generate a complete syndrome. The
    target P/Q pages are always overwritten. With this patch we provide
    a framework for inplace P/Q modification. In the first place simply
    fill those functions with NULL values.

    xor_syndrome() has two additional parameters: start & stop. These
    will indicate the first and last page that are changing during a
    rmw run. That makes it possible to avoid several unneccessary loops
    and speed up calculation. The caller needs to implement the following
    logic to make the functions work.

    1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
    blocks inside P/Q between (and including) start and end.

    2) modify any block with start
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • Pull char/misc driver updates from Greg KH:
    "Here's the big char/misc driver patchset for 4.1-rc1.

    Lots of different driver subsystem updates here, nothing major, full
    details are in the shortlog.

    All of this has been in linux-next for a while"

    * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits)
    mei: trace: remove unused TRACE_SYSTEM_STRING
    DTS: ARM: OMAP3-N900: Add lis3lv02d support
    Documentation: DT: lis302: update wakeup binding
    lis3lv02d: DT: add wakeup unit 2 and wakeup threshold
    lis3lv02d: DT: use s32 to support negative values
    Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
    Drivers: hv: hv_balloon: correctly handle val.freeram directory
    coresight-tmc: Adding a status interface to sysfs
    coresight: remove the unnecessary configuration coresight-default-sink
    ...

    Linus Torvalds
     

21 Apr, 2015

3 commits

  • When CONFIG_DEBUG_FORCE_WEAK_PER_CPU is set, the DEFINE_PER_CPU_SECTION
    macro will define an extern __pcpu_unique_##name variable that could
    conflict with the same definition in powerpc at this time. Avoid that
    conflict by renaming iommu_pool_hash in iommu-common.c

    Thanks to Guenter Roeck for catching this, and helping to test the fix.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Guenter Roeck
    Reviewed-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Declare iommu_large_alloc as static. Remove extern definition for
    iommu_tbl_pool_init().

    Signed-off-by: Sowmini Varadhan
    Tested-by: Guenter Roeck
    Reviewed-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell:
    "This is the final removal (after several years!) of the obsolete
    cpus_* functions, prompted by their mis-use in staging.

    With these function removed, all cpu functions should only iterate to
    nr_cpu_ids, so we finally only allocate that many bits when cpumasks
    are allocated offstack"

    * tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
    cpumask: remove __first_cpu / __next_cpu
    cpumask: resurrect CPU_MASK_CPU0
    linux/cpumask.h: add typechecking to cpumask_test_cpu
    cpumask: only allocate nr_cpumask_bits.
    Fix weird uses of num_online_cpus().
    cpumask: remove deprecated functions.
    mips: fix obsolete cpumask_of_cpu usage.
    x86: fix more deprecated cpu function usage.
    ia64: remove deprecated cpus_ usage.
    powerpc: fix deprecated CPU_MASK_CPU0 usage.
    CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region.
    staging/lustre/o2iblnd: Don't use cpus_weight
    staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_
    staging/lustre/ptlrpc: Do not use deprecated cpus_* functions
    blackfin: fix up obsolete cpu function usage.
    parisc: fix up obsolete cpu function usage.
    tile: fix up obsolete cpu function usage.
    arm64: fix up obsolete cpu function usage.
    mips: fix up obsolete cpu function usage.
    x86: fix up obsolete cpu function usage.
    ...

    Linus Torvalds
     

20 Apr, 2015

1 commit

  • The test_data_1_le[] array is a const array of const char *. To avoid
    dropping any const information, we need to use "const char * const *",
    not just "const char **".

    I'm not sure why the different test arrays end up having different
    const'ness, but let's make the pointer we use to traverse them as const
    as possible, since we modify neither the array of pointers _or_ the
    pointers we find in the array.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

19 Apr, 2015

4 commits

  • They were for use by the deprecated first_cpu() and next_cpu() wrappers,
    but sparc used them directly.

    They're now replaced by cpumask_first / cpumask_next. And __next_cpu_nr
    is completely obsolete.

    Signed-off-by: Rusty Russell
    Acked-by: David S. Miller

    Rusty Russell
     
  • Fixes warnings due to
    - no DMA_ERROR_CODE on PARISC,
    - sizeof (unsigned long) == 4 bytes on PARISC.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Investigation of multithreaded iperf experiments on an ethernet
    interface show the iommu->lock as the hottest lock identified by
    lockstat, with something of the order of 21M contentions out of
    27M acquisitions, and an average wait time of 26 us for the lock.
    This is not efficient. A more scalable design is to follow the ppc
    model, where the iommu_map_table has multiple pools, each stretching
    over a segment of the map, and with a separate lock for each pool.
    This model allows for better parallelization of the iommu map search.

    This patch adds the iommu range alloc/free function infrastructure.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • I applied the wrong version of this patch series, V4 instead
    of V10, due to a patchwork bundling snafu.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Apr, 2015

3 commits

  • …k/linux-rcu into core/urgent

    Pull RCU fix from Paul E. McKenney:

    "This series contains a single change that fixes Kconfig asking pointless
    questions."

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Pull sparc updates from David Miller:
    "The PowerPC folks have a really nice scalable IOMMU pool allocator
    that we wanted to make use of for sparc. So here we have a series
    that abstracts out their code into a common layer that anyone can make
    use of.

    Sparc is converted, and the PowerPC folks have reviewed and ACK'd this
    series and plan to convert PowerPC over as well"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    iommu-common: Fix PARISC compile-time warnings
    sparc: Make LDC use common iommu poll management functions
    sparc: Make sparc64 use scalable lib/iommu-common.c functions
    sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

    Linus Torvalds
     
  • Fixes warnings due to
    - no DMA_ERROR_CODE on PARISC,
    - sizeof (unsigned long) == 4 bytes on PARISC.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Apr, 2015

12 commits

  • Merge third patchbomb from Andrew Morton:

    - various misc things

    - a couple of lib/ optimisations

    - provide DIV_ROUND_CLOSEST_ULL()

    - checkpatch updates

    - rtc tree

    - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs

    - ptrace fixes

    - fork() fixes

    - seccomp cleanups

    - more mmap_sem hold time reductions from Davidlohr

    * emailed patches from Andrew Morton : (138 commits)
    proc: show locks in /proc/pid/fdinfo/X
    docs: add missing and new /proc/PID/status file entries, fix typos
    drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
    Documentation/spi/spidev_test.c: fix warning
    drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
    .gitignore: ignore *.tar
    MAINTAINERS: add Mediatek SoC mailing list
    tomoyo: reduce mmap_sem hold for mm->exe_file
    powerpc/oprofile: reduce mmap_sem hold for exe_file
    oprofile: reduce mmap_sem hold for mm->exe_file
    mips: ip32: add platform data hooks to use DS1685 driver
    lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
    x86: switch to using asm-generic for seccomp.h
    sparc: switch to using asm-generic for seccomp.h
    powerpc: switch to using asm-generic for seccomp.h
    parisc: switch to using asm-generic for seccomp.h
    mips: switch to using asm-generic for seccomp.h
    microblaze: use asm-generic for seccomp.h
    arm: use asm-generic for seccomp.h
    seccomp: allow COMPAT sigreturn overrides
    ...

    Linus Torvalds
     
  • Cc: Yalin Wang
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • cpumask_next_and() is looking for cpumask_next() in src1 in a loop and
    tests if found cpu is also present in src2. remove that loop, perform
    cpumask_and() of src1 and src2 first and use that new mask to find
    cpumask_next().

    Apart from removing while loop, ./bloat-o-meter on x86_64 shows
    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
    function old new delta
    cpumask_next_and 62 54 -8

    Signed-off-by: Sergey Senozhatsky
    Cc: Tejun Heo
    Cc: "David S. Miller"
    Cc: Amir Vadai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • bitmap_empty() has its own implementation. But it's clearly as simple as:

    find_first_bit(src, nbits) == nbits

    The same is true for 'bitmap_full'.

    Signed-off-by: Yury Norov
    Cc: George Spelvin
    Cc: Alexey Klimov
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • I hadn't had enough coffee when I wrote this. Currently, the final
    increment of buf depends on the value loaded from the table, and
    causes gcc to emit a cmov immediately before the return. It is smarter
    to let it depend on r, since the increment can then be computed in
    parallel with the final load/store pair. It also shaves 16 bytes of
    .text.

    Signed-off-by: Rasmus Villemoes
    Cc: Tejun Heo
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • bucket_find_contain() will search the bucket list for a dma_debug_entry.
    When the entry isn't found it needs to search other buckets too, since
    only the start address of a dma range is hashed (which might be in a
    different bucket).

    A copy of the dma_debug_entry is used to get the previous hash bucket
    but when its list is searched the original dma_debug_entry is to be used
    not its modified copy.

    This fixes false "device driver tries to sync DMA memory it has not allocated"
    warnings.

    Signed-off-by: Sebastian Ott
    Cc: Florian Fainelli
    Cc: Horia Geanta
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Ott
     
  • The most expensive part of decimal conversion is the divisions by 10
    (albeit done using reciprocal multiplication with appropriately chosen
    constants). I decided to see if one could eliminate around half of
    these multiplications by emitting two digits at a time, at the cost of a
    200 byte lookup table, and it does indeed seem like there is something
    to be gained, especially on 64 bits. Microbenchmarking shows
    improvements ranging from -50% (for numbers uniformly distributed in [0,
    2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a
    more realistic distribution).

    On a larger scale, perf shows that top, one of the big consumers of /proc
    data, uses 0.5-1.0% fewer cpu cycles.

    I had to jump through some hoops to get the 32 bit code to compile and run
    on my 64 bit machine, so I'm not sure how relevant these numbers are, but
    just for comparison the microbenchmark showed improvements between -30%
    and -10%.

    The bloat-o-meter costs are around 150 bytes (the generated code is a
    little smaller, so it's not the full 200 bytes) on both 32 and 64 bit.
    I'm aware that extra cache misses won't show up in a microbenchmark as
    used above, but on the other hand decimal conversions often happen in bulk
    (for example in the case of top).

    I have of course tested that the new code generates the same output as the
    old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9
    'random' numbers in-between.

    Test and verification code on github: https://github.com/Villemoes/dec.

    Signed-off-by: Rasmus Villemoes
    Tested-by: Jeff Epler
    Cc: "Peter Zijlstra (Intel)"
    Cc: Tejun Heo
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • This file contains implementation for all find_*_bit{,_le}
    So giving it more generic name looks reasonable.

    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
    except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
    there's no major benefit to have it separated.

    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • This patchset does rework to find_bit function family to achieve better
    performance, and decrease size of text. All rework is done in patch 1.
    Patches 2 and 3 are about code moving and renaming.

    It was boot-tested on x86_64 and MIPS (big-endian) machines.
    Performance tests were ran on userspace with code like this:

    /* addr[] is filled from /dev/urandom */
    start = clock();
    while (ret < nbits)
    ret = find_next_bit(addr, nbits, ret + 1);

    end = clock();
    printf("%ld\t", (unsigned long) end - start);

    On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
    find_next_bit, nbits is 8M, for find_first_bit - 80K)

    find_next_bit: find_first_bit:
    new current new current
    26932 43151 14777 14925
    26947 43182 14521 15423
    26507 43824 15053 14705
    27329 43759 14473 14777
    26895 43367 14847 15023
    26990 43693 15103 15163
    26775 43299 15067 15232
    27282 42752 14544 15121
    27504 43088 14644 14858
    26761 43856 14699 15193
    26692 43075 14781 14681
    27137 42969 14451 15061
    ... ...

    find_next_bit performance gain is 35-40%;
    find_first_bit - no measurable difference.

    On ARM machine, there is arch-specific implementation for find_bit.

    Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
    helpful discussions.

    This patch (of 3):

    New implementations takes less space in source file (see diffstat) and in
    object. For me it's 710 vs 453 bytes of text. It also shows better
    performance.

    find_last_bit description fixed due to obvious typo.

    [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • Pull SCSI updates from James Bottomley:
    "This is the usual grab bag of driver updates (lpfc, qla2xxx, storvsc,
    aacraid, ipr) plus an assortment of minor updates. There's also a
    major update to aic1542 which moves the driver into this millenium"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits)
    change SCSI Maintainer email
    sd, mmc, virtio_blk, string_helpers: fix block size units
    ufs: add support to allow non standard behaviours (quirks)
    ufs-qcom: save controller revision info in internal structure
    qla2xxx: Update driver version to 8.07.00.18-k
    qla2xxx: Restore physical port WWPN only, when port down detected for FA-WWPN port.
    qla2xxx: Fix virtual port configuration, when switch port is disabled/enabled.
    qla2xxx: Prevent multiple firmware dump collection for ISP27XX.
    qla2xxx: Disable Interrupt handshake for ISP27XX.
    qla2xxx: Add debugging info for MBX timeout.
    qla2xxx: Add serdes read/write support for ISP27XX
    qla2xxx: Add udev notification to save fw dump for ISP27XX
    qla2xxx: Add message for sucessful FW dump collected for ISP27XX.
    qla2xxx: Add support to load firmware from file for ISP 26XX/27XX.
    qla2xxx: Fix beacon blink for ISP27XX.
    qla2xxx: Increase the wait time for firmware to be ready for P3P.
    qla2xxx: Fix crash due to wrong casting of reg for ISP27XX.
    qla2xxx: Fix warnings reported by static checker.
    lpfc: Update version to 10.5.0.0 for upstream patch set
    lpfc: Update copyright to 2015
    ...

    Linus Torvalds
     
  • Investigation of multithreaded iperf experiments on an ethernet
    interface show the iommu->lock as the hottest lock identified by
    lockstat, with something of the order of 21M contentions out of
    27M acquisitions, and an average wait time of 26 us for the lock.
    This is not efficient. A more scalable design is to follow the ppc
    model, where the iommu_table has multiple pools, each stretching
    over a segment of the map, and with a separate lock for each pool.
    This model allows for better parallelization of the iommu map search.

    This patch adds the iommu range alloc/free function infrastructure.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Apr, 2015

1 commit

  • Merge second patchbomb from Andrew Morton:

    - the rest of MM

    - various misc bits

    - add ability to run /sbin/reboot at reboot time

    - printk/vsprintf changes

    - fiddle with seq_printf() return value

    * akpm: (114 commits)
    parisc: remove use of seq_printf return value
    lru_cache: remove use of seq_printf return value
    tracing: remove use of seq_printf return value
    cgroup: remove use of seq_printf return value
    proc: remove use of seq_printf return value
    s390: remove use of seq_printf return value
    cris fasttimer: remove use of seq_printf return value
    cris: remove use of seq_printf return value
    openrisc: remove use of seq_printf return value
    ARM: plat-pxa: remove use of seq_printf return value
    nios2: cpuinfo: remove use of seq_printf return value
    microblaze: mb: remove use of seq_printf return value
    ipc: remove use of seq_printf return value
    rtc: remove use of seq_printf return value
    power: wakeup: remove use of seq_printf return value
    x86: mtrr: if: remove use of seq_printf return value
    linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
    MAINTAINERS: CREDITS: remove Stefano Brivio from B43
    .mailmap: add Ricardo Ribalda
    CREDITS: add Ricardo Ribalda Delgado
    ...

    Linus Torvalds