20 Apr, 2008
1 commit
-
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar
05 Mar, 2008
1 commit
-
Some oprofile results obtained while using tbench on a 2x2 cpu machine were
very surprising.For example, loopback_xmit() function was using high number of cpu cycles
to perform the statistic updates, supposed to be real cheap since they use
percpu datapcpu_lstats = netdev_priv(dev);
lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
lb_stats->packets++; /* HERE : serious contention */
lb_stats->bytes += skb->len;struct pcpu_lstats is a small structure containing two longs. It appears
that on my 32bits platform, alloc_percpu(8) allocates a single cache line,
instead of giving to each cpu a separate cache line.Using the following patch gave me impressive boost in various benchmarks
( 6 % in tbench)
(all percpu_counters hit this bug too)Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own
block of memory, so that we dont need to roudup sizes to L1_CACHE_BYTES, or
merging the SGI stuff of course...Note : SLUB vs SLAB is important here to *show* the improvement, since they
dont have the same minimum allocation sizes (8 bytes vs 32 bytes). This
could very well explain regressions some guys reported when they switched
to SLUB.Signed-off-by: Eric Dumazet
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2008
1 commit
-
Instead of allocating a fix sized array of NR_CPUS pointers for percpu_data,
we can use nr_cpu_ids, which is generally < NR_CPUS.Signed-off-by: Eric Dumazet
Cc: Christoph Lameter
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
1 commit
-
The patch (as824b) makes percpu_free() ignore NULL arguments, as one would
expect for a deallocation routine. (Note that free_percpu is #defined as
percpu_free in include/linux/percpu.h.) A few callers are updated to remove
now-unneeded tests for NULL. A few other callers already seem to assume
that passing a NULL pointer to percpu_free() is okay!The patch also removes an unnecessary NULL check in percpu_depopulate().
Signed-off-by: Alan Stern
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Sep, 2006
1 commit
-
The allocpercpu functions __alloc_percpu and __free_percpu() are heavily
using the slab allocator. However, they are conceptually slab. This also
simplifies SLOB (at this point slob may be broken in mm. This should fix
it).Signed-off-by: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds