06 Dec, 2009

4 commits

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
    tracing: Separate raw syscall from syscall tracer
    ring-buffer-benchmark: Add parameters to set produce/consumer priorities
    tracing, function tracer: Clean up strstrip() usage
    ring-buffer benchmark: Run producer/consumer threads at nice +19
    tracing: Remove the stale include/trace/power.h
    tracing: Only print objcopy version warning once from recordmcount
    tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
    ring-buffer: Move access to commit_page up into function used
    tracing: do not disable interrupts for trace_clock_local
    ring-buffer: Add multiple iterations between benchmark timestamps
    kprobes: Sanitize struct kretprobe_instance allocations
    tracing: Fix to use __always_unused attribute
    compiler: Introduce __always_unused
    tracing: Exit with error if a weak function is used in recordmcount.pl
    tracing: Move conditional into update_funcs() in recordmcount.pl
    tracing: Add regex for weak functions in recordmcount.pl
    tracing: Move mcount section search to front of loop in recordmcount.pl
    tracing: Fix objcopy revision check in recordmcount.pl
    tracing: Check absolute path of input file in recordmcount.pl
    tracing: Correct the check for number of arguments in recordmcount.pl
    ...

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    rcu: Make RCU's CPU-stall detector be default
    rcu: Add expedited grace-period support for preemptible RCU
    rcu: Enable fourth level of TREE_RCU hierarchy
    rcu: Rename "quiet" functions
    rcu: Re-arrange code to reduce #ifdef pain
    rcu: Eliminate unneeded function wrapping
    rcu: Fix grace-period-stall bug on large systems with CPU hotplug
    rcu: Eliminate __rcu_pending() false positives
    rcu: Further cleanups of use of lastcomp
    rcu: Simplify association of forced quiescent states with grace periods
    rcu: Accelerate callback processing on CPUs not detecting GP end
    rcu: Mark init-time-only rcu_bootup_announce() as __init
    rcu: Simplify association of quiescent states with grace periods
    rcu: Rename dynticks_completed to completed_fqs
    rcu: Enable synchronize_sched_expedited() fastpath
    rcu: Remove inline from forward-referenced functions
    rcu: Fix note_new_gpnum() uses of ->gpnum
    rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
    rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
    rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ratelimit: Make suppressed output messages more useful
    printk: Remove ratelimit.h from kernel.h
    ratelimit: Fix/allow use in atomic contexts
    ratelimit: Use per ratelimit context locking

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
    x86/amd-iommu: Remove amd_iommu_pd_table
    x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
    x86/amd-iommu: Cleanup DTE flushing code
    x86/amd-iommu: Introduce iommu_flush_device() function
    x86/amd-iommu: Cleanup attach/detach_device code
    x86/amd-iommu: Keep devices per domain in a list
    x86/amd-iommu: Add device bind reference counting
    x86/amd-iommu: Use dev->arch->iommu to store iommu related information
    x86/amd-iommu: Remove support for domain sharing
    x86/amd-iommu: Rearrange dma_ops related functions
    x86/amd-iommu: Move some pte allocation functions in the right section
    x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
    x86/amd-iommu: Use get_device_id and check_device where appropriate
    x86/amd-iommu: Move find_protection_domain to helper functions
    x86/amd-iommu: Simplify get_device_resources()
    x86/amd-iommu: Let domain_for_device handle aliases
    x86/amd-iommu: Remove iommu specific handling from dma_ops path
    x86/amd-iommu: Remove iommu parameter from __(un)map_single
    x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
    ...

    Linus Torvalds
     

03 Dec, 2009

1 commit

  • The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
    some bugs that might otherwise have been difficult to track
    down. Make it be default for the TREE RCU implementations.

    The vmlinux size impact is limited (on 64-bit x86 defconfig):

    text data bss dec hex filename
    8440248 1260076 995588 10695912 a334e8 vmlinux.before
    8440774 1260060 995588 10696422 a336e6 vmlinux.after

    +526 bytes - acceptable default cost.

    For RAM starved systems, TINY_RCU does not support CPU-stall detection
    and is much smaller, but then again it is a uniprocessor...

    Signed-off-by: Paul E. McKenney
    Acked-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    [ v2: added image size calculations to the changelog ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

20 Nov, 2009

2 commits

  • Don't delete pending pages from the page-store tracking tree, but rather send
    them for another write as they've presumably been updated.

    Signed-off-by: David Howells

    David Howells
     
  • __fscache_write_page() attempts to load the radix tree preallocation pool for
    the CPU it is on before calling radix_tree_insert(), as the insertion must be
    done inside a pair of spinlocks.

    Use of the preallocation pool, however, is contingent on the radix tree being
    initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was
    passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.

    The solution is to AND out __GFP_WAIT.

    Additionally, the banner comment to radix_tree_preload() is altered to make
    note of this prerequisite. Possibly there should be a WARN_ON() too.

    Without this fix, I have seen the following recursive deadlock caused by
    radix_tree_insert() attempting to allocate memory inside the spinlocked
    region, which resulted in FS-Cache being called back into to release memory -
    which required the spinlock already held.

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.32-rc6-cachefs #24
    ---------------------------------------------
    nfsiod/7916 is trying to acquire lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_uncache_page+0xdb/0x160 [fscache]

    but task is already holding lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]

    other info that might help us debug this:
    5 locks held by nfsiod/7916:
    #0: (nfsiod){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #1: (&task->u.tk_work#2){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #2: (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]
    #3: (&object->lock#2){+.+.-.}, at: [] __fscache_write_page+0x197/0x3f3 [fscache]
    #4: (&cookie->stores_lock){+.+...}, at: [] __fscache_write_page+0x19f/0x3f3 [fscache]

    stack backtrace:
    Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
    Call Trace:
    [] __lock_acquire+0x1649/0x16e3
    [] ? __lock_acquire+0x7b7/0x16e3
    [] ? dump_trace+0x248/0x257
    [] lock_acquire+0x57/0x6d
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] _spin_lock+0x2c/0x3b
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] __fscache_uncache_page+0xdb/0x160 [fscache]
    [] ? __fscache_check_page_write+0x0/0x71 [fscache]
    [] nfs_fscache_release_page+0x86/0xc4 [nfs]
    [] nfs_release_page+0x3c/0x41 [nfs]
    [] try_to_release_page+0x32/0x3b
    [] shrink_page_list+0x316/0x4ac
    [] ? mark_held_locks+0x52/0x70
    [] ? _spin_unlock_irq+0x2b/0x31
    [] shrink_inactive_list+0x392/0x67c
    [] ? mark_held_locks+0x52/0x70
    [] shrink_list+0x8d/0x8f
    [] shrink_zone+0x278/0x33c
    [] ? ktime_get_ts+0xad/0xba
    [] try_to_free_pages+0x22e/0x392
    [] ? isolate_pages_global+0x0/0x212
    [] __alloc_pages_nodemask+0x3dc/0x5cf
    [] cache_alloc_refill+0x34d/0x6c1
    [] ? radix_tree_node_alloc+0x52/0x5c
    [] kmem_cache_alloc+0xb2/0x118
    [] radix_tree_node_alloc+0x52/0x5c
    [] radix_tree_insert+0x57/0x19c
    [] __fscache_write_page+0x1e3/0x3f3 [fscache]
    [] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
    [] nfs_readpage_release+0x34/0x9b [nfs]
    [] nfs_readpage_release_full+0x32/0x4b [nfs]
    [] rpc_release_calldata+0x12/0x14 [sunrpc]
    [] rpc_free_task+0x59/0x61 [sunrpc]
    [] rpc_async_release+0x10/0x12 [sunrpc]
    [] worker_thread+0x1ef/0x2e2
    [] ? worker_thread+0x19a/0x2e2
    [] ? thread_return+0x3e/0x101
    [] ? rpc_async_release+0x0/0x12 [sunrpc]
    [] ? autoremove_wake_function+0x0/0x34
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? worker_thread+0x0/0x2e2
    [] kthread+0x7a/0x82
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? add_wait_queue+0x15/0x44
    [] ? kthread+0x0/0x82
    [] ? child_rip+0x0/0x20

    Signed-off-by: David Howells

    David Howells
     

19 Nov, 2009

1 commit

  • Doing the strcmp return value as

    signed char __res = *cs - *ct;

    is wrong for two reasons. The subtraction can overflow because __res
    doesn't use a type big enough. Moreover the compared bytes should be
    interpreted as unsigned char as specified by POSIX.

    The same problem is fixed in strncmp.

    Signed-off-by: Uwe Kleine-König
    Cc: Michael Buesch
    Cc: Andreas Schwab
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Nov, 2009

1 commit

  • POWERPC doesn't expect it to be used.

    This fixes the linux-next build failure reported by
    Stephen Rothwell:

    lib/swiotlb.c: In function 'setup_io_tlb_npages':
    lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)

    Reported-by: Stephen Rothwell
    Signed-off-by: FUJITA Tomonori
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    FUJITA Tomonori
     

10 Nov, 2009

3 commits

  • If HW IOMMU initialization fails (Intel VT-d often does this,
    typically due to BIOS bugs), we fall back to nommu. It doesn't
    work for the majority since nowadays we have more than 4GB
    memory so we must use swiotlb instead of nommu.

    The problem is that it's too late to initialize swiotlb when HW
    IOMMU initialization fails. We need to allocate swiotlb memory
    earlier from bootmem allocator. Chris explained the issue in
    detail:

    http://marc.info/?l=linux-kernel&m=125657444317079&w=2

    The current x86 IOMMU initialization sequence is too complicated
    and handling the above issue makes it more hacky.

    This patch changes x86 IOMMU initialization sequence to handle
    the above issue cleanly.

    The new x86 IOMMU initialization sequence are:

    1. we initialize the swiotlb (and setting swiotlb to 1) in the case
    of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
    swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
    the boot option, we finish here.

    2. we call the detection functions of all the IOMMUs

    3. the detection function sets x86_init.iommu.iommu_init to the
    IOMMU initialization function (so we can avoid calling the
    initialization functions of all the IOMMUs needlessly).

    4. if the IOMMU initialization function doesn't need to swiotlb
    then sets swiotlb to zero (e.g. the initialization is
    sucessful).

    5. if we find that swiotlb is set to zero, we free swiotlb
    resource.

    Signed-off-by: FUJITA Tomonori
    Cc: chrisw@sous-sol.org
    Cc: dwmw2@infradead.org
    Cc: joerg.roedel@amd.com
    Cc: muli@il.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    FUJITA Tomonori
     
  • This enables us to avoid printing swiotlb memory info when we
    initialize swiotlb. After swiotlb initialization, we could find
    that we don't need swiotlb.

    This patch removes the code to print swiotlb memory info in
    swiotlb_init() and exports the function to do that.

    Signed-off-by: FUJITA Tomonori
    Cc: chrisw@sous-sol.org
    Cc: dwmw2@infradead.org
    Cc: joerg.roedel@amd.com
    Cc: muli@il.ibm.com
    Cc: tony.luck@intel.com
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    [ -v2: merge up conflict ]
    Signed-off-by: Ingo Molnar

    FUJITA Tomonori
     
  • swiotlb_free() function frees all allocated memory for swiotlb.

    We need to initialize swiotlb before IOMMU initialization (x86
    and powerpc needs to allocate memory from bootmem allocator). If
    IOMMU initialization is successful, we need to free swiotlb
    resource (don't want to waste 64MB).

    Signed-off-by: FUJITA Tomonori
    Cc: chrisw@sous-sol.org
    Cc: dwmw2@infradead.org
    Cc: joerg.roedel@amd.com
    Cc: muli@il.ibm.com
    LKML-Reference:
    [ -v2: build fix for the !CONFIG_SWIOTLB case ]
    Signed-off-by: Ingo Molnar

    FUJITA Tomonori
     

06 Nov, 2009

1 commit


29 Oct, 2009

2 commits


27 Oct, 2009

1 commit


23 Oct, 2009

1 commit

  • Today I got:

    [39648.224782] Registered led device: iwl-phy0::TX
    [40676.545099] __ratelimit: 246 callbacks suppressed
    [40676.545103] abcdef[23675]: segfault at 0 ...

    as you can see the ratelimit message contains a function prefix.
    Since this is always __ratelimit, this wont help much.

    This patch changes __ratelimit and printk_ratelimit to print the
    function name that calls ratelimit.

    This will pinpoint the responsible function, as long as not several
    different places call ratelimit with the same ratelimit state at
    the same time. In that case we catch only one random function that
    calls ratelimit after the wait period.

    Signed-off-by: Christian Borntraeger
    Cc: Dave Young
    Cc: Linus Torvalds
    CC: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christian Borntraeger
     

13 Oct, 2009

1 commit


12 Oct, 2009

2 commits

  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (21 commits)
    [S390] dasd: fix race condition in resume code
    [S390] Add EX_TABLE for addressing exception in usercopy functions.
    [S390] 64-bit register support for 31-bit processes
    [S390] hibernate: Use correct place for CPU address in lowcore
    [S390] pm: ignore time spend in suspended state
    [S390] zcrypt: Improve some comments
    [S390] zcrypt: Fix sparse warning.
    [S390] perf_counter: fix vdso detection
    [S390] ftrace: drop nmi protection
    [S390] compat: fix truncate system call wrapper
    [S390] Provide arch specific mdelay implementation.
    [S390] Fix enabled udelay for short delays.
    [S390] cio: allow setting boxed devices offline
    [S390] cio: make not operational handling consistent
    [S390] cio: make disconnected handling consistent
    [S390] Fix memory leak in /proc/cio_ignore
    [S390] cio: channel path memory leak
    [S390] module: fix memory leak in s390 module loader
    [S390] Enable kmemleak on s390.
    [S390] 3270 console build fix
    ...

    Linus Torvalds
     
  • After m68k's task_thread_info() doesn't refer to current,
    it's possible to remove sched.h from interrupt.h and not break m68k!
    Many thanks to Heiko Carstens for allowing this.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

06 Oct, 2009

1 commit


02 Oct, 2009

1 commit

  • When using %*s, sscanf should honor conversion specifiers immediately
    following the %*s. For example, the following code should find the
    position of the end of the string "hello".

    int end;
    char buf[] = "hello world";
    sscanf(buf, "%*s%n", &end);
    printf("%d\n", end);

    Ideally, sscanf would advance the fmt and str pointers the same as it
    would without the *, but the code for that is rather complicated and is
    not included in the patch.

    Signed-off-by: Andy Spencer
    Acked-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Spencer
     

01 Oct, 2009

1 commit


29 Sep, 2009

1 commit

  • Currently we are calling the bkl tracepoint callbacks just before the
    bkl lock/unlock operations, ie the tracepoint call is not inside a
    lock_kernel() function but inside a lock_kernel() macro. Hence the
    bkl trace event header must be included from smp_lock.h. This raises
    some nasty circular header dependencies:

    linux/smp_lock.h -> trace/events/bkl.h -> trace/define_trace.h
    -> trace/ftrace.h -> linux/ftrace_event.h -> linux/hardirq.h
    -> linux/smp_lock.h

    This results in incomplete event declarations, spurious event
    definitions and other kind of funny behaviours.

    This is hardly fixable without ugly workarounds. So instead, we push
    the file name, line number and function name as lock_kernel()
    parameters, so that we only deal with the trace event header from
    lib/kernel_lock.c

    This adds two parameters to lock_kernel() and unlock_kernel() but
    it should be fine wrt to performances because this pair dos not seem
    to be called in fast paths.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Li Zefan

    Frederic Weisbecker
     

25 Sep, 2009

1 commit


24 Sep, 2009

3 commits

  • If the lzma/gzip decompressors are called with insufficient input data
    (len > 0 & fill = NULL), they will attempt to call the fill function to
    obtain more data, leading to a kernel oops.

    Signed-off-by: Phillip Lougher
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phillip Lougher
     
  • Add two events lock_kernel and unlock_kernel() to trace the bkl uses.
    This opens the door for userspace tools to perform statistics about
    the callsites that use it, dependencies with other locks (by pairing
    the trace with lock events), use with recursivity and so on...

    The {__reacquire,release}_kernel_lock() events are not traced because
    these are called from schedule, thus the sched events are sufficient
    to trace them.

    Example of a trace:

    hald-addon-stor-4152 [000] 165.875501: unlock_kernel: depth: 0, fs/block_dev.c:1358 __blkdev_put()
    hald-addon-stor-4152 [000] 167.832974: lock_kernel: depth: 0, fs/block_dev.c:1167 __blkdev_get()

    How to get the callsites that acquire it recursively:

    cd /debug/tracing/events/bkl
    echo "lock_depth > 0" > filter

    firefox-4951 [001] 206.276967: unlock_kernel: depth: 1, fs/reiserfs/super.c:575 reiserfs_dirty_inode()

    You can also filter by file and/or line.

    v2: Use of FILTER_PTR_STRING attribute for files and lines fields to
    make them traceable.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Li Zefan

    Frederic Weisbecker
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
    Use macros for .data.page_aligned section.
    Use macros for .bss.page_aligned section.
    Use new __init_task_data macro in arch init_task.c files.
    kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
    arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
    kbuild: add static to prototypes
    kbuild: fail build if recordmcount.pl fails
    kbuild: set -fconserve-stack option for gcc 4.5
    kbuild: echo the record_mcount command
    gconfig: disable "typeahead find" search in treeviews
    kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
    checkincludes.pl: add option to remove duplicates in place
    markup_oops: use modinfo to avoid confusion with underscored module names
    checkincludes.pl: provide usage helper
    checkincludes.pl: close file as soon as we're done with it
    ctags: usability fix
    kernel hacking: move STRIP_ASM_SYMS from General
    gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
    kbuild: Check if linker supports the -X option
    kbuild: introduce ld-option
    ...

    Fix trivial conflict in scripts/basic/fixdep.c

    Linus Torvalds
     

23 Sep, 2009

1 commit


22 Sep, 2009

11 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck:
    kmemcheck: add missing braces to do-while in kmemcheck_annotate_bitfield
    kmemcheck: update documentation
    kmemcheck: depend on HAVE_ARCH_KMEMCHECK
    kmemcheck: remove useless check
    kmemcheck: remove duplicated #include

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    trivial: fix typo in aic7xxx comment
    trivial: fix comment typo in drivers/ata/pata_hpt37x.c
    trivial: typo in kernel-parameters.txt
    trivial: fix typo in tracing documentation
    trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
    trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
    trivial: remove unnecessary semicolons
    trivial: Fix duplicated word "options" in comment
    trivial: kbuild: remove extraneous blank line after declaration of usage()
    trivial: improve help text for mm debug config options
    trivial: doc: hpfall: accept disk device to unload as argument
    trivial: doc: hpfall: reduce risk that hpfall can do harm
    trivial: SubmittingPatches: Fix reference to renumbered step
    trivial: fix typos "man[ae]g?ment" -> "management"
    trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
    trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
    trivial: fix missing printk space in amd_k7_smp_check
    trivial: fix typo s/ketymap/keymap/ in comment
    trivial: fix typo "to to" in multiple files
    trivial: fix typos in comments s/DGBU/DBGU/
    ...

    Linus Torvalds
     
  • Decouple kernel.h from ratelimit.h: the global declaration of
    printk's ratelimit_state is not needed, and it leads to messy
    circular dependencies due to ratelimit.h's (new) adding of a
    spinlock_types.h include.

    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: David S. Miller
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add kerneldoc annotations for function formals of type struct flex_array
    and gfp_t which are currently lacking.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • FLEX_ARRAY_INIT(element_size, total_nr_elements) cannot determine if
    either parameter is valid, so flex arrays which are statically allocated
    with this interface can easily become corrupted or reference beyond its
    allocated memory.

    This removes FLEX_ARRAY_INIT() as a struct flex_array initializer since no
    initializer may perform the required checking. Instead, the array is now
    defined with a new interface:

    DEFINE_FLEX_ARRAY(name, element_size, total_nr_elements)

    This may be prefixed with `static' for file scope.

    This interface includes compile-time checking of the parameters to ensure
    they are valid. Since the validity of both element_size and
    total_nr_elements depend on FLEX_ARRAY_BASE_SIZE and FLEX_ARRAY_PART_SIZE,
    the kernel build will fail if either of these predefined values changes
    such that the array parameters are no longer valid.

    Since BUILD_BUG_ON() requires compile time constants, several of the
    static inline functions that were once local to lib/flex_array.c had to be
    moved to include/linux/flex_array.h.

    Signed-off-by: David Rientjes
    Acked-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Add a new function to the flex_array API:

    int flex_array_shrink(struct flex_array *fa)

    This function will free all unused second-level pages. Since elements are
    now poisoned if they are not allocated with __GFP_ZERO, it's possible to
    identify parts that consist solely of unused elements.

    flex_array_shrink() returns the number of pages freed.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Newly initialized flex_array's and/or flex_array_part's are now poisoned
    with a new poison value, FLEX_ARRAY_FREE. It's value is similar to
    POISON_FREE used in the various slab allocators, but is different to
    distinguish between flex array's poisoned kmem and slab allocator poisoned
    kmem.

    This will allow us to identify flex_array_part's that only contain free
    elements (and free them with an addition to the flex_array API). This
    could also be extended in the future to identify `get' uses on elements
    that have not been `put'.

    If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided.
    These elements are considered to be in-use since they have been
    initialized.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Add a new function to the flex_array API:

    int flex_array_clear(struct flex_array *fa,
    unsigned int element_nr)

    This function will zero the element at element_nr in the flex_array.

    Although this is equivalent to using flex_array_put() and passing a
    pointer to zero'd memory, flex_array_clear() does not require such a
    pointer to memory that would most likely need to be allocated on the
    caller's stack which could be significantly large depending on
    element_size.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Signed-off-by: Marcin Slusarz
    Reviewed-by: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • I'd like to use printk_ratelimit() in NMI context, but it's not
    robust right now due to spinlock usage in lib/ratelimit.c. If an
    NMI is unlucky enough to hit just that spot we might lock up trying
    to take the spinlock again.

    Fix that by using a trylock variant. If we contend on that lock we
    can genuinely skip the message because the state is just being
    accessed by another CPU (or by this CPU).

    ( We could use atomics for the suppressed messages field, but
    i doubt it matters in practice and it makes the code heavier. )

    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: David S. Miller
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • I'd like to use printk_ratelimit() in atomic context, but that's
    not possible right now due to the spinlock usage this commit
    introduced more than a year ago:

    717115e: printk ratelimiting rewrite

    As a first step push the lock into the ratelimit state structure.
    This allows us to deal with locking failures to be considered as an
    event related to that state being too busy.

    Also clean up the code a bit (without changing functionality):

    - tidy up the definitions

    - clean up the code flow

    This also shrinks the code a tiny bit:

    text data bss dec hex filename
    264 0 4 268 10c ratelimit.o.before
    255 0 0 255 ff ratelimit.o.after

    ( Whole-kernel data size got a bit larger, because we have
    two ratelimit-state data structures right now. )

    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: David S. Miller
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar