15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the remaining one-off uses of the __cpuinit macros
    from all C files in the drivers/* directory.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: Greg Kroah-Hartman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Feb, 2013

1 commit


09 Oct, 2012

1 commit

  • Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
    a task's executable file. After this patch they will use mm->exe_file
    directly. mm->exe_file is protected with mm->mmap_sem, so locking stays
    the same.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Chris Metcalf [arch/tile]
    Acked-by: Tetsuo Handa [tomoyo]
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Acked-by: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

27 Aug, 2012

1 commit

  • Under certain workloads we see the following warnings:

    WQ on CPU0, prefer CPU1
    WQ on CPU0, prefer CPU2
    WQ on CPU0, prefer CPU3

    It warns the user that the wq to access a per-cpu buffers runs not on
    the same cpu. This happens if the wq is rescheduled on a different cpu
    than where the buffer is located. This was probably implemented to
    detect performance issues. Not sure if there actually is one as the
    buffers are copied to a single buffer anyway which should be the
    actual bottleneck.

    We wont change WQ implementation. Since a user can do nothing the
    warning is pointless. Removing it.

    Cc: Andi Kleen
    Signed-off-by: Robert Richter

    Robert Richter
     

22 Jun, 2012

1 commit

  • This changes oprofile_perf.c to use the per-cpu framework.

    Using the per-cpu framework should avoid error like the following:

    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:28:28: error: variably modified 'perf_events' at file scope

    Reported-by: William Cohen
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Robert Richter
     

21 Jun, 2012

1 commit

  • The OProfile perf backend uses a static array to keep track of the
    perf events on the system. When compiling with CONFIG_CPUMASK_OFFSTACK=y
    && SMP, nr_cpumask_bits is not a compile-time constant and the build
    will fail with:

    oprofile_perf.c:28: error: variably modified 'perf_events' at file scope

    This patch uses NR_CPUs instead of nr_cpumask_bits for the array
    initialisation. If this causes space problems in the future, we can
    always move to dynamic allocation for the events array.

    Cc: Matt Fleming
    Reported-by: Russell King - ARM Linux
    Signed-off-by: Will Deacon
    Cc: # v2.6.37+
    Signed-off-by: Robert Richter

    Will Deacon
     

06 Apr, 2012

1 commit

  • Many users of debugfs copy the implementation of default_open() when
    they want to support a custom read/write function op. This leads to a
    proliferation of the default_open() implementation across the entire
    tree.

    Now that the common implementation has been consolidated into libfs we
    can replace all the users of this function with simple_open().

    This replacement was done with the following semantic patch:

    @ open @
    identifier open_f != simple_open;
    identifier i, f;
    @@
    -int open_f(struct inode *i, struct file *f)
    -{
    (
    -if (i->i_private)
    -f->private_data = i->i_private;
    |
    -f->private_data = i->i_private;
    )
    -return 0;
    -}

    @ has_open depends on open @
    identifier fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ...
    -.open = open_f,
    +.open = simple_open,
    ...
    };

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: Al Viro
    Cc: Julia Lawall
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

21 Mar, 2012

2 commits


07 Jan, 2012

1 commit

  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
    perf kvm: Fix copy & paste error in description
    perf script: Kill script_spec__delete
    perf top: Fix a memory leak
    perf stat: Introduce get_ratio_color() helper
    perf session: Remove impossible condition check
    perf tools: Fix feature-bits rework fallout, remove unused variable
    perf script: Add generic perl handler to process events
    perf tools: Use for_each_set_bit() to iterate over feature flags
    perf tools: Unify handling of features when writing feature section
    perf report: Accept fifos as input file
    perf tools: Moving code in some files
    perf tools: Fix out-of-bound access to struct perf_session
    perf tools: Continue processing header on unknown features
    perf tools: Improve macros for struct feature_ops
    perf: builtin-record: Document and check that mmap_pages must be a power of two.
    perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
    perf tools: Fix truncated annotation
    perf script: look up thread using tid instead of pid
    perf tools: Look up thread names for system wide profiling
    perf tools: Fix comm for processes with named threads
    ...

    Linus Torvalds
     

20 Dec, 2011

2 commits


07 Dec, 2011

1 commit


15 Nov, 2011

2 commits


04 Nov, 2011

3 commits

  • The legacy x86 nmi watchdog code was removed with the implementation
    of the perf based nmi watchdog. This broke Oprofile's nmi timer
    mode. To run nmi timer mode we relied on a continuous ticking nmi
    source which the nmi watchdog provided. The nmi tick was no longer
    available and current watchdog can not be used anymore since it runs
    with very long periods in the range of seconds. This patch
    reimplements the nmi timer mode using a perf counter nmi source.

    V2:
    * removing pr_info()
    * fix undefined reference to `__udivdi3' for 32 bit build
    * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb
    * removed nmi timer setup in arch/x86
    * implemented function stubs for op_nmi_init/exit()
    * made code more readable in oprofile_init()

    V3:
    * fix architectural initialization in oprofile_init()
    * fix CONFIG_OPROFILE_NMI_TIMER dependencies

    Acked-by: Peter Zijlstra
    Signed-off-by: Robert Richter

    Robert Richter
     
  • Remove exit functions by moving init/exit code to oprofile's setup/
    shutdown functions. Doing so the oprofile module exit code will be
    easier and less error-prone.

    Signed-off-by: Robert Richter

    Robert Richter
     
  • Oprofile may crash in a KVM guest while unlaoding modules. This
    happens if oprofile_arch_init() fails and oprofile switches to the hr
    timer mode as a fallback. In this case oprofile_arch_exit() is called,
    but it never was initialized properly which causes the crash. This
    patch fixes this.

    oprofile: using timer interrupt.
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] unregister_syscore_ops+0x41/0x58
    PGD 41da3f067 PUD 41d80e067 PMD 0
    Oops: 0002 [#1] PREEMPT SMP
    CPU 5
    Modules linked in: oprofile(-)

    Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d #18 Advanced Micro Device Anaheim/Anaheim
    RIP: 0010:[] [] unregister_syscore_ops+0x41/0x58
    RSP: 0018:ffff88041de1de98 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200
    RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
    RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082
    R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080
    R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
    FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040)
    Stack:
    ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e
    ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993
    ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200
    Call Trace:
    [] op_nmi_exit+0x15/0x17 [oprofile]
    [] oprofile_arch_exit+0xe/0x10 [oprofile]
    [] oprofile_exit+0x13/0x15 [oprofile]
    [] sys_delete_module+0x1c3/0x22f
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b
    Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
    89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
    RIP [] unregister_syscore_ops+0x41/0x58
    RSP
    CR2: 0000000000000008
    ---[ end trace 06d4e95b6aa3b437 ]---

    CC: stable@kernel.org # 2.6.37+
    Signed-off-by: Robert Richter

    Robert Richter
     

13 Sep, 2011

1 commit

  • The oprofilefs_lock can be taken in atomic context (in profiling
    interrupts) and therefore cannot cannot be preempted on -rt -
    annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

22 Jul, 2011

1 commit


01 Jul, 2011

1 commit

  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     

31 May, 2011

2 commits

  • This fixes the A->B/B->A locking dependency, see the warning below.

    The function task_exit_notify() is called with (task_exit_notifier)
    .rwsem set and then calls sync_buffer() which locks buffer_mutex. In
    sync_start() the buffer_mutex was set to prevent notifier functions to
    be started before sync_start() is finished. But when registering the
    notifier, (task_exit_notifier).rwsem is locked too, but now in
    different order than in sync_buffer(). In theory this causes a locking
    dependency, what does not occur in practice since task_exit_notify()
    is always called after the notifier is registered which means the lock
    is already released.

    However, after checking the notifier functions it turned out the
    buffer_mutex in sync_start() is unnecessary. This is because
    sync_buffer() may be called from the notifiers even if sync_start()
    did not finish yet, the buffers are already allocated but empty. No
    need to protect this with the mutex.

    So we fix this theoretical locking dependency by removing buffer_mutex
    in sync_start(). This is similar to the implementation before commit:

    750d857 oprofile: fix crash when accessing freed task structs

    which introduced the locking dependency.

    Lockdep warning:

    oprofiled/4447 is trying to acquire lock:
    (buffer_mutex){+.+...}, at: [] sync_buffer+0x31/0x3ec [oprofile]

    but task is already holding lock:
    ((task_exit_notifier).rwsem){++++..}, at: [] __blocking_notifier_call_chain+0x39/0x67

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 ((task_exit_notifier).rwsem){++++..}:
    [] lock_acquire+0xf8/0x11e
    [] down_write+0x44/0x67
    [] blocking_notifier_chain_register+0x52/0x8b
    [] profile_event_register+0x2d/0x2f
    [] sync_start+0x47/0xc6 [oprofile]
    [] oprofile_setup+0x60/0xa5 [oprofile]
    [] event_buffer_open+0x59/0x8c [oprofile]
    [] __dentry_open+0x1eb/0x308
    [] nameidata_to_filp+0x60/0x67
    [] do_last+0x5be/0x6b2
    [] path_openat+0xc7/0x360
    [] do_filp_open+0x3d/0x8c
    [] do_sys_open+0x110/0x1a9
    [] sys_open+0x20/0x22
    [] system_call_fastpath+0x16/0x1b

    -> #0 (buffer_mutex){+.+...}:
    [] __lock_acquire+0x1085/0x1711
    [] lock_acquire+0xf8/0x11e
    [] mutex_lock_nested+0x63/0x309
    [] sync_buffer+0x31/0x3ec [oprofile]
    [] task_exit_notify+0x16/0x1a [oprofile]
    [] notifier_call_chain+0x37/0x63
    [] __blocking_notifier_call_chain+0x50/0x67
    [] blocking_notifier_call_chain+0x14/0x16
    [] profile_task_exit+0x1a/0x1c
    [] do_exit+0x2a/0x6fc
    [] do_group_exit+0x83/0xae
    [] sys_exit_group+0x17/0x1b
    [] system_call_fastpath+0x16/0x1b

    other info that might help us debug this:

    1 lock held by oprofiled/4447:
    #0: ((task_exit_notifier).rwsem){++++..}, at: [] __blocking_notifier_call_chain+0x39/0x67

    stack backtrace:
    Pid: 4447, comm: oprofiled Not tainted 2.6.39-00007-gcf4d8d4 #10
    Call Trace:
    [] print_circular_bug+0xae/0xbc
    [] __lock_acquire+0x1085/0x1711
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] lock_acquire+0xf8/0x11e
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] ? mark_lock+0x42f/0x552
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] mutex_lock_nested+0x63/0x309
    [] ? sync_buffer+0x31/0x3ec [oprofile]
    [] sync_buffer+0x31/0x3ec [oprofile]
    [] ? __blocking_notifier_call_chain+0x39/0x67
    [] ? __blocking_notifier_call_chain+0x39/0x67
    [] task_exit_notify+0x16/0x1a [oprofile]
    [] notifier_call_chain+0x37/0x63
    [] __blocking_notifier_call_chain+0x50/0x67
    [] blocking_notifier_call_chain+0x14/0x16
    [] profile_task_exit+0x1a/0x1c
    [] do_exit+0x2a/0x6fc
    [] ? retint_swapgs+0xe/0x13
    [] do_group_exit+0x83/0xae
    [] sys_exit_group+0x17/0x1b
    [] system_call_fastpath+0x16/0x1b

    Reported-by: Marcin Slusarz
    Cc: Carl Love
    Cc: # .36+
    Signed-off-by: Robert Richter

    Robert Richter
     
  • After registering the task free notifier we possibly have tasks in our
    dying_tasks list. Free them after unregistering the notifier in case
    of an error.

    Cc: # .36+
    Signed-off-by: Robert Richter

    Robert Richter
     

24 May, 2011

1 commit


15 Feb, 2011

3 commits

  • This patch is a rework of the hwsampler oprofile implementation that
    has been applied recently. Now there are less non-architectural
    changes. The only changes are:

    * introduction of oprofile_add_ext_hw_sample(), and
    * removal of section attributes of oprofile_timer_init/_exit().

    To setup hwsampler for oprofile we need to modify start()/stop()
    callbacks and additional hwsampler control files in oprofilefs. We do
    not reinitialize the timer or hwsampler mode by restarting calling
    init/exit() anymore, instead hwsampler_running is used to switch the
    mode directly in oprofile_hwsampler_start/_stop(). For locking reasons
    there is also hwsampler_file that reflects the value in oprofilefs.

    The overall diffstat of the oprofile s390 hwsampler implemenation
    shows the low impact to non-architectural code:

    arch/Kconfig | 3 +
    arch/s390/Kconfig | 1 +
    arch/s390/oprofile/Makefile | 2 +-
    arch/s390/oprofile/hwsampler.c | 1256 ++++++++++++++++++++++++++++++++++
    arch/s390/oprofile/hwsampler.h | 113 +++
    arch/s390/oprofile/hwsampler_files.c | 162 +++++
    arch/s390/oprofile/init.c | 6 +-
    drivers/oprofile/cpu_buffer.c | 24 +-
    drivers/oprofile/timer_int.c | 4 +-
    include/linux/oprofile.h | 7 +
    10 files changed, 1567 insertions(+), 11 deletions(-)

    Acked-by: Heiko Carstens
    Signed-off-by: Robert Richter

    Robert Richter
     
  • OProfile is enhanced to export all files for controlling System z's
    hardware sampling, and to invoke hwsampler exported functions to
    initialize and use System z's hardware sampling.

    The patch invokes hwsampler_setup() during oprofile init and exports
    following hwsampler files under oprofilefs if hwsampler's setup
    succeeded:

    A new directory for hardware sampling based files

    /dev/oprofile/hwsampling/

    The userland daemon must explicitly write to the following files
    to disable (or enable) hardware based sampling

    /dev/oprofile/hwsampling/hwsampler

    to modify the actual sampling rate

    /dev/oprofile/hwsampling/hw_interval

    to modify the amount of sampling memory (measured in 4K pages)

    /dev/oprofile/hwsampling/hw_sdbt_blocks

    The following files are read only and show
    the possible minimum sampling rate

    /dev/oprofile/hwsampling/hw_min_interval

    the possible maximum sampling rate

    /dev/oprofile/hwsampling/hw_max_interval

    The patch splits the oprofile_timer_[init/exit] function so that it
    can be also called through user context (oprofilefs) to avoid kernel
    oops.

    Applied with following changes:
    * whitespace changes in Makefile and timer_int.c

    Signed-off-by: Mahesh Salgaonkar
    Signed-off-by: Maran Pakkirisamy
    Signed-off-by: Heinz Graalfs
    Acked-by: Heiko Carstens
    Signed-off-by: Robert Richter

    Heinz Graalfs
     
  • This patch introduces a new oprofile sample add function
    (oprofile_add_ext_hw_sample) that can also take task_struct as an
    argument, which is used by the hwsampler kernel module when copying
    hardware samples to OProfile buffers.

    Applied with following changes:
    * removed #include
    * whitespace changes
    * removed conditional compilation (CONFIG_HAVE_HWSAMPLER)
    * modified order of functions
    * fix missing function definition in header file

    Signed-off-by: Mahesh Salgaonkar
    Signed-off-by: Maran Pakkirisamy
    Signed-off-by: Heinz Graalfs
    Acked-by: Heiko Carstens
    Signed-off-by: Robert Richter

    Heinz Graalfs
     

31 Oct, 2010

1 commit

  • …nel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    jump label: Add work around to i386 gcc asm goto bug
    x86, ftrace: Use safe noops, drop trap test
    jump_label: Fix unaligned traps on sparc.
    jump label: Make arch_jump_label_text_poke_early() optional
    jump label: Fix error with preempt disable holding mutex
    oprofile: Remove deprecated use of flush_scheduled_work()
    oprofile: Fix the hang while taking the cpu offline
    jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex
    jump label: Fix module __init section race

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Check irq_remapped instead of remapping_enabled in destroy_irq()

    Linus Torvalds
     

30 Oct, 2010

1 commit


29 Oct, 2010

3 commits

  • flush_scheduled_work() is deprecated and scheduled to be removed.
    sync_stop() currently cancels cpu_buffer works inside buffer_mutex and
    flushes the system workqueue outside. Instead, split end_cpu_work()
    into two parts - stopping further work enqueues and flushing works -
    and do the former inside buffer_mutex and latter outside.

    For stable kernels v2.6.35.y and v2.6.36.y.

    Signed-off-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Robert Richter

    Tejun Heo
     
  • The kernel build with CONFIG_OPROFILE and CPU_HOTPLUG enabled.
    The oprofile is initialised using system timer in absence of hardware
    counters supports. Oprofile isn't started from userland.

    In this setup while doing a CPU offline the kernel hangs in infinite
    for loop inside lock_hrtimer_base() function

    This happens because as part of oprofile_cpu_notify(, it tries to
    stop an hrtimer which was never started. These per-cpu hrtimers
    are started when the oprfile is started.
    echo 1 > /dev/oprofile/enable

    This problem also existwhen the cpu is booted with maxcpus parameter
    set. When bringing the remaining cpus online the timers are started
    even if oprofile is not yet enabled.

    This patch fix this issue by adding a state variable so that
    these hrtimer start/stop is only attempted when oprofile is
    started

    For stable kernels v2.6.35.y and v2.6.36.y.

    Reported-by: Jan Sebastien
    Tested-by: sricharan
    Signed-off-by: Santosh Shilimkar
    Cc: stable@kernel.org
    Signed-off-by: Robert Richter

    Santosh Shilimkar
     
  • Signed-off-by: Al Viro

    Al Viro
     

26 Oct, 2010

1 commit

  • Instead of always assigning an increasing inode number in new_inode
    move the call to assign it into those callers that actually need it.
    For now callers that need it is estimated conservatively, that is
    the call is added to all filesystems that do not assign an i_ino
    by themselves. For a few more filesystems we can avoid assigning
    any inode number given that they aren't user visible, and for others
    it could be done lazily when an inode number is actually needed,
    but that's left for later patches.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

15 Oct, 2010

4 commits

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     
  • Make !CONFIG_PM function stubs static inline and remove section
    attribute.

    Signed-off-by: Robert Richter

    Robert Richter
     
  • Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
    cleanup on failure) caused oprofile_perf_exit to be called
    in the cleanup path of oprofile_perf_init. The __exit tag
    for oprofile_perf_exit should therefore be dropped.

    The same has to be done for exit_driverfs as well, as this
    function is called from oprofile_perf_exit. Else, we get
    the following two linker errors.

    LD .tmp_vmlinux1
    `oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
    make: *** [.tmp_vmlinux1] Error 1

    LD .tmp_vmlinux1
    `exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
    make: *** [.tmp_vmlinux1] Error 1

    Signed-off-by: Anand Gadiyar
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Anand Gadiyar
     
  • oprofile_perf.c needs to include platform_device.h
    Otherwise we get the following build break.

    CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':

    Signed-off-by: Anand Gadiyar
    Cc: Matt Fleming
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Anand Gadiyar