07 Jul, 2009

2 commits

  • do_execve() and ptrace_attach() return -EINTR if
    mutex_lock_interruptible(->cred_guard_mutex) fails.

    This is not right, change the code to return ERESTARTNOINTR.

    Perhaps we should also change proc_pid_attr_write().

    Signed-off-by: Oleg Nesterov
    Cc: David Howells
    Acked-by: Roland McGrath
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • These warnings were observed on MIPS32 using 2.6.31-rc1 and gcc-4.2.0:

    mm/page_alloc.c: In function 'alloc_pages_exact':
    mm/page_alloc.c:1986: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast

    drivers/usb/mon/mon_bin.c: In function 'mon_alloc_buff':
    drivers/usb/mon/mon_bin.c:1264: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast

    [akpm@linux-foundation.org: fix kernel/perf_counter.c too]
    Signed-off-by: Kevin Cernekee
    Cc: Andi Kleen
    Cc: Ralf Baechle
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Cernekee
     

01 Jul, 2009

4 commits

  • * 'kmemleak' of git://linux-arm.org/linux-2.6:
    kmemleak: Inform kmemleak about pid_hash
    kmemleak: Do not warn if an unknown object is freed
    kmemleak: Do not report new leaked objects if the scanning was stopped
    kmemleak: Slightly change the policy on newly allocated objects
    kmemleak: Do not trigger a scan when reading the debug/kmemleak file
    kmemleak: Simplify the reports logged by the scanning thread
    kmemleak: Enable task stacks scanning by default
    kmemleak: Allow the early log buffer to be configurable.

    Linus Torvalds
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
    perf report: Add --symbols parameter
    perf report: Add --comms parameter
    perf report: Add --dsos parameter
    perf_counter tools: Adjust only prelinked symbol's addresses
    perf_counter: Provide a way to enable counters on exec
    perf_counter tools: Reduce perf stat measurement overhead/skew
    perf stat: Use percentages for scaling output
    perf_counter, x86: Update x86_pmu after WARN()
    perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
    perf stat: Improve output
    perf stat: Fix multi-run stats
    perf stat: Add -n/--null option to run without counters
    perf_counter tools: Remove dead code
    perf_counter: Complete counter swap
    perf report: Print sorted callchains per histogram entries
    perf_counter tools: Prepare a small callchain framework
    perf record: Fix unhandled io return value
    perf_counter tools: Add alias for 'l1d' and 'l1i'
    perf-report: Add bare minimum PERF_EVENT_READ parsing
    perf-report: Add modes for inherited stats and no-samples
    ...

    Linus Torvalds
     
  • The file opened in acct_on and freshly stored in the ns->bacct struct can
    be closed in acct_file_reopen by a concurrent call after we release
    acct_lock and before we call mntput(file->f_path.mnt).

    Record file->f_path.mnt in a local variable and use this variable only.

    Signed-off-by: Renaud Lottiaux
    Signed-off-by: Louis Rilling
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Renaud Lottiaux
     
  • When the 32-bit signed quantities get assigned to the u64 resource_size_t,
    they are incorrectly sign-extended.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253
    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905

    Signed-off-by: Zhang Rui
    Reported-by: Leann Ogasawara
    Cc: Pierre Ossman
    Reported-by:
    Tested-by:
    Cc:
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Rui
     

30 Jun, 2009

2 commits


29 Jun, 2009

3 commits

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, delay: tsc based udelay should have rdtsc_barrier
    x86, setup: correct include file in <asm/boot.h>
    x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
    x86, mce: percpu mcheck_timer should be pinned
    x86: Add sysctl to allow panic on IOCK NMI error
    x86: Fix uv bau sending buffer initialization
    x86, mce: Fix mce resume on 32bit
    x86: Move init_gbpages() to setup_arch()
    x86: ensure percpu lpage doesn't consume too much vmalloc space
    x86: implement percpu_alloc kernel parameter
    x86: fix pageattr handling for lpage percpu allocator and re-enable it
    x86: reorganize cpa_process_alias()
    x86: prepare setup_pcpu_lpage() for pageattr fix
    x86: rename remap percpu first chunk allocator to lpage
    x86: fix duplicate free in setup_pcpu_remap() failure path
    percpu: fix too lazy vunmap cache flushing
    x86: Set cpu_llc_id on AMD CPUs

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timer stats: Optimize by adding quick check to avoid function calls
    timers: Fix timer_migration interface which accepts any number as input

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ftrace: Fix the output of profile
    ring-buffer: Make it generally available
    ftrace: Remove duplicate newline
    tracing: Fix trace_buf_size boot option
    ftrace: Fix t_hash_start()
    ftrace: Don't manipulate @pos in t_start()
    ftrace: Don't increment @pos in g_start()
    tracing: Reset iterator in t_start()
    trace_stat: Don't increment @pos in seq start()
    tracing_bprintk: Don't increment @pos in t_start()
    tracing/events: Don't increment @pos in s_start()

    Linus Torvalds
     

26 Jun, 2009

8 commits

  • Complete the counter swap by indeed switching the times too and
    updating the userpage after modifying the counter values.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The first entry of the ftrace profile was always skipped when
    reading trace_stat/functionX.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • This patch introduces a new sysctl:

    /proc/sys/kernel/panic_on_io_nmi

    which defaults to 0 (off).

    When enabled, the kernel panics when the kernel receives an NMI
    caused by an IO error.

    The IO error triggered NMI indicates a serious system
    condition, which could result in IO data corruption. Rather
    than contiuing, panicing and dumping might be a better choice,
    so one can figure out what's causing the IO error.

    This could be especially important to companies running IO
    intensive applications where corruption must be avoided, e.g. a
    bank's databases.

    [ SuSE has been shipping it for a while, it was done at the
    request of a large database vendor, for their users. ]

    Signed-off-by: Kurt Garloff
    Signed-off-by: Roberto Angelino
    Signed-off-by: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Kurt Garloff
     
  • The PERF_EVENT_READ implementation made me realize we don't
    actually need the sample_type int the output sample, since
    we already have that in the perf_counter_attr information.

    Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the
    event->type overloading, and imply put counter overflow
    samples in a PERF_EVENT_SAMPLE type.

    This also fixes the issue that event->type was only 32-bit
    and sample_type had 64 usable bits.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • With the introduction of PERF_EVENT_READ we have the
    possibility to provide accurate counter values for
    individual tasks in a task hierarchy.

    However, due to the lazy context switching used for similar
    counter contexts our current per task counts are way off.

    In order to maintain some of the lazy switch benefits we
    don't disable it out-right, but simply iterate the active
    counters and flip the values between the contexts.

    This only reads the counters but does not need to reprogram
    the full PMU.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide a read() like event which can be used to log the
    counter value at specific sites such as child->parent
    folding on exit.

    In order to be useful, we log the counter parent ID, not the
    actual counter ID, since userspace can only relate parent
    IDs to perf_counter_attr constructs.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Update the mmap control page with the needed information to
    use the userspace RDPMC instruction for self monitoring.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Add the needed time scale to the self-profile mmap information.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Jun, 2009

6 commits

  • Yanmin noticed that fault_in_user_writeable() requests 4 pages instead
    of one.

    That's the result of blindly trusting Linus' proposal :) I even looked
    up the prototype to verify the correctness: the argument in question
    is confusingly enough named "len" while in reality it means number of
    pages.

    Pointed-out-by: Yanmin Zhang
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • In hunting down the cause for the hwlat_detector ring buffer spew in
    my failed -next builds it became obvious that folks are now treating
    ring_buffer as something that is generic independent of tracing and thus,
    suitable for public driver consumption.

    Given that there are only a few minor areas in ring_buffer that have any
    reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
    those and make it generally available.

    Signed-off-by: Paul Mundt
    Cc: Jon Masters
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mundt
     
  • Before:
    # echo 'sys_open:traceon:' > set_ftrace_filter
    # echo 'sys_close:traceoff:5' > set_ftrace_filter
    # cat set_ftrace_filter
    #### all functions enabled ####
    sys_open:traceon:unlimited

    sys_close:traceoff:count=0

    After:
    # cat set_ftrace_filter
    #### all functions enabled ####
    sys_open:traceon:unlimited
    sys_close:traceoff:count=0

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • …/{vfs-2.6,audit-current}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    another race fix in jfs_check_acl()
    Get "no acls for this inode" right, fix shmem breakage
    inline functions left without protection of ifdef (acl)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
    audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL

    Linus Torvalds
     
  • Even though one cannot make use of the audit watch code without
    CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
    the audit rule filtering requires that it at least be compiled.

    Thus build the audit_watch code when we build auditfilter like it was
    before cfcad62c74abfef83762dc05a556d21bdf3980a2

    Clearly this is a point of potential future cleanup..

    Reported-by: Frans Pop
    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     
  • commit 64d1304a64 (futex: setup writeable mapping for futex ops which
    modify user space data) did address only half of the problem of write
    access faults.

    The patch was made on two wrong assumptions:

    1) access_ok(VERIFY_WRITE,...) would actually check write access.

    On x86 it does _NOT_. It's a pure address range check.

    2) a RW mapped region can not go away under us.

    That's wrong as well. Nobody can prevent another thread to call
    mprotect(PROT_READ) on that region where the futex resides. If that
    call hits between the get_user_pages_fast() verification and the
    actual write access in the atomic region we are toast again.

    The solution is to not rely on access_ok and get_user() for any write
    access related fault on private and shared futexes. Instead we need to
    fault it in with verification of write access.

    There is no generic non destructive write mechanism which would fault
    the user page in trough a #PF, but as we already know that we will
    fault we can as well call get_user_pages() directly and avoid the #PF
    overhead.

    If get_user_pages() returns -EFAULT we know that we can not fix it
    anymore and need to bail out to user space.

    Remove a bunch of confusing comments on this issue as well.

    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

24 Jun, 2009

15 commits

  • We should be able to specify [KMG] when setting trace_buf_size
    boot option, as documented in kernel-parameters.txt

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • When the kernel is configured with CONFIG_TIMER_STATS but timer
    stats are runtime disabled we still get calls to
    __timer_stats_timer_set_start_info which initializes some
    fields in the corresponding struct timer_list.

    So add some quick checks in the the timer stats setup functions
    to avoid function calls to __timer_stats_timer_set_start_info
    when timer stats are disabled.

    In an artificial workload that does nothing but playing ping
    pong with a single tcp packet via loopback this decreases cpu
    consumption by 1 - 1.5%.

    This is part of a modified function trace output on SLES11:

    perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer
    Cc: Andrew Morton
    Cc: Martin Schwidefsky
    Cc: Mustafa Mesanovic
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     
  • When the output of set_ftrace_filter is larger than PAGE_SIZE,
    t_hash_start() will be called the 2nd time, and then we start
    from the head of a hlist, which is wrong and causes some entries
    to be outputed twice.

    The worse is, if the hlist is large enough, reading set_ftrace_filter
    won't stop but in a dead loop.

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • It's rather confusing that in t_start(), in some cases @pos is
    incremented, and in some cases it's decremented and then incremented.

    This patch rewrites t_start() in a much more general way.

    Thus we fix a bug that if ftrace_filtered == 1, functions have tracer
    hooks won't be printed, because the branch is always unreachable:

    static void *t_start(...)
    {
    ...
    if (!p)
    return t_hash_start(m, pos);
    return p;
    }

    Before:
    # echo 'sys_open' > /mnt/tracing/set_ftrace_filter
    # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter
    sys_open

    After:
    # echo 'sys_open' > /mnt/tracing/set_ftrace_filter
    # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter
    sys_open
    sys_write:traceon:count=4

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • It's wrong to increment @pos in g_start(). It causes some entries
    lost when reading set_graph_function, if the output of the file
    is larger than PAGE_SIZE.

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • The iterator is m->private, but it's not reset to trace_types in
    t_start(). If the output is larger than PAGE_SIZE and t_start()
    is called the 2nd time, things will go wrong.

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • It's wrong to increment @pos in stat_seq_start(). It causes some
    stat entries lost when reading stat file, if the output of the file
    is larger than PAGE_SIZE.

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • It's wrong to increment @pos in t_start(), otherwise we'll lose
    some entries when reading printk_formats, if the output is larger
    than PAGE_SIZE.

    Reported-by: Lai Jiangshan
    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • While testing syscall tracepoints posted by Jason, I found 3 entries
    were missing when reading available_events. The output size of
    available_events is < 4 pages, which means we lost 1 entry per page.

    The cause is, it's wrong to increment @pos in s_start().

    Actually there's another bug here -- reading avaiable_events/set_events
    can race with module unload:

    # cat available_events |
    s_start() |
    s_stop() |
    | # rmmod foo.ko
    s_start() |
    call = list_entry(m->private) |

    @call might be freed and accessing it will lead to crash.

    Reviewed-by: Liming Wang
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • If syscall removes the root of subtree being watched, we
    definitely do not want the rules refering that subtree
    to be destroyed without the syscall in question having
    a chance to match them.

    Signed-off-by: Al Viro

    Al Viro
     
  • A number of places in the audit system we send an op= followed by a string
    that includes spaces. Somehow this works but it's just wrong. This patch
    moves all of those that I could find to be quoted.

    Example:

    Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
    subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule
    key="number2" list=4 res=0

    Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
    subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule"
    key="number2" list=4 res=0

    Signed-off-by: Eric Paris

    Eric Paris
     
  • audit_get_nd() is only used by audit_watch and could be more cleanly
    implemented by having the audit watch functions call it when needed rather
    than making the generic audit rule parsing code deal with those objects.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • In preparation for converting audit to use fsnotify instead of inotify we
    seperate the inode watching code into it's own file. This is similar to
    how the audit tree watching code is already seperated into audit_tree.c

    Signed-off-by: Eric Paris

    Eric Paris
     
  • audit_receive_skb is hard to clearly parse what it is doing to the netlink
    message. Clean the function up so it is easy and clear to see what is going
    on.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • The audit handling of netlink messages is all over the place. Clean things
    up, use predetermined macros, generally make it more readable.

    Signed-off-by: Eric Paris

    Eric Paris