14 Sep, 2010

11 commits

  • The set_ftrace_filter uses seq_file and reads from two lists. The
    pointer returned by t_next() can either be of type struct dyn_ftrace
    or struct ftrace_func_probe. If there is a bug (there was one)
    the wrong pointer may be used and the reference can cause an oops.

    This patch makes t_next() and friends only return the iterator structure
    which now has a pointer of type struct dyn_ftrace and struct
    ftrace_func_probe. The t_show() can now test if the pointer is NULL or
    not and if the pointer exists, it is guaranteed to be of the correct type.

    Now if there's a bug, only wrong data will be shown but not an oops.

    Cc: Chris Wright
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • After the filtered functions are read, the probed functions are read
    from the hash in set_ftrace_filter. When the hashed probed functions
    are read, the *pos passed in is reset. Instead of modifying the pos
    given to the read function, just record the pos where the filtered
    functions ended and subtract from that.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • * 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Improve latencies under load by decreasing minimum scheduling granularity

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
    m68k,m68knommu: Wire up fanotify_init, fanotify_mark, and prlimit64

    Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] fix siglock

    Quoth Tony:

    "I committed the fix for this last week prior to your -rc4 announcement
    reminding us to give proper "Reported-by:" credit. This one should have
    had:

    Reported-by: Tony Ernst

    and also

    Much-useful-investigation-and-tracing-by: Hedi Berriche
    Much-useful-investigation-and-tracing-by: Petr Tesarik "

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: prevent possible memory corruption in cifs_demultiplex_thread
    cifs: eliminate some more premature cifsd exits
    cifs: prevent cifsd from exiting prematurely
    [CIFS] ntlmv2/ntlmssp remove-unused-function CalcNTLMv2_partial_mac_key
    cifs: eliminate redundant xdev check in cifs_rename
    Revert "[CIFS] Fix ntlmv2 auth with ntlmssp"
    Revert "missing changes during ntlmv2/ntlmssp auth and sign"
    Revert "Eliminate sparse warning - bad constant expression"
    Revert "[CIFS] Eliminate unused variable warning"

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: Don't use dotl version of mknod for dotu inode operations
    fs/9p: Use the correct dentry operations
    9p: Check for NULL fid in v9fs_dir_release()
    fs/9p: Fix error handling in v9fs_get_sb
    fs/9p, net/9p: memory leak fixes

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    dquot: do full inode dirty in allocating space

    Linus Torvalds
     
  • * 'next-spi' of git://git.secretlab.ca/git/linux-2.6:
    spi/pl022: move probe call to subsys_initcall()
    powerpc/5200: mpc52xx_uart.c: Add of_node_put to avoid memory leak
    spi/pl022: fix APB pclk power regression on U300
    spi/spi_s3c64xx: Warn if PIO transfers time out
    spi/s3c64xx: Fix incorrect reuse of 'val' local variable.
    spi/s3c64xx: Fix compilation warning
    spi/dw_spi: clean the cs_control code
    spi/dw_spi: Allow interrupt sharing
    spi/spi_s3c64xx: Increase dead reckoning time in wait_for_xfer()
    spi/spi_s3c64xx: Move to subsys_initcall()
    spi: free children in spi_unregister_master, not siblings
    gpiolib: Add 'struct gpio_chip' forward declaration for !GPIOLIB case
    of: Fix missing includes - ll_temac
    spi/spi_s3c64xx: Staticise non-exported functions
    spi/spi_s3c64xx: Make probe more robust against missing board config

    Linus Torvalds
     
  • Signed-off-by: Geert Uytterhoeven
    Acked-by: Greg Ungerer

    Geert Uytterhoeven
     
  • Mathieu reported bad latencies with make -j10 kind of kbuild
    workloads - which is mostly caused by us scheduling with a
    too coarse granularity.

    Reduce the minimum granularity some more, to make sure we
    can meet the latency target.

    I got the following results (make -j10 kbuild load, average of 3
    runs):

    vanilla:

    maximum latency: 38278.9 µs
    average latency: 7730.1 µs

    patched:

    maximum latency: 22702.1 µs
    average latency: 6684.8 µs

    Mathieu also measured it:

    |
    | * wakeup-latency.c (SIGEV_THREAD) with make -j10
    |
    | - Mainline 2.6.35.2 kernel
    |
    | maximum latency: 45762.1 µs
    | average latency: 7348.6 µs
    |
    | - With only Peter's smaller min_gran (shown below):
    |
    | maximum latency: 29100.6 µs
    | average latency: 6684.1 µs
    |

    Reported-by: Mathieu Desnoyers
    Reported-by: Linus Torvalds
    Acked-by: Mathieu Desnoyers
    Suggested-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

13 Sep, 2010

6 commits


12 Sep, 2010

6 commits

  • Fix docbook templates that reference files that do not contain the
    expected kernel-doc notation.

    Fixes these warnings:

    Warning(arch/x86/include/asm/unaligned.h): no structured comments found
    Warning(lib/vsprintf.c): no structured comments found

    These cause errors in the generated html output, like below, so drop
    these lines.

    Name
    arch/x86/include/asm/unaligned.h - Document generation inconsistency
    Oops
    Warning
    The template for this document tried to insert the structured comment from the file arch/x86/include/asm/unaligned.h at this point, but none was found. This dummy section is inserted to allow generation to continue.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • When you don't use !E or !I but only !F, then it's very easy to miss
    including some functions, structs etc. in documentation. To help
    finding which ones were missed, allow printing out the unused ones as
    warnings.

    For example, using this on mac80211 yields a lot of warnings like this:

    Warning: didn't use docs for DOC: mac80211 workqueue
    Warning: didn't use docs for ieee80211_max_queues
    Warning: didn't use docs for ieee80211_bss_change
    Warning: didn't use docs for ieee80211_bss_conf

    when generating the documentation for it.

    Signed-off-by: Johannes Berg
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • There are valid attributes that could have upper case letters, but we
    still want to remove, like for example
    __attribute__((aligned(NETDEV_ALIGN)))
    as encountered in the wireless code.

    Signed-off-by: Johannes Berg
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Avoid hitting OOM during preallocation of memory
    PM QoS: Correct pr_debug() misuse and improve parameter checks
    PM: Prevent waiting forever on asynchronous resume after failing suspend

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] fix use-after-free in scsi_init_io()
    [SCSI] sd: fix medium-removal bug
    [SCSI] qla2xxx: Update version number to 8.03.04-k0.
    [SCSI] qla2xxx: Check for empty slot in request queue before posting Command type 6 request.
    [SCSI] qla2xxx: Cover UNDERRUN case where SCSI status is set.
    [SCSI] qla2xxx: Correctly set fw hung and complete only waiting mbx.
    [SCSI] qla2xxx: Reset seconds_since_last_heartbeat correctly.
    [SCSI] qla2xxx: make rport deletions explicit during vport removal
    [SCSI] qla2xxx: Fix vport delete issues
    [SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
    [SCSI] Fix warning: zero-length gnu_printf format string
    [SCSI] hpsa: disable doorbell reset on reset_devices
    [SCSI] be2iscsi: Fix for Login failure
    [SCSI] fix bio.bi_rw handling

    Linus Torvalds
     
  • There is a problem in hibernate_preallocate_memory() that it calls
    preallocate_image_memory() with an argument that may be greater than
    the total number of available non-highmem memory pages. If that's
    the case, the OOM condition is guaranteed to trigger, which in turn
    can cause significant slowdown to occur during hibernation.

    To avoid that, make preallocate_image_memory() adjust its argument
    before calling preallocate_image_pages(), so that the total number of
    saveable non-highem pages left is not less than the minimum size of
    a hibernation image. Change hibernate_preallocate_memory() to try to
    allocate from highmem if the number of pages allocated by
    preallocate_image_memory() is too low.

    Modify free_unnecessary_pages() to take all possible memory
    allocation patterns into account.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki
    Tested-by: M. Vefa Bicakci

    Rafael J. Wysocki
     

11 Sep, 2010

8 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
    ipheth: remove incorrect devtype to WWAN
    MAINTAINERS: Add CAIF
    sctp: fix test for end of loop
    KS8851: Correct RX packet allocation
    udp: add rehash on connect()
    net: blackhole route should always be recalculated
    ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
    niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
    ipvs: fix active FTP
    gro: Re-fix different skb headrooms
    via-velocity: Turn scatter-gather support back off.
    ipv4: Fix reverse path filtering with multipath routing.
    UNIX: Do not loop forever at unix_autobind().
    PATCH: b44 Handle RX FIFO overflow better (simplified)
    irda: off by one
    3c59x: Fix deadlock in vortex_error()
    netfilter: discard overlapping IPv6 fragment
    ipv6: discard overlapping fragment
    net: fix tx queue selection for bridged devices implementing select_queue
    bonding: Fix jiffies overflow problems (again)
    ...

    Fix up trivial conflicts due to the same cgroup API thinko fix going
    through both Andrew and the networking tree. However, there were small
    differences between the two, with Andrew's version generally being the
    nicer one, and the one I merged first. So pick that one.

    Conflicts in: include/linux/cgroup.h and kernel/cgroup.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc: Kill all BKL usage.

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, tsc: Fix a preemption leak in restore_sched_clock_state()
    sched: Move sched_avg_update() to update_cpu_load()

    Linus Torvalds
     
  • Doh, a real life genuine preemption leak..

    This caused a suspend failure.

    Reported-bisected-and-tested-by-the-invaluable: Jeff Chua
    Acked-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Nico Schottelius
    Cc: Jesse Barnes
    Cc: Linus Torvalds
    Cc: Florian Pritz
    Cc: Suresh Siddha
    Cc: Len Brown
    Cc: # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
    sleep states
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • * 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
    drm/i915: don't enable self-refresh on Ironlake
    drm/i915: Double check that the wait_request is not pending before warning
    Revert "drm/i915: Warn if we run out of FIFO space for a mode"
    Revert "drm/i915: Allow LVDS on pipe A on gen4+"
    Revert "drm/i915: Enable RC6 on Ironlake."

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: log IO completion workqueue is a high priority queue
    xfs: prevent reading uninitialized stack memory

    Linus Torvalds
     
  • A real life genuine preemption leak..

    Reported-and-tested-by: Jeff Chua
    Signed-off-by: Peter Zijlstra
    Acked-by: Suresh Siddha
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Correct some pr_debug() misuse and add a stronger parameter check to
    pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter
    for pointing out the problem!

    Signed-off-by: mark gross
    Signed-off-by: Rafael J. Wysocki

    mark gross
     

10 Sep, 2010

9 commits

  • The workqueue implementation in 2.6.36-rcX has changed, resulting
    in the workqueues no longer having dedicated threads for work
    processing. This has caused severe livelocks under heavy parallel
    create workloads because the log IO completions have been getting
    held up behind metadata IO completions. Hence log commits would
    stall, memory allocation would stall because pages could not be
    cleaned, and lock contention on the AIL during inode IO completion
    processing was being seen to slow everything down even further.

    By making the log Io completion workqueue a high priority workqueue,
    they are queued ahead of all data/metadata IO completions and
    processed before the data/metadata completions. Hence the log never
    gets stalled, and operations needed to clean memory can continue as
    quickly as possible. This avoids the livelock conditions and allos
    the system to keep running under heavy load as per normal.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • An execve with a very large total of argument/environment strings
    can take a really long time in the execve system call. It runs
    uninterruptibly to count and copy all the strings. This change
    makes it abort the exec quickly if sent a SIGKILL.

    Note that this is the conservative change, to interrupt only for
    SIGKILL, by using fatal_signal_pending(). It would be perfectly
    correct semantics to let any signal interrupt the string-copying in
    execve, i.e. use signal_pending() instead of fatal_signal_pending().
    We'll save that change for later, since it could have user-visible
    consequences, such as having a timer set too quickly make it so that
    an execve can never complete, though it always happened to work before.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds a preemption point during the copying of the argument and
    environment strings for execve, in copy_strings(). There is already
    a preemption point in the count() loop, so this doesn't add any new
    points in the abstract sense.

    When the total argument+environment strings are very large, the time
    spent copying them can be much more than a normal user time slice.
    So this change improves the interactivity of the rest of the system
    when one process is doing an execve with very large arguments.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
    check the size of the argument/environment area on the stack.
    When it is unworkably large, shift_arg_pages() hits its BUG_ON.
    This is exploitable with a very large RLIMIT_STACK limit, to
    create a crash pretty easily.

    Check that the initial stack is not too large to make it possible
    to map in any executable. We're not checking that the actual
    executable (or intepreter, for binfmt_elf) will fit. So those
    mappings might clobber part of the initial stack mapping. But
    that is just userland lossage that userland made happen, not a
    kernel problem.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: Perform hardware_enable in CPU_STARTING callback
    KVM: i8259: fix migration
    KVM: fix i8259 oops when no vcpus are online
    KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
    perf symbols: Fix multiple initialization of symbol system
    perf: Fix CPU hotplug
    perf, trace: Fix module leak
    tracing/kprobe: Fix handling of C-unlike argument names
    tracing/kprobes: Fix handling of argument names
    perf probe: Fix handling of arguments names
    perf probe: Fix return probe support
    tracing/kprobe: Fix a memory leak in error case
    tracing: Do not allow llseek to set_ftrace_filter

    Linus Torvalds
     
  • Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
    of the parent process's session keyring whether or not the parent has a session
    keyring [CVE-2010-2960].

    This results in the following oops:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
    IP: [] keyctl_session_to_parent+0x251/0x443
    ...
    Call Trace:
    [] ? keyctl_session_to_parent+0x67/0x443
    [] ? __do_fault+0x24b/0x3d0
    [] sys_keyctl+0xb4/0xb8
    [] system_call_fastpath+0x16/0x1b

    if the parent process has no session keyring.

    If the system is using pam_keyinit then it mostly protected against this as all
    processes derived from a login will have inherited the session keyring created
    by pam_keyinit during the log in procedure.

    To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

    Reported-by: Tavis Ormandy
    Signed-off-by: David Howells
    Acked-by: Tavis Ormandy
    Signed-off-by: Linus Torvalds

    David Howells
     
  • There's an protected access to the parent process's credentials in the middle
    of keyctl_session_to_parent(). This results in the following RCU warning:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl-session-/2137:
    #0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

    stack backtrace:
    Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] keyctl_session_to_parent+0xed/0x236
    [] sys_keyctl+0xb4/0xb6
    [] system_call_fastpath+0x16/0x1b

    The code should take the RCU read lock to make sure the parents credentials
    don't go away, even though it's holding a spinlock and has IRQ disabled.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: Range check cpu in blk_cpu_to_group
    scatterlist: prevent invalid free when alloc fails
    writeback: Fix lost wake-up shutting down writeback thread
    writeback: do not lose wakeup events when forking bdi threads
    cciss: fix reporting of max queue depth since init
    block: switch s390 tape_block and mg_disk to elevator_change()
    block: add function call to switch the IO scheduler from a driver
    fs/bio-integrity.c: return -ENOMEM on kmalloc failure
    bio-integrity.c: remove dependency on __GFP_NOFAIL
    BLOCK: fix bio.bi_rw handling
    block: put dev->kobj in blk_register_queue fail path
    cciss: handle allocation failure
    cfq-iosched: Documentation help for new tunables
    cfq-iosched: blktrace print per slice sector stats
    cfq-iosched: Implement tunable group_idle
    cfq-iosched: Do group share accounting in IOPS when slice_idle=0
    cfq-iosched: Do not idle if slice_idle=0
    cciss: disable doorbell reset on reset_devices
    blkio: Fix return code for mkdir calls

    Linus Torvalds