15 Sep, 2010

2 commits

  • On 64 bits, we always, by necessity, jump through the system call
    table via %rax. For 32-bit system calls, in theory the system call
    number is stored in %eax, and the code was testing %eax for a valid
    system call number. At one point we loaded the stored value back from
    the stack to enforce zero-extension, but that was removed in checkin
    d4d67150165df8bf1cc05e532f6efca96f907cab. An actual 32-bit process
    will not be able to introduce a non-zero-extended number, but it can
    happen via ptrace.

    Instead of re-introducing the zero-extension, test what we are
    actually going to use, i.e. %rax. This only adds a handful of REX
    prefixes to the code.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Cc:
    Cc: Roland McGrath
    Cc: Andrew Morton

    H. Peter Anvin
     
  • compat_alloc_user_space() expects the caller to independently call
    access_ok() to verify the returned area. A missing call could
    introduce problems on some architectures.

    This patch incorporates the access_ok() check into
    compat_alloc_user_space() and also adds a sanity check on the length.
    The existing compat_alloc_user_space() implementations are renamed
    arch_compat_alloc_user_space() and are used as part of the
    implementation of the new global function.

    This patch assumes NULL will cause __get_user()/__put_user() to either
    fail or access userspace on all architectures. This should be
    followed by checking the return value of compat_access_user_space()
    for NULL in the callers, at which time the access_ok() in the callers
    can also be removed.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Chris Metcalf
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Tony Luck
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: James Bottomley
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc:

    H. Peter Anvin
     

14 Sep, 2010

9 commits


13 Sep, 2010

6 commits


12 Sep, 2010

6 commits

  • Fix docbook templates that reference files that do not contain the
    expected kernel-doc notation.

    Fixes these warnings:

    Warning(arch/x86/include/asm/unaligned.h): no structured comments found
    Warning(lib/vsprintf.c): no structured comments found

    These cause errors in the generated html output, like below, so drop
    these lines.

    Name
    arch/x86/include/asm/unaligned.h - Document generation inconsistency
    Oops
    Warning
    The template for this document tried to insert the structured comment from the file arch/x86/include/asm/unaligned.h at this point, but none was found. This dummy section is inserted to allow generation to continue.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • When you don't use !E or !I but only !F, then it's very easy to miss
    including some functions, structs etc. in documentation. To help
    finding which ones were missed, allow printing out the unused ones as
    warnings.

    For example, using this on mac80211 yields a lot of warnings like this:

    Warning: didn't use docs for DOC: mac80211 workqueue
    Warning: didn't use docs for ieee80211_max_queues
    Warning: didn't use docs for ieee80211_bss_change
    Warning: didn't use docs for ieee80211_bss_conf

    when generating the documentation for it.

    Signed-off-by: Johannes Berg
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • There are valid attributes that could have upper case letters, but we
    still want to remove, like for example
    __attribute__((aligned(NETDEV_ALIGN)))
    as encountered in the wireless code.

    Signed-off-by: Johannes Berg
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Avoid hitting OOM during preallocation of memory
    PM QoS: Correct pr_debug() misuse and improve parameter checks
    PM: Prevent waiting forever on asynchronous resume after failing suspend

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] fix use-after-free in scsi_init_io()
    [SCSI] sd: fix medium-removal bug
    [SCSI] qla2xxx: Update version number to 8.03.04-k0.
    [SCSI] qla2xxx: Check for empty slot in request queue before posting Command type 6 request.
    [SCSI] qla2xxx: Cover UNDERRUN case where SCSI status is set.
    [SCSI] qla2xxx: Correctly set fw hung and complete only waiting mbx.
    [SCSI] qla2xxx: Reset seconds_since_last_heartbeat correctly.
    [SCSI] qla2xxx: make rport deletions explicit during vport removal
    [SCSI] qla2xxx: Fix vport delete issues
    [SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
    [SCSI] Fix warning: zero-length gnu_printf format string
    [SCSI] hpsa: disable doorbell reset on reset_devices
    [SCSI] be2iscsi: Fix for Login failure
    [SCSI] fix bio.bi_rw handling

    Linus Torvalds
     
  • There is a problem in hibernate_preallocate_memory() that it calls
    preallocate_image_memory() with an argument that may be greater than
    the total number of available non-highmem memory pages. If that's
    the case, the OOM condition is guaranteed to trigger, which in turn
    can cause significant slowdown to occur during hibernation.

    To avoid that, make preallocate_image_memory() adjust its argument
    before calling preallocate_image_pages(), so that the total number of
    saveable non-highem pages left is not less than the minimum size of
    a hibernation image. Change hibernate_preallocate_memory() to try to
    allocate from highmem if the number of pages allocated by
    preallocate_image_memory() is too low.

    Modify free_unnecessary_pages() to take all possible memory
    allocation patterns into account.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki
    Tested-by: M. Vefa Bicakci

    Rafael J. Wysocki
     

11 Sep, 2010

8 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
    ipheth: remove incorrect devtype to WWAN
    MAINTAINERS: Add CAIF
    sctp: fix test for end of loop
    KS8851: Correct RX packet allocation
    udp: add rehash on connect()
    net: blackhole route should always be recalculated
    ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
    niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
    ipvs: fix active FTP
    gro: Re-fix different skb headrooms
    via-velocity: Turn scatter-gather support back off.
    ipv4: Fix reverse path filtering with multipath routing.
    UNIX: Do not loop forever at unix_autobind().
    PATCH: b44 Handle RX FIFO overflow better (simplified)
    irda: off by one
    3c59x: Fix deadlock in vortex_error()
    netfilter: discard overlapping IPv6 fragment
    ipv6: discard overlapping fragment
    net: fix tx queue selection for bridged devices implementing select_queue
    bonding: Fix jiffies overflow problems (again)
    ...

    Fix up trivial conflicts due to the same cgroup API thinko fix going
    through both Andrew and the networking tree. However, there were small
    differences between the two, with Andrew's version generally being the
    nicer one, and the one I merged first. So pick that one.

    Conflicts in: include/linux/cgroup.h and kernel/cgroup.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc: Kill all BKL usage.

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, tsc: Fix a preemption leak in restore_sched_clock_state()
    sched: Move sched_avg_update() to update_cpu_load()

    Linus Torvalds
     
  • Doh, a real life genuine preemption leak..

    This caused a suspend failure.

    Reported-bisected-and-tested-by-the-invaluable: Jeff Chua
    Acked-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Nico Schottelius
    Cc: Jesse Barnes
    Cc: Linus Torvalds
    Cc: Florian Pritz
    Cc: Suresh Siddha
    Cc: Len Brown
    Cc: # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
    sleep states
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • * 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
    drm/i915: don't enable self-refresh on Ironlake
    drm/i915: Double check that the wait_request is not pending before warning
    Revert "drm/i915: Warn if we run out of FIFO space for a mode"
    Revert "drm/i915: Allow LVDS on pipe A on gen4+"
    Revert "drm/i915: Enable RC6 on Ironlake."

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: log IO completion workqueue is a high priority queue
    xfs: prevent reading uninitialized stack memory

    Linus Torvalds
     
  • A real life genuine preemption leak..

    Reported-and-tested-by: Jeff Chua
    Signed-off-by: Peter Zijlstra
    Acked-by: Suresh Siddha
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Correct some pr_debug() misuse and add a stronger parameter check to
    pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter
    for pointing out the problem!

    Signed-off-by: mark gross
    Signed-off-by: Rafael J. Wysocki

    mark gross
     

10 Sep, 2010

9 commits

  • The workqueue implementation in 2.6.36-rcX has changed, resulting
    in the workqueues no longer having dedicated threads for work
    processing. This has caused severe livelocks under heavy parallel
    create workloads because the log IO completions have been getting
    held up behind metadata IO completions. Hence log commits would
    stall, memory allocation would stall because pages could not be
    cleaned, and lock contention on the AIL during inode IO completion
    processing was being seen to slow everything down even further.

    By making the log Io completion workqueue a high priority workqueue,
    they are queued ahead of all data/metadata IO completions and
    processed before the data/metadata completions. Hence the log never
    gets stalled, and operations needed to clean memory can continue as
    quickly as possible. This avoids the livelock conditions and allos
    the system to keep running under heavy load as per normal.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • An execve with a very large total of argument/environment strings
    can take a really long time in the execve system call. It runs
    uninterruptibly to count and copy all the strings. This change
    makes it abort the exec quickly if sent a SIGKILL.

    Note that this is the conservative change, to interrupt only for
    SIGKILL, by using fatal_signal_pending(). It would be perfectly
    correct semantics to let any signal interrupt the string-copying in
    execve, i.e. use signal_pending() instead of fatal_signal_pending().
    We'll save that change for later, since it could have user-visible
    consequences, such as having a timer set too quickly make it so that
    an execve can never complete, though it always happened to work before.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds a preemption point during the copying of the argument and
    environment strings for execve, in copy_strings(). There is already
    a preemption point in the count() loop, so this doesn't add any new
    points in the abstract sense.

    When the total argument+environment strings are very large, the time
    spent copying them can be much more than a normal user time slice.
    So this change improves the interactivity of the rest of the system
    when one process is doing an execve with very large arguments.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
    check the size of the argument/environment area on the stack.
    When it is unworkably large, shift_arg_pages() hits its BUG_ON.
    This is exploitable with a very large RLIMIT_STACK limit, to
    create a crash pretty easily.

    Check that the initial stack is not too large to make it possible
    to map in any executable. We're not checking that the actual
    executable (or intepreter, for binfmt_elf) will fit. So those
    mappings might clobber part of the initial stack mapping. But
    that is just userland lossage that userland made happen, not a
    kernel problem.

    Signed-off-by: Roland McGrath
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: Perform hardware_enable in CPU_STARTING callback
    KVM: i8259: fix migration
    KVM: fix i8259 oops when no vcpus are online
    KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
    perf symbols: Fix multiple initialization of symbol system
    perf: Fix CPU hotplug
    perf, trace: Fix module leak
    tracing/kprobe: Fix handling of C-unlike argument names
    tracing/kprobes: Fix handling of argument names
    perf probe: Fix handling of arguments names
    perf probe: Fix return probe support
    tracing/kprobe: Fix a memory leak in error case
    tracing: Do not allow llseek to set_ftrace_filter

    Linus Torvalds
     
  • Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
    of the parent process's session keyring whether or not the parent has a session
    keyring [CVE-2010-2960].

    This results in the following oops:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
    IP: [] keyctl_session_to_parent+0x251/0x443
    ...
    Call Trace:
    [] ? keyctl_session_to_parent+0x67/0x443
    [] ? __do_fault+0x24b/0x3d0
    [] sys_keyctl+0xb4/0xb8
    [] system_call_fastpath+0x16/0x1b

    if the parent process has no session keyring.

    If the system is using pam_keyinit then it mostly protected against this as all
    processes derived from a login will have inherited the session keyring created
    by pam_keyinit during the log in procedure.

    To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

    Reported-by: Tavis Ormandy
    Signed-off-by: David Howells
    Acked-by: Tavis Ormandy
    Signed-off-by: Linus Torvalds

    David Howells
     
  • There's an protected access to the parent process's credentials in the middle
    of keyctl_session_to_parent(). This results in the following RCU warning:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl-session-/2137:
    #0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

    stack backtrace:
    Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] keyctl_session_to_parent+0xed/0x236
    [] sys_keyctl+0xb4/0xb6
    [] system_call_fastpath+0x16/0x1b

    The code should take the RCU read lock to make sure the parents credentials
    don't go away, even though it's holding a spinlock and has IRQ disabled.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: Range check cpu in blk_cpu_to_group
    scatterlist: prevent invalid free when alloc fails
    writeback: Fix lost wake-up shutting down writeback thread
    writeback: do not lose wakeup events when forking bdi threads
    cciss: fix reporting of max queue depth since init
    block: switch s390 tape_block and mg_disk to elevator_change()
    block: add function call to switch the IO scheduler from a driver
    fs/bio-integrity.c: return -ENOMEM on kmalloc failure
    bio-integrity.c: remove dependency on __GFP_NOFAIL
    BLOCK: fix bio.bi_rw handling
    block: put dev->kobj in blk_register_queue fail path
    cciss: handle allocation failure
    cfq-iosched: Documentation help for new tunables
    cfq-iosched: blktrace print per slice sector stats
    cfq-iosched: Implement tunable group_idle
    cfq-iosched: Do group share accounting in IOPS when slice_idle=0
    cfq-iosched: Do not idle if slice_idle=0
    cciss: disable doorbell reset on reset_devices
    blkio: Fix return code for mkdir calls

    Linus Torvalds