12 Feb, 2009

20 commits

  • Fix kernel-doc-nano-HOWTO.txt to use */ as the ending marker in kernel-doc
    examples and state that */ is the preferred ending marker.

    Signed-off-by: Randy Dunlap
    Reported-by: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • page_cgroup's page allocation at init/memory hotplug uses kmalloc() and
    vmalloc(). If kmalloc() failes, vmalloc() is used.

    This is because vmalloc() is very limited resource on 32bit systems.
    We want to use kmalloc() first.

    But in this kind of call, __GFP_NOWARN should be specified.

    Reported-by: Heiko Carstens
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Signed-off-by: Uwe Kleine-Koenig
    Signed-off-by: Mike Frysinger
    Signed-off-by: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-Koenig
     
  • Update my email address.

    Signed-off-by: Marcel Selhorst
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcel Selhorst
     
  • When I tested following program, I found that the mlocked counter
    is strange. It cannot free some mlocked pages.

    It is because try_to_unmap_file() doesn't check real
    page mappings in vmas.

    That is because the goal of an address_space for a file is to find all
    processes into which the file's specific interval is mapped. It is
    related to the file's interval, not to pages.

    Even if the page isn't really mapped by the vma, it returns SWAP_MLOCK
    since the vma has VM_LOCKED, then calls try_to_mlock_page. After this the
    mlocked counter is increased again.

    COWed anon page in a file-backed vma could be a such case. This patch
    resolves it.

    -- my test program --

    int main()
    {
    mlockall(MCL_CURRENT);
    return 0;
    }

    -- before --

    root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
    Unevictable: 0 kB
    Mlocked: 0 kB

    -- after --

    root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
    Unevictable: 8 kB
    Mlocked: 8 kB

    Signed-off-by: MinChan Kim
    Acked-by: Lee Schermerhorn
    Acked-by: KOSAKI Motohiro
    Tested-by: Lee Schermerhorn
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    MinChan Kim
     
  • This reverts commit c87591b719737b4e91eb1a9fa8fd55a4ff1886d6.

    Since journal_start_commit() is now fixed to return 1 when we started a
    transaction commit, there's some transaction waiting to be committed or
    there's a transaction already committing, we don't need to call
    ext3_force_commit() in ext3_sync_fs(). Furthermore ext3_force_commit()
    can unnecessarily create sync transaction which is expensive so it's
    worthwhile to remove it when we can.

    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • journal_start_commit() returns 1 if either a transaction is committing or
    the function has queued a transaction commit. But it returns 0 if we
    raced with somebody queueing the transaction commit as well. This
    resulted in ext3_sync_fs() not functioning correctly (description from
    Arthur Jones): In the case of a data=ordered umount with pending long
    symlinks which are delayed due to a long list of other I/O on the backing
    block device, this causes the buffer associated with the long symlinks to
    not be moved to the inode dirty list in the second phase of fsync_super.
    Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT
    flag and the dirty pages are never written to the backing block device,
    causing long symlink corruption and exposing new or previously freed block
    data to userspace.

    This can be reproduced with a script created by Eric Sandeen
    :

    #!/bin/bash

    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    rm -f /mnt/test2/*
    dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
    touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
    ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
    /mnt/test2/link
    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    ls /mnt/test2/

    This patch fixes journal_start_commit() to always return 1 when there's
    a transaction committing or queued for commit.

    Cc: Eric Sandeen
    Cc: Mike Snitzer
    Cc:
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • We need to pass an unsigned long as the minimum, because it gets casted
    to an unsigned long in the sysctl handler. If we pass an int, we'll
    access four more bytes on 64bit arches, resulting in a random minimum
    value.

    [rientjes@google.com: fix type of `old_bytes']
    Signed-off-by: Sven Wegener
    Cc: Peter Zijlstra
    Cc: Dave Chinner
    Cc: Christoph Lameter
    Cc: David Rientjes
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sven Wegener
     
  • We weren't properly allocating the cmap for depths greater than 8bpp,
    which caused pain for things like DirectFB. Also, we never freed the cmap
    memory upon module unload..

    Signed-off-by: Andres Salomon
    Cc: Marco La Porta
    Cc: Jordan Crouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • We weren't properly allocating the cmap for depths greater than 8bpp,
    which caused pain for things like DirectFB. Also, we never freed the cmap
    memory upon module unload..

    Signed-off-by: Andres Salomon
    Cc: Marco La Porta
    Cc: Jordan Crouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • We weren't properly allocating the cmap for depths greater than 8bpp,
    which caused pain for things like DirectFB. Also, we never freed the cmap
    memory upon module unload..

    [dilinger@debian.org: dropped unnecessary code and clean up patch]
    [dilinger@debian.org: add error checking and handling]
    Signed-off-by: Andres Salomon
    Cc: Jordan Crouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco La Porta
     
  • Signed-off-by: Robert Jarzmik
    Signed-off-by: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Jarzmik
     
  • migrate_vmas() should check "vma" not "vma->vm_next" for for-loop condition.

    Signed-off-by: Daisuke Nishimura
    Cc: Christoph Lameter
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • Commit 5a6fe125950676015f5108fb71b2a67441755003 brought hugetlbfs more
    in line with the core VM by obeying VM_NORESERVE and not reserving
    hugepages for both shared and private mappings when [SHM|MAP]_NORESERVE
    are specified. However, it is still taking filesystem quota
    unconditionally.

    At fault time, if there are no reserves and attempt is made to allocate
    the page and account for filesystem quota. If either fail, the fault
    fails. The impact is that quota is getting accounted for twice. This
    patch partially reverts 5a6fe125950676015f5108fb71b2a67441755003. To
    help prevent this mistake happening again, it improves the documentation
    of hugetlb_reserve_pages()

    Reported-by: Andy Whitcroft
    Signed-off-by: Mel Gorman
    Acked-by: Andy Whitcroft
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: revert recent sync wakeup changes

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timers: fix TIMER_ABSTIME for process wide cpu timers
    timers: split process wide cpu clocks/timers, fix
    x86: clean up hpet timer reinit
    timers: split process wide cpu clocks/timers, remove spurious warning
    timers: split process wide cpu clocks/timers
    signal: re-add dead task accumulation stats.
    x86: fix hpet timer reinit for x86_64
    sched: fix nohz load balancer on cpu offline

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ptrace, x86: fix the usage of ptrace_fork()
    i8327: fix outb() parameter order
    x86: fix math_emu register frame access
    x86: math_emu info cleanup
    x86: include correct %gs in a.out core dump
    x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
    x86: find nr_irqs_gsi with mp_ioapic_routing
    x86: add clflush before monitor for Intel 7400 series
    x86: disable intel_iommu support by default
    x86: don't apply __supported_pte_mask to non-present ptes
    x86: fix grammar in user-visible BIOS warning
    x86/Kconfig.cpu: make Kconfig help readable in the console
    x86, 64-bit: print DMI info in the oops trace

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing, x86: fix constraint for parent variable
    tracing, x86: fix fixup section to return to original code
    profiling: fix broken profiling regression

    Linus Torvalds
     
  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
    [S390] Update default configuration.
    [S390] dasd: fix race in dasd timer handling
    [S390] dasd: bus_id -> dev_name() conversion.
    [S390] Fix init irq proc build break.
    [S390] vdso: fix per cpu vdso pointer in lowcore

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW

    Linus Torvalds
     

11 Feb, 2009

19 commits

  • Intel reported a 10% regression (mysql+sysbench) on a 16-way machine
    with these patches:

    1596e29: sched: symmetric sync vs avg_overlap
    d942fb6: sched: fix sync wakeups

    Revert them.

    Reported-by: "Zhang, Yanmin"
    Bisected-by: Lin Ming
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The POSIX timer interface allows for absolute time expiry values through the
    TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
    every time we start it.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • To decrease the chance of a missed enable, always enable the timer when we
    sample it, we'll always disable it when we find that there are no active timers
    in the jiffy tick.

    This fixes a flood of warnings reported by Mike Galbraith.

    Reported-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • In dasd_device_set_timer and dasd_block_set_timer we interpret the
    return value of mod_timer in a wrong way. If the timer expires in
    the small window between our check of timer_pending and the call to
    mod_timer, then the timer will be set, mod_timer returns zero and
    we will call add_timer for a timer that is already pending.
    As del_timer and mod_timer do all the necessary checking themselves,
    we can simplify our code and remove the race a the same time.

    Signed-off-by: Stefan Weinhuber
    Signed-off-by: Martin Schwidefsky

    Stefan Weinhuber
     
  • bus_id usage crept in again; fix it.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Heiko Carstens

    Cornelia Huck
     
  • Embed init_irq_proc(s390) within CONFIG_PROC_FS to fix a build break.

    Signed-off-by : Sachin Sant

    Sachin Sant
     
  • The vdso_per_cpu_data entry in the lowcore structure uses __u32
    instead of __u64. If the data page is above 4GB the pointer is
    truncated and the kernel crashes.

    Reported-by: Mijo Safradin
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • I noticed by pure accident we have ptrace_fork() and friends. This was
    added by "x86, bts: add fork and exit handling", commit
    bf53de907dfdaac178c92d774aae7370d7b97d20.

    I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
    believe this needs the fix. I think something like this program

    int main(void)
    {
    int pid = fork();

    if (!pid) {
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    kill(getpid(), SIGSTOP);
    fork();
    } else {
    struct ptrace_bts_config bts = {
    .flags = PTRACE_BTS_O_ALLOC,
    .size = 4 * 4096,
    };

    wait(NULL);

    ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
    ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
    ptrace(PTRACE_CONT, pid, NULL, NULL);

    sleep(1);
    }

    return 0;
    }

    should crash the kernel.

    If the task is traced by its natural parent ptrace_reparented() returns 0
    but we should clear ->btsxxx anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Markus Metzger
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • The constraint used for retrieving and restoring the parent function
    pointer is incorrect. The parent variable is a pointer, and the
    address of the pointer is modified by the asm statement and not
    the pointer itself. It is incorrect to pass it in as an output
    constraint since the asm will never update the pointer.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • …it/rostedt/linux-2.6-trace into tracing/urgent

    Ingo Molnar
     
  • The following commit:

    commit 64b3d0e8122b422e879b23d42f9e0e8efbbf9744
    Author: Benjamin Herrenschmidt
    Date: Thu Dec 18 19:13:51 2008 +0000

    powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED

    broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now
    actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
    out before we propogate it to the PPC HW PTE.

    Reported-by: Martyn Welch
    Signed-off-by: Kumar Gala
    Signed-off-by: Benjamin Herrenschmidt

    Kumar Gala
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm:
    [ARM] AACI: timeout will reach -1
    [ARM] Storage class should be before const qualifier
    [ARM] pxa: stop and disable IRQ for each DMA channels at startup
    [ARM] pxa: make more SSCR0 bit definitions visible on multiple processors
    [ARM] pxa: fix missing of __REG() definition for ac97 registers access
    [ARM] pxa: fix NAND and MMC clock initialization for pxa3xx

    Linus Torvalds
     
  • Fix regression due to 5a6fe125950676015f5108fb71b2a67441755003,
    "Do not account for the address space used by hugetlbfs using VM_ACCOUNT"
    which added an argument to the function hugetlb_file_setup() but not to
    the macro hugetlb_file_setup().

    Reported-by: Chris Clayton
    Signed-off-by: Stefan Richter
    Acked-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc: Add missing sparsemem.h include
    powerpc/pci: mmap anonymous memory when legacy_mem doesn't exist
    powerpc/cell: Add missing #include for oprofile
    powerpc/ftrace: Fix math to calculate offset in TOC
    powerpc: Don't emulate mr. instructions
    powerpc/fsl-booke: Fix mapping functions to use phys_addr_t
    arch/powerpc: Eliminate double sizeof
    powerpc/cpm2: Fix set interrupt type
    powerpc/83xx: Fix TSEC0 workability on MPC8313E-RDB boards
    powerpc/83xx: Fix missing #{address,size}-cells in mpc8313erdb.dts
    powerpc/83xx: Build breakage for CONFIG_PM but no CONFIG_SUSPEND

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc64: Fix probe_kernel_{read,write}().
    sparc64: Kill .fixup section bloat.
    sparc64: Don't hook up pcr_ops on spitfire chips.
    sparc64: Call dump_stack() in die_nmi().

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
    bridge: Fix LRO crash with tun
    IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created.
    gianfar: Fix boot hangs while bringing up gianfar ethernet
    netfilter: xt_sctp: sctp chunk mapping doesn't work
    netfilter: ctnetlink: fix echo if not subscribed to any multicast group
    netfilter: ctnetlink: allow changing NAT sequence adjustment in creation
    netfilter: nf_conntrack_ipv6: don't track ICMPv6 negotiation message
    netfilter: fix tuple inversion for Node information request
    netxen: fix msi-x interrupt handling
    de2104x: force correct order when writing to rx ring
    tun: Fix unicast filter overflow
    drivers/isdn: introduce missing kfree
    drivers/atm: introduce missing kfree
    sunhme: Don't match PCI devices in SBUS probe.
    9p: fix endian issues [attempt 3]
    net_dma: call dmaengine_get only if NET_DMA enabled
    3c509: Fix resume from hibernation for PnP mode.
    sungem: Soft lockup in sungem on Netra AC200 when switching interface up
    RxRPC: Fix a potential NULL dereference
    r8169: Don't update statistics counters when interface is down
    ...

    Linus Torvalds
     
  • When overcommit is disabled, the core VM accounts for pages used by anonymous
    shared, private mappings and special mappings. It keeps track of VMAs that
    should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
    with VM_NORESERVE.

    Overcommit for hugetlbfs is much riskier than overcommit for base pages
    due to contiguity requirements. It avoids overcommiting on both shared and
    private mappings using reservation counters that are checked and updated
    during mmap(). This ensures (within limits) that hugepages exist in the
    future when faults occurs or it is too easy to applications to be SIGKILLed.

    As hugetlbfs makes its own reservations of a different unit to the base page
    size, VM_ACCOUNT should never be set. Even if the units were correct, we would
    double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
    be set because an application can request no reserves be made for hugetlbfs
    at the risk of getting killed later.

    With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and
    VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
    breaks the accounting for both the core VM and hugetlbfs, can trigger an
    OOM storm when hugepage pools are too small lockups and corrupted counters
    otherwise are used. This patch brings hugetlbfs more in line with how the
    core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.

    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Impact: fix to prevent a kernel crash on fault

    If for some reason the pointer to the parent function on the
    stack takes a fault, the fix up code will not return back to
    the original faulting code. This can lead to unpredictable
    results and perhaps even a kernel panic.

    A fault should not happen, but if it does, we should simply
    disable the tracer, warn, and continue running the kernel.
    It should not lead to a kernel crash.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

10 Feb, 2009

1 commit