11 Oct, 2007

3 commits


09 Oct, 2007

2 commits


08 Oct, 2007

2 commits

  • Commit a3d384029aa304f8f3f5355d35f0ae274454f7cd aka
    "[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
    transformed rose_loopback_neigh var into statically allocated one.
    However, on unload it will be kfree's which can't work.

    Steps to reproduce:

    modprobe rose
    rmmod rose

    BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
    printing eip:
    c014c664
    *pde = 00000000
    Oops: 0000 [#1]
    PREEMPT DEBUG_PAGEALLOC
    Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
    CPU: 0
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00210086 (2.6.23-rc9 #3)
    EIP is at kfree+0x48/0xa1
    eax: 00000556 ebx: c1734aa0 ecx: f6a5e000 edx: f7082000
    esi: 00000000 edi: f9a55d20 ebp: 00200287 esp: f6a5ef28
    ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
    Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
    Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00
    00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000
    f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000
    Call Trace:
    [] rose_rt_free+0x1d/0x49 [rose]
    [] rose_rt_free+0x1d/0x49 [rose]
    [] rose_exit+0x4c/0xd5 [rose]
    [] sys_delete_module+0x15e/0x186
    [] remove_vma+0x40/0x45
    [] sysenter_past_esp+0x8f/0x99
    [] trace_hardirqs_on+0x118/0x13b
    [] sysenter_past_esp+0x5f/0x99
    =======================
    Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f
    EIP: [] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • It turns out that there are a few other five-second timers in the
    kernel, and if the timers get in sync, the load-average can get
    artificially inflated by events that just happen to coincide.

    So just offset the load average calculation it by a timer tick.

    Noticed by Anders Boström, for whom the coincidence started triggering
    on one of his machines with the JBD jiffies rounding code (JBD is one of
    the subsystems that also end up using a 5-second timer by default).

    Tested-by: Anders Boström
    Cc: Chuck Ebbert
    Cc: Arjan van de Ven
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Oct, 2007

1 commit

  • It is ok to call prefetch() function with NULL argument, as specifically
    commented in include/linux/prefetch.h. But in standard C, it is invalid
    to dereference NULL pointer (see C99 standard 6.5.3.2 paragraph 4 and
    note #84).

    prefetch() has a memory reference for its argument.

    Newer gcc versions (4.3 and above) will use that to conclude that "x"
    argument is non-null and thus wreaking havok everywhere prefetch() was
    inlined.

    Fixed by removing cast and changing asm constraint.

    [ It seems in theory gcc 4.2 could miscompile this too; although no
    cases known. In 2.6.24 we should probably switch to
    __builtin_prefetch() instead, but this is a simpler fix for now.
    -- AK ]

    Signed-off-by: Serge Belyshev
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Serge Belyshev
     

04 Oct, 2007

3 commits


03 Oct, 2007

3 commits


02 Oct, 2007

1 commit


01 Oct, 2007

1 commit

  • A "cleanup" almost two years ago deleted the old definition from
    , so asm-generic/fcntl.h defaulted it to the the same
    value as FASYNC ... which happened to be the wrong thing.

    Signed-off-by: Ralf Baechle

    Ralf Baechle
     

30 Sep, 2007

1 commit


29 Sep, 2007

2 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [TCP]: Fix MD5 signature handling on big-endian.
    [NET]: Zero length write() on socket should not simply return 0.

    Linus Torvalds
     
  • Based upon a report and initial patch by Peter Lieven.

    tcp4_md5sig_key and tcp6_md5sig_key need to start with
    the exact same members as tcp_md5sig_key. Because they
    are both cast to that type by tcp_v{4,6}_md5_do_lookup().

    Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
    length instead of a u8, which is what tcp_md5sig_key
    uses. This just so happens to work by accident on
    little-endian, but on big-endian it doesn't.

    Instead of casting, just place tcp_md5sig_key as the first member of
    the address-family specific structures, adjust the access sites, and
    kill off the ugly casts.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Sep, 2007

1 commit


27 Sep, 2007

4 commits

  • This reverts commit 184c44d2049c4db7ef6ec65794546954da2c6a0e.

    As noted by Dave Jones:
    "Linus, please revert the above cset. It doesn't seem to be
    necessary (it was added to fix a miscompile in 'make allnoconfig'
    which doesn't seem to be repeatable with it reverted) and actively
    breaks the ARM SA1100 framebuffer driver."

    Requested-by: Dave Jones
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Andi Kleen
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This reverts commit e66485d747505e9d960b864fc6c37f8b2afafaf0, since
    Rafael Wysocki noticed that the change only works for his in -mm, not in
    mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken
    on his hardware, but that's apparently not a regression, just a symptom
    of the same issue that causes the automatic apic timer disable to not
    work).

    It turns out that it really doesn't work correctly on x86-64, since
    x86-64 doesn't use the generic clock events for timers yet.

    Thanks to Rafal for testing, and here's the ugly details on x86-64 as
    per Thomas:

    "I just looked into the code and the logic vs. noapictimer on SMP is
    completely broken.

    On i386 the noapictimer option not only disables the local APIC
    timer, it also registers the CPUs for broadcasting via IPI on SMP
    systems.

    The x86-64 code uses the broadcast only when the local apic timer is
    active, i.e. "noapictimer" is not on the command line. This defeats
    the whole purpose of "noapictimer". It should be there to make boxen
    work, where the local APIC timer actually has a hardware problem,
    e.g. the nx6325.

    The current implementation of x86_64 only fixes the ACPI c-states
    related problem where the APIC timer stops in C3(2), nothing else.

    On nx6325 and other AMD X2 equipped systems which have the C1E
    enabled we run into the following:

    PIT keeps jiffies (and the system) running, but the local APIC timer
    interrupts can get out of sync due to this C1E effect.

    I don't think this is a critical problem, but it is wrong
    nevertheless.

    I think it's safe to revert the C1E patch and postpone the fix to the
    clock events conversion."

    On further reflection, Thomas noted:

    "It's even worse than I thought on the first check:

    "noapictimer" on the command line of an SMP box prevents _ONLY_ the
    boot CPU apic timer from being used. But the secondary CPU is still
    unconditionally setting up the APIC timer and uses the non
    calibrated variable calibration_result, which is of course 0, to
    setup the APIC timer. Wreckage guaranteed."

    so we'll just have to wait for the x86 merge to hopefully fix this up
    for x86-64.

    Tested-and-requested-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • commit 3556ddfa9284a86a59a9b78fe5894430f6ab4eef titled

    [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E

    solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64
    X2) with C1E enabled:

    When both cores go into idle at the same time, then the system switches
    into C1E state, which is basically the same as C3. This stops the local
    apic timer.

    This was debugged right after the dyntick merge on i386 and despite the
    patch title it fixes only the 32 bit path.

    x86_64 is still missing this fix. It seems that mainline is not really
    affected by this issue, as the PIT is running and keeps jiffies
    incrementing, but that's just waiting for trouble.

    -mm suffers from this problem due to the x86_64 high resolution timer
    patches.

    This is a quick and dirty port of the i386 code to x86_64.

    I spent quite a time with Rafael to debug the -mm / hrt wreckage until
    someone pointed us to this. I really had forgotten that we debugged this
    half a year ago already.

    Sigh, is it just me or is there something yelling arch/x86 into my ear?

    Signed-off-by: Thomas Gleixner
    Tested-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • It gets pointer to fastcall function, expects a pointer to normal
    one and calls the sucker.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

26 Sep, 2007

4 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [PPP_MPPE]: Don't put InterimKey on the stack
    SCTP : Add paramters validity check for ASCONF chunk
    SCTP: Discard OOTB packetes with bundled INIT early.
    SCTP: Clean up OOTB handling and fix infinite loop processing
    SCTP: Explicitely discard OOTB chunks
    SCTP: Send ABORT chunk with correct tag in response to INIT ACK
    SCTP: Validate buffer room when processing sequential chunks
    [PATCH] mac80211: fix initialisation when built-in
    [PATCH] net/mac80211/wme.c: fix sparse warning
    [PATCH] cfg80211: fix initialisation if built-in
    [PATCH] net/wireless/sysfs.c: Shut up build warning

    Linus Torvalds
     
  • If ADDIP is enabled, when an ASCONF chunk is received with ASCONF
    paramter length set to zero, this will cause infinite loop.
    By the way, if an malformed ASCONF chunk is received, will cause
    processing to access memory without verifying.

    This is because of not check the validity of parameters in ASCONF chunk.
    This patch fixed this.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • While processing OOTB chunks as well as chunks with an invalid
    length of 0, it was possible to SCTP to get wedged inside an
    infinite loop because we didn't catch the condition correctly,
    or didn't mark the packet for discard correctly.
    This work is based on original findings and work by
    Wei Yongjun

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Signed-off-by: Alexey Starikovskiy
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Alexey Starikovskiy
     

25 Sep, 2007

1 commit


23 Sep, 2007

2 commits

  • device_suspend() calls ACPI suspend functions, which seems to have undesired
    side effects on lower idle C-states. It took me some time to realize that
    especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
    show this effect. I'm quite sure that other bug reports against suspend/resume
    about turning the system into a brick have the same root cause.

    After fishing in the dark for quite some time, I realized that removing the ACPI
    processor module before suspend (this removes the lower C-state functionality)
    made the problem disappear. Interestingly enough the propability of having a
    bricked box is influenced by various factors (interrupts, size of the ram image,
    ...). Even adding a bunch of printks in the wrong places made the problem go
    away. The previous periodic tick implementation simply pampered over the
    problem, which explains why the dyntick / clockevents changes made this more
    prominent.

    We avoid complex functionality during the boot process and we have to do the
    same during suspend/resume. It is a similar scenario and equaly fragile.

    Add suspend / resume functions to the ACPI processor code and disable the lower
    idle C-states across suspend/resume. Fall back to the default idle
    implementation (halt) instead.

    Signed-off-by: Thomas Gleixner
    Tested-by: Andrew Morton
    Cc: Len Brown
    Cc: Venkatesh Pallipadi
    Cc: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • When compiling the Blackfin kernel, checksyscalls.pl will report lots of missing syscalls warnings.
    This patch will add some missing syscalls which make sense on Blackfin arch

    After appling this patch, toolchain should be rebuilt. Then recompiling the kernel with the new
    toolchain.

    Signed-off-by: Bryan Wu

    Bryan Wu
     

22 Sep, 2007

1 commit

  • This reverts commit 34feb2c83beb3bdf13535a36770f7e50b47ef299.

    Suresh Siddha points out that this one breaks the fundamental
    requirement that you cannot free page table pages before the TLB caches
    are flushed. The quicklists do not give the same kinds of guarantees
    that the mmu_gather structure does, at least not in NUMA configurations.

    Requested-by: Suresh Siddha
    Acked-by: Andi Kleen
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Cc: Asit Mallick
    Cc: Tony Luck
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Sep, 2007

1 commit

  • This simplifies signalfd code, by avoiding it to remain attached to the
    sighand during its lifetime.

    In this way, the signalfd remain attached to the sighand only during
    poll(2) (and select and epoll) and read(2). This also allows to remove
    all the custom "tsk == current" checks in kernel/signal.c, since
    dequeue_signal() will only be called by "current".

    I think this is also what Ben was suggesting time ago.

    The external effect of this, is that a thread can extract only its own
    private signals and the group ones. I think this is an acceptable
    behaviour, in that those are the signals the thread would be able to
    fetch w/out signalfd.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

20 Sep, 2007

7 commits

  • add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
    more agressive, by moving the yielding task to the last position
    in the rbtree.

    with sched_compat_yield=0:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield
    2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop

    with sched_compat_yield=1:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop
    2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Ingo Molnar
     
  • * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
    [MIPS] cpu-bugs64.c: GCC 3.3 constraint workaround
    [MIPS] DEC: Initialise ioasic_ssr_lock

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
    [POWERPC] Fix timekeeping on PowerPC 601
    [POWERPC] Don't expose clock vDSO functions when CPU has no timebase
    [POWERPC] spusched: Fix null pointer dereference in find_victim

    Linus Torvalds
     
  • Add a workaround to address warnings generated on the "n" constraint by
    GCC 3.3 and below.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Ralf Baechle

    Maciej W. Rozycki
     
  • This patch proposes fixes to the reference counting of memory policy in the
    page allocation paths and in show_numa_map(). Extracted from my "Memory
    Policy Cleanups and Enhancements" series as stand-alone.

    Shared policy lookup [shmem] has always added a reference to the policy,
    but this was never unrefed after page allocation or after formatting the
    numa map data.

    Default system policy should not require additional ref counting, nor
    should the current task's task policy. However, show_numa_map() calls
    get_vma_policy() to examine what may be [likely is] another task's policy.
    The latter case needs protection against freeing of the policy.

    This patch adds a reference count to a mempolicy returned by
    get_vma_policy() when the policy is a vma policy or another task's
    mempolicy. Again, shared policy is already reference counted on lookup. A
    matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
    shared and vma policies, and in show_numa_map() for shared and another
    task's mempolicy. We can call __mpol_free() directly, saving an admittedly
    inexpensive inline NULL test, because we know we have a non-NULL policy.

    Handling policy ref counts for hugepages is a bit trickier.
    huge_zonelist() returns a zone list that might come from a shared or vma
    'BIND policy. In this case, we should hold the reference until after the
    huge page allocation in dequeue_hugepage(). The patch modifies
    huge_zonelist() to return a pointer to the mempolicy if it needs to be
    unref'd after allocation.

    Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:

    w/o patch w/ refcount patch
    Avg Std Devn Avg Std Devn
    Real: 100.59 0.38 100.63 0.43
    User: 1209.60 0.37 1209.91 0.31
    System: 81.52 0.42 81.64 0.34

    Signed-off-by: Lee Schermerhorn
    Acked-by: Andi Kleen
    Cc: Christoph Lameter
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • It turned out, that the user namespace is released during the do_exit() in
    exit_task_namespaces(), but the struct user_struct is released only during the
    put_task_struct(), i.e. MUCH later.

    On debug kernels with poisoned slabs this will cause the oops in
    uid_hash_remove() because the head of the chain, which resides inside the
    struct user_namespace, will be already freed and poisoned.

    Since the uid hash itself is required only when someone can search it, i.e.
    when the namespace is alive, we can safely unhash all the user_struct-s from
    it during the namespace exiting. The subsequent free_uid() will complete the
    user_struct destruction.

    For example simple program

    #include

    char stack[2 * 1024 * 1024];

    int f(void *foo)
    {
    return 0;
    }

    int main(void)
    {
    clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
    return 0;
    }

    run on kernel with CONFIG_USER_NS turned on will oops the
    kernel immediately.

    This was spotted during OpenVZ kernel testing.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Acked-by: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
    list_heads, thus occupying twice as much place as it could. Convert it to
    hlist_heads.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov