13 Oct, 2009

1 commit

  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

02 Oct, 2009

1 commit


28 Sep, 2009

1 commit


25 Sep, 2009

2 commits

  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    Fix build of cpm_uart due to core changes
    powerpc/8xx: Fix regression introduced by cache coherency rewrite
    powerpc/4xx: Fix erroneous xmon warning on PowerPC 4xx
    powerpc/mm: Fix 40x and 8xx vs. _PAGE_SPECIAL
    powerpc: Cleanup linker script using new linker script macros.
    powerpc: Fix ibm,client-architecture-support printout
    powerpc: Increase NODES_SHIFT on 64bit from 4 to 8
    powerpc/perf_counter: Fix vdso detection
    powerpc: Move 64bit heap above 1TB on machines with 1TB segments
    powerpc: Change archdata dma_data to a union
    powerpc: Rename get_dma_direct_offset get_dma_offset
    powerpc/mm: Remove duplicated #include
    powerpc/book3e-64: Remove duplicated #include
    powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
    powerpc/pmc: Don't access lppaca on Book3E
    powerpc: kmalloc failure ignored in vio_build_iommu_table()
    hvc_console: Provide (un)locked version for hvc_resize()

    Linus Torvalds
     
  • Signed-off-by: Tim Abbott
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: linuxppc-dev@ozlabs.org
    Acked-by: Sam Ravnborg
    Signed-off-by: Linus Torvalds

    Tim Abbott
     

24 Sep, 2009

26 commits

  • * 'for-linus' of git://neil.brown.name/md: (97 commits)
    md: raid-1/10: fix RW bits manipulation
    md: remove unnecessary memset from multipath.
    md: report device as congested when suspended
    md: Improve name of threads created by md_register_thread
    md: remove sparse warnings about lock context.
    md: remove sparse waring "symbol xxx shadows an earlier one"
    async_tx/raid6: add missing dma_unmap calls to the async fail case
    ioat3: fix uninitialized var warnings
    drivers/dma/ioat/dma_v2.c: fix warnings
    raid6test: fix stack overflow
    ioat2: clarify ring size limits
    md/raid6: cleanup ops_run_compute6_2
    md/raid6: eliminate BUG_ON with side effect
    dca: module load should not be an error message
    ioat: driver version 4.0
    dca: registering requesters in multiple dca domains
    async_tx: remove HIGHMEM64G restriction
    dmaengine: sh: Add Support SuperH DMA Engine driver
    dmaengine: Move all map_sg/unmap_sg for slave channel to its client
    fsldma: Add DMA_SLAVE support
    ...

    Linus Torvalds
     
  • After upgrading to the latest kernel on my mpc875 userspace started
    running incredibly slow (hours to get to a shell, even!).
    I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
    that patch removed a work-around for the 8xx. Adding it
    back makes my problem go away.

    Signed-off-by: Rex Feany
    Signed-off-by: Benjamin Herrenschmidt

    Rex Feany
     
  • The xmon code relies on MSR_RI being non-zero to indicate that an exception
    is recoverable. If it is not, it prints a warning message. However, the
    PowerPC 4xx cores do not have an MSR_RI bit and this warning is produced for
    every xmon event.

    This introduces an unrecoverable_excp function to determine if an exception
    is recoverable or not. This gets rid of the erroneous warnings on 4xx.

    Signed-off-by: Josh Boyer
    Signed-off-by: Benjamin Herrenschmidt

    Josh Boyer
     
  • The test to check whether we have _PAGE_SPECIAL defined is broken,
    since we always define it, just not always to a meaninful value :-)

    That broke 8xx and 40x under some circumstances.

    This fixes it by adding _PAGE_SPECIAL for both of these since they
    had a free PTE bit, and removing the condition around advertising
    it.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Signed-off-by: Tim Abbott
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: linuxppc-dev@ozlabs.org
    Cc: Sam Ravnborg
    Acked-by: Sam Ravnborg
    Signed-off-by: Benjamin Herrenschmidt

    Tim Abbott
     
  • On machines without the ibm,client-architecture-support call we were missing a
    newline. We may as well print the full name in all its glory too - its
    ibm,client-architecture-support, not ibm,client-architecture as I mistakenly
    wrote (a name only an IBM architect could love).

    For my penance I will write out ibm,client-architecture-support 100 times.

    Before:

    Calling ibm,client-architecture...command line: root=/dev/sda6 console=hvc0 quiet

    After:

    Calling ibm,client-architecture-support... not implemented
    command line: root=/dev/sda6 console=hvc0

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Some System p configurations can already have more than 16 nodes so we
    need to increase NODES_SHIFT. I chose 256 to give us some room to grow in the
    future, although we can look at something smaller if the memory bloat is
    considered too much.

    Unless we clamp MAX_ACTIVE_REGIONS we end up with 300kB of extra bloat in
    early_node_map in mm/page_alloc.c:

    < 6144 early_node_map
    > 307200 early_node_map

    due to:

    #if MAX_NUMNODES >= 32
    /* If there can be many nodes, allow up to 50 holes per node */
    #define MAX_ACTIVE_REGIONS (MAX_NUMNODES*50)
    #else
    /* By default, allow up to 256 distinct regions */
    #define MAX_ACTIVE_REGIONS 256

    Since our memory is mostly contiguous it seems reasonable to keep this
    at 256 for now. I also set 32bit to 32 to save space (is there any chance
    a 32bit system will have more than 32 discontiguous memory ranges?).

    Even with that fixed we have a few data structures that grow:

    < 896 bootmem_node_data
    > 14336 bootmem_node_data

    < 1280 node_devices
    > 20480 node_devices

    < 25088 kmalloc_caches
    > 59648 kmalloc_caches

    < 1632 hstates
    > 21792 hstates

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • perf_counter uses arch_vma_name() to detect a vdso region which in turn uses
    current->mm->context.vdso_base. We need to initialise this before doing
    the mmap or else we fail to detect the vdso.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • If we are using 1TB segments and we are allowed to randomise the heap, we can
    put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be
    in the bottom 1TB which always uses 256MB segments and this may result in a
    performance penalty.

    This functionality is disabled when heap randomisation is turned off:

    echo 1 > /proc/sys/kernel/randomize_va_space

    which may be useful when trying to allocate the maximum amount of 16M or 16G
    pages.

    On a microbenchmark that repeatedly touches 32GB of memory with a stride of
    256MB + 4kB (designed to stress 256MB segments while still mapping nicely into
    the L1 cache), we see the improvement:

    Force malloc to use heap all the time:
    # export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1

    Disable heap randomization:
    # echo 1 > /proc/sys/kernel/randomize_va_space
    # time ./test
    12.51s

    Enable heap randomization:
    # echo 2 > /proc/sys/kernel/randomize_va_space
    # time ./test
    1.70s

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Sometimes this is used to hold a simple offset, and sometimes
    it is used to hold a pointer. This patch changes it to a union containing
    void * and dma_addr_t. get/set accessors are also provided, because it was
    getting a bit ugly to get to the actual data.

    Signed-off-by: Becky Bruce
    Signed-off-by: Benjamin Herrenschmidt

    Becky Bruce
     
  • The former is no longer really accurate with the swiotlb case now
    a possibility. I also move it into dma-mapping.h - it no longer
    needs to be in dma.c, and there are about to be some more accessors
    that should all end up in the same place. A comment is added to
    indicate that this function is not used in configs where there is no
    simple dma offset, such as the iommu case.

    Signed-off-by: Becky Bruce
    Signed-off-by: Benjamin Herrenschmidt

    Becky Bruce
     
  • Remove duplicated #include('s) in
    arch/powerpc/mm/tlb_low_64e.S

    Signed-off-by: Huang Weiyi
    Signed-off-by: Benjamin Herrenschmidt

    Huang Weiyi
     
  • Remove duplicated #include('s) in
    arch/powerpc/kernel/exceptions-64e.S

    Signed-off-by: Huang Weiyi
    Signed-off-by: Benjamin Herrenschmidt

    Huang Weiyi
     
  • When using CONFIG_RELOCATABLE, we build the kernel as a position
    independent executable. The kernel then uses a little bit of relocation
    code to relocate itself. That code only deals with R_PPC64_RELATIVE
    relocations though. If for some reason you use assembly constructs
    such as LOAD_REG_IMMEDIATE() to load the address of a symbol, you'll
    generate different kinds of relocations that won't be processed properly
    and bad things will happen. (We have 2 such bugs today).

    The perl script tries to filter out "known" bad ones. It's possible
    that we are missing some in the case of a weak function that nobody
    implements, we'll see if we get false positive and fix it.

    Signed-off-by: Tony Breeds
    Signed-off-by: Benjamin Herrenschmidt

    Tony Breeds
     
  • It doesn't exist !

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Prevent NULL dereference if kmalloc() fails.

    Signed-off-by: Roel Kluin
    Signed-off-by: Benjamin Herrenschmidt

    roel kluin
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
    cpumask: Move deprecated functions to end of header.
    cpumask: remove unused deprecated functions, avoid accusations of insanity
    cpumask: use new-style cpumask ops in mm/quicklist.
    cpumask: use mm_cpumask() wrapper: x86
    cpumask: use mm_cpumask() wrapper: um
    cpumask: use mm_cpumask() wrapper: mips
    cpumask: use mm_cpumask() wrapper: mn10300
    cpumask: use mm_cpumask() wrapper: m32r
    cpumask: use mm_cpumask() wrapper: arm
    cpumask: Use accessors for cpu_*_mask: um
    cpumask: Use accessors for cpu_*_mask: powerpc
    cpumask: Use accessors for cpu_*_mask: mips
    cpumask: Use accessors for cpu_*_mask: m32r
    cpumask: remove arch_send_call_function_ipi
    cpumask: arch_send_call_function_ipi_mask: s390
    cpumask: arch_send_call_function_ipi_mask: powerpc
    cpumask: arch_send_call_function_ipi_mask: mips
    cpumask: arch_send_call_function_ipi_mask: m32r
    cpumask: arch_send_call_function_ipi_mask: alpha
    cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
    ...

    Linus Torvalds
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Now everyone is converted to arch_send_call_function_ipi_mask, remove
    the shim and the #defines.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask(), and by defining
    it, the old arch_send_call_function_ipi is defined by the core code.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • There were replaced by topology_core_cpumask and topology_thread_cpumask.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Signed-off-by: Rusty Russell

    Rusty Russell
     
  • cpumask_of_pcibus() is the new version.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
    Use macros for .data.page_aligned section.
    Use macros for .bss.page_aligned section.
    Use new __init_task_data macro in arch init_task.c files.
    kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
    arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
    kbuild: add static to prototypes
    kbuild: fail build if recordmcount.pl fails
    kbuild: set -fconserve-stack option for gcc 4.5
    kbuild: echo the record_mcount command
    gconfig: disable "typeahead find" search in treeviews
    kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
    checkincludes.pl: add option to remove duplicates in place
    markup_oops: use modinfo to avoid confusion with underscored module names
    checkincludes.pl: provide usage helper
    checkincludes.pl: close file as soon as we're done with it
    ctags: usability fix
    kernel hacking: move STRIP_ASM_SYMS from General
    gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
    kbuild: Check if linker supports the -X option
    kbuild: introduce ld-option
    ...

    Fix trivial conflict in scripts/basic/fixdep.c

    Linus Torvalds
     
  • * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    itimers: Add tracepoints for itimer
    hrtimer: Add tracepoint for hrtimers
    timers: Add tracepoints for timer_list timers
    cputime: Optimize jiffies_to_cputime(1)
    itimers: Simplify arm_timer() code a bit
    itimers: Fix periodic tics precision
    itimers: Merge ITIMER_VIRT and ITIMER_PROF

    Trivial header file include conflicts in kernel/fork.c

    Linus Torvalds
     

23 Sep, 2009

7 commits

  • For /proc/kcore, each arch registers its memory range by kclist_add().
    In usual,

    - range of physical memory
    - range of vmalloc area
    - text, etc...

    are registered but "range of physical memory" has some troubles. It
    doesn't updated at memory hotplug and it tend to include unnecessary
    memory holes. Now, /proc/iomem (kernel/resource.c) includes required
    physical memory range information and it's properly updated at memory
    hotplug. Then, it's good to avoid using its own code(duplicating
    information) and to rebuild kclist for physical memory based on
    /proc/iomem.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Jiri Slaby
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • For /proc/kcore, vmalloc areas are registered per arch. But, all of them
    registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
    them. By this. archs which have no kclist_add() hooks can see vmalloc
    area correctly.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, kclist_add() only eats start address and size as its arguments.
    Considering to make kclist dynamically reconfigulable, it's necessary to
    know which kclists are for System RAM and which are not.

    This patch add kclist types as
    KCORE_RAM
    KCORE_VMALLOC
    KCORE_TEXT
    KCORE_OTHER

    This "type" is used in a patch following this for detecting KCORE_RAM.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • eSDHC block in MPC837x SOCs reports inverted write-protect state, soon
    sdhci-of driver will look for sdhci,wp-inverted properties to decide
    whether apply a specific quirk.

    So, document the property and add it to device tree source files.

    Signed-off-by: Anton Vorontsov
    Cc: Pierre Ossman
    Cc: Kumar Gala
    Cc: David Vrabel
    Cc: Ben Dooks
    Cc: Sascha Hauer
    Cc: Benjamin Herrenschmidt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Vorontsov
     
  • Make all seq_operations structs const, to help mitigate against
    revectoring user-triggerable function pointers.

    This is derived from the grsecurity patch, although generated from scratch
    because it's simpler than extracting the changes from there.

    Signed-off-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morris
     
  • NeilBrown
     

22 Sep, 2009

2 commits

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf_event, powerpc: Fix compilation after big perf_counter rename

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    trivial: fix typo in aic7xxx comment
    trivial: fix comment typo in drivers/ata/pata_hpt37x.c
    trivial: typo in kernel-parameters.txt
    trivial: fix typo in tracing documentation
    trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
    trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
    trivial: remove unnecessary semicolons
    trivial: Fix duplicated word "options" in comment
    trivial: kbuild: remove extraneous blank line after declaration of usage()
    trivial: improve help text for mm debug config options
    trivial: doc: hpfall: accept disk device to unload as argument
    trivial: doc: hpfall: reduce risk that hpfall can do harm
    trivial: SubmittingPatches: Fix reference to renumbered step
    trivial: fix typos "man[ae]g?ment" -> "management"
    trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
    trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
    trivial: fix missing printk space in amd_k7_smp_check
    trivial: fix typo s/ketymap/keymap/ in comment
    trivial: fix typo "to to" in multiple files
    trivial: fix typos in comments s/DGBU/DBGU/
    ...

    Linus Torvalds