13 Oct, 2009

2 commits

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

01 Oct, 2009

18 commits


26 Sep, 2009

1 commit

  • Commit 51b563fc93c8cb5bff1d67a0a71c374e4a4ea049 ("arm, cris, mips,
    sparc, powerpc, um, xtensa: fix build with bash 4.0") removed a few
    CPPFLAGS with vital include paths necessary to build vmlinux.lds
    on MIPS, and moved the calculation of the 'jiffies' symbol
    directly to vmlinux.lds.S but forgot to change make ifdef/... to
    cpp macros.

    Signed-off-by: Manuel Lauss
    [sam: moved assignment of CPPFLAGS arch/mips/kernel/Makefile]
    Signed-off-by: Sam Ravnborg
    Acked-by: Dmitri Vorobiev

    Manuel Lauss
     

24 Sep, 2009

10 commits

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
    cpumask: Move deprecated functions to end of header.
    cpumask: remove unused deprecated functions, avoid accusations of insanity
    cpumask: use new-style cpumask ops in mm/quicklist.
    cpumask: use mm_cpumask() wrapper: x86
    cpumask: use mm_cpumask() wrapper: um
    cpumask: use mm_cpumask() wrapper: mips
    cpumask: use mm_cpumask() wrapper: mn10300
    cpumask: use mm_cpumask() wrapper: m32r
    cpumask: use mm_cpumask() wrapper: arm
    cpumask: Use accessors for cpu_*_mask: um
    cpumask: Use accessors for cpu_*_mask: powerpc
    cpumask: Use accessors for cpu_*_mask: mips
    cpumask: Use accessors for cpu_*_mask: m32r
    cpumask: remove arch_send_call_function_ipi
    cpumask: arch_send_call_function_ipi_mask: s390
    cpumask: arch_send_call_function_ipi_mask: powerpc
    cpumask: arch_send_call_function_ipi_mask: mips
    cpumask: arch_send_call_function_ipi_mask: m32r
    cpumask: arch_send_call_function_ipi_mask: alpha
    cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
    ...

    Linus Torvalds
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask.

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Now everyone is converted to arch_send_call_function_ipi_mask, remove
    the shim and the #defines.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask(), and by defining
    it, the old arch_send_call_function_ipi is defined by the core code.

    We also take the chance to wean the implementations off the
    obsolescent for_each_cpu_mask(): making send_ipi_mask take the pointer
    seemed the most natural way to ensure all implementations used
    for_each_cpu.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • (Thanks to Al Viro for reminding me of this, via Ingo)

    CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

    #define CPU_MASK_ALL (cpumask_t) { { ... } }

    Taking the address of such a temporary is questionable at best,
    unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
    CPU_MASK_ALL_PTR:

    #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

    Which formalizes this practice. One day gcc could bite us over this
    usage (though we seem to have gotten away with it so far).

    So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
    with the modern "cpu_all_mask" (a real struct cpumask *), and remove
    CPU_MASK_ALL_PTR altogether.

    Signed-off-by: Rusty Russell
    Acked-by: Ingo Molnar
    Reported-by: Al Viro
    Cc: Mike Travis

    Rusty Russell
     
  • Signed-off-by: Rusty Russell

    Rusty Russell
     
  • cpumask_of_pcibus() is the new version.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
    Use macros for .data.page_aligned section.
    Use macros for .bss.page_aligned section.
    Use new __init_task_data macro in arch init_task.c files.
    kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
    arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
    kbuild: add static to prototypes
    kbuild: fail build if recordmcount.pl fails
    kbuild: set -fconserve-stack option for gcc 4.5
    kbuild: echo the record_mcount command
    gconfig: disable "typeahead find" search in treeviews
    kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
    checkincludes.pl: add option to remove duplicates in place
    markup_oops: use modinfo to avoid confusion with underscored module names
    checkincludes.pl: provide usage helper
    checkincludes.pl: close file as soon as we're done with it
    ctags: usability fix
    kernel hacking: move STRIP_ASM_SYMS from General
    gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
    kbuild: Check if linker supports the -X option
    kbuild: introduce ld-option
    ...

    Fix trivial conflict in scripts/basic/fixdep.c

    Linus Torvalds
     

23 Sep, 2009

3 commits

  • For /proc/kcore, each arch registers its memory range by kclist_add().
    In usual,

    - range of physical memory
    - range of vmalloc area
    - text, etc...

    are registered but "range of physical memory" has some troubles. It
    doesn't updated at memory hotplug and it tend to include unnecessary
    memory holes. Now, /proc/iomem (kernel/resource.c) includes required
    physical memory range information and it's properly updated at memory
    hotplug. Then, it's good to avoid using its own code(duplicating
    information) and to rebuild kclist for physical memory based on
    /proc/iomem.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Jiri Slaby
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • For /proc/kcore, vmalloc areas are registered per arch. But, all of them
    registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
    them. By this. archs which have no kclist_add() hooks can see vmalloc
    area correctly.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, kclist_add() only eats start address and size as its arguments.
    Considering to make kclist dynamically reconfigulable, it's necessary to
    know which kclists are for System RAM and which are not.

    This patch add kclist types as
    KCORE_RAM
    KCORE_VMALLOC
    KCORE_TEXT
    KCORE_OTHER

    This "type" is used in a patch following this for detecting KCORE_RAM.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

4 commits

  • Add a flag for mmap that will be used to request a huge page region that
    will look like anonymous memory to user space. This is accomplished by
    using a file on the internal vfsmount. MAP_HUGETLB is a modifier of
    MAP_ANONYMOUS and so must be specified with it. The region will behave
    the same as a MAP_ANONYMOUS region using small pages.

    The patch also adds the MAP_STACK flag, which was previously defined only
    on some architectures but not on others. Since MAP_STACK is meant to be a
    hint only, architectures can define it without assigning a specific
    meaning to it.

    Signed-off-by: Arnd Bergmann
    Cc: Eric B Munson
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
    those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.

    Contrary to how I'd imagined it, there's nothing ugly about this, just a
    zero_pfn test built into one or another block of vm_normal_page().

    But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
    my_zero_pfn() inlines. Reinstate its mremap move_pte() shuffling of
    ZERO_PAGEs we did from 2.6.17 to 2.6.19? Not unless someone shouts for
    that: it would have to take vm_flags to weed out some cases.

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Commit 96177299416dbccb73b54e6b344260154a445375 ("Drop free_pages()")
    modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
    int'. This made the casts to 'unsigned long' in most callers superfluous,
    so remove them.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Christoph Lameter
    Acked-by: Ingo Molnar
    Acked-by: Russell King
    Acked-by: David S. Miller
    Acked-by: Kyle McMartin
    Acked-by: WANG Cong
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Haavard Skinnemoen
    Cc: Mikael Starvik
    Cc: "Luck, Tony"
    Cc: Hirokazu Takata
    Cc: Ralf Baechle
    Cc: David Howells
    Acked-by: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Chris Zankel
    Cc: Michal Simek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • The out-of-tree KSM used ioctls on fds cloned from /dev/ksm to register a
    memory area for merging: we prefer now to use an madvise(2) interface.

    This patch just defines MADV_MERGEABLE (to tell KSM it may merge pages in
    this area found identical to pages in other mergeable areas) and
    MADV_UNMERGEABLE (to undo that).

    Most architectures use asm-generic, but alpha, mips, parisc, xtensa need
    their own definitions: included here for mmotm convenience, but we'll
    probably want to split this and feed pieces to arch maintainers.

    Based upon earlier patches by Chris Wright and Izik Eidus.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Chris Wright
    Signed-off-by: Izik Eidus
    Cc: Michael Kerrisk
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Helge Deller
    Cc: Chris Zankel
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Avi Kivity
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 Sep, 2009

2 commits

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Signed-off-by: Joe Perches
    Signed-off-by: Tim Abbott
    Acked-by: Paul Mundt
    Signed-off-by: Sam Ravnborg

    Joe Perches