16 Jun, 2009

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits)
    pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US
    ipv4: Fix fib_trie rebalancing
    Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver
    Bluetooth: Fix Kconfig issue with RFKILL integration
    PIM-SM: namespace changes
    ipv4: update ARPD help text
    net: use a deferred timer in rt_check_expire
    ieee802154: fix kconfig bool/tristate muckup
    bonding: initialization rework
    bonding: use is_zero_ether_addr
    bonding: network device names are case sensative
    bonding: elminate bad refcount code
    bonding: fix style issues
    bonding: fix destructor
    bonding: remove bonding read/write semaphore
    bonding: initialize before registration
    bonding: bond_create always called with default parameters
    x_tables: Convert printk to pr_err
    netfilter: conntrack: optional reliable conntrack event delivery
    list_nulls: add hlist_nulls_add_head and hlist_nulls_del
    ...

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
    powerpc: Fix bug in move of altivec code to vector.S
    powerpc: Add support for swiotlb on 32-bit
    powerpc/spufs: Remove unused error path
    powerpc: Fix warning when printing a resource_size_t
    powerpc/xmon: Remove unused variable in xmon.c
    powerpc/pseries: Fix warnings when printing resource_size_t
    powerpc: Shield code specific to 64-bit server processors
    powerpc: Separate PACA fields for server CPUs
    powerpc: Split exception handling out of head_64.S
    powerpc: Introduce CONFIG_PPC_BOOK3S
    powerpc: Move VMX and VSX asm code to vector.S
    powerpc: Set init_bootmem_done on NUMA platforms as well
    powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
    powerpc/mm: Fix some SMP issues with MMU context handling
    powerpc: Add PTRACE_SINGLEBLOCK support
    fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
    powerpc/virtex: Add ml510 reference design device tree
    powerpc/virtex: Add Xilinx ML510 reference design support
    powerpc/virtex: refactor intc driver and add support for i8259 cascading
    powerpc/virtex: Add support for Xilinx PCI host bridge
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
    nilfs2: support contiguous lookup of blocks
    nilfs2: add sync_page method to page caches of meta data
    nilfs2: use device's backing_dev_info for btree node caches
    nilfs2: return EBUSY against delete request on snapshot
    nilfs2: modify list of unsupported features in caveats
    nilfs2: enable sync_page method
    nilfs2: set bio unplug flag for the last bio in segment
    nilfs2: allow future expansion of metadata read out via get info ioctl
    NILFS2: Pagecache usage optimization on NILFS2
    nilfs2: remove nilfs_btree_operations from btree mapping
    nilfs2: remove nilfs_direct_operations from direct mapping
    nilfs2: remove bmap pointer operations
    nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
    nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
    nilfs2: move get block functions in bmap.c into btree codes
    nilfs2: remove nilfs_bmap_delete_block
    nilfs2: remove nilfs_bmap_put_block
    nilfs2: remove header file for segment list operations
    nilfs2: eliminate removal list of segments
    nilfs2: add sufile function that can modify multiple segment usages
    ...

    Linus Torvalds
     

15 Jun, 2009

5 commits

  • Conflicts:
    Documentation/feature-removal-schedule.txt
    drivers/scsi/fcoe/fcoe.c
    net/core/drop_monitor.c
    net/core/net-traces.c

    David S. Miller
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
    .gitignore: ignore *.lzma files
    kbuild: add generic --set-str option to scripts/config
    kbuild: simplify argument loop in scripts/config
    kbuild: handle non-existing options in scripts/config
    kallsyms: generalize text region handling
    kallsyms: support kernel symbols in Blackfin on-chip memory
    documentation: make version fix
    kbuild: fix a compile warning
    gitignore: Add GNU GLOBAL files to top .gitignore
    kbuild: fix delay in setlocalversion on readonly source
    README: fix misleading pointer to the defconf directory
    vmlinux.lds.h update
    kernel-doc: cleanup perl script
    Improve vmlinux.lds.h support for arch specific linker scripts
    kbuild: fix headers_exports with boolean expression
    kbuild/headers_check: refine extern check
    kbuild: fix "Argument list too long" error for "make headers_check",
    ignore *.patch files
    Remove bashisms from scripts
    menu: fix embedded menu presentation
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits)
    trivial: remove the trivial patch monkey's name from SubmittingPatches
    trivial: Fix a typo in comment of addrconf_dad_start()
    trivial: usb: fix missing space typo in doc
    trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug
    trivial: Remove the hyphen from git commands
    trivial: fix ETIMEOUT -> ETIMEDOUT typos
    trivial: Kconfig: .ko is normally not included in module names
    trivial: SubmittingPatches: fix typo
    trivial: Documentation/dell_rbu.txt: fix typos
    trivial: Fix Pavel's address in MAINTAINERS
    trivial: ftrace:fix description of trace directory
    trivial: unnecessary (void*) cast removal in sound/oss/msnd.c
    trivial: input/misc: Fix typo in Kconfig
    trivial: fix grammo in bus_for_each_dev() kerneldoc
    trivial: rbtree.txt: fix rb_entry() parameters in sample code
    trivial: spelling fix in ppc code comments
    trivial: fix typo in bio_alloc kernel doc
    trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt
    trivial: Miscellaneous documentation typo fixes
    trivial: fix typo milisecond/millisecond for documentation and source comments.
    ...

    Linus Torvalds
     
  • * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (417 commits)
    MAINTAINERS: EB110ATX is not ebsa110
    MAINTAINERS: update Eric Miao's email address and status
    fb: add support of LCD display controller on pxa168/910 (base layer)
    [ARM] 5552/1: ep93xx get_uart_rate(): use EP93XX_SYSCON_PWRCNT and EP93XX_SYSCON_PWRCN
    [ARM] pxa/sharpsl_pm: zaurus needs generic pxa suspend/resume routines
    [ARM] 5544/1: Trust PrimeCell resource sizes
    [ARM] pxa/sharpsl_pm: cleanup of gpio-related code.
    [ARM] pxa/sharpsl_pm: drop set_irq_type calls
    [ARM] pxa/sharpsl_pm: merge pxa-specific code into generic one
    [ARM] pxa/sharpsl_pm: merge the two sharpsl_pm.c since it's now pxa specific
    [ARM] sa1100: remove unused collie_pm.c
    [ARM] pxa: fix the conflicting non-static declarations of global_gpios[]
    [ARM] 5550/1: Add default configure file for w90p910 platform
    [ARM] 5549/1: Add clock api for w90p910 platform.
    [ARM] 5548/1: Add gpio api for w90p910 platform
    [ARM] 5551/1: Add multi-function pin api for w90p910 platform.
    [ARM] Make ARM_VIC_NR depend on ARM_VIC
    [ARM] 5546/1: ARM PL022 SSP/SPI driver v3
    ARM: OMAP4: SMP: Update defconfig for OMAP4430
    ARM: OMAP4: SMP: Enable SMP support for OMAP4430
    ...

    Linus Torvalds
     
  • The Makefiles in the build directories use the internal make variable
    MAKEFILE_LIST which is available from make 3.80 only. (The patch would be
    valid back to 2.6.25)

    Signed-off-by: Adam Lackorzynski
    Signed-off-by: Andrew Morton
    Signed-off-by: Sam Ravnborg

    Adam Lackorzynski
     

14 Jun, 2009

3 commits

  • * 'next-i2c' of git://aeryn.fluff.org.uk/bjdooks/linux:
    i2c-ocores: Can add I2C devices to the bus
    i2c-s3c2410: move to using platform idtable to match devices
    i2c: OMAP3: Better noise suppression for fast/standard modes
    i2c: OMAP2/3: Fix scll/sclh calculations
    i2c: Blackfin TWI: implement I2C_FUNC_SMBUS_I2C_BLOCK functionality
    i2c: Blackfin TWI: fix transfer errors with repeat start
    i2c: Blackfin TWI: fix REPEAT START mode doesn't repeat
    i2c: Blackfin TWI: make sure we don't end up with a CLKDIV=0

    Linus Torvalds
     
  • * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
    x86, mce: Add boot options for corrected errors
    x86, mce: Fix mce printing
    x86, mce: fix for mce counters
    x86, mce: support action-optional machine checks
    x86, mce: define MCE_VECTOR
    x86, mce: rename mce_notify_user to mce_notify_irq
    x86: fix panic with interrupts off (needed for MCE)
    x86, mce: export MCE severities coverage via debugfs
    x86, mce: implement new status bits
    x86, mce: print header/footer only once for multiple MCEs
    x86, mce: default to panic timeout for machine checks
    x86, mce: improve mce_get_rip
    x86, mce: make non Monarch panic message "Fatal machine check" too
    x86, mce: switch x86 machine check handler to Monarch election.
    x86, mce: implement panic synchronization
    x86, mce: implement bootstrapping for machine check wakeups
    x86, mce: check early in exception handler if panic is needed
    x86, mce: add table driven machine check grading
    x86, mce: remove TSC print heuristic
    x86, mce: log corrected errors when panicing
    ...

    Linus Torvalds
     
  • * 'docs-next' of git://git.lwn.net/linux-2.6:
    Document the debugfs API
    Documentation: Add "how to write a good patch summary" to SubmittingPatches
    SubmittingPatches: fix typo
    docs: Encourage better changelogs in the development process document
    Document Reported-by in SubmittingPatches

    Linus Torvalds
     

13 Jun, 2009

17 commits


12 Jun, 2009

12 commits

  • Support the VIRTIO_RING_F_INDIRECT_DESC feature.

    This is a simple matter of changing the descriptor walking
    code to operate on a struct vring_desc* and supplying it
    with an indirect table if detected.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell

    Mark McLoughlin
     
  • The Guest only really needs to tell us about activity when we're going
    to listen to the eventfd: normally, we don't want to know.

    So if there are no available buffers, turn on notifications, re-check,
    then wait for the Guest to notify us via the eventfd, then turn
    notifications off again.

    There's enough else going on that the differences are in the noise.

    Before: Secs RxKicks TxKicks
    1G TCP Guest->Host: 3.94 4686 32815
    1M normal pings: 104 142862 1000010
    1M 1k pings (-l 120): 57 142026 1000007

    After:
    1G TCP Guest->Host: 3.76 4691 32811
    1M normal pings: 111 142859 997467
    1M 1k pings (-l 120): 55 19648 501549

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Rather than triggering an interrupt every time, we only trigger an
    interrupt when there are no more incoming packets (or the recv queue
    is full).

    However, the overhead of doing the select to figure this out is
    measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host
    TCP goes from 3.69 to 3.94 seconds. It's close to the noise though.

    I tested various timeouts, including reducing it as the number of
    pending packets increased, timing a 1 gigabyte TCP send from Guest ->
    Host and Host -> Guest (GSO disabled, to increase packet rate).

    // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null

    Timeout Guest->Host Pkts/irq Host->Guest Pkts/irq
    Before 11.3s 1.0 6.3s 1.0
    0 11.7s 1.0 6.6s 23.5
    1 17.1s 8.8 8.6s 26.0
    1/pending 13.4s 1.9 6.6s 23.8
    2/pending 13.6s 2.8 6.6s 24.1
    5/pending 14.1s 5.0 6.6s 24.4

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • If we track how many buffers we've used, we can tell whether we really
    need to interrupt the Guest. This happens as a side effect of
    spurious notifications.

    Spurious notifications happen because it can take a while before the
    Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and
    meanwhile the Guest can more notifications.

    A real fix would be to use wake counts, rather than a suppression
    flag, but the practical difference is generally in the noise: the
    interrupt is usually coalesced into a pending one anyway so we just
    save a system call which isn't clearly measurable.

    Secs Spurious IRQS
    1G TCP Guest->Host: 3.93 58
    1M normal pings: 100 72
    1M 1k pings (-l 120): 57 492904

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Rather than sending an interrupt on every buffer, we only send an interrupt
    when we're about to wait for the Guest to send us a new one. The console
    input and network input still send interrupts manually, but the block device,
    network and console output queues can simply rely on this logic to send
    interrupts to the Guest at the right time.

    The patch is cluttered by moving trigger_irq() higher in the code.

    In practice, two factors make this optimization less interesting:
    (1) we often only get one input at a time, even for networking,
    (2) triggering an interrupt rapidly tends to get coalesced anyway.

    Before: Secs RxIRQS TxIRQs
    1G TCP Guest->Host: 3.72 32784 32771
    1M normal pings: 99 1000004 995541
    100,000 1k pings (-l 120): 5 49510 49058

    After:
    1G TCP Guest->Host: 3.69 32809 32769
    1M normal pings: 99 1000004 996196
    100,000 1k pings (-l 120): 5 52435 52361

    (Note the interrupt count on 100k pings goes *up*: see next patch).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Currently lguest has three threads: the main Launcher thread, a Waker
    thread, and a thread for the block device (because synchronous block
    was simply too painful to bear).

    The Waker selects() on all the input file descriptors (eg. stdin, net
    devices, pipe to the block thread) and when one becomes readable it calls
    into the kernel to kick the Launcher thread out into userspace, which
    repeats the poll, services the device(s), and then tells the kernel to
    release the Waker before re-entering the kernel to run the Guest.

    Also, to make a slightly-decent network transmit routine, the Launcher
    would suppress further network interrupts while it set a timer: that
    signal handler would write to a pipe, which would rouse the Waker
    which would prod the Launcher out of the kernel to check the network
    device again.

    Now we can convert all our virtqueues to separate threads: each one has
    a separate eventfd for when the Guest pokes the device, and can trigger
    interrupts in the Guest directly.

    The linecount shows how much this simplifies, but to really bring it
    home, here's an strace analysis of single Guest->Host ping before:

    * Guest sends packet, notifies xmit vq, return control to Launcher
    * Launcher clears notification flag on xmit ring
    * Launcher writes packet to TUN device
    writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
    * Launcher sets up interrupt for Guest (xmit ring is empty)
    write(10, "\2\0\0\0\3\0\0\0", 8) = 0
    * Launcher sets up timer for interrupt mitigation
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
    * Launcher re-runs guest
    pread64(10, 0xbfa5f4d4, 4, 0) ...
    * Waker notices reply packet in tun device (it was in select)
    select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
    * Waker kicks Launcher out of guest:
    pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
    * Launcher returns from running guest:
    ... = -1 EAGAIN (Resource temporarily unavailable)
    * Launcher looks at input fds:
    select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
    * Launcher reads pong from tun device:
    readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
    * Launcher injects guest notification:
    write(10, "\2\0\0\0\2\0\0\0", 8) = 0
    * Launcher rechecks fds:
    select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
    * Launcher clears Waker:
    pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
    * Launcher reruns Guest:
    pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
    * Signal comes in, uses pipe to wake up Launcher:
    --- SIGALRM (Alarm clock) @ 0 (0) ---
    write(8, "\0", 1) = 1
    sigreturn() = ? (mask now [])
    * Waker sees write on pipe:
    select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
    * Waker kicks Launcher out of Guest:
    pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
    * Launcher exits from kernel:
    pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
    * Launcher looks to see what fd woke it:
    select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
    * Launcher reads timeout fd, sets notification flag on xmit ring
    read(6, "\0", 32) = 1
    * Launcher rechecks fds:
    select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
    * Launcher clears Waker:
    pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
    * Launcher resumes Guest:
    pread64(10, "\0p\0\4", 4, 0) ....

    strace analysis of single Guest->Host ping after:

    * Guest sends packet, notifies xmit vq, creates event on eventfd.
    * Network xmit thread wakes from read on eventfd:
    read(7, "\1\0\0\0\0\0\0\0", 8) = 8
    * Network xmit thread writes packet to TUN device
    writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
    * Network recv thread wakes up from read on tunfd:
    readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
    * Network recv thread sets up interrupt for the Guest
    write(6, "\2\0\0\0\2\0\0\0", 8) = 0
    * Network recv thread goes back to reading tunfd
    13:39:42.460285 readv(4,
    * Network xmit thread sets up interrupt for Guest (xmit ring is empty)
    write(6, "\2\0\0\0\3\0\0\0", 8) = 0
    * Network xmit thread goes back to reading from eventfd
    read(7,

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • This version requires that host and guest have the same PAE status.
    NX cap is not offered to the guest, yet.

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • I've never seen it here, but I can't find anywhere that says writev
    will write everything.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • The "len" field in the used ring for virtio indicates the number of
    bytes *written* to the buffer. This means the guest doesn't have to
    zero the buffers in advance as it always knows the used length.

    Erroneously, the console and network example code puts the length
    *read* into that field. The guest ignores it, but it's wrong.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • 18 months ago 5bbf89fc260830f3f58b331d946a16b39ad1ca2d changed to loading
    bzImages directly, and no longer manually ungzipping them, so we no longer
    need libz.

    Also, -m32 is useful for those on 64-bit platforms (and harmless on
    32-bit).

    Reported-by: Ron Minnich
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • 20887611523e749d99cc7d64ff6c97d27529fbae (lguest: notify on empty) introduced
    lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
    interrupts all the time.

    Because we always process one buffer at a time, the inflight count is always 0
    when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
    the Guest.

    It should be looking to see if there are more buffers in the Guest's queue:
    if it's empty, then we force an interrupt.

    This makes little difference, since we usually have an empty queue; but
    that's the subject of another patch.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Since the Launcher process runs the Guest, it doesn't have to be very
    serious about its barriers: the Guest isn't running while we are (Guest
    is UP).

    Before we change to use threads to service devices, we need to fix this.

    Signed-off-by: Rusty Russell

    Rusty Russell