13 Nov, 2009

2 commits


12 Nov, 2009

1 commit

  • Originally task_s/utime() were designed to return clock_t but
    later changed to return cputime_t by following commit:

    commit efe567fc8281661524ffa75477a7c4ca9b466c63
    Author: Christian Borntraeger
    Date: Thu Aug 23 15:18:02 2007 +0200

    It only changed the type of return value, but not the
    implementation. As the result the granularity of task_s/utime()
    is still that of clock_t, not that of cputime_t.

    So using task_s/utime() in __exit_signal() makes values
    accumulated to the signal struct to be rounded and coarse
    grained.

    This patch removes casts to clock_t in task_u/stime(), to keep
    granularity of cputime_t over the calculation.

    v2:
    Use div_u64() to avoid error "undefined reference to `__udivdi3`"
    on some 32bit systems.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: xiyou.wangcong@gmail.com
    Cc: Spencer Candland
    Cc: Oleg Nesterov
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

11 Nov, 2009

1 commit

  • From the code in rt_mutex_setprio(), it is evident that the
    intention is that task's with a RT 'prio' value as a consequence
    of receiving a PI boost also have their 'sched_class' field set
    to '&rt_sched_class'.

    However, Peter noticed that the code in __setscheduler() could
    result in this intention being frustrated. Fix it.

    Reported-by: Peter Williams
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Nov, 2009

1 commit

  • Commit 1b9508f, "Rate-limit newidle" has been confirmed to fix
    the netperf UDP loopback regression reported by Alex Shi.

    This is a cleanup and a fix:

    - moved to a more out of the way spot

    - fix to ensure that balancing doesn't try to balance
    runqueues which haven't gone online yet, which can
    mess up CPU enumeration during boot.

    Reported-by: Alex Shi
    Reported-by: Zhang, Yanmin
    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Cc: # .32.x: a1f84a3: sched: Check for an idle shared cache
    Cc: # .32.x: 1b9508f: sched: Rate-limit newidle
    Cc: # .32.x: fd21073: sched: Fix affinity logic
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

08 Nov, 2009

1 commit


05 Nov, 2009

3 commits

  • Ingo Molnar reported:

    [ 26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
    [ 26.808000] caller is vmstat_update+0x26/0x70
    [ 26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
    [ 26.816000] Call Trace:
    [ 26.820000] [] ? printk+0x28/0x3c
    [ 26.824000] [] debug_smp_processor_id+0xf0/0x110
    [ 26.824000] mount used greatest stack depth: 1464 bytes left
    [ 26.828000] [] vmstat_update+0x26/0x70
    [ 26.832000] [] worker_thread+0x188/0x310
    [ 26.836000] [] ? worker_thread+0x127/0x310
    [ 26.840000] [] ? autoremove_wake_function+0x0/0x60
    [ 26.844000] [] ? worker_thread+0x0/0x310
    [ 26.848000] [] kthread+0x7c/0x90
    [ 26.852000] [] ? kthread+0x0/0x90
    [ 26.856000] [] kernel_thread_helper+0x7/0x10
    [ 26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
    [ 26.864000] caller is vmstat_update+0x3c/0x70

    Because this commit:

    a1f84a3: sched: Check for an idle shared cache in select_task_rq_fair()

    broke ->cpus_allowed.

    Signed-off-by: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: arjan@infradead.org
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Rate limit newidle to migration_cost. It's a win for all
    stages of sysbench oltp tests.

    Signed-off-by: Mike Galbraith
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • When waking affine, check for an idle shared cache, and if
    found, wake to that CPU/sibling instead of the waker's CPU.

    This improves pgsql+oltp ramp up by roughly 8%. Possibly more
    for other loads, depending on overlap. The trade-off is a
    roughly 1% peak downturn if tasks are truly synchronous.

    Signed-off-by: Mike Galbraith
    Cc: Arjan van de Ven
    Cc: Peter Zijlstra
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

04 Nov, 2009

6 commits

  • Currently partition_sched_domains() takes a 'struct cpumask
    *doms_new' which is a kmalloc'ed array of cpumask_t. You can't
    have such an array if 'struct cpumask' is undefined, as we plan
    for CONFIG_CPUMASK_OFFSTACK=y.

    So, we make this an array of cpumask_var_t instead: this is the
    same for the CONFIG_CPUMASK_OFFSTACK=n case, but requires
    multiple allocations for the CONFIG_CPUMASK_OFFSTACK=y case.
    Hence we add alloc_sched_domains() and free_sched_domains()
    functions.

    Signed-off-by: Rusty Russell
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     
  • find_lowest_rq() wants to call pick_optimal_cpu() on the
    intersection of sched_domain_span(sd) and lowest_mask. Rather
    than doing a cpus_and into a temporary, we can open-code it.

    This actually makes the code slightly clearer, IMHO.

    Signed-off-by: Rusty Russell
    Acked-by: Gregory Haskins
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     
  • Peter Zijlstra suggested that we remove USER_SCHED at:

    http://lkml.org/lkml/2009/3/21/67

    Removing USER_SCHED removes a lot of code from the scheduler
    and simplifies the code.

    We already have the ability to do user based classification
    which is tightened using PAM in userspace.

    Schedule USER_SCHED for removal in 2.6.34

    Signed-off-by: Dhaval Giani
    Acked-by: Peter Zijlstra
    Cc: Balbir Singh
    Cc: Bharata B Rao
    Cc: Serge E. Hallyn
    Cc: Srivatsa Vaddagiri
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dhaval Giani
     
  • cpu_nr_migrations() is not used, remove it.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     
  • time_sync_thresh had been removed.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     
  • __schedule() had been removed.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

26 Oct, 2009

2 commits

  • CPU time of a guest is always accounted in 'user' time
    without concern for the nice value of its counterpart
    process although the guest is scheduled under the nice
    value.

    This patch fixes the defect and accounts cpu time of
    a niced guest in 'nice' time as same as a niced process.

    And also the patch adds 'guest_nice' to cpuacct. The
    value provides niced guest cpu time which is like 'nice'
    to 'user'.

    The original discussions can be found here:

    http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
    http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

    Signed-off-by: Ryota Ozaki
    Acked-by: Avi Kivity
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ryota Ozaki
     
  • Conflicts:
    fs/proc/array.c

    Merge reason: resolve conflict and queue up dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Oct, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    move virtrng_remove to .devexit.text
    move virtballoon_remove to .devexit.text
    virtio_blk: Revert serial number support
    virtio: let header files include virtio_ids.h
    virtio_blk: revert QUEUE_FLAG_VIRT addition

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
    KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
    KS8851: Fix MAC address write order
    KS8851: Add soft reset at probe time
    net: fix section mismatch in fec.c
    net: Fix struct inet_timewait_sock bitfield annotation
    tcp: Try to catch MSG_PEEK bug
    net: Fix IP_MULTICAST_IF
    bluetooth: static lock key fix
    bluetooth: scheduling while atomic bug fix
    tcp: fix TCP_DEFER_ACCEPT retrans calculation
    tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
    tcp: accept socket after TCP_DEFER_ACCEPT period
    Revert "tcp: fix tcp_defer_accept to consider the timeout"
    AF_UNIX: Fix deadlock on connecting to shutdown socket
    ethoc: clear only pending irqs
    ethoc: inline regs access
    vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
    virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
    be2net: fix support for PCI hot plug
    ...

    Linus Torvalds
     

22 Oct, 2009

16 commits

  • The function virtrng_remove is used only wrapped by __devexit_p so define
    it using __devexit.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sam Ravnborg
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Acked-by: Christian Borntraeger
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Rusty Russell

    Uwe Kleine-König
     
  • The function virtballoon_remove is used only wrapped by __devexit_p so
    define it using __devexit.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sam Ravnborg
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Uwe Kleine-König
     
  • This reverts "Add serial number support for virtio_blk, V4a".

    Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
    on virtio config space, so noone could ever use this.

    This is coming back later in a cleaner form.

    Signed-off-by: Rusty Russell
    Cc: john cooper
    Cc: Jens Axboe

    Rusty Russell
     
  • Rusty,

    commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
    moved all device IDs into a single file. While the change itself is
    a very good one, it can break userspace applications. For example
    if a userspace tool wanted to get the ID of virtio_net it used to
    include virtio_net.h. This does no longer work, since virtio_net.h
    does not include virtio_ids.h.
    This patch moves all "#include " from the C
    files into the header files, making the header files compatible with
    the old ones.

    In addition, this patch exports virtio_ids.h to userspace.

    CC: Fernando Luis Vazquez Cao
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     
  • It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
    regressions for Fedora users:

    https://bugzilla.redhat.com/show_bug.cgi?id=509383
    https://bugzilla.redhat.com/show_bug.cgi?id=505695

    while I can't reproduce those extreme regressions myself I think the flag
    is wrong.

    Rationale:

    QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
    unplugged immediately. This is not a good behaviour for at least
    qemu and kvm where we do have significant overhead for every
    I/O operations. Even with all the latested speeups (native AIO,
    MSI support, zero copy) we can only get native speed for up to 128kb
    I/O requests we already are down to 66% of native performance for 4kb
    requests even on my laptop running the Intel X25-M SSD for which the
    QUEUE_FLAG_NONROT was designed.
    If we ever get virtio-blk overhead low enough that this flag makes
    sense it should only be set based on a feature flag set by the host.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Rusty Russell

    Christoph Hellwig
     
  • …ied to the head buffer in the Vlan packets case

    Signed-off-by: Joyce Yu <joyce.yu@sun.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Joyce Yu
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify:
    dnotify: ignore FS_EVENT_ON_CHILD
    inotify: fix coalesce duplicate events into a single event in special case
    inotify: deprecate the inotify kernel interface
    fsnotify: do not set group for a mark before it is on the i_list

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()
    Input: atkbd - consolidate force release quirks for volume keys
    Input: logips2pp - model 73 is actually TrackMan FX
    Input: i8042 - add Sony Vaio VGN-FZ240E to the nomux list
    Input: fix locking issue in /proc/bus/input/ handlers
    Input: atkbd - postpone restoring LED/repeat rate at resume
    Input: atkbd - restore resetting LED state at startup
    Input: i8042 - make pnp_data_busted variable boolean instead of int
    Input: synaptics - add another Protege M300 to rate blacklist

    Linus Torvalds
     
  • * 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: Prevent kvm_init from corrupting debugfs structures
    KVM: MMU: fix pointer cast
    KVM: use proper hrtimer function to retrieve expiration time

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
    dm snapshot: allow chunk size to be less than page size
    dm snapshot: use unsigned integer chunk size
    dm snapshot: lock snapshot while supplying status
    dm exception store: fix failed set_chunk_size error path
    dm snapshot: require non zero chunk size by end of ctr
    dm: dec_pending needs locking to save error value
    dm: add missing del_gendisk to alloc_dev error path
    dm log: userspace fix incorrect luid cast in userspace_ctr
    dm snapshot: free exception store on init failure
    dm snapshot: sort by chunk size to fix race

    Linus Torvalds
     
  • Increase TEST_SUSPEND_SECONDS to 10 so the warning in
    suspend_test_finish() doesn't annoy the users of slower systems so much.

    Also, make the warning print the suspend-resume cycle time, so that we
    know why the warning actually triggered.

    Patch prepared during the hacking session at the Kernel Summit in Tokyo.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This fixes a compile bug introduced in

    6ef297f (ARM: 5720/1: Move MMCI header to amba include dir)

    That commit moved arch/arm/include/asm/mach/mmc.h to
    include/linux/amba/mmci.h. Just removing the include was enough.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Linus Walleij
    Acked-by: Nicolas Ferre
    Acked-by: Bill Gatliff
    Cc: Catalin Marinas
    Cc: Russell King
    Cc: Pierre Ossman
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • * 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Kill off stray HAVE_FTRACE_SYSCALLS reference.
    sh: Remove BKL from landisk gio.
    sh: disabled cache handling fix.
    sh: Fix up single page flushing to use PAGE_SIZE.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: aesni-intel - Fix irq_fpu_usable usage
    crypto: padlock-sha - Fix stack alignment

    Linus Torvalds
     
  • Fix a (small) memory leak in one of the error paths of the NFS mount
    options parsing code.

    Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
    rpc/rdma transport module automatically).

    Reported-by: Yinghai Lu
    Reported-by: Pekka Enberg
    Signed-off-by: Ingo Molnar
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • This patch fixes a null pointer exception in pipe_rdwr_open() which
    generates the stack trace:

    > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
    > [] pipe_rdwr_open+0x35/0x70
    > [] __dentry_open+0x13c/0x230
    > [] do_filp_open+0x2d/0x40
    > [] do_sys_open+0x5a/0x100
    > [] sysenter_do_call+0x1b/0x67

    The failure mode is triggered by an attempt to open an anonymous
    pipe via /proc/pid/fd/* as exemplified by this script:

    =============================================================
    while : ; do
    { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
    PID=$!
    OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
    { read PID REST ; echo $PID; } )
    OUT="${OUT%% *}"
    DELAY=$((RANDOM * 1000 / 32768))
    usleep $((DELAY * 1000 + RANDOM % 1000 ))
    echo n > /proc/$OUT/fd/1 # Trigger defect
    done
    =============================================================

    Note that the failure window is quite small and I could only
    reliably reproduce the defect by inserting a small delay
    in pipe_rdwr_open(). For example:

    static int
    pipe_rdwr_open(struct inode *inode, struct file *filp)
    {
    msleep(100);
    mutex_lock(&inode->i_mutex);

    Although the defect was observed in pipe_rdwr_open(), I think it
    makes sense to replicate the change through all the pipe_*_open()
    functions.

    The core of the change is to verify that inode->i_pipe has not
    been released before attempting to manipulate it. If inode->i_pipe
    is no longer present, return ENOENT to indicate so.

    The comment about potentially using atomic_t for i_pipe->readers
    and i_pipe->writers has also been removed because it is no longer
    relevant in this context. The inode->i_mutex lock must be used so
    that inode->i_pipe can be dealt with correctly.

    Signed-off-by: Earl Chew
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Earl Chew
     

21 Oct, 2009

5 commits

  • In ks8851_set_rx_mode() the case handling IFF_MULTICAST was also setting
    the RXCR1_AE bit by accident. This meant that all unicast frames where
    being accepted by the device. Remove RXCR1_AE from this case.

    Note, RXCR1_AE was also masking a problem with setting the MAC address
    properly, so needs to be applied after fixing the MAC write order.

    Fixes a bug reported by Doong, Ping of Micrel. This version of the
    patch avoids setting RXCR1_ME for all cases.

    Signed-off-by: Ben Dooks
    Signed-off-by: David S. Miller

    Ben Dooks
     
  • The MAC address register was being written in the wrong order, so add
    a new address macro to convert mac-address byte to register address and
    a ks8851_wrreg8() function to write each byte without having to worry
    about any difficult byte swapping.

    Fixes a bug reported by Doong, Ping of Micrel.

    Signed-off-by: Ben Dooks
    Signed-off-by: David S. Miller

    Ben Dooks
     
  • Issue a full soft reset at probe time.

    This was reported by Doong Ping of Micrel, but no explanation of why this
    is necessary or what bug it is fixing. Add it as it does not seem to hurt
    the current driver and ensures that the device is in a known state when we
    start setting it up.

    Signed-off-by: Ben Dooks
    Signed-off-by: David S. Miller

    Ben Dooks
     
  • fec_enet_init is called by both fec_probe and fec_resume, so it
    shouldn't be marked as __init.

    Signed-off-by: Steven King
    Signed-off-by: David S. Miller

    Steven King
     
  • Mask off FS_EVENT_ON_CHILD in dnotify_handle_event(). Otherwise, when there
    is more than one watch on a directory and dnotify_should_send_event()
    succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
    spurious events.

    This case was overlooked in commit e42e2773.

    #define _GNU_SOURCE

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static void create_event(int s, siginfo_t* si, void* p)
    {
    printf("create\n");
    }

    static void delete_event(int s, siginfo_t* si, void* p)
    {
    printf("delete\n");
    }

    int main (void) {
    struct sigaction action;
    char *tmpdir, *file;
    int fd1, fd2;

    sigemptyset (&action.sa_mask);
    action.sa_flags = SA_SIGINFO;

    action.sa_sigaction = create_event;
    sigaction (SIGRTMIN + 0, &action, NULL);

    action.sa_sigaction = delete_event;
    sigaction (SIGRTMIN + 1, &action, NULL);

    # define TMPDIR "/tmp/test.XXXXXX"
    tmpdir = malloc(strlen(TMPDIR) + 1);
    strcpy(tmpdir, TMPDIR);
    mkdtemp(tmpdir);

    # define TMPFILE "/file"
    file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
    sprintf(file, "%s/%s", tmpdir, TMPFILE);

    fd1 = open (tmpdir, O_RDONLY);
    fcntl(fd1, F_SETSIG, SIGRTMIN);
    fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE);

    fd2 = open (tmpdir, O_RDONLY);
    fcntl(fd2, F_SETSIG, SIGRTMIN + 1);
    fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE);

    if (fork()) {
    /* This triggers a create event */
    creat(file, 0600);
    /* This triggers a create and delete event (!) */
    unlink(file);
    } else {
    sleep(1);
    rmdir(tmpdir);
    }

    return 0;
    }

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Eric Paris

    Andreas Gruenbacher