04 Dec, 2007

2 commits


03 Dec, 2007

1 commit

  • Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for
    us, which provided a cpu accounting resource controller. This feature would be
    useful if someone wants to group tasks only for accounting purpose and doesnt
    really want to exercise any control over their cpu consumption.

    The patch below reintroduces the feature. It is based on Paul Menage's
    original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with
    these differences:

    - Removed load average information. I felt it needs more thought (esp
    to deal with SMP and virtualized platforms) and can be added for
    2.6.25 after more discussions.
    - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
    stats are) and invoke cpuacct_charge() from the respective scheduler
    classes
    - Make accounting scalable on SMP systems by splitting the usage
    counter to be per-cpu
    - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
    code is not big enough to warrant a new file and also this rightly
    needs to live inside the scheduler. Also things like accessing
    rq->lock while reading cpu usage becomes easier if the code lived in
    kernel/sched.c)

    The patch also modifies the cpu controller not to provide the same accounting
    information.

    Tested-by: Balbir Singh

    Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
    some simple tests like cpuspin (spin on the cpu), ran several tasks in
    the same group and timed them. Compared their time stamps with
    cpuacct.usage.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Balbir Singh
    Signed-off-by: Ingo Molnar

    Srivatsa Vaddagiri
     

30 Nov, 2007

4 commits

  • In wait_task_stopped() exit_code already contains the right value for the
    si_status member of siginfo, and this is simply set in the non WNOWAIT
    case.

    If you call waitid() with a stopped or traced process, you'll get the signal
    in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
    same time, you'll get the signal << 8 | 0x7f

    Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
    and add 0x7f if we were returning it in the user status field and that
    isn't used for any function that permits WNOWAIT.

    Signed-off-by: Scott James Remnant
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Scott James Remnant
     
  • Fix the extern declaration of kallsyms_num_syms to indicate that the symbol
    does not reside in the small-data storage space, and so may not be accessed
    relative to the small data base register.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Commit 7d69a1f4a72b18876c99c697692b78339d491568 ("remove CONFIG_UTS_NS
    and CONFIG_IPC_NS") by Cedric Le Goater accidentally removed the code
    that prevented the uts->hostname and uts->domainname values from being
    overwritten from another namespace.

    In other words, setting hostname/domainname via sysfs (echo xxx >
    /proc/sys/kernel/(host|domain)name) cased the new value to be set in
    init UTS namespace only.

    Return the isolation back.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Cedric Le Goater
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • wait_task_stopped(WNOWAIT) does task_pid_nr_ns() without tasklist/rcu lock,
    we can read an already freed memory. Use the cached pid_t value.

    Signed-off-by: Oleg Nesterov
    Looks-good-to: Roland McGrath
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

28 Nov, 2007

5 commits


27 Nov, 2007

9 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (41 commits)
    [XFRM]: Fix leak of expired xfrm_states
    [ATM]: [he] initialize lock and tasklet earlier
    [IPV4]: Remove bogus ifdef mess in arp_process
    [SKBUFF]: Free old skb properly in skb_morph
    [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on
    [IPSEC]: Temporarily remove locks around copying of non-atomic fields
    [TCP] MTUprobe: Cleanup send queue check (no need to loop)
    [TCP]: MTUprobe: receiver window & data available checks fixed
    [MAINTAINERS]: tlan list is subscribers-only
    [SUNRPC]: Remove SPIN_LOCK_UNLOCKED
    [SUNRPC]: Make xprtsock.c:xs_setup_{udp,tcp}() static
    [PFKEY]: Sending an SADB_GET responds with an SADB_GET
    [IRDA]: Compilation for CONFIG_INET=n case
    [IPVS]: Fix compiler warning about unused register_ip_vs_protocol
    [ARP]: Fix arp reply when sender ip 0
    [IPV6] TCPMD5: Fix deleting key operation.
    [IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool().
    [IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps.
    [IPV4] TCPMD5: Omit redundant NULL check for kfree() argument.
    ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    sched: bump version of kernel/sched_debug.c
    sched: fix minimum granularity tunings
    sched: fix RLIMIT_CPU comment
    sched: fix kernel/acct.c comment
    sched: fix prev_stime calculation
    sched: don't forget to unlock uids_mutex on error paths

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
    x86: fix APIC related bootup crash on Athlon XP CPUs
    time: add ADJ_OFFSET_SS_READ
    x86: export the symbol empty_zero_page on the 32-bit x86 architecture
    x86: fix kprobes_64.c inlining borkage
    pci: use pci=bfsort for HP DL385 G2, DL585 G2
    x86: correctly set UTS_MACHINE for "make ARCH=x86"
    lockdep: annotate do_debug() trap handler
    x86: turn off iommu merge by default
    x86: fix ACPI compile for LOCAL_APIC=n
    x86: printk kernel version in WARN_ON and other dump_stack users
    ACPI: Set max_cstate to 1 for early Opterons.
    x86: fix NMI watchdog & 'stopped time' problem

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    virtio: fix net driver loop case where we fail to restart
    module: fix and elaborate comments
    virtio: fix module/device unloading
    lguest: Fix uninitialized members in example launcher

    Linus Torvalds
     
  • bump version of kernel/sched_debug.c and remove CFS version
    information from it.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • increase the default minimum granularity some more - this gives us
    more performance in aim7 benchmarks.

    also correct some comments: we scale with ilog(ncpus) + 1.

    Signed-off-by: Zou Nan hai
    Signed-off-by: Ingo Molnar

    Zou Nan hai
     
  • fix kernel/acct.c comment.

    noticed by Lin Tan. Comment suggested by Olaf Kirch.

    also see:

    http://bugzilla.kernel.org/show_bug.cgi?id=8220

    Reported-by: tammy000@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The commit

    commit 5cb350baf580017da38199625b7365b1763d7180
    Author: Dhaval Giani
    Date: Mon Oct 15 17:00:14 2007 +0200

    sched: group scheduling, sysfs tunables

    introduced the uids_mutex and the helpers to lock/unlock it.
    Unfortunately, the error paths of alloc_uid() were not patched
    to unlock it.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Dhaval Giani
    Signed-off-by: Ingo Molnar

    Pavel Emelyanov
     
  • Michael Kerrisk reported that a long standing bug in the adjtimex()
    system call causes glibc's adjtime(3) function to deliver the wrong
    results if 'delta' is NULL.

    add the ADJ_OFFSET_SS_READ API detail, which will be used by glibc
    to fix this API compatibility bug.

    Also see: http://bugzilla.kernel.org/show_bug.cgi?id=6761

    [ mingo@elte.hu: added patch description and made it backwards compatible ]

    NOTE: the new flag is defined 0xa001 so that it returns -EINVAL on
    older kernels - this way glibc can use it safely. Suggested by Ulrich
    Drepper.

    Acked-by: Ulrich Drepper
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    John Stultz
     

20 Nov, 2007

5 commits

  • Remove binary sysctls that never worked due to missing strategy functions.

    Cc: "Eric W. Biederman"
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Remove binary sysctls that never worked due to missing strategy functions.

    Cc: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
    I stronly doubt that anyone is using the sys_sysctl interface to
    these variables.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

    Switch these entried over to use CTL_UNNUMBERED as clearly
    the sys_syscal portion wasn't working.

    This is along the same lines as Christian Borntraeger's patch that fixes
    up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Running the latest git code I get the following messages during boot:
    sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

    I removed the binary sysctl handler for those messages and also removed
    the definitions in ip_vs.h. The alternative would be to implement a
    proper strategy handler, but syscall sysctl is deprecated.

    There are other sysctl definitions that are commented out or work with
    the default sysctl_data strategy. I did not touch these.

    Signed-off-by: Christian Borntraeger
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Christian Borntraeger
     

19 Nov, 2007

1 commit


17 Nov, 2007

2 commits

  • Fix a typo in ntp.c that has caused updating of the persistent (RTC)
    clock when synced to NTP to behave erratically.

    When debugging a freeze that arises on my AMD64 machines when I
    run the ntpd service, I added a number of printk's to monitor the
    sync_cmos_clock procedure. I discovered that it was not syncing to
    cmos RTC every 11 minutes as documented, but instead would keep trying
    every second for hours at a time. The reason turned out to be a typo
    in sync_cmos_clock, where it attempts to ensure that
    update_persistent_clock is called very close to 500 msec. after a 1
    second boundary (required by the PC RTC's spec). That typo referred to
    "xtime" in one spot, rather than "now", which is derived from "xtime"
    but not equal to it. This makes the test erratic, creating a
    "coin-flip" that decides when update_persistent_clock is called - when
    it is called, which is rarely, it may be at any time during the one
    second period, rather than close to 500 msec, so the value written is
    needlessly incorrect, too.

    Signed-off-by: David P. Reed
    Signed-off-by: Thomas Gleixner

    David P. Reed
     
  • dont use the vgetcpu tcache - it's causing problems with tasks
    migrating, they'll see the old cache up to a jiffy after the
    migration, further increasing the costs of the migration.

    In the worst case they see a complete bogus information from
    the tcache, when a sys_getcpu() call "invalidated" the cache
    info by incrementing the jiffies _and_ the cpuid info in the
    cache and the following vdso_getcpu() call happens after
    vdso_jiffies have been incremented.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Ulrich Drepper
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     

16 Nov, 2007

8 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    sched: reorder SCHED_FEAT_ bits
    sched: make sched_nr_latency static
    sched: remove activate_idle_task()
    sched: fix __set_task_cpu() SMP race
    sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHED
    sched: fix accounting of interrupts during guest execution on s390

    Linus Torvalds
     
  • reorder SCHED_FEAT_ bits so that the used ones come first. Makes
    tuning instructions easier.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • sched_nr_latency can now become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Ingo Molnar

    Adrian Bunk
     
  • cpu_down() code is ok wrt sched_idle_next() placing the 'idle' task not
    at the beginning of the queue.

    So get rid of activate_idle_task() and make use of activate_task() instead.
    It is the same as activate_task(), except for the update_rq_clock(rq) call
    that is redundant.

    Code size goes down:

    text data bss dec hex filename
    47853 3934 336 52123 cb9b sched.o.before
    47828 3934 336 52098 cb82 sched.o.after

    Signed-off-by: Dmitry Adamushko
    Signed-off-by: Ingo Molnar

    Dmitry Adamushko
     
  • Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core
    system, which crashes can only be explained via runqueue corruption.

    there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to
    a new value, task_rq_lock(p, ...) can be successfuly executed on another
    CPU. We must ensure that updates of per-task data have been completed by
    this moment.

    this bug has been hiding in the Linux scheduler for an eternity (we never
    had any explicit barrier for task->cpu in set_task_cpu() - so the bug was
    introduced in 2.5.1), but only became visible via set_task_cfs_rq() being
    accidentally put after the task->cpu update. It also probably needs a
    sufficiently out-of-order CPU to trigger.

    Reported-by: Grant Wilson
    Signed-off-by: Dmitry Adamushko
    Signed-off-by: Ingo Molnar

    Dmitry Adamushko
     
  • Suppose that the SCHED_FIFO task does

    switch_uid(new_user);

    Now, p->se.cfs_rq and p->se.parent both point into the old
    user_struct->tg because sched_move_task() doesn't call set_task_cfs_rq()
    for !fair_sched_class case.

    Suppose that old user_struct/task_group is freed/reused, and the task
    does

    sched_setscheduler(SCHED_NORMAL);

    __setscheduler() sets fair_sched_class, but doesn't update
    ->se.cfs_rq/parent which point to the freed memory.

    This means that check_preempt_wakeup() doing

    while (!is_same_group(se, pse)) {
    se = parent_entity(se);
    pse = parent_entity(pse);
    }

    may OOPS in a similar way if rq->curr or p did something like above.

    Perhaps we need something like the patch below, note that
    __setscheduler() can't do set_task_cfs_rq().

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Currently the scheduler checks for PF_VCPU to decide if this timeslice
    has to be accounted as guest time. On s390 host interrupts are not
    disabled during guest execution. This causes theses interrupts to be
    accounted as guest time if CONFIG_VIRT_CPU_ACCOUNTING is set. Solution
    is to check if an interrupt triggered account_system_time. As the tick
    is timer interrupt based, we have to subtract hardirq_offset.

    I tested the patch on s390 with CONFIG_VIRT_CPU_ACCOUNTING and on
    x86_64. Seems to work.

    CC: Avi Kivity
    CC: Laurent Vivier
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Ingo Molnar

    Christian Borntraeger
     
  • The original meaning of the old test (p->state > TASK_STOPPED) was
    "not dead", since it was before TASK_TRACED existed and before the
    state/exit_state split. It was a wrong correction in commit
    14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for
    TASK_TRACED instead. It should have been changed when TASK_TRACED
    was introducted and again when exit_state was introduced.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Kees Cook
    Acked-by: Scott James Remnant
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

15 Nov, 2007

3 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [NET]: rt_check_expire() can take a long time, add a cond_resched()
    [ISDN] sc: Really, really fix warning
    [ISDN] sc: Fix sndpkt to have the correct number of arguments
    [TCP] FRTO: Clear frto_highmark only after process_frto that uses it
    [NET]: Remove notifier block from chain when register_netdevice_notifier fails
    [FS_ENET]: Fix module build.
    [TCP]: Make sure write_queue_from does not begin with NULL ptr
    [TCP]: Fix size calculation in sk_stream_alloc_pskb
    [S2IO]: Fixed memory leak when MSI-X vector allocation fails
    [BONDING]: Fix resource use after free
    [SYSCTL]: Fix warning for token-ring from sysctl checker
    [NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR
    [IWLWIFI]: Not correctly dealing with hotunplug.
    [TCP] FRTO: Plug potential LOST-bit leak
    [TCP] FRTO: Limit snd_cwnd if TCP was application limited
    [E1000]: Fix schedule while atomic when called from mii-tool.
    [NETX]: Fix build failure added by 2.6.24 statistics cleanup.
    [EP93xx_ETH]: Build fix after 2.6.24 NAPI changes.
    [PKT_SCHED]: Check subqueue status before calling hard_start_xmit

    Linus Torvalds
     
  • We'd better not nlmsg_free on a pointer containing an undefined value
    (and without having anything allocated).

    Spotted by the Coverity checker.

    Signed-off-by: Adrian Bunk
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Lockdep reports a circular locking dependency in the hibernate code
    because
    - during system boot hibernate code (from an initcall) locks pm_mutex
    and then a sysfs buffer mutex via name_to_dev_t
    - during regular operation hibernate code locks pm_mutex under a
    sysfs buffer mutex because it's called from sysfs methods.

    The deadlock can never happen because during initcall invocation nothing
    can write to sysfs yet. This removes the lockdep report by marking the
    initcall locking as being in a different class.

    Signed-off-by: Johannes Berg
    Cc: "Rafael J. Wysocki"
    Cc: Alan Stern
    Acked-by: Peter Zijlstra
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg