19 Jun, 2008

24 commits

  • This patch corrects the incorrect value of per process run-queue wait
    time reported by delay statistics. The anomaly was due to the following
    reason. When a process leaves the CPU and immediately starts waiting for
    CPU on the runqueue (which means it remains in the TASK_RUNNABLE state),
    the time of re-entry into the run-queue is never recorded. Due to this,
    the waiting time on the runqueue from this point of re-entry upto the
    next time it hits the CPU is not accounted for. This is solved by
    recording the time of re-entry of a process leaving the CPU in the
    sched_info_depart() function IF the process will go back to waiting on
    the run-queue. This IF condition is verified by checking whether the
    process is still in the TASK_RUNNABLE state.

    The patch was tested on 2.6.26-rc6 using two simple CPU hog programs.
    The values noted prior to the fix did not account for the time spent on
    the runqueue waiting. After the fix, the correct values were reported
    back to user space.

    Signed-off-by: Bharath Ravi
    Signed-off-by: Madhava K R
    Cc: dhaval@linux.vnet.ibm.com
    Cc: vatsa@in.ibm.com
    Cc: balbir@in.ibm.com
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Bharath Ravi
     
  • Ingo Molnar
     
  • First issue is not related to the cpusets. We're simply leaking doms_cur.
    It's allocated in arch_init_sched_domains() which is called for every
    hotplug event. So we just keep reallocation doms_cur without freeing it.
    I introduced free_sched_domains() function that cleans things up.

    Second issue is that sched domains created by the cpusets are
    completely destroyed by the CPU hotplug events. For all CPU hotplug
    events scheduler attaches all CPUs to the NULL domain and then puts
    them all into the single domain thereby destroying domains created
    by the cpusets (partition_sched_domains).
    The solution is simple, when cpusets are enabled scheduler should not
    create default domain and instead let cpusets do that. Which is
    exactly what the patch does.

    Signed-off-by: Max Krasnyansky
    Cc: pj@sgi.com
    Cc: menage@google.com
    Cc: rostedt@goodmis.org
    Acked-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner

    Max Krasnyansky
     
  • In tick_task_rt() we first call update_curr_rt() which can dequeue a runqueue
    due to it running out of runtime, and then we try to requeue it, of it also
    having exhausted its RR quota. Obviously requeueing something that is no longer
    on the runqueue will not have the expected result.

    Signed-off-by: Peter Zijlstra
    Tested-by: Daniel K.
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The bandwidth throttle code dequeues a group when it runs out of quota, and
    re-queues it once the period rolls over and the quota gets refreshed.

    Sadly it failed to take the hierarchy into consideration. Share more of the
    enqueue/dequeue code with regular task opterations.

    Also, some operations like sched_setscheduler() can dequeue/enqueue tasks that
    are in throttled runqueues, we should not inadvertly re-enqueue empty runqueues
    so check for that.

    Signed-off-by: Peter Zijlstra
    Tested-by: Daniel K.
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Don't re-set the entity's runqueue to the wrong rq after we've set it
    to the right one.

    Signed-off-by: Peter Zijlstra
    Tested-by: Daniel K.
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • When CONFIG_RT_GROUP_SCHED and CONFIG_CGROUP_SCHED are enabled, with:

    echo 10000 > /proc/sys/kernel/sched_rt_period_us

    We get this:

    BUG: unable to handle kernel NULL pointer dereference at 0000008c
    [ 947.682233] IP: [] __rt_schedulable+0x12/0x160
    [ 947.683123] *pde = 00000000=20
    [ 947.683782] Oops: 0000 [#1]
    [ 947.684307] Modules linked in:
    [ 947.684308]
    [ 947.684308] Pid: 2359, comm: bash Not tainted (2.6.26-rc6 #8)
    [ 947.684308] EIP: 0060:[] EFLAGS: 00000246 CPU: 0
    [ 947.684308] EIP is at __rt_schedulable+0x12/0x160
    [ 947.684308] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
    [ 947.684308] ESI: c0521db4 EDI: 00000001 EBP: c6cc9f00 ESP: c6cc9ed0
    [ 947.684308] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
    [ 947.684308] Process bash (pid: 2359, tiÆcc8000 taskÇa54f00=20 task.tiÆcc8000)
    [ 947.684308] Stack: c0222790 00000000 080f8c08 c0521db4 c6cc9f00 00000001 00000000 00000000
    [ 947.684308] c6cc9f9c 00000000 c0521db4 00000001 c6cc9f28 c0216d40 00000000 00000000
    [ 947.684308] c6cc9f9c 000f4240 000e7ef0 ffffffff c0521db4 c79dfb60 c6cc9f58 c02af2cc
    [ 947.684308] Call Trace:
    [ 947.684308] [] ? do_proc_dointvec_conv+0x0/0x50
    [ 947.684308] [] ? sched_rt_handler+0x80/0x110
    [ 947.684308] [] ? proc_sys_call_handler+0x9c/0xb0
    [ 947.684308] [] ? proc_sys_write+0x1a/0x20
    [ 947.684308] [] ? vfs_write+0x96/0x160
    [ 947.684308] [] ? proc_sys_write+0x0/0x20
    [ 947.684308] [] ? sys_write+0x3d/0x70
    [ 947.684308] [] ? sysenter_past_esp+0x6a/0x91
    [ 947.684308] =======================
    [ 947.684308] Code: 24 04 e8 62 b1 0e 00 89 c7 89 f8 8b 5d f4 8b 75
    f8 8b 7d fc 89 ec 5d c3 90 55 89 e5 57 56 53 83 ec 24 89 45 ec 89 55 e4
    89 4d e8 b8 8c 00 00 00 85 ff 0f 84 c9 00 00 00 8b 57 24 39 55 e8
    8b
    [ 947.684308] EIP: [] __rt_schedulable+0x12/0x160 SS:ESP 0068:c6cc9ed0

    We think the following patch solves the issue.

    Signed-off-by: Dario Faggioli
    Signed-off-by: Michael Trimarchi
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     
  • Commit 62c96b9d0917894c164aa3e474a3ff3bca1554ae ("agp/intel: cleanup
    some serious whitespace badness") didn't just fix whitespace. It also
    lost two lines.

    Noticed by Linus. No more whitespace diffs for me.

    Signed-off-by: Dave Airlie
    Signed-off-by: Linus Torvalds

    Dave Airlie
     
  • * 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
    agp/intel: cleanup some serious whitespace badness
    [AGP] intel_agp: Add support for Intel 4 series chipsets
    [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset
    agp: more boolean conversions.
    drivers/char/agp - use bool
    agp: two-stage page destruction issue
    agp/via: fixup pci ids

    Linus Torvalds
     
  • Signed-off-by: Dave Airlie

    Dave Airlie
     
  • Signed-off-by: Zhenyu Wang
    Signed-off-by: Dave Airlie

    Zhenyu Wang
     
  • This adds missing stolen memory size detect for IGD_GM, be sure to
    detect right size as current X intel driver (2.3.2) which has already
    worked out.

    Signed-off-by: Zhenyu Wang
    Signed-off-by: Dave Airlie

    Zhenyu Wang
     
  • Signed-off-by: Dave Airlie

    Dave Airlie
     
  • Use boolean in AGP instead of having own TRUE/FALSE

    --
    Signed-off-by: Joe Perches
    Signed-off-by: Dave Airlie

    Joe Perches
     
  • besides it apparently being useful only in 2.6.24 (the changes in 2.6.25
    really mean that it could be converted back to a single-stage mechanism),
    I'm seeing an issue in Xen Dom0 kernels, which is caused by the calling
    of gart_to_virt() in the second stage invocations of the destroy function.
    I think that besides this being a real issue with Xen (where
    unmap_page_from_agp() is not just a page table attribute change), this
    also is invalid from a theoretical perspective: One should not assume that
    gart_to_virt() is still valid after unmapping a page. So minimally (keeping
    the 2-stage mechanism) a patch like the one below would be needed.

    Jan

    Signed-off-by: Dave Airlie

    Jan Beulich
     
  • add a new PCI ID and remove an old dodgy one, include the explaination
    in the commented code so nobody readds later.

    (davej also sent the pci id addition).

    Signed-off-by: Dave Airlie

    Greg KH
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
    RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path

    Linus Torvalds
     
  • Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
    files") changed the way that closed files are handled in the uverbs
    code. However, after the conversion, is_closed flag is checked
    incorrectly in ib_uverbs_async_handler(). As a result, no async
    events are ever passed to applications.

    Found by: Ronni Zimmerman

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
    Revert "[WATCHDOG] hpwdt: Fix NMI handling."
    [WATCHDOG] hpwdt: Add CFLAGS to get driver working
    Revert "[WATCHDOG] make watchdog/hpwdt.c:asminline_call() static"

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] dpt_i2o: Add PROC_IA64 define
    [SCSI] scsi_host regression: fix scsi host leak
    [SCSI] sr: fix corrupt CD data after media change and delay

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
    [POWERPC] Clear sub-page HPTE present bits when demoting page size
    [POWERPC] 4xx: Clear new TLB cache attribute bits in Data Storage vector

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
    udf: restore UDFFS_DEBUG to being undefined by default

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
    netlink: genl: fix circular locking
    Revert "mac80211: Use skb_header_cloned() on TX path."
    af_unix: fix 'poll for write'/ connected DGRAM sockets
    tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set
    atl1: relax eeprom mac address error check
    net/enc28j60: low power mode
    net/enc28j60: section fix
    sky2: 88E8040T pci device id
    netxen: download firmware in pci probe
    netxen: cleanup debug messages
    netxen: remove global physical_port array
    netxen: fix portnum for hp mezz cards
    ibm_newemac: select CRC32 in Kconfig
    xfrm: fix fragmentation for ipv4 xfrm tunnel
    netfilter: nf_conntrack_h323: fix module unload crash
    netfilter: nf_conntrack_h323: fix memory leak in module initialization error path
    netfilter: nf_nat: fix RCU races
    atm: [he] send idle cells instead of unassigned when in SDH mode
    atm: [he] limit queries to the device's register space
    atm: [br2864] fix routed vcmux support
    ...

    Linus Torvalds
     
  • The old setup works better.

    Signed-off-by: Thomas Mingarelli
    Signed-off-by: Wim Van Sebroeck

    Wim Van Sebroeck
     

18 Jun, 2008

16 commits

  • When we demote a slice from 64k to 4k, and we are about to insert an
    HPTE for a 4k subpage and we notice that there is an existing 64k
    HPTE, we first invalidate that HPTE before inserting the new 4k
    subpage HPTE. Since the bits that encode which hash bucket the old
    HPTE was in overlap with the bits that encode which of the 16 subpages
    have HPTEs, we need to clear out the subpage HPTE-present bits before
    starting to insert HPTEs for the 4k subpages. If we don't do that, we
    can erroneously think that a subpage already has an HPTE when it
    doesn't.

    That in itself wouldn't be such a problem except that when we go to
    update the HPTE that we think is present on machines with a
    hypervisor, the hypervisor can tell us that the HPTE we think is there
    is actually there even though it isn't, which can lead to a process
    getting stuck in a loop, continually faulting. The reason for the
    confusion is that the AVPN (abbreviated virtual page number) we are
    looking for in the HPTE for a 4k subpage can actually match the AVPN
    in a stale HPTE for another 64k page. For example, the HPTE for
    the 4k subpage at 0x84000f000 will be in the same hash bucket and have
    the same AVPN as the HPTE for the 64k page at 0x8400f0000.

    This fixes the code to clear out the subpage HPTE-present bits.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • A recent commit added support for the new 440x6 and 464 cores that have the
    added WL1, IL1I, IL1D, IL2I, and ILD2 bits for the caching attributes in the
    TLBs. The new bits were cleared in the finish_tlb_load function, however a
    similar bit of code was missed in the DataStorage interrupt vector.

    Signed-off-by: Josh Boyer
    Signed-off-by: Paul Mackerras

    Josh Boyer
     
  • genetlink has a circular locking dependency when dumping the registered
    families:

    - dump start:
    genl_rcv() : take genl_mutex
    genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
    netlink_dump_start(),
    netlink_dump() : take nlk->cb_mutex
    ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
    second time

    - dump continuance:
    netlink_rcv() : call netlink_dump
    netlink_dump : take nlk->cb_mutex
    ctrl_dumpfamily() : take genl_mutex

    Register genl_lock as callback mutex with netlink to fix this. This slightly
    widens an already existing module unload race, the genl ops used during the
    dump might go away when the module is unloaded. Thomas Graf is working on a
    seperate fix for this.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • This reverts commit 608961a5eca8d3c6bd07172febc27b5559408c5d.

    The problem is that the mac80211 stack not only needs to be able to
    muck with the link-level headers, it also might need to mangle all of
    the packet data if doing sw wireless encryption.

    This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the
    bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
    bringing this bisection analysis to my attention), and Ilpo (for
    trying to analyze this purely from the TCP side).

    In 2.6.27 we can take another stab at this, by using something like
    skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
    tx->key. The ESP protocol code in the IPSEC stack can be used as a
    model for implementation.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The unix_dgram_sendmsg routine implements a (somewhat crude)
    form of receiver-imposed flow control by comparing the length of the
    receive queue of the 'peer socket' with the max_ack_backlog value
    stored in the corresponding sock structure, either blocking
    the thread which caused the send-routine to be called or returning
    EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
    sockets. The poll-implementation for these socket types is
    datagram_poll from core/datagram.c. A socket is deemed to be writeable
    by this routine when the memory presently consumed by datagrams
    owned by it is less than the configured socket send buffer size. This
    is always wrong for connected PF_UNIX non-stream sockets when the
    abovementioned receive queue is currently considered to be full.
    'poll' will then return, indicating that the socket is writeable, but
    a subsequent write result in EAGAIN, effectively causing an
    (usual) application to 'poll for writeability by repeated send request
    with O_NONBLOCK set' until it has consumed its time quantum.

    The change below uses a suitably modified variant of the datagram_poll
    routines for both type of PF_UNIX sockets, which tests if the
    recv-queue of the peer a socket is connected to is presently
    considered to be 'full' as part of the 'is this socket
    writeable'-checking code. The socket being polled is additionally
    put onto the peer_wait wait queue associated with its peer, because the
    unix_dgram_sendmsg routine does a wake up on this queue after a
    datagram was received and the 'other wakeup call' is done implicitly
    as part of skb destruction, meaning, a process blocked in poll
    because of a full peer receive queue could otherwise sleep forever
    if no datagram owned by its socket was already sitting on this queue.
    Among this change is a small (inline) helper routine named
    'unix_recvq_full', which consolidates the actual testing code (in three
    different places) into a single location.

    Signed-off-by: Rainer Weikusat
    Signed-off-by: David S. Miller

    Rainer Weikusat
     
  • David S. Miller
     
  • By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
    packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
    assumed that the driver will check the first nibble of packet,
    determine IP version and set the appropriate protocol.

    Signed-off-by: Ang Way Chuang
    Acked-by: Max Krasnyansky
    Signed-off-by: David S. Miller

    Ang Way Chuang
     
  • The atl1 driver tries to determine the MAC address thusly:

    - If an EEPROM exists, read the MAC address from EEPROM and
    validate it.
    - If an EEPROM doesn't exist, try to read a MAC address from
    SPI flash.
    - If that fails, try to read a MAC address directly from the
    MAC Station Address register.
    - If that fails, assign a random MAC address provided by the
    kernel.

    We now have a report of a system fitted with an EEPROM containing all
    zeros where we expect the MAC address to be, and we currently handle
    this as an error condition. Turns out, on this system the BIOS writes
    a valid MAC address to the NIC's MAC Station Address register, but we
    never try to read it because we return an error when we find the all-
    zeros address in EEPROM.

    This patch relaxes the error check and continues looking for a MAC
    address even if it finds an illegal one in EEPROM.

    Signed-off-by: Radu Cristescu
    Signed-off-by: Jay Cliburn
    Signed-off-by: Jeff Garzik

    Radu Cristescu
     
  • Keep enc28j60 chips in low-power mode when they're not in use.
    At typically 120 mA, these chips run hot even when idle; this
    low power mode cuts that power usage by a factor of around 100.

    This version provides a generic routine to poll a register until
    its masked value equals some value ... e.g. bit set or cleared.
    It's basically what the previous wait_phy_ready() did, but this
    version is generalized to support the handshaking needed to
    enter and exit low power mode.

    Signed-off-by: David Brownell
    Signed-off-by: Claudio Lanconelli
    Signed-off-by: Jeff Garzik

    David Brownell
     
  • Minor bugfixes to the enc28j60 driver ... wrong section marking,
    indentation, and bogus use of spi_bus_type.

    Signed-off-by: David Brownell
    Acked-by: Claudio Lanconelli
    Signed-off-by: Jeff Garzik

    David Brownell
     
  • Missed one pci id for 88E8040T.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Jeff Garzik

    Stephen Hemminger
     
  • Downloading firmware in pci probe allows recovery in case of
    firmware failure by reloading the driver.

    Also reduced delays in firmware load.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: Jeff Garzik

    Dhananjay Phadke
     
  • o Remove unnecessary debug prints and functions.
    o Explicitly specify pci class (0x020000) to avoid enabling
    management function.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: Jeff Garzik

    Dhananjay Phadke
     
  • Store physical port number in netxen_adapter structure.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: Jeff Garzik

    Dhananjay Phadke
     
  • This fixes a the issue where logical port number is set incorrectly
    for HP blade mezz cards.

    Signed-off-by: Dhananjay Phadke
    Signed-off-by: Jeff Garzik

    Dhananjay Phadke
     
  • The ibm_newemac driver requires ether_crc to be defined. Apparently it is
    possible to generate a .config without CONFIG_CRC32 set which causes the
    following link errors if IBM_NEW_EMAC is selected:

    LD .tmp_vmlinux1
    drivers/built-in.o: In function `emac_hash_mc':
    core.c:(.text+0x2f524): undefined reference to `crc32_le'
    core.c:(.text+0x2f528): undefined reference to `bitrev32'
    make: *** [.tmp_vmlinux1] Error 1

    This patch has IBM_NEW_EMAC select CRC32 so we don't hit this error.

    Signed-off-by: Josh Boyer
    Signed-off-by: Jeff Garzik

    Josh Boyer