09 Jun, 2010

7 commits

  • For people who otherwise get to write: cpu_clock(smp_processor_id()),
    there is now: local_clock().

    Also, as per suggestion from Andrew, provide some documentation on
    the various clock interfaces, and minimize the unsigned long long vs
    u64 mess.

    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ingo Molnar
     
  • Concurrency managed workqueue needs to know when workers are going to
    sleep and waking up. Using these two hooks, cmwq keeps track of the
    current concurrency level and throttles execution of new works if it's
    too high and wakes up another worker from the sleep hook if it becomes
    too low.

    This patch introduces PF_WQ_WORKER to identify workqueue workers and
    adds the following two hooks.

    * wq_worker_waking_up(): called when a worker is woken up.

    * wq_worker_sleeping(): called when a worker is going to sleep and may
    return a pointer to a local task which should be woken up. The
    returned task is woken up using try_to_wake_up_local() which is
    simplified ttwu which is called under rq lock and can only wake up
    local tasks.

    Both hooks are currently defined as noop in kernel/workqueue_sched.h.
    Later cmwq implementation will replace them with proper
    implementation.

    These hooks are hard coded as they'll always be enabled.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Ingo Molnar

    Tejun Heo
     
  • Factor ttwu_activate() and ttwu_woken_up() out of try_to_wake_up().
    The factoring out doesn't affect try_to_wake_up() much
    code-generation-wise. Depending on configuration options, it ends up
    generating the same object code as before or slightly different one
    due to different register assignment.

    This is to help future implementation of try_to_wake_up_local().

    Mike Galbraith suggested rename to ttwu_post_activation() from
    ttwu_woken_up() and comment update in try_to_wake_up().

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Ingo Molnar

    Tejun Heo
     
  • Currently, when a cpu goes down, cpu_active is cleared before
    CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
    default priority cpu notifier. When a cpu is coming up, it's set
    before CPU_ONLINE but cpuset configuration again is updated from the
    same cpu notifier.

    For cpu notifiers, this presents an inconsistent state. Threads which
    a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
    migrated to other cpus because the cpu is no more inactive.

    Fix it by updating cpu_active in the highest priority cpu notifier and
    cpuset configuration in the second highest when a cpu is coming up.
    Down path is updated similarly. This guarantees that all other cpu
    notifiers see consistent cpu_active and cpuset configuration.

    cpuset_track_online_cpus() notifier is converted to
    cpuset_update_active_cpus() which just updates the configuration and
    now called from cpuset_cpu_[in]active() notifiers registered from
    sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus()
    degenerates into partition_sched_domains() making separate notifier
    for !CONFIG_CPUSETS unnecessary.

    This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug
    callback creates a kthread and kthread_bind()s it to the target cpu,
    and the thread is expected to run on that cpu.

    * Ingo's test discovered __cpuinit/exit markups were incorrect.
    Fixed.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul Menage

    Tejun Heo
     
  • Instead of hardcoding priority 10 and 20 in sched and perf, collect
    them into CPU_PRI_* enums.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo

    Tejun Heo
     
  • PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
    typically holds rq->lock around the css rcu derefs but the generic
    cgroup code doesn't (and can't) know about that lock.

    Provide means to add extra checks to the css dereference and use that
    in the scheduler to annotate its users.

    The addition of rq->lock to these checks is correct because the
    cgroup_subsys::attach() method takes the rq->lock for each task it
    moves, therefore by holding that lock, we ensure the task is pinned to
    the current cgroup and the RCU derefence is valid.

    That leaves one genuine race in __sched_setscheduler() where we used
    task_group() without holding any of the required locks and thus raced
    with the cgroup code. Solve this by moving the check under the
    appropriate lock.

    Signed-off-by: Peter Zijlstra
    Cc: "Paul E. McKenney"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Jun, 2010

6 commits

  • * git://git.infradead.org/~dwmw2/mtd-2.6.35:
    jffs2: update ctime when changing the file's permission by setfacl
    jffs2: Fix NFS race by using insert_inode_locked()
    jffs2: Fix in-core inode leaks on error paths
    mtd: Fix NAND submenu
    mtd/r852: update card detect early.
    mtd/r852: Fixes in case of DMA timeout
    mtd/r852: register IRQ as last step
    drivers/mtd: Use memdup_user
    docbook: make mtd nand module init static

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
    ahci: redo stopping DMA engines on empty ports
    sata_sil24: fix kernel panic on ARM caused by unaligned access in sata_sil24
    ahci: add pci quirk for JMB362
    sata_via: explain the magic fix

    Linus Torvalds
     
  • Commit 96d60303fd (ahci: Turn off DMA engines when there's no device)
    implemented stopping DMA engines on empty ports but it used single
    sampling of status registers to determine device presence which led to
    disabling of DMA engines on occupied ports. Do it after all EH
    actions are complete using device presence state determined by EH.
    This avoids spurious disabling of DMA engines and simplifies the code.

    Signed-off-by: Tejun Heo
    Tested-by: Marc Dionne
    Cc: Matthew Garrett
    Cc: Robert Hancock
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • The sata_sil24 driver has six 16-bit registers that are initialised with
    32-bit writes. This cause a kernel panic on ARM due to the unaligned
    accesses which result.

    This patch changes the accesses to the correct 16-bit ones.

    Signed-off-by: Colin Tuckley
    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Colin Tuckley
     
  • JMB362 is a new variant of jmicron controller which is similar to
    JMB360 but has two SATA ports instead of one. As there is no PATA
    port, single function AHCI mode can be used as in JMB360. Add pci
    quirk for JMB362.

    Signed-off-by: Tejun Heo
    Reported-by: Aries Lee
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Add Joseph Chan's explanation of the problem and workaround to the
    VT6421 magic fix.

    Signed-off-by: Tejun Heo
    Cc: Joseph Chan
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

07 Jun, 2010

2 commits

  • At the point of the call to dev_err, wm8350 is NULL.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @r exists@
    expression E,E1;
    identifier f;
    statement S1,S2,S3;
    @@

    if ((E == NULL && ...) || ...)
    {
    ... when != if (...) S1 else S2
    when != E = E1
    * E->f
    ... when any
    return ...;
    }
    else S3
    //

    Signed-off-by: Julia Lawall
    Acked-by: Mark Brown
    Signed-off-by: Wim Van Sebroeck

    Julia Lawall
     
  • This reverts commit 962400e8fd29981a7b166e463dd143b6ac6a3e76, which was
    entirely bogus.

    The code used to multiply the character offset by "vc->vc_cols", and
    that's actually correct, because 'd' itself is an 'unsigned short'. So
    the pointer arithmetic already takes the size of a VGA character into
    account. Changing it to use vc_size_row (which is just "vc_cols"
    shifted up to take the size of the character into account) ends up
    multiplying with the VGA character size twice.

    This got reported as bugs for various other subsystems, because what it
    actually results in is writing the 16-bit vc_video_erase_char pattern
    (usually 0x0720: 0x07 is the default attribute, 0x20 is ASCII space)
    into some random other allocation.

    So Markus ended up reporting this as a ext4 bug, while to Torsten Kaiser
    it looked like a problem with KMS or libata. Jeff Chua saw it in
    different places.

    And finally - Justin Mattock had slab poisoning enabled, and saw it as a
    slab poison overwritten. And bisected and reverted this to verify the
    buggy commit.

    Reported-by: Markus Trippelsdorf
    Reported-by: Torsten Kaiser
    Reported-by: Jeff Chua
    Reported-by: Justin P. Mattock
    Reported-bisected-and-tested-by: Justin P. Mattock
    Acked-by: Dave Airlie
    Cc: Frank Pan
    Cc: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Jun, 2010

4 commits

  • jffs2 didn't update the ctime of the file when its permission was changed.

    Steps to reproduce:
    # touch aaa
    # stat -c %Z aaa
    1275289822
    # setfacl -m 'u::x,g::x,o::x' aaa
    # stat -c %Z aaa
    1275289822 .

    Signed-off-by: Jan Kara
    Signed-off-by: David Woodhouse

    Jan Kara
     
  • Linus Torvalds
     
  • Cursors need to be in the GTT domain when being accessed by the GPU.
    Previously this was a fortuitous byproduct of userspace using pwrite()
    to upload the image data into the cursor. The redundant clflush was
    removed in commit 9b8c4a and so the image was no longer being flushed
    out of the caches into main memory. One could also devise a scenario
    where the cursor was rendered by the GPU, prior to being attached as the
    cursor, resulting in similar corruption due to the missing MI_FLUSH.

    Fixes:

    Bug 28335 - Cursor corruption caused by commit 9b8c4a0b21
    https://bugs.freedesktop.org/show_bug.cgi?id=28335

    Signed-off-by: Chris Wilson
    Reported-and-tested-by: Jeff Chua
    Tested-by: Linus Torvalds
    Reported-by: Andy Isaacson
    Signed-off-by: Linus Torvalds

    Chris Wilson
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags
    ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files

    Linus Torvalds
     

05 Jun, 2010

21 commits

  • A few functions were still modifying i_flags in a racy manner.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: improve xfs_isilocked
    xfs: skip writeback from reclaim context
    xfs: remove done roadmap item from xfs-delayed-logging-design.txt
    xfs: fix race in inode cluster freeing failing to stale inodes
    xfs: fix access to upper inodes without inode64
    xfs: fix might_sleep() warning when initialising per-ag tree
    fs/xfs/quota: Add missing mutex_unlock
    xfs: remove duplicated #include
    xfs: convert more trace events to DEFINE_EVENT
    xfs: xfs_trace.c: remove duplicated #include
    xfs: Check new inode size is OK before preallocating
    xfs: clean up xlog_align
    xfs: cleanup log reservation calculactions
    xfs: be more explicit if RT mount fails due to config
    xfs: replace E2BIG with EFBIG where appropriate

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
    X25: remove duplicated #include
    tcp: use correct net ns in cookie_v4_check()
    rps: tcp: fix rps_sock_flow_table table updates
    ppp_generic: fix multilink fragment sizes
    syncookies: remove Kconfig text line about disabled-by-default
    ixgbe: only check pfc bits in hang logic if pfc is enabled
    net: check for refcount if pop a stacked dst_entry
    ixgbe: return IXGBE_ERR_RAR_INDEX when out of range
    act_pedit: access skb->data safely
    sfc: Store port number in net_device::dev_id
    epic100: Test __BIG_ENDIAN instead of (non-existent) CONFIG_BIG_ENDIAN
    tehuti: return -EFAULT on copy_to_user errors
    isdn/kcapi: return -EFAULT on copy_from_user errors
    e1000e: change logical negate to bitwise
    sfc: Get port number from CS_PORT_NUM, not PCI function number
    cls_u32: use skb_header_pointer() to dereference data safely
    TCP: tcp_hybla: Fix integer overflow in slow start increment
    act_nat: fix the wrong checksum when addr isn't in old_addr/mask
    net/fec: fix pm to survive to suspend/resume
    korina: count RX DMA OVR as rx_fifo_error
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
    nilfs2: remove obsolete declarations of cache constructor and destructor
    nilfs2: fix style issue in nilfs_destroy_cachep

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    Minix: Clean up left over label
    fix truncate inode time modification breakage
    fix setattr error handling in sysfs, configfs
    fcntl: return -EFAULT if copy_to_user fails
    wrong type for 'magic' argument in simple_fill_super()
    fix the deadlock in qib_fs
    mqueue doesn't need make_bad_inode()

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    module: fix bne2 "gave up waiting for init of module libcrc32c"
    module: verify_export_symbols under the lock
    module: move find_module check to end
    module: make locking more fine-grained.
    module: Make module sysfs functions private.
    module: move sysfs exposure to end of load_module
    module: fix kdb's illicit use of struct module_use.
    module: Make the 'usage' lists be two-way

    Linus Torvalds
     
  • Problem: it's hard to avoid an init routine stumbling over a
    request_module these days. And it's not clear it's always a bad idea:
    for example, a module like kvm with dynamic dependencies on kvm-intel
    or kvm-amd would be neater if it could simply request_module the right
    one.

    In this particular case, it's libcrc32c:

    libcrc32c_mod_init
    crypto_alloc_shash
    crypto_alloc_tfm
    crypto_find_alg
    crypto_alg_mod_lookup
    crypto_larval_lookup
    request_module

    If another module is waiting inside resolve_symbol() for libcrc32c to
    finish initializing (ie. bne2 depends on libcrc32c) then it does so
    holding the module lock, and our request_module() can't make progress
    until that is released.

    Waiting inside resolve_symbol() without the lock isn't all that hard:
    we just need to pass the -EBUSY up the call chain so we can sleep
    where we don't hold the lock. Error reporting is a bit trickier: we
    need to copy the name of the unfinished module before releasing the
    lock.

    Other notes:
    1) This also fixes a theoretical issue where a weak dependency would allow
    symbol version mismatches to be ignored.
    2) We rename use_module to ref_module to make life easier for the only
    external user (the out-of-tree ksplice patches).

    Signed-off-by: Rusty Russell
    Cc: Linus Torvalds
    Cc: Tim Abbot
    Tested-by: Brandon Philips

    Rusty Russell
     
  • It disabled preempt so it was "safe", but nothing stops another module
    slipping in before this module is added to the global list now we don't
    hold the lock the whole time.

    So we check this just after we check for duplicate modules, and just
    before we put the module in the global list.

    (find_symbol finds symbols in coming and going modules, too).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • I think Rusty may have made the lock a bit _too_ finegrained there, and
    didn't add it to some places that needed it. It looks, for example, like
    PATCH 1/2 actually drops the lock in places where it's needed
    ("find_module()" is documented to need it, but now load_module() didn't
    hold it at all when it did the find_module()).

    Rather than adding a new "module_loading" list, I think we should be able
    to just use the existing "modules" list, and just fix up the locking a
    bit.

    In fact, maybe we could just move the "look up existing module" a bit
    later - optimistically assuming that the module doesn't exist, and then
    just undoing the work if it turns out that we were wrong, just before
    adding ourselves to the list.

    Signed-off-by: Rusty Russell

    Linus Torvalds
     
  • Kay Sievers reports that we still have some
    contention over module loading which is slowing boot.

    Linus also disliked a previous "drop lock and regrab" patch to fix the
    bne2 "gave up waiting for init of module libcrc32c" message.

    This is more ambitious: we only grab the lock where we need it.

    Signed-off-by: Rusty Russell
    Cc: Brandon Philips
    Cc: Kay Sievers
    Cc: Linus Torvalds

    Rusty Russell
     
  • These were placed in the header in ef665c1a06 to get the various
    SYSFS/MODULE config combintations to compile.

    That may have been necessary then, but it's not now. These functions
    are all local to module.c.

    Signed-off-by: Rusty Russell
    Cc: Randy Dunlap

    Rusty Russell
     
  • This means a little extra work, but is more logical: we don't put
    anything in sysfs until we're about to put the module into the
    global list an parse its parameters.

    This also gives us a logical place to put duplicate module detection
    in the next patch.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Linus changed the structure, and luckily this didn't compile any more.

    Reported-by: Stephen Rothwell
    Signed-off-by: Rusty Russell
    Cc: Jason Wessel
    Cc: Martin Hicks

    Rusty Russell
     
  • When adding a module that depends on another one, we used to create a
    one-way list of "modules_which_use_me", so that module unloading could
    see who needs a module.

    It's actually quite simple to make that list go both ways: so that we
    not only can see "who uses me", but also see a list of modules that are
    "used by me".

    In fact, we always wanted that list in "module_unload_free()": when we
    unload a module, we want to also release all the other modules that are
    used by that module. But because we didn't have that list, we used to
    first iterate over all modules, and then iterate over each "used by me"
    list of that module.

    By making the list two-way, we simplify module_unload_free(), and it
    allows for some trivial fixes later too.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Rusty Russell (cleaned & rebased)

    Linus Torvalds
     
  • Remove duplicated #include('s) in drivers/net/wan/x25_asy.c

    Signed-off-by: Huang Weiyi
    Signed-off-by: David S. Miller

    Huang Weiyi
     
  • Its better to make a route lookup in appropriate namespace.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I believe a moderate SYN flood attack can corrupt RFS flow table
    (rps_sock_flow_table), making RPS/RFS much less effective.

    Even in a normal situation, server handling short lived sessions suffer
    from bad steering for the first data packet of a session, if another SYN
    packet is received for another session.

    We do following action in tcp_v4_rcv() :

    sock_rps_save_rxhash(sk, skb->rxhash);

    We should _not_ do this if sk is a LISTEN socket, as about each
    packet received on a LISTEN socket has a different rxhash than
    previous one.
    -> RPS_NO_CPU markers are spread all over rps_sock_flow_table.

    Also, it makes sense to protect sk->rxhash field changes with socket
    lock (We currently can change it even if user thread owns the lock
    and might use rxhash)

    This patch moves sock_rps_save_rxhash() to a sock locked section,
    and only for non LISTEN sockets.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Fix bug in multilink fragment size calculation introduced by
    commit 9c705260feea6ae329bc6b6d5f6d2ef0227eda0a
    "ppp: ppp_mp_explode() redesign"

    Signed-off-by: Ben McKeegan
    Signed-off-by: David S. Miller

    Ben McKeegan
     
  • syncookies default to on since
    e994b7c901ded7200b525a707c6da71f2cf6d4bb
    (tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Only check pfc bits in hang logic if PFC is enabled. Previously,
    if DCB was enabled but PFC was disabled the incorrect pause
    bits would be checked.

    Signed-off-by: John Fastabend
    Acked-by: Don Skidmore
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    John Fastabend
     
  • xfrm triggers a warning if dst_pop() drops a refcount
    on a noref dst. This patch changes dst_pop() to
    skb_dst_pop(). skb_dst_pop() drops the refcnt only
    on a refcounted dst. Also we don't clone the child
    dst_entry, so it is not refcounted and we can use
    skb_dst_set_noref() in xfrm_output_one().

    Signed-off-by: Steffen Klassert
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Steffen Klassert