06 Oct, 2010

3 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh
    generic-ipi: Fix deadlock in __smp_call_function_single

    Linus Torvalds
     
  • The "flags" member of "struct wait_queue_t" is used in several places in
    the kernel code without beeing initialized by init_wait(). "flags" is
    used in bitwise operations.

    If "flags" not initialized then unexpected behaviour may take place.
    Incorrect flags might used later in code.

    Added initialization of "wait_queue_t.flags" with zero value into
    "init_wait".

    Signed-off-by: Evgeny Kuznetsov
    [ The bit we care about does end up being initialized by both
    prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to
    cause actual bugs, but is definitely the right thing to do -Linus ]
    Signed-off-by: Linus Torvalds

    Evgeny Kuznetsov
     
  • With all the recent module loading cleanups, we've minimized the code
    that sits under module_mutex, fixing various deadlocks and making it
    possible to do most of the module loading in parallel.

    However, that whole conversion totally missed the rather obscure code
    that adds a new module to the list for BUG() handling. That code was
    doubly obscure because (a) the code itself lives in lib/bugs.c (for
    dubious reasons) and (b) it gets called from the architecture-specific
    "module_finalize()" rather than from generic code.

    Calling it from arch-specific code makes no sense what-so-ever to begin
    with, and is now actively wrong since that code isn't protected by the
    module loading lock any more.

    So this commit moves the "module_bug_{finalize,cleanup}()" calls away
    from the arch-specific code, and into the generic code - and in the
    process protects it with the module_mutex so that the list operations
    are now safe.

    Future fixups:
    - move the module list handling code into kernel/module.c where it
    belongs.
    - get rid of 'module_bug_list' and just use the regular list of modules
    (called 'modules' - imagine that) that we already create and maintain
    for other reasons.

    Reported-and-tested-by: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Adrian Bunk
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 Oct, 2010

1 commit


01 Oct, 2010

1 commit

  • Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm()
    before entering into that state. CPUs tend to flush TLB in those c-states
    anyways.

    acpi_idle does this with C3-type states, but it was not caried over
    when intel_idle was introduced. intel_idle can apply it
    to C-states in addition to those that ACPI might export as C3...

    Signed-off-by: Suresh Siddha
    Signed-off-by: Len Brown

    Suresh Siddha
     

30 Sep, 2010

1 commit


29 Sep, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
    tcp: Fix >4GB writes on 64-bit.
    net/9p: Mount only matching virtio channels
    de2104x: fix ethtool
    tproxy: check for transparent flag in ip_route_newports
    ipv6: add IPv6 to neighbour table overflow warning
    tcp: fix TSO FACK loss marking in tcp_mark_head_lost
    3c59x: fix regression from patch "Add ethtool WOL support"
    ipv6: add a missing unregister_pernet_subsys call
    s390: use free_netdev(netdev) instead of kfree()
    sgiseeq: use free_netdev(netdev) instead of kfree()
    rionet: use free_netdev(netdev) instead of kfree()
    ibm_newemac: use free_netdev(netdev) instead of kfree()
    smsc911x: Add MODULE_ALIAS()
    net: reset skb queue mapping when rx'ing over tunnel
    br2684: fix scheduling while atomic
    de2104x: fix TP link detection
    de2104x: fix power management
    de2104x: disable autonegotiation on broken hardware
    net: fix a lockdep splat
    e1000e: 82579 do not gate auto config of PHY by hardware during nominal use
    ...

    Linus Torvalds
     

28 Sep, 2010

2 commits

  • Fixes kernel bugzilla #16603

    tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
    zero bytes, for example.

    There is also the problem higher up of how verify_iovec() works. It
    wants to prevent the total length from looking like an error return
    value.

    However it does this using 'int', but syscalls return 'long' (and
    thus signed 64-bit on 64-bit machines). So it could trigger
    false-positives on 64-bit as written. So fix it to use 'long'.

    Reported-by: Olaf Bonorden
    Reported-by: Daniel Büse
    Reported-by: Andrew Morton
    Signed-off-by: David S. Miller

    David S. Miller
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/amd-iommu: Fix rounding-bug in __unmap_single
    x86/amd-iommu: Work around S3 BIOS bug
    x86/amd-iommu: Set iommu configuration flags in enable-loop
    x86, setup: Fix earlyprintk=serial,0x3f8,115200
    x86, setup: Fix earlyprintk=serial,ttyS0,115200

    Linus Torvalds
     

24 Sep, 2010

1 commit


23 Sep, 2010

5 commits

  • rcu_dereference_bh() doesnt know yet about hard irq being disabled, so
    lockdep can trigger in netpoll_rx() after commit f0f9deae9e7c4 (netpoll:
    Disable IRQ around RCU dereference in netpoll_rx)

    Reported-by: Miles Lane
    Signed-off-by: Eric Dumazet
    Tested-by: Miles Lane
    Signed-off-by: Paul E. McKenney

    Eric Dumazet
     
  • This patch adds a workaround for an IOMMU BIOS problem to
    the AMD IOMMU driver. The result of the bug is that the
    IOMMU does not execute commands anymore when the system
    comes out of the S3 state resulting in system failure. The
    bug in the BIOS is that is does not restore certain hardware
    specific registers correctly. This workaround reads out the
    contents of these registers at boot time and restores them
    on resume from S3. The workaround is limited to the specific
    IOMMU chipset where this problem occurs.

    Cc: stable@kernel.org
    Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • This fixes the regression caused by the commit 6fee48cd330c68
    ("dma-mapping: arm: use generic pci_set_dma_mask and
    pci_set_consistent_dma_mask").

    ARM needs to clip the dma coherent mask for dmabounce devices. This
    restores the old trick.

    Note that strictly speaking, the DMA API doesn't allow architectures to do
    such but I'm not sure it's worth adding the new API to set the dma mask
    that allows architectures to clip it.

    Reported-by: Krzysztof Halasa
    Signed-off-by: FUJITA Tomonori
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • Add a missing inline keyword for static function in linux/dmaengine.h to
    avoid duplicate symbol definitions.

    Signed-off-by: Mathieu Lacage
    Signed-off-by: Dan Williams

    Mathieu Lacage
     
  • This patch reduces namespace pollution by moving the "struct net" declaration
    out of the userspace-facing portion of linux/netlink.h. It has no impact on
    the kernel.

    (This came up because we have several C++ applications which use "net" as a
    namespace name.)

    Signed-off-by: Ollie Wild
    Signed-off-by: David S. Miller

    Ollie Wild
     

22 Sep, 2010

1 commit

  • The lock structs are currently protected by the BKL, but are accessed by
    code in fs/locks.c and misc file system and DLM code. These stubs will
    allow all users to switch to the new interface before the implementation
    is changed to a spinlock.

    Acked-by: Arnd Bergmann
    Signed-off-by: Sage Weil
    Signed-off-by: Linus Torvalds

    Sage Weil
     

20 Sep, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    dca: disable dca on IOAT ver.3.0 multiple-IOH platforms
    netpoll: Disable IRQ around RCU dereference in netpoll_rx
    sctp: Do not reset the packet during sctp_packet_config().
    net/llc: storing negative error codes in unsigned short
    MAINTAINERS: move atlx discussions to netdev
    drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memory
    drivers/net/eql.c: prevent reading uninitialized stack memory
    drivers/net/usb/hso.c: prevent reading uninitialized memory
    xfrm: dont assume rcu_read_lock in xfrm_output_one()
    r8169: Handle rxfifo errors on 8168 chips
    3c59x: Remove atomic context inside vortex_{set|get}_wol
    tcp: Prevent overzealous packetization by SWS logic.
    net: RPS needs to depend upon USE_GENERIC_SMP_HELPERS
    phylib: fix PAL state machine restart on resume
    net: use rcu_barrier() in rollback_registered_many
    bonding: correctly process non-linear skbs
    ipv4: enable getsockopt() for IP_NODEFRAG
    ipv4: force_igmp_version ignored when a IGMPv3 query received
    ppp: potential NULL dereference in ppp_mp_explode()
    net/llc: make opt unsigned in llc_ui_setsockopt()
    ...

    Linus Torvalds
     

18 Sep, 2010

1 commit

  • We cannot use rcu_dereference_bh safely in netpoll_rx as we may
    be called with IRQs disabled. We could however simply disable
    IRQs as that too causes BH to be disabled and is safe in either
    case.

    Thanks to John Linville for discovering this bug and providing
    a patch.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

17 Sep, 2010

1 commit


15 Sep, 2010

3 commits

  • * ssh://master.kernel.org/home/hpa/tree/sec:
    x86-64, compat: Retruncate rax after ia32 syscall entry tracing
    x86-64, compat: Test %rax for the syscall number, not %eax
    compat: Make compat_alloc_user_space() incorporate the access_ok()

    Linus Torvalds
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
    statfs() gives ESTALE error
    NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
    sunrpc: increase MAX_HASHTABLE_BITS to 14
    gss:spkm3 miss returning error to caller when import security context
    gss:krb5 miss returning error to caller when import security context
    Remove incorrect do_vfs_lock message
    SUNRPC: cleanup state-machine ordering
    SUNRPC: Fix a race in rpc_info_open
    SUNRPC: Fix race corrupting rpc upcall
    Fix null dereference in call_allocate

    Linus Torvalds
     
  • compat_alloc_user_space() expects the caller to independently call
    access_ok() to verify the returned area. A missing call could
    introduce problems on some architectures.

    This patch incorporates the access_ok() check into
    compat_alloc_user_space() and also adds a sanity check on the length.
    The existing compat_alloc_user_space() implementations are renamed
    arch_compat_alloc_user_space() and are used as part of the
    implementation of the new global function.

    This patch assumes NULL will cause __get_user()/__put_user() to either
    fail or access userspace on all architectures. This should be
    followed by checking the return value of compat_access_user_space()
    for NULL in the callers, at which time the access_ok() in the callers
    can also be removed.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Chris Metcalf
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Tony Luck
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: James Bottomley
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc:

    H. Peter Anvin
     

14 Sep, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    dquot: do full inode dirty in allocating space

    Linus Torvalds
     
  • * 'next-spi' of git://git.secretlab.ca/git/linux-2.6:
    spi/pl022: move probe call to subsys_initcall()
    powerpc/5200: mpc52xx_uart.c: Add of_node_put to avoid memory leak
    spi/pl022: fix APB pclk power regression on U300
    spi/spi_s3c64xx: Warn if PIO transfers time out
    spi/s3c64xx: Fix incorrect reuse of 'val' local variable.
    spi/s3c64xx: Fix compilation warning
    spi/dw_spi: clean the cs_control code
    spi/dw_spi: Allow interrupt sharing
    spi/spi_s3c64xx: Increase dead reckoning time in wait_for_xfer()
    spi/spi_s3c64xx: Move to subsys_initcall()
    spi: free children in spi_unregister_master, not siblings
    gpiolib: Add 'struct gpio_chip' forward declaration for !GPIOLIB case
    of: Fix missing includes - ll_temac
    spi/spi_s3c64xx: Staticise non-exported functions
    spi/spi_s3c64xx: Make probe more robust against missing board config

    Linus Torvalds
     

13 Sep, 2010

2 commits

  • Update copyright notice and add Documentation/workqueue.txt.

    Randy Dunlap, Dave Chinner: misc fixes.

    Signed-off-by: Tejun Heo
    Reviewed-By: Florian Mickler
    Cc: Ingo Molnar
    Cc: Christoph Lameter
    Cc: Randy Dunlap
    Cc: Dave Chinner

    Tejun Heo
     
  • There is a race between rpc_info_open and rpc_release_client()
    in that nothing stops a process from opening the file after
    the clnt->cl_kref goes to zero.

    Fix this by using atomic_inc_unless_zero()...

    Reported-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     

10 Sep, 2010

14 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: Range check cpu in blk_cpu_to_group
    scatterlist: prevent invalid free when alloc fails
    writeback: Fix lost wake-up shutting down writeback thread
    writeback: do not lose wakeup events when forking bdi threads
    cciss: fix reporting of max queue depth since init
    block: switch s390 tape_block and mg_disk to elevator_change()
    block: add function call to switch the IO scheduler from a driver
    fs/bio-integrity.c: return -ENOMEM on kmalloc failure
    bio-integrity.c: remove dependency on __GFP_NOFAIL
    BLOCK: fix bio.bi_rw handling
    block: put dev->kobj in blk_register_queue fail path
    cciss: handle allocation failure
    cfq-iosched: Documentation help for new tunables
    cfq-iosched: blktrace print per slice sector stats
    cfq-iosched: Implement tunable group_idle
    cfq-iosched: Do group share accounting in IOPS when slice_idle=0
    cfq-iosched: Do not idle if slice_idle=0
    cciss: disable doorbell reset on reset_devices
    blkio: Fix return code for mkdir calls

    Linus Torvalds
     
  • David S. Miller
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
    libata-sff: Reenable Port Multiplier after libata-sff remodeling.
    libata: skip EH autopsy and recovery during suspend
    ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
    ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
    libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
    ahci: fix hang on failed softreset
    pata_artop: Fix device ID parity check

    Linus Torvalds
     
  • Keep track of the link on the which the current request is in progress.
    It allows support of links behind port multiplier.

    Not all libata-sff is PMP compliant. Code for native BMDMA controller
    does not take in accound PMP.

    Tested on Marvell 7042 and Sil7526.

    Signed-off-by: Gwendal Grignou
    Signed-off-by: Jeff Garzik

    Gwendal Grignou
     
  • For some mysterious reason, certain hardware reacts badly to usual EH
    actions while the system is going for suspend. As the devices won't
    be needed until the system is resumed, ask EH to skip usual autopsy
    and recovery and proceed directly to suspend.

    Signed-off-by: Tejun Heo
    Tested-by: Stephan Diestelhorst
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • …low and kswapd is awake

    Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
    cheaper than scanning a number of lists. To avoid synchronization
    overhead, counter deltas are maintained on a per-cpu basis and drained
    both periodically and when the delta is above a threshold. On large CPU
    systems, the difference between the estimated and real value of
    NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
    number of real free page in buddy, the VM can allocate pages below min
    watermark, at worst reducing the real number of pages to zero. Even if
    the OOM killer kills some victim for freeing memory, it may not free
    memory if the exit path requires a new page resulting in livelock.

    This patch introduces a zone_page_state_snapshot() function (courtesy of
    Christoph) that takes a slightly more accurate view of an arbitrary vmstat
    counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
    the watermark being accidentally broken. The estimate is not perfect and
    may result in cache line bounces but is expected to be lighter than the
    IPI calls necessary to continually drain the per-cpu counters while kswapd
    is awake.

    Signed-off-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Christoph Lameter
     
  • Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
    show a shift since I last tested in December: in part because of firmware
    updates, in part because of the necessary move from barriers to awaiting
    completion at the block layer. While discard at swapon still shows as
    slightly beneficial on both, discarding 1MB swap cluster when allocating
    is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).

    Surrender: discard as presently implemented is more hindrance than help
    for swap; but might prove useful on other devices, or with improvements.
    So continue to do the discard at swapon, but make discard while swapping
    conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
    only the lower 16 bits of int flags).

    We can add a --discard or -d to swapon(8), and a "discard" to swap in
    /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Cc: Nigel Cunningham
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc: James Bottomley
    Cc: "Martin K. Petersen"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf
    "hibernation: freeze swap at hibernation". It complicated matters by
    adding a second swap allocation path, just for hibernation; without in any
    way fixing the issue that it was intended to address - page reclaim after
    fixing the hibernation image might free swap from a page already imaged as
    swapcache, letting its swap be reallocated to store a different page of
    the image: resulting in data corruption if the imaged page were freed as
    clean then swapped back in. Pages freed to si->swap_map were still in
    danger of being reallocated by the alternative allocation path.

    I guess it inadvertently fixed slow SSD swap allocation for hibernation,
    as reported by Nigel Cunningham: by missing out the discards that occur on
    the usual swap allocation path; but that was unintentional, and needs a
    separate fix.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: "Rafael J. Wysocki"
    Cc: Ondrej Zary
    Cc: Andrea Gelmini
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Replace the arbitrary software-reset call from the device-probe
    method, because:

    - It is defective. To work correctly, it should be two byte writes,
    not a single word write. As it stands, it does nothing.

    - Some devices with sx150x expanders installed have their NRESET pins
    ganged on the same line, so resetting one causes the others to reset -
    not a nice thing to do arbitrarily!

    - The probe, usually taking place at boot, implies a recent hard-reset,
    so a software reset at this point is just a waste of energy anyway.

    Therefore, make it optional, defaulting to off, as this will match the
    common case of probing at powerup and also matches the current broken
    no-op behavior.

    Signed-off-by: Gregory Bean
    Reviewed-by: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gregory Bean
     
  • The pte_same check is reliable only if the swap entry remains pinned (by
    the page lock on swapcache). We've also to ensure the swapcache isn't
    removed before we take the lock as try_to_free_swap won't care about the
    page pin.

    One of the possible impacts of this patch is that a KSM-shared page can
    point to the anon_vma of another process, which could exit before the page
    is freed.

    This can leave a page with a pointer to a recycled anon_vma object, or
    worse, a pointer to something that is no longer an anon_vma.

    [riel@redhat.com: changelog help]
    Signed-off-by: Andrea Arcangeli
    Acked-by: Hugh Dickins
    Reviewed-by: Rik van Riel
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add cgroup_attach_task_all()

    The existing cgroup_attach_task_current_cg() API is called by a thread to
    attach another thread to all of its cgroups; this is unsuitable for cases
    where a privileged task wants to attach itself to the cgroups of a less
    privileged one, since the call must be made from the context of the target
    task.

    This patch adds a more generic cgroup_attach_task_all() API that allows
    both the source task and to-be-moved task to be specified.
    cgroup_attach_task_current_cg() becomes a specialization of the more
    generic new function.

    [menage@google.com: rewrote changelog]
    [akpm@linux-foundation.org: address reviewer comments]
    Signed-off-by: Michael S. Tsirkin
    Tested-by: Alex Williamson
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Ben Blum
    Cc: Sridhar Samudrala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael S. Tsirkin
     
  • Some macro parameter references inside typeof() operator are not enclosed
    with parenthesis. It should be safer to add them.

    Signed-off-by: Huang Ying
    Acked-by: Stefani Seibold
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • The introduction of support for SD combo cards breaks the initialization
    of all CSR SDIO chips. The GO_IDLE (CMD0) in mmc_sd_get_cid() causes CSR
    chips to be reset (this is non-standard behavior).

    When initializing an SDIO card check for a combo card by using the memory
    present bit in the R4 response to IO_SEND_OP_COND (CMD5). This avoids the
    call to mmc_sd_get_cid() on an SDIO-only card.

    Signed-off-by: David Vrabel
    Acked-by: Michal Mirolaw
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Vrabel
     
  • lg_lock_global() currently only acquires spinlocks for online CPUs, but
    it's meant to lock all possible CPUs. Lglock-protected resources may be
    associated with removed CPUs - and, indeed, that could happen with the
    per-superblock open files lists.

    At Nick's suggestion, change for_each_online_cpu() to
    for_each_possible_cpu() to protect accesses to those resources.

    Cc: Al Viro
    Acked-by: Nick Piggin
    Signed-off-by: Jonathan Corbet
    Signed-off-by: Linus Torvalds

    Jonathan Corbet