02 Aug, 2012

17 commits

  • Pull UML fixes from Richard Weinberger:
    "This patch set contains mostly fixes and cleanups. The UML tty driver
    uses now tty_port and is no longer broken like hell :-)"

    * 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Add arch/x86/um to MAINTAINERS
    um: pass siginfo to guest process
    um: fix ubd_file_size for read-only files
    um: pull interrupt_end() into userspace()
    um: split syscall_trace(), pass pt_regs to it
    um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs
    um: set BLK_CGROUP=y in defconfig
    um: remove count_lock
    um: fully use tty_port
    um: Remove dead code
    um: remove line_ioctl()
    TTY: um/line, use tty from tty_port
    TTY: um/line, add tty_port

    Linus Torvalds
     
  • Pull ARM DMA engine updates from Russell King:
    "This looks scary at first glance, but what it is is:
    - a rework of the sa11x0 DMA engine driver merged during the previous
    cycle, to extract a common set of helper functions for DMA engine
    implementations.
    - conversion of amba-pl08x.c to use these helper functions.
    - addition of OMAP DMA engine driver (using these helper functions),
    and conversion of some of the OMAP DMA users to use DMA engine.

    Nothing in the helper functions is ARM specific, so I hope that other
    implementations can consolidate some of their code by making use of
    these helpers.

    This has been sitting in linux-next most of the merge cycle, and has
    been tested by several OMAP folk. I've tested it on sa11x0 platforms,
    and given it my best shot on my broken platforms which have the
    amba-pl08x controller.

    The last point is the addition to feature-removal-schedule.txt, which
    will have a merge conflict. Between myself and TI, we're planning to
    remove the old TI DMA implementation next year."

    Fix up trivial add/add conflicts in Documentation/feature-removal-schedule.txt
    and drivers/dma/{Kconfig,Makefile}

    * 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm: (53 commits)
    ARM: 7481/1: OMAP2+: omap2plus_defconfig: enable OMAP DMA engine
    ARM: 7464/1: mmc: omap_hsmmc: ensure probe returns error if DMA channel request fails
    Add feature removal of old OMAP private DMA implementation
    mtd: omap2: remove private DMA API implementation
    mtd: omap2: add DMA engine support
    spi: omap2-mcspi: remove private DMA API implementation
    spi: omap2-mcspi: add DMA engine support
    ARM: omap: remove mmc platform data dma_mask and initialization
    mmc: omap: remove private DMA API implementation
    mmc: omap: add DMA engine support
    mmc: omap_hsmmc: remove private DMA API implementation
    mmc: omap_hsmmc: add DMA engine support
    dmaengine: omap: add support for cyclic DMA
    dmaengine: omap: add support for setting fi
    dmaengine: omap: add support for returning residue in tx_state method
    dmaengine: add OMAP DMA engine driver
    dmaengine: sa11x0-dma: add cyclic DMA support
    dmaengine: sa11x0-dma: fix DMA residue support
    dmaengine: PL08x: ensure all descriptors are freed when channel is released
    dmaengine: PL08x: get rid of write only pool_ctr and free_txd locking
    ...

    Linus Torvalds
     
  • Pull ARM audit/signal updates from Russell King:
    "ARM audit/signal handling updates from Al and Will. This improves on
    the work Viro did last merge window, and sorts out some of the issues
    found with that work."

    * 'audit' of git://git.linaro.org/people/rmk/linux-arm:
    ARM: 7475/1: sys_trace: allow all syscall arguments to be updated via ptrace
    ARM: 7474/1: get rid of TIF_SYSCALL_RESTARTSYS
    ARM: 7473/1: deal with handlerless restarts without leaving the kernel
    ARM: 7472/1: pull all work_pending logics into C function
    ARM: 7471/1: Revert "7442/1: Revert "remove unused restart trampoline""
    ARM: 7470/1: Revert "7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK""

    Linus Torvalds
     
  • Pull ARM fixes from Russell King:
    "This fixes various issues found during July"

    * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
    ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
    ARM: Fix undefined instruction exception handling
    ARM: 7480/1: only call smp_send_stop() on SMP
    ARM: 7478/1: errata: extend workaround for erratum #720789
    ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
    ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
    ARM: 7468/1: ftrace: Trace function entry before updating index
    ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
    ARM: 7466/1: disable interrupt before spinning endlessly
    ARM: 7465/1: Handle >4GB memory sizes in device tree and mem=size@start option

    Linus Torvalds
     
  • Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • UML guest processes now get correct siginfo_t for SIGTRAP, SIGFPE,
    SIGILL and SIGBUS. Specifically, si_addr and si_code are now correct
    where previously they were si_addr = NULL and si_code = 128.

    Signed-off-by: Martin Pärtel
    Signed-off-by: Richard Weinberger

    Martin Pärtel
     
  • Made ubd_file_size not request write access. Fixes use of read-only images.

    Signed-off-by: Martin Pärtel
    Signed-off-by: Richard Weinberger

    Martin Pärtel
     
  • Signed-off-by: Al Viro
    Signed-off-by: Richard Weinberger

    Al Viro
     
  • Signed-off-by: Al Viro
    [richard@nod.at: Fixed some minor build issues]
    Signed-off-by: Richard Weinberger

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Richard Weinberger

    Al Viro
     
  • Pull fbdev updates from Florian Tobias Schandinat:
    - large updates for OMAP
    - support for LCD3 overlay manager (omap5)
    - omapdss output cleanup
    - removal of passive matrix LCD support as there are no drivers for
    such panels for DSS or DSS2 and nobody complained (cleanup)
    - large updates for SH Mobile
    - overlay support
    - separating MERAM (cache) from framebuffer driver
    - some updates for Exynos and da8xx-fb
    - various other small patches

    * tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6: (78 commits)
    da8xx-fb: fix compile issue due to missing include
    fbdev: Make pixel_to_pat() failure mode more friendly
    da8xx-fb: do not turn ON LCD backlight unless LCDC is enabled
    fbdev: sh_mobile_lcdc: Fix vertical panning step
    video: exynos mipi dsi: Fix mipi dsi regulators handling issue
    video: da8xx-fb: do clock reset of revision 2 LCDC before enabling
    arm: da850: configure LCDC fifo threshold
    video: da8xx-fb: configure FIFO threshold to reduce underflow errors
    video: da8xx-fb: fix flicker due to 1 frame delay in updated frame
    video: da8xx-fb rev2: fix disabling of palette completion interrupt
    da8xx-fb: add missing FB_BLANK operations
    video: exynos_dp: use usleep_range instead of delay
    video: exynos_dp: check the only INTERLANE_ALIGN_DONE bit during Link Training
    fb: epson1355fb: Fix section mismatch
    video: exynos_dp: fix wrong DPCD address during Link Training
    video/smscufx: fix line counting in fb_write
    aty128fb: Fix coding style issues
    fbdev: sh_mobile_lcdc: Fix pan offset computation in YUV mode
    fbdev: sh_mobile_lcdc: Fix overlay registers update during pan operation
    fbdev: sh_mobile_lcdc: Support horizontal panning
    ...

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "A collection of small fixes that have been found recently. Most of
    the commits are regression fixes in HD-audio and some other random
    drivers."

    * tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: snd-usb: fix clock source validity index
    ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
    ALSA: hda - Add descriptions for missing IDT 92HD83x models
    ALSA: hda - Fix polarity of mute LED on HP Mini 210
    ALSA: es1688 - freeup resources on init failure
    ALSA: hda - Workaround for silent output on VAIO Z with ALC889
    ALSA: hda - Fix WARNING from HDMI/DP parser
    ALSA: hda - Detach from converter at closing in patch_hdmi.c
    ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
    ALSA: mpu401: Fix missing initialization of irq field
    ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs

    Linus Torvalds
     
  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     
  • Pull block driver changes from Jens Axboe:

    - Making the plugging support for drivers a bit more sane from Neil.
    This supersedes the plugging change from Shaohua as well.

    - The usual round of drbd updates.

    - Using a tail add instead of a head add in the request completion for
    ndb, making us find the most completed request more quickly.

    - A few floppy changes, getting rid of a duplicated flag and also
    running the floppy init async (since it takes forever in boot terms)
    from Andi.

    * 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
    floppy: remove duplicated flag FD_RAW_NEED_DISK
    blk: pass from_schedule to non-request unplug functions.
    block: stack unplug
    blk: centralize non-request unplug handling.
    md: remove plug_cnt feature of plugging.
    block/nbd: micro-optimization in nbd request completion
    drbd: announce FLUSH/FUA capability to upper layers
    drbd: fix max_bio_size to be unsigned
    drbd: flush drbd work queue before invalidate/invalidate remote
    drbd: fix potential access after free
    drbd: call local-io-error handler early
    drbd: do not reset rs_pending_cnt too early
    drbd: reset congestion information before reporting it in /proc/drbd
    drbd: report congestion if we are waiting for some userland callback
    drbd: differentiate between normal and forced detach
    drbd: cleanup, remove two unused global flags
    floppy: Run floppy initialization asynchronous

    Linus Torvalds
     
  • Pull core block IO bits from Jens Axboe:
    "The most complicated part if this is the request allocation rework by
    Tejun, which has been queued up for a long time and has been in
    for-next ditto as well.

    There are a few commits from yesterday and today, mostly trivial and
    obvious fixes. So I'm pretty confident that it is sound. It's also
    smaller than usual."

    * 'for-3.6/core' of git://git.kernel.dk/linux-block:
    block: remove dead func declaration
    block: add partition resize function to blkpg ioctl
    block: uninitialized ioc->nr_tasks triggers WARN_ON
    block: do not artificially constrain max_sectors for stacking drivers
    blkcg: implement per-blkg request allocation
    block: prepare for multiple request_lists
    block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
    blkcg: inline bio_blkcg() and friends
    block: allocate io_context upfront
    block: refactor get_request[_wait]()
    block: drop custom queue draining used by scsi_transport_{iscsi|fc}
    mempool: add @gfp_mask to mempool_create_node()
    blkcg: make root blkcg allocation use %GFP_KERNEL
    blkcg: __blkg_lookup_create() doesn't need radix preload

    Linus Torvalds
     
  • Pull md updates from NeilBrown.

    * 'for-next' of git://neil.brown.name/md:
    DM RAID: Add support for MD RAID10
    md/RAID1: Add missing case for attempting to repair known bad blocks.
    md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
    md/raid1: don't abort a resync on the first badblock.
    md: remove duplicated test on ->openers when calling do_md_stop()
    raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
    md/raid1: prevent merging too large request
    md/raid1: read balance chooses idlest disk for SSD
    md/raid1: make sequential read detection per disk based
    MD RAID10: Export md_raid10_congested
    MD: Move macros from raid1*.h to raid1*.c
    MD RAID1: rename mirror_info structure
    MD RAID10: rename mirror_info structure
    MD RAID10: Fix compiler warning.
    raid5: add a per-stripe lock
    raid5: remove unnecessary bitmap write optimization
    raid5: lockless access raid5 overrided bi_phys_segments
    raid5: reduce chance release_stripe() taking device_lock

    Linus Torvalds
     
  • In commit 3b6e2723f32d ("locks: prevent side-effects of
    locks_release_private before file_lock is initialized") we removed the
    last user of lm_release_private without removing the field itself.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

01 Aug, 2012

23 commits

  • * set_fs(KERNEL_DS) + getname() is probably the weirdest implementation
    of strdup() I've seen. Especially since they don't to copy it at all...
    * filp_open() never returns NULL; it's ERR_PTR(-E...) on failure.
    * file->f_dentry is never going to be NULL, TYVM.
    * match_strdup() + snprintf() + kfree() is a bloody weird way to spell
    match_strlcpy().

    Pox on cargo-cult programmers...

    Signed-off-by: Al Viro

    Al Viro
     
  • Support the MD RAID10 personality through dm-raid.c

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • Pull in pre-requisites for adding raid10 support to dm-raid.

    NeilBrown
     
  • __generic_unplug_device() function is removed with commit
    7eaceaccab5f40bbfda044629a6298616aeaed50, which forgot to
    remove the declaration at meantime. Here remove it.

    Signed-off-by: Yuanhan Liu
    Signed-off-by: Jens Axboe

    Yuanhan Liu
     
  • Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
    allows altering the size of an existing partition, even if it is currently
    in use.

    This patch converts hd_struct->nr_sects into sequence counter because
    One might extend a partition while IO is happening to it and update of
    nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
    can lead to issues like reading inconsistent size of a partition. Sequence
    counter have been used so that readers don't have to take bdev mutex lock
    as we call sector_in_part() very frequently.

    Now all the access to hd_struct->nr_sects should happen using sequence
    counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
    There is one exception though, set_capacity()/get_capacity(). I think
    theoritically race should exist there too but this patch does not
    modify set_capacity()/get_capacity() due to sheer number of call sites
    and I am afraid that change might break something. I have left that as a
    TODO item. We can handle it later if need be. This patch does not introduce
    any new races as such w.r.t set_capacity()/get_capacity().

    v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Phillip Susi
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Hi,

    I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the
    below warning as of 3.5-rc1 and later (3.4 is fine):

    [ 10.886893] ------------[ cut here ]------------
    [ 10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560()
    [ 10.886905] Hardware name: Bochs
    [ 10.886906] Modules linked in:
    [ 10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27
    [ 10.886908] Call Trace:
    [ 10.886911] [] warn_slowpath_common+0x7a/0xb0
    [ 10.886912] [] warn_slowpath_null+0x15/0x20
    [ 10.886913] [] copy_process+0x1488/0x1560
    [ 10.886914] [] do_fork+0xb4/0x340
    [ 10.886918] [] ? recalc_sigpending+0x1a/0x50
    [ 10.886919] [] ? __set_task_blocked+0x32/0x80
    [ 10.886920] [] ? __set_current_blocked+0x3a/0x60
    [ 10.886923] [] sys_clone+0x23/0x30
    [ 10.886925] [] stub_clone+0x13/0x20
    [ 10.886927] [] ? system_call_fastpath+0x16/0x1b
    [ 10.886928] ---[ end trace 32a14af7ee6a590b ]---

    Reproducing is easy, I can hit it on a KVM system with a very basic
    config (x86_64 make defconfig + enable the drivers needed). To hit it,
    just install dump (on debian/ubuntu, not sure what the package might be
    called on Fedora), and:

    dump -o -f /tmp/foo /

    You'll see the warning in dmesg once it forks off the I/O process and
    starts dumping filesystem contents.

    I bisected it down to the following commit:

    commit f6e8d01bee036460e03bd4f6a79d014f98ba712e
    Author: Tejun Heo
    Date: Mon Mar 5 13:15:26 2012 -0800

    block: add io_context->active_ref

    Currently ioc->nr_tasks is used to decide two things - whether an ioc
    is done issuing IOs and whether it's shared by multiple tasks. This
    patch separate out the first into ioc->active_ref, which is acquired
    and released using {get|put}_io_context_active() respectively.

    This will be used to associate bio's with a given task. This patch
    doesn't introduce any visible behavior change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    It seems like the init of ioc->nr_tasks was removed in that patch,
    so it starts out at 0 instead of 1.

    Tejun, is the right thing here to add back the init, or should something else
    be done?

    The below patch removes the warning, but I haven't done any more extensive
    testing on it.

    Signed-off-by: Olof Johansson
    Acked-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Olof Johansson
     
  • blk_set_stacking_limits is intended to allow stacking drivers to build
    up the limits of the stacked device based on the underlying devices'
    limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
    doesn't allow the stacking driver to inherit a max_sectors larger than
    1024 -- due to blk_stack_limits' use of min_not_zero.

    It is now clear that this artificial limit is getting in the way so
    change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
    stacking drivers like dm-multipath to inherit 'max_sectors' from the
    underlying paths).

    Reported-by: Vijay Chauhan
    Tested-by: Vijay Chauhan
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • uac_clock_source_is_valid() uses the control selector value to access
    the bmControls bitmap of the clock source unit. This is wrong, as
    control selector values start from 1, while the bitmap uses all
    available bits.

    In other words, "Clock Validity Control" is stored in D3..2, not D5..4
    of the clock selector unit's bmControls.

    Signed-off-by: Daniel Mack
    Reported-by: Andreas Koch
    Cc: stable@kernel.org
    Signed-off-by: Takashi Iwai

    Daniel Mack
     
  • Pull irqdomain changes from Grant Likely:
    "Round of refactoring and enhancements to irq_domain infrastructure.
    This series starts the process of simplifying irqdomain. The ultimate
    goal is to merge LEGACY, LINEAR and TREE mappings into a single
    system, but had to back off from that after some last minute bugs.
    Instead it mainly reorganizes the code and ensures that the reverse
    map gets populated when the irq is mapped instead of the first time it
    is looked up.

    Merging of the irq_domain types is deferred to v3.7

    In other news, this series adds helpers for creating static mappings
    on a linear or tree mapping."

    * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
    irqdomain: Improve diagnostics when a domain mapping fails
    irqdomain: eliminate slow-path revmap lookups
    irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
    irqdomain: Eliminate dedicated radix lookup functions
    irqdomain: Support for static IRQ mapping and association.
    irqdomain: Always update revmap when setting up a virq
    irqdomain: Split disassociating code into separate function
    irq_domain: correct a minor wrong comment for linear revmap
    irq_domain: Standardise legacy/linear domain selection
    irqdomain: Make ops->map hook optional
    irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
    irqdomain: Simple NUMA awareness.
    devicetree: add helper inline for retrieving a node's full name

    Linus Torvalds
     
  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull VFIO core from Alex Williamson:
    "This series includes the VFIO userspace driver interface for the 3.6
    kernel merge window. This driver is intended to provide a secure
    interface for device access using IOMMU protection for applications
    like assignment of physical devices to virtual machines.

    Qemu will be the first user of this interface, enabling assignment of
    PCI devices to Qemu guests. This interface is intended to eventually
    replace the x86-specific assignment mechanism currently available in
    KVM.

    This interface has the advantage of being more secure, by working with
    IOMMU groups to ensure device isolation and providing it's own
    filtered resource access mechanism, and also more flexible, in not
    being x86 or KVM specific (extensions to enable POWER are already
    working).

    This driver is originally the work of Tom Lyon, but has since been
    handed over to me and gone through a complete overhaul thanks to the
    input from David Gibson, Ben Herrenschmidt, Chris Wright, Joerg
    Roedel, and others. This driver has been available in linux-next for
    the last month."

    Paul Mackerras says:
    "I would be glad to see it go in since we want to use it with KVM on
    PowerPC. If possible we'd like the PowerPC bits for it to go in as
    well."

    * tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio:
    vfio: Add PCI device driver
    vfio: Type1 IOMMU implementation
    vfio: Add documentation
    vfio: VFIO core

    Linus Torvalds
     
  • Pull random subsystem patches from Ted Ts'o:
    "This patch series contains a major revamp of how we collect entropy
    from interrupts for /dev/random and /dev/urandom.

    The goal is to addresses weaknesses discussed in the paper "Mining
    your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
    by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
    which will be published in the Proceedings of the 21st Usenix Security
    Symposium, August 2012. (See https://factorable.net for more
    information and an extended version of the paper.)"

    Fix up trivial conflicts due to nearby changes in
    drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
    random: mix in architectural randomness in extract_buf()
    dmi: Feed DMI table to /dev/random driver
    random: Add comment to random_initialize()
    random: final removal of IRQF_SAMPLE_RANDOM
    um: remove IRQF_SAMPLE_RANDOM which is now a no-op
    sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
    board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
    isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
    uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
    drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
    xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
    n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
    i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
    input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
    mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
    ...

    Linus Torvalds
     
  • Pull final RDMA changes from Roland Dreier:
    - Fix IPoIB to stop using unsafe linkage between networking neighbour
    layer and private path database.
    - Small fixes for bugs found by Fengguang Wu's automated builds.

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    IPoIB: Use a private hash table for path lookup in xmit path
    IB/qib: Fix size of cc_supported_table_entries
    RDMA/ucma: Convert open-coded equivalent to memdup_user()
    RDMA/ocrdma: Fix check of GSI CQs
    RDMA/cma: Use PTR_RET rather than if (IS_ERR(...)) + PTR_ERR

    Linus Torvalds
     
  • Pull second set of media updates from Mauro Carvalho Chehab:

    - radio API: add support to work with radio frequency bands

    - new AM/FM radio drivers: radio-shark, radio-shark2

    - new Remote Controller USB driver: iguanair

    - conversion of several drivers to the v4l2 core control framework

    - new board additions at existing drivers

    - the remaining (and vast majority of the patches) are due to
    drivers/DocBook fixes/cleanups.

    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (154 commits)
    [media] radio-tea5777: use library for 64bits div
    [media] tlg2300: Declare MODULE_FIRMWARE usage
    [media] lgs8gxx: Declare MODULE_FIRMWARE usage
    [media] xc5000: Add MODULE_FIRMWARE statements
    [media] s2255drv: Add MODULE_FIRMWARE statement
    [media] dib8000: move dereference after check for NULL
    [media] Documentation: Update cardlists
    [media] bttv: add support for Aposonic W-DVR
    [media] cx25821: Remove bad strcpy to read-only char*
    [media] pms.c: remove duplicated include
    [media] smiapp-core.c: remove duplicated include
    [media] via-camera: pass correct format settings to sensor
    [media] rtl2832.c: minor cleanup
    [media] Add support for the IguanaWorks USB IR Transceiver
    [media] Minor cleanups for MCE USB
    [media] drivers/media/dvb/siano/smscoreapi.c: use list_for_each_entry
    [media] Use a named union in struct v4l2_ioctl_info
    [media] mceusb: Add Twisted Melon USB IDs
    [media] staging/media/solo6x10: use module_pci_driver macro
    [media] staging/media/dt3155v4l: use module_pci_driver macro
    ...

    Conflicts:
    Documentation/feature-removal-schedule.txt

    Linus Torvalds
     
  • Pull second wave of NFS client updates from Trond Myklebust:

    - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
    separate modules.

    - Fix Oopses in the NFSv4 idmapper

    - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
    up recursing into the NFS code due to memory reclaim.

    - Increase the number of permitted callback connections.

    * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    nfs: explicitly reject LOCK_MAND flock() requests
    nfs: increase number of permitted callback connections.
    SUNRPC: return negative value in case rpcbind client creation error
    NFS: Convert v4 into a module
    NFS: Convert v3 into a module
    NFS: Convert v2 into a module
    NFS: Keep module parameters in the generic NFS client
    NFS: Split out remaining NFS v4 inode functions
    NFS: Pass super operations and xattr handlers in the nfs_subversion
    NFS: Only initialize the ACL client in the v3 case
    NFS: Create a try_mount rpc op
    NFS: Remove the NFS v4 xdev mount function
    NFS: Add version registering framework
    NFS: Fix a number of bugs in the idmapper
    nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
    sunrpc: clarify comments on rpc_make_runnable
    pnfsblock: bail out partial page IO

    Linus Torvalds
     
  • Pull networking update from David S. Miller:
    "I think Eric Dumazet and I have dealt with all of the known routing
    cache removal fallout. Some other minor fixes all around.

    1) Fix RCU of cached routes, particular of output routes which require
    liberation via call_rcu() instead of call_rcu_bh(). From Eric
    Dumazet.

    2) Make sure we purge net device references in cached routes properly.

    3) TG3 driver bug fixes from Michael Chan.

    4) Fix reported 'expires' value in ipv6 routes, from Li Wei.

    5) TUN driver ioctl leaks kernel bytes to userspace, from Mathias
    Krause."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
    ipv4: Properly purge netdev references on uncached routes.
    ipv4: Cache routes in nexthop exception entries.
    ipv4: percpu nh_rth_output cache
    ipv4: Restore old dst_free() behavior.
    bridge: make port attributes const
    ipv4: remove rt_cache_rebuild_count
    net: ipv4: fix RCU races on dst refcounts
    net: TCP early demux cleanup
    tun: Fix formatting.
    net/tun: fix ioctl() based info leaks
    tg3: Update version to 3.124
    tg3: Fix race condition in tg3_get_stats64()
    tg3: Add New 5719 Read DMA workaround
    tg3: Fix Read DMA workaround for 5719 A0.
    tg3: Request APE_LOCK_PHY before PHY access
    ipv6: fix incorrect route 'expires' value passed to userspace
    mISDN: Bugfix only few bytes are transfered on a connection
    seeq: use PTR_RET at init_module of driver
    bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_path
    ipv4: clean up put_child
    ...

    Linus Torvalds
     
  • devm_kzalloc() doesn't need a matching devm_kfree(), the freeing mechanism
    will trigger when driver unloads.

    Signed-off-by: Devendra Naga
    Cc: Alessandro Zummo
    Cc: Ashish Jangam
    Cc: David Dajun Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Devendra Naga
     
  • At the probe we are assigning ret to return value of PTR_ERR right after
    the rtc_register_drive()r, as we would have done it in the if
    (IS_ERR(ptr)) check, since the function fails and goes inside that case

    Signed-off-by: Devendra Naga
    Cc: Alessandro Zummo
    Cc: Ashish Jangam
    Cc: David Dajun Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Devendra Naga
     
  • If a process creates a large hugetlbfs mapping that is eligible for page
    table sharing and forks heavily with children some of whom fault and
    others which destroy the mapping then it is possible for page tables to
    get corrupted. Some teardowns of the mapping encounter a "bad pmd" and
    output a message to the kernel log. The final teardown will trigger a
    BUG_ON in mm/filemap.c.

    This was reproduced in 3.4 but is known to have existed for a long time
    and goes back at least as far as 2.6.37. It was probably was introduced
    in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages
    look like this;

    [ ..........] Lots of bad pmd messages followed by this
    [ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
    [ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
    [ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
    [ 127.186778] ------------[ cut here ]------------
    [ 127.186781] kernel BUG at mm/filemap.c:134!
    [ 127.186782] invalid opcode: 0000 [#1] SMP
    [ 127.186783] CPU 7
    [ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
    [ 127.186801]
    [ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
    [ 127.186804] RIP: 0010:[] [] __delete_from_page_cache+0x15e/0x160
    [ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002
    [ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
    [ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
    [ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
    [ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
    [ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
    [ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
    [ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
    [ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
    [ 127.186821] Stack:
    [ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
    [ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
    [ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
    [ 127.186827] Call Trace:
    [ 127.186829] [] delete_from_page_cache+0x3b/0x80
    [ 127.186832] [] truncate_hugepages+0x115/0x220
    [ 127.186834] [] hugetlbfs_evict_inode+0x13/0x30
    [ 127.186837] [] evict+0xa7/0x1b0
    [ 127.186839] [] iput_final+0xd3/0x1f0
    [ 127.186840] [] iput+0x39/0x50
    [ 127.186842] [] d_kill+0xf8/0x130
    [ 127.186843] [] dput+0xd2/0x1a0
    [ 127.186845] [] __fput+0x170/0x230
    [ 127.186848] [] ? rb_erase+0xce/0x150
    [ 127.186849] [] fput+0x1d/0x30
    [ 127.186851] [] remove_vma+0x37/0x80
    [ 127.186853] [] do_munmap+0x2d2/0x360
    [ 127.186855] [] sys_shmdt+0xc9/0x170
    [ 127.186857] [] system_call_fastpath+0x16/0x1b
    [ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
    [ 127.186868] RIP [] __delete_from_page_cache+0x15e/0x160
    [ 127.186870] RSP
    [ 127.186871] ---[ end trace 7cbac5d1db69f426 ]---

    The bug is a race and not always easy to reproduce. To reproduce it I was
    doing the following on a single socket I7-based machine with 16G of RAM.

    $ hugeadm --pool-pages-max DEFAULT:13G
    $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
    $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
    $ for i in `seq 1 9000`; do ./hugetlbfs-test; done

    On my particular machine, it usually triggers within 10 minutes but
    enabling debug options can change the timing such that it never hits.
    Once the bug is triggered, the machine is in trouble and needs to be
    rebooted. The machine will respond but processes accessing proc like "ps
    aux" will hang due to the BUG_ON. shutdown will also hang and needs a
    hard reset or a sysrq-b.

    The basic problem is a race between page table sharing and teardown. For
    the most part page table sharing depends on i_mmap_mutex. In some cases,
    it is also taking the mm->page_table_lock for the PTE updates but with
    shared page tables, it is the i_mmap_mutex that is more important.

    Unfortunately it appears to be also insufficient. Consider the following
    situation

    Process A Process B
    --------- ---------
    hugetlb_fault shmdt
    LockWrite(mmap_sem)
    do_munmap
    unmap_region
    unmap_vmas
    unmap_single_vma
    unmap_hugepage_range
    Lock(i_mmap_mutex)
    Lock(mm->page_table_lock)
    huge_pmd_unshare/unmap tables page_table_lock)
    Unlock(i_mmap_mutex)
    huge_pte_alloc ...
    Lock(i_mmap_mutex) ...
    vma_prio_walk, find svma, spte ...
    Lock(mm->page_table_lock) ...
    share spte ...
    Unlock(mm->page_table_lock) ...
    Unlock(i_mmap_mutex) ...
    hugetlb_no_page < end; a += 4096)
    *a = 0;
    }

    int main(int argc, char **argv)
    {
    key_t key = IPC_PRIVATE;
    size_t sizeA = nr_huge_page_A * huge_page_size;
    size_t sizeB = nr_huge_page_B * huge_page_size;
    int shmidA, shmidB;
    void *addrA = NULL, *addrB = NULL;
    int nr_children = 300, n = 0;

    if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
    perror("shmget:");
    return 1;
    }

    if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
    perror("shmat");
    return 1;
    }
    if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
    perror("shmget:");
    return 1;
    }

    if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
    perror("shmat");
    return 1;
    }

    fork_child:
    switch(fork()) {
    case 0:
    switch (n%3) {
    case 0:
    play(addrA, sizeA);
    break;
    case 1:
    play(addrB, sizeB);
    break;
    case 2:
    break;
    }
    break;
    case -1:
    perror("fork:");
    break;
    default:
    if (++n < nr_children)
    goto fork_child;
    play(addrA, sizeA);
    break;
    }
    shmdt(addrA);
    shmdt(addrB);
    do {
    wait(NULL);
    } while (--n > 0);
    shmctl(shmidA, IPC_RMID, NULL);
    shmctl(shmidB, IPC_RMID, NULL);
    return 0;
    }

    [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
    Signed-off-by: Hugh Dickins
    Reviewed-by: Michal Hocko
    Signed-off-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When tmpfs has the interleave memory policy, it always starts allocating
    for each file from node 0 at offset 0. When there are many small files,
    the lower nodes fill up disproportionately.

    This patch spreads out node usage by starting files at nodes other than 0,
    by using the inode number to bias the starting node for interleave.

    Signed-off-by: Nathan Zimmer
    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc: Nick Piggin
    Cc: Lee Schermerhorn
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Zimmer
     
  • pg_data_t is zeroed before reaching free_area_init_core(), so remove the
    now unnecessary initializations.

    Signed-off-by: Minchan Kim
    Cc: Tejun Heo
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
    when it is allocated. Arch code and memory hotplug already initiailize
    pg_data_t. So this warning should never happen. I select fields randomly
    near the beginning, middle and end of pg_data_t for checking.

    This patch isn't for performance but for removing initialization code
    which is necessary to add whenever we adds new field to pg_data_t or zone.

    Firstly, Andrew suggested clearing out of pg_data_t in MM core part but
    Tejun doesn't like it because in the future, some archs can initialize
    some fields in arch code and pass them into general MM part so blindly
    clearing it out in mm core part would be very annoying.

    Signed-off-by: Minchan Kim
    Cc: Tejun Heo
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch is preparation for the next patch which removes the zeroing of
    the pg_data_t in core MM. All archs except MIPS already do this.

    Signed-off-by: Minchan Kim
    Cc: Ralf Baechle
    Cc: Tejun Heo

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim