14 Feb, 2012

2 commits

  • commit d020283dc694c9ec31b410f522252f7a8397e67d upstream.

    Looks like change "PM QoS: Move and rename the implementation files"
    merged during the 3.2 development cycle made PM QoS depend on
    CONFIG_PM which depends on (PM_SLEEP || PM_RUNTIME).

    That breaks CPU C-states with kernels not having these CONFIGs, causing CPUs
    to spend time in Polling loop idle instead of going into deep C-states,
    consuming way way more power. This is with either acpi idle or intel idle
    enabled.

    Either CONFIG_PM should be enabled with any pm_qos users or
    the !CONFIG_PM pm_qos_request() should return sane defaults not to break
    the existing users. Here's is the patch for the latter option.

    [rjw: Modified the changelog slightly.]

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Venkatesh Pallipadi
     
  • commit 181e9bdef37bfcaa41f3ab6c948a2a0d60a268b5 upstream.

    Commit 2aede851ddf08666f68ffc17be446420e9d2a056

    PM / Hibernate: Freeze kernel threads after preallocating memory

    introduced a mechanism by which kernel threads were frozen after
    the preallocation of hibernate image memory to avoid problems with
    frozen kernel threads not responding to memory freeing requests.
    However, it overlooked the s2disk code path in which the
    SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
    which caused freeze_workqueues_begin() to BUG(), because it saw
    that worqueues had been already frozen.

    Although in principle this issue might be addressed by removing
    the relevant BUG_ON() from freeze_workqueues_begin(), that would
    reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
    attempted to avoid into that particular code path. For this reason,
    to fix the issue at hand, introduce thaw_kernel_threads() and make
    the SNAPSHOT_FREE ioctl execute it.

    Special thanks to Srivatsa S. Bhat for detailed analysis of the
    problem.

    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Srivatsa S. Bhat
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     

07 Feb, 2012

1 commit

  • commit 3c076351c4027a56d5005a39a0b518a4ba393ce2 upstream.

    Right now we forcibly clear ASPM state on all devices if the BIOS indicates
    that the feature isn't supported. Based on the Microsoft presentation
    "PCI Express In Depth for Windows Vista and Beyond", I'm starting to think
    that this may be an error. The implication is that unless the platform
    grants full control via _OSC, Windows will not touch any PCIe features -
    including ASPM. In that case clearing ASPM state would be an error unless
    the platform has granted us that control.

    This patch reworks the ASPM disabling code such that the actual clearing
    of state is triggered by a successful handoff of PCIe control to the OS.
    The general ASPM code undergoes some changes in order to ensure that the
    ability to clear the bits isn't overridden by ASPM having already been
    disabled. Further, this theoretically now allows for situations where
    only a subset of PCIe roots hand over control, leaving the others in the
    BIOS state.

    It's difficult to know for sure that this is the right thing to do -
    there's zero public documentation on the interaction between all of these
    components. But enough vendors enable ASPM on platforms and then set this
    bit that it seems likely that they're expecting the OS to leave them alone.

    Measured to save around 5W on an idle Thinkpad X220.

    Signed-off-by: Matthew Garrett
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Matthew Garrett
     

04 Feb, 2012

2 commits

  • [ Upstream commit 5ee4433efe99b9f39f6eff5052a177bbcfe72cea ]

    By definition net_generic should never be called when it can return
    NULL. Fail conspicously with a BUG_ON to make it clear when people mess
    up that a NULL return should never happen.

    Recently there was a bug in the CAIF subsystem where it was registered
    with register_pernet_device instead of register_pernet_subsys. It was
    erroneously concluded that net_generic could validly return NULL and
    that net_assign_generic was buggy (when it was just inefficient).
    Hopefully this BUG_ON will prevent people to coming to similar erroneous
    conclusions in the futrue.

    Signed-off-by: Eric W. Biederman
    Tested-by: Sasha Levin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 598781d71119827b454fd75d46f84755bca6f0c6 upstream.

    If the master tries to authenticate a client using drm_authmagic and
    that client has already closed its drm file descriptor,
    either wilfully or because it was terminated, the
    call to drm_authmagic will dereference a stale pointer into kmalloc'ed memory
    and corrupt it.

    Typically this results in a hard system hang.

    This patch fixes that problem by removing any authentication tokens
    (struct drm_magic_entry) open for a file descriptor when that file
    descriptor is closed.

    Signed-off-by: Thomas Hellstrom
    Reviewed-by: Daniel Vetter
    Signed-off-by: Dave Airlie
    Signed-off-by: Greg Kroah-Hartman

    Thomas Hellstrom
     

26 Jan, 2012

14 commits

  • commit 245132643e1cfcd145bbc86a716c1818371fcb93 upstream.

    Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in
    find_get_pages") correctly fixed an infinite loop; but left a problem
    that find_get_pages() on shmem would return 0 (appearing to callers to
    mean end of tree) when it meets a run of nr_pages swap entries.

    The only uses of find_get_pages() on shmem are via pagevec_lookup(),
    called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
    scan_mapping_unevictable_pages(). The first is already commented, and
    not worth worrying about; but the second can leave pages on the
    Unevictable list after an unusual sequence of swapping and locking.

    Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
    swap) instead of pagevec_lookup().

    But I don't want to contaminate vmscan.c with shmem internals, nor
    shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
    shmem.c, renaming it shmem_unlock_mapping(); and rename
    check_move_unevictable_page() to check_move_unevictable_pages(), looping
    down an array of pages, oftentimes under the same lock.

    Leave out the "rotate unevictable list" block: that's a leftover from
    when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
    handling involved looking at pages at tail of LRU.

    Was there significance to the sequence first ClearPageUnevictable, then
    test page_evictable, then SetPageUnevictable here? I think not, we're
    under LRU lock, and have no barriers between those.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Shaohua Li
    Cc: Eric Dumazet
    Cc: Johannes Weiner
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     
  • commit cd4ca7afc61d3b18fcd635002459fb6b1d701099 upstream.

    Update xc4000 tuner definition, number 81 is already in use by
    TUNER_PARTSNIC_PTI_5NF05.

    Signed-off-by: Miroslav Slugen
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Miroslav Slugen
     
  • commit 895f3022523361e9b383cf48f51feb1f7d5e7e53 upstream.

    The target code was not setting the additional sense length field in the
    sense data it returned, which meant that at least the Linux stack
    ignored the ASC/ASCQ fields. For example, without this patch, on a
    tcm_loop device:

    # sg_raw -v /dev/sda 2 0 0 0 0 0

    gives

    cdb to send: 02 00 00 00 00 00
    SCSI Status: Check Condition

    Sense Information:
    Fixed format, current; Sense key: Illegal Request
    Raw sense data (in hex):
    70 00 05 00 00 00 00 00

    while after the patch we correctly get the following (which matches what
    a regular disk returns):

    cdb to send: 02 00 00 00 00 00
    SCSI Status: Check Condition

    Sense Information:
    Fixed format, current; Sense key: Illegal Request
    Additional sense: Invalid command operation code
    Raw sense data (in hex):
    70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00
    00 00

    Signed-off-by: Roland Dreier
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Roland Dreier
     
  • commit 8df0eb7c9d96f9e82f233ee8b74e0f0c8471f868 upstream.

    In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
    32bits for these. The new fields were reserved before.
    According to the ACPI spec, the OS must disregrard reserved fields.
    In order to know whether or not, we must know what version the SRAT
    table has.

    This patch stores the SRAT table revision for later consumption
    by arch specific __init functions.

    Signed-off-by: Kurt Garloff
    Signed-off-by: Len Brown
    Signed-off-by: Greg Kroah-Hartman

    Kurt Garloff
     
  • commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream.

    [ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl
    and -ENOIOCTLCMD from sd_compat_ioctl. ]

    Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
    will pass the command to the underlying block device. This is
    well-known, but it is also a large security problem when (via Unix
    permissions, ACLs, SELinux or a combination thereof) a program or user
    needs to be granted access only to part of the disk.

    This patch lets partitions forward a small set of harmless ioctls;
    others are logged with printk so that we can see which ioctls are
    actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
    Of course it was being sent to a (partition on a) hard disk, so it would
    have failed with ENOTTY and the patch isn't changing anything in
    practice. Still, I'm treating it specially to avoid spamming the logs.

    In principle, this restriction should include programs running with
    CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
    /dev/sdb, it still should not be able to read/write outside the
    boundaries of /dev/sda2 independent of the capabilities. However, for
    now programs with CAP_SYS_RAWIO will still be allowed to send the
    ioctls. Their actions will still be logged.

    This patch does not affect the non-libata IDE driver. That driver
    however already tests for bd != bd->bd_contains before issuing some
    ioctl; it could be restricted further to forbid these ioctls even for
    programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

    Cc: linux-scsi@vger.kernel.org
    Cc: Jens Axboe
    Cc: James Bottomley
    Signed-off-by: Paolo Bonzini
    [ Make it also print the command name when warning - Linus ]
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Paolo Bonzini
     
  • commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream.

    Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

    The function will then be enhanced to detect partition block devices
    and, in that case, subject the ioctls to whitelisting.

    Cc: linux-scsi@vger.kernel.org
    Cc: Jens Axboe
    Cc: James Bottomley
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Paolo Bonzini
     
  • commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream.

    Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
    cause shrink_dcache_parent() to loop forever.

    Here's what appears to happen:

    1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

    2 - CPU1: select_parent(P) locks P->d_lock

    3 - CPU0: shrink_dentry_list() locks C->d_lock
    dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

    4 - CPU1: select_parent(P) locks C->d_lock,
    moves C from dispose list being processed on CPU0 to the new
    dispose list, returns 1

    5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

    6 - Goto 2 with CPU0 and CPU1 switched

    Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
    it found a new one, causing shrink_dentry_list() to think it's making progress
    and loop over and over.

    One way to trigger this is to make udev calls stat() on the sysfs file while it
    is going away.

    Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:

    ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

    Then execute the following loop:

    while true; do
    echo -bond0 > /sys/class/net/bonding_masters
    echo +bond0 > /sys/class/net/bonding_masters
    echo -bond1 > /sys/class/net/bonding_masters
    echo +bond1 > /sys/class/net/bonding_masters
    done

    One fix would be to check all callers and prevent concurrent calls to
    shrink_dcache_parent(). But I think a better solution is to stop the
    stealing behavior.

    This patch adds a new dentry flag that is set when the dentry is added to the
    dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
    new reference just before being pruned.

    If the dentry has this flag, select_parent() will skip it and let
    shrink_dentry_list() retry pruning it. With select_parent() skipping those
    dentries there will not be the appearance of progress (new dentries found) when
    there is none, hence shrink_dcache_parent() will not loop forever.

    Set the flag is also set in prune_dcache_sb() for consistency as suggested by
    Linus.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 2fefb8a09e7ed251ae8996e0c69066e74c5aa560 upstream.

    There's no reason I can see that we need to call sv_shutdown between
    closing the two lists of sockets.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • commit 6c06108be53ca5e94d8b0e93883d534dd9079646 upstream.

    If ctrls->count is too high the multiplication could overflow and
    array_size would be lower than expected. Mauro and Hans Verkuil
    suggested that we cap it at 1024. That comes from the maximum
    number of controls with lots of room for expantion.

    $ grep V4L2_CID include/linux/videodev2.h | wc -l
    211

    Signed-off-by: Dan Carpenter
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit ab936cbcd02072a34b60d268f94440fd5cf1970b upstream.

    Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
    function replace_page_cache_page(). This function replaces a page in the
    radix-tree with a new page. WHen doing this, memory cgroup needs to fix
    up the accounting information. memcg need to check PCG_USED bit etc.

    In some(many?) cases, 'newpage' is on LRU before calling
    replace_page_cache(). So, memcg's LRU accounting information should be
    fixed, too.

    This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
    In that function, old pages will be unaccounted without touching
    res_counter and new page will be accounted to the memcg (of old page).
    WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
    races with LRU handling.

    Background:
    replace_page_cache_page() is called by FUSE code in its splice() handling.
    Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
    page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
    because rmdir() checks the whole LRU is empty and there is no account leak.
    If a page is on the other LRU than it should be, rmdir() will fail.

    This bug was added in March 2011, but no bug report yet. I guess there
    are not many people who use memcg and FUSE at the same time with upstream
    kernels.

    The result of this bug is that admin cannot destroy a memcg because of
    account leak. So, no panic, no deadlock. And, even if an active cgroup
    exist, umount can succseed. So no problem at shutdown.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Miklos Szeredi
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    KAMEZAWA Hiroyuki
     
  • commit 1f536b9e9f85456df93614b3c2f6a1a2b7d7cb9b upstream.

    Building an ARM target we get the following warnings:

    CC arch/arm/kernel/setup.o
    In file included from arch/arm/kernel/setup.c:39:
    arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined
    In file included from arch/arm/kernel/setup.c:24:
    include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition

    Quoting Russell King:

    "linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends
    on stuff in asm/elf.h to determine how stuff inside this file is defined
    at parse time.

    So, if asm/elf.h is included after linux/crash_dump.h or not at all, you
    get a different result from the situation where asm/elf.h is included
    before."

    So add elf.h header to crash_dump.h to avoid this problem.

    The original discussion about this can be found at:
    http://www.spinics.net/lists/arm-kernel/msg154113.html

    Signed-off-by: Fabio Estevam
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Fabio Estevam
     
  • commit 9e7860cee18241633eddb36a4c34c7b61d8cecbc upstream.

    Haogang Chen found out that:

    There is a potential integer overflow in process_msg() that could result
    in cross-domain attack.

    body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH);

    When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent
    call to xb_read() would write to a zero-length buffer.

    The other end of this connection is always the xenstore backend daemon
    so there is no guest (malicious or otherwise) which can do this. The
    xenstore daemon is a trusted component in the system.

    However this seem like a reasonable robustness improvement so we should
    have it.

    And Ian when read the API docs found that:
    The payload length (len field of the header) is limited to 4096
    (XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the
    limit, its xenstored connection will be immediately killed by
    xenstored, which is usually catastrophic from the client's point of
    view. Clients (particularly domains, which cannot just reconnect)
    should avoid this.

    so this patch checks against that instead.

    This also avoids a potential integer overflow pointed out by Haogang Chen.

    Signed-off-by: Ian Campbell
    Cc: Haogang Chen
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Ian Campbell
     
  • commit 1830ea91c20b06608f7cdb2455ce05ba834b3214 upstream.

    Spec shows this as 1010b = 0xa

    Signed-off-by: Alex Williamson
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Alex Williamson
     
  • commit bf118a342f10dafe44b14451a1392c3254629a1f upstream.

    The NFSv4 bitmap size is unbounded: a server can return an arbitrary
    sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
    nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
    with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
    xdr length to the (cached) acl page data.

    This is a general solution to commit e5012d1f "NFSv4.1: update
    nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
    when getting ACLs.

    Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
    was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Andy Adamson
     

13 Jan, 2012

2 commits

  • commit 18b7ede5f7ee2092aedcb578d3ac30bd5d4fc23c upstream.

    [ removed the dwc3 portion of the patch as it didn't apply to
    older kernels - gregkh]

    According to USB 3.0 Specification Table 9-22, if
    bmAttributes [4:0] are set to zero, it means "no
    streams supported", but the way this helper was
    defined on Linux, we will *always* have one stream
    which might cause several problems.

    For example on DWC3, we would tell the controller
    endpoint has streams enabled and yet start transfers
    with Stream ID set to 0, which would goof up the host
    side.

    While doing that, convert the macro to an inline
    function due to the different checks we now need.

    Signed-off-by: Felipe Balbi
    Signed-off-by: Sarah Sharp
    Signed-off-by: Greg Kroah-Hartman

    Felipe Balbi
     
  • commit bc677d5b64644c399cd3db6a905453e611f402ab upstream.

    Add a new field num_mapped_sgs to struct urb so that we have a place to
    store the number of mapped entries and can also retain the original
    value of entries in num_sgs. Previously, usb_hcd_map_urb_for_dma()
    would overwrite this with the number of mapped entries, which would
    break dma_unmap_sg() because it requires the original number of entries.

    This fixes warnings like the following when using USB storage devices:
    ------------[ cut here ]------------
    WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695()
    ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1]
    Modules linked in: ohci_hcd ehci_hcd
    Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319
    Call Trace:
    [] warn_slowpath_common+0x80/0x98
    [] warn_slowpath_fmt+0x41/0x43
    [] check_unmap+0x4e4/0x695
    [] ? trace_hardirqs_off+0xd/0xf
    [] ? _raw_spin_unlock_irqrestore+0x33/0x50
    [] debug_dma_unmap_sg+0xeb/0x117
    [] usb_hcd_unmap_urb_for_dma+0x71/0x188
    [] unmap_urb_for_dma+0x20/0x22
    [] usb_hcd_giveback_urb+0x5d/0xc0
    [] ehci_urb_done+0xf7/0x10c [ehci_hcd]
    [] qh_completions+0x429/0x4bd [ehci_hcd]
    [] ehci_work+0x95/0x9c0 [ehci_hcd]
    ...
    ---[ end trace f29ac88a5a48c580 ]---
    Mapped at:
    [] debug_dma_map_sg+0x45/0x139
    [] usb_hcd_map_urb_for_dma+0x22e/0x478
    [] usb_hcd_submit_urb+0x63f/0x6fa
    [] usb_submit_urb+0x2c7/0x2de
    [] usb_sg_wait+0x55/0x161

    Signed-off-by: Clemens Ladisch
    Signed-off-by: Greg Kroah-Hartman

    Clemens Ladisch
     

04 Jan, 2012

1 commit

  • Commit 1e39f384bb01 ("evm: fix build problems") makes the stub version
    of security_old_inode_init_security() return 0 when CONFIG_SECURITY is
    not set.

    But that makes callers such as reiserfs_security_init() assume that
    security_old_inode_init_security() has set name, value, and len
    arguments properly - but security_old_inode_init_security() left them
    uninitialized which then results in interesting failures.

    Revert security_old_inode_init_security() to the old behavior of
    returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this
    just fine.

    [ Also fixed the S_PRIVATE(inode) case of the actual non-stub
    security_old_inode_init_security() function to return EOPNOTSUPP
    for the same reason, as pointed out by Mimi Zohar.

    It got incorrectly changed to match the new function in commit
    fb88c2b6cbb1: "evm: fix security/security_old_init_security return
    code". - Linus ]

    Reported-by: Jorge Bastos
    Acked-by: James Morris
    Acked-by: Mimi Zohar
    Signed-off-by: Jan Kara
    Signed-off-by: Linus Torvalds

    Jan Kara
     

31 Dec, 2011

1 commit


30 Dec, 2011

1 commit

  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     

26 Dec, 2011

1 commit

  • Unlike all of the other cpuid bits, the TSC deadline timer bit is set
    unconditionally, regardless of what userspace wants.

    This is broken in several ways:
    - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
    deadline timer feature, a guest that uses the feature will break
    - live migration to older host kernels that don't support the TSC deadline
    timer will cause the feature to be pulled from under the guest's feet;
    breaking it
    - guests that are broken wrt the feature will fail.

    Fix by not enabling the feature automatically; instead report it to userspace.
    Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
    will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
    KVM_GET_SUPPORTED_CPUID.

    Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

    [avi: add the KVM_CAP + documentation]

    Reported-by: Alexey Zaytsev
    Tested-by: Alexey Zaytsev
    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

24 Dec, 2011

2 commits


23 Dec, 2011

2 commits

  • skb->truesize might be big even for a small packet.

    Its even bigger after commit 87fb4b7b533 (net: more accurate skb
    truesize) and big MTU.

    We should allow queueing at least one packet per receiver, even with a
    low RCVBUF setting.

    Reported-by: Michal Simek
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Chris Boot reported crashes occurring in ipv6_select_ident().

    [ 461.457562] RIP: 0010:[] []
    ipv6_select_ident+0x31/0xa7

    [ 461.578229] Call Trace:
    [ 461.580742]
    [ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
    [ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
    [ 461.595140] [] ? skb_gso_segment+0x208/0x28b
    [ 461.601198] [] ? ipv6_confirm+0x146/0x15e
    [nf_conntrack_ipv6]
    [ 461.608786] [] ? nf_iterate+0x41/0x77
    [ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
    [ 461.620659] [] ? nf_hook_slow+0x73/0x111
    [ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
    [bridge]
    [ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
    [ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
    [bridge]
    [ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
    [bridge]
    [ 461.653997] [] ? nf_iterate+0x41/0x77
    [ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.665485] [] ? nf_hook_slow+0x73/0x111
    [ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.677299] [] ?
    nf_bridge_update_protocol+0x20/0x20 [bridge]
    [ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
    [ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
    [bridge]
    [ 461.704616] [] ?
    nf_bridge_push_encap_header+0x1c/0x26 [bridge]
    [ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
    [bridge]
    [ 461.719490] [] ?
    nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
    [ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
    [ 461.734292] [] ? nf_iterate+0x41/0x77
    [ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.746203] [] ? nf_hook_slow+0x73/0x111
    [ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
    [bridge]

    This is caused by bridge netfilter special dst_entry (fake_rtable), a
    special shared entry, where attaching an inetpeer makes no sense.

    Problem is present since commit 87c48fa3b46 (ipv6: make fragment
    identifications less predictable)

    Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
    __ip_select_ident() fallback to the 'no peer attached' handling.

    Reported-by: Chris Boot
    Tested-by: Chris Boot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Dec, 2011

3 commits

  • Currently, the *_global_[un]lock_online() routines are not at all synchronized
    with CPU hotplug. Soft-lockups detected as a consequence of this race was
    reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
    for finding out that the root-cause of this issue is the race condition
    between br_write_[un]lock() and CPU hotplug, which results in the lock states
    getting messed up).

    Fixing this race by just adding {get,put}_online_cpus() at appropriate places
    in *_global_[un]lock_online() is not a good option, because, then suddenly
    br_write_[un]lock() would become blocking, whereas they have been kept as
    non-blocking all this time, and we would want to keep them that way.

    So, overall, we want to ensure 3 things:
    1. br_write_lock() and br_write_unlock() must remain as non-blocking.
    2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
    for different sets of CPUs.
    3. Either prevent any new CPU online operation in between this lock-unlock, or
    ensure that the newly onlined CPU does not proceed with its corresponding
    per-cpu spinlock unlocked.

    To achieve all this:
    (a) We introduce a new spinlock that is taken by the *_global_lock_online()
    routine and released by the *_global_unlock_online() routine.
    (b) We register a callback for CPU hotplug notifications, and this callback
    takes the same spinlock as above.
    (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
    initialized in the lock_init() code, all future updates to it are done in
    the callback, under the above spinlock.
    (d) The above bitmap is used (instead of cpu_online_mask) while locking and
    unlocking the per-cpu locks.

    The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
    br_write_lock-unlock sequence is in progress, the callback keeps spinning,
    thus preventing the CPU online operation till the lock-unlock sequence is
    complete. This takes care of requirement (3).

    The bitmap that we maintain remains unmodified throughout the lock-unlock
    sequence, since all updates to it are managed by the callback, which takes
    the same spinlock as the one taken by the lock code and released only by the
    unlock routine. Combining this with (d) above, satisfies requirement (2).

    Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
    operations from racing with br_write_lock-unlock, requirement (1) is also
    taken care of.

    By the way, it is to be noted that a CPU offline operation can actually run
    in parallel with our lock-unlock sequence, because our callback doesn't react
    to notifications earlier than CPU_DEAD (in order to maintain our bitmap
    properly). And this means, since we use our own bitmap (which is stale, on
    purpose) during the lock-unlock sequence, we could end up unlocking the
    per-cpu lock of an offline CPU (because we had locked it earlier, when the
    CPU was online), in order to satisfy requirement (2). But this is harmless,
    though it looks a bit awkward.

    Debugged-by: Cong Meng
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Al Viro
    Cc: stable@vger.kernel.org

    Srivatsa S. Bhat
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: Add a flow_cache_flush_deferred function
    ipv4: reintroduce route cache garbage collector
    net: have ipconfig not wait if no dev is available
    sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
    asix: new device id
    davinci-cpdma: fix locking issue in cpdma_chan_stop
    sctp: fix incorrect overflow check on autoclose
    r8169: fix Config2 MSIEnable bit setting.
    llc: llc_cmsg_rcv was getting called after sk_eat_skb.
    net: bpf_jit: fix an off-one bug in x86_64 cond jump target
    iwlwifi: update SCD BC table for all SCD queues
    Revert "Bluetooth: Revert: Fix L2CAP connection establishment"
    Bluetooth: Clear RFCOMM session timer when disconnecting last channel
    Bluetooth: Prevent uninitialized data access in L2CAP configuration
    iwlwifi: allow to switch to HT40 if not associated
    iwlwifi: tx_sync only on PAN context
    mwifiex: avoid double list_del in command cancel path
    ath9k: fix max phy rate at rate control init
    nfc: signedness bug in __nci_request()
    iwlwifi: do not set the sequence control bit is not needed

    Linus Torvalds
     
  • flow_cach_flush() might sleep but can be called from
    atomic context via the xfrm garbage collector. So add
    a flow_cache_flush_deferred() function and use this if
    the xfrm garbage colector is invoked from within the
    packet path.

    Signed-off-by: Steffen Klassert
    Acked-by: Timo Teräs
    Signed-off-by: David S. Miller

    Steffen Klassert
     

21 Dec, 2011

3 commits

  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time/clocksource: Fix kernel-doc warnings
    rtc: m41t80: Workaround broken alarm functionality
    rtc: Expire alarms after the time is set.

    Linus Torvalds
     
  • …kernel/git/konrad/xen

    * 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"

    Linus Torvalds
     
  • * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (31 commits)
    Revert "[media] af9015: limit I2C access to keep FW happy"
    [media] s5p-fimc: Fix camera input configuration in subdev operations
    [media] m5mols: Fix logic in sanity check
    [media] ati_remote: switch to single-byte scancodes
    [media] V4L: mt9m111: fix uninitialised mutex
    [media] V4L: omap1_camera: fix missing include
    [media] V4L: mt9t112: use after free in mt9t112_probe()
    [media] V4L: soc-camera: fix compiler warnings on 64-bit platforms
    [media] s5p_mfc_enc: fix s/H264/H263/ typo
    [media] omap_vout: Fix compile error in 3.1
    [media] au0828: add missing models 72101, 72201 & 72261 to the model matrix
    [media] au0828: add missing USB ID 2040:7213
    [media] au0828: add missing USB ID 2040:7260
    [media] [trivial] omap24xxcam-dma: Fix logical test
    [media] omap_vout: fix crash if no driver for a display
    [media] media: video: s5p-tv: fix build break
    [media] omap3isp: fix compilation of ispvideo.c
    [media] m5mols: Fix set_fmt to return proper pixel format code
    [media] s5p-fimc: Use correct fourcc for RGB565 colour format
    [media] s5p-fimc: Fail driver probing when sensor configuration is wrong
    ...

    Linus Torvalds
     

20 Dec, 2011

1 commit

  • Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
    limiting the autoclose value. If userspace passes in -1 on 32-bit
    platform, the overflow check didn't work and autoclose would be set
    to 0xffffffff.

    This patch defines a max_autoclose (in seconds) for limiting the value
    and exposes it through sysctl, with the following intentions.

    1) Avoid overflowing autoclose * HZ.

    2) Keep the default autoclose bound consistent across 32- and 64-bit
    platforms (INT_MAX / HZ in this patch).

    3) Keep the autoclose value consistent between setsockopt() and
    getsockopt() calls.

    Suggested-by: Vlad Yasevich
    Signed-off-by: Xi Wang
    Signed-off-by: David S. Miller

    Xi Wang
     

19 Dec, 2011

3 commits

  • This reverts commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05.
    As when booting the kernel under Amazon EC2 as an HVM guest it ends up
    hanging during startup. Reverting this we loose the fix for kexec
    booting to the crash kernels.

    Fixes Canonical BZ #901305 (http://bugs.launchpad.net/bugs/901305)

    Tested-by: Alessandro Salvatori
    Reported-by: Stefan Bader
    Acked-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Fix various KernelDoc build warnings.

    Signed-off-by: Kusanagi Kouichi
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20111219091320.0D5AF6FC03D@msa105.auone-net.jp
    Signed-off-by: Ingo Molnar

    Kusanagi Kouichi
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (22 commits)
    [SCSI] fcoe: fix fcoe in a DCB environment by adding DCB notifiers to set skb priority
    [SCSI] bnx2i: Fixed kernel panic caused by unprotected task->sc->request deref
    [SCSI] qla4xxx: check for failed conn setup
    [SCSI] qla4xxx: a small loop fix
    [SCSI] qla4xxx: fix flash/ddb support
    [SCSI] zfcp: return early from slave_destroy if slave_alloc returned early
    [SCSI] fcoe: Fix preempt count leak in fcoe_filter_frames()
    [SCSI] qla2xxx: Update version number to 8.03.07.12-k.
    [SCSI] qla2xxx: Submit all chained IOCBs for passthrough commands on request queue 0.
    [SCSI] qla2xxx: Correct fc_host port_state display.
    [SCSI] qla2xxx: Disable generating pause frames when firmware hang detected for ISP82xx.
    [SCSI] qla2xxx: Clear mailbox busy flag during premature mailbox completion for ISP82xx.
    [SCSI] qla2xxx: Encapsulate prematurely completing mailbox commands during ISP82xx firmware hang.
    [SCSI] qla2xxx: Display IPE error message for ISP82xx.
    [SCSI] qla2xxx: Return the correct value for a mailbox command if 82xx is in reset recovery.
    [SCSI] qla2xxx: Enable Minidump by default with default capture mask 0x1f.
    [SCSI] qla2xxx: Stop unconditional completion of mailbox commands issued in interrupt mode during firmware hang.
    [SCSI] qla2xxx: Revert back the request queue mapping to request queue 0.
    [SCSI] qla2xxx: Don't call alloc_fw_dump for ISP82XX.
    [SCSI] qla2xxx: Check for SCSI status on underruns.
    ...

    Linus Torvalds
     

18 Dec, 2011

1 commit