13 Jun, 2013

31 commits

  • Pull block layer fixes from Jens Axboe:
    "Outside of bcache (which really isn't super big), these are all
    few-liners. There are a few important fixes in here:

    - Fix blk pm sleeping when holding the queue lock

    - A small collection of bcache fixes that have been done and tested
    since bcache was included in this merge window.

    - A fix for a raid5 regression introduced with the bio changes.

    - Two important fixes for mtip32xx, fixing an oops and potential data
    corruption (or hang) due to wrong bio iteration on stacked devices."

    * 'for-linus' of git://git.kernel.dk/linux-block:
    scatterlist: sg_set_buf() argument must be in linear mapping
    raid5: Initialize bi_vcnt
    pktcdvd: silence static checker warning
    block: remove refs to XD disks from documentation
    blkpm: avoid sleep when holding queue lock
    mtip32xx: Correctly handle bio->bi_idx != 0 conditions
    mtip32xx: Fix NULL pointer dereference during module unload
    bcache: Fix error handling in init code
    bcache: clarify free/available/unused space
    bcache: drop "select CLOSURES"
    bcache: Fix incompatible pointer type warning

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "Bunch of fixes and one little addition to math64.h"

    * emailed patches from Andrew Morton : (27 commits)
    include/linux/math64.h: add div64_ul()
    mm: memcontrol: fix lockless reclaim hierarchy iterator
    frontswap: fix incorrect zeroing and allocation size for frontswap_map
    kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
    mm: migration: add migrate_entry_wait_huge()
    ocfs2: add missing lockres put in dlm_mig_lockres_handler
    mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
    drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
    aio: fix io_destroy() regression by using call_rcu()
    rtc-at91rm9200: use shadow IMR on at91sam9x5
    rtc-at91rm9200: add shadow interrupt mask
    rtc-at91rm9200: refactor interrupt-register handling
    rtc-at91rm9200: add configuration support
    rtc-at91rm9200: add match-table compile guard
    fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
    swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
    drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
    cciss: fix broken mutex usage in ioctl
    audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
    drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
    ...

    Linus Torvalds
     
  • There is div64_long() to handle the s64/long division, but no mocro do
    u64/ul division. It is necessary in some scenarios, so add this
    function.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alex Shi
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Shi
     
  • The lockless reclaim hierarchy iterator currently has a misplaced
    barrier that can lead to use-after-free crashes.

    The reclaim hierarchy iterator consist of a sequence count and a
    position pointer that are read and written locklessly, with memory
    barriers enforcing ordering.

    The write side sets the position pointer first, then updates the
    sequence count to "publish" the new position. Likewise, the read side
    must read the sequence count first, then the position. If the sequence
    count is up to date, it's guaranteed that the position is up to date as
    well:

    writer: reader:
    iter->position = position if iter->sequence == expected:
    smp_wmb() smp_rmb()
    iter->sequence = sequence position = iter->position

    However, the read side barrier is currently misplaced, which can lead to
    dereferencing stale position pointers that no longer point to valid
    memory. Fix this.

    Signed-off-by: Johannes Weiner
    Reported-by: Tejun Heo
    Reviewed-by: Tejun Heo
    Acked-by: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Glauber Costa
    Cc: [3.10+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The bitmap accessed by bitops must have enough size to hold the required
    numbers of bits rounded up to a multiple of BITS_PER_LONG. And the
    bitmap must not be zeroed by memset() if the number of bits cleared is
    not a multiple of BITS_PER_LONG.

    This fixes incorrect zeroing and allocation size for frontswap_map. The
    incorrect zeroing part doesn't cause any problem because frontswap_map
    is freed just after zeroing. But the wrongly calculated allocation size
    may cause the problem.

    For 32bit systems, the allocation size of frontswap_map is about twice
    as large as required size. For 64bit systems, the allocation size is
    smaller than requeired if the number of bits is not a multiple of
    BITS_PER_LONG.

    Signed-off-by: Akinobu Mita
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
    the rule itself freed in kill_rules().

    The reason is when it is killed, the 'rule' itself may have already
    released, we should not access it. one example: we add a rule to an
    inode, just at the same time the other task is deleting this inode.

    The work flow for adding a rule:

    audit_receive() -> (need audit_cmd_mutex lock)
    audit_receive_skb() ->
    audit_receive_msg() ->
    audit_receive_filter() ->
    audit_add_rule() ->
    audit_add_tree_rule() -> (need audit_filter_mutex lock)
    ...
    unlock audit_filter_mutex
    get_tree()
    ...
    iterate_mounts() -> (iterate all related inodes)
    tag_mount() ->
    tag_trunk() ->
    create_trunk() -> (assume it is 1st rule)
    fsnotify_add_mark() ->
    fsnotify_add_inode_mark() -> (add mark to inode->i_fsnotify_marks)
    ...
    get_tree(); (each inode will get one)
    ...
    lock audit_filter_mutex

    The work flow for deleting an inode:

    __destroy_inode() ->
    fsnotify_inode_delete() ->
    __fsnotify_inode_delete() ->
    fsnotify_clear_marks_by_inode() -> (get mark from inode->i_fsnotify_marks)
    fsnotify_destroy_mark() ->
    fsnotify_destroy_mark_locked() ->
    audit_tree_freeing_mark() ->
    evict_chunk() ->
    ...
    tree->goner = 1
    ...
    kill_rules() -> (assume current->audit_context == NULL)
    call_rcu() -> (rule->tree != NULL)
    audit_free_rule_rcu() ->
    audit_free_rule()
    ...
    audit_schedule_prune() -> (assume current->audit_context == NULL)
    kthread_run() -> (need audit_cmd_mutex and audit_filter_mutex lock)
    prune_one() -> (delete it from prue_list)
    put_tree(); (match the original get_tree above)

    Signed-off-by: Chen Gang
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • When we have a page fault for the address which is backed by a hugepage
    under migration, the kernel can't wait correctly and do busy looping on
    hugepage fault until the migration finishes. As a result, users who try
    to kick hugepage migration (via soft offlining, for example) occasionally
    experience long delay or soft lockup.

    This is because pte_offset_map_lock() can't get a correct migration entry
    or a correct page table lock for hugepage. This patch introduces
    migration_entry_wait_huge() to solve this.

    Signed-off-by: Naoya Horiguchi
    Reviewed-by: Rik van Riel
    Reviewed-by: Wanpeng Li
    Reviewed-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: KOSAKI Motohiro
    Cc: [2.6.35+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.

    Signed-off-by: joyce
    Reviewed-by: shencanquan
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • The watermark check consists of two sub-checks. The first one is:

    if (free_pages < order; o++) {
    free_pages -= z->free_area[o].nr_free << o;
    min >>= 1;
    if (free_pages free_area[o].nr_free is equal to the number of free pages
    including free CMA pages. Therefore the CMA pages are subtracted twice.
    This may cause a false positive fail of __zone_watermark_ok() if the CMA
    area gets strongly fragmented. In such a case there are many 0-order
    free pages located in CMA. Those pages are subtracted twice therefore
    they will quickly drain free_pages during the check against
    fragmentation. The test fails even though there are many free non-cma
    pages in the zone.

    This patch fixes this issue by subtracting CMA pages only for a purpose of
    (free_pages
    Signed-off-by: Kyungmin Park
    Tested-by: Laura Abbott
    Cc: Bartlomiej Zolnierkiewicz
    Acked-by: Minchan Kim
    Cc: Mel Gorman
    Tested-by: Marek Szyprowski
    Cc: [3.7+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Stanislawski
     
  • The "info.fill" array isn't initialized so it can leak uninitialized stack
    information to user space.

    Signed-off-by: Dan Carpenter
    Acked-by: Robin Holt
    Acked-by: Dimitri Sivanich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • There was a regression introduced by 36f5588905c1 ("aio: refcounting
    cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
    using RCU in the shutdown path, but the synchronize_rcu() was done in
    the context of the io_destroy() syscall greatly increasing the time it
    could block.

    This patch switches it to call_rcu() and makes shutdown asynchronous
    (more asynchronous than it was originally; before the refcount changes
    io_destroy() would still wait on pending kiocbs).

    Note that there's a global quota on the max outstanding kiocbs, and that
    quota must be manipulated synchronously; otherwise io_setup() could
    return -EAGAIN when there isn't quota available, and userspace won't
    have any way of waiting until shutdown of the old kioctxs has finished
    (besides busy looping).

    So we release our quota before kioctx shutdown has finished, which
    should be fine since the quota never corresponded to anything real
    anyways.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Signed-off-by: Benjamin LaHaise
    Tested-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Add support for the at91sam9x5-family which must use the shadow
    interrupt mask due to a hardware issue (causing RTC_IMR to always be
    zero).

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add shadow interrupt-mask register which can be used on SoCs where the
    actual hardware register is broken.

    Note that some care needs to be taken to make sure the shadow mask
    corresponds to the actual hardware state. The added overhead is not an
    issue for the non-broken SoCs due to the relatively infrequent
    interrupt-mask updates. We do, however, only use the shadow mask value
    as a fall-back when it actually needed as there is still a theoretical
    possibility that the mask is incorrect (see the code for details).

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add accessors for the interrupt register.

    This will allow us to easily add a shadow interrupt-mask register to use
    on SoCs where the interrupt-mask register cannot be used.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add configuration support which can be used to implement SoC-specific
    workarounds for broken hardware.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • The members of Atmel's at91sam9x5 family (9x5) have a broken RTC
    interrupt mask register (AT91_RTC_IMR). It does not reflect enabled
    interrupts but instead always returns zero.

    The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family.
    Currently when the date/time is set, an interrupt is generated and this
    driver neglects to handle the interrupt. The kernel complains about the
    un-handled interrupt and disables it henceforth. This not only breaks
    the RTC function, but since that interrupt is shared (Atmel's SYS
    interrupt) then other things break as well (e.g. the debug port no
    longer accepts characters).

    Tested on the at91sam9g25. Bug confirmed by Atmel.

    This patch (of 5):

    Add missing match-table compile guard.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • While removing a non-empty directory, the kernel dumps a message:

    (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39

    Suppress the error message from being printed in the dmesg so users
    don't panic.

    Signed-off-by: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Acked-by: Sunil Mushran
    Reviewed-by: Jie Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • read_swap_cache_async() can race against get_swap_page(), and stumble
    across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
    into the swapcache yet.

    This transient swap_map state is expected to be transitory, but the
    actual placement of discard at scan_swap_map() inserts a wait for I/O
    completion thus making the thread at read_swap_cache_async() to loop
    around its -EEXIST case, while the other end at get_swap_page() is
    scheduled away at scan_swap_map(). This can leave the system deadlocked
    if the I/O completion happens to be waiting on the CPU waitqueue where
    read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.

    This patch introduces a cond_resched() call to make the aforementioned
    read_swap_cache_async() busy loop condition to bail out when necessary,
    thus avoiding the subtle race window.

    Signed-off-by: Rafael Aquini
    Acked-by: Johannes Weiner
    Acked-by: KOSAKI Motohiro
    Acked-by: Hugh Dickins
    Cc: Shaohua Li
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • When booted in legacy mode device_init_wakeup() gets called by
    drivers/mfd/twl-core.c when the children are initialized. However, when
    booted using device tree, the children are created with
    of_platform_populate() instead add_children().

    This means that the RTC driver will not have device_init_wakeup() set,
    and we need to call it from the driver probe like RTC drivers typically
    do.

    Without this we cannot test PM wake-up events on omaps for cases where
    there may not be any physical wake-up event.

    Signed-off-by: Tony Lindgren
    Reported-by: Kevin Hilman
    Cc: Alessandro Zummo
    Cc: Jingoo Han
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Lindgren
     
  • If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked
    (as is normal with the Array Configuration Utility) the process will
    hang as below. It attempts to acquire the same mutex twice, once in
    do_ioctl() and once in cciss_unlocked_open(). The BKL was recursive,
    the mutex isn't.

    Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013
    [...]
    acu D 0000000000000001 0 3246 3191 0x00000080
    Call Trace:
    schedule+0x29/0x70
    schedule_preempt_disabled+0xe/0x10
    __mutex_lock_slowpath+0x17b/0x220
    mutex_lock+0x2b/0x50
    cciss_unlocked_open+0x2f/0x110 [cciss]
    __blkdev_get+0xd3/0x470
    blkdev_get+0x5c/0x1e0
    register_disk+0x182/0x1a0
    add_disk+0x17c/0x310
    cciss_add_disk+0x13a/0x170 [cciss]
    cciss_update_drive_info+0x39b/0x480 [cciss]
    rebuild_lun_table+0x258/0x370 [cciss]
    cciss_ioctl+0x34f/0x470 [cciss]
    do_ioctl+0x49/0x70 [cciss]
    __blkdev_driver_ioctl+0x28/0x30
    blkdev_ioctl+0x200/0x7b0
    block_ioctl+0x3c/0x40
    do_vfs_ioctl+0x89/0x350
    SyS_ioctl+0xa1/0xb0
    system_call_fastpath+0x16/0x1b

    This mutex usage was added into the ioctl path when the big kernel lock
    was removed. As it turns out, these paths are all thread safe anyway
    (or can easily be made so) and we don't want ioctl() to be single
    threaded in any case.

    Signed-off-by: Stephen M. Cameron
    Cc: Jens Axboe
    Cc: Mike Miller
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen M. Cameron
     
  • audit_log_start() does wait_for_auditd() in a loop until
    audit_backlog_wait_time passes or audit_skb_queue has a room.

    If signal_pending() is true this becomes a busy-wait loop, schedule() in
    TASK_INTERRUPTIBLE won't block.

    Thanks to Guy for fully investigating and explaining the problem.

    (akpm: that'll cause the system to lock up on a non-preemptible
    uniprocessor kernel)

    (Guy: "Our customer was in fact running a uniprocessor machine, and they
    reported a system hang.")

    Signed-off-by: Oleg Nesterov
    Reported-by: Guy Streeter
    Cc: Eric Paris
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • During resume, we call hpet_rtc_timer_init after masking an irq bit in
    hpet. This will cause the call to hpet_disable_rtc_channel to be undone
    if RTC_AIE is the only bit not masked.

    Allowing the cmos interrupt handler to run before resuming caused some
    issues where the timer for the alarm was not removed. This would cause
    other, later timers to not be cleared, so utilities such as hwclock
    would time out when waiting for the update interrupt.

    [akpm@linux-foundation.org: coding-style tweak]
    Signed-off-by: Derek Basehore
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Derek Basehore
     
  • Use device_init_wakeup() instead of device_set_wakeup_capable() and move
    it before rtc dev registering. This fixes alarmtimer not registered
    when tps6586x rtc is the only wakeup compatible rtc in the system.

    Signed-off-by: Dmitry Osipenko
    Cc: Laxman Dewangan
    Cc: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Osipenko
     
  • struct memcg_cache_params has a union. Different parts of this union
    are used for root and non-root caches. A part with destroying work is
    used only for non-root caches.

    BUG: unable to handle kernel paging request at 0000000fffffffe0
    IP: kmem_cache_alloc+0x41/0x1f0
    Modules linked in: netlink_diag af_packet_diag udp_diag tcp_diag inet_diag unix_diag ip6table_filter ip6_tables i2c_piix4 virtio_net virtio_balloon microcode i2c_core pcspkr floppy
    CPU: 0 PID: 1929 Comm: lt-vzctl Tainted: G D 3.10.0-rc1+ #2
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    RIP: kmem_cache_alloc+0x41/0x1f0
    Call Trace:
    getname_flags.part.34+0x30/0x140
    getname+0x38/0x60
    do_sys_open+0xc5/0x1e0
    SyS_open+0x22/0x30
    system_call_fastpath+0x16/0x1b
    Code: f4 53 48 83 ec 18 8b 05 8e 53 b7 00 4c 8b 4d 08 21 f0 a8 10 74 0d 4c 89 4d c0 e8 1b 76 4a 00 4c 8b 4d c0 e9 92 00 00 00 4d 89 f5 8b 45 00 65 4c 03 04 25 48 cd 00 00 49 8b 50 08 4d 8b 38 49
    RIP [] kmem_cache_alloc+0x41/0x1f0

    Signed-off-by: Andrey Vagin
    Cc: Konstantin Khlebnikov
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Cc: Li Zefan
    Cc: [3.9.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
    ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
    caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
    a NULL ocfs2_alloc_context struct in the following functions. A kernel
    panic happens.

    Signed-off-by: "Xiaowei.Hu"
    Reviewed-by: shencanquan
    Acked-by: Sunil Mushran
    Cc: Joe Jin
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaowei.Hu
     
  • For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
    ('nbytes' is size_t which is always bigger or equal than zero).

    The related warning: (with EXTRA_CFLAGS=-W)

    lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

    Signed-off-by: Chen Gang
    Cc: Rusty Russell
    Cc: David Howells
    Cc: James Morris
    Cc: Andy Shevchenko
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • The dmesg_restrict sysctl currently covers the syslog method for access
    dmesg, however /dev/kmsg isn't covered by the same protections. Most
    people haven't noticed because util-linux dmesg(1) defaults to using the
    syslog method for access in older versions. With util-linux dmesg(1)
    defaults to reading directly from /dev/kmsg.

    To fix /dev/kmsg, let's compare the existing interfaces and what they
    allow:

    - /proc/kmsg allows:
    - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
    - everything, after an open.

    - syslog syscall allows:
    - anything, if CAP_SYSLOG.
    - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
    - nothing else (EPERM).

    The use-cases were:
    - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
    - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
    destructive SYSLOG_ACTION_READs.

    AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
    clear the ring buffer.

    Based on the comments in devkmsg_llseek, it sounds like actions besides
    reading aren't going to be supported by /dev/kmsg (i.e.
    SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
    syslog syscall actions.

    To this end, move the check as Josh had done, but also rename the
    constants to reflect their new uses (SYSLOG_FROM_CALL becomes
    SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
    SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
    allows destructive actions after a capabilities-constrained
    SYSLOG_ACTION_OPEN check.

    - /dev/kmsg allows:
    - open if CAP_SYSLOG or dmesg_restrict==0
    - reading/polling, after open

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

    [akpm@linux-foundation.org: use pr_warn_once()]
    Signed-off-by: Kees Cook
    Reported-by: Christian Kujau
    Tested-by: Josh Boyer
    Cc: Kay Sievers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • We recently noticed that reboot of a 1024 cpu machine takes approx 16
    minutes of just stopping the cpus. The slowdown was tracked to commit
    f96972f2dc63 ("kernel/sys.c: call disable_nonboot_cpus() in
    kernel_restart()").

    The current implementation does all the work of hot removing the cpus
    before halting the system. We are switching to just migrating to the
    boot cpu and then continuing with shutdown/reboot.

    This also has the effect of not breaking x86's command line parameter
    for specifying the reboot cpu. Note, this code was shamelessly copied
    from arch/x86/kernel/reboot.c with bits removed pertaining to the
    reboot_cpu command line parameter.

    Signed-off-by: Robin Holt
    Tested-by: Shawn Guo
    Cc: "Srivatsa S. Bhat"
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • There are instances in the kernel where we would like to disable CPU
    hotplug (from sysfs) during some important operation. Today the freezer
    code depends on this and the code to do it was kinda tailor-made for
    that.

    Restructure the code and make it generic enough to be useful for other
    usecases too.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Robin Holt
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc: Shawn Guo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     
  • Pull MIPS fixes from Ralf Baechle:
    "Resurrect Alchemy platforms by invoking the WAIT instructions with
    interrupts enabled. This still leaves the race condition between
    testing TIF_NEED_RESCHED and the WAIT instruction for Alchemy
    platforms which need a different fix than other MIPS platforms. But
    at least it gets MIPS platforms flying again.

    There are also fixes for two build errors (CONFIG_FTRACE=y with
    CONFIG_DYNAMIC_FTRACE=n) and CONFIG_VIRTUALIZATION without CONFIG_KVM"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
    MIPS: ftrace: Add missing CONFIG_DYNAMIC_FTRACE
    MIPS: include: mmu_context.h: Replace VIRTUALIZATION with KVM
    MIPS: Alchemy: fix wait function

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Just some GMA500 memory leaks and i915 regression fix due to a
    regression fix"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    drm/i915: prefer VBT modes for SVDO-LVDS over EDID
    drm/i915: Enable hotplug interrupts after querying hw capabilities.
    drm/i915: Fix hotplug interrupt enabling for SDVOC
    drm/gma500/cdv: Fix cursor gem obj referencing on cdv
    drm/gma500/psb: Fix cursor gem obj referencing on psb
    drm/gma500/cdv: Unpin framebuffer on crtc disable
    drm/gma500/psb: Unpin framebuffer on crtc disable
    drm/gma500: Add fb gtt offset to fb base

    Linus Torvalds
     

12 Jun, 2013

5 commits

  • …it/rostedt/linux-trace

    Pull tracing fix from Steven Rostedt:
    "Yoshihiro Yunomae fixed a regression in the output format when using
    one of the counter clocks.

    The new multibuffer code changed the trace_clock file to update the
    trace instances tr->clock_id but the actual traces still used the
    value from the obsolete global variable trace_clock_id"

    * tag 'trace-fixes-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix outputting formats of x86-tsc and counter when use trace_clock

    Linus Torvalds
     
  • Pull ceph fixes from Sage Weil:
    "There is a pair of fixes for double-frees in the recent bundle for
    3.10, a couple of fixes for long-standing bugs (sleep while atomic and
    an endianness fix), and a locking fix that can be triggered when osds
    are going down"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup in rbd_add()
    rbd: don't destroy ceph_opts in rbd_add()
    ceph: ceph_pagelist_append might sleep while atomic
    ceph: add cpu_to_le32() calls when encoding a reconnect capability
    libceph: must hold mutex for reset_changed_osds()

    Linus Torvalds
     
  • Pull NVMe fixes from Matthew Wilcox.

    * 'fixes-3.10' of git://git.infradead.org/users/willy/linux-nvme:
    NVMe: Add MSI support
    NVMe: Use dma_set_mask() correctly
    Return the result from user admin command IOCTL even in case of failure
    NVMe: Do not cancel command multiple times
    NVMe: fix error return code in nvme_submit_bio_queue()
    NVMe: check for integer overflow in nvme_map_user_pages()
    MAINTAINERS: update NVM EXPRESS DRIVER file list
    NVMe: Fix a signedness bug in nvme_trans_modesel_get_mp
    NVMe: Remove redundant version.h header include

    Linus Torvalds
     
  • Pull kvm bugfixes from Gleb Natapov:
    "There is one more fix for MIPS KVM ABI here, MIPS and PPC build
    breakage fixes and a couple of PPC bug fixes"

    * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
    kvm/ppc/booke: Hold srcu lock when calling gfn functions
    kvm/ppc/booke64: Disable e6500 support
    kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
    mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
    kvm: Add definition of KVM_REG_MIPS
    KVM: add kvm_para_available to asm-generic/kvm_para.h

    Linus Torvalds
     
  • Outputting formats of x86-tsc and counter should be a raw format, but after
    applying the patch(2b6080f28c7cc3efc8625ab71495aae89aeb63a0), the format was
    changed to nanosec. This is because the global variable trace_clock_id was used.
    When we use multiple buffers, clock_id of each sub-buffer should be used. Then,
    this patch uses tr->clock_id instead of the global variable trace_clock_id.

    [ Basically, this fixes a regression where the multibuffer code changed the
    trace_clock file to update tr->clock_id but the traces still use the old
    global trace_clock_id variable, negating the file's effect. The global
    trace_clock_id variable is obsolete and removed. - SR ]

    Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel

    Signed-off-by: Yoshihiro YUNOMAE
    Signed-off-by: Steven Rostedt

    Yoshihiro YUNOMAE
     

11 Jun, 2013

4 commits

  • …rm-intel into drm-fixes

    Daniel writes:
    Just tiny regression fixes here:
    - Two fixes to fix sdvo hotplug which broke in the hpd storm detection
    work.
    - One fix to patch-up the sdvo lvds regression fixer from the last pull -
    we need to prefer the vbt mode over edid modes.

    * tag 'drm-intel-fixes-2013-06-11' of git://people.freedesktop.org/~danvet/drm-intel:
    drm/i915: prefer VBT modes for SVDO-LVDS over EDID
    drm/i915: Enable hotplug interrupts after querying hw capabilities.
    drm/i915: Fix hotplug interrupt enabling for SDVOC

    Dave Airlie
     
  • EE is hard-disabled on entry to kvmppc_handle_exit(), so call
    hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
    is unset.

    Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
    and sometimes host kernel hangs.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood
     
  • KVM core expects arch code to acquire the srcu lock when calling
    gfn_to_memslot and similar functions.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood
     
  • The previous patch made 64-bit booke KVM build again, but Altivec
    support is still not complete, and we can't prevent the guest from
    turning on Altivec (which can corrupt host state until state
    save/restore is implemented). Disable e6500 on KVM until this is
    fixed.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood