10 Jan, 2015

1 commit

  • Pull kgdb/kdb fixes from Jason Wessel:
    "These have been around since 3.17 and in kgdb-next for the last 9
    weeks and some will go back to -stable.

    Summary of changes:

    Cleanups
    - kdb: Remove unused command flags, repeat flags and KDB_REPEAT_NONE

    Fixes
    - kgdb/kdb: Allow access on a single core, if a CPU round up is
    deemed impossible, which will allow inspection of the now "trashed"
    kernel
    - kdb: Add enable mask for the command groups
    - kdb: access controls to restrict sensitive commands"

    * tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
    kernel/debug/debug_core.c: Logging clean-up
    kgdb: timeout if secondary CPUs ignore the roundup
    kdb: Allow access to sensitive commands to be restricted by default
    kdb: Add enable mask for groups of commands
    kdb: Categorize kdb commands (similar to SysRq categorization)
    kdb: Remove KDB_REPEAT_NONE flag
    kdb: Use KDB_REPEAT_* values as flags
    kdb: Rename kdb_register_repeat() to kdb_register_flags()
    kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
    kdb: Remove currently unused kdbtab_t->cmd_flags

    Linus Torvalds
     

09 Jan, 2015

1 commit

  • wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
    both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
    change in between, gcc needs to reload p->exit_state after
    security_task_wait(). In this case ->notask_error will be wrongly
    cleared and do_wait() can hang forever if it was the last eligible
    child.

    Many thanks to Arne who carefully investigated the problem.

    Note: this bug is very old but it was pure theoretical until commit
    b3ab03160dfa ("wait: completely ignore the EXIT_DEAD tasks"). Before
    this commit "-O2" was probably enough to guarantee that compiler won't
    read ->exit_state twice.

    Signed-off-by: Oleg Nesterov
    Reported-by: Arne Goedeke
    Tested-by: Arne Goedeke
    Cc: [3.15+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

01 Jan, 2015

1 commit

  • Pull audit fix from Paul Moore:
    "One audit patch to resolve a panic/oops when recording filenames in
    the audit log, see the mail archive link below.

    The fix isn't as nice as I would like, as it involves an allocate/copy
    of the filename, but it solves the problem and the overhead should
    only affect users who have configured audit rules involving file
    names.

    We'll revisit this issue with future kernels in an attempt to make
    this suck less, but in the meantime I think this fix should go into
    the next release of v3.19-rcX.

    [ https://marc.info/?t=141986927600001&r=1&w=2 ]"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: create private file name copies when auditing inodes

    Linus Torvalds
     

31 Dec, 2014

1 commit

  • Pull networking fixes from David Miller:

    1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.

    2) Fix receive checksum handling in enic driver, from Govindarajulu
    Varadarajan.

    3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
    Herbert Xu. Also, add code to detect drivers that have this mistake
    in the future.

    4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.

    5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
    input path,f rom Nicolas Dichtel.

    6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.

    7) Fix double SKB free in vxlan driver, also from Pravin.

    8) When we scrub a packet, which happens when we are switching the
    context of the packet (namespace, etc.), we should reset the
    secmark. From Thomas Graf.

    9) ->ndo_gso_check() needs to do more than return true/false, it also
    has to allow the driver to clear netdev feature bits in order for
    the caller to be able to proceed properly. From Jesse Gross.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
    genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
    netlink/genetlink: pass network namespace to bind/unbind
    ne2k-pci: Add pci_disable_device in error handling
    bonding: change error message to debug message in __bond_release_one()
    genetlink: pass multicast bind/unbind to families
    netlink: call unbind when releasing socket
    netlink: update listeners directly when removing socket
    genetlink: pass only network namespace to genl_has_listeners()
    netlink: rename netlink_unbind() to netlink_undo_bind()
    net: Generalize ndo_gso_check to ndo_features_check
    net: incorrect use of init_completion fixup
    neigh: remove next ptr from struct neigh_table
    net: xilinx: Remove unnecessary temac_property in the driver
    net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
    net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
    openvswitch: fix odd_ptr_err.cocci warnings
    Bluetooth: Fix accepting connections when not using mgmt
    Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
    brcmfmac: Do not crash if platform data is not populated
    ipw2200: select CFG80211_WEXT
    ...

    Linus Torvalds
     

30 Dec, 2014

1 commit

  • Unfortunately, while commit 4a928436 ("audit: correctly record file
    names with different path name types") fixed a problem where we were
    not recording filenames, it created a new problem by attempting to use
    these file names after they had been freed. This patch resolves the
    issue by creating a copy of the filename which the audit subsystem
    frees after it is done with the string.

    At some point it would be nice to resolve this issue with refcounts,
    or something similar, instead of having to allocate/copy strings, but
    that is almost surely beyond the scope of a -rcX patch so we'll defer
    that for later. On the plus side, only audit users should be impacted
    by the string copying.

    Reported-by: Toralf Foerster
    Signed-off-by: Paul Moore

    Paul Moore
     

27 Dec, 2014

1 commit

  • Netlink families can exist in multiple namespaces, and for the most
    part multicast subscriptions are per network namespace. Thus it only
    makes sense to have bind/unbind notifications per network namespace.

    To achieve this, pass the network namespace of a given client socket
    to the bind/unbind functions.

    Also do this in generic netlink, and there also make sure that any
    bind for multicast groups that only exist in init_net is rejected.
    This isn't really a problem if it is accepted since a client in a
    different namespace will never receive any notifications from such
    a group, but it can confuse the family if not rejected (it's also
    possible to silently (without telling the family) accept it, but it
    would also have to be ignored on unbind so families that take any
    kind of action on bind/unbind won't do unnecessary work for invalid
    clients like that.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

24 Dec, 2014

2 commits

  • Pull audit fixes from Paul Moore:
    "Four patches to fix various problems with the audit subsystem, all are
    fairly small and straightforward.

    One patch fixes a problem where we weren't using the correct gfp
    allocation flags (GFP_KERNEL regardless of context, oops), one patch
    fixes a problem with old userspace tools (this was broken for a
    while), one patch fixes a problem where we weren't recording pathnames
    correctly, and one fixes a problem with PID based filters.

    In general I don't think there is anything controversial with this
    patchset, and it fixes some rather unfortunate bugs; the allocation
    flag one can be particularly scary looking for users"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: restore AUDIT_LOGINUID unset ABI
    audit: correctly record file names with different path name types
    audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb
    audit: don't attempt to lookup PIDs when changing PID filtering audit rules

    Linus Torvalds
     
  • A regression was caused by commit 780a7654cee8:
    audit: Make testing for a valid loginuid explicit.
    (which in turn attempted to fix a regression caused by e1760bd)

    When audit_krule_to_data() fills in the rules to get a listing, there was a
    missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

    This broke userspace by not returning the same information that was sent and
    expected.

    The rule:
    auditctl -a exit,never -F auid=-1
    gives:
    auditctl -l
    LIST_RULES: exit,never f24=0 syscall=all
    when it should give:
    LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

    Tag it so that it is reported the same way it was set. Create a new
    private flags audit_krule field (pflags) to store it that won't interact with
    the public one from the API.

    Cc: stable@vger.kernel.org # v3.10-rc1+
    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

23 Dec, 2014

1 commit

  • There is a problem with the audit system when multiple audit records
    are created for the same path, each with a different path name type.
    The root cause of the problem is in __audit_inode() when an exact
    match (both the path name and path name type) is not found for a
    path name record; the existing code creates a new path name record,
    but it never sets the path name in this record, leaving it NULL.
    This patch corrects this problem by assigning the path name to these
    newly created records.

    There are many ways to reproduce this problem, but one of the
    easiest is the following (assuming auditd is running):

    # mkdir /root/tmp/test
    # touch /root/tmp/test/567
    # auditctl -a always,exit -F dir=/root/tmp/test
    # touch /root/tmp/test/567

    Afterwards, or while the commands above are running, check the audit
    log and pay special attention to the PATH records. A faulty kernel
    will display something like the following for the file creation:

    type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2
    success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
    type=CWD msg=audit(1416957442.025:93): cwd="/root/tmp"
    type=PATH msg=audit(1416957442.025:93): item=0 name="test/"
    inode=401409 ... nametype=PARENT
    type=PATH msg=audit(1416957442.025:93): item=1 name=(null)
    inode=393804 ... nametype=NORMAL
    type=PATH msg=audit(1416957442.025:93): item=2 name=(null)
    inode=393804 ... nametype=NORMAL

    While a patched kernel will show the following:

    type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2
    success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
    type=CWD msg=audit(1416955786.566:89): cwd="/root/tmp"
    type=PATH msg=audit(1416955786.566:89): item=0 name="test/"
    inode=401409 ... nametype=PARENT
    type=PATH msg=audit(1416955786.566:89): item=1 name="test/567"
    inode=393804 ... nametype=NORMAL

    This issue was brought up by a number of people, but special credit
    should go to hujianyang@huawei.com for reporting the problem along
    with an explanation of the problem and a patch. While the original
    patch did have some problems (see the archive link below), it did
    demonstrate the problem and helped kickstart the fix presented here.

    * https://lkml.org/lkml/2014/9/5/66

    Reported-by: hujianyang
    Signed-off-by: Paul Moore
    Acked-by: Richard Guy Briggs

    Paul Moore
     

21 Dec, 2014

1 commit

  • Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki:
    "This removes the last few uses of CONFIG_PM_RUNTIME introduced
    recently and makes that config option finally go away.

    CONFIG_PM will be available directly from the menu now and also it
    will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION
    is set"

    * tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM: Eliminate CONFIG_PM_RUNTIME
    tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM

    Linus Torvalds
     

20 Dec, 2014

6 commits

  • Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
    audit_log_end(), which can come from any context (aka even a sleeping context)
    GFP_KERNEL can't be used. Since the audit_buffer knows what context it should
    use, pass that down and use that.

    See: https://lkml.org/lkml/2014/12/16/542

    BUG: sleeping function called from invalid context at mm/slab.c:2849
    in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
    2 locks held by sulogin/885:
    #0: (&sig->cred_guard_mutex){+.+.+.}, at: [] prepare_bprm_creds+0x28/0x8b
    #1: (tty_files_lock){+.+.+.}, at: [] selinux_bprm_committing_creds+0x55/0x22b
    CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
    Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
    ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
    ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
    0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
    Call Trace:
    [] dump_stack+0x50/0xa8
    [] ___might_sleep+0x1b6/0x1be
    [] __might_sleep+0x119/0x128
    [] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
    [] kmem_cache_alloc+0x43/0x1c9
    [] __alloc_skb+0x42/0x1a3
    [] skb_copy+0x3e/0xa3
    [] audit_log_end+0x83/0x100
    [] ? avc_audit_pre_callback+0x103/0x103
    [] common_lsm_audit+0x441/0x450
    [] slow_avc_audit+0x63/0x67
    [] avc_has_perm+0xca/0xe3
    [] inode_has_perm+0x5a/0x65
    [] selinux_bprm_committing_creds+0x98/0x22b
    [] security_bprm_committing_creds+0xe/0x10
    [] install_exec_creds+0xe/0x79
    [] load_elf_binary+0xe36/0x10d7
    [] search_binary_handler+0x81/0x18c
    [] do_execveat_common.isra.31+0x4e3/0x7b7
    [] do_execve+0x1f/0x21
    [] SyS_execve+0x25/0x29
    [] stub_execve+0x69/0xa0

    Cc: stable@vger.kernel.org #v3.16-rc1
    Reported-by: Valdis Kletnieks
    Signed-off-by: Richard Guy Briggs
    Tested-by: Valdis Kletnieks
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     
  • Commit f1dc4867 ("audit: anchor all pid references in the initial pid
    namespace") introduced a find_vpid() call when adding/removing audit
    rules with PID/PPID filters; unfortunately this is problematic as
    find_vpid() only works if there is a task with the associated PID
    alive on the system. The following commands demonstrate a simple
    reproducer.

    # auditctl -D
    # auditctl -l
    # autrace /bin/true
    # auditctl -l

    This patch resolves the problem by simply using the PID provided by
    the user without any additional validation, e.g. no calls to check to
    see if the task/PID exists.

    Cc: stable@vger.kernel.org # 3.15
    Cc: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Acked-by: Eric Paris
    Reviewed-by: Richard Guy Briggs

    Paul Moore
     
  • Having switched over all of the users of CONFIG_PM_RUNTIME to use
    CONFIG_PM directly, turn the latter into a user-selectable option
    and drop the former entirely from the tree.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Ulf Hansson
    Acked-by: Kevin Hilman

    Rafael J. Wysocki
     
  • Pull NOHZ update from Thomas Gleixner:
    "Remove the call into the nohz idle code from the fake 'idle' thread in
    the powerclamp driver along with the export of those functions which
    was smuggeled in via the thermal tree. People have tried to hack
    around it in the nohz core code, but it just violates all rightful
    assumptions of that code about the only valid calling context (i.e.
    the proper idle task).

    The powerclamp trainwreck will still work, it just wont get the
    benefit of long idle sleeps"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick/powerclamp: Remove tick_nohz_idle abuse

    Linus Torvalds
     
  • Pull irq core fix from Thomas Gleixner:
    "A single fix plugging a long standing race between proc/stat and
    proc/interrupts access and freeing of interrupt descriptors"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Prevent proc race against freeing of irq descriptors

    Linus Torvalds
     
  • Pull perf fixes and cleanups from Ingo Molnar:
    "A kernel fix plus mostly tooling fixes, but also some tooling
    restructuring and cleanups"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    perf: Fix building warning on ARM 32
    perf symbols: Fix use after free in filename__read_build_id
    perf evlist: Use roundup_pow_of_two
    tools: Adopt roundup_pow_of_two
    perf tools: Make the mmap length autotuning more robust
    tools: Adopt rounddown_pow_of_two and deps
    tools: Adopt fls_long and deps
    tools: Move bitops.h from tools/perf/util to tools/
    tools: Introduce asm-generic/bitops.h
    tools lib: Move asm-generic/bitops/find.h code to tools/include and tools/lib
    tools: Whitespace prep patches for moving bitops.h
    tools: Move code originally from asm-generic/atomic.h into tools/include/asm-generic/
    tools: Move code originally from linux/log2.h to tools/include/linux/
    tools: Move __ffs implementation to tools/include/asm-generic/bitops/__ffs.h
    perf evlist: Do not use hard coded value for a mmap_pages default
    perf trace: Let the perf_evlist__mmap autosize the number of pages to use
    perf evlist: Improve the strerror_mmap method
    perf evlist: Clarify sterror_mmap variable names
    perf evlist: Fixup brown paper bag on "hint" for --mmap-pages cmdline arg
    perf trace: Provide a better explanation when mmap fails
    ...

    Linus Torvalds
     

19 Dec, 2014

3 commits

  • commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module
    use" was merged via the thermal tree without an explicit ack from the
    relevant maintainers.

    The exports are abused by the intel powerclamp driver which implements
    a fake idle state from a sched FIFO task. This causes all kinds of
    wreckage in the NOHZ core code which rightfully assumes that
    tick_nohz_idle_enter/exit() are only called from the idle task itself.

    Recent changes in the NOHZ core lead to a failure of the powerclamp
    driver and now people try to hack completely broken and backwards
    workarounds into the NOHZ core code. This is completely unacceptable
    and just papers over the real problem. There are way more subtle
    issues lurking around the corner.

    The real solution is to fix the powerclamp driver by rewriting it with
    a sane concept, but that's beyond the scope of this.

    So the only solution for now is to remove the calls into the core NOHZ
    code from the powerclamp trainwreck along with the exports.

    Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver"
    Signed-off-by: Thomas Gleixner
    Cc: Preeti U Murthy
    Cc: Viresh Kumar
    Cc: Frederic Weisbecker
    Cc: Fengguang Wu
    Cc: Frederic Weisbecker
    Cc: Pan Jacob jun
    Cc: LKP
    Cc: Peter Zijlstra
    Cc: Zhang Rui
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Pull module updates from Rusty Russell:
    "The exciting thing here is the getting rid of stop_machine on module
    removal. This is possible by using a simple atomic_t for the counter,
    rather than our fancy per-cpu counter: it turns out that no one is
    doing a module increment per net packet, so the slowdown should be in
    the noise"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    param: do not set store func without write perm
    params: cleanup sysfs allocation
    kernel:module Fix coding style errors and warnings.
    module: Remove stop_machine from module unloading
    module: Replace module_ref with atomic_t refcnt
    lib/bug: Use RCU list ops for module_bug_list
    module: Unlink module with RCU synchronizing instead of stop_machine
    module: Wait for RCU synchronizing before releasing a module

    Linus Torvalds
     
  • Pull more ACPI and power management updates from Rafael Wysocki:
    "These are regression fixes (leds-gpio, ACPI backlight driver,
    operating performance points library, ACPI device enumeration
    messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device
    PM), some cleanups in the operating performance points (OPP)
    framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of
    minor intel_pstate driver changes, a new MAINTAINERS entry for it and
    an ACPI fan driver change needed for better support of thermal
    management in user space.

    Specifics:

    - Fix a regression in leds-gpio introduced by a recent commit that
    inadvertently changed the name of one of the properties used by the
    driver (Fabio Estevam).

    - Fix a regression in the ACPI backlight driver introduced by a
    recent fix that missed one special case that had to be taken into
    account (Aaron Lu).

    - Drop the level of some new kernel messages from the ACPI core
    introduced by a recent commit to KERN_DEBUG which they should have
    used from the start and drop some other unuseful KERN_ERR messages
    printed by ACPI (Rafael J Wysocki).

    - Revert an incorrect commit modifying the cpupower tool (Prarit
    Bhargava).

    - Fix two regressions introduced by recent commits in the OPP library
    and clean up some existing minor issues in that code (Viresh
    Kumar).

    - Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the
    tree (or drop it where that can be done) in order to make it
    possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf
    Hansson, Ludovic Desroches).

    There will be one more "CONFIG_PM_RUNTIME removal" batch after this
    one, because some new uses of it have been introduced during the
    current merge window, but that should be sufficient to finally get
    rid of it.

    - Make the ACPI EC driver more robust against race conditions related
    to GPE handler installation failures (Lv Zheng).

    - Prevent the ACPI device PM core code from attempting to disable
    GPEs that it has not enabled which confuses ACPICA and makes it
    report errors unnecessarily (Rafael J Wysocki).

    - Add a "force" command line switch to the intel_pstate driver to
    make it possible to override the blacklisting of some systems in
    that driver if needed (Ethan Zhao).

    - Improve intel_pstate code documentation and add a MAINTAINERS entry
    for it (Kristen Carlson Accardi).

    - Make the ACPI fan driver create cooling device interfaces witn
    names that reflect the IDs of the ACPI device objects they are
    associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B").

    That's necessary for user space thermal management tools to be able
    to connect the fans with the parts of the system they are supposed
    to be cooling properly. From Srinivas Pandruvada"

    * tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
    MAINTAINERS: add entry for intel_pstate
    ACPI / video: update the skip case for acpi_video_device_in_dod()
    power / PM: Eliminate CONFIG_PM_RUNTIME
    NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    ACPI / EC: Fix unexpected ec_remove_handlers() invocations
    Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()"
    tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c
    PM: Remove the SET_PM_RUNTIME_PM_OPS() macro
    mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro
    PM / Kconfig: Replace PM_RUNTIME with PM in dependencies
    ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    ACPI / PM: Do not disable wakeup GPEs that have not been enabled
    ACPI / utils: Drop error messages from acpi_evaluate_reference()
    ...

    Linus Torvalds
     

18 Dec, 2014

2 commits

  • When a module_param is defined without DAC write permissions, it can
    still be changed at runtime and updated. Drivers using a 0444 permission
    may be surprised that these values can still be changed.

    For drivers that want to allow updates, any S_IW* flag will set the
    "store" function as before. Drivers without S_IW* flags will have the
    "store" function unset, unforcing a read-only value. Drivers that wish
    neither "store" nor "get" can continue to use "0" for perms to stay out
    of sysfs entirely.

    Old behavior:
    # cd /sys/module/snd/parameters
    # ls -l
    total 0
    -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit
    -r--r--r-- 1 root root 4096 Dec 11 13:55 major
    -r--r--r-- 1 root root 4096 Dec 11 13:55 slots
    # cat major
    116
    # echo -1 > major
    -bash: major: Permission denied
    # chmod u+w major
    # echo -1 > major
    # cat major
    -1

    New behavior:
    ...
    # chmod u+w major
    # echo -1 > major
    -bash: echo: write error: Input/output error

    Signed-off-by: Kees Cook
    Signed-off-by: Rusty Russell

    Kees Cook
     
  • Pull user namespace related fixes from Eric Biederman:
    "As these are bug fixes almost all of thes changes are marked for
    backporting to stable.

    The first change (implicitly adding MNT_NODEV on remount) addresses a
    regression that was created when security issues with unprivileged
    remount were closed. I go on to update the remount test to make it
    easy to detect if this issue reoccurs.

    Then there are a handful of mount and umount related fixes.

    Then half of the changes deal with the a recently discovered design
    bug in the permission checks of gid_map. Unix since the beginning has
    allowed setting group permissions on files to less than the user and
    other permissions (aka ---rwx---rwx). As the unix permission checks
    stop as soon as a group matches, and setgroups allows setting groups
    that can not later be dropped, results in a situtation where it is
    possible to legitimately use a group to assign fewer privileges to a
    process. Which means dropping a group can increase a processes
    privileges.

    The fix I have adopted is that gid_map is now no longer writable
    without privilege unless the new file /proc/self/setgroups has been
    set to permanently disable setgroups.

    The bulk of user namespace using applications even the applications
    using applications using user namespaces without privilege remain
    unaffected by this change. Unfortunately this ix breaks a couple user
    space applications, that were relying on the problematic behavior (one
    of which was tools/selftests/mount/unprivileged-remount-test.c).

    To hopefully prevent needing a regression fix on top of my security
    fix I rounded folks who work with the container implementations mostly
    like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn

    > I tested this with Sandstorm. It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski # breaks things as designed :("

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Unbreak the unprivileged remount tests
    userns; Correct the comment in map_write
    userns: Allow setting gid_maps without privilege when setgroups is disabled
    userns: Add a knob to disable setgroups on a per user namespace basis
    userns: Rename id_map_mutex to userns_state_mutex
    userns: Only allow the creator of the userns unprivileged mappings
    userns: Check euid no fsuid when establishing an unprivileged uid mapping
    userns: Don't allow unprivileged creation of gid mappings
    userns: Don't allow setgroups until a gid mapping has been setablished
    userns: Document what the invariant required for safe unprivileged mappings.
    groups: Consolidate the setgroups permission checks
    mnt: Clear mnt_expire during pivot_root
    mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
    mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
    umount: Do not allow unmounting rootfs.
    umount: Disallow unprivileged mount force
    mnt: Update unprivileged remount test
    mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount

    Linus Torvalds
     

17 Dec, 2014

2 commits

  • Pull vfs pile #2 from Al Viro:
    "Next pile (and there'll be one or two more).

    The large piece in this one is getting rid of /proc/*/ns/* weirdness;
    among other things, it allows to (finally) make nameidata completely
    opaque outside of fs/namei.c, making for easier further cleanups in
    there"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    coda_venus_readdir(): use file_inode()
    fs/namei.c: fold link_path_walk() call into path_init()
    path_init(): don't bother with LOOKUP_PARENT in argument
    fs/namei.c: new helper (path_cleanup())
    path_init(): store the "base" pointer to file in nameidata itself
    make default ->i_fop have ->open() fail with ENXIO
    make nameidata completely opaque outside of fs/namei.c
    kill proc_ns completely
    take the targets of /proc/*/ns/* symlinks to separate fs
    bury struct proc_ns in fs/proc
    copy address of proc_ns_ops into ns_common
    new helpers: ns_alloc_inum/ns_free_inum
    make proc_ns_operations work with struct ns_common * instead of void *
    switch the rest of proc_ns_operations to working with &...->ns
    netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
    make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
    common object embedded into various struct ....ns

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "As the merge window is still open, and this code was not as complex as
    I thought it might be. I'm pushing this in now.

    This will allow Thomas to debug his irq work for 3.20.

    This adds two new features:

    1) Allow traceopoints to be enabled right after mm_init().

    By passing in the trace_event= kernel command line parameter,
    tracepoints can be enabled at boot up. For debugging things like
    the initialization of interrupts, it is needed to have tracepoints
    enabled very early. People have asked about this before and this
    has been on my todo list. As it can be helpful for Thomas to debug
    his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
    add tracepoints into the IRQ set up and have users enable them when
    things go wrong.

    2) Have the tracepoints printed via printk() (the console) when they
    are triggered.

    If the irq code locks up or reboots the box, having the tracepoint
    output go into the kernel ring buffer is useless for debugging.
    But being able to add the tp_printk kernel command line option
    along with the trace_event= option will have these tracepoints
    printed as they occur, and that can be really useful for debugging
    early lock up or reboot problems.

    This code is not that intrusive and it passed all my tests. Thomas
    tried them out too and it works for his needs.

    Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"

    * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add tp_printk cmdline to have tracepoints go to printk()
    tracing: Move enabling tracepoints to just after rcu_init()

    Linus Torvalds
     

16 Dec, 2014

1 commit

  • Pull drm updates from Dave Airlie:
    "Highlights:

    - AMD KFD driver merge

    This is the AMD HSA interface for exposing a lowlevel interface for
    GPGPU use. They have an open source userspace built on top of this
    interface, and the code looks as good as it was going to get out of
    tree.

    - Initial atomic modesetting work

    The need for an atomic modesetting interface to allow userspace to
    try and send a complete set of modesetting state to the driver has
    arisen, and been suffering from neglect this past year. No more,
    the start of the common code and changes for msm driver to use it
    are in this tree. Ongoing work to get the userspace ioctl finished
    and the code clean will probably wait until next kernel.

    - DisplayID 1.3 and tiled monitor exposed to userspace.

    Tiled monitor property is now exposed for userspace to make use of.

    - Rockchip drm driver merged.

    - imx gpu driver moved out of staging

    Other stuff:

    - core:
    panel - MIPI DSI + new panels.
    expose suggested x/y properties for virtual GPUs

    - i915:
    Initial Skylake (SKL) support
    gen3/4 reset work
    start of dri1/ums removal
    infoframe tracking
    fixes for lots of things.

    - nouveau:
    tegra k1 voltage support
    GM204 modesetting support
    GT21x memory reclocking work

    - radeon:
    CI dpm fixes
    GPUVM improvements
    Initial DPM fan control

    - rcar-du:
    HDMI support added
    removed some support for old boards
    slave encoder driver for Analog Devices adv7511

    - exynos:
    Exynos4415 SoC support

    - msm:
    a4xx gpu support
    atomic helper conversion

    - tegra:
    iommu support
    universal plane support
    ganged-mode DSI support

    - sti:
    HDMI i2c improvements

    - vmwgfx:
    some late fixes.

    - qxl:
    use suggested x/y properties"

    * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
    drm: sti: fix module compilation issue
    drm/i915: save/restore GMBUS freq across suspend/resume on gen4
    drm: sti: correctly cleanup CRTC and planes
    drm: sti: add HQVDP plane
    drm: sti: add cursor plane
    drm: sti: enable auxiliary CRTC
    drm: sti: fix delay in VTG programming
    drm: sti: prepare sti_tvout to support auxiliary crtc
    drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
    drm: sti: fix hdmi avi infoframe
    drm: sti: remove event lock while disabling vblank
    drm: sti: simplify gdp code
    drm: sti: clear all mixer control
    drm: sti: remove gpio for HDMI hot plug detection
    drm: sti: allow to change hdmi ddc i2c adapter
    drm/doc: Document drm_add_modes_noedid() usage
    drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
    drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
    drm: Zero out DRM object memory upon cleanup
    drm/i915/bdw: Fix the write setting up the WIZ hashing mode
    ...

    Linus Torvalds
     

15 Dec, 2014

3 commits

  • Add the kernel command line tp_printk option that will have tracepoints
    that are active sent to printk() as well as to the trace buffer.

    Passing "tp_printk" will activate this. To turn it off, the sysctl
    /proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note,
    this only works if the cmdline option is used. Echoing 1 into the sysctl
    file without the cmdline option will have no affect.

    Note, this is a dangerous option. Having high frequency tracepoints send
    their data to printk() can possibly cause a live lock. This is another
    reason why this is only active if the command line option is used.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos

    Suggested-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Enabling tracepoints at boot up can be very useful. The tracepoint
    can be initialized right after RCU has been. There's no need to
    wait for the early_initcall() to be called. That's too late for some
    things that can use tracepoints for debugging. Move the logic to
    enable tracepoints out of the initcalls and into init/main.c to
    right after rcu_init().

    This also allows trace_printk() to be used early too.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
    Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

    Reviewed-by: Paul E. McKenney
    Suggested-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Pull tty/serial driver updates from Greg KH:
    "Here's the big tty/serial driver update for 3.19-rc1.

    There are a number of TTY core changes/fixes in here from Peter Hurley
    that have all been teted in linux-next for a long time now. There are
    also the normal serial driver updates as well, full details in the
    changelog below"

    * tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits)
    serial: pxa: hold port.lock when reporting modem line changes
    tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put"
    tty: Deletion of unnecessary checks before two function calls
    n_tty: Fix read_buf race condition, increment read_head after pushing data
    serial: of-serial: add PM suspend/resume support
    Revert "serial: of-serial: add PM suspend/resume support"
    Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type"
    serial: 8250: don't attempt a trylock if in sysrq
    serial: core: Add big-endian iotype
    serial: samsung: use port->fifosize instead of hardcoded values
    serial: samsung: prefer to use fifosize from driver data
    serial: samsung: fix style problems
    serial: samsung: wait for transfer completion before clock disable
    serial: icom: fix error return code
    serial: tegra: clean up tty-flag assignments
    serial: Fix io address assign flow with Fintek PCI-to-UART Product
    serial: mxs-auart: fix tx_empty against shift register
    serial: mxs-auart: fix gpio change detection on interrupt
    serial: mxs-auart: Fix mxs_auart_set_ldisc()
    serial: 8250_dw: Use 64-bit access for OCTEON.
    ...

    Linus Torvalds
     

14 Dec, 2014

11 commits

  • Pull block driver core update from Jens Axboe:
    "This is the pull request for the core block IO changes for 3.19. Not
    a huge round this time, mostly lots of little good fixes:

    - Fix a bug in sysfs blktrace interface causing a NULL pointer
    dereference, when enabled/disabled through that API. From Arianna
    Avanzini.

    - Various updates/fixes/improvements for blk-mq:

    - A set of updates from Bart, mostly fixing buts in the tag
    handling.

    - Cleanup/code consolidation from Christoph.

    - Extend queue_rq API to be able to handle batching issues of IO
    requests. NVMe will utilize this shortly. From me.

    - A few tag and request handling updates from me.

    - Cleanup of the preempt handling for running queues from Paolo.

    - Prevent running of unmapped hardware queues from Ming Lei.

    - Move the kdump memory limiting check to be in the correct
    location, from Shaohua.

    - Initialize all software queues at init time from Takashi. This
    prevents a kobject warning when CPUs are brought online that
    weren't online when a queue was registered.

    - Single writeback fix for I_DIRTY clearing from Tejun. Queued with
    the core IO changes, since it's just a single fix.

    - Version X of the __bio_add_page() segment addition retry from
    Maurizio. Hope the Xth time is the charm.

    - Documentation fixup for IO scheduler merging from Jan.

    - Introduce (and use) generic IO stat accounting helpers for non-rq
    drivers, from Gu Zheng.

    - Kill off artificial limiting of max sectors in a request from
    Christoph"

    * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
    bio: modify __bio_add_page() to accept pages that don't start a new segment
    blk-mq: Fix uninitialized kobject at CPU hotplugging
    blktrace: don't let the sysfs interface remove trace from running list
    blk-mq: Use all available hardware queues
    blk-mq: Micro-optimize bt_get()
    blk-mq: Fix a race between bt_clear_tag() and bt_get()
    blk-mq: Avoid that __bt_get_word() wraps multiple times
    blk-mq: Fix a use-after-free
    blk-mq: prevent unmapped hw queue from being scheduled
    blk-mq: re-check for available tags after running the hardware queue
    blk-mq: fix hang in bt_get()
    blk-mq: move the kdump check to blk_mq_alloc_tag_set
    blk-mq: cleanup tag free handling
    blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq cpu map
    blk: introduce generic io stat accounting help function
    blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
    genhd: check for int overflow in disk_expand_part_tbl()
    blk-mq: add blk_mq_free_hctx_request()
    blk-mq: export blk_mq_free_request()
    blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
    ...

    Linus Torvalds
     
  • …it/rostedt/linux-trace

    Pull tracing fixlet from Steven Rostedt:
    "Remove unnecessary preempt_disable in printk()"

    * tag 'trace-seq-buf-3.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    printk: Do not disable preemption for accessing printk_func

    Linus Torvalds
     
  • Pull audit updates from Paul Moore:
    "Two small patches from the audit next branch; only one of which has
    any real significant code changes, the other is simply a MAINTAINERS
    update for audit.

    The single code patch is pretty small and rather straightforward, it
    changes the audit "version" number reported to userspace from an
    integer to a bitmap which is used to indicate the functionality of the
    running kernel. This really doesn't have much impact on the kernel,
    but it will make life easier for the audit userspace folks.

    Thankfully we were still on a version number which allowed us to do
    this without breaking userspace"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: convert status version to a feature bitmap
    audit: add Paul Moore to the MAINTAINERS entry

    Linus Torvalds
     
  • There's a lot of common code in inode and mount marks handling. Factor it
    out to a common helper function.

    Signed-off-by: Jan Kara
    Cc: Eric Paris
    Cc: Heinrich Schuchardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Following the suggestions from Andrew Morton and Stephen Rothwell,
    Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
    define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
    can enable.

    set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
    previously allowed + ARM64 which I tested.

    Signed-off-by: Riku Voipio
    Cc: Peter Oberparleiter
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Riku Voipio
     
  • Remove unnecessary KERN_ERR from pr_err() within kexec.c.

    Signed-off-by: Masanari Iida
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masanari Iida
     
  • This patchset adds execveat(2) for x86, and is derived from Meredydd
    Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

    The primary aim of adding an execveat syscall is to allow an
    implementation of fexecve(3) that does not rely on the /proc filesystem,
    at least for executables (rather than scripts). The current glibc version
    of fexecve(3) is implemented via /proc, which causes problems in sandboxed
    or otherwise restricted environments.

    Given the desire for a /proc-free fexecve() implementation, HPA suggested
    (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
    an appropriate generalization.

    Also, having a new syscall means that it can take a flags argument without
    back-compatibility concerns. The current implementation just defines the
    AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
    added in future -- for example, flags for new namespaces (as suggested at
    https://lkml.org/lkml/2006/7/11/474).

    Related history:
    - https://lkml.org/lkml/2006/12/27/123 is an example of someone
    realizing that fexecve() is likely to fail in a chroot environment.
    - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
    documenting the /proc requirement of fexecve(3) in its manpage, to
    "prevent other people from wasting their time".
    - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
    problem where a process that did setuid() could not fexecve()
    because it no longer had access to /proc/self/fd; this has since
    been fixed.

    This patch (of 4):

    Add a new execveat(2) system call. execveat() is to execve() as openat()
    is to open(): it takes a file descriptor that refers to a directory, and
    resolves the filename relative to that.

    In addition, if the filename is empty and AT_EMPTY_PATH is specified,
    execveat() executes the file to which the file descriptor refers. This
    replicates the functionality of fexecve(), which is a system call in other
    UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/" (and
    so relies on /proc being mounted).

    The filename fed to the executed program as argv[0] (or the name of the
    script fed to a script interpreter) will be of the form "/dev/fd/"
    (for an empty filename) or "/dev/fd//", effectively
    reflecting how the executable was found. This does however mean that
    execution of a script in a /proc-less environment won't work; also, script
    execution via an O_CLOEXEC file descriptor fails (as the file will not be
    accessible after exec).

    Based on patches by Meredydd Luff.

    Signed-off-by: David Drysdale
    Cc: Meredydd Luff
    Cc: Shuah Khan
    Cc: "Eric W. Biederman"
    Cc: Andy Lutomirski
    Cc: Alexander Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Kees Cook
    Cc: Arnd Bergmann
    Cc: Rich Felker
    Cc: Christoph Hellwig
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Drysdale
     
  • Current stacktrace only have the function for console output. page_owner
    that will be introduced in following patch needs to print the output of
    stacktrace into the buffer for our own output format so so new function,
    snprint_stack_trace(), is needed.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Both register and unregister call build_map_info() in order to create the
    list of mappings before installing or removing breakpoints for every mm
    which maps file backed memory. As such, there is no reason to hold the
    i_mmap_rwsem exclusively, so share it and allow concurrent readers to
    build the mapping data.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Srikar Dronamraju
    Acked-by: "Kirill A. Shutemov"
    Cc: Oleg Nesterov
    Acked-by: Hugh Dickins
    Acked-by: Peter Zijlstra (Intel)
    Cc: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
    similar data, one for file backed pages and the other for anon memory. To
    this end, this lock can also be a rwsem. In addition, there are some
    important opportunities to share the lock when there are no tree
    modifications.

    This conversion is straightforward. For now, all users take the write
    lock.

    [sfr@canb.auug.org.au: update fremap.c]
    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Rik van Riel
    Acked-by: "Kirill A. Shutemov"
    Acked-by: Hugh Dickins
    Cc: Oleg Nesterov
    Acked-by: Peter Zijlstra (Intel)
    Cc: Srikar Dronamraju
    Acked-by: Mel Gorman
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Convert all open coded mutex_lock/unlock calls to the
    i_mmap_[lock/unlock]_write() helpers.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Rik van Riel
    Acked-by: "Kirill A. Shutemov"
    Acked-by: Hugh Dickins
    Cc: Oleg Nesterov
    Acked-by: Peter Zijlstra (Intel)
    Cc: Srikar Dronamraju
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

13 Dec, 2014

2 commits

  • Since the rework of the sparse interrupt code to actually free the
    unused interrupt descriptors there exists a race between the /proc
    interfaces to the irq subsystem and the code which frees the interrupt
    descriptor.

    CPU0 CPU1
    show_interrupts()
    desc = irq_to_desc(X);
    free_desc(desc)
    remove_from_radix_tree();
    kfree(desc);
    raw_spinlock_irq(&desc->lock);

    /proc/interrupts is the only interface which can actively corrupt
    kernel memory via the lock access. /proc/stat can only read from freed
    memory. Extremly hard to trigger, but possible.

    The interfaces in /proc/irq/N/ are not affected by this because the
    removal of the proc file is serialized in procfs against concurrent
    readers/writers. The removal happens before the descriptor is freed.

    For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
    as the descriptor is never freed. It's merely cleared out with the irq
    descriptor lock held. So any concurrent proc access will either see
    the old correct value or the cleared out ones.

    Protect the lookup and access to the irq descriptor in
    show_interrupts() with the sparse_irq_lock.

    Provide kstat_irqs_usr() which is protecting the lookup and access
    with sparse_irq_lock and switch /proc/stat to use it.

    Document the existing kstat_irqs interfaces so it's clear that the
    caller needs to take care about protection. The users of these
    interfaces are either not affected due to SPARSE_IRQ=n or already
    protected against removal.

    Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     
  • After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
    selected) PM_RUNTIME is always set if PM is set, so files that are
    build conditionally if CONFIG_PM_RUNTIME is set may now be build
    if CONFIG_PM is set.

    Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile
    for this reason.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Steven Rostedt <rostedt@goodmis.org.

    Rafael J. Wysocki