22 Feb, 2017

1 commit

  • Pull security layer updates from James Morris:
    "Highlights:

    - major AppArmor update: policy namespaces & lots of fixes

    - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

    - SELinux cgroupfs labeling support

    - SELinux context mounts on tmpfs, ramfs, devpts within user
    namespaces

    - improved TPM 2.0 support"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
    tpm: declare tpm2_get_pcr_allocation() as static
    tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
    tpm xen: drop unneeded chip variable
    tpm: fix misspelled "facilitate" in module parameter description
    tpm_tis: fix the error handling of init_tis()
    KEYS: Use memzero_explicit() for secret data
    KEYS: Fix an error code in request_master_key()
    sign-file: fix build error in sign-file.c with libressl
    selinux: allow changing labels for cgroupfs
    selinux: fix off-by-one in setprocattr
    tpm: silence an array overflow warning
    tpm: fix the type of owned field in cap_t
    tpm: add securityfs support for TPM 2.0 firmware event log
    tpm: enhance read_log_of() to support Physical TPM event log
    tpm: enhance TPM 2.0 PCR extend to support multiple banks
    tpm: implement TPM 2.0 capability to get active PCR banks
    tpm: fix RC value check in tpm2_seal_trusted
    tpm_tis: fix iTPM probe via probe_itpm() function
    tpm: Begin the process to deprecate user_read_timer
    tpm: remove tpm_read_index and tpm_write_index from tpm.h
    ...

    Linus Torvalds
     

21 Feb, 2017

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this (fairly busy) cycle were:

    - There was a class of scheduler bugs related to forgetting to update
    the rq-clock timestamp which can cause weird and hard to debug
    problems, so there's a new debug facility for this: which uncovered
    a whole lot of bugs which convinced us that we want to keep the
    debug facility.

    (Peter Zijlstra, Matt Fleming)

    - Various cputime related updates: eliminate cputime and use u64
    nanoseconds directly, simplify and improve the arch interfaces,
    implement delayed accounting more widely, etc. - (Frederic
    Weisbecker)

    - Move code around for better structure plus cleanups (Ingo Molnar)

    - Move IO schedule accounting deeper into the scheduler plus related
    changes to improve the situation (Tejun Heo)

    - ... plus a round of sched/rt and sched/deadline fixes, plus other
    fixes, updats and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
    sched/core: Remove unlikely() annotation from sched_move_task()
    sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
    sched/topology: Split out scheduler topology code from core.c into topology.c
    sched/core: Remove unnecessary #include headers
    sched/rq_clock: Consolidate the ordering of the rq_clock methods
    delayacct: Include
    sched/core: Clean up comments
    sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
    sched/clock: Add dummy clear_sched_clock_stable() stub function
    sched/cputime: Remove generic asm headers
    sched/cputime: Remove unused nsec_to_cputime()
    s390, sched/cputime: Remove unused cputime definitions
    powerpc, sched/cputime: Remove unused cputime definitions
    s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
    ia64, sched/cputime: Remove unused cputime definitions
    ia64: Convert vtime to use nsec units directly
    ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
    sched/cputime: Remove jiffies based cputime
    sched/cputime, vtime: Return nsecs instead of cputime_t to account
    sched/cputime: Complete nsec conversion of tick based accounting
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "Nothing exciting, just the usual pile of fixes, updates and cleanups:

    - A bunch of clocksource driver updates

    - Removal of CONFIG_TIMER_STATS and the related /proc file

    - More posix timer slim down work

    - A scalability enhancement in the tick broadcast code

    - Math cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    hrtimer: Catch invalid clockids again
    math64, tile: Fix build failure
    clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init
    timerfd: Protect the might cancel mechanism proper
    timer_list: Remove useless cast when printing
    time: Remove CONFIG_TIMER_STATS
    clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101
    clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure
    clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter
    clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum
    clocksource/drivers/ostm: Add renesas-ostm timer driver
    clocksource/drivers/ostm: Document renesas-ostm timer DT bindings
    clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock
    clocksource/drivers/gemini: Add driver for the Cortina Gemini
    clocksource: add DT bindings for Cortina Gemini
    clockevents: Add a clkevt-of mechanism like clksrc-of
    tick/broadcast: Reduce lock cacheline contention
    timers: Omit POSIX timer stuff from task_struct when disabled
    x86/timer: Make delay() work during early bootup
    delay: Add explanation of udelay() inaccuracy
    ...

    Linus Torvalds
     

08 Feb, 2017

1 commit

  • Commit 6326fec1122c ("mm: Use owner_priv bit for PageSwapCache, valid
    when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
    depending on PageSwapBacked being true).

    As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
    synthesized, instead of being shown on unrelated pages which just happen
    to have PG_owner_priv_1 set.

    Signed-off-by: Hugh Dickins
    Cc: Andrew Morton
    Cc: Nicholas Piggin
    Cc: Wu Fengguang
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

01 Feb, 2017

4 commits

  • This way we don't need to deal with cputime_t details from the core code.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-32-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Now that most cputime readers use the transition API which return the
    task cputime in old style cputime_t, we can safely store the cputime in
    nsecs. This will eventually make cputime statistics less opaque and more
    granular. Back and forth convertions between cputime_t and nsecs in order
    to deal with cputime_t random granularity won't be needed anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • cputime_t is being obsolete and replaced by nsecs units in order to make
    internal timestamps less opaque and more granular.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-6-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Kernel CPU stats are stored in cputime_t which is an architecture
    defined type, and hence a bit opaque and requiring accessors and mutators
    for any operation.

    Converting them to nsecs simplifies the code and is one step toward
    the removal of cputime_t in the core code.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

28 Jan, 2017

1 commit


25 Jan, 2017

1 commit

  • We have seen proc_pid_readdir() invocations holding cpu for more than 50
    ms. Add a cond_resched() to be gentle with other tasks.

    [akpm@linux-foundation.org: coding style fix]
    Link: http://lkml.kernel.org/r/1484238380.15816.42.camel@edumazet-glaptop3.roam.corp.google.com
    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

10 Jan, 2017

1 commit

  • Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
    added by grab_header when return from !dir_emit_dots path.
    It can cause any path called unregister_sysctl_table will
    wait forever.

    The calltrace of CVE-2016-9191:

    [ 5535.960522] Call Trace:
    [ 5535.963265] [] schedule+0x3f/0xa0
    [ 5535.968817] [] schedule_timeout+0x3db/0x6f0
    [ 5535.975346] [] ? wait_for_completion+0x45/0x130
    [ 5535.982256] [] wait_for_completion+0xc3/0x130
    [ 5535.988972] [] ? wake_up_q+0x80/0x80
    [ 5535.994804] [] drop_sysctl_table+0xc4/0xe0
    [ 5536.001227] [] drop_sysctl_table+0x77/0xe0
    [ 5536.007648] [] unregister_sysctl_table+0x4d/0xa0
    [ 5536.014654] [] unregister_sysctl_table+0x7f/0xa0
    [ 5536.021657] [] unregister_sched_domain_sysctl+0x15/0x40
    [ 5536.029344] [] partition_sched_domains+0x44/0x450
    [ 5536.036447] [] ? __mutex_unlock_slowpath+0x111/0x1f0
    [ 5536.043844] [] rebuild_sched_domains_locked+0x64/0xb0
    [ 5536.051336] [] update_flag+0x11d/0x210
    [ 5536.057373] [] ? mutex_lock_nested+0x2df/0x450
    [ 5536.064186] [] ? cpuset_css_offline+0x1b/0x60
    [ 5536.070899] [] ? trace_hardirqs_on+0xd/0x10
    [ 5536.077420] [] ? mutex_lock_nested+0x2df/0x450
    [ 5536.084234] [] ? css_killed_work_fn+0x25/0x220
    [ 5536.091049] [] cpuset_css_offline+0x35/0x60
    [ 5536.097571] [] css_killed_work_fn+0x5c/0x220
    [ 5536.104207] [] process_one_work+0x1df/0x710
    [ 5536.110736] [] ? process_one_work+0x160/0x710
    [ 5536.117461] [] worker_thread+0x12b/0x4a0
    [ 5536.123697] [] ? process_one_work+0x710/0x710
    [ 5536.130426] [] kthread+0xfe/0x120
    [ 5536.135991] [] ret_from_fork+0x1f/0x40
    [ 5536.142041] [] ? kthread_create_on_node+0x230/0x230

    One cgroup maintainer mentioned that "cgroup is trying to offline
    a cpuset css, which takes place under cgroup_mutex. The offlining
    ends up trying to drain active usages of a sysctl table which apprently
    is not happening."
    The real reason is that proc_sys_readdir doesn't drop reference added
    by grab_header when return from !dir_emit_dots path. So this cpuset
    offline path will wait here forever.

    See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

    Fixes: f0c3b5093add ("[readdir] convert procfs")
    Cc: stable@vger.kernel.org
    Reported-by: CAI Qian
    Tested-by: Yang Shukui
    Signed-off-by: Zhou Chengming
    Acked-by: Al Viro
    Signed-off-by: Eric W. Biederman

    Zhou Chengming
     

09 Jan, 2017

1 commit

  • Processes can only alter their own security attributes via
    /proc/pid/attr nodes. This is presently enforced by each individual
    security module and is also imposed by the Linux credentials
    implementation, which only allows a task to alter its own credentials.
    Move the check enforcing this restriction from the individual
    security modules to proc_pid_attr_write() before calling the security hook,
    and drop the unnecessary task argument to the security hook since it can
    only ever be the current task.

    Signed-off-by: Stephen Smalley
    Acked-by: Casey Schaufler
    Acked-by: John Johansen
    Signed-off-by: Paul Moore

    Stephen Smalley
     

25 Dec, 2016

1 commit


18 Dec, 2016

1 commit

  • …/linux/kernel/git/mszeredi/vfs

    Pull partial readlink cleanups from Miklos Szeredi.

    This is the uncontroversial part of the readlink cleanup patch-set that
    simplifies the default readlink handling.

    Miklos and Al are still discussing the rest of the series.

    * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    vfs: make generic_readlink() static
    vfs: remove ".readlink = generic_readlink" assignments
    vfs: default to generic_readlink()
    vfs: replace calling i_op->readlink with vfs_readlink()
    proc/self: use generic_readlink
    ecryptfs: use vfs_get_link()
    bad_inode: add missing i_op initializers

    Linus Torvalds
     

15 Dec, 2016

2 commits

  • Pull audit updates from Paul Moore:
    "After the small number of patches for v4.9, we've got a much bigger
    pile for v4.10.

    The bulk of these patches involve a rework of the audit backlog queue
    to enable us to move the netlink multicasting out of the task/thread
    that generates the audit record and into the kernel thread that emits
    the record (just like we do for the audit unicast to auditd).

    While we were playing with the backlog queue(s) we fixed a number of
    other little problems with the code, and from all the testing so far
    things look to be in much better shape now. Doing this also allowed us
    to re-enable disabling IRQs for some netns operations ("netns: avoid
    disabling irq for netns id").

    The remaining patches fix some small problems that are well documented
    in the commit descriptions, as well as adding session ID filtering
    support"

    * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
    audit: use proper refcount locking on audit_sock
    netns: avoid disabling irq for netns id
    audit: don't ever sleep on a command record/message
    audit: handle a clean auditd shutdown with grace
    audit: wake up kauditd_thread after auditd registers
    audit: rework audit_log_start()
    audit: rework the audit queue handling
    audit: rename the queues and kauditd related functions
    audit: queue netlink multicast sends just like we do for unicast sends
    audit: fixup audit_init()
    audit: move kaudit thread start from auditd registration to kaudit init (#2)
    audit: add support for session ID user filter
    audit: fix formatting of AUDIT_CONFIG_CHANGE events
    audit: skip sessionid sentinel value when auto-incrementing
    audit: tame initialization warning len_abuf in audit_log_execve_info
    audit: less stack usage for /proc/*/loginuid

    Linus Torvalds
     
  • Pull security subsystem updates from James Morris:
    "Generally pretty quiet for this release. Highlights:

    Yama:
    - allow ptrace access for original parent after re-parenting

    TPM:
    - add documentation
    - many bugfixes & cleanups
    - define a generic open() method for ascii & bios measurements

    Integrity:
    - Harden against malformed xattrs

    SELinux:
    - bugfixes & cleanups

    Smack:
    - Remove unnecessary smack_known_invalid label
    - Do not apply star label in smack_setprocattr hook
    - parse mnt opts after privileges check (fixes unpriv DoS vuln)"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits)
    Yama: allow access for the current ptrace parent
    tpm: adjust return value of tpm_read_log
    tpm: vtpm_proxy: conditionally call tpm_chip_unregister
    tpm: Fix handling of missing event log
    tpm: Check the bios_dir entry for NULL before accessing it
    tpm: return -ENODEV if np is not set
    tpm: cleanup of printk error messages
    tpm: replace of_find_node_by_name() with dev of_node property
    tpm: redefine read_log() to handle ACPI/OF at runtime
    tpm: fix the missing .owner in tpm_bios_measurements_ops
    tpm: have event log use the tpm_chip
    tpm: drop tpm1_chip_register(/unregister)
    tpm: replace dynamically allocated bios_dir with a static array
    tpm: replace symbolic permission with octal for securityfs files
    char: tpm: fix kerneldoc tpm2_unseal_trusted name typo
    tpm_tis: Allow tpm_tis to be bound using DT
    tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV
    tpm: Only call pm_runtime_get_sync if device has a parent
    tpm: define a generic open() method for ascii & bios measurements
    Documentation: tpm: add the Physical TPM device tree binding documentation
    ...

    Linus Torvalds
     

14 Dec, 2016

1 commit

  • Pull xen updates from Juergen Gross:
    "Xen features and fixes for 4.10

    These are some fixes, a move of some arm related headers to share them
    between arm and arm64 and a series introducing a helper to make code
    more readable.

    The most notable change is David stepping down as maintainer of the
    Xen hypervisor interface. This results in me sending you the pull
    requests for Xen related code from now on"

    * tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits)
    xen/balloon: Only mark a page as managed when it is released
    xenbus: fix deadlock on writes to /proc/xen/xenbus
    xen/scsifront: don't request a slot on the ring until request is ready
    xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
    x86: Make E820_X_MAX unconditionally larger than E820MAX
    xen/pci: Bubble up error and fix description.
    xen: xenbus: set error code on failure
    xen: set error code on failures
    arm/xen: Use alloc_percpu rather than __alloc_percpu
    arm/arm64: xen: Move shared architecture headers to include/xen/arm
    xen/events: use xen_vcpu_id mapping for EVTCHNOP_status
    xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
    xen-scsifront: Add a missing call to kfree
    MAINTAINERS: update XEN HYPERVISOR INTERFACE
    xenfs: Use proc_create_mount_point() to create /proc/xen
    xen-platform: use builtin_pci_driver
    xen-netback: fix error handling output
    xen: make use of xenbus_read_unsigned() in xenbus
    xen: make use of xenbus_read_unsigned() in xen-pciback
    xen: make use of xenbus_read_unsigned() in xen-fbfront
    ...

    Linus Torvalds
     

13 Dec, 2016

11 commits

  • Runtime nlink calculation works but meh. I don't know how to do it at
    compile time, but I know how to do it at init time.

    Shift "2+" part into init time as a bonus.

    Link: http://lkml.kernel.org/r/20161122195549.GB29812@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Comparison for "
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • format_decode and vsnprintf occasionally show up in perf top, so I went
    looking for places that might not need the full printf power. With the
    help of kprobes, I gathered some statistics on which format strings we
    mostly pass to vsnprintf. On a trivial desktop workload, I hit "%x" 25%
    of the time, so something apparently reads /proc/pid/status (which does
    5*16 printf("%x") calls) a lot.

    With this patch, reading /proc/pid/status is 30% faster according to
    this microbenchmark:

    char buf[4096];
    int i, fd;
    for (i = 0; i < 10000; ++i) {
    fd = open("/proc/self/status", O_RDONLY);
    read(fd, buf, sizeof(buf));
    close(fd);
    }

    Link: http://lkml.kernel.org/r/1474410485-1305-1-git-send-email-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Acked-by: Andrei Vagin
    Acked-by: Kees Cook
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • Some comments were obsoleted since commit 05c0ae21c034 ("try a saner
    locking for pde_opener...").

    Some new comments added.

    Some confusing comments replaced with equally confusing ones.

    Link: http://lkml.kernel.org/r/20161029160231.GD1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • kzalloc is too much, half of the fields will be reinitialized anyway.

    If proc file doesn't have ->release hook (some still do not), clearing
    is unnecessary because it will be freed immediately.

    Link: http://lkml.kernel.org/r/20161029155747.GC1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • struct pde_opener::closing is boolean.

    Link: http://lkml.kernel.org/r/20161029155439.GB1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • list_del_init() is too much, structure will be freed in three lines
    anyway.

    Link: http://lkml.kernel.org/r/20161029155313.GA1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Linux doesn't support 4GB+ filenames in /proc, so unsigned long is too
    much.

    MOV r64, r/m64 is larger than MOV r32, r/m32.

    Link: http://lkml.kernel.org/r/20161029161123.GG1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • "unsigned int" is better on x86_64 because it most of the time it
    autoexpands to 64-bit value while "int" requires MOVSX instruction.

    Link: http://lkml.kernel.org/r/20161029160810.GF1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Similar to being able to examine if a process has been correctly
    confined with seccomp, the state of no_new_privs is equally interesting,
    so this adds it to /proc/$pid/status.

    Link: http://lkml.kernel.org/r/20161103214041.GA58566@beast
    Signed-off-by: Kees Cook
    Reviewed-by: Jann Horn
    Cc: Jonathan Corbet
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Konstantin Khlebnikov
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: Rodrigo Freire
    Cc: John Stultz
    Cc: Ross Zwisler
    Cc: Robert Ho
    Cc: Jerome Marchand
    Cc: Andy Lutomirski
    Cc: Johannes Weiner
    Cc: Alexey Dobriyan
    Cc: "Richard W.M. Jones"
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The other pagetable walks in task_mmu.c have a cond_resched() after
    walking their ptes: add a cond_resched() in gather_pte_stats() too, for
    reading /proc//numa_maps. Only pagemap_pmd_range() has a
    cond_resched() in its (unusually expensive) pmd_trans_huge case: more
    should probably be added, but leave them unchanged for now.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1612052157400.13021@eggly.anvils
    Signed-off-by: Hugh Dickins
    Acked-by: Michal Hocko
    Cc: David Rientjes
    Cc: Gerald Schaefer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

09 Dec, 2016

2 commits


24 Nov, 2016

1 commit


17 Nov, 2016

1 commit

  • Mounting proc in user namespace containers fails if the xenbus
    filesystem is mounted on /proc/xen because this directory fails
    the "permanently empty" test. proc_create_mount_point() exists
    specifically to create such mountpoints in proc but is currently
    proc-internal. Export this interface to modules, then use it in
    xenbus when creating /proc/xen.

    Signed-off-by: Seth Forshee
    Signed-off-by: David Vrabel
    Signed-off-by: Juergen Gross

    Seth Forshee
     

15 Nov, 2016

1 commit

  • Pass the file mode of the proc inode to be created to
    proc_pid_make_inode. In proc_pid_make_inode, initialize inode->i_mode
    before calling security_task_to_inode. This allows selinux to set
    isec->sclass right away without introducing "half-initialized" inode
    security structs.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Paul Moore

    Andreas Gruenbacher
     

04 Nov, 2016

1 commit


28 Oct, 2016

1 commit

  • Reading auxv of any kernel thread results in NULL pointer dereferencing
    in auxv_read() where mm can be NULL. Fix that by checking for NULL mm
    and bailing out early. This is also the original behavior changed by
    recent commit c5317167854e ("proc: switch auxv to use of __mem_open()").

    # cat /proc/2/auxv
    Unable to handle kernel NULL pointer dereference at virtual address 000000a8
    Internal error: Oops: 17 [#1] PREEMPT SMP ARM
    CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
    Hardware name: BCM2709
    task: ea3b0b00 task.stack: e99b2000
    PC is at auxv_read+0x24/0x4c
    LR is at do_readv_writev+0x2fc/0x37c
    Process cat (pid: 113, stack limit = 0xe99b2210)
    Call chain:
    auxv_read
    do_readv_writev
    vfs_readv
    default_file_splice_read
    splice_direct_to_actor
    do_splice_direct
    do_sendfile
    SyS_sendfile64
    ret_fast_syscall

    Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()")
    Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
    Signed-off-by: Leon Yu
    Acked-by: Oleg Nesterov
    Acked-by: Michal Hocko
    Cc: Al Viro
    Cc: Kees Cook
    Cc: John Stultz
    Cc: Mateusz Guzik
    Cc: Janis Danisevskis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leon Yu
     

25 Oct, 2016

1 commit

  • Now that Lorenzo cleaned things up and made the FOLL_FORCE users
    explicit, it becomes obvious how some of them don't really need
    FOLL_FORCE at all.

    So remove FOLL_FORCE from the proc code that reads the command line and
    arguments from user space.

    The mem_rw() function actually does want FOLL_FORCE, because gdd (and
    possibly many other debuggers) use it as a much more convenient version
    of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
    conditional on actually being a ptracer. This does not actually do
    that, just moves adds a comment to that effect and moves the gup_flags
    settings next to each other.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Oct, 2016

1 commit

  • Pull vmap stack fixes from Ingo Molnar:
    "This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
    accesses that used to be just somewhat questionable are now totally
    buggy.

    These changes try to do it without breaking the ABI: the fields are
    left there, they are just reporting zero, or reporting narrower
    information (the maps file change)"

    * 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
    fs/proc: Stop trying to report thread stacks
    fs/proc: Stop reporting eip and esp in /proc/PID/stat
    mm/numa: Remove duplicated include from mprotect.c

    Linus Torvalds
     

20 Oct, 2016

2 commits

  • This reverts more of:

    b76437579d13 ("procfs: mark thread stack correctly in proc//maps")

    ... which was partially reverted by:

    65376df58217 ("proc: revert /proc//maps [stack:TID] annotation")

    Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.

    In current kernels, /proc/PID/maps (or /proc/TID/maps even for
    threads) shows "[stack]" for VMAs in the mm's stack address range.

    In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
    target thread's stack's VMA. This is racy, probably returns garbage
    and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
    KSTK_ESP is not safe to use on tasks that aren't known to be running
    ordinary process-context kernel code.

    This patch removes the difference and just shows "[stack]" for VMAs
    in the mm's stack range. This is IMO much more sensible -- the
    actual "stack" address really is treated specially by the VM code,
    and the current thread stack isn't even well-defined for programs
    that frequently switch stacks on their own.

    Reported-by: Jann Horn
    Signed-off-by: Andy Lutomirski
    Acked-by: Thomas Gleixner
    Cc: Al Viro
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Linux API
    Cc: Peter Zijlstra
    Cc: Tycho Andersen
    Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • Reporting these fields on a non-current task is dangerous. If the
    task is in any state other than normal kernel code, they may contain
    garbage or even kernel addresses on some architectures. (x86_64
    used to do this. I bet lots of architectures still do.) With
    CONFIG_THREAD_INFO_IN_TASK=y, it can OOPS, too.

    As far as I know, there are no use programs that make any material
    use of these fields, so just get rid of them.

    Reported-by: Jann Horn
    Signed-off-by: Andy Lutomirski
    Acked-by: Thomas Gleixner
    Cc: Al Viro
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Linux API
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Tycho Andersen
    Link: http://lkml.kernel.org/r/a5fed4c3f4e33ed25d4bb03567e329bc5a712bcc.1475257877.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

19 Oct, 2016

1 commit

  • This removes the 'write' argument from access_remote_vm() and replaces
    it with 'gup_flags' as use of this function previously silently implied
    FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

    We make this explicit as use of FOLL_FORCE can result in surprising
    behaviour (and hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes