05 Aug, 2016

1 commit


03 Aug, 2016

2 commits

  • When CONFIG_RANDOMIZE_MEMORY is set on x86-64, __PAGE_OFFSET becomes
    a variable and using it as a symbol in the image memory restoration
    assembly code under core_restore_code is not correct any more.

    To avoid that problem, modify set_up_temporary_mappings() to compute
    the physical address of the temporary page tables and store it in
    temp_level4_pgt, so that the value of that variable is ready to be
    written into CR3. Then, the assembly code doesn't have to worry
    about converting that value into a physical address and things work
    regardless of whether or not CONFIG_RANDOMIZE_MEMORY is set.

    Reported-and-tested-by: Thomas Garnier
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • CPU frequency transition statistics are not absolutely required for
    proper cpufreq operation on the system AFAICT so remove the default-yes
    setting in Kconfig.

    Signed-off-by: Borislav Petkov
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Borislav Petkov
     

30 Jul, 2016

16 commits

  • Pull power management fix from Rafael Wysocki:
    "Fix a nasty (and really hard to debug) memory corruption during resume
    from hibernation on x86-64 (that leads to a kernel panic most of the
    time) due to the use of a stale stack pointer value in FRAME_BEGIN
    (Josh Poimboeuf)"

    * tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    x86/power/64: Fix hibernation return address corruption

    Linus Torvalds
     
  • Pull more cgroup updates from Tejun Heo:
    "I forgot to include the patches which got applied to for-4.7-fixes
    late during last cycle.

    Eric's three patches fix bugs introduced with the namespace support"

    * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
    cgroupns: Close race between cgroup_post_fork and copy_cgroup_ns
    cgroupns: Fix the locking in copy_cgroup_ns

    Linus Torvalds
     
  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the next part of the hotplug rework.

    - Convert all notifiers with a priority assigned

    - Convert all CPU_STARTING/DYING notifiers

    The final removal of the STARTING/DYING infrastructure will happen
    when the merge window closes.

    Another 700 hundred line of unpenetrable maze gone :)"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    timers/core: Correct callback order during CPU hot plug
    leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
    powerpc/numa: Convert to hotplug state machine
    arm/perf: Fix hotplug state machine conversion
    irqchip/armada: Avoid unused function warnings
    ARC/time: Convert to hotplug state machine
    clocksource/atlas7: Convert to hotplug state machine
    clocksource/armada-370-xp: Convert to hotplug state machine
    clocksource/exynos_mct: Convert to hotplug state machine
    clocksource/arm_global_timer: Convert to hotplug state machine
    rcu: Convert rcutree to hotplug state machine
    KVM/arm/arm64/vgic-new: Convert to hotplug state machine
    smp/cfd: Convert core to hotplug state machine
    x86/x2apic: Convert to CPU hotplug state machine
    profile: Convert to hotplug state machine
    timers/core: Convert to hotplug state machine
    hrtimer: Convert to hotplug state machine
    x86/tboot: Convert to hotplug state machine
    arm64/armv8 deprecated: Convert to hotplug state machine
    hwtracing/coresight-etm4x: Convert to hotplug state machine
    ...

    Linus Torvalds
     
  • Pull IDE updates from David Miller:
    "Just a couple small bug fixes, nothing overly exciting in here"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
    ide: missing break statement in set_timings_mdma()
    ide: hpt366: fix incorrect mask when checking at cmd_high_time
    ide-tape: fix misprint in failure handling in idetape_init()
    cmd640: add __init attribute

    Linus Torvalds
     
  • Pull sparc updates from David Miller:

    1) Double spin lock bug in sunhv serial driver, from Dan Carpenter.

    2) Use correct RSS estimate when determining whether to grow the huge
    TSB or not, from Mike Kravetz.

    3) Don't use full three level page tables for hugepages, PMD level is
    sufficient. From Nitin Gupta.

    4) Mask out extraneous bits from TSB_TAG_ACCESS register, we only want
    the address bits.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Trim page tables for 8M hugepages
    sparc64 mm: Fix base TSB sizing when hugetlb pages are used
    sparc: serial: sunhv: fix a double lock bug
    sparc32: off by ones in BUG_ON()
    sparc: Don't leak context bits into thread->fault_address

    Linus Torvalds
     
  • Pull ARC updates from Vineet Gupta:
    "Things have been calm here - nothing much except for a few fixes"

    * tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: mm: don't loose PTE_SPECIAL in pte_modify()
    ARC: dma: fix address translation in arc_dma_free
    ARC: typo fix in mm/ioremap.c
    ARC: fix linux-next build breakage

    Linus Torvalds
     
  • * pm-sleep:
    x86/power/64: Fix hibernation return address corruption

    Rafael J. Wysocki
     
  • Pull AVR32 updates from Hans-Christian Noren Egtvedt.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
    avr32: off by one in at32_init_pio()
    avr32: fixup code style in unistd.h and syscall_table.S
    avr32: wire up preadv2 and pwritev2 syscalls

    Linus Torvalds
     
  • Pull ARM updates from Russell King:
    "Included in this update are:

    - Patches from Gregory Clement to fix the coherent DMA cases in our
    dma-mapping code.

    - A number of CPU errata updates and fixes.

    - ARM cpuidle improvements from Jisheng Zhang.

    - Fix from Kees for the location of _etext.

    - Cleanups from Masahiro Yamada to avoid duplicated messages during
    the kernel build, and remove CONFIG_ARCH_HAS_BARRIERS.

    - Remove a udelay loop limitation, allowing for faster CPUs to
    calibrate the delay correctly.

    - Cleanup some left-overs from the SW PAN implementation.

    - Ensure that a modified address limit is not visible to exception
    handlers"

    * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (21 commits)
    ARM: 8586/1: cpuidle: make arm_cpuidle_suspend() a bit more efficient
    ARM: 8585/1: cpuidle: fix !cpuidle_ops[cpu].init case during init
    ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is used
    ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherent
    ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421
    ARM: 8559/1: errata: Workaround erratum A12 821420
    ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423
    ARM: save and reset the address limit when entering an exception
    ARM: 8577/1: Fix Cortex-A15 798181 errata initialization
    ARM: 8584/1: floppy: avoid gcc-6 warning
    ARM: 8583/1: mm: fix location of _etext
    ARM: 8582/1: remove unused CONFIG_ARCH_HAS_BARRIERS
    ARM: 8306/1: loop_udelay: remove bogomips value limitation
    ARM: 8581/1: add missing to arch/arm/kernel/devtree.c
    ARM: 8576/1: avoid duplicating "Kernel: arch/arm/boot/*Image is ready"
    ARM: 8556/1: on a generic DT system: do not touch l2x0
    ARM: uaccess: remove put_user() code duplication
    ARM: 8580/1: Remove orphaned __addr_ok() definition
    ARM: get rid of horrible *(unsigned int *)(regs + 1)
    ARM: introduce svc_pt_regs structure
    ...

    Linus Torvalds
     
  • Pull fuse updates from Miklos Szeredi:
    "This fixes error propagation from writeback to fsync/close for
    writeback cache mode as well as adding a missing capability flag to
    the INIT message. The rest are cleanups.

    (The commits are recent but all the code actually sat in -next for a
    while now. The recommits are due to conflict avoidance and the
    addition of Cc: stable@...)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: use filemap_check_errors()
    mm: export filemap_check_errors() to modules
    fuse: fix wrong assignment of ->flags in fuse_send_init()
    fuse: fuse_flush must check mapping->flags for errors
    fuse: fsync() did not return IO errors
    fuse: don't mess with blocking signals
    new helper: wait_event_killable_exclusive()
    fuse: improve aio directIO write performance for size extending writes

    Linus Torvalds
     
  • This reverts commit 3c9fe8cdff1b889a059a30d22f130372f2b3885f.

    As Miklos points out in commit c1b2cc1a765a, the "lookup_hash()" helper
    is now unused, and in fact, with the hash salting changes, since the
    hash of a dentry name now depends on the directory dentry it is in, the
    helper function isn't even really likely to be useful.

    So rather than keep it around in case somebody else might end up finding
    a use for it, let's just remove the helper and not trick people into
    thinking it might be a useful thing.

    For example, I had obviously completely missed how the helper didn't
    follow the normal dentry hashing patterns, and how the hash salting
    patch broke overlayfs. Things would quietly build and look sane, but
    not work.

    Suggested-by: Miklos Szeredi
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull overlayfs update from Miklos Szeredi:
    "First of all, this fixes a regression in overlayfs introduced by the
    dentry hash salting. I've moved the patch fixing this to the front of
    the queue, so if (god forbid) something needs to be bisected in
    overlayfs this regression won't interfere with that.

    The biggest part is preparation for selinux support, done by Vivek
    Goyal. Essentially this makes all operations on underlying
    filesystems be done with credentials of mounter. This makes
    everything nicely consistent.

    There are also fixes for a number of known and recently discovered
    non-standard behavior (thanks to Eryu Guan for testing and improving
    the test suites)"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (23 commits)
    ovl: simplify empty checking
    qstr: constify instances in overlayfs
    ovl: clear nlink on rmdir
    ovl: disallow overlayfs as upperdir
    ovl: fix warning
    ovl: remove duplicated include from super.c
    ovl: append MAY_READ when diluting write checks
    ovl: dilute permission checks on lower only if not special file
    ovl: fix POSIX ACL setting
    ovl: share inode for hard link
    ovl: store real inode pointer in ->i_private
    ovl: permission: return ECHILD instead of ENOENT
    ovl: update atime on upper
    ovl: fix sgid on directory
    ovl: simplify permission checking
    ovl: do not require mounter to have MAY_WRITE on lower
    ovl: do operations on underlying file system in mounter's context
    ovl: modify ovl_permission() to do checks on two inodes
    ovl: define ->get_acl() for overlay inodes
    ovl: move some common code in a function
    ...

    Linus Torvalds
     
  • Pull freevxfs updates from Christoph Hellwig:
    "Support for foreign endianess and HP-UP superblocks from
    Krzysztof Błaszkowski"

    * tag 'freevxfs-for-4.8' of git://git.infradead.org/users/hch/freevxfs:
    freevxfs: update Kconfig information
    freevxfs: refactor readdir and lookup code
    freevxfs: fix lack of inode initialization
    freevxfs: fix memory leak in vxfs_read_fshead()
    freevxfs: update documentation and cresdits for HP-UX support
    freevxfs: implement ->alloc_inode and ->destroy_inode
    freevxfs: avoid the need for forward declaring the super operations
    freevxfs: move VFS inode allocation into vxfs_blkiget and vxfs_stiget
    freevxfs: remove vxfs_put_fake_inode
    freevxfs: handle big endian HP-UX file systems

    Linus Torvalds
     
  • Pull configfs update from Christoph Hellwig:
    "A simple error handling fix from Tal Shorer"

    * tag 'configfs-for-4.8' of git://git.infradead.org/users/hch/configfs:
    configfs: don't set buffer_needs_fill to zero if show() returns error

    Linus Torvalds
     
  • Pull CIFS/SMB3 fixes from Steve French:
    "Various CIFS/SMB3 fixes, most for stable"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Fix a possible invalid memory access in smb2_query_symlink()
    fs/cifs: make share unaccessible at root level mountable
    cifs: fix crash due to race in hmac(md5) handling
    cifs: unbreak TCP session reuse
    cifs: Check for existing directory when opening file with O_CREAT
    Add MF-Symlinks support for SMB 2.0

    Linus Torvalds
     
  • For PMD aligned (8M) hugepages, we currently allocate
    all four page table levels which is wasteful. We now
    allocate till PMD level only which saves memory usage
    from page tables.

    Also, when freeing page table for 8M hugepage backed region,
    make sure we don't try to access non-existent PTE level.

    Orabug: 22630259

    Signed-off-by: Nitin Gupta
    Signed-off-by: David S. Miller

    Nitin Gupta
     

29 Jul, 2016

21 commits

  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Can be used by fuse, btrfs and f2fs to replace opencoded variants.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.

    Signed-off-by: Wei Fang
    Signed-off-by: Miklos Szeredi
    Fixes: 69fe05c90ed5 ("fuse: add missing INIT flags")
    Cc:

    Wei Fang
     
  • fuse_flush() calls write_inode_now() that triggers writeback, but actual
    writeback will happen later, on fuse_sync_writes(). If an error happens,
    fuse_writepage_end() will set error bit in mapping->flags. So, we have to
    check mapping->flags after fuse_sync_writes().

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
    Cc: # v3.15+

    Maxim Patlasov
     
  • Due to implementation of fuse writeback filemap_write_and_wait_range() does
    not catch errors. We have to do this directly after fuse_sync_writes()

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
    Cc: # v3.15+

    Alexey Kuznetsov
     
  • In kernel bug 150021, a kernel panic was reported when restoring a
    hibernate image. Only a picture of the oops was reported, so I can't
    paste the whole thing here. But here are the most interesting parts:

    kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
    BUG: unable to handle kernel paging request at ffff8804615cfd78
    ...
    RIP: ffff8804615cfd78
    RSP: ffff8804615f0000
    RBP: ffff8804615cfdc0
    ...
    Call Trace:
    do_signal+0x23
    exit_to_usermode_loop+0x64
    ...

    The RIP is on the same page as RBP, so it apparently started executing
    on the stack.

    The bug was bisected to commit ef0f3ed5a4ac (x86/asm/power: Create
    stack frames in hibernate_asm_64.S), which in retrospect seems quite
    dangerous, since that code saves and restores the stack pointer from a
    global variable ('saved_context').

    There are a lot of moving parts in the hibernate save and restore paths,
    so I don't know exactly what caused the panic. Presumably, a FRAME_END
    was executed without the corresponding FRAME_BEGIN, or vice versa. That
    would corrupt the return address on the stack and would be consistent
    with the details of the above panic.

    [ rjw: One major problem is that by the time the FRAME_BEGIN in
    restore_registers() is executed, the stack pointer value may not
    be valid any more. Namely, the stack area pointed to by it
    previously may have been overwritten by some image memory contents
    and that page frame may now be used for whatever different purpose
    it had been allocated for before hibernation. In that case, the
    FRAME_BEGIN will corrupt that memory. ]

    Instead of doing the frame pointer save/restore around the bounds of the
    affected functions, just do it around the call to swsusp_save().

    That has the same effect of ensuring that if swsusp_save() sleeps, the
    frame pointers will be correct. It's also a much more obviously safe
    way to do it than the original patch. And objtool still doesn't report
    any warnings.

    Fixes: ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
    Cc: 4.6+ # 4.6+
    Reported-by: Andre Reinke
    Tested-by: Andre Reinke
    Signed-off-by: Josh Poimboeuf
    Acked-by: Ingo Molnar
    Signed-off-by: Rafael J. Wysocki

    Josh Poimboeuf
     
  • The empty checking logic is duplicated in ovl_check_empty_and_clear() and
    ovl_remove_and_whiteout(), except the condition for clearing whiteouts is
    different:

    ovl_check_empty_and_clear() checked for being upper

    ovl_remove_and_whiteout() checked for merge OR lower

    Move the intersection of those checks (upper AND merge) into
    ovl_check_empty_and_clear() and simplify ovl_remove_and_whiteout().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Signed-off-by: Al Viro
    Signed-off-by: Miklos Szeredi

    Al Viro
     
  • To make delete notification work on fa/inotify.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This does not work and does not make sense. So instead of fixing it
    (probably not hard) just disallow.

    Reported-by: Andrei Vagin
    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • There's a superfluous newline in the warning message in ovl_d_real().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Remove duplicated include.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Miklos Szeredi

    Wei Yongjun
     
  • Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on
    lower/. This is done as files on lower will never be written and will be
    copied up. But to copy up a file, mounter should have MAY_READ permission
    otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is
    reset.

    Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned
    True (context mounts) but when he tried to actually write to file, it
    failed as mounter did not have permission on lower file.

    [SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this
    won't trigger a copy-up.

    Reported-by: Dan Walsh
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from
    mask as lower/ will never be written and file will be copied up. But this
    is not true for special files. These files are not copied up and are opened
    in place. So don't dilute the checks for these types of files.

    Reported-by: Dan Walsh
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Setting POSIX ACL needs special handling:

    1) Some permission checks are done by ->setxattr() which now uses mounter's
    creds ("ovl: do operations on underlying file system in mounter's
    context"). These permission checks need to be done with current cred as
    well.

    2) Setting ACL can fail for various reasons. We do not need to copy up in
    these cases.

    In the mean time switch to using generic_setxattr.

    [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
    doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
    disabled, and I could not come up with an obvious way to do it.

    This instead avoids the link error by defining two sets of ACL operations
    and letting the compiler drop one of the two at compile time depending
    on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
    also leading to smaller code.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
    mtime, ctime) so generic code using these fields works correcty. If a hard
    link is created in overlayfs separate inodes are allocated for each link.
    If chmod/chown/etc. is performed on one of the links then the inode
    belonging to the other ones won't be updated.

    This patch attempts to fix this by sharing inodes for hard links.

    Use inode hash (with real inode pointer as a key) to make sure overlay
    inodes are shared for hard links on upper. Hard links on lower are still
    split (which is not user observable until the copy-up happens, see
    Documentation/filesystems/overlayfs.txt under "Non-standard behavior").

    The inode is only inserted in the hash if it is non-directoy and upper.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • To get from overlay inode to real inode we currently use 'struct
    ovl_entry', which has lifetime connected to overlay dentry. This is okay,
    since each overlay dentry had a new overlay inode allocated.

    Following patch will break that assumption, so need to leave out ovl_entry.
    This patch stores the real inode directly in i_private, with the lowest bit
    used to indicate whether the inode is upper or lower.

    Lifetime rules remain, using ovl_inode_real() must only be done while
    caller holds ref on overlay dentry (and hence on real dentry), or within
    RCU protected regions.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The error is due to RCU and is temporary.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Fix atime update logic in overlayfs.

    This patch adds an i_op->update_time() handler to overlayfs inodes. This
    forwards atime updates to the upper layer only. No atime updates are done
    on lower layers.

    Remove implicit atime updates to underlying files and directories with
    O_NOATIME. Remove explicit atime update in ovl_readlink().

    Clear atime related mnt flags from cloned upper mount. This means atime
    updates are controlled purely by overlayfs mount options.

    Reported-by: Konstantin Khlebnikov
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • When creating directory in workdir, the group/sgid inheritance from the
    parent dir was omitted completely. Fix this by calling inode_init_owner()
    on overlay inode and using the resulting uid/gid/mode to create the file.

    Unfortunately the sgid bit can be stripped off due to umask, so need to
    reset the mode in this case in workdir before moving the directory in
    place.

    Reported-by: Eryu Guan
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The fact that we always do permission checking on the overlay inode and
    clear MAY_WRITE for checking access to the lower inode allows cruft to be
    removed from ovl_permission().

    1) "default_permissions" option effectively did generic_permission() on the
    overlay inode with i_mode, i_uid and i_gid updated from underlying
    filesystem. This is what we do by default now. It did the update using
    vfs_getattr() but that's only needed if the underlying filesystem can
    change (which is not allowed). We may later introduce a "paranoia_mode"
    that verifies that mode/uid/gid are not changed.

    2) splitting out the IS_RDONLY() check from inode_permission() also becomes
    unnecessary once we remove the MAY_WRITE from the lower inode check.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi