05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Mar, 2016

1 commit


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

03 Nov, 2015

1 commit


23 Oct, 2015

1 commit


22 Oct, 2015

3 commits

  • pstore doesn't support unregistering yet. It was marked as TODO.
    This patch adds some code to fix it:
    1) Add functions to unregister kmsg/console/ftrace/pmsg.
    2) Add a function to free compression buffer.
    3) Unmap the memory and free it.
    4) Add a function to unregister pstore filesystem.

    Signed-off-by: Geliang Tang
    Acked-by: Kees Cook
    [Removed __exit annotation from ramoops_remove(). Reported by Arnd Bergmann]
    Signed-off-by: Tony Luck

    Geliang Tang
     
  • Add a new wrapper function pstore_register_kmsg to keep the
    consistency with other similar pstore_register_* functions.

    Signed-off-by: Geliang Tang
    Signed-off-by: Tony Luck

    Geliang Tang
     
  • If vmalloc fails, make write_pmsg return -ENOMEM.

    Signed-off-by: Geliang Tang
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Geliang Tang
     

04 Jul, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • This allows for better documentation in the code and
    it allows for a simpler and fully correct version of
    fs_fully_visible to be written.

    The mount points converted and their filesystems are:
    /sys/hypervisor/s390/ s390_hypfs
    /sys/kernel/config/ configfs
    /sys/kernel/debug/ debugfs
    /sys/firmware/efi/efivars/ efivarfs
    /sys/fs/fuse/connections/ fusectl
    /sys/fs/pstore/ pstore
    /sys/kernel/tracing/ tracefs
    /sys/fs/cgroup/ cgroup
    /sys/kernel/security/ securityfs
    /sys/fs/selinux/ selinuxfs
    /sys/fs/smackfs/ smackfs

    Cc: stable@vger.kernel.org
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

22 May, 2015

5 commits


27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Numerous minor fixes, cleanups etc.

    - More EEH work from Gavin to remove its dependency on device_nodes.

    - Memory hotplug implemented entirely in the kernel from Nathan
    Fontenot.

    - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

    - Rewrite of VPHN parsing logic & tests from Greg Kurz.

    - A fix from Nish Aravamudan to reduce memory usage by clamping
    nodes_possible_map.

    - Support for pstore on powernv from Hari Bathini.

    - Removal of old powerpc specific byte swap routines by David Gibson.

    - Fix from Vasant Hegde to prevent the flash driver telling you it was
    flashing your firmware when it wasn't.

    - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

    - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
    Stancek.

    - Some fixes for migration from Tyrel Datwyler.

    - A new syscall to switch the cpu endian by Michael Ellerman.

    - Large series from Wei Yang to implement SRIOV, reviewed and acked by
    Bjorn.

    - A fix for the OPAL sensor driver from Cédric Le Goater.

    - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

    - Large series from Daniel Axtens to make our PCI hooks per PHB rather
    than per machine.

    - Small patch from Sam Bobroff to explicitly abort non-suspended
    transactions on syscalls, plus a test to exercise it.

    - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

    - Small patch to enable the hard lockup detector from Anton Blanchard.

    - Fix from Dave Olson for missing L2 cache information on some CPUs.

    - Some fixes from Michael Ellerman to get Cell machines booting again.

    - Freescale updates from Scott: Highlights include BMan device tree
    nodes, an MSI erratum workaround, a couple minor performance
    improvements, config updates, and misc fixes/cleanup.

    * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
    powerpc/powermac: Fix build error seen with powermac smp builds
    powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
    powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
    powerpc/cell: Fix iommu breakage caused by controller_ops change
    powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
    powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
    powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
    powerpc/pseries: Correct memory hotplug locking
    powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
    powerpc: Add ppc64 hard lockup detector support
    oprofile: Disable oprofile NMI timer on ppc64
    powerpc/perf/hv-24x7: Add missing put_cpu_var()
    powerpc/perf/hv-24x7: Break up single_24x7_request
    powerpc/perf/hv-24x7: Define update_event_count()
    powerpc/perf/hv-24x7: Whitespace cleanup
    powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
    powerpc/perf/hv-24x7: Rename hv_24x7_event_update
    powerpc/perf/hv-24x7: Move debug prints to separate function
    powerpc/perf/hv-24x7: Drop event_24x7_request()
    powerpc/perf/hv-24x7: Use pr_devel() to log message
    ...

    Conflicts:
    tools/testing/selftests/powerpc/Makefile
    tools/testing/selftests/powerpc/tm/Makefile

    Linus Torvalds
     

16 Apr, 2015

1 commit


23 Mar, 2015

1 commit


17 Mar, 2015

1 commit

  • In the function ramoops_probe, the console_size, pmsg_size,
    ftrace_size may be update because the value is not the power
    of two. We should update the module parameter variables
    as well so they are visible through /sys/module/ramoops/parameters
    correctly.

    Signed-off-by: Wang Long
    Acked-by: Mark Salyzyn
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Wang Long
     

17 Jan, 2015

5 commits


15 Dec, 2014

1 commit

  • Pull driver core update from Greg KH:
    "Here's the set of driver core patches for 3.19-rc1.

    They are dominated by the removal of the .owner field in platform
    drivers. They touch a lot of files, but they are "simple" changes,
    just removing a line in a structure.

    Other than that, a few minor driver core and debugfs changes. There
    are some ath9k patches coming in through this tree that have been
    acked by the wireless maintainers as they relied on the debugfs
    changes.

    Everything has been in linux-next for a while"

    * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
    Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
    fs: debugfs: add forward declaration for struct device type
    firmware class: Deletion of an unnecessary check before the function call "vunmap"
    firmware loader: fix hung task warning dump
    devcoredump: provide a one-way disable function
    device: Add dev__once variants
    ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
    ath: use seq_file api for ath9k debugfs files
    debugfs: add helper function to create device related seq_file
    drivers/base: cacheinfo: remove noisy error boot message
    Revert "core: platform: add warning if driver has no owner"
    drivers: base: support cpu cache information interface to userspace via sysfs
    drivers: base: add cpu_device_create to support per-cpu devices
    topology: replace custom attribute macros with standard DEVICE_ATTR*
    cpumask: factor out show_cpumap into separate helper function
    driver core: Fix unbalanced device reference in drivers_probe
    driver core: fix race with userland in device_add()
    sysfs/kernfs: make read requests on pre-alloc files use the buffer.
    sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
    fs: sysfs: return EGBIG on write if offset is larger than file size
    ...

    Linus Torvalds
     

12 Dec, 2014

2 commits

  • On some ARMs the memory can be mapped pgprot_noncached() and still
    be working for atomic operations. As pointed out by Colin Cross
    , in some cases you do want to use
    pgprot_noncached() if the SoC supports it to see a debug printk
    just before a write hanging the system.

    On ARMs, the atomic operations on strongly ordered memory are
    implementation defined. So let's provide an optional kernel parameter
    for configuring pgprot_noncached(), and use pgprot_writecombine() by
    default.

    Cc: Arnd Bergmann
    Cc: Rob Herring
    Cc: Randy Dunlap
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Olof Johansson
    Cc: Russell King
    Cc: stable@vger.kernel.org
    Acked-by: Kees Cook
    Signed-off-by: Tony Lindgren
    Signed-off-by: Tony Luck

    Tony Lindgren
     
  • Currently trying to use pstore on at least ARMs can hang as we're
    mapping the peristent RAM with pgprot_noncached().

    On ARMs, pgprot_noncached() will actually make the memory strongly
    ordered, and as the atomic operations pstore uses are implementation
    defined for strongly ordered memory, they may not work. So basically
    atomic operations have undefined behavior on ARM for device or strongly
    ordered memory types.

    Let's fix the issue by using write-combine variants for mappings. This
    corresponds to normal, non-cacheable memory on ARM. For many other
    architectures, this change does not change the mapping type as by
    default we have:

    #define pgprot_writecombine pgprot_noncached

    The reason why pgprot_noncached() was originaly used for pstore
    is because Colin Cross had observed lost
    debug prints right before a device hanging write operation on some
    systems. For the platforms supporting pgprot_noncached(), we can
    add a an optional configuration option to support that. But let's
    get pstore working first before adding new features.

    Cc: Arnd Bergmann
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Olof Johansson
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Kees Cook
    Signed-off-by: Rob Herring
    [tony@atomide.com: updated description]
    Signed-off-by: Tony Lindgren
    Signed-off-by: Tony Luck

    Rob Herring
     

06 Nov, 2014

2 commits

  • When the kernel.dmesg_restrict restriction is in place, only users with
    CAP_SYSLOG should be able to access crash dumps (like: attacker is
    trying to exploit a bug, watchdog reboots, attacker can happily read
    crash dumps and logs).

    This puts the restriction on console-* types as well as sensitive
    information could have been leaked there.

    Other log types are unaffected.

    Signed-off-by: Sebastian Schmidt
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Sebastian Schmidt
     
  • pstore compression/decompression was added during 3.12.
    The ramoops driver prepends a "====timestamp.timestamp-C|D\n"
    header to the compressed record before handing it over to pstore
    driver which doesn't know about the header. In pstore_decompress(),
    the pstore driver reads the first "==" as a zlib header, so the
    decompression always fails. For example, this causes the driver
    to write /dev/pstore/dmesg-ramoops-0.enc.z instead of
    /dev/pstore/dmesg-ramoops-0.

    This patch makes the ramoops driver remove the header before
    pstore decompression.

    Signed-off-by: Ben Zhang
    Acked-by: Kees Cook
    Signed-off-by: Tony Luck

    Ben Zhang
     

20 Oct, 2014

1 commit


16 Oct, 2014

1 commit

  • The pstore filesystem still creates duplicate filename/inode pairs for
    some pstore types. Add the id to the filename to prevent that.

    Before patch:

    [/sys/fs/pstore] ls -li
    total 0
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
    1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi

    After:

    [/sys/fs/pstore] ls -li
    total 0
    1232 -r--r--r--. 1 root root 148 Sep 29 17:09 console-efi-141202499100000
    1231 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi-141202499200000
    1230 -r--r--r--. 1 root root 148 Sep 29 17:44 console-efi-141202705400000
    1229 -r--r--r--. 1 root root 67 Sep 29 17:44 console-efi-141202705500000
    1228 -r--r--r--. 1 root root 67 Sep 29 20:42 console-efi-141203772600000
    1227 -r--r--r--. 1 root root 148 Sep 29 23:42 console-efi-141204854900000
    1226 -r--r--r--. 1 root root 67 Sep 29 23:42 console-efi-141204855000000
    1225 -r--r--r--. 1 root root 148 Sep 29 23:59 console-efi-141204954200000
    1224 -r--r--r--. 1 root root 67 Sep 29 23:59 console-efi-141204954400000

    Signed-off-by: Valdis Kletnieks
    Acked-by: Kees Cook
    Cc: stable@vger.kernel.org # 3.6+
    Signed-off-by: Tony Luck

    Valdis Kletnieks
     

09 Aug, 2014

1 commit


07 Jun, 2014

1 commit

  • - Define pr_fmt in plateform.c and ram_core.c for global prefix.

    - Coalesce format fragments.

    - Separate format/arguments on lines > 80 characters.

    Note: Some pr_foo() were initially declared without prefix and therefore
    this could break existing log analyzer.

    [akpm@linux-foundation.org: missed a couple of prefix removals]
    Signed-off-by: Fabian Frederick
    Cc: Joe Perches
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

18 Mar, 2014

5 commits