13 Feb, 2015

7 commits

  • The output of /proc/$pid/numa_maps is in terms of number of pages like
    anon=22 or dirty=54. Here's some output:

    7f4680000000 default file=/hugetlb/bigfile anon=50 dirty=50 N0=50
    7f7659600000 default file=/anon_hugepage\040(deleted) anon=50 dirty=50 N0=50
    7fff8d425000 default stack anon=50 dirty=50 N0=50

    Looks like we have a stack and a couple of anonymous hugetlbfs
    areas page which both use the same amount of memory. They don't.

    The 'bigfile' uses 1GB pages and takes up ~50GB of space. The
    anon_hugepage uses 2MB pages and takes up ~100MB of space while the stack
    uses normal 4k pages. You can go over to smaps to figure out what the
    page size _really_ is with KernelPageSize or MMUPageSize. But, I think
    this is a pretty nasty and counterintuitive interface as it stands.

    This patch introduces 'kernelpagesize_kB' line element to
    /proc//numa_maps report file in order to help identifying the size of
    pages that are backing memory areas mapped by a given task. This is
    specially useful to help differentiating between HUGE and GIGANTIC page
    backed VMAs.

    This patch is based on Dave Hansen's proposal and reviewer's follow-ups
    taken from the following dicussion threads:
    * https://lkml.org/lkml/2011/9/21/454
    * https://lkml.org/lkml/2014/12/20/66

    Signed-off-by: Rafael Aquini
    Cc: Johannes Weiner
    Cc: Dave Hansen
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Add a small section to proc.txt doc in order to document its
    /proc/pid/numa_maps interface. It does not introduce any functional
    changes, just documentation.

    Signed-off-by: Rafael Aquini
    Cc: Johannes Weiner
    Cc: Dave Hansen
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Peak resident size of a process can be reset back to the process's
    current rss value by writing "5" to /proc/pid/clear_refs. The driving
    use-case for this would be getting the peak RSS value, which can be
    retrieved from the VmHWM field in /proc/pid/status, per benchmark
    iteration or test scenario.

    [akpm@linux-foundation.org: clarify behaviour in documentation]
    Signed-off-by: Petr Cermak
    Cc: Bjorn Helgaas
    Cc: Primiano Tucci
    Cc: Petr Cermak
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Cermak
     
  • Pull nfsd updates from Bruce Fields:
    "The main change is the pNFS block server support from Christoph, which
    allows an NFS client connected to shared disk to do block IO to the
    shared disk in place of NFS reads and writes. This also requires xfs
    patches, which should arrive soon through the xfs tree, barring
    unexpected problems. Support for other filesystems is also possible
    if there's interest.

    Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
    shape"

    * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: default NFSv4.2 to on
    nfsd: pNFS block layout driver
    exportfs: add methods for block layout exports
    nfsd: add trace events
    nfsd: update documentation for pNFS support
    nfsd: implement pNFS layout recalls
    nfsd: implement pNFS operations
    nfsd: make find_any_file available outside nfs4state.c
    nfsd: make find/get/put file available outside nfs4state.c
    nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
    nfsd: add fh_fsid_match helper
    nfsd: move nfsd_fh_match to nfsfh.h
    fs: add FL_LAYOUT lease type
    fs: track fl_owner for leases
    nfs: add LAYOUT_TYPE_MAX enum value
    nfsd: factor out a helper to decode nfstime4 values
    sunrpc/lockd: fix references to the BKL
    nfsd: fix year-2038 nfs4 state problem
    svcrdma: Handle additional inline content
    svcrdma: Move read list XDR round-up logic
    ...

    Linus Torvalds
     
  • Pull IOMMU updates from Joerg Roedel:
    "This time with:

    - Generic page-table framework for ARM IOMMUs using the LPAE
    page-table format, ARM-SMMU and Renesas IPMMU make use of it
    already.

    - Break out the IO virtual address allocator from the Intel IOMMU so
    that it can be used by other DMA-API implementations too. The
    first user will be the ARM64 common DMA-API implementation for
    IOMMUs

    - Device tree support for Renesas IPMMU

    - Various fixes and cleanups all over the place"

    * tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
    iommu/amd: Convert non-returned local variable to boolean when relevant
    iommu: Update my email address
    iommu/amd: Use wait_event in put_pasid_state_wait
    iommu/amd: Fix amd_iommu_free_device()
    iommu/arm-smmu: Avoid build warning
    iommu/fsl: Various cleanups
    iommu/fsl: Use %pa to print phys_addr_t
    iommu/omap: Print phys_addr_t using %pa
    iommu: Make more drivers depend on COMPILE_TEST
    iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
    iommu: Disable on !MMU builds
    iommu/fsl: Remove unused fsl_of_pamu_ids[]
    iommu/fsl: Fix section mismatch
    iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
    iommu: Fix trace_map() to report original iova and original size
    iommu/arm-smmu: add support for iova_to_phys through ATS1PR
    iopoll: Introduce memory-mapped IO polling macros
    iommu/arm-smmu: don't touch the secure STLBIALL register
    iommu/arm-smmu: make use of generic LPAE allocator
    iommu: io-pgtable-arm: add non-secure quirk
    ...

    Linus Torvalds
     
  • Pull DeviceTree changes from Rob Herring:

    - DT unittests for I2C probing and overlays from Pantelis Antoniou

    - Remove DT unittest dependency on OF_DYNAMIC from Gaurav Minocha

    - Add Tegra compatible strings missing for newer parts from Paul
    Walmsley

    - Various vendor prefix additions

    * tag 'devicetree-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
    of: Add vendor prefix for OmniVision Technologies
    of: Use ovti for Omnivision
    of: Add vendor prefix for Truly Semiconductors Limited
    of: Add vendor prefix for Himax Technologies Inc.
    of/fdt: fix sparse warning
    of: unitest: Add I2C overlay unit tests.
    Documentation: DT: document compatible string existence requirement
    Documentation: DT bindings: add nvidia, tegra132-denver compatible string
    Documentation: DT bindings: add more Tegra chip compatible strings
    of: EXPORT_SYMBOL_GPL of_property_read_u64_array
    of: Fix brace position for struct of_device_id definition
    of/unittest: Remove obsolete code
    dt-bindings: use isil prefix for Intersil in vendor-prefixes.txt
    Add AD Holdings Plc. to vendor-prefixes.
    dt-bindings: Add Silicon Mitus vendor prefix
    Removes OF_UNITTEST dependency on OF_DYNAMIC config symbol
    pinctrl: fix up device tree bindings
    DT: Vendors: Add Everspin
    doc: add bindings document for altera fpga manager
    drivers: of: Export of_reserved_mem_device_{init,release}

    Linus Torvalds
     
  • Pull ARM updates from Russell King:

    - clang assembly fixes from Ard

    - optimisations and cleanups for Aurora L2 cache support

    - efficient L2 cache support for secure monitor API on Exynos SoCs

    - debug menu cleanup from Daniel Thompson to allow better behaviour for
    multiplatform kernels

    - StrongARM SA11x0 conversion to irq domains, and pxa_timer

    - kprobes updates for older ARM CPUs

    - move probes support out of arch/arm/kernel to arch/arm/probes

    - add inline asm support for the rbit (reverse bits) instruction

    - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs)

    - remove the unused ARMv3 user access code

    - add driver_override support to AMBA Primecell bus

    * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits)
    ARM: 8256/1: driver coamba: add device binding path 'driver_override'
    ARM: 8301/1: qcom: Use secondary_startup_arm()
    ARM: 8302/1: Add a secondary_startup that assumes ARM mode
    ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip
    ARM: kprobes: Fix compilation error caused by superfluous '*'
    ARM: 8297/1: cache-l2x0: optimize aurora range operations
    ARM: 8296/1: cache-l2x0: clean up aurora cache handling
    ARM: 8284/1: sa1100: clear RCSR_SMR on resume
    ARM: 8283/1: sa1100: collie: clear PWER register on machine init
    ARM: 8282/1: sa1100: use handle_domain_irq
    ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver
    ARM: 8280/1: sa1100: switch to irq_domain_add_simple()
    ARM: 8279/1: sa1100: merge both GPIO irqdomains
    ARM: 8278/1: sa1100: split irq handling for low GPIOs
    ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code
    ARM: 8290/1: decompressor: fix a wrong comment
    ARM: 8286/1: mm: Fix dma_contiguous_reserve comment
    ARM: 8248/1: pm: remove outdated comment
    ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X)
    ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX
    ...

    Linus Torvalds
     

12 Feb, 2015

20 commits

  • Pull security layer updates from James Morris:
    "Highlights:

    - Smack adds secmark support for Netfilter
    - /proc/keys is now mandatory if CONFIG_KEYS=y
    - TPM gets its own device class
    - Added TPM 2.0 support
    - Smack file hook rework (all Smack users should review this!)"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
    cipso: don't use IPCB() to locate the CIPSO IP option
    SELinux: fix error code in policydb_init()
    selinux: add security in-core xattr support for pstore and debugfs
    selinux: quiet the filesystem labeling behavior message
    selinux: Remove unused function avc_sidcmp()
    ima: /proc/keys is now mandatory
    Smack: Repair netfilter dependency
    X.509: silence asn1 compiler debug output
    X.509: shut up about included cert for silent build
    KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
    MAINTAINERS: email update
    tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
    smack: fix possible use after frees in task_security() callers
    smack: Add missing logging in bidirectional UDS connect check
    Smack: secmark support for netfilter
    Smack: Rework file hooks
    tpm: fix format string error in tpm-chip.c
    char/tpm/tpm_crb: fix build error
    smack: Fix a bidirectional UDS connect check typo
    smack: introduce a special case for tmpfs in smack_d_instantiate()
    ...

    Linus Torvalds
     
  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:

    - Update of all defconfigs

    - Addition of a bunch of config options to modernise our defconfigs

    - Some PS3 updates from Geoff

    - Optimised memcmp for 64 bit from Anton

    - Fix for kprobes that allows 'perf probe' to work from Naveen

    - Several cxl updates from Ian & Ryan

    - Expanded support for the '24x7' PMU from Cody & Sukadev

    - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
    device tree content, e300 machine check support, t1040 corenet
    error reporting, and various cleanups and fixes"

    * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
    cxl: Add missing return statement after handling AFU errror
    cxl: Fail AFU initialisation if an invalid configuration record is found
    cxl: Export optional AFU configuration record in sysfs
    powerpc/mm: Warn on flushing tlb page in kernel context
    powerpc/powernv: Add OPAL soft-poweroff routine
    powerpc/perf/hv-24x7: Document sysfs event description entries
    powerpc/perf/hv-gpci: add the remaining gpci requests
    powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
    powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
    perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
    perf: add PMU_EVENT_ATTR_STRING() helper
    perf: provide sysfs_show for struct perf_pmu_events_attr
    powerpc/kernel: Avoid initializing device-tree pointer twice
    powerpc: Remove old compile time disabled syscall tracing code
    powerpc/kernel: Make syscall_exit a local label
    cxl: Fix device_node reference counting
    powerpc/mm: bail out early when flushing TLB page
    powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
    perf/powerpc: reset event hw state when adding it to the PMU
    powerpc/qe: Use strlcpy()
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Catalin Marinas:
    "arm64 updates for 3.20:

    - reimplementation of the virtual remapping of UEFI Runtime Services
    in a way that is stable across kexec
    - emulation of the "setend" instruction for 32-bit tasks (user
    endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
    accordingly)
    - compat_sys_call_table implemented in C (from asm) and made it a
    constant array together with sys_call_table
    - export CPU cache information via /sys (like other architectures)
    - DMA API implementation clean-up in preparation for IOMMU support
    - macros clean-up for KVM
    - dropped some unnecessary cache+tlb maintenance
    - CONFIG_ARM64_CPU_SUSPEND clean-up
    - defconfig update (CPU_IDLE)

    The EFI changes going via the arm64 tree have been acked by Matt
    Fleming. There is also a patch adding sys_*stat64 prototypes to
    include/linux/syscalls.h, acked by Andrew Morton"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
    arm64: compat: Remove incorrect comment in compat_siginfo
    arm64: Fix section mismatch on alloc_init_p[mu]d()
    arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
    arm64: mm: use *_sect to check for section maps
    arm64: drop unnecessary cache+tlb maintenance
    arm64:mm: free the useless initial page table
    arm64: Enable CPU_IDLE in defconfig
    arm64: kernel: remove ARM64_CPU_SUSPEND config option
    arm64: make sys_call_table const
    arm64: Remove asm/syscalls.h
    arm64: Implement the compat_sys_call_table in C
    syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
    compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
    arm64: uapi: expose our struct ucontext to the uapi headers
    smp, ARM64: Kill SMP single function call interrupt
    arm64: Emulate SETEND for AArch32 tasks
    arm64: Consolidate hotplug notifier for instruction emulation
    arm64: Track system support for mixed endian EL0
    arm64: implement generic IOMMU configuration
    arm64: Combine coherent and non-coherent swiotlb dma_ops
    ...

    Linus Torvalds
     
  • Pull s390 updates from Martin Schwidefsky:

    - The remaining patches for the z13 machine support: kernel build
    option for z13, the cache synonym avoidance, SMT support,
    compare-and-delay for spinloops and the CES5S crypto adapater.

    - The ftrace support for function tracing with the gcc hotpatch option.
    This touches common code Makefiles, Steven is ok with the changes.

    - The hypfs file system gets an extension to access diagnose 0x0c data
    in user space for performance analysis for Linux running under z/VM.

    - The iucv hvc console gets wildcard spport for the user id filtering.

    - The cacheinfo code is converted to use the generic infrastructure.

    - Cleanup and bug fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
    s390/process: free vx save area when releasing tasks
    s390/hypfs: Eliminate hypfs interval
    s390/hypfs: Add diagnose 0c support
    s390/cacheinfo: don't use smp_processor_id() in preemptible context
    s390/zcrypt: fixed domain scanning problem (again)
    s390/smp: increase maximum value of NR_CPUS to 512
    s390/jump label: use different nop instruction
    s390/jump label: add sanity checks
    s390/mm: correct missing space when reporting user process faults
    s390/dasd: cleanup profiling
    s390/dasd: add locking for global_profile access
    s390/ftrace: hotpatch support for function tracing
    ftrace: let notrace function attribute disable hotpatching if necessary
    ftrace: allow architectures to specify ftrace compile options
    s390: reintroduce diag 44 calls for cpu_relax()
    s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
    s390/zcrypt: Number of supported ap domains is not retrievable.
    s390/spinlock: add compare-and-delay to lock wait loops
    s390/tape: remove redundant if statement
    s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights incluse:

    Features:
    - Removing the forced serialisation of open()/close() calls in
    NFSv4.x (x>0) makes for a significant performance improvement in
    metadata intensive workloads.
    - Full support for the pNFS "flexible files" layout type
    - Further RPC/RDMA client improvements from Chuck

    Bugfixes:
    - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
    - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
    - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
    as part of the namespace cleanup,
    - Stable fix: Ensure we reference the inode for return-on-close in
    delegreturn
    - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
    the same source address/port combination during a disconnect/
    reconnect event. This is a requirement imposed by most NFSv3
    server duplicate reply cache implementations.

    Optimisations:
    - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT

    Other:
    - Add Anna Schumaker as co-maintainer for the NFS client"

    * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
    SUNRPC: Cleanup to remove xs_tcp_close()
    pnfs: delete an unintended goto
    pnfs/flexfiles: Do not dprintk after the free
    SUNRPC: Fix stupid typo in xs_sock_set_reuseport
    SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
    SUNRPC: Handle connection reset more efficiently.
    SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
    SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
    SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
    SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
    SUNRPC: Remove TCP socket linger code
    SUNRPC: Remove TCP client connection reset hack
    SUNRPC: TCP/UDP always close the old socket before reconnecting
    SUNRPC: Add helpers to prevent socket create from racing
    SUNRPC: Ensure xs_reset_transport() resets the close connection flags
    SUNRPC: Do not clear the source port in xs_reset_transport
    SUNRPC: Handle EADDRINUSE on connect
    SUNRPC: Set SO_REUSEPORT socket option for TCP connections
    NFSv4.1: Fix pnfs_put_lseg races
    NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
    ...

    Linus Torvalds
     
  • [akpm@linux-foundation.org: tweaks]
    Signed-off-by: Cyrill Gorcunov
    Cc: Kees Cook
    Cc: "Kirill A. Shutemov"
    Cc: Calvin Owens
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Dave noticed that unprivileged process can allocate significant amount of
    memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
    memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
    kernel doesn't account PMD tables to the process, only PTE.

    The use-cases below use few tricks to allocate a lot of PMD page tables
    while keeping VmRSS and VmPTE low. oom_score for the process will be 0.

    #include
    #include
    #include
    #include
    #include
    #include

    #define PUD_SIZE (1UL << 30)
    #define PMD_SIZE (1UL << 21)

    #define NR_PUD 130000

    int main(void)
    {
    char *addr = NULL;
    unsigned long i;

    prctl(PR_SET_THP_DISABLE);
    for (i = 0; i < NR_PUD ; i++) {
    addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
    MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
    if (addr == MAP_FAILED) {
    perror("mmap");
    break;
    }
    *addr = 'x';
    munmap(addr, PMD_SIZE);
    mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
    MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
    if (addr == MAP_FAILED)
    perror("re-mmap"), exit(1);
    }
    printf("PID %d consumed %lu KiB in PMD page tables\n",
    getpid(), i * 4096 >> 10);
    return pause();
    }

    The patch addresses the issue by account PMD tables to the process the
    same way we account PTE.

    The main place where PMD tables is accounted is __pmd_alloc() and
    free_pmd_range(). But there're few corner cases:

    - HugeTLB can share PMD page tables. The patch handles by accounting
    the table to all processes who share it.

    - x86 PAE pre-allocates few PMD tables on fork.

    - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
    check on exit(2).

    Accounting only happens on configuration where PMD page table's level is
    present (PMD is not folded). As with nr_ptes we use per-mm counter. The
    counter value is used to calculate baseline for badness score by
    oom-killer.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Dave Hansen
    Cc: Hugh Dickins
    Reviewed-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: David Rientjes
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Introduce the basic control files to account, partition, and limit
    memory using cgroups in default hierarchy mode.

    This interface versioning allows us to address fundamental design
    issues in the existing memory cgroup interface, further explained
    below. The old interface will be maintained indefinitely, but a
    clearer model and improved workload performance should encourage
    existing users to switch over to the new one eventually.

    The control files are thus:

    - memory.current shows the current consumption of the cgroup and its
    descendants, in bytes.

    - memory.low configures the lower end of the cgroup's expected
    memory consumption range. The kernel considers memory below that
    boundary to be a reserve - the minimum that the workload needs in
    order to make forward progress - and generally avoids reclaiming
    it, unless there is an imminent risk of entering an OOM situation.

    - memory.high configures the upper end of the cgroup's expected
    memory consumption range. A cgroup whose consumption grows beyond
    this threshold is forced into direct reclaim, to work off the
    excess and to throttle new allocations heavily, but is generally
    allowed to continue and the OOM killer is not invoked.

    - memory.max configures the hard maximum amount of memory that the
    cgroup is allowed to consume before the OOM killer is invoked.

    - memory.events shows event counters that indicate how often the
    cgroup was reclaimed while below memory.low, how often it was
    forced to reclaim excess beyond memory.high, how often it hit
    memory.max, and how often it entered OOM due to memory.max. This
    allows users to identify configuration problems when observing a
    degradation in workload performance. An overcommitted system will
    have an increased rate of low boundary breaches, whereas increased
    rates of high limit breaches, maximum hits, or even OOM situations
    will indicate internally overcommitted cgroups.

    For existing users of memory cgroups, the following deviations from
    the current interface are worth pointing out and explaining:

    - The original lower boundary, the soft limit, is defined as a limit
    that is per default unset. As a result, the set of cgroups that
    global reclaim prefers is opt-in, rather than opt-out. The costs
    for optimizing these mostly negative lookups are so high that the
    implementation, despite its enormous size, does not even provide
    the basic desirable behavior. First off, the soft limit has no
    hierarchical meaning. All configured groups are organized in a
    global rbtree and treated like equal peers, regardless where they
    are located in the hierarchy. This makes subtree delegation
    impossible. Second, the soft limit reclaim pass is so aggressive
    that it not just introduces high allocation latencies into the
    system, but also impacts system performance due to overreclaim, to
    the point where the feature becomes self-defeating.

    The memory.low boundary on the other hand is a top-down allocated
    reserve. A cgroup enjoys reclaim protection when it and all its
    ancestors are below their low boundaries, which makes delegation
    of subtrees possible. Secondly, new cgroups have no reserve per
    default and in the common case most cgroups are eligible for the
    preferred reclaim pass. This allows the new low boundary to be
    efficiently implemented with just a minor addition to the generic
    reclaim code, without the need for out-of-band data structures and
    reclaim passes. Because the generic reclaim code considers all
    cgroups except for the ones running low in the preferred first
    reclaim pass, overreclaim of individual groups is eliminated as
    well, resulting in much better overall workload performance.

    - The original high boundary, the hard limit, is defined as a strict
    limit that can not budge, even if the OOM killer has to be called.
    But this generally goes against the goal of making the most out of
    the available memory. The memory consumption of workloads varies
    during runtime, and that requires users to overcommit. But doing
    that with a strict upper limit requires either a fairly accurate
    prediction of the working set size or adding slack to the limit.
    Since working set size estimation is hard and error prone, and
    getting it wrong results in OOM kills, most users tend to err on
    the side of a looser limit and end up wasting precious resources.

    The memory.high boundary on the other hand can be set much more
    conservatively. When hit, it throttles allocations by forcing
    them into direct reclaim to work off the excess, but it never
    invokes the OOM killer. As a result, a high boundary that is
    chosen too aggressively will not terminate the processes, but
    instead it will lead to gradual performance degradation. The user
    can monitor this and make corrections until the minimal memory
    footprint that still gives acceptable performance is found.

    In extreme cases, with many concurrent allocations and a complete
    breakdown of reclaim progress within the group, the high boundary
    can be exceeded. But even then it's mostly better to satisfy the
    allocation from the slack available in other groups or the rest of
    the system than killing the group. Otherwise, memory.max is there
    to limit this type of spillover and ultimately contain buggy or
    even malicious applications.

    - The original control file names are unwieldy and inconsistent in
    many different ways. For example, the upper boundary hit count is
    exported in the memory.failcnt file, but an OOM event count has to
    be manually counted by listening to memory.oom_control events, and
    lower boundary / soft limit events have to be counted by first
    setting a threshold for that value and then counting those events.
    Also, usage and limit files encode their units in the filename.
    That makes the filenames very long, even though this is not
    information that a user needs to be reminded of every time they
    type out those names.

    To address these naming issues, as well as to signal clearly that
    the new interface carries a new configuration model, the naming
    conventions in it necessarily differ from the old interface.

    - The original limit files indicate the state of an unset limit with
    a very high number, and a configured limit can be unset by echoing
    -1 into those files. But that very high number is implementation
    and architecture dependent and not very descriptive. And while -1
    can be understood as an underflow into the highest possible value,
    -2 or -10M etc. do not work, so it's not inconsistent.

    memory.low, memory.high, and memory.max will use the string
    "infinity" to indicate and set the highest possible value.

    [akpm@linux-foundation.org: use seq_puts() for basic strings]
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can
    detect zero_page in /proc/kpageflags, and then do memory analysis more
    accurately.

    Signed-off-by: Yalin Wang
    Acked-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang, Yalin
     
  • Pull documentation updates from Jonathan Corbet:
    "Highlights this time around include:

    - A thrashing of SubmittingPatches to bring it out of the "send
    everything to Linus" era of kernel development.

    - A new document on completions from Nicholas McGuire

    - Lots of typo fixes, formatting improvements, corrections, build
    fixes, and more"

    * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
    Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
    can-doc: Fixed a wrong filepath in can.txt
    Documentation: Fix trivial typo in comment.
    kgdb,docs: Fix typo and minor style issues
    Documentation: add description for FTRACE probe status
    doc: brief user documentation for completion
    Documentation/misc-devices/mei: Fix indentation of embedded code.
    Documentation/misc-devices/mei: Fix indentation of enumeration.
    Documentation/misc-devices/mei: Fix spacing around parentheses.
    Documentation/misc-devices/mei: Fix formatting of headings.
    Documentation: devicetree: Fix double words in Doumentation/devicetree
    Documentation: mm: Fix typo in vm.txt
    lockstat: Add documentation on contention and contenting points
    Documentation: fix blackfin gptimers-example build errors
    Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
    CodingStyle: enable emacs display of trailing whitespace
    DocBook: Do not exceed argument list limit
    gpio: board.txt: Fix the gpio name example
    Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
    MAINTAINERS: Add the docs-next git tree to the maintainer entry
    ...

    Linus Torvalds
     
  • Pull mailbox framework updates from Jassi Brar.

    * 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
    mailbox: Add Altera mailbox driver
    mailbox: check for bit set before polling
    Mailbox: Fix return value check in pcc_init()

    Linus Torvalds
     
  • Pull pincontrol updates from Linus Walleij:
    :This is the bulk of pin control changes for the v3.20 cycle:

    Framework changes and enhancements:
    - Passing -DDEBUG recursively to subdir drivers so we get debug
    messages properly turned on.
    - Infer map type from DT property in the groups parsing code in the
    generic pinconfig code.
    - Support for custom parameter passing in generic pin config. This
    is used when you are using the generic pin config, but want to add
    a few custom properties that no other driver will use.

    New drivers:
    - Driver for the Xilinx Zynq
    - Driver for the AmLogic Meson SoCs

    New features in drivers:
    - Sleep support (suspend/resume) for the Cherryview driver
    - mvebeu a38x can now mux a UART on pins MPP19 and MPP20
    - Migrated the qualcomm driver to generic pin config handling of
    extended config options in the core code.
    - Support BUS1 and AUDIO in the Exynos pin controller.
    - Add some missing functions in the sun6i driver.
    - Add support for the A31S variant in the sun6i driver.
    - EMEv2 support in the Renesas PFC driver.
    - Add support for Qualcomm MSM8916 in the qcom driver.

    Deleted features
    - Drop support for the SiRF Marco that was never released to the
    market.
    - Drop SH7372 support as the support for this platform is removed
    from the kernel"

    * tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (40 commits)
    sh-pfc: emev2 - Fix mangled author name
    pinctrl: cherryview: Configure HiZ pins to be input when requested as GPIOs
    pinctrl: imx25: fix numbering for pins
    pinctrl: pinctrl-imx: don't use invalid value of conf_reg
    pinctrl: qcom: delete pin_config_get/set pinconf operations
    pinctrl: qcom: Add msm8916 pinctrl driver
    DT: pinctrl: Document Qualcomm MSM8916 pinctrl binding
    pinctrl: qcom: increase variable size for register offsets
    pinctrl: hide PCONFDUMP in #ifdef
    pinctrl: rockchip: Only mask interrupts; never disable
    pinctrl: zynq: Fix usb0 pins
    pinctrl: sh-pfc: sh7372: Remove DT binding documentation
    pinctrl: sh-pfc: sh7372: Remove PFC support
    sh-pfc: Add emev2 pinmux support
    sh-pfc: add macro to define pinmux without function
    pinctrl: add driver for Amlogic Meson SoCs
    staging: drivers: pinctrl: Fixed checkpatch.pl warnings
    pinctrl: exynos: Add AUDIO pin controller for exynos7
    sh-pfc: r8a7790: add MLB+ pin group
    sh-pfc: r8a7791: add MLB+ pin group
    ...

    Linus Torvalds
     
  • Pull GPIO changes from Linus Walleij:
    "This is the GPIO bulk changes for the v3.20 series:

    GPIOLIB core changes:
    - Create and use of_mm_gpiochip_remove() for removing memory-mapped
    OF GPIO chips
    - GPIO MMIO library suppports bgpio_set_multiple for switching
    several lines at once, a feature merged in the last cycle.

    New drivers:
    - New driver for the APM X-gene standby GPIO controller
    - New driver for the Fujitsu MB86S7x GPIO controller

    Cleanups:
    - Moved rcar driver to use gpiolib irqchip
    - Moxart converted to the GPIO MMIO library
    - GE driver converted to GPIO MMIO library
    - Move sx150x to irqdomain
    - Move max732x to irqdomain
    - Move vx855 to use managed resources
    - Move dwapb to use managed resources
    - Clean tc3589x from platform data
    - Clean stmpe driver to use device tree only probe

    New subtypes:
    - sx1506 support in the sx150x driver
    - Quark 1000 SoC support in the SCH driver
    - Support X86 in the Xilinx driver
    - Support PXA1928 in the PXA driver

    Extended drivers:
    - max732x supports device tree probe
    - sx150x supports device tree probe

    Various minor cleanups and bug fixes"

    * tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
    gpio: kconfig: replace PPC_OF with PPC
    gpio: pxa: add PXA1928 gpio type support
    dt/bindings: gpio: add compatible string for marvell,pxa1928-gpio
    gpio: pxa: remove mach IRQ includes
    gpio: max732x: use an inline function for container cast
    gpio: use sizeof() instead of hardcoded values
    gpio: max732x: add set_multiple function
    gpio: sch: Consolidate similar algorithms
    gpio: tz1090-pdc: Use resource_size to fix off-by-one resource size calculation
    gpio: ge: Convert to use devm_kstrdup
    gpio: correctly use const char * const
    gpio: sx150x: fixup OF support
    gpio: mpc8xxx: Use of_mm_gpiochip_remove
    gpio: Add Fujitsu MB86S7x GPIO driver
    gpio: mpc8xxx: Convert to platform device interface.
    gpio: zevio: Use of_mm_gpiochip_remove
    gpio: gpio-mm-lantiq: Use of_mm_gpiochip_remove
    gpio: gpio-mm-lantiq: Use of_property_read_u32
    gpio: gpio-mm-lantiq: Do not replicate code
    gpio :gpio-mm-lantiq: Use devm_kzalloc
    ...

    Linus Torvalds
     
  • Pull MMC updates from Ulf Hansson:
    "MMC core:
    - Support for MMC power sequences.
    - SDIO function devicetree subnode parsing.
    - Refactor the hardware reset routines and enable it for SD cards.
    - Various code quality improvements, especially for slot-gpio.

    MMC host:
    - dw_mmc: Various fixes and cleanups.
    - dw_mmc: Convert to mmc_send_tuning().
    - moxart: Fix probe logic.
    - sdhci: Various fixes and cleanups
    - sdhci: Asynchronous request handling support.
    - sdhci-pxav3: Various fixes and cleanups.
    - sdhci-tegra: Fixes for T114, T124 and T132.
    - rtsx: Various fixes and cleanups.
    - rtsx: Support for SDIO.
    - sdhi/tmio: Refactor and cleanup of header files.
    - omap_hsmmc: Use slot-gpio and common MMC DT parser.
    - Make all hosts to deal with errors from mmc_of_parse().
    - sunxi: Various fixes and cleanups.
    - sdhci: Support for Fujitsu SDHCI controller f_sdh30"

    * tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits)
    mmc: sdhci-s3c: solve problem with sleeping in atomic context
    mmc: pwrseq: add driver for emmc hardware reset
    mmc: moxart: fix probe logic
    mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state
    mmc: pwrseq_simple: Add optional reference clock support
    mmc: pwrseq: Document optional clock for the simple power sequence
    mmc: pwrseq_simple: Extend to support more pins
    mmc: pwrseq: Document that simple sequence support more than one GPIO
    mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2
    mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes
    mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x
    mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951
    mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor
    mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume
    mmc: tegra: Write xfer_mode, CMD regs in together
    mmc: Resolve BKOPS compatability issue
    mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
    mmc: dw_mmc: rockchip: remove incorrect __exit_p()
    mmc: dw_mmc: exynos: remove incorrect __exit_p()
    mmc: Fix menuconfig alignment of MMC_SDHCI_* options
    ...

    Linus Torvalds
     
  • Pull input updates from Dmitry Torokhov:
    "The first round of updates for the input subsystem.

    A few new drivers (power button handler for AXP20x PMIC, tps65218
    power button driver, sun4i keys driver, regulator haptic driver, NI
    Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller).

    Updates to Synaptics and ALPS touchpad drivers (with more to come
    later), brand new Focaltech PS/2 support, update to Cypress driver to
    handle Gen5 (in addition to Gen3) devices, and number of other fixups
    to various drivers as well as input core"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
    Input: elan_i2c - fix wrong %p extension
    Input: evdev - do not queue SYN_DROPPED if queue is empty
    Input: gscps2 - fix MODULE_DEVICE_TABLE invocation
    Input: synaptics - use dmax in input_mt_assign_slots
    Input: pxa27x_keypad - remove unnecessary ARM includes
    Input: ti_am335x_tsc - replace delta filtering with median filtering
    ARM: dts: AM335x: Make charge delay a DT parameter for TSC
    Input: ti_am335x_tsc - read charge delay from DT
    Input: ti_am335x_tsc - remove udelay in interrupt handler
    Input: ti_am335x_tsc - interchange touchscreen and ADC steps
    Input: MT - add support for balanced slot assignment
    Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias
    Input: drv260x - remove wrong and unneeded drv260x-haptics modalias
    Input: cap11xx - remove wrong and unneeded cap11xx modalias
    Input: sun4i-ts - add support for touchpanel controller on A31
    Input: serio - add support for Alwinner A10/A20 PS/2 controller
    Input: gtco - use sign_extend32() for sign extension
    Input: elan_i2c - verify firmware signature applying it
    Input: elantech - remove stale comment from Kconfig
    Input: cyapa - off by one in cyapa_update_fw_store()
    ...

    Linus Torvalds
     
  • Pull fbdev changes from Tomi Valkeinen:

    - omapdss: add DRA7xxx SoC support

    - fbdev: support DMT (Display Monitor Timing) calculation

    * tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (40 commits)
    omapfb: Return error code when applying overlay settings fails
    OMAPDSS: DPI: DRA7xx support
    OMAPDSS: HDMI: Add DRA7xx support
    OMAPDSS: DISPC: program dispc polarities to control module
    OMAPDSS: DISPC: Add DRA7xx support
    OMAPDSS: Add Video PLLs for DRA7xx
    OMAPDSS: Add functions for external control of PLL
    OMAPDSS: DSS: Add DRA7xx base support
    Doc/DT: Add DT binding doc for DRA7xx DSS
    OMAPDSS: add define for DRA7xx HW version
    OMAPDSS: encoder-tpd12s015: Fix race issue with LS_OE
    OMAPDSS: OMAP5: fix digit output's allowed mgrs
    OMAPDSS: constify port arrays
    OMAPDSS: PLL: add dss_pll_wait_reset_done()
    OMAPDSS: Add enum dss_pll_id
    video: fbdev: fix sys_copyarea
    video/mmpfb: allow modular build
    fb: via: turn gpiolib and i2c selects into dependencies
    fbdev: ssd1307fb: return proper error code if write command fails
    fbdev: fix CVT vertical front and back porch values
    ...

    Linus Torvalds
     
  • Pull sound updates from Takashi Iwai:
    "In this batch, you can find lots of cleanups through the whole
    subsystem, as our good New Year's resolution. Lots of LOCs and
    commits are about LINE6 driver that was promoted finally from staging
    tree, and as usual, there've been widely spread ASoC changes.

    Here some highlights:

    ALSA core changes
    - Embedding struct device into ALSA core structures
    - sequencer core cleanups / fixes
    - PCM msbits constraints cleanups / fixes
    - New SNDRV_PCM_TRIGGER_DRAIN command
    - PCM kerneldoc fixes, header cleanups
    - PCM code cleanups using more standard codes
    - Control notification ID fixes

    Driver cleanups
    - Cleanups of PCI PM callbacks
    - Timer helper usages cleanups
    - Simplification (e.g. argument reduction) of many driver codes

    HD-audio
    - Hotkey and LED support on HP laptops with Realtek codecs
    - Dock station support on HP laptops
    - Toshiba Satellite S50D fixup
    - Enhanced wallclock timestamp handling for HD-audio
    - Componentization to simplify the linkage between i915 and hd-audio
    drivers for Intel HDMI/DP

    USB-audio
    - Akai MPC Element support
    - Enhanced timestamp handling

    ASoC
    - Lots of refactoringin ASoC core, moving drivers to more data driven
    initialization and rationalizing a lot of DAPM usage
    - Much improved handling of CDCLK clocks on Samsung I2S controllers
    - Lots of driver specific cleanups and feature improvements
    - CODEC support for TI PCM514x and TLV320AIC3104 devices
    - Board support for Tegra systems with Realtek RT5677
    - New driver for Maxim max98357a
    - More enhancements / fixes for Intel SST driver

    Others
    - Promotion of LINE6 driver from staging along with lots of rewrites
    and cleanups
    - DT support for old non-ASoC atmel driver
    - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
    - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
    - A few more ak411x fixes for ice1724 boards"

    * tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits)
    ALSA: line6: toneport: Use explicit type for firmware version
    ALSA: line6: Use explicit type for serial number
    ALSA: line6: Return EIO if read/write not successful
    ALSA: line6: Return error if device not responding
    ALSA: line6: Add delay before reading status
    ASoC: Intel: Clean data after SST fw fetch
    ALSA: hda - Add docking station support for another HP machine
    ALSA: control: fix failure to return new numerical ID in 'replace' event data
    ALSA: usb: update trigger timestamp on first non-zero URB submitted
    ALSA: hda: read trigger_timestamp immediately after starting DMA
    ALSA: pcm: allow for trigger_tstamp snapshot in .trigger
    ALSA: pcm: don't override timestamp unconditionally
    ALSA: off by one bug in snd_riptide_joystick_probe()
    ASoC: rt5670: Set use_single_rw flag for regmap
    ASoC: rt286: Add rt288 codec support
    ASoC: max98357a: Fix build in !CONFIG_OF case
    ASoC: Intel: fix platform_no_drv_owner.cocci warnings
    ARM: dts: Switch Odroid X2/U2 to simple-audio-card
    ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update
    ALSA: control: fix failure to return numerical ID in 'add' event
    ...

    Linus Torvalds
     
  • Pull media updates from Mauro Carvalho Chehab:

    - Some documentation updates and a few new pixel formats

    - Stop btcx-risc abuse by cx88 and move it to bt8xx driver

    - New platform driver: am437x

    - New webcam driver: toptek

    - New remote controller hardware protocols added to img-ir driver

    - Removal of a few very old drivers that relies on old kABIs and are
    for very hard to find hardware: parallel port webcam drivers
    (bw-qcam, c-cam, pms and w9966), tlg2300, Video In/Out for SGI (vino)

    - Removal of the USB Telegent driver (tlg2300). The company that
    developed this driver has long gone and the hardware is hard to find.
    As it relies on a legacy set of kABI symbols and nobody seems to care
    about it, remove it.

    - several improvements at rtl2832 driver

    - conversion on cx28521 and au0828 to use videobuf2 (VB2)

    - several improvements, fixups and board additions

    * tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (321 commits)
    [media] dvb_net: Convert local hex dump to print_hex_dump_debug
    [media] dvb_net: Use standard debugging facilities
    [media] dvb_net: Use vsprintf %pM extension to print Ethernet addresses
    [media] staging: lirc_serial: adjust boolean assignments
    [media] stb0899: use sign_extend32() for sign extension
    [media] si2168: add support for 1.7MHz bandwidth
    [media] si2168: return error if set_frontend is called with invalid parameters
    [media] lirc_dev: avoid potential null-dereference
    [media] mn88472: simplify bandwidth registers setting code
    [media] dvb: tc90522: re-add symbol-rate report
    [media] lmedm04: add read snr, signal strength and ber call backs
    [media] lmedm04: Create frontend call back for read status
    [media] lmedm04: create frontend callbacks for signal/snr/ber/ucblocks
    [media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
    [media] lmedm04: Increase Interupt due time to 200 msec
    [media] cx88-dvb: whitespace cleanup
    [media] rtl28xxu: properly initialize pdata
    [media] rtl2832: declare functions as static
    [media] rtl2830: declare functions as static
    [media] rtl2832_sdr: add kernel-doc comments for platform_data
    ...

    Linus Torvalds
     
  • Pull power supply and reset changes from Sebastian Reichel:
    "New drivers:
    - charger driver for Maxim 77693
    - battery gauge driver for LTC 2941/2943
    - battery gauge driver for RT5033
    - reset driver for R-Mobile platforms

    Convert drivers to restart handler framework:
    - arm-versatile
    - at91
    - st-poweroff

    Misc:
    - remove deprecated sun6i reboot driver
    - use alarmtimer instead of rtc in charger-manager
    - misc fixes"

    * tag 'for-v3.20' of git://git.infradead.org/battery-2.6: (48 commits)
    power_supply: 88pm860x: Fix leaked power supply on probe fail
    power/reset: restart-poweroff: Remove arm dependencies
    power/reset: st-poweroff: Fix misleading Kconfig description
    power/reset: st-poweroff: Register with kernel restart handler
    power/reset: Remove sun6i reboot driver
    power/reset: at91: Register with kernel restart handler
    power/reset: arm-versatile: Register with kernel restart handler
    power: test_power: Use enum as index for array of supplies
    Add devicetree binding documentation for the LTC2941/LTC2943 driver
    Add LTC2941/LTC2943 Battery Gauge Driver
    power/reset: brcmstb: Add support for old 65nm chips
    power/reset: brcmstb: Use the DT "compatible" string to indicate bit positions
    power/reset: brcmstb: Make the driver buildable on MIPS
    power: charger-manager: Use alarmtimer for battery monitoring in suspend.
    power/reset: at91-poweroff: Fix error handling and other compiler warnings
    bq27x00_battery: Call power_supply_changed only when capacity changed
    bq27x00_battery: fix register offset for bq27425
    power: max14577: Remove SYSFS dependency from Kconfig
    power: bq24190_charger: suppress build warning
    power: reset: Add reset driver for R-Mobile platforms
    ...

    Linus Torvalds
     

11 Feb, 2015

13 commits

  • Pull networking updates from David Miller:

    1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
    wrong, and this pull actually adds an extra commit on top of the
    branch I'm pulling to fix that up, so that the pre-merge state is
    ok. - Linus ]

    2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation. From Alexander Duyck.

    3) Remove sock_iocb altogether, from CHristoph Hellwig.

    4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

    5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

    6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

    7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
    Florian Westphal.

    8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

    9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

    10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

    11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

    12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms. From Neal Cardwell.

    13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

    14) Support xmit_more in be2net, from Sathya Perla.

    15) Group Policy extensions for vxlan, from Thomas Graf.

    16) Remove Checksum Offload support for vxlan, from Tom Herbert.

    17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
    Vlad Yasevich.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
    crypto: fix af_alg_make_sg() conversion to iov_iter
    ipv4: Namespecify TCP PMTU mechanism
    i40e: Fix for stats init function call in Rx setup
    tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
    openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
    ipv6: Make __ipv6_select_ident static
    ipv6: Fix fragment id assignment on LE arches.
    bridge: Fix inability to add non-vlan fdb entry
    net: Mellanox: Delete unnecessary checks before the function call "vunmap"
    cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
    ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
    net: dsa: Remove redundant phy_attach()
    IB/mlx4: Reset flow support for IB kernel ULPs
    IB/mlx4: Always use the correct port for mirrored multicast attachments
    net/bonding: Fix potential bad memory access during bonding events
    tipc: remove tipc_snprintf
    tipc: nl compat add noop and remove legacy nl framework
    tipc: convert legacy nl stats show to nl compat
    tipc: convert legacy nl net id get to nl compat
    tipc: convert legacy nl net id set to nl compat
    ...

    Linus Torvalds
     
  • Pull trivial tree changes from Jiri Kosina:
    "Patches from trivial.git that keep the world turning around.

    Mostly documentation and comment fixes, and a two corner-case code
    fixes from Alan Cox"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kexec, Kconfig: spell "architecture" properly
    mm: fix cleancache debugfs directory path
    blackfin: mach-common: ints-priority: remove unused function
    doubletalk: probe failure causes OOPS
    ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
    msdos_fs.h: fix 'fields' in comment
    scsi: aic7xxx: fix comment
    ARM: l2c: fix comment
    ibmraid: fix writeable attribute with no store method
    dynamic_debug: fix comment
    doc: usbmon: fix spelling s/unpriviledged/unprivileged/
    x86: init_mem_mapping(): use capital BIOS in comment

    Linus Torvalds
     
  • Pull live patching infrastructure from Jiri Kosina:
    "Let me provide a bit of history first, before describing what is in
    this pile.

    Originally, there was kSplice as a standalone project that implemented
    stop_machine()-based patching for the linux kernel. This project got
    later acquired, and the current owner is providing live patching as a
    proprietary service, without any intentions to have their
    implementation merged.

    Then, due to rising user/customer demand, both Red Hat and SUSE
    started working on their own implementation (not knowing about each
    other), and announced first versions roughly at the same time [1] [2].

    The principle difference between the two solutions is how they are
    making sure that the patching is performed in a consistent way when it
    comes to different execution threads with respect to the semantic
    nature of the change that is being introduced.

    In a nutshell, kPatch is issuing stop_machine(), then looking at
    stacks of all existing processess, and if it decides that the system
    is in a state that can be patched safely, it proceeds insterting code
    redirection machinery to the patched functions.

    On the other hand, kGraft provides a per-thread consistency during one
    single pass of a process through the kernel and performs a lazy
    contignuous migration of threads from "unpatched" universe to the
    "patched" one at safe checkpoints.

    If interested in a more detailed discussion about the consistency
    models and its possible combinations, please see the thread that
    evolved around [3].

    It pretty quickly became obvious to the interested parties that it's
    absolutely impractical in this case to have several isolated solutions
    for one task to co-exist in the kernel. During a dedicated Live
    Kernel Patching track at LPC in Dusseldorf, all the interested parties
    sat together and came up with a joint aproach that would work for both
    distro vendors. Steven Rostedt took notes [4] from this meeting.

    And the foundation for that aproach is what's present in this pull
    request.

    It provides a basic infrastructure for function "live patching" (i.e.
    code redirection), including API for kernel modules containing the
    actual patches, and API/ABI for userspace to be able to operate on the
    patches (look up what patches are applied, enable/disable them, etc).

    It's relatively simple and minimalistic, as it's making use of
    existing kernel infrastructure (namely ftrace) as much as possible.
    It's also self-contained, in a sense that it doesn't hook itself in
    any other kernel subsystem (it doesn't even touch any other code).
    It's now implemented for x86 only as a reference architecture, but
    support for powerpc, s390 and arm is already in the works (adding
    arch-specific support basically boils down to teaching ftrace about
    regs-saving).

    Once this common infrastructure gets merged, both Red Hat and SUSE
    have agreed to immediately start porting their current solutions on
    top of this, abandoning their out-of-tree code. The plan basically is
    that each patch will be marked by flag(s) that would indicate which
    consistency model it is willing to use (again, the details have been
    sketched out already in the thread at [3]).

    Before this happens, the current codebase can be used to patch a large
    group of secruity/stability problems the patches for which are not too
    complex (in a sense that they don't introduce non-trivial change of
    function's return value semantics, they don't change layout of data
    structures, etc) -- this corresponds to LEAVE_FUNCTION &&
    SWITCH_FUNCTION semantics described at [3].

    This tree has been in linux-next since December.

    [1] https://lkml.org/lkml/2014/4/30/477
    [2] https://lkml.org/lkml/2014/7/14/857
    [3] https://lkml.org/lkml/2014/11/7/354
    [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt

    [ The core code is introduced by the three commits authored by Seth
    Jennings, which got a lot of changes incorporated during numerous
    respins and reviews of the initial implementation. All the followup
    commits have materialized only after public tree has been created,
    so they were not folded into initial three commits so that the
    public tree doesn't get rebased ]"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add missing newline to error message
    livepatch: rename config to CONFIG_LIVEPATCH
    livepatch: fix uninitialized return value
    livepatch: support for repatching a function
    livepatch: enforce patch stacking semantics
    livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING
    livepatch: fix deferred module patching order
    livepatch: handle ancient compilers with more grace
    livepatch: kconfig: use bool instead of boolean
    livepatch: samples: fix usage example comments
    livepatch: MAINTAINERS: add git tree location
    livepatch: use FTRACE_OPS_FL_IPMODIFY
    livepatch: move x86 specific ftrace handler code to arch/x86
    livepatch: samples: add sample live patching module
    livepatch: kernel: add support for live patching
    livepatch: kernel: add TAINT_LIVEPATCH

    Linus Torvalds
     
  • Merge misc updates from Andrew Morton:
    "Bite-sized chunks this time, to avoid the MTA ratelimiting woes.

    - fs/notify updates

    - ocfs2

    - some of MM"

    That laconic "some MM" is mainly the removal of remap_file_pages(),
    which is a big simplification of the VM, and which gets rid of a *lot*
    of random cruft and special cases because we no longer support the
    non-linear mappings that it used.

    From a user interface perspective, nothing has changed, because the
    remap_file_pages() syscall still exists, it's just done by emulating the
    old behavior by creating a lot of individual small mappings instead of
    one non-linear one.

    The emulation is slower than the old "native" non-linear mappings, but
    nobody really uses or cares about remap_file_pages(), and simplifying
    the VM is a big advantage.

    * emailed patches from Andrew Morton : (78 commits)
    memcg: zap memcg_slab_caches and memcg_slab_mutex
    memcg: zap memcg_name argument of memcg_create_kmem_cache
    memcg: zap __memcg_{charge,uncharge}_slab
    mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
    mm: hugetlb: fix type of hugetlb_treat_as_movable variable
    mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
    mm: memory: merge shared-writable dirtying branches in do_wp_page()
    mm: memory: remove ->vm_file check on shared writable vmas
    xtensa: drop _PAGE_FILE and pte_file()-related helpers
    x86: drop _PAGE_FILE and pte_file()-related helpers
    unicore32: drop pte_file()-related helpers
    um: drop _PAGE_FILE and pte_file()-related helpers
    tile: drop pte_file()-related helpers
    sparc: drop pte_file()-related helpers
    sh: drop _PAGE_FILE and pte_file()-related helpers
    score: drop _PAGE_FILE and pte_file()-related helpers
    s390: drop pte_file()-related helpers
    parisc: drop _PAGE_FILE and pte_file()-related helpers
    openrisc: drop _PAGE_FILE and pte_file()-related helpers
    nios2: drop _PAGE_FILE and pte_file()-related helpers
    ...

    Linus Torvalds
     
  • Pull xfs update from Dave Chinner:
    "This update contains:

    - RENAME_EXCHANGE support

    - Rework of the superblock logging infrastructure

    - Rework of the XFS_IOCTL_SETXATTR implementation
    * enables use inside user namespaces
    * fixes inconsistencies setting extent size hints

    - fixes for missing buffer type annotations used in log recovery

    - more consolidation of libxfs headers

    - preparation patches for block based PNFS support

    - miscellaneous bug fixes and cleanups"

    * tag 'xfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (37 commits)
    xfs: only trace buffer items if they exist
    xfs: report proper f_files in statfs if we overshoot imaxpct
    xfs: fix panic_mask documentation
    xfs: xfs_ioctl_setattr_check_projid can be static
    xfs: growfs should use synchronous transactions
    xfs: fix behaviour of XFS_IOC_FSSETXATTR on directories
    xfs: factor projid hint checking out of xfs_ioctl_setattr
    xfs: factor extsize hint checking out of xfs_ioctl_setattr
    xfs: XFS_IOCTL_SETXATTR can run in user namespaces
    xfs: kill xfs_ioctl_setattr behaviour mask
    xfs: disaggregate xfs_ioctl_setattr
    xfs: factor out xfs_ioctl_setattr transaciton preamble
    xfs: separate xflags from xfs_ioctl_setattr
    xfs: FSX_NONBLOCK is not used
    xfs: don't allocate an ioend for direct I/O completions
    xfs: change kmem_free to use generic kvfree()
    xfs: factor out a xfs_update_prealloc_flags() helper
    xfs: remove incorrect error negation in attr_multi ioctl
    xfs: set superblock buffer type correctly
    xfs: set buf types when converting extent formats
    ...

    Linus Torvalds
     
  • Pull ACPI and power management updates from Rafael Wysocki:
    "We have a few new features this time, including a new SFI-based
    cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new
    devfreq class for providing its governors with raw utilization data
    and a new ACPI driver for AMD SoCs.

    Still, the majority of changes here are reworks of existing code to
    make it more straightforward or to prepare it for implementing new
    features on top of it. The primary example is the rework of ACPI
    resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with
    support for IOAPIC hotplug implemented on top of it, but there is
    quite a number of changes of this kind in the cpufreq core, ACPICA,
    ACPI EC driver, ACPI processor driver and the generic power domains
    core code too.

    The most active developer is Viresh Kumar with his cpufreq changes.

    Specifics:

    - Rework of the core ACPI resources parsing code to fix issues in it
    and make using resource offsets more convenient and consolidation
    of some resource-handing code in a couple of places that have grown
    analagous data structures and code to cover the the same gap in the
    core (Jiang Liu, Thomas Gleixner, Lv Zheng).

    - ACPI-based IOAPIC hotplug support on top of the resources handling
    rework (Jiang Liu, Yinghai Lu).

    - ACPICA update to upstream release 20150204 including an interrupt
    handling rework that allows drivers to install raw handlers for
    ACPI GPEs which then become entirely responsible for the given GPE
    and the ACPICA core code won't touch it (Lv Zheng, David E Box,
    Octavian Purdila).

    - ACPI EC driver rework to fix several concurrency issues and other
    problems related to events handling on top of the ACPICA's new
    support for raw GPE handlers (Lv Zheng).

    - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power
    Subsystem) driver for Intel chips (Ken Xue).

    - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko
    Nikula).

    - Two new blacklist entries for machines (Samsung 730U3E/740U3E and
    510R) where the native backlight interface doesn't work correctly
    while the ACPI one does (Hans de Goede).

    - Rework of the ACPI processor driver's handling of idle states to
    make the code more straightforward and less bloated overall (Rafael
    J Wysocki).

    - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht,
    Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei
    Bai).

    - PCI core power management modification to avoid resuming (some)
    runtime-suspended devices during system suspend if they are in the
    right states already (Rafael J Wysocki).

    - New SFI-based cpufreq driver for Intel platforms using SFI
    (Srinidhi Kasagar).

    - cpufreq core fixes, cleanups and simplifications (Viresh Kumar,
    Doug Anderson, Wolfram Sang).

    - SkyLake CPU support and other updates for the intel_pstate driver
    (Kristen Carlson Accardi, Srinivas Pandruvada).

    - cpufreq-dt driver cleanup (Markus Elfring).

    - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla).

    - Generic power domains core code fixes and cleanups (Ulf Hansson).

    - Operating Performance Points (OPP) core code cleanups and kernel
    documentation update (Nishanth Menon).

    - New dabugfs interface to make the list of PM QoS constraints
    available to user space (Nishanth Menon).

    - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso).

    - New devfreq class (devfreq_event) to provide raw utilization data
    to devfreq governors (Chanwoo Choi).

    - Assorted minor fixes and cleanups related to power management
    (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel
    Machek, Todd E Brandt, Wonhong Kwon).

    - turbostat updates (Len Brown) and cpupower Makefile improvement
    (Sriram Raghunathan)"

    * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits)
    tools/power turbostat: relax dependency on APERF_MSR
    tools/power turbostat: relax dependency on invariant TSC
    Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
    tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
    tools/power turbostat: relax dependency on root permission
    ACPI / video: Add disable_native_backlight quirk for Samsung 510R
    ACPI / PM: Remove unneeded nested #ifdef
    USB / PM: Remove unneeded #ifdef and associated dead code
    intel_pstate: provide option to only use intel_pstate with HWP
    ACPI / EC: Add GPE reference counting debugging messages
    ACPI / EC: Add query flushing support
    ACPI / EC: Refine command storm prevention support
    ACPI / EC: Add command flushing support.
    ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag
    ACPI: add AMD ACPI2Platform device support for x86 system
    ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
    ACPI / EC: Update revision due to raw handler mode.
    ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
    ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode.
    ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
    ...

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Enumeration
    - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi)
    - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi)
    - ARM: Move to generic PCI domains (Lorenzo Pieralisi)
    - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado)
    - Add and use generic config accessors on ARM, PowerPC (Rob Herring)

    Resource management
    - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi)
    - Fix infinite loop with ROM image of size 0 (Michel Dänzer)

    PCI device hotplug
    - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas)

    Virtualization
    - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson)
    - Add DMA alias quirk for Adaptec 3405 (Alex Williamson)
    - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson)
    - Add ACS quirk for Emulex NICs (Vasundhara Volam)

    MSI
    - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang)

    Freescale Layerscape host bridge driver
    - Fix platform_no_drv_owner.cocci warnings (Julia Lawall)

    NVIDIA Tegra host bridge driver
    - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach)

    Renesas R-Car host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)

    TI Keystone host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
    - Fix misspelling of current function in debug output (Julia Lawall)

    Xilinx AXI host bridge driver
    - Fix harmless format string warning (Arnd Bergmann)

    Miscellaneous
    - Use standard parsing functions for ASPM sysfs setters (Chris J Arges)
    - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao)
    - Delete unnecessary NULL pointer checks (Markus Elfring)
    - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki)
    - Include clk.h instead of clk-private.h (Stephen Boyd)"

    * tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
    PCI: Add pci_device_to_OF_node() stub for !CONFIG_OF
    PCI: xilinx: Convert to use generic config accessors
    PCI: xgene: Convert to use generic config accessors
    PCI: tegra: Convert to use generic config accessors
    PCI: rcar: Convert to use generic config accessors
    PCI: generic: Convert to use generic config accessors
    powerpc/powermac: Convert PCI to use generic config accessors
    powerpc/fsl_pci: Convert PCI to use generic config accessors
    ARM: ks8695: Convert PCI to use generic config accessors
    ARM: sa1100: Convert PCI to use generic config accessors
    ARM: integrator: Convert PCI to use generic config accessors
    PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver
    ARM: dts: versatile: add PCI controller binding
    of/pci: Free resources on failure in of_pci_get_host_bridge_resources()
    PCI: versatile: Add DT docs for ARM Versatile PB PCIe driver
    PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR
    r8169: use PCI define for Max_Read_Request_Size
    [SCSI] esas2r: use PCI define for Max_Read_Request_Size
    tile: use PCI define for Max_Read_Request_Size
    rapidio/tsi721: use PCI define for Max_Read_Request_Size
    ...

    Linus Torvalds
     
  • We don't create non-linear mappings anymore. Let's drop code which
    handles them in rmap.

    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • remap_file_pages(2) was invented to be able efficiently map parts of
    huge file into limited 32-bit virtual address space such as in database
    workloads.

    Nonlinear mappings are pain to support and it seems there's no
    legitimate use-cases nowadays since 64-bit systems are widely available.

    Let's drop it and get rid of all these special-cased code.

    The patch replaces the syscall with emulation which creates new VMA on
    each remap_file_pages(), unless they it can be merged with an adjacent
    one.

    I didn't find *any* real code that uses remap_file_pages(2) to test
    emulation impact on. I've checked Debian code search and source of all
    packages in ALT Linux. No real users: libc wrappers, mentions in
    strace, gdb, valgrind and this kind of stuff.

    There are few basic tests in LTP for the syscall. They work just fine
    with emulation.

    To test performance impact, I've written small test case which
    demonstrate pretty much worst case scenario: map 4G shmfs file, write to
    begin of every page pgoff of the page, remap pages in reverse order,
    read every page.

    The test creates 1 million of VMAs if emulation is in use, so I had to
    set vm.max_map_count to 1100000 to avoid -ENOMEM.

    Before: 23.3 ( +- 4.31% ) seconds
    After: 43.9 ( +- 0.85% ) seconds
    Slowdown: 1.88x

    I believe we can live with that.

    Test case:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #define MB (1024UL * 1024)
    #define SIZE (4096 * MB)

    int main(int argc, char **argv)
    {
    unsigned long *p;
    long i, pass;

    for (pass = 0; pass < 10; pass++) {
    p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED) {
    perror("mmap");
    return -1;
    }

    for (i = 0; i < SIZE / 4096; i++)
    p[i * 4096 / sizeof(*p)] = i;

    for (i = 0; i < SIZE / 4096; i++) {
    if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
    0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
    perror("remap_file_pages");
    return -1;
    }
    }

    for (i = SIZE / 4096 - 1; i >= 0; i--)
    assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

    munmap(p, SIZE);
    }

    return 0;
    }

    [akpm@linux-foundation.org: fix spello]
    [sasha.levin@oracle.com: initialize populate before usage]
    [sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
    Signed-off-by: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Linus Torvalds
    Cc: Armin Rigo
    Signed-off-by: Sasha Levin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • __generic_block_fiemap may spin very long time for large sparse files.

    Without this patch an unprivileged user may abuse system resources simply
    by spawning a vast number of unkilable busyloops (works on ext2/ext3):

    truncate --size 1T test
    for ((i=0;i /dev/null &
    done

    Signed-off-by: Dmitry Monakhov
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Monakhov
     
  • Add a mount option to support JBD2 feature:

    JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
    commit block can be written to disk without waiting for descriptor blocks,
    which can improve journal commit performance. This option will enable
    'journal_checksum' internally.

    Using the fs_mark benchmark, using journal_async_commit shows a 50%
    improvement, the files per second go up from 215.2 to 317.5.

    test script:
    fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000

    default:
    FSUse% Count Size Files/sec App Overhead
    0 1000 10240 215.2 17878

    with journal_async_commit option:
    FSUse% Count Size Files/sec App Overhead
    0 1000 10240 317.5 17881

    Signed-off-by: Alex Chen
    Signed-off-by: Weiwei Wang
    Reviewed-by: Joseph Qi
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    alex chen
     
  • The inotify interface has changed a lot. The user interface was too
    old, and the kernel interface was removed by Eric Paris in commit:
    2dfc1ca inotify: remove inotify in kernel interface.

    Signed-off-by: Zhang Zhen
    Cc: Wang Kai
    Cc: Eric Paris
    Cc: Robert Love
    Cc: John McCutchan
    Cc: Heinrich Schuchardt
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Zhen
     
  • Prepare first round of input updates for 3.20.

    Dmitry Torokhov