15 Nov, 2007

40 commits

  • Fix http://bugzilla.kernel.org/show_bug.cgi?id=9247

    Allow sigcont to be sent to a process with greater capabilities if it is in
    the same session. Otherwise, a shell from which I've started a root shell
    and done 'suspend' can't be restarted by the parent shell.

    Also don't do file-capabilities signaling checks when uids for the
    processes don't match, since the standard check_kill_permission will have
    done those checks.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Serge E. Hallyn
    Acked-by: Andrew Morgan
    Cc: Chris Wright
    Tested-by: "Theodore Ts'o"
    Cc: Stephen Smalley
    Cc: "Rafael J. Wysocki"
    Cc: Chris Wright
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • The delay incurred in lock_page() should also be accounted in swap delay
    accounting

    Reported-by: Nick Piggin
    Signed-off-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Handle the case of CONFIG_PRINTK being disabled. This requires a do-nothing
    stub to be present in arch/um/include/user.h so that we don't get references
    to printk from libc code.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Make UML build in the absence of CONFIG_INET by making the inetaddr_notifier
    registration depend on it.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • asm/page.h is disappearing from the libc headers and we don't need it anyway.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • The spurious IRQ testing in request_irq is mishandled in um_request_irq, which
    sets the incoming file descriptors non-blocking only after request_irq
    succeeds. This results in the spurious irq calling read on a blocking
    descriptor, and a hang.

    Fixed by reversing the O_NONBLOCK setting and the request_irq call.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • With 64KB blocksize, a directory entry can have size 64KB which does not
    fit into 16 bits we have for entry lenght. So we store 0xffff instead and
    convert value when read from / written to disk. The patch also converts
    some places to use ext3_next_entry() when we are changing them anyway.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that
    builds with -Werror don't fail.

    Signed-off-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • Lockdep reports a circular locking dependency in the hibernate code
    because
    - during system boot hibernate code (from an initcall) locks pm_mutex
    and then a sysfs buffer mutex via name_to_dev_t
    - during regular operation hibernate code locks pm_mutex under a
    sysfs buffer mutex because it's called from sysfs methods.

    The deadlock can never happen because during initcall invocation nothing
    can write to sysfs yet. This removes the lockdep report by marking the
    initcall locking as being in a different class.

    Signed-off-by: Johannes Berg
    Cc: "Rafael J. Wysocki"
    Cc: Alan Stern
    Acked-by: Peter Zijlstra
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     
  • In __do_IRQ(), the normal case is that IRQ_DISABLED is checked and if set
    the handler (handle_IRQ_event()) is not called.

    Earlier in __do_IRQ(), if IRQ_PER_CPU is set the code does not check
    IRQ_DISABLED and calls the handler even though IRQ_DISABLED is set. This
    behavior seems unintentional.

    One user encountering this behavior is the CPE handler (in
    arch/ia64/kernel/mca.c). When the CPE handler encounters too many CPEs
    (such as a solid single bit error), it sets up a polling timer and disables
    the CPE interrupt (to avoid excessive overhead logging the stream of single
    bit errors). disable_irq_nosync() is called which sets IRQ_DISABLED. The
    IRQ_PER_CPU flag was previously set (in ia64_mca_late_init()). The net
    result is the CPE handler gets called even though it is marked disabled.

    If the behavior of not checking IRQ_DISABLED when IRQ_PER_CPU is set is
    intentional, it would be worthy of a comment describing the intended
    behavior. disable_irq_nosync() does call chip->disable() to provide a
    chipset specifiec interface for disabling the interrupt, which avoids this
    issue when used.

    Signed-off-by: Russ Anderson
    Cc: "Luck, Tony"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russ Anderson
     
  • This is my trivial patch to swat innumerable little bugs with a single
    blow.

    After some intensive review (my apologies for not having gotten to this
    sooner) what we have looks like a good base to build on with the current
    pid namespace code but it is not complete, and it is still much to simple
    to find issues where the kernel does the wrong thing outside of the initial
    pid namespace.

    Until the dust settles and we are certain we have the ABI and the
    implementation is as correct as humanly possible let's keep process ID
    namespaces behind CONFIG_EXPERIMENTAL.

    Allowing us the option of fixing any ABI or other bugs we find as long as
    they are minor.

    Allowing users of the kernel to avoid those bugs simply by ensuring their
    kernel does not have support for multiple pid namespaces.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Adrian Bunk
    Cc: Jeremy Fitzhardinge
    Cc: Kir Kolyshkin
    Cc: Kirill Korotaev
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Mark start_cpu_timer() as __cpuinit instead of __devinit.
    Fixes this section warning:

    WARNING: vmlinux.o(.text+0x60e53): Section mismatch: reference to .init.text:start_cpu_timer (between 'vmstat_cpuup_callback' and 'vmstat_show')

    Signed-off-by: Randy Dunlap
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Make 'default_mode' and 'default_var' be __initdata.
    Fixes these section warnings:

    WARNING: vmlinux.o(.data+0x128e0): Section mismatch: reference to .init.data:default_mode_CRT (between 'default_mode' and 'default_var')
    WARNING: vmlinux.o(.data+0x128e4): Section mismatch: reference to .init.data:default_var_CRT (between 'default_var' and 'dev_attr_size')

    Signed-off-by: Randy Dunlap
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • sys_open / sys_read were used in the early 1.2 days to load firmware from
    disk inside drivers. Since 2.0 or so this was deprecated behavior, but
    several drivers still were using this. Since a few years we have a
    request_firmware() API that implements this in a nice, consistent way.
    Only some old ISA sound drivers (pre-ALSA) still straggled along for some
    time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the
    last user is now gone.

    This is a good thing, since using sys_open / sys_read etc for firmware is a
    very buggy to dangerous thing to do; these operations put an fd in the
    process file descriptor table.... which then can be tampered with from
    other threads for example. For those who don't want the firmware loader,
    filp_open()/vfs_read are the better APIs to use, without this security
    issue.

    The patch below marks sys_open and sys_read unused now that they're
    really not used anymore, and for deletion in the 2.6.25 timeframe.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Commit faf8c714f4508207a9c81cc94dafc76ed6680b44 caused a regression:
    parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
    although we just need to keep the module name part that short. This patch
    restores the old behaviour while still avoiding that memchr is called with
    its length parameter larger than the total string length.

    Signed-off-by: Jan Kiszka
    Cc: Dave Young
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kiszka
     
  • Currently we special case when we have only the initial pid namespace.
    Unfortunately in doing so the copied case for the other namespaces was
    broken so we don't properly flush the thread directories :(

    So this patch removes the unnecessary special case (removing a usage of
    proc_mnt) and corrects the flushing of the thread directories.

    Signed-off-by: Eric W. Biederman
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Roel Kluin
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roel Kluin
     
  • Signed-off-by: Roel Kluin
    Cc: Mikael Starvik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roel Kluin
     
  • We have seen ramdisk based install systems, where some pages of mapped
    libraries and programs were suddendly zeroed under memory pressure. This
    should not happen, as the ramdisk avoids freeing its pages by keeping them
    dirty all the time.

    It turns out that there is a case, where the VM makes a ramdisk page clean,
    without telling the ramdisk driver. On memory pressure shrink_zone runs
    and it starts to run shrink_active_list. There is a check for
    buffer_heads_over_limit, and if true, pagevec_strip is called.
    pagevec_strip calls try_to_release_page. If the mapping has no releasepage
    callback, try_to_free_buffers is called. try_to_free_buffers has now a
    special logic for some file systems to make a dirty page clean, if all
    buffers are clean. Thats what happened in our test case.

    The simplest solution is to provide a noop-releasepage callback for the
    ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.

    Signed-off-by: Christian Borntraeger
    Acked-by: Nick Piggin
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christian Borntraeger
     
  • The tle62x0 driver was ignoring all read errors. This patch makes it
    pass such errors up the stack, instead of returning bogus data.

    Signed-off-by: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • Fix obvious NULL dereferences spotted by the Coverity checker.

    Signed-off-by: Adrian Bunk
    Acked-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Commit ef8b4520bd9f8294ffce9abd6158085bde5dc902 added one NULL check for
    "p" in krealloc(), but that doesn't seem to be enough since there
    doesn't seem to be any guarantee that memcpy(ret, NULL, 0) works
    (spotted by the Coverity checker).

    For making it clearer what happens this patch also removes the pointless
    min().

    Signed-off-by: Adrian Bunk
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Fix an obvious use-after-free spotted by the Coverity checker.

    Signed-off-by: Adrian Bunk
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • "Luming Yu" says:

    There is a "ttyS1 irq is -1" problem observed on tiger4 which cause the
    serial port broken.

    It is because that there is __no__ ACPI IRQ resource assigned for the
    serial port. So the value of the IRQ for the port is never changed since it
    got initialized to -1.

    If PNP supplies a valid IRQ, use it. Otherwise, leave port.irq == 0, which
    means "no IRQ" to the serial core.

    Signed-off-by: Bjorn Helgaas
    Cc: Yu Luming
    Acked-by: Matthew Wilcox
    Cc: Alan Cox
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • The i5000_edac driver's PCI registration structure has the name
    ""i5000_edac"" (with extra set of double-quotes) which is probably not
    intentional. Get rid of __stringify.

    Signed-off-by: Darrick J. Wong
    Cc: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Firmware like PNPBIOS or ACPI can report the address space consumed by the
    RTC. The actual space consumed may be less than the size (RTC_IO_EXTENT)
    assumed by the RTC driver.

    The PNP core doesn't request resources yet, but I'd like to make it do so.
    If/when it does, the RTC_IO_EXTENT request may fail, which prevents the RTC
    driver from loading.

    Since we only use the RTC index and data registers at RTC_PORT(0) and
    RTC_PORT(1), we can fall back to requesting just enough space for those.

    If the PNP core requests resources, this results in typical I/O port usage
    like this:

    0070-0073 : 00:06
    Cc: Alessandro Zummo
    Cc: David Brownell
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • The misc_register() error path always released an I/O port region,
    even if the region was memory-mapped (only mips uses memory-mapped RTC,
    as far as I can see).

    Signed-off-by: Bjorn Helgaas
    Cc: Alessandro Zummo
    Cc: David Brownell
    Acked-by: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the
    same way.

    Reiserfs could accumulate dirty sub-page-size files until umount time.
    They cannot be synced to disk by pdflush routines or explicit `sync'
    commands. Only `umount' can do the trick.

    The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
    Call trace:
    [] cancel_dirty_page+0xd0/0xf0
    [] :reiserfs:reiserfs_cut_from_item+0x660/0x710
    [] :reiserfs:reiserfs_do_truncate+0x271/0x530
    [] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
    [] :reiserfs:reiserfs_file_release+0x1e0/0x340
    [] __fput+0xcc/0x1b0
    [] fput+0x16/0x20
    [] filp_close+0x56/0x90
    [] sys_close+0xad/0x110
    [] system_call+0x7e/0x83

    Fix the bug by removing the cancel_dirty_page() call. Tests show that
    it causes no bad behaviors on various write sizes.

    === for the patient ===
    Here are more detailed demonstrations of the problem.

    1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
    and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.

    ------------------------------ screen 0 ------------------------------
    [T0] root /home/wfg# cat > /test/tiny
    [T1] hi
    [T2] root /home/wfg#

    ------------------------------ screen 1 ------------------------------
    [T1] root /home/wfg# echo /test/tiny > /proc/filecache
    [T1] root /home/wfg# cat /proc/filecache
    # file /test/tiny
    # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
    # idx len state refcnt
    0 1 ___UD__Bd_ 2
    [T2] root /home/wfg# cat /proc/filecache
    # file /test/tiny
    # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
    # idx len state refcnt
    0 1 ___U___Bd_ 2

    2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.

    ------------------------------ screen 0 ------------------------------
    [T0] root /home/wfg# echo hi > /tmp/hi
    [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
    [T2] hi
    [T3] root /home/wfg#

    ------------------------------ screen 1 ------------------------------
    [T1] root /proc/4397# cd /proc/`pidof cp`
    [T1] root /proc/4713# cat io
    rchar: 8396
    wchar: 3
    syscr: 20
    syscw: 1
    read_bytes: 0
    write_bytes: 20480
    cancelled_write_bytes: 4096
    [T2] root /proc/4713# cat io
    rchar: 8399
    wchar: 6
    syscr: 21
    syscw: 2
    read_bytes: 0
    write_bytes: 24576
    cancelled_write_bytes: 4096

    //Question: the 'write_bytes' is a bit more than expected ;-)

    Tested-by: Maxim Levitsky
    Cc: Peter Zijlstra
    Cc: Jeff Mahoney
    Signed-off-by: Fengguang Wu
    Reviewed-by: Chris Mason
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Add support for version 2 of the ioatdma device. This device handles
    the descriptor chain and DCA services slightly differently:
    - Instead of moving the dma descriptors between a busy and an idle chain,
    this new version uses a single circular chain so that we don't have
    rewrite the next_descriptor pointers as we add new requests, and the
    device doesn't need to re-read the last descriptor.
    - The new device has the DCA tags defined internally instead of needing
    them defined statically.

    Signed-off-by: Shannon Nelson
    Cc: "Williams, Dan J"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shannon Nelson
     
  • Add the field names to marker example format string.

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Describes the format string standard further: Use of field names before the
    type specifiers..

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Upon module load, we must take the markers mutex. It implies that the marker
    mutex must be nested inside the module mutex.

    It implies changing the nesting order : now the marker mutex nests inside the
    module mutex. Make the necessary changes to reverse the order in which the
    mutexes are taken.

    Includes some cleanup from Dave Hansen .

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • I found a few bugs in the BFS driver. Detailed description of the bugs as
    well as the steps to reproduce the errors are given in the kernel bugzilla.
    Please follow these links for more information:

    http://bugzilla.kernel.org/show_bug.cgi?id=9363
    http://bugzilla.kernel.org/show_bug.cgi?id=9364
    http://bugzilla.kernel.org/show_bug.cgi?id=9365
    http://bugzilla.kernel.org/show_bug.cgi?id=9366

    This patch fixes the bugs described above. Besides, the patch introduces
    coding style changes to make the BFS driver conform to the requirements
    specified for Linux kernel code. Finally, I made a few cosmetic changes
    such as removal of trivial debug output.

    Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
    in-core superblock structure. These fields are initialized but never
    actually used.

    If you are wondering why I need BFS, here is the answer: I am using this
    driver in the context of Linux kernel classes I am teaching in the Moscow
    State University and in the International Institute of Information
    Technology in Pune, India.

    Signed-off-by: Dmitri Vorobiev
    Cc: Tigran Aivazian
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitri Vorobiev
     
  • Revert 62d0df64065e7c135d0002f069444fbdfc64768f.

    This was originally intended as a simple initial example of how to create a
    control groups subsystem; it wasn't intended for mainline, but I didn't make
    this clear enough to Andrew.

    The CFS cgroup subsystem now has better functionality for the per-cgroup usage
    accounting (based directly on CFS stats) than the "usage" status file in this
    patch, and the "load" status file is rather simplistic - although having a
    per-cgroup load average report would be a useful feature, I don't believe this
    patch actually provides it. If it gets into the final 2.6.24 we'd probably
    have to support this interface for ever.

    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • For administrative purpose, we want to query actual block usage for
    hugetlbfs file via fstat. Currently, hugetlbfs always return 0. Fix that
    up since kernel already has all the information to track it properly.

    Signed-off-by: Ken Chen
    Acked-by: Adam Litke
    Cc: Badari Pulavarty
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     
  • return_unused_surplus_pages() can become static.

    Signed-off-by: Adrian Bunk
    Acked-by: Adam Litke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are reserved
    to guarantee no problems will occur later when instantiating pages. If quotas
    are in force, page instantiation could fail due to a race with another process
    or an oversized (but approved) shared mapping.

    To prevent these scenarios, debit the quota for the full reservation amount up
    front and credit the unused quota when the reservation is released.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
    allow bulk updating of the sbinfo->free_blocks counter. This will be used by
    the next patch in the series.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • Now that quota is credited by free_huge_page(), calls to hugetlb_get_quota()
    seem out of place. The alloc/free API is unbalanced because we handle the
    hugetlb_put_quota() but expect the caller to open-code hugetlb_get_quota().
    Move the get inside alloc_huge_page to clean up this disparity.

    This patch has been kept apart from the previous patch because of the somewhat
    dodgy ERR_PTR() use herein. Moving the quota logic means that
    alloc_huge_page() has two failure modes. Quota failure must result in a
    SIGBUS while a standard allocation failure is OOM. Unfortunately, ERR_PTR()
    doesn't like the small positive errnos we have in VM_FAULT_* so they must be
    negated before they are used.

    Does anyone take issue with the way I am using PTR_ERR. If so, what are your
    thoughts on how to clean this up (without needing an if,else if,else block at
    each alloc_huge_page() callsite)?

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
    mappings when that support was added. Currently, quota is debited at page
    instantiation and credited at file truncation. This approach works correctly
    for shared pages but is incomplete for private pages. In addition to
    hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
    this function does not respect quotas.

    Private huge pages are treated very much like normal, anonymous pages. They
    are not "backed" by the hugetlbfs file and are not stored in the mapping's
    radix tree. This means that private pages are invisible to
    truncate_hugepages() so that function will not credit the quota.

    This patch (based on a prototype provided by Ken Chen) moves quota crediting
    for all pages into free_huge_page(). page->private is used to store a pointer
    to the mapping to which this page belongs. This is used to credit quota on
    the appropriate hugetlbfs instance.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke