26 Jul, 2008

40 commits

  • fix:

    arch/x86/ia32/built-in.o: In function `ia32_sys_call_table':
    (.rodata+0xa38): undefined reference to `compat_sys_signalfd4'

    on !CONFIG_SIGNALFD.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • on !CONFIG_SYSCTL on x86 with latest -git i get:

    mm/hugetlb.c: In function 'decrement_hugepage_resv_vma':
    mm/hugetlb.c:83: error: 'reserve' undeclared (first use in this function)
    mm/hugetlb.c:83: error: (Each undeclared identifier is reported only once
    mm/hugetlb.c:83: error: for each function it appears in.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
    powerpc: Wireup new syscalls
    Move update_mmu_cache() declaration from tlbflush.h to pgtable.h
    powerpc/pseries: Remove kmalloc call in handling writes to lparcfg
    powerpc/pseries: Update arch vector to indicate support for CMO
    ibmvfc: Add support for collaborative memory overcommit
    ibmvscsi: driver enablement for CMO
    ibmveth: enable driver for CMO
    ibmveth: Automatically enable larger rx buffer pools for larger mtu
    powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O
    powerpc/pseries: vio bus support for CMO
    powerpc/pseries: iommu enablement for CMO
    powerpc/pseries: Add CMO paging statistics
    powerpc/pseries: Add collaborative memory manager
    powerpc/pseries: Utilities to set firmware page state
    powerpc/pseries: Enable CMO feature during platform setup
    powerpc/pseries: Split retrieval of processor entitlement data into a helper routine
    powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg
    powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines
    powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg
    powerpc: Fix compile error with binutils 2.15
    ...

    Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.

    Linus Torvalds
     
  • * 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6: (22 commits)
    UBI: always start the background thread
    UBI: fix gcc warning
    UBI: remove pre-sqnum images support
    UBI: fix kernel-doc errors and warnings
    UBI: fix checkpatch.pl errors and warnings
    UBI: bugfix - do not torture PEB needlessly
    UBI: rework scrubbing messages
    UBI: implement multiple volumes rename
    UBI: fix and re-work debugging stuff
    UBI: amend commentaries
    UBI: fix error message
    UBI: improve mkvol request validation
    UBI: add ubi_sync() interface
    UBI: fix 64-bit calculations
    UBI: fix LEB locking
    UBI: fix memory leak on error path
    UBI: do not forget to free internal volumes
    UBI: fix memory leak
    UBI: avoid unnecessary division operations
    UBI: fix buffer padding
    ...

    Linus Torvalds
     
  • The new type checking of the flags arguments to irqsave and friends
    (commit 3f307891ce0e7b0438c432af1aacd656a092ff45) pointed out this thing
    with a big nice warning.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Convert PCI err device from platform to open firmware of_dev to comply
    with powerpc schemes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Fixup of missing bit 0 on 64360 PCIx_ERR_MASK and errata FEr-#11 and
    FEr-#16 for the 64460. Bit 0 must remain 0.

    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Update get_property() call to use of_get_property() in order to fix compile

    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • This module harvests more than just memory errors, it also harvests
    various bus and dma errors that the Chipset detects. Previously, it would
    report all such errors, which would cause output to be TOO loud.

    This patches therefore adds a parameter which is used to turn off
    NON-MEMORY error reports by default. Or the reporting can be enabled via
    the parameter

    Also did code style cleanup: less than 80 characters per line rule

    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • The channel DIMM label does not seem to be used much in the edac code.
    However, where it is used (in the core code), it is assumed to not have a
    newline embedded. This leaves the sysfs file newline free which looks
    funny when cat'ing it. Here we just add the trailing newline to the sysfs
    chX_dimm_label output...

    [Doug Thompson note: the DIMM label is one of the primary uses of EDAC.
    User space daemon scripts, edac-utils@sourceforge, populate the DIMM label
    fields, via /sys/devices/system/edac attributes, with the silk screen
    labels of the motherboard in use. dmidecode access BIOS tables, but BIOS
    tables are well known to be incorrect and useless in these respects.
    edac-utils will strip off any newlines before its use of the output, when
    displaying DIMM slot silk screen labels.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Static kobjects and ksets are not supported in Linux kernel. Convert the
    mc_kset from static to dynamic. This patch depends on my previous patch
    to remove the module parameter attributes from mc...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • /sys/devices/system/edac/mc has a few files which are duplicated in
    /sys/module/edac_core/parameters. Now that all the functionality is
    duplicated between these two locations, we remove the former kobject
    attributes and update the documentation.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • When updating the edac_mc_poll_msec module parameter from the sysfs
    /sys/module/edac_core/parameters/edac_mc_poll_msec file, we don't update
    the workq timers. So that, if we move from a big poll time to a small
    one, the small one won't take effect until the big one has timed out.

    Here we provide a new module parameter set method to call out to the
    update routine. This brings the /sys/module/edac_core/parameters
    functionality up to that provided by the /sys/drivers/system/edac/mc sysfs
    module parameter files so that we can remove them or at least link to the
    /sys/module files...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Static kobjects are not supported in linux kernel. Convert the
    edac_pci_top_main_kobj from static to dynamic. This avoids the double
    free of the edac_pci_top_main_kobj.name that we see on module reload of
    the e752x edac driver (and probably others as well).

    In addition Greg KH has pointed out that this code may be
    cleaned up significantly. I will look at that as a follow-on patch, for
    now, I just want the minimum fix to get this double-free oops bug
    squashed...

    Many thanks to Greg KH for his patience in showing me what the
    Documentation/kobject.txt already said (oops)...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Some code cleanliness issues found by Andrew Morton (thanks!) which should
    not affect functionality, but which should help make the code more
    maintainable.

    In particular, we now:

    * convert all #define's w/ a parameter to static inlines
    * use 1UL rather than 1ULL when calculating an unsigned long
    * use pci_disable_device

    The resulting code is tested and seems to work fine...

    Signed-off-by: Arthur Jones
    Cc: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Explicitly unmask ECC errors we are interested in reporting.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • It is possible that the BIOS did not enable ECC at boot time. We check
    for that case and fail to load if it is true.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • The error mask we use to trigger ECC notifications is missing many bits of
    interest. We add these bits here so that all possible ECC errors can be
    reported.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Preliminary support for the Intel 5100 MCH. CE and UE errors are reported
    along with the current DIMM label information and other memory parameters.

    Reasons why this is preliminary:

    1) This chip has 2 independent memory controllers which, for best
    perforance, use interleaved accesses to the DDR2 memory. This
    architecture does not map very well to the current edac data structures
    which depend on symmetric channel access to the interleaved data.
    Without core changes, the best I could do for now is to map both memory
    controllers to different csrows (first all ranks of controller 0, then
    all ranks of controller 1). Someone much more familiar with the edac
    core than I will probably need to come up with a more general data
    structure to handle the interleaving and de-interleaving of the two
    memory controllers.

    2) I have not yet tackled the de-interleaving of the rank/controller
    address space into the physical address space of the CPU. There is
    nothing fundamentally missing, it is just ending up to be a lot of
    code, and I'd rather keep it separate for now, esp since it doesn't
    work yet...

    3) The code depends on a particular i5100 chip select to DIMM mainboard
    chip select mapping. This mapping seems obvious to me in order to
    support dual and single ranked memory, but it is not unique and DIMM
    labels could be wrong on other mainboards. There is no way to query
    this mapping that I know of.

    4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per
    controller, 2 ranks per DIMM are supported. I do not have hardware
    (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
    per controller) mode.

    5) The serial presence detect code should be broken out into a "real"
    i2c driver so that decode-dimms.pl can work.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • If fuse filesystem doesn't define it's own lock operations, then allow the
    lock manager to work with fuse.

    Adding lockd support for remote locking is also possible, but more rarely
    used, so leave it till later.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Implement the get_parent export operation by sending a LOOKUP request with
    ".." as the name.

    Implement looking up an inode by node ID after it has been evicted from
    the cache. This is done by seding a LOOKUP request with "." as the name
    (for all file types, not just directories).

    The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
    indicate that it supports these special lookups.

    Thanks to John Muir for the original implementation of this feature.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a new helper function which sends a LOOKUP request with the supplied
    name. This will be used by the next patch to send special LOOKUP requests
    with "." and ".." as the name.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Implement export_operations, to allow fuse filesystems to be exported to
    NFS. This feature has been in the out-of-tree fuse module, and is widely
    used and tested.

    It has not been originally merged into mainline, because doing the NFS
    export in userspace was thought to be a cleaner and more efficient way of
    doing it, than through the kernel.

    While that is true, it would also have involved a lot of duplicated effort
    at reimplementing NFS exporting (all the different versions of the
    protocol). This effort was unfortunately not undertaken by anyone, so we
    are left with doing it the easy but less efficient way.

    If this feature goes in, the out-of-tree fuse module can go away,
    which would have several advantages:

    - not having to maintain two versions
    - less confusion for users
    - no bugs due to kernel API changes

    Comment from hch:
    - Use the same fh_type values as XFS, since we use the same fh encoding.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Use d_splice_alias() instead of d_add() in fuse lookup code, to allow NFS
    exporting.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Allow filesystem's ->lock() method to call posix_lock_file() instead of
    posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it
    possible to implement a such a ->lock() function, that works with the lock
    manager, which needs the call to be asynchronous.

    Now the vfs_lock_file() helper can be used, so this is a cleanup as well.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Extract common code into a function.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Use a special error value FILE_LOCK_DEFERRED to mean that a locking
    operation returned asynchronously. This is returned by

    posix_lock_file() for sleeping locks to mean that the lock has been
    queued on the block list, and will be woken up when it might become
    available and needs to be retried (either fl_lmops->fl_notify() is
    called or fl_wait is woken up).

    f_op->lock() to mean either the above, or that the filesystem will
    call back with fl_lmops->fl_grant() when the result of the locking
    operation is known. The filesystem can do this for sleeping as well
    as non-sleeping locks.

    This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
    filesystems are not mistaken to mean an asynchronous locking.

    This also makes error handling in fs/locks.c and lockd/svclock.c slightly
    cleaner.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead
    of NLM_LCK_DENIED. The latter means the lock request failed because of a
    conflicting lock (i.e. a temporary error), which is wrong in this case.

    Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock
    request returns with NLM_LOCK_DENIED.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Cc: Shailabh Nagar
    Signed-off-by: Vegard Nossum
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • Update document and make getdelays.c show delay accounting for memory reclaim.

    For making a distinction between "swapping in pages" and "memory reclaim"
    in getdelays.c, MEM is changed to SWAP.

    Signed-off-by: Keika Kobayashi
    Acked-by: Balbir Singh
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     
  • Add members for memory reclaim delay to taskstats, and accumulate them in
    __delayacct_add_tsk() .

    Signed-off-by: Keika Kobayashi
    Cc: Hiroshi Shimamoto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     
  • Sometimes, application responses become bad under heavy memory load.
    Applications take a bit time to reclaim memory. The statistics, how long
    memory reclaim takes, will be useful to measure memory usage.

    This patch adds accounting memory reclaim to per-task-delay-accounting for
    accounting the time of do_try_to_free_pages().

    - When System is under low memory load,
    memory reclaim may not occur.

    $ free
    total used free shared buffers cached
    Mem: 8197800 1577300 6620500 0 4808 1516724
    -/+ buffers/cache: 55768 8142032
    Swap: 16386292 0 16386292

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0

    Measure the time of tar command.

    $ ls -s test.dat
    1501472 test.dat

    $ time tar cvf test.tar test.dat
    real 0m13.388s
    user 0m0.116s
    sys 0m5.304s

    $ ./delayget -d -p
    CPU count real total virtual total delay total
    428 5528345500 5477116080 62749891
    IO count delay total
    338 8078977189
    SWAP count delay total
    0 0
    RECLAIM count delay total
    0 0

    - When system is under heavy memory load
    memory reclaim may occur.

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
    0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
    0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0

    In this case, one process uses more 8G memory
    by execution of malloc() and memset().

    $ time tar cvf test.tar test.dat
    real 1m38.563s
    CPU count real total virtual total delay total
    9021 7140446250 7315277975 923201824
    IO count delay total
    8965 90466349669
    SWAP count delay total
    3 21036367
    RECLAIM count delay total
    740 61011951153

    In the later case, the value of RECLAIM is increasing.
    So, taskstats can show how much memory reclaim influences TAT.

    Signed-off-by: Keika Kobayashi
    Acked-by: Balbir Singh
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     
  • Fix bacct_add_tsk()'s use of do_div() on an s64 by making ac_etime a u64
    instead and dividing that.

    Possibly this should be guarded lest the interval calculation turn up
    negative, but the possible negativity of the result of the division is
    cast away, and it shouldn't end up negative anyway.

    This was introduced by patch f3cef7a99469afc159fec3a61b42dc7ca5b6824f.

    Signed-off-by: David Howells
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
    parent I/O statistics in /proc/pid/io. This approach follows the same
    model used to account per-process and per-thread CPU times.

    As a practial application, this allows for example to quickly find the top
    I/O consumer when a process spawns many child threads that perform the
    actual I/O work, because the aggregated I/O statistics can always be found
    in /proc/pid/io.

    [ Oleg Nesterov points out that we should check that the task is still
    alive before we iterate over the threads, but also says that we can do
    that fixup on top of this later. - Linus ]

    Acked-by: Balbir Singh
    Signed-off-by: Andrea Righi
    Cc: Matt Heaton
    Cc: Shailabh Nagar
    Acked-by-with-comments: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     
  • Fix the one describing what this function is and add one more - about
    locking absence around pid namespaces loop.

    Signed-off-by: Pavel Emelyanov
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This just makes the acct_proces walk the pid namespaces from current up to
    the top and account a task in each with the accounting turned on.

    ns->parent access if safe lockless, since current it still alive and holds
    its namespace, which in turn holds its parent.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • All the bsd_acct_strcts with opened accounting are linked into a global
    list. So, the acct_auto_close(_mnt) walks one and drops the accounting
    for each.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Allocate the structure on the first call to sys_acct(). After this each
    namespace, that ordered the accounting, will live with this structure till
    its own death.

    Two notes
    - routines, that close the accounting on fs umount time use
    the init_pid_ns's acct by now;
    - accounting routine accounts to dying task's namespace
    (also by now).

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This adds the appropriate pointer to all the internal (i.e. static)
    functions that work with global acct instance. API calls pass a global
    instance to them (while we still have such).

    Mostly this is a s/acct_globals./acct->/ over the file.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Don't use per-bsd-acct-struct lock, but work with a global one.

    This lock is taken for short periods, so it doesn't seem it'll become a
    bottleneck, but it will allow us to easily avoid many locking difficulties
    in the future.

    So this is a mostly s/acct_globals.lock/acct_lock/ over the file.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov