21 Oct, 2008

2 commits

  • …/git/tip/linux-2.6-tip

    This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu
    and x86/uv.

    The sparseirq branch is just preliminary groundwork: no sparse IRQs are
    actually implemented by this tree anymore - just the new APIs are added
    while keeping the old way intact as well (the new APIs map 1:1 to
    irq_desc[]). The 'real' sparse IRQ support will then be a relatively
    small patch ontop of this - with a v2.6.29 merge target.

    * 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits)
    genirq: improve include files
    intr_remapping: fix typo
    io_apic: make irq_mis_count available on 64-bit too
    genirq: fix name space collisions of nr_irqs in arch/*
    genirq: fix name space collision of nr_irqs in autoprobe.c
    genirq: use iterators for irq_desc loops
    proc: fixup irq iterator
    genirq: add reverse iterator for irq_desc
    x86: move ack_bad_irq() to irq.c
    x86: unify show_interrupts() and proc helpers
    x86: cleanup show_interrupts
    genirq: cleanup the sparseirq modifications
    genirq: remove artifacts from sparseirq removal
    genirq: revert dynarray
    genirq: remove irq_to_desc_alloc
    genirq: remove sparse irq code
    genirq: use inline function for irq_to_desc
    genirq: consolidate nr_irqs and for_each_irq_desc()
    x86: remove sparse irq from Kconfig
    genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n
    ...

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
    fix documentation of sysrq-q really
    Fix documentation of sysrq-q
    timer_list: add base address to clock base
    timer_list: print cpu number of clockevents device
    timer_list: print real timer address
    NOHZ: restart tick device from irq_enter()
    NOHZ: split tick_nohz_restart_sched_tick()
    NOHZ: unify the nohz function calls in irq_enter()
    timers: fix itimer/many thread hang, fix
    timers: fix itimer/many thread hang, v3
    ntp: improve adjtimex frequency rounding
    timekeeping: fix rounding problem during clock update
    ntp: let update_persistent_clock() sleep
    hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds
    posix-timers: lock_timer: make it readable
    posix-timers: lock_timer: kill the bogus ->it_id check
    posix-timers: kill ->it_sigev_signo and ->it_sigev_value
    posix-timers: sys_timer_create: cleanup the error handling
    posix-timers: move the initialization of timer->sigq from send to create path
    posix-timers: sys_timer_create: simplify and s/tasklist/rcu/
    ...

    Fix trivial conflicts due to sysrq-q description clahes in
    Documentation/sysrq.txt and drivers/char/sysrq.c

    Linus Torvalds
     

20 Oct, 2008

6 commits

  • The usage of elfcorehdr_addr has changed recently such that being set to
    ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
    executing in a kernel executed as a crash kernel.

    However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
    elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
    calls to is_kdump_kernel() will return 0, even though they should return
    1.

    Ok, at this point in time there are no subsequent calls, but I think its
    fair to say that there is ample scope for error or at the very least
    confusion.

    This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
    elfcorehdr_addr was passed on the command line, and thus execution is
    taking place in a crashdump kernel, but vmcore can't be used for some
    reason. This is tested for using is_vmcore_usable() and set using
    vmcore_unusable(). A subsequent patch makes use of this new code.

    To summarise, the states that elfcorehdr_addr can now be in are as follows:

    ELFCORE_ADDR_MAX: not a crashdump kernel
    ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
    any other value: crash dump kernel and vmcore is usable

    Signed-off-by: Simon Horman
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Horman
     
  • o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
    but also by the code which is not inside CONFIG_PROC_VMCORE. For
    example, is_kdump_kernel() is used by powerpc code to determine if
    kernel is booting after a panic then use previous kernel's TCE table.
    So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
    able to correctly determine that we are booting after a panic and setup
    calgary iommu accordingly.

    o So remove the assumption that elfcorehdr_addr is under
    CONFIG_PROC_VMCORE.

    o Move definition of elfcorehdr_addr to arch dependent crash files.
    (Unfortunately crash dump does not have an arch independent file
    otherwise that would have been the best place).

    o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
    second kernel without KEXEC being enabled.

    o I don't see sh setup code parsing the command line for
    elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
    Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
    broken on sh.

    Signed-off-by: Vivek Goyal
    Acked-by: "Eric W. Biederman"
    Acked-by: Simon Horman
    Acked-by: Paul Mundt
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Add NR_MLOCK zone page state, which provides a (conservative) count of
    mlocked pages (actually, the number of mlocked pages moved off the LRU).

    Reworked by lts to fit in with the modified mlock page support in the
    Reclaim Scalability series.

    [kosaki.motohiro@jp.fujitsu.com: fix incorrect Mlocked field of /proc/meminfo]
    [lee.schermerhorn@hp.com: mlocked-pages: add event counting with statistics]
    Signed-off-by: Nick Piggin
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Report unevictable pages per zone and system wide.

    Kosaki Motohiro added support for memory controller unevictable
    statistics.

    [riel@redhat.com: fix printk in show_free_areas()]
    [akpm@linux-foundation.org: fix units in /proc/vmstats]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: KOSAKI Motohiro
    Debugged-by: Hiroshi Shimamoto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Split the LRU lists in two, one set for pages that are backed by real file
    systems ("file") and one for pages that are backed by memory and swap
    ("anon"). The latter includes tmpfs.

    The advantage of doing this is that the VM will not have to scan over lots
    of anonymous pages (which we generally do not want to swap out), just to
    find the page cache pages that it should evict.

    This patch has the infrastructure and a basic policy to balance how much
    we scan the anon lists and how much we scan the file lists. The big
    policy changes are in separate patches.

    [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
    [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
    [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
    [hugh@veritas.com: memcg swapbacked pages active]
    [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
    [akpm@linux-foundation.org: fix /proc/vmstat units]
    [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
    [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
    [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • …tp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus

    Thomas Gleixner
     

17 Oct, 2008

1 commit


16 Oct, 2008

7 commits


15 Oct, 2008

1 commit

  • * 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits)
    svcrdma: Fix IRD/ORD polarity
    svcrdma: Update svc_rdma_send_error to use DMA LKEY
    svcrdma: Modify the RPC reply path to use FRMR when available
    svcrdma: Modify the RPC recv path to use FRMR when available
    svcrdma: Add support to svc_rdma_send to handle chained WR
    svcrdma: Modify post recv path to use local dma key
    svcrdma: Add a service to register a Fast Reg MR with the device
    svcrdma: Query device for Fast Reg support during connection setup
    svcrdma: Add FRMR get/put services
    NLM: Remove unused argument from svc_addsock() function
    NLM: Remove "proto" argument from lockd_up()
    NLM: Always start both UDP and TCP listeners
    lockd: Remove unused fields in the nlm_reboot structure
    lockd: Add helper to sanity check incoming NOTIFY requests
    lockd: change nlmclnt_grant() to take a "struct sockaddr *"
    lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses
    lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET
    lockd: Support non-AF_INET addresses in nlm_lookup_host()
    NLM: Convert nlm_lookup_host() to use a single argument
    svcrdma: Add Fast Reg MR Data Types
    ...

    Linus Torvalds
     

10 Oct, 2008

10 commits


30 Sep, 2008

1 commit

  • This patch adds the CONFIG_FILE_LOCKING option which allows to remove
    support for advisory locks. With this patch enabled, the flock()
    system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl()
    and NFS support are disabled. These features are not necessarly needed
    on embedded systems. It allows to save ~11 Kb of kernel code and data:

    text data bss dec hex filename
    1125436 118764 212992 1457192 163c28 vmlinux.old
    1114299 118564 212992 1445855 160fdf vmlinux
    -11137 -200 0 -11337 -2C49 +/-

    This patch has originally been written by Matt Mackall
    , and is part of the Linux Tiny project.

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: Matt Mackall
    Cc: matthew@wil.cx
    Cc: linux-fsdevel@vger.kernel.org
    Cc: mpm@selenic.com
    Cc: akpm@linux-foundation.org
    Signed-off-by: J. Bruce Fields

    Thomas Petazzoni
     

14 Sep, 2008

3 commits

  • Overview

    This patch reworks the handling of POSIX CPU timers, including the
    ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together
    with the help of Roland McGrath, the owner and original writer of this code.

    The problem we ran into, and the reason for this rework, has to do with using
    a profiling timer in a process with a large number of threads. It appears
    that the performance of the old implementation of run_posix_cpu_timers() was
    at least O(n*3) (where "n" is the number of threads in a process) or worse.
    Everything is fine with an increasing number of threads until the time taken
    for that routine to run becomes the same as or greater than the tick time, at
    which point things degrade rather quickly.

    This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."

    Code Changes

    This rework corrects the implementation of run_posix_cpu_timers() to make it
    run in constant time for a particular machine. (Performance may vary between
    one machine and another depending upon whether the kernel is built as single-
    or multiprocessor and, in the latter case, depending upon the number of
    running processors.) To do this, at each tick we now update fields in
    signal_struct as well as task_struct. The run_posix_cpu_timers() function
    uses those fields to make its decisions.

    We define a new structure, "task_cputime," to contain user, system and
    scheduler times and use these in appropriate places:

    struct task_cputime {
    cputime_t utime;
    cputime_t stime;
    unsigned long long sum_exec_runtime;
    };

    This is included in the structure "thread_group_cputime," which is a new
    substructure of signal_struct and which varies for uniprocessor versus
    multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as
    a simple substructure, while for multiprocessor kernels it is a pointer:

    struct thread_group_cputime {
    struct task_cputime totals;
    };

    struct thread_group_cputime {
    struct task_cputime *totals;
    };

    We also add a new task_cputime substructure directly to signal_struct, to
    cache the earliest expiration of process-wide timers, and task_cputime also
    replaces the it_*_expires fields of task_struct (used for earliest expiration
    of thread timers). The "thread_group_cputime" structure contains process-wide
    timers that are updated via account_user_time() and friends. In the non-SMP
    case the structure is a simple aggregator; unfortunately in the SMP case that
    simplicity was not achievable due to cache-line contention between CPUs (in
    one measured case performance was actually _worse_ on a 16-cpu system than
    the same test on a 4-cpu system, due to this contention). For SMP, the
    thread_group_cputime counters are maintained as a per-cpu structure allocated
    using alloc_percpu(). The timer functions update only the timer field in
    the structure corresponding to the running CPU, obtained using per_cpu_ptr().

    We define a set of inline functions in sched.h that we use to maintain the
    thread_group_cputime structure and hide the differences between UP and SMP
    implementations from the rest of the kernel. The thread_group_cputime_init()
    function initializes the thread_group_cputime structure for the given task.
    The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
    out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
    in the per-cpu structures and fields. The thread_group_cputime_free()
    function, also a no-op for UP, in SMP frees the per-cpu structures. The
    thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
    thread_group_cputime_alloc() if the per-cpu structures haven't yet been
    allocated. The thread_group_cputime() function fills the task_cputime
    structure it is passed with the contents of the thread_group_cputime fields;
    in UP it's that simple but in SMP it must also safely check that tsk->signal
    is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
    if so, sums the per-cpu values for each online CPU. Finally, the three
    functions account_group_user_time(), account_group_system_time() and
    account_group_exec_runtime() are used by timer functions to update the
    respective fields of the thread_group_cputime structure.

    Non-SMP operation is trivial and will not be mentioned further.

    The per-cpu structure is always allocated when a task creates its first new
    thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
    It is freed at process exit via a call to thread_group_cputime_free() from
    cleanup_signal().

    All functions that formerly summed utime/stime/sum_sched_runtime values from
    from all threads in the thread group now use thread_group_cputime() to
    snapshot the values in the thread_group_cputime structure or the values in
    the task structure itself if the per-cpu structure hasn't been allocated.

    Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
    The run_posix_cpu_timers() function has been split into a fast path and a
    slow path; the former safely checks whether there are any expired thread
    timers and, if not, just returns, while the slow path does the heavy lifting.
    With the dedicated thread group fields, timers are no longer "rebalanced" and
    the process_timer_rebalance() function and related code has gone away. All
    summing loops are gone and all code that used them now uses the
    thread_group_cputime() inline. When process-wide timers are set, the new
    task_cputime structure in signal_struct is used to cache the earliest
    expiration; this is checked in the fast path.

    Performance

    The fix appears not to add significant overhead to existing operations. It
    generally performs the same as the current code except in two cases, one in
    which it performs slightly worse (Case 5 below) and one in which it performs
    very significantly better (Case 2 below). Overall it's a wash except in those
    two cases.

    I've since done somewhat more involved testing on a dual-core Opteron system.

    Case 1: With no itimer running, for a test with 100,000 threads, the fixed
    kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
    all of which was spent in the system. There were twice as many
    voluntary context switches with the fix as without it.

    Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
    an unmodified kernel can handle), the fixed kernel ran the test in
    eight percent of the time (5.8 seconds as opposed to 70 seconds) and
    had better tick accuracy (.012 seconds per tick as opposed to .023
    seconds per tick).

    Case 3: A 4000-thread test with an initial timer tick of .01 second and an
    interval of 10,000 seconds (i.e. a timer that ticks only once) had
    very nearly the same performance in both cases: 6.3 seconds elapsed
    for the fixed kernel versus 5.5 seconds for the unfixed kernel.

    With fewer threads (eight in these tests), the Case 1 test ran in essentially
    the same time on both the modified and unmodified kernels (5.2 seconds versus
    5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds
    versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
    tick versus .025 seconds per tick for the unmodified kernel.

    Since the fix affected the rlimit code, I also tested soft and hard CPU limits.

    Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
    running), the modified kernel was very slightly favored in that while
    it killed the process in 19.997 seconds of CPU time (5.002 seconds of
    wall time), only .003 seconds of that was system time, the rest was
    user time. The unmodified kernel killed the process in 20.001 seconds
    of CPU (5.014 seconds of wall time) of which .016 seconds was system
    time. Really, though, the results were too close to call. The results
    were essentially the same with no itimer running.

    Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
    (where the hard limit would never be reached) and an itimer running,
    the modified kernel exhibited worse tick accuracy than the unmodified
    kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise,
    performance was almost indistinguishable. With no itimer running this
    test exhibited virtually identical behavior and times in both cases.

    In times past I did some limited performance testing. those results are below.

    On a four-cpu Opteron system without this fix, a sixteen-thread test executed
    in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On
    the same system with the fix, user and elapsed time were about the same, but
    system time dropped to 0.007 seconds. Performance with eight, four and one
    thread were comparable. Interestingly, the timer ticks with the fix seemed
    more accurate: The sixteen-thread test with the fix received 149543 ticks
    for 0.024 seconds per tick, while the same test without the fix received 58720
    for 0.061 seconds per tick. Both cases were configured for an interval of
    0.01 seconds. Again, the other tests were comparable. Each thread in this
    test computed the primes up to 25,000,000.

    I also did a test with a large number of threads, 100,000 threads, which is
    impossible without the fix. In this case each thread computed the primes only
    up to 10,000 (to make the runtime manageable). System time dominated, at
    1546.968 seconds out of a total 2176.906 seconds (giving a user time of
    629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite
    accurate. There is obviously no comparable test without the fix.

    Signed-off-by: Frank Mayhar
    Cc: Roland McGrath
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Frank Mayhar
     
  • A "Quicklists: 0 kB" line has just started appearing in
    /proc/meminfo, but most architectures (including x86) don't have
    them configured, so #ifdef it, like the highmem lines.

    And those architectures which do have quicklists configured are
    using them for page tables: so let's place it next to PageTables.

    Signed-off-by: Hugh Dickins
    Acked-by: Christoph Lameter
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Print parent directory name as well.

    The aim is to catch non-creation of parent directory when proc_mkdir will
    return NULL and all subsequent registrations go directly in /proc instead
    of intended directory.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    [ Fixed insane printk string while at it. - Linus ]
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

06 Sep, 2008

1 commit

  • Spencer reported a problem where utime and stime were going negative despite
    the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
    reason for the problem is that signal_struct maintains it's own utime and
    stime (of exited tasks), these are not updated using the new task_utime()
    routine, hence sig->utime can go backwards and cause the same problem
    to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
    fixes the problem

    TODO: using max(task->prev_utime, derived utime) works for now, but a more
    generic solution is to implement cputime_max() and use the cputime_gt()
    function for comparison.

    Reported-by: spencer@bluehost.com
    Signed-off-by: Balbir Singh
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Balbir Singh
     

03 Sep, 2008

1 commit

  • Quicklists can consume several GB of memory. We should provide a means of
    monitoring this.

    After this patch is applied, /proc/meminfo will output the following:

    % cat /proc/meminfo

    MemTotal: 7715392 kB
    MemFree: 5401600 kB
    Buffers: 80384 kB
    Cached: 300800 kB
    SwapCached: 0 kB
    Active: 235584 kB
    Inactive: 262656 kB
    SwapTotal: 2031488 kB
    SwapFree: 2031488 kB
    Dirty: 3520 kB
    Writeback: 0 kB
    AnonPages: 117696 kB
    Mapped: 38528 kB
    Slab: 1589952 kB
    SReclaimable: 23104 kB
    SUnreclaim: 1566848 kB
    PageTables: 14656 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 5889152 kB
    Committed_AS: 393152 kB
    VmallocTotal: 17592177655808 kB
    VmallocUsed: 29056 kB
    VmallocChunk: 17592177626432 kB
    Quicklists: 130944 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 262144 kB

    Signed-off-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Keiichiro Tokunaga
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 Aug, 2008

1 commit

  • Ouch, if number taken from IDA is too big, the intent was to signal an
    error, not check for overflow and still do overflowing addition.

    One still needs 2^28 proc entries to notice this.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     

21 Aug, 2008

1 commit

  • This addresses

    http://bugzilla.kernel.org/show_bug.cgi?id=11318

    In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20
    than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE
    equal to 4096 (i.e. 2^12). The next seq_printf use an unsigned long for
    the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset
    value displayed in /proc/self/maps is truncated if the page offset is
    greater than 2^20.

    A test that shows this issue:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define PAGE_SIZE (getpagesize())

    #if __i386__
    # define U64_STR "%llx"
    #elif __x86_64
    # define U64_STR "%lx"
    #else
    # error "Architecture Unsupported"
    #endif

    int main(int argc, char *argv[])
    {
    int fd;
    char *addr;
    off64_t offset = 0x10000000;
    char *filename = "/dev/zero";

    fd = open(filename, O_RDONLY);
    if (fd < 0) {
    perror("open");
    return 1;
    }

    offset *= 0x10;
    printf("offset = " U64_STR "\n", offset);

    addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd,
    offset);
    if ((void*)addr == MAP_FAILED) {
    perror("mmap64");
    return 1;
    }

    {
    FILE *fmaps;
    char *line = NULL;
    size_t len = 0;
    ssize_t read;
    size_t filename_len = strlen(filename);

    fmaps = fopen("/proc/self/maps", "r");
    if (!fmaps) {
    perror("fopen");
    return 1;
    }
    while ((read = getline(&line, &len, fmaps)) != -1) {
    if ((read > filename_len + 1)
    && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0))
    printf("%s", line);
    }

    if (line)
    free(line);

    fclose(fmaps);
    }

    close(fd);
    return 0;
    }

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Clement Calmels
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Clement Calmels
     

06 Aug, 2008

1 commit

  • proc: fix warnings

    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 5 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 6 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 9 has type 'u64'

    Signed-off-by: Alexander Beregalov
    Acked-by: Andrea Righi
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Beregalov
     

01 Aug, 2008

2 commits

  • proc doesn't use "associate pointer with id" feature of IDR, so switch
    to IDA.

    NOTE, NOTE, NOTE:
    Do not apply if release_inode_number() still mantions MAX_ID_MASK!

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     
  • Id which proc gets from IDR for inode number and id which proc removes
    from IDR do not match. E.g. 0x11a transforms into 0x8000011a.

    Which stayed unnoticed for a long time because, surprise, idr_remove()
    masks out that high bit before doing anything.

    All of this due to "| ~MAX_ID_MASK" in release_inode_number().

    I still don't understand how it's supposed to work, because "| ~MASK"
    is not an inversion for "& MAX" operation.

    So, use just one nice, working addition. Make start offset unsigned int,
    while I'm at it. It's longness is not used anywhere.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     

28 Jul, 2008

2 commits

  • Simplify the code of include/linux/task_io_accounting.h.

    It is also more reasonable to have all the task i/o-related statistics in a
    single struct (task_io_accounting).

    Signed-off-by: Andrea Righi
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Andrea Righi
     
  • Put all i/o statistics in struct proc_io_accounting and use inline functions to
    initialize and increment statistics, removing a lot of single variable
    assignments.

    This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
    CONFIG_TASK_IO_ACCOUNTING=y).

    text data bss dec hex filename
    11651 0 0 11651 2d83 kernel/exit.o.before
    11619 0 0 11619 2d63 kernel/exit.o.after
    10886 132 136 11154 2b92 kernel/fork.o.before
    10758 132 136 11026 2b12 kernel/fork.o.after

    3082029 807968 4818600 8708597 84e1f5 vmlinux.o.before
    3081869 807968 4818600 8708437 84e155 vmlinux.o.after

    Signed-off-by: Andrea Righi
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Andrea Righi