06 Sep, 2008

1 commit

  • Spencer reported a problem where utime and stime were going negative despite
    the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
    reason for the problem is that signal_struct maintains it's own utime and
    stime (of exited tasks), these are not updated using the new task_utime()
    routine, hence sig->utime can go backwards and cause the same problem
    to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
    fixes the problem

    TODO: using max(task->prev_utime, derived utime) works for now, but a more
    generic solution is to implement cputime_max() and use the cputime_gt()
    function for comparison.

    Reported-by: spencer@bluehost.com
    Signed-off-by: Balbir Singh
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Balbir Singh
     

03 Sep, 2008

1 commit

  • Quicklists can consume several GB of memory. We should provide a means of
    monitoring this.

    After this patch is applied, /proc/meminfo will output the following:

    % cat /proc/meminfo

    MemTotal: 7715392 kB
    MemFree: 5401600 kB
    Buffers: 80384 kB
    Cached: 300800 kB
    SwapCached: 0 kB
    Active: 235584 kB
    Inactive: 262656 kB
    SwapTotal: 2031488 kB
    SwapFree: 2031488 kB
    Dirty: 3520 kB
    Writeback: 0 kB
    AnonPages: 117696 kB
    Mapped: 38528 kB
    Slab: 1589952 kB
    SReclaimable: 23104 kB
    SUnreclaim: 1566848 kB
    PageTables: 14656 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 5889152 kB
    Committed_AS: 393152 kB
    VmallocTotal: 17592177655808 kB
    VmallocUsed: 29056 kB
    VmallocChunk: 17592177626432 kB
    Quicklists: 130944 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 262144 kB

    Signed-off-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Keiichiro Tokunaga
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 Aug, 2008

1 commit

  • Ouch, if number taken from IDA is too big, the intent was to signal an
    error, not check for overflow and still do overflowing addition.

    One still needs 2^28 proc entries to notice this.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     

21 Aug, 2008

1 commit

  • This addresses

    http://bugzilla.kernel.org/show_bug.cgi?id=11318

    In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20
    than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE
    equal to 4096 (i.e. 2^12). The next seq_printf use an unsigned long for
    the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset
    value displayed in /proc/self/maps is truncated if the page offset is
    greater than 2^20.

    A test that shows this issue:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define PAGE_SIZE (getpagesize())

    #if __i386__
    # define U64_STR "%llx"
    #elif __x86_64
    # define U64_STR "%lx"
    #else
    # error "Architecture Unsupported"
    #endif

    int main(int argc, char *argv[])
    {
    int fd;
    char *addr;
    off64_t offset = 0x10000000;
    char *filename = "/dev/zero";

    fd = open(filename, O_RDONLY);
    if (fd < 0) {
    perror("open");
    return 1;
    }

    offset *= 0x10;
    printf("offset = " U64_STR "\n", offset);

    addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd,
    offset);
    if ((void*)addr == MAP_FAILED) {
    perror("mmap64");
    return 1;
    }

    {
    FILE *fmaps;
    char *line = NULL;
    size_t len = 0;
    ssize_t read;
    size_t filename_len = strlen(filename);

    fmaps = fopen("/proc/self/maps", "r");
    if (!fmaps) {
    perror("fopen");
    return 1;
    }
    while ((read = getline(&line, &len, fmaps)) != -1) {
    if ((read > filename_len + 1)
    && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0))
    printf("%s", line);
    }

    if (line)
    free(line);

    fclose(fmaps);
    }

    close(fd);
    return 0;
    }

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Clement Calmels
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Clement Calmels
     

06 Aug, 2008

1 commit

  • proc: fix warnings

    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 5 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 6 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'u64'
    fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 9 has type 'u64'

    Signed-off-by: Alexander Beregalov
    Acked-by: Andrea Righi
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Beregalov
     

01 Aug, 2008

2 commits

  • proc doesn't use "associate pointer with id" feature of IDR, so switch
    to IDA.

    NOTE, NOTE, NOTE:
    Do not apply if release_inode_number() still mantions MAX_ID_MASK!

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     
  • Id which proc gets from IDR for inode number and id which proc removes
    from IDR do not match. E.g. 0x11a transforms into 0x8000011a.

    Which stayed unnoticed for a long time because, surprise, idr_remove()
    masks out that high bit before doing anything.

    All of this due to "| ~MAX_ID_MASK" in release_inode_number().

    I still don't understand how it's supposed to work, because "| ~MASK"
    is not an inversion for "& MAX" operation.

    So, use just one nice, working addition. Make start offset unsigned int,
    while I'm at it. It's longness is not used anywhere.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     

28 Jul, 2008

2 commits

  • Simplify the code of include/linux/task_io_accounting.h.

    It is also more reasonable to have all the task i/o-related statistics in a
    single struct (task_io_accounting).

    Signed-off-by: Andrea Righi
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Andrea Righi
     
  • Put all i/o statistics in struct proc_io_accounting and use inline functions to
    initialize and increment statistics, removing a lot of single variable
    assignments.

    This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
    CONFIG_TASK_IO_ACCOUNTING=y).

    text data bss dec hex filename
    11651 0 0 11651 2d83 kernel/exit.o.before
    11619 0 0 11619 2d63 kernel/exit.o.after
    10886 132 136 11154 2b92 kernel/fork.o.before
    10758 132 136 11026 2b12 kernel/fork.o.after

    3082029 807968 4818600 8708597 84e1f5 vmlinux.o.before
    3081869 807968 4818600 8708437 84e155 vmlinux.o.after

    Signed-off-by: Andrea Righi
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Andrea Righi
     

27 Jul, 2008

8 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits)
    [PATCH] fix RLIM_NOFILE handling
    [PATCH] get rid of corner case in dup3() entirely
    [PATCH] remove remaining namei_{32,64}.h crap
    [PATCH] get rid of indirect users of namei.h
    [PATCH] get rid of __user_path_lookup_open
    [PATCH] f_count may wrap around
    [PATCH] dup3 fix
    [PATCH] don't pass nameidata to __ncp_lookup_validate()
    [PATCH] don't pass nameidata to gfs2_lookupi()
    [PATCH] new (local) helper: user_path_parent()
    [PATCH] sanitize __user_walk_fd() et.al.
    [PATCH] preparation to __user_walk_fd cleanup
    [PATCH] kill nameidata passing to permission(), rename to inode_permission()
    [PATCH] take noexec checks to very few callers that care
    Re: [PATCH 3/6] vfs: open_exec cleanup
    [patch 4/4] vfs: immutable inode checking cleanup
    [patch 3/4] fat: dont call notify_change
    [patch 2/4] vfs: utimes cleanup
    [patch 1/4] vfs: utimes: move owner check into inode_change_ok()
    [PATCH] vfs: use kstrdup() and check failing allocation
    ...

    Linus Torvalds
     
  • Oleg Nesterov points out that we should check that the task is still alive
    before we iterate over the threads. This patch includes a fixup for this.

    Also simplify do_io_accounting() implementation.

    Signed-off-by: Andrea Righi
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     
  • * kill nameidata * argument; map the 3 bits in ->flags anybody cares
    about to new MAY_... ones and pass with the mask.
    * kill redundant gfs2_iop_permission()
    * sanitize ecryptfs_permission()
    * fix remaining places where ->permission() instances might barf on new
    MAY_... found in mask.

    The obvious next target in that direction is permission(9)

    folded fix for nfs_permission() breakage from Miklos Szeredi

    Signed-off-by: Al Viro

    Al Viro
     
  • * keep references to ctl_table_head and ctl_table in /proc/sys inodes
    * grab the former during operations, use the latter for access to
    entry if that succeeds
    * have ->d_compare() check if table should be seen for one who does lookup;
    that allows us to avoid flipping inodes - if we have the same name resolve
    to different things, we'll just keep several dentries and ->d_compare()
    will reject the wrong ones.
    * have ->lookup() and ->readdir() scan the table of our inode first, then
    walk all ctl_table_header and scan ->attached_by for those that are
    attached to our directory.
    * implement ->getattr().
    * get rid of insane amounts of tree-walking
    * get rid of the need to know dentry in ->permission() and of the contortions
    induced by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
    These use task_current_syscall() to show the task's current system call
    number and argument registers, stack pointer and PC. For a task blocked
    but not in a syscall, the file shows "-1" in place of the syscall number,
    followed by only the SP and PC. For a task that's not blocked, it shows
    "running".

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds the tracehook_tracer_task() hook to consolidate all forms of
    "Who is using ptrace on me?" logic. This is used for "TracerPid:" in
    /proc and for permission checks. We also clean up the selinux code the
    called an identical accessor.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.
    This way, the entire if() {} section can collapse into the WARN() as well.

    Signed-off-by: Arjan van de Ven
    Acked-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

6 commits

  • Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
    parent I/O statistics in /proc/pid/io. This approach follows the same
    model used to account per-process and per-thread CPU times.

    As a practial application, this allows for example to quickly find the top
    I/O consumer when a process spawns many child threads that perform the
    actual I/O work, because the aggregated I/O statistics can always be found
    in /proc/pid/io.

    [ Oleg Nesterov points out that we should check that the task is still
    alive before we iterate over the threads, but also says that we can do
    that fixup on top of this later. - Linus ]

    Acked-by: Balbir Singh
    Signed-off-by: Andrea Righi
    Cc: Matt Heaton
    Cc: Shailabh Nagar
    Acked-by-with-comments: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • MS_RMT_MASK will unmask changes in do_remount_sb() anyway.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Current two-stage scheme of removing PDE emphasizes one bug in proc:

    open
    rmmod
    remove_proc_entry
    close

    ->release won't be called because ->proc_fops were cleared. In simple
    cases it's small memory leak.

    For every ->open, ->release has to be done. List of openers is introduced
    which is traversed at remove_proc_entry() if neeeded.

    Discussions with Al long ago (sigh).

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This patch moves the extern of struct proc_kmsg_operations to
    fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c
    so that the latter sees the former.

    Signed-off-by: Adrian Bunk
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • ELF_CORE_EFLAGS is already used by the binfmt_elf coredumper to set correct
    arch specific ELF header flags on coredumps. Use it for kcore dumps as well.
    At the moment, this affects the CRIS and the H8300 arch.

    Signed-off-by: Edgar E. Iglesias
    Cc: Mikael Starvik
    Cc: Yoshinori Sato
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Edgar E. Iglesias
     

25 Jul, 2008

2 commits

  • Christoph recently added /proc/vmallocinfo file to get information about
    vmalloc allocations.

    This patch adds NUMA specific information, giving number of pages
    allocated on each memory node.

    This should help to check that vmalloc() is able to respect NUMA policies.

    Example of output on a four nodes machine (one cpu per node)

    1) network hash tables are evenly spreaded on four nodes (OK) (Same
    point for inodes and dentries hash tables)

    2) iptables tables (x_tables) are correctly allocated on each cpu node
    (OK).

    3) sys_swapon() allocates its memory from one node only.

    4) each loaded module is using memory on one node.

    Sysadmins could tune their setup to change points 3) and 4) if necessary.

    grep "pages=" /proc/vmallocinfo
    0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
    0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
    0xffffc2000031a000-0xffffc2000031d000 12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
    0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
    0xffffc2000033e000-0xffffc20000341000 12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2
    0xffffc20000341000-0xffffc20000344000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
    0xffffc20000344000-0xffffc20000347000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
    0xffffc20000347000-0xffffc2000034a000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
    0xffffc2000034a000-0xffffc2000034d000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
    0xffffc20004381000-0xffffc20004402000 528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32
    0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256
    0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
    0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743
    0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
    0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
    0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
    0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
    0xffffffffa0022000-0xffffffffa0028000 24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
    0xffffffffa0028000-0xffffffffa0050000 163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
    0xffffffffa0050000-0xffffffffa0052000 8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
    0xffffffffa0052000-0xffffffffa0056000 16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
    0xffffffffa0056000-0xffffffffa0081000 176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
    0xffffffffa0081000-0xffffffffa00ae000 184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
    0xffffffffa00ae000-0xffffffffa00b1000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
    0xffffffffa00b1000-0xffffffffa00b9000 32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
    0xffffffffa00b9000-0xffffffffa00c4000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
    0xffffffffa00c6000-0xffffffffa00e0000 106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
    0xffffffffa00e0000-0xffffffffa00f1000 69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
    0xffffffffa00f1000-0xffffffffa00f4000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
    0xffffffffa00f4000-0xffffffffa00f7000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2

    [akpm@linux-foundation.org: fix comment]
    Signed-off-by: Eric Dumazet
    Cc: Christoph Lameter
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • This patch adds proper extern declarations for five variables in
    include/linux/vmstat.h

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

23 Jul, 2008

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
    ipw2200: Call netif_*_queue() interfaces properly.
    netxen: Needs to include linux/vmalloc.h
    [netdrvr] atl1d: fix !CONFIG_PM build
    r6040: rework init_one error handling
    r6040: bump release number to 0.18
    r6040: handle RX fifo full and no descriptor interrupts
    r6040: change the default waiting time
    r6040: use definitions for magic values in descriptor status
    r6040: completely rework the RX path
    r6040: call napi_disable when puting down the interface and set lp->dev accordingly.
    mv643xx_eth: fix NETPOLL build
    r6040: rework the RX buffers allocation routine
    r6040: fix scheduling while atomic in r6040_tx_timeout
    r6040: fix null pointer access and tx timeouts
    r6040: prefix all functions with r6040
    rndis_host: support WM6 devices as modems
    at91_ether: use netstats in net_device structure
    sfc: Create one RX queue and interrupt per CPU package by default
    sfc: Use a separate workqueue for resets
    sfc: I2C adapter initialisation fixes
    ...

    Linus Torvalds
     
  • get_proc_net() can now become static.

    Signed-off-by: Adrian Bunk
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • struct pagemap_walk was placed on stack, some hooks are initialized, the
    rest (->pgd_entry, ->pud_entry, ->pte_entry) are valid but junk.

    Reported-by: Eric Sesterhenn
    Signed-off-by: Alexey Dobriyan
    Cc: "Vegard Nossum"
    Cc: [2.6.25.x, 2.6.26.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 Jul, 2008

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits)
    iucv: Fix bad merging.
    net_sched: Add size table for qdiscs
    net_sched: Add accessor function for packet length for qdiscs
    net_sched: Add qdisc_enqueue wrapper
    highmem: Export totalhigh_pages.
    ipv6 mcast: Omit redundant address family checks in ip6_mc_source().
    net: Use standard structures for generic socket address structures.
    ipv6 netns: Make several "global" sysctl variables namespace aware.
    netns: Use net_eq() to compare net-namespaces for optimization.
    ipv6: remove unused macros from net/ipv6.h
    ipv6: remove unused parameter from ip6_ra_control
    tcp: fix kernel panic with listening_get_next
    tcp: Remove redundant checks when setting eff_sacks
    tcp: options clean up
    tcp: Fix MD5 signatures for non-linear skbs
    sctp: Update sctp global memory limit allocations.
    sctp: remove unnecessary byteshifting, calculate directly in big-endian
    sctp: Allow only 1 listening socket with SO_REUSEADDR
    sctp: Do not leak memory on multiple listen() calls
    sctp: Support ipv6only AF_INET6 sockets.
    ...

    Linus Torvalds
     
  • Move the line disciplines towards a conventional ->ops arrangement. For
    the moment the actual 'tty_ldisc' struct in the tty is kept as part of
    the tty struct but this can then be changed if it turns out that when it
    all settles down we want to refcount ldiscs separately to the tty.

    Pull the ldisc code out of /proc and put it with our ldisc code.

    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Alan Cox
     

18 Jul, 2008

2 commits


15 Jul, 2008

1 commit

  • * 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (821 commits)
    x86: make 64bit hpet_set_mapping to use ioremap too, v2
    x86: get x86_phys_bits early
    x86: max_low_pfn_mapped fix #4
    x86: change _node_to_cpumask_ptr to return const ptr
    x86: I/O APIC: remove an IRQ2-mask hack
    x86: fix numaq_tsc_disable calling
    x86, e820: remove end_user_pfn
    x86: max_low_pfn_mapped fix, #3
    x86: max_low_pfn_mapped fix, #2
    x86: max_low_pfn_mapped fix, #1
    x86_64: fix delayed signals
    x86: remove conflicting nx6325 and nx6125 quirks
    x86: Recover timer_ack lost in the merge of the NMI watchdog
    x86: I/O APIC: Never configure IRQ2
    x86: L-APIC: Always fully configure IRQ0
    x86: L-APIC: Set IRQ0 as edge-triggered
    x86: merge dwarf2 headers
    x86: use AS_CFI instead of UNWIND_INFO
    x86: use ignore macro instead of hash comment
    x86: use matching CFI_ENDPROC
    ...

    Linus Torvalds
     

14 Jul, 2008

1 commit

  • Enable security modules to distinguish reading of process state via
    proc from full ptrace access by renaming ptrace_may_attach to
    ptrace_may_access and adding a mode argument indicating whether only
    read access or full attach access is requested. This allows security
    modules to permit access to reading process state without granting
    full ptrace access. The base DAC/capability checking remains unchanged.

    Read access to /proc/pid/mem continues to apply a full ptrace attach
    check since check_mem_permission() already requires the current task
    to already be ptracing the target. The other ptrace checks within
    proc for elements like environ, maps, and fds are changed to pass the
    read mode instead of attach.

    In the SELinux case, we model such reading of process state as a
    reading of a proc file labeled with the target process' label. This
    enables SELinux policy to permit such reading of process state without
    permitting control or manipulation of the target process, as there are
    a number of cases where programs probe for such information via proc
    but do not need to be able to control the target (e.g. procps,
    lsof, PolicyKit, ConsoleKit). At present we have to choose between
    allowing full ptrace in policy (more permissive than required/desired)
    or breaking functionality (or in some cases just silencing the denials
    via dontaudit rules but this can hide genuine attacks).

    This version of the patch incorporates comments from Casey Schaufler
    (change/replace existing ptrace_may_attach interface, pass access
    mode), and Chris Wright (provide greater consistency in the checking).

    Note that like their predecessors __ptrace_may_attach and
    ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
    interfaces use different return value conventions from each other (0
    or -errno vs. 1 or 0). I retained this difference to avoid any
    changes to the caller logic but made the difference clearer by
    changing the latter interface to return a bool rather than an int and
    by adding a comment about it to ptrace.h for any future callers.

    Signed-off-by: Stephen Smalley
    Acked-by: Chris Wright
    Signed-off-by: James Morris

    Stephen Smalley
     

08 Jul, 2008

2 commits

  • …', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel

    Ingo Molnar
     
  • Add information about the mapping state of the direct mapping to
    /proc/meminfo. I chose /proc/meminfo because that is where all the other
    memory statistics are too and it is a generally useful metric even
    outside debugging situations. A lot of split kernel pages means the
    kernel will run slower.

    This way we can see how many large pages are really used for it and how
    many are split.

    Useful for general insight into the kernel.

    v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
    32bit doesn't need it because it doesn't support hotadd for lowmem.
    Fix some typos
    v3: Rename dpages_cnt
    Add CONFIG ifdef for count update as requested by tglx
    Expand description
    v4: Fix stupid bugs added in v3
    Move update_page_count to pageattr.c

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Andi Kleen
     

06 Jul, 2008

2 commits

  • Fix some issues in pagemap_read noted by Alexey:

    - initialize pagemap_walk.mm to "mm" , so the code starts working as
    advertised

    - initialize ->private to "&pm" so it wouldn't immediately oops in
    pagemap_pte_hole()

    - unstatic struct pagemap_walk, so two threads won't fsckup each other
    (including those started by root, including flipping ->mm when you don't
    have permissions)

    - pagemap_read() contains two calls to ptrace_may_attach(), second one
    looks unneeded.

    - avoid possible kmalloc(0) and integer wraparound.

    Cc: Alexey Dobriyan
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    [ Personally, I'd just remove the functionality entirely - Linus ]
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Don't use a static entry, so as to prevent races during concurrent use
    of this function.

    Reported-by: Alexey Dobriyan
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

16 Jun, 2008

1 commit


13 Jun, 2008

1 commit

  • We were walking right into huge page areas in the pagemap walker, and
    calling the pmds pmd_bad() and clearing them.

    That leaked huge pages. Bad.

    This patch at least works around that for now. It ignores huge pages in
    the pagemap walker for the time being, and won't leak those pages.

    Signed-off-by: Dave Hansen
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen