17 Oct, 2007

12 commits

  • Introduce architecture dependent kretprobe blacklists to prohibit users
    from inserting return probes on the function in which kprobes can be
    inserted but kretprobes can not.

    This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
    registers "__switch_to" to the blacklist on x86-64, because that mark is to
    prohibit user from inserting only kretprobe.

    Signed-off-by: Masami Hiramatsu
    Cc: Prasanna S Panchamukhi
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • Remove the cpuset hooks that defined sched domains depending on the setting
    of the 'cpu_exclusive' flag.

    The cpu_exclusive flag can only be set on a child if it is set on the
    parent.

    This made that flag painfully unsuitable for use as a flag defining a
    partitioning of a system.

    It was entirely unobvious to a cpuset user what partitioning of sched
    domains they would be causing when they set that one cpu_exclusive bit on
    one cpuset, because it depended on what CPUs were in the remainder of that
    cpusets siblings and child cpusets, after subtracting out other
    cpu_exclusive cpusets.

    Furthermore, there was no way on production systems to query the
    result.

    Using the cpu_exclusive flag for this was simply wrong from the get go.

    Fortunately, it was sufficiently borked that so far as I know, almost no
    successful use has been made of this. One real time group did use it to
    affectively isolate CPUs from any load balancing efforts. They are willing
    to adapt to alternative mechanisms for this, such as someway to manipulate
    the list of isolated CPUs on a running system. They can do without this
    present cpu_exclusive based mechanism while we develop an alternative.

    There is a real risk, to the best of my understanding, of users
    accidentally setting up a partitioned scheduler domains, inhibiting desired
    load balancing across all their CPUs, due to the nonobvious (from the
    cpuset perspective) side affects of the cpu_exclusive flag.

    Furthermore, since there was no way on a running system to see what one was
    doing with sched domains, this change will be invisible to any using code.
    Unless they have real insight to the scheduler load balancing choices, they
    will be unable to detect that this change has been made in the kernel's
    behaviour.

    Initial discussion on lkml of this patch has generated much comment. My
    (probably controversial) take on that discussion is that it has reached a
    rough concensus that the current cpuset cpu_exclusive mechanism for
    defining sched domains is borked. There is no concensus on the
    replacement. But since we can remove this mechanism, and since its
    continued presence risks causing unwanted partitioning of the schedulers
    load balancing, we should remove it while we can, as we proceed to work the
    replacement scheduler domain mechanisms.

    Signed-off-by: Paul Jackson
    Cc: Ingo Molnar
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Dinakar Guniguntala
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Convert m32r to the generic sys_ptrace. The conversion requires an
    architecture hook after ptrace_attach which this patch adds. The hook
    will also be needed for a conersion of ia64 to the generic ptrace code.

    Thanks to Hirokazu Takata for fixing a bug in the first version of this
    code.

    Signed-off-by: Christoph Hellwig
    Cc: Hirokazu Takata
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The maximum size of the huge page pool can be controlled using the overall
    size of the hugetlb filesystem (via its 'size' mount option). However in the
    common case the this will not be set as the pool is traditionally fixed in
    size at boot time. In order to maintain the expected semantics, we need to
    prevent the pool expanding by default.

    This patch introduces a new sysctl controlling dynamic pool resizing. When
    this is enabled the pool will expand beyond its base size up to the size of
    the hugetlb filesystem. It is disabled by default.

    Signed-off-by: Adam Litke
    Acked-by: Andy Whitcroft
    Acked-by: Dave McCracken
    Cc: William Irwin
    Cc: David Gibson
    Cc: Ken Chen
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • A clean up patch for "scanning memory resource [start, end)" operation.

    Now, find_next_system_ram() function is used in memory hotplug, but this
    interface is not easy to use and codes are complicated.

    This patch adds walk_memory_resouce(start,len,arg,func) function.
    The function 'func' is called per valid memory resouce range in [start,pfn).

    [pbadari@us.ibm.com: Error handling in walk_memory_resource()]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • cpusets try to ensure that any node added to a cpuset's mems_allowed is
    on-line and contains memory. The assumption was that online nodes contained
    memory. Thus, it is possible to add memoryless nodes to a cpuset and then add
    tasks to this cpuset. This results in continuous series of oom-kill and
    apparent system hang.

    Change cpusets to use node_states[N_HIGH_MEMORY] [a.k.a. node_memory_map] in
    place of node_online_map when vetting memories. Return error if admin
    attempts to write a non-empty mems_allowed node mask containing only
    memoryless-nodes.

    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Bob Picco
    Signed-off-by: Nishanth Aravamudan
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Processors on memoryless nodes must be able to fall back to remote nodes in
    order to get a profiling buffer. This may lead to excessive NUMA traffic but
    I think we should allow this rather than failing.

    Signed-off-by: Christoph Lameter
    Acked-by: Nishanth Aravamudan
    Acked-by: Lee Schermerhorn
    Acked-by: Bob Picco
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • x86(-64) are the last architectures still using the page fault notifier
    cruft for the kprobes page fault hook. This patch converts them to the
    proper direct calls, and removes the now unused pagefault notifier bits
    aswell as the cruft in kprobes.c that was related to this mess.

    I know Andi didn't really like this, but all other architecture maintainers
    agreed the direct calls are much better and besides the obvious cruft
    removal a common way of dealing with kprobes across architectures is
    important aswell.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix sparc64]
    Signed-off-by: Christoph Hellwig
    Cc: Andi Kleen
    Cc:
    Cc: Prasanna S Panchamukhi
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
    variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
    from startup and CPU HOTPLUG functions.

    Signed-off-by: Mike Travis
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: "Siddha, Suresh B"
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Travis
     
  • Optionally add a boot delay after each kernel printk() call, crudely
    measured in milliseconds, with a maximum delay of 10 seconds per printk.

    Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
    "lpj=loops_per_jiffy boot_delay=100"
    to the kernel command line.

    It has been useful in cases like "during boot, my machine just reboots or the
    screen goes black" by slowing down printk, (and adding initcall_debug), we can
    usually see the last thing that happened before the lights went out which is
    usually a valuable clue.

    [akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
    [akpm@linux-foundation.org: fix lots of stuff]
    [bunk@stusta.de: kernel/printk.c: make 2 variables static]
    [heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
    Signed-off-by: Randy Dunlap
    Signed-off-by: Dave Jones
    Signed-off-by: Adrian Bunk
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Identical handlers of PTRACE_DETACH go into ptrace_request().
    Not touching compat code.
    Not touching archs that don't call ptrace_request.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

16 Oct, 2007

2 commits

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
    NFSv4: Fix a typo in nfs_inode_reclaim_delegation
    NFS: Add a boot parameter to disable 64 bit inode numbers
    NFS: nfs_refresh_inode should clear cache_validity flags on success
    NFS: Fix a connectathon regression in NFSv3 and NFSv4
    NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
    SUNRPC: Don't call xprt_release in call refresh
    SUNRPC: Don't call xprt_release() if call_allocate fails
    SUNRPC: Fix buggy UDP transmission
    [23/37] Clean up duplicate includes in
    [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
    SUNRPC: Use correct type in buffer length calculations
    SUNRPC: Fix default hostname created in rpc_create()
    nfs: add server port to rpc_pipe info file
    NFS: Get rid of some obsolete macros
    NFS: Simplify filehandle revalidation
    NFS: Ensure that nfs_link() returns a hashed dentry
    NFS: Be strict about dentry revalidation when doing exclusive create
    NFS: Don't zap the readdir caches upon error
    NFS: Remove the redundant nfs_reval_fsid()
    NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
    ...

    Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
    net/sunrpc/xprtsock.c

    Linus Torvalds
     
  • …peterz/linux-2.6-lockdep

    * 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
    lockdep: annotate dir vs file i_mutex
    lockdep: per filesystem inode lock class
    lockdep: annotate kprobes irq fiddling
    lockdep: annotate rcu_read_{,un}lock{,_bh}
    lockdep: annotate journal_start()
    lockdep: s390: connect the sysexit hook
    lockdep: x86_64: connect the sysexit hook
    lockdep: i386: connect the sysexit hook
    lockdep: syscall exit check
    lockdep: fixup mutex annotations
    lockdep: fix mismatched lockdep_depth/curr_chain_hash
    lockdep: Avoid /proc/lockdep & lock_stat infinite output
    lockdep: maintainers

    Linus Torvalds
     

15 Oct, 2007

26 commits