23 Jun, 2006

1 commit

  • This patch adds panic_on_oom sysctl under sys.vm.

    When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
    processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
    the same way as it does today. Of course, the default value is 0 and only
    root can modifies it.

    In general, oom_killer works well and kill rogue processes. So the whole
    system can survive. But there are environments where panic is preferable
    rather than kill some processes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

18 Jun, 2006

3 commits


21 Mar, 2006

10 commits


01 Mar, 2006

1 commit

  • Allow sysadmin to disable all warnings about userland apps
    making unaligned accesses by using:
    # echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
    Rather than having to use prctl on a process by process basis.

    Default behaivour leaves the warnings enabled.

    Signed-off-by: Jes Sorensen
    Signed-off-by: Tony Luck

    Jes Sorensen
     

21 Feb, 2006

1 commit

  • Currently, acpi video options can only be set on kernel command line. That's
    little inflexible; I'd like userland s2ram application that just works, and
    modifying kernel command line according to whitelist is not fun. It is better
    to just allow s2ram application to set video options just before suspend
    (according to the whitelist).

    This implements sysctl to allow setting suspend video options without reboot.

    (akpm: Documentation updates for this new sysctl are pending..)

    Signed-off-by: Pavel Machek
    Cc: "Brown, Len"
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     

02 Feb, 2006

1 commit

  • Currently the zone_reclaim code has a fixed window of 30 seconds of off node
    allocations should a local zone have no unused pagecache pages left. Reclaim
    will be attempted again after this timeout period to avoid repeated useless
    scans for memory. This is also useful to established sufficiently large off
    node allocation chunks to relieve the local node.

    It may be beneficial to adjust that time period for some special situations.
    For example if memory use was exceeding node capacity one may want to give up
    for longer periods of time. If memory spikes intermittendly then one may want
    to shorten the time period to reduce the number of off node allocations.

    This patch allows just that....

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

19 Jan, 2006

1 commit

  • proc support for zone reclaim

    This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
    used to override the automatic determination of the zone reclaim made on
    bootup.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Jan, 2006

2 commits

  • As recently there has been lot of traffic on the right values for batch and
    high water marks for per_cpu_pagelists. This patch makes these two
    variables configurable through /proc interface.

    A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry
    controls the fraction of pages at most in each zone that are allocated for
    each per cpu page list. The min value for this is 8. It means that we
    don't allow more than 1/8th of pages in each zone to be allocated in any
    single per_cpu_pagelist.

    The batch value of each per cpu pagelist is also updated as a result. It
    is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)

    Signed-off-by: Rohit Seth
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rohit Seth
     
  • Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
    discard as much pagecache and/or reclaimable slab objects as it can. THis
    operation requires root permissions.

    It won't drop dirty data, so the user should run `sync' first.

    Caveats:

    a) Holds inode_lock for exorbitant amounts of time.

    b) Needs to be taught about NUMA nodes: propagate these all the way through
    so the discarding can be controlled on a per-node basis.

    This is a debugging feature: useful for getting consistent results between
    filesystem benchmarks. We could possibly put it under a config option, but
    it's less than 300 bytes.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

05 Jan, 2006

2 commits


04 Jan, 2006

1 commit

  • Another spin of Herbert Xu's "safer ip reassembly" patch
    for 2.6.16.

    (The original patch is here:
    http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
    and my only contribution is to have tested it.)

    This patch (optionally) does additional checks before accepting IP
    fragments, which can greatly reduce the possibility of reassembling
    fragments which originated from different IP datagrams.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arthur Kepner
    Signed-off-by: David S. Miller

    Herbert Xu
     

06 Dec, 2005

1 commit


16 Nov, 2005

1 commit


12 Nov, 2005

1 commit


11 Nov, 2005

1 commit

  • This is an updated version of the RFC3465 ABC patch originally
    for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
    bytes ack'd rather than packets when updating congestion control.

    The orignal ABC described in the RFC applied to a Reno style
    algorithm. For advanced congestion control there is little
    change after leaving slow start.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

10 Nov, 2005

1 commit

  • The existing connection tracking subsystem in netfilter can only
    handle ipv4. There were basically two choices present to add
    connection tracking support for ipv6. We could either duplicate all
    of the ipv4 connection tracking code into an ipv6 counterpart, or (the
    choice taken by these patches) we could design a generic layer that
    could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
    (TCP, UDP, etc.) connection tracking helper module to be written.

    In fact nf_conntrack is capable of working with any layer 3
    protocol.

    The existing ipv4 specific conntrack code could also not deal
    with the pecularities of doing connection tracking on ipv6,
    which is also cured here. For example, these issues include:

    1) ICMPv6 handling, which is used for neighbour discovery in
    ipv6 thus some messages such as these should not participate
    in connection tracking since effectively they are like ARP
    messages

    2) fragmentation must be handled differently in ipv6, because
    the simplistic "defrag, connection track and NAT, refrag"
    (which the existing ipv4 connection tracking does) approach simply
    isn't feasible in ipv6

    3) ipv6 extension header parsing must occur at the correct spots
    before and after connection tracking decisions, and there were
    no provisions for this in the existing connection tracking
    design

    4) ipv6 has no need for stateful NAT

    The ipv4 specific conntrack layer is kept around, until all of
    the ipv4 specific conntrack helpers are ported over to nf_conntrack
    and it is feature complete. Once that occurs, the old conntrack
    stuff will get placed into the feature-removal-schedule and we will
    fully kill it off 6 months later.

    Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Harald Welte
    Signed-off-by: Arnaldo Carvalho de Melo

    Yasuyuki Kozakai
     

09 Nov, 2005

1 commit

  • You could open the /proc/sys/net/ipv4/conf// file, then
    wait for interface to go away, try to grab as much memory as possible in
    hope to hit the (kfreed) ctl_table. Then fill it with pointers to your
    function. Then do read from file you've opened and if you are lucky,
    you'll get it called as ->proc_handler() in kernel mode.

    So this is at least an Oops and possibly more. It does depend on an
    interface going away though, so less of a security risk than it would
    otherwise be.

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Al Viro
     

22 Sep, 2005

1 commit


13 Sep, 2005

1 commit

  • NET/ROM is lacking a connection reset like TCP's RST flag which at times
    may result in a connecting having to slowly timing out instead of just being
    reset. An earlier attempt to reset the connection by sending a
    NR_CONNACK | NR_CHOKE_FLAG transport was inacceptable as it did result in
    crashes of BPQ systems. An alternative approach of introducing a new
    transport type 7 (NR_RESET) has be implemented several years ago in
    Paula Jayne Dowie G8PZT's Xrouter.

    Implement NR_RESET for Linux's NET/ROM but like any messing with the state
    engine consider this experimental for now and thus control it by a sysctl
    (net.netrom.reset) which for the time being defaults to off.

    Signed-off-by: Ralf Baechle DL5RB
    Signed-off-by: David S. Miller

    Ralf Baechle
     

08 Sep, 2005

1 commit

  • The IPMI power control function proc_write_chassctrl was badly written, it
    directly used userspace pointers, it assumed that strings were NULL
    terminated, and it used the evil sscanf function. This converts over to
    using the sysctl interface for this data and changes the semantics to be a
    little more logical.

    Signed-off-by: Corey Minyard
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Corey Minyard
     

28 Jul, 2005

1 commit

  • Split spin lock and r/w lock implementation into a single try which is done
    inline and an out of line function that repeatedly tries to get the lock
    before doing the cpu_relax(). Add a system control to set the number of
    retries before a cpu is yielded.

    The reason for the spin lock retry is that the diagnose 0x44 that is used to
    give up the virtual cpu is quite expensive. For spin locks that are held only
    for a short period of time the costs of the diagnoses outweights the savings
    for spin locks that are held for a longer timer. The default retry count is
    1000.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

14 Jul, 2005

1 commit


13 Jul, 2005

1 commit

  • inotify is intended to correct the deficiencies of dnotify, particularly
    its inability to scale and its terrible user interface:

    * dnotify requires the opening of one fd per each directory
    that you intend to watch. This quickly results in too many
    open files and pins removable media, preventing unmount.
    * dnotify is directory-based. You only learn about changes to
    directories. Sure, a change to a file in a directory affects
    the directory, but you are then forced to keep a cache of
    stat structures.
    * dnotify's interface to user-space is awful. Signals?

    inotify provides a more usable, simple, powerful solution to file change
    notification:

    * inotify's interface is a system call that returns a fd, not SIGIO.
    You get a single fd, which is select()-able.
    * inotify has an event that says "the filesystem that the item
    you were watching is on was unmounted."
    * inotify can watch directories or files.

    Inotify is currently used by Beagle (a desktop search infrastructure),
    Gamin (a FAM replacement), and other projects.

    See Documentation/filesystems/inotify.txt.

    Signed-off-by: Robert Love
    Cc: John McCutchan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Love
     

29 Jun, 2005

1 commit


24 Jun, 2005

3 commits

  • Separate out the two uses of netdev_max_backlog. One controls the
    upper bound on packets processed per softirq, the new name for this is
    netdev_budget; the other controls the limit on packets queued via
    netif_rx.

    Increase the max_backlog default to account for faster processors.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Allow TCP to have multiple pluggable congestion control algorithms.
    Algorithms are defined by a set of operations and can be built in
    or modules. The legacy "new RENO" algorithm is used as a starting
    point and fallback.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Add a new `suid_dumpable' sysctl:

    This value can be used to query and set the core dump mode for setuid
    or otherwise protected/tainted binaries. The modes are

    0 - (default) - traditional behaviour. Any process which has changed
    privilege levels or is execute only will not be dumped

    1 - (debug) - all processes dump core when possible. The core dump is
    owned by the current user and no security is applied. This is intended
    for system debugging situations only. Ptrace is unchecked.

    2 - (suidsafe) - any binary which normally would not be dumped is dumped
    readable by root only. This allows the end user to remove such a dump but
    not access it directly. For security reasons core dumps in this mode will
    not overwrite one another or other files. This mode is appropriate when
    adminstrators are attempting to debug problems in a normal environment.

    (akpm:

    > > +EXPORT_SYMBOL(suid_dumpable);
    >
    > EXPORT_SYMBOL_GPL?

    No problem to me.

    > > if (current->euid == current->uid && current->egid == current->gid)
    > > current->mm->dumpable = 1;
    >
    > Should this be SUID_DUMP_USER?

    Actually the feedback I had from last time was that the SUID_ defines
    should go because its clearer to follow the numbers. They can go
    everywhere (and there are lots of places where dumpable is tested/used
    as a bool in untouched code)

    > Maybe this should be renamed to `dump_policy' or something. Doing that
    > would help us catch any code which isn't using the #defines, too.

    Fair comment. The patch was designed to be easy to maintain for Red Hat
    rather than for merging. Changing that field would create a gigantic
    diff because it is used all over the place.

    )

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

14 Jun, 2005

1 commit

  • This patch alows you to change the source address of icmp error
    messages. It applies cleanly to 2.6.11.11 and retains the default
    behaviour.

    In the old (default) behaviour icmp error messages are sent with the ip
    of the exiting interface.

    The new behaviour (when the sysctl variable is toggled on), it will send
    the message with the ip of the interface that received the packet that
    caused the icmp error. This is the behaviour network administrators will
    expect from a router. It makes debugging complicated network layouts
    much easier. Also, all 'vendor routers' I know of have the later
    behaviour.

    Signed-off-by: David S. Miller

    J. Simonetti