21 Jul, 2011

1 commit


04 Jun, 2011

1 commit

  • Turns out that distro packages use this file as an indicator of
    the perf event subsystem - this is easier to check for from scripts
    than the existence of the system call.

    This is easy enough to keep around for the kernel, so add a
    comment to make sure it stays so.

    Signed-off-by: Vince Weaver
    Cc: David Ahern
    Cc: Peter Zijlstra
    Cc: paulus@samba.org
    Cc: acme@redhat.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031751170.29381@cl320.eecs.utk.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     

26 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (26 commits)
    arch/tile: prefer "tilepro" as the name of the 32-bit architecture
    compat: include aio_abi.h for aio_context_t
    arch/tile: cleanups for tilegx compat mode
    arch/tile: allocate PCI IRQs later in boot
    arch/tile: support signal "exception-trace" hook
    arch/tile: use better definitions of xchg() and cmpxchg()
    include/linux/compat.h: coding-style fixes
    tile: add an RTC driver for the Tilera hypervisor
    arch/tile: finish enabling support for TILE-Gx 64-bit chip
    compat: fixes to allow working with tile arch
    arch/tile: update defconfig file to something more useful
    tile: do_hardwall_trap: do not play with task->sighand
    tile: replace mm->cpu_vm_mask with mm_cpumask()
    tile,mn10300: add device parameter to dma_cache_sync()
    audit: support the "standard"
    arch/tile: clarify flush_buffer()/finv_buffer() function names
    arch/tile: kernel-related cleanups from removing static page size
    arch/tile: various header improvements for building drivers
    arch/tile: disable GX prefetcher during cache flush
    arch/tile: tolerate disabling CONFIG_BLK_DEV_INITRD
    ...

    Linus Torvalds
     

24 May, 2011

1 commit


23 May, 2011

1 commit

  • This restores the previous behavior of softlock_thresh.

    Currently, setting watchdog_thresh to zero causes the watchdog
    kthreads to consume a lot of CPU.

    In addition, the logic of proc_dowatchdog_thresh and
    proc_dowatchdog_enabled has been factored into proc_dowatchdog.

    Signed-off-by: Mandeep Singh Baines
    Cc: Marcin Slusarz
    Cc: Don Zickus
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1306127423-3347-3-git-send-email-msb@chromium.org
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Mandeep Singh Baines
     

20 May, 2011

1 commit

  • This change adds support for /proc/sys/debug/exception-trace to tile.
    Like x86 and sparc, by default it is set to "1", generating a one-line
    printk whenever a user process crashes. By setting it to "2", we get
    a much more complete userspace diagnostic at crash time, including
    a user-space backtrace, register dump, and memory dump around the
    address of the crash.

    Some vestiges of the Tilera-internal version of this support are
    removed with this patch (the show_crashinfo variable and the
    arch_coredump_signal function). We retain a "crashinfo" boot parameter
    which allows you to set the boot-time value of exception-trace.

    Signed-off-by: Chris Metcalf

    Chris Metcalf
     

04 Apr, 2011

1 commit

  • There is no way to limit the capabilities of usermodehelpers. This problem
    reared its head recently when someone complained that any user with
    cap_net_admin was able to load arbitrary kernel modules, even though the user
    didn't have cap_sys_module. The reason is because the actual load is done by
    a usermode helper and those always have the full cap set. This patch addes new
    sysctls which allow us to bound the permissions of usermode helpers.

    /proc/sys/kernel/usermodehelper/bset
    /proc/sys/kernel/usermodehelper/inheritable

    You must have CAP_SYS_MODULE and CAP_SETPCAP to change these (changes are
    &= ONLY). When the kernel launches a usermodehelper it will do so with these
    as the bset and pI.

    -v2: make globals static
    create spinlock to protect globals

    -v3: require both CAP_SETPCAP and CAP_SYS_MODULE
    -v4: fix the typo s/CAP_SET_PCAP/CAP_SETPCAP/ because I didn't commit
    Signed-off-by: Eric Paris
    No-objection-from: Serge E. Hallyn
    Acked-by: David Howells
    Acked-by: Serge E. Hallyn
    Acked-by: Andrew G. Morgan
    Signed-off-by: James Morris

    Eric Paris
     

24 Mar, 2011

2 commits

  • When dmesg_restrict is set to 1 CAP_SYS_ADMIN is needed to read the kernel
    ring buffer. But a root user without CAP_SYS_ADMIN is able to reset
    dmesg_restrict to 0.

    This is an issue when e.g. LXC (Linux Containers) are used and complete
    user space is running without CAP_SYS_ADMIN. A unprivileged and jailed
    root user can bypass the dmesg_restrict protection.

    With this patch writing to dmesg_restrict is only allowed when root has
    CAP_SYS_ADMIN.

    Signed-off-by: Richard Weinberger
    Acked-by: Dan Rosenberg
    Acked-by: Serge E. Hallyn
    Cc: Eric Paris
    Cc: Kees Cook
    Cc: James Morris
    Cc: Eugene Teo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • Add boundaries of allowed input ranges for: dirty_expire_centisecs,
    drop_caches, overcommit_memory, page-cluster and panic_on_oom.

    Signed-off-by: Petr Holasek
    Acked-by: Dave Young
    Cc: David Rientjes
    Cc: Wu Fengguang
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Holasek
     

17 Mar, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

16 Mar, 2011

3 commits

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    sched: Resched proper CPU on yield_to()
    sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
    sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
    sched: Clean up the IRQ_TIME_ACCOUNTING code
    sched: Add #ifdef around irq time accounting functions
    sched, autogroup: Stop claiming ownership of the root task group
    sched, autogroup: Stop going ahead if autogroup is disabled
    sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
    sched: Fix the group_imb logic
    sched: Clean up some f_b_g() comments
    sched: Clean up remnants of sd_idle
    sched: Wholesale removal of sd_idle logic
    sched: Add yield_to(task, preempt) functionality
    sched: Use a buddy to implement yield_task_fair()
    sched: Limit the scope of clear_buddies
    sched: Check the right ->nr_running in yield_task_fair()
    sched: Avoid expensive initial update_cfs_load(), on UP too
    sched: Fix switch_from_fair()
    sched: Simplify the idle scheduling class
    softirqs: Account ksoftirqd time as cpustat softirq
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
    perf probe: Clean up probe_point_lazy_walker() return value
    tracing: Fix irqoff selftest expanding max buffer
    tracing: Align 4 byte ints together in struct tracer
    tracing: Export trace_set_clr_event()
    tracing: Explain about unstable clock on resume with ring buffer warning
    ftrace/graph: Trace function entry before updating index
    ftrace: Add .ref.text as one of the safe areas to trace
    tracing: Adjust conditional expression latency formatting.
    tracing: Fix event alignment: skb:kfree_skb
    tracing: Fix event alignment: mce:mce_record
    tracing: Fix event alignment: kvm:kvm_hv_hypercall
    tracing: Fix event alignment: module:module_request
    tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
    tracing: Remove lock_depth from event entry
    perf header: Stop using 'self'
    perf session: Use evlist/evsel for managing perf.data attributes
    perf top: Don't let events to eat up whole header line
    perf top: Fix events overflow in top command
    ring-buffer: Remove unused #include <linux/trace_irq.h>
    tracing: Add an 'overwrite' trace_option.
    ...

    Linus Torvalds
     
  • James Morris
     

08 Mar, 2011

2 commits

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     
  • James Morris
     

23 Feb, 2011

1 commit


16 Feb, 2011

2 commits


03 Feb, 2011

1 commit

  • Use the buddy mechanism to implement yield_task_fair. This
    allows us to skip onto the next highest priority se at every
    level in the CFS tree, unless doing so would introduce gross
    unfairness in CPU time distribution.

    We order the buddy selection in pick_next_entity to check
    yield first, then last, then next. We need next to be able
    to override yield, because it is possible for the "next" and
    "yield" task to be different processen in the same sub-tree
    of the CFS tree. When they are, we need to go into that
    sub-tree regardless of the "yield" hint, and pick the correct
    entity once we get to the right level.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

02 Feb, 2011

1 commit


26 Jan, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: wacom - pass touch resolution to clients through input_absinfo
    Input: wacom - add 2 Bamboo Pen and touch models
    Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent
    Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup
    Input: tegra-kbc - add tegra keyboard driver
    Input: gpio_keys - switch to using request_any_context_irq
    Input: serio - allow registered drivers to get status flag
    Input: ct82710c - return proper error code for ct82c710_open
    Input: bu21013_ts - added regulator support
    Input: bu21013_ts - remove duplicate resolution parameters
    Input: tnetv107x-ts - don't treat NULL clk as an error
    Input: tnetv107x-keypad - don't treat NULL clk as an error

    Fix up trivial conflicts in drivers/input/keyboard/Makefile due to
    additions of tc3589x/Tegra drivers

    Linus Torvalds
     

25 Jan, 2011

1 commit

  • Currently sysrq_enabled and __sysrq_enabled are initialised separately
    and inconsistently, leading to sysrq being actually enabled by reported
    as not enabled in sysfs. The first change to the sysfs configurable
    synchronises these two:

    static int __read_mostly sysrq_enabled = 1;
    static int __sysrq_enabled;

    Add a common define to carry the default for these preventing them becoming
    out of sync again. Default this to 1 to mirror previous behaviour.

    Signed-off-by: Andy Whitcroft
    Cc: stable@kernel.org
    Signed-off-by: Dmitry Torokhov

    Andy Whitcroft
     

14 Jan, 2011

3 commits

  • ctl_unnumbered.txt have been removed in Documentation directory so just
    also remove this invalid comments

    [akpm@linux-foundation.org: fix Documentation/sysctl/00-INDEX, per Dave]
    Signed-off-by: Jovi Zhang
    Cc: Dave Young
    Acked-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jovi Zhang
     
  • Signed-off-by: Jovi Zhang
    Acked-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jovi Zhang
     
  • Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict
    sysctl.

    The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    [akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/]
    [akpm@linux-foundation.org: coding-style fixup]
    [randy.dunlap@oracle.com: fix kernel/sysctl.c warning]
    Signed-off-by: Dan Rosenberg
    Signed-off-by: Randy Dunlap
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     

07 Jan, 2011

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
    sched: Change wait_for_completion_*_timeout() to return a signed long
    sched, autogroup: Fix reference leak
    sched, autogroup: Fix potential access to freed memory
    sched: Remove redundant CONFIG_CGROUP_SCHED ifdef
    sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight
    sched: Move periodic share updates to entity_tick()
    printk: Use this_cpu_{read|write} api on printk_pending
    sched: Make pushable_tasks CONFIG_SMP dependant
    sched: Add 'autogroup' scheduling feature: automated per session task groups
    sched: Fix unregister_fair_sched_group()
    sched: Remove unused argument dest_cpu to migrate_task()
    mutexes, sched: Introduce arch_mutex_cpu_relax()
    sched: Add some clock info to sched_debug
    cpu: Remove incorrect BUG_ON
    cpu: Remove unused variable
    sched: Fix UP build breakage
    sched: Make task dump print all 15 chars of proc comm
    sched: Update tg->shares after cpu.shares write
    sched: Allow update_cfs_load() to update global load
    sched: Implement demand based update_cfs_load()
    ...

    Linus Torvalds
     

10 Dec, 2010

1 commit

  • Originally adapted from Huang Ying's patch which moved the
    unknown_nmi_panic to the traps.c file. Because the old nmi
    watchdog was deleted before this change happened, the
    unknown_nmi_panic sysctl was lost. This re-adds it.

    Also, the nmi_watchdog sysctl was re-implemented and its
    documentation updated accordingly.

    Patch-inspired-by: Huang Ying
    Signed-off-by: Don Zickus
    Reviewed-by: Cyrill Gorcunov
    Acked-by: Yinghai Lu
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Don Zickus
     

30 Nov, 2010

1 commit

  • A recurring complaint from CFS users is that parallel kbuild has
    a negative impact on desktop interactivity. This patch
    implements an idea from Linus, to automatically create task
    groups. Currently, only per session autogroups are implemented,
    but the patch leaves the way open for enhancement.

    Implementation: each task's signal struct contains an inherited
    pointer to a refcounted autogroup struct containing a task group
    pointer, the default for all tasks pointing to the
    init_task_group. When a task calls setsid(), a new task group
    is created, the process is moved into the new task group, and a
    reference to the preveious task group is dropped. Child
    processes inherit this task group thereafter, and increase it's
    refcount. When the last thread of a process exits, the
    process's reference is dropped, such that when the last process
    referencing an autogroup exits, the autogroup is destroyed.

    At runqueue selection time, IFF a task has no cgroup assignment,
    its current autogroup is used.

    Autogroup bandwidth is controllable via setting it's nice level
    through the proc filesystem:

    cat /proc//autogroup

    Displays the task's group and the group's nice level.

    echo > /proc//autogroup

    Sets the task group's shares to the weight of nice task.
    Setting nice level is rate limited for !admin users due to the
    abuse risk of task group locking.

    The feature is enabled from boot by default if
    CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
    the boot option noautogroup, and can also be turned on/off on
    the fly via:

    echo [01] > /proc/sys/kernel/sched_autogroup_enabled

    ... which will automatically move tasks to/from the root task group.

    Signed-off-by: Mike Galbraith
    Acked-by: Linus Torvalds
    Acked-by: Peter Zijlstra
    Cc: Markus Trippelsdorf
    Cc: Mathieu Desnoyers
    Cc: Paul Turner
    Cc: Oleg Nesterov
    [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
    Signed-off-by: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

26 Nov, 2010

2 commits


18 Nov, 2010

3 commits

  • Introduce a new sysctl for the shares window and disambiguate it from
    sched_time_avg.

    A 10ms window appears to be a good compromise between accuracy and performance.

    Signed-off-by: Paul Turner
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • By tracking a per-cpu load-avg for each cfs_rq and folding it into a
    global task_group load on each tick we can rework tg_shares_up to be
    strictly per-cpu.

    This should improve cpu-cgroup performance for smp systems
    significantly.

    [ Paul: changed to use queueing cfs_rq + bug fixes ]

    Signed-off-by: Paul Turner
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Now that we have a new nmi_watchdog that is more generic and
    sits on top of the perf subsystem, we really do not need the old
    nmi_watchdog any more.

    In addition, the old nmi_watchdog doesn't really work if you are
    using the default clocksource, hpet. The old nmi_watchdog code
    relied on local apic interrupts to determine if the cpu is still
    alive. With hpet as the clocksource, these interrupts don't
    increment any more and the old nmi_watchdog triggers false
    postives.

    This piece removes the old nmi_watchdog code and stubs out any
    variables and functions calls. The stubs are the same ones used
    by the new nmi_watchdog code, so it should be well tested.

    Signed-off-by: Don Zickus
    Cc: fweisbec@gmail.com
    Cc: gorcunov@openvz.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Don Zickus
     

16 Nov, 2010

1 commit


12 Nov, 2010

1 commit

  • The kernel syslog contains debugging information that is often useful
    during exploitation of other vulnerabilities, such as kernel heap
    addresses. Rather than futilely attempt to sanitize hundreds (or
    thousands) of printk statements and simultaneously cripple useful
    debugging functionality, it is far simpler to create an option that
    prevents unprivileged users from reading the syslog.

    This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
    dmesg_restrict sysctl. When set to "0", the default, no restrictions are
    enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
    kernel syslog via dmesg(8) or other mechanisms.

    [akpm@linux-foundation.org: explain the config option in kernel.txt]
    Signed-off-by: Dan Rosenberg
    Acked-by: Ingo Molnar
    Acked-by: Eugene Teo
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     

27 Oct, 2010

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • Adding declaration of printk_ratelimit_state in ratelimit.h removes
    potential build breakage and following sparse warning:

    kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?

    [akpm@linux-foundation.org: remove unneeded ifdef]
    Signed-off-by: Namhyung Kim
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Robin Holt tried to boot a 16TB system and found af_unix was overflowing
    a 32bit value :

    We were seeing a failure which prevented boot. The kernel was incapable
    of creating either a named pipe or unix domain socket. This comes down
    to a common kernel function called unix_create1() which does:

    atomic_inc(&unix_nr_socks);
    if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
    goto out;

    The function get_max_files() is a simple return of files_stat.max_files.
    files_stat.max_files is a signed integer and is computed in
    fs/file_table.c's files_init().

    n = (mempages * (PAGE_SIZE / 1024)) / 10;
    files_stat.max_files = n;

    In our case, mempages (total_ram_pages) is approx 3,758,096,384
    (0xe0000000). That leaves max_files at approximately 1,503,238,553.
    This causes 2 * get_max_files() to integer overflow.

    Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
    integers, and change af_unix to use an atomic_long_t instead of atomic_t.

    get_max_files() is changed to return an unsigned long. get_nr_files() is
    changed to return a long.

    unix_nr_socks is changed from atomic_t to atomic_long_t, while not
    strictly needed to address Robin problem.

    Before patch (on a 64bit kernel) :
    # echo 2147483648 >/proc/sys/fs/file-max
    # cat /proc/sys/fs/file-max
    -18446744071562067968

    After patch:
    # echo 2147483648 >/proc/sys/fs/file-max
    # cat /proc/sys/fs/file-max
    2147483648
    # cat /proc/sys/fs/file-nr
    704 0 2147483648

    Reported-by: Robin Holt
    Signed-off-by: Eric Dumazet
    Acked-by: David Miller
    Reviewed-by: Robin Holt
    Tested-by: Robin Holt
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

26 Oct, 2010

2 commits

  • The nr_dentry stat is a globally touched cacheline and atomic operation
    twice over the lifetime of a dentry. It is used for the benfit of userspace
    only. Turn it into a per-cpu counter and always decrement it in d_free instead
    of doing various batching operations to reduce lock hold times in the callers.

    Based on an earlier patch from Nick Piggin .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • The number of inodes allocated does not need to be tied to the
    addition or removal of an inode to/from a list. If we are not tied
    to a list lock, we could update the counters when inodes are
    initialised or destroyed, but to do that we need to convert the
    counters to be per-cpu (i.e. independent of a lock). This means that
    we have the freedom to change the list/locking implementation
    without needing to care about the counters.

    Based on a patch originally from Eric Dumazet.

    [AV: cleaned up a bit, fixed build breakage on weird configs

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Dave Chinner