09 May, 2007

40 commits

  • This patch moves the die notifier handling to common code. Previous
    various architectures had exactly the same code for it. Note that the new
    code is compiled unconditionally, this should be understood as an appel to
    the other architecture maintainer to implement support for it aswell (aka
    sprinkling a notify_die or two in the proper place)

    arm had a notifiy_die that did something totally different, I renamed it to
    arm_notify_die as part of the patch and made it static to the file it's
    declared and used at. avr32 used to pass slightly less information through
    this interface and I brought it into line with the other architectures.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
    [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Russell King
    Signed-off-by: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Fix sparse NULL warnings:
    kernel/kprobes.c:915:49: warning: Using plain integer as NULL pointer

    Signed-off-by: Randy Dunlap
    Acked-by: Ananth N Mavinakayanahalli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • REISER_FS /proc option needs to depend on PROC_FS.

    fs/reiserfs/procfs.c: In function 'show_super':
    fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'max_hash_collisions'
    fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'breads'
    fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'bread_miss'
    fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key'
    fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_fs_changed'
    fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_restarted'
    fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'insert_item_restarted'
    fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'paste_into_item_restarted'
    fs/reiserfs/procfs.c:138: error: 'reiserfs_proc_info_data_t' has no member named 'cut_from_item_restarted'
    fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_solid_item_restarted'
    fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_item_restarted'
    fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaked_oid'
    fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaves_removable'
    fs/reiserfs/procfs.c: In function 'show_per_level':
    fs/reiserfs/procfs.c:184: error: 'reiserfs_proc_info_data_t' has no member named 'balance_at'
    fs/reiserfs/procfs.c:185: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_read_at'
    fs/reiserfs/procfs.c:186: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_fs_changed'
    fs/reiserfs/procfs.c:187: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_restarted'
    fs/reiserfs/procfs.c:188: error: 'reiserfs_proc_info_data_t' has no member named 'free_at'
    fs/reiserfs/procfs.c:189: error: 'reiserfs_proc_info_data_t' has no member named 'items_at'
    fs/reiserfs/procfs.c:190: error: 'reiserfs_proc_info_data_t' has no member named 'can_node_be_removed'
    fs/reiserfs/procfs.c:191: error: 'reiserfs_proc_info_data_t' has no member named 'lnum'
    fs/reiserfs/procfs.c:192: error: 'reiserfs_proc_info_data_t' has no member named 'rnum'
    fs/reiserfs/procfs.c:193: error: 'reiserfs_proc_info_data_t' has no member named 'lbytes'
    fs/reiserfs/procfs.c:194: error: 'reiserfs_proc_info_data_t' has no member named 'rbytes'
    fs/reiserfs/procfs.c:195: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors'
    fs/reiserfs/procfs.c:196: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors_restart'
    fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_l_neighbor'
    fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_r_neighbor'
    fs/reiserfs/procfs.c: In function 'show_bitmap':
    fs/reiserfs/procfs.c:224: error: 'reiserfs_proc_info_data_t' has no member named 'free_block'
    fs/reiserfs/procfs.c:225: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:226: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:227: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:228: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:229: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c: In function 'show_journal':
    fs/reiserfs/procfs.c:384: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:385: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:386: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:387: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:388: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:389: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:390: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:391: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:392: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:393: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:394: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_init':
    fs/reiserfs/procfs.c:504: warning: implicit declaration of function '__PINFO'
    fs/reiserfs/procfs.c:504: error: request for member 'lock' in something not a structure or union
    fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_done':
    fs/reiserfs/procfs.c:544: error: request for member 'lock' in something not a structure or union
    fs/reiserfs/procfs.c:545: error: request for member 'exiting' in something not a structure or union
    fs/reiserfs/procfs.c:546: error: request for member 'lock' in something not a structure or union

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • While researching the tty layer pid leaks I found a weird case in selinux when
    we drop a controlling tty because of inadequate permissions we don't do the
    normal hangup processing. Which is a problem if it happens the session leader
    has exec'd something that can no longer access the tty.

    We already have code in the kernel to handle this case in the form of the
    TIOCNOTTY ioctl. So this patch factors out a helper function that is the
    essence of that ioctl and calls it from the selinux code.

    This removes the inconsistency in handling dropping of a controlling tty and
    who knows it might even make some part of user space happy because it received
    a SIGHUP it was expecting.

    In addition since this removes the last user of proc_set_tty outside of
    tty_io.c proc_set_tty is made static and removed from tty.h

    Signed-off-by: Eric W. Biederman
    Acked-by: Alan Cox
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This patch should contain no functional changes.

    At some point I got confused and thought put_pid could not be called while a
    spin lock was held. While it may be nice to avoid that to reduce lock hold
    times put_pid can be safely called while we hold a spin lock.

    This patch removes all of the complications from the code introduced by my
    misunderstanding, making the code a little more readable.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • All of the users of proc_clear_tty are compiled into the kernel so exporting
    this symbol appears gratuitous.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The console subsystem already has an idea of a boot console, using the
    CON_BOOT flag. The implementation has some flaws though. The major
    problem is that presence of a boot console makes register_console() ignore
    any other console devices (unless explicitly specified on the kernel
    command line).

    This patch fixes the console selection code to *not* consider a boot
    console a full-featured one, so the first non-boot console registering will
    become the default console instead. This way the unregister call for the
    boot console in the register_console() function actually triggers and the
    handover from the boot console to the real console device works smoothly.
    Added a printk for the handover, so you know which console device the
    output goes to when the boot console stops printing messages.

    The disable_early_printk() call is obsolete with that patch, explicitly
    disabling the early console isn't needed any more as it works automagically
    with that patch.

    I've walked through the tree, dropped all disable_early_printk() instances
    found below arch/ and tagged the consoles with CON_BOOT if needed. The
    code is tested on x86, sh (thanks to Paul) and mips (thanks to Ralf).

    Changes to last version: Rediffed against -rc3, adapted to mips cleanups by
    Ralf, fixed "udbg-immortal" cmd line arg on powerpc.

    Signed-off-by: Gerd Hoffmann
    Acked-by: Paul Mundt
    Acked-by: Ralf Baechle
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     
  • console.name[] is eight chars, but so is "earlyvga". So when we try to print
    console->name when using earlyvga it runs off the end of the string.

    Make it bigger.

    Diagnosed-by: Gerd Hoffmann

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Eternal quest to make

    while true; do cat /proc/fs/xfs/stat >/dev/null 2>/dev/null; done
    while true; do find /proc -type f 2>/dev/null | xargs cat >/dev/null 2>/dev/null; done
    while true; do modprobe xfs; rmmod xfs; done

    work reliably continues and now kernel oopses in the following way:

    BUG: unable to handle ... at virtual address 6b6b6b6b
    EIP is at badness
    process: cat
    proc_oom_score
    proc_info_read
    sys_fstat64
    vfs_read
    proc_info_read
    sys_read

    Failing code is prefetch hidden in list_for_each_entry() in badness().
    badness() is reachable from two points. One is proc_oom_score, another
    is out_of_memory() => select_bad_process() => badness().

    Second path grabs tasklist_lock, while first doesn't.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • LTP test sigaction_16_24 fails, because it expects sem_wait to be restarted
    if SA_RESTART is set. sem_wait is implemented with futex_wait, that
    currently doesn't support being restarted. Ulrich confirms that the call
    should be restartable.

    Implement a restart_block method to handle the relative timeout, and allow
    restarts.

    Signed-off-by: Nick Piggin
    Cc: Ulrich Drepper
    Cc: Rusty Russell
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • lguest uses the convenient futex infrastructure for inter-domain I/O, so
    expose get_futex_key, get_key_refs (renamed get_futex_key_refs) and
    drop_key_refs (renamed drop_futex_key_refs). Also means we need to expose the
    union that these use.

    No code changes.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Signed-off-by: Dale Farnsworth.org
    Cc: Alessandro Zummo
    Cc: David Brownell
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dale Farnsworth
     
  • Add an RTC driver for Ricoh RS5C313 RTC chip.

    [akpm@linux-foundation.org: Zillions of coding-style fixes]
    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Nobuhiro Iwamatsu
    Cc: Alessandro Zummo
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nobuhiro Iwamatsu
     
  • Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning
    Ding .

    Signed-off-by: Nick Piggin
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The jsm driver doesn't currently use the uart_handle_*_change helper
    functions, which are the obvious place for things like linuxpps to tie
    into (which it now does of course), and as a result the jsm driver can
    not be used with linuxpps and anything else that ties into the
    serial_core helper functions. This patch adds calls to these helper
    functions whenever the value they manage changes. That actual storage
    of the state is not modified since the jsm driver caches the current
    settings (The 8250 driver reads them everytime a user asks for the
    state), and only updates them whenever they change.

    Signed-off-by: Len Sorensen
    Cc: Scott H Kilau
    Cc: Wendy Xiong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Len Sorensen
     
  • The jsm driver fails when you try to use the TIOCSSERIAL ioctl. The reason
    is that the driver never sets uart_port.uartclk, causing the data received
    using TIOCGSERIAL to not match the internal state of the driver. This
    patch fixes this problem by settings the uartclk to the value used by the
    serial_core (16 times the baud base).

    Signed-off-by: Len Sorensen
    Cc: Scott H Kilau
    Cc: Wendy Xiong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Len Sorensen
     
  • Switch from private uclong, etc over to standard types.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Klaus Kudielka
     
  • At least on x86_64 the present cyclades.h is broken due to the wrong size
    of uclong. This affects, of course, both the kernel and the user-level
    utilities. The symptom is that cyzload refuses to load the firmware. I
    also managed to freeze the machine when unloading the module.

    The patch below fixes this in an architecture-independent way. I have
    tested it with 2.6.19 and the driver works fine again with a Cyclades-Z on
    an Athlon 64 X2.

    [akpm@linux-foundation.org: fix warnings]

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Klaus Kudielka
     
  • Kprobes doesn't scribble the kprobe.symbol_name field. Its only set by the
    module when registering the probe. Modules that exercise good hygiene
    using the "const" qualifier will see warnings...

    warning: assignment discards qualifiers from pointer target type

    Make struct kprobe.symbol_name const char *

    Signed-off-by: Ananth N Mavinakayanahalli
    Signed-off-by: Jim Keniston
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ananth N Mavinakayanahalli
     
  • Adds the needed TCGETS2/TCSETS2 ioctl calls, structures, defines and the like.
    Tested against the test suite and passes. Other platforms should need
    roughly the same change.

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • 1) Introduces a new method in 'struct dentry_operations'. This method
    called d_dname() might be called from d_path() to build a pathname for
    special filesystems. It is called without locks.

    Future patches (if we succeed in having one common dentry for all
    pipes/sockets) may need to change prototype of this method, but we now
    use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

    2) Adds a dynamic_dname() helper function that eases d_dname() implementations

    3) Defines d_dname method for sockets : No more sprintf() at socket
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    4) Defines d_dname method for pipes : No more sprintf() at pipe
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
    *nice* speedup on my Pentium(M) 1.6 Ghz :

    3.090 s instead of 3.450 s

    Signed-off-by: Eric Dumazet
    Acked-by: Christoph Hellwig
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Add support for finding out the current file position, open flags and
    possibly other info in the future.

    These new entries are added:

    /proc/PID/fdinfo/FD
    /proc/PID/task/TID/fdinfo/FD

    For each fd the information is provided in the following format:

    pos: 1234
    flags: 0100002

    [bunk@stusta.de: make struct proc_fdinfo_file_operations static]
    Signed-off-by: Miklos Szeredi
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Change the order of fields of struct pid_entry (file fs/proc/base.c) in order
    to avoid a hole on 64bit archs. (8 bytes saved per object)

    Also change all pid_entry arrays to be const qualified, to make clear they
    must not be modified.

    Before (on x86_64) :

    # size fs/proc/base.o
    text data bss dec hex filename
    15549 2192 0 17741 454d fs/proc/base.o

    After :

    # size fs/proc/base.o
    text data bss dec hex filename
    17229 176 0 17405 43fd fs/proc/base.o

    Thats 336 bytes saved on kernel size on x86_64

    Signed-off-by: Eric Dumazet
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Signed-off-by: Amit Choudhary
    Cc: Paul Fulghum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amit Choudhary
     
  • Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Signed-off-by: Robert P. J. Day
    Cc: Markus Lidel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
    information about the memory location and usage of processes. Issues:

    - maps should not be world-readable, especially if programs expect any
    kind of ASLR protection from local attackers.
    - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
    check the maps when %n is in a *printf call, and a setuid(getuid())
    process wouldn't be able to read its own maps file. (For reference
    see http://lkml.org/lkml/2006/1/22/150)
    - a system-wide toggle is needed to allow prior behavior in the case of
    non-root applications that depend on access to the maps contents.

    This change implements a check using "ptrace_may_attach" before allowing
    access to read the maps contents. To control this protection, the new knob
    /proc/sys/kernel/maps_protect has been added, with corresponding updates to
    the procfs documentation.

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: New sysctl numbers are old hat]
    Signed-off-by: Kees Cook
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • WARNING: vmlinux - Section mismatch: reference to
    .init.text:eisa_root_register from .text between 'virtual_eisa_root_init' (at
    offset 0xc026b80f) and 'cpufreq_debug_disable_ratelimit'

    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It misspelled "MODVERSIONS" preprocessor variable with "CONFIG_MODVERSIONS".
    Just kill it all.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • This patch kills the "ignoring return value of 'device_create_file'"
    warning message.

    Signed-off-by: Monakhov Dmitriy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitriy Monakhov
     
  • Cleaning up of pci_find_device in drivers/telephony/ixj.c.

    Signed-off-by: Surya Prabhakar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Surya
     
  • tAdd adds support for devices living in MMIO space to the Infineon TPM
    driver. These can be found on some of the newer HP ia64 systems.

    Signed-off-by: Alex Williamson
    Cc: Kylene Jo Hall
    Acked-by: Marcel Selhorst
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Williamson
     
  • We don't have a routine called namei() anymore since at least 2.3.x, and
    the comment is just totally out of sync with the current lookup logic.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • inode->i_sb is always set, not need to check for it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • WARN_ON(de && de->deleted); is sooo unreliable. Why?

    proc_lookup remove_proc_entry
    =========== =================
    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find proc entry]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find proc entry]

    proc_get_inode
    ==============
    WARN_ON(de && de->deleted); ...

    if (!atomic_read(&de->count))
    free_proc_entry(de);
    else
    de->deleted = 1;

    So, if you have some strange oops [1], and doesn't see this WARN_ON it means
    nothing.

    [1] try_module_get() of module which doesn't exist, two lines below
    should suffice, or not?

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Fix the following race:

    proc_readdir remove_proc_entry
    ============ =================

    spin_lock(&proc_subdir_lock);
    [choose PDE to start filldir from]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find PDE]
    [free PDE, refcount is 0]
    spin_unlock(&proc_subdir_lock);
    /* boom */
    if (filldir(dirent, de->name, ...

    [de_put on error path --adobriyan]
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • proc_lookup remove_proc_entry
    =========== =================

    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    [check refcount and free PDE]
    spin_unlock(&proc_subdir_lock);
    proc_get_inode:
    de_get(de); /* boom */

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There's a slight problem with filesystem type representation in fuse
    based filesystems.

    From the kernel's view, there are just two filesystem types: fuse and
    fuseblk. From the user's view there are lots of different filesystem
    types. The user is not even much concerned if the filesystem is fuse based
    or not. So there's a conflict of interest in how this should be
    represented in fstab, mtab and /proc/mounts.

    The current scheme is to encode the real filesystem type in the mount
    source. So an sshfs mount looks like this:

    sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,...

    This url-ish syntax works OK for sshfs and similar filesystems. However
    for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
    the kernel expects the mount source to be a real device name.

    A possibly better scheme would be to encode the real type in the type
    field as "type.subtype". So fuse mounts would look like this:

    /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,...
    user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,...

    This patch adds the necessary code to the kernel so that this can be
    correctly displayed in /proc/mounts.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • strlcpy already accounts for the trailing zero in its length
    computation, so there is no need to substract one to the buffer size.

    Signed-off-by: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • Epoll is doing multiple passes over the ready set at the moment, because of
    the constraints over the f_op->poll() call. Looking at the code again, I
    noticed that we already hold the epoll semaphore in read, and this
    (together with other locking conditions that hold while doing an
    epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace
    (in a single pass).

    This is a stress application that can be used to test the new code. It
    spwans multiple thread and call epoll_wait() and epoll_ctl() from many
    threads. Stress tested on my dual Opteron 254 w/out any problems.

    http://www.xmailserver.org/totalmess.c

    This is not a benchmark, just something that tries to stress and exploit
    possible problems with the new code.
    Also, I made a stupid micro-benchmark:

    http://www.xmailserver.org/epwbench.c

    [1] Considering that epoll must be thread-safe, there are five ways we can
    be hit during an epoll_wait() transfer loop (ep_send_events()):

    1) The epoll fd going away and calling ep_free
    This just can't happen, since we did an fget() in sys_epoll_wait

    2) An epoll_ctl(EPOLL_CTL_DEL)
    This can't happen because epoll_ctl() gets ep->sem in write, and
    we're holding it in read during ep_send_events()

    3) An fd stored inside the epoll fd going away
    This can't happen because in eventpoll_release_file() we get
    ep->sem in write, and we're holding it in read during
    ep_send_events()

    4) Another epoll_wait() happening on another thread
    They both can be inside ep_send_events() at the same time, we get
    (splice) the ready-list under the spinlock, so each one will get
    its own ready list. Note that an fd cannot be at the same time
    inside more than one ready list, because ep_poll_callback() will
    not re-queue it if it sees it already linked:

    if (ep_is_linked(&epi->rdllink))
    goto is_linked;

    Another case that can happen, is two concurrent epoll_wait(),
    coming in with a userspace event buffer of size, say, ten.
    Suppose there are 50 event ready in the list. The first
    epoll_wait() will "steal" the whole list, while the second, seeing
    no events, will go to sleep. But at the end of ep_send_events() in
    the first epoll_wait(), we will re-inject surplus ready fds, and we
    will trigger the proper wake_up to the second epoll_wait().

    5) ep_poll_callback() hitting us asyncronously
    This is the tricky part. As I said above, the ep_is_linked() test
    done inside ep_poll_callback(), will guarantee us that until the
    item will result linked to a list, ep_poll_callback() will not try
    to re-queue it again (read, write data on any of its members). When
    we do a list_del() in ep_send_events(), the item will still satisfy
    the ep_is_linked() test (whatever data is written in prev/next,
    it'll never be its own pointer), so ep_poll_callback() will still
    leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&epi->rdllink)
    that it'll become visible to ep_poll_callback(), but at the point
    we're already past it.

    [akpm@osdl.org: 80 cols]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi