09 Jan, 2006

40 commits

  • Some long time ago, dentry struct was carefully tuned so that on 32 bits
    UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
    of memory cache lines.

    Then RCU was added and dentry struct enlarged by two pointers, with nice
    results for SMP, but not so good on UP, because breaking the above tuning
    (128 + 8 = 136 bytes)

    This patch reverts this unwanted side effect, by using an union (d_u),
    where d_rcu and d_child are placed so that these two fields can share their
    memory needs.

    At the time d_free() is called (and d_rcu is really used), d_child is known
    to be empty and not touched by the dentry freeing.

    Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
    the previous content of d_child is not needed if said dentry was unhashed
    but still accessed by a CPU because of RCU constraints)

    As dentry cache easily contains millions of entries, a size reduction is
    worth the extra complexity of the ugly C union.

    Signed-off-by: Eric Dumazet
    Cc: Dipankar Sarma
    Cc: Maneesh Soni
    Cc: Miklos Szeredi
    Cc: "Paul E. McKenney"
    Cc: Ian Kent
    Cc: Paul Jackson
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • It would be helpful if the kernel did not silently stop parsing
    nfs options, but instead warned about any he does not recognize. The
    attached patch adds one printk to do just that.

    It took me a couple of hours to find my configuration mistake.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jorn Dreyer
     
  • The only user of send_sigio_to_task() already holds tasklist_lock, so it is
    better not to send the signal via send_group_sig_info() (which takes
    tasklist recursively) but use group_send_sig_info().

    The same change in send_sigurg()->send_sigurg_to_task().

    Signed-off-by: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This patch removes unneeded sig->curr_target recalculation under 'if
    (atomic_dec_and_test(&sig->count))' in __exit_signal().

    When sig->count == 0 the signal can't be sent to this task and
    next_thread(tsk) == tsk anyway.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • uniq -d MAINTAINERS

    Signed-off-by: Nicolas Kaiser
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Kaiser
     
  • There are places in the resize code in which EXT3_SB() macro is used after
    an statement like sbi = EXT3_SB(sb) is done. Inside the same function,
    both sbi and EXT3_SB() are used to reference the super block Altough it is
    not wrong, keeping it coherent increases legibility, IMHO.

    Signed-off-by: Glauber de Oliveira Costa
    Cc: "Stephen C. Tweedie"
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber de Oliveira Costa
     
  • Remove the trailing newlines in calls to ext3_warning(). This function
    already adds a trailing newline to the end of messages.

    Signed-off-by: Glauber de Oliveira Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber de Oliveira Costa
     
  • Make oprofile alloc_cpu_buffers() function NUMA aware, allocating each CPU
    local buffer in its memory node if possible.

    Signed-off-by: Eric Dumazet
    Cc: Philippe Elie
    Cc: John Levon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • The patch below adds a new mount option to allow the external journal
    device to be specified.

    The syntax is as follows:
    # mount -t ext3 -o journal_dev=0x0820 ...
    where 0x0820 means major=8 and minor=32.

    Signed-off-by: Johann Lombardi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johann Lombardi
     
  • Small cleanups in shared mounts code.

    Signed-off-by: Miklos Szeredi
    Cc: Ram Pai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Brown
     
  • Thanks to Nathan Lynch for the review and comments. Thanks to Joel Schopp
    for the pointer to add user space scipts.

    Signed-off-by: Ashok Raj
    Signed-off-by: Nathan Lynch
    Signed-off-by: Joel Schopp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashok Raj
     
  • According to the TCG specifications measurements or hashes of the BIOS code
    and data are extended into TPM PCRS and a log is kept in an ACPI table of
    these extensions for later validation if desired. This patch exports the
    values in the ACPI table through a security-fs seq_file.

    Signed-off-by: Seiji Munetoh
    Signed-off-by: Stefan Berger
    Signed-off-by: Reiner Sailer
    Signed-off-by: Kylene Hall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kylene Jo Hall
     
  • zap_other_threads() sets SIGNAL_GROUP_EXIT at the very start,
    do_group_exit() doesn't need to do it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • __group_complete_signal() sets ->group_stop_count in sig_kernel_coredump()
    path and marks the target thread as ->group_exit_task. So any thread
    except group_exit_task will go to handle_group_stop()->finish_stop().

    However, when group_exit_task actually starts do_coredump(), it sets
    SIGNAL_GROUP_EXIT, but does not reset ->group_stop_count while killing
    other threads. If we have not yet stopped threads in the same thread
    group, they all will spin in kernel mode until group_exit_task sends them
    SIGKILL, because ->group_stop_count > 0 means:

    recalc_sigpending_tsk() never clears TIF_SIGPENDING

    get_signal_to_deliver() goes to handle_group_stop()

    handle_group_stop() returns when SIGNAL_GROUP_EXIT set

    syscall_exit/resume_userspace notice TIF_SIGPENDING,
    call get_signal_to_deliver() again.

    So we are wasting cpu cycles, and if one of these threads is rt_task() this
    may be a serious problem.

    NOTE: do_coredump() holds ->mmap_sem, so not stopped threads can't escape
    coredumping after clearing ->group_stop_count.

    See also this thread: http://marc.theaimsgroup.com/?t=112739139900002

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Use symbolic names instead of hardcoded constants.

    Signed-off-by: Oleg Nesterov
    Acked-by: Harald Welte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • We've had two instances recently of overflows when doing

    64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)

    I did a tree-wide grep of `<page_base)

    Cc: Oleg Drokin
    Cc: David Howells
    Cc: David Woodhouse
    Cc:
    Cc: Christoph Hellwig
    Cc: Anton Altaparmakov
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Roman Zippel
    Cc:
    Cc: Miklos Szeredi
    Cc: Russell King
    Cc: Trond Myklebust
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • __create_workqueue() not checking return of alloc_percpu()

    NULL dereference was possible.

    Signed-off-by: Ben Collins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Collins
     
  • Sent by Paul Clements , who needs to read
    Documentation/SubmittingPatches..

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    taneli.vahakangas@netsonic.fi
     
  • HDIO_GETGEO is implemented in most block drivers, and all of them have to
    duplicate the code to copy the structure to userspace, as well as getting
    the start sector. This patch moves that to common code [1] and adds a
    ->getgeo method to fill out the raw kernel hd_geometry structure. For many
    drivers this means ->ioctl can go away now.

    [1] the s390 block drivers are odd in this respect. xpram sets ->start
    to 4 always which seems more than odd, and the dasd driver shifts
    the start offset around, probably because of it's non-standard
    sector size.

    Signed-off-by: Christoph Hellwig
    Cc: Jens Axboe
    Cc:
    Cc: Jeff Dike
    Cc: Paolo Giarrusso
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Neil Brown
    Cc: Markus Lidel
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: James Bottomley
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xose Vazquez Perez
     
  • While rooting aroung in the signal code trying to understand how to fix the
    SIG_IGN ploy (set sig handler to SIG_IGN and flood system with high speed
    repeating timers) I came across what, I think, is a problem in sigaction()
    in that when processing a SIG_IGN request it flushes signals from 1 to
    SIGRTMIN and leaves the rest. Attempt to fix this.

    Signed-off-by: George Anzinger
    Cc: Roland McGrath
    Cc: Linus Torvalds
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    George Anzinger
     
  • Make it possible for a running process (such as gssapid) to be able to
    instantiate a key, as was requested by Trond Myklebust for NFS4.

    The patch makes the following changes:

    (1) A new, optional key type method has been added. This permits a key type
    to intercept requests at the point /sbin/request-key is about to be
    spawned and do something else with them - passing them over the
    rpc_pipefs files or netlink sockets for instance.

    The uninstantiated key, the authorisation key and the intended operation
    name are passed to the method.

    (2) The callout_info is no longer passed as an argument to /sbin/request-key
    to prevent unauthorised viewing of this data using ps or by looking in
    /proc/pid/cmdline.

    This means that the old /sbin/request-key program will not work with the
    patched kernel as it will expect to see an extra argument that is no
    longer there.

    A revised keyutils package will be made available tomorrow.

    (3) The callout_info is now attached to the authorisation key. Reading this
    key will retrieve the information.

    (4) A new field has been added to the task_struct. This holds the
    authorisation key currently active for a thread. Searches now look here
    for the caller's set of keys rather than looking for an auth key in the
    lowest level of the session keyring.

    This permits a thread to be servicing multiple requests at once and to
    switch between them. Note that this is per-thread, not per-process, and
    so is usable in multithreaded programs.

    The setting of this field is inherited across fork and exec.

    (5) A new keyctl function (KEYCTL_ASSUME_AUTHORITY) has been added that
    permits a thread to assume the authority to deal with an uninstantiated
    key. Assumption is only permitted if the authorisation key associated
    with the uninstantiated key is somewhere in the thread's keyrings.

    This function can also clear the assumption.

    (6) A new magic key specifier has been added to refer to the currently
    assumed authorisation key (KEY_SPEC_REQKEY_AUTH_KEY).

    (7) Instantiation will only proceed if the appropriate authorisation key is
    assumed first. The assumed authorisation key is discarded if
    instantiation is successful.

    (8) key_validate() is moved from the file of request_key functions to the
    file of permissions functions.

    (9) The documentation is updated.

    From:

    Build fix.

    Signed-off-by: David Howells
    Cc: Trond Myklebust
    Cc: Alexander Zangerl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Cause any links within a keyring to keys that match a key to be linked into
    that keyring to be discarded as a link to the new key is added. The match is
    contingent on the type and description strings being the same.

    This permits requests, adds and searches to displace negative, expired,
    revoked and dead keys easily. After some discussion it was concluded that
    duplicate valid keys should probably be discarded also as they would otherwise
    hide the new key.

    Since request_key() is intended to be the primary method by which keys are
    added to a keyring, duplicate valid keys wouldn't be an issue there as that
    function would return an existing match in preference to creating a new key.

    Signed-off-by: David Howells
    Cc: Trond Myklebust
    Cc: Alexander Zangerl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Add a new keyctl function that allows the expiry time to be set on a key or
    removed from a key, provided the caller has attribute modification access.

    Signed-off-by: David Howells
    Cc: Trond Myklebust
    Cc: Alexander Zangerl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • kmsg_write returns with printk, so some programs may be confused by a
    successful write() with a return value different than the buffer length.

    # /bin/echo something > /dev/kmsg
    /bin/echo: write error: Inappropriate ioctl for device

    The drawbacks is that the printk return value can no more be quickly
    checked from userspace.

    Signed-off-by: Guillaume Chazarain
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Chazarain
     
  • What's the true meaning of the printk return value? Should it include the
    priority prefix length of 3? and what about the timing information? In
    both cases it was broken:

    strace -e write echo 1 > /dev/kmsg
    => write(1, "1\n", 2) = 5
    strace -e write echo "1" > /dev/kmsg
    => write(1, "1\n", 5) = 8

    The returned length was "length of input string + 3", I made it "length
    of string output to the log buffer".

    Note that I couldn't find any printk caller in the kernel interested by its
    return value besides kmsg_write.

    Signed-off-by: Guillaume Chazarain
    Acked-By: Tim Bird
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Chazarain
     
  • When making an fctl locking call through compat_sys_fcntl64 (i.e. a 32bit
    app on a 64bit kernel), the syscall can return a locking range that is in
    conflict with the queried lock.

    If some aspect of this range does not fit in the 32bit structure, something
    needs to be done.

    The current code is wrong in several respects:

    - It returns data to userspace even if no conflict was found
    i.e. it should check l_type for F_UNLCK
    - It returns -EOVERFLOW too agressively. A lock range covering
    the last possible byte of the file (start = COMPAT_OFF_T_MAX,
    len = 1) should be possible, but is rejected with the current test.
    - A extra-long 'len' should not be a problem. If only that part
    of the conflicting lock that would be visible to the 32bit
    app needs to be reported to the 32bit app anyway.

    This patch addresses those three issues and adds a comment to (hopefully)
    record it for posterity.

    Note: this patch mainly affects test-cases. Real applications rarely is
    ever see the problems.

    This patch has been tested (LSB test suite), and works.

    Signed-off-by: Neil Brown
    Cc: Arnd Bergmann
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • SUS requires that when truncating a file to the size that it currently
    is:
    truncate and ftruncate should NOT modify ctime or mtime
    O_TRUNC SHOULD modify ctime and mtime.

    Currently mtime and ctime are always modified on most local
    filesystems (side effect of ->truncate) or never modified (on NFS).

    With this patch:
    ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
    an update of these times is required whether size changes or not
    (via a new argument to do_truncate). This allows NFS to do
    the right thing for O_TRUNC.
    inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
    sets the size to it's current value. This allows local filesystems
    to do the right thing for f?truncate.

    Also, the logic in inode_setattr is changed a bit so there are two return
    points. One returns the error from vmtruncate if it failed, the other
    returns 0 (there can be no other failure).

    Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
    requested, we now fall-through and mark_inode_dirty. If a filesystem did
    not have a ->truncate function, then vmtruncate will have changed i_size,
    without marking the inode as 'dirty', and I think this is wrong.

    Signed-off-by: Neil Brown
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Make it possible to include linux/pagevec.h multiple times without
    incurring errors due to duplicate definitions.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Reported from Redhat Bugzilla Bug 170450

    "I updated to the development kernel and now during boot only the top of the
    text is visable. For example the monitor screen the is the lines and I can
    only see text in the asterisk area.

    Antonino A. Daplas
     
  • When doublescan mode is in use, scanlines must be doubled.

    Thanks to Jason Dravet for reporting and testing.

    Signed-off-by: Samuel Thibault
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Samuel Thibault
     
  • inode can never be NULL when calling this function.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The ptrace_get_task_struct() helper that I added as part of the ptrace
    consolidation is useful in variety of places that currently opencode it.
    Switch them to the common helpers.

    Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
    the ptrace_get_task_struct() interface. We don't need the request argument
    now, and we return the task_struct directly, using ERR_PTR() for error
    returns. It's a bit more code in the callers, but we have two sane routines
    that do one thing well now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • librelay and relay-app.h have been retired - update Documentation to reflect
    that.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • This patch renames relayfs_file_operations to relay_file_operations, and the
    file operations themselves from relayfs_XXX to relay_file_XXX, to make it more
    clear that they refer to relay files.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • Documentation update for creating global buffers.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • This patch adds the optional is_global outparam to the create_buf_file()
    callback. This can be used by clients to create a single global relayfs
    buffer instead of the default per-cpu buffers. This was suggested as being
    useful for certain debugging applications where it's more convenient to be
    able to get all the data from a single channel without having to go to the
    bother of dealing with per-cpu files.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • Documentation update for creating relay files in other filesystems.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • This patch adds a couple of callback functions that allow a client to hook
    into relay_open()/close() and supply the files that will be used to represent
    the channel buffers; the default implementation if no callbacks are defined is
    to create the files in relayfs. This is to support the creation and use of
    relay files in other filesystems such as debugfs, as implied by the fact that
    relayfs_file_operations are exported.

    Signed-off-by: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi