17 Oct, 2007

40 commits

  • [1/3] Cleanup the coding style according to Andrew's comments:
    http://lists.infradead.org/pipermail/kexec/2007-August/000522.html
    - vmcoreinfo_append_str() should have suitable __attribute__s so that
    the compiler can check its use.
    - vmcoreinfo_max_size should have size_t.
    - Use get_seconds() instead of xtime.tv_sec.
    - Use init_uts_ns.name.release instead of UTS_RELEASE.

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • This patch set frees the restriction that makedumpfile users should install a
    vmlinux file (including the debugging information) into each system.

    makedumpfile command is the dump filtering feature for kdump. It creates a
    small dumpfile by filtering unnecessary pages for the analysis. To
    distinguish unnecessary pages, it needs a vmlinux file including the debugging
    information. These days, the debugging package becomes a huge file, and it is
    hard to install it into each system.

    To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
    the result, we reached the conclusion that necessary information for dump
    filtering (called "vmcoreinfo") should be embedded into the first kernel file
    and it should be accessed through /proc/vmcore during the second kernel.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)

    Dan Aloni created the patch set for the above implementation.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)

    And I updated it for multi architectures and memory models.
    (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)

    Signed-off-by: Dan Aloni
    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Bernhard Walle
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • Fix this lot:

    fs/binfmt_flat.c: In function `decompress_exec':
    fs/binfmt_flat.c:293: warning: label `out' defined but not used
    fs/binfmt_flat.c: In function `load_flat_file':
    fs/binfmt_flat.c:462: warning: unsigned int format, long int arg (arg 3)
    fs/binfmt_flat.c:462: warning: unsigned int format, long int arg (arg 4)
    fs/binfmt_flat.c:518: warning: comparison of distinct pointer types lacks a cast
    fs/binfmt_flat.c:549: warning: passing arg 1 of `ksize' makes pointer from integer without a cast
    fs/binfmt_flat.c:601: warning: passing arg 1 of `ksize' makes pointer from integer without a cast
    fs/binfmt_flat.c: In function `load_flat_binary':
    fs/binfmt_flat.c:116: warning: 'dummy' might be used uninitialized in this function

    Acked-by: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Simply fill out the bits in checkstack.pl for Blackfin. I thought I already
    sent this, but I don't see it in -mm anywhere ...

    Signed-off-by: Mike Frysinger
    Cc: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • do_sigaction() returns -ERESTARTNOINTR if signal_pending(). The comment says:

    * If there might be a fatal signal pending on multiple
    * threads, make sure we take it before changing the action.

    I think this is not needed. We should only worry about SIGNAL_GROUP_EXIT case,
    bit it implies a pending SIGKILL which can't be cleared by do_sigaction.

    Kill this special case.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks
    if an rt-prio execer shares the same cpu with ->group_leader. Change the code
    to use ->group_exit_task/notify_count mechanics.

    This patch certainly uglifies the code, perhaps someone can suggest something
    better.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that we don't pre-allocate the new ->sighand, we can kill the first fast
    path, it doesn't make sense any longer. At best, it can save one "list_empty()"
    check but leads to the code duplication.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • de_thread() pre-allocates newsighand to make sure that exec() can't fail after
    killing all sub-threads. Imho, this buys nothing, but complicates the code:

    - this is (mostly) needed to handle CLONE_SIGHAND without CLONE_THREAD
    tasks, this is very unlikely (if ever used) case

    - unless we already have some serious problems, GFP_KERNEL allocation
    should not fail

    - ENOMEM still can happen after de_thread(), ->sighand is not the last
    object we have to allocate

    Change the code to allocate the new ->sighand on demand.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There is no any reason to do recalc_sigpending() after changing ->sighand.
    To begin with, recalc_sigpending() does not take ->sighand into account.

    This means we don't need to take newsighand->siglock while changing sighands.
    rcu_assign_pointer() provides a necessary barrier, and if another process
    reads the new ->sighand it should either take tasklist_lock or it should use
    lock_task_sighand() which has a corresponding smp_read_barrier_depends().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Fix f_version type: should be u64 instead of long

    There is a type inconsistency between struct inode i_version and struct file
    f_version.

    fs.h:

    struct inode
    u64 i_version;

    and

    struct file
    unsigned long f_version;

    Users do:

    fs/ext3/dir.c:

    if (filp->f_version != inode->i_version) {

    So why isn't f_version a u64 ? It becomes a problem if versions gets
    higher than 2^32 and we are on an architecture where longs are 32 bits.

    This patch changes the f_version type to u64, and updates the users accordingly.

    It applies to 2.6.23-rc2-mm2.

    Signed-off-by: Mathieu Desnoyers
    Cc: Martin Bligh
    Cc: "Randy.Dunlap"
    Cc: Al Viro
    Cc:
    Cc: Mark Fasheh
    Cc: Christoph Hellwig
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Some months back I proposed changing the schedule() call in
    read_events to an io_schedule():
    http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html
    This was rejected as there are AIO operations that do not initiate
    disk I/O. I've had another look at the problem, and the only AIO
    operation that will not initiate disk I/O is IOCB_CMD_NOOP. However,
    this command isn't even wired up!

    Given that it doesn't work, and hasn't for *years*, I'm going to
    suggest again that we do proper I/O accounting when using AIO.

    Signed-off-by: Jeff Moyer
    Acked-by: Zach Brown
    Cc: Benjamin LaHaise
    Cc: Suparna Bhattacharya
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     
  • Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.

    The locking used by get_random_bytes() can conflict with the
    preempt_disable() and synchronize_sched() form of RCU. This patch changes
    rcutorture's RNG to gather entropy from the new cpu_clock() interface
    (relying on interrupts, preemption, daemons, and rcutorture's reader
    thread's rock-bottom scheduling priority to provide useful entropy), and
    also adds and EXPORT_SYMBOL_GPL() to make that interface available to GPLed
    kernel modules such as rcutorture.

    Passes several hours of rcutorture.

    [ego@in.ibm.com: Use raw_smp_processor_id() in rcu_random()]
    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • To avoid lock contention, we distribute the sched_timer calls across the
    cpus so they do not trigger at the same instant. However, I used NR_CPUS,
    which can cause needless grouping on small smp systems depending on your
    kernel config. This patch converts to using num_possible_cpus() so we
    spread it as evenly as possible on every machine.

    Briefly tested w/ NR_CPUS=255 and verified reduced contention.

    Signed-off-by: John Stultz
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • Lomesh reported poll returning EINTR during suspend/resume cycle. This is
    caused by the STOP/CONT cycle that the freezer uses, generating a pending
    signal for what in effect is an ignored signal. In general poll is a
    little eager in returning EINTR, when it could try not bother userspace and
    simply restart the syscall. Both select and ppoll do use ERESTARTNOHAND to
    restart the syscall. Oleg points out that simply using ERESTARTNOHAND will
    cause poll to restart with original timeout value. which could ultimately
    lead to process never returning to userspace. Instead use
    ERESTART_RESTARTBLOCK, and restart poll with updated timeout value.
    Inspired by Manfred's use ERESTARTNOHAND in poll patch.

    [bunk@kernel.org: do_restart_poll() can become static]
    Cc: Manfred Spraul
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: "Agarwal, Lomesh"
    Signed-off-by: Chris Wright
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wright
     
  • Allow disabling DNOTIFY with CONFIG_EMBEDDED=n.

    I'm currently running a kernel with dnotify disabled and I haven't run into
    any problem. Is there any popular application left that breaks without
    dnotify support in the kernel?

    Note that this patch does not remove dnotify support, it still defaults to
    "y", and the help text recommends enabling it.

    Signed-off-by: Adrian Bunk
    Acked-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • simple_commit_write() can now become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • - remove the no longer required __attribute__((weak)) of xtime_lock
    - remove the following no longer used EXPORT_SYMBOL's:
    - xtime
    - xtime_lock

    Signed-off-by: Adrian Bunk
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This attempts to address CVE-2006-6058
    http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058

    first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html

    Essentially a corrupted minix dir inode reporting a very large
    i_size will loop for a very long time in minix_readdir, minix_find_entry,
    etc, because on EIO they just move on to try the next page. This is
    under the BKL, printk-storming as well. This can lock up the machine
    for a very long time. Simply ratelimiting the printks gets things back
    under control. Make the message a bit more informative while we're here.

    Signed-off-by: Eric Sandeen
    Cc: Bodo Eggert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Replace n & (n - 1) with is_power_of_2(n)

    Signed-off-by: vignesh babu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • oomkilladj is int, but values which can be assigned to it are -17, [-16,
    15], thus fitting into s8.

    While patch itself doesn't help in making task_struct smaller, because of
    natural alignment of ->link_count, it will make picture clearer wrt futher
    task_struct reduction patches. My plan is to move ->fpu_counter and
    ->oomkilladj after ->ioprio filling hole on i386 and x86_64. But that's
    for later, because bloated distro configs need looking at as well.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove the __STRICT_ANSI__ check from the __u64/__s64 declaration on
    32bit targets.

    GCC can be made to warn about usage of long long types with ISO C90
    (-ansi), but only with -pedantic. You can write this in a way that even
    then it doesn't cause warnings, namely by:

    #ifdef __GNUC__
    __extension__ typedef __signed__ long long __s64;
    __extension__ typedef unsigned long long __u64;
    #endif

    The __extension__ keyword in front of this switches off any pedantic
    warnings for this expression.

    Signed-off-by: Olaf Hering
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olaf Hering
     
  • The README file in the cramfs subdirectory says: "All data is currently in
    host-endian format; neither mkcramfs nor the kernel ever do swabbing."

    If somebody tries to mount a cramfs with the wrong endianess, cramfs only
    complains about a wrong magic but doesn't inform the user that only the
    endianess isn't right.

    The following patch adds an error message to the cramfs sources. If a user
    tries to mount a cramfs with the wrong endianess using the patched sources,
    cramfs will display the message "cramfs: wrong endianess".

    Signed-off-by: Andi Drebes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Drebes
     
  • include/linux/if_fddi.h is an exported header.
    It uses __be16. Include linux/types.h to get this prototype.

    Signed-off-by: Olaf Hering
    Cc: "Maciej W. Rozycki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olaf Hering
     
  • It looks like in the end all pruners want parents removed.

    So remove unused code and function arguments.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • vfs_permission(MAY_EXEC) checks if the filesystem is mounted with "noexec", so
    there's no need to repeat this check in sys_uselib() and open_exec().

    Signed-off-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • permission() checks that MAY_EXEC is only allowed on regular files if at least
    one execute bit is set in the file mode.

    generic_permission() already ensures this, so the extra check in permission()
    is superfluous.

    If the filesystem defines it's own ->permission() the check may still be
    needed. In this case move it after ->permission(). This is needed because
    filesystems such as FUSE may need to refresh the inode attributes before
    checking permissions.

    This check should be moved inside ->permission(), but that's another story.

    Signed-off-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • utimensat() (and possibly other callers of do_utimes()) didn't check if the
    nanosecond value was within the allowed range.

    Signed-off-by: Miklos Szeredi
    Cc: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Hello, I fixed and tested a small bug in lib/sort.c file, heap sort
    function.

    The fix avoids unnecessary swap of contents when i is 0 (saves few loads
    and stores), which happens every time sort function is called. I felt the
    fix is worth bringing it to your attention given the importance and
    frequent use of the sort function.

    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Subbaiah Venkata
     
  • Remove linux/consolemap.h from make headers_install

    It contains no user interfaces.
    The defines in this file are used only for kernel internal state.

    Signed-off-by: Olaf Hering
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olaf Hering
     
  • Remove some remaining vestiges of the old hacks jsm had to work around the old
    tty buffering. With the new tty buffering it simply doesn't matter any more.

    [michal.k.k.piotrowski@gmail.com: fix warning]
    Signed-off-by: Alan Cox
    Acked-by: Scott Kilau
    Cc: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • We simply define it to the same value. Nowdays the TTY flip value is
    irrelevant but the value it used is as good as any so why risk breaking it

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • There have been issues with non-latin1 diacritics and unicode.
    http://bugzilla.kernel.org/show_bug.cgi?id=7746

    Git 759448f459234bfcf34b82471f0dba77a9aca498 `Kernel utf-8 handling'
    partly resolved it by adding conversion between diacritics and
    unicode. The patch below goes further by just turning diacritics into
    unicode, hence providing better future support. The kbd support can be
    fetched from
    http://bugzilla.kernel.org/attachment.cgi?id=12313

    This was tested in all of latin1, latin9, latin2 and unicode with french
    and czech dead keys.

    Turn the kernel accent_table into unicode, and extend ioctls KDGKBDIACR
    and KDSKBDIACR into their equivalents KDGKBDIACRUC and KDSKBDIACR.

    New function int conv_uni_to_8bit(u32 uni) for converting unicode into 8bit
    _input_. No, we don't want to store the translation, as it is potentially
    sparse and large.

    Signed-off-by: Samuel Thibault
    Cc: Jan Engelhardt
    Cc: "Antonino A. Daplas"
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Samuel Thibault
     
  • We can just use skb_mac_header now, and we don't need a wrapper function to
    perform the cast. Instead of requiring the reader to check aoe.h to look
    up what an aoe_hdr function does, I'd rather do without it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • When a new block bitmap is read from disk in read_block_bitmap() there are
    a few bits that should ALWAYS be set. In particular, the blocks given by
    ext4_blk_bitmap, ext4_inode_bitmap and ext4_inode_table. Validate the
    block bitmap against these blocks.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Andreas Dilger
    Acked-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter.
    This dumps the first page (only) of a private file mapping if it appears to
    be a mapping of an ELF file. Including these pages in the core dump may
    give sufficient identifying information to associate the original DSO and
    executable file images and their debugging information with a core file in
    a generic way just from its contents (e.g. when those binaries were built
    with ld --build-id). I expect this to become the default behavior
    eventually. Existing versions of gdb can be confused by the core dumps it
    creates, so it won't enabled by default for some time to come. Soon many
    people will have systems with a gdb that handle these dumps, so they can
    arrange to set the bit at boot and have it inherited system-wide.

    This also cleans up the checking of the MMF_DUMP_* flag bits, which did not
    need to be using atomic macros.

    Signed-off-by: Roland McGrath
    Cc: Hidehiro Kawai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • /usr/include/scsi is provided by glibc.
    Remove the scsi export from make headers_install target.

    Signed-off-by: Olaf Hering
    Cc: David Woodhouse
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olaf Hering
     
  • The child was found on ->children list under tasklist_lock, it must have a
    valid ->signal. __exit_signal() both removes the task from parent->children
    and clears ->signal "atomically" under write_lock(tasklist).

    Remove unneeded checks.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. __group_complete_signal() wakes up ->group_exit_task twice. The
    second wakeup's state includes TASK_UNINTERRUPTIBLE, which is not very
    appropriate.

    Change the code to pass the "correct" argument to signal_wake_up() and kill
    now unneeded wake_up_process().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The "p->exit_signal == -1 && p->ptrace == 0" check and the comment are
    bogus. We already did exactly the same check in eligible_child(), we did
    not drop tasklist_lock since then, and both variables need
    write_lock(tasklist) to be changed.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Nowadays thread_group_empty() and next_thread() are simple list operations,
    this optimization doesn't make sense: we are doing exactly same check one
    line below.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov