01 May, 2013

40 commits

  • Simplify the logic of variable assignments.

    [akpm@linux-foundation.org: replace min_t with min, remove unneeded casts]
    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Reviewed-by: Simon Horman
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • The types of the following local variables:

    - ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment()

    - r in vmcoreinfo_append_str()

    are wrong, so fix them.

    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Cc: Simon Horman
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • threadgroup_lock() takes signal->cred_guard_mutex to ensure that
    thread_group_leader() is stable. This doesn't look nice, the scope of
    this lock in do_execve() is huge.

    And as Dave pointed out this can lead to deadlock, we have the
    following dependencies:

    do_execve: cred_guard_mutex -> i_mutex
    cgroup_mount: i_mutex -> cgroup_mutex
    attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

    Change de_thread() to take threadgroup_change_begin() around the
    switch-the-leader code and change threadgroup_lock() to avoid
    ->cred_guard_mutex.

    Note that de_thread() can't sleep with ->group_rwsem held, this can
    obviously deadlock with the exiting leader if the writer is active, so it
    does threadgroup_change_end() before schedule().

    Reported-by: Dave Jones
    Acked-by: Tejun Heo
    Acked-by: Li Zefan
    Signed-off-by: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • set_task_comm() does memset() + wmb() before strlcpy(). This buys
    nothing and to add to the confusion, the comment is wrong.

    - We do not need memset() to be "safe from non-terminating string
    reads", the final char is always zero and we never change it.

    - wmb() is paired with nothing, it cannot prevent from printing
    the mixture of the old/new data unless the reader takes the lock.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently, a write to a procfs file will return the number of bytes
    successfully written. If the actual string is longer than this, the
    remainder of the string will not be be written and userspace will
    complete the operation by issuing additional write()s.

    Hence

    $ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

    results in

    $ cat /proc/$$/comm
    pqrs

    since the final four bytes were written with a second write() since
    TASK_COMM_LEN == 16. This is obviously an undesired result and not
    equivalent to prctl(PR_SET_NAME). The implementation should not need to
    know the definition of TASK_COMM_LEN.

    This patch truncates the string to the first TASK_COMM_LEN bytes and
    returns the bytes written as the length of the string written so the
    second write() is suppressed.

    $ cat /proc/$$/comm
    abcdefghijklmno

    Signed-off-by: David Rientjes
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
    wait_event-like loop. This is not needed and in fact this is not
    strictly correct, we can/should do this only once after we change
    pipe->writers. We could even check if it becomes zero.

    Change this code to use use wait_event_interruptible(), this can also
    help to make this wait freezable.

    With this patch we check pipe->readers without pipe_lock(), this is
    fine. Once we see pipe->readers == 1 we know that the handler
    decremented the counter, this is all we need.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
    zap_threads() called by do_coredump().

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • By discussion with Mandeep.

    Change dump_write(), dump_seek() and do_coredump() to check
    signal_pending() and abort if it is true. dump_seek() does this only
    before f_op->llseek(), otherwise it relies on dump_write().

    We need this change to ensure that the coredump won't delay suspend, and
    to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
    lot of time. In particular this can help oom-killer.

    We add the new trivial helper, dump_interrupted() to add the comments and
    to simplify the potential freezer changes. Perhaps it will have more
    callers.

    Ideally it should do try_to_freeze() but then we need the unpleasant
    changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
    change dump_write() to restart if f_op->write() fails because of
    freezing(). We need to handle the short writes, we need to clear
    TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
    it to check PF_DUMPCORE). And if the buggy f_op->write() sets
    TIF_SIGPENDING we can not distinguish this case from the race with
    freeze_task() + __thaw_task().

    So we simply accept the fact that the freezer can truncate a core-dump but
    at least you can reliably suspend. Hopefully we can tolerate this
    unlikely case and the necessary complications doesn't worth a trouble.
    But if we decide to make the coredumping freezable later we can do this on
    top of this change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that the coredumping process can be SIGKILL'ed, the setting of
    ->group_exit_code in do_coredump() can race with complete_signal() and
    SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
    SIGKILL | 0x80.

    But the main problem is that it is not clear to me what should we do if
    binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
    comes as a separate change.

    This patch adds 0x80 if ->core_dump() succeeds and the process was not
    killed. But perhaps we can (should?) re-set ->group_exit_code changed by
    SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • prepare_signal() blesses SIGKILL sent to the dumping process but this
    signal can be "lost" anyway. The problems is, complete_signal() sees
    SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
    dumping process is single-threaded (so the target is always "correct"),
    the group-wide SIGKILL is not recorded in task->pending and thus
    __fatal_signal_pending() won't be true. A multi-threaded case has even
    more problems.

    And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
    right to me. This coredumping process is not exiting yet, it can do a lot
    of work dumping the core.

    With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
    signal->group_exit_task instead. This makes signal_group_exit() true and
    thus this should equally close the races with exit/exec/stop but allows to
    kill the dumping thread reliably.

    Notes:
    - It is not clear what should we do with ->group_exit_code
    if the dumper was killed, see the next change.

    - we need more (hopefully straightforward) changes to ensure
    that SIGKILL actually interrupts the coredump. Basically we
    need to check __fatal_signal_pending() in dump_write() and
    dump_seek().

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There are 2 well known and ancient problems with coredump/signals, and a
    lot of related bug reports:

    - do_coredump() clears TIF_SIGPENDING but of course this can't help
    if, say, SIGCHLD comes after that.

    In this case the coredump can fail unexpectedly. See for example
    wait_for_dump_helper()->signal_pending() check but there are other
    reasons.

    - At the same time, dumping a huge core on the slow media can take a
    lot of time/resources and there is no way to kill the coredumping
    task reliably. In particular this is not oom_kill-friendly.

    This patch tries to fix the 1st problem, and makes the preparation for the
    next changes.

    We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
    that this process dumps the core. prepare_signal() checks this flag and
    nacks any signal except SIGKILL.

    Note that this check tries to be conservative, in the long term we should
    probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
    discussion. See marc.info/?l=linux-kernel&m=120508897917439

    Notes:
    - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
    The patch assumes that dump_write/etc paths should never
    call it, but we can change it as well.

    - There is another source of TIF_SIGPENDING, freezer. This
    will be addressed separately.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This function suffers from not being able to determine if the cleanup is
    called in case it returns -ENOMEM. Nobody is using it anymore, so let's
    remove it.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
    calling call_usermodehelper_fns(). In case there's an OOM in this last
    function the cleanup function may not be called - in this case we would
    miss a call to key_put().

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Acked-by: David Howells
    Acked-by: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
    calling call_usermodehelper_fns(). In case the latter returns -ENOMEM the
    cleanup function may had not been called - in this case we would not free
    argv and module_name.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • call_usermodehelper_setup() + call_usermodehelper_exec() need to be
    called instead of call_usermodehelper_fns() when the cleanup function
    needs to be called even when an ENOMEM error occurs. In this case using
    call_usermodehelper_fns() the user can't distinguish if the cleanup
    function was called or not.

    [akpm@linux-foundation.org: export call_usermodehelper_setup() to modules]
    Signed-off-by: Lucas De Marchi
    Reviewed-by: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • * Dump signals from process-wide and per-thread queues with
    different sizes of buffers.
    * Check error paths for buffers with restricted permissions. A part of
    buffer or a whole buffer is for read-only.
    * Try to get nonexistent signal.

    Signed-off-by: Andrew Vagin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: David Howells
    Cc: Dave Jones
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Pavel Emelyanov
    Cc: Linus Torvalds
    Cc: Pedro Alves
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

    This request is used to retrieve information about pending signals
    starting with the specified sequence number. Siginfo_t structures are
    copied from the child into the buffer starting at "data".

    The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
    struct ptrace_peeksiginfo_args {
    u64 off; /* from which siginfo to start */
    u32 flags;
    s32 nr; /* how may siginfos to take */
    };

    "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
    i386 and a negative values is used for errors.

    Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
    signals from process-wide queue. If this flag is not set, signals are
    read from a per-thread queue.

    The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a
    signal with the specified sequence number doesn't exist, ptrace returns
    zero. The request returns an error, if no signal has been dumped.

    Errors:
    EINVAL - one or more specified flags are not supported or nr is negative
    EFAULT - buf or addr is outside your accessible address space.

    A result siginfo contains a kernel part of si_code which usually striped,
    but it's required for queuing the same siginfo back during restore of
    pending signals.

    This functionality is required for checkpointing pending signals. Pedro
    Alves suggested using it in "gdb" to peek at pending signals. gdb already
    uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
    dequeued. This functionality allows gdb to look at the pending signals
    which were not reported yet.

    The prototype of this code was developed by Oleg Nesterov.

    Signed-off-by: Andrew Vagin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: David Howells
    Cc: Dave Jones
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Pavel Emelyanov
    Cc: Linus Torvalds
    Cc: Pedro Alves
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • __hfsplus_ext_write_extent() suppresses errors coming from
    hfs_brec_find(). The patch implements error code propagation.

    Signed-off-by: Alexey Khoroshilov
    Reviewed-by: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • Use a more current logging style.

    Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    hfsplus now uses "hfsplus: " for all messages.
    Coalesce formats.
    Prefix debugging messages too.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style.

    Rename macro and uses.
    Add do {} while (0) to macro.
    Add DBG_ to macro.
    Add and use hfs_dbg_cont variant where appropriate.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
    (1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
    (2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

    [akpm@linux-foundation.org: make the workaround more explicit]
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • hfs_find_init() may fail with ENOMEM, but there are places, where the
    returned value is not checked. The consequences can be very unpleasant,
    e.g. kfree uninitialized pointer and inappropriate mutex unlocking.

    The patch adds checks for errors in hfs_find_init().

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Reviewed-by: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
    unneeded test.

    The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
    error: we previously assumed 'inode' could be null (see line 195)".

    Reported-by: Dan Carpenter
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Change test_bit(PG_locked, &page->flags) to PageLocked().

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • …river's internal error or metadata corruption

    The NILFS2 driver remounts itself in RO mode in the case of discovering
    metadata corruption (for example, discovering a broken bmap). But
    usually, this takes place when there have been file system operations
    before remounting in RO mode.

    Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
    modified inodes' address spaces. It results in flush kernel thread's
    infinite trying to flush dirty pages in RO mode. As a result, it is
    possible to see such side effects as: (1) flush kernel thread occupies
    50% - 99% of CPU time; (2) system can't be shutdowned without manual
    power switch off.

    SYMPTOMS:
    (1) System log contains error message: "Remounting filesystem read-only".
    (2) The flush kernel thread occupies 50% - 99% of CPU time.
    (3) The system can't be shutdowned without manual power switch off.

    REPRODUCTION PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

    ----------------[BEGIN SCRIPT]--------------------
    #!/bin/bash

    VG=unencrypted
    #apt-get install nilfs-tools darcs
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------

    (3) Try to shutdown the system.

    REPRODUCIBILITY: 100%

    FIX:

    This patch implements checking mount state of NILFS2 driver in
    nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
    methods. If it is detected the RO mount state then all dirty pages are
    simply discarded with warning messages is written in system log.

    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
    Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
    Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
    Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
    Cc: Elmer Zhang <freeboy6716@gmail.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Vyacheslav Dubeyko
     
  • Limit the size of the copy so we don't corrupt memory. Hopefully this
    can only be called by root, but fixing this makes the static checkers
    happier.

    Signed-off-by: Dan Carpenter
    Cc: Jiri Kosina
    Cc: Masanari Iida
    Cc: Alan Cox
    Cc: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Move the calls to memcpy_fromio() up into the loop in
    dmi_scan_machine(), and move the signature checks back down into
    dmi_decode(). We need to check at 16-byte intervals but keep a 32-byte
    buffer for an SMBIOS entry, so shift the buffer after each iteration.

    Merge smbios_present() into dmi_present(), so we look for an SMBIOS
    signature at the beginning of the given buffer and then for a DMI
    signature at an offset of 16 bytes.

    [artem.savkov@gmail.com: use proper buf type in dmi_present()]
    Signed-off-by: Ben Hutchings
    Reported-by: Tim McGrath
    Tested-by: Tim Mcgrath
    Cc: Zhenzhong Duan
    Signed-off-by: Artem Savkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     
  • The comment I originally added in commit a3defbe5c337 ("binfmt_elf: fix
    PIE execution with randomization disabled") is not really 100% accurate
    -- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset
    in runtime.

    Another option of course is direct modification of personality flags
    (i.e. running through setarch wrapper).

    Make the comment more explicit and accurate.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     
  • Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
    for interpreted scripts starting with "#!"; allow compiling out that
    support, or building it as a module. Embedded systems running exclusively
    compiled binaries could leave this support out, and systems that don't
    need scripts before mounting the root filesystem can build this as a
    module.

    Signed-off-by: Josh Triplett
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results
    in slightly smaller/faster code.

    Signed-off-by: Eric Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • This reduces the amount of code inside the ready list iteration loops for
    better readability IMHO.

    Signed-off-by: Eric Wong
    Cc: Davide Libenzi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • Technically we do not need to hold ep->mtx during ep_free since we are
    certain there are no other users of ep at that point. However, lockdep
    complains with a "suspicious rcu_dereference_check() usage!" message; so
    lock the mutex before ep_remove to silence the warning.

    Signed-off-by: Eric Wong
    Cc: Al Viro
    Cc: Arve Hjønnevåg
    Cc: Davide Libenzi
    Cc: Eric Dumazet
    Cc: NeilBrown ,
    Cc: Rafael J. Wysocki
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • This prevents wakeup_source destruction when a user hits the item with
    EPOLL_CTL_MOD while ep_poll_callback is running.

    Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2"

    Signed-off-by: Eric Wong
    Cc: Alexander Viro
    Cc: Arve Hjønnevåg
    Cc: Davide Libenzi
    Cc: Eric Dumazet
    Cc: NeilBrown
    Cc: "Rafael J. Wysocki"
    Cc: "Paul E. McKenney"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • It is common for epoll users to have thousands of epitems, so saving a
    cache line on every allocation leads to large memory savings.

    Since epitem allocations are cache-aligned, reducing sizeof(struct
    epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
    cache line boundary on x86_64.

    Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my
    x86_64 Core2 Duo (which has 64-byte cache alignment):

    object_size : 192 => 128
    objs_per_slab: 21 => 32

    Also, add a BUILD_BUG_ON() to check for future accidental breakage.

    [akpm@linux-foundation.org: use __packed, for all architectures]
    Signed-off-by: Eric Wong
    Cc: Davide Libenzi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • Andrew Morton noted:

    akpm3:/usr/src/25> grep SYSCALL kernel/timer.c
    SYSCALL_DEFINE1(alarm, unsigned int, seconds)
    SYSCALL_DEFINE0(getpid)
    SYSCALL_DEFINE0(getppid)
    SYSCALL_DEFINE0(getuid)
    SYSCALL_DEFINE0(geteuid)
    SYSCALL_DEFINE0(getgid)
    SYSCALL_DEFINE0(getegid)
    SYSCALL_DEFINE0(gettid)
    SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info)
    COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)

    Only one of those should be in kernel/timer.c. Who wrote this thing?

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stephen Rothwell
    Acked-by: Thomas Gleixner
    Cc: Guenter Roeck
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Signed-off-by: Stephen Rothwell
    Cc: Thomas Gleixner
    Cc: Guenter Roeck
    Cc: Al Viro
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • The only use outside of kernel/timer.c was in kernel/compat.c, so move
    compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c.

    Signed-off-by: Stephen Rothwell
    Cc: Thomas Gleixner
    Cc: Guenter Roeck
    Cc: Al Viro
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell