20 Apr, 2014

1 commit

  • A va_list needs to be copied in case it needs to be used twice.

    Thanks to Hugh for debugging this issue, leading to various panics.

    Tested:

    lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

    'produce_core' is simply : main() { *(int *)0 = 1;}

    lpq84:~# ./produce_core
    Segmentation fault (core dumped)
    lpq84:~# dmesg | tail -1
    [ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

    Notice the last argument was replaced by a NULL (we were lucky enough to
    not crash, but do not try this on your production machine !)

    After fix :

    lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
    lpq83:~# ./produce_core
    Segmentation fault
    lpq83:~# dmesg | tail -1
    [ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

    Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
    Signed-off-by: Eric Dumazet
    Diagnosed-by: Hugh Dickins
    Acked-by: Oleg Nesterov
    Cc: Neil Horman
    Cc: Andrew Morton
    Cc: stable@vger.kernel.org # 3.11+
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

24 Jan, 2014

1 commit

  • 1. Remove fs/coredump.h. It is not clear why do we need it,
    it only declares __get_dumpable(), signal.c includes it
    for no reason.

    2. Now that get_dumpable() and __get_dumpable() are really
    trivial make them inline in linux/sched.h.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Alex Kelly
    Cc: "Eric W. Biederman"
    Cc: Josh Triplett
    Cc: Petr Matousek
    Cc: Vasily Kulikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

16 Nov, 2013

2 commits


09 Nov, 2013

5 commits


25 Oct, 2013

1 commit


12 Sep, 2013

1 commit

  • Add a new %P variable to be used in core_pattern. This variable contains
    the global PID (PID in the init namespace) as %p contains the PID in the
    current namespace which isn't always what we want.

    The main use for this is to make it easier to handle crashes that happened
    within a container. With that new variables it's possible to have the
    crashes dumped into the container or forwarded to the host with the right
    PID (from the host's point of view).

    Signed-off-by: Stéphane Graber
    Reported-by: Hans Feldt
    Cc: Alexander Viro
    Cc: Eric W. Biederman
    Cc: Andy Whitcroft
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stéphane Graber
     

04 Jul, 2013

6 commits

  • "goto end" should not bypass the "Backward compatibility with
    core_uses_pid" code, move this label up.

    While at it,

    - It is ugly to copy '|' into cn->corename and then inc
    the pointer for argv_split().

    Change format_corename() to increment pat_ptr instead.

    - Remove the dead "if (*pat_ptr == 0)" in format_corename(),
    we already checked it is not zero.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, "atomic_t call_count" is ugly and should die. It buys nothing and
    in fact it can grow more than necessary, expand doesn't check if it was
    already incremented by another task.

    Kill it, and introduce "static int core_name_size" updated by
    expand_corename(). This is obviously racy too but harmless, and
    core_name_size never grows for no reason.

    We do not bother to to calculate the "right" new size, we simply do
    kmalloc(size_we_need) and use ksize() to rely on kmalloc_index's decision.

    Finally change format_corename() to use expand_corename(), krealloc(NULL)
    is fine.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The usage of cn_escape() looks really annoying, imho this sequence needs a
    wrapper. And it is buggy. If cn_printf() does expand_corename()
    cn_escape() writes to the freed memory.

    Introduce cn_esc_printf() which hopefully does this all right. It records
    the index before cn_vprintf(), not "char *" which is no longer valid (in
    general) after krealloc().

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • cn_vprintf() looks really overcomplicated and sub-optimal. We do not need
    vsnprintf(NULL) to calculate the size we need, we can simply try to print
    into the current buffer and expand/retry only if necessary.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Turn cn_printf(...) into cn_vprintf(va_list args), reintroduce
    cn_printf() as a trivial wrapper.

    This simplifies the next change and cn_vprintf() will have more
    callers.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_coredump() assumes that format_corename() can only fail if
    expand_corename() fails and frees cn->corename. This is not true, for
    example cn_print_exe_file() can fail and in this case nobody frees
    cn->corename.

    Change do_coredump() to always do kfree(cn->corename) after it calls
    format_corename() (NULL is fine), change expand_corename() to do nothing
    if kmalloc() fails.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Colin Walters
    Cc: Denys Vlasenko
    Cc: Jiri Slaby
    Cc: Lennart Poettering
    Cc: Lucas De Marchi
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

05 May, 2013

1 commit


02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

8 commits

  • wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
    wait_event-like loop. This is not needed and in fact this is not
    strictly correct, we can/should do this only once after we change
    pipe->writers. We could even check if it becomes zero.

    Change this code to use use wait_event_interruptible(), this can also
    help to make this wait freezable.

    With this patch we check pipe->readers without pipe_lock(), this is
    fine. Once we see pipe->readers == 1 we know that the handler
    decremented the counter, this is all we need.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
    zap_threads() called by do_coredump().

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • By discussion with Mandeep.

    Change dump_write(), dump_seek() and do_coredump() to check
    signal_pending() and abort if it is true. dump_seek() does this only
    before f_op->llseek(), otherwise it relies on dump_write().

    We need this change to ensure that the coredump won't delay suspend, and
    to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
    lot of time. In particular this can help oom-killer.

    We add the new trivial helper, dump_interrupted() to add the comments and
    to simplify the potential freezer changes. Perhaps it will have more
    callers.

    Ideally it should do try_to_freeze() but then we need the unpleasant
    changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
    change dump_write() to restart if f_op->write() fails because of
    freezing(). We need to handle the short writes, we need to clear
    TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
    it to check PF_DUMPCORE). And if the buggy f_op->write() sets
    TIF_SIGPENDING we can not distinguish this case from the race with
    freeze_task() + __thaw_task().

    So we simply accept the fact that the freezer can truncate a core-dump but
    at least you can reliably suspend. Hopefully we can tolerate this
    unlikely case and the necessary complications doesn't worth a trouble.
    But if we decide to make the coredumping freezable later we can do this on
    top of this change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that the coredumping process can be SIGKILL'ed, the setting of
    ->group_exit_code in do_coredump() can race with complete_signal() and
    SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
    SIGKILL | 0x80.

    But the main problem is that it is not clear to me what should we do if
    binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
    comes as a separate change.

    This patch adds 0x80 if ->core_dump() succeeds and the process was not
    killed. But perhaps we can (should?) re-set ->group_exit_code changed by
    SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • prepare_signal() blesses SIGKILL sent to the dumping process but this
    signal can be "lost" anyway. The problems is, complete_signal() sees
    SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
    dumping process is single-threaded (so the target is always "correct"),
    the group-wide SIGKILL is not recorded in task->pending and thus
    __fatal_signal_pending() won't be true. A multi-threaded case has even
    more problems.

    And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
    right to me. This coredumping process is not exiting yet, it can do a lot
    of work dumping the core.

    With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
    signal->group_exit_task instead. This makes signal_group_exit() true and
    thus this should equally close the races with exit/exec/stop but allows to
    kill the dumping thread reliably.

    Notes:
    - It is not clear what should we do with ->group_exit_code
    if the dumper was killed, see the next change.

    - we need more (hopefully straightforward) changes to ensure
    that SIGKILL actually interrupts the coredump. Basically we
    need to check __fatal_signal_pending() in dump_write() and
    dump_seek().

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There are 2 well known and ancient problems with coredump/signals, and a
    lot of related bug reports:

    - do_coredump() clears TIF_SIGPENDING but of course this can't help
    if, say, SIGCHLD comes after that.

    In this case the coredump can fail unexpectedly. See for example
    wait_for_dump_helper()->signal_pending() check but there are other
    reasons.

    - At the same time, dumping a huge core on the slow media can take a
    lot of time/resources and there is no way to kill the coredumping
    task reliably. In particular this is not oom_kill-friendly.

    This patch tries to fix the 1st problem, and makes the preparation for the
    next changes.

    We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
    that this process dumps the core. prepare_signal() checks this flag and
    nacks any signal except SIGKILL.

    Note that this check tries to be conservative, in the long term we should
    probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
    discussion. See marc.info/?l=linux-kernel&m=120508897917439

    Notes:
    - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
    The patch assumes that dump_write/etc paths should never
    call it, but we can change it as well.

    - There is another source of TIF_SIGPENDING, freezer. This
    will be addressed separately.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     

10 Apr, 2013

2 commits


28 Feb, 2013

1 commit

  • The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
    defines introduced in 54b501992dd2 ("coredump: warn about unsafe
    suid_dumpable / core_pattern combo"). Remove the new ones, and use the
    prior values instead.

    Signed-off-by: Kees Cook
    Reported-by: Chen Gang
    Cc: Alexander Viro
    Cc: Alan Cox
    Cc: "Eric W. Biederman"
    Cc: Doug Ledford
    Cc: Serge Hallyn
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

23 Feb, 2013

1 commit


29 Nov, 2012

1 commit


17 Oct, 2012

1 commit

  • replace_fd() began with "eats a reference, tries to insert into
    descriptor table" semantics; at some point I'd switched it to
    much saner current behaviour ("try to insert into descriptor
    table, grabbing a new reference if inserted; caller should do
    fput() in any case"), but forgot to update the callers.
    Mea culpa...

    [Spotted by Pavel Roskin, who has really weird system with pipe-fed
    coredumps as part of what he considers a normal boot ;-)]

    Signed-off-by: Al Viro

    Al Viro
     

06 Oct, 2012

3 commits

  • This is a preparatory patch for the introduction of NT_SIGINFO elf note.

    With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
    do_coredump() and put it into coredump_params. It will be used by the
    next patch. Most changes are simple s/signr/siginfo->si_signo/.

    Signed-off-by: Denys Vlasenko
    Reviewed-by: Oleg Nesterov
    Cc: Amerigo Wang
    Cc: "Jonathan M. Foote"
    Cc: Roland McGrath
    Cc: Pedro Alves
    Cc: Fengguang Wu
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Some coredump handlers want to create a core file in a way compatible with
    standard behavior. Standard behavior with fs.suid_dumpable = 2 is to
    create core file with uid=gid=0. However, there was no way for coredump
    handler to know that the process being dumped was suid'ed.

    This patch adds the new %d specifier for format_corename() which simply
    reports __get_dumpable(mm->flags), this is compatible with
    /proc/sys/fs/suid_dumpable we already have.

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=787135

    Developed during a discussion with Denys Vlasenko.

    Signed-off-by: Oleg Nesterov
    Cc: Denys Vlasenko
    Cc: Alex Kelly
    Cc: Andi Kleen
    Cc: Cong Wang
    Cc: Jiri Moskovcak
    Acked-by: Neil Horman
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Create a new header file, fs/coredump.h, which contains functions only
    used by the new coredump.c. It also moves do_coredump to the
    include/linux/coredump.h header file, for consistency.

    Signed-off-by: Alex Kelly
    Reviewed-by: Josh Triplett
    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Kelly
     

03 Oct, 2012

1 commit

  • This prepares for making core dump functionality optional.

    The variable "suid_dumpable" and associated functions are left in fs/exec.c
    because they're used elsewhere, such as in ptrace.

    Signed-off-by: Alex Kelly
    Reviewed-by: Josh Triplett
    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Alex Kelly