do_exit(): make sure that we run with get_fs() == USER_DS

If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c) do a put_user to a user-controlled address, potentially allowing a user to leverage an oops into a controlled write into kernel memory. This is only triggerable in the presence of another bug, but this potentially turns a lot of DoS bugs into privilege escalations, so it's worth fixing. I have proof-of-concept code which uses this bug along with CVE-2010-3849 to write a zero to an arbitrary kernel address, so I've tested that this is not theoretical. A more logical place to put this fix might be when we know an oops has occurred, before we call do_exit(), but that would involve changing every architecture, in multiple places. Let's just stick it in do_exit instead. [akpm@linux-foundation.org: update code comment] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

do_exit(): make sure that we run with get_fs() == USER_DS
If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c) do a put_user to a user-controlled address, potentially allowing a user to leverage an oops into a controlled write into kernel memory. This is only triggerable in the presence of another bug, but this potentially turns a lot of DoS bugs into privilege escalations, so it's worth fixing. I have proof-of-concept code which uses this bug along with CVE-2010-3849 to write a zero to an arbitrary kernel address, so I've tested that this is not theoretical. A more logical place to put this fix might be when we know an oops has occurred, before we call do_exit(), but that would involve changing every architecture, in multiple places. Let's just stick it in do_exit instead. [akpm@linux-foundation.org: update code comment] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nelson Elhage · Linus Torvalds
1 parent a0b0f58cdd
Showing 1 changed file with 9 additions and 0 deletions Side-by-side Diff
kernel/exit.c
@@ -914,6 +914,15 @@
 	if (unlikely(!tsk->pid))
 		panic("Attempted to kill the idle task!");
  
+	/*
+	 * If do_exit is called because this processes oopsed, it's possible
+	 * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
+	 * continuing. Amongst other possible reasons, this is to prevent
+	 * mm_release()->clear_child_tid() from writing to a user-controlled
+	 * kernel address.
+	 */
+	set_fs(USER_DS);
+
 	tracehook_report_exit(&code);
  
 	validate_creds_for_do_exit(tsk);
...	...	@@ -914,6 +914,15 @@
914	914	if (unlikely(!tsk->pid))
915	915	panic("Attempted to kill the idle task!");
916	916
	917	+ /*
	918	+ * If do_exit is called because this processes oopsed, it's possible
	919	+ * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
	920	+ * continuing. Amongst other possible reasons, this is to prevent
	921	+ * mm_release()->clear_child_tid() from writing to a user-controlled
	922	+ * kernel address.
	923	+ */
	924	+ set_fs(USER_DS);
	925	+
917	926	tracehook_report_exit(&code);
918	927
919	928	validate_creds_for_do_exit(tsk);