Commit 3c77f845722158206a7209c45ccddc264d19319c

Authored by Oleg Nesterov
Committed by Linus Torvalds
1 parent 37a09f0745

exec: make argv/envp memory visible to oom-killer

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler <spender@grsecurity.net>
Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 2 changed files with 31 additions and 2 deletions Side-by-side Diff

... ... @@ -164,6 +164,25 @@
164 164  
165 165 #ifdef CONFIG_MMU
166 166  
  167 +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
  168 +{
  169 + struct mm_struct *mm = current->mm;
  170 + long diff = (long)(pages - bprm->vma_pages);
  171 +
  172 + if (!mm || !diff)
  173 + return;
  174 +
  175 + bprm->vma_pages = pages;
  176 +
  177 +#ifdef SPLIT_RSS_COUNTING
  178 + add_mm_counter(mm, MM_ANONPAGES, diff);
  179 +#else
  180 + spin_lock(&mm->page_table_lock);
  181 + add_mm_counter(mm, MM_ANONPAGES, diff);
  182 + spin_unlock(&mm->page_table_lock);
  183 +#endif
  184 +}
  185 +
167 186 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
168 187 int write)
169 188 {
... ... @@ -186,6 +205,8 @@
186 205 unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
187 206 struct rlimit *rlim;
188 207  
  208 + acct_arg_size(bprm, size / PAGE_SIZE);
  209 +
189 210 /*
190 211 * We've historically supported up to 32 pages (ARG_MAX)
191 212 * of argument strings even with small stacks
... ... @@ -276,6 +297,10 @@
276 297  
277 298 #else
278 299  
  300 +static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
  301 +{
  302 +}
  303 +
279 304 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
280 305 int write)
281 306 {
... ... @@ -1003,6 +1028,7 @@
1003 1028 /*
1004 1029 * Release all of the old mmap stuff
1005 1030 */
  1031 + acct_arg_size(bprm, 0);
1006 1032 retval = exec_mmap(bprm->mm);
1007 1033 if (retval)
1008 1034 goto out;
... ... @@ -1426,8 +1452,10 @@
1426 1452 return retval;
1427 1453  
1428 1454 out:
1429   - if (bprm->mm)
1430   - mmput (bprm->mm);
  1455 + if (bprm->mm) {
  1456 + acct_arg_size(bprm, 0);
  1457 + mmput(bprm->mm);
  1458 + }
1431 1459  
1432 1460 out_file:
1433 1461 if (bprm->file) {
include/linux/binfmts.h
... ... @@ -29,6 +29,7 @@
29 29 char buf[BINPRM_BUF_SIZE];
30 30 #ifdef CONFIG_MMU
31 31 struct vm_area_struct *vma;
  32 + unsigned long vma_pages;
32 33 #else
33 34 # define MAX_ARG_PAGES 32
34 35 struct page *page[MAX_ARG_PAGES];