20 Oct, 2008

40 commits

  • Changes mips to use the new bcd2bin/bin2bcd functions instead of the
    obsolete BCD_TO_BIN/BIN_TO_BCD/BCD2BIN/BIN2BCD macros.

    Signed-off-by: Adrian Bunk
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • These are going away.

    Cc: Takashi Iwai
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Change various rtc related code to use the new bcd2bin/bin2bcd functions
    instead of the obsolete BCD_TO_BIN/BIN_TO_BCD/BCD2BIN/BIN2BCD macros.

    Signed-off-by: Adrian Bunk
    Acked-by: Alessandro Zummo
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Change drivers/rtc/ to use the new bcd2bin/bin2bcd functions instead of
    the obsolete BCD_TO_BIN/BIN_TO_BCD/BCD2BIN/BIN2BCD macros.

    Signed-off-by: Adrian Bunk
    Acked-by: Alessandro Zummo
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Change cris to use the new bcd2bin/bin2bcd functions instead of the
    obsolete BCD_TO_BIN/BIN_TO_BCD macros.

    Signed-off-by: Adrian Bunk
    Cc: Chris Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Change alpha to use the new bcd2bin/bin2bcd functions instead of the
    obsolete BCD_TO_BIN/BIN_TO_BCD macros.

    Signed-off-by: Adrian Bunk
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Change ACPI to use the new bcd2bin/bin2bcd functions instead of the
    obsolete BCD_TO_BIN/BIN_TO_BCD macros.

    Signed-off-by: Adrian Bunk
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch makes the needlessly global anon_vma_cachep static.

    Signed-off-by: Adrian Bunk
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Rik van Riel
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • A consolidated implementation will provide this generically through
    asm/byteorder, remove direct includes to avoid breakage when the
    changeover to the new implementation occurs.

    Signed-off-by: Harvey Harrison
    Acked-by: Mauro Carvalho Chehab
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • This is needed during the transition to the new byteorder headers as the
    swabb.h functionality will be provided from asm/byteorder.h in the new
    version. To avoid breakage on arches still using the old implementation,
    provide swabb.h from asm/byteorder.h as well.

    Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • This makes the new implementation of the byteorder helpers match the old
    in how it degraded when an arch-defined version was not available:

    1) swab()
    - look for arch defined
    - if not, use generic c version

    2) swabp()
    - look for arch-defined
    - if not, deref pointer and use swab()

    3) swabs()
    - look for arch defined
    - if not, use swabp

    Signed-off-by: Harvey Harrison
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • The cell_edac driver is setting the edac_mode field of the csrow's to an
    incorrect value, causing the sysfs show routine for that field to go out
    of an array bound and Oopsing the kernel when used.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Doug Thompson
    Cc: [2.6.27.x, 2.6.26.x. 2.6.25.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • This is only compile tested, because I do not own appropriate hardware.

    Signed-off-by: Andre Haupt
    Cc: Jim Cromie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andre Haupt
     
  • Increase the range of various posix message queue limits.

    Posix gives the message queue user the ability to 'trade off' the maximum
    size of messages with the number of possible messages that can be 'in
    flight'. Linux currently makes this trade off more restrictive than it
    needs to be.

    In particular, the maximum message size today can be made no smaller than
    8192. This greatly restricts those applications that would like to have
    the ability to post large numbers of very small messages.

    So this task lowers the limit that the maximum message size can be set to,
    from 8192 to 128. It also lowers the limit that the maximum #number of
    messages in flight can be set to, from 10 to 1.

    With these changes the message queue user can make better trade offs
    between #messages and message size, in order to get everything to fit
    within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user.

    This patch also applies the values in

    /proc/sys/fs/mqueue/msg_max
    /proc/sys/fs/mqueue/msgsize_max

    as the defaults for the max #messages allowed and the max message size
    allowed, respectively, for those applications that do not supply these.
    Previously, the defaults were hardwired to 10 and 8192, respectively.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joe Korty
    Cc: Al Viro
    Cc: Manfred Spraul
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Korty
     
  • Add the symbols 'vmlist' and offset 'vm_struct.addr' to the vmcoreinfo[1]
    data for i386 vmalloc translation.

    makedumpfile[2] needs VMALLOC_START value for distinguishing a vmalloc
    address or not, because it should choose suitable translation method. If
    applying this patch, makedumpfile will be able to take VMALLOC_START value
    from 'vmlist.addr'.

    vmcoreinfo[1]:
    The vmcoreinfo data has the minimum debugging information only for dump
    filtering. makedumpfile[2] uses it to distinguish unnecessary pages and
    creates a small dumpfile.

    makedumpfile[2]:
    dump filtering command
    https://sourceforge.net/projects/makedumpfile/

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • elfcore header memory needs to be reserved in a crash kernel. This means
    that the relevant code should be protected by CONFIG_CRASH_DUMP rather
    than CONFIG_PROC_VMCORE.

    Signed-off-by: Simon Horman
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Horman
     
  • The usage of elfcorehdr_addr has changed recently such that being set to
    ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
    executing in a kernel executed as a crash kernel.

    However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
    elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
    calls to is_kdump_kernel() will return 0, even though they should return
    1.

    Ok, at this point in time there are no subsequent calls, but I think its
    fair to say that there is ample scope for error or at the very least
    confusion.

    This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
    elfcorehdr_addr was passed on the command line, and thus execution is
    taking place in a crashdump kernel, but vmcore can't be used for some
    reason. This is tested for using is_vmcore_usable() and set using
    vmcore_unusable(). A subsequent patch makes use of this new code.

    To summarise, the states that elfcorehdr_addr can now be in are as follows:

    ELFCORE_ADDR_MAX: not a crashdump kernel
    ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
    any other value: crash dump kernel and vmcore is usable

    Signed-off-by: Simon Horman
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Horman
     
  • o Make use of is_kdump_kernel() rather than checking elfcorehdr_addr directly.

    o Remove CONFIG_CRASH_DUMP as is_kdump_kernel() is safe to call anywhere

    o Remove CONFIG_PROC_FS as it is bogus, the check
    should occur regardless of if CONFIG_PROC_FS is set or not.

    Signed-off-by: Simon Horman
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Horman
     
  • IA64, PPC and SH also support the elfcorehdr command line.

    Signed-off-by: Simon Horman
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Horman
     
  • o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
    but also by the code which is not inside CONFIG_PROC_VMCORE. For
    example, is_kdump_kernel() is used by powerpc code to determine if
    kernel is booting after a panic then use previous kernel's TCE table.
    So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
    able to correctly determine that we are booting after a panic and setup
    calgary iommu accordingly.

    o So remove the assumption that elfcorehdr_addr is under
    CONFIG_PROC_VMCORE.

    o Move definition of elfcorehdr_addr to arch dependent crash files.
    (Unfortunately crash dump does not have an arch independent file
    otherwise that would have been the best place).

    o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
    second kernel without KEXEC being enabled.

    o I don't see sh setup code parsing the command line for
    elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
    Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
    broken on sh.

    Signed-off-by: Vivek Goyal
    Acked-by: "Eric W. Biederman"
    Acked-by: Simon Horman
    Acked-by: Paul Mundt
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Now that wait_task_inactive(task, state) checks task->state == state,
    we can simplify the code and make this debugging check more robust.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This adds a kconfig option to change the /proc/PID/coredump_filter default.
    Fedora has been carrying a trivial patch to change the hard-wired value for
    this default, since Fedora 8. The default default can't change safely
    because there are old GDB versions out there (all before 6.7) that are
    confused by the core dump files created by the MMF_DUMP_ELF_HEADERS setting.

    Signed-off-by: Roland McGrath
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Alan Cox
    Cc: Andi Kleen
    Cc: KOSAKI Motohiro
    Cc: Kawai Hidehiro
    Cc: Ingo Molnar
    Cc: David Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • If the coredumping is multi-threaded, format_corename() appends .%pid to
    the corename. This was needed before the proper multi-thread core dump
    support, now all the threads in the mm go into a single unified core file.

    Remove this special case, it is not even documented and we have "%p"
    and core_uses_pid.

    Signed-off-by: Oleg Nesterov
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Alan Cox
    Cc: Roland McGrath
    Cc: Andi Kleen
    Cc: La Monte Yarroll
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • ptrace_untrace() can now become static.

    Signed-off-by: Adrian Bunk
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • bitmap_scnprintf_len() is not used now, so we remove it.

    Otherwise we have to maintain it and make its return
    value always equal to bitmap_scnprintf()'s return value.

    Signed-off-by: Lai Jiangshan
    Cc: Alexey Dobriyan
    Cc: Paul Menage
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • 1) seq_file excepts that m->count == m->size when it's buf is full,
    so current code will causes bugs when buf is overflow.

    2) There is not too good that cpuset accesses struct seq_file's
    fields directly.

    Signed-off-by: Lai Jiangshan
    Cc: Alexey Dobriyan
    Acked-by: Paul Menage
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • seq_cpumask_list(), seq_nodemask_list() are very like seq_cpumask(),
    seq_nodemask(), but they print human readable string.

    Signed-off-by: Lai Jiangshan
    Cc: Alexey Dobriyan
    Cc: Paul Menage
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • "m->count + len < m->size" is true commonly, so bitmap_scnprintf()
    is commonly called. this fix saves a call to bitmap_scnprintf_len().

    Signed-off-by: Lai Jiangshan
    Cc: Alexey Dobriyan
    Cc: Paul Menage
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Remove the use of int cpus_nonempty variable from 'update_flag' function.

    Signed-off-by: Md.Rakib H. Mullick
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • Allocate all page_cgroup at boot and remove page_cgroup poitner from
    struct page. This patch adds an interface as

    struct page_cgroup *lookup_page_cgroup(struct page*)

    All FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.

    Remove page_cgroup pointer reduces the amount of memory by
    - 4 bytes per PAGE_SIZE.
    - 8 bytes per PAGE_SIZE
    if memory controller is disabled. (even if configured.)

    On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
    On my x86-64 server with 48GB of memory, this saves 96MB of memory.
    I think this reduction makes sense.

    By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
    This means
    - we're not necessary to be afraid of kmalloc faiulre.
    (this can happen because of gfp_mask type.)
    - we can avoid calling kmalloc/kfree.
    - we can avoid allocating tons of small objects which can be fragmented.
    - we can know what amount of memory will be used for this extra-lru handling.

    I added printk message as

    "allocated %ld bytes of page_cgroup"
    "please try cgroup_disable=memory option if you don't want"

    maybe enough informative for users.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch makes page_cgroup->flags to be atomic_ops and define functions
    (and macros) to access it.

    Before trying to modify memory resource controller, this atomic operation
    on flags is necessary. Most of flags in this patch is for LRU and modfied
    under mz->lru_lock but we'll add another flags which is not for LRU soon.
    For example, we'll place LOCK bit on flags field. We need atomic
    operation to modify LRU bit without LOCK.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Some obvious optimization to memcg.

    I found mem_cgroup_charge_statistics() is a little big (in object) and
    does unnecessary address calclation. This patch is for optimization to
    reduce the size of this function.

    And res_counter_charge() is 'likely' to succeed.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • There are not-on-LRU pages which can be mapped and they are not worth to
    be accounted. (becasue we can't shrink them and need dirty codes to
    handle specical case) We'd like to make use of usual objrmap/radix-tree's
    protcol and don't want to account out-of-vm's control pages.

    When special_mapping_fault() is called, page->mapping is tend to be NULL
    and it's charged as Anonymous page. insert_page() also handles some
    special pages from drivers.

    This patch is for avoiding to account special pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch tries to make page->mapping to be NULL before
    mem_cgroup_uncharge_cache_page() is called.

    "page->mapping == NULL" is a good check for "whether the page is still
    radix-tree or not". This patch also adds BUG_ON() to
    mem_cgroup_uncharge_cache_page();

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • While page-cache's charge/uncharge is done under page_lock(), swap-cache
    isn't. (anonymous page is charged when it's newly allocated.)

    This patch moves do_swap_page()'s charge() call under lock. I don't see
    any bad problem *now* but this fix will be good for future for avoiding
    unnecessary racy state.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Since we introduced rcu for read side, spin_lock is used only for update.
    But we always hold cgroup_lock() when update, so spin_lock() is not need.

    Additional cleanup:
    1) include linux/rcupdate.h explicitly
    2) remove unused variable cur_devcgroup in devcgroup_update_access()

    Signed-off-by: Lai Jiangshan
    Acked-by: "Serge E. Hallyn"
    Cc: Paul Menage
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • This saves 40 bytes on my x86_32 box.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • The choice of real/dummy declaration for cgroup_mm_owner_callbacks()
    shouldn't be based on CONFIG_MM_OWNER, but on CONFIG_CGROUPS. Otherwise
    kernel/exit.c fails to compile when something other than a cgroups
    controller selects CONFIG_MM_OWNER

    Signed-off-by: Paul Menage
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage