07 Mar, 2010

40 commits

  • Signed-off-by: Frans Pop
    Cc: linux-parisc@vger.kernel.org
    Cc: Kyle McMartin
    Cc: Helge Deller
    Signed-off-by: Kyle McMartin

    Frans Pop
     
  • tested with test_accept4.c from de11defebf00007677fb7ee91d9b089b78786fbb

    Signed-off-by: Kyle McMartin

    Kyle McMartin
     
  • Signed-off-by: Helge Deller
    Signed-off-by: Kyle McMartin

    Helge Deller
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
    [LogFS] Change magic number
    [LogFS] Remove h_version field
    [LogFS] Check feature flags
    [LogFS] Only write journal if dirty
    [LogFS] Fix bdev erases
    [LogFS] Silence gcc
    [LogFS] Prevent 64bit divisions in hash_index
    [LogFS] Plug memory leak on error paths
    [LogFS] Add MAINTAINERS entry
    [LogFS] add new flash file system

    Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
    fs/logfs/inode.c introduced by write_inode() being changed to use
    writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
    ("pass writeback_control to ->write_inode")

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
    dm raid1: fix deadlock when suspending failed device
    dm: eliminate some holes data structures
    dm ioctl: introduce flag indicating uevent was generated
    dm: free dm_io before bio_endio not after
    dm table: remove unused dm_get_device range parameters
    dm ioctl: only issue uevent on resume if state changed
    dm raid1: always return error if all legs fail
    dm mpath: refactor pg_init
    dm mpath: wait for pg_init completion when suspending
    dm mpath: hold io until all pg_inits completed
    dm mpath: avoid storing private suspended state
    dm: document when snapshot has finished merging
    dm table: remove dm_get from dm_table_get_md
    dm mpath: skip activate_path for failed paths
    dm mpath: pass struct pgpath to pg init done

    Linus Torvalds
     
  • * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: (23 commits)
    hwmon: Remove the deprecated adt7473 driver
    hwmon: Fix off-by-one kind values
    hwmon: (tmp421) Fix temperature conversions
    hwmon: (tmp421) Restore missing inputs
    hwmon: Driver for Andigilog aSC7621 family monitoring chips
    hwmon: (adt7411) Improve locking
    hwmon: Add driver for ADT7411 voltage and temperature sensor
    hwmon: (w83793) Add watchdog functionality
    hwmon: (g760a) Make rpm_from_cnt static
    hwmon: (it87) Validate auto pwm settings
    hwmon: (it87) Add support for old automatic fan speed control
    hwmon: (it87) Drop dead web links in documentation
    hwmon: (it87) Add an entry in MAINTAINERS
    hwmon: (it87) Use strict_strtol instead of simple_strtol
    hwmon: (it87) Fix many checkpatch errors and warnings
    hwmon: (it87) Add support for beep on alarm
    hwmon: (it87) Create vid attributes by group
    hwmon: (it87) Refactor attributes creation and removal
    hwmon: (it87) Expose the PWM/temperature mappings
    hwmon: (it87) Display fan outputs in automatic mode as such
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs: (21 commits)
    xfs: return inode fork offset in bulkstat for fsr
    xfs: Increase the default size of the reserved blocks pool
    xfs: truncate delalloc extents when IO fails in writeback
    xfs: check for more work before sleeping in xfssyncd
    xfs: Fix a build warning in xfs_aops.c
    xfs: fix locking for inode cache radix tree tag updates
    xfs: remove xfs_ipin/xfs_iunpin
    xfs: cleanup xfs_iunpin_wait/xfs_iunpin_nowait
    xfs: kill xfs_lrw.h
    xfs: factor common xfs_trans_bjoin code
    xfs: stop passing opaque handles to xfs_log.c routines
    xfs: split xfs_bmap_btalloc
    xfs: fix xfs_fsblock_t tracing
    xfs: fix inode pincount check in fsync
    xfs: Non-blocking inode locking in IO completion
    xfs: implement optimized fdatasync
    xfs: remove wrapper for the fsync file operation
    xfs: remove wrappers for read/write file operations
    xfs: merge xfs_lrw.c into xfs_file.c
    xfs: fix dquota trace format
    ...

    Linus Torvalds
     
  • * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
    nfsd4: fix minor memory leak
    svcrpc: treat uid's as unsigned
    nfsd: ensure sockets are closed on error
    Revert "sunrpc: move the close processing after do recvfrom method"
    Revert "sunrpc: fix peername failed on closed listener"
    sunrpc: remove unnecessary svc_xprt_put
    NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
    xfs_export_operations.commit_metadata
    commit_metadata export operation replacing nfsd_sync_dir
    lockd: don't clear sm_monitored on nsm_reboot_lookup
    lockd: release reference to nsm_handle in nlm_host_rebooted
    nfsd: Use vfs_fsync_range() in nfsd_commit
    NFSD: Create PF_INET6 listener in write_ports
    SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
    SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
    NFSD: Support AF_INET6 in svc_addsock() function
    SUNRPC: Use rpc_pton() in ip_map_parse()
    nfsd: 4.1 has an rfc number
    nfsd41: Create the recovery entry for the NFSv4.1 client
    nfsd: use vfs_fsync for non-directories
    ...

    Linus Torvalds
     
  • * git://git.infradead.org/ubi-2.6:
    UBI: add write checking
    UBI: simplify debugging return codes
    UBI: fix attaching error path
    UBI: support attaching by MTD character device name
    UBI: mark few variables as __initdata

    Linus Torvalds
     
  • Signed-off-by: Denis Turischev
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis Turischev
     
  • The cs5535-gpio driver's get() function was returning the output value.
    This means that the GPIO pins would never work as an input, even if
    configured as an input.

    The driver should return the READ_BACK value, which is the sensed line
    value. To make that work when the direction is 'output', INPUT_ENABLE
    needs to be set.

    In addition, the driver was not disabling OUTPUT_ENABLE when the direction
    is set to 'input'. That would cause the GPIO to continue to drive the pin
    if the direction was ever set to output.

    This issue was noticed when attempting to use the gpiolib driver to read
    an external input. I had previously been using the char/cs5535-gpio
    driver.

    Signed-off-by: Ben Gardner
    Acked-by: Andres Salomon
    Cc: Andrew Morton
    Cc: David Brownell
    Cc: Mark Brown
    Cc: [2.6.33.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Gardner
     
  • Most of the GPIO expanders controlled by the pca953x driver are able to
    report changes on the input pins through an *INT pin.

    This patch implements the irq_chip functionality (edge detection only).

    The driver has been tested on an Arcom Zeus.

    [akpm@linux-foundation.org: the compiler does inlining for us nowadays]
    Signed-off-by: Marc Zyngier
    Cc: Eric Miao
    Cc: Haojian Zhuang
    Cc: David Brownell
    Cc: Nate Case
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marc Zyngier
     
  • Introduce support for triggering interrupts on both rising and falling
    edge.

    This feature requires version 3 or newer of the IP, a version check is
    done when triggering on both edges is requested.

    Signed-off-by: Richard Röjfors
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Röjfors
     
  • gpio_request() without initial configuration of the GPIO is normally
    useless, introduce gpio_request_one() together with GPIOF_ flags for
    input/output direction and initial output level.

    gpio_{request,free}_array() for multiple GPIOs.

    Signed-off-by: Eric Miao
    Cc: David Brownell
    Cc: Ben Nizette
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Miao
     
  • linux/i2c/pca953x.h is a very bare include file. Fix check for multiple
    includes of linux/i2c/pca953x.h, and add dependent includes into the
    header file.

    Signed-off-by: Olof Johansson
    Acked-by: Wolfram Sang
    Acked-by: Jean Delvare
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • Add the MAX7300-I2C variant of the MAX7301-SPI version. Both chips share
    the same core logic, so the generic part of the in-kernel SPI-driver is
    refactored into a generic part. The I2C and SPI specific funtions are
    then wrapped into seperate drivers picking up the generic part.

    Signed-off-by: Wolfram Sang
    Cc: Juergen Beisert
    Cc: David Brownell
    Cc: Jean Delvare
    Cc: Anton Vorontsov
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wolfram Sang
     
  • The else part of the if statement is indented but does not have braces
    around it. It clearly should since it uses clk_enable and clk_disable
    which are supposed to balance.

    Signed-off-by: James Hogan
    Acked-by: Linus Walleij
    Acked-by: Alessandro Zummo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Hogan
     
  • Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • This is to protect from interrupt handlers using an unregistered rtc
    device.

    To assert that the reset irq is considered now before the rtc is
    registered the corresponding status is checked before.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • The driver for the mc13783 rtc needs to know if the TODA irq is pending.

    Instead of tracking in the rtc driver if the irq is enabled provide that
    information, too.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • mc13783_ackirq, mc13783_unmask and mc13783_mask are deprecated, use the
    drop in replacements with the nicer names.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • mc13783_ackirq is deprecated, use the drop in replacement mc13783_irq_ack.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Acked-by: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • In the source file group these functions together.

    The mc13783 header file provides fallback implementations for the old
    names to prevent build failures. When all users of the old names are
    fixed to use the new names these can go away.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • The idr should be destroyed when the module is unloaded. Found with
    kmemleak.

    Signed-off-by: Aaro Koskinen
    Cc: Alessandro Zummo
    Cc: stable
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaro Koskinen
     
  • The C99 specification states in section 6.11.5:

    The placement of a storage-class specifier other than at the beginning of
    the declaration specifiers in a declaration is an obsolescent feature.

    Signed-off-by: Tobias Klauser
    Signed-off-by: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • Memset should be given the size of the structure, not the size of the
    pointer.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    type T;
    T *x;
    expression E;
    @@

    memset(x, E, sizeof(
    + *
    x))
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • The function pcf2123_remove is used only wrapped by __devexit_p so define
    it using __devexit.

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Alessandro Zummo
    Cc: Christian Pellegrin
    Cc: Chris Verges
    Cc: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • Fix issue with rtc device not getting unregistered in probe error path.

    Use the devres managed resource functions in the probe routine to cleanup
    the error path.

    Use sysfs_{create/remove}_group to add/remove the sysfs files.

    Reduces the text size by 132 bytes, increases data by 12 bytes:
    text data bss dec hex filename
    - 937 124 0 1061 425 rtc-ep93xx.o
    + 805 136 0 941 3ad rtc-ep93xx.o

    Signed-off-by: H Hartley Sweeten
    Acked-by: Alessandro Zummo
    Cc: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Free pdata before exit. Found by cppcheck.

    [yuasa@linux-mips.org: add missing iounmap()]
    Signed-off-by: Alexander Beregalov
    Reviewed-by: WANG Cong
    Acked-by: Daniel Mack
    Acked-by: Alessandro Zummo
    Cc Yoichi Yuasa
    Cc: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Beregalov
     
  • Currently the xen support drivers are displayed in the main Device Drivers
    menu of the config tools instead of in their own sub-menu, so move them to
    their own sub-menu, like the rest of the driver world uses.

    This keeps the main Device Drivers menu from becoming messy.

    Signed-off-by: Randy Dunlap
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • warning: symbol 'vgacon_text_mode_force' was not declared. Should it be static?

    Signed-off-by: Thiago Farina
    Acked-by: Matthew Garrett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Farina
     
  • BuraphaLinux reported that we will trigger a mm warning when we
    CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=65536, this is because mm cann't
    allocate so many pages. We should limit the range of
    CONFIG_VGACON_SOFT_SCROLLBACK_SIZE, don't give a user any chance to
    trigger that.

    Reported-by: BuraphaLinux Server
    Tested-by: BuraphaLinux Server
    Signed-off-by: WANG Cong
    Cc: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • Modify uid check in do_coredump so as to not apply it in the case of
    pipes.

    This just got noticed in testing. The end of do_coredump validates the
    uid of the inode for the created file against the uid of the crashing
    process to ensure that no one can pre-create a core file with different
    ownership and grab the information contained in the core when they
    shouldn' tbe able to. This causes failures when using pipes for a core
    dumps if the crashing process is not root, which is the uid of the pipe
    when it is created.

    The fix is simple. Since the check for matching uid's isn't relevant for
    pipes (a process can't create a pipe that the uermodehelper code will open
    anyway), we can just just skip it in the event ispipe is non-zero

    Reverts a pipe-affecting change which was accidentally made in

    : commit c46f739dd39db3b07ab5deb4e3ec81e1c04a91af
    : Author: Ingo Molnar
    : AuthorDate: Wed Nov 28 13:59:18 2007 +0100
    : Commit: Linus Torvalds
    : CommitDate: Wed Nov 28 10:58:01 2007 -0800
    :
    : vfs: coredumping fix

    Signed-off-by: Neil Horman
    Cc: Andi Kleen
    Cc: Oleg Nesterov
    Cc: Alan Cox
    Cc: Al Viro
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • User visible change.

    do_coredump() kills all threads which share the same ->mm but only the
    coredumping process gets the proper exit_code. Other tasks which share
    the same ->mm die "silently" and return status == 0 to parent.

    This is historical behaviour, not actually a bug. But I think Frank
    Heckenbach rightly dislikes the current behaviour. Simple test-case:

    #include
    #include
    #include
    #include

    int main(void)
    {
    int stat;

    if (!fork()) {
    if (!vfork())
    kill(getpid(), SIGQUIT);
    }

    wait(&stat);
    printf("stat=%x\n", stat);
    return 0;
    }

    Before this patch it prints "stat=0" despite the fact the child was killed
    by SIGQUIT. After this patch the output is "stat=3" which obviously makes
    more sense.

    Even with this patch, only the task which originates the coredumping gets
    "|= 0x80" if the core was actually dumped, but at least the coredumping
    signal is visible to do_wait/etc.

    Reported-by: Frank Heckenbach
    Signed-off-by: Oleg Nesterov
    Acked-by: WANG Cong
    Cc: Roland McGrath
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Pass mm->flags as a coredump parameter for consistency.

    ---
    1787 if (mm->core_state || !get_dumpable(mm)) { mmap_sem);
    1789 put_cred(cred);
    1790 goto fail;
    1791 }
    1792
    [...]
    1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ fsuid = 0; /* Dump root private */
    1801 }
    ---

    Since dumpable bits are not protected by lock, there is a chance to change
    these bits between (1) and (2).

    To solve this issue, this patch copies mm->flags to
    coredump_params.mm_flags at the beginning of do_coredump() and uses it
    instead of get_dumpable() while dumping core.

    This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
    dump_filter bits in mm->flags.

    [akpm@linux-foundation.org: fix merge]
    Signed-off-by: Masami Hiramatsu
    Acked-by: Roland McGrath
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • The current ELF dumper implementation can produce broken corefiles if
    program headers exceed 65535. This number is determined by the number of
    vmas which the process have. In particular, some extreme programs may use
    more than 65535 vmas. (If you google max_map_count, you can find some
    users facing this problem.) This kind of program never be able to generate
    correct coredumps.

    This patch implements ``extended numbering'' that uses sh_info field of
    the first section header instead of e_phnum field in order to represent
    upto 4294967295 vmas.

    This is supported by
    AMD64-ABI(http://www.x86-64.org/documentation.html) and
    Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
    Of course, we are preparing patches for gdb and binutils.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
    extended numbering and so will produce the corefiles with section header
    table in a special case.

    The problem is the process of writing a file header offset of the section
    header table into e_shoff field of the ELF header. ELF header is
    positioned at the beginning of the corefile, while section header at the
    end. So, we need to take which of the following ways:

    1. Seek backward to retry writing operation for ELF header
    after writing process for a whole part

    2. Make offset calculation process and writing process
    totally sequential

    The clause 1. is not always possible: one cannot assume that file system
    supports seek function. Consider the no_llseek case.

    Therefore, this patch adopts the clause 2.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
    macro for hiding _multiline_ logics in functions. This patch removes
    #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
    architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
    order to reduce a range of modification.

    This cleanup is for my next patches, but I think this cleanup itself is
    worth doing regardless of my firnal purpose.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
    them into other newly created *.c files. Then, each files will contain
    dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
    same. So, this patch moves them into a header file with dump_seek().
    Also, the patch deletes confusing DUMP_WRITE macros in each files.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • The current ELF dumper can produce broken corefiles if program headers
    exceed 65535. In particular, the program in 64-bit environment often
    demands more than 65535 mmaps. If you google max_map_count, then you can
    find many users facing this problem.

    Solaris has already dealt with this issue, and other OSes have also
    adopted the same method as in Solaris. Currently, Sun's document and AMD
    64 ABI include the description for the extension, where they call the
    extension Extended Numbering. See Reference for further information.

    I believe that linux kernel should adopt the same way as they did, so I've
    written this patch.

    I am also preparing for patches of GDB and binutils.

    How to fix
    ==========

    In new dumping process, there are two cases according to weather or
    not the number of program headers is equal to or more than 65535.

    - if less than 65535, the produced corefile format is exactly the same
    as the ordinary one.

    - if equal to or more than 65535, then e_phnum field is set to newly
    introduced constant PN_XNUM(0xffff) and the actual number of program
    headers is set to sh_info field of the section header at index 0.

    Compatibility Concern
    =====================

    * As already mentioned in Summary, Sun and AMD64 has already adopted
    this. See Reference.

    * There are four combinations according to whether kernel and userland
    tools are respectively modified or not. The next table summarizes
    shortly for each combination.

    ---------------------------------------------
    Original Kernel | Modified Kernel
    ---------------------------------------------
    < 65535 | >= 65535 | < 65535 | >= 65535
    -------------------------------------------------------------
    Original Tools | OK | broken | OK | broken (#)
    -------------------------------------------------------------
    Modified Tools | OK | broken | OK | OK
    -------------------------------------------------------------

    Note that there is no case that `OK' changes to `broken'.

    (#) Although this case remains broken, O-M behaves better than
    O-O. That is, while in O-O case e_phnum field would be extremely
    small due to integer overflow, in O-M case it is guaranteed to be at
    least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
    actual correct value than the O-O case.

    Test Program
    ============

    Here is a test program mkmmaps.c that is useful to produce the
    corefile with many mmaps. To use this, please take the following
    steps:

    $ ulimit -c unlimited
    $ sysctl vm.max_map_count=70000 # default 65530 is too small
    $ sysctl fs.file-max=70000
    $ mkmmaps 65535

    Then, the program will abort and a corefile will be generated.

    If failed, there are two cases according to the error message
    displayed.

    * ``out of memory'' means vm.max_map_count is still smaller

    * ``too many open files'' means fs.file-max is still smaller

    So, please change it to a larger value, and then retry it.

    mkmmaps.c
    ==
    #include
    #include
    #include
    #include
    #include
    int main(int argc, char **argv)
    {
    int maps_num;
    if (argc < 2) {
    fprintf(stderr, "mkmmaps [number of maps to be created]\n");
    exit(1);
    }
    if (sscanf(argv[1], "%d", &maps_num) == EOF) {
    perror("sscanf");
    exit(2);
    }
    if (maps_num < 0) {
    fprintf(stderr, "%d is invalid\n", maps_num);
    exit(3);
    }
    for (; maps_num > 0; --maps_num) {
    if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
    MAP_SHARED | MAP_ANONYMOUS, (int) -1,
    (off_t) NULL)) {
    perror("mmap");
    exit(4);
    }
    }
    abort();
    {
    char buffer[128];
    sprintf(buffer, "wc -l /proc/%u/maps", getpid());
    system(buffer);
    }
    return 0;
    }

    Tested on i386, ia64 and um/sys-i386.
    Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

    References
    ==========

    - Sun microsystems: Linker and Libraries.
    Part No: 817-1984-17, September 2008.
    URL: http://docs.sun.com/app/docs/doc/817-1984

    - System V ABI AMD64 Architecture Processor Supplement
    Draft Version 0.99., May 11, 2009.
    URL: http://www.x86-64.org/

    This patch:

    There are three different definitions for dump_seek() functions in
    binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
    only for binfmt_elf.c.

    My next patch will move dump_seek() into a header file in order to share
    the same implementations for dump_write() and dump_seek(). As the first
    step, this patch unify these three definitions for dump_seek() by applying
    the past commits that have been applied only for binfmt_elf.c.

    Specifically, the modification made here is part of the following commits:

    * d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
    * 7f14daa19ea36b200d237ad3ac5826ae25360461

    This patch does not change a shape of corefiles.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA