05 Aug, 2011

1 commit


04 Aug, 2011

7 commits

  • lockdep_init_map() only initializes parts of lockdep_map and triggers
    kmemcheck warning when it is copied as a whole. There isn't anything
    to be gained by clearing selectively. memset() the whole structure
    and remove loop for ->class_cache[] clearing.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Christian Casteyde
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.org
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:

    > /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
    > /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
    > distinct pointer types lacks a cast

    The warning is harmless in this case, but the below makes it go away.

    Reported-by: Arnaud Lacombe
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Commit dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
    recursion") made a bit of a mess of the various checks and error
    conditions.

    In particular it moved the check for !irqs_disabled() before the
    spurious enable test, resulting in some warnings.

    Reported-by: Arnaud Lacombe
    Reported-by: Dave Jones
    Reported-and-tested-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The core device layer sends tons of uevent notifications for each device
    it finds, and if the kernel has been built with a non-empty
    CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
    helper binary for all these events very early in the boot.

    Not only won't the root filesystem even be mounted at that point, we
    literally won't have necessarily even initialized all the process
    handling data structures at that point, which causes no end of silly
    problems even when the usermode helper doesn't actually succeed in
    executing.

    So just use our existing infrastructure to disable the usermodehelpers
    to make the kernel start out with them disabled. We enable them when
    we've at least initialized stuff a bit.

    Problems related to an uninitialized

    init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

    reported by various people.

    Reported-by: Manuel Lauss
    Reported-by: Richard Weinberger
    Reported-by: Marc Zyngier
    Acked-by: Kay Sievers
    Cc: Andrew Morton
    Cc: Vasiliy Kulikov
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Ingo Molnar
     
  • When send_cpu_listeners() finds the orphaned listener it marks it as
    !valid and drops listeners->sem. Before it takes this sem for writing,
    s->pid can be reused and add_del_listener() can wrongly try to re-use
    this entry.

    Change add_del_listener() to check ->valid = T.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Vasiliy Kulikov
    Acked-by: Balbir Singh
    Cc: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. Commit 26c4caea9d69 "don't allow duplicate entries in listener mode"
    changed add_del_listener(REGISTER) so that "next_cpu:" can reuse the
    listener allocated for the previous cpu, this doesn't look exactly
    right even if minor.

    Change the code to kfree() in the already-registered case, this case
    is unlikely anyway so the extra kmalloc_node() shouldn't hurt but
    looke more correct and clean.

    2. use the plain list_for_each_entry() instead of _safe() to scan
    listeners->list.

    3. Remove the unneeded INIT_LIST_HEAD(&s->list), we are going to
    list_add(&s->list).

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Vasiliy Kulikov
    Cc: Balbir Singh
    Reviewed-by: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

02 Aug, 2011

5 commits


01 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
    m68k/math-emu: Remove unnecessary code
    m68k/math-emu: Remove commented out old code
    m68k: Kill warning in setup_arch() when compiling for Sun3
    m68k/atari: Prefix GPIO_{IN,OUT} with CODEC_
    sparc: iounmap() and *_free_coherent() - Use lookup_resource()
    m68k/atari: Reserve some ST-RAM early on for device buffer use
    m68k/amiga: Chip RAM - Use lookup_resource()
    resources: Add lookup_resource()
    sparc: _sparc_find_resource() should check for exact matches
    m68k/amiga: Chip RAM - Offset resource end by CHIP_PHYSADDR
    m68k/amiga: Chip RAM - Use resource_size() to fix off-by-one error
    m68k/amiga: Chip RAM - Change chipavail to an atomic_t
    m68k/amiga: Chip RAM - Always allocate from the start of memory
    m68k/amiga: Chip RAM - Convert from printk() to pr_*()
    m68k/amiga: Chip RAM - Use tabs for indentation

    Linus Torvalds
     

31 Jul, 2011

1 commit


30 Jul, 2011

2 commits

  • * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (430 commits)
    [media] ir-mce_kbd-decoder: include module.h for its facilities
    [media] ov5642: include module.h for its facilities
    [media] em28xx: Fix DVB-C maxsize for em2884
    [media] tda18271c2dd: Fix saw filter configuration for DVB-C @6MHz
    [media] v4l: mt9v032: Fix Bayer pattern
    [media] V4L: mt9m111: rewrite set_pixfmt
    [media] V4L: mt9m111: fix missing return value check mt9m111_reg_clear
    [media] V4L: initial driver for ov5642 CMOS sensor
    [media] V4L: sh_mobile_ceu_camera: fix Oops when USERPTR mapping fails
    [media] V4L: soc-camera: remove soc-camera bus and devices on it
    [media] V4L: soc-camera: un-export the soc-camera bus
    [media] V4L: sh_mobile_csi2: switch away from using the soc-camera bus notifier
    [media] V4L: add media bus configuration subdev operations
    [media] V4L: soc-camera: group struct field initialisations together
    [media] V4L: soc-camera: remove now unused soc-camera specific PM hooks
    [media] V4L: pxa-camera: switch to using standard PM hooks
    [media] NetUP Dual DVB-T/C CI RF: force card hardware revision by module param
    [media] Don't OOPS if videobuf_dvb_get_frontend return NULL
    [media] NetUP Dual DVB-T/C CI RF: load firmware according card revision
    [media] omap3isp: Support configurable HS/VS polarities
    ...

    Fix up conflicts:
    - arch/arm/mach-omap2/board-rx51-peripherals.c:
    cleanup regulator supply definitions in mach-omap2
    vs
    OMAP3: RX-51: define vdds_csib regulator supply
    - drivers/staging/tm6000/tm6000-alsa.c (trivial)

    Linus Torvalds
     
  • * 'next/dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: (21 commits)
    arm/dt: tegra devicetree support
    arm/versatile: Add device tree support
    dt/irq: add irq_domain_generate_simple() helper
    irq: add irq_domain translation infrastructure
    dmaengine: imx-sdma: add device tree probe support
    dmaengine: imx-sdma: sdma_get_firmware does not need to copy fw_name
    dmaengine: imx-sdma: use platform_device_id to identify sdma version
    mmc: sdhci-esdhc-imx: add device tree probe support
    mmc: sdhci-pltfm: dt device does not pass parent to sdhci_alloc_host
    mmc: sdhci-esdhc-imx: get rid of the uses of cpu_is_mx()
    mmc: sdhci-esdhc-imx: do not reference platform data after probe
    mmc: sdhci-esdhc-imx: extend card_detect and write_protect support for mx5
    net/fec: add device tree probe support
    net: ibm_newemac: convert it to use of_get_phy_mode
    dt/net: add helper function of_get_phy_mode
    net/fec: gasket needs to be enabled for some i.mx
    serial/imx: add device tree probe support
    serial/imx: get rid of the uses of cpu_is_mx1()
    arm/dt: Add dtb make rule
    arm/dt: Add skeleton dtsi file
    ...

    Linus Torvalds
     

28 Jul, 2011

6 commits

  • Arnd Bergmann
     
  • irq_domain_generate_simple() is an easy way to generate an irq translation
    domain for simple irq controllers. It assumes a flat 1:1 mapping from
    hardware irq number to an offset of the first linux irq number assigned
    to the controller

    Signed-off-by: Grant Likely

    Grant Likely
     
  • This patch adds irq_domain infrastructure for translating from
    hardware irq numbers to linux irqs. This is particularly important
    for architectures adding device tree support because the current
    implementation (excluding PowerPC and SPARC) cannot handle
    translation for more than a single interrupt controller. irq_domain
    supports device tree translation for any number of interrupt
    controllers.

    This patch converts x86, Microblaze, ARM and MIPS to use irq_domain
    for device tree irq translation. x86 is untested beyond compiling it,
    irq_domain is enabled for MIPS and Microblaze, but the old behaviour is
    preserved until the core code is modified to actually register an
    irq_domain yet. On ARM it works and is required for much of the new
    ARM device tree board support.

    PowerPC has /not/ been converted to use this new infrastructure. It
    is still missing some features before it can replace the virq
    infrastructure already in powerpc (see documentation on
    irq_domain_map/unmap for details). Followup patches will add the
    missing pieces and migrate PowerPC to use irq_domain.

    SPARC has its own method of managing interrupts from the device tree
    and is unaffected by this change.

    Acked-by: Ralf Baechle
    Signed-off-by: Grant Likely

    Grant Likely
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (54 commits)
    tpm_nsc: Fix bug when loading multiple TPM drivers
    tpm: Move tpm_tis_reenable_interrupts out of CONFIG_PNP block
    tpm: Fix compilation warning when CONFIG_PNP is not defined
    TOMOYO: Update kernel-doc.
    tpm: Fix a typo
    tpm_tis: Probing function for Intel iTPM bug
    tpm_tis: Fix the probing for interrupts
    tpm_tis: Delay ACPI S3 suspend while the TPM is busy
    tpm_tis: Re-enable interrupts upon (S3) resume
    tpm: Fix display of data in pubek sysfs entry
    tpm_tis: Add timeouts sysfs entry
    tpm: Adjust interface timeouts if they are too small
    tpm: Use interface timeouts returned from the TPM
    tpm_tis: Introduce durations sysfs entry
    tpm: Adjust the durations if they are too small
    tpm: Use durations returned from TPM
    TOMOYO: Enable conditional ACL.
    TOMOYO: Allow using argv[]/envp[] of execve() as conditions.
    TOMOYO: Allow using executable's realpath and symlink's target as conditions.
    TOMOYO: Allow using owner/group etc. of file objects as conditions.
    ...

    Fix up trivial conflict in security/tomoyo/realpath.c

    Linus Torvalds
     
  • Signed-off-by: Hans Verkuil
    Signed-off-by: Mauro Carvalho Chehab

    Hans Verkuil
     
  • sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
    change ->blocked directly. This is not correct, see the changelog in
    e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"

    Change them to use set_current_blocked().

    Another change is that now we are doing ->saved_sigmask = ->blocked
    lockless, it doesn't make any sense to do this under ->siglock.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

27 Jul, 2011

7 commits

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • When a kernel BUG or oops occurs, ChromeOS intends to panic and
    immediately reboot, with stacktrace and other messages preserved in RAM
    across reboot.

    But the longer we delay, the more likely the user is to poweroff and
    lose the info.

    panic_timeout (seconds before rebooting) is set by panic= boot option or
    sysctl or /proc/sys/kernel/panic; but 0 means wait forever, so at
    present we have to delay at least 1 second.

    Let a negative number mean reboot immediately (with the small cosmetic
    benefit of suppressing that newline-less "Rebooting in %d seconds.."
    message).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Mandeep Singh Baines
    Cc: Huang Ying
    Cc: Andi Kleen
    Cc: Hugh Dickins
    Cc: Olaf Hering
    Cc: Jesse Barnes
    Cc: Dave Airlie
    Cc: Greg Kroah-Hartman
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Selecting GCOV for UML causing configuration mismatch:

    warning: (GCOV_KERNEL) selects CONSTRUCTORS which has unmet direct dependencies (!UML)

    Constructors are not needed for UML.

    Signed-off-by: Vitaliy Ivanov
    Cc: Peter Oberparleiter
    Acked-by: Richard Weinberger
    Acked-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaliy Ivanov
     
  • Add support for the shm_rmid_forced sysctl. If set to 1, all shared
    memory objects in current ipc namespace will be automatically forced to
    use IPC_RMID.

    The POSIX way of handling shmem allows one to create shm objects and
    call shmdt(), leaving shm object associated with no process, thus
    consuming memory not counted via rlimits.

    With shm_rmid_forced=1 the shared memory object is counted at least for
    one process, so OOM killer may effectively kill the fat process holding
    the shared memory.

    It obviously breaks POSIX - some programs relying on the feature would
    stop working. So set shm_rmid_forced=1 only if you're sure nobody uses
    "orphaned" memory. Use shm_rmid_forced=0 by default for compatability
    reasons.

    The feature was previously impemented in -ow as a configure option.

    [akpm@linux-foundation.org: fix documentation, per Randy]
    [akpm@linux-foundation.org: fix warning]
    [akpm@linux-foundation.org: readability/conventionality tweaks]
    [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
    Signed-off-by: Vasiliy Kulikov
    Cc: Randy Dunlap
    Cc: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Daniel Lezcano
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Alan Cox
    Cc: Solar Designer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Signed-off-by: Daniel Rebelo de Oliveira
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Rebelo de Oliveira
     
  • [ This patch has already been accepted as commit 0ac0c0d0f837 but later
    reverted (commit 35926ff5fba8) because it itroduced arch specific
    __node_random which was defined only for x86 code so it broke other
    archs. This is a followup without any arch specific code. Other than
    that there are no functional changes.]

    Some workloads that create a large number of small files tend to assign
    too many pages to node 0 (multi-node systems). Part of the reason is
    that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
    at node 0 for newly created tasks.

    This patch changes the rotor to be initialized to a random node number
    of the cpuset.

    [akpm@linux-foundation.org: fix layout]
    [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
    [mhocko@suse.cz: Make it arch independent]
    [akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
    Signed-off-by: Jack Steiner
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Michal Hocko
    Reviewed-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Paul Menage
    Cc: Jack Steiner
    Cc: Robin Holt
    Cc: David Rientjes
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Jack Steiner
    Cc: KOSAKI Motohiro
    Cc: Lee Schermerhorn
    Cc: Michal Hocko
    Cc: Paul Menage
    Cc: Pekka Enberg
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw
    parameter from get_futex_key()) in 2.6.33 fixed two problems: First, It
    prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
    MAP_PRIVATE futex operations by forcing the COW to occur by
    unconditionally performing a write access get_user_pages_fast() to get
    the page. The commit also introduced a user-mode regression in that it
    broke futex operations on read-only memory maps. For example, this
    breaks workloads that have one or more reader processes doing a
    FUTEX_WAIT on a futex within a read only shared file mapping, and a
    writer processes that has a writable mapping issuing the FUTEX_WAKE.

    This fixes the regression for valid futex operations on RO mappings by
    trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
    fails. This change makes it necessary to also check for invalid use
    cases, such as anonymous RO mappings (which can never change) and the
    ZERO_PAGE which the commit referenced above was written to address.

    This patch does restore the original behavior with RO MAP_PRIVATE
    mappings, which have inherent user-mode usage problems and don't really
    make sense. With this patch performing a FUTEX_WAIT within a RO
    MAP_PRIVATE mapping will be successfully woken provided another process
    updates the region of the underlying mapped file. However, the mmap()
    man page states that for a MAP_PRIVATE mapping:

    It is unspecified whether changes made to the file after
    the mmap() call are visible in the mapped region.

    So user-mode users attempting to use futex operations on RO MAP_PRIVATE
    mappings are depending on unspecified behavior. Additionally a
    RO MAP_PRIVATE mapping could fail to wake up in the following case.

    Thread-A: call futex(FUTEX_WAIT, memory-region-A).
    get_futex_key() return inode based key.
    sleep on the key
    Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
    Thread-B: write memory-region-A.
    COW happen. This process's memory-region-A become related
    to new COWed private (ie PageAnon=1) page.
    Thread-B: call futex(FUETX_WAKE, memory-region-A).
    get_futex_key() return mm based key.
    IOW, we fail to wake up Thread-A.

    Once again doing something like this is just silly and users who do
    something like this get what they deserve.

    While RO MAP_PRIVATE mappings are nonsensical, checking for a private
    mapping requires walking the vmas and was deemed too costly to avoid a
    userspace hang.

    This Patch is based on Peter Zijlstra's initial patch with modifications to
    only allow RO mappings for futex operations that need VERIFY_READ access.

    Reported-by: David Oliver
    Signed-off-by: Shawn Bohrer
    Acked-by: Peter Zijlstra
    Signed-off-by: Darren Hart
    Cc: KOSAKI Motohiro
    Cc: peterz@infradead.org
    Cc: eric.dumazet@gmail.com
    Cc: zvonler@rgmadvisors.com
    Cc: hughd@google.com
    Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Shawn Bohrer
     

26 Jul, 2011

8 commits

  • * Merge akpm patch series: (122 commits)
    drivers/connector/cn_proc.c: remove unused local
    Documentation/SubmitChecklist: add RCU debug config options
    reiserfs: use hweight_long()
    reiserfs: use proper little-endian bitops
    pnpacpi: register disabled resources
    drivers/rtc/rtc-tegra.c: properly initialize spinlock
    drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
    drivers/rtc: add support for Qualcomm PMIC8xxx RTC
    drivers/rtc/rtc-s3c.c: support clock gating
    drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
    init: skip calibration delay if previously done
    misc/eeprom: add eeprom access driver for digsy_mtc board
    misc/eeprom: add driver for microwire 93xx46 EEPROMs
    checkpatch.pl: update $logFunctions
    checkpatch: make utf-8 test --strict
    checkpatch.pl: add ability to ignore various messages
    checkpatch: add a "prefer __aligned" check
    checkpatch: validate signature styles and To: and Cc: lines
    checkpatch: add __rcu as a sparse modifier
    checkpatch: suggest using min_t or max_t
    ...

    Did this as a merge because of (trivial) conflicts in
    - Documentation/feature-removal-schedule.txt
    - arch/xtensa/include/asm/uaccess.h
    that were just easier to fix up in the merge than in the patch series.

    Linus Torvalds
     
  • If CONFIG_IKCONFIG=m but CONFIG_IKCONFIG_PROC=n we get a module that has
    no MODULE_LICENSE definition. Move the MODULE_*() definitions outside the
    CONFIG_IKCONFIG_PROC #ifdef to prevent this configuration from tainting
    the kernel.

    Signed-off-by: Stephen Boyd
    Acked-by: Randy Dunlap
    Acked-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • It is not necessary to share the same notifier.h.

    This patch already moves register_reboot_notifier() and
    unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.

    [amwang@redhat.com: make allyesconfig succeed on ppc64]
    Signed-off-by: WANG Cong
    Cc: David Miller
    Cc: "Rafael J. Wysocki"
    Cc: Greg KH
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • devres uses the pointer value as key after it's freed, which is safe but
    triggers spurious use-after-free warnings on some static analysis tools.
    Rearrange code to avoid such warnings.

    Signed-off-by: Maxin B. John
    Reviewed-by: Rolf Eike Beer
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maxin B John
     
  • I haven't reproduced it myself but the fail scenario is that on such
    machines (notably ARM and some embedded powerpc), if you manage to hit
    that futex path on a writable page whose dirty bit has gone from the PTE,
    you'll livelock inside the kernel from what I can tell.

    It will go in a loop of trying the atomic access, failing, trying gup to
    "fix it up", getting succcess from gup, go back to the atomic access,
    failing again because dirty wasn't fixed etc...

    So I think you essentially hang in the kernel.

    The scenario is probably rare'ish because affected architecture are
    embedded and tend to not swap much (if at all) so we probably rarely hit
    the case where dirty is missing or young is missing, but I think Shan has
    a piece of SW that can reliably reproduce it using a shared writable
    mapping & fork or something like that.

    On archs who use SW tracking of dirty & young, a page without dirty is
    effectively mapped read-only and a page without young unaccessible in the
    PTE.

    Additionally, some architectures might lazily flush the TLB when relaxing
    write protection (by doing only a local flush), and expect a fault to
    invalidate the stale entry if it's still present on another processor.

    The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
    "fix it up" by causing get_user_pages() which would then be equivalent to
    taking the fault.

    However that isn't the case. get_user_pages() will not call
    handle_mm_fault() in the case where the PTE seems to have the right
    permissions, regardless of the dirty and young state. It will eventually
    update those bits ... in the struct page, but not in the PTE.

    Additionally, it will not handle the lazy TLB flushing that can be
    required by some architectures in the fault case.

    Basically, gup is the wrong interface for the job. The patch provides a
    more appropriate one which boils down to just calling handle_mm_fault()
    since what we are trying to do is simulate a real page fault.

    The futex code currently attempts to write to user memory within a
    pagefault disabled section, and if that fails, tries to fix it up using
    get_user_pages().

    This doesn't work on archs where the dirty and young bits are maintained
    by software, since they will gate access permission in the TLB, and will
    not be updated by gup().

    In addition, there's an expectation on some archs that a spurious write
    fault triggers a local TLB flush, and that is missing from the picture as
    well.

    I decided that adding those "features" to gup() would be too much for this
    already too complex function, and instead added a new simpler
    fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
    which the futex code can call.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
    Signed-off-by: Benjamin Herrenschmidt
    Reported-by: Shan Hai
    Tested-by: Shan Hai
    Cc: David Laight
    Acked-by: Peter Zijlstra
    Cc: Darren Hart
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • …linux/kernel/git/mmarek/kbuild-2.6

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    genksyms: Use same type in loop comparison
    kbuild: silence generated makefile message
    kernel: prevent unnecessary rebuilding due to config_data.gz
    headers_install: fix __packed in exported kernel headers
    dtc: regen parser
    dtc: migrate parser to implicit rules
    kconfig: regen parser
    kconfig: migrate parser to implicit rules
    kconfig/zconf.l: do not ask to generate backup
    kconfig: kill no longer needed reference to YYDEBUG
    kconfig: constify `kconf_id_lookup'
    genksym: regen parser
    genksyms: migrate parser to implicit rules
    genksyms: drop -Wno-uninitialized from HOSTCFLAGS_parse.tab.o
    genksyms: pass hash and lookup functions name and target language though the input file
    kbuild: simplify the %_shipped rule
    kbuild: add implicit rules for parser generation
    kbuild: add `baseprereq'
    kbuild: Fix reference to vermagic.h

    * 'packaging' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    package: Makefile: fix perf target bug

    * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    gitignore: ignore debian build directory

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    fs: Merge split strings
    treewide: fix potentially dangerous trailing ';' in #defined values/expressions
    uwb: Fix misspelling of neighbourhood in comment
    net, netfilter: Remove redundant goto in ebt_ulog_packet
    trivial: don't touch files that are removed in the staging tree
    lib/vsprintf: replace link to Draft by final RFC number
    doc: Kconfig: `to be' -> `be'
    doc: Kconfig: Typo: square -> squared
    doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
    drivers/net: static should be at beginning of declaration
    drivers/media: static should be at beginning of declaration
    drivers/i2c: static should be at beginning of declaration
    XTENSA: static should be at beginning of declaration
    SH: static should be at beginning of declaration
    MIPS: static should be at beginning of declaration
    ARM: static should be at beginning of declaration
    rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
    Update my e-mail address
    PCIe ASPM: forcedly -> forcibly
    gma500: push through device driver tree
    ...

    Fix up trivial conflicts:
    - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
    - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
    - drivers/net/r8169.c (just context changes)

    Linus Torvalds
     
  • * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
    block: strict rq_affinity
    backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
    block: fix patch import error in max_discard_sectors check
    block: reorder request_queue to remove 64 bit alignment padding
    CFQ: add think time check for group
    CFQ: add think time check for service tree
    CFQ: move think time check variables to a separate struct
    fixlet: Remove fs_excl from struct task.
    cfq: Remove special treatment for metadata rqs.
    block: document blk_plug list access
    block: avoid building too big plug list
    compat_ioctl: fix make headers_check regression
    block: eliminate potential for infinite loop in blkdev_issue_discard
    compat_ioctl: fix warning caused by qemu
    block: flush MEDIA_CHANGE from drivers on close(2)
    blk-throttle: Make total_nr_queued unsigned
    block: Add __attribute__((format(printf...) and fix fallout
    fs/partitions/check.c: make local symbols static
    block:remove some spare spaces in genhd.c
    block:fix the comment error in blkdev.h
    ...

    Linus Torvalds
     

25 Jul, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    modpost: Fix modpost's license checking V3
    module: add /sys/module//uevent files
    module: change attr callbacks to take struct module_kobject
    modules: make arch's use default loader hooks
    modules: add default loader hook implementations
    param: fix return value handling in param_set_*

    Linus Torvalds
     
  • * 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
    KVM: IOMMU: Disable device assignment without interrupt remapping
    KVM: MMU: trace mmio page fault
    KVM: MMU: mmio page fault support
    KVM: MMU: reorganize struct kvm_shadow_walk_iterator
    KVM: MMU: lockless walking shadow page table
    KVM: MMU: do not need atomicly to set/clear spte
    KVM: MMU: introduce the rules to modify shadow page table
    KVM: MMU: abstract some functions to handle fault pfn
    KVM: MMU: filter out the mmio pfn from the fault pfn
    KVM: MMU: remove bypass_guest_pf
    KVM: MMU: split kvm_mmu_free_page
    KVM: MMU: count used shadow pages on prepareing path
    KVM: MMU: rename 'pt_write' to 'emulate'
    KVM: MMU: cleanup for FNAME(fetch)
    KVM: MMU: optimize to handle dirty bit
    KVM: MMU: cache mmio info on page fault path
    KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
    KVM: MMU: do not update slot bitmap if spte is nonpresent
    KVM: MMU: fix walking shadow page table
    KVM guest: KVM Steal time registration
    ...

    Linus Torvalds