Commit a1f4d39500ad8ed61825eff061debff42386ab5b

Authored by Avi Kivity
1 parent fc34531db3

KVM: Remove memory alias support

As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity <avi@redhat.com>

Showing 13 changed files with 11 additions and 225 deletions Inline Diff

Documentation/feature-removal-schedule.txt
1 The following is a list of files and features that are going to be 1 The following is a list of files and features that are going to be
2 removed in the kernel source tree. Every entry should contain what 2 removed in the kernel source tree. Every entry should contain what
3 exactly is going away, why it is happening, and who is going to be doing 3 exactly is going away, why it is happening, and who is going to be doing
4 the work. When the feature is removed from the kernel, it should also 4 the work. When the feature is removed from the kernel, it should also
5 be removed from this file. 5 be removed from this file.
6 6
7 --------------------------- 7 ---------------------------
8 8
9 What: PRISM54 9 What: PRISM54
10 When: 2.6.34 10 When: 2.6.34
11 11
12 Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the 12 Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the
13 prism54 wireless driver. After Intersil stopped selling these 13 prism54 wireless driver. After Intersil stopped selling these
14 devices in preference for the newer more flexible SoftMAC devices 14 devices in preference for the newer more flexible SoftMAC devices
15 a SoftMAC device driver was required and prism54 did not support 15 a SoftMAC device driver was required and prism54 did not support
16 them. The p54pci driver now exists and has been present in the kernel for 16 them. The p54pci driver now exists and has been present in the kernel for
17 a while. This driver supports both SoftMAC devices and FullMAC devices. 17 a while. This driver supports both SoftMAC devices and FullMAC devices.
18 The main difference between these devices was the amount of memory which 18 The main difference between these devices was the amount of memory which
19 could be used for the firmware. The SoftMAC devices support a smaller 19 could be used for the firmware. The SoftMAC devices support a smaller
20 amount of memory. Because of this the SoftMAC firmware fits into FullMAC 20 amount of memory. Because of this the SoftMAC firmware fits into FullMAC
21 devices's memory. p54pci supports not only PCI / Cardbus but also USB 21 devices's memory. p54pci supports not only PCI / Cardbus but also USB
22 and SPI. Since p54pci supports all devices prism54 supports 22 and SPI. Since p54pci supports all devices prism54 supports
23 you will have a conflict. I'm not quite sure how distributions are 23 you will have a conflict. I'm not quite sure how distributions are
24 handling this conflict right now. prism54 was kept around due to 24 handling this conflict right now. prism54 was kept around due to
25 claims users may experience issues when using the SoftMAC driver. 25 claims users may experience issues when using the SoftMAC driver.
26 Time has passed users have not reported issues. If you use prism54 26 Time has passed users have not reported issues. If you use prism54
27 and for whatever reason you cannot use p54pci please let us know! 27 and for whatever reason you cannot use p54pci please let us know!
28 E-mail us at: linux-wireless@vger.kernel.org 28 E-mail us at: linux-wireless@vger.kernel.org
29 29
30 For more information see the p54 wiki page: 30 For more information see the p54 wiki page:
31 31
32 http://wireless.kernel.org/en/users/Drivers/p54 32 http://wireless.kernel.org/en/users/Drivers/p54
33 33
34 Who: Luis R. Rodriguez <lrodriguez@atheros.com> 34 Who: Luis R. Rodriguez <lrodriguez@atheros.com>
35 35
36 --------------------------- 36 ---------------------------
37 37
38 What: IRQF_SAMPLE_RANDOM 38 What: IRQF_SAMPLE_RANDOM
39 Check: IRQF_SAMPLE_RANDOM 39 Check: IRQF_SAMPLE_RANDOM
40 When: July 2009 40 When: July 2009
41 41
42 Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy 42 Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
43 sources in the kernel's current entropy model. To resolve this, every 43 sources in the kernel's current entropy model. To resolve this, every
44 input point to the kernel's entropy pool needs to better document the 44 input point to the kernel's entropy pool needs to better document the
45 type of entropy source it actually is. This will be replaced with 45 type of entropy source it actually is. This will be replaced with
46 additional add_*_randomness functions in drivers/char/random.c 46 additional add_*_randomness functions in drivers/char/random.c
47 47
48 Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> 48 Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
49 49
50 --------------------------- 50 ---------------------------
51 51
52 What: Deprecated snapshot ioctls 52 What: Deprecated snapshot ioctls
53 When: 2.6.36 53 When: 2.6.36
54 54
55 Why: The ioctls in kernel/power/user.c were marked as deprecated long time 55 Why: The ioctls in kernel/power/user.c were marked as deprecated long time
56 ago. Now they notify users about that so that they need to replace 56 ago. Now they notify users about that so that they need to replace
57 their userspace. After some more time, remove them completely. 57 their userspace. After some more time, remove them completely.
58 58
59 Who: Jiri Slaby <jirislaby@gmail.com> 59 Who: Jiri Slaby <jirislaby@gmail.com>
60 60
61 --------------------------- 61 ---------------------------
62 62
63 What: The ieee80211_regdom module parameter 63 What: The ieee80211_regdom module parameter
64 When: March 2010 / desktop catchup 64 When: March 2010 / desktop catchup
65 65
66 Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code, 66 Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code,
67 and currently serves as an option for users to define an 67 and currently serves as an option for users to define an
68 ISO / IEC 3166 alpha2 code for the country they are currently 68 ISO / IEC 3166 alpha2 code for the country they are currently
69 present in. Although there are userspace API replacements for this 69 present in. Although there are userspace API replacements for this
70 through nl80211 distributions haven't yet caught up with implementing 70 through nl80211 distributions haven't yet caught up with implementing
71 decent alternatives through standard GUIs. Although available as an 71 decent alternatives through standard GUIs. Although available as an
72 option through iw or wpa_supplicant its just a matter of time before 72 option through iw or wpa_supplicant its just a matter of time before
73 distributions pick up good GUI options for this. The ideal solution 73 distributions pick up good GUI options for this. The ideal solution
74 would actually consist of intelligent designs which would do this for 74 would actually consist of intelligent designs which would do this for
75 the user automatically even when travelling through different countries. 75 the user automatically even when travelling through different countries.
76 Until then we leave this module parameter as a compromise. 76 Until then we leave this module parameter as a compromise.
77 77
78 When userspace improves with reasonable widely-available alternatives for 78 When userspace improves with reasonable widely-available alternatives for
79 this we will no longer need this module parameter. This entry hopes that 79 this we will no longer need this module parameter. This entry hopes that
80 by the super-futuristically looking date of "March 2010" we will have 80 by the super-futuristically looking date of "March 2010" we will have
81 such replacements widely available. 81 such replacements widely available.
82 82
83 Who: Luis R. Rodriguez <lrodriguez@atheros.com> 83 Who: Luis R. Rodriguez <lrodriguez@atheros.com>
84 84
85 --------------------------- 85 ---------------------------
86 86
87 What: dev->power.power_state 87 What: dev->power.power_state
88 When: July 2007 88 When: July 2007
89 Why: Broken design for runtime control over driver power states, confusing 89 Why: Broken design for runtime control over driver power states, confusing
90 driver-internal runtime power management with: mechanisms to support 90 driver-internal runtime power management with: mechanisms to support
91 system-wide sleep state transitions; event codes that distinguish 91 system-wide sleep state transitions; event codes that distinguish
92 different phases of swsusp "sleep" transitions; and userspace policy 92 different phases of swsusp "sleep" transitions; and userspace policy
93 inputs. This framework was never widely used, and most attempts to 93 inputs. This framework was never widely used, and most attempts to
94 use it were broken. Drivers should instead be exposing domain-specific 94 use it were broken. Drivers should instead be exposing domain-specific
95 interfaces either to kernel or to userspace. 95 interfaces either to kernel or to userspace.
96 Who: Pavel Machek <pavel@suse.cz> 96 Who: Pavel Machek <pavel@suse.cz>
97 97
98 --------------------------- 98 ---------------------------
99 99
100 What: Video4Linux API 1 ioctls and from Video devices. 100 What: Video4Linux API 1 ioctls and from Video devices.
101 When: July 2009 101 When: July 2009
102 Files: include/linux/videodev.h 102 Files: include/linux/videodev.h
103 Check: include/linux/videodev.h 103 Check: include/linux/videodev.h
104 Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6 104 Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6
105 series. The old API have lots of drawbacks and don't provide enough 105 series. The old API have lots of drawbacks and don't provide enough
106 means to work with all video and audio standards. The newer API is 106 means to work with all video and audio standards. The newer API is
107 already available on the main drivers and should be used instead. 107 already available on the main drivers and should be used instead.
108 Newer drivers should use v4l_compat_translate_ioctl function to handle 108 Newer drivers should use v4l_compat_translate_ioctl function to handle
109 old calls, replacing to newer ones. 109 old calls, replacing to newer ones.
110 Decoder iocts are using internally to allow video drivers to 110 Decoder iocts are using internally to allow video drivers to
111 communicate with video decoders. This should also be improved to allow 111 communicate with video decoders. This should also be improved to allow
112 V4L2 calls being translated into compatible internal ioctls. 112 V4L2 calls being translated into compatible internal ioctls.
113 Compatibility ioctls will be provided, for a while, via 113 Compatibility ioctls will be provided, for a while, via
114 v4l1-compat module. 114 v4l1-compat module.
115 Who: Mauro Carvalho Chehab <mchehab@infradead.org> 115 Who: Mauro Carvalho Chehab <mchehab@infradead.org>
116 116
117 --------------------------- 117 ---------------------------
118 118
119 What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) 119 What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
120 When: 2.6.35/2.6.36 120 When: 2.6.35/2.6.36
121 Files: drivers/pcmcia/: pcmcia_ioctl.c 121 Files: drivers/pcmcia/: pcmcia_ioctl.c
122 Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a 122 Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a
123 normal hotpluggable bus, and with it using the default kernel 123 normal hotpluggable bus, and with it using the default kernel
124 infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA 124 infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA
125 control ioctl needed by cardmgr and cardctl from pcmcia-cs is 125 control ioctl needed by cardmgr and cardctl from pcmcia-cs is
126 unnecessary and potentially harmful (it does not provide for 126 unnecessary and potentially harmful (it does not provide for
127 proper locking), and makes further cleanups and integration of the 127 proper locking), and makes further cleanups and integration of the
128 PCMCIA subsystem into the Linux kernel device driver model more 128 PCMCIA subsystem into the Linux kernel device driver model more
129 difficult. The features provided by cardmgr and cardctl are either 129 difficult. The features provided by cardmgr and cardctl are either
130 handled by the kernel itself now or are available in the new 130 handled by the kernel itself now or are available in the new
131 pcmciautils package available at 131 pcmciautils package available at
132 http://kernel.org/pub/linux/utils/kernel/pcmcia/ 132 http://kernel.org/pub/linux/utils/kernel/pcmcia/
133 133
134 For all architectures except ARM, the associated config symbol 134 For all architectures except ARM, the associated config symbol
135 has been removed from kernel 2.6.34; for ARM, it will be likely 135 has been removed from kernel 2.6.34; for ARM, it will be likely
136 be removed from kernel 2.6.35. The actual code will then likely 136 be removed from kernel 2.6.35. The actual code will then likely
137 be removed from kernel 2.6.36. 137 be removed from kernel 2.6.36.
138 Who: Dominik Brodowski <linux@dominikbrodowski.net> 138 Who: Dominik Brodowski <linux@dominikbrodowski.net>
139 139
140 --------------------------- 140 ---------------------------
141 141
142 What: sys_sysctl 142 What: sys_sysctl
143 When: September 2010 143 When: September 2010
144 Option: CONFIG_SYSCTL_SYSCALL 144 Option: CONFIG_SYSCTL_SYSCALL
145 Why: The same information is available in a more convenient from 145 Why: The same information is available in a more convenient from
146 /proc/sys, and none of the sysctl variables appear to be 146 /proc/sys, and none of the sysctl variables appear to be
147 important performance wise. 147 important performance wise.
148 148
149 Binary sysctls are a long standing source of subtle kernel 149 Binary sysctls are a long standing source of subtle kernel
150 bugs and security issues. 150 bugs and security issues.
151 151
152 When I looked several months ago all I could find after 152 When I looked several months ago all I could find after
153 searching several distributions were 5 user space programs and 153 searching several distributions were 5 user space programs and
154 glibc (which falls back to /proc/sys) using this syscall. 154 glibc (which falls back to /proc/sys) using this syscall.
155 155
156 The man page for sysctl(2) documents it as unusable for user 156 The man page for sysctl(2) documents it as unusable for user
157 space programs. 157 space programs.
158 158
159 sysctl(2) is not generally ABI compatible to a 32bit user 159 sysctl(2) is not generally ABI compatible to a 32bit user
160 space application on a 64bit and a 32bit kernel. 160 space application on a 64bit and a 32bit kernel.
161 161
162 For the last several months the policy has been no new binary 162 For the last several months the policy has been no new binary
163 sysctls and no one has put forward an argument to use them. 163 sysctls and no one has put forward an argument to use them.
164 164
165 Binary sysctls issues seem to keep happening appearing so 165 Binary sysctls issues seem to keep happening appearing so
166 properly deprecating them (with a warning to user space) and a 166 properly deprecating them (with a warning to user space) and a
167 2 year grace warning period will mean eventually we can kill 167 2 year grace warning period will mean eventually we can kill
168 them and end the pain. 168 them and end the pain.
169 169
170 In the mean time individual binary sysctls can be dealt with 170 In the mean time individual binary sysctls can be dealt with
171 in a piecewise fashion. 171 in a piecewise fashion.
172 172
173 Who: Eric Biederman <ebiederm@xmission.com> 173 Who: Eric Biederman <ebiederm@xmission.com>
174 174
175 --------------------------- 175 ---------------------------
176 176
177 What: remove EXPORT_SYMBOL(kernel_thread) 177 What: remove EXPORT_SYMBOL(kernel_thread)
178 When: August 2006 178 When: August 2006
179 Files: arch/*/kernel/*_ksyms.c 179 Files: arch/*/kernel/*_ksyms.c
180 Check: kernel_thread 180 Check: kernel_thread
181 Why: kernel_thread is a low-level implementation detail. Drivers should 181 Why: kernel_thread is a low-level implementation detail. Drivers should
182 use the <linux/kthread.h> API instead which shields them from 182 use the <linux/kthread.h> API instead which shields them from
183 implementation details and provides a higherlevel interface that 183 implementation details and provides a higherlevel interface that
184 prevents bugs and code duplication 184 prevents bugs and code duplication
185 Who: Christoph Hellwig <hch@lst.de> 185 Who: Christoph Hellwig <hch@lst.de>
186 186
187 --------------------------- 187 ---------------------------
188 188
189 What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports 189 What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports
190 (temporary transition config option provided until then) 190 (temporary transition config option provided until then)
191 The transition config option will also be removed at the same time. 191 The transition config option will also be removed at the same time.
192 When: before 2.6.19 192 When: before 2.6.19
193 Why: Unused symbols are both increasing the size of the kernel binary 193 Why: Unused symbols are both increasing the size of the kernel binary
194 and are often a sign of "wrong API" 194 and are often a sign of "wrong API"
195 Who: Arjan van de Ven <arjan@linux.intel.com> 195 Who: Arjan van de Ven <arjan@linux.intel.com>
196 196
197 --------------------------- 197 ---------------------------
198 198
199 What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment 199 What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
200 When: October 2008 200 When: October 2008
201 Why: The stacking of class devices makes these values misleading and 201 Why: The stacking of class devices makes these values misleading and
202 inconsistent. 202 inconsistent.
203 Class devices should not carry any of these properties, and bus 203 Class devices should not carry any of these properties, and bus
204 devices have SUBSYTEM and DRIVER as a replacement. 204 devices have SUBSYTEM and DRIVER as a replacement.
205 Who: Kay Sievers <kay.sievers@suse.de> 205 Who: Kay Sievers <kay.sievers@suse.de>
206 206
207 --------------------------- 207 ---------------------------
208 208
209 What: ACPI procfs interface 209 What: ACPI procfs interface
210 When: July 2008 210 When: July 2008
211 Why: ACPI sysfs conversion should be finished by January 2008. 211 Why: ACPI sysfs conversion should be finished by January 2008.
212 ACPI procfs interface will be removed in July 2008 so that 212 ACPI procfs interface will be removed in July 2008 so that
213 there is enough time for the user space to catch up. 213 there is enough time for the user space to catch up.
214 Who: Zhang Rui <rui.zhang@intel.com> 214 Who: Zhang Rui <rui.zhang@intel.com>
215 215
216 --------------------------- 216 ---------------------------
217 217
218 What: /proc/acpi/button 218 What: /proc/acpi/button
219 When: August 2007 219 When: August 2007
220 Why: /proc/acpi/button has been replaced by events to the input layer 220 Why: /proc/acpi/button has been replaced by events to the input layer
221 since 2.6.20. 221 since 2.6.20.
222 Who: Len Brown <len.brown@intel.com> 222 Who: Len Brown <len.brown@intel.com>
223 223
224 --------------------------- 224 ---------------------------
225 225
226 What: /proc/acpi/event 226 What: /proc/acpi/event
227 When: February 2008 227 When: February 2008
228 Why: /proc/acpi/event has been replaced by events via the input layer 228 Why: /proc/acpi/event has been replaced by events via the input layer
229 and netlink since 2.6.23. 229 and netlink since 2.6.23.
230 Who: Len Brown <len.brown@intel.com> 230 Who: Len Brown <len.brown@intel.com>
231 231
232 --------------------------- 232 ---------------------------
233 233
234 What: i386/x86_64 bzImage symlinks 234 What: i386/x86_64 bzImage symlinks
235 When: April 2010 235 When: April 2010
236 236
237 Why: The i386/x86_64 merge provides a symlink to the old bzImage 237 Why: The i386/x86_64 merge provides a symlink to the old bzImage
238 location so not yet updated user space tools, e.g. package 238 location so not yet updated user space tools, e.g. package
239 scripts, do not break. 239 scripts, do not break.
240 Who: Thomas Gleixner <tglx@linutronix.de> 240 Who: Thomas Gleixner <tglx@linutronix.de>
241 241
242 --------------------------- 242 ---------------------------
243 243
244 What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib 244 What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib
245 When: February 2010 245 When: February 2010
246 Why: All callers should use explicit gpio_request()/gpio_free(). 246 Why: All callers should use explicit gpio_request()/gpio_free().
247 The autorequest mechanism in gpiolib was provided mostly as a 247 The autorequest mechanism in gpiolib was provided mostly as a
248 migration aid for legacy GPIO interfaces (for SOC based GPIOs). 248 migration aid for legacy GPIO interfaces (for SOC based GPIOs).
249 Those users have now largely migrated. Platforms implementing 249 Those users have now largely migrated. Platforms implementing
250 the GPIO interfaces without using gpiolib will see no changes. 250 the GPIO interfaces without using gpiolib will see no changes.
251 Who: David Brownell <dbrownell@users.sourceforge.net> 251 Who: David Brownell <dbrownell@users.sourceforge.net>
252 --------------------------- 252 ---------------------------
253 253
254 What: b43 support for firmware revision < 410 254 What: b43 support for firmware revision < 410
255 When: The schedule was July 2008, but it was decided that we are going to keep the 255 When: The schedule was July 2008, but it was decided that we are going to keep the
256 code as long as there are no major maintanance headaches. 256 code as long as there are no major maintanance headaches.
257 So it _could_ be removed _any_ time now, if it conflicts with something new. 257 So it _could_ be removed _any_ time now, if it conflicts with something new.
258 Why: The support code for the old firmware hurts code readability/maintainability 258 Why: The support code for the old firmware hurts code readability/maintainability
259 and slightly hurts runtime performance. Bugfixes for the old firmware 259 and slightly hurts runtime performance. Bugfixes for the old firmware
260 are not provided by Broadcom anymore. 260 are not provided by Broadcom anymore.
261 Who: Michael Buesch <mb@bu3sch.de> 261 Who: Michael Buesch <mb@bu3sch.de>
262 262
263 --------------------------- 263 ---------------------------
264 264
265 What: /sys/o2cb symlink 265 What: /sys/o2cb symlink
266 When: January 2010 266 When: January 2010
267 Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb 267 Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb
268 exists as a symlink for backwards compatibility for old versions of 268 exists as a symlink for backwards compatibility for old versions of
269 ocfs2-tools. 2 years should be sufficient time to phase in new versions 269 ocfs2-tools. 2 years should be sufficient time to phase in new versions
270 which know to look in /sys/fs/o2cb. 270 which know to look in /sys/fs/o2cb.
271 Who: ocfs2-devel@oss.oracle.com 271 Who: ocfs2-devel@oss.oracle.com
272 272
273 --------------------------- 273 ---------------------------
274 274
275 What: Ability for non root users to shm_get hugetlb pages based on mlock 275 What: Ability for non root users to shm_get hugetlb pages based on mlock
276 resource limits 276 resource limits
277 When: 2.6.31 277 When: 2.6.31
278 Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or 278 Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or
279 have CAP_IPC_LOCK to be able to allocate shm segments backed by 279 have CAP_IPC_LOCK to be able to allocate shm segments backed by
280 huge pages. The mlock based rlimit check to allow shm hugetlb is 280 huge pages. The mlock based rlimit check to allow shm hugetlb is
281 inconsistent with mmap based allocations. Hence it is being 281 inconsistent with mmap based allocations. Hence it is being
282 deprecated. 282 deprecated.
283 Who: Ravikiran Thirumalai <kiran@scalex86.org> 283 Who: Ravikiran Thirumalai <kiran@scalex86.org>
284 284
285 --------------------------- 285 ---------------------------
286 286
287 What: CONFIG_THERMAL_HWMON 287 What: CONFIG_THERMAL_HWMON
288 When: January 2009 288 When: January 2009
289 Why: This option was introduced just to allow older lm-sensors userspace 289 Why: This option was introduced just to allow older lm-sensors userspace
290 to keep working over the upgrade to 2.6.26. At the scheduled time of 290 to keep working over the upgrade to 2.6.26. At the scheduled time of
291 removal fixed lm-sensors (2.x or 3.x) should be readily available. 291 removal fixed lm-sensors (2.x or 3.x) should be readily available.
292 Who: Rene Herman <rene.herman@gmail.com> 292 Who: Rene Herman <rene.herman@gmail.com>
293 293
294 --------------------------- 294 ---------------------------
295 295
296 What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS 296 What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
297 (in net/core/net-sysfs.c) 297 (in net/core/net-sysfs.c)
298 When: After the only user (hal) has seen a release with the patches 298 When: After the only user (hal) has seen a release with the patches
299 for enough time, probably some time in 2010. 299 for enough time, probably some time in 2010.
300 Why: Over 1K .text/.data size reduction, data is available in other 300 Why: Over 1K .text/.data size reduction, data is available in other
301 ways (ioctls) 301 ways (ioctls)
302 Who: Johannes Berg <johannes@sipsolutions.net> 302 Who: Johannes Berg <johannes@sipsolutions.net>
303 303
304 --------------------------- 304 ---------------------------
305 305
306 What: CONFIG_NF_CT_ACCT 306 What: CONFIG_NF_CT_ACCT
307 When: 2.6.29 307 When: 2.6.29
308 Why: Accounting can now be enabled/disabled without kernel recompilation. 308 Why: Accounting can now be enabled/disabled without kernel recompilation.
309 Currently used only to set a default value for a feature that is also 309 Currently used only to set a default value for a feature that is also
310 controlled by a kernel/module/sysfs/sysctl parameter. 310 controlled by a kernel/module/sysfs/sysctl parameter.
311 Who: Krzysztof Piotr Oledzki <ole@ans.pl> 311 Who: Krzysztof Piotr Oledzki <ole@ans.pl>
312 312
313 --------------------------- 313 ---------------------------
314 314
315 What: sysfs ui for changing p4-clockmod parameters 315 What: sysfs ui for changing p4-clockmod parameters
316 When: September 2009 316 When: September 2009
317 Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and 317 Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and
318 e088e4c9cdb618675874becb91b2fd581ee707e6. 318 e088e4c9cdb618675874becb91b2fd581ee707e6.
319 Removal is subject to fixing any remaining bugs in ACPI which may 319 Removal is subject to fixing any remaining bugs in ACPI which may
320 cause the thermal throttling not to happen at the right time. 320 cause the thermal throttling not to happen at the right time.
321 Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com> 321 Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com>
322 322
323 ----------------------------- 323 -----------------------------
324 324
325 What: __do_IRQ all in one fits nothing interrupt handler 325 What: __do_IRQ all in one fits nothing interrupt handler
326 When: 2.6.32 326 When: 2.6.32
327 Why: __do_IRQ was kept for easy migration to the type flow handlers. 327 Why: __do_IRQ was kept for easy migration to the type flow handlers.
328 More than two years of migration time is enough. 328 More than two years of migration time is enough.
329 Who: Thomas Gleixner <tglx@linutronix.de> 329 Who: Thomas Gleixner <tglx@linutronix.de>
330 330
331 ----------------------------- 331 -----------------------------
332 332
333 What: fakephp and associated sysfs files in /sys/bus/pci/slots/ 333 What: fakephp and associated sysfs files in /sys/bus/pci/slots/
334 When: 2011 334 When: 2011
335 Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to 335 Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to
336 represent a machine's physical PCI slots. The change in semantics 336 represent a machine's physical PCI slots. The change in semantics
337 had userspace implications, as the hotplug core no longer allowed 337 had userspace implications, as the hotplug core no longer allowed
338 drivers to create multiple sysfs files per physical slot (required 338 drivers to create multiple sysfs files per physical slot (required
339 for multi-function devices, e.g.). fakephp was seen as a developer's 339 for multi-function devices, e.g.). fakephp was seen as a developer's
340 tool only, and its interface changed. Too late, we learned that 340 tool only, and its interface changed. Too late, we learned that
341 there were some users of the fakephp interface. 341 there were some users of the fakephp interface.
342 342
343 In 2.6.30, the original fakephp interface was restored. At the same 343 In 2.6.30, the original fakephp interface was restored. At the same
344 time, the PCI core gained the ability that fakephp provided, namely 344 time, the PCI core gained the ability that fakephp provided, namely
345 function-level hot-remove and hot-add. 345 function-level hot-remove and hot-add.
346 346
347 Since the PCI core now provides the same functionality, exposed in: 347 Since the PCI core now provides the same functionality, exposed in:
348 348
349 /sys/bus/pci/rescan 349 /sys/bus/pci/rescan
350 /sys/bus/pci/devices/.../remove 350 /sys/bus/pci/devices/.../remove
351 /sys/bus/pci/devices/.../rescan 351 /sys/bus/pci/devices/.../rescan
352 352
353 there is no functional reason to maintain fakephp as well. 353 there is no functional reason to maintain fakephp as well.
354 354
355 We will keep the existing module so that 'modprobe fakephp' will 355 We will keep the existing module so that 'modprobe fakephp' will
356 present the old /sys/bus/pci/slots/... interface for compatibility, 356 present the old /sys/bus/pci/slots/... interface for compatibility,
357 but users are urged to migrate their applications to the API above. 357 but users are urged to migrate their applications to the API above.
358 358
359 After a reasonable transition period, we will remove the legacy 359 After a reasonable transition period, we will remove the legacy
360 fakephp interface. 360 fakephp interface.
361 Who: Alex Chiang <achiang@hp.com> 361 Who: Alex Chiang <achiang@hp.com>
362 362
363 --------------------------- 363 ---------------------------
364 364
365 What: CONFIG_RFKILL_INPUT 365 What: CONFIG_RFKILL_INPUT
366 When: 2.6.33 366 When: 2.6.33
367 Why: Should be implemented in userspace, policy daemon. 367 Why: Should be implemented in userspace, policy daemon.
368 Who: Johannes Berg <johannes@sipsolutions.net> 368 Who: Johannes Berg <johannes@sipsolutions.net>
369 369
370 --------------------------- 370 ---------------------------
371 371
372 What: CONFIG_INOTIFY 372 What: CONFIG_INOTIFY
373 When: 2.6.33 373 When: 2.6.33
374 Why: last user (audit) will be converted to the newer more generic 374 Why: last user (audit) will be converted to the newer more generic
375 and more easily maintained fsnotify subsystem 375 and more easily maintained fsnotify subsystem
376 Who: Eric Paris <eparis@redhat.com> 376 Who: Eric Paris <eparis@redhat.com>
377 377
378 ---------------------------- 378 ----------------------------
379 379
380 What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be 380 What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
381 exported interface anymore. 381 exported interface anymore.
382 When: 2.6.33 382 When: 2.6.33
383 Why: cpu_policy_rwsem has a new cleaner definition making it local to 383 Why: cpu_policy_rwsem has a new cleaner definition making it local to
384 cpufreq core and contained inside cpufreq.c. Other dependent 384 cpufreq core and contained inside cpufreq.c. Other dependent
385 drivers should not use it in order to safely avoid lockdep issues. 385 drivers should not use it in order to safely avoid lockdep issues.
386 Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> 386 Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
387 387
388 ---------------------------- 388 ----------------------------
389 389
390 What: sound-slot/service-* module aliases and related clutters in 390 What: sound-slot/service-* module aliases and related clutters in
391 sound/sound_core.c 391 sound/sound_core.c
392 When: August 2010 392 When: August 2010
393 Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR 393 Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
394 (14) and requests modules using custom sound-slot/service-* 394 (14) and requests modules using custom sound-slot/service-*
395 module aliases. The only benefit of doing this is allowing 395 module aliases. The only benefit of doing this is allowing
396 use of custom module aliases which might as well be considered 396 use of custom module aliases which might as well be considered
397 a bug at this point. This preemptive claiming prevents 397 a bug at this point. This preemptive claiming prevents
398 alternative OSS implementations. 398 alternative OSS implementations.
399 399
400 Till the feature is removed, the kernel will be requesting 400 Till the feature is removed, the kernel will be requesting
401 both sound-slot/service-* and the standard char-major-* module 401 both sound-slot/service-* and the standard char-major-* module
402 aliases and allow turning off the pre-claiming selectively via 402 aliases and allow turning off the pre-claiming selectively via
403 CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss 403 CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss
404 kernel parameter. 404 kernel parameter.
405 405
406 After the transition phase is complete, both the custom module 406 After the transition phase is complete, both the custom module
407 aliases and switches to disable it will go away. This removal 407 aliases and switches to disable it will go away. This removal
408 will also allow making ALSA OSS emulation independent of 408 will also allow making ALSA OSS emulation independent of
409 sound_core. The dependency will be broken then too. 409 sound_core. The dependency will be broken then too.
410 Who: Tejun Heo <tj@kernel.org> 410 Who: Tejun Heo <tj@kernel.org>
411 411
412 ---------------------------- 412 ----------------------------
413 413
414 What: Support for VMware's guest paravirtuliazation technique [VMI] will be 414 What: Support for VMware's guest paravirtuliazation technique [VMI] will be
415 dropped. 415 dropped.
416 When: 2.6.37 or earlier. 416 When: 2.6.37 or earlier.
417 Why: With the recent innovations in CPU hardware acceleration technologies 417 Why: With the recent innovations in CPU hardware acceleration technologies
418 from Intel and AMD, VMware ran a few experiments to compare these 418 from Intel and AMD, VMware ran a few experiments to compare these
419 techniques to guest paravirtualization technique on VMware's platform. 419 techniques to guest paravirtualization technique on VMware's platform.
420 These hardware assisted virtualization techniques have outperformed the 420 These hardware assisted virtualization techniques have outperformed the
421 performance benefits provided by VMI in most of the workloads. VMware 421 performance benefits provided by VMI in most of the workloads. VMware
422 expects that these hardware features will be ubiquitous in a couple of 422 expects that these hardware features will be ubiquitous in a couple of
423 years, as a result, VMware has started a phased retirement of this 423 years, as a result, VMware has started a phased retirement of this
424 feature from the hypervisor. We will be removing this feature from the 424 feature from the hypervisor. We will be removing this feature from the
425 Kernel too. Right now we are targeting 2.6.37 but can retire earlier if 425 Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
426 technical reasons (read opportunity to remove major chunk of pvops) 426 technical reasons (read opportunity to remove major chunk of pvops)
427 arise. 427 arise.
428 428
429 Please note that VMI has always been an optimization and non-VMI kernels 429 Please note that VMI has always been an optimization and non-VMI kernels
430 still work fine on VMware's platform. 430 still work fine on VMware's platform.
431 Latest versions of VMware's product which support VMI are, 431 Latest versions of VMware's product which support VMI are,
432 Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence 432 Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
433 releases for these products will continue supporting VMI. 433 releases for these products will continue supporting VMI.
434 434
435 For more details about VMI retirement take a look at this, 435 For more details about VMI retirement take a look at this,
436 http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html 436 http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
437 437
438 Who: Alok N Kataria <akataria@vmware.com> 438 Who: Alok N Kataria <akataria@vmware.com>
439 439
440 ---------------------------- 440 ----------------------------
441 441
442 What: Support for lcd_switch and display_get in asus-laptop driver 442 What: Support for lcd_switch and display_get in asus-laptop driver
443 When: March 2010 443 When: March 2010
444 Why: These two features use non-standard interfaces. There are the 444 Why: These two features use non-standard interfaces. There are the
445 only features that really need multiple path to guess what's 445 only features that really need multiple path to guess what's
446 the right method name on a specific laptop. 446 the right method name on a specific laptop.
447 447
448 Removing them will allow to remove a lot of code an significantly 448 Removing them will allow to remove a lot of code an significantly
449 clean the drivers. 449 clean the drivers.
450 450
451 This will affect the backlight code which won't be able to know 451 This will affect the backlight code which won't be able to know
452 if the backlight is on or off. The platform display file will also be 452 if the backlight is on or off. The platform display file will also be
453 write only (like the one in eeepc-laptop). 453 write only (like the one in eeepc-laptop).
454 454
455 This should'nt affect a lot of user because they usually know 455 This should'nt affect a lot of user because they usually know
456 when their display is on or off. 456 when their display is on or off.
457 457
458 Who: Corentin Chary <corentin.chary@gmail.com> 458 Who: Corentin Chary <corentin.chary@gmail.com>
459 459
460 ---------------------------- 460 ----------------------------
461 461
462 What: usbvideo quickcam_messenger driver 462 What: usbvideo quickcam_messenger driver
463 When: 2.6.35 463 When: 2.6.35
464 Files: drivers/media/video/usbvideo/quickcam_messenger.[ch] 464 Files: drivers/media/video/usbvideo/quickcam_messenger.[ch]
465 Why: obsolete v4l1 driver replaced by gspca_stv06xx 465 Why: obsolete v4l1 driver replaced by gspca_stv06xx
466 Who: Hans de Goede <hdegoede@redhat.com> 466 Who: Hans de Goede <hdegoede@redhat.com>
467 467
468 ---------------------------- 468 ----------------------------
469 469
470 What: ov511 v4l1 driver 470 What: ov511 v4l1 driver
471 When: 2.6.35 471 When: 2.6.35
472 Files: drivers/media/video/ov511.[ch] 472 Files: drivers/media/video/ov511.[ch]
473 Why: obsolete v4l1 driver replaced by gspca_ov519 473 Why: obsolete v4l1 driver replaced by gspca_ov519
474 Who: Hans de Goede <hdegoede@redhat.com> 474 Who: Hans de Goede <hdegoede@redhat.com>
475 475
476 ---------------------------- 476 ----------------------------
477 477
478 What: w9968cf v4l1 driver 478 What: w9968cf v4l1 driver
479 When: 2.6.35 479 When: 2.6.35
480 Files: drivers/media/video/w9968cf*.[ch] 480 Files: drivers/media/video/w9968cf*.[ch]
481 Why: obsolete v4l1 driver replaced by gspca_ov519 481 Why: obsolete v4l1 driver replaced by gspca_ov519
482 Who: Hans de Goede <hdegoede@redhat.com> 482 Who: Hans de Goede <hdegoede@redhat.com>
483 483
484 ---------------------------- 484 ----------------------------
485 485
486 What: ovcamchip sensor framework 486 What: ovcamchip sensor framework
487 When: 2.6.35 487 When: 2.6.35
488 Files: drivers/media/video/ovcamchip/* 488 Files: drivers/media/video/ovcamchip/*
489 Why: Only used by obsoleted v4l1 drivers 489 Why: Only used by obsoleted v4l1 drivers
490 Who: Hans de Goede <hdegoede@redhat.com> 490 Who: Hans de Goede <hdegoede@redhat.com>
491 491
492 ---------------------------- 492 ----------------------------
493 493
494 What: stv680 v4l1 driver 494 What: stv680 v4l1 driver
495 When: 2.6.35 495 When: 2.6.35
496 Files: drivers/media/video/stv680.[ch] 496 Files: drivers/media/video/stv680.[ch]
497 Why: obsolete v4l1 driver replaced by gspca_stv0680 497 Why: obsolete v4l1 driver replaced by gspca_stv0680
498 Who: Hans de Goede <hdegoede@redhat.com> 498 Who: Hans de Goede <hdegoede@redhat.com>
499 499
500 ---------------------------- 500 ----------------------------
501 501
502 What: zc0301 v4l driver 502 What: zc0301 v4l driver
503 When: 2.6.35 503 When: 2.6.35
504 Files: drivers/media/video/zc0301/* 504 Files: drivers/media/video/zc0301/*
505 Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only 505 Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only
506 supports 2 USB-ID's (because it only supports a limited set of 506 supports 2 USB-ID's (because it only supports a limited set of
507 sensors) wich are also supported by the gspca_zc3xx driver 507 sensors) wich are also supported by the gspca_zc3xx driver
508 (which supports 53 USB-ID's in total) 508 (which supports 53 USB-ID's in total)
509 Who: Hans de Goede <hdegoede@redhat.com> 509 Who: Hans de Goede <hdegoede@redhat.com>
510 510
511 ---------------------------- 511 ----------------------------
512 512
513 What: sysfs-class-rfkill state file 513 What: sysfs-class-rfkill state file
514 When: Feb 2014 514 When: Feb 2014
515 Files: net/rfkill/core.c 515 Files: net/rfkill/core.c
516 Why: Documented as obsolete since Feb 2010. This file is limited to 3 516 Why: Documented as obsolete since Feb 2010. This file is limited to 3
517 states while the rfkill drivers can have 4 states. 517 states while the rfkill drivers can have 4 states.
518 Who: anybody or Florian Mickler <florian@mickler.org> 518 Who: anybody or Florian Mickler <florian@mickler.org>
519 519
520 ---------------------------- 520 ----------------------------
521 521
522 What: sysfs-class-rfkill claim file 522 What: sysfs-class-rfkill claim file
523 When: Feb 2012 523 When: Feb 2012
524 Files: net/rfkill/core.c 524 Files: net/rfkill/core.c
525 Why: It is not possible to claim an rfkill driver since 2007. This is 525 Why: It is not possible to claim an rfkill driver since 2007. This is
526 Documented as obsolete since Feb 2010. 526 Documented as obsolete since Feb 2010.
527 Who: anybody or Florian Mickler <florian@mickler.org> 527 Who: anybody or Florian Mickler <florian@mickler.org>
528 528
529 ---------------------------- 529 ----------------------------
530 530
531 What: capifs 531 What: capifs
532 When: February 2011 532 When: February 2011
533 Files: drivers/isdn/capi/capifs.* 533 Files: drivers/isdn/capi/capifs.*
534 Why: udev fully replaces this special file system that only contains CAPI 534 Why: udev fully replaces this special file system that only contains CAPI
535 NCCI TTY device nodes. User space (pppdcapiplugin) works without 535 NCCI TTY device nodes. User space (pppdcapiplugin) works without
536 noticing the difference. 536 noticing the difference.
537 Who: Jan Kiszka <jan.kiszka@web.de> 537 Who: Jan Kiszka <jan.kiszka@web.de>
538 538
539 ---------------------------- 539 ----------------------------
540 540
541 What: KVM memory aliases support
542 When: July 2010
543 Why: Memory aliasing support is used for speeding up guest vga access
544 through the vga windows.
545
546 Modern userspace no longer uses this feature, so it's just bitrotted
547 code and can be removed with no impact.
548 Who: Avi Kivity <avi@redhat.com>
549
550 ----------------------------
551
552 What: xtime, wall_to_monotonic 541 What: xtime, wall_to_monotonic
553 When: 2.6.36+ 542 When: 2.6.36+
554 Files: kernel/time/timekeeping.c include/linux/time.h 543 Files: kernel/time/timekeeping.c include/linux/time.h
555 Why: Cleaning up timekeeping internal values. Please use 544 Why: Cleaning up timekeeping internal values. Please use
556 existing timekeeping accessor functions to access 545 existing timekeeping accessor functions to access
557 the equivalent functionality. 546 the equivalent functionality.
558 Who: John Stultz <johnstul@us.ibm.com> 547 Who: John Stultz <johnstul@us.ibm.com>
559 548
560 ---------------------------- 549 ----------------------------
561 550
562 What: KVM kernel-allocated memory slots 551 What: KVM kernel-allocated memory slots
563 When: July 2010 552 When: July 2010
564 Why: Since 2.6.25, kvm supports user-allocated memory slots, which are 553 Why: Since 2.6.25, kvm supports user-allocated memory slots, which are
565 much more flexible than kernel-allocated slots. All current userspace 554 much more flexible than kernel-allocated slots. All current userspace
566 supports the newer interface and this code can be removed with no 555 supports the newer interface and this code can be removed with no
567 impact. 556 impact.
568 Who: Avi Kivity <avi@redhat.com> 557 Who: Avi Kivity <avi@redhat.com>
569 558
570 ---------------------------- 559 ----------------------------
571 560
572 What: KVM paravirt mmu host support 561 What: KVM paravirt mmu host support
573 When: January 2011 562 When: January 2011
574 Why: The paravirt mmu host support is slower than non-paravirt mmu, both 563 Why: The paravirt mmu host support is slower than non-paravirt mmu, both
575 on newer and older hardware. It is already not exposed to the guest, 564 on newer and older hardware. It is already not exposed to the guest,
576 and kept only for live migration purposes. 565 and kept only for live migration purposes.
577 Who: Avi Kivity <avi@redhat.com> 566 Who: Avi Kivity <avi@redhat.com>
578 567
579 ---------------------------- 568 ----------------------------
580 569
581 What: iwlwifi 50XX module parameters 570 What: iwlwifi 50XX module parameters
582 When: 2.6.40 571 When: 2.6.40
583 Why: The "..50" modules parameters were used to configure 5000 series and 572 Why: The "..50" modules parameters were used to configure 5000 series and
584 up devices; different set of module parameters also available for 4965 573 up devices; different set of module parameters also available for 4965
585 with same functionalities. Consolidate both set into single place 574 with same functionalities. Consolidate both set into single place
586 in drivers/net/wireless/iwlwifi/iwl-agn.c 575 in drivers/net/wireless/iwlwifi/iwl-agn.c
587 576
588 Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> 577 Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
589 578
590 ---------------------------- 579 ----------------------------
591 580
592 What: iwl4965 alias support 581 What: iwl4965 alias support
593 When: 2.6.40 582 When: 2.6.40
594 Why: Internal alias support has been present in module-init-tools for some 583 Why: Internal alias support has been present in module-init-tools for some
595 time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed 584 time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed
596 with no impact. 585 with no impact.
597 586
598 Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> 587 Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
599 588
600 --------------------------- 589 ---------------------------
601 590
602 What: xt_NOTRACK 591 What: xt_NOTRACK
603 Files: net/netfilter/xt_NOTRACK.c 592 Files: net/netfilter/xt_NOTRACK.c
604 When: April 2011 593 When: April 2011
605 Why: Superseded by xt_CT 594 Why: Superseded by xt_CT
606 Who: Netfilter developer team <netfilter-devel@vger.kernel.org> 595 Who: Netfilter developer team <netfilter-devel@vger.kernel.org>
607 596
608 --------------------------- 597 ---------------------------
609 598
610 What: video4linux /dev/vtx teletext API support 599 What: video4linux /dev/vtx teletext API support
611 When: 2.6.35 600 When: 2.6.35
612 Files: drivers/media/video/saa5246a.c drivers/media/video/saa5249.c 601 Files: drivers/media/video/saa5246a.c drivers/media/video/saa5249.c
613 include/linux/videotext.h 602 include/linux/videotext.h
614 Why: The vtx device nodes have been superseded by vbi device nodes 603 Why: The vtx device nodes have been superseded by vbi device nodes
615 for many years. No applications exist that use the vtx support. 604 for many years. No applications exist that use the vtx support.
616 Of the two i2c drivers that actually support this API the saa5249 605 Of the two i2c drivers that actually support this API the saa5249
617 has been impossible to use for a year now and no known hardware 606 has been impossible to use for a year now and no known hardware
618 that supports this device exists. The saa5246a is theoretically 607 that supports this device exists. The saa5246a is theoretically
619 supported by the old mxb boards, but it never actually worked. 608 supported by the old mxb boards, but it never actually worked.
620 609
621 In summary: there is no hardware that can use this API and there 610 In summary: there is no hardware that can use this API and there
622 are no applications actually implementing this API. 611 are no applications actually implementing this API.
623 612
624 The vtx support still reserves minors 192-223 and we would really 613 The vtx support still reserves minors 192-223 and we would really
625 like to reuse those for upcoming new functionality. In the unlikely 614 like to reuse those for upcoming new functionality. In the unlikely
626 event that new hardware appears that wants to use the functionality 615 event that new hardware appears that wants to use the functionality
627 provided by the vtx API, then that functionality should be build 616 provided by the vtx API, then that functionality should be build
628 around the sliced VBI API instead. 617 around the sliced VBI API instead.
629 Who: Hans Verkuil <hverkuil@xs4all.nl> 618 Who: Hans Verkuil <hverkuil@xs4all.nl>
630 619
631 ---------------------------- 620 ----------------------------
632 621
633 What: IRQF_DISABLED 622 What: IRQF_DISABLED
634 When: 2.6.36 623 When: 2.6.36
635 Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled 624 Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled
636 Who: Thomas Gleixner <tglx@linutronix.de> 625 Who: Thomas Gleixner <tglx@linutronix.de>
637 626
638 ---------------------------- 627 ----------------------------
639 628
640 What: old ieee1394 subsystem (CONFIG_IEEE1394) 629 What: old ieee1394 subsystem (CONFIG_IEEE1394)
641 When: 2.6.37 630 When: 2.6.37
642 Files: drivers/ieee1394/ except init_ohci1394_dma.c 631 Files: drivers/ieee1394/ except init_ohci1394_dma.c
643 Why: superseded by drivers/firewire/ (CONFIG_FIREWIRE) which offers more 632 Why: superseded by drivers/firewire/ (CONFIG_FIREWIRE) which offers more
644 features, better performance, and better security, all with smaller 633 features, better performance, and better security, all with smaller
645 and more modern code base 634 and more modern code base
646 Who: Stefan Richter <stefanr@s5r6.in-berlin.de> 635 Who: Stefan Richter <stefanr@s5r6.in-berlin.de>
647 636
648 ---------------------------- 637 ----------------------------
649 638
650 What: The acpi_sleep=s4_nonvs command line option 639 What: The acpi_sleep=s4_nonvs command line option
651 When: 2.6.37 640 When: 2.6.37
652 Files: arch/x86/kernel/acpi/sleep.c 641 Files: arch/x86/kernel/acpi/sleep.c
653 Why: superseded by acpi_sleep=nonvs 642 Why: superseded by acpi_sleep=nonvs
654 Who: Rafael J. Wysocki <rjw@sisk.pl> 643 Who: Rafael J. Wysocki <rjw@sisk.pl>
655 644
656 ---------------------------- 645 ----------------------------
657 646
Documentation/kvm/api.txt
1 The Definitive KVM (Kernel-based Virtual Machine) API Documentation 1 The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2 =================================================================== 2 ===================================================================
3 3
4 1. General description 4 1. General description
5 5
6 The kvm API is a set of ioctls that are issued to control various aspects 6 The kvm API is a set of ioctls that are issued to control various aspects
7 of a virtual machine. The ioctls belong to three classes 7 of a virtual machine. The ioctls belong to three classes
8 8
9 - System ioctls: These query and set global attributes which affect the 9 - System ioctls: These query and set global attributes which affect the
10 whole kvm subsystem. In addition a system ioctl is used to create 10 whole kvm subsystem. In addition a system ioctl is used to create
11 virtual machines 11 virtual machines
12 12
13 - VM ioctls: These query and set attributes that affect an entire virtual 13 - VM ioctls: These query and set attributes that affect an entire virtual
14 machine, for example memory layout. In addition a VM ioctl is used to 14 machine, for example memory layout. In addition a VM ioctl is used to
15 create virtual cpus (vcpus). 15 create virtual cpus (vcpus).
16 16
17 Only run VM ioctls from the same process (address space) that was used 17 Only run VM ioctls from the same process (address space) that was used
18 to create the VM. 18 to create the VM.
19 19
20 - vcpu ioctls: These query and set attributes that control the operation 20 - vcpu ioctls: These query and set attributes that control the operation
21 of a single virtual cpu. 21 of a single virtual cpu.
22 22
23 Only run vcpu ioctls from the same thread that was used to create the 23 Only run vcpu ioctls from the same thread that was used to create the
24 vcpu. 24 vcpu.
25 25
26 2. File descriptors 26 2. File descriptors
27 27
28 The kvm API is centered around file descriptors. An initial 28 The kvm API is centered around file descriptors. An initial
29 open("/dev/kvm") obtains a handle to the kvm subsystem; this handle 29 open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
30 can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this 30 can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
31 handle will create a VM file descriptor which can be used to issue VM 31 handle will create a VM file descriptor which can be used to issue VM
32 ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu 32 ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
33 and return a file descriptor pointing to it. Finally, ioctls on a vcpu 33 and return a file descriptor pointing to it. Finally, ioctls on a vcpu
34 fd can be used to control the vcpu, including the important task of 34 fd can be used to control the vcpu, including the important task of
35 actually running guest code. 35 actually running guest code.
36 36
37 In general file descriptors can be migrated among processes by means 37 In general file descriptors can be migrated among processes by means
38 of fork() and the SCM_RIGHTS facility of unix domain socket. These 38 of fork() and the SCM_RIGHTS facility of unix domain socket. These
39 kinds of tricks are explicitly not supported by kvm. While they will 39 kinds of tricks are explicitly not supported by kvm. While they will
40 not cause harm to the host, their actual behavior is not guaranteed by 40 not cause harm to the host, their actual behavior is not guaranteed by
41 the API. The only supported use is one virtual machine per process, 41 the API. The only supported use is one virtual machine per process,
42 and one vcpu per thread. 42 and one vcpu per thread.
43 43
44 3. Extensions 44 3. Extensions
45 45
46 As of Linux 2.6.22, the KVM ABI has been stabilized: no backward 46 As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
47 incompatible change are allowed. However, there is an extension 47 incompatible change are allowed. However, there is an extension
48 facility that allows backward-compatible extensions to the API to be 48 facility that allows backward-compatible extensions to the API to be
49 queried and used. 49 queried and used.
50 50
51 The extension mechanism is not based on on the Linux version number. 51 The extension mechanism is not based on on the Linux version number.
52 Instead, kvm defines extension identifiers and a facility to query 52 Instead, kvm defines extension identifiers and a facility to query
53 whether a particular extension identifier is available. If it is, a 53 whether a particular extension identifier is available. If it is, a
54 set of ioctls is available for application use. 54 set of ioctls is available for application use.
55 55
56 4. API description 56 4. API description
57 57
58 This section describes ioctls that can be used to control kvm guests. 58 This section describes ioctls that can be used to control kvm guests.
59 For each ioctl, the following information is provided along with a 59 For each ioctl, the following information is provided along with a
60 description: 60 description:
61 61
62 Capability: which KVM extension provides this ioctl. Can be 'basic', 62 Capability: which KVM extension provides this ioctl. Can be 'basic',
63 which means that is will be provided by any kernel that supports 63 which means that is will be provided by any kernel that supports
64 API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which 64 API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
65 means availability needs to be checked with KVM_CHECK_EXTENSION 65 means availability needs to be checked with KVM_CHECK_EXTENSION
66 (see section 4.4). 66 (see section 4.4).
67 67
68 Architectures: which instruction set architectures provide this ioctl. 68 Architectures: which instruction set architectures provide this ioctl.
69 x86 includes both i386 and x86_64. 69 x86 includes both i386 and x86_64.
70 70
71 Type: system, vm, or vcpu. 71 Type: system, vm, or vcpu.
72 72
73 Parameters: what parameters are accepted by the ioctl. 73 Parameters: what parameters are accepted by the ioctl.
74 74
75 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) 75 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
76 are not detailed, but errors with specific meanings are. 76 are not detailed, but errors with specific meanings are.
77 77
78 4.1 KVM_GET_API_VERSION 78 4.1 KVM_GET_API_VERSION
79 79
80 Capability: basic 80 Capability: basic
81 Architectures: all 81 Architectures: all
82 Type: system ioctl 82 Type: system ioctl
83 Parameters: none 83 Parameters: none
84 Returns: the constant KVM_API_VERSION (=12) 84 Returns: the constant KVM_API_VERSION (=12)
85 85
86 This identifies the API version as the stable kvm API. It is not 86 This identifies the API version as the stable kvm API. It is not
87 expected that this number will change. However, Linux 2.6.20 and 87 expected that this number will change. However, Linux 2.6.20 and
88 2.6.21 report earlier versions; these are not documented and not 88 2.6.21 report earlier versions; these are not documented and not
89 supported. Applications should refuse to run if KVM_GET_API_VERSION 89 supported. Applications should refuse to run if KVM_GET_API_VERSION
90 returns a value other than 12. If this check passes, all ioctls 90 returns a value other than 12. If this check passes, all ioctls
91 described as 'basic' will be available. 91 described as 'basic' will be available.
92 92
93 4.2 KVM_CREATE_VM 93 4.2 KVM_CREATE_VM
94 94
95 Capability: basic 95 Capability: basic
96 Architectures: all 96 Architectures: all
97 Type: system ioctl 97 Type: system ioctl
98 Parameters: none 98 Parameters: none
99 Returns: a VM fd that can be used to control the new virtual machine. 99 Returns: a VM fd that can be used to control the new virtual machine.
100 100
101 The new VM has no virtual cpus and no memory. An mmap() of a VM fd 101 The new VM has no virtual cpus and no memory. An mmap() of a VM fd
102 will access the virtual machine's physical address space; offset zero 102 will access the virtual machine's physical address space; offset zero
103 corresponds to guest physical address zero. Use of mmap() on a VM fd 103 corresponds to guest physical address zero. Use of mmap() on a VM fd
104 is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is 104 is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
105 available. 105 available.
106 106
107 4.3 KVM_GET_MSR_INDEX_LIST 107 4.3 KVM_GET_MSR_INDEX_LIST
108 108
109 Capability: basic 109 Capability: basic
110 Architectures: x86 110 Architectures: x86
111 Type: system 111 Type: system
112 Parameters: struct kvm_msr_list (in/out) 112 Parameters: struct kvm_msr_list (in/out)
113 Returns: 0 on success; -1 on error 113 Returns: 0 on success; -1 on error
114 Errors: 114 Errors:
115 E2BIG: the msr index list is to be to fit in the array specified by 115 E2BIG: the msr index list is to be to fit in the array specified by
116 the user. 116 the user.
117 117
118 struct kvm_msr_list { 118 struct kvm_msr_list {
119 __u32 nmsrs; /* number of msrs in entries */ 119 __u32 nmsrs; /* number of msrs in entries */
120 __u32 indices[0]; 120 __u32 indices[0];
121 }; 121 };
122 122
123 This ioctl returns the guest msrs that are supported. The list varies 123 This ioctl returns the guest msrs that are supported. The list varies
124 by kvm version and host processor, but does not change otherwise. The 124 by kvm version and host processor, but does not change otherwise. The
125 user fills in the size of the indices array in nmsrs, and in return 125 user fills in the size of the indices array in nmsrs, and in return
126 kvm adjusts nmsrs to reflect the actual number of msrs and fills in 126 kvm adjusts nmsrs to reflect the actual number of msrs and fills in
127 the indices array with their numbers. 127 the indices array with their numbers.
128 128
129 4.4 KVM_CHECK_EXTENSION 129 4.4 KVM_CHECK_EXTENSION
130 130
131 Capability: basic 131 Capability: basic
132 Architectures: all 132 Architectures: all
133 Type: system ioctl 133 Type: system ioctl
134 Parameters: extension identifier (KVM_CAP_*) 134 Parameters: extension identifier (KVM_CAP_*)
135 Returns: 0 if unsupported; 1 (or some other positive integer) if supported 135 Returns: 0 if unsupported; 1 (or some other positive integer) if supported
136 136
137 The API allows the application to query about extensions to the core 137 The API allows the application to query about extensions to the core
138 kvm API. Userspace passes an extension identifier (an integer) and 138 kvm API. Userspace passes an extension identifier (an integer) and
139 receives an integer that describes the extension availability. 139 receives an integer that describes the extension availability.
140 Generally 0 means no and 1 means yes, but some extensions may report 140 Generally 0 means no and 1 means yes, but some extensions may report
141 additional information in the integer return value. 141 additional information in the integer return value.
142 142
143 4.5 KVM_GET_VCPU_MMAP_SIZE 143 4.5 KVM_GET_VCPU_MMAP_SIZE
144 144
145 Capability: basic 145 Capability: basic
146 Architectures: all 146 Architectures: all
147 Type: system ioctl 147 Type: system ioctl
148 Parameters: none 148 Parameters: none
149 Returns: size of vcpu mmap area, in bytes 149 Returns: size of vcpu mmap area, in bytes
150 150
151 The KVM_RUN ioctl (cf.) communicates with userspace via a shared 151 The KVM_RUN ioctl (cf.) communicates with userspace via a shared
152 memory region. This ioctl returns the size of that region. See the 152 memory region. This ioctl returns the size of that region. See the
153 KVM_RUN documentation for details. 153 KVM_RUN documentation for details.
154 154
155 4.6 KVM_SET_MEMORY_REGION 155 4.6 KVM_SET_MEMORY_REGION
156 156
157 Capability: basic 157 Capability: basic
158 Architectures: all 158 Architectures: all
159 Type: vm ioctl 159 Type: vm ioctl
160 Parameters: struct kvm_memory_region (in) 160 Parameters: struct kvm_memory_region (in)
161 Returns: 0 on success, -1 on error 161 Returns: 0 on success, -1 on error
162 162
163 struct kvm_memory_region { 163 struct kvm_memory_region {
164 __u32 slot; 164 __u32 slot;
165 __u32 flags; 165 __u32 flags;
166 __u64 guest_phys_addr; 166 __u64 guest_phys_addr;
167 __u64 memory_size; /* bytes */ 167 __u64 memory_size; /* bytes */
168 }; 168 };
169 169
170 /* for kvm_memory_region::flags */ 170 /* for kvm_memory_region::flags */
171 #define KVM_MEM_LOG_DIRTY_PAGES 1UL 171 #define KVM_MEM_LOG_DIRTY_PAGES 1UL
172 172
173 This ioctl allows the user to create or modify a guest physical memory 173 This ioctl allows the user to create or modify a guest physical memory
174 slot. When changing an existing slot, it may be moved in the guest 174 slot. When changing an existing slot, it may be moved in the guest
175 physical memory space, or its flags may be modified. It may not be 175 physical memory space, or its flags may be modified. It may not be
176 resized. Slots may not overlap. 176 resized. Slots may not overlap.
177 177
178 The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which 178 The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
179 instructs kvm to keep track of writes to memory within the slot. See 179 instructs kvm to keep track of writes to memory within the slot. See
180 the KVM_GET_DIRTY_LOG ioctl. 180 the KVM_GET_DIRTY_LOG ioctl.
181 181
182 It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead 182 It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead
183 of this API, if available. This newer API allows placing guest memory 183 of this API, if available. This newer API allows placing guest memory
184 at specified locations in the host address space, yielding better 184 at specified locations in the host address space, yielding better
185 control and easy access. 185 control and easy access.
186 186
187 4.6 KVM_CREATE_VCPU 187 4.6 KVM_CREATE_VCPU
188 188
189 Capability: basic 189 Capability: basic
190 Architectures: all 190 Architectures: all
191 Type: vm ioctl 191 Type: vm ioctl
192 Parameters: vcpu id (apic id on x86) 192 Parameters: vcpu id (apic id on x86)
193 Returns: vcpu fd on success, -1 on error 193 Returns: vcpu fd on success, -1 on error
194 194
195 This API adds a vcpu to a virtual machine. The vcpu id is a small integer 195 This API adds a vcpu to a virtual machine. The vcpu id is a small integer
196 in the range [0, max_vcpus). 196 in the range [0, max_vcpus).
197 197
198 4.7 KVM_GET_DIRTY_LOG (vm ioctl) 198 4.7 KVM_GET_DIRTY_LOG (vm ioctl)
199 199
200 Capability: basic 200 Capability: basic
201 Architectures: x86 201 Architectures: x86
202 Type: vm ioctl 202 Type: vm ioctl
203 Parameters: struct kvm_dirty_log (in/out) 203 Parameters: struct kvm_dirty_log (in/out)
204 Returns: 0 on success, -1 on error 204 Returns: 0 on success, -1 on error
205 205
206 /* for KVM_GET_DIRTY_LOG */ 206 /* for KVM_GET_DIRTY_LOG */
207 struct kvm_dirty_log { 207 struct kvm_dirty_log {
208 __u32 slot; 208 __u32 slot;
209 __u32 padding; 209 __u32 padding;
210 union { 210 union {
211 void __user *dirty_bitmap; /* one bit per page */ 211 void __user *dirty_bitmap; /* one bit per page */
212 __u64 padding; 212 __u64 padding;
213 }; 213 };
214 }; 214 };
215 215
216 Given a memory slot, return a bitmap containing any pages dirtied 216 Given a memory slot, return a bitmap containing any pages dirtied
217 since the last call to this ioctl. Bit 0 is the first page in the 217 since the last call to this ioctl. Bit 0 is the first page in the
218 memory slot. Ensure the entire structure is cleared to avoid padding 218 memory slot. Ensure the entire structure is cleared to avoid padding
219 issues. 219 issues.
220 220
221 4.8 KVM_SET_MEMORY_ALIAS 221 4.8 KVM_SET_MEMORY_ALIAS
222 222
223 Capability: basic 223 Capability: basic
224 Architectures: x86 224 Architectures: x86
225 Type: vm ioctl 225 Type: vm ioctl
226 Parameters: struct kvm_memory_alias (in) 226 Parameters: struct kvm_memory_alias (in)
227 Returns: 0 (success), -1 (error) 227 Returns: 0 (success), -1 (error)
228 228
229 struct kvm_memory_alias { 229 This ioctl is obsolete and has been removed.
230 __u32 slot; /* this has a different namespace than memory slots */
231 __u32 flags;
232 __u64 guest_phys_addr;
233 __u64 memory_size;
234 __u64 target_phys_addr;
235 };
236
237 Defines a guest physical address space region as an alias to another
238 region. Useful for aliased address, for example the VGA low memory
239 window. Should not be used with userspace memory.
240 230
241 4.9 KVM_RUN 231 4.9 KVM_RUN
242 232
243 Capability: basic 233 Capability: basic
244 Architectures: all 234 Architectures: all
245 Type: vcpu ioctl 235 Type: vcpu ioctl
246 Parameters: none 236 Parameters: none
247 Returns: 0 on success, -1 on error 237 Returns: 0 on success, -1 on error
248 Errors: 238 Errors:
249 EINTR: an unmasked signal is pending 239 EINTR: an unmasked signal is pending
250 240
251 This ioctl is used to run a guest virtual cpu. While there are no 241 This ioctl is used to run a guest virtual cpu. While there are no
252 explicit parameters, there is an implicit parameter block that can be 242 explicit parameters, there is an implicit parameter block that can be
253 obtained by mmap()ing the vcpu fd at offset 0, with the size given by 243 obtained by mmap()ing the vcpu fd at offset 0, with the size given by
254 KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct 244 KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
255 kvm_run' (see below). 245 kvm_run' (see below).
256 246
257 4.10 KVM_GET_REGS 247 4.10 KVM_GET_REGS
258 248
259 Capability: basic 249 Capability: basic
260 Architectures: all 250 Architectures: all
261 Type: vcpu ioctl 251 Type: vcpu ioctl
262 Parameters: struct kvm_regs (out) 252 Parameters: struct kvm_regs (out)
263 Returns: 0 on success, -1 on error 253 Returns: 0 on success, -1 on error
264 254
265 Reads the general purpose registers from the vcpu. 255 Reads the general purpose registers from the vcpu.
266 256
267 /* x86 */ 257 /* x86 */
268 struct kvm_regs { 258 struct kvm_regs {
269 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ 259 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
270 __u64 rax, rbx, rcx, rdx; 260 __u64 rax, rbx, rcx, rdx;
271 __u64 rsi, rdi, rsp, rbp; 261 __u64 rsi, rdi, rsp, rbp;
272 __u64 r8, r9, r10, r11; 262 __u64 r8, r9, r10, r11;
273 __u64 r12, r13, r14, r15; 263 __u64 r12, r13, r14, r15;
274 __u64 rip, rflags; 264 __u64 rip, rflags;
275 }; 265 };
276 266
277 4.11 KVM_SET_REGS 267 4.11 KVM_SET_REGS
278 268
279 Capability: basic 269 Capability: basic
280 Architectures: all 270 Architectures: all
281 Type: vcpu ioctl 271 Type: vcpu ioctl
282 Parameters: struct kvm_regs (in) 272 Parameters: struct kvm_regs (in)
283 Returns: 0 on success, -1 on error 273 Returns: 0 on success, -1 on error
284 274
285 Writes the general purpose registers into the vcpu. 275 Writes the general purpose registers into the vcpu.
286 276
287 See KVM_GET_REGS for the data structure. 277 See KVM_GET_REGS for the data structure.
288 278
289 4.12 KVM_GET_SREGS 279 4.12 KVM_GET_SREGS
290 280
291 Capability: basic 281 Capability: basic
292 Architectures: x86 282 Architectures: x86
293 Type: vcpu ioctl 283 Type: vcpu ioctl
294 Parameters: struct kvm_sregs (out) 284 Parameters: struct kvm_sregs (out)
295 Returns: 0 on success, -1 on error 285 Returns: 0 on success, -1 on error
296 286
297 Reads special registers from the vcpu. 287 Reads special registers from the vcpu.
298 288
299 /* x86 */ 289 /* x86 */
300 struct kvm_sregs { 290 struct kvm_sregs {
301 struct kvm_segment cs, ds, es, fs, gs, ss; 291 struct kvm_segment cs, ds, es, fs, gs, ss;
302 struct kvm_segment tr, ldt; 292 struct kvm_segment tr, ldt;
303 struct kvm_dtable gdt, idt; 293 struct kvm_dtable gdt, idt;
304 __u64 cr0, cr2, cr3, cr4, cr8; 294 __u64 cr0, cr2, cr3, cr4, cr8;
305 __u64 efer; 295 __u64 efer;
306 __u64 apic_base; 296 __u64 apic_base;
307 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; 297 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
308 }; 298 };
309 299
310 interrupt_bitmap is a bitmap of pending external interrupts. At most 300 interrupt_bitmap is a bitmap of pending external interrupts. At most
311 one bit may be set. This interrupt has been acknowledged by the APIC 301 one bit may be set. This interrupt has been acknowledged by the APIC
312 but not yet injected into the cpu core. 302 but not yet injected into the cpu core.
313 303
314 4.13 KVM_SET_SREGS 304 4.13 KVM_SET_SREGS
315 305
316 Capability: basic 306 Capability: basic
317 Architectures: x86 307 Architectures: x86
318 Type: vcpu ioctl 308 Type: vcpu ioctl
319 Parameters: struct kvm_sregs (in) 309 Parameters: struct kvm_sregs (in)
320 Returns: 0 on success, -1 on error 310 Returns: 0 on success, -1 on error
321 311
322 Writes special registers into the vcpu. See KVM_GET_SREGS for the 312 Writes special registers into the vcpu. See KVM_GET_SREGS for the
323 data structures. 313 data structures.
324 314
325 4.14 KVM_TRANSLATE 315 4.14 KVM_TRANSLATE
326 316
327 Capability: basic 317 Capability: basic
328 Architectures: x86 318 Architectures: x86
329 Type: vcpu ioctl 319 Type: vcpu ioctl
330 Parameters: struct kvm_translation (in/out) 320 Parameters: struct kvm_translation (in/out)
331 Returns: 0 on success, -1 on error 321 Returns: 0 on success, -1 on error
332 322
333 Translates a virtual address according to the vcpu's current address 323 Translates a virtual address according to the vcpu's current address
334 translation mode. 324 translation mode.
335 325
336 struct kvm_translation { 326 struct kvm_translation {
337 /* in */ 327 /* in */
338 __u64 linear_address; 328 __u64 linear_address;
339 329
340 /* out */ 330 /* out */
341 __u64 physical_address; 331 __u64 physical_address;
342 __u8 valid; 332 __u8 valid;
343 __u8 writeable; 333 __u8 writeable;
344 __u8 usermode; 334 __u8 usermode;
345 __u8 pad[5]; 335 __u8 pad[5];
346 }; 336 };
347 337
348 4.15 KVM_INTERRUPT 338 4.15 KVM_INTERRUPT
349 339
350 Capability: basic 340 Capability: basic
351 Architectures: x86 341 Architectures: x86
352 Type: vcpu ioctl 342 Type: vcpu ioctl
353 Parameters: struct kvm_interrupt (in) 343 Parameters: struct kvm_interrupt (in)
354 Returns: 0 on success, -1 on error 344 Returns: 0 on success, -1 on error
355 345
356 Queues a hardware interrupt vector to be injected. This is only 346 Queues a hardware interrupt vector to be injected. This is only
357 useful if in-kernel local APIC is not used. 347 useful if in-kernel local APIC is not used.
358 348
359 /* for KVM_INTERRUPT */ 349 /* for KVM_INTERRUPT */
360 struct kvm_interrupt { 350 struct kvm_interrupt {
361 /* in */ 351 /* in */
362 __u32 irq; 352 __u32 irq;
363 }; 353 };
364 354
365 Note 'irq' is an interrupt vector, not an interrupt pin or line. 355 Note 'irq' is an interrupt vector, not an interrupt pin or line.
366 356
367 4.16 KVM_DEBUG_GUEST 357 4.16 KVM_DEBUG_GUEST
368 358
369 Capability: basic 359 Capability: basic
370 Architectures: none 360 Architectures: none
371 Type: vcpu ioctl 361 Type: vcpu ioctl
372 Parameters: none) 362 Parameters: none)
373 Returns: -1 on error 363 Returns: -1 on error
374 364
375 Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. 365 Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
376 366
377 4.17 KVM_GET_MSRS 367 4.17 KVM_GET_MSRS
378 368
379 Capability: basic 369 Capability: basic
380 Architectures: x86 370 Architectures: x86
381 Type: vcpu ioctl 371 Type: vcpu ioctl
382 Parameters: struct kvm_msrs (in/out) 372 Parameters: struct kvm_msrs (in/out)
383 Returns: 0 on success, -1 on error 373 Returns: 0 on success, -1 on error
384 374
385 Reads model-specific registers from the vcpu. Supported msr indices can 375 Reads model-specific registers from the vcpu. Supported msr indices can
386 be obtained using KVM_GET_MSR_INDEX_LIST. 376 be obtained using KVM_GET_MSR_INDEX_LIST.
387 377
388 struct kvm_msrs { 378 struct kvm_msrs {
389 __u32 nmsrs; /* number of msrs in entries */ 379 __u32 nmsrs; /* number of msrs in entries */
390 __u32 pad; 380 __u32 pad;
391 381
392 struct kvm_msr_entry entries[0]; 382 struct kvm_msr_entry entries[0];
393 }; 383 };
394 384
395 struct kvm_msr_entry { 385 struct kvm_msr_entry {
396 __u32 index; 386 __u32 index;
397 __u32 reserved; 387 __u32 reserved;
398 __u64 data; 388 __u64 data;
399 }; 389 };
400 390
401 Application code should set the 'nmsrs' member (which indicates the 391 Application code should set the 'nmsrs' member (which indicates the
402 size of the entries array) and the 'index' member of each array entry. 392 size of the entries array) and the 'index' member of each array entry.
403 kvm will fill in the 'data' member. 393 kvm will fill in the 'data' member.
404 394
405 4.18 KVM_SET_MSRS 395 4.18 KVM_SET_MSRS
406 396
407 Capability: basic 397 Capability: basic
408 Architectures: x86 398 Architectures: x86
409 Type: vcpu ioctl 399 Type: vcpu ioctl
410 Parameters: struct kvm_msrs (in) 400 Parameters: struct kvm_msrs (in)
411 Returns: 0 on success, -1 on error 401 Returns: 0 on success, -1 on error
412 402
413 Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the 403 Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
414 data structures. 404 data structures.
415 405
416 Application code should set the 'nmsrs' member (which indicates the 406 Application code should set the 'nmsrs' member (which indicates the
417 size of the entries array), and the 'index' and 'data' members of each 407 size of the entries array), and the 'index' and 'data' members of each
418 array entry. 408 array entry.
419 409
420 4.19 KVM_SET_CPUID 410 4.19 KVM_SET_CPUID
421 411
422 Capability: basic 412 Capability: basic
423 Architectures: x86 413 Architectures: x86
424 Type: vcpu ioctl 414 Type: vcpu ioctl
425 Parameters: struct kvm_cpuid (in) 415 Parameters: struct kvm_cpuid (in)
426 Returns: 0 on success, -1 on error 416 Returns: 0 on success, -1 on error
427 417
428 Defines the vcpu responses to the cpuid instruction. Applications 418 Defines the vcpu responses to the cpuid instruction. Applications
429 should use the KVM_SET_CPUID2 ioctl if available. 419 should use the KVM_SET_CPUID2 ioctl if available.
430 420
431 421
432 struct kvm_cpuid_entry { 422 struct kvm_cpuid_entry {
433 __u32 function; 423 __u32 function;
434 __u32 eax; 424 __u32 eax;
435 __u32 ebx; 425 __u32 ebx;
436 __u32 ecx; 426 __u32 ecx;
437 __u32 edx; 427 __u32 edx;
438 __u32 padding; 428 __u32 padding;
439 }; 429 };
440 430
441 /* for KVM_SET_CPUID */ 431 /* for KVM_SET_CPUID */
442 struct kvm_cpuid { 432 struct kvm_cpuid {
443 __u32 nent; 433 __u32 nent;
444 __u32 padding; 434 __u32 padding;
445 struct kvm_cpuid_entry entries[0]; 435 struct kvm_cpuid_entry entries[0];
446 }; 436 };
447 437
448 4.20 KVM_SET_SIGNAL_MASK 438 4.20 KVM_SET_SIGNAL_MASK
449 439
450 Capability: basic 440 Capability: basic
451 Architectures: x86 441 Architectures: x86
452 Type: vcpu ioctl 442 Type: vcpu ioctl
453 Parameters: struct kvm_signal_mask (in) 443 Parameters: struct kvm_signal_mask (in)
454 Returns: 0 on success, -1 on error 444 Returns: 0 on success, -1 on error
455 445
456 Defines which signals are blocked during execution of KVM_RUN. This 446 Defines which signals are blocked during execution of KVM_RUN. This
457 signal mask temporarily overrides the threads signal mask. Any 447 signal mask temporarily overrides the threads signal mask. Any
458 unblocked signal received (except SIGKILL and SIGSTOP, which retain 448 unblocked signal received (except SIGKILL and SIGSTOP, which retain
459 their traditional behaviour) will cause KVM_RUN to return with -EINTR. 449 their traditional behaviour) will cause KVM_RUN to return with -EINTR.
460 450
461 Note the signal will only be delivered if not blocked by the original 451 Note the signal will only be delivered if not blocked by the original
462 signal mask. 452 signal mask.
463 453
464 /* for KVM_SET_SIGNAL_MASK */ 454 /* for KVM_SET_SIGNAL_MASK */
465 struct kvm_signal_mask { 455 struct kvm_signal_mask {
466 __u32 len; 456 __u32 len;
467 __u8 sigset[0]; 457 __u8 sigset[0];
468 }; 458 };
469 459
470 4.21 KVM_GET_FPU 460 4.21 KVM_GET_FPU
471 461
472 Capability: basic 462 Capability: basic
473 Architectures: x86 463 Architectures: x86
474 Type: vcpu ioctl 464 Type: vcpu ioctl
475 Parameters: struct kvm_fpu (out) 465 Parameters: struct kvm_fpu (out)
476 Returns: 0 on success, -1 on error 466 Returns: 0 on success, -1 on error
477 467
478 Reads the floating point state from the vcpu. 468 Reads the floating point state from the vcpu.
479 469
480 /* for KVM_GET_FPU and KVM_SET_FPU */ 470 /* for KVM_GET_FPU and KVM_SET_FPU */
481 struct kvm_fpu { 471 struct kvm_fpu {
482 __u8 fpr[8][16]; 472 __u8 fpr[8][16];
483 __u16 fcw; 473 __u16 fcw;
484 __u16 fsw; 474 __u16 fsw;
485 __u8 ftwx; /* in fxsave format */ 475 __u8 ftwx; /* in fxsave format */
486 __u8 pad1; 476 __u8 pad1;
487 __u16 last_opcode; 477 __u16 last_opcode;
488 __u64 last_ip; 478 __u64 last_ip;
489 __u64 last_dp; 479 __u64 last_dp;
490 __u8 xmm[16][16]; 480 __u8 xmm[16][16];
491 __u32 mxcsr; 481 __u32 mxcsr;
492 __u32 pad2; 482 __u32 pad2;
493 }; 483 };
494 484
495 4.22 KVM_SET_FPU 485 4.22 KVM_SET_FPU
496 486
497 Capability: basic 487 Capability: basic
498 Architectures: x86 488 Architectures: x86
499 Type: vcpu ioctl 489 Type: vcpu ioctl
500 Parameters: struct kvm_fpu (in) 490 Parameters: struct kvm_fpu (in)
501 Returns: 0 on success, -1 on error 491 Returns: 0 on success, -1 on error
502 492
503 Writes the floating point state to the vcpu. 493 Writes the floating point state to the vcpu.
504 494
505 /* for KVM_GET_FPU and KVM_SET_FPU */ 495 /* for KVM_GET_FPU and KVM_SET_FPU */
506 struct kvm_fpu { 496 struct kvm_fpu {
507 __u8 fpr[8][16]; 497 __u8 fpr[8][16];
508 __u16 fcw; 498 __u16 fcw;
509 __u16 fsw; 499 __u16 fsw;
510 __u8 ftwx; /* in fxsave format */ 500 __u8 ftwx; /* in fxsave format */
511 __u8 pad1; 501 __u8 pad1;
512 __u16 last_opcode; 502 __u16 last_opcode;
513 __u64 last_ip; 503 __u64 last_ip;
514 __u64 last_dp; 504 __u64 last_dp;
515 __u8 xmm[16][16]; 505 __u8 xmm[16][16];
516 __u32 mxcsr; 506 __u32 mxcsr;
517 __u32 pad2; 507 __u32 pad2;
518 }; 508 };
519 509
520 4.23 KVM_CREATE_IRQCHIP 510 4.23 KVM_CREATE_IRQCHIP
521 511
522 Capability: KVM_CAP_IRQCHIP 512 Capability: KVM_CAP_IRQCHIP
523 Architectures: x86, ia64 513 Architectures: x86, ia64
524 Type: vm ioctl 514 Type: vm ioctl
525 Parameters: none 515 Parameters: none
526 Returns: 0 on success, -1 on error 516 Returns: 0 on success, -1 on error
527 517
528 Creates an interrupt controller model in the kernel. On x86, creates a virtual 518 Creates an interrupt controller model in the kernel. On x86, creates a virtual
529 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a 519 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
530 local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 520 local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
531 only go to the IOAPIC. On ia64, a IOSAPIC is created. 521 only go to the IOAPIC. On ia64, a IOSAPIC is created.
532 522
533 4.24 KVM_IRQ_LINE 523 4.24 KVM_IRQ_LINE
534 524
535 Capability: KVM_CAP_IRQCHIP 525 Capability: KVM_CAP_IRQCHIP
536 Architectures: x86, ia64 526 Architectures: x86, ia64
537 Type: vm ioctl 527 Type: vm ioctl
538 Parameters: struct kvm_irq_level 528 Parameters: struct kvm_irq_level
539 Returns: 0 on success, -1 on error 529 Returns: 0 on success, -1 on error
540 530
541 Sets the level of a GSI input to the interrupt controller model in the kernel. 531 Sets the level of a GSI input to the interrupt controller model in the kernel.
542 Requires that an interrupt controller model has been previously created with 532 Requires that an interrupt controller model has been previously created with
543 KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level 533 KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level
544 to be set to 1 and then back to 0. 534 to be set to 1 and then back to 0.
545 535
546 struct kvm_irq_level { 536 struct kvm_irq_level {
547 union { 537 union {
548 __u32 irq; /* GSI */ 538 __u32 irq; /* GSI */
549 __s32 status; /* not used for KVM_IRQ_LEVEL */ 539 __s32 status; /* not used for KVM_IRQ_LEVEL */
550 }; 540 };
551 __u32 level; /* 0 or 1 */ 541 __u32 level; /* 0 or 1 */
552 }; 542 };
553 543
554 4.25 KVM_GET_IRQCHIP 544 4.25 KVM_GET_IRQCHIP
555 545
556 Capability: KVM_CAP_IRQCHIP 546 Capability: KVM_CAP_IRQCHIP
557 Architectures: x86, ia64 547 Architectures: x86, ia64
558 Type: vm ioctl 548 Type: vm ioctl
559 Parameters: struct kvm_irqchip (in/out) 549 Parameters: struct kvm_irqchip (in/out)
560 Returns: 0 on success, -1 on error 550 Returns: 0 on success, -1 on error
561 551
562 Reads the state of a kernel interrupt controller created with 552 Reads the state of a kernel interrupt controller created with
563 KVM_CREATE_IRQCHIP into a buffer provided by the caller. 553 KVM_CREATE_IRQCHIP into a buffer provided by the caller.
564 554
565 struct kvm_irqchip { 555 struct kvm_irqchip {
566 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ 556 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
567 __u32 pad; 557 __u32 pad;
568 union { 558 union {
569 char dummy[512]; /* reserving space */ 559 char dummy[512]; /* reserving space */
570 struct kvm_pic_state pic; 560 struct kvm_pic_state pic;
571 struct kvm_ioapic_state ioapic; 561 struct kvm_ioapic_state ioapic;
572 } chip; 562 } chip;
573 }; 563 };
574 564
575 4.26 KVM_SET_IRQCHIP 565 4.26 KVM_SET_IRQCHIP
576 566
577 Capability: KVM_CAP_IRQCHIP 567 Capability: KVM_CAP_IRQCHIP
578 Architectures: x86, ia64 568 Architectures: x86, ia64
579 Type: vm ioctl 569 Type: vm ioctl
580 Parameters: struct kvm_irqchip (in) 570 Parameters: struct kvm_irqchip (in)
581 Returns: 0 on success, -1 on error 571 Returns: 0 on success, -1 on error
582 572
583 Sets the state of a kernel interrupt controller created with 573 Sets the state of a kernel interrupt controller created with
584 KVM_CREATE_IRQCHIP from a buffer provided by the caller. 574 KVM_CREATE_IRQCHIP from a buffer provided by the caller.
585 575
586 struct kvm_irqchip { 576 struct kvm_irqchip {
587 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ 577 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
588 __u32 pad; 578 __u32 pad;
589 union { 579 union {
590 char dummy[512]; /* reserving space */ 580 char dummy[512]; /* reserving space */
591 struct kvm_pic_state pic; 581 struct kvm_pic_state pic;
592 struct kvm_ioapic_state ioapic; 582 struct kvm_ioapic_state ioapic;
593 } chip; 583 } chip;
594 }; 584 };
595 585
596 4.27 KVM_XEN_HVM_CONFIG 586 4.27 KVM_XEN_HVM_CONFIG
597 587
598 Capability: KVM_CAP_XEN_HVM 588 Capability: KVM_CAP_XEN_HVM
599 Architectures: x86 589 Architectures: x86
600 Type: vm ioctl 590 Type: vm ioctl
601 Parameters: struct kvm_xen_hvm_config (in) 591 Parameters: struct kvm_xen_hvm_config (in)
602 Returns: 0 on success, -1 on error 592 Returns: 0 on success, -1 on error
603 593
604 Sets the MSR that the Xen HVM guest uses to initialize its hypercall 594 Sets the MSR that the Xen HVM guest uses to initialize its hypercall
605 page, and provides the starting address and size of the hypercall 595 page, and provides the starting address and size of the hypercall
606 blobs in userspace. When the guest writes the MSR, kvm copies one 596 blobs in userspace. When the guest writes the MSR, kvm copies one
607 page of a blob (32- or 64-bit, depending on the vcpu mode) to guest 597 page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
608 memory. 598 memory.
609 599
610 struct kvm_xen_hvm_config { 600 struct kvm_xen_hvm_config {
611 __u32 flags; 601 __u32 flags;
612 __u32 msr; 602 __u32 msr;
613 __u64 blob_addr_32; 603 __u64 blob_addr_32;
614 __u64 blob_addr_64; 604 __u64 blob_addr_64;
615 __u8 blob_size_32; 605 __u8 blob_size_32;
616 __u8 blob_size_64; 606 __u8 blob_size_64;
617 __u8 pad2[30]; 607 __u8 pad2[30];
618 }; 608 };
619 609
620 4.27 KVM_GET_CLOCK 610 4.27 KVM_GET_CLOCK
621 611
622 Capability: KVM_CAP_ADJUST_CLOCK 612 Capability: KVM_CAP_ADJUST_CLOCK
623 Architectures: x86 613 Architectures: x86
624 Type: vm ioctl 614 Type: vm ioctl
625 Parameters: struct kvm_clock_data (out) 615 Parameters: struct kvm_clock_data (out)
626 Returns: 0 on success, -1 on error 616 Returns: 0 on success, -1 on error
627 617
628 Gets the current timestamp of kvmclock as seen by the current guest. In 618 Gets the current timestamp of kvmclock as seen by the current guest. In
629 conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios 619 conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
630 such as migration. 620 such as migration.
631 621
632 struct kvm_clock_data { 622 struct kvm_clock_data {
633 __u64 clock; /* kvmclock current value */ 623 __u64 clock; /* kvmclock current value */
634 __u32 flags; 624 __u32 flags;
635 __u32 pad[9]; 625 __u32 pad[9];
636 }; 626 };
637 627
638 4.28 KVM_SET_CLOCK 628 4.28 KVM_SET_CLOCK
639 629
640 Capability: KVM_CAP_ADJUST_CLOCK 630 Capability: KVM_CAP_ADJUST_CLOCK
641 Architectures: x86 631 Architectures: x86
642 Type: vm ioctl 632 Type: vm ioctl
643 Parameters: struct kvm_clock_data (in) 633 Parameters: struct kvm_clock_data (in)
644 Returns: 0 on success, -1 on error 634 Returns: 0 on success, -1 on error
645 635
646 Sets the current timestamp of kvmclock to the value specified in its parameter. 636 Sets the current timestamp of kvmclock to the value specified in its parameter.
647 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios 637 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
648 such as migration. 638 such as migration.
649 639
650 struct kvm_clock_data { 640 struct kvm_clock_data {
651 __u64 clock; /* kvmclock current value */ 641 __u64 clock; /* kvmclock current value */
652 __u32 flags; 642 __u32 flags;
653 __u32 pad[9]; 643 __u32 pad[9];
654 }; 644 };
655 645
656 4.29 KVM_GET_VCPU_EVENTS 646 4.29 KVM_GET_VCPU_EVENTS
657 647
658 Capability: KVM_CAP_VCPU_EVENTS 648 Capability: KVM_CAP_VCPU_EVENTS
659 Extended by: KVM_CAP_INTR_SHADOW 649 Extended by: KVM_CAP_INTR_SHADOW
660 Architectures: x86 650 Architectures: x86
661 Type: vm ioctl 651 Type: vm ioctl
662 Parameters: struct kvm_vcpu_event (out) 652 Parameters: struct kvm_vcpu_event (out)
663 Returns: 0 on success, -1 on error 653 Returns: 0 on success, -1 on error
664 654
665 Gets currently pending exceptions, interrupts, and NMIs as well as related 655 Gets currently pending exceptions, interrupts, and NMIs as well as related
666 states of the vcpu. 656 states of the vcpu.
667 657
668 struct kvm_vcpu_events { 658 struct kvm_vcpu_events {
669 struct { 659 struct {
670 __u8 injected; 660 __u8 injected;
671 __u8 nr; 661 __u8 nr;
672 __u8 has_error_code; 662 __u8 has_error_code;
673 __u8 pad; 663 __u8 pad;
674 __u32 error_code; 664 __u32 error_code;
675 } exception; 665 } exception;
676 struct { 666 struct {
677 __u8 injected; 667 __u8 injected;
678 __u8 nr; 668 __u8 nr;
679 __u8 soft; 669 __u8 soft;
680 __u8 shadow; 670 __u8 shadow;
681 } interrupt; 671 } interrupt;
682 struct { 672 struct {
683 __u8 injected; 673 __u8 injected;
684 __u8 pending; 674 __u8 pending;
685 __u8 masked; 675 __u8 masked;
686 __u8 pad; 676 __u8 pad;
687 } nmi; 677 } nmi;
688 __u32 sipi_vector; 678 __u32 sipi_vector;
689 __u32 flags; 679 __u32 flags;
690 }; 680 };
691 681
692 KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that 682 KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
693 interrupt.shadow contains a valid state. Otherwise, this field is undefined. 683 interrupt.shadow contains a valid state. Otherwise, this field is undefined.
694 684
695 4.30 KVM_SET_VCPU_EVENTS 685 4.30 KVM_SET_VCPU_EVENTS
696 686
697 Capability: KVM_CAP_VCPU_EVENTS 687 Capability: KVM_CAP_VCPU_EVENTS
698 Extended by: KVM_CAP_INTR_SHADOW 688 Extended by: KVM_CAP_INTR_SHADOW
699 Architectures: x86 689 Architectures: x86
700 Type: vm ioctl 690 Type: vm ioctl
701 Parameters: struct kvm_vcpu_event (in) 691 Parameters: struct kvm_vcpu_event (in)
702 Returns: 0 on success, -1 on error 692 Returns: 0 on success, -1 on error
703 693
704 Set pending exceptions, interrupts, and NMIs as well as related states of the 694 Set pending exceptions, interrupts, and NMIs as well as related states of the
705 vcpu. 695 vcpu.
706 696
707 See KVM_GET_VCPU_EVENTS for the data structure. 697 See KVM_GET_VCPU_EVENTS for the data structure.
708 698
709 Fields that may be modified asynchronously by running VCPUs can be excluded 699 Fields that may be modified asynchronously by running VCPUs can be excluded
710 from the update. These fields are nmi.pending and sipi_vector. Keep the 700 from the update. These fields are nmi.pending and sipi_vector. Keep the
711 corresponding bits in the flags field cleared to suppress overwriting the 701 corresponding bits in the flags field cleared to suppress overwriting the
712 current in-kernel state. The bits are: 702 current in-kernel state. The bits are:
713 703
714 KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel 704 KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
715 KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector 705 KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
716 706
717 If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in 707 If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
718 the flags field to signal that interrupt.shadow contains a valid state and 708 the flags field to signal that interrupt.shadow contains a valid state and
719 shall be written into the VCPU. 709 shall be written into the VCPU.
720 710
721 4.32 KVM_GET_DEBUGREGS 711 4.32 KVM_GET_DEBUGREGS
722 712
723 Capability: KVM_CAP_DEBUGREGS 713 Capability: KVM_CAP_DEBUGREGS
724 Architectures: x86 714 Architectures: x86
725 Type: vm ioctl 715 Type: vm ioctl
726 Parameters: struct kvm_debugregs (out) 716 Parameters: struct kvm_debugregs (out)
727 Returns: 0 on success, -1 on error 717 Returns: 0 on success, -1 on error
728 718
729 Reads debug registers from the vcpu. 719 Reads debug registers from the vcpu.
730 720
731 struct kvm_debugregs { 721 struct kvm_debugregs {
732 __u64 db[4]; 722 __u64 db[4];
733 __u64 dr6; 723 __u64 dr6;
734 __u64 dr7; 724 __u64 dr7;
735 __u64 flags; 725 __u64 flags;
736 __u64 reserved[9]; 726 __u64 reserved[9];
737 }; 727 };
738 728
739 4.33 KVM_SET_DEBUGREGS 729 4.33 KVM_SET_DEBUGREGS
740 730
741 Capability: KVM_CAP_DEBUGREGS 731 Capability: KVM_CAP_DEBUGREGS
742 Architectures: x86 732 Architectures: x86
743 Type: vm ioctl 733 Type: vm ioctl
744 Parameters: struct kvm_debugregs (in) 734 Parameters: struct kvm_debugregs (in)
745 Returns: 0 on success, -1 on error 735 Returns: 0 on success, -1 on error
746 736
747 Writes debug registers into the vcpu. 737 Writes debug registers into the vcpu.
748 738
749 See KVM_GET_DEBUGREGS for the data structure. The flags field is unused 739 See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
750 yet and must be cleared on entry. 740 yet and must be cleared on entry.
751 741
752 4.34 KVM_SET_USER_MEMORY_REGION 742 4.34 KVM_SET_USER_MEMORY_REGION
753 743
754 Capability: KVM_CAP_USER_MEM 744 Capability: KVM_CAP_USER_MEM
755 Architectures: all 745 Architectures: all
756 Type: vm ioctl 746 Type: vm ioctl
757 Parameters: struct kvm_userspace_memory_region (in) 747 Parameters: struct kvm_userspace_memory_region (in)
758 Returns: 0 on success, -1 on error 748 Returns: 0 on success, -1 on error
759 749
760 struct kvm_userspace_memory_region { 750 struct kvm_userspace_memory_region {
761 __u32 slot; 751 __u32 slot;
762 __u32 flags; 752 __u32 flags;
763 __u64 guest_phys_addr; 753 __u64 guest_phys_addr;
764 __u64 memory_size; /* bytes */ 754 __u64 memory_size; /* bytes */
765 __u64 userspace_addr; /* start of the userspace allocated memory */ 755 __u64 userspace_addr; /* start of the userspace allocated memory */
766 }; 756 };
767 757
768 /* for kvm_memory_region::flags */ 758 /* for kvm_memory_region::flags */
769 #define KVM_MEM_LOG_DIRTY_PAGES 1UL 759 #define KVM_MEM_LOG_DIRTY_PAGES 1UL
770 760
771 This ioctl allows the user to create or modify a guest physical memory 761 This ioctl allows the user to create or modify a guest physical memory
772 slot. When changing an existing slot, it may be moved in the guest 762 slot. When changing an existing slot, it may be moved in the guest
773 physical memory space, or its flags may be modified. It may not be 763 physical memory space, or its flags may be modified. It may not be
774 resized. Slots may not overlap in guest physical address space. 764 resized. Slots may not overlap in guest physical address space.
775 765
776 Memory for the region is taken starting at the address denoted by the 766 Memory for the region is taken starting at the address denoted by the
777 field userspace_addr, which must point at user addressable memory for 767 field userspace_addr, which must point at user addressable memory for
778 the entire memory slot size. Any object may back this memory, including 768 the entire memory slot size. Any object may back this memory, including
779 anonymous memory, ordinary files, and hugetlbfs. 769 anonymous memory, ordinary files, and hugetlbfs.
780 770
781 It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr 771 It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
782 be identical. This allows large pages in the guest to be backed by large 772 be identical. This allows large pages in the guest to be backed by large
783 pages in the host. 773 pages in the host.
784 774
785 The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which 775 The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
786 instructs kvm to keep track of writes to memory within the slot. See 776 instructs kvm to keep track of writes to memory within the slot. See
787 the KVM_GET_DIRTY_LOG ioctl. 777 the KVM_GET_DIRTY_LOG ioctl.
788 778
789 When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory 779 When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
790 region are automatically reflected into the guest. For example, an mmap() 780 region are automatically reflected into the guest. For example, an mmap()
791 that affects the region will be made visible immediately. Another example 781 that affects the region will be made visible immediately. Another example
792 is madvise(MADV_DROP). 782 is madvise(MADV_DROP).
793 783
794 It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. 784 It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
795 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory 785 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
796 allocation and is deprecated. 786 allocation and is deprecated.
797 787
798 4.35 KVM_SET_TSS_ADDR 788 4.35 KVM_SET_TSS_ADDR
799 789
800 Capability: KVM_CAP_SET_TSS_ADDR 790 Capability: KVM_CAP_SET_TSS_ADDR
801 Architectures: x86 791 Architectures: x86
802 Type: vm ioctl 792 Type: vm ioctl
803 Parameters: unsigned long tss_address (in) 793 Parameters: unsigned long tss_address (in)
804 Returns: 0 on success, -1 on error 794 Returns: 0 on success, -1 on error
805 795
806 This ioctl defines the physical address of a three-page region in the guest 796 This ioctl defines the physical address of a three-page region in the guest
807 physical address space. The region must be within the first 4GB of the 797 physical address space. The region must be within the first 4GB of the
808 guest physical address space and must not conflict with any memory slot 798 guest physical address space and must not conflict with any memory slot
809 or any mmio address. The guest may malfunction if it accesses this memory 799 or any mmio address. The guest may malfunction if it accesses this memory
810 region. 800 region.
811 801
812 This ioctl is required on Intel-based hosts. This is needed on Intel hardware 802 This ioctl is required on Intel-based hosts. This is needed on Intel hardware
813 because of a quirk in the virtualization implementation (see the internals 803 because of a quirk in the virtualization implementation (see the internals
814 documentation when it pops into existence). 804 documentation when it pops into existence).
815 805
816 4.36 KVM_ENABLE_CAP 806 4.36 KVM_ENABLE_CAP
817 807
818 Capability: KVM_CAP_ENABLE_CAP 808 Capability: KVM_CAP_ENABLE_CAP
819 Architectures: ppc 809 Architectures: ppc
820 Type: vcpu ioctl 810 Type: vcpu ioctl
821 Parameters: struct kvm_enable_cap (in) 811 Parameters: struct kvm_enable_cap (in)
822 Returns: 0 on success; -1 on error 812 Returns: 0 on success; -1 on error
823 813
824 +Not all extensions are enabled by default. Using this ioctl the application 814 +Not all extensions are enabled by default. Using this ioctl the application
825 can enable an extension, making it available to the guest. 815 can enable an extension, making it available to the guest.
826 816
827 On systems that do not support this ioctl, it always fails. On systems that 817 On systems that do not support this ioctl, it always fails. On systems that
828 do support it, it only works for extensions that are supported for enablement. 818 do support it, it only works for extensions that are supported for enablement.
829 819
830 To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should 820 To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
831 be used. 821 be used.
832 822
833 struct kvm_enable_cap { 823 struct kvm_enable_cap {
834 /* in */ 824 /* in */
835 __u32 cap; 825 __u32 cap;
836 826
837 The capability that is supposed to get enabled. 827 The capability that is supposed to get enabled.
838 828
839 __u32 flags; 829 __u32 flags;
840 830
841 A bitfield indicating future enhancements. Has to be 0 for now. 831 A bitfield indicating future enhancements. Has to be 0 for now.
842 832
843 __u64 args[4]; 833 __u64 args[4];
844 834
845 Arguments for enabling a feature. If a feature needs initial values to 835 Arguments for enabling a feature. If a feature needs initial values to
846 function properly, this is the place to put them. 836 function properly, this is the place to put them.
847 837
848 __u8 pad[64]; 838 __u8 pad[64];
849 }; 839 };
850 840
851 4.37 KVM_GET_MP_STATE 841 4.37 KVM_GET_MP_STATE
852 842
853 Capability: KVM_CAP_MP_STATE 843 Capability: KVM_CAP_MP_STATE
854 Architectures: x86, ia64 844 Architectures: x86, ia64
855 Type: vcpu ioctl 845 Type: vcpu ioctl
856 Parameters: struct kvm_mp_state (out) 846 Parameters: struct kvm_mp_state (out)
857 Returns: 0 on success; -1 on error 847 Returns: 0 on success; -1 on error
858 848
859 struct kvm_mp_state { 849 struct kvm_mp_state {
860 __u32 mp_state; 850 __u32 mp_state;
861 }; 851 };
862 852
863 Returns the vcpu's current "multiprocessing state" (though also valid on 853 Returns the vcpu's current "multiprocessing state" (though also valid on
864 uniprocessor guests). 854 uniprocessor guests).
865 855
866 Possible values are: 856 Possible values are:
867 857
868 - KVM_MP_STATE_RUNNABLE: the vcpu is currently running 858 - KVM_MP_STATE_RUNNABLE: the vcpu is currently running
869 - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) 859 - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
870 which has not yet received an INIT signal 860 which has not yet received an INIT signal
871 - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is 861 - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
872 now ready for a SIPI 862 now ready for a SIPI
873 - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and 863 - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
874 is waiting for an interrupt 864 is waiting for an interrupt
875 - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector 865 - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
876 accesible via KVM_GET_VCPU_EVENTS) 866 accesible via KVM_GET_VCPU_EVENTS)
877 867
878 This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel 868 This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
879 irqchip, the multiprocessing state must be maintained by userspace. 869 irqchip, the multiprocessing state must be maintained by userspace.
880 870
881 4.38 KVM_SET_MP_STATE 871 4.38 KVM_SET_MP_STATE
882 872
883 Capability: KVM_CAP_MP_STATE 873 Capability: KVM_CAP_MP_STATE
884 Architectures: x86, ia64 874 Architectures: x86, ia64
885 Type: vcpu ioctl 875 Type: vcpu ioctl
886 Parameters: struct kvm_mp_state (in) 876 Parameters: struct kvm_mp_state (in)
887 Returns: 0 on success; -1 on error 877 Returns: 0 on success; -1 on error
888 878
889 Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for 879 Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
890 arguments. 880 arguments.
891 881
892 This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel 882 This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
893 irqchip, the multiprocessing state must be maintained by userspace. 883 irqchip, the multiprocessing state must be maintained by userspace.
894 884
895 4.39 KVM_SET_IDENTITY_MAP_ADDR 885 4.39 KVM_SET_IDENTITY_MAP_ADDR
896 886
897 Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR 887 Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
898 Architectures: x86 888 Architectures: x86
899 Type: vm ioctl 889 Type: vm ioctl
900 Parameters: unsigned long identity (in) 890 Parameters: unsigned long identity (in)
901 Returns: 0 on success, -1 on error 891 Returns: 0 on success, -1 on error
902 892
903 This ioctl defines the physical address of a one-page region in the guest 893 This ioctl defines the physical address of a one-page region in the guest
904 physical address space. The region must be within the first 4GB of the 894 physical address space. The region must be within the first 4GB of the
905 guest physical address space and must not conflict with any memory slot 895 guest physical address space and must not conflict with any memory slot
906 or any mmio address. The guest may malfunction if it accesses this memory 896 or any mmio address. The guest may malfunction if it accesses this memory
907 region. 897 region.
908 898
909 This ioctl is required on Intel-based hosts. This is needed on Intel hardware 899 This ioctl is required on Intel-based hosts. This is needed on Intel hardware
910 because of a quirk in the virtualization implementation (see the internals 900 because of a quirk in the virtualization implementation (see the internals
911 documentation when it pops into existence). 901 documentation when it pops into existence).
912 902
913 4.40 KVM_SET_BOOT_CPU_ID 903 4.40 KVM_SET_BOOT_CPU_ID
914 904
915 Capability: KVM_CAP_SET_BOOT_CPU_ID 905 Capability: KVM_CAP_SET_BOOT_CPU_ID
916 Architectures: x86, ia64 906 Architectures: x86, ia64
917 Type: vm ioctl 907 Type: vm ioctl
918 Parameters: unsigned long vcpu_id 908 Parameters: unsigned long vcpu_id
919 Returns: 0 on success, -1 on error 909 Returns: 0 on success, -1 on error
920 910
921 Define which vcpu is the Bootstrap Processor (BSP). Values are the same 911 Define which vcpu is the Bootstrap Processor (BSP). Values are the same
922 as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default 912 as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
923 is vcpu 0. 913 is vcpu 0.
924 914
925 4.41 KVM_GET_XSAVE 915 4.41 KVM_GET_XSAVE
926 916
927 Capability: KVM_CAP_XSAVE 917 Capability: KVM_CAP_XSAVE
928 Architectures: x86 918 Architectures: x86
929 Type: vcpu ioctl 919 Type: vcpu ioctl
930 Parameters: struct kvm_xsave (out) 920 Parameters: struct kvm_xsave (out)
931 Returns: 0 on success, -1 on error 921 Returns: 0 on success, -1 on error
932 922
933 struct kvm_xsave { 923 struct kvm_xsave {
934 __u32 region[1024]; 924 __u32 region[1024];
935 }; 925 };
936 926
937 This ioctl would copy current vcpu's xsave struct to the userspace. 927 This ioctl would copy current vcpu's xsave struct to the userspace.
938 928
939 4.42 KVM_SET_XSAVE 929 4.42 KVM_SET_XSAVE
940 930
941 Capability: KVM_CAP_XSAVE 931 Capability: KVM_CAP_XSAVE
942 Architectures: x86 932 Architectures: x86
943 Type: vcpu ioctl 933 Type: vcpu ioctl
944 Parameters: struct kvm_xsave (in) 934 Parameters: struct kvm_xsave (in)
945 Returns: 0 on success, -1 on error 935 Returns: 0 on success, -1 on error
946 936
947 struct kvm_xsave { 937 struct kvm_xsave {
948 __u32 region[1024]; 938 __u32 region[1024];
949 }; 939 };
950 940
951 This ioctl would copy userspace's xsave struct to the kernel. 941 This ioctl would copy userspace's xsave struct to the kernel.
952 942
953 4.43 KVM_GET_XCRS 943 4.43 KVM_GET_XCRS
954 944
955 Capability: KVM_CAP_XCRS 945 Capability: KVM_CAP_XCRS
956 Architectures: x86 946 Architectures: x86
957 Type: vcpu ioctl 947 Type: vcpu ioctl
958 Parameters: struct kvm_xcrs (out) 948 Parameters: struct kvm_xcrs (out)
959 Returns: 0 on success, -1 on error 949 Returns: 0 on success, -1 on error
960 950
961 struct kvm_xcr { 951 struct kvm_xcr {
962 __u32 xcr; 952 __u32 xcr;
963 __u32 reserved; 953 __u32 reserved;
964 __u64 value; 954 __u64 value;
965 }; 955 };
966 956
967 struct kvm_xcrs { 957 struct kvm_xcrs {
968 __u32 nr_xcrs; 958 __u32 nr_xcrs;
969 __u32 flags; 959 __u32 flags;
970 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 960 struct kvm_xcr xcrs[KVM_MAX_XCRS];
971 __u64 padding[16]; 961 __u64 padding[16];
972 }; 962 };
973 963
974 This ioctl would copy current vcpu's xcrs to the userspace. 964 This ioctl would copy current vcpu's xcrs to the userspace.
975 965
976 4.44 KVM_SET_XCRS 966 4.44 KVM_SET_XCRS
977 967
978 Capability: KVM_CAP_XCRS 968 Capability: KVM_CAP_XCRS
979 Architectures: x86 969 Architectures: x86
980 Type: vcpu ioctl 970 Type: vcpu ioctl
981 Parameters: struct kvm_xcrs (in) 971 Parameters: struct kvm_xcrs (in)
982 Returns: 0 on success, -1 on error 972 Returns: 0 on success, -1 on error
983 973
984 struct kvm_xcr { 974 struct kvm_xcr {
985 __u32 xcr; 975 __u32 xcr;
986 __u32 reserved; 976 __u32 reserved;
987 __u64 value; 977 __u64 value;
988 }; 978 };
989 979
990 struct kvm_xcrs { 980 struct kvm_xcrs {
991 __u32 nr_xcrs; 981 __u32 nr_xcrs;
992 __u32 flags; 982 __u32 flags;
993 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 983 struct kvm_xcr xcrs[KVM_MAX_XCRS];
994 __u64 padding[16]; 984 __u64 padding[16];
995 }; 985 };
996 986
997 This ioctl would set vcpu's xcr to the value userspace specified. 987 This ioctl would set vcpu's xcr to the value userspace specified.
998 988
999 5. The kvm_run structure 989 5. The kvm_run structure
1000 990
1001 Application code obtains a pointer to the kvm_run structure by 991 Application code obtains a pointer to the kvm_run structure by
1002 mmap()ing a vcpu fd. From that point, application code can control 992 mmap()ing a vcpu fd. From that point, application code can control
1003 execution by changing fields in kvm_run prior to calling the KVM_RUN 993 execution by changing fields in kvm_run prior to calling the KVM_RUN
1004 ioctl, and obtain information about the reason KVM_RUN returned by 994 ioctl, and obtain information about the reason KVM_RUN returned by
1005 looking up structure members. 995 looking up structure members.
1006 996
1007 struct kvm_run { 997 struct kvm_run {
1008 /* in */ 998 /* in */
1009 __u8 request_interrupt_window; 999 __u8 request_interrupt_window;
1010 1000
1011 Request that KVM_RUN return when it becomes possible to inject external 1001 Request that KVM_RUN return when it becomes possible to inject external
1012 interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. 1002 interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
1013 1003
1014 __u8 padding1[7]; 1004 __u8 padding1[7];
1015 1005
1016 /* out */ 1006 /* out */
1017 __u32 exit_reason; 1007 __u32 exit_reason;
1018 1008
1019 When KVM_RUN has returned successfully (return value 0), this informs 1009 When KVM_RUN has returned successfully (return value 0), this informs
1020 application code why KVM_RUN has returned. Allowable values for this 1010 application code why KVM_RUN has returned. Allowable values for this
1021 field are detailed below. 1011 field are detailed below.
1022 1012
1023 __u8 ready_for_interrupt_injection; 1013 __u8 ready_for_interrupt_injection;
1024 1014
1025 If request_interrupt_window has been specified, this field indicates 1015 If request_interrupt_window has been specified, this field indicates
1026 an interrupt can be injected now with KVM_INTERRUPT. 1016 an interrupt can be injected now with KVM_INTERRUPT.
1027 1017
1028 __u8 if_flag; 1018 __u8 if_flag;
1029 1019
1030 The value of the current interrupt flag. Only valid if in-kernel 1020 The value of the current interrupt flag. Only valid if in-kernel
1031 local APIC is not used. 1021 local APIC is not used.
1032 1022
1033 __u8 padding2[2]; 1023 __u8 padding2[2];
1034 1024
1035 /* in (pre_kvm_run), out (post_kvm_run) */ 1025 /* in (pre_kvm_run), out (post_kvm_run) */
1036 __u64 cr8; 1026 __u64 cr8;
1037 1027
1038 The value of the cr8 register. Only valid if in-kernel local APIC is 1028 The value of the cr8 register. Only valid if in-kernel local APIC is
1039 not used. Both input and output. 1029 not used. Both input and output.
1040 1030
1041 __u64 apic_base; 1031 __u64 apic_base;
1042 1032
1043 The value of the APIC BASE msr. Only valid if in-kernel local 1033 The value of the APIC BASE msr. Only valid if in-kernel local
1044 APIC is not used. Both input and output. 1034 APIC is not used. Both input and output.
1045 1035
1046 union { 1036 union {
1047 /* KVM_EXIT_UNKNOWN */ 1037 /* KVM_EXIT_UNKNOWN */
1048 struct { 1038 struct {
1049 __u64 hardware_exit_reason; 1039 __u64 hardware_exit_reason;
1050 } hw; 1040 } hw;
1051 1041
1052 If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown 1042 If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
1053 reasons. Further architecture-specific information is available in 1043 reasons. Further architecture-specific information is available in
1054 hardware_exit_reason. 1044 hardware_exit_reason.
1055 1045
1056 /* KVM_EXIT_FAIL_ENTRY */ 1046 /* KVM_EXIT_FAIL_ENTRY */
1057 struct { 1047 struct {
1058 __u64 hardware_entry_failure_reason; 1048 __u64 hardware_entry_failure_reason;
1059 } fail_entry; 1049 } fail_entry;
1060 1050
1061 If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due 1051 If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
1062 to unknown reasons. Further architecture-specific information is 1052 to unknown reasons. Further architecture-specific information is
1063 available in hardware_entry_failure_reason. 1053 available in hardware_entry_failure_reason.
1064 1054
1065 /* KVM_EXIT_EXCEPTION */ 1055 /* KVM_EXIT_EXCEPTION */
1066 struct { 1056 struct {
1067 __u32 exception; 1057 __u32 exception;
1068 __u32 error_code; 1058 __u32 error_code;
1069 } ex; 1059 } ex;
1070 1060
1071 Unused. 1061 Unused.
1072 1062
1073 /* KVM_EXIT_IO */ 1063 /* KVM_EXIT_IO */
1074 struct { 1064 struct {
1075 #define KVM_EXIT_IO_IN 0 1065 #define KVM_EXIT_IO_IN 0
1076 #define KVM_EXIT_IO_OUT 1 1066 #define KVM_EXIT_IO_OUT 1
1077 __u8 direction; 1067 __u8 direction;
1078 __u8 size; /* bytes */ 1068 __u8 size; /* bytes */
1079 __u16 port; 1069 __u16 port;
1080 __u32 count; 1070 __u32 count;
1081 __u64 data_offset; /* relative to kvm_run start */ 1071 __u64 data_offset; /* relative to kvm_run start */
1082 } io; 1072 } io;
1083 1073
1084 If exit_reason is KVM_EXIT_IO, then the vcpu has 1074 If exit_reason is KVM_EXIT_IO, then the vcpu has
1085 executed a port I/O instruction which could not be satisfied by kvm. 1075 executed a port I/O instruction which could not be satisfied by kvm.
1086 data_offset describes where the data is located (KVM_EXIT_IO_OUT) or 1076 data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
1087 where kvm expects application code to place the data for the next 1077 where kvm expects application code to place the data for the next
1088 KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. 1078 KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
1089 1079
1090 struct { 1080 struct {
1091 struct kvm_debug_exit_arch arch; 1081 struct kvm_debug_exit_arch arch;
1092 } debug; 1082 } debug;
1093 1083
1094 Unused. 1084 Unused.
1095 1085
1096 /* KVM_EXIT_MMIO */ 1086 /* KVM_EXIT_MMIO */
1097 struct { 1087 struct {
1098 __u64 phys_addr; 1088 __u64 phys_addr;
1099 __u8 data[8]; 1089 __u8 data[8];
1100 __u32 len; 1090 __u32 len;
1101 __u8 is_write; 1091 __u8 is_write;
1102 } mmio; 1092 } mmio;
1103 1093
1104 If exit_reason is KVM_EXIT_MMIO, then the vcpu has 1094 If exit_reason is KVM_EXIT_MMIO, then the vcpu has
1105 executed a memory-mapped I/O instruction which could not be satisfied 1095 executed a memory-mapped I/O instruction which could not be satisfied
1106 by kvm. The 'data' member contains the written data if 'is_write' is 1096 by kvm. The 'data' member contains the written data if 'is_write' is
1107 true, and should be filled by application code otherwise. 1097 true, and should be filled by application code otherwise.
1108 1098
1109 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding 1099 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
1110 operations are complete (and guest state is consistent) only after userspace 1100 operations are complete (and guest state is consistent) only after userspace
1111 has re-entered the kernel with KVM_RUN. The kernel side will first finish 1101 has re-entered the kernel with KVM_RUN. The kernel side will first finish
1112 incomplete operations and then check for pending signals. Userspace 1102 incomplete operations and then check for pending signals. Userspace
1113 can re-enter the guest with an unmasked signal pending to complete 1103 can re-enter the guest with an unmasked signal pending to complete
1114 pending operations. 1104 pending operations.
1115 1105
1116 /* KVM_EXIT_HYPERCALL */ 1106 /* KVM_EXIT_HYPERCALL */
1117 struct { 1107 struct {
1118 __u64 nr; 1108 __u64 nr;
1119 __u64 args[6]; 1109 __u64 args[6];
1120 __u64 ret; 1110 __u64 ret;
1121 __u32 longmode; 1111 __u32 longmode;
1122 __u32 pad; 1112 __u32 pad;
1123 } hypercall; 1113 } hypercall;
1124 1114
1125 Unused. This was once used for 'hypercall to userspace'. To implement 1115 Unused. This was once used for 'hypercall to userspace'. To implement
1126 such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). 1116 such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
1127 Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. 1117 Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
1128 1118
1129 /* KVM_EXIT_TPR_ACCESS */ 1119 /* KVM_EXIT_TPR_ACCESS */
1130 struct { 1120 struct {
1131 __u64 rip; 1121 __u64 rip;
1132 __u32 is_write; 1122 __u32 is_write;
1133 __u32 pad; 1123 __u32 pad;
1134 } tpr_access; 1124 } tpr_access;
1135 1125
1136 To be documented (KVM_TPR_ACCESS_REPORTING). 1126 To be documented (KVM_TPR_ACCESS_REPORTING).
1137 1127
1138 /* KVM_EXIT_S390_SIEIC */ 1128 /* KVM_EXIT_S390_SIEIC */
1139 struct { 1129 struct {
1140 __u8 icptcode; 1130 __u8 icptcode;
1141 __u64 mask; /* psw upper half */ 1131 __u64 mask; /* psw upper half */
1142 __u64 addr; /* psw lower half */ 1132 __u64 addr; /* psw lower half */
1143 __u16 ipa; 1133 __u16 ipa;
1144 __u32 ipb; 1134 __u32 ipb;
1145 } s390_sieic; 1135 } s390_sieic;
1146 1136
1147 s390 specific. 1137 s390 specific.
1148 1138
1149 /* KVM_EXIT_S390_RESET */ 1139 /* KVM_EXIT_S390_RESET */
1150 #define KVM_S390_RESET_POR 1 1140 #define KVM_S390_RESET_POR 1
1151 #define KVM_S390_RESET_CLEAR 2 1141 #define KVM_S390_RESET_CLEAR 2
1152 #define KVM_S390_RESET_SUBSYSTEM 4 1142 #define KVM_S390_RESET_SUBSYSTEM 4
1153 #define KVM_S390_RESET_CPU_INIT 8 1143 #define KVM_S390_RESET_CPU_INIT 8
1154 #define KVM_S390_RESET_IPL 16 1144 #define KVM_S390_RESET_IPL 16
1155 __u64 s390_reset_flags; 1145 __u64 s390_reset_flags;
1156 1146
1157 s390 specific. 1147 s390 specific.
1158 1148
1159 /* KVM_EXIT_DCR */ 1149 /* KVM_EXIT_DCR */
1160 struct { 1150 struct {
1161 __u32 dcrn; 1151 __u32 dcrn;
1162 __u32 data; 1152 __u32 data;
1163 __u8 is_write; 1153 __u8 is_write;
1164 } dcr; 1154 } dcr;
1165 1155
1166 powerpc specific. 1156 powerpc specific.
1167 1157
1168 /* KVM_EXIT_OSI */ 1158 /* KVM_EXIT_OSI */
1169 struct { 1159 struct {
1170 __u64 gprs[32]; 1160 __u64 gprs[32];
1171 } osi; 1161 } osi;
1172 1162
1173 MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch 1163 MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
1174 hypercalls and exit with this exit struct that contains all the guest gprs. 1164 hypercalls and exit with this exit struct that contains all the guest gprs.
1175 1165
1176 If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. 1166 If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
1177 Userspace can now handle the hypercall and when it's done modify the gprs as 1167 Userspace can now handle the hypercall and when it's done modify the gprs as
1178 necessary. Upon guest entry all guest GPRs will then be replaced by the values 1168 necessary. Upon guest entry all guest GPRs will then be replaced by the values
1179 in this struct. 1169 in this struct.
1180 1170
1181 /* Fix the size of the union. */ 1171 /* Fix the size of the union. */
1182 char padding[256]; 1172 char padding[256];
1183 }; 1173 };
1184 }; 1174 };
1185 1175
arch/ia64/kvm/kvm-ia64.c
1 /* 1 /*
2 * kvm_ia64.c: Basic KVM suppport On Itanium series processors 2 * kvm_ia64.c: Basic KVM suppport On Itanium series processors
3 * 3 *
4 * 4 *
5 * Copyright (C) 2007, Intel Corporation. 5 * Copyright (C) 2007, Intel Corporation.
6 * Xiantao Zhang (xiantao.zhang@intel.com) 6 * Xiantao Zhang (xiantao.zhang@intel.com)
7 * 7 *
8 * This program is free software; you can redistribute it and/or modify it 8 * This program is free software; you can redistribute it and/or modify it
9 * under the terms and conditions of the GNU General Public License, 9 * under the terms and conditions of the GNU General Public License,
10 * version 2, as published by the Free Software Foundation. 10 * version 2, as published by the Free Software Foundation.
11 * 11 *
12 * This program is distributed in the hope it will be useful, but WITHOUT 12 * This program is distributed in the hope it will be useful, but WITHOUT
13 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 13 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
14 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for 14 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
15 * more details. 15 * more details.
16 * 16 *
17 * You should have received a copy of the GNU General Public License along with 17 * You should have received a copy of the GNU General Public License along with
18 * this program; if not, write to the Free Software Foundation, Inc., 59 Temple 18 * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
19 * Place - Suite 330, Boston, MA 02111-1307 USA. 19 * Place - Suite 330, Boston, MA 02111-1307 USA.
20 * 20 *
21 */ 21 */
22 22
23 #include <linux/module.h> 23 #include <linux/module.h>
24 #include <linux/errno.h> 24 #include <linux/errno.h>
25 #include <linux/percpu.h> 25 #include <linux/percpu.h>
26 #include <linux/fs.h> 26 #include <linux/fs.h>
27 #include <linux/slab.h> 27 #include <linux/slab.h>
28 #include <linux/smp.h> 28 #include <linux/smp.h>
29 #include <linux/kvm_host.h> 29 #include <linux/kvm_host.h>
30 #include <linux/kvm.h> 30 #include <linux/kvm.h>
31 #include <linux/bitops.h> 31 #include <linux/bitops.h>
32 #include <linux/hrtimer.h> 32 #include <linux/hrtimer.h>
33 #include <linux/uaccess.h> 33 #include <linux/uaccess.h>
34 #include <linux/iommu.h> 34 #include <linux/iommu.h>
35 #include <linux/intel-iommu.h> 35 #include <linux/intel-iommu.h>
36 36
37 #include <asm/pgtable.h> 37 #include <asm/pgtable.h>
38 #include <asm/gcc_intrin.h> 38 #include <asm/gcc_intrin.h>
39 #include <asm/pal.h> 39 #include <asm/pal.h>
40 #include <asm/cacheflush.h> 40 #include <asm/cacheflush.h>
41 #include <asm/div64.h> 41 #include <asm/div64.h>
42 #include <asm/tlb.h> 42 #include <asm/tlb.h>
43 #include <asm/elf.h> 43 #include <asm/elf.h>
44 #include <asm/sn/addrs.h> 44 #include <asm/sn/addrs.h>
45 #include <asm/sn/clksupport.h> 45 #include <asm/sn/clksupport.h>
46 #include <asm/sn/shub_mmr.h> 46 #include <asm/sn/shub_mmr.h>
47 47
48 #include "misc.h" 48 #include "misc.h"
49 #include "vti.h" 49 #include "vti.h"
50 #include "iodev.h" 50 #include "iodev.h"
51 #include "ioapic.h" 51 #include "ioapic.h"
52 #include "lapic.h" 52 #include "lapic.h"
53 #include "irq.h" 53 #include "irq.h"
54 54
55 static unsigned long kvm_vmm_base; 55 static unsigned long kvm_vmm_base;
56 static unsigned long kvm_vsa_base; 56 static unsigned long kvm_vsa_base;
57 static unsigned long kvm_vm_buffer; 57 static unsigned long kvm_vm_buffer;
58 static unsigned long kvm_vm_buffer_size; 58 static unsigned long kvm_vm_buffer_size;
59 unsigned long kvm_vmm_gp; 59 unsigned long kvm_vmm_gp;
60 60
61 static long vp_env_info; 61 static long vp_env_info;
62 62
63 static struct kvm_vmm_info *kvm_vmm_info; 63 static struct kvm_vmm_info *kvm_vmm_info;
64 64
65 static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu); 65 static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu);
66 66
67 struct kvm_stats_debugfs_item debugfs_entries[] = { 67 struct kvm_stats_debugfs_item debugfs_entries[] = {
68 { NULL } 68 { NULL }
69 }; 69 };
70 70
71 static unsigned long kvm_get_itc(struct kvm_vcpu *vcpu) 71 static unsigned long kvm_get_itc(struct kvm_vcpu *vcpu)
72 { 72 {
73 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) 73 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
74 if (vcpu->kvm->arch.is_sn2) 74 if (vcpu->kvm->arch.is_sn2)
75 return rtc_time(); 75 return rtc_time();
76 else 76 else
77 #endif 77 #endif
78 return ia64_getreg(_IA64_REG_AR_ITC); 78 return ia64_getreg(_IA64_REG_AR_ITC);
79 } 79 }
80 80
81 static void kvm_flush_icache(unsigned long start, unsigned long len) 81 static void kvm_flush_icache(unsigned long start, unsigned long len)
82 { 82 {
83 int l; 83 int l;
84 84
85 for (l = 0; l < (len + 32); l += 32) 85 for (l = 0; l < (len + 32); l += 32)
86 ia64_fc((void *)(start + l)); 86 ia64_fc((void *)(start + l));
87 87
88 ia64_sync_i(); 88 ia64_sync_i();
89 ia64_srlz_i(); 89 ia64_srlz_i();
90 } 90 }
91 91
92 static void kvm_flush_tlb_all(void) 92 static void kvm_flush_tlb_all(void)
93 { 93 {
94 unsigned long i, j, count0, count1, stride0, stride1, addr; 94 unsigned long i, j, count0, count1, stride0, stride1, addr;
95 long flags; 95 long flags;
96 96
97 addr = local_cpu_data->ptce_base; 97 addr = local_cpu_data->ptce_base;
98 count0 = local_cpu_data->ptce_count[0]; 98 count0 = local_cpu_data->ptce_count[0];
99 count1 = local_cpu_data->ptce_count[1]; 99 count1 = local_cpu_data->ptce_count[1];
100 stride0 = local_cpu_data->ptce_stride[0]; 100 stride0 = local_cpu_data->ptce_stride[0];
101 stride1 = local_cpu_data->ptce_stride[1]; 101 stride1 = local_cpu_data->ptce_stride[1];
102 102
103 local_irq_save(flags); 103 local_irq_save(flags);
104 for (i = 0; i < count0; ++i) { 104 for (i = 0; i < count0; ++i) {
105 for (j = 0; j < count1; ++j) { 105 for (j = 0; j < count1; ++j) {
106 ia64_ptce(addr); 106 ia64_ptce(addr);
107 addr += stride1; 107 addr += stride1;
108 } 108 }
109 addr += stride0; 109 addr += stride0;
110 } 110 }
111 local_irq_restore(flags); 111 local_irq_restore(flags);
112 ia64_srlz_i(); /* srlz.i implies srlz.d */ 112 ia64_srlz_i(); /* srlz.i implies srlz.d */
113 } 113 }
114 114
115 long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 *opt_handler) 115 long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 *opt_handler)
116 { 116 {
117 struct ia64_pal_retval iprv; 117 struct ia64_pal_retval iprv;
118 118
119 PAL_CALL_STK(iprv, PAL_VP_CREATE, (u64)vpd, (u64)host_iva, 119 PAL_CALL_STK(iprv, PAL_VP_CREATE, (u64)vpd, (u64)host_iva,
120 (u64)opt_handler); 120 (u64)opt_handler);
121 121
122 return iprv.status; 122 return iprv.status;
123 } 123 }
124 124
125 static DEFINE_SPINLOCK(vp_lock); 125 static DEFINE_SPINLOCK(vp_lock);
126 126
127 int kvm_arch_hardware_enable(void *garbage) 127 int kvm_arch_hardware_enable(void *garbage)
128 { 128 {
129 long status; 129 long status;
130 long tmp_base; 130 long tmp_base;
131 unsigned long pte; 131 unsigned long pte;
132 unsigned long saved_psr; 132 unsigned long saved_psr;
133 int slot; 133 int slot;
134 134
135 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); 135 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL));
136 local_irq_save(saved_psr); 136 local_irq_save(saved_psr);
137 slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); 137 slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT);
138 local_irq_restore(saved_psr); 138 local_irq_restore(saved_psr);
139 if (slot < 0) 139 if (slot < 0)
140 return -EINVAL; 140 return -EINVAL;
141 141
142 spin_lock(&vp_lock); 142 spin_lock(&vp_lock);
143 status = ia64_pal_vp_init_env(kvm_vsa_base ? 143 status = ia64_pal_vp_init_env(kvm_vsa_base ?
144 VP_INIT_ENV : VP_INIT_ENV_INITALIZE, 144 VP_INIT_ENV : VP_INIT_ENV_INITALIZE,
145 __pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, &tmp_base); 145 __pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, &tmp_base);
146 if (status != 0) { 146 if (status != 0) {
147 spin_unlock(&vp_lock); 147 spin_unlock(&vp_lock);
148 printk(KERN_WARNING"kvm: Failed to Enable VT Support!!!!\n"); 148 printk(KERN_WARNING"kvm: Failed to Enable VT Support!!!!\n");
149 return -EINVAL; 149 return -EINVAL;
150 } 150 }
151 151
152 if (!kvm_vsa_base) { 152 if (!kvm_vsa_base) {
153 kvm_vsa_base = tmp_base; 153 kvm_vsa_base = tmp_base;
154 printk(KERN_INFO"kvm: kvm_vsa_base:0x%lx\n", kvm_vsa_base); 154 printk(KERN_INFO"kvm: kvm_vsa_base:0x%lx\n", kvm_vsa_base);
155 } 155 }
156 spin_unlock(&vp_lock); 156 spin_unlock(&vp_lock);
157 ia64_ptr_entry(0x3, slot); 157 ia64_ptr_entry(0x3, slot);
158 158
159 return 0; 159 return 0;
160 } 160 }
161 161
162 void kvm_arch_hardware_disable(void *garbage) 162 void kvm_arch_hardware_disable(void *garbage)
163 { 163 {
164 164
165 long status; 165 long status;
166 int slot; 166 int slot;
167 unsigned long pte; 167 unsigned long pte;
168 unsigned long saved_psr; 168 unsigned long saved_psr;
169 unsigned long host_iva = ia64_getreg(_IA64_REG_CR_IVA); 169 unsigned long host_iva = ia64_getreg(_IA64_REG_CR_IVA);
170 170
171 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), 171 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base),
172 PAGE_KERNEL)); 172 PAGE_KERNEL));
173 173
174 local_irq_save(saved_psr); 174 local_irq_save(saved_psr);
175 slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); 175 slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT);
176 local_irq_restore(saved_psr); 176 local_irq_restore(saved_psr);
177 if (slot < 0) 177 if (slot < 0)
178 return; 178 return;
179 179
180 status = ia64_pal_vp_exit_env(host_iva); 180 status = ia64_pal_vp_exit_env(host_iva);
181 if (status) 181 if (status)
182 printk(KERN_DEBUG"kvm: Failed to disable VT support! :%ld\n", 182 printk(KERN_DEBUG"kvm: Failed to disable VT support! :%ld\n",
183 status); 183 status);
184 ia64_ptr_entry(0x3, slot); 184 ia64_ptr_entry(0x3, slot);
185 } 185 }
186 186
187 void kvm_arch_check_processor_compat(void *rtn) 187 void kvm_arch_check_processor_compat(void *rtn)
188 { 188 {
189 *(int *)rtn = 0; 189 *(int *)rtn = 0;
190 } 190 }
191 191
192 int kvm_dev_ioctl_check_extension(long ext) 192 int kvm_dev_ioctl_check_extension(long ext)
193 { 193 {
194 194
195 int r; 195 int r;
196 196
197 switch (ext) { 197 switch (ext) {
198 case KVM_CAP_IRQCHIP: 198 case KVM_CAP_IRQCHIP:
199 case KVM_CAP_MP_STATE: 199 case KVM_CAP_MP_STATE:
200 case KVM_CAP_IRQ_INJECT_STATUS: 200 case KVM_CAP_IRQ_INJECT_STATUS:
201 r = 1; 201 r = 1;
202 break; 202 break;
203 case KVM_CAP_COALESCED_MMIO: 203 case KVM_CAP_COALESCED_MMIO:
204 r = KVM_COALESCED_MMIO_PAGE_OFFSET; 204 r = KVM_COALESCED_MMIO_PAGE_OFFSET;
205 break; 205 break;
206 case KVM_CAP_IOMMU: 206 case KVM_CAP_IOMMU:
207 r = iommu_found(); 207 r = iommu_found();
208 break; 208 break;
209 default: 209 default:
210 r = 0; 210 r = 0;
211 } 211 }
212 return r; 212 return r;
213 213
214 } 214 }
215 215
216 static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 216 static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
217 { 217 {
218 kvm_run->exit_reason = KVM_EXIT_UNKNOWN; 218 kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
219 kvm_run->hw.hardware_exit_reason = 1; 219 kvm_run->hw.hardware_exit_reason = 1;
220 return 0; 220 return 0;
221 } 221 }
222 222
223 static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 223 static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
224 { 224 {
225 struct kvm_mmio_req *p; 225 struct kvm_mmio_req *p;
226 struct kvm_io_device *mmio_dev; 226 struct kvm_io_device *mmio_dev;
227 int r; 227 int r;
228 228
229 p = kvm_get_vcpu_ioreq(vcpu); 229 p = kvm_get_vcpu_ioreq(vcpu);
230 230
231 if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS) 231 if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS)
232 goto mmio; 232 goto mmio;
233 vcpu->mmio_needed = 1; 233 vcpu->mmio_needed = 1;
234 vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr; 234 vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr;
235 vcpu->mmio_size = kvm_run->mmio.len = p->size; 235 vcpu->mmio_size = kvm_run->mmio.len = p->size;
236 vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir; 236 vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir;
237 237
238 if (vcpu->mmio_is_write) 238 if (vcpu->mmio_is_write)
239 memcpy(vcpu->mmio_data, &p->data, p->size); 239 memcpy(vcpu->mmio_data, &p->data, p->size);
240 memcpy(kvm_run->mmio.data, &p->data, p->size); 240 memcpy(kvm_run->mmio.data, &p->data, p->size);
241 kvm_run->exit_reason = KVM_EXIT_MMIO; 241 kvm_run->exit_reason = KVM_EXIT_MMIO;
242 return 0; 242 return 0;
243 mmio: 243 mmio:
244 if (p->dir) 244 if (p->dir)
245 r = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, p->addr, 245 r = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, p->addr,
246 p->size, &p->data); 246 p->size, &p->data);
247 else 247 else
248 r = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, p->addr, 248 r = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, p->addr,
249 p->size, &p->data); 249 p->size, &p->data);
250 if (r) 250 if (r)
251 printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); 251 printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr);
252 p->state = STATE_IORESP_READY; 252 p->state = STATE_IORESP_READY;
253 253
254 return 1; 254 return 1;
255 } 255 }
256 256
257 static int handle_pal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 257 static int handle_pal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
258 { 258 {
259 struct exit_ctl_data *p; 259 struct exit_ctl_data *p;
260 260
261 p = kvm_get_exit_data(vcpu); 261 p = kvm_get_exit_data(vcpu);
262 262
263 if (p->exit_reason == EXIT_REASON_PAL_CALL) 263 if (p->exit_reason == EXIT_REASON_PAL_CALL)
264 return kvm_pal_emul(vcpu, kvm_run); 264 return kvm_pal_emul(vcpu, kvm_run);
265 else { 265 else {
266 kvm_run->exit_reason = KVM_EXIT_UNKNOWN; 266 kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
267 kvm_run->hw.hardware_exit_reason = 2; 267 kvm_run->hw.hardware_exit_reason = 2;
268 return 0; 268 return 0;
269 } 269 }
270 } 270 }
271 271
272 static int handle_sal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 272 static int handle_sal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
273 { 273 {
274 struct exit_ctl_data *p; 274 struct exit_ctl_data *p;
275 275
276 p = kvm_get_exit_data(vcpu); 276 p = kvm_get_exit_data(vcpu);
277 277
278 if (p->exit_reason == EXIT_REASON_SAL_CALL) { 278 if (p->exit_reason == EXIT_REASON_SAL_CALL) {
279 kvm_sal_emul(vcpu); 279 kvm_sal_emul(vcpu);
280 return 1; 280 return 1;
281 } else { 281 } else {
282 kvm_run->exit_reason = KVM_EXIT_UNKNOWN; 282 kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
283 kvm_run->hw.hardware_exit_reason = 3; 283 kvm_run->hw.hardware_exit_reason = 3;
284 return 0; 284 return 0;
285 } 285 }
286 286
287 } 287 }
288 288
289 static int __apic_accept_irq(struct kvm_vcpu *vcpu, uint64_t vector) 289 static int __apic_accept_irq(struct kvm_vcpu *vcpu, uint64_t vector)
290 { 290 {
291 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 291 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
292 292
293 if (!test_and_set_bit(vector, &vpd->irr[0])) { 293 if (!test_and_set_bit(vector, &vpd->irr[0])) {
294 vcpu->arch.irq_new_pending = 1; 294 vcpu->arch.irq_new_pending = 1;
295 kvm_vcpu_kick(vcpu); 295 kvm_vcpu_kick(vcpu);
296 return 1; 296 return 1;
297 } 297 }
298 return 0; 298 return 0;
299 } 299 }
300 300
301 /* 301 /*
302 * offset: address offset to IPI space. 302 * offset: address offset to IPI space.
303 * value: deliver value. 303 * value: deliver value.
304 */ 304 */
305 static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, uint64_t dm, 305 static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, uint64_t dm,
306 uint64_t vector) 306 uint64_t vector)
307 { 307 {
308 switch (dm) { 308 switch (dm) {
309 case SAPIC_FIXED: 309 case SAPIC_FIXED:
310 break; 310 break;
311 case SAPIC_NMI: 311 case SAPIC_NMI:
312 vector = 2; 312 vector = 2;
313 break; 313 break;
314 case SAPIC_EXTINT: 314 case SAPIC_EXTINT:
315 vector = 0; 315 vector = 0;
316 break; 316 break;
317 case SAPIC_INIT: 317 case SAPIC_INIT:
318 case SAPIC_PMI: 318 case SAPIC_PMI:
319 default: 319 default:
320 printk(KERN_ERR"kvm: Unimplemented Deliver reserved IPI!\n"); 320 printk(KERN_ERR"kvm: Unimplemented Deliver reserved IPI!\n");
321 return; 321 return;
322 } 322 }
323 __apic_accept_irq(vcpu, vector); 323 __apic_accept_irq(vcpu, vector);
324 } 324 }
325 325
326 static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id, 326 static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id,
327 unsigned long eid) 327 unsigned long eid)
328 { 328 {
329 union ia64_lid lid; 329 union ia64_lid lid;
330 int i; 330 int i;
331 struct kvm_vcpu *vcpu; 331 struct kvm_vcpu *vcpu;
332 332
333 kvm_for_each_vcpu(i, vcpu, kvm) { 333 kvm_for_each_vcpu(i, vcpu, kvm) {
334 lid.val = VCPU_LID(vcpu); 334 lid.val = VCPU_LID(vcpu);
335 if (lid.id == id && lid.eid == eid) 335 if (lid.id == id && lid.eid == eid)
336 return vcpu; 336 return vcpu;
337 } 337 }
338 338
339 return NULL; 339 return NULL;
340 } 340 }
341 341
342 static int handle_ipi(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 342 static int handle_ipi(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
343 { 343 {
344 struct exit_ctl_data *p = kvm_get_exit_data(vcpu); 344 struct exit_ctl_data *p = kvm_get_exit_data(vcpu);
345 struct kvm_vcpu *target_vcpu; 345 struct kvm_vcpu *target_vcpu;
346 struct kvm_pt_regs *regs; 346 struct kvm_pt_regs *regs;
347 union ia64_ipi_a addr = p->u.ipi_data.addr; 347 union ia64_ipi_a addr = p->u.ipi_data.addr;
348 union ia64_ipi_d data = p->u.ipi_data.data; 348 union ia64_ipi_d data = p->u.ipi_data.data;
349 349
350 target_vcpu = lid_to_vcpu(vcpu->kvm, addr.id, addr.eid); 350 target_vcpu = lid_to_vcpu(vcpu->kvm, addr.id, addr.eid);
351 if (!target_vcpu) 351 if (!target_vcpu)
352 return handle_vm_error(vcpu, kvm_run); 352 return handle_vm_error(vcpu, kvm_run);
353 353
354 if (!target_vcpu->arch.launched) { 354 if (!target_vcpu->arch.launched) {
355 regs = vcpu_regs(target_vcpu); 355 regs = vcpu_regs(target_vcpu);
356 356
357 regs->cr_iip = vcpu->kvm->arch.rdv_sal_data.boot_ip; 357 regs->cr_iip = vcpu->kvm->arch.rdv_sal_data.boot_ip;
358 regs->r1 = vcpu->kvm->arch.rdv_sal_data.boot_gp; 358 regs->r1 = vcpu->kvm->arch.rdv_sal_data.boot_gp;
359 359
360 target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 360 target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
361 if (waitqueue_active(&target_vcpu->wq)) 361 if (waitqueue_active(&target_vcpu->wq))
362 wake_up_interruptible(&target_vcpu->wq); 362 wake_up_interruptible(&target_vcpu->wq);
363 } else { 363 } else {
364 vcpu_deliver_ipi(target_vcpu, data.dm, data.vector); 364 vcpu_deliver_ipi(target_vcpu, data.dm, data.vector);
365 if (target_vcpu != vcpu) 365 if (target_vcpu != vcpu)
366 kvm_vcpu_kick(target_vcpu); 366 kvm_vcpu_kick(target_vcpu);
367 } 367 }
368 368
369 return 1; 369 return 1;
370 } 370 }
371 371
372 struct call_data { 372 struct call_data {
373 struct kvm_ptc_g ptc_g_data; 373 struct kvm_ptc_g ptc_g_data;
374 struct kvm_vcpu *vcpu; 374 struct kvm_vcpu *vcpu;
375 }; 375 };
376 376
377 static void vcpu_global_purge(void *info) 377 static void vcpu_global_purge(void *info)
378 { 378 {
379 struct call_data *p = (struct call_data *)info; 379 struct call_data *p = (struct call_data *)info;
380 struct kvm_vcpu *vcpu = p->vcpu; 380 struct kvm_vcpu *vcpu = p->vcpu;
381 381
382 if (test_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) 382 if (test_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
383 return; 383 return;
384 384
385 set_bit(KVM_REQ_PTC_G, &vcpu->requests); 385 set_bit(KVM_REQ_PTC_G, &vcpu->requests);
386 if (vcpu->arch.ptc_g_count < MAX_PTC_G_NUM) { 386 if (vcpu->arch.ptc_g_count < MAX_PTC_G_NUM) {
387 vcpu->arch.ptc_g_data[vcpu->arch.ptc_g_count++] = 387 vcpu->arch.ptc_g_data[vcpu->arch.ptc_g_count++] =
388 p->ptc_g_data; 388 p->ptc_g_data;
389 } else { 389 } else {
390 clear_bit(KVM_REQ_PTC_G, &vcpu->requests); 390 clear_bit(KVM_REQ_PTC_G, &vcpu->requests);
391 vcpu->arch.ptc_g_count = 0; 391 vcpu->arch.ptc_g_count = 0;
392 set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); 392 set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
393 } 393 }
394 } 394 }
395 395
396 static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 396 static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
397 { 397 {
398 struct exit_ctl_data *p = kvm_get_exit_data(vcpu); 398 struct exit_ctl_data *p = kvm_get_exit_data(vcpu);
399 struct kvm *kvm = vcpu->kvm; 399 struct kvm *kvm = vcpu->kvm;
400 struct call_data call_data; 400 struct call_data call_data;
401 int i; 401 int i;
402 struct kvm_vcpu *vcpui; 402 struct kvm_vcpu *vcpui;
403 403
404 call_data.ptc_g_data = p->u.ptc_g_data; 404 call_data.ptc_g_data = p->u.ptc_g_data;
405 405
406 kvm_for_each_vcpu(i, vcpui, kvm) { 406 kvm_for_each_vcpu(i, vcpui, kvm) {
407 if (vcpui->arch.mp_state == KVM_MP_STATE_UNINITIALIZED || 407 if (vcpui->arch.mp_state == KVM_MP_STATE_UNINITIALIZED ||
408 vcpu == vcpui) 408 vcpu == vcpui)
409 continue; 409 continue;
410 410
411 if (waitqueue_active(&vcpui->wq)) 411 if (waitqueue_active(&vcpui->wq))
412 wake_up_interruptible(&vcpui->wq); 412 wake_up_interruptible(&vcpui->wq);
413 413
414 if (vcpui->cpu != -1) { 414 if (vcpui->cpu != -1) {
415 call_data.vcpu = vcpui; 415 call_data.vcpu = vcpui;
416 smp_call_function_single(vcpui->cpu, 416 smp_call_function_single(vcpui->cpu,
417 vcpu_global_purge, &call_data, 1); 417 vcpu_global_purge, &call_data, 1);
418 } else 418 } else
419 printk(KERN_WARNING"kvm: Uninit vcpu received ipi!\n"); 419 printk(KERN_WARNING"kvm: Uninit vcpu received ipi!\n");
420 420
421 } 421 }
422 return 1; 422 return 1;
423 } 423 }
424 424
425 static int handle_switch_rr6(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 425 static int handle_switch_rr6(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
426 { 426 {
427 return 1; 427 return 1;
428 } 428 }
429 429
430 static int kvm_sn2_setup_mappings(struct kvm_vcpu *vcpu) 430 static int kvm_sn2_setup_mappings(struct kvm_vcpu *vcpu)
431 { 431 {
432 unsigned long pte, rtc_phys_addr, map_addr; 432 unsigned long pte, rtc_phys_addr, map_addr;
433 int slot; 433 int slot;
434 434
435 map_addr = KVM_VMM_BASE + (1UL << KVM_VMM_SHIFT); 435 map_addr = KVM_VMM_BASE + (1UL << KVM_VMM_SHIFT);
436 rtc_phys_addr = LOCAL_MMR_OFFSET | SH_RTC; 436 rtc_phys_addr = LOCAL_MMR_OFFSET | SH_RTC;
437 pte = pte_val(mk_pte_phys(rtc_phys_addr, PAGE_KERNEL_UC)); 437 pte = pte_val(mk_pte_phys(rtc_phys_addr, PAGE_KERNEL_UC));
438 slot = ia64_itr_entry(0x3, map_addr, pte, PAGE_SHIFT); 438 slot = ia64_itr_entry(0x3, map_addr, pte, PAGE_SHIFT);
439 vcpu->arch.sn_rtc_tr_slot = slot; 439 vcpu->arch.sn_rtc_tr_slot = slot;
440 if (slot < 0) { 440 if (slot < 0) {
441 printk(KERN_ERR "Mayday mayday! RTC mapping failed!\n"); 441 printk(KERN_ERR "Mayday mayday! RTC mapping failed!\n");
442 slot = 0; 442 slot = 0;
443 } 443 }
444 return slot; 444 return slot;
445 } 445 }
446 446
447 int kvm_emulate_halt(struct kvm_vcpu *vcpu) 447 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
448 { 448 {
449 449
450 ktime_t kt; 450 ktime_t kt;
451 long itc_diff; 451 long itc_diff;
452 unsigned long vcpu_now_itc; 452 unsigned long vcpu_now_itc;
453 unsigned long expires; 453 unsigned long expires;
454 struct hrtimer *p_ht = &vcpu->arch.hlt_timer; 454 struct hrtimer *p_ht = &vcpu->arch.hlt_timer;
455 unsigned long cyc_per_usec = local_cpu_data->cyc_per_usec; 455 unsigned long cyc_per_usec = local_cpu_data->cyc_per_usec;
456 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 456 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
457 457
458 if (irqchip_in_kernel(vcpu->kvm)) { 458 if (irqchip_in_kernel(vcpu->kvm)) {
459 459
460 vcpu_now_itc = kvm_get_itc(vcpu) + vcpu->arch.itc_offset; 460 vcpu_now_itc = kvm_get_itc(vcpu) + vcpu->arch.itc_offset;
461 461
462 if (time_after(vcpu_now_itc, vpd->itm)) { 462 if (time_after(vcpu_now_itc, vpd->itm)) {
463 vcpu->arch.timer_check = 1; 463 vcpu->arch.timer_check = 1;
464 return 1; 464 return 1;
465 } 465 }
466 itc_diff = vpd->itm - vcpu_now_itc; 466 itc_diff = vpd->itm - vcpu_now_itc;
467 if (itc_diff < 0) 467 if (itc_diff < 0)
468 itc_diff = -itc_diff; 468 itc_diff = -itc_diff;
469 469
470 expires = div64_u64(itc_diff, cyc_per_usec); 470 expires = div64_u64(itc_diff, cyc_per_usec);
471 kt = ktime_set(0, 1000 * expires); 471 kt = ktime_set(0, 1000 * expires);
472 472
473 vcpu->arch.ht_active = 1; 473 vcpu->arch.ht_active = 1;
474 hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS); 474 hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS);
475 475
476 vcpu->arch.mp_state = KVM_MP_STATE_HALTED; 476 vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
477 kvm_vcpu_block(vcpu); 477 kvm_vcpu_block(vcpu);
478 hrtimer_cancel(p_ht); 478 hrtimer_cancel(p_ht);
479 vcpu->arch.ht_active = 0; 479 vcpu->arch.ht_active = 0;
480 480
481 if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests) || 481 if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests) ||
482 kvm_cpu_has_pending_timer(vcpu)) 482 kvm_cpu_has_pending_timer(vcpu))
483 if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) 483 if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
484 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 484 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
485 485
486 if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) 486 if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
487 return -EINTR; 487 return -EINTR;
488 return 1; 488 return 1;
489 } else { 489 } else {
490 printk(KERN_ERR"kvm: Unsupported userspace halt!"); 490 printk(KERN_ERR"kvm: Unsupported userspace halt!");
491 return 0; 491 return 0;
492 } 492 }
493 } 493 }
494 494
495 static int handle_vm_shutdown(struct kvm_vcpu *vcpu, 495 static int handle_vm_shutdown(struct kvm_vcpu *vcpu,
496 struct kvm_run *kvm_run) 496 struct kvm_run *kvm_run)
497 { 497 {
498 kvm_run->exit_reason = KVM_EXIT_SHUTDOWN; 498 kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
499 return 0; 499 return 0;
500 } 500 }
501 501
502 static int handle_external_interrupt(struct kvm_vcpu *vcpu, 502 static int handle_external_interrupt(struct kvm_vcpu *vcpu,
503 struct kvm_run *kvm_run) 503 struct kvm_run *kvm_run)
504 { 504 {
505 return 1; 505 return 1;
506 } 506 }
507 507
508 static int handle_vcpu_debug(struct kvm_vcpu *vcpu, 508 static int handle_vcpu_debug(struct kvm_vcpu *vcpu,
509 struct kvm_run *kvm_run) 509 struct kvm_run *kvm_run)
510 { 510 {
511 printk("VMM: %s", vcpu->arch.log_buf); 511 printk("VMM: %s", vcpu->arch.log_buf);
512 return 1; 512 return 1;
513 } 513 }
514 514
515 static int (*kvm_vti_exit_handlers[])(struct kvm_vcpu *vcpu, 515 static int (*kvm_vti_exit_handlers[])(struct kvm_vcpu *vcpu,
516 struct kvm_run *kvm_run) = { 516 struct kvm_run *kvm_run) = {
517 [EXIT_REASON_VM_PANIC] = handle_vm_error, 517 [EXIT_REASON_VM_PANIC] = handle_vm_error,
518 [EXIT_REASON_MMIO_INSTRUCTION] = handle_mmio, 518 [EXIT_REASON_MMIO_INSTRUCTION] = handle_mmio,
519 [EXIT_REASON_PAL_CALL] = handle_pal_call, 519 [EXIT_REASON_PAL_CALL] = handle_pal_call,
520 [EXIT_REASON_SAL_CALL] = handle_sal_call, 520 [EXIT_REASON_SAL_CALL] = handle_sal_call,
521 [EXIT_REASON_SWITCH_RR6] = handle_switch_rr6, 521 [EXIT_REASON_SWITCH_RR6] = handle_switch_rr6,
522 [EXIT_REASON_VM_DESTROY] = handle_vm_shutdown, 522 [EXIT_REASON_VM_DESTROY] = handle_vm_shutdown,
523 [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, 523 [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt,
524 [EXIT_REASON_IPI] = handle_ipi, 524 [EXIT_REASON_IPI] = handle_ipi,
525 [EXIT_REASON_PTC_G] = handle_global_purge, 525 [EXIT_REASON_PTC_G] = handle_global_purge,
526 [EXIT_REASON_DEBUG] = handle_vcpu_debug, 526 [EXIT_REASON_DEBUG] = handle_vcpu_debug,
527 527
528 }; 528 };
529 529
530 static const int kvm_vti_max_exit_handlers = 530 static const int kvm_vti_max_exit_handlers =
531 sizeof(kvm_vti_exit_handlers)/sizeof(*kvm_vti_exit_handlers); 531 sizeof(kvm_vti_exit_handlers)/sizeof(*kvm_vti_exit_handlers);
532 532
533 static uint32_t kvm_get_exit_reason(struct kvm_vcpu *vcpu) 533 static uint32_t kvm_get_exit_reason(struct kvm_vcpu *vcpu)
534 { 534 {
535 struct exit_ctl_data *p_exit_data; 535 struct exit_ctl_data *p_exit_data;
536 536
537 p_exit_data = kvm_get_exit_data(vcpu); 537 p_exit_data = kvm_get_exit_data(vcpu);
538 return p_exit_data->exit_reason; 538 return p_exit_data->exit_reason;
539 } 539 }
540 540
541 /* 541 /*
542 * The guest has exited. See if we can fix it or if we need userspace 542 * The guest has exited. See if we can fix it or if we need userspace
543 * assistance. 543 * assistance.
544 */ 544 */
545 static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) 545 static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
546 { 546 {
547 u32 exit_reason = kvm_get_exit_reason(vcpu); 547 u32 exit_reason = kvm_get_exit_reason(vcpu);
548 vcpu->arch.last_exit = exit_reason; 548 vcpu->arch.last_exit = exit_reason;
549 549
550 if (exit_reason < kvm_vti_max_exit_handlers 550 if (exit_reason < kvm_vti_max_exit_handlers
551 && kvm_vti_exit_handlers[exit_reason]) 551 && kvm_vti_exit_handlers[exit_reason])
552 return kvm_vti_exit_handlers[exit_reason](vcpu, kvm_run); 552 return kvm_vti_exit_handlers[exit_reason](vcpu, kvm_run);
553 else { 553 else {
554 kvm_run->exit_reason = KVM_EXIT_UNKNOWN; 554 kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
555 kvm_run->hw.hardware_exit_reason = exit_reason; 555 kvm_run->hw.hardware_exit_reason = exit_reason;
556 } 556 }
557 return 0; 557 return 0;
558 } 558 }
559 559
560 static inline void vti_set_rr6(unsigned long rr6) 560 static inline void vti_set_rr6(unsigned long rr6)
561 { 561 {
562 ia64_set_rr(RR6, rr6); 562 ia64_set_rr(RR6, rr6);
563 ia64_srlz_i(); 563 ia64_srlz_i();
564 } 564 }
565 565
566 static int kvm_insert_vmm_mapping(struct kvm_vcpu *vcpu) 566 static int kvm_insert_vmm_mapping(struct kvm_vcpu *vcpu)
567 { 567 {
568 unsigned long pte; 568 unsigned long pte;
569 struct kvm *kvm = vcpu->kvm; 569 struct kvm *kvm = vcpu->kvm;
570 int r; 570 int r;
571 571
572 /*Insert a pair of tr to map vmm*/ 572 /*Insert a pair of tr to map vmm*/
573 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); 573 pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL));
574 r = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); 574 r = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT);
575 if (r < 0) 575 if (r < 0)
576 goto out; 576 goto out;
577 vcpu->arch.vmm_tr_slot = r; 577 vcpu->arch.vmm_tr_slot = r;
578 /*Insert a pairt of tr to map data of vm*/ 578 /*Insert a pairt of tr to map data of vm*/
579 pte = pte_val(mk_pte_phys(__pa(kvm->arch.vm_base), PAGE_KERNEL)); 579 pte = pte_val(mk_pte_phys(__pa(kvm->arch.vm_base), PAGE_KERNEL));
580 r = ia64_itr_entry(0x3, KVM_VM_DATA_BASE, 580 r = ia64_itr_entry(0x3, KVM_VM_DATA_BASE,
581 pte, KVM_VM_DATA_SHIFT); 581 pte, KVM_VM_DATA_SHIFT);
582 if (r < 0) 582 if (r < 0)
583 goto out; 583 goto out;
584 vcpu->arch.vm_tr_slot = r; 584 vcpu->arch.vm_tr_slot = r;
585 585
586 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) 586 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
587 if (kvm->arch.is_sn2) { 587 if (kvm->arch.is_sn2) {
588 r = kvm_sn2_setup_mappings(vcpu); 588 r = kvm_sn2_setup_mappings(vcpu);
589 if (r < 0) 589 if (r < 0)
590 goto out; 590 goto out;
591 } 591 }
592 #endif 592 #endif
593 593
594 r = 0; 594 r = 0;
595 out: 595 out:
596 return r; 596 return r;
597 } 597 }
598 598
599 static void kvm_purge_vmm_mapping(struct kvm_vcpu *vcpu) 599 static void kvm_purge_vmm_mapping(struct kvm_vcpu *vcpu)
600 { 600 {
601 struct kvm *kvm = vcpu->kvm; 601 struct kvm *kvm = vcpu->kvm;
602 ia64_ptr_entry(0x3, vcpu->arch.vmm_tr_slot); 602 ia64_ptr_entry(0x3, vcpu->arch.vmm_tr_slot);
603 ia64_ptr_entry(0x3, vcpu->arch.vm_tr_slot); 603 ia64_ptr_entry(0x3, vcpu->arch.vm_tr_slot);
604 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) 604 #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
605 if (kvm->arch.is_sn2) 605 if (kvm->arch.is_sn2)
606 ia64_ptr_entry(0x3, vcpu->arch.sn_rtc_tr_slot); 606 ia64_ptr_entry(0x3, vcpu->arch.sn_rtc_tr_slot);
607 #endif 607 #endif
608 } 608 }
609 609
610 static int kvm_vcpu_pre_transition(struct kvm_vcpu *vcpu) 610 static int kvm_vcpu_pre_transition(struct kvm_vcpu *vcpu)
611 { 611 {
612 unsigned long psr; 612 unsigned long psr;
613 int r; 613 int r;
614 int cpu = smp_processor_id(); 614 int cpu = smp_processor_id();
615 615
616 if (vcpu->arch.last_run_cpu != cpu || 616 if (vcpu->arch.last_run_cpu != cpu ||
617 per_cpu(last_vcpu, cpu) != vcpu) { 617 per_cpu(last_vcpu, cpu) != vcpu) {
618 per_cpu(last_vcpu, cpu) = vcpu; 618 per_cpu(last_vcpu, cpu) = vcpu;
619 vcpu->arch.last_run_cpu = cpu; 619 vcpu->arch.last_run_cpu = cpu;
620 kvm_flush_tlb_all(); 620 kvm_flush_tlb_all();
621 } 621 }
622 622
623 vcpu->arch.host_rr6 = ia64_get_rr(RR6); 623 vcpu->arch.host_rr6 = ia64_get_rr(RR6);
624 vti_set_rr6(vcpu->arch.vmm_rr); 624 vti_set_rr6(vcpu->arch.vmm_rr);
625 local_irq_save(psr); 625 local_irq_save(psr);
626 r = kvm_insert_vmm_mapping(vcpu); 626 r = kvm_insert_vmm_mapping(vcpu);
627 local_irq_restore(psr); 627 local_irq_restore(psr);
628 return r; 628 return r;
629 } 629 }
630 630
631 static void kvm_vcpu_post_transition(struct kvm_vcpu *vcpu) 631 static void kvm_vcpu_post_transition(struct kvm_vcpu *vcpu)
632 { 632 {
633 kvm_purge_vmm_mapping(vcpu); 633 kvm_purge_vmm_mapping(vcpu);
634 vti_set_rr6(vcpu->arch.host_rr6); 634 vti_set_rr6(vcpu->arch.host_rr6);
635 } 635 }
636 636
637 static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 637 static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
638 { 638 {
639 union context *host_ctx, *guest_ctx; 639 union context *host_ctx, *guest_ctx;
640 int r, idx; 640 int r, idx;
641 641
642 idx = srcu_read_lock(&vcpu->kvm->srcu); 642 idx = srcu_read_lock(&vcpu->kvm->srcu);
643 643
644 again: 644 again:
645 if (signal_pending(current)) { 645 if (signal_pending(current)) {
646 r = -EINTR; 646 r = -EINTR;
647 kvm_run->exit_reason = KVM_EXIT_INTR; 647 kvm_run->exit_reason = KVM_EXIT_INTR;
648 goto out; 648 goto out;
649 } 649 }
650 650
651 preempt_disable(); 651 preempt_disable();
652 local_irq_disable(); 652 local_irq_disable();
653 653
654 /*Get host and guest context with guest address space.*/ 654 /*Get host and guest context with guest address space.*/
655 host_ctx = kvm_get_host_context(vcpu); 655 host_ctx = kvm_get_host_context(vcpu);
656 guest_ctx = kvm_get_guest_context(vcpu); 656 guest_ctx = kvm_get_guest_context(vcpu);
657 657
658 clear_bit(KVM_REQ_KICK, &vcpu->requests); 658 clear_bit(KVM_REQ_KICK, &vcpu->requests);
659 659
660 r = kvm_vcpu_pre_transition(vcpu); 660 r = kvm_vcpu_pre_transition(vcpu);
661 if (r < 0) 661 if (r < 0)
662 goto vcpu_run_fail; 662 goto vcpu_run_fail;
663 663
664 srcu_read_unlock(&vcpu->kvm->srcu, idx); 664 srcu_read_unlock(&vcpu->kvm->srcu, idx);
665 kvm_guest_enter(); 665 kvm_guest_enter();
666 666
667 /* 667 /*
668 * Transition to the guest 668 * Transition to the guest
669 */ 669 */
670 kvm_vmm_info->tramp_entry(host_ctx, guest_ctx); 670 kvm_vmm_info->tramp_entry(host_ctx, guest_ctx);
671 671
672 kvm_vcpu_post_transition(vcpu); 672 kvm_vcpu_post_transition(vcpu);
673 673
674 vcpu->arch.launched = 1; 674 vcpu->arch.launched = 1;
675 set_bit(KVM_REQ_KICK, &vcpu->requests); 675 set_bit(KVM_REQ_KICK, &vcpu->requests);
676 local_irq_enable(); 676 local_irq_enable();
677 677
678 /* 678 /*
679 * We must have an instruction between local_irq_enable() and 679 * We must have an instruction between local_irq_enable() and
680 * kvm_guest_exit(), so the timer interrupt isn't delayed by 680 * kvm_guest_exit(), so the timer interrupt isn't delayed by
681 * the interrupt shadow. The stat.exits increment will do nicely. 681 * the interrupt shadow. The stat.exits increment will do nicely.
682 * But we need to prevent reordering, hence this barrier(): 682 * But we need to prevent reordering, hence this barrier():
683 */ 683 */
684 barrier(); 684 barrier();
685 kvm_guest_exit(); 685 kvm_guest_exit();
686 preempt_enable(); 686 preempt_enable();
687 687
688 idx = srcu_read_lock(&vcpu->kvm->srcu); 688 idx = srcu_read_lock(&vcpu->kvm->srcu);
689 689
690 r = kvm_handle_exit(kvm_run, vcpu); 690 r = kvm_handle_exit(kvm_run, vcpu);
691 691
692 if (r > 0) { 692 if (r > 0) {
693 if (!need_resched()) 693 if (!need_resched())
694 goto again; 694 goto again;
695 } 695 }
696 696
697 out: 697 out:
698 srcu_read_unlock(&vcpu->kvm->srcu, idx); 698 srcu_read_unlock(&vcpu->kvm->srcu, idx);
699 if (r > 0) { 699 if (r > 0) {
700 kvm_resched(vcpu); 700 kvm_resched(vcpu);
701 idx = srcu_read_lock(&vcpu->kvm->srcu); 701 idx = srcu_read_lock(&vcpu->kvm->srcu);
702 goto again; 702 goto again;
703 } 703 }
704 704
705 return r; 705 return r;
706 706
707 vcpu_run_fail: 707 vcpu_run_fail:
708 local_irq_enable(); 708 local_irq_enable();
709 preempt_enable(); 709 preempt_enable();
710 kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY; 710 kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
711 goto out; 711 goto out;
712 } 712 }
713 713
714 static void kvm_set_mmio_data(struct kvm_vcpu *vcpu) 714 static void kvm_set_mmio_data(struct kvm_vcpu *vcpu)
715 { 715 {
716 struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu); 716 struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu);
717 717
718 if (!vcpu->mmio_is_write) 718 if (!vcpu->mmio_is_write)
719 memcpy(&p->data, vcpu->mmio_data, 8); 719 memcpy(&p->data, vcpu->mmio_data, 8);
720 p->state = STATE_IORESP_READY; 720 p->state = STATE_IORESP_READY;
721 } 721 }
722 722
723 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 723 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
724 { 724 {
725 int r; 725 int r;
726 sigset_t sigsaved; 726 sigset_t sigsaved;
727 727
728 if (vcpu->sigset_active) 728 if (vcpu->sigset_active)
729 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); 729 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
730 730
731 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { 731 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
732 kvm_vcpu_block(vcpu); 732 kvm_vcpu_block(vcpu);
733 clear_bit(KVM_REQ_UNHALT, &vcpu->requests); 733 clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
734 r = -EAGAIN; 734 r = -EAGAIN;
735 goto out; 735 goto out;
736 } 736 }
737 737
738 if (vcpu->mmio_needed) { 738 if (vcpu->mmio_needed) {
739 memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); 739 memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
740 kvm_set_mmio_data(vcpu); 740 kvm_set_mmio_data(vcpu);
741 vcpu->mmio_read_completed = 1; 741 vcpu->mmio_read_completed = 1;
742 vcpu->mmio_needed = 0; 742 vcpu->mmio_needed = 0;
743 } 743 }
744 r = __vcpu_run(vcpu, kvm_run); 744 r = __vcpu_run(vcpu, kvm_run);
745 out: 745 out:
746 if (vcpu->sigset_active) 746 if (vcpu->sigset_active)
747 sigprocmask(SIG_SETMASK, &sigsaved, NULL); 747 sigprocmask(SIG_SETMASK, &sigsaved, NULL);
748 748
749 return r; 749 return r;
750 } 750 }
751 751
752 static struct kvm *kvm_alloc_kvm(void) 752 static struct kvm *kvm_alloc_kvm(void)
753 { 753 {
754 754
755 struct kvm *kvm; 755 struct kvm *kvm;
756 uint64_t vm_base; 756 uint64_t vm_base;
757 757
758 BUG_ON(sizeof(struct kvm) > KVM_VM_STRUCT_SIZE); 758 BUG_ON(sizeof(struct kvm) > KVM_VM_STRUCT_SIZE);
759 759
760 vm_base = __get_free_pages(GFP_KERNEL, get_order(KVM_VM_DATA_SIZE)); 760 vm_base = __get_free_pages(GFP_KERNEL, get_order(KVM_VM_DATA_SIZE));
761 761
762 if (!vm_base) 762 if (!vm_base)
763 return ERR_PTR(-ENOMEM); 763 return ERR_PTR(-ENOMEM);
764 764
765 memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); 765 memset((void *)vm_base, 0, KVM_VM_DATA_SIZE);
766 kvm = (struct kvm *)(vm_base + 766 kvm = (struct kvm *)(vm_base +
767 offsetof(struct kvm_vm_data, kvm_vm_struct)); 767 offsetof(struct kvm_vm_data, kvm_vm_struct));
768 kvm->arch.vm_base = vm_base; 768 kvm->arch.vm_base = vm_base;
769 printk(KERN_DEBUG"kvm: vm's data area:0x%lx\n", vm_base); 769 printk(KERN_DEBUG"kvm: vm's data area:0x%lx\n", vm_base);
770 770
771 return kvm; 771 return kvm;
772 } 772 }
773 773
774 struct kvm_io_range { 774 struct kvm_io_range {
775 unsigned long start; 775 unsigned long start;
776 unsigned long size; 776 unsigned long size;
777 unsigned long type; 777 unsigned long type;
778 }; 778 };
779 779
780 static const struct kvm_io_range io_ranges[] = { 780 static const struct kvm_io_range io_ranges[] = {
781 {VGA_IO_START, VGA_IO_SIZE, GPFN_FRAME_BUFFER}, 781 {VGA_IO_START, VGA_IO_SIZE, GPFN_FRAME_BUFFER},
782 {MMIO_START, MMIO_SIZE, GPFN_LOW_MMIO}, 782 {MMIO_START, MMIO_SIZE, GPFN_LOW_MMIO},
783 {LEGACY_IO_START, LEGACY_IO_SIZE, GPFN_LEGACY_IO}, 783 {LEGACY_IO_START, LEGACY_IO_SIZE, GPFN_LEGACY_IO},
784 {IO_SAPIC_START, IO_SAPIC_SIZE, GPFN_IOSAPIC}, 784 {IO_SAPIC_START, IO_SAPIC_SIZE, GPFN_IOSAPIC},
785 {PIB_START, PIB_SIZE, GPFN_PIB}, 785 {PIB_START, PIB_SIZE, GPFN_PIB},
786 }; 786 };
787 787
788 static void kvm_build_io_pmt(struct kvm *kvm) 788 static void kvm_build_io_pmt(struct kvm *kvm)
789 { 789 {
790 unsigned long i, j; 790 unsigned long i, j;
791 791
792 /* Mark I/O ranges */ 792 /* Mark I/O ranges */
793 for (i = 0; i < (sizeof(io_ranges) / sizeof(struct kvm_io_range)); 793 for (i = 0; i < (sizeof(io_ranges) / sizeof(struct kvm_io_range));
794 i++) { 794 i++) {
795 for (j = io_ranges[i].start; 795 for (j = io_ranges[i].start;
796 j < io_ranges[i].start + io_ranges[i].size; 796 j < io_ranges[i].start + io_ranges[i].size;
797 j += PAGE_SIZE) 797 j += PAGE_SIZE)
798 kvm_set_pmt_entry(kvm, j >> PAGE_SHIFT, 798 kvm_set_pmt_entry(kvm, j >> PAGE_SHIFT,
799 io_ranges[i].type, 0); 799 io_ranges[i].type, 0);
800 } 800 }
801 801
802 } 802 }
803 803
804 /*Use unused rids to virtualize guest rid.*/ 804 /*Use unused rids to virtualize guest rid.*/
805 #define GUEST_PHYSICAL_RR0 0x1739 805 #define GUEST_PHYSICAL_RR0 0x1739
806 #define GUEST_PHYSICAL_RR4 0x2739 806 #define GUEST_PHYSICAL_RR4 0x2739
807 #define VMM_INIT_RR 0x1660 807 #define VMM_INIT_RR 0x1660
808 808
809 static void kvm_init_vm(struct kvm *kvm) 809 static void kvm_init_vm(struct kvm *kvm)
810 { 810 {
811 BUG_ON(!kvm); 811 BUG_ON(!kvm);
812 812
813 kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0; 813 kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0;
814 kvm->arch.metaphysical_rr4 = GUEST_PHYSICAL_RR4; 814 kvm->arch.metaphysical_rr4 = GUEST_PHYSICAL_RR4;
815 kvm->arch.vmm_init_rr = VMM_INIT_RR; 815 kvm->arch.vmm_init_rr = VMM_INIT_RR;
816 816
817 /* 817 /*
818 *Fill P2M entries for MMIO/IO ranges 818 *Fill P2M entries for MMIO/IO ranges
819 */ 819 */
820 kvm_build_io_pmt(kvm); 820 kvm_build_io_pmt(kvm);
821 821
822 INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); 822 INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
823 823
824 /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ 824 /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
825 set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); 825 set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
826 } 826 }
827 827
828 struct kvm *kvm_arch_create_vm(void) 828 struct kvm *kvm_arch_create_vm(void)
829 { 829 {
830 struct kvm *kvm = kvm_alloc_kvm(); 830 struct kvm *kvm = kvm_alloc_kvm();
831 831
832 if (IS_ERR(kvm)) 832 if (IS_ERR(kvm))
833 return ERR_PTR(-ENOMEM); 833 return ERR_PTR(-ENOMEM);
834 834
835 kvm->arch.is_sn2 = ia64_platform_is("sn2"); 835 kvm->arch.is_sn2 = ia64_platform_is("sn2");
836 836
837 kvm_init_vm(kvm); 837 kvm_init_vm(kvm);
838 838
839 return kvm; 839 return kvm;
840 840
841 } 841 }
842 842
843 static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, 843 static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm,
844 struct kvm_irqchip *chip) 844 struct kvm_irqchip *chip)
845 { 845 {
846 int r; 846 int r;
847 847
848 r = 0; 848 r = 0;
849 switch (chip->chip_id) { 849 switch (chip->chip_id) {
850 case KVM_IRQCHIP_IOAPIC: 850 case KVM_IRQCHIP_IOAPIC:
851 r = kvm_get_ioapic(kvm, &chip->chip.ioapic); 851 r = kvm_get_ioapic(kvm, &chip->chip.ioapic);
852 break; 852 break;
853 default: 853 default:
854 r = -EINVAL; 854 r = -EINVAL;
855 break; 855 break;
856 } 856 }
857 return r; 857 return r;
858 } 858 }
859 859
860 static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) 860 static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
861 { 861 {
862 int r; 862 int r;
863 863
864 r = 0; 864 r = 0;
865 switch (chip->chip_id) { 865 switch (chip->chip_id) {
866 case KVM_IRQCHIP_IOAPIC: 866 case KVM_IRQCHIP_IOAPIC:
867 r = kvm_set_ioapic(kvm, &chip->chip.ioapic); 867 r = kvm_set_ioapic(kvm, &chip->chip.ioapic);
868 break; 868 break;
869 default: 869 default:
870 r = -EINVAL; 870 r = -EINVAL;
871 break; 871 break;
872 } 872 }
873 return r; 873 return r;
874 } 874 }
875 875
876 #define RESTORE_REGS(_x) vcpu->arch._x = regs->_x 876 #define RESTORE_REGS(_x) vcpu->arch._x = regs->_x
877 877
878 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 878 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
879 { 879 {
880 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 880 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
881 int i; 881 int i;
882 882
883 for (i = 0; i < 16; i++) { 883 for (i = 0; i < 16; i++) {
884 vpd->vgr[i] = regs->vpd.vgr[i]; 884 vpd->vgr[i] = regs->vpd.vgr[i];
885 vpd->vbgr[i] = regs->vpd.vbgr[i]; 885 vpd->vbgr[i] = regs->vpd.vbgr[i];
886 } 886 }
887 for (i = 0; i < 128; i++) 887 for (i = 0; i < 128; i++)
888 vpd->vcr[i] = regs->vpd.vcr[i]; 888 vpd->vcr[i] = regs->vpd.vcr[i];
889 vpd->vhpi = regs->vpd.vhpi; 889 vpd->vhpi = regs->vpd.vhpi;
890 vpd->vnat = regs->vpd.vnat; 890 vpd->vnat = regs->vpd.vnat;
891 vpd->vbnat = regs->vpd.vbnat; 891 vpd->vbnat = regs->vpd.vbnat;
892 vpd->vpsr = regs->vpd.vpsr; 892 vpd->vpsr = regs->vpd.vpsr;
893 893
894 vpd->vpr = regs->vpd.vpr; 894 vpd->vpr = regs->vpd.vpr;
895 895
896 memcpy(&vcpu->arch.guest, &regs->saved_guest, sizeof(union context)); 896 memcpy(&vcpu->arch.guest, &regs->saved_guest, sizeof(union context));
897 897
898 RESTORE_REGS(mp_state); 898 RESTORE_REGS(mp_state);
899 RESTORE_REGS(vmm_rr); 899 RESTORE_REGS(vmm_rr);
900 memcpy(vcpu->arch.itrs, regs->itrs, sizeof(struct thash_data) * NITRS); 900 memcpy(vcpu->arch.itrs, regs->itrs, sizeof(struct thash_data) * NITRS);
901 memcpy(vcpu->arch.dtrs, regs->dtrs, sizeof(struct thash_data) * NDTRS); 901 memcpy(vcpu->arch.dtrs, regs->dtrs, sizeof(struct thash_data) * NDTRS);
902 RESTORE_REGS(itr_regions); 902 RESTORE_REGS(itr_regions);
903 RESTORE_REGS(dtr_regions); 903 RESTORE_REGS(dtr_regions);
904 RESTORE_REGS(tc_regions); 904 RESTORE_REGS(tc_regions);
905 RESTORE_REGS(irq_check); 905 RESTORE_REGS(irq_check);
906 RESTORE_REGS(itc_check); 906 RESTORE_REGS(itc_check);
907 RESTORE_REGS(timer_check); 907 RESTORE_REGS(timer_check);
908 RESTORE_REGS(timer_pending); 908 RESTORE_REGS(timer_pending);
909 RESTORE_REGS(last_itc); 909 RESTORE_REGS(last_itc);
910 for (i = 0; i < 8; i++) { 910 for (i = 0; i < 8; i++) {
911 vcpu->arch.vrr[i] = regs->vrr[i]; 911 vcpu->arch.vrr[i] = regs->vrr[i];
912 vcpu->arch.ibr[i] = regs->ibr[i]; 912 vcpu->arch.ibr[i] = regs->ibr[i];
913 vcpu->arch.dbr[i] = regs->dbr[i]; 913 vcpu->arch.dbr[i] = regs->dbr[i];
914 } 914 }
915 for (i = 0; i < 4; i++) 915 for (i = 0; i < 4; i++)
916 vcpu->arch.insvc[i] = regs->insvc[i]; 916 vcpu->arch.insvc[i] = regs->insvc[i];
917 RESTORE_REGS(xtp); 917 RESTORE_REGS(xtp);
918 RESTORE_REGS(metaphysical_rr0); 918 RESTORE_REGS(metaphysical_rr0);
919 RESTORE_REGS(metaphysical_rr4); 919 RESTORE_REGS(metaphysical_rr4);
920 RESTORE_REGS(metaphysical_saved_rr0); 920 RESTORE_REGS(metaphysical_saved_rr0);
921 RESTORE_REGS(metaphysical_saved_rr4); 921 RESTORE_REGS(metaphysical_saved_rr4);
922 RESTORE_REGS(fp_psr); 922 RESTORE_REGS(fp_psr);
923 RESTORE_REGS(saved_gp); 923 RESTORE_REGS(saved_gp);
924 924
925 vcpu->arch.irq_new_pending = 1; 925 vcpu->arch.irq_new_pending = 1;
926 vcpu->arch.itc_offset = regs->saved_itc - kvm_get_itc(vcpu); 926 vcpu->arch.itc_offset = regs->saved_itc - kvm_get_itc(vcpu);
927 set_bit(KVM_REQ_RESUME, &vcpu->requests); 927 set_bit(KVM_REQ_RESUME, &vcpu->requests);
928 928
929 return 0; 929 return 0;
930 } 930 }
931 931
932 long kvm_arch_vm_ioctl(struct file *filp, 932 long kvm_arch_vm_ioctl(struct file *filp,
933 unsigned int ioctl, unsigned long arg) 933 unsigned int ioctl, unsigned long arg)
934 { 934 {
935 struct kvm *kvm = filp->private_data; 935 struct kvm *kvm = filp->private_data;
936 void __user *argp = (void __user *)arg; 936 void __user *argp = (void __user *)arg;
937 int r = -ENOTTY; 937 int r = -ENOTTY;
938 938
939 switch (ioctl) { 939 switch (ioctl) {
940 case KVM_SET_MEMORY_REGION: { 940 case KVM_SET_MEMORY_REGION: {
941 struct kvm_memory_region kvm_mem; 941 struct kvm_memory_region kvm_mem;
942 struct kvm_userspace_memory_region kvm_userspace_mem; 942 struct kvm_userspace_memory_region kvm_userspace_mem;
943 943
944 r = -EFAULT; 944 r = -EFAULT;
945 if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) 945 if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem))
946 goto out; 946 goto out;
947 kvm_userspace_mem.slot = kvm_mem.slot; 947 kvm_userspace_mem.slot = kvm_mem.slot;
948 kvm_userspace_mem.flags = kvm_mem.flags; 948 kvm_userspace_mem.flags = kvm_mem.flags;
949 kvm_userspace_mem.guest_phys_addr = 949 kvm_userspace_mem.guest_phys_addr =
950 kvm_mem.guest_phys_addr; 950 kvm_mem.guest_phys_addr;
951 kvm_userspace_mem.memory_size = kvm_mem.memory_size; 951 kvm_userspace_mem.memory_size = kvm_mem.memory_size;
952 r = kvm_vm_ioctl_set_memory_region(kvm, 952 r = kvm_vm_ioctl_set_memory_region(kvm,
953 &kvm_userspace_mem, 0); 953 &kvm_userspace_mem, 0);
954 if (r) 954 if (r)
955 goto out; 955 goto out;
956 break; 956 break;
957 } 957 }
958 case KVM_CREATE_IRQCHIP: 958 case KVM_CREATE_IRQCHIP:
959 r = -EFAULT; 959 r = -EFAULT;
960 r = kvm_ioapic_init(kvm); 960 r = kvm_ioapic_init(kvm);
961 if (r) 961 if (r)
962 goto out; 962 goto out;
963 r = kvm_setup_default_irq_routing(kvm); 963 r = kvm_setup_default_irq_routing(kvm);
964 if (r) { 964 if (r) {
965 kvm_ioapic_destroy(kvm); 965 kvm_ioapic_destroy(kvm);
966 goto out; 966 goto out;
967 } 967 }
968 break; 968 break;
969 case KVM_IRQ_LINE_STATUS: 969 case KVM_IRQ_LINE_STATUS:
970 case KVM_IRQ_LINE: { 970 case KVM_IRQ_LINE: {
971 struct kvm_irq_level irq_event; 971 struct kvm_irq_level irq_event;
972 972
973 r = -EFAULT; 973 r = -EFAULT;
974 if (copy_from_user(&irq_event, argp, sizeof irq_event)) 974 if (copy_from_user(&irq_event, argp, sizeof irq_event))
975 goto out; 975 goto out;
976 r = -ENXIO; 976 r = -ENXIO;
977 if (irqchip_in_kernel(kvm)) { 977 if (irqchip_in_kernel(kvm)) {
978 __s32 status; 978 __s32 status;
979 status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 979 status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
980 irq_event.irq, irq_event.level); 980 irq_event.irq, irq_event.level);
981 if (ioctl == KVM_IRQ_LINE_STATUS) { 981 if (ioctl == KVM_IRQ_LINE_STATUS) {
982 r = -EFAULT; 982 r = -EFAULT;
983 irq_event.status = status; 983 irq_event.status = status;
984 if (copy_to_user(argp, &irq_event, 984 if (copy_to_user(argp, &irq_event,
985 sizeof irq_event)) 985 sizeof irq_event))
986 goto out; 986 goto out;
987 } 987 }
988 r = 0; 988 r = 0;
989 } 989 }
990 break; 990 break;
991 } 991 }
992 case KVM_GET_IRQCHIP: { 992 case KVM_GET_IRQCHIP: {
993 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ 993 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
994 struct kvm_irqchip chip; 994 struct kvm_irqchip chip;
995 995
996 r = -EFAULT; 996 r = -EFAULT;
997 if (copy_from_user(&chip, argp, sizeof chip)) 997 if (copy_from_user(&chip, argp, sizeof chip))
998 goto out; 998 goto out;
999 r = -ENXIO; 999 r = -ENXIO;
1000 if (!irqchip_in_kernel(kvm)) 1000 if (!irqchip_in_kernel(kvm))
1001 goto out; 1001 goto out;
1002 r = kvm_vm_ioctl_get_irqchip(kvm, &chip); 1002 r = kvm_vm_ioctl_get_irqchip(kvm, &chip);
1003 if (r) 1003 if (r)
1004 goto out; 1004 goto out;
1005 r = -EFAULT; 1005 r = -EFAULT;
1006 if (copy_to_user(argp, &chip, sizeof chip)) 1006 if (copy_to_user(argp, &chip, sizeof chip))
1007 goto out; 1007 goto out;
1008 r = 0; 1008 r = 0;
1009 break; 1009 break;
1010 } 1010 }
1011 case KVM_SET_IRQCHIP: { 1011 case KVM_SET_IRQCHIP: {
1012 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ 1012 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
1013 struct kvm_irqchip chip; 1013 struct kvm_irqchip chip;
1014 1014
1015 r = -EFAULT; 1015 r = -EFAULT;
1016 if (copy_from_user(&chip, argp, sizeof chip)) 1016 if (copy_from_user(&chip, argp, sizeof chip))
1017 goto out; 1017 goto out;
1018 r = -ENXIO; 1018 r = -ENXIO;
1019 if (!irqchip_in_kernel(kvm)) 1019 if (!irqchip_in_kernel(kvm))
1020 goto out; 1020 goto out;
1021 r = kvm_vm_ioctl_set_irqchip(kvm, &chip); 1021 r = kvm_vm_ioctl_set_irqchip(kvm, &chip);
1022 if (r) 1022 if (r)
1023 goto out; 1023 goto out;
1024 r = 0; 1024 r = 0;
1025 break; 1025 break;
1026 } 1026 }
1027 default: 1027 default:
1028 ; 1028 ;
1029 } 1029 }
1030 out: 1030 out:
1031 return r; 1031 return r;
1032 } 1032 }
1033 1033
1034 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, 1034 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
1035 struct kvm_sregs *sregs) 1035 struct kvm_sregs *sregs)
1036 { 1036 {
1037 return -EINVAL; 1037 return -EINVAL;
1038 } 1038 }
1039 1039
1040 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, 1040 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
1041 struct kvm_sregs *sregs) 1041 struct kvm_sregs *sregs)
1042 { 1042 {
1043 return -EINVAL; 1043 return -EINVAL;
1044 1044
1045 } 1045 }
1046 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, 1046 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
1047 struct kvm_translation *tr) 1047 struct kvm_translation *tr)
1048 { 1048 {
1049 1049
1050 return -EINVAL; 1050 return -EINVAL;
1051 } 1051 }
1052 1052
1053 static int kvm_alloc_vmm_area(void) 1053 static int kvm_alloc_vmm_area(void)
1054 { 1054 {
1055 if (!kvm_vmm_base && (kvm_vm_buffer_size < KVM_VM_BUFFER_SIZE)) { 1055 if (!kvm_vmm_base && (kvm_vm_buffer_size < KVM_VM_BUFFER_SIZE)) {
1056 kvm_vmm_base = __get_free_pages(GFP_KERNEL, 1056 kvm_vmm_base = __get_free_pages(GFP_KERNEL,
1057 get_order(KVM_VMM_SIZE)); 1057 get_order(KVM_VMM_SIZE));
1058 if (!kvm_vmm_base) 1058 if (!kvm_vmm_base)
1059 return -ENOMEM; 1059 return -ENOMEM;
1060 1060
1061 memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); 1061 memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE);
1062 kvm_vm_buffer = kvm_vmm_base + VMM_SIZE; 1062 kvm_vm_buffer = kvm_vmm_base + VMM_SIZE;
1063 1063
1064 printk(KERN_DEBUG"kvm:VMM's Base Addr:0x%lx, vm_buffer:0x%lx\n", 1064 printk(KERN_DEBUG"kvm:VMM's Base Addr:0x%lx, vm_buffer:0x%lx\n",
1065 kvm_vmm_base, kvm_vm_buffer); 1065 kvm_vmm_base, kvm_vm_buffer);
1066 } 1066 }
1067 1067
1068 return 0; 1068 return 0;
1069 } 1069 }
1070 1070
1071 static void kvm_free_vmm_area(void) 1071 static void kvm_free_vmm_area(void)
1072 { 1072 {
1073 if (kvm_vmm_base) { 1073 if (kvm_vmm_base) {
1074 /*Zero this area before free to avoid bits leak!!*/ 1074 /*Zero this area before free to avoid bits leak!!*/
1075 memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); 1075 memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE);
1076 free_pages(kvm_vmm_base, get_order(KVM_VMM_SIZE)); 1076 free_pages(kvm_vmm_base, get_order(KVM_VMM_SIZE));
1077 kvm_vmm_base = 0; 1077 kvm_vmm_base = 0;
1078 kvm_vm_buffer = 0; 1078 kvm_vm_buffer = 0;
1079 kvm_vsa_base = 0; 1079 kvm_vsa_base = 0;
1080 } 1080 }
1081 } 1081 }
1082 1082
1083 static int vti_init_vpd(struct kvm_vcpu *vcpu) 1083 static int vti_init_vpd(struct kvm_vcpu *vcpu)
1084 { 1084 {
1085 int i; 1085 int i;
1086 union cpuid3_t cpuid3; 1086 union cpuid3_t cpuid3;
1087 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 1087 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
1088 1088
1089 if (IS_ERR(vpd)) 1089 if (IS_ERR(vpd))
1090 return PTR_ERR(vpd); 1090 return PTR_ERR(vpd);
1091 1091
1092 /* CPUID init */ 1092 /* CPUID init */
1093 for (i = 0; i < 5; i++) 1093 for (i = 0; i < 5; i++)
1094 vpd->vcpuid[i] = ia64_get_cpuid(i); 1094 vpd->vcpuid[i] = ia64_get_cpuid(i);
1095 1095
1096 /* Limit the CPUID number to 5 */ 1096 /* Limit the CPUID number to 5 */
1097 cpuid3.value = vpd->vcpuid[3]; 1097 cpuid3.value = vpd->vcpuid[3];
1098 cpuid3.number = 4; /* 5 - 1 */ 1098 cpuid3.number = 4; /* 5 - 1 */
1099 vpd->vcpuid[3] = cpuid3.value; 1099 vpd->vcpuid[3] = cpuid3.value;
1100 1100
1101 /*Set vac and vdc fields*/ 1101 /*Set vac and vdc fields*/
1102 vpd->vac.a_from_int_cr = 1; 1102 vpd->vac.a_from_int_cr = 1;
1103 vpd->vac.a_to_int_cr = 1; 1103 vpd->vac.a_to_int_cr = 1;
1104 vpd->vac.a_from_psr = 1; 1104 vpd->vac.a_from_psr = 1;
1105 vpd->vac.a_from_cpuid = 1; 1105 vpd->vac.a_from_cpuid = 1;
1106 vpd->vac.a_cover = 1; 1106 vpd->vac.a_cover = 1;
1107 vpd->vac.a_bsw = 1; 1107 vpd->vac.a_bsw = 1;
1108 vpd->vac.a_int = 1; 1108 vpd->vac.a_int = 1;
1109 vpd->vdc.d_vmsw = 1; 1109 vpd->vdc.d_vmsw = 1;
1110 1110
1111 /*Set virtual buffer*/ 1111 /*Set virtual buffer*/
1112 vpd->virt_env_vaddr = KVM_VM_BUFFER_BASE; 1112 vpd->virt_env_vaddr = KVM_VM_BUFFER_BASE;
1113 1113
1114 return 0; 1114 return 0;
1115 } 1115 }
1116 1116
1117 static int vti_create_vp(struct kvm_vcpu *vcpu) 1117 static int vti_create_vp(struct kvm_vcpu *vcpu)
1118 { 1118 {
1119 long ret; 1119 long ret;
1120 struct vpd *vpd = vcpu->arch.vpd; 1120 struct vpd *vpd = vcpu->arch.vpd;
1121 unsigned long vmm_ivt; 1121 unsigned long vmm_ivt;
1122 1122
1123 vmm_ivt = kvm_vmm_info->vmm_ivt; 1123 vmm_ivt = kvm_vmm_info->vmm_ivt;
1124 1124
1125 printk(KERN_DEBUG "kvm: vcpu:%p,ivt: 0x%lx\n", vcpu, vmm_ivt); 1125 printk(KERN_DEBUG "kvm: vcpu:%p,ivt: 0x%lx\n", vcpu, vmm_ivt);
1126 1126
1127 ret = ia64_pal_vp_create((u64 *)vpd, (u64 *)vmm_ivt, 0); 1127 ret = ia64_pal_vp_create((u64 *)vpd, (u64 *)vmm_ivt, 0);
1128 1128
1129 if (ret) { 1129 if (ret) {
1130 printk(KERN_ERR"kvm: ia64_pal_vp_create failed!\n"); 1130 printk(KERN_ERR"kvm: ia64_pal_vp_create failed!\n");
1131 return -EINVAL; 1131 return -EINVAL;
1132 } 1132 }
1133 return 0; 1133 return 0;
1134 } 1134 }
1135 1135
1136 static void init_ptce_info(struct kvm_vcpu *vcpu) 1136 static void init_ptce_info(struct kvm_vcpu *vcpu)
1137 { 1137 {
1138 ia64_ptce_info_t ptce = {0}; 1138 ia64_ptce_info_t ptce = {0};
1139 1139
1140 ia64_get_ptce(&ptce); 1140 ia64_get_ptce(&ptce);
1141 vcpu->arch.ptce_base = ptce.base; 1141 vcpu->arch.ptce_base = ptce.base;
1142 vcpu->arch.ptce_count[0] = ptce.count[0]; 1142 vcpu->arch.ptce_count[0] = ptce.count[0];
1143 vcpu->arch.ptce_count[1] = ptce.count[1]; 1143 vcpu->arch.ptce_count[1] = ptce.count[1];
1144 vcpu->arch.ptce_stride[0] = ptce.stride[0]; 1144 vcpu->arch.ptce_stride[0] = ptce.stride[0];
1145 vcpu->arch.ptce_stride[1] = ptce.stride[1]; 1145 vcpu->arch.ptce_stride[1] = ptce.stride[1];
1146 } 1146 }
1147 1147
1148 static void kvm_migrate_hlt_timer(struct kvm_vcpu *vcpu) 1148 static void kvm_migrate_hlt_timer(struct kvm_vcpu *vcpu)
1149 { 1149 {
1150 struct hrtimer *p_ht = &vcpu->arch.hlt_timer; 1150 struct hrtimer *p_ht = &vcpu->arch.hlt_timer;
1151 1151
1152 if (hrtimer_cancel(p_ht)) 1152 if (hrtimer_cancel(p_ht))
1153 hrtimer_start_expires(p_ht, HRTIMER_MODE_ABS); 1153 hrtimer_start_expires(p_ht, HRTIMER_MODE_ABS);
1154 } 1154 }
1155 1155
1156 static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data) 1156 static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data)
1157 { 1157 {
1158 struct kvm_vcpu *vcpu; 1158 struct kvm_vcpu *vcpu;
1159 wait_queue_head_t *q; 1159 wait_queue_head_t *q;
1160 1160
1161 vcpu = container_of(data, struct kvm_vcpu, arch.hlt_timer); 1161 vcpu = container_of(data, struct kvm_vcpu, arch.hlt_timer);
1162 q = &vcpu->wq; 1162 q = &vcpu->wq;
1163 1163
1164 if (vcpu->arch.mp_state != KVM_MP_STATE_HALTED) 1164 if (vcpu->arch.mp_state != KVM_MP_STATE_HALTED)
1165 goto out; 1165 goto out;
1166 1166
1167 if (waitqueue_active(q)) 1167 if (waitqueue_active(q))
1168 wake_up_interruptible(q); 1168 wake_up_interruptible(q);
1169 1169
1170 out: 1170 out:
1171 vcpu->arch.timer_fired = 1; 1171 vcpu->arch.timer_fired = 1;
1172 vcpu->arch.timer_check = 1; 1172 vcpu->arch.timer_check = 1;
1173 return HRTIMER_NORESTART; 1173 return HRTIMER_NORESTART;
1174 } 1174 }
1175 1175
1176 #define PALE_RESET_ENTRY 0x80000000ffffffb0UL 1176 #define PALE_RESET_ENTRY 0x80000000ffffffb0UL
1177 1177
1178 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) 1178 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
1179 { 1179 {
1180 struct kvm_vcpu *v; 1180 struct kvm_vcpu *v;
1181 int r; 1181 int r;
1182 int i; 1182 int i;
1183 long itc_offset; 1183 long itc_offset;
1184 struct kvm *kvm = vcpu->kvm; 1184 struct kvm *kvm = vcpu->kvm;
1185 struct kvm_pt_regs *regs = vcpu_regs(vcpu); 1185 struct kvm_pt_regs *regs = vcpu_regs(vcpu);
1186 1186
1187 union context *p_ctx = &vcpu->arch.guest; 1187 union context *p_ctx = &vcpu->arch.guest;
1188 struct kvm_vcpu *vmm_vcpu = to_guest(vcpu->kvm, vcpu); 1188 struct kvm_vcpu *vmm_vcpu = to_guest(vcpu->kvm, vcpu);
1189 1189
1190 /*Init vcpu context for first run.*/ 1190 /*Init vcpu context for first run.*/
1191 if (IS_ERR(vmm_vcpu)) 1191 if (IS_ERR(vmm_vcpu))
1192 return PTR_ERR(vmm_vcpu); 1192 return PTR_ERR(vmm_vcpu);
1193 1193
1194 if (kvm_vcpu_is_bsp(vcpu)) { 1194 if (kvm_vcpu_is_bsp(vcpu)) {
1195 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 1195 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
1196 1196
1197 /*Set entry address for first run.*/ 1197 /*Set entry address for first run.*/
1198 regs->cr_iip = PALE_RESET_ENTRY; 1198 regs->cr_iip = PALE_RESET_ENTRY;
1199 1199
1200 /*Initialize itc offset for vcpus*/ 1200 /*Initialize itc offset for vcpus*/
1201 itc_offset = 0UL - kvm_get_itc(vcpu); 1201 itc_offset = 0UL - kvm_get_itc(vcpu);
1202 for (i = 0; i < KVM_MAX_VCPUS; i++) { 1202 for (i = 0; i < KVM_MAX_VCPUS; i++) {
1203 v = (struct kvm_vcpu *)((char *)vcpu + 1203 v = (struct kvm_vcpu *)((char *)vcpu +
1204 sizeof(struct kvm_vcpu_data) * i); 1204 sizeof(struct kvm_vcpu_data) * i);
1205 v->arch.itc_offset = itc_offset; 1205 v->arch.itc_offset = itc_offset;
1206 v->arch.last_itc = 0; 1206 v->arch.last_itc = 0;
1207 } 1207 }
1208 } else 1208 } else
1209 vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; 1209 vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
1210 1210
1211 r = -ENOMEM; 1211 r = -ENOMEM;
1212 vcpu->arch.apic = kzalloc(sizeof(struct kvm_lapic), GFP_KERNEL); 1212 vcpu->arch.apic = kzalloc(sizeof(struct kvm_lapic), GFP_KERNEL);
1213 if (!vcpu->arch.apic) 1213 if (!vcpu->arch.apic)
1214 goto out; 1214 goto out;
1215 vcpu->arch.apic->vcpu = vcpu; 1215 vcpu->arch.apic->vcpu = vcpu;
1216 1216
1217 p_ctx->gr[1] = 0; 1217 p_ctx->gr[1] = 0;
1218 p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + KVM_STK_OFFSET); 1218 p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + KVM_STK_OFFSET);
1219 p_ctx->gr[13] = (unsigned long)vmm_vcpu; 1219 p_ctx->gr[13] = (unsigned long)vmm_vcpu;
1220 p_ctx->psr = 0x1008522000UL; 1220 p_ctx->psr = 0x1008522000UL;
1221 p_ctx->ar[40] = FPSR_DEFAULT; /*fpsr*/ 1221 p_ctx->ar[40] = FPSR_DEFAULT; /*fpsr*/
1222 p_ctx->caller_unat = 0; 1222 p_ctx->caller_unat = 0;
1223 p_ctx->pr = 0x0; 1223 p_ctx->pr = 0x0;
1224 p_ctx->ar[36] = 0x0; /*unat*/ 1224 p_ctx->ar[36] = 0x0; /*unat*/
1225 p_ctx->ar[19] = 0x0; /*rnat*/ 1225 p_ctx->ar[19] = 0x0; /*rnat*/
1226 p_ctx->ar[18] = (unsigned long)vmm_vcpu + 1226 p_ctx->ar[18] = (unsigned long)vmm_vcpu +
1227 ((sizeof(struct kvm_vcpu)+15) & ~15); 1227 ((sizeof(struct kvm_vcpu)+15) & ~15);
1228 p_ctx->ar[64] = 0x0; /*pfs*/ 1228 p_ctx->ar[64] = 0x0; /*pfs*/
1229 p_ctx->cr[0] = 0x7e04UL; 1229 p_ctx->cr[0] = 0x7e04UL;
1230 p_ctx->cr[2] = (unsigned long)kvm_vmm_info->vmm_ivt; 1230 p_ctx->cr[2] = (unsigned long)kvm_vmm_info->vmm_ivt;
1231 p_ctx->cr[8] = 0x3c; 1231 p_ctx->cr[8] = 0x3c;
1232 1232
1233 /*Initilize region register*/ 1233 /*Initilize region register*/
1234 p_ctx->rr[0] = 0x30; 1234 p_ctx->rr[0] = 0x30;
1235 p_ctx->rr[1] = 0x30; 1235 p_ctx->rr[1] = 0x30;
1236 p_ctx->rr[2] = 0x30; 1236 p_ctx->rr[2] = 0x30;
1237 p_ctx->rr[3] = 0x30; 1237 p_ctx->rr[3] = 0x30;
1238 p_ctx->rr[4] = 0x30; 1238 p_ctx->rr[4] = 0x30;
1239 p_ctx->rr[5] = 0x30; 1239 p_ctx->rr[5] = 0x30;
1240 p_ctx->rr[7] = 0x30; 1240 p_ctx->rr[7] = 0x30;
1241 1241
1242 /*Initilize branch register 0*/ 1242 /*Initilize branch register 0*/
1243 p_ctx->br[0] = *(unsigned long *)kvm_vmm_info->vmm_entry; 1243 p_ctx->br[0] = *(unsigned long *)kvm_vmm_info->vmm_entry;
1244 1244
1245 vcpu->arch.vmm_rr = kvm->arch.vmm_init_rr; 1245 vcpu->arch.vmm_rr = kvm->arch.vmm_init_rr;
1246 vcpu->arch.metaphysical_rr0 = kvm->arch.metaphysical_rr0; 1246 vcpu->arch.metaphysical_rr0 = kvm->arch.metaphysical_rr0;
1247 vcpu->arch.metaphysical_rr4 = kvm->arch.metaphysical_rr4; 1247 vcpu->arch.metaphysical_rr4 = kvm->arch.metaphysical_rr4;
1248 1248
1249 hrtimer_init(&vcpu->arch.hlt_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 1249 hrtimer_init(&vcpu->arch.hlt_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
1250 vcpu->arch.hlt_timer.function = hlt_timer_fn; 1250 vcpu->arch.hlt_timer.function = hlt_timer_fn;
1251 1251
1252 vcpu->arch.last_run_cpu = -1; 1252 vcpu->arch.last_run_cpu = -1;
1253 vcpu->arch.vpd = (struct vpd *)VPD_BASE(vcpu->vcpu_id); 1253 vcpu->arch.vpd = (struct vpd *)VPD_BASE(vcpu->vcpu_id);
1254 vcpu->arch.vsa_base = kvm_vsa_base; 1254 vcpu->arch.vsa_base = kvm_vsa_base;
1255 vcpu->arch.__gp = kvm_vmm_gp; 1255 vcpu->arch.__gp = kvm_vmm_gp;
1256 vcpu->arch.dirty_log_lock_pa = __pa(&kvm->arch.dirty_log_lock); 1256 vcpu->arch.dirty_log_lock_pa = __pa(&kvm->arch.dirty_log_lock);
1257 vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_BASE(vcpu->vcpu_id); 1257 vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_BASE(vcpu->vcpu_id);
1258 vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_BASE(vcpu->vcpu_id); 1258 vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_BASE(vcpu->vcpu_id);
1259 init_ptce_info(vcpu); 1259 init_ptce_info(vcpu);
1260 1260
1261 r = 0; 1261 r = 0;
1262 out: 1262 out:
1263 return r; 1263 return r;
1264 } 1264 }
1265 1265
1266 static int vti_vcpu_setup(struct kvm_vcpu *vcpu, int id) 1266 static int vti_vcpu_setup(struct kvm_vcpu *vcpu, int id)
1267 { 1267 {
1268 unsigned long psr; 1268 unsigned long psr;
1269 int r; 1269 int r;
1270 1270
1271 local_irq_save(psr); 1271 local_irq_save(psr);
1272 r = kvm_insert_vmm_mapping(vcpu); 1272 r = kvm_insert_vmm_mapping(vcpu);
1273 local_irq_restore(psr); 1273 local_irq_restore(psr);
1274 if (r) 1274 if (r)
1275 goto fail; 1275 goto fail;
1276 r = kvm_vcpu_init(vcpu, vcpu->kvm, id); 1276 r = kvm_vcpu_init(vcpu, vcpu->kvm, id);
1277 if (r) 1277 if (r)
1278 goto fail; 1278 goto fail;
1279 1279
1280 r = vti_init_vpd(vcpu); 1280 r = vti_init_vpd(vcpu);
1281 if (r) { 1281 if (r) {
1282 printk(KERN_DEBUG"kvm: vpd init error!!\n"); 1282 printk(KERN_DEBUG"kvm: vpd init error!!\n");
1283 goto uninit; 1283 goto uninit;
1284 } 1284 }
1285 1285
1286 r = vti_create_vp(vcpu); 1286 r = vti_create_vp(vcpu);
1287 if (r) 1287 if (r)
1288 goto uninit; 1288 goto uninit;
1289 1289
1290 kvm_purge_vmm_mapping(vcpu); 1290 kvm_purge_vmm_mapping(vcpu);
1291 1291
1292 return 0; 1292 return 0;
1293 uninit: 1293 uninit:
1294 kvm_vcpu_uninit(vcpu); 1294 kvm_vcpu_uninit(vcpu);
1295 fail: 1295 fail:
1296 return r; 1296 return r;
1297 } 1297 }
1298 1298
1299 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 1299 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
1300 unsigned int id) 1300 unsigned int id)
1301 { 1301 {
1302 struct kvm_vcpu *vcpu; 1302 struct kvm_vcpu *vcpu;
1303 unsigned long vm_base = kvm->arch.vm_base; 1303 unsigned long vm_base = kvm->arch.vm_base;
1304 int r; 1304 int r;
1305 int cpu; 1305 int cpu;
1306 1306
1307 BUG_ON(sizeof(struct kvm_vcpu) > VCPU_STRUCT_SIZE/2); 1307 BUG_ON(sizeof(struct kvm_vcpu) > VCPU_STRUCT_SIZE/2);
1308 1308
1309 r = -EINVAL; 1309 r = -EINVAL;
1310 if (id >= KVM_MAX_VCPUS) { 1310 if (id >= KVM_MAX_VCPUS) {
1311 printk(KERN_ERR"kvm: Can't configure vcpus > %ld", 1311 printk(KERN_ERR"kvm: Can't configure vcpus > %ld",
1312 KVM_MAX_VCPUS); 1312 KVM_MAX_VCPUS);
1313 goto fail; 1313 goto fail;
1314 } 1314 }
1315 1315
1316 r = -ENOMEM; 1316 r = -ENOMEM;
1317 if (!vm_base) { 1317 if (!vm_base) {
1318 printk(KERN_ERR"kvm: Create vcpu[%d] error!\n", id); 1318 printk(KERN_ERR"kvm: Create vcpu[%d] error!\n", id);
1319 goto fail; 1319 goto fail;
1320 } 1320 }
1321 vcpu = (struct kvm_vcpu *)(vm_base + offsetof(struct kvm_vm_data, 1321 vcpu = (struct kvm_vcpu *)(vm_base + offsetof(struct kvm_vm_data,
1322 vcpu_data[id].vcpu_struct)); 1322 vcpu_data[id].vcpu_struct));
1323 vcpu->kvm = kvm; 1323 vcpu->kvm = kvm;
1324 1324
1325 cpu = get_cpu(); 1325 cpu = get_cpu();
1326 r = vti_vcpu_setup(vcpu, id); 1326 r = vti_vcpu_setup(vcpu, id);
1327 put_cpu(); 1327 put_cpu();
1328 1328
1329 if (r) { 1329 if (r) {
1330 printk(KERN_DEBUG"kvm: vcpu_setup error!!\n"); 1330 printk(KERN_DEBUG"kvm: vcpu_setup error!!\n");
1331 goto fail; 1331 goto fail;
1332 } 1332 }
1333 1333
1334 return vcpu; 1334 return vcpu;
1335 fail: 1335 fail:
1336 return ERR_PTR(r); 1336 return ERR_PTR(r);
1337 } 1337 }
1338 1338
1339 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) 1339 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
1340 { 1340 {
1341 return 0; 1341 return 0;
1342 } 1342 }
1343 1343
1344 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 1344 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
1345 { 1345 {
1346 return -EINVAL; 1346 return -EINVAL;
1347 } 1347 }
1348 1348
1349 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 1349 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
1350 { 1350 {
1351 return -EINVAL; 1351 return -EINVAL;
1352 } 1352 }
1353 1353
1354 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 1354 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
1355 struct kvm_guest_debug *dbg) 1355 struct kvm_guest_debug *dbg)
1356 { 1356 {
1357 return -EINVAL; 1357 return -EINVAL;
1358 } 1358 }
1359 1359
1360 static void free_kvm(struct kvm *kvm) 1360 static void free_kvm(struct kvm *kvm)
1361 { 1361 {
1362 unsigned long vm_base = kvm->arch.vm_base; 1362 unsigned long vm_base = kvm->arch.vm_base;
1363 1363
1364 if (vm_base) { 1364 if (vm_base) {
1365 memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); 1365 memset((void *)vm_base, 0, KVM_VM_DATA_SIZE);
1366 free_pages(vm_base, get_order(KVM_VM_DATA_SIZE)); 1366 free_pages(vm_base, get_order(KVM_VM_DATA_SIZE));
1367 } 1367 }
1368 1368
1369 } 1369 }
1370 1370
1371 static void kvm_release_vm_pages(struct kvm *kvm) 1371 static void kvm_release_vm_pages(struct kvm *kvm)
1372 { 1372 {
1373 struct kvm_memslots *slots; 1373 struct kvm_memslots *slots;
1374 struct kvm_memory_slot *memslot; 1374 struct kvm_memory_slot *memslot;
1375 int i, j; 1375 int i, j;
1376 unsigned long base_gfn; 1376 unsigned long base_gfn;
1377 1377
1378 slots = kvm_memslots(kvm); 1378 slots = kvm_memslots(kvm);
1379 for (i = 0; i < slots->nmemslots; i++) { 1379 for (i = 0; i < slots->nmemslots; i++) {
1380 memslot = &slots->memslots[i]; 1380 memslot = &slots->memslots[i];
1381 base_gfn = memslot->base_gfn; 1381 base_gfn = memslot->base_gfn;
1382 1382
1383 for (j = 0; j < memslot->npages; j++) { 1383 for (j = 0; j < memslot->npages; j++) {
1384 if (memslot->rmap[j]) 1384 if (memslot->rmap[j])
1385 put_page((struct page *)memslot->rmap[j]); 1385 put_page((struct page *)memslot->rmap[j]);
1386 } 1386 }
1387 } 1387 }
1388 } 1388 }
1389 1389
1390 void kvm_arch_sync_events(struct kvm *kvm) 1390 void kvm_arch_sync_events(struct kvm *kvm)
1391 { 1391 {
1392 } 1392 }
1393 1393
1394 void kvm_arch_destroy_vm(struct kvm *kvm) 1394 void kvm_arch_destroy_vm(struct kvm *kvm)
1395 { 1395 {
1396 kvm_iommu_unmap_guest(kvm); 1396 kvm_iommu_unmap_guest(kvm);
1397 #ifdef KVM_CAP_DEVICE_ASSIGNMENT 1397 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
1398 kvm_free_all_assigned_devices(kvm); 1398 kvm_free_all_assigned_devices(kvm);
1399 #endif 1399 #endif
1400 kfree(kvm->arch.vioapic); 1400 kfree(kvm->arch.vioapic);
1401 kvm_release_vm_pages(kvm); 1401 kvm_release_vm_pages(kvm);
1402 kvm_free_physmem(kvm); 1402 kvm_free_physmem(kvm);
1403 cleanup_srcu_struct(&kvm->srcu); 1403 cleanup_srcu_struct(&kvm->srcu);
1404 free_kvm(kvm); 1404 free_kvm(kvm);
1405 } 1405 }
1406 1406
1407 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 1407 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
1408 { 1408 {
1409 } 1409 }
1410 1410
1411 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 1411 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
1412 { 1412 {
1413 if (cpu != vcpu->cpu) { 1413 if (cpu != vcpu->cpu) {
1414 vcpu->cpu = cpu; 1414 vcpu->cpu = cpu;
1415 if (vcpu->arch.ht_active) 1415 if (vcpu->arch.ht_active)
1416 kvm_migrate_hlt_timer(vcpu); 1416 kvm_migrate_hlt_timer(vcpu);
1417 } 1417 }
1418 } 1418 }
1419 1419
1420 #define SAVE_REGS(_x) regs->_x = vcpu->arch._x 1420 #define SAVE_REGS(_x) regs->_x = vcpu->arch._x
1421 1421
1422 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 1422 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
1423 { 1423 {
1424 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 1424 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
1425 int i; 1425 int i;
1426 1426
1427 vcpu_load(vcpu); 1427 vcpu_load(vcpu);
1428 1428
1429 for (i = 0; i < 16; i++) { 1429 for (i = 0; i < 16; i++) {
1430 regs->vpd.vgr[i] = vpd->vgr[i]; 1430 regs->vpd.vgr[i] = vpd->vgr[i];
1431 regs->vpd.vbgr[i] = vpd->vbgr[i]; 1431 regs->vpd.vbgr[i] = vpd->vbgr[i];
1432 } 1432 }
1433 for (i = 0; i < 128; i++) 1433 for (i = 0; i < 128; i++)
1434 regs->vpd.vcr[i] = vpd->vcr[i]; 1434 regs->vpd.vcr[i] = vpd->vcr[i];
1435 regs->vpd.vhpi = vpd->vhpi; 1435 regs->vpd.vhpi = vpd->vhpi;
1436 regs->vpd.vnat = vpd->vnat; 1436 regs->vpd.vnat = vpd->vnat;
1437 regs->vpd.vbnat = vpd->vbnat; 1437 regs->vpd.vbnat = vpd->vbnat;
1438 regs->vpd.vpsr = vpd->vpsr; 1438 regs->vpd.vpsr = vpd->vpsr;
1439 regs->vpd.vpr = vpd->vpr; 1439 regs->vpd.vpr = vpd->vpr;
1440 1440
1441 memcpy(&regs->saved_guest, &vcpu->arch.guest, sizeof(union context)); 1441 memcpy(&regs->saved_guest, &vcpu->arch.guest, sizeof(union context));
1442 1442
1443 SAVE_REGS(mp_state); 1443 SAVE_REGS(mp_state);
1444 SAVE_REGS(vmm_rr); 1444 SAVE_REGS(vmm_rr);
1445 memcpy(regs->itrs, vcpu->arch.itrs, sizeof(struct thash_data) * NITRS); 1445 memcpy(regs->itrs, vcpu->arch.itrs, sizeof(struct thash_data) * NITRS);
1446 memcpy(regs->dtrs, vcpu->arch.dtrs, sizeof(struct thash_data) * NDTRS); 1446 memcpy(regs->dtrs, vcpu->arch.dtrs, sizeof(struct thash_data) * NDTRS);
1447 SAVE_REGS(itr_regions); 1447 SAVE_REGS(itr_regions);
1448 SAVE_REGS(dtr_regions); 1448 SAVE_REGS(dtr_regions);
1449 SAVE_REGS(tc_regions); 1449 SAVE_REGS(tc_regions);
1450 SAVE_REGS(irq_check); 1450 SAVE_REGS(irq_check);
1451 SAVE_REGS(itc_check); 1451 SAVE_REGS(itc_check);
1452 SAVE_REGS(timer_check); 1452 SAVE_REGS(timer_check);
1453 SAVE_REGS(timer_pending); 1453 SAVE_REGS(timer_pending);
1454 SAVE_REGS(last_itc); 1454 SAVE_REGS(last_itc);
1455 for (i = 0; i < 8; i++) { 1455 for (i = 0; i < 8; i++) {
1456 regs->vrr[i] = vcpu->arch.vrr[i]; 1456 regs->vrr[i] = vcpu->arch.vrr[i];
1457 regs->ibr[i] = vcpu->arch.ibr[i]; 1457 regs->ibr[i] = vcpu->arch.ibr[i];
1458 regs->dbr[i] = vcpu->arch.dbr[i]; 1458 regs->dbr[i] = vcpu->arch.dbr[i];
1459 } 1459 }
1460 for (i = 0; i < 4; i++) 1460 for (i = 0; i < 4; i++)
1461 regs->insvc[i] = vcpu->arch.insvc[i]; 1461 regs->insvc[i] = vcpu->arch.insvc[i];
1462 regs->saved_itc = vcpu->arch.itc_offset + kvm_get_itc(vcpu); 1462 regs->saved_itc = vcpu->arch.itc_offset + kvm_get_itc(vcpu);
1463 SAVE_REGS(xtp); 1463 SAVE_REGS(xtp);
1464 SAVE_REGS(metaphysical_rr0); 1464 SAVE_REGS(metaphysical_rr0);
1465 SAVE_REGS(metaphysical_rr4); 1465 SAVE_REGS(metaphysical_rr4);
1466 SAVE_REGS(metaphysical_saved_rr0); 1466 SAVE_REGS(metaphysical_saved_rr0);
1467 SAVE_REGS(metaphysical_saved_rr4); 1467 SAVE_REGS(metaphysical_saved_rr4);
1468 SAVE_REGS(fp_psr); 1468 SAVE_REGS(fp_psr);
1469 SAVE_REGS(saved_gp); 1469 SAVE_REGS(saved_gp);
1470 1470
1471 vcpu_put(vcpu); 1471 vcpu_put(vcpu);
1472 return 0; 1472 return 0;
1473 } 1473 }
1474 1474
1475 int kvm_arch_vcpu_ioctl_get_stack(struct kvm_vcpu *vcpu, 1475 int kvm_arch_vcpu_ioctl_get_stack(struct kvm_vcpu *vcpu,
1476 struct kvm_ia64_vcpu_stack *stack) 1476 struct kvm_ia64_vcpu_stack *stack)
1477 { 1477 {
1478 memcpy(stack, vcpu, sizeof(struct kvm_ia64_vcpu_stack)); 1478 memcpy(stack, vcpu, sizeof(struct kvm_ia64_vcpu_stack));
1479 return 0; 1479 return 0;
1480 } 1480 }
1481 1481
1482 int kvm_arch_vcpu_ioctl_set_stack(struct kvm_vcpu *vcpu, 1482 int kvm_arch_vcpu_ioctl_set_stack(struct kvm_vcpu *vcpu,
1483 struct kvm_ia64_vcpu_stack *stack) 1483 struct kvm_ia64_vcpu_stack *stack)
1484 { 1484 {
1485 memcpy(vcpu + 1, &stack->stack[0] + sizeof(struct kvm_vcpu), 1485 memcpy(vcpu + 1, &stack->stack[0] + sizeof(struct kvm_vcpu),
1486 sizeof(struct kvm_ia64_vcpu_stack) - sizeof(struct kvm_vcpu)); 1486 sizeof(struct kvm_ia64_vcpu_stack) - sizeof(struct kvm_vcpu));
1487 1487
1488 vcpu->arch.exit_data = ((struct kvm_vcpu *)stack)->arch.exit_data; 1488 vcpu->arch.exit_data = ((struct kvm_vcpu *)stack)->arch.exit_data;
1489 return 0; 1489 return 0;
1490 } 1490 }
1491 1491
1492 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) 1492 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
1493 { 1493 {
1494 1494
1495 hrtimer_cancel(&vcpu->arch.hlt_timer); 1495 hrtimer_cancel(&vcpu->arch.hlt_timer);
1496 kfree(vcpu->arch.apic); 1496 kfree(vcpu->arch.apic);
1497 } 1497 }
1498 1498
1499 1499
1500 long kvm_arch_vcpu_ioctl(struct file *filp, 1500 long kvm_arch_vcpu_ioctl(struct file *filp,
1501 unsigned int ioctl, unsigned long arg) 1501 unsigned int ioctl, unsigned long arg)
1502 { 1502 {
1503 struct kvm_vcpu *vcpu = filp->private_data; 1503 struct kvm_vcpu *vcpu = filp->private_data;
1504 void __user *argp = (void __user *)arg; 1504 void __user *argp = (void __user *)arg;
1505 struct kvm_ia64_vcpu_stack *stack = NULL; 1505 struct kvm_ia64_vcpu_stack *stack = NULL;
1506 long r; 1506 long r;
1507 1507
1508 switch (ioctl) { 1508 switch (ioctl) {
1509 case KVM_IA64_VCPU_GET_STACK: { 1509 case KVM_IA64_VCPU_GET_STACK: {
1510 struct kvm_ia64_vcpu_stack __user *user_stack; 1510 struct kvm_ia64_vcpu_stack __user *user_stack;
1511 void __user *first_p = argp; 1511 void __user *first_p = argp;
1512 1512
1513 r = -EFAULT; 1513 r = -EFAULT;
1514 if (copy_from_user(&user_stack, first_p, sizeof(void *))) 1514 if (copy_from_user(&user_stack, first_p, sizeof(void *)))
1515 goto out; 1515 goto out;
1516 1516
1517 if (!access_ok(VERIFY_WRITE, user_stack, 1517 if (!access_ok(VERIFY_WRITE, user_stack,
1518 sizeof(struct kvm_ia64_vcpu_stack))) { 1518 sizeof(struct kvm_ia64_vcpu_stack))) {
1519 printk(KERN_INFO "KVM_IA64_VCPU_GET_STACK: " 1519 printk(KERN_INFO "KVM_IA64_VCPU_GET_STACK: "
1520 "Illegal user destination address for stack\n"); 1520 "Illegal user destination address for stack\n");
1521 goto out; 1521 goto out;
1522 } 1522 }
1523 stack = kzalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); 1523 stack = kzalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL);
1524 if (!stack) { 1524 if (!stack) {
1525 r = -ENOMEM; 1525 r = -ENOMEM;
1526 goto out; 1526 goto out;
1527 } 1527 }
1528 1528
1529 r = kvm_arch_vcpu_ioctl_get_stack(vcpu, stack); 1529 r = kvm_arch_vcpu_ioctl_get_stack(vcpu, stack);
1530 if (r) 1530 if (r)
1531 goto out; 1531 goto out;
1532 1532
1533 if (copy_to_user(user_stack, stack, 1533 if (copy_to_user(user_stack, stack,
1534 sizeof(struct kvm_ia64_vcpu_stack))) { 1534 sizeof(struct kvm_ia64_vcpu_stack))) {
1535 r = -EFAULT; 1535 r = -EFAULT;
1536 goto out; 1536 goto out;
1537 } 1537 }
1538 1538
1539 break; 1539 break;
1540 } 1540 }
1541 case KVM_IA64_VCPU_SET_STACK: { 1541 case KVM_IA64_VCPU_SET_STACK: {
1542 struct kvm_ia64_vcpu_stack __user *user_stack; 1542 struct kvm_ia64_vcpu_stack __user *user_stack;
1543 void __user *first_p = argp; 1543 void __user *first_p = argp;
1544 1544
1545 r = -EFAULT; 1545 r = -EFAULT;
1546 if (copy_from_user(&user_stack, first_p, sizeof(void *))) 1546 if (copy_from_user(&user_stack, first_p, sizeof(void *)))
1547 goto out; 1547 goto out;
1548 1548
1549 if (!access_ok(VERIFY_READ, user_stack, 1549 if (!access_ok(VERIFY_READ, user_stack,
1550 sizeof(struct kvm_ia64_vcpu_stack))) { 1550 sizeof(struct kvm_ia64_vcpu_stack))) {
1551 printk(KERN_INFO "KVM_IA64_VCPU_SET_STACK: " 1551 printk(KERN_INFO "KVM_IA64_VCPU_SET_STACK: "
1552 "Illegal user address for stack\n"); 1552 "Illegal user address for stack\n");
1553 goto out; 1553 goto out;
1554 } 1554 }
1555 stack = kmalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); 1555 stack = kmalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL);
1556 if (!stack) { 1556 if (!stack) {
1557 r = -ENOMEM; 1557 r = -ENOMEM;
1558 goto out; 1558 goto out;
1559 } 1559 }
1560 if (copy_from_user(stack, user_stack, 1560 if (copy_from_user(stack, user_stack,
1561 sizeof(struct kvm_ia64_vcpu_stack))) 1561 sizeof(struct kvm_ia64_vcpu_stack)))
1562 goto out; 1562 goto out;
1563 1563
1564 r = kvm_arch_vcpu_ioctl_set_stack(vcpu, stack); 1564 r = kvm_arch_vcpu_ioctl_set_stack(vcpu, stack);
1565 break; 1565 break;
1566 } 1566 }
1567 1567
1568 default: 1568 default:
1569 r = -EINVAL; 1569 r = -EINVAL;
1570 } 1570 }
1571 1571
1572 out: 1572 out:
1573 kfree(stack); 1573 kfree(stack);
1574 return r; 1574 return r;
1575 } 1575 }
1576 1576
1577 int kvm_arch_prepare_memory_region(struct kvm *kvm, 1577 int kvm_arch_prepare_memory_region(struct kvm *kvm,
1578 struct kvm_memory_slot *memslot, 1578 struct kvm_memory_slot *memslot,
1579 struct kvm_memory_slot old, 1579 struct kvm_memory_slot old,
1580 struct kvm_userspace_memory_region *mem, 1580 struct kvm_userspace_memory_region *mem,
1581 int user_alloc) 1581 int user_alloc)
1582 { 1582 {
1583 unsigned long i; 1583 unsigned long i;
1584 unsigned long pfn; 1584 unsigned long pfn;
1585 int npages = memslot->npages; 1585 int npages = memslot->npages;
1586 unsigned long base_gfn = memslot->base_gfn; 1586 unsigned long base_gfn = memslot->base_gfn;
1587 1587
1588 if (base_gfn + npages > (KVM_MAX_MEM_SIZE >> PAGE_SHIFT)) 1588 if (base_gfn + npages > (KVM_MAX_MEM_SIZE >> PAGE_SHIFT))
1589 return -ENOMEM; 1589 return -ENOMEM;
1590 1590
1591 for (i = 0; i < npages; i++) { 1591 for (i = 0; i < npages; i++) {
1592 pfn = gfn_to_pfn(kvm, base_gfn + i); 1592 pfn = gfn_to_pfn(kvm, base_gfn + i);
1593 if (!kvm_is_mmio_pfn(pfn)) { 1593 if (!kvm_is_mmio_pfn(pfn)) {
1594 kvm_set_pmt_entry(kvm, base_gfn + i, 1594 kvm_set_pmt_entry(kvm, base_gfn + i,
1595 pfn << PAGE_SHIFT, 1595 pfn << PAGE_SHIFT,
1596 _PAGE_AR_RWX | _PAGE_MA_WB); 1596 _PAGE_AR_RWX | _PAGE_MA_WB);
1597 memslot->rmap[i] = (unsigned long)pfn_to_page(pfn); 1597 memslot->rmap[i] = (unsigned long)pfn_to_page(pfn);
1598 } else { 1598 } else {
1599 kvm_set_pmt_entry(kvm, base_gfn + i, 1599 kvm_set_pmt_entry(kvm, base_gfn + i,
1600 GPFN_PHYS_MMIO | (pfn << PAGE_SHIFT), 1600 GPFN_PHYS_MMIO | (pfn << PAGE_SHIFT),
1601 _PAGE_MA_UC); 1601 _PAGE_MA_UC);
1602 memslot->rmap[i] = 0; 1602 memslot->rmap[i] = 0;
1603 } 1603 }
1604 } 1604 }
1605 1605
1606 return 0; 1606 return 0;
1607 } 1607 }
1608 1608
1609 void kvm_arch_commit_memory_region(struct kvm *kvm, 1609 void kvm_arch_commit_memory_region(struct kvm *kvm,
1610 struct kvm_userspace_memory_region *mem, 1610 struct kvm_userspace_memory_region *mem,
1611 struct kvm_memory_slot old, 1611 struct kvm_memory_slot old,
1612 int user_alloc) 1612 int user_alloc)
1613 { 1613 {
1614 return; 1614 return;
1615 } 1615 }
1616 1616
1617 void kvm_arch_flush_shadow(struct kvm *kvm) 1617 void kvm_arch_flush_shadow(struct kvm *kvm)
1618 { 1618 {
1619 kvm_flush_remote_tlbs(kvm); 1619 kvm_flush_remote_tlbs(kvm);
1620 } 1620 }
1621 1621
1622 long kvm_arch_dev_ioctl(struct file *filp, 1622 long kvm_arch_dev_ioctl(struct file *filp,
1623 unsigned int ioctl, unsigned long arg) 1623 unsigned int ioctl, unsigned long arg)
1624 { 1624 {
1625 return -EINVAL; 1625 return -EINVAL;
1626 } 1626 }
1627 1627
1628 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 1628 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
1629 { 1629 {
1630 kvm_vcpu_uninit(vcpu); 1630 kvm_vcpu_uninit(vcpu);
1631 } 1631 }
1632 1632
1633 static int vti_cpu_has_kvm_support(void) 1633 static int vti_cpu_has_kvm_support(void)
1634 { 1634 {
1635 long avail = 1, status = 1, control = 1; 1635 long avail = 1, status = 1, control = 1;
1636 long ret; 1636 long ret;
1637 1637
1638 ret = ia64_pal_proc_get_features(&avail, &status, &control, 0); 1638 ret = ia64_pal_proc_get_features(&avail, &status, &control, 0);
1639 if (ret) 1639 if (ret)
1640 goto out; 1640 goto out;
1641 1641
1642 if (!(avail & PAL_PROC_VM_BIT)) 1642 if (!(avail & PAL_PROC_VM_BIT))
1643 goto out; 1643 goto out;
1644 1644
1645 printk(KERN_DEBUG"kvm: Hardware Supports VT\n"); 1645 printk(KERN_DEBUG"kvm: Hardware Supports VT\n");
1646 1646
1647 ret = ia64_pal_vp_env_info(&kvm_vm_buffer_size, &vp_env_info); 1647 ret = ia64_pal_vp_env_info(&kvm_vm_buffer_size, &vp_env_info);
1648 if (ret) 1648 if (ret)
1649 goto out; 1649 goto out;
1650 printk(KERN_DEBUG"kvm: VM Buffer Size:0x%lx\n", kvm_vm_buffer_size); 1650 printk(KERN_DEBUG"kvm: VM Buffer Size:0x%lx\n", kvm_vm_buffer_size);
1651 1651
1652 if (!(vp_env_info & VP_OPCODE)) { 1652 if (!(vp_env_info & VP_OPCODE)) {
1653 printk(KERN_WARNING"kvm: No opcode ability on hardware, " 1653 printk(KERN_WARNING"kvm: No opcode ability on hardware, "
1654 "vm_env_info:0x%lx\n", vp_env_info); 1654 "vm_env_info:0x%lx\n", vp_env_info);
1655 } 1655 }
1656 1656
1657 return 1; 1657 return 1;
1658 out: 1658 out:
1659 return 0; 1659 return 0;
1660 } 1660 }
1661 1661
1662 1662
1663 /* 1663 /*
1664 * On SN2, the ITC isn't stable, so copy in fast path code to use the 1664 * On SN2, the ITC isn't stable, so copy in fast path code to use the
1665 * SN2 RTC, replacing the ITC based default verion. 1665 * SN2 RTC, replacing the ITC based default verion.
1666 */ 1666 */
1667 static void kvm_patch_vmm(struct kvm_vmm_info *vmm_info, 1667 static void kvm_patch_vmm(struct kvm_vmm_info *vmm_info,
1668 struct module *module) 1668 struct module *module)
1669 { 1669 {
1670 unsigned long new_ar, new_ar_sn2; 1670 unsigned long new_ar, new_ar_sn2;
1671 unsigned long module_base; 1671 unsigned long module_base;
1672 1672
1673 if (!ia64_platform_is("sn2")) 1673 if (!ia64_platform_is("sn2"))
1674 return; 1674 return;
1675 1675
1676 module_base = (unsigned long)module->module_core; 1676 module_base = (unsigned long)module->module_core;
1677 1677
1678 new_ar = kvm_vmm_base + vmm_info->patch_mov_ar - module_base; 1678 new_ar = kvm_vmm_base + vmm_info->patch_mov_ar - module_base;
1679 new_ar_sn2 = kvm_vmm_base + vmm_info->patch_mov_ar_sn2 - module_base; 1679 new_ar_sn2 = kvm_vmm_base + vmm_info->patch_mov_ar_sn2 - module_base;
1680 1680
1681 printk(KERN_INFO "kvm: Patching ITC emulation to use SGI SN2 RTC " 1681 printk(KERN_INFO "kvm: Patching ITC emulation to use SGI SN2 RTC "
1682 "as source\n"); 1682 "as source\n");
1683 1683
1684 /* 1684 /*
1685 * Copy the SN2 version of mov_ar into place. They are both 1685 * Copy the SN2 version of mov_ar into place. They are both
1686 * the same size, so 6 bundles is sufficient (6 * 0x10). 1686 * the same size, so 6 bundles is sufficient (6 * 0x10).
1687 */ 1687 */
1688 memcpy((void *)new_ar, (void *)new_ar_sn2, 0x60); 1688 memcpy((void *)new_ar, (void *)new_ar_sn2, 0x60);
1689 } 1689 }
1690 1690
1691 static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info, 1691 static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info,
1692 struct module *module) 1692 struct module *module)
1693 { 1693 {
1694 unsigned long module_base; 1694 unsigned long module_base;
1695 unsigned long vmm_size; 1695 unsigned long vmm_size;
1696 1696
1697 unsigned long vmm_offset, func_offset, fdesc_offset; 1697 unsigned long vmm_offset, func_offset, fdesc_offset;
1698 struct fdesc *p_fdesc; 1698 struct fdesc *p_fdesc;
1699 1699
1700 BUG_ON(!module); 1700 BUG_ON(!module);
1701 1701
1702 if (!kvm_vmm_base) { 1702 if (!kvm_vmm_base) {
1703 printk("kvm: kvm area hasn't been initilized yet!!\n"); 1703 printk("kvm: kvm area hasn't been initilized yet!!\n");
1704 return -EFAULT; 1704 return -EFAULT;
1705 } 1705 }
1706 1706
1707 /*Calculate new position of relocated vmm module.*/ 1707 /*Calculate new position of relocated vmm module.*/
1708 module_base = (unsigned long)module->module_core; 1708 module_base = (unsigned long)module->module_core;
1709 vmm_size = module->core_size; 1709 vmm_size = module->core_size;
1710 if (unlikely(vmm_size > KVM_VMM_SIZE)) 1710 if (unlikely(vmm_size > KVM_VMM_SIZE))
1711 return -EFAULT; 1711 return -EFAULT;
1712 1712
1713 memcpy((void *)kvm_vmm_base, (void *)module_base, vmm_size); 1713 memcpy((void *)kvm_vmm_base, (void *)module_base, vmm_size);
1714 kvm_patch_vmm(vmm_info, module); 1714 kvm_patch_vmm(vmm_info, module);
1715 kvm_flush_icache(kvm_vmm_base, vmm_size); 1715 kvm_flush_icache(kvm_vmm_base, vmm_size);
1716 1716
1717 /*Recalculate kvm_vmm_info based on new VMM*/ 1717 /*Recalculate kvm_vmm_info based on new VMM*/
1718 vmm_offset = vmm_info->vmm_ivt - module_base; 1718 vmm_offset = vmm_info->vmm_ivt - module_base;
1719 kvm_vmm_info->vmm_ivt = KVM_VMM_BASE + vmm_offset; 1719 kvm_vmm_info->vmm_ivt = KVM_VMM_BASE + vmm_offset;
1720 printk(KERN_DEBUG"kvm: Relocated VMM's IVT Base Addr:%lx\n", 1720 printk(KERN_DEBUG"kvm: Relocated VMM's IVT Base Addr:%lx\n",
1721 kvm_vmm_info->vmm_ivt); 1721 kvm_vmm_info->vmm_ivt);
1722 1722
1723 fdesc_offset = (unsigned long)vmm_info->vmm_entry - module_base; 1723 fdesc_offset = (unsigned long)vmm_info->vmm_entry - module_base;
1724 kvm_vmm_info->vmm_entry = (kvm_vmm_entry *)(KVM_VMM_BASE + 1724 kvm_vmm_info->vmm_entry = (kvm_vmm_entry *)(KVM_VMM_BASE +
1725 fdesc_offset); 1725 fdesc_offset);
1726 func_offset = *(unsigned long *)vmm_info->vmm_entry - module_base; 1726 func_offset = *(unsigned long *)vmm_info->vmm_entry - module_base;
1727 p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); 1727 p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset);
1728 p_fdesc->ip = KVM_VMM_BASE + func_offset; 1728 p_fdesc->ip = KVM_VMM_BASE + func_offset;
1729 p_fdesc->gp = KVM_VMM_BASE+(p_fdesc->gp - module_base); 1729 p_fdesc->gp = KVM_VMM_BASE+(p_fdesc->gp - module_base);
1730 1730
1731 printk(KERN_DEBUG"kvm: Relocated VMM's Init Entry Addr:%lx\n", 1731 printk(KERN_DEBUG"kvm: Relocated VMM's Init Entry Addr:%lx\n",
1732 KVM_VMM_BASE+func_offset); 1732 KVM_VMM_BASE+func_offset);
1733 1733
1734 fdesc_offset = (unsigned long)vmm_info->tramp_entry - module_base; 1734 fdesc_offset = (unsigned long)vmm_info->tramp_entry - module_base;
1735 kvm_vmm_info->tramp_entry = (kvm_tramp_entry *)(KVM_VMM_BASE + 1735 kvm_vmm_info->tramp_entry = (kvm_tramp_entry *)(KVM_VMM_BASE +
1736 fdesc_offset); 1736 fdesc_offset);
1737 func_offset = *(unsigned long *)vmm_info->tramp_entry - module_base; 1737 func_offset = *(unsigned long *)vmm_info->tramp_entry - module_base;
1738 p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); 1738 p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset);
1739 p_fdesc->ip = KVM_VMM_BASE + func_offset; 1739 p_fdesc->ip = KVM_VMM_BASE + func_offset;
1740 p_fdesc->gp = KVM_VMM_BASE + (p_fdesc->gp - module_base); 1740 p_fdesc->gp = KVM_VMM_BASE + (p_fdesc->gp - module_base);
1741 1741
1742 kvm_vmm_gp = p_fdesc->gp; 1742 kvm_vmm_gp = p_fdesc->gp;
1743 1743
1744 printk(KERN_DEBUG"kvm: Relocated VMM's Entry IP:%p\n", 1744 printk(KERN_DEBUG"kvm: Relocated VMM's Entry IP:%p\n",
1745 kvm_vmm_info->vmm_entry); 1745 kvm_vmm_info->vmm_entry);
1746 printk(KERN_DEBUG"kvm: Relocated VMM's Trampoline Entry IP:0x%lx\n", 1746 printk(KERN_DEBUG"kvm: Relocated VMM's Trampoline Entry IP:0x%lx\n",
1747 KVM_VMM_BASE + func_offset); 1747 KVM_VMM_BASE + func_offset);
1748 1748
1749 return 0; 1749 return 0;
1750 } 1750 }
1751 1751
1752 int kvm_arch_init(void *opaque) 1752 int kvm_arch_init(void *opaque)
1753 { 1753 {
1754 int r; 1754 int r;
1755 struct kvm_vmm_info *vmm_info = (struct kvm_vmm_info *)opaque; 1755 struct kvm_vmm_info *vmm_info = (struct kvm_vmm_info *)opaque;
1756 1756
1757 if (!vti_cpu_has_kvm_support()) { 1757 if (!vti_cpu_has_kvm_support()) {
1758 printk(KERN_ERR "kvm: No Hardware Virtualization Support!\n"); 1758 printk(KERN_ERR "kvm: No Hardware Virtualization Support!\n");
1759 r = -EOPNOTSUPP; 1759 r = -EOPNOTSUPP;
1760 goto out; 1760 goto out;
1761 } 1761 }
1762 1762
1763 if (kvm_vmm_info) { 1763 if (kvm_vmm_info) {
1764 printk(KERN_ERR "kvm: Already loaded VMM module!\n"); 1764 printk(KERN_ERR "kvm: Already loaded VMM module!\n");
1765 r = -EEXIST; 1765 r = -EEXIST;
1766 goto out; 1766 goto out;
1767 } 1767 }
1768 1768
1769 r = -ENOMEM; 1769 r = -ENOMEM;
1770 kvm_vmm_info = kzalloc(sizeof(struct kvm_vmm_info), GFP_KERNEL); 1770 kvm_vmm_info = kzalloc(sizeof(struct kvm_vmm_info), GFP_KERNEL);
1771 if (!kvm_vmm_info) 1771 if (!kvm_vmm_info)
1772 goto out; 1772 goto out;
1773 1773
1774 if (kvm_alloc_vmm_area()) 1774 if (kvm_alloc_vmm_area())
1775 goto out_free0; 1775 goto out_free0;
1776 1776
1777 r = kvm_relocate_vmm(vmm_info, vmm_info->module); 1777 r = kvm_relocate_vmm(vmm_info, vmm_info->module);
1778 if (r) 1778 if (r)
1779 goto out_free1; 1779 goto out_free1;
1780 1780
1781 return 0; 1781 return 0;
1782 1782
1783 out_free1: 1783 out_free1:
1784 kvm_free_vmm_area(); 1784 kvm_free_vmm_area();
1785 out_free0: 1785 out_free0:
1786 kfree(kvm_vmm_info); 1786 kfree(kvm_vmm_info);
1787 out: 1787 out:
1788 return r; 1788 return r;
1789 } 1789 }
1790 1790
1791 void kvm_arch_exit(void) 1791 void kvm_arch_exit(void)
1792 { 1792 {
1793 kvm_free_vmm_area(); 1793 kvm_free_vmm_area();
1794 kfree(kvm_vmm_info); 1794 kfree(kvm_vmm_info);
1795 kvm_vmm_info = NULL; 1795 kvm_vmm_info = NULL;
1796 } 1796 }
1797 1797
1798 static int kvm_ia64_sync_dirty_log(struct kvm *kvm, 1798 static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
1799 struct kvm_dirty_log *log) 1799 struct kvm_dirty_log *log)
1800 { 1800 {
1801 struct kvm_memory_slot *memslot; 1801 struct kvm_memory_slot *memslot;
1802 int r, i; 1802 int r, i;
1803 long base; 1803 long base;
1804 unsigned long n; 1804 unsigned long n;
1805 unsigned long *dirty_bitmap = (unsigned long *)(kvm->arch.vm_base + 1805 unsigned long *dirty_bitmap = (unsigned long *)(kvm->arch.vm_base +
1806 offsetof(struct kvm_vm_data, kvm_mem_dirty_log)); 1806 offsetof(struct kvm_vm_data, kvm_mem_dirty_log));
1807 1807
1808 r = -EINVAL; 1808 r = -EINVAL;
1809 if (log->slot >= KVM_MEMORY_SLOTS) 1809 if (log->slot >= KVM_MEMORY_SLOTS)
1810 goto out; 1810 goto out;
1811 1811
1812 memslot = &kvm->memslots->memslots[log->slot]; 1812 memslot = &kvm->memslots->memslots[log->slot];
1813 r = -ENOENT; 1813 r = -ENOENT;
1814 if (!memslot->dirty_bitmap) 1814 if (!memslot->dirty_bitmap)
1815 goto out; 1815 goto out;
1816 1816
1817 n = kvm_dirty_bitmap_bytes(memslot); 1817 n = kvm_dirty_bitmap_bytes(memslot);
1818 base = memslot->base_gfn / BITS_PER_LONG; 1818 base = memslot->base_gfn / BITS_PER_LONG;
1819 1819
1820 for (i = 0; i < n/sizeof(long); ++i) { 1820 for (i = 0; i < n/sizeof(long); ++i) {
1821 memslot->dirty_bitmap[i] = dirty_bitmap[base + i]; 1821 memslot->dirty_bitmap[i] = dirty_bitmap[base + i];
1822 dirty_bitmap[base + i] = 0; 1822 dirty_bitmap[base + i] = 0;
1823 } 1823 }
1824 r = 0; 1824 r = 0;
1825 out: 1825 out:
1826 return r; 1826 return r;
1827 } 1827 }
1828 1828
1829 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, 1829 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
1830 struct kvm_dirty_log *log) 1830 struct kvm_dirty_log *log)
1831 { 1831 {
1832 int r; 1832 int r;
1833 unsigned long n; 1833 unsigned long n;
1834 struct kvm_memory_slot *memslot; 1834 struct kvm_memory_slot *memslot;
1835 int is_dirty = 0; 1835 int is_dirty = 0;
1836 1836
1837 mutex_lock(&kvm->slots_lock); 1837 mutex_lock(&kvm->slots_lock);
1838 spin_lock(&kvm->arch.dirty_log_lock); 1838 spin_lock(&kvm->arch.dirty_log_lock);
1839 1839
1840 r = kvm_ia64_sync_dirty_log(kvm, log); 1840 r = kvm_ia64_sync_dirty_log(kvm, log);
1841 if (r) 1841 if (r)
1842 goto out; 1842 goto out;
1843 1843
1844 r = kvm_get_dirty_log(kvm, log, &is_dirty); 1844 r = kvm_get_dirty_log(kvm, log, &is_dirty);
1845 if (r) 1845 if (r)
1846 goto out; 1846 goto out;
1847 1847
1848 /* If nothing is dirty, don't bother messing with page tables. */ 1848 /* If nothing is dirty, don't bother messing with page tables. */
1849 if (is_dirty) { 1849 if (is_dirty) {
1850 kvm_flush_remote_tlbs(kvm); 1850 kvm_flush_remote_tlbs(kvm);
1851 memslot = &kvm->memslots->memslots[log->slot]; 1851 memslot = &kvm->memslots->memslots[log->slot];
1852 n = kvm_dirty_bitmap_bytes(memslot); 1852 n = kvm_dirty_bitmap_bytes(memslot);
1853 memset(memslot->dirty_bitmap, 0, n); 1853 memset(memslot->dirty_bitmap, 0, n);
1854 } 1854 }
1855 r = 0; 1855 r = 0;
1856 out: 1856 out:
1857 mutex_unlock(&kvm->slots_lock); 1857 mutex_unlock(&kvm->slots_lock);
1858 spin_unlock(&kvm->arch.dirty_log_lock); 1858 spin_unlock(&kvm->arch.dirty_log_lock);
1859 return r; 1859 return r;
1860 } 1860 }
1861 1861
1862 int kvm_arch_hardware_setup(void) 1862 int kvm_arch_hardware_setup(void)
1863 { 1863 {
1864 return 0; 1864 return 0;
1865 } 1865 }
1866 1866
1867 void kvm_arch_hardware_unsetup(void) 1867 void kvm_arch_hardware_unsetup(void)
1868 { 1868 {
1869 } 1869 }
1870 1870
1871 void kvm_vcpu_kick(struct kvm_vcpu *vcpu) 1871 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
1872 { 1872 {
1873 int me; 1873 int me;
1874 int cpu = vcpu->cpu; 1874 int cpu = vcpu->cpu;
1875 1875
1876 if (waitqueue_active(&vcpu->wq)) 1876 if (waitqueue_active(&vcpu->wq))
1877 wake_up_interruptible(&vcpu->wq); 1877 wake_up_interruptible(&vcpu->wq);
1878 1878
1879 me = get_cpu(); 1879 me = get_cpu();
1880 if (cpu != me && (unsigned) cpu < nr_cpu_ids && cpu_online(cpu)) 1880 if (cpu != me && (unsigned) cpu < nr_cpu_ids && cpu_online(cpu))
1881 if (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests)) 1881 if (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests))
1882 smp_send_reschedule(cpu); 1882 smp_send_reschedule(cpu);
1883 put_cpu(); 1883 put_cpu();
1884 } 1884 }
1885 1885
1886 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) 1886 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
1887 { 1887 {
1888 return __apic_accept_irq(vcpu, irq->vector); 1888 return __apic_accept_irq(vcpu, irq->vector);
1889 } 1889 }
1890 1890
1891 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) 1891 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
1892 { 1892 {
1893 return apic->vcpu->vcpu_id == dest; 1893 return apic->vcpu->vcpu_id == dest;
1894 } 1894 }
1895 1895
1896 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda) 1896 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
1897 { 1897 {
1898 return 0; 1898 return 0;
1899 } 1899 }
1900 1900
1901 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) 1901 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
1902 { 1902 {
1903 return vcpu1->arch.xtp - vcpu2->arch.xtp; 1903 return vcpu1->arch.xtp - vcpu2->arch.xtp;
1904 } 1904 }
1905 1905
1906 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, 1906 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
1907 int short_hand, int dest, int dest_mode) 1907 int short_hand, int dest, int dest_mode)
1908 { 1908 {
1909 struct kvm_lapic *target = vcpu->arch.apic; 1909 struct kvm_lapic *target = vcpu->arch.apic;
1910 return (dest_mode == 0) ? 1910 return (dest_mode == 0) ?
1911 kvm_apic_match_physical_addr(target, dest) : 1911 kvm_apic_match_physical_addr(target, dest) :
1912 kvm_apic_match_logical_addr(target, dest); 1912 kvm_apic_match_logical_addr(target, dest);
1913 } 1913 }
1914 1914
1915 static int find_highest_bits(int *dat) 1915 static int find_highest_bits(int *dat)
1916 { 1916 {
1917 u32 bits, bitnum; 1917 u32 bits, bitnum;
1918 int i; 1918 int i;
1919 1919
1920 /* loop for all 256 bits */ 1920 /* loop for all 256 bits */
1921 for (i = 7; i >= 0 ; i--) { 1921 for (i = 7; i >= 0 ; i--) {
1922 bits = dat[i]; 1922 bits = dat[i];
1923 if (bits) { 1923 if (bits) {
1924 bitnum = fls(bits); 1924 bitnum = fls(bits);
1925 return i * 32 + bitnum - 1; 1925 return i * 32 + bitnum - 1;
1926 } 1926 }
1927 } 1927 }
1928 1928
1929 return -1; 1929 return -1;
1930 } 1930 }
1931 1931
1932 int kvm_highest_pending_irq(struct kvm_vcpu *vcpu) 1932 int kvm_highest_pending_irq(struct kvm_vcpu *vcpu)
1933 { 1933 {
1934 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); 1934 struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
1935 1935
1936 if (vpd->irr[0] & (1UL << NMI_VECTOR)) 1936 if (vpd->irr[0] & (1UL << NMI_VECTOR))
1937 return NMI_VECTOR; 1937 return NMI_VECTOR;
1938 if (vpd->irr[0] & (1UL << ExtINT_VECTOR)) 1938 if (vpd->irr[0] & (1UL << ExtINT_VECTOR))
1939 return ExtINT_VECTOR; 1939 return ExtINT_VECTOR;
1940 1940
1941 return find_highest_bits((int *)&vpd->irr[0]); 1941 return find_highest_bits((int *)&vpd->irr[0]);
1942 } 1942 }
1943 1943
1944 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) 1944 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
1945 { 1945 {
1946 return vcpu->arch.timer_fired; 1946 return vcpu->arch.timer_fired;
1947 } 1947 }
1948 1948
1949 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
1950 {
1951 return gfn;
1952 }
1953
1954 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 1949 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
1955 { 1950 {
1956 return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) || 1951 return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) ||
1957 (kvm_highest_pending_irq(vcpu) != -1); 1952 (kvm_highest_pending_irq(vcpu) != -1);
1958 } 1953 }
1959 1954
1960 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 1955 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
1961 struct kvm_mp_state *mp_state) 1956 struct kvm_mp_state *mp_state)
1962 { 1957 {
1963 mp_state->mp_state = vcpu->arch.mp_state; 1958 mp_state->mp_state = vcpu->arch.mp_state;
1964 return 0; 1959 return 0;
1965 } 1960 }
1966 1961
1967 static int vcpu_reset(struct kvm_vcpu *vcpu) 1962 static int vcpu_reset(struct kvm_vcpu *vcpu)
1968 { 1963 {
1969 int r; 1964 int r;
1970 long psr; 1965 long psr;
1971 local_irq_save(psr); 1966 local_irq_save(psr);
1972 r = kvm_insert_vmm_mapping(vcpu); 1967 r = kvm_insert_vmm_mapping(vcpu);
1973 local_irq_restore(psr); 1968 local_irq_restore(psr);
1974 if (r) 1969 if (r)
1975 goto fail; 1970 goto fail;
1976 1971
1977 vcpu->arch.launched = 0; 1972 vcpu->arch.launched = 0;
1978 kvm_arch_vcpu_uninit(vcpu); 1973 kvm_arch_vcpu_uninit(vcpu);
1979 r = kvm_arch_vcpu_init(vcpu); 1974 r = kvm_arch_vcpu_init(vcpu);
1980 if (r) 1975 if (r)
1981 goto fail; 1976 goto fail;
1982 1977
1983 kvm_purge_vmm_mapping(vcpu); 1978 kvm_purge_vmm_mapping(vcpu);
1984 r = 0; 1979 r = 0;
1985 fail: 1980 fail:
1986 return r; 1981 return r;
1987 } 1982 }
1988 1983
1989 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 1984 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
1990 struct kvm_mp_state *mp_state) 1985 struct kvm_mp_state *mp_state)
1991 { 1986 {
1992 int r = 0; 1987 int r = 0;
1993 1988
1994 vcpu->arch.mp_state = mp_state->mp_state; 1989 vcpu->arch.mp_state = mp_state->mp_state;
1995 if (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) 1990 if (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
1996 r = vcpu_reset(vcpu); 1991 r = vcpu_reset(vcpu);
1997 return r; 1992 return r;
1998 } 1993 }
1999 1994
arch/powerpc/kvm/powerpc.c
1 /* 1 /*
2 * This program is free software; you can redistribute it and/or modify 2 * This program is free software; you can redistribute it and/or modify
3 * it under the terms of the GNU General Public License, version 2, as 3 * it under the terms of the GNU General Public License, version 2, as
4 * published by the Free Software Foundation. 4 * published by the Free Software Foundation.
5 * 5 *
6 * This program is distributed in the hope that it will be useful, 6 * This program is distributed in the hope that it will be useful,
7 * but WITHOUT ANY WARRANTY; without even the implied warranty of 7 * but WITHOUT ANY WARRANTY; without even the implied warranty of
8 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 8 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
9 * GNU General Public License for more details. 9 * GNU General Public License for more details.
10 * 10 *
11 * You should have received a copy of the GNU General Public License 11 * You should have received a copy of the GNU General Public License
12 * along with this program; if not, write to the Free Software 12 * along with this program; if not, write to the Free Software
13 * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. 13 * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
14 * 14 *
15 * Copyright IBM Corp. 2007 15 * Copyright IBM Corp. 2007
16 * 16 *
17 * Authors: Hollis Blanchard <hollisb@us.ibm.com> 17 * Authors: Hollis Blanchard <hollisb@us.ibm.com>
18 * Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> 18 * Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
19 */ 19 */
20 20
21 #include <linux/errno.h> 21 #include <linux/errno.h>
22 #include <linux/err.h> 22 #include <linux/err.h>
23 #include <linux/kvm_host.h> 23 #include <linux/kvm_host.h>
24 #include <linux/module.h> 24 #include <linux/module.h>
25 #include <linux/vmalloc.h> 25 #include <linux/vmalloc.h>
26 #include <linux/hrtimer.h> 26 #include <linux/hrtimer.h>
27 #include <linux/fs.h> 27 #include <linux/fs.h>
28 #include <linux/slab.h> 28 #include <linux/slab.h>
29 #include <asm/cputable.h> 29 #include <asm/cputable.h>
30 #include <asm/uaccess.h> 30 #include <asm/uaccess.h>
31 #include <asm/kvm_ppc.h> 31 #include <asm/kvm_ppc.h>
32 #include <asm/tlbflush.h> 32 #include <asm/tlbflush.h>
33 #include "timing.h" 33 #include "timing.h"
34 #include "../mm/mmu_decl.h" 34 #include "../mm/mmu_decl.h"
35 35
36 #define CREATE_TRACE_POINTS 36 #define CREATE_TRACE_POINTS
37 #include "trace.h" 37 #include "trace.h"
38 38
39 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
40 {
41 return gfn;
42 }
43
44 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) 39 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
45 { 40 {
46 return !(v->arch.msr & MSR_WE) || !!(v->arch.pending_exceptions); 41 return !(v->arch.msr & MSR_WE) || !!(v->arch.pending_exceptions);
47 } 42 }
48 43
49 44
50 int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) 45 int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
51 { 46 {
52 enum emulation_result er; 47 enum emulation_result er;
53 int r; 48 int r;
54 49
55 er = kvmppc_emulate_instruction(run, vcpu); 50 er = kvmppc_emulate_instruction(run, vcpu);
56 switch (er) { 51 switch (er) {
57 case EMULATE_DONE: 52 case EMULATE_DONE:
58 /* Future optimization: only reload non-volatiles if they were 53 /* Future optimization: only reload non-volatiles if they were
59 * actually modified. */ 54 * actually modified. */
60 r = RESUME_GUEST_NV; 55 r = RESUME_GUEST_NV;
61 break; 56 break;
62 case EMULATE_DO_MMIO: 57 case EMULATE_DO_MMIO:
63 run->exit_reason = KVM_EXIT_MMIO; 58 run->exit_reason = KVM_EXIT_MMIO;
64 /* We must reload nonvolatiles because "update" load/store 59 /* We must reload nonvolatiles because "update" load/store
65 * instructions modify register state. */ 60 * instructions modify register state. */
66 /* Future optimization: only reload non-volatiles if they were 61 /* Future optimization: only reload non-volatiles if they were
67 * actually modified. */ 62 * actually modified. */
68 r = RESUME_HOST_NV; 63 r = RESUME_HOST_NV;
69 break; 64 break;
70 case EMULATE_FAIL: 65 case EMULATE_FAIL:
71 /* XXX Deliver Program interrupt to guest. */ 66 /* XXX Deliver Program interrupt to guest. */
72 printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__, 67 printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__,
73 kvmppc_get_last_inst(vcpu)); 68 kvmppc_get_last_inst(vcpu));
74 r = RESUME_HOST; 69 r = RESUME_HOST;
75 break; 70 break;
76 default: 71 default:
77 BUG(); 72 BUG();
78 } 73 }
79 74
80 return r; 75 return r;
81 } 76 }
82 77
83 int kvm_arch_hardware_enable(void *garbage) 78 int kvm_arch_hardware_enable(void *garbage)
84 { 79 {
85 return 0; 80 return 0;
86 } 81 }
87 82
88 void kvm_arch_hardware_disable(void *garbage) 83 void kvm_arch_hardware_disable(void *garbage)
89 { 84 {
90 } 85 }
91 86
92 int kvm_arch_hardware_setup(void) 87 int kvm_arch_hardware_setup(void)
93 { 88 {
94 return 0; 89 return 0;
95 } 90 }
96 91
97 void kvm_arch_hardware_unsetup(void) 92 void kvm_arch_hardware_unsetup(void)
98 { 93 {
99 } 94 }
100 95
101 void kvm_arch_check_processor_compat(void *rtn) 96 void kvm_arch_check_processor_compat(void *rtn)
102 { 97 {
103 *(int *)rtn = kvmppc_core_check_processor_compat(); 98 *(int *)rtn = kvmppc_core_check_processor_compat();
104 } 99 }
105 100
106 struct kvm *kvm_arch_create_vm(void) 101 struct kvm *kvm_arch_create_vm(void)
107 { 102 {
108 struct kvm *kvm; 103 struct kvm *kvm;
109 104
110 kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); 105 kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
111 if (!kvm) 106 if (!kvm)
112 return ERR_PTR(-ENOMEM); 107 return ERR_PTR(-ENOMEM);
113 108
114 return kvm; 109 return kvm;
115 } 110 }
116 111
117 static void kvmppc_free_vcpus(struct kvm *kvm) 112 static void kvmppc_free_vcpus(struct kvm *kvm)
118 { 113 {
119 unsigned int i; 114 unsigned int i;
120 struct kvm_vcpu *vcpu; 115 struct kvm_vcpu *vcpu;
121 116
122 kvm_for_each_vcpu(i, vcpu, kvm) 117 kvm_for_each_vcpu(i, vcpu, kvm)
123 kvm_arch_vcpu_free(vcpu); 118 kvm_arch_vcpu_free(vcpu);
124 119
125 mutex_lock(&kvm->lock); 120 mutex_lock(&kvm->lock);
126 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) 121 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
127 kvm->vcpus[i] = NULL; 122 kvm->vcpus[i] = NULL;
128 123
129 atomic_set(&kvm->online_vcpus, 0); 124 atomic_set(&kvm->online_vcpus, 0);
130 mutex_unlock(&kvm->lock); 125 mutex_unlock(&kvm->lock);
131 } 126 }
132 127
133 void kvm_arch_sync_events(struct kvm *kvm) 128 void kvm_arch_sync_events(struct kvm *kvm)
134 { 129 {
135 } 130 }
136 131
137 void kvm_arch_destroy_vm(struct kvm *kvm) 132 void kvm_arch_destroy_vm(struct kvm *kvm)
138 { 133 {
139 kvmppc_free_vcpus(kvm); 134 kvmppc_free_vcpus(kvm);
140 kvm_free_physmem(kvm); 135 kvm_free_physmem(kvm);
141 cleanup_srcu_struct(&kvm->srcu); 136 cleanup_srcu_struct(&kvm->srcu);
142 kfree(kvm); 137 kfree(kvm);
143 } 138 }
144 139
145 int kvm_dev_ioctl_check_extension(long ext) 140 int kvm_dev_ioctl_check_extension(long ext)
146 { 141 {
147 int r; 142 int r;
148 143
149 switch (ext) { 144 switch (ext) {
150 case KVM_CAP_PPC_SEGSTATE: 145 case KVM_CAP_PPC_SEGSTATE:
151 case KVM_CAP_PPC_PAIRED_SINGLES: 146 case KVM_CAP_PPC_PAIRED_SINGLES:
152 case KVM_CAP_PPC_UNSET_IRQ: 147 case KVM_CAP_PPC_UNSET_IRQ:
153 case KVM_CAP_ENABLE_CAP: 148 case KVM_CAP_ENABLE_CAP:
154 case KVM_CAP_PPC_OSI: 149 case KVM_CAP_PPC_OSI:
155 r = 1; 150 r = 1;
156 break; 151 break;
157 case KVM_CAP_COALESCED_MMIO: 152 case KVM_CAP_COALESCED_MMIO:
158 r = KVM_COALESCED_MMIO_PAGE_OFFSET; 153 r = KVM_COALESCED_MMIO_PAGE_OFFSET;
159 break; 154 break;
160 default: 155 default:
161 r = 0; 156 r = 0;
162 break; 157 break;
163 } 158 }
164 return r; 159 return r;
165 160
166 } 161 }
167 162
168 long kvm_arch_dev_ioctl(struct file *filp, 163 long kvm_arch_dev_ioctl(struct file *filp,
169 unsigned int ioctl, unsigned long arg) 164 unsigned int ioctl, unsigned long arg)
170 { 165 {
171 return -EINVAL; 166 return -EINVAL;
172 } 167 }
173 168
174 int kvm_arch_prepare_memory_region(struct kvm *kvm, 169 int kvm_arch_prepare_memory_region(struct kvm *kvm,
175 struct kvm_memory_slot *memslot, 170 struct kvm_memory_slot *memslot,
176 struct kvm_memory_slot old, 171 struct kvm_memory_slot old,
177 struct kvm_userspace_memory_region *mem, 172 struct kvm_userspace_memory_region *mem,
178 int user_alloc) 173 int user_alloc)
179 { 174 {
180 return 0; 175 return 0;
181 } 176 }
182 177
183 void kvm_arch_commit_memory_region(struct kvm *kvm, 178 void kvm_arch_commit_memory_region(struct kvm *kvm,
184 struct kvm_userspace_memory_region *mem, 179 struct kvm_userspace_memory_region *mem,
185 struct kvm_memory_slot old, 180 struct kvm_memory_slot old,
186 int user_alloc) 181 int user_alloc)
187 { 182 {
188 return; 183 return;
189 } 184 }
190 185
191 186
192 void kvm_arch_flush_shadow(struct kvm *kvm) 187 void kvm_arch_flush_shadow(struct kvm *kvm)
193 { 188 {
194 } 189 }
195 190
196 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) 191 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
197 { 192 {
198 struct kvm_vcpu *vcpu; 193 struct kvm_vcpu *vcpu;
199 vcpu = kvmppc_core_vcpu_create(kvm, id); 194 vcpu = kvmppc_core_vcpu_create(kvm, id);
200 if (!IS_ERR(vcpu)) 195 if (!IS_ERR(vcpu))
201 kvmppc_create_vcpu_debugfs(vcpu, id); 196 kvmppc_create_vcpu_debugfs(vcpu, id);
202 return vcpu; 197 return vcpu;
203 } 198 }
204 199
205 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) 200 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
206 { 201 {
207 /* Make sure we're not using the vcpu anymore */ 202 /* Make sure we're not using the vcpu anymore */
208 hrtimer_cancel(&vcpu->arch.dec_timer); 203 hrtimer_cancel(&vcpu->arch.dec_timer);
209 tasklet_kill(&vcpu->arch.tasklet); 204 tasklet_kill(&vcpu->arch.tasklet);
210 205
211 kvmppc_remove_vcpu_debugfs(vcpu); 206 kvmppc_remove_vcpu_debugfs(vcpu);
212 kvmppc_core_vcpu_free(vcpu); 207 kvmppc_core_vcpu_free(vcpu);
213 } 208 }
214 209
215 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 210 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
216 { 211 {
217 kvm_arch_vcpu_free(vcpu); 212 kvm_arch_vcpu_free(vcpu);
218 } 213 }
219 214
220 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) 215 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
221 { 216 {
222 return kvmppc_core_pending_dec(vcpu); 217 return kvmppc_core_pending_dec(vcpu);
223 } 218 }
224 219
225 static void kvmppc_decrementer_func(unsigned long data) 220 static void kvmppc_decrementer_func(unsigned long data)
226 { 221 {
227 struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; 222 struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
228 223
229 kvmppc_core_queue_dec(vcpu); 224 kvmppc_core_queue_dec(vcpu);
230 225
231 if (waitqueue_active(&vcpu->wq)) { 226 if (waitqueue_active(&vcpu->wq)) {
232 wake_up_interruptible(&vcpu->wq); 227 wake_up_interruptible(&vcpu->wq);
233 vcpu->stat.halt_wakeup++; 228 vcpu->stat.halt_wakeup++;
234 } 229 }
235 } 230 }
236 231
237 /* 232 /*
238 * low level hrtimer wake routine. Because this runs in hardirq context 233 * low level hrtimer wake routine. Because this runs in hardirq context
239 * we schedule a tasklet to do the real work. 234 * we schedule a tasklet to do the real work.
240 */ 235 */
241 enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer) 236 enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
242 { 237 {
243 struct kvm_vcpu *vcpu; 238 struct kvm_vcpu *vcpu;
244 239
245 vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer); 240 vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
246 tasklet_schedule(&vcpu->arch.tasklet); 241 tasklet_schedule(&vcpu->arch.tasklet);
247 242
248 return HRTIMER_NORESTART; 243 return HRTIMER_NORESTART;
249 } 244 }
250 245
251 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) 246 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
252 { 247 {
253 hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); 248 hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
254 tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu); 249 tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
255 vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup; 250 vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
256 251
257 return 0; 252 return 0;
258 } 253 }
259 254
260 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) 255 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
261 { 256 {
262 kvmppc_mmu_destroy(vcpu); 257 kvmppc_mmu_destroy(vcpu);
263 } 258 }
264 259
265 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 260 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
266 { 261 {
267 kvmppc_core_vcpu_load(vcpu, cpu); 262 kvmppc_core_vcpu_load(vcpu, cpu);
268 } 263 }
269 264
270 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 265 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
271 { 266 {
272 kvmppc_core_vcpu_put(vcpu); 267 kvmppc_core_vcpu_put(vcpu);
273 } 268 }
274 269
275 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 270 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
276 struct kvm_guest_debug *dbg) 271 struct kvm_guest_debug *dbg)
277 { 272 {
278 return -EINVAL; 273 return -EINVAL;
279 } 274 }
280 275
281 static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu, 276 static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu,
282 struct kvm_run *run) 277 struct kvm_run *run)
283 { 278 {
284 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data); 279 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data);
285 } 280 }
286 281
287 static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, 282 static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu,
288 struct kvm_run *run) 283 struct kvm_run *run)
289 { 284 {
290 u64 uninitialized_var(gpr); 285 u64 uninitialized_var(gpr);
291 286
292 if (run->mmio.len > sizeof(gpr)) { 287 if (run->mmio.len > sizeof(gpr)) {
293 printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len); 288 printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len);
294 return; 289 return;
295 } 290 }
296 291
297 if (vcpu->arch.mmio_is_bigendian) { 292 if (vcpu->arch.mmio_is_bigendian) {
298 switch (run->mmio.len) { 293 switch (run->mmio.len) {
299 case 8: gpr = *(u64 *)run->mmio.data; break; 294 case 8: gpr = *(u64 *)run->mmio.data; break;
300 case 4: gpr = *(u32 *)run->mmio.data; break; 295 case 4: gpr = *(u32 *)run->mmio.data; break;
301 case 2: gpr = *(u16 *)run->mmio.data; break; 296 case 2: gpr = *(u16 *)run->mmio.data; break;
302 case 1: gpr = *(u8 *)run->mmio.data; break; 297 case 1: gpr = *(u8 *)run->mmio.data; break;
303 } 298 }
304 } else { 299 } else {
305 /* Convert BE data from userland back to LE. */ 300 /* Convert BE data from userland back to LE. */
306 switch (run->mmio.len) { 301 switch (run->mmio.len) {
307 case 4: gpr = ld_le32((u32 *)run->mmio.data); break; 302 case 4: gpr = ld_le32((u32 *)run->mmio.data); break;
308 case 2: gpr = ld_le16((u16 *)run->mmio.data); break; 303 case 2: gpr = ld_le16((u16 *)run->mmio.data); break;
309 case 1: gpr = *(u8 *)run->mmio.data; break; 304 case 1: gpr = *(u8 *)run->mmio.data; break;
310 } 305 }
311 } 306 }
312 307
313 if (vcpu->arch.mmio_sign_extend) { 308 if (vcpu->arch.mmio_sign_extend) {
314 switch (run->mmio.len) { 309 switch (run->mmio.len) {
315 #ifdef CONFIG_PPC64 310 #ifdef CONFIG_PPC64
316 case 4: 311 case 4:
317 gpr = (s64)(s32)gpr; 312 gpr = (s64)(s32)gpr;
318 break; 313 break;
319 #endif 314 #endif
320 case 2: 315 case 2:
321 gpr = (s64)(s16)gpr; 316 gpr = (s64)(s16)gpr;
322 break; 317 break;
323 case 1: 318 case 1:
324 gpr = (s64)(s8)gpr; 319 gpr = (s64)(s8)gpr;
325 break; 320 break;
326 } 321 }
327 } 322 }
328 323
329 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); 324 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr);
330 325
331 switch (vcpu->arch.io_gpr & KVM_REG_EXT_MASK) { 326 switch (vcpu->arch.io_gpr & KVM_REG_EXT_MASK) {
332 case KVM_REG_GPR: 327 case KVM_REG_GPR:
333 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); 328 kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr);
334 break; 329 break;
335 case KVM_REG_FPR: 330 case KVM_REG_FPR:
336 vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; 331 vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr;
337 break; 332 break;
338 #ifdef CONFIG_PPC_BOOK3S 333 #ifdef CONFIG_PPC_BOOK3S
339 case KVM_REG_QPR: 334 case KVM_REG_QPR:
340 vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; 335 vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr;
341 break; 336 break;
342 case KVM_REG_FQPR: 337 case KVM_REG_FQPR:
343 vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; 338 vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr;
344 vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; 339 vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr;
345 break; 340 break;
346 #endif 341 #endif
347 default: 342 default:
348 BUG(); 343 BUG();
349 } 344 }
350 } 345 }
351 346
352 int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, 347 int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
353 unsigned int rt, unsigned int bytes, int is_bigendian) 348 unsigned int rt, unsigned int bytes, int is_bigendian)
354 { 349 {
355 if (bytes > sizeof(run->mmio.data)) { 350 if (bytes > sizeof(run->mmio.data)) {
356 printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, 351 printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
357 run->mmio.len); 352 run->mmio.len);
358 } 353 }
359 354
360 run->mmio.phys_addr = vcpu->arch.paddr_accessed; 355 run->mmio.phys_addr = vcpu->arch.paddr_accessed;
361 run->mmio.len = bytes; 356 run->mmio.len = bytes;
362 run->mmio.is_write = 0; 357 run->mmio.is_write = 0;
363 358
364 vcpu->arch.io_gpr = rt; 359 vcpu->arch.io_gpr = rt;
365 vcpu->arch.mmio_is_bigendian = is_bigendian; 360 vcpu->arch.mmio_is_bigendian = is_bigendian;
366 vcpu->mmio_needed = 1; 361 vcpu->mmio_needed = 1;
367 vcpu->mmio_is_write = 0; 362 vcpu->mmio_is_write = 0;
368 vcpu->arch.mmio_sign_extend = 0; 363 vcpu->arch.mmio_sign_extend = 0;
369 364
370 return EMULATE_DO_MMIO; 365 return EMULATE_DO_MMIO;
371 } 366 }
372 367
373 /* Same as above, but sign extends */ 368 /* Same as above, but sign extends */
374 int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, 369 int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu,
375 unsigned int rt, unsigned int bytes, int is_bigendian) 370 unsigned int rt, unsigned int bytes, int is_bigendian)
376 { 371 {
377 int r; 372 int r;
378 373
379 r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); 374 r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian);
380 vcpu->arch.mmio_sign_extend = 1; 375 vcpu->arch.mmio_sign_extend = 1;
381 376
382 return r; 377 return r;
383 } 378 }
384 379
385 int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, 380 int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
386 u64 val, unsigned int bytes, int is_bigendian) 381 u64 val, unsigned int bytes, int is_bigendian)
387 { 382 {
388 void *data = run->mmio.data; 383 void *data = run->mmio.data;
389 384
390 if (bytes > sizeof(run->mmio.data)) { 385 if (bytes > sizeof(run->mmio.data)) {
391 printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, 386 printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
392 run->mmio.len); 387 run->mmio.len);
393 } 388 }
394 389
395 run->mmio.phys_addr = vcpu->arch.paddr_accessed; 390 run->mmio.phys_addr = vcpu->arch.paddr_accessed;
396 run->mmio.len = bytes; 391 run->mmio.len = bytes;
397 run->mmio.is_write = 1; 392 run->mmio.is_write = 1;
398 vcpu->mmio_needed = 1; 393 vcpu->mmio_needed = 1;
399 vcpu->mmio_is_write = 1; 394 vcpu->mmio_is_write = 1;
400 395
401 /* Store the value at the lowest bytes in 'data'. */ 396 /* Store the value at the lowest bytes in 'data'. */
402 if (is_bigendian) { 397 if (is_bigendian) {
403 switch (bytes) { 398 switch (bytes) {
404 case 8: *(u64 *)data = val; break; 399 case 8: *(u64 *)data = val; break;
405 case 4: *(u32 *)data = val; break; 400 case 4: *(u32 *)data = val; break;
406 case 2: *(u16 *)data = val; break; 401 case 2: *(u16 *)data = val; break;
407 case 1: *(u8 *)data = val; break; 402 case 1: *(u8 *)data = val; break;
408 } 403 }
409 } else { 404 } else {
410 /* Store LE value into 'data'. */ 405 /* Store LE value into 'data'. */
411 switch (bytes) { 406 switch (bytes) {
412 case 4: st_le32(data, val); break; 407 case 4: st_le32(data, val); break;
413 case 2: st_le16(data, val); break; 408 case 2: st_le16(data, val); break;
414 case 1: *(u8 *)data = val; break; 409 case 1: *(u8 *)data = val; break;
415 } 410 }
416 } 411 }
417 412
418 return EMULATE_DO_MMIO; 413 return EMULATE_DO_MMIO;
419 } 414 }
420 415
421 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) 416 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
422 { 417 {
423 int r; 418 int r;
424 sigset_t sigsaved; 419 sigset_t sigsaved;
425 420
426 if (vcpu->sigset_active) 421 if (vcpu->sigset_active)
427 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); 422 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
428 423
429 if (vcpu->mmio_needed) { 424 if (vcpu->mmio_needed) {
430 if (!vcpu->mmio_is_write) 425 if (!vcpu->mmio_is_write)
431 kvmppc_complete_mmio_load(vcpu, run); 426 kvmppc_complete_mmio_load(vcpu, run);
432 vcpu->mmio_needed = 0; 427 vcpu->mmio_needed = 0;
433 } else if (vcpu->arch.dcr_needed) { 428 } else if (vcpu->arch.dcr_needed) {
434 if (!vcpu->arch.dcr_is_write) 429 if (!vcpu->arch.dcr_is_write)
435 kvmppc_complete_dcr_load(vcpu, run); 430 kvmppc_complete_dcr_load(vcpu, run);
436 vcpu->arch.dcr_needed = 0; 431 vcpu->arch.dcr_needed = 0;
437 } else if (vcpu->arch.osi_needed) { 432 } else if (vcpu->arch.osi_needed) {
438 u64 *gprs = run->osi.gprs; 433 u64 *gprs = run->osi.gprs;
439 int i; 434 int i;
440 435
441 for (i = 0; i < 32; i++) 436 for (i = 0; i < 32; i++)
442 kvmppc_set_gpr(vcpu, i, gprs[i]); 437 kvmppc_set_gpr(vcpu, i, gprs[i]);
443 vcpu->arch.osi_needed = 0; 438 vcpu->arch.osi_needed = 0;
444 } 439 }
445 440
446 kvmppc_core_deliver_interrupts(vcpu); 441 kvmppc_core_deliver_interrupts(vcpu);
447 442
448 local_irq_disable(); 443 local_irq_disable();
449 kvm_guest_enter(); 444 kvm_guest_enter();
450 r = __kvmppc_vcpu_run(run, vcpu); 445 r = __kvmppc_vcpu_run(run, vcpu);
451 kvm_guest_exit(); 446 kvm_guest_exit();
452 local_irq_enable(); 447 local_irq_enable();
453 448
454 if (vcpu->sigset_active) 449 if (vcpu->sigset_active)
455 sigprocmask(SIG_SETMASK, &sigsaved, NULL); 450 sigprocmask(SIG_SETMASK, &sigsaved, NULL);
456 451
457 return r; 452 return r;
458 } 453 }
459 454
460 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) 455 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
461 { 456 {
462 if (irq->irq == KVM_INTERRUPT_UNSET) 457 if (irq->irq == KVM_INTERRUPT_UNSET)
463 kvmppc_core_dequeue_external(vcpu, irq); 458 kvmppc_core_dequeue_external(vcpu, irq);
464 else 459 else
465 kvmppc_core_queue_external(vcpu, irq); 460 kvmppc_core_queue_external(vcpu, irq);
466 461
467 if (waitqueue_active(&vcpu->wq)) { 462 if (waitqueue_active(&vcpu->wq)) {
468 wake_up_interruptible(&vcpu->wq); 463 wake_up_interruptible(&vcpu->wq);
469 vcpu->stat.halt_wakeup++; 464 vcpu->stat.halt_wakeup++;
470 } 465 }
471 466
472 return 0; 467 return 0;
473 } 468 }
474 469
475 static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, 470 static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
476 struct kvm_enable_cap *cap) 471 struct kvm_enable_cap *cap)
477 { 472 {
478 int r; 473 int r;
479 474
480 if (cap->flags) 475 if (cap->flags)
481 return -EINVAL; 476 return -EINVAL;
482 477
483 switch (cap->cap) { 478 switch (cap->cap) {
484 case KVM_CAP_PPC_OSI: 479 case KVM_CAP_PPC_OSI:
485 r = 0; 480 r = 0;
486 vcpu->arch.osi_enabled = true; 481 vcpu->arch.osi_enabled = true;
487 break; 482 break;
488 default: 483 default:
489 r = -EINVAL; 484 r = -EINVAL;
490 break; 485 break;
491 } 486 }
492 487
493 return r; 488 return r;
494 } 489 }
495 490
496 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 491 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
497 struct kvm_mp_state *mp_state) 492 struct kvm_mp_state *mp_state)
498 { 493 {
499 return -EINVAL; 494 return -EINVAL;
500 } 495 }
501 496
502 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 497 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
503 struct kvm_mp_state *mp_state) 498 struct kvm_mp_state *mp_state)
504 { 499 {
505 return -EINVAL; 500 return -EINVAL;
506 } 501 }
507 502
508 long kvm_arch_vcpu_ioctl(struct file *filp, 503 long kvm_arch_vcpu_ioctl(struct file *filp,
509 unsigned int ioctl, unsigned long arg) 504 unsigned int ioctl, unsigned long arg)
510 { 505 {
511 struct kvm_vcpu *vcpu = filp->private_data; 506 struct kvm_vcpu *vcpu = filp->private_data;
512 void __user *argp = (void __user *)arg; 507 void __user *argp = (void __user *)arg;
513 long r; 508 long r;
514 509
515 switch (ioctl) { 510 switch (ioctl) {
516 case KVM_INTERRUPT: { 511 case KVM_INTERRUPT: {
517 struct kvm_interrupt irq; 512 struct kvm_interrupt irq;
518 r = -EFAULT; 513 r = -EFAULT;
519 if (copy_from_user(&irq, argp, sizeof(irq))) 514 if (copy_from_user(&irq, argp, sizeof(irq)))
520 goto out; 515 goto out;
521 r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); 516 r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
522 goto out; 517 goto out;
523 } 518 }
524 519
525 case KVM_ENABLE_CAP: 520 case KVM_ENABLE_CAP:
526 { 521 {
527 struct kvm_enable_cap cap; 522 struct kvm_enable_cap cap;
528 r = -EFAULT; 523 r = -EFAULT;
529 if (copy_from_user(&cap, argp, sizeof(cap))) 524 if (copy_from_user(&cap, argp, sizeof(cap)))
530 goto out; 525 goto out;
531 r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap); 526 r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap);
532 break; 527 break;
533 } 528 }
534 default: 529 default:
535 r = -EINVAL; 530 r = -EINVAL;
536 } 531 }
537 532
538 out: 533 out:
539 return r; 534 return r;
540 } 535 }
541 536
542 long kvm_arch_vm_ioctl(struct file *filp, 537 long kvm_arch_vm_ioctl(struct file *filp,
543 unsigned int ioctl, unsigned long arg) 538 unsigned int ioctl, unsigned long arg)
544 { 539 {
545 long r; 540 long r;
546 541
547 switch (ioctl) { 542 switch (ioctl) {
548 default: 543 default:
549 r = -ENOTTY; 544 r = -ENOTTY;
550 } 545 }
551 546
552 return r; 547 return r;
553 } 548 }
554 549
555 int kvm_arch_init(void *opaque) 550 int kvm_arch_init(void *opaque)
556 { 551 {
557 return 0; 552 return 0;
558 } 553 }
559 554
560 void kvm_arch_exit(void) 555 void kvm_arch_exit(void)
561 { 556 {
562 } 557 }
563 558
arch/s390/kvm/kvm-s390.c
1 /* 1 /*
2 * s390host.c -- hosting zSeries kernel virtual machines 2 * s390host.c -- hosting zSeries kernel virtual machines
3 * 3 *
4 * Copyright IBM Corp. 2008,2009 4 * Copyright IBM Corp. 2008,2009
5 * 5 *
6 * This program is free software; you can redistribute it and/or modify 6 * This program is free software; you can redistribute it and/or modify
7 * it under the terms of the GNU General Public License (version 2 only) 7 * it under the terms of the GNU General Public License (version 2 only)
8 * as published by the Free Software Foundation. 8 * as published by the Free Software Foundation.
9 * 9 *
10 * Author(s): Carsten Otte <cotte@de.ibm.com> 10 * Author(s): Carsten Otte <cotte@de.ibm.com>
11 * Christian Borntraeger <borntraeger@de.ibm.com> 11 * Christian Borntraeger <borntraeger@de.ibm.com>
12 * Heiko Carstens <heiko.carstens@de.ibm.com> 12 * Heiko Carstens <heiko.carstens@de.ibm.com>
13 * Christian Ehrhardt <ehrhardt@de.ibm.com> 13 * Christian Ehrhardt <ehrhardt@de.ibm.com>
14 */ 14 */
15 15
16 #include <linux/compiler.h> 16 #include <linux/compiler.h>
17 #include <linux/err.h> 17 #include <linux/err.h>
18 #include <linux/fs.h> 18 #include <linux/fs.h>
19 #include <linux/hrtimer.h> 19 #include <linux/hrtimer.h>
20 #include <linux/init.h> 20 #include <linux/init.h>
21 #include <linux/kvm.h> 21 #include <linux/kvm.h>
22 #include <linux/kvm_host.h> 22 #include <linux/kvm_host.h>
23 #include <linux/module.h> 23 #include <linux/module.h>
24 #include <linux/slab.h> 24 #include <linux/slab.h>
25 #include <linux/timer.h> 25 #include <linux/timer.h>
26 #include <asm/asm-offsets.h> 26 #include <asm/asm-offsets.h>
27 #include <asm/lowcore.h> 27 #include <asm/lowcore.h>
28 #include <asm/pgtable.h> 28 #include <asm/pgtable.h>
29 #include <asm/nmi.h> 29 #include <asm/nmi.h>
30 #include <asm/system.h> 30 #include <asm/system.h>
31 #include "kvm-s390.h" 31 #include "kvm-s390.h"
32 #include "gaccess.h" 32 #include "gaccess.h"
33 33
34 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU 34 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
35 35
36 struct kvm_stats_debugfs_item debugfs_entries[] = { 36 struct kvm_stats_debugfs_item debugfs_entries[] = {
37 { "userspace_handled", VCPU_STAT(exit_userspace) }, 37 { "userspace_handled", VCPU_STAT(exit_userspace) },
38 { "exit_null", VCPU_STAT(exit_null) }, 38 { "exit_null", VCPU_STAT(exit_null) },
39 { "exit_validity", VCPU_STAT(exit_validity) }, 39 { "exit_validity", VCPU_STAT(exit_validity) },
40 { "exit_stop_request", VCPU_STAT(exit_stop_request) }, 40 { "exit_stop_request", VCPU_STAT(exit_stop_request) },
41 { "exit_external_request", VCPU_STAT(exit_external_request) }, 41 { "exit_external_request", VCPU_STAT(exit_external_request) },
42 { "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) }, 42 { "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) },
43 { "exit_instruction", VCPU_STAT(exit_instruction) }, 43 { "exit_instruction", VCPU_STAT(exit_instruction) },
44 { "exit_program_interruption", VCPU_STAT(exit_program_interruption) }, 44 { "exit_program_interruption", VCPU_STAT(exit_program_interruption) },
45 { "exit_instr_and_program_int", VCPU_STAT(exit_instr_and_program) }, 45 { "exit_instr_and_program_int", VCPU_STAT(exit_instr_and_program) },
46 { "instruction_lctlg", VCPU_STAT(instruction_lctlg) }, 46 { "instruction_lctlg", VCPU_STAT(instruction_lctlg) },
47 { "instruction_lctl", VCPU_STAT(instruction_lctl) }, 47 { "instruction_lctl", VCPU_STAT(instruction_lctl) },
48 { "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) }, 48 { "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) },
49 { "deliver_service_signal", VCPU_STAT(deliver_service_signal) }, 49 { "deliver_service_signal", VCPU_STAT(deliver_service_signal) },
50 { "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) }, 50 { "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) },
51 { "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) }, 51 { "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) },
52 { "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) }, 52 { "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) },
53 { "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) }, 53 { "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) },
54 { "deliver_program_interruption", VCPU_STAT(deliver_program_int) }, 54 { "deliver_program_interruption", VCPU_STAT(deliver_program_int) },
55 { "exit_wait_state", VCPU_STAT(exit_wait_state) }, 55 { "exit_wait_state", VCPU_STAT(exit_wait_state) },
56 { "instruction_stidp", VCPU_STAT(instruction_stidp) }, 56 { "instruction_stidp", VCPU_STAT(instruction_stidp) },
57 { "instruction_spx", VCPU_STAT(instruction_spx) }, 57 { "instruction_spx", VCPU_STAT(instruction_spx) },
58 { "instruction_stpx", VCPU_STAT(instruction_stpx) }, 58 { "instruction_stpx", VCPU_STAT(instruction_stpx) },
59 { "instruction_stap", VCPU_STAT(instruction_stap) }, 59 { "instruction_stap", VCPU_STAT(instruction_stap) },
60 { "instruction_storage_key", VCPU_STAT(instruction_storage_key) }, 60 { "instruction_storage_key", VCPU_STAT(instruction_storage_key) },
61 { "instruction_stsch", VCPU_STAT(instruction_stsch) }, 61 { "instruction_stsch", VCPU_STAT(instruction_stsch) },
62 { "instruction_chsc", VCPU_STAT(instruction_chsc) }, 62 { "instruction_chsc", VCPU_STAT(instruction_chsc) },
63 { "instruction_stsi", VCPU_STAT(instruction_stsi) }, 63 { "instruction_stsi", VCPU_STAT(instruction_stsi) },
64 { "instruction_stfl", VCPU_STAT(instruction_stfl) }, 64 { "instruction_stfl", VCPU_STAT(instruction_stfl) },
65 { "instruction_sigp_sense", VCPU_STAT(instruction_sigp_sense) }, 65 { "instruction_sigp_sense", VCPU_STAT(instruction_sigp_sense) },
66 { "instruction_sigp_emergency", VCPU_STAT(instruction_sigp_emergency) }, 66 { "instruction_sigp_emergency", VCPU_STAT(instruction_sigp_emergency) },
67 { "instruction_sigp_stop", VCPU_STAT(instruction_sigp_stop) }, 67 { "instruction_sigp_stop", VCPU_STAT(instruction_sigp_stop) },
68 { "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) }, 68 { "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) },
69 { "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) }, 69 { "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) },
70 { "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) }, 70 { "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) },
71 { "diagnose_44", VCPU_STAT(diagnose_44) }, 71 { "diagnose_44", VCPU_STAT(diagnose_44) },
72 { NULL } 72 { NULL }
73 }; 73 };
74 74
75 static unsigned long long *facilities; 75 static unsigned long long *facilities;
76 76
77 /* Section: not file related */ 77 /* Section: not file related */
78 int kvm_arch_hardware_enable(void *garbage) 78 int kvm_arch_hardware_enable(void *garbage)
79 { 79 {
80 /* every s390 is virtualization enabled ;-) */ 80 /* every s390 is virtualization enabled ;-) */
81 return 0; 81 return 0;
82 } 82 }
83 83
84 void kvm_arch_hardware_disable(void *garbage) 84 void kvm_arch_hardware_disable(void *garbage)
85 { 85 {
86 } 86 }
87 87
88 int kvm_arch_hardware_setup(void) 88 int kvm_arch_hardware_setup(void)
89 { 89 {
90 return 0; 90 return 0;
91 } 91 }
92 92
93 void kvm_arch_hardware_unsetup(void) 93 void kvm_arch_hardware_unsetup(void)
94 { 94 {
95 } 95 }
96 96
97 void kvm_arch_check_processor_compat(void *rtn) 97 void kvm_arch_check_processor_compat(void *rtn)
98 { 98 {
99 } 99 }
100 100
101 int kvm_arch_init(void *opaque) 101 int kvm_arch_init(void *opaque)
102 { 102 {
103 return 0; 103 return 0;
104 } 104 }
105 105
106 void kvm_arch_exit(void) 106 void kvm_arch_exit(void)
107 { 107 {
108 } 108 }
109 109
110 /* Section: device related */ 110 /* Section: device related */
111 long kvm_arch_dev_ioctl(struct file *filp, 111 long kvm_arch_dev_ioctl(struct file *filp,
112 unsigned int ioctl, unsigned long arg) 112 unsigned int ioctl, unsigned long arg)
113 { 113 {
114 if (ioctl == KVM_S390_ENABLE_SIE) 114 if (ioctl == KVM_S390_ENABLE_SIE)
115 return s390_enable_sie(); 115 return s390_enable_sie();
116 return -EINVAL; 116 return -EINVAL;
117 } 117 }
118 118
119 int kvm_dev_ioctl_check_extension(long ext) 119 int kvm_dev_ioctl_check_extension(long ext)
120 { 120 {
121 int r; 121 int r;
122 122
123 switch (ext) { 123 switch (ext) {
124 case KVM_CAP_S390_PSW: 124 case KVM_CAP_S390_PSW:
125 r = 1; 125 r = 1;
126 break; 126 break;
127 default: 127 default:
128 r = 0; 128 r = 0;
129 } 129 }
130 return r; 130 return r;
131 } 131 }
132 132
133 /* Section: vm related */ 133 /* Section: vm related */
134 /* 134 /*
135 * Get (and clear) the dirty memory log for a memory slot. 135 * Get (and clear) the dirty memory log for a memory slot.
136 */ 136 */
137 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, 137 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
138 struct kvm_dirty_log *log) 138 struct kvm_dirty_log *log)
139 { 139 {
140 return 0; 140 return 0;
141 } 141 }
142 142
143 long kvm_arch_vm_ioctl(struct file *filp, 143 long kvm_arch_vm_ioctl(struct file *filp,
144 unsigned int ioctl, unsigned long arg) 144 unsigned int ioctl, unsigned long arg)
145 { 145 {
146 struct kvm *kvm = filp->private_data; 146 struct kvm *kvm = filp->private_data;
147 void __user *argp = (void __user *)arg; 147 void __user *argp = (void __user *)arg;
148 int r; 148 int r;
149 149
150 switch (ioctl) { 150 switch (ioctl) {
151 case KVM_S390_INTERRUPT: { 151 case KVM_S390_INTERRUPT: {
152 struct kvm_s390_interrupt s390int; 152 struct kvm_s390_interrupt s390int;
153 153
154 r = -EFAULT; 154 r = -EFAULT;
155 if (copy_from_user(&s390int, argp, sizeof(s390int))) 155 if (copy_from_user(&s390int, argp, sizeof(s390int)))
156 break; 156 break;
157 r = kvm_s390_inject_vm(kvm, &s390int); 157 r = kvm_s390_inject_vm(kvm, &s390int);
158 break; 158 break;
159 } 159 }
160 default: 160 default:
161 r = -ENOTTY; 161 r = -ENOTTY;
162 } 162 }
163 163
164 return r; 164 return r;
165 } 165 }
166 166
167 struct kvm *kvm_arch_create_vm(void) 167 struct kvm *kvm_arch_create_vm(void)
168 { 168 {
169 struct kvm *kvm; 169 struct kvm *kvm;
170 int rc; 170 int rc;
171 char debug_name[16]; 171 char debug_name[16];
172 172
173 rc = s390_enable_sie(); 173 rc = s390_enable_sie();
174 if (rc) 174 if (rc)
175 goto out_nokvm; 175 goto out_nokvm;
176 176
177 rc = -ENOMEM; 177 rc = -ENOMEM;
178 kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); 178 kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
179 if (!kvm) 179 if (!kvm)
180 goto out_nokvm; 180 goto out_nokvm;
181 181
182 kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL); 182 kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL);
183 if (!kvm->arch.sca) 183 if (!kvm->arch.sca)
184 goto out_nosca; 184 goto out_nosca;
185 185
186 sprintf(debug_name, "kvm-%u", current->pid); 186 sprintf(debug_name, "kvm-%u", current->pid);
187 187
188 kvm->arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long)); 188 kvm->arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long));
189 if (!kvm->arch.dbf) 189 if (!kvm->arch.dbf)
190 goto out_nodbf; 190 goto out_nodbf;
191 191
192 spin_lock_init(&kvm->arch.float_int.lock); 192 spin_lock_init(&kvm->arch.float_int.lock);
193 INIT_LIST_HEAD(&kvm->arch.float_int.list); 193 INIT_LIST_HEAD(&kvm->arch.float_int.list);
194 194
195 debug_register_view(kvm->arch.dbf, &debug_sprintf_view); 195 debug_register_view(kvm->arch.dbf, &debug_sprintf_view);
196 VM_EVENT(kvm, 3, "%s", "vm created"); 196 VM_EVENT(kvm, 3, "%s", "vm created");
197 197
198 return kvm; 198 return kvm;
199 out_nodbf: 199 out_nodbf:
200 free_page((unsigned long)(kvm->arch.sca)); 200 free_page((unsigned long)(kvm->arch.sca));
201 out_nosca: 201 out_nosca:
202 kfree(kvm); 202 kfree(kvm);
203 out_nokvm: 203 out_nokvm:
204 return ERR_PTR(rc); 204 return ERR_PTR(rc);
205 } 205 }
206 206
207 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 207 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
208 { 208 {
209 VCPU_EVENT(vcpu, 3, "%s", "free cpu"); 209 VCPU_EVENT(vcpu, 3, "%s", "free cpu");
210 clear_bit(63 - vcpu->vcpu_id, (unsigned long *) &vcpu->kvm->arch.sca->mcn); 210 clear_bit(63 - vcpu->vcpu_id, (unsigned long *) &vcpu->kvm->arch.sca->mcn);
211 if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda == 211 if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
212 (__u64) vcpu->arch.sie_block) 212 (__u64) vcpu->arch.sie_block)
213 vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; 213 vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
214 smp_mb(); 214 smp_mb();
215 free_page((unsigned long)(vcpu->arch.sie_block)); 215 free_page((unsigned long)(vcpu->arch.sie_block));
216 kvm_vcpu_uninit(vcpu); 216 kvm_vcpu_uninit(vcpu);
217 kfree(vcpu); 217 kfree(vcpu);
218 } 218 }
219 219
220 static void kvm_free_vcpus(struct kvm *kvm) 220 static void kvm_free_vcpus(struct kvm *kvm)
221 { 221 {
222 unsigned int i; 222 unsigned int i;
223 struct kvm_vcpu *vcpu; 223 struct kvm_vcpu *vcpu;
224 224
225 kvm_for_each_vcpu(i, vcpu, kvm) 225 kvm_for_each_vcpu(i, vcpu, kvm)
226 kvm_arch_vcpu_destroy(vcpu); 226 kvm_arch_vcpu_destroy(vcpu);
227 227
228 mutex_lock(&kvm->lock); 228 mutex_lock(&kvm->lock);
229 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) 229 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
230 kvm->vcpus[i] = NULL; 230 kvm->vcpus[i] = NULL;
231 231
232 atomic_set(&kvm->online_vcpus, 0); 232 atomic_set(&kvm->online_vcpus, 0);
233 mutex_unlock(&kvm->lock); 233 mutex_unlock(&kvm->lock);
234 } 234 }
235 235
236 void kvm_arch_sync_events(struct kvm *kvm) 236 void kvm_arch_sync_events(struct kvm *kvm)
237 { 237 {
238 } 238 }
239 239
240 void kvm_arch_destroy_vm(struct kvm *kvm) 240 void kvm_arch_destroy_vm(struct kvm *kvm)
241 { 241 {
242 kvm_free_vcpus(kvm); 242 kvm_free_vcpus(kvm);
243 kvm_free_physmem(kvm); 243 kvm_free_physmem(kvm);
244 free_page((unsigned long)(kvm->arch.sca)); 244 free_page((unsigned long)(kvm->arch.sca));
245 debug_unregister(kvm->arch.dbf); 245 debug_unregister(kvm->arch.dbf);
246 cleanup_srcu_struct(&kvm->srcu); 246 cleanup_srcu_struct(&kvm->srcu);
247 kfree(kvm); 247 kfree(kvm);
248 } 248 }
249 249
250 /* Section: vcpu related */ 250 /* Section: vcpu related */
251 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) 251 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
252 { 252 {
253 return 0; 253 return 0;
254 } 254 }
255 255
256 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) 256 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
257 { 257 {
258 /* Nothing todo */ 258 /* Nothing todo */
259 } 259 }
260 260
261 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 261 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
262 { 262 {
263 save_fp_regs(&vcpu->arch.host_fpregs); 263 save_fp_regs(&vcpu->arch.host_fpregs);
264 save_access_regs(vcpu->arch.host_acrs); 264 save_access_regs(vcpu->arch.host_acrs);
265 vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK; 265 vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK;
266 restore_fp_regs(&vcpu->arch.guest_fpregs); 266 restore_fp_regs(&vcpu->arch.guest_fpregs);
267 restore_access_regs(vcpu->arch.guest_acrs); 267 restore_access_regs(vcpu->arch.guest_acrs);
268 } 268 }
269 269
270 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 270 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
271 { 271 {
272 save_fp_regs(&vcpu->arch.guest_fpregs); 272 save_fp_regs(&vcpu->arch.guest_fpregs);
273 save_access_regs(vcpu->arch.guest_acrs); 273 save_access_regs(vcpu->arch.guest_acrs);
274 restore_fp_regs(&vcpu->arch.host_fpregs); 274 restore_fp_regs(&vcpu->arch.host_fpregs);
275 restore_access_regs(vcpu->arch.host_acrs); 275 restore_access_regs(vcpu->arch.host_acrs);
276 } 276 }
277 277
278 static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu) 278 static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
279 { 279 {
280 /* this equals initial cpu reset in pop, but we don't switch to ESA */ 280 /* this equals initial cpu reset in pop, but we don't switch to ESA */
281 vcpu->arch.sie_block->gpsw.mask = 0UL; 281 vcpu->arch.sie_block->gpsw.mask = 0UL;
282 vcpu->arch.sie_block->gpsw.addr = 0UL; 282 vcpu->arch.sie_block->gpsw.addr = 0UL;
283 vcpu->arch.sie_block->prefix = 0UL; 283 vcpu->arch.sie_block->prefix = 0UL;
284 vcpu->arch.sie_block->ihcpu = 0xffff; 284 vcpu->arch.sie_block->ihcpu = 0xffff;
285 vcpu->arch.sie_block->cputm = 0UL; 285 vcpu->arch.sie_block->cputm = 0UL;
286 vcpu->arch.sie_block->ckc = 0UL; 286 vcpu->arch.sie_block->ckc = 0UL;
287 vcpu->arch.sie_block->todpr = 0; 287 vcpu->arch.sie_block->todpr = 0;
288 memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64)); 288 memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64));
289 vcpu->arch.sie_block->gcr[0] = 0xE0UL; 289 vcpu->arch.sie_block->gcr[0] = 0xE0UL;
290 vcpu->arch.sie_block->gcr[14] = 0xC2000000UL; 290 vcpu->arch.sie_block->gcr[14] = 0xC2000000UL;
291 vcpu->arch.guest_fpregs.fpc = 0; 291 vcpu->arch.guest_fpregs.fpc = 0;
292 asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc)); 292 asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
293 vcpu->arch.sie_block->gbea = 1; 293 vcpu->arch.sie_block->gbea = 1;
294 } 294 }
295 295
296 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) 296 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
297 { 297 {
298 atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH); 298 atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH);
299 set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests); 299 set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests);
300 vcpu->arch.sie_block->ecb = 6; 300 vcpu->arch.sie_block->ecb = 6;
301 vcpu->arch.sie_block->eca = 0xC1002001U; 301 vcpu->arch.sie_block->eca = 0xC1002001U;
302 vcpu->arch.sie_block->fac = (int) (long) facilities; 302 vcpu->arch.sie_block->fac = (int) (long) facilities;
303 hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); 303 hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
304 tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet, 304 tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet,
305 (unsigned long) vcpu); 305 (unsigned long) vcpu);
306 vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup; 306 vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup;
307 get_cpu_id(&vcpu->arch.cpu_id); 307 get_cpu_id(&vcpu->arch.cpu_id);
308 vcpu->arch.cpu_id.version = 0xff; 308 vcpu->arch.cpu_id.version = 0xff;
309 return 0; 309 return 0;
310 } 310 }
311 311
312 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 312 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
313 unsigned int id) 313 unsigned int id)
314 { 314 {
315 struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); 315 struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
316 int rc = -ENOMEM; 316 int rc = -ENOMEM;
317 317
318 if (!vcpu) 318 if (!vcpu)
319 goto out_nomem; 319 goto out_nomem;
320 320
321 vcpu->arch.sie_block = (struct kvm_s390_sie_block *) 321 vcpu->arch.sie_block = (struct kvm_s390_sie_block *)
322 get_zeroed_page(GFP_KERNEL); 322 get_zeroed_page(GFP_KERNEL);
323 323
324 if (!vcpu->arch.sie_block) 324 if (!vcpu->arch.sie_block)
325 goto out_free_cpu; 325 goto out_free_cpu;
326 326
327 vcpu->arch.sie_block->icpua = id; 327 vcpu->arch.sie_block->icpua = id;
328 BUG_ON(!kvm->arch.sca); 328 BUG_ON(!kvm->arch.sca);
329 if (!kvm->arch.sca->cpu[id].sda) 329 if (!kvm->arch.sca->cpu[id].sda)
330 kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block; 330 kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
331 vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32); 331 vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
332 vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca; 332 vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
333 set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn); 333 set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
334 334
335 spin_lock_init(&vcpu->arch.local_int.lock); 335 spin_lock_init(&vcpu->arch.local_int.lock);
336 INIT_LIST_HEAD(&vcpu->arch.local_int.list); 336 INIT_LIST_HEAD(&vcpu->arch.local_int.list);
337 vcpu->arch.local_int.float_int = &kvm->arch.float_int; 337 vcpu->arch.local_int.float_int = &kvm->arch.float_int;
338 spin_lock(&kvm->arch.float_int.lock); 338 spin_lock(&kvm->arch.float_int.lock);
339 kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int; 339 kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int;
340 init_waitqueue_head(&vcpu->arch.local_int.wq); 340 init_waitqueue_head(&vcpu->arch.local_int.wq);
341 vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags; 341 vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags;
342 spin_unlock(&kvm->arch.float_int.lock); 342 spin_unlock(&kvm->arch.float_int.lock);
343 343
344 rc = kvm_vcpu_init(vcpu, kvm, id); 344 rc = kvm_vcpu_init(vcpu, kvm, id);
345 if (rc) 345 if (rc)
346 goto out_free_sie_block; 346 goto out_free_sie_block;
347 VM_EVENT(kvm, 3, "create cpu %d at %p, sie block at %p", id, vcpu, 347 VM_EVENT(kvm, 3, "create cpu %d at %p, sie block at %p", id, vcpu,
348 vcpu->arch.sie_block); 348 vcpu->arch.sie_block);
349 349
350 return vcpu; 350 return vcpu;
351 out_free_sie_block: 351 out_free_sie_block:
352 free_page((unsigned long)(vcpu->arch.sie_block)); 352 free_page((unsigned long)(vcpu->arch.sie_block));
353 out_free_cpu: 353 out_free_cpu:
354 kfree(vcpu); 354 kfree(vcpu);
355 out_nomem: 355 out_nomem:
356 return ERR_PTR(rc); 356 return ERR_PTR(rc);
357 } 357 }
358 358
359 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 359 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
360 { 360 {
361 /* kvm common code refers to this, but never calls it */ 361 /* kvm common code refers to this, but never calls it */
362 BUG(); 362 BUG();
363 return 0; 363 return 0;
364 } 364 }
365 365
366 static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) 366 static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
367 { 367 {
368 kvm_s390_vcpu_initial_reset(vcpu); 368 kvm_s390_vcpu_initial_reset(vcpu);
369 return 0; 369 return 0;
370 } 370 }
371 371
372 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 372 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
373 { 373 {
374 memcpy(&vcpu->arch.guest_gprs, &regs->gprs, sizeof(regs->gprs)); 374 memcpy(&vcpu->arch.guest_gprs, &regs->gprs, sizeof(regs->gprs));
375 return 0; 375 return 0;
376 } 376 }
377 377
378 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 378 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
379 { 379 {
380 memcpy(&regs->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs)); 380 memcpy(&regs->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs));
381 return 0; 381 return 0;
382 } 382 }
383 383
384 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, 384 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
385 struct kvm_sregs *sregs) 385 struct kvm_sregs *sregs)
386 { 386 {
387 memcpy(&vcpu->arch.guest_acrs, &sregs->acrs, sizeof(sregs->acrs)); 387 memcpy(&vcpu->arch.guest_acrs, &sregs->acrs, sizeof(sregs->acrs));
388 memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs)); 388 memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs));
389 return 0; 389 return 0;
390 } 390 }
391 391
392 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, 392 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
393 struct kvm_sregs *sregs) 393 struct kvm_sregs *sregs)
394 { 394 {
395 memcpy(&sregs->acrs, &vcpu->arch.guest_acrs, sizeof(sregs->acrs)); 395 memcpy(&sregs->acrs, &vcpu->arch.guest_acrs, sizeof(sregs->acrs));
396 memcpy(&sregs->crs, &vcpu->arch.sie_block->gcr, sizeof(sregs->crs)); 396 memcpy(&sregs->crs, &vcpu->arch.sie_block->gcr, sizeof(sregs->crs));
397 return 0; 397 return 0;
398 } 398 }
399 399
400 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 400 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
401 { 401 {
402 memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs)); 402 memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs));
403 vcpu->arch.guest_fpregs.fpc = fpu->fpc; 403 vcpu->arch.guest_fpregs.fpc = fpu->fpc;
404 return 0; 404 return 0;
405 } 405 }
406 406
407 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 407 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
408 { 408 {
409 memcpy(&fpu->fprs, &vcpu->arch.guest_fpregs.fprs, sizeof(fpu->fprs)); 409 memcpy(&fpu->fprs, &vcpu->arch.guest_fpregs.fprs, sizeof(fpu->fprs));
410 fpu->fpc = vcpu->arch.guest_fpregs.fpc; 410 fpu->fpc = vcpu->arch.guest_fpregs.fpc;
411 return 0; 411 return 0;
412 } 412 }
413 413
414 static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw) 414 static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw)
415 { 415 {
416 int rc = 0; 416 int rc = 0;
417 417
418 if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING) 418 if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING)
419 rc = -EBUSY; 419 rc = -EBUSY;
420 else { 420 else {
421 vcpu->run->psw_mask = psw.mask; 421 vcpu->run->psw_mask = psw.mask;
422 vcpu->run->psw_addr = psw.addr; 422 vcpu->run->psw_addr = psw.addr;
423 } 423 }
424 return rc; 424 return rc;
425 } 425 }
426 426
427 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, 427 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
428 struct kvm_translation *tr) 428 struct kvm_translation *tr)
429 { 429 {
430 return -EINVAL; /* not implemented yet */ 430 return -EINVAL; /* not implemented yet */
431 } 431 }
432 432
433 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 433 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
434 struct kvm_guest_debug *dbg) 434 struct kvm_guest_debug *dbg)
435 { 435 {
436 return -EINVAL; /* not implemented yet */ 436 return -EINVAL; /* not implemented yet */
437 } 437 }
438 438
439 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 439 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
440 struct kvm_mp_state *mp_state) 440 struct kvm_mp_state *mp_state)
441 { 441 {
442 return -EINVAL; /* not implemented yet */ 442 return -EINVAL; /* not implemented yet */
443 } 443 }
444 444
445 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 445 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
446 struct kvm_mp_state *mp_state) 446 struct kvm_mp_state *mp_state)
447 { 447 {
448 return -EINVAL; /* not implemented yet */ 448 return -EINVAL; /* not implemented yet */
449 } 449 }
450 450
451 static void __vcpu_run(struct kvm_vcpu *vcpu) 451 static void __vcpu_run(struct kvm_vcpu *vcpu)
452 { 452 {
453 memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); 453 memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
454 454
455 if (need_resched()) 455 if (need_resched())
456 schedule(); 456 schedule();
457 457
458 if (test_thread_flag(TIF_MCCK_PENDING)) 458 if (test_thread_flag(TIF_MCCK_PENDING))
459 s390_handle_mcck(); 459 s390_handle_mcck();
460 460
461 kvm_s390_deliver_pending_interrupts(vcpu); 461 kvm_s390_deliver_pending_interrupts(vcpu);
462 462
463 vcpu->arch.sie_block->icptcode = 0; 463 vcpu->arch.sie_block->icptcode = 0;
464 local_irq_disable(); 464 local_irq_disable();
465 kvm_guest_enter(); 465 kvm_guest_enter();
466 local_irq_enable(); 466 local_irq_enable();
467 VCPU_EVENT(vcpu, 6, "entering sie flags %x", 467 VCPU_EVENT(vcpu, 6, "entering sie flags %x",
468 atomic_read(&vcpu->arch.sie_block->cpuflags)); 468 atomic_read(&vcpu->arch.sie_block->cpuflags));
469 if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) { 469 if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) {
470 VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); 470 VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
471 kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); 471 kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
472 } 472 }
473 VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", 473 VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
474 vcpu->arch.sie_block->icptcode); 474 vcpu->arch.sie_block->icptcode);
475 local_irq_disable(); 475 local_irq_disable();
476 kvm_guest_exit(); 476 kvm_guest_exit();
477 local_irq_enable(); 477 local_irq_enable();
478 478
479 memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16); 479 memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16);
480 } 480 }
481 481
482 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 482 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
483 { 483 {
484 int rc; 484 int rc;
485 sigset_t sigsaved; 485 sigset_t sigsaved;
486 486
487 rerun_vcpu: 487 rerun_vcpu:
488 if (vcpu->requests) 488 if (vcpu->requests)
489 if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) 489 if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
490 kvm_s390_vcpu_set_mem(vcpu); 490 kvm_s390_vcpu_set_mem(vcpu);
491 491
492 /* verify, that memory has been registered */ 492 /* verify, that memory has been registered */
493 if (!vcpu->arch.sie_block->gmslm) { 493 if (!vcpu->arch.sie_block->gmslm) {
494 vcpu_put(vcpu); 494 vcpu_put(vcpu);
495 VCPU_EVENT(vcpu, 3, "%s", "no memory registered to run vcpu"); 495 VCPU_EVENT(vcpu, 3, "%s", "no memory registered to run vcpu");
496 return -EINVAL; 496 return -EINVAL;
497 } 497 }
498 498
499 if (vcpu->sigset_active) 499 if (vcpu->sigset_active)
500 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); 500 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
501 501
502 atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags); 502 atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
503 503
504 BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL); 504 BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);
505 505
506 switch (kvm_run->exit_reason) { 506 switch (kvm_run->exit_reason) {
507 case KVM_EXIT_S390_SIEIC: 507 case KVM_EXIT_S390_SIEIC:
508 case KVM_EXIT_UNKNOWN: 508 case KVM_EXIT_UNKNOWN:
509 case KVM_EXIT_INTR: 509 case KVM_EXIT_INTR:
510 case KVM_EXIT_S390_RESET: 510 case KVM_EXIT_S390_RESET:
511 break; 511 break;
512 default: 512 default:
513 BUG(); 513 BUG();
514 } 514 }
515 515
516 vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask; 516 vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
517 vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr; 517 vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
518 518
519 might_fault(); 519 might_fault();
520 520
521 do { 521 do {
522 __vcpu_run(vcpu); 522 __vcpu_run(vcpu);
523 rc = kvm_handle_sie_intercept(vcpu); 523 rc = kvm_handle_sie_intercept(vcpu);
524 } while (!signal_pending(current) && !rc); 524 } while (!signal_pending(current) && !rc);
525 525
526 if (rc == SIE_INTERCEPT_RERUNVCPU) 526 if (rc == SIE_INTERCEPT_RERUNVCPU)
527 goto rerun_vcpu; 527 goto rerun_vcpu;
528 528
529 if (signal_pending(current) && !rc) { 529 if (signal_pending(current) && !rc) {
530 kvm_run->exit_reason = KVM_EXIT_INTR; 530 kvm_run->exit_reason = KVM_EXIT_INTR;
531 rc = -EINTR; 531 rc = -EINTR;
532 } 532 }
533 533
534 if (rc == -EOPNOTSUPP) { 534 if (rc == -EOPNOTSUPP) {
535 /* intercept cannot be handled in-kernel, prepare kvm-run */ 535 /* intercept cannot be handled in-kernel, prepare kvm-run */
536 kvm_run->exit_reason = KVM_EXIT_S390_SIEIC; 536 kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
537 kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode; 537 kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode;
538 kvm_run->s390_sieic.ipa = vcpu->arch.sie_block->ipa; 538 kvm_run->s390_sieic.ipa = vcpu->arch.sie_block->ipa;
539 kvm_run->s390_sieic.ipb = vcpu->arch.sie_block->ipb; 539 kvm_run->s390_sieic.ipb = vcpu->arch.sie_block->ipb;
540 rc = 0; 540 rc = 0;
541 } 541 }
542 542
543 if (rc == -EREMOTE) { 543 if (rc == -EREMOTE) {
544 /* intercept was handled, but userspace support is needed 544 /* intercept was handled, but userspace support is needed
545 * kvm_run has been prepared by the handler */ 545 * kvm_run has been prepared by the handler */
546 rc = 0; 546 rc = 0;
547 } 547 }
548 548
549 kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask; 549 kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask;
550 kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr; 550 kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr;
551 551
552 if (vcpu->sigset_active) 552 if (vcpu->sigset_active)
553 sigprocmask(SIG_SETMASK, &sigsaved, NULL); 553 sigprocmask(SIG_SETMASK, &sigsaved, NULL);
554 554
555 vcpu->stat.exit_userspace++; 555 vcpu->stat.exit_userspace++;
556 return rc; 556 return rc;
557 } 557 }
558 558
559 static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from, 559 static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from,
560 unsigned long n, int prefix) 560 unsigned long n, int prefix)
561 { 561 {
562 if (prefix) 562 if (prefix)
563 return copy_to_guest(vcpu, guestdest, from, n); 563 return copy_to_guest(vcpu, guestdest, from, n);
564 else 564 else
565 return copy_to_guest_absolute(vcpu, guestdest, from, n); 565 return copy_to_guest_absolute(vcpu, guestdest, from, n);
566 } 566 }
567 567
568 /* 568 /*
569 * store status at address 569 * store status at address
570 * we use have two special cases: 570 * we use have two special cases:
571 * KVM_S390_STORE_STATUS_NOADDR: -> 0x1200 on 64 bit 571 * KVM_S390_STORE_STATUS_NOADDR: -> 0x1200 on 64 bit
572 * KVM_S390_STORE_STATUS_PREFIXED: -> prefix 572 * KVM_S390_STORE_STATUS_PREFIXED: -> prefix
573 */ 573 */
574 int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) 574 int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr)
575 { 575 {
576 const unsigned char archmode = 1; 576 const unsigned char archmode = 1;
577 int prefix; 577 int prefix;
578 578
579 if (addr == KVM_S390_STORE_STATUS_NOADDR) { 579 if (addr == KVM_S390_STORE_STATUS_NOADDR) {
580 if (copy_to_guest_absolute(vcpu, 163ul, &archmode, 1)) 580 if (copy_to_guest_absolute(vcpu, 163ul, &archmode, 1))
581 return -EFAULT; 581 return -EFAULT;
582 addr = SAVE_AREA_BASE; 582 addr = SAVE_AREA_BASE;
583 prefix = 0; 583 prefix = 0;
584 } else if (addr == KVM_S390_STORE_STATUS_PREFIXED) { 584 } else if (addr == KVM_S390_STORE_STATUS_PREFIXED) {
585 if (copy_to_guest(vcpu, 163ul, &archmode, 1)) 585 if (copy_to_guest(vcpu, 163ul, &archmode, 1))
586 return -EFAULT; 586 return -EFAULT;
587 addr = SAVE_AREA_BASE; 587 addr = SAVE_AREA_BASE;
588 prefix = 1; 588 prefix = 1;
589 } else 589 } else
590 prefix = 0; 590 prefix = 0;
591 591
592 if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs), 592 if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs),
593 vcpu->arch.guest_fpregs.fprs, 128, prefix)) 593 vcpu->arch.guest_fpregs.fprs, 128, prefix))
594 return -EFAULT; 594 return -EFAULT;
595 595
596 if (__guestcopy(vcpu, addr + offsetof(struct save_area, gp_regs), 596 if (__guestcopy(vcpu, addr + offsetof(struct save_area, gp_regs),
597 vcpu->arch.guest_gprs, 128, prefix)) 597 vcpu->arch.guest_gprs, 128, prefix))
598 return -EFAULT; 598 return -EFAULT;
599 599
600 if (__guestcopy(vcpu, addr + offsetof(struct save_area, psw), 600 if (__guestcopy(vcpu, addr + offsetof(struct save_area, psw),
601 &vcpu->arch.sie_block->gpsw, 16, prefix)) 601 &vcpu->arch.sie_block->gpsw, 16, prefix))
602 return -EFAULT; 602 return -EFAULT;
603 603
604 if (__guestcopy(vcpu, addr + offsetof(struct save_area, pref_reg), 604 if (__guestcopy(vcpu, addr + offsetof(struct save_area, pref_reg),
605 &vcpu->arch.sie_block->prefix, 4, prefix)) 605 &vcpu->arch.sie_block->prefix, 4, prefix))
606 return -EFAULT; 606 return -EFAULT;
607 607
608 if (__guestcopy(vcpu, 608 if (__guestcopy(vcpu,
609 addr + offsetof(struct save_area, fp_ctrl_reg), 609 addr + offsetof(struct save_area, fp_ctrl_reg),
610 &vcpu->arch.guest_fpregs.fpc, 4, prefix)) 610 &vcpu->arch.guest_fpregs.fpc, 4, prefix))
611 return -EFAULT; 611 return -EFAULT;
612 612
613 if (__guestcopy(vcpu, addr + offsetof(struct save_area, tod_reg), 613 if (__guestcopy(vcpu, addr + offsetof(struct save_area, tod_reg),
614 &vcpu->arch.sie_block->todpr, 4, prefix)) 614 &vcpu->arch.sie_block->todpr, 4, prefix))
615 return -EFAULT; 615 return -EFAULT;
616 616
617 if (__guestcopy(vcpu, addr + offsetof(struct save_area, timer), 617 if (__guestcopy(vcpu, addr + offsetof(struct save_area, timer),
618 &vcpu->arch.sie_block->cputm, 8, prefix)) 618 &vcpu->arch.sie_block->cputm, 8, prefix))
619 return -EFAULT; 619 return -EFAULT;
620 620
621 if (__guestcopy(vcpu, addr + offsetof(struct save_area, clk_cmp), 621 if (__guestcopy(vcpu, addr + offsetof(struct save_area, clk_cmp),
622 &vcpu->arch.sie_block->ckc, 8, prefix)) 622 &vcpu->arch.sie_block->ckc, 8, prefix))
623 return -EFAULT; 623 return -EFAULT;
624 624
625 if (__guestcopy(vcpu, addr + offsetof(struct save_area, acc_regs), 625 if (__guestcopy(vcpu, addr + offsetof(struct save_area, acc_regs),
626 &vcpu->arch.guest_acrs, 64, prefix)) 626 &vcpu->arch.guest_acrs, 64, prefix))
627 return -EFAULT; 627 return -EFAULT;
628 628
629 if (__guestcopy(vcpu, 629 if (__guestcopy(vcpu,
630 addr + offsetof(struct save_area, ctrl_regs), 630 addr + offsetof(struct save_area, ctrl_regs),
631 &vcpu->arch.sie_block->gcr, 128, prefix)) 631 &vcpu->arch.sie_block->gcr, 128, prefix))
632 return -EFAULT; 632 return -EFAULT;
633 return 0; 633 return 0;
634 } 634 }
635 635
636 long kvm_arch_vcpu_ioctl(struct file *filp, 636 long kvm_arch_vcpu_ioctl(struct file *filp,
637 unsigned int ioctl, unsigned long arg) 637 unsigned int ioctl, unsigned long arg)
638 { 638 {
639 struct kvm_vcpu *vcpu = filp->private_data; 639 struct kvm_vcpu *vcpu = filp->private_data;
640 void __user *argp = (void __user *)arg; 640 void __user *argp = (void __user *)arg;
641 long r; 641 long r;
642 642
643 switch (ioctl) { 643 switch (ioctl) {
644 case KVM_S390_INTERRUPT: { 644 case KVM_S390_INTERRUPT: {
645 struct kvm_s390_interrupt s390int; 645 struct kvm_s390_interrupt s390int;
646 646
647 r = -EFAULT; 647 r = -EFAULT;
648 if (copy_from_user(&s390int, argp, sizeof(s390int))) 648 if (copy_from_user(&s390int, argp, sizeof(s390int)))
649 break; 649 break;
650 r = kvm_s390_inject_vcpu(vcpu, &s390int); 650 r = kvm_s390_inject_vcpu(vcpu, &s390int);
651 break; 651 break;
652 } 652 }
653 case KVM_S390_STORE_STATUS: 653 case KVM_S390_STORE_STATUS:
654 r = kvm_s390_vcpu_store_status(vcpu, arg); 654 r = kvm_s390_vcpu_store_status(vcpu, arg);
655 break; 655 break;
656 case KVM_S390_SET_INITIAL_PSW: { 656 case KVM_S390_SET_INITIAL_PSW: {
657 psw_t psw; 657 psw_t psw;
658 658
659 r = -EFAULT; 659 r = -EFAULT;
660 if (copy_from_user(&psw, argp, sizeof(psw))) 660 if (copy_from_user(&psw, argp, sizeof(psw)))
661 break; 661 break;
662 r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw); 662 r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw);
663 break; 663 break;
664 } 664 }
665 case KVM_S390_INITIAL_RESET: 665 case KVM_S390_INITIAL_RESET:
666 r = kvm_arch_vcpu_ioctl_initial_reset(vcpu); 666 r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
667 break; 667 break;
668 default: 668 default:
669 r = -EINVAL; 669 r = -EINVAL;
670 } 670 }
671 return r; 671 return r;
672 } 672 }
673 673
674 /* Section: memory related */ 674 /* Section: memory related */
675 int kvm_arch_prepare_memory_region(struct kvm *kvm, 675 int kvm_arch_prepare_memory_region(struct kvm *kvm,
676 struct kvm_memory_slot *memslot, 676 struct kvm_memory_slot *memslot,
677 struct kvm_memory_slot old, 677 struct kvm_memory_slot old,
678 struct kvm_userspace_memory_region *mem, 678 struct kvm_userspace_memory_region *mem,
679 int user_alloc) 679 int user_alloc)
680 { 680 {
681 /* A few sanity checks. We can have exactly one memory slot which has 681 /* A few sanity checks. We can have exactly one memory slot which has
682 to start at guest virtual zero and which has to be located at a 682 to start at guest virtual zero and which has to be located at a
683 page boundary in userland and which has to end at a page boundary. 683 page boundary in userland and which has to end at a page boundary.
684 The memory in userland is ok to be fragmented into various different 684 The memory in userland is ok to be fragmented into various different
685 vmas. It is okay to mmap() and munmap() stuff in this slot after 685 vmas. It is okay to mmap() and munmap() stuff in this slot after
686 doing this call at any time */ 686 doing this call at any time */
687 687
688 if (mem->slot) 688 if (mem->slot)
689 return -EINVAL; 689 return -EINVAL;
690 690
691 if (mem->guest_phys_addr) 691 if (mem->guest_phys_addr)
692 return -EINVAL; 692 return -EINVAL;
693 693
694 if (mem->userspace_addr & (PAGE_SIZE - 1)) 694 if (mem->userspace_addr & (PAGE_SIZE - 1))
695 return -EINVAL; 695 return -EINVAL;
696 696
697 if (mem->memory_size & (PAGE_SIZE - 1)) 697 if (mem->memory_size & (PAGE_SIZE - 1))
698 return -EINVAL; 698 return -EINVAL;
699 699
700 if (!user_alloc) 700 if (!user_alloc)
701 return -EINVAL; 701 return -EINVAL;
702 702
703 return 0; 703 return 0;
704 } 704 }
705 705
706 void kvm_arch_commit_memory_region(struct kvm *kvm, 706 void kvm_arch_commit_memory_region(struct kvm *kvm,
707 struct kvm_userspace_memory_region *mem, 707 struct kvm_userspace_memory_region *mem,
708 struct kvm_memory_slot old, 708 struct kvm_memory_slot old,
709 int user_alloc) 709 int user_alloc)
710 { 710 {
711 int i; 711 int i;
712 struct kvm_vcpu *vcpu; 712 struct kvm_vcpu *vcpu;
713 713
714 /* request update of sie control block for all available vcpus */ 714 /* request update of sie control block for all available vcpus */
715 kvm_for_each_vcpu(i, vcpu, kvm) { 715 kvm_for_each_vcpu(i, vcpu, kvm) {
716 if (test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) 716 if (test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
717 continue; 717 continue;
718 kvm_s390_inject_sigp_stop(vcpu, ACTION_RELOADVCPU_ON_STOP); 718 kvm_s390_inject_sigp_stop(vcpu, ACTION_RELOADVCPU_ON_STOP);
719 } 719 }
720 } 720 }
721 721
722 void kvm_arch_flush_shadow(struct kvm *kvm) 722 void kvm_arch_flush_shadow(struct kvm *kvm)
723 { 723 {
724 } 724 }
725 725
726 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
727 {
728 return gfn;
729 }
730
731 static int __init kvm_s390_init(void) 726 static int __init kvm_s390_init(void)
732 { 727 {
733 int ret; 728 int ret;
734 ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); 729 ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
735 if (ret) 730 if (ret)
736 return ret; 731 return ret;
737 732
738 /* 733 /*
739 * guests can ask for up to 255+1 double words, we need a full page 734 * guests can ask for up to 255+1 double words, we need a full page
740 * to hold the maximum amount of facilites. On the other hand, we 735 * to hold the maximum amount of facilites. On the other hand, we
741 * only set facilities that are known to work in KVM. 736 * only set facilities that are known to work in KVM.
742 */ 737 */
743 facilities = (unsigned long long *) get_zeroed_page(GFP_KERNEL|GFP_DMA); 738 facilities = (unsigned long long *) get_zeroed_page(GFP_KERNEL|GFP_DMA);
744 if (!facilities) { 739 if (!facilities) {
745 kvm_exit(); 740 kvm_exit();
746 return -ENOMEM; 741 return -ENOMEM;
747 } 742 }
748 stfle(facilities, 1); 743 stfle(facilities, 1);
749 facilities[0] &= 0xff00fff3f0700000ULL; 744 facilities[0] &= 0xff00fff3f0700000ULL;
750 return 0; 745 return 0;
751 } 746 }
752 747
753 static void __exit kvm_s390_exit(void) 748 static void __exit kvm_s390_exit(void)
754 { 749 {
755 free_page((unsigned long) facilities); 750 free_page((unsigned long) facilities);
756 kvm_exit(); 751 kvm_exit();
757 } 752 }
758 753
759 module_init(kvm_s390_init); 754 module_init(kvm_s390_init);
760 module_exit(kvm_s390_exit); 755 module_exit(kvm_s390_exit);
761 756
arch/x86/include/asm/kvm_host.h
1 /* 1 /*
2 * Kernel-based Virtual Machine driver for Linux 2 * Kernel-based Virtual Machine driver for Linux
3 * 3 *
4 * This header defines architecture specific interfaces, x86 version 4 * This header defines architecture specific interfaces, x86 version
5 * 5 *
6 * This work is licensed under the terms of the GNU GPL, version 2. See 6 * This work is licensed under the terms of the GNU GPL, version 2. See
7 * the COPYING file in the top-level directory. 7 * the COPYING file in the top-level directory.
8 * 8 *
9 */ 9 */
10 10
11 #ifndef _ASM_X86_KVM_HOST_H 11 #ifndef _ASM_X86_KVM_HOST_H
12 #define _ASM_X86_KVM_HOST_H 12 #define _ASM_X86_KVM_HOST_H
13 13
14 #include <linux/types.h> 14 #include <linux/types.h>
15 #include <linux/mm.h> 15 #include <linux/mm.h>
16 #include <linux/mmu_notifier.h> 16 #include <linux/mmu_notifier.h>
17 #include <linux/tracepoint.h> 17 #include <linux/tracepoint.h>
18 18
19 #include <linux/kvm.h> 19 #include <linux/kvm.h>
20 #include <linux/kvm_para.h> 20 #include <linux/kvm_para.h>
21 #include <linux/kvm_types.h> 21 #include <linux/kvm_types.h>
22 22
23 #include <asm/pvclock-abi.h> 23 #include <asm/pvclock-abi.h>
24 #include <asm/desc.h> 24 #include <asm/desc.h>
25 #include <asm/mtrr.h> 25 #include <asm/mtrr.h>
26 #include <asm/msr-index.h> 26 #include <asm/msr-index.h>
27 27
28 #define KVM_MAX_VCPUS 64 28 #define KVM_MAX_VCPUS 64
29 #define KVM_MEMORY_SLOTS 32 29 #define KVM_MEMORY_SLOTS 32
30 /* memory slots that does not exposed to userspace */ 30 /* memory slots that does not exposed to userspace */
31 #define KVM_PRIVATE_MEM_SLOTS 4 31 #define KVM_PRIVATE_MEM_SLOTS 4
32 32
33 #define KVM_PIO_PAGE_OFFSET 1 33 #define KVM_PIO_PAGE_OFFSET 1
34 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 34 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
35 35
36 #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1) 36 #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1)
37 #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD)) 37 #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD))
38 #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \ 38 #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \
39 0xFFFFFF0000000000ULL) 39 0xFFFFFF0000000000ULL)
40 40
41 #define INVALID_PAGE (~(hpa_t)0) 41 #define INVALID_PAGE (~(hpa_t)0)
42 #define UNMAPPED_GVA (~(gpa_t)0) 42 #define UNMAPPED_GVA (~(gpa_t)0)
43 43
44 /* KVM Hugepage definitions for x86 */ 44 /* KVM Hugepage definitions for x86 */
45 #define KVM_NR_PAGE_SIZES 3 45 #define KVM_NR_PAGE_SIZES 3
46 #define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9)) 46 #define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9))
47 #define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x)) 47 #define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x))
48 #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1)) 48 #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
49 #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE) 49 #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
50 50
51 #define DE_VECTOR 0 51 #define DE_VECTOR 0
52 #define DB_VECTOR 1 52 #define DB_VECTOR 1
53 #define BP_VECTOR 3 53 #define BP_VECTOR 3
54 #define OF_VECTOR 4 54 #define OF_VECTOR 4
55 #define BR_VECTOR 5 55 #define BR_VECTOR 5
56 #define UD_VECTOR 6 56 #define UD_VECTOR 6
57 #define NM_VECTOR 7 57 #define NM_VECTOR 7
58 #define DF_VECTOR 8 58 #define DF_VECTOR 8
59 #define TS_VECTOR 10 59 #define TS_VECTOR 10
60 #define NP_VECTOR 11 60 #define NP_VECTOR 11
61 #define SS_VECTOR 12 61 #define SS_VECTOR 12
62 #define GP_VECTOR 13 62 #define GP_VECTOR 13
63 #define PF_VECTOR 14 63 #define PF_VECTOR 14
64 #define MF_VECTOR 16 64 #define MF_VECTOR 16
65 #define MC_VECTOR 18 65 #define MC_VECTOR 18
66 66
67 #define SELECTOR_TI_MASK (1 << 2) 67 #define SELECTOR_TI_MASK (1 << 2)
68 #define SELECTOR_RPL_MASK 0x03 68 #define SELECTOR_RPL_MASK 0x03
69 69
70 #define IOPL_SHIFT 12 70 #define IOPL_SHIFT 12
71 71
72 #define KVM_ALIAS_SLOTS 4
73
74 #define KVM_PERMILLE_MMU_PAGES 20 72 #define KVM_PERMILLE_MMU_PAGES 20
75 #define KVM_MIN_ALLOC_MMU_PAGES 64 73 #define KVM_MIN_ALLOC_MMU_PAGES 64
76 #define KVM_MMU_HASH_SHIFT 10 74 #define KVM_MMU_HASH_SHIFT 10
77 #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT) 75 #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
78 #define KVM_MIN_FREE_MMU_PAGES 5 76 #define KVM_MIN_FREE_MMU_PAGES 5
79 #define KVM_REFILL_PAGES 25 77 #define KVM_REFILL_PAGES 25
80 #define KVM_MAX_CPUID_ENTRIES 40 78 #define KVM_MAX_CPUID_ENTRIES 40
81 #define KVM_NR_FIXED_MTRR_REGION 88 79 #define KVM_NR_FIXED_MTRR_REGION 88
82 #define KVM_NR_VAR_MTRR 8 80 #define KVM_NR_VAR_MTRR 8
83 81
84 extern spinlock_t kvm_lock; 82 extern spinlock_t kvm_lock;
85 extern struct list_head vm_list; 83 extern struct list_head vm_list;
86 84
87 struct kvm_vcpu; 85 struct kvm_vcpu;
88 struct kvm; 86 struct kvm;
89 87
90 enum kvm_reg { 88 enum kvm_reg {
91 VCPU_REGS_RAX = 0, 89 VCPU_REGS_RAX = 0,
92 VCPU_REGS_RCX = 1, 90 VCPU_REGS_RCX = 1,
93 VCPU_REGS_RDX = 2, 91 VCPU_REGS_RDX = 2,
94 VCPU_REGS_RBX = 3, 92 VCPU_REGS_RBX = 3,
95 VCPU_REGS_RSP = 4, 93 VCPU_REGS_RSP = 4,
96 VCPU_REGS_RBP = 5, 94 VCPU_REGS_RBP = 5,
97 VCPU_REGS_RSI = 6, 95 VCPU_REGS_RSI = 6,
98 VCPU_REGS_RDI = 7, 96 VCPU_REGS_RDI = 7,
99 #ifdef CONFIG_X86_64 97 #ifdef CONFIG_X86_64
100 VCPU_REGS_R8 = 8, 98 VCPU_REGS_R8 = 8,
101 VCPU_REGS_R9 = 9, 99 VCPU_REGS_R9 = 9,
102 VCPU_REGS_R10 = 10, 100 VCPU_REGS_R10 = 10,
103 VCPU_REGS_R11 = 11, 101 VCPU_REGS_R11 = 11,
104 VCPU_REGS_R12 = 12, 102 VCPU_REGS_R12 = 12,
105 VCPU_REGS_R13 = 13, 103 VCPU_REGS_R13 = 13,
106 VCPU_REGS_R14 = 14, 104 VCPU_REGS_R14 = 14,
107 VCPU_REGS_R15 = 15, 105 VCPU_REGS_R15 = 15,
108 #endif 106 #endif
109 VCPU_REGS_RIP, 107 VCPU_REGS_RIP,
110 NR_VCPU_REGS 108 NR_VCPU_REGS
111 }; 109 };
112 110
113 enum kvm_reg_ex { 111 enum kvm_reg_ex {
114 VCPU_EXREG_PDPTR = NR_VCPU_REGS, 112 VCPU_EXREG_PDPTR = NR_VCPU_REGS,
115 }; 113 };
116 114
117 enum { 115 enum {
118 VCPU_SREG_ES, 116 VCPU_SREG_ES,
119 VCPU_SREG_CS, 117 VCPU_SREG_CS,
120 VCPU_SREG_SS, 118 VCPU_SREG_SS,
121 VCPU_SREG_DS, 119 VCPU_SREG_DS,
122 VCPU_SREG_FS, 120 VCPU_SREG_FS,
123 VCPU_SREG_GS, 121 VCPU_SREG_GS,
124 VCPU_SREG_TR, 122 VCPU_SREG_TR,
125 VCPU_SREG_LDTR, 123 VCPU_SREG_LDTR,
126 }; 124 };
127 125
128 #include <asm/kvm_emulate.h> 126 #include <asm/kvm_emulate.h>
129 127
130 #define KVM_NR_MEM_OBJS 40 128 #define KVM_NR_MEM_OBJS 40
131 129
132 #define KVM_NR_DB_REGS 4 130 #define KVM_NR_DB_REGS 4
133 131
134 #define DR6_BD (1 << 13) 132 #define DR6_BD (1 << 13)
135 #define DR6_BS (1 << 14) 133 #define DR6_BS (1 << 14)
136 #define DR6_FIXED_1 0xffff0ff0 134 #define DR6_FIXED_1 0xffff0ff0
137 #define DR6_VOLATILE 0x0000e00f 135 #define DR6_VOLATILE 0x0000e00f
138 136
139 #define DR7_BP_EN_MASK 0x000000ff 137 #define DR7_BP_EN_MASK 0x000000ff
140 #define DR7_GE (1 << 9) 138 #define DR7_GE (1 << 9)
141 #define DR7_GD (1 << 13) 139 #define DR7_GD (1 << 13)
142 #define DR7_FIXED_1 0x00000400 140 #define DR7_FIXED_1 0x00000400
143 #define DR7_VOLATILE 0xffff23ff 141 #define DR7_VOLATILE 0xffff23ff
144 142
145 /* 143 /*
146 * We don't want allocation failures within the mmu code, so we preallocate 144 * We don't want allocation failures within the mmu code, so we preallocate
147 * enough memory for a single page fault in a cache. 145 * enough memory for a single page fault in a cache.
148 */ 146 */
149 struct kvm_mmu_memory_cache { 147 struct kvm_mmu_memory_cache {
150 int nobjs; 148 int nobjs;
151 void *objects[KVM_NR_MEM_OBJS]; 149 void *objects[KVM_NR_MEM_OBJS];
152 }; 150 };
153 151
154 #define NR_PTE_CHAIN_ENTRIES 5 152 #define NR_PTE_CHAIN_ENTRIES 5
155 153
156 struct kvm_pte_chain { 154 struct kvm_pte_chain {
157 u64 *parent_ptes[NR_PTE_CHAIN_ENTRIES]; 155 u64 *parent_ptes[NR_PTE_CHAIN_ENTRIES];
158 struct hlist_node link; 156 struct hlist_node link;
159 }; 157 };
160 158
161 /* 159 /*
162 * kvm_mmu_page_role, below, is defined as: 160 * kvm_mmu_page_role, below, is defined as:
163 * 161 *
164 * bits 0:3 - total guest paging levels (2-4, or zero for real mode) 162 * bits 0:3 - total guest paging levels (2-4, or zero for real mode)
165 * bits 4:7 - page table level for this shadow (1-4) 163 * bits 4:7 - page table level for this shadow (1-4)
166 * bits 8:9 - page table quadrant for 2-level guests 164 * bits 8:9 - page table quadrant for 2-level guests
167 * bit 16 - direct mapping of virtual to physical mapping at gfn 165 * bit 16 - direct mapping of virtual to physical mapping at gfn
168 * used for real mode and two-dimensional paging 166 * used for real mode and two-dimensional paging
169 * bits 17:19 - common access permissions for all ptes in this shadow page 167 * bits 17:19 - common access permissions for all ptes in this shadow page
170 */ 168 */
171 union kvm_mmu_page_role { 169 union kvm_mmu_page_role {
172 unsigned word; 170 unsigned word;
173 struct { 171 struct {
174 unsigned level:4; 172 unsigned level:4;
175 unsigned cr4_pae:1; 173 unsigned cr4_pae:1;
176 unsigned quadrant:2; 174 unsigned quadrant:2;
177 unsigned pad_for_nice_hex_output:6; 175 unsigned pad_for_nice_hex_output:6;
178 unsigned direct:1; 176 unsigned direct:1;
179 unsigned access:3; 177 unsigned access:3;
180 unsigned invalid:1; 178 unsigned invalid:1;
181 unsigned nxe:1; 179 unsigned nxe:1;
182 unsigned cr0_wp:1; 180 unsigned cr0_wp:1;
183 }; 181 };
184 }; 182 };
185 183
186 struct kvm_mmu_page { 184 struct kvm_mmu_page {
187 struct list_head link; 185 struct list_head link;
188 struct hlist_node hash_link; 186 struct hlist_node hash_link;
189 187
190 /* 188 /*
191 * The following two entries are used to key the shadow page in the 189 * The following two entries are used to key the shadow page in the
192 * hash table. 190 * hash table.
193 */ 191 */
194 gfn_t gfn; 192 gfn_t gfn;
195 union kvm_mmu_page_role role; 193 union kvm_mmu_page_role role;
196 194
197 u64 *spt; 195 u64 *spt;
198 /* hold the gfn of each spte inside spt */ 196 /* hold the gfn of each spte inside spt */
199 gfn_t *gfns; 197 gfn_t *gfns;
200 /* 198 /*
201 * One bit set per slot which has memory 199 * One bit set per slot which has memory
202 * in this shadow page. 200 * in this shadow page.
203 */ 201 */
204 DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); 202 DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
205 bool multimapped; /* More than one parent_pte? */ 203 bool multimapped; /* More than one parent_pte? */
206 bool unsync; 204 bool unsync;
207 int root_count; /* Currently serving as active root */ 205 int root_count; /* Currently serving as active root */
208 unsigned int unsync_children; 206 unsigned int unsync_children;
209 union { 207 union {
210 u64 *parent_pte; /* !multimapped */ 208 u64 *parent_pte; /* !multimapped */
211 struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ 209 struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
212 }; 210 };
213 DECLARE_BITMAP(unsync_child_bitmap, 512); 211 DECLARE_BITMAP(unsync_child_bitmap, 512);
214 }; 212 };
215 213
216 struct kvm_pv_mmu_op_buffer { 214 struct kvm_pv_mmu_op_buffer {
217 void *ptr; 215 void *ptr;
218 unsigned len; 216 unsigned len;
219 unsigned processed; 217 unsigned processed;
220 char buf[512] __aligned(sizeof(long)); 218 char buf[512] __aligned(sizeof(long));
221 }; 219 };
222 220
223 struct kvm_pio_request { 221 struct kvm_pio_request {
224 unsigned long count; 222 unsigned long count;
225 int in; 223 int in;
226 int port; 224 int port;
227 int size; 225 int size;
228 }; 226 };
229 227
230 /* 228 /*
231 * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level 229 * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
232 * 32-bit). The kvm_mmu structure abstracts the details of the current mmu 230 * 32-bit). The kvm_mmu structure abstracts the details of the current mmu
233 * mode. 231 * mode.
234 */ 232 */
235 struct kvm_mmu { 233 struct kvm_mmu {
236 void (*new_cr3)(struct kvm_vcpu *vcpu); 234 void (*new_cr3)(struct kvm_vcpu *vcpu);
237 int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err); 235 int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
238 void (*free)(struct kvm_vcpu *vcpu); 236 void (*free)(struct kvm_vcpu *vcpu);
239 gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, 237 gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
240 u32 *error); 238 u32 *error);
241 void (*prefetch_page)(struct kvm_vcpu *vcpu, 239 void (*prefetch_page)(struct kvm_vcpu *vcpu,
242 struct kvm_mmu_page *page); 240 struct kvm_mmu_page *page);
243 int (*sync_page)(struct kvm_vcpu *vcpu, 241 int (*sync_page)(struct kvm_vcpu *vcpu,
244 struct kvm_mmu_page *sp, bool clear_unsync); 242 struct kvm_mmu_page *sp, bool clear_unsync);
245 void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva); 243 void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
246 hpa_t root_hpa; 244 hpa_t root_hpa;
247 int root_level; 245 int root_level;
248 int shadow_root_level; 246 int shadow_root_level;
249 union kvm_mmu_page_role base_role; 247 union kvm_mmu_page_role base_role;
250 248
251 u64 *pae_root; 249 u64 *pae_root;
252 u64 rsvd_bits_mask[2][4]; 250 u64 rsvd_bits_mask[2][4];
253 }; 251 };
254 252
255 struct kvm_vcpu_arch { 253 struct kvm_vcpu_arch {
256 u64 host_tsc; 254 u64 host_tsc;
257 /* 255 /*
258 * rip and regs accesses must go through 256 * rip and regs accesses must go through
259 * kvm_{register,rip}_{read,write} functions. 257 * kvm_{register,rip}_{read,write} functions.
260 */ 258 */
261 unsigned long regs[NR_VCPU_REGS]; 259 unsigned long regs[NR_VCPU_REGS];
262 u32 regs_avail; 260 u32 regs_avail;
263 u32 regs_dirty; 261 u32 regs_dirty;
264 262
265 unsigned long cr0; 263 unsigned long cr0;
266 unsigned long cr0_guest_owned_bits; 264 unsigned long cr0_guest_owned_bits;
267 unsigned long cr2; 265 unsigned long cr2;
268 unsigned long cr3; 266 unsigned long cr3;
269 unsigned long cr4; 267 unsigned long cr4;
270 unsigned long cr4_guest_owned_bits; 268 unsigned long cr4_guest_owned_bits;
271 unsigned long cr8; 269 unsigned long cr8;
272 u32 hflags; 270 u32 hflags;
273 u64 pdptrs[4]; /* pae */ 271 u64 pdptrs[4]; /* pae */
274 u64 efer; 272 u64 efer;
275 u64 apic_base; 273 u64 apic_base;
276 struct kvm_lapic *apic; /* kernel irqchip context */ 274 struct kvm_lapic *apic; /* kernel irqchip context */
277 int32_t apic_arb_prio; 275 int32_t apic_arb_prio;
278 int mp_state; 276 int mp_state;
279 int sipi_vector; 277 int sipi_vector;
280 u64 ia32_misc_enable_msr; 278 u64 ia32_misc_enable_msr;
281 bool tpr_access_reporting; 279 bool tpr_access_reporting;
282 280
283 struct kvm_mmu mmu; 281 struct kvm_mmu mmu;
284 /* only needed in kvm_pv_mmu_op() path, but it's hot so 282 /* only needed in kvm_pv_mmu_op() path, but it's hot so
285 * put it here to avoid allocation */ 283 * put it here to avoid allocation */
286 struct kvm_pv_mmu_op_buffer mmu_op_buffer; 284 struct kvm_pv_mmu_op_buffer mmu_op_buffer;
287 285
288 struct kvm_mmu_memory_cache mmu_pte_chain_cache; 286 struct kvm_mmu_memory_cache mmu_pte_chain_cache;
289 struct kvm_mmu_memory_cache mmu_rmap_desc_cache; 287 struct kvm_mmu_memory_cache mmu_rmap_desc_cache;
290 struct kvm_mmu_memory_cache mmu_page_cache; 288 struct kvm_mmu_memory_cache mmu_page_cache;
291 struct kvm_mmu_memory_cache mmu_page_header_cache; 289 struct kvm_mmu_memory_cache mmu_page_header_cache;
292 290
293 gfn_t last_pt_write_gfn; 291 gfn_t last_pt_write_gfn;
294 int last_pt_write_count; 292 int last_pt_write_count;
295 u64 *last_pte_updated; 293 u64 *last_pte_updated;
296 gfn_t last_pte_gfn; 294 gfn_t last_pte_gfn;
297 295
298 struct { 296 struct {
299 gfn_t gfn; /* presumed gfn during guest pte update */ 297 gfn_t gfn; /* presumed gfn during guest pte update */
300 pfn_t pfn; /* pfn corresponding to that gfn */ 298 pfn_t pfn; /* pfn corresponding to that gfn */
301 unsigned long mmu_seq; 299 unsigned long mmu_seq;
302 } update_pte; 300 } update_pte;
303 301
304 struct fpu guest_fpu; 302 struct fpu guest_fpu;
305 u64 xcr0; 303 u64 xcr0;
306 304
307 gva_t mmio_fault_cr2; 305 gva_t mmio_fault_cr2;
308 struct kvm_pio_request pio; 306 struct kvm_pio_request pio;
309 void *pio_data; 307 void *pio_data;
310 308
311 u8 event_exit_inst_len; 309 u8 event_exit_inst_len;
312 310
313 struct kvm_queued_exception { 311 struct kvm_queued_exception {
314 bool pending; 312 bool pending;
315 bool has_error_code; 313 bool has_error_code;
316 bool reinject; 314 bool reinject;
317 u8 nr; 315 u8 nr;
318 u32 error_code; 316 u32 error_code;
319 } exception; 317 } exception;
320 318
321 struct kvm_queued_interrupt { 319 struct kvm_queued_interrupt {
322 bool pending; 320 bool pending;
323 bool soft; 321 bool soft;
324 u8 nr; 322 u8 nr;
325 } interrupt; 323 } interrupt;
326 324
327 int halt_request; /* real mode on Intel only */ 325 int halt_request; /* real mode on Intel only */
328 326
329 int cpuid_nent; 327 int cpuid_nent;
330 struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES]; 328 struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES];
331 /* emulate context */ 329 /* emulate context */
332 330
333 struct x86_emulate_ctxt emulate_ctxt; 331 struct x86_emulate_ctxt emulate_ctxt;
334 332
335 gpa_t time; 333 gpa_t time;
336 struct pvclock_vcpu_time_info hv_clock; 334 struct pvclock_vcpu_time_info hv_clock;
337 unsigned int hv_clock_tsc_khz; 335 unsigned int hv_clock_tsc_khz;
338 unsigned int time_offset; 336 unsigned int time_offset;
339 struct page *time_page; 337 struct page *time_page;
340 338
341 bool nmi_pending; 339 bool nmi_pending;
342 bool nmi_injected; 340 bool nmi_injected;
343 341
344 struct mtrr_state_type mtrr_state; 342 struct mtrr_state_type mtrr_state;
345 u32 pat; 343 u32 pat;
346 344
347 int switch_db_regs; 345 int switch_db_regs;
348 unsigned long db[KVM_NR_DB_REGS]; 346 unsigned long db[KVM_NR_DB_REGS];
349 unsigned long dr6; 347 unsigned long dr6;
350 unsigned long dr7; 348 unsigned long dr7;
351 unsigned long eff_db[KVM_NR_DB_REGS]; 349 unsigned long eff_db[KVM_NR_DB_REGS];
352 350
353 u64 mcg_cap; 351 u64 mcg_cap;
354 u64 mcg_status; 352 u64 mcg_status;
355 u64 mcg_ctl; 353 u64 mcg_ctl;
356 u64 *mce_banks; 354 u64 *mce_banks;
357 355
358 /* used for guest single stepping over the given code position */ 356 /* used for guest single stepping over the given code position */
359 unsigned long singlestep_rip; 357 unsigned long singlestep_rip;
360 358
361 /* fields used by HYPER-V emulation */ 359 /* fields used by HYPER-V emulation */
362 u64 hv_vapic; 360 u64 hv_vapic;
363 }; 361 };
364 362
365 struct kvm_mem_alias {
366 gfn_t base_gfn;
367 unsigned long npages;
368 gfn_t target_gfn;
369 #define KVM_ALIAS_INVALID 1UL
370 unsigned long flags;
371 };
372
373 #define KVM_ARCH_HAS_UNALIAS_INSTANTIATION
374
375 struct kvm_mem_aliases {
376 struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
377 int naliases;
378 };
379
380 struct kvm_arch { 363 struct kvm_arch {
381 struct kvm_mem_aliases *aliases;
382
383 unsigned int n_free_mmu_pages; 364 unsigned int n_free_mmu_pages;
384 unsigned int n_requested_mmu_pages; 365 unsigned int n_requested_mmu_pages;
385 unsigned int n_alloc_mmu_pages; 366 unsigned int n_alloc_mmu_pages;
386 atomic_t invlpg_counter; 367 atomic_t invlpg_counter;
387 struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; 368 struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
388 /* 369 /*
389 * Hash table of struct kvm_mmu_page. 370 * Hash table of struct kvm_mmu_page.
390 */ 371 */
391 struct list_head active_mmu_pages; 372 struct list_head active_mmu_pages;
392 struct list_head assigned_dev_head; 373 struct list_head assigned_dev_head;
393 struct iommu_domain *iommu_domain; 374 struct iommu_domain *iommu_domain;
394 int iommu_flags; 375 int iommu_flags;
395 struct kvm_pic *vpic; 376 struct kvm_pic *vpic;
396 struct kvm_ioapic *vioapic; 377 struct kvm_ioapic *vioapic;
397 struct kvm_pit *vpit; 378 struct kvm_pit *vpit;
398 int vapics_in_nmi_mode; 379 int vapics_in_nmi_mode;
399 380
400 unsigned int tss_addr; 381 unsigned int tss_addr;
401 struct page *apic_access_page; 382 struct page *apic_access_page;
402 383
403 gpa_t wall_clock; 384 gpa_t wall_clock;
404 385
405 struct page *ept_identity_pagetable; 386 struct page *ept_identity_pagetable;
406 bool ept_identity_pagetable_done; 387 bool ept_identity_pagetable_done;
407 gpa_t ept_identity_map_addr; 388 gpa_t ept_identity_map_addr;
408 389
409 unsigned long irq_sources_bitmap; 390 unsigned long irq_sources_bitmap;
410 u64 vm_init_tsc; 391 u64 vm_init_tsc;
411 s64 kvmclock_offset; 392 s64 kvmclock_offset;
412 393
413 struct kvm_xen_hvm_config xen_hvm_config; 394 struct kvm_xen_hvm_config xen_hvm_config;
414 395
415 /* fields used by HYPER-V emulation */ 396 /* fields used by HYPER-V emulation */
416 u64 hv_guest_os_id; 397 u64 hv_guest_os_id;
417 u64 hv_hypercall; 398 u64 hv_hypercall;
418 }; 399 };
419 400
420 struct kvm_vm_stat { 401 struct kvm_vm_stat {
421 u32 mmu_shadow_zapped; 402 u32 mmu_shadow_zapped;
422 u32 mmu_pte_write; 403 u32 mmu_pte_write;
423 u32 mmu_pte_updated; 404 u32 mmu_pte_updated;
424 u32 mmu_pde_zapped; 405 u32 mmu_pde_zapped;
425 u32 mmu_flooded; 406 u32 mmu_flooded;
426 u32 mmu_recycled; 407 u32 mmu_recycled;
427 u32 mmu_cache_miss; 408 u32 mmu_cache_miss;
428 u32 mmu_unsync; 409 u32 mmu_unsync;
429 u32 remote_tlb_flush; 410 u32 remote_tlb_flush;
430 u32 lpages; 411 u32 lpages;
431 }; 412 };
432 413
433 struct kvm_vcpu_stat { 414 struct kvm_vcpu_stat {
434 u32 pf_fixed; 415 u32 pf_fixed;
435 u32 pf_guest; 416 u32 pf_guest;
436 u32 tlb_flush; 417 u32 tlb_flush;
437 u32 invlpg; 418 u32 invlpg;
438 419
439 u32 exits; 420 u32 exits;
440 u32 io_exits; 421 u32 io_exits;
441 u32 mmio_exits; 422 u32 mmio_exits;
442 u32 signal_exits; 423 u32 signal_exits;
443 u32 irq_window_exits; 424 u32 irq_window_exits;
444 u32 nmi_window_exits; 425 u32 nmi_window_exits;
445 u32 halt_exits; 426 u32 halt_exits;
446 u32 halt_wakeup; 427 u32 halt_wakeup;
447 u32 request_irq_exits; 428 u32 request_irq_exits;
448 u32 irq_exits; 429 u32 irq_exits;
449 u32 host_state_reload; 430 u32 host_state_reload;
450 u32 efer_reload; 431 u32 efer_reload;
451 u32 fpu_reload; 432 u32 fpu_reload;
452 u32 insn_emulation; 433 u32 insn_emulation;
453 u32 insn_emulation_fail; 434 u32 insn_emulation_fail;
454 u32 hypercalls; 435 u32 hypercalls;
455 u32 irq_injections; 436 u32 irq_injections;
456 u32 nmi_injections; 437 u32 nmi_injections;
457 }; 438 };
458 439
459 struct kvm_x86_ops { 440 struct kvm_x86_ops {
460 int (*cpu_has_kvm_support)(void); /* __init */ 441 int (*cpu_has_kvm_support)(void); /* __init */
461 int (*disabled_by_bios)(void); /* __init */ 442 int (*disabled_by_bios)(void); /* __init */
462 int (*hardware_enable)(void *dummy); 443 int (*hardware_enable)(void *dummy);
463 void (*hardware_disable)(void *dummy); 444 void (*hardware_disable)(void *dummy);
464 void (*check_processor_compatibility)(void *rtn); 445 void (*check_processor_compatibility)(void *rtn);
465 int (*hardware_setup)(void); /* __init */ 446 int (*hardware_setup)(void); /* __init */
466 void (*hardware_unsetup)(void); /* __exit */ 447 void (*hardware_unsetup)(void); /* __exit */
467 bool (*cpu_has_accelerated_tpr)(void); 448 bool (*cpu_has_accelerated_tpr)(void);
468 void (*cpuid_update)(struct kvm_vcpu *vcpu); 449 void (*cpuid_update)(struct kvm_vcpu *vcpu);
469 450
470 /* Create, but do not attach this VCPU */ 451 /* Create, but do not attach this VCPU */
471 struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id); 452 struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
472 void (*vcpu_free)(struct kvm_vcpu *vcpu); 453 void (*vcpu_free)(struct kvm_vcpu *vcpu);
473 int (*vcpu_reset)(struct kvm_vcpu *vcpu); 454 int (*vcpu_reset)(struct kvm_vcpu *vcpu);
474 455
475 void (*prepare_guest_switch)(struct kvm_vcpu *vcpu); 456 void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
476 void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); 457 void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
477 void (*vcpu_put)(struct kvm_vcpu *vcpu); 458 void (*vcpu_put)(struct kvm_vcpu *vcpu);
478 459
479 void (*set_guest_debug)(struct kvm_vcpu *vcpu, 460 void (*set_guest_debug)(struct kvm_vcpu *vcpu,
480 struct kvm_guest_debug *dbg); 461 struct kvm_guest_debug *dbg);
481 int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); 462 int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
482 int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); 463 int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
483 u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); 464 u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
484 void (*get_segment)(struct kvm_vcpu *vcpu, 465 void (*get_segment)(struct kvm_vcpu *vcpu,
485 struct kvm_segment *var, int seg); 466 struct kvm_segment *var, int seg);
486 int (*get_cpl)(struct kvm_vcpu *vcpu); 467 int (*get_cpl)(struct kvm_vcpu *vcpu);
487 void (*set_segment)(struct kvm_vcpu *vcpu, 468 void (*set_segment)(struct kvm_vcpu *vcpu,
488 struct kvm_segment *var, int seg); 469 struct kvm_segment *var, int seg);
489 void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); 470 void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
490 void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu); 471 void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu);
491 void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu); 472 void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
492 void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0); 473 void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
493 void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); 474 void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
494 void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4); 475 void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
495 void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer); 476 void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
496 void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 477 void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
497 void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 478 void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
498 void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 479 void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
499 void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 480 void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
500 void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value); 481 void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
501 void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); 482 void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
502 unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); 483 unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
503 void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); 484 void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
504 void (*fpu_activate)(struct kvm_vcpu *vcpu); 485 void (*fpu_activate)(struct kvm_vcpu *vcpu);
505 void (*fpu_deactivate)(struct kvm_vcpu *vcpu); 486 void (*fpu_deactivate)(struct kvm_vcpu *vcpu);
506 487
507 void (*tlb_flush)(struct kvm_vcpu *vcpu); 488 void (*tlb_flush)(struct kvm_vcpu *vcpu);
508 489
509 void (*run)(struct kvm_vcpu *vcpu); 490 void (*run)(struct kvm_vcpu *vcpu);
510 int (*handle_exit)(struct kvm_vcpu *vcpu); 491 int (*handle_exit)(struct kvm_vcpu *vcpu);
511 void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); 492 void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
512 void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); 493 void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
513 u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); 494 u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
514 void (*patch_hypercall)(struct kvm_vcpu *vcpu, 495 void (*patch_hypercall)(struct kvm_vcpu *vcpu,
515 unsigned char *hypercall_addr); 496 unsigned char *hypercall_addr);
516 void (*set_irq)(struct kvm_vcpu *vcpu); 497 void (*set_irq)(struct kvm_vcpu *vcpu);
517 void (*set_nmi)(struct kvm_vcpu *vcpu); 498 void (*set_nmi)(struct kvm_vcpu *vcpu);
518 void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, 499 void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
519 bool has_error_code, u32 error_code, 500 bool has_error_code, u32 error_code,
520 bool reinject); 501 bool reinject);
521 int (*interrupt_allowed)(struct kvm_vcpu *vcpu); 502 int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
522 int (*nmi_allowed)(struct kvm_vcpu *vcpu); 503 int (*nmi_allowed)(struct kvm_vcpu *vcpu);
523 bool (*get_nmi_mask)(struct kvm_vcpu *vcpu); 504 bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
524 void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked); 505 void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
525 void (*enable_nmi_window)(struct kvm_vcpu *vcpu); 506 void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
526 void (*enable_irq_window)(struct kvm_vcpu *vcpu); 507 void (*enable_irq_window)(struct kvm_vcpu *vcpu);
527 void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); 508 void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
528 int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); 509 int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
529 int (*get_tdp_level)(void); 510 int (*get_tdp_level)(void);
530 u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); 511 u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
531 int (*get_lpage_level)(void); 512 int (*get_lpage_level)(void);
532 bool (*rdtscp_supported)(void); 513 bool (*rdtscp_supported)(void);
533 514
534 void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); 515 void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
535 516
536 const struct trace_print_flags *exit_reasons_str; 517 const struct trace_print_flags *exit_reasons_str;
537 }; 518 };
538 519
539 extern struct kvm_x86_ops *kvm_x86_ops; 520 extern struct kvm_x86_ops *kvm_x86_ops;
540 521
541 int kvm_mmu_module_init(void); 522 int kvm_mmu_module_init(void);
542 void kvm_mmu_module_exit(void); 523 void kvm_mmu_module_exit(void);
543 524
544 void kvm_mmu_destroy(struct kvm_vcpu *vcpu); 525 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
545 int kvm_mmu_create(struct kvm_vcpu *vcpu); 526 int kvm_mmu_create(struct kvm_vcpu *vcpu);
546 int kvm_mmu_setup(struct kvm_vcpu *vcpu); 527 int kvm_mmu_setup(struct kvm_vcpu *vcpu);
547 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); 528 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
548 void kvm_mmu_set_base_ptes(u64 base_pte); 529 void kvm_mmu_set_base_ptes(u64 base_pte);
549 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, 530 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
550 u64 dirty_mask, u64 nx_mask, u64 x_mask); 531 u64 dirty_mask, u64 nx_mask, u64 x_mask);
551 532
552 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); 533 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
553 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); 534 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
554 void kvm_mmu_zap_all(struct kvm *kvm); 535 void kvm_mmu_zap_all(struct kvm *kvm);
555 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); 536 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
556 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); 537 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
557 538
558 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); 539 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
559 540
560 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, 541 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
561 const void *val, int bytes); 542 const void *val, int bytes);
562 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, 543 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
563 gpa_t addr, unsigned long *ret); 544 gpa_t addr, unsigned long *ret);
564 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); 545 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
565 546
566 extern bool tdp_enabled; 547 extern bool tdp_enabled;
567 548
568 enum emulation_result { 549 enum emulation_result {
569 EMULATE_DONE, /* no further processing */ 550 EMULATE_DONE, /* no further processing */
570 EMULATE_DO_MMIO, /* kvm_run filled with mmio request */ 551 EMULATE_DO_MMIO, /* kvm_run filled with mmio request */
571 EMULATE_FAIL, /* can't emulate this instruction */ 552 EMULATE_FAIL, /* can't emulate this instruction */
572 }; 553 };
573 554
574 #define EMULTYPE_NO_DECODE (1 << 0) 555 #define EMULTYPE_NO_DECODE (1 << 0)
575 #define EMULTYPE_TRAP_UD (1 << 1) 556 #define EMULTYPE_TRAP_UD (1 << 1)
576 #define EMULTYPE_SKIP (1 << 2) 557 #define EMULTYPE_SKIP (1 << 2)
577 int emulate_instruction(struct kvm_vcpu *vcpu, 558 int emulate_instruction(struct kvm_vcpu *vcpu,
578 unsigned long cr2, u16 error_code, int emulation_type); 559 unsigned long cr2, u16 error_code, int emulation_type);
579 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); 560 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
580 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); 561 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
581 562
582 void kvm_enable_efer_bits(u64); 563 void kvm_enable_efer_bits(u64);
583 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); 564 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
584 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); 565 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
585 566
586 struct x86_emulate_ctxt; 567 struct x86_emulate_ctxt;
587 568
588 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); 569 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
589 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); 570 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
590 int kvm_emulate_halt(struct kvm_vcpu *vcpu); 571 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
591 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); 572 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
592 int emulate_clts(struct kvm_vcpu *vcpu); 573 int emulate_clts(struct kvm_vcpu *vcpu);
593 574
594 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); 575 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
595 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); 576 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
596 577
597 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, 578 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
598 bool has_error_code, u32 error_code); 579 bool has_error_code, u32 error_code);
599 580
600 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); 581 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
601 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); 582 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
602 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); 583 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
603 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8); 584 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
604 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val); 585 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
605 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val); 586 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
606 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu); 587 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
607 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw); 588 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
608 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); 589 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
609 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); 590 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
610 591
611 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 592 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
612 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); 593 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
613 594
614 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); 595 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
615 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); 596 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
616 597
617 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); 598 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
618 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); 599 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
619 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr); 600 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
620 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); 601 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
621 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2, 602 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
622 u32 error_code); 603 u32 error_code);
623 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl); 604 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
624 605
625 int kvm_pic_set_irq(void *opaque, int irq, int level); 606 int kvm_pic_set_irq(void *opaque, int irq, int level);
626 607
627 void kvm_inject_nmi(struct kvm_vcpu *vcpu); 608 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
628 609
629 int fx_init(struct kvm_vcpu *vcpu); 610 int fx_init(struct kvm_vcpu *vcpu);
630 611
631 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu); 612 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
632 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 613 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
633 const u8 *new, int bytes, 614 const u8 *new, int bytes,
634 bool guest_initiated); 615 bool guest_initiated);
635 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); 616 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
636 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); 617 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
637 int kvm_mmu_load(struct kvm_vcpu *vcpu); 618 int kvm_mmu_load(struct kvm_vcpu *vcpu);
638 void kvm_mmu_unload(struct kvm_vcpu *vcpu); 619 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
639 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); 620 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
640 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); 621 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
641 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); 622 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
642 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); 623 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
643 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); 624 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
644 625
645 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); 626 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
646 627
647 int kvm_fix_hypercall(struct kvm_vcpu *vcpu); 628 int kvm_fix_hypercall(struct kvm_vcpu *vcpu);
648 629
649 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code); 630 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
650 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); 631 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
651 632
652 void kvm_enable_tdp(void); 633 void kvm_enable_tdp(void);
653 void kvm_disable_tdp(void); 634 void kvm_disable_tdp(void);
654 635
655 int complete_pio(struct kvm_vcpu *vcpu); 636 int complete_pio(struct kvm_vcpu *vcpu);
656 bool kvm_check_iopl(struct kvm_vcpu *vcpu); 637 bool kvm_check_iopl(struct kvm_vcpu *vcpu);
657
658 struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn);
659 638
660 static inline struct kvm_mmu_page *page_header(hpa_t shadow_page) 639 static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
661 { 640 {
662 struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT); 641 struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
663 642
664 return (struct kvm_mmu_page *)page_private(page); 643 return (struct kvm_mmu_page *)page_private(page);
665 } 644 }
666 645
667 static inline u16 kvm_read_fs(void) 646 static inline u16 kvm_read_fs(void)
668 { 647 {
669 u16 seg; 648 u16 seg;
670 asm("mov %%fs, %0" : "=g"(seg)); 649 asm("mov %%fs, %0" : "=g"(seg));
671 return seg; 650 return seg;
672 } 651 }
673 652
674 static inline u16 kvm_read_gs(void) 653 static inline u16 kvm_read_gs(void)
675 { 654 {
676 u16 seg; 655 u16 seg;
677 asm("mov %%gs, %0" : "=g"(seg)); 656 asm("mov %%gs, %0" : "=g"(seg));
678 return seg; 657 return seg;
679 } 658 }
680 659
681 static inline u16 kvm_read_ldt(void) 660 static inline u16 kvm_read_ldt(void)
682 { 661 {
683 u16 ldt; 662 u16 ldt;
684 asm("sldt %0" : "=g"(ldt)); 663 asm("sldt %0" : "=g"(ldt));
685 return ldt; 664 return ldt;
686 } 665 }
687 666
688 static inline void kvm_load_fs(u16 sel) 667 static inline void kvm_load_fs(u16 sel)
689 { 668 {
690 asm("mov %0, %%fs" : : "rm"(sel)); 669 asm("mov %0, %%fs" : : "rm"(sel));
691 } 670 }
692 671
693 static inline void kvm_load_gs(u16 sel) 672 static inline void kvm_load_gs(u16 sel)
694 { 673 {
695 asm("mov %0, %%gs" : : "rm"(sel)); 674 asm("mov %0, %%gs" : : "rm"(sel));
696 } 675 }
697 676
698 static inline void kvm_load_ldt(u16 sel) 677 static inline void kvm_load_ldt(u16 sel)
699 { 678 {
700 asm("lldt %0" : : "rm"(sel)); 679 asm("lldt %0" : : "rm"(sel));
701 } 680 }
702 681
703 #ifdef CONFIG_X86_64 682 #ifdef CONFIG_X86_64
704 static inline unsigned long read_msr(unsigned long msr) 683 static inline unsigned long read_msr(unsigned long msr)
705 { 684 {
706 u64 value; 685 u64 value;
707 686
708 rdmsrl(msr, value); 687 rdmsrl(msr, value);
709 return value; 688 return value;
710 } 689 }
711 #endif 690 #endif
712 691
713 static inline u32 get_rdx_init_val(void) 692 static inline u32 get_rdx_init_val(void)
714 { 693 {
715 return 0x600; /* P6 family */ 694 return 0x600; /* P6 family */
716 } 695 }
717 696
718 static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code) 697 static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
719 { 698 {
720 kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); 699 kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
721 } 700 }
722 701
723 #define TSS_IOPB_BASE_OFFSET 0x66 702 #define TSS_IOPB_BASE_OFFSET 0x66
724 #define TSS_BASE_SIZE 0x68 703 #define TSS_BASE_SIZE 0x68
725 #define TSS_IOPB_SIZE (65536 / 8) 704 #define TSS_IOPB_SIZE (65536 / 8)
726 #define TSS_REDIRECTION_SIZE (256 / 8) 705 #define TSS_REDIRECTION_SIZE (256 / 8)
727 #define RMODE_TSS_SIZE \ 706 #define RMODE_TSS_SIZE \
728 (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 1) 707 (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 1)
729 708
730 enum { 709 enum {
731 TASK_SWITCH_CALL = 0, 710 TASK_SWITCH_CALL = 0,
732 TASK_SWITCH_IRET = 1, 711 TASK_SWITCH_IRET = 1,
733 TASK_SWITCH_JMP = 2, 712 TASK_SWITCH_JMP = 2,
734 TASK_SWITCH_GATE = 3, 713 TASK_SWITCH_GATE = 3,
735 }; 714 };
736 715
737 #define HF_GIF_MASK (1 << 0) 716 #define HF_GIF_MASK (1 << 0)
738 #define HF_HIF_MASK (1 << 1) 717 #define HF_HIF_MASK (1 << 1)
739 #define HF_VINTR_MASK (1 << 2) 718 #define HF_VINTR_MASK (1 << 2)
740 #define HF_NMI_MASK (1 << 3) 719 #define HF_NMI_MASK (1 << 3)
741 #define HF_IRET_MASK (1 << 4) 720 #define HF_IRET_MASK (1 << 4)
742 721
743 /* 722 /*
744 * Hardware virtualization extension instructions may fault if a 723 * Hardware virtualization extension instructions may fault if a
745 * reboot turns off virtualization while processes are running. 724 * reboot turns off virtualization while processes are running.
746 * Trap the fault and ignore the instruction if that happens. 725 * Trap the fault and ignore the instruction if that happens.
747 */ 726 */
748 asmlinkage void kvm_handle_fault_on_reboot(void); 727 asmlinkage void kvm_handle_fault_on_reboot(void);
749 728
750 #define __kvm_handle_fault_on_reboot(insn) \ 729 #define __kvm_handle_fault_on_reboot(insn) \
751 "666: " insn "\n\t" \ 730 "666: " insn "\n\t" \
752 ".pushsection .fixup, \"ax\" \n" \ 731 ".pushsection .fixup, \"ax\" \n" \
753 "667: \n\t" \ 732 "667: \n\t" \
754 __ASM_SIZE(push) " $666b \n\t" \ 733 __ASM_SIZE(push) " $666b \n\t" \
755 "jmp kvm_handle_fault_on_reboot \n\t" \ 734 "jmp kvm_handle_fault_on_reboot \n\t" \
756 ".popsection \n\t" \ 735 ".popsection \n\t" \
757 ".pushsection __ex_table, \"a\" \n\t" \ 736 ".pushsection __ex_table, \"a\" \n\t" \
758 _ASM_PTR " 666b, 667b \n\t" \ 737 _ASM_PTR " 666b, 667b \n\t" \
759 ".popsection" 738 ".popsection"
760 739
761 #define KVM_ARCH_WANT_MMU_NOTIFIER 740 #define KVM_ARCH_WANT_MMU_NOTIFIER
762 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); 741 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
763 int kvm_age_hva(struct kvm *kvm, unsigned long hva); 742 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
764 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); 743 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
765 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); 744 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
766 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); 745 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
767 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); 746 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
768 int kvm_cpu_get_interrupt(struct kvm_vcpu *v); 747 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
769 748
770 void kvm_define_shared_msr(unsigned index, u32 msr); 749 void kvm_define_shared_msr(unsigned index, u32 msr);
771 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); 750 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
772 751
773 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); 752 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
774 753
775 #endif /* _ASM_X86_KVM_HOST_H */ 754 #endif /* _ASM_X86_KVM_HOST_H */
776 755
1 /* 1 /*
2 * Kernel-based Virtual Machine driver for Linux 2 * Kernel-based Virtual Machine driver for Linux
3 * 3 *
4 * This module enables machines with Intel VT-x extensions to run virtual 4 * This module enables machines with Intel VT-x extensions to run virtual
5 * machines without emulation or binary translation. 5 * machines without emulation or binary translation.
6 * 6 *
7 * MMU support 7 * MMU support
8 * 8 *
9 * Copyright (C) 2006 Qumranet, Inc. 9 * Copyright (C) 2006 Qumranet, Inc.
10 * Copyright 2010 Red Hat, Inc. and/or its affilates. 10 * Copyright 2010 Red Hat, Inc. and/or its affilates.
11 * 11 *
12 * Authors: 12 * Authors:
13 * Yaniv Kamay <yaniv@qumranet.com> 13 * Yaniv Kamay <yaniv@qumranet.com>
14 * Avi Kivity <avi@qumranet.com> 14 * Avi Kivity <avi@qumranet.com>
15 * 15 *
16 * This work is licensed under the terms of the GNU GPL, version 2. See 16 * This work is licensed under the terms of the GNU GPL, version 2. See
17 * the COPYING file in the top-level directory. 17 * the COPYING file in the top-level directory.
18 * 18 *
19 */ 19 */
20 20
21 #include "mmu.h" 21 #include "mmu.h"
22 #include "x86.h" 22 #include "x86.h"
23 #include "kvm_cache_regs.h" 23 #include "kvm_cache_regs.h"
24 24
25 #include <linux/kvm_host.h> 25 #include <linux/kvm_host.h>
26 #include <linux/types.h> 26 #include <linux/types.h>
27 #include <linux/string.h> 27 #include <linux/string.h>
28 #include <linux/mm.h> 28 #include <linux/mm.h>
29 #include <linux/highmem.h> 29 #include <linux/highmem.h>
30 #include <linux/module.h> 30 #include <linux/module.h>
31 #include <linux/swap.h> 31 #include <linux/swap.h>
32 #include <linux/hugetlb.h> 32 #include <linux/hugetlb.h>
33 #include <linux/compiler.h> 33 #include <linux/compiler.h>
34 #include <linux/srcu.h> 34 #include <linux/srcu.h>
35 #include <linux/slab.h> 35 #include <linux/slab.h>
36 #include <linux/uaccess.h> 36 #include <linux/uaccess.h>
37 37
38 #include <asm/page.h> 38 #include <asm/page.h>
39 #include <asm/cmpxchg.h> 39 #include <asm/cmpxchg.h>
40 #include <asm/io.h> 40 #include <asm/io.h>
41 #include <asm/vmx.h> 41 #include <asm/vmx.h>
42 42
43 /* 43 /*
44 * When setting this variable to true it enables Two-Dimensional-Paging 44 * When setting this variable to true it enables Two-Dimensional-Paging
45 * where the hardware walks 2 page tables: 45 * where the hardware walks 2 page tables:
46 * 1. the guest-virtual to guest-physical 46 * 1. the guest-virtual to guest-physical
47 * 2. while doing 1. it walks guest-physical to host-physical 47 * 2. while doing 1. it walks guest-physical to host-physical
48 * If the hardware supports that we don't need to do shadow paging. 48 * If the hardware supports that we don't need to do shadow paging.
49 */ 49 */
50 bool tdp_enabled = false; 50 bool tdp_enabled = false;
51 51
52 #undef MMU_DEBUG 52 #undef MMU_DEBUG
53 53
54 #undef AUDIT 54 #undef AUDIT
55 55
56 #ifdef AUDIT 56 #ifdef AUDIT
57 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg); 57 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg);
58 #else 58 #else
59 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {} 59 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {}
60 #endif 60 #endif
61 61
62 #ifdef MMU_DEBUG 62 #ifdef MMU_DEBUG
63 63
64 #define pgprintk(x...) do { if (dbg) printk(x); } while (0) 64 #define pgprintk(x...) do { if (dbg) printk(x); } while (0)
65 #define rmap_printk(x...) do { if (dbg) printk(x); } while (0) 65 #define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
66 66
67 #else 67 #else
68 68
69 #define pgprintk(x...) do { } while (0) 69 #define pgprintk(x...) do { } while (0)
70 #define rmap_printk(x...) do { } while (0) 70 #define rmap_printk(x...) do { } while (0)
71 71
72 #endif 72 #endif
73 73
74 #if defined(MMU_DEBUG) || defined(AUDIT) 74 #if defined(MMU_DEBUG) || defined(AUDIT)
75 static int dbg = 0; 75 static int dbg = 0;
76 module_param(dbg, bool, 0644); 76 module_param(dbg, bool, 0644);
77 #endif 77 #endif
78 78
79 static int oos_shadow = 1; 79 static int oos_shadow = 1;
80 module_param(oos_shadow, bool, 0644); 80 module_param(oos_shadow, bool, 0644);
81 81
82 #ifndef MMU_DEBUG 82 #ifndef MMU_DEBUG
83 #define ASSERT(x) do { } while (0) 83 #define ASSERT(x) do { } while (0)
84 #else 84 #else
85 #define ASSERT(x) \ 85 #define ASSERT(x) \
86 if (!(x)) { \ 86 if (!(x)) { \
87 printk(KERN_WARNING "assertion failed %s:%d: %s\n", \ 87 printk(KERN_WARNING "assertion failed %s:%d: %s\n", \
88 __FILE__, __LINE__, #x); \ 88 __FILE__, __LINE__, #x); \
89 } 89 }
90 #endif 90 #endif
91 91
92 #define PT_FIRST_AVAIL_BITS_SHIFT 9 92 #define PT_FIRST_AVAIL_BITS_SHIFT 9
93 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 93 #define PT64_SECOND_AVAIL_BITS_SHIFT 52
94 94
95 #define VALID_PAGE(x) ((x) != INVALID_PAGE) 95 #define VALID_PAGE(x) ((x) != INVALID_PAGE)
96 96
97 #define PT64_LEVEL_BITS 9 97 #define PT64_LEVEL_BITS 9
98 98
99 #define PT64_LEVEL_SHIFT(level) \ 99 #define PT64_LEVEL_SHIFT(level) \
100 (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS) 100 (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
101 101
102 #define PT64_LEVEL_MASK(level) \ 102 #define PT64_LEVEL_MASK(level) \
103 (((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level)) 103 (((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level))
104 104
105 #define PT64_INDEX(address, level)\ 105 #define PT64_INDEX(address, level)\
106 (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) 106 (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
107 107
108 108
109 #define PT32_LEVEL_BITS 10 109 #define PT32_LEVEL_BITS 10
110 110
111 #define PT32_LEVEL_SHIFT(level) \ 111 #define PT32_LEVEL_SHIFT(level) \
112 (PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS) 112 (PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS)
113 113
114 #define PT32_LEVEL_MASK(level) \ 114 #define PT32_LEVEL_MASK(level) \
115 (((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level)) 115 (((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level))
116 #define PT32_LVL_OFFSET_MASK(level) \ 116 #define PT32_LVL_OFFSET_MASK(level) \
117 (PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ 117 (PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
118 * PT32_LEVEL_BITS))) - 1)) 118 * PT32_LEVEL_BITS))) - 1))
119 119
120 #define PT32_INDEX(address, level)\ 120 #define PT32_INDEX(address, level)\
121 (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1)) 121 (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
122 122
123 123
124 #define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) 124 #define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
125 #define PT64_DIR_BASE_ADDR_MASK \ 125 #define PT64_DIR_BASE_ADDR_MASK \
126 (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1)) 126 (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
127 #define PT64_LVL_ADDR_MASK(level) \ 127 #define PT64_LVL_ADDR_MASK(level) \
128 (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ 128 (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
129 * PT64_LEVEL_BITS))) - 1)) 129 * PT64_LEVEL_BITS))) - 1))
130 #define PT64_LVL_OFFSET_MASK(level) \ 130 #define PT64_LVL_OFFSET_MASK(level) \
131 (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ 131 (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
132 * PT64_LEVEL_BITS))) - 1)) 132 * PT64_LEVEL_BITS))) - 1))
133 133
134 #define PT32_BASE_ADDR_MASK PAGE_MASK 134 #define PT32_BASE_ADDR_MASK PAGE_MASK
135 #define PT32_DIR_BASE_ADDR_MASK \ 135 #define PT32_DIR_BASE_ADDR_MASK \
136 (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1)) 136 (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
137 #define PT32_LVL_ADDR_MASK(level) \ 137 #define PT32_LVL_ADDR_MASK(level) \
138 (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ 138 (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
139 * PT32_LEVEL_BITS))) - 1)) 139 * PT32_LEVEL_BITS))) - 1))
140 140
141 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \ 141 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
142 | PT64_NX_MASK) 142 | PT64_NX_MASK)
143 143
144 #define RMAP_EXT 4 144 #define RMAP_EXT 4
145 145
146 #define ACC_EXEC_MASK 1 146 #define ACC_EXEC_MASK 1
147 #define ACC_WRITE_MASK PT_WRITABLE_MASK 147 #define ACC_WRITE_MASK PT_WRITABLE_MASK
148 #define ACC_USER_MASK PT_USER_MASK 148 #define ACC_USER_MASK PT_USER_MASK
149 #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) 149 #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
150 150
151 #include <trace/events/kvm.h> 151 #include <trace/events/kvm.h>
152 152
153 #define CREATE_TRACE_POINTS 153 #define CREATE_TRACE_POINTS
154 #include "mmutrace.h" 154 #include "mmutrace.h"
155 155
156 #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) 156 #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
157 157
158 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) 158 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
159 159
160 struct kvm_rmap_desc { 160 struct kvm_rmap_desc {
161 u64 *sptes[RMAP_EXT]; 161 u64 *sptes[RMAP_EXT];
162 struct kvm_rmap_desc *more; 162 struct kvm_rmap_desc *more;
163 }; 163 };
164 164
165 struct kvm_shadow_walk_iterator { 165 struct kvm_shadow_walk_iterator {
166 u64 addr; 166 u64 addr;
167 hpa_t shadow_addr; 167 hpa_t shadow_addr;
168 int level; 168 int level;
169 u64 *sptep; 169 u64 *sptep;
170 unsigned index; 170 unsigned index;
171 }; 171 };
172 172
173 #define for_each_shadow_entry(_vcpu, _addr, _walker) \ 173 #define for_each_shadow_entry(_vcpu, _addr, _walker) \
174 for (shadow_walk_init(&(_walker), _vcpu, _addr); \ 174 for (shadow_walk_init(&(_walker), _vcpu, _addr); \
175 shadow_walk_okay(&(_walker)); \ 175 shadow_walk_okay(&(_walker)); \
176 shadow_walk_next(&(_walker))) 176 shadow_walk_next(&(_walker)))
177 177
178 typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte); 178 typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte);
179 179
180 static struct kmem_cache *pte_chain_cache; 180 static struct kmem_cache *pte_chain_cache;
181 static struct kmem_cache *rmap_desc_cache; 181 static struct kmem_cache *rmap_desc_cache;
182 static struct kmem_cache *mmu_page_header_cache; 182 static struct kmem_cache *mmu_page_header_cache;
183 183
184 static u64 __read_mostly shadow_trap_nonpresent_pte; 184 static u64 __read_mostly shadow_trap_nonpresent_pte;
185 static u64 __read_mostly shadow_notrap_nonpresent_pte; 185 static u64 __read_mostly shadow_notrap_nonpresent_pte;
186 static u64 __read_mostly shadow_base_present_pte; 186 static u64 __read_mostly shadow_base_present_pte;
187 static u64 __read_mostly shadow_nx_mask; 187 static u64 __read_mostly shadow_nx_mask;
188 static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ 188 static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
189 static u64 __read_mostly shadow_user_mask; 189 static u64 __read_mostly shadow_user_mask;
190 static u64 __read_mostly shadow_accessed_mask; 190 static u64 __read_mostly shadow_accessed_mask;
191 static u64 __read_mostly shadow_dirty_mask; 191 static u64 __read_mostly shadow_dirty_mask;
192 192
193 static inline u64 rsvd_bits(int s, int e) 193 static inline u64 rsvd_bits(int s, int e)
194 { 194 {
195 return ((1ULL << (e - s + 1)) - 1) << s; 195 return ((1ULL << (e - s + 1)) - 1) << s;
196 } 196 }
197 197
198 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) 198 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
199 { 199 {
200 shadow_trap_nonpresent_pte = trap_pte; 200 shadow_trap_nonpresent_pte = trap_pte;
201 shadow_notrap_nonpresent_pte = notrap_pte; 201 shadow_notrap_nonpresent_pte = notrap_pte;
202 } 202 }
203 EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); 203 EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
204 204
205 void kvm_mmu_set_base_ptes(u64 base_pte) 205 void kvm_mmu_set_base_ptes(u64 base_pte)
206 { 206 {
207 shadow_base_present_pte = base_pte; 207 shadow_base_present_pte = base_pte;
208 } 208 }
209 EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); 209 EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
210 210
211 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, 211 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
212 u64 dirty_mask, u64 nx_mask, u64 x_mask) 212 u64 dirty_mask, u64 nx_mask, u64 x_mask)
213 { 213 {
214 shadow_user_mask = user_mask; 214 shadow_user_mask = user_mask;
215 shadow_accessed_mask = accessed_mask; 215 shadow_accessed_mask = accessed_mask;
216 shadow_dirty_mask = dirty_mask; 216 shadow_dirty_mask = dirty_mask;
217 shadow_nx_mask = nx_mask; 217 shadow_nx_mask = nx_mask;
218 shadow_x_mask = x_mask; 218 shadow_x_mask = x_mask;
219 } 219 }
220 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); 220 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
221 221
222 static bool is_write_protection(struct kvm_vcpu *vcpu) 222 static bool is_write_protection(struct kvm_vcpu *vcpu)
223 { 223 {
224 return kvm_read_cr0_bits(vcpu, X86_CR0_WP); 224 return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
225 } 225 }
226 226
227 static int is_cpuid_PSE36(void) 227 static int is_cpuid_PSE36(void)
228 { 228 {
229 return 1; 229 return 1;
230 } 230 }
231 231
232 static int is_nx(struct kvm_vcpu *vcpu) 232 static int is_nx(struct kvm_vcpu *vcpu)
233 { 233 {
234 return vcpu->arch.efer & EFER_NX; 234 return vcpu->arch.efer & EFER_NX;
235 } 235 }
236 236
237 static int is_shadow_present_pte(u64 pte) 237 static int is_shadow_present_pte(u64 pte)
238 { 238 {
239 return pte != shadow_trap_nonpresent_pte 239 return pte != shadow_trap_nonpresent_pte
240 && pte != shadow_notrap_nonpresent_pte; 240 && pte != shadow_notrap_nonpresent_pte;
241 } 241 }
242 242
243 static int is_large_pte(u64 pte) 243 static int is_large_pte(u64 pte)
244 { 244 {
245 return pte & PT_PAGE_SIZE_MASK; 245 return pte & PT_PAGE_SIZE_MASK;
246 } 246 }
247 247
248 static int is_writable_pte(unsigned long pte) 248 static int is_writable_pte(unsigned long pte)
249 { 249 {
250 return pte & PT_WRITABLE_MASK; 250 return pte & PT_WRITABLE_MASK;
251 } 251 }
252 252
253 static int is_dirty_gpte(unsigned long pte) 253 static int is_dirty_gpte(unsigned long pte)
254 { 254 {
255 return pte & PT_DIRTY_MASK; 255 return pte & PT_DIRTY_MASK;
256 } 256 }
257 257
258 static int is_rmap_spte(u64 pte) 258 static int is_rmap_spte(u64 pte)
259 { 259 {
260 return is_shadow_present_pte(pte); 260 return is_shadow_present_pte(pte);
261 } 261 }
262 262
263 static int is_last_spte(u64 pte, int level) 263 static int is_last_spte(u64 pte, int level)
264 { 264 {
265 if (level == PT_PAGE_TABLE_LEVEL) 265 if (level == PT_PAGE_TABLE_LEVEL)
266 return 1; 266 return 1;
267 if (is_large_pte(pte)) 267 if (is_large_pte(pte))
268 return 1; 268 return 1;
269 return 0; 269 return 0;
270 } 270 }
271 271
272 static pfn_t spte_to_pfn(u64 pte) 272 static pfn_t spte_to_pfn(u64 pte)
273 { 273 {
274 return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; 274 return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
275 } 275 }
276 276
277 static gfn_t pse36_gfn_delta(u32 gpte) 277 static gfn_t pse36_gfn_delta(u32 gpte)
278 { 278 {
279 int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT; 279 int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
280 280
281 return (gpte & PT32_DIR_PSE36_MASK) << shift; 281 return (gpte & PT32_DIR_PSE36_MASK) << shift;
282 } 282 }
283 283
284 static void __set_spte(u64 *sptep, u64 spte) 284 static void __set_spte(u64 *sptep, u64 spte)
285 { 285 {
286 #ifdef CONFIG_X86_64 286 #ifdef CONFIG_X86_64
287 set_64bit((unsigned long *)sptep, spte); 287 set_64bit((unsigned long *)sptep, spte);
288 #else 288 #else
289 set_64bit((unsigned long long *)sptep, spte); 289 set_64bit((unsigned long long *)sptep, spte);
290 #endif 290 #endif
291 } 291 }
292 292
293 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, 293 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
294 struct kmem_cache *base_cache, int min) 294 struct kmem_cache *base_cache, int min)
295 { 295 {
296 void *obj; 296 void *obj;
297 297
298 if (cache->nobjs >= min) 298 if (cache->nobjs >= min)
299 return 0; 299 return 0;
300 while (cache->nobjs < ARRAY_SIZE(cache->objects)) { 300 while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
301 obj = kmem_cache_zalloc(base_cache, GFP_KERNEL); 301 obj = kmem_cache_zalloc(base_cache, GFP_KERNEL);
302 if (!obj) 302 if (!obj)
303 return -ENOMEM; 303 return -ENOMEM;
304 cache->objects[cache->nobjs++] = obj; 304 cache->objects[cache->nobjs++] = obj;
305 } 305 }
306 return 0; 306 return 0;
307 } 307 }
308 308
309 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, 309 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
310 struct kmem_cache *cache) 310 struct kmem_cache *cache)
311 { 311 {
312 while (mc->nobjs) 312 while (mc->nobjs)
313 kmem_cache_free(cache, mc->objects[--mc->nobjs]); 313 kmem_cache_free(cache, mc->objects[--mc->nobjs]);
314 } 314 }
315 315
316 static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, 316 static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
317 int min) 317 int min)
318 { 318 {
319 struct page *page; 319 struct page *page;
320 320
321 if (cache->nobjs >= min) 321 if (cache->nobjs >= min)
322 return 0; 322 return 0;
323 while (cache->nobjs < ARRAY_SIZE(cache->objects)) { 323 while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
324 page = alloc_page(GFP_KERNEL); 324 page = alloc_page(GFP_KERNEL);
325 if (!page) 325 if (!page)
326 return -ENOMEM; 326 return -ENOMEM;
327 cache->objects[cache->nobjs++] = page_address(page); 327 cache->objects[cache->nobjs++] = page_address(page);
328 } 328 }
329 return 0; 329 return 0;
330 } 330 }
331 331
332 static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc) 332 static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
333 { 333 {
334 while (mc->nobjs) 334 while (mc->nobjs)
335 free_page((unsigned long)mc->objects[--mc->nobjs]); 335 free_page((unsigned long)mc->objects[--mc->nobjs]);
336 } 336 }
337 337
338 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu) 338 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
339 { 339 {
340 int r; 340 int r;
341 341
342 r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache, 342 r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
343 pte_chain_cache, 4); 343 pte_chain_cache, 4);
344 if (r) 344 if (r)
345 goto out; 345 goto out;
346 r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, 346 r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache,
347 rmap_desc_cache, 4); 347 rmap_desc_cache, 4);
348 if (r) 348 if (r)
349 goto out; 349 goto out;
350 r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8); 350 r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
351 if (r) 351 if (r)
352 goto out; 352 goto out;
353 r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 353 r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
354 mmu_page_header_cache, 4); 354 mmu_page_header_cache, 4);
355 out: 355 out:
356 return r; 356 return r;
357 } 357 }
358 358
359 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) 359 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
360 { 360 {
361 mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache); 361 mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache);
362 mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache); 362 mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache);
363 mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache); 363 mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
364 mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache, 364 mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
365 mmu_page_header_cache); 365 mmu_page_header_cache);
366 } 366 }
367 367
368 static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc, 368 static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
369 size_t size) 369 size_t size)
370 { 370 {
371 void *p; 371 void *p;
372 372
373 BUG_ON(!mc->nobjs); 373 BUG_ON(!mc->nobjs);
374 p = mc->objects[--mc->nobjs]; 374 p = mc->objects[--mc->nobjs];
375 return p; 375 return p;
376 } 376 }
377 377
378 static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu) 378 static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
379 { 379 {
380 return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache, 380 return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache,
381 sizeof(struct kvm_pte_chain)); 381 sizeof(struct kvm_pte_chain));
382 } 382 }
383 383
384 static void mmu_free_pte_chain(struct kvm_pte_chain *pc) 384 static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
385 { 385 {
386 kmem_cache_free(pte_chain_cache, pc); 386 kmem_cache_free(pte_chain_cache, pc);
387 } 387 }
388 388
389 static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) 389 static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
390 { 390 {
391 return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache, 391 return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache,
392 sizeof(struct kvm_rmap_desc)); 392 sizeof(struct kvm_rmap_desc));
393 } 393 }
394 394
395 static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) 395 static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
396 { 396 {
397 kmem_cache_free(rmap_desc_cache, rd); 397 kmem_cache_free(rmap_desc_cache, rd);
398 } 398 }
399 399
400 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) 400 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
401 { 401 {
402 if (!sp->role.direct) 402 if (!sp->role.direct)
403 return sp->gfns[index]; 403 return sp->gfns[index];
404 404
405 return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); 405 return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
406 } 406 }
407 407
408 static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) 408 static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
409 { 409 {
410 if (sp->role.direct) 410 if (sp->role.direct)
411 BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index)); 411 BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index));
412 else 412 else
413 sp->gfns[index] = gfn; 413 sp->gfns[index] = gfn;
414 } 414 }
415 415
416 /* 416 /*
417 * Return the pointer to the largepage write count for a given 417 * Return the pointer to the largepage write count for a given
418 * gfn, handling slots that are not large page aligned. 418 * gfn, handling slots that are not large page aligned.
419 */ 419 */
420 static int *slot_largepage_idx(gfn_t gfn, 420 static int *slot_largepage_idx(gfn_t gfn,
421 struct kvm_memory_slot *slot, 421 struct kvm_memory_slot *slot,
422 int level) 422 int level)
423 { 423 {
424 unsigned long idx; 424 unsigned long idx;
425 425
426 idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - 426 idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
427 (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); 427 (slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
428 return &slot->lpage_info[level - 2][idx].write_count; 428 return &slot->lpage_info[level - 2][idx].write_count;
429 } 429 }
430 430
431 static void account_shadowed(struct kvm *kvm, gfn_t gfn) 431 static void account_shadowed(struct kvm *kvm, gfn_t gfn)
432 { 432 {
433 struct kvm_memory_slot *slot; 433 struct kvm_memory_slot *slot;
434 int *write_count; 434 int *write_count;
435 int i; 435 int i;
436 436
437 gfn = unalias_gfn(kvm, gfn); 437 slot = gfn_to_memslot(kvm, gfn);
438
439 slot = gfn_to_memslot_unaliased(kvm, gfn);
440 for (i = PT_DIRECTORY_LEVEL; 438 for (i = PT_DIRECTORY_LEVEL;
441 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { 439 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
442 write_count = slot_largepage_idx(gfn, slot, i); 440 write_count = slot_largepage_idx(gfn, slot, i);
443 *write_count += 1; 441 *write_count += 1;
444 } 442 }
445 } 443 }
446 444
447 static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) 445 static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
448 { 446 {
449 struct kvm_memory_slot *slot; 447 struct kvm_memory_slot *slot;
450 int *write_count; 448 int *write_count;
451 int i; 449 int i;
452 450
453 gfn = unalias_gfn(kvm, gfn); 451 slot = gfn_to_memslot(kvm, gfn);
454 slot = gfn_to_memslot_unaliased(kvm, gfn);
455 for (i = PT_DIRECTORY_LEVEL; 452 for (i = PT_DIRECTORY_LEVEL;
456 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { 453 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
457 write_count = slot_largepage_idx(gfn, slot, i); 454 write_count = slot_largepage_idx(gfn, slot, i);
458 *write_count -= 1; 455 *write_count -= 1;
459 WARN_ON(*write_count < 0); 456 WARN_ON(*write_count < 0);
460 } 457 }
461 } 458 }
462 459
463 static int has_wrprotected_page(struct kvm *kvm, 460 static int has_wrprotected_page(struct kvm *kvm,
464 gfn_t gfn, 461 gfn_t gfn,
465 int level) 462 int level)
466 { 463 {
467 struct kvm_memory_slot *slot; 464 struct kvm_memory_slot *slot;
468 int *largepage_idx; 465 int *largepage_idx;
469 466
470 gfn = unalias_gfn(kvm, gfn); 467 slot = gfn_to_memslot(kvm, gfn);
471 slot = gfn_to_memslot_unaliased(kvm, gfn);
472 if (slot) { 468 if (slot) {
473 largepage_idx = slot_largepage_idx(gfn, slot, level); 469 largepage_idx = slot_largepage_idx(gfn, slot, level);
474 return *largepage_idx; 470 return *largepage_idx;
475 } 471 }
476 472
477 return 1; 473 return 1;
478 } 474 }
479 475
480 static int host_mapping_level(struct kvm *kvm, gfn_t gfn) 476 static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
481 { 477 {
482 unsigned long page_size; 478 unsigned long page_size;
483 int i, ret = 0; 479 int i, ret = 0;
484 480
485 page_size = kvm_host_page_size(kvm, gfn); 481 page_size = kvm_host_page_size(kvm, gfn);
486 482
487 for (i = PT_PAGE_TABLE_LEVEL; 483 for (i = PT_PAGE_TABLE_LEVEL;
488 i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) { 484 i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
489 if (page_size >= KVM_HPAGE_SIZE(i)) 485 if (page_size >= KVM_HPAGE_SIZE(i))
490 ret = i; 486 ret = i;
491 else 487 else
492 break; 488 break;
493 } 489 }
494 490
495 return ret; 491 return ret;
496 } 492 }
497 493
498 static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) 494 static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
499 { 495 {
500 struct kvm_memory_slot *slot; 496 struct kvm_memory_slot *slot;
501 int host_level, level, max_level; 497 int host_level, level, max_level;
502 498
503 slot = gfn_to_memslot(vcpu->kvm, large_gfn); 499 slot = gfn_to_memslot(vcpu->kvm, large_gfn);
504 if (slot && slot->dirty_bitmap) 500 if (slot && slot->dirty_bitmap)
505 return PT_PAGE_TABLE_LEVEL; 501 return PT_PAGE_TABLE_LEVEL;
506 502
507 host_level = host_mapping_level(vcpu->kvm, large_gfn); 503 host_level = host_mapping_level(vcpu->kvm, large_gfn);
508 504
509 if (host_level == PT_PAGE_TABLE_LEVEL) 505 if (host_level == PT_PAGE_TABLE_LEVEL)
510 return host_level; 506 return host_level;
511 507
512 max_level = kvm_x86_ops->get_lpage_level() < host_level ? 508 max_level = kvm_x86_ops->get_lpage_level() < host_level ?
513 kvm_x86_ops->get_lpage_level() : host_level; 509 kvm_x86_ops->get_lpage_level() : host_level;
514 510
515 for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) 511 for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
516 if (has_wrprotected_page(vcpu->kvm, large_gfn, level)) 512 if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
517 break; 513 break;
518 514
519 return level - 1; 515 return level - 1;
520 } 516 }
521 517
522 /* 518 /*
523 * Take gfn and return the reverse mapping to it. 519 * Take gfn and return the reverse mapping to it.
524 * Note: gfn must be unaliased before this function get called
525 */ 520 */
526 521
527 static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level) 522 static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
528 { 523 {
529 struct kvm_memory_slot *slot; 524 struct kvm_memory_slot *slot;
530 unsigned long idx; 525 unsigned long idx;
531 526
532 slot = gfn_to_memslot(kvm, gfn); 527 slot = gfn_to_memslot(kvm, gfn);
533 if (likely(level == PT_PAGE_TABLE_LEVEL)) 528 if (likely(level == PT_PAGE_TABLE_LEVEL))
534 return &slot->rmap[gfn - slot->base_gfn]; 529 return &slot->rmap[gfn - slot->base_gfn];
535 530
536 idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - 531 idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
537 (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); 532 (slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
538 533
539 return &slot->lpage_info[level - 2][idx].rmap_pde; 534 return &slot->lpage_info[level - 2][idx].rmap_pde;
540 } 535 }
541 536
542 /* 537 /*
543 * Reverse mapping data structures: 538 * Reverse mapping data structures:
544 * 539 *
545 * If rmapp bit zero is zero, then rmapp point to the shadw page table entry 540 * If rmapp bit zero is zero, then rmapp point to the shadw page table entry
546 * that points to page_address(page). 541 * that points to page_address(page).
547 * 542 *
548 * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc 543 * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
549 * containing more mappings. 544 * containing more mappings.
550 * 545 *
551 * Returns the number of rmap entries before the spte was added or zero if 546 * Returns the number of rmap entries before the spte was added or zero if
552 * the spte was not added. 547 * the spte was not added.
553 * 548 *
554 */ 549 */
555 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) 550 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
556 { 551 {
557 struct kvm_mmu_page *sp; 552 struct kvm_mmu_page *sp;
558 struct kvm_rmap_desc *desc; 553 struct kvm_rmap_desc *desc;
559 unsigned long *rmapp; 554 unsigned long *rmapp;
560 int i, count = 0; 555 int i, count = 0;
561 556
562 if (!is_rmap_spte(*spte)) 557 if (!is_rmap_spte(*spte))
563 return count; 558 return count;
564 gfn = unalias_gfn(vcpu->kvm, gfn);
565 sp = page_header(__pa(spte)); 559 sp = page_header(__pa(spte));
566 kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); 560 kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
567 rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); 561 rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
568 if (!*rmapp) { 562 if (!*rmapp) {
569 rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte); 563 rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
570 *rmapp = (unsigned long)spte; 564 *rmapp = (unsigned long)spte;
571 } else if (!(*rmapp & 1)) { 565 } else if (!(*rmapp & 1)) {
572 rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte); 566 rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte);
573 desc = mmu_alloc_rmap_desc(vcpu); 567 desc = mmu_alloc_rmap_desc(vcpu);
574 desc->sptes[0] = (u64 *)*rmapp; 568 desc->sptes[0] = (u64 *)*rmapp;
575 desc->sptes[1] = spte; 569 desc->sptes[1] = spte;
576 *rmapp = (unsigned long)desc | 1; 570 *rmapp = (unsigned long)desc | 1;
577 } else { 571 } else {
578 rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte); 572 rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte);
579 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); 573 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
580 while (desc->sptes[RMAP_EXT-1] && desc->more) { 574 while (desc->sptes[RMAP_EXT-1] && desc->more) {
581 desc = desc->more; 575 desc = desc->more;
582 count += RMAP_EXT; 576 count += RMAP_EXT;
583 } 577 }
584 if (desc->sptes[RMAP_EXT-1]) { 578 if (desc->sptes[RMAP_EXT-1]) {
585 desc->more = mmu_alloc_rmap_desc(vcpu); 579 desc->more = mmu_alloc_rmap_desc(vcpu);
586 desc = desc->more; 580 desc = desc->more;
587 } 581 }
588 for (i = 0; desc->sptes[i]; ++i) 582 for (i = 0; desc->sptes[i]; ++i)
589 ; 583 ;
590 desc->sptes[i] = spte; 584 desc->sptes[i] = spte;
591 } 585 }
592 return count; 586 return count;
593 } 587 }
594 588
595 static void rmap_desc_remove_entry(unsigned long *rmapp, 589 static void rmap_desc_remove_entry(unsigned long *rmapp,
596 struct kvm_rmap_desc *desc, 590 struct kvm_rmap_desc *desc,
597 int i, 591 int i,
598 struct kvm_rmap_desc *prev_desc) 592 struct kvm_rmap_desc *prev_desc)
599 { 593 {
600 int j; 594 int j;
601 595
602 for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j) 596 for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j)
603 ; 597 ;
604 desc->sptes[i] = desc->sptes[j]; 598 desc->sptes[i] = desc->sptes[j];
605 desc->sptes[j] = NULL; 599 desc->sptes[j] = NULL;
606 if (j != 0) 600 if (j != 0)
607 return; 601 return;
608 if (!prev_desc && !desc->more) 602 if (!prev_desc && !desc->more)
609 *rmapp = (unsigned long)desc->sptes[0]; 603 *rmapp = (unsigned long)desc->sptes[0];
610 else 604 else
611 if (prev_desc) 605 if (prev_desc)
612 prev_desc->more = desc->more; 606 prev_desc->more = desc->more;
613 else 607 else
614 *rmapp = (unsigned long)desc->more | 1; 608 *rmapp = (unsigned long)desc->more | 1;
615 mmu_free_rmap_desc(desc); 609 mmu_free_rmap_desc(desc);
616 } 610 }
617 611
618 static void rmap_remove(struct kvm *kvm, u64 *spte) 612 static void rmap_remove(struct kvm *kvm, u64 *spte)
619 { 613 {
620 struct kvm_rmap_desc *desc; 614 struct kvm_rmap_desc *desc;
621 struct kvm_rmap_desc *prev_desc; 615 struct kvm_rmap_desc *prev_desc;
622 struct kvm_mmu_page *sp; 616 struct kvm_mmu_page *sp;
623 pfn_t pfn; 617 pfn_t pfn;
624 gfn_t gfn; 618 gfn_t gfn;
625 unsigned long *rmapp; 619 unsigned long *rmapp;
626 int i; 620 int i;
627 621
628 if (!is_rmap_spte(*spte)) 622 if (!is_rmap_spte(*spte))
629 return; 623 return;
630 sp = page_header(__pa(spte)); 624 sp = page_header(__pa(spte));
631 pfn = spte_to_pfn(*spte); 625 pfn = spte_to_pfn(*spte);
632 if (*spte & shadow_accessed_mask) 626 if (*spte & shadow_accessed_mask)
633 kvm_set_pfn_accessed(pfn); 627 kvm_set_pfn_accessed(pfn);
634 if (is_writable_pte(*spte)) 628 if (is_writable_pte(*spte))
635 kvm_set_pfn_dirty(pfn); 629 kvm_set_pfn_dirty(pfn);
636 gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); 630 gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
637 rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); 631 rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
638 if (!*rmapp) { 632 if (!*rmapp) {
639 printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); 633 printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
640 BUG(); 634 BUG();
641 } else if (!(*rmapp & 1)) { 635 } else if (!(*rmapp & 1)) {
642 rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte); 636 rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte);
643 if ((u64 *)*rmapp != spte) { 637 if ((u64 *)*rmapp != spte) {
644 printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n", 638 printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n",
645 spte, *spte); 639 spte, *spte);
646 BUG(); 640 BUG();
647 } 641 }
648 *rmapp = 0; 642 *rmapp = 0;
649 } else { 643 } else {
650 rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte); 644 rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte);
651 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); 645 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
652 prev_desc = NULL; 646 prev_desc = NULL;
653 while (desc) { 647 while (desc) {
654 for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) 648 for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i)
655 if (desc->sptes[i] == spte) { 649 if (desc->sptes[i] == spte) {
656 rmap_desc_remove_entry(rmapp, 650 rmap_desc_remove_entry(rmapp,
657 desc, i, 651 desc, i,
658 prev_desc); 652 prev_desc);
659 return; 653 return;
660 } 654 }
661 prev_desc = desc; 655 prev_desc = desc;
662 desc = desc->more; 656 desc = desc->more;
663 } 657 }
664 pr_err("rmap_remove: %p %llx many->many\n", spte, *spte); 658 pr_err("rmap_remove: %p %llx many->many\n", spte, *spte);
665 BUG(); 659 BUG();
666 } 660 }
667 } 661 }
668 662
669 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) 663 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
670 { 664 {
671 struct kvm_rmap_desc *desc; 665 struct kvm_rmap_desc *desc;
672 u64 *prev_spte; 666 u64 *prev_spte;
673 int i; 667 int i;
674 668
675 if (!*rmapp) 669 if (!*rmapp)
676 return NULL; 670 return NULL;
677 else if (!(*rmapp & 1)) { 671 else if (!(*rmapp & 1)) {
678 if (!spte) 672 if (!spte)
679 return (u64 *)*rmapp; 673 return (u64 *)*rmapp;
680 return NULL; 674 return NULL;
681 } 675 }
682 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); 676 desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
683 prev_spte = NULL; 677 prev_spte = NULL;
684 while (desc) { 678 while (desc) {
685 for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) { 679 for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) {
686 if (prev_spte == spte) 680 if (prev_spte == spte)
687 return desc->sptes[i]; 681 return desc->sptes[i];
688 prev_spte = desc->sptes[i]; 682 prev_spte = desc->sptes[i];
689 } 683 }
690 desc = desc->more; 684 desc = desc->more;
691 } 685 }
692 return NULL; 686 return NULL;
693 } 687 }
694 688
695 static int rmap_write_protect(struct kvm *kvm, u64 gfn) 689 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
696 { 690 {
697 unsigned long *rmapp; 691 unsigned long *rmapp;
698 u64 *spte; 692 u64 *spte;
699 int i, write_protected = 0; 693 int i, write_protected = 0;
700 694
701 gfn = unalias_gfn(kvm, gfn);
702 rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL); 695 rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
703 696
704 spte = rmap_next(kvm, rmapp, NULL); 697 spte = rmap_next(kvm, rmapp, NULL);
705 while (spte) { 698 while (spte) {
706 BUG_ON(!spte); 699 BUG_ON(!spte);
707 BUG_ON(!(*spte & PT_PRESENT_MASK)); 700 BUG_ON(!(*spte & PT_PRESENT_MASK));
708 rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte); 701 rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
709 if (is_writable_pte(*spte)) { 702 if (is_writable_pte(*spte)) {
710 __set_spte(spte, *spte & ~PT_WRITABLE_MASK); 703 __set_spte(spte, *spte & ~PT_WRITABLE_MASK);
711 write_protected = 1; 704 write_protected = 1;
712 } 705 }
713 spte = rmap_next(kvm, rmapp, spte); 706 spte = rmap_next(kvm, rmapp, spte);
714 } 707 }
715 if (write_protected) { 708 if (write_protected) {
716 pfn_t pfn; 709 pfn_t pfn;
717 710
718 spte = rmap_next(kvm, rmapp, NULL); 711 spte = rmap_next(kvm, rmapp, NULL);
719 pfn = spte_to_pfn(*spte); 712 pfn = spte_to_pfn(*spte);
720 kvm_set_pfn_dirty(pfn); 713 kvm_set_pfn_dirty(pfn);
721 } 714 }
722 715
723 /* check for huge page mappings */ 716 /* check for huge page mappings */
724 for (i = PT_DIRECTORY_LEVEL; 717 for (i = PT_DIRECTORY_LEVEL;
725 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { 718 i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
726 rmapp = gfn_to_rmap(kvm, gfn, i); 719 rmapp = gfn_to_rmap(kvm, gfn, i);
727 spte = rmap_next(kvm, rmapp, NULL); 720 spte = rmap_next(kvm, rmapp, NULL);
728 while (spte) { 721 while (spte) {
729 BUG_ON(!spte); 722 BUG_ON(!spte);
730 BUG_ON(!(*spte & PT_PRESENT_MASK)); 723 BUG_ON(!(*spte & PT_PRESENT_MASK));
731 BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)); 724 BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
732 pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn); 725 pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
733 if (is_writable_pte(*spte)) { 726 if (is_writable_pte(*spte)) {
734 rmap_remove(kvm, spte); 727 rmap_remove(kvm, spte);
735 --kvm->stat.lpages; 728 --kvm->stat.lpages;
736 __set_spte(spte, shadow_trap_nonpresent_pte); 729 __set_spte(spte, shadow_trap_nonpresent_pte);
737 spte = NULL; 730 spte = NULL;
738 write_protected = 1; 731 write_protected = 1;
739 } 732 }
740 spte = rmap_next(kvm, rmapp, spte); 733 spte = rmap_next(kvm, rmapp, spte);
741 } 734 }
742 } 735 }
743 736
744 return write_protected; 737 return write_protected;
745 } 738 }
746 739
747 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, 740 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
748 unsigned long data) 741 unsigned long data)
749 { 742 {
750 u64 *spte; 743 u64 *spte;
751 int need_tlb_flush = 0; 744 int need_tlb_flush = 0;
752 745
753 while ((spte = rmap_next(kvm, rmapp, NULL))) { 746 while ((spte = rmap_next(kvm, rmapp, NULL))) {
754 BUG_ON(!(*spte & PT_PRESENT_MASK)); 747 BUG_ON(!(*spte & PT_PRESENT_MASK));
755 rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); 748 rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
756 rmap_remove(kvm, spte); 749 rmap_remove(kvm, spte);
757 __set_spte(spte, shadow_trap_nonpresent_pte); 750 __set_spte(spte, shadow_trap_nonpresent_pte);
758 need_tlb_flush = 1; 751 need_tlb_flush = 1;
759 } 752 }
760 return need_tlb_flush; 753 return need_tlb_flush;
761 } 754 }
762 755
763 static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, 756 static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
764 unsigned long data) 757 unsigned long data)
765 { 758 {
766 int need_flush = 0; 759 int need_flush = 0;
767 u64 *spte, new_spte; 760 u64 *spte, new_spte;
768 pte_t *ptep = (pte_t *)data; 761 pte_t *ptep = (pte_t *)data;
769 pfn_t new_pfn; 762 pfn_t new_pfn;
770 763
771 WARN_ON(pte_huge(*ptep)); 764 WARN_ON(pte_huge(*ptep));
772 new_pfn = pte_pfn(*ptep); 765 new_pfn = pte_pfn(*ptep);
773 spte = rmap_next(kvm, rmapp, NULL); 766 spte = rmap_next(kvm, rmapp, NULL);
774 while (spte) { 767 while (spte) {
775 BUG_ON(!is_shadow_present_pte(*spte)); 768 BUG_ON(!is_shadow_present_pte(*spte));
776 rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte); 769 rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
777 need_flush = 1; 770 need_flush = 1;
778 if (pte_write(*ptep)) { 771 if (pte_write(*ptep)) {
779 rmap_remove(kvm, spte); 772 rmap_remove(kvm, spte);
780 __set_spte(spte, shadow_trap_nonpresent_pte); 773 __set_spte(spte, shadow_trap_nonpresent_pte);
781 spte = rmap_next(kvm, rmapp, NULL); 774 spte = rmap_next(kvm, rmapp, NULL);
782 } else { 775 } else {
783 new_spte = *spte &~ (PT64_BASE_ADDR_MASK); 776 new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
784 new_spte |= (u64)new_pfn << PAGE_SHIFT; 777 new_spte |= (u64)new_pfn << PAGE_SHIFT;
785 778
786 new_spte &= ~PT_WRITABLE_MASK; 779 new_spte &= ~PT_WRITABLE_MASK;
787 new_spte &= ~SPTE_HOST_WRITEABLE; 780 new_spte &= ~SPTE_HOST_WRITEABLE;
788 if (is_writable_pte(*spte)) 781 if (is_writable_pte(*spte))
789 kvm_set_pfn_dirty(spte_to_pfn(*spte)); 782 kvm_set_pfn_dirty(spte_to_pfn(*spte));
790 __set_spte(spte, new_spte); 783 __set_spte(spte, new_spte);
791 spte = rmap_next(kvm, rmapp, spte); 784 spte = rmap_next(kvm, rmapp, spte);
792 } 785 }
793 } 786 }
794 if (need_flush) 787 if (need_flush)
795 kvm_flush_remote_tlbs(kvm); 788 kvm_flush_remote_tlbs(kvm);
796 789
797 return 0; 790 return 0;
798 } 791 }
799 792
800 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, 793 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
801 unsigned long data, 794 unsigned long data,
802 int (*handler)(struct kvm *kvm, unsigned long *rmapp, 795 int (*handler)(struct kvm *kvm, unsigned long *rmapp,
803 unsigned long data)) 796 unsigned long data))
804 { 797 {
805 int i, j; 798 int i, j;
806 int ret; 799 int ret;
807 int retval = 0; 800 int retval = 0;
808 struct kvm_memslots *slots; 801 struct kvm_memslots *slots;
809 802
810 slots = kvm_memslots(kvm); 803 slots = kvm_memslots(kvm);
811 804
812 for (i = 0; i < slots->nmemslots; i++) { 805 for (i = 0; i < slots->nmemslots; i++) {
813 struct kvm_memory_slot *memslot = &slots->memslots[i]; 806 struct kvm_memory_slot *memslot = &slots->memslots[i];
814 unsigned long start = memslot->userspace_addr; 807 unsigned long start = memslot->userspace_addr;
815 unsigned long end; 808 unsigned long end;
816 809
817 end = start + (memslot->npages << PAGE_SHIFT); 810 end = start + (memslot->npages << PAGE_SHIFT);
818 if (hva >= start && hva < end) { 811 if (hva >= start && hva < end) {
819 gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; 812 gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
820 813
821 ret = handler(kvm, &memslot->rmap[gfn_offset], data); 814 ret = handler(kvm, &memslot->rmap[gfn_offset], data);
822 815
823 for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) { 816 for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
824 int idx = gfn_offset; 817 int idx = gfn_offset;
825 idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j); 818 idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
826 ret |= handler(kvm, 819 ret |= handler(kvm,
827 &memslot->lpage_info[j][idx].rmap_pde, 820 &memslot->lpage_info[j][idx].rmap_pde,
828 data); 821 data);
829 } 822 }
830 trace_kvm_age_page(hva, memslot, ret); 823 trace_kvm_age_page(hva, memslot, ret);
831 retval |= ret; 824 retval |= ret;
832 } 825 }
833 } 826 }
834 827
835 return retval; 828 return retval;
836 } 829 }
837 830
838 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) 831 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
839 { 832 {
840 return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp); 833 return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
841 } 834 }
842 835
843 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) 836 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
844 { 837 {
845 kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); 838 kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
846 } 839 }
847 840
848 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, 841 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
849 unsigned long data) 842 unsigned long data)
850 { 843 {
851 u64 *spte; 844 u64 *spte;
852 int young = 0; 845 int young = 0;
853 846
854 /* 847 /*
855 * Emulate the accessed bit for EPT, by checking if this page has 848 * Emulate the accessed bit for EPT, by checking if this page has
856 * an EPT mapping, and clearing it if it does. On the next access, 849 * an EPT mapping, and clearing it if it does. On the next access,
857 * a new EPT mapping will be established. 850 * a new EPT mapping will be established.
858 * This has some overhead, but not as much as the cost of swapping 851 * This has some overhead, but not as much as the cost of swapping
859 * out actively used pages or breaking up actively used hugepages. 852 * out actively used pages or breaking up actively used hugepages.
860 */ 853 */
861 if (!shadow_accessed_mask) 854 if (!shadow_accessed_mask)
862 return kvm_unmap_rmapp(kvm, rmapp, data); 855 return kvm_unmap_rmapp(kvm, rmapp, data);
863 856
864 spte = rmap_next(kvm, rmapp, NULL); 857 spte = rmap_next(kvm, rmapp, NULL);
865 while (spte) { 858 while (spte) {
866 int _young; 859 int _young;
867 u64 _spte = *spte; 860 u64 _spte = *spte;
868 BUG_ON(!(_spte & PT_PRESENT_MASK)); 861 BUG_ON(!(_spte & PT_PRESENT_MASK));
869 _young = _spte & PT_ACCESSED_MASK; 862 _young = _spte & PT_ACCESSED_MASK;
870 if (_young) { 863 if (_young) {
871 young = 1; 864 young = 1;
872 clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); 865 clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
873 } 866 }
874 spte = rmap_next(kvm, rmapp, spte); 867 spte = rmap_next(kvm, rmapp, spte);
875 } 868 }
876 return young; 869 return young;
877 } 870 }
878 871
879 #define RMAP_RECYCLE_THRESHOLD 1000 872 #define RMAP_RECYCLE_THRESHOLD 1000
880 873
881 static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) 874 static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
882 { 875 {
883 unsigned long *rmapp; 876 unsigned long *rmapp;
884 struct kvm_mmu_page *sp; 877 struct kvm_mmu_page *sp;
885 878
886 sp = page_header(__pa(spte)); 879 sp = page_header(__pa(spte));
887 880
888 gfn = unalias_gfn(vcpu->kvm, gfn);
889 rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); 881 rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
890 882
891 kvm_unmap_rmapp(vcpu->kvm, rmapp, 0); 883 kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
892 kvm_flush_remote_tlbs(vcpu->kvm); 884 kvm_flush_remote_tlbs(vcpu->kvm);
893 } 885 }
894 886
895 int kvm_age_hva(struct kvm *kvm, unsigned long hva) 887 int kvm_age_hva(struct kvm *kvm, unsigned long hva)
896 { 888 {
897 return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp); 889 return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
898 } 890 }
899 891
900 #ifdef MMU_DEBUG 892 #ifdef MMU_DEBUG
901 static int is_empty_shadow_page(u64 *spt) 893 static int is_empty_shadow_page(u64 *spt)
902 { 894 {
903 u64 *pos; 895 u64 *pos;
904 u64 *end; 896 u64 *end;
905 897
906 for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++) 898 for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++)
907 if (is_shadow_present_pte(*pos)) { 899 if (is_shadow_present_pte(*pos)) {
908 printk(KERN_ERR "%s: %p %llx\n", __func__, 900 printk(KERN_ERR "%s: %p %llx\n", __func__,
909 pos, *pos); 901 pos, *pos);
910 return 0; 902 return 0;
911 } 903 }
912 return 1; 904 return 1;
913 } 905 }
914 #endif 906 #endif
915 907
916 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) 908 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
917 { 909 {
918 ASSERT(is_empty_shadow_page(sp->spt)); 910 ASSERT(is_empty_shadow_page(sp->spt));
919 hlist_del(&sp->hash_link); 911 hlist_del(&sp->hash_link);
920 list_del(&sp->link); 912 list_del(&sp->link);
921 __free_page(virt_to_page(sp->spt)); 913 __free_page(virt_to_page(sp->spt));
922 if (!sp->role.direct) 914 if (!sp->role.direct)
923 __free_page(virt_to_page(sp->gfns)); 915 __free_page(virt_to_page(sp->gfns));
924 kmem_cache_free(mmu_page_header_cache, sp); 916 kmem_cache_free(mmu_page_header_cache, sp);
925 ++kvm->arch.n_free_mmu_pages; 917 ++kvm->arch.n_free_mmu_pages;
926 } 918 }
927 919
928 static unsigned kvm_page_table_hashfn(gfn_t gfn) 920 static unsigned kvm_page_table_hashfn(gfn_t gfn)
929 { 921 {
930 return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1); 922 return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
931 } 923 }
932 924
933 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, 925 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
934 u64 *parent_pte, int direct) 926 u64 *parent_pte, int direct)
935 { 927 {
936 struct kvm_mmu_page *sp; 928 struct kvm_mmu_page *sp;
937 929
938 sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp); 930 sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
939 sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); 931 sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
940 if (!direct) 932 if (!direct)
941 sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, 933 sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
942 PAGE_SIZE); 934 PAGE_SIZE);
943 set_page_private(virt_to_page(sp->spt), (unsigned long)sp); 935 set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
944 list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); 936 list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
945 bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); 937 bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
946 sp->multimapped = 0; 938 sp->multimapped = 0;
947 sp->parent_pte = parent_pte; 939 sp->parent_pte = parent_pte;
948 --vcpu->kvm->arch.n_free_mmu_pages; 940 --vcpu->kvm->arch.n_free_mmu_pages;
949 return sp; 941 return sp;
950 } 942 }
951 943
952 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu, 944 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
953 struct kvm_mmu_page *sp, u64 *parent_pte) 945 struct kvm_mmu_page *sp, u64 *parent_pte)
954 { 946 {
955 struct kvm_pte_chain *pte_chain; 947 struct kvm_pte_chain *pte_chain;
956 struct hlist_node *node; 948 struct hlist_node *node;
957 int i; 949 int i;
958 950
959 if (!parent_pte) 951 if (!parent_pte)
960 return; 952 return;
961 if (!sp->multimapped) { 953 if (!sp->multimapped) {
962 u64 *old = sp->parent_pte; 954 u64 *old = sp->parent_pte;
963 955
964 if (!old) { 956 if (!old) {
965 sp->parent_pte = parent_pte; 957 sp->parent_pte = parent_pte;
966 return; 958 return;
967 } 959 }
968 sp->multimapped = 1; 960 sp->multimapped = 1;
969 pte_chain = mmu_alloc_pte_chain(vcpu); 961 pte_chain = mmu_alloc_pte_chain(vcpu);
970 INIT_HLIST_HEAD(&sp->parent_ptes); 962 INIT_HLIST_HEAD(&sp->parent_ptes);
971 hlist_add_head(&pte_chain->link, &sp->parent_ptes); 963 hlist_add_head(&pte_chain->link, &sp->parent_ptes);
972 pte_chain->parent_ptes[0] = old; 964 pte_chain->parent_ptes[0] = old;
973 } 965 }
974 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) { 966 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) {
975 if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1]) 967 if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1])
976 continue; 968 continue;
977 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) 969 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i)
978 if (!pte_chain->parent_ptes[i]) { 970 if (!pte_chain->parent_ptes[i]) {
979 pte_chain->parent_ptes[i] = parent_pte; 971 pte_chain->parent_ptes[i] = parent_pte;
980 return; 972 return;
981 } 973 }
982 } 974 }
983 pte_chain = mmu_alloc_pte_chain(vcpu); 975 pte_chain = mmu_alloc_pte_chain(vcpu);
984 BUG_ON(!pte_chain); 976 BUG_ON(!pte_chain);
985 hlist_add_head(&pte_chain->link, &sp->parent_ptes); 977 hlist_add_head(&pte_chain->link, &sp->parent_ptes);
986 pte_chain->parent_ptes[0] = parent_pte; 978 pte_chain->parent_ptes[0] = parent_pte;
987 } 979 }
988 980
989 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp, 981 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
990 u64 *parent_pte) 982 u64 *parent_pte)
991 { 983 {
992 struct kvm_pte_chain *pte_chain; 984 struct kvm_pte_chain *pte_chain;
993 struct hlist_node *node; 985 struct hlist_node *node;
994 int i; 986 int i;
995 987
996 if (!sp->multimapped) { 988 if (!sp->multimapped) {
997 BUG_ON(sp->parent_pte != parent_pte); 989 BUG_ON(sp->parent_pte != parent_pte);
998 sp->parent_pte = NULL; 990 sp->parent_pte = NULL;
999 return; 991 return;
1000 } 992 }
1001 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) 993 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
1002 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { 994 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
1003 if (!pte_chain->parent_ptes[i]) 995 if (!pte_chain->parent_ptes[i])
1004 break; 996 break;
1005 if (pte_chain->parent_ptes[i] != parent_pte) 997 if (pte_chain->parent_ptes[i] != parent_pte)
1006 continue; 998 continue;
1007 while (i + 1 < NR_PTE_CHAIN_ENTRIES 999 while (i + 1 < NR_PTE_CHAIN_ENTRIES
1008 && pte_chain->parent_ptes[i + 1]) { 1000 && pte_chain->parent_ptes[i + 1]) {
1009 pte_chain->parent_ptes[i] 1001 pte_chain->parent_ptes[i]
1010 = pte_chain->parent_ptes[i + 1]; 1002 = pte_chain->parent_ptes[i + 1];
1011 ++i; 1003 ++i;
1012 } 1004 }
1013 pte_chain->parent_ptes[i] = NULL; 1005 pte_chain->parent_ptes[i] = NULL;
1014 if (i == 0) { 1006 if (i == 0) {
1015 hlist_del(&pte_chain->link); 1007 hlist_del(&pte_chain->link);
1016 mmu_free_pte_chain(pte_chain); 1008 mmu_free_pte_chain(pte_chain);
1017 if (hlist_empty(&sp->parent_ptes)) { 1009 if (hlist_empty(&sp->parent_ptes)) {
1018 sp->multimapped = 0; 1010 sp->multimapped = 0;
1019 sp->parent_pte = NULL; 1011 sp->parent_pte = NULL;
1020 } 1012 }
1021 } 1013 }
1022 return; 1014 return;
1023 } 1015 }
1024 BUG(); 1016 BUG();
1025 } 1017 }
1026 1018
1027 static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn) 1019 static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn)
1028 { 1020 {
1029 struct kvm_pte_chain *pte_chain; 1021 struct kvm_pte_chain *pte_chain;
1030 struct hlist_node *node; 1022 struct hlist_node *node;
1031 struct kvm_mmu_page *parent_sp; 1023 struct kvm_mmu_page *parent_sp;
1032 int i; 1024 int i;
1033 1025
1034 if (!sp->multimapped && sp->parent_pte) { 1026 if (!sp->multimapped && sp->parent_pte) {
1035 parent_sp = page_header(__pa(sp->parent_pte)); 1027 parent_sp = page_header(__pa(sp->parent_pte));
1036 fn(parent_sp, sp->parent_pte); 1028 fn(parent_sp, sp->parent_pte);
1037 return; 1029 return;
1038 } 1030 }
1039 1031
1040 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) 1032 hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
1041 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { 1033 for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
1042 u64 *spte = pte_chain->parent_ptes[i]; 1034 u64 *spte = pte_chain->parent_ptes[i];
1043 1035
1044 if (!spte) 1036 if (!spte)
1045 break; 1037 break;
1046 parent_sp = page_header(__pa(spte)); 1038 parent_sp = page_header(__pa(spte));
1047 fn(parent_sp, spte); 1039 fn(parent_sp, spte);
1048 } 1040 }
1049 } 1041 }
1050 1042
1051 static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte); 1043 static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte);
1052 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) 1044 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
1053 { 1045 {
1054 mmu_parent_walk(sp, mark_unsync); 1046 mmu_parent_walk(sp, mark_unsync);
1055 } 1047 }
1056 1048
1057 static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte) 1049 static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte)
1058 { 1050 {
1059 unsigned int index; 1051 unsigned int index;
1060 1052
1061 index = spte - sp->spt; 1053 index = spte - sp->spt;
1062 if (__test_and_set_bit(index, sp->unsync_child_bitmap)) 1054 if (__test_and_set_bit(index, sp->unsync_child_bitmap))
1063 return; 1055 return;
1064 if (sp->unsync_children++) 1056 if (sp->unsync_children++)
1065 return; 1057 return;
1066 kvm_mmu_mark_parents_unsync(sp); 1058 kvm_mmu_mark_parents_unsync(sp);
1067 } 1059 }
1068 1060
1069 static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu, 1061 static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
1070 struct kvm_mmu_page *sp) 1062 struct kvm_mmu_page *sp)
1071 { 1063 {
1072 int i; 1064 int i;
1073 1065
1074 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) 1066 for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
1075 sp->spt[i] = shadow_trap_nonpresent_pte; 1067 sp->spt[i] = shadow_trap_nonpresent_pte;
1076 } 1068 }
1077 1069
1078 static int nonpaging_sync_page(struct kvm_vcpu *vcpu, 1070 static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
1079 struct kvm_mmu_page *sp, bool clear_unsync) 1071 struct kvm_mmu_page *sp, bool clear_unsync)
1080 { 1072 {
1081 return 1; 1073 return 1;
1082 } 1074 }
1083 1075
1084 static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva) 1076 static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
1085 { 1077 {
1086 } 1078 }
1087 1079
1088 #define KVM_PAGE_ARRAY_NR 16 1080 #define KVM_PAGE_ARRAY_NR 16
1089 1081
1090 struct kvm_mmu_pages { 1082 struct kvm_mmu_pages {
1091 struct mmu_page_and_offset { 1083 struct mmu_page_and_offset {
1092 struct kvm_mmu_page *sp; 1084 struct kvm_mmu_page *sp;
1093 unsigned int idx; 1085 unsigned int idx;
1094 } page[KVM_PAGE_ARRAY_NR]; 1086 } page[KVM_PAGE_ARRAY_NR];
1095 unsigned int nr; 1087 unsigned int nr;
1096 }; 1088 };
1097 1089
1098 #define for_each_unsync_children(bitmap, idx) \ 1090 #define for_each_unsync_children(bitmap, idx) \
1099 for (idx = find_first_bit(bitmap, 512); \ 1091 for (idx = find_first_bit(bitmap, 512); \
1100 idx < 512; \ 1092 idx < 512; \
1101 idx = find_next_bit(bitmap, 512, idx+1)) 1093 idx = find_next_bit(bitmap, 512, idx+1))
1102 1094
1103 static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, 1095 static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
1104 int idx) 1096 int idx)
1105 { 1097 {
1106 int i; 1098 int i;
1107 1099
1108 if (sp->unsync) 1100 if (sp->unsync)
1109 for (i=0; i < pvec->nr; i++) 1101 for (i=0; i < pvec->nr; i++)
1110 if (pvec->page[i].sp == sp) 1102 if (pvec->page[i].sp == sp)
1111 return 0; 1103 return 0;
1112 1104
1113 pvec->page[pvec->nr].sp = sp; 1105 pvec->page[pvec->nr].sp = sp;
1114 pvec->page[pvec->nr].idx = idx; 1106 pvec->page[pvec->nr].idx = idx;
1115 pvec->nr++; 1107 pvec->nr++;
1116 return (pvec->nr == KVM_PAGE_ARRAY_NR); 1108 return (pvec->nr == KVM_PAGE_ARRAY_NR);
1117 } 1109 }
1118 1110
1119 static int __mmu_unsync_walk(struct kvm_mmu_page *sp, 1111 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
1120 struct kvm_mmu_pages *pvec) 1112 struct kvm_mmu_pages *pvec)
1121 { 1113 {
1122 int i, ret, nr_unsync_leaf = 0; 1114 int i, ret, nr_unsync_leaf = 0;
1123 1115
1124 for_each_unsync_children(sp->unsync_child_bitmap, i) { 1116 for_each_unsync_children(sp->unsync_child_bitmap, i) {
1125 struct kvm_mmu_page *child; 1117 struct kvm_mmu_page *child;
1126 u64 ent = sp->spt[i]; 1118 u64 ent = sp->spt[i];
1127 1119
1128 if (!is_shadow_present_pte(ent) || is_large_pte(ent)) 1120 if (!is_shadow_present_pte(ent) || is_large_pte(ent))
1129 goto clear_child_bitmap; 1121 goto clear_child_bitmap;
1130 1122
1131 child = page_header(ent & PT64_BASE_ADDR_MASK); 1123 child = page_header(ent & PT64_BASE_ADDR_MASK);
1132 1124
1133 if (child->unsync_children) { 1125 if (child->unsync_children) {
1134 if (mmu_pages_add(pvec, child, i)) 1126 if (mmu_pages_add(pvec, child, i))
1135 return -ENOSPC; 1127 return -ENOSPC;
1136 1128
1137 ret = __mmu_unsync_walk(child, pvec); 1129 ret = __mmu_unsync_walk(child, pvec);
1138 if (!ret) 1130 if (!ret)
1139 goto clear_child_bitmap; 1131 goto clear_child_bitmap;
1140 else if (ret > 0) 1132 else if (ret > 0)
1141 nr_unsync_leaf += ret; 1133 nr_unsync_leaf += ret;
1142 else 1134 else
1143 return ret; 1135 return ret;
1144 } else if (child->unsync) { 1136 } else if (child->unsync) {
1145 nr_unsync_leaf++; 1137 nr_unsync_leaf++;
1146 if (mmu_pages_add(pvec, child, i)) 1138 if (mmu_pages_add(pvec, child, i))
1147 return -ENOSPC; 1139 return -ENOSPC;
1148 } else 1140 } else
1149 goto clear_child_bitmap; 1141 goto clear_child_bitmap;
1150 1142
1151 continue; 1143 continue;
1152 1144
1153 clear_child_bitmap: 1145 clear_child_bitmap:
1154 __clear_bit(i, sp->unsync_child_bitmap); 1146 __clear_bit(i, sp->unsync_child_bitmap);
1155 sp->unsync_children--; 1147 sp->unsync_children--;
1156 WARN_ON((int)sp->unsync_children < 0); 1148 WARN_ON((int)sp->unsync_children < 0);
1157 } 1149 }
1158 1150
1159 1151
1160 return nr_unsync_leaf; 1152 return nr_unsync_leaf;
1161 } 1153 }
1162 1154
1163 static int mmu_unsync_walk(struct kvm_mmu_page *sp, 1155 static int mmu_unsync_walk(struct kvm_mmu_page *sp,
1164 struct kvm_mmu_pages *pvec) 1156 struct kvm_mmu_pages *pvec)
1165 { 1157 {
1166 if (!sp->unsync_children) 1158 if (!sp->unsync_children)
1167 return 0; 1159 return 0;
1168 1160
1169 mmu_pages_add(pvec, sp, 0); 1161 mmu_pages_add(pvec, sp, 0);
1170 return __mmu_unsync_walk(sp, pvec); 1162 return __mmu_unsync_walk(sp, pvec);
1171 } 1163 }
1172 1164
1173 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) 1165 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
1174 { 1166 {
1175 WARN_ON(!sp->unsync); 1167 WARN_ON(!sp->unsync);
1176 trace_kvm_mmu_sync_page(sp); 1168 trace_kvm_mmu_sync_page(sp);
1177 sp->unsync = 0; 1169 sp->unsync = 0;
1178 --kvm->stat.mmu_unsync; 1170 --kvm->stat.mmu_unsync;
1179 } 1171 }
1180 1172
1181 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, 1173 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
1182 struct list_head *invalid_list); 1174 struct list_head *invalid_list);
1183 static void kvm_mmu_commit_zap_page(struct kvm *kvm, 1175 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
1184 struct list_head *invalid_list); 1176 struct list_head *invalid_list);
1185 1177
1186 #define for_each_gfn_sp(kvm, sp, gfn, pos) \ 1178 #define for_each_gfn_sp(kvm, sp, gfn, pos) \
1187 hlist_for_each_entry(sp, pos, \ 1179 hlist_for_each_entry(sp, pos, \
1188 &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ 1180 &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
1189 if ((sp)->gfn != (gfn)) {} else 1181 if ((sp)->gfn != (gfn)) {} else
1190 1182
1191 #define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \ 1183 #define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \
1192 hlist_for_each_entry(sp, pos, \ 1184 hlist_for_each_entry(sp, pos, \
1193 &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ 1185 &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
1194 if ((sp)->gfn != (gfn) || (sp)->role.direct || \ 1186 if ((sp)->gfn != (gfn) || (sp)->role.direct || \
1195 (sp)->role.invalid) {} else 1187 (sp)->role.invalid) {} else
1196 1188
1197 /* @sp->gfn should be write-protected at the call site */ 1189 /* @sp->gfn should be write-protected at the call site */
1198 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1190 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
1199 struct list_head *invalid_list, bool clear_unsync) 1191 struct list_head *invalid_list, bool clear_unsync)
1200 { 1192 {
1201 if (sp->role.cr4_pae != !!is_pae(vcpu)) { 1193 if (sp->role.cr4_pae != !!is_pae(vcpu)) {
1202 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); 1194 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
1203 return 1; 1195 return 1;
1204 } 1196 }
1205 1197
1206 if (clear_unsync) 1198 if (clear_unsync)
1207 kvm_unlink_unsync_page(vcpu->kvm, sp); 1199 kvm_unlink_unsync_page(vcpu->kvm, sp);
1208 1200
1209 if (vcpu->arch.mmu.sync_page(vcpu, sp, clear_unsync)) { 1201 if (vcpu->arch.mmu.sync_page(vcpu, sp, clear_unsync)) {
1210 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); 1202 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
1211 return 1; 1203 return 1;
1212 } 1204 }
1213 1205
1214 kvm_mmu_flush_tlb(vcpu); 1206 kvm_mmu_flush_tlb(vcpu);
1215 return 0; 1207 return 0;
1216 } 1208 }
1217 1209
1218 static int kvm_sync_page_transient(struct kvm_vcpu *vcpu, 1210 static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
1219 struct kvm_mmu_page *sp) 1211 struct kvm_mmu_page *sp)
1220 { 1212 {
1221 LIST_HEAD(invalid_list); 1213 LIST_HEAD(invalid_list);
1222 int ret; 1214 int ret;
1223 1215
1224 ret = __kvm_sync_page(vcpu, sp, &invalid_list, false); 1216 ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
1225 if (ret) 1217 if (ret)
1226 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 1218 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
1227 1219
1228 return ret; 1220 return ret;
1229 } 1221 }
1230 1222
1231 static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1223 static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
1232 struct list_head *invalid_list) 1224 struct list_head *invalid_list)
1233 { 1225 {
1234 return __kvm_sync_page(vcpu, sp, invalid_list, true); 1226 return __kvm_sync_page(vcpu, sp, invalid_list, true);
1235 } 1227 }
1236 1228
1237 /* @gfn should be write-protected at the call site */ 1229 /* @gfn should be write-protected at the call site */
1238 static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) 1230 static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
1239 { 1231 {
1240 struct kvm_mmu_page *s; 1232 struct kvm_mmu_page *s;
1241 struct hlist_node *node; 1233 struct hlist_node *node;
1242 LIST_HEAD(invalid_list); 1234 LIST_HEAD(invalid_list);
1243 bool flush = false; 1235 bool flush = false;
1244 1236
1245 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { 1237 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
1246 if (!s->unsync) 1238 if (!s->unsync)
1247 continue; 1239 continue;
1248 1240
1249 WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); 1241 WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
1250 if ((s->role.cr4_pae != !!is_pae(vcpu)) || 1242 if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
1251 (vcpu->arch.mmu.sync_page(vcpu, s, true))) { 1243 (vcpu->arch.mmu.sync_page(vcpu, s, true))) {
1252 kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list); 1244 kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
1253 continue; 1245 continue;
1254 } 1246 }
1255 kvm_unlink_unsync_page(vcpu->kvm, s); 1247 kvm_unlink_unsync_page(vcpu->kvm, s);
1256 flush = true; 1248 flush = true;
1257 } 1249 }
1258 1250
1259 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 1251 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
1260 if (flush) 1252 if (flush)
1261 kvm_mmu_flush_tlb(vcpu); 1253 kvm_mmu_flush_tlb(vcpu);
1262 } 1254 }
1263 1255
1264 struct mmu_page_path { 1256 struct mmu_page_path {
1265 struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1]; 1257 struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
1266 unsigned int idx[PT64_ROOT_LEVEL-1]; 1258 unsigned int idx[PT64_ROOT_LEVEL-1];
1267 }; 1259 };
1268 1260
1269 #define for_each_sp(pvec, sp, parents, i) \ 1261 #define for_each_sp(pvec, sp, parents, i) \
1270 for (i = mmu_pages_next(&pvec, &parents, -1), \ 1262 for (i = mmu_pages_next(&pvec, &parents, -1), \
1271 sp = pvec.page[i].sp; \ 1263 sp = pvec.page[i].sp; \
1272 i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \ 1264 i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
1273 i = mmu_pages_next(&pvec, &parents, i)) 1265 i = mmu_pages_next(&pvec, &parents, i))
1274 1266
1275 static int mmu_pages_next(struct kvm_mmu_pages *pvec, 1267 static int mmu_pages_next(struct kvm_mmu_pages *pvec,
1276 struct mmu_page_path *parents, 1268 struct mmu_page_path *parents,
1277 int i) 1269 int i)
1278 { 1270 {
1279 int n; 1271 int n;
1280 1272
1281 for (n = i+1; n < pvec->nr; n++) { 1273 for (n = i+1; n < pvec->nr; n++) {
1282 struct kvm_mmu_page *sp = pvec->page[n].sp; 1274 struct kvm_mmu_page *sp = pvec->page[n].sp;
1283 1275
1284 if (sp->role.level == PT_PAGE_TABLE_LEVEL) { 1276 if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
1285 parents->idx[0] = pvec->page[n].idx; 1277 parents->idx[0] = pvec->page[n].idx;
1286 return n; 1278 return n;
1287 } 1279 }
1288 1280
1289 parents->parent[sp->role.level-2] = sp; 1281 parents->parent[sp->role.level-2] = sp;
1290 parents->idx[sp->role.level-1] = pvec->page[n].idx; 1282 parents->idx[sp->role.level-1] = pvec->page[n].idx;
1291 } 1283 }
1292 1284
1293 return n; 1285 return n;
1294 } 1286 }
1295 1287
1296 static void mmu_pages_clear_parents(struct mmu_page_path *parents) 1288 static void mmu_pages_clear_parents(struct mmu_page_path *parents)
1297 { 1289 {
1298 struct kvm_mmu_page *sp; 1290 struct kvm_mmu_page *sp;
1299 unsigned int level = 0; 1291 unsigned int level = 0;
1300 1292
1301 do { 1293 do {
1302 unsigned int idx = parents->idx[level]; 1294 unsigned int idx = parents->idx[level];
1303 1295
1304 sp = parents->parent[level]; 1296 sp = parents->parent[level];
1305 if (!sp) 1297 if (!sp)
1306 return; 1298 return;
1307 1299
1308 --sp->unsync_children; 1300 --sp->unsync_children;
1309 WARN_ON((int)sp->unsync_children < 0); 1301 WARN_ON((int)sp->unsync_children < 0);
1310 __clear_bit(idx, sp->unsync_child_bitmap); 1302 __clear_bit(idx, sp->unsync_child_bitmap);
1311 level++; 1303 level++;
1312 } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); 1304 } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
1313 } 1305 }
1314 1306
1315 static void kvm_mmu_pages_init(struct kvm_mmu_page *parent, 1307 static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
1316 struct mmu_page_path *parents, 1308 struct mmu_page_path *parents,
1317 struct kvm_mmu_pages *pvec) 1309 struct kvm_mmu_pages *pvec)
1318 { 1310 {
1319 parents->parent[parent->role.level-1] = NULL; 1311 parents->parent[parent->role.level-1] = NULL;
1320 pvec->nr = 0; 1312 pvec->nr = 0;
1321 } 1313 }
1322 1314
1323 static void mmu_sync_children(struct kvm_vcpu *vcpu, 1315 static void mmu_sync_children(struct kvm_vcpu *vcpu,
1324 struct kvm_mmu_page *parent) 1316 struct kvm_mmu_page *parent)
1325 { 1317 {
1326 int i; 1318 int i;
1327 struct kvm_mmu_page *sp; 1319 struct kvm_mmu_page *sp;
1328 struct mmu_page_path parents; 1320 struct mmu_page_path parents;
1329 struct kvm_mmu_pages pages; 1321 struct kvm_mmu_pages pages;
1330 LIST_HEAD(invalid_list); 1322 LIST_HEAD(invalid_list);
1331 1323
1332 kvm_mmu_pages_init(parent, &parents, &pages); 1324 kvm_mmu_pages_init(parent, &parents, &pages);
1333 while (mmu_unsync_walk(parent, &pages)) { 1325 while (mmu_unsync_walk(parent, &pages)) {
1334 int protected = 0; 1326 int protected = 0;
1335 1327
1336 for_each_sp(pages, sp, parents, i) 1328 for_each_sp(pages, sp, parents, i)
1337 protected |= rmap_write_protect(vcpu->kvm, sp->gfn); 1329 protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
1338 1330
1339 if (protected) 1331 if (protected)
1340 kvm_flush_remote_tlbs(vcpu->kvm); 1332 kvm_flush_remote_tlbs(vcpu->kvm);
1341 1333
1342 for_each_sp(pages, sp, parents, i) { 1334 for_each_sp(pages, sp, parents, i) {
1343 kvm_sync_page(vcpu, sp, &invalid_list); 1335 kvm_sync_page(vcpu, sp, &invalid_list);
1344 mmu_pages_clear_parents(&parents); 1336 mmu_pages_clear_parents(&parents);
1345 } 1337 }
1346 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 1338 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
1347 cond_resched_lock(&vcpu->kvm->mmu_lock); 1339 cond_resched_lock(&vcpu->kvm->mmu_lock);
1348 kvm_mmu_pages_init(parent, &parents, &pages); 1340 kvm_mmu_pages_init(parent, &parents, &pages);
1349 } 1341 }
1350 } 1342 }
1351 1343
1352 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, 1344 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
1353 gfn_t gfn, 1345 gfn_t gfn,
1354 gva_t gaddr, 1346 gva_t gaddr,
1355 unsigned level, 1347 unsigned level,
1356 int direct, 1348 int direct,
1357 unsigned access, 1349 unsigned access,
1358 u64 *parent_pte) 1350 u64 *parent_pte)
1359 { 1351 {
1360 union kvm_mmu_page_role role; 1352 union kvm_mmu_page_role role;
1361 unsigned quadrant; 1353 unsigned quadrant;
1362 struct kvm_mmu_page *sp; 1354 struct kvm_mmu_page *sp;
1363 struct hlist_node *node; 1355 struct hlist_node *node;
1364 bool need_sync = false; 1356 bool need_sync = false;
1365 1357
1366 role = vcpu->arch.mmu.base_role; 1358 role = vcpu->arch.mmu.base_role;
1367 role.level = level; 1359 role.level = level;
1368 role.direct = direct; 1360 role.direct = direct;
1369 if (role.direct) 1361 if (role.direct)
1370 role.cr4_pae = 0; 1362 role.cr4_pae = 0;
1371 role.access = access; 1363 role.access = access;
1372 if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) { 1364 if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
1373 quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); 1365 quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
1374 quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; 1366 quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
1375 role.quadrant = quadrant; 1367 role.quadrant = quadrant;
1376 } 1368 }
1377 for_each_gfn_sp(vcpu->kvm, sp, gfn, node) { 1369 for_each_gfn_sp(vcpu->kvm, sp, gfn, node) {
1378 if (!need_sync && sp->unsync) 1370 if (!need_sync && sp->unsync)
1379 need_sync = true; 1371 need_sync = true;
1380 1372
1381 if (sp->role.word != role.word) 1373 if (sp->role.word != role.word)
1382 continue; 1374 continue;
1383 1375
1384 if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) 1376 if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
1385 break; 1377 break;
1386 1378
1387 mmu_page_add_parent_pte(vcpu, sp, parent_pte); 1379 mmu_page_add_parent_pte(vcpu, sp, parent_pte);
1388 if (sp->unsync_children) { 1380 if (sp->unsync_children) {
1389 set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests); 1381 set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
1390 kvm_mmu_mark_parents_unsync(sp); 1382 kvm_mmu_mark_parents_unsync(sp);
1391 } else if (sp->unsync) 1383 } else if (sp->unsync)
1392 kvm_mmu_mark_parents_unsync(sp); 1384 kvm_mmu_mark_parents_unsync(sp);
1393 1385
1394 trace_kvm_mmu_get_page(sp, false); 1386 trace_kvm_mmu_get_page(sp, false);
1395 return sp; 1387 return sp;
1396 } 1388 }
1397 ++vcpu->kvm->stat.mmu_cache_miss; 1389 ++vcpu->kvm->stat.mmu_cache_miss;
1398 sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); 1390 sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
1399 if (!sp) 1391 if (!sp)
1400 return sp; 1392 return sp;
1401 sp->gfn = gfn; 1393 sp->gfn = gfn;
1402 sp->role = role; 1394 sp->role = role;
1403 hlist_add_head(&sp->hash_link, 1395 hlist_add_head(&sp->hash_link,
1404 &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]); 1396 &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
1405 if (!direct) { 1397 if (!direct) {
1406 if (rmap_write_protect(vcpu->kvm, gfn)) 1398 if (rmap_write_protect(vcpu->kvm, gfn))
1407 kvm_flush_remote_tlbs(vcpu->kvm); 1399 kvm_flush_remote_tlbs(vcpu->kvm);
1408 if (level > PT_PAGE_TABLE_LEVEL && need_sync) 1400 if (level > PT_PAGE_TABLE_LEVEL && need_sync)
1409 kvm_sync_pages(vcpu, gfn); 1401 kvm_sync_pages(vcpu, gfn);
1410 1402
1411 account_shadowed(vcpu->kvm, gfn); 1403 account_shadowed(vcpu->kvm, gfn);
1412 } 1404 }
1413 if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte) 1405 if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
1414 vcpu->arch.mmu.prefetch_page(vcpu, sp); 1406 vcpu->arch.mmu.prefetch_page(vcpu, sp);
1415 else 1407 else
1416 nonpaging_prefetch_page(vcpu, sp); 1408 nonpaging_prefetch_page(vcpu, sp);
1417 trace_kvm_mmu_get_page(sp, true); 1409 trace_kvm_mmu_get_page(sp, true);
1418 return sp; 1410 return sp;
1419 } 1411 }
1420 1412
1421 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, 1413 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
1422 struct kvm_vcpu *vcpu, u64 addr) 1414 struct kvm_vcpu *vcpu, u64 addr)
1423 { 1415 {
1424 iterator->addr = addr; 1416 iterator->addr = addr;
1425 iterator->shadow_addr = vcpu->arch.mmu.root_hpa; 1417 iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
1426 iterator->level = vcpu->arch.mmu.shadow_root_level; 1418 iterator->level = vcpu->arch.mmu.shadow_root_level;
1427 if (iterator->level == PT32E_ROOT_LEVEL) { 1419 if (iterator->level == PT32E_ROOT_LEVEL) {
1428 iterator->shadow_addr 1420 iterator->shadow_addr
1429 = vcpu->arch.mmu.pae_root[(addr >> 30) & 3]; 1421 = vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
1430 iterator->shadow_addr &= PT64_BASE_ADDR_MASK; 1422 iterator->shadow_addr &= PT64_BASE_ADDR_MASK;
1431 --iterator->level; 1423 --iterator->level;
1432 if (!iterator->shadow_addr) 1424 if (!iterator->shadow_addr)
1433 iterator->level = 0; 1425 iterator->level = 0;
1434 } 1426 }
1435 } 1427 }
1436 1428
1437 static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) 1429 static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
1438 { 1430 {
1439 if (iterator->level < PT_PAGE_TABLE_LEVEL) 1431 if (iterator->level < PT_PAGE_TABLE_LEVEL)
1440 return false; 1432 return false;
1441 1433
1442 if (iterator->level == PT_PAGE_TABLE_LEVEL) 1434 if (iterator->level == PT_PAGE_TABLE_LEVEL)
1443 if (is_large_pte(*iterator->sptep)) 1435 if (is_large_pte(*iterator->sptep))
1444 return false; 1436 return false;
1445 1437
1446 iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level); 1438 iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
1447 iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index; 1439 iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
1448 return true; 1440 return true;
1449 } 1441 }
1450 1442
1451 static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) 1443 static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
1452 { 1444 {
1453 iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK; 1445 iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK;
1454 --iterator->level; 1446 --iterator->level;
1455 } 1447 }
1456 1448
1457 static void kvm_mmu_page_unlink_children(struct kvm *kvm, 1449 static void kvm_mmu_page_unlink_children(struct kvm *kvm,
1458 struct kvm_mmu_page *sp) 1450 struct kvm_mmu_page *sp)
1459 { 1451 {
1460 unsigned i; 1452 unsigned i;
1461 u64 *pt; 1453 u64 *pt;
1462 u64 ent; 1454 u64 ent;
1463 1455
1464 pt = sp->spt; 1456 pt = sp->spt;
1465 1457
1466 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { 1458 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
1467 ent = pt[i]; 1459 ent = pt[i];
1468 1460
1469 if (is_shadow_present_pte(ent)) { 1461 if (is_shadow_present_pte(ent)) {
1470 if (!is_last_spte(ent, sp->role.level)) { 1462 if (!is_last_spte(ent, sp->role.level)) {
1471 ent &= PT64_BASE_ADDR_MASK; 1463 ent &= PT64_BASE_ADDR_MASK;
1472 mmu_page_remove_parent_pte(page_header(ent), 1464 mmu_page_remove_parent_pte(page_header(ent),
1473 &pt[i]); 1465 &pt[i]);
1474 } else { 1466 } else {
1475 if (is_large_pte(ent)) 1467 if (is_large_pte(ent))
1476 --kvm->stat.lpages; 1468 --kvm->stat.lpages;
1477 rmap_remove(kvm, &pt[i]); 1469 rmap_remove(kvm, &pt[i]);
1478 } 1470 }
1479 } 1471 }
1480 pt[i] = shadow_trap_nonpresent_pte; 1472 pt[i] = shadow_trap_nonpresent_pte;
1481 } 1473 }
1482 } 1474 }
1483 1475
1484 static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) 1476 static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
1485 { 1477 {
1486 mmu_page_remove_parent_pte(sp, parent_pte); 1478 mmu_page_remove_parent_pte(sp, parent_pte);
1487 } 1479 }
1488 1480
1489 static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm) 1481 static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
1490 { 1482 {
1491 int i; 1483 int i;
1492 struct kvm_vcpu *vcpu; 1484 struct kvm_vcpu *vcpu;
1493 1485
1494 kvm_for_each_vcpu(i, vcpu, kvm) 1486 kvm_for_each_vcpu(i, vcpu, kvm)
1495 vcpu->arch.last_pte_updated = NULL; 1487 vcpu->arch.last_pte_updated = NULL;
1496 } 1488 }
1497 1489
1498 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) 1490 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
1499 { 1491 {
1500 u64 *parent_pte; 1492 u64 *parent_pte;
1501 1493
1502 while (sp->multimapped || sp->parent_pte) { 1494 while (sp->multimapped || sp->parent_pte) {
1503 if (!sp->multimapped) 1495 if (!sp->multimapped)
1504 parent_pte = sp->parent_pte; 1496 parent_pte = sp->parent_pte;
1505 else { 1497 else {
1506 struct kvm_pte_chain *chain; 1498 struct kvm_pte_chain *chain;
1507 1499
1508 chain = container_of(sp->parent_ptes.first, 1500 chain = container_of(sp->parent_ptes.first,
1509 struct kvm_pte_chain, link); 1501 struct kvm_pte_chain, link);
1510 parent_pte = chain->parent_ptes[0]; 1502 parent_pte = chain->parent_ptes[0];
1511 } 1503 }
1512 BUG_ON(!parent_pte); 1504 BUG_ON(!parent_pte);
1513 kvm_mmu_put_page(sp, parent_pte); 1505 kvm_mmu_put_page(sp, parent_pte);
1514 __set_spte(parent_pte, shadow_trap_nonpresent_pte); 1506 __set_spte(parent_pte, shadow_trap_nonpresent_pte);
1515 } 1507 }
1516 } 1508 }
1517 1509
1518 static int mmu_zap_unsync_children(struct kvm *kvm, 1510 static int mmu_zap_unsync_children(struct kvm *kvm,
1519 struct kvm_mmu_page *parent, 1511 struct kvm_mmu_page *parent,
1520 struct list_head *invalid_list) 1512 struct list_head *invalid_list)
1521 { 1513 {
1522 int i, zapped = 0; 1514 int i, zapped = 0;
1523 struct mmu_page_path parents; 1515 struct mmu_page_path parents;
1524 struct kvm_mmu_pages pages; 1516 struct kvm_mmu_pages pages;
1525 1517
1526 if (parent->role.level == PT_PAGE_TABLE_LEVEL) 1518 if (parent->role.level == PT_PAGE_TABLE_LEVEL)
1527 return 0; 1519 return 0;
1528 1520
1529 kvm_mmu_pages_init(parent, &parents, &pages); 1521 kvm_mmu_pages_init(parent, &parents, &pages);
1530 while (mmu_unsync_walk(parent, &pages)) { 1522 while (mmu_unsync_walk(parent, &pages)) {
1531 struct kvm_mmu_page *sp; 1523 struct kvm_mmu_page *sp;
1532 1524
1533 for_each_sp(pages, sp, parents, i) { 1525 for_each_sp(pages, sp, parents, i) {
1534 kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); 1526 kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
1535 mmu_pages_clear_parents(&parents); 1527 mmu_pages_clear_parents(&parents);
1536 zapped++; 1528 zapped++;
1537 } 1529 }
1538 kvm_mmu_pages_init(parent, &parents, &pages); 1530 kvm_mmu_pages_init(parent, &parents, &pages);
1539 } 1531 }
1540 1532
1541 return zapped; 1533 return zapped;
1542 } 1534 }
1543 1535
1544 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, 1536 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
1545 struct list_head *invalid_list) 1537 struct list_head *invalid_list)
1546 { 1538 {
1547 int ret; 1539 int ret;
1548 1540
1549 trace_kvm_mmu_prepare_zap_page(sp); 1541 trace_kvm_mmu_prepare_zap_page(sp);
1550 ++kvm->stat.mmu_shadow_zapped; 1542 ++kvm->stat.mmu_shadow_zapped;
1551 ret = mmu_zap_unsync_children(kvm, sp, invalid_list); 1543 ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
1552 kvm_mmu_page_unlink_children(kvm, sp); 1544 kvm_mmu_page_unlink_children(kvm, sp);
1553 kvm_mmu_unlink_parents(kvm, sp); 1545 kvm_mmu_unlink_parents(kvm, sp);
1554 if (!sp->role.invalid && !sp->role.direct) 1546 if (!sp->role.invalid && !sp->role.direct)
1555 unaccount_shadowed(kvm, sp->gfn); 1547 unaccount_shadowed(kvm, sp->gfn);
1556 if (sp->unsync) 1548 if (sp->unsync)
1557 kvm_unlink_unsync_page(kvm, sp); 1549 kvm_unlink_unsync_page(kvm, sp);
1558 if (!sp->root_count) { 1550 if (!sp->root_count) {
1559 /* Count self */ 1551 /* Count self */
1560 ret++; 1552 ret++;
1561 list_move(&sp->link, invalid_list); 1553 list_move(&sp->link, invalid_list);
1562 } else { 1554 } else {
1563 list_move(&sp->link, &kvm->arch.active_mmu_pages); 1555 list_move(&sp->link, &kvm->arch.active_mmu_pages);
1564 kvm_reload_remote_mmus(kvm); 1556 kvm_reload_remote_mmus(kvm);
1565 } 1557 }
1566 1558
1567 sp->role.invalid = 1; 1559 sp->role.invalid = 1;
1568 kvm_mmu_reset_last_pte_updated(kvm); 1560 kvm_mmu_reset_last_pte_updated(kvm);
1569 return ret; 1561 return ret;
1570 } 1562 }
1571 1563
1572 static void kvm_mmu_commit_zap_page(struct kvm *kvm, 1564 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
1573 struct list_head *invalid_list) 1565 struct list_head *invalid_list)
1574 { 1566 {
1575 struct kvm_mmu_page *sp; 1567 struct kvm_mmu_page *sp;
1576 1568
1577 if (list_empty(invalid_list)) 1569 if (list_empty(invalid_list))
1578 return; 1570 return;
1579 1571
1580 kvm_flush_remote_tlbs(kvm); 1572 kvm_flush_remote_tlbs(kvm);
1581 1573
1582 do { 1574 do {
1583 sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); 1575 sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
1584 WARN_ON(!sp->role.invalid || sp->root_count); 1576 WARN_ON(!sp->role.invalid || sp->root_count);
1585 kvm_mmu_free_page(kvm, sp); 1577 kvm_mmu_free_page(kvm, sp);
1586 } while (!list_empty(invalid_list)); 1578 } while (!list_empty(invalid_list));
1587 1579
1588 } 1580 }
1589 1581
1590 /* 1582 /*
1591 * Changing the number of mmu pages allocated to the vm 1583 * Changing the number of mmu pages allocated to the vm
1592 * Note: if kvm_nr_mmu_pages is too small, you will get dead lock 1584 * Note: if kvm_nr_mmu_pages is too small, you will get dead lock
1593 */ 1585 */
1594 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) 1586 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages)
1595 { 1587 {
1596 int used_pages; 1588 int used_pages;
1597 LIST_HEAD(invalid_list); 1589 LIST_HEAD(invalid_list);
1598 1590
1599 used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages; 1591 used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages;
1600 used_pages = max(0, used_pages); 1592 used_pages = max(0, used_pages);
1601 1593
1602 /* 1594 /*
1603 * If we set the number of mmu pages to be smaller be than the 1595 * If we set the number of mmu pages to be smaller be than the
1604 * number of actived pages , we must to free some mmu pages before we 1596 * number of actived pages , we must to free some mmu pages before we
1605 * change the value 1597 * change the value
1606 */ 1598 */
1607 1599
1608 if (used_pages > kvm_nr_mmu_pages) { 1600 if (used_pages > kvm_nr_mmu_pages) {
1609 while (used_pages > kvm_nr_mmu_pages && 1601 while (used_pages > kvm_nr_mmu_pages &&
1610 !list_empty(&kvm->arch.active_mmu_pages)) { 1602 !list_empty(&kvm->arch.active_mmu_pages)) {
1611 struct kvm_mmu_page *page; 1603 struct kvm_mmu_page *page;
1612 1604
1613 page = container_of(kvm->arch.active_mmu_pages.prev, 1605 page = container_of(kvm->arch.active_mmu_pages.prev,
1614 struct kvm_mmu_page, link); 1606 struct kvm_mmu_page, link);
1615 used_pages -= kvm_mmu_prepare_zap_page(kvm, page, 1607 used_pages -= kvm_mmu_prepare_zap_page(kvm, page,
1616 &invalid_list); 1608 &invalid_list);
1617 } 1609 }
1618 kvm_mmu_commit_zap_page(kvm, &invalid_list); 1610 kvm_mmu_commit_zap_page(kvm, &invalid_list);
1619 kvm_nr_mmu_pages = used_pages; 1611 kvm_nr_mmu_pages = used_pages;
1620 kvm->arch.n_free_mmu_pages = 0; 1612 kvm->arch.n_free_mmu_pages = 0;
1621 } 1613 }
1622 else 1614 else
1623 kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages 1615 kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages
1624 - kvm->arch.n_alloc_mmu_pages; 1616 - kvm->arch.n_alloc_mmu_pages;
1625 1617
1626 kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages; 1618 kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages;
1627 } 1619 }
1628 1620
1629 static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) 1621 static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
1630 { 1622 {
1631 struct kvm_mmu_page *sp; 1623 struct kvm_mmu_page *sp;
1632 struct hlist_node *node; 1624 struct hlist_node *node;
1633 LIST_HEAD(invalid_list); 1625 LIST_HEAD(invalid_list);
1634 int r; 1626 int r;
1635 1627
1636 pgprintk("%s: looking for gfn %lx\n", __func__, gfn); 1628 pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
1637 r = 0; 1629 r = 0;
1638 1630
1639 for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { 1631 for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
1640 pgprintk("%s: gfn %lx role %x\n", __func__, gfn, 1632 pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
1641 sp->role.word); 1633 sp->role.word);
1642 r = 1; 1634 r = 1;
1643 kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 1635 kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
1644 } 1636 }
1645 kvm_mmu_commit_zap_page(kvm, &invalid_list); 1637 kvm_mmu_commit_zap_page(kvm, &invalid_list);
1646 return r; 1638 return r;
1647 } 1639 }
1648 1640
1649 static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) 1641 static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
1650 { 1642 {
1651 struct kvm_mmu_page *sp; 1643 struct kvm_mmu_page *sp;
1652 struct hlist_node *node; 1644 struct hlist_node *node;
1653 LIST_HEAD(invalid_list); 1645 LIST_HEAD(invalid_list);
1654 1646
1655 for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { 1647 for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
1656 pgprintk("%s: zap %lx %x\n", 1648 pgprintk("%s: zap %lx %x\n",
1657 __func__, gfn, sp->role.word); 1649 __func__, gfn, sp->role.word);
1658 kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 1650 kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
1659 } 1651 }
1660 kvm_mmu_commit_zap_page(kvm, &invalid_list); 1652 kvm_mmu_commit_zap_page(kvm, &invalid_list);
1661 } 1653 }
1662 1654
1663 static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn) 1655 static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
1664 { 1656 {
1665 int slot = memslot_id(kvm, gfn); 1657 int slot = memslot_id(kvm, gfn);
1666 struct kvm_mmu_page *sp = page_header(__pa(pte)); 1658 struct kvm_mmu_page *sp = page_header(__pa(pte));
1667 1659
1668 __set_bit(slot, sp->slot_bitmap); 1660 __set_bit(slot, sp->slot_bitmap);
1669 } 1661 }
1670 1662
1671 static void mmu_convert_notrap(struct kvm_mmu_page *sp) 1663 static void mmu_convert_notrap(struct kvm_mmu_page *sp)
1672 { 1664 {
1673 int i; 1665 int i;
1674 u64 *pt = sp->spt; 1666 u64 *pt = sp->spt;
1675 1667
1676 if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte) 1668 if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte)
1677 return; 1669 return;
1678 1670
1679 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { 1671 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
1680 if (pt[i] == shadow_notrap_nonpresent_pte) 1672 if (pt[i] == shadow_notrap_nonpresent_pte)
1681 __set_spte(&pt[i], shadow_trap_nonpresent_pte); 1673 __set_spte(&pt[i], shadow_trap_nonpresent_pte);
1682 } 1674 }
1683 } 1675 }
1684 1676
1685 /* 1677 /*
1686 * The function is based on mtrr_type_lookup() in 1678 * The function is based on mtrr_type_lookup() in
1687 * arch/x86/kernel/cpu/mtrr/generic.c 1679 * arch/x86/kernel/cpu/mtrr/generic.c
1688 */ 1680 */
1689 static int get_mtrr_type(struct mtrr_state_type *mtrr_state, 1681 static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
1690 u64 start, u64 end) 1682 u64 start, u64 end)
1691 { 1683 {
1692 int i; 1684 int i;
1693 u64 base, mask; 1685 u64 base, mask;
1694 u8 prev_match, curr_match; 1686 u8 prev_match, curr_match;
1695 int num_var_ranges = KVM_NR_VAR_MTRR; 1687 int num_var_ranges = KVM_NR_VAR_MTRR;
1696 1688
1697 if (!mtrr_state->enabled) 1689 if (!mtrr_state->enabled)
1698 return 0xFF; 1690 return 0xFF;
1699 1691
1700 /* Make end inclusive end, instead of exclusive */ 1692 /* Make end inclusive end, instead of exclusive */
1701 end--; 1693 end--;
1702 1694
1703 /* Look in fixed ranges. Just return the type as per start */ 1695 /* Look in fixed ranges. Just return the type as per start */
1704 if (mtrr_state->have_fixed && (start < 0x100000)) { 1696 if (mtrr_state->have_fixed && (start < 0x100000)) {
1705 int idx; 1697 int idx;
1706 1698
1707 if (start < 0x80000) { 1699 if (start < 0x80000) {
1708 idx = 0; 1700 idx = 0;
1709 idx += (start >> 16); 1701 idx += (start >> 16);
1710 return mtrr_state->fixed_ranges[idx]; 1702 return mtrr_state->fixed_ranges[idx];
1711 } else if (start < 0xC0000) { 1703 } else if (start < 0xC0000) {
1712 idx = 1 * 8; 1704 idx = 1 * 8;
1713 idx += ((start - 0x80000) >> 14); 1705 idx += ((start - 0x80000) >> 14);
1714 return mtrr_state->fixed_ranges[idx]; 1706 return mtrr_state->fixed_ranges[idx];
1715 } else if (start < 0x1000000) { 1707 } else if (start < 0x1000000) {
1716 idx = 3 * 8; 1708 idx = 3 * 8;
1717 idx += ((start - 0xC0000) >> 12); 1709 idx += ((start - 0xC0000) >> 12);
1718 return mtrr_state->fixed_ranges[idx]; 1710 return mtrr_state->fixed_ranges[idx];
1719 } 1711 }
1720 } 1712 }
1721 1713
1722 /* 1714 /*
1723 * Look in variable ranges 1715 * Look in variable ranges
1724 * Look of multiple ranges matching this address and pick type 1716 * Look of multiple ranges matching this address and pick type
1725 * as per MTRR precedence 1717 * as per MTRR precedence
1726 */ 1718 */
1727 if (!(mtrr_state->enabled & 2)) 1719 if (!(mtrr_state->enabled & 2))
1728 return mtrr_state->def_type; 1720 return mtrr_state->def_type;
1729 1721
1730 prev_match = 0xFF; 1722 prev_match = 0xFF;
1731 for (i = 0; i < num_var_ranges; ++i) { 1723 for (i = 0; i < num_var_ranges; ++i) {
1732 unsigned short start_state, end_state; 1724 unsigned short start_state, end_state;
1733 1725
1734 if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11))) 1726 if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11)))
1735 continue; 1727 continue;
1736 1728
1737 base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) + 1729 base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) +
1738 (mtrr_state->var_ranges[i].base_lo & PAGE_MASK); 1730 (mtrr_state->var_ranges[i].base_lo & PAGE_MASK);
1739 mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) + 1731 mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) +
1740 (mtrr_state->var_ranges[i].mask_lo & PAGE_MASK); 1732 (mtrr_state->var_ranges[i].mask_lo & PAGE_MASK);
1741 1733
1742 start_state = ((start & mask) == (base & mask)); 1734 start_state = ((start & mask) == (base & mask));
1743 end_state = ((end & mask) == (base & mask)); 1735 end_state = ((end & mask) == (base & mask));
1744 if (start_state != end_state) 1736 if (start_state != end_state)
1745 return 0xFE; 1737 return 0xFE;
1746 1738
1747 if ((start & mask) != (base & mask)) 1739 if ((start & mask) != (base & mask))
1748 continue; 1740 continue;
1749 1741
1750 curr_match = mtrr_state->var_ranges[i].base_lo & 0xff; 1742 curr_match = mtrr_state->var_ranges[i].base_lo & 0xff;
1751 if (prev_match == 0xFF) { 1743 if (prev_match == 0xFF) {
1752 prev_match = curr_match; 1744 prev_match = curr_match;
1753 continue; 1745 continue;
1754 } 1746 }
1755 1747
1756 if (prev_match == MTRR_TYPE_UNCACHABLE || 1748 if (prev_match == MTRR_TYPE_UNCACHABLE ||
1757 curr_match == MTRR_TYPE_UNCACHABLE) 1749 curr_match == MTRR_TYPE_UNCACHABLE)
1758 return MTRR_TYPE_UNCACHABLE; 1750 return MTRR_TYPE_UNCACHABLE;
1759 1751
1760 if ((prev_match == MTRR_TYPE_WRBACK && 1752 if ((prev_match == MTRR_TYPE_WRBACK &&
1761 curr_match == MTRR_TYPE_WRTHROUGH) || 1753 curr_match == MTRR_TYPE_WRTHROUGH) ||
1762 (prev_match == MTRR_TYPE_WRTHROUGH && 1754 (prev_match == MTRR_TYPE_WRTHROUGH &&
1763 curr_match == MTRR_TYPE_WRBACK)) { 1755 curr_match == MTRR_TYPE_WRBACK)) {
1764 prev_match = MTRR_TYPE_WRTHROUGH; 1756 prev_match = MTRR_TYPE_WRTHROUGH;
1765 curr_match = MTRR_TYPE_WRTHROUGH; 1757 curr_match = MTRR_TYPE_WRTHROUGH;
1766 } 1758 }
1767 1759
1768 if (prev_match != curr_match) 1760 if (prev_match != curr_match)
1769 return MTRR_TYPE_UNCACHABLE; 1761 return MTRR_TYPE_UNCACHABLE;
1770 } 1762 }
1771 1763
1772 if (prev_match != 0xFF) 1764 if (prev_match != 0xFF)
1773 return prev_match; 1765 return prev_match;
1774 1766
1775 return mtrr_state->def_type; 1767 return mtrr_state->def_type;
1776 } 1768 }
1777 1769
1778 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) 1770 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
1779 { 1771 {
1780 u8 mtrr; 1772 u8 mtrr;
1781 1773
1782 mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT, 1774 mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT,
1783 (gfn << PAGE_SHIFT) + PAGE_SIZE); 1775 (gfn << PAGE_SHIFT) + PAGE_SIZE);
1784 if (mtrr == 0xfe || mtrr == 0xff) 1776 if (mtrr == 0xfe || mtrr == 0xff)
1785 mtrr = MTRR_TYPE_WRBACK; 1777 mtrr = MTRR_TYPE_WRBACK;
1786 return mtrr; 1778 return mtrr;
1787 } 1779 }
1788 EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type); 1780 EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
1789 1781
1790 static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) 1782 static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
1791 { 1783 {
1792 trace_kvm_mmu_unsync_page(sp); 1784 trace_kvm_mmu_unsync_page(sp);
1793 ++vcpu->kvm->stat.mmu_unsync; 1785 ++vcpu->kvm->stat.mmu_unsync;
1794 sp->unsync = 1; 1786 sp->unsync = 1;
1795 1787
1796 kvm_mmu_mark_parents_unsync(sp); 1788 kvm_mmu_mark_parents_unsync(sp);
1797 mmu_convert_notrap(sp); 1789 mmu_convert_notrap(sp);
1798 } 1790 }
1799 1791
1800 static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) 1792 static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
1801 { 1793 {
1802 struct kvm_mmu_page *s; 1794 struct kvm_mmu_page *s;
1803 struct hlist_node *node; 1795 struct hlist_node *node;
1804 1796
1805 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { 1797 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
1806 if (s->unsync) 1798 if (s->unsync)
1807 continue; 1799 continue;
1808 WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); 1800 WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
1809 __kvm_unsync_page(vcpu, s); 1801 __kvm_unsync_page(vcpu, s);
1810 } 1802 }
1811 } 1803 }
1812 1804
1813 static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, 1805 static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
1814 bool can_unsync) 1806 bool can_unsync)
1815 { 1807 {
1816 struct kvm_mmu_page *s; 1808 struct kvm_mmu_page *s;
1817 struct hlist_node *node; 1809 struct hlist_node *node;
1818 bool need_unsync = false; 1810 bool need_unsync = false;
1819 1811
1820 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { 1812 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
1821 if (s->role.level != PT_PAGE_TABLE_LEVEL) 1813 if (s->role.level != PT_PAGE_TABLE_LEVEL)
1822 return 1; 1814 return 1;
1823 1815
1824 if (!need_unsync && !s->unsync) { 1816 if (!need_unsync && !s->unsync) {
1825 if (!can_unsync || !oos_shadow) 1817 if (!can_unsync || !oos_shadow)
1826 return 1; 1818 return 1;
1827 need_unsync = true; 1819 need_unsync = true;
1828 } 1820 }
1829 } 1821 }
1830 if (need_unsync) 1822 if (need_unsync)
1831 kvm_unsync_pages(vcpu, gfn); 1823 kvm_unsync_pages(vcpu, gfn);
1832 return 0; 1824 return 0;
1833 } 1825 }
1834 1826
1835 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, 1827 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
1836 unsigned pte_access, int user_fault, 1828 unsigned pte_access, int user_fault,
1837 int write_fault, int dirty, int level, 1829 int write_fault, int dirty, int level,
1838 gfn_t gfn, pfn_t pfn, bool speculative, 1830 gfn_t gfn, pfn_t pfn, bool speculative,
1839 bool can_unsync, bool reset_host_protection) 1831 bool can_unsync, bool reset_host_protection)
1840 { 1832 {
1841 u64 spte; 1833 u64 spte;
1842 int ret = 0; 1834 int ret = 0;
1843 1835
1844 /* 1836 /*
1845 * We don't set the accessed bit, since we sometimes want to see 1837 * We don't set the accessed bit, since we sometimes want to see
1846 * whether the guest actually used the pte (in order to detect 1838 * whether the guest actually used the pte (in order to detect
1847 * demand paging). 1839 * demand paging).
1848 */ 1840 */
1849 spte = shadow_base_present_pte | shadow_dirty_mask; 1841 spte = shadow_base_present_pte | shadow_dirty_mask;
1850 if (!speculative) 1842 if (!speculative)
1851 spte |= shadow_accessed_mask; 1843 spte |= shadow_accessed_mask;
1852 if (!dirty) 1844 if (!dirty)
1853 pte_access &= ~ACC_WRITE_MASK; 1845 pte_access &= ~ACC_WRITE_MASK;
1854 if (pte_access & ACC_EXEC_MASK) 1846 if (pte_access & ACC_EXEC_MASK)
1855 spte |= shadow_x_mask; 1847 spte |= shadow_x_mask;
1856 else 1848 else
1857 spte |= shadow_nx_mask; 1849 spte |= shadow_nx_mask;
1858 if (pte_access & ACC_USER_MASK) 1850 if (pte_access & ACC_USER_MASK)
1859 spte |= shadow_user_mask; 1851 spte |= shadow_user_mask;
1860 if (level > PT_PAGE_TABLE_LEVEL) 1852 if (level > PT_PAGE_TABLE_LEVEL)
1861 spte |= PT_PAGE_SIZE_MASK; 1853 spte |= PT_PAGE_SIZE_MASK;
1862 if (tdp_enabled) 1854 if (tdp_enabled)
1863 spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, 1855 spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
1864 kvm_is_mmio_pfn(pfn)); 1856 kvm_is_mmio_pfn(pfn));
1865 1857
1866 if (reset_host_protection) 1858 if (reset_host_protection)
1867 spte |= SPTE_HOST_WRITEABLE; 1859 spte |= SPTE_HOST_WRITEABLE;
1868 1860
1869 spte |= (u64)pfn << PAGE_SHIFT; 1861 spte |= (u64)pfn << PAGE_SHIFT;
1870 1862
1871 if ((pte_access & ACC_WRITE_MASK) 1863 if ((pte_access & ACC_WRITE_MASK)
1872 || (!tdp_enabled && write_fault && !is_write_protection(vcpu) 1864 || (!tdp_enabled && write_fault && !is_write_protection(vcpu)
1873 && !user_fault)) { 1865 && !user_fault)) {
1874 1866
1875 if (level > PT_PAGE_TABLE_LEVEL && 1867 if (level > PT_PAGE_TABLE_LEVEL &&
1876 has_wrprotected_page(vcpu->kvm, gfn, level)) { 1868 has_wrprotected_page(vcpu->kvm, gfn, level)) {
1877 ret = 1; 1869 ret = 1;
1878 rmap_remove(vcpu->kvm, sptep); 1870 rmap_remove(vcpu->kvm, sptep);
1879 spte = shadow_trap_nonpresent_pte; 1871 spte = shadow_trap_nonpresent_pte;
1880 goto set_pte; 1872 goto set_pte;
1881 } 1873 }
1882 1874
1883 spte |= PT_WRITABLE_MASK; 1875 spte |= PT_WRITABLE_MASK;
1884 1876
1885 if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK)) 1877 if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK))
1886 spte &= ~PT_USER_MASK; 1878 spte &= ~PT_USER_MASK;
1887 1879
1888 /* 1880 /*
1889 * Optimization: for pte sync, if spte was writable the hash 1881 * Optimization: for pte sync, if spte was writable the hash
1890 * lookup is unnecessary (and expensive). Write protection 1882 * lookup is unnecessary (and expensive). Write protection
1891 * is responsibility of mmu_get_page / kvm_sync_page. 1883 * is responsibility of mmu_get_page / kvm_sync_page.
1892 * Same reasoning can be applied to dirty page accounting. 1884 * Same reasoning can be applied to dirty page accounting.
1893 */ 1885 */
1894 if (!can_unsync && is_writable_pte(*sptep)) 1886 if (!can_unsync && is_writable_pte(*sptep))
1895 goto set_pte; 1887 goto set_pte;
1896 1888
1897 if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { 1889 if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
1898 pgprintk("%s: found shadow page for %lx, marking ro\n", 1890 pgprintk("%s: found shadow page for %lx, marking ro\n",
1899 __func__, gfn); 1891 __func__, gfn);
1900 ret = 1; 1892 ret = 1;
1901 pte_access &= ~ACC_WRITE_MASK; 1893 pte_access &= ~ACC_WRITE_MASK;
1902 if (is_writable_pte(spte)) 1894 if (is_writable_pte(spte))
1903 spte &= ~PT_WRITABLE_MASK; 1895 spte &= ~PT_WRITABLE_MASK;
1904 } 1896 }
1905 } 1897 }
1906 1898
1907 if (pte_access & ACC_WRITE_MASK) 1899 if (pte_access & ACC_WRITE_MASK)
1908 mark_page_dirty(vcpu->kvm, gfn); 1900 mark_page_dirty(vcpu->kvm, gfn);
1909 1901
1910 set_pte: 1902 set_pte:
1911 __set_spte(sptep, spte); 1903 __set_spte(sptep, spte);
1912 return ret; 1904 return ret;
1913 } 1905 }
1914 1906
1915 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, 1907 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
1916 unsigned pt_access, unsigned pte_access, 1908 unsigned pt_access, unsigned pte_access,
1917 int user_fault, int write_fault, int dirty, 1909 int user_fault, int write_fault, int dirty,
1918 int *ptwrite, int level, gfn_t gfn, 1910 int *ptwrite, int level, gfn_t gfn,
1919 pfn_t pfn, bool speculative, 1911 pfn_t pfn, bool speculative,
1920 bool reset_host_protection) 1912 bool reset_host_protection)
1921 { 1913 {
1922 int was_rmapped = 0; 1914 int was_rmapped = 0;
1923 int was_writable = is_writable_pte(*sptep); 1915 int was_writable = is_writable_pte(*sptep);
1924 int rmap_count; 1916 int rmap_count;
1925 1917
1926 pgprintk("%s: spte %llx access %x write_fault %d" 1918 pgprintk("%s: spte %llx access %x write_fault %d"
1927 " user_fault %d gfn %lx\n", 1919 " user_fault %d gfn %lx\n",
1928 __func__, *sptep, pt_access, 1920 __func__, *sptep, pt_access,
1929 write_fault, user_fault, gfn); 1921 write_fault, user_fault, gfn);
1930 1922
1931 if (is_rmap_spte(*sptep)) { 1923 if (is_rmap_spte(*sptep)) {
1932 /* 1924 /*
1933 * If we overwrite a PTE page pointer with a 2MB PMD, unlink 1925 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
1934 * the parent of the now unreachable PTE. 1926 * the parent of the now unreachable PTE.
1935 */ 1927 */
1936 if (level > PT_PAGE_TABLE_LEVEL && 1928 if (level > PT_PAGE_TABLE_LEVEL &&
1937 !is_large_pte(*sptep)) { 1929 !is_large_pte(*sptep)) {
1938 struct kvm_mmu_page *child; 1930 struct kvm_mmu_page *child;
1939 u64 pte = *sptep; 1931 u64 pte = *sptep;
1940 1932
1941 child = page_header(pte & PT64_BASE_ADDR_MASK); 1933 child = page_header(pte & PT64_BASE_ADDR_MASK);
1942 mmu_page_remove_parent_pte(child, sptep); 1934 mmu_page_remove_parent_pte(child, sptep);
1943 __set_spte(sptep, shadow_trap_nonpresent_pte); 1935 __set_spte(sptep, shadow_trap_nonpresent_pte);
1944 kvm_flush_remote_tlbs(vcpu->kvm); 1936 kvm_flush_remote_tlbs(vcpu->kvm);
1945 } else if (pfn != spte_to_pfn(*sptep)) { 1937 } else if (pfn != spte_to_pfn(*sptep)) {
1946 pgprintk("hfn old %lx new %lx\n", 1938 pgprintk("hfn old %lx new %lx\n",
1947 spte_to_pfn(*sptep), pfn); 1939 spte_to_pfn(*sptep), pfn);
1948 rmap_remove(vcpu->kvm, sptep); 1940 rmap_remove(vcpu->kvm, sptep);
1949 __set_spte(sptep, shadow_trap_nonpresent_pte); 1941 __set_spte(sptep, shadow_trap_nonpresent_pte);
1950 kvm_flush_remote_tlbs(vcpu->kvm); 1942 kvm_flush_remote_tlbs(vcpu->kvm);
1951 } else 1943 } else
1952 was_rmapped = 1; 1944 was_rmapped = 1;
1953 } 1945 }
1954 1946
1955 if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, 1947 if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
1956 dirty, level, gfn, pfn, speculative, true, 1948 dirty, level, gfn, pfn, speculative, true,
1957 reset_host_protection)) { 1949 reset_host_protection)) {
1958 if (write_fault) 1950 if (write_fault)
1959 *ptwrite = 1; 1951 *ptwrite = 1;
1960 kvm_mmu_flush_tlb(vcpu); 1952 kvm_mmu_flush_tlb(vcpu);
1961 } 1953 }
1962 1954
1963 pgprintk("%s: setting spte %llx\n", __func__, *sptep); 1955 pgprintk("%s: setting spte %llx\n", __func__, *sptep);
1964 pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n", 1956 pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",
1965 is_large_pte(*sptep)? "2MB" : "4kB", 1957 is_large_pte(*sptep)? "2MB" : "4kB",
1966 *sptep & PT_PRESENT_MASK ?"RW":"R", gfn, 1958 *sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
1967 *sptep, sptep); 1959 *sptep, sptep);
1968 if (!was_rmapped && is_large_pte(*sptep)) 1960 if (!was_rmapped && is_large_pte(*sptep))
1969 ++vcpu->kvm->stat.lpages; 1961 ++vcpu->kvm->stat.lpages;
1970 1962
1971 page_header_update_slot(vcpu->kvm, sptep, gfn); 1963 page_header_update_slot(vcpu->kvm, sptep, gfn);
1972 if (!was_rmapped) { 1964 if (!was_rmapped) {
1973 rmap_count = rmap_add(vcpu, sptep, gfn); 1965 rmap_count = rmap_add(vcpu, sptep, gfn);
1974 kvm_release_pfn_clean(pfn); 1966 kvm_release_pfn_clean(pfn);
1975 if (rmap_count > RMAP_RECYCLE_THRESHOLD) 1967 if (rmap_count > RMAP_RECYCLE_THRESHOLD)
1976 rmap_recycle(vcpu, sptep, gfn); 1968 rmap_recycle(vcpu, sptep, gfn);
1977 } else { 1969 } else {
1978 if (was_writable) 1970 if (was_writable)
1979 kvm_release_pfn_dirty(pfn); 1971 kvm_release_pfn_dirty(pfn);
1980 else 1972 else
1981 kvm_release_pfn_clean(pfn); 1973 kvm_release_pfn_clean(pfn);
1982 } 1974 }
1983 if (speculative) { 1975 if (speculative) {
1984 vcpu->arch.last_pte_updated = sptep; 1976 vcpu->arch.last_pte_updated = sptep;
1985 vcpu->arch.last_pte_gfn = gfn; 1977 vcpu->arch.last_pte_gfn = gfn;
1986 } 1978 }
1987 } 1979 }
1988 1980
1989 static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) 1981 static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
1990 { 1982 {
1991 } 1983 }
1992 1984
1993 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, 1985 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
1994 int level, gfn_t gfn, pfn_t pfn) 1986 int level, gfn_t gfn, pfn_t pfn)
1995 { 1987 {
1996 struct kvm_shadow_walk_iterator iterator; 1988 struct kvm_shadow_walk_iterator iterator;
1997 struct kvm_mmu_page *sp; 1989 struct kvm_mmu_page *sp;
1998 int pt_write = 0; 1990 int pt_write = 0;
1999 gfn_t pseudo_gfn; 1991 gfn_t pseudo_gfn;
2000 1992
2001 for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { 1993 for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
2002 if (iterator.level == level) { 1994 if (iterator.level == level) {
2003 mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL, 1995 mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
2004 0, write, 1, &pt_write, 1996 0, write, 1, &pt_write,
2005 level, gfn, pfn, false, true); 1997 level, gfn, pfn, false, true);
2006 ++vcpu->stat.pf_fixed; 1998 ++vcpu->stat.pf_fixed;
2007 break; 1999 break;
2008 } 2000 }
2009 2001
2010 if (*iterator.sptep == shadow_trap_nonpresent_pte) { 2002 if (*iterator.sptep == shadow_trap_nonpresent_pte) {
2011 u64 base_addr = iterator.addr; 2003 u64 base_addr = iterator.addr;
2012 2004
2013 base_addr &= PT64_LVL_ADDR_MASK(iterator.level); 2005 base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
2014 pseudo_gfn = base_addr >> PAGE_SHIFT; 2006 pseudo_gfn = base_addr >> PAGE_SHIFT;
2015 sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, 2007 sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
2016 iterator.level - 1, 2008 iterator.level - 1,
2017 1, ACC_ALL, iterator.sptep); 2009 1, ACC_ALL, iterator.sptep);
2018 if (!sp) { 2010 if (!sp) {
2019 pgprintk("nonpaging_map: ENOMEM\n"); 2011 pgprintk("nonpaging_map: ENOMEM\n");
2020 kvm_release_pfn_clean(pfn); 2012 kvm_release_pfn_clean(pfn);
2021 return -ENOMEM; 2013 return -ENOMEM;
2022 } 2014 }
2023 2015
2024 __set_spte(iterator.sptep, 2016 __set_spte(iterator.sptep,
2025 __pa(sp->spt) 2017 __pa(sp->spt)
2026 | PT_PRESENT_MASK | PT_WRITABLE_MASK 2018 | PT_PRESENT_MASK | PT_WRITABLE_MASK
2027 | shadow_user_mask | shadow_x_mask); 2019 | shadow_user_mask | shadow_x_mask);
2028 } 2020 }
2029 } 2021 }
2030 return pt_write; 2022 return pt_write;
2031 } 2023 }
2032 2024
2033 static void kvm_send_hwpoison_signal(struct kvm *kvm, gfn_t gfn) 2025 static void kvm_send_hwpoison_signal(struct kvm *kvm, gfn_t gfn)
2034 { 2026 {
2035 char buf[1]; 2027 char buf[1];
2036 void __user *hva; 2028 void __user *hva;
2037 int r; 2029 int r;
2038 2030
2039 /* Touch the page, so send SIGBUS */ 2031 /* Touch the page, so send SIGBUS */
2040 hva = (void __user *)gfn_to_hva(kvm, gfn); 2032 hva = (void __user *)gfn_to_hva(kvm, gfn);
2041 r = copy_from_user(buf, hva, 1); 2033 r = copy_from_user(buf, hva, 1);
2042 } 2034 }
2043 2035
2044 static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn) 2036 static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn)
2045 { 2037 {
2046 kvm_release_pfn_clean(pfn); 2038 kvm_release_pfn_clean(pfn);
2047 if (is_hwpoison_pfn(pfn)) { 2039 if (is_hwpoison_pfn(pfn)) {
2048 kvm_send_hwpoison_signal(kvm, gfn); 2040 kvm_send_hwpoison_signal(kvm, gfn);
2049 return 0; 2041 return 0;
2050 } 2042 }
2051 return 1; 2043 return 1;
2052 } 2044 }
2053 2045
2054 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) 2046 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
2055 { 2047 {
2056 int r; 2048 int r;
2057 int level; 2049 int level;
2058 pfn_t pfn; 2050 pfn_t pfn;
2059 unsigned long mmu_seq; 2051 unsigned long mmu_seq;
2060 2052
2061 level = mapping_level(vcpu, gfn); 2053 level = mapping_level(vcpu, gfn);
2062 2054
2063 /* 2055 /*
2064 * This path builds a PAE pagetable - so we can map 2mb pages at 2056 * This path builds a PAE pagetable - so we can map 2mb pages at
2065 * maximum. Therefore check if the level is larger than that. 2057 * maximum. Therefore check if the level is larger than that.
2066 */ 2058 */
2067 if (level > PT_DIRECTORY_LEVEL) 2059 if (level > PT_DIRECTORY_LEVEL)
2068 level = PT_DIRECTORY_LEVEL; 2060 level = PT_DIRECTORY_LEVEL;
2069 2061
2070 gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); 2062 gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
2071 2063
2072 mmu_seq = vcpu->kvm->mmu_notifier_seq; 2064 mmu_seq = vcpu->kvm->mmu_notifier_seq;
2073 smp_rmb(); 2065 smp_rmb();
2074 pfn = gfn_to_pfn(vcpu->kvm, gfn); 2066 pfn = gfn_to_pfn(vcpu->kvm, gfn);
2075 2067
2076 /* mmio */ 2068 /* mmio */
2077 if (is_error_pfn(pfn)) 2069 if (is_error_pfn(pfn))
2078 return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); 2070 return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
2079 2071
2080 spin_lock(&vcpu->kvm->mmu_lock); 2072 spin_lock(&vcpu->kvm->mmu_lock);
2081 if (mmu_notifier_retry(vcpu, mmu_seq)) 2073 if (mmu_notifier_retry(vcpu, mmu_seq))
2082 goto out_unlock; 2074 goto out_unlock;
2083 kvm_mmu_free_some_pages(vcpu); 2075 kvm_mmu_free_some_pages(vcpu);
2084 r = __direct_map(vcpu, v, write, level, gfn, pfn); 2076 r = __direct_map(vcpu, v, write, level, gfn, pfn);
2085 spin_unlock(&vcpu->kvm->mmu_lock); 2077 spin_unlock(&vcpu->kvm->mmu_lock);
2086 2078
2087 2079
2088 return r; 2080 return r;
2089 2081
2090 out_unlock: 2082 out_unlock:
2091 spin_unlock(&vcpu->kvm->mmu_lock); 2083 spin_unlock(&vcpu->kvm->mmu_lock);
2092 kvm_release_pfn_clean(pfn); 2084 kvm_release_pfn_clean(pfn);
2093 return 0; 2085 return 0;
2094 } 2086 }
2095 2087
2096 2088
2097 static void mmu_free_roots(struct kvm_vcpu *vcpu) 2089 static void mmu_free_roots(struct kvm_vcpu *vcpu)
2098 { 2090 {
2099 int i; 2091 int i;
2100 struct kvm_mmu_page *sp; 2092 struct kvm_mmu_page *sp;
2101 LIST_HEAD(invalid_list); 2093 LIST_HEAD(invalid_list);
2102 2094
2103 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) 2095 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
2104 return; 2096 return;
2105 spin_lock(&vcpu->kvm->mmu_lock); 2097 spin_lock(&vcpu->kvm->mmu_lock);
2106 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { 2098 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
2107 hpa_t root = vcpu->arch.mmu.root_hpa; 2099 hpa_t root = vcpu->arch.mmu.root_hpa;
2108 2100
2109 sp = page_header(root); 2101 sp = page_header(root);
2110 --sp->root_count; 2102 --sp->root_count;
2111 if (!sp->root_count && sp->role.invalid) { 2103 if (!sp->root_count && sp->role.invalid) {
2112 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); 2104 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
2113 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 2105 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
2114 } 2106 }
2115 vcpu->arch.mmu.root_hpa = INVALID_PAGE; 2107 vcpu->arch.mmu.root_hpa = INVALID_PAGE;
2116 spin_unlock(&vcpu->kvm->mmu_lock); 2108 spin_unlock(&vcpu->kvm->mmu_lock);
2117 return; 2109 return;
2118 } 2110 }
2119 for (i = 0; i < 4; ++i) { 2111 for (i = 0; i < 4; ++i) {
2120 hpa_t root = vcpu->arch.mmu.pae_root[i]; 2112 hpa_t root = vcpu->arch.mmu.pae_root[i];
2121 2113
2122 if (root) { 2114 if (root) {
2123 root &= PT64_BASE_ADDR_MASK; 2115 root &= PT64_BASE_ADDR_MASK;
2124 sp = page_header(root); 2116 sp = page_header(root);
2125 --sp->root_count; 2117 --sp->root_count;
2126 if (!sp->root_count && sp->role.invalid) 2118 if (!sp->root_count && sp->role.invalid)
2127 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, 2119 kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
2128 &invalid_list); 2120 &invalid_list);
2129 } 2121 }
2130 vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; 2122 vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
2131 } 2123 }
2132 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 2124 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
2133 spin_unlock(&vcpu->kvm->mmu_lock); 2125 spin_unlock(&vcpu->kvm->mmu_lock);
2134 vcpu->arch.mmu.root_hpa = INVALID_PAGE; 2126 vcpu->arch.mmu.root_hpa = INVALID_PAGE;
2135 } 2127 }
2136 2128
2137 static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn) 2129 static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
2138 { 2130 {
2139 int ret = 0; 2131 int ret = 0;
2140 2132
2141 if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) { 2133 if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) {
2142 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); 2134 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
2143 ret = 1; 2135 ret = 1;
2144 } 2136 }
2145 2137
2146 return ret; 2138 return ret;
2147 } 2139 }
2148 2140
2149 static int mmu_alloc_roots(struct kvm_vcpu *vcpu) 2141 static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
2150 { 2142 {
2151 int i; 2143 int i;
2152 gfn_t root_gfn; 2144 gfn_t root_gfn;
2153 struct kvm_mmu_page *sp; 2145 struct kvm_mmu_page *sp;
2154 int direct = 0; 2146 int direct = 0;
2155 u64 pdptr; 2147 u64 pdptr;
2156 2148
2157 root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT; 2149 root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
2158 2150
2159 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { 2151 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
2160 hpa_t root = vcpu->arch.mmu.root_hpa; 2152 hpa_t root = vcpu->arch.mmu.root_hpa;
2161 2153
2162 ASSERT(!VALID_PAGE(root)); 2154 ASSERT(!VALID_PAGE(root));
2163 if (mmu_check_root(vcpu, root_gfn)) 2155 if (mmu_check_root(vcpu, root_gfn))
2164 return 1; 2156 return 1;
2165 if (tdp_enabled) { 2157 if (tdp_enabled) {
2166 direct = 1; 2158 direct = 1;
2167 root_gfn = 0; 2159 root_gfn = 0;
2168 } 2160 }
2169 spin_lock(&vcpu->kvm->mmu_lock); 2161 spin_lock(&vcpu->kvm->mmu_lock);
2170 kvm_mmu_free_some_pages(vcpu); 2162 kvm_mmu_free_some_pages(vcpu);
2171 sp = kvm_mmu_get_page(vcpu, root_gfn, 0, 2163 sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
2172 PT64_ROOT_LEVEL, direct, 2164 PT64_ROOT_LEVEL, direct,
2173 ACC_ALL, NULL); 2165 ACC_ALL, NULL);
2174 root = __pa(sp->spt); 2166 root = __pa(sp->spt);
2175 ++sp->root_count; 2167 ++sp->root_count;
2176 spin_unlock(&vcpu->kvm->mmu_lock); 2168 spin_unlock(&vcpu->kvm->mmu_lock);
2177 vcpu->arch.mmu.root_hpa = root; 2169 vcpu->arch.mmu.root_hpa = root;
2178 return 0; 2170 return 0;
2179 } 2171 }
2180 direct = !is_paging(vcpu); 2172 direct = !is_paging(vcpu);
2181 for (i = 0; i < 4; ++i) { 2173 for (i = 0; i < 4; ++i) {
2182 hpa_t root = vcpu->arch.mmu.pae_root[i]; 2174 hpa_t root = vcpu->arch.mmu.pae_root[i];
2183 2175
2184 ASSERT(!VALID_PAGE(root)); 2176 ASSERT(!VALID_PAGE(root));
2185 if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) { 2177 if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
2186 pdptr = kvm_pdptr_read(vcpu, i); 2178 pdptr = kvm_pdptr_read(vcpu, i);
2187 if (!is_present_gpte(pdptr)) { 2179 if (!is_present_gpte(pdptr)) {
2188 vcpu->arch.mmu.pae_root[i] = 0; 2180 vcpu->arch.mmu.pae_root[i] = 0;
2189 continue; 2181 continue;
2190 } 2182 }
2191 root_gfn = pdptr >> PAGE_SHIFT; 2183 root_gfn = pdptr >> PAGE_SHIFT;
2192 } else if (vcpu->arch.mmu.root_level == 0) 2184 } else if (vcpu->arch.mmu.root_level == 0)
2193 root_gfn = 0; 2185 root_gfn = 0;
2194 if (mmu_check_root(vcpu, root_gfn)) 2186 if (mmu_check_root(vcpu, root_gfn))
2195 return 1; 2187 return 1;
2196 if (tdp_enabled) { 2188 if (tdp_enabled) {
2197 direct = 1; 2189 direct = 1;
2198 root_gfn = i << 30; 2190 root_gfn = i << 30;
2199 } 2191 }
2200 spin_lock(&vcpu->kvm->mmu_lock); 2192 spin_lock(&vcpu->kvm->mmu_lock);
2201 kvm_mmu_free_some_pages(vcpu); 2193 kvm_mmu_free_some_pages(vcpu);
2202 sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, 2194 sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
2203 PT32_ROOT_LEVEL, direct, 2195 PT32_ROOT_LEVEL, direct,
2204 ACC_ALL, NULL); 2196 ACC_ALL, NULL);
2205 root = __pa(sp->spt); 2197 root = __pa(sp->spt);
2206 ++sp->root_count; 2198 ++sp->root_count;
2207 spin_unlock(&vcpu->kvm->mmu_lock); 2199 spin_unlock(&vcpu->kvm->mmu_lock);
2208 2200
2209 vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; 2201 vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
2210 } 2202 }
2211 vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root); 2203 vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
2212 return 0; 2204 return 0;
2213 } 2205 }
2214 2206
2215 static void mmu_sync_roots(struct kvm_vcpu *vcpu) 2207 static void mmu_sync_roots(struct kvm_vcpu *vcpu)
2216 { 2208 {
2217 int i; 2209 int i;
2218 struct kvm_mmu_page *sp; 2210 struct kvm_mmu_page *sp;
2219 2211
2220 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) 2212 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
2221 return; 2213 return;
2222 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { 2214 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
2223 hpa_t root = vcpu->arch.mmu.root_hpa; 2215 hpa_t root = vcpu->arch.mmu.root_hpa;
2224 sp = page_header(root); 2216 sp = page_header(root);
2225 mmu_sync_children(vcpu, sp); 2217 mmu_sync_children(vcpu, sp);
2226 return; 2218 return;
2227 } 2219 }
2228 for (i = 0; i < 4; ++i) { 2220 for (i = 0; i < 4; ++i) {
2229 hpa_t root = vcpu->arch.mmu.pae_root[i]; 2221 hpa_t root = vcpu->arch.mmu.pae_root[i];
2230 2222
2231 if (root && VALID_PAGE(root)) { 2223 if (root && VALID_PAGE(root)) {
2232 root &= PT64_BASE_ADDR_MASK; 2224 root &= PT64_BASE_ADDR_MASK;
2233 sp = page_header(root); 2225 sp = page_header(root);
2234 mmu_sync_children(vcpu, sp); 2226 mmu_sync_children(vcpu, sp);
2235 } 2227 }
2236 } 2228 }
2237 } 2229 }
2238 2230
2239 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) 2231 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
2240 { 2232 {
2241 spin_lock(&vcpu->kvm->mmu_lock); 2233 spin_lock(&vcpu->kvm->mmu_lock);
2242 mmu_sync_roots(vcpu); 2234 mmu_sync_roots(vcpu);
2243 spin_unlock(&vcpu->kvm->mmu_lock); 2235 spin_unlock(&vcpu->kvm->mmu_lock);
2244 } 2236 }
2245 2237
2246 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr, 2238 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
2247 u32 access, u32 *error) 2239 u32 access, u32 *error)
2248 { 2240 {
2249 if (error) 2241 if (error)
2250 *error = 0; 2242 *error = 0;
2251 return vaddr; 2243 return vaddr;
2252 } 2244 }
2253 2245
2254 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, 2246 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
2255 u32 error_code) 2247 u32 error_code)
2256 { 2248 {
2257 gfn_t gfn; 2249 gfn_t gfn;
2258 int r; 2250 int r;
2259 2251
2260 pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); 2252 pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
2261 r = mmu_topup_memory_caches(vcpu); 2253 r = mmu_topup_memory_caches(vcpu);
2262 if (r) 2254 if (r)
2263 return r; 2255 return r;
2264 2256
2265 ASSERT(vcpu); 2257 ASSERT(vcpu);
2266 ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); 2258 ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
2267 2259
2268 gfn = gva >> PAGE_SHIFT; 2260 gfn = gva >> PAGE_SHIFT;
2269 2261
2270 return nonpaging_map(vcpu, gva & PAGE_MASK, 2262 return nonpaging_map(vcpu, gva & PAGE_MASK,
2271 error_code & PFERR_WRITE_MASK, gfn); 2263 error_code & PFERR_WRITE_MASK, gfn);
2272 } 2264 }
2273 2265
2274 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, 2266 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
2275 u32 error_code) 2267 u32 error_code)
2276 { 2268 {
2277 pfn_t pfn; 2269 pfn_t pfn;
2278 int r; 2270 int r;
2279 int level; 2271 int level;
2280 gfn_t gfn = gpa >> PAGE_SHIFT; 2272 gfn_t gfn = gpa >> PAGE_SHIFT;
2281 unsigned long mmu_seq; 2273 unsigned long mmu_seq;
2282 2274
2283 ASSERT(vcpu); 2275 ASSERT(vcpu);
2284 ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); 2276 ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
2285 2277
2286 r = mmu_topup_memory_caches(vcpu); 2278 r = mmu_topup_memory_caches(vcpu);
2287 if (r) 2279 if (r)
2288 return r; 2280 return r;
2289 2281
2290 level = mapping_level(vcpu, gfn); 2282 level = mapping_level(vcpu, gfn);
2291 2283
2292 gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); 2284 gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
2293 2285
2294 mmu_seq = vcpu->kvm->mmu_notifier_seq; 2286 mmu_seq = vcpu->kvm->mmu_notifier_seq;
2295 smp_rmb(); 2287 smp_rmb();
2296 pfn = gfn_to_pfn(vcpu->kvm, gfn); 2288 pfn = gfn_to_pfn(vcpu->kvm, gfn);
2297 if (is_error_pfn(pfn)) 2289 if (is_error_pfn(pfn))
2298 return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); 2290 return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
2299 spin_lock(&vcpu->kvm->mmu_lock); 2291 spin_lock(&vcpu->kvm->mmu_lock);
2300 if (mmu_notifier_retry(vcpu, mmu_seq)) 2292 if (mmu_notifier_retry(vcpu, mmu_seq))
2301 goto out_unlock; 2293 goto out_unlock;
2302 kvm_mmu_free_some_pages(vcpu); 2294 kvm_mmu_free_some_pages(vcpu);
2303 r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, 2295 r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
2304 level, gfn, pfn); 2296 level, gfn, pfn);
2305 spin_unlock(&vcpu->kvm->mmu_lock); 2297 spin_unlock(&vcpu->kvm->mmu_lock);
2306 2298
2307 return r; 2299 return r;
2308 2300
2309 out_unlock: 2301 out_unlock:
2310 spin_unlock(&vcpu->kvm->mmu_lock); 2302 spin_unlock(&vcpu->kvm->mmu_lock);
2311 kvm_release_pfn_clean(pfn); 2303 kvm_release_pfn_clean(pfn);
2312 return 0; 2304 return 0;
2313 } 2305 }
2314 2306
2315 static void nonpaging_free(struct kvm_vcpu *vcpu) 2307 static void nonpaging_free(struct kvm_vcpu *vcpu)
2316 { 2308 {
2317 mmu_free_roots(vcpu); 2309 mmu_free_roots(vcpu);
2318 } 2310 }
2319 2311
2320 static int nonpaging_init_context(struct kvm_vcpu *vcpu) 2312 static int nonpaging_init_context(struct kvm_vcpu *vcpu)
2321 { 2313 {
2322 struct kvm_mmu *context = &vcpu->arch.mmu; 2314 struct kvm_mmu *context = &vcpu->arch.mmu;
2323 2315
2324 context->new_cr3 = nonpaging_new_cr3; 2316 context->new_cr3 = nonpaging_new_cr3;
2325 context->page_fault = nonpaging_page_fault; 2317 context->page_fault = nonpaging_page_fault;
2326 context->gva_to_gpa = nonpaging_gva_to_gpa; 2318 context->gva_to_gpa = nonpaging_gva_to_gpa;
2327 context->free = nonpaging_free; 2319 context->free = nonpaging_free;
2328 context->prefetch_page = nonpaging_prefetch_page; 2320 context->prefetch_page = nonpaging_prefetch_page;
2329 context->sync_page = nonpaging_sync_page; 2321 context->sync_page = nonpaging_sync_page;
2330 context->invlpg = nonpaging_invlpg; 2322 context->invlpg = nonpaging_invlpg;
2331 context->root_level = 0; 2323 context->root_level = 0;
2332 context->shadow_root_level = PT32E_ROOT_LEVEL; 2324 context->shadow_root_level = PT32E_ROOT_LEVEL;
2333 context->root_hpa = INVALID_PAGE; 2325 context->root_hpa = INVALID_PAGE;
2334 return 0; 2326 return 0;
2335 } 2327 }
2336 2328
2337 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu) 2329 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
2338 { 2330 {
2339 ++vcpu->stat.tlb_flush; 2331 ++vcpu->stat.tlb_flush;
2340 set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); 2332 set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
2341 } 2333 }
2342 2334
2343 static void paging_new_cr3(struct kvm_vcpu *vcpu) 2335 static void paging_new_cr3(struct kvm_vcpu *vcpu)
2344 { 2336 {
2345 pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3); 2337 pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3);
2346 mmu_free_roots(vcpu); 2338 mmu_free_roots(vcpu);
2347 } 2339 }
2348 2340
2349 static void inject_page_fault(struct kvm_vcpu *vcpu, 2341 static void inject_page_fault(struct kvm_vcpu *vcpu,
2350 u64 addr, 2342 u64 addr,
2351 u32 err_code) 2343 u32 err_code)
2352 { 2344 {
2353 kvm_inject_page_fault(vcpu, addr, err_code); 2345 kvm_inject_page_fault(vcpu, addr, err_code);
2354 } 2346 }
2355 2347
2356 static void paging_free(struct kvm_vcpu *vcpu) 2348 static void paging_free(struct kvm_vcpu *vcpu)
2357 { 2349 {
2358 nonpaging_free(vcpu); 2350 nonpaging_free(vcpu);
2359 } 2351 }
2360 2352
2361 static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level) 2353 static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
2362 { 2354 {
2363 int bit7; 2355 int bit7;
2364 2356
2365 bit7 = (gpte >> 7) & 1; 2357 bit7 = (gpte >> 7) & 1;
2366 return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0; 2358 return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
2367 } 2359 }
2368 2360
2369 #define PTTYPE 64 2361 #define PTTYPE 64
2370 #include "paging_tmpl.h" 2362 #include "paging_tmpl.h"
2371 #undef PTTYPE 2363 #undef PTTYPE
2372 2364
2373 #define PTTYPE 32 2365 #define PTTYPE 32
2374 #include "paging_tmpl.h" 2366 #include "paging_tmpl.h"
2375 #undef PTTYPE 2367 #undef PTTYPE
2376 2368
2377 static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) 2369 static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
2378 { 2370 {
2379 struct kvm_mmu *context = &vcpu->arch.mmu; 2371 struct kvm_mmu *context = &vcpu->arch.mmu;
2380 int maxphyaddr = cpuid_maxphyaddr(vcpu); 2372 int maxphyaddr = cpuid_maxphyaddr(vcpu);
2381 u64 exb_bit_rsvd = 0; 2373 u64 exb_bit_rsvd = 0;
2382 2374
2383 if (!is_nx(vcpu)) 2375 if (!is_nx(vcpu))
2384 exb_bit_rsvd = rsvd_bits(63, 63); 2376 exb_bit_rsvd = rsvd_bits(63, 63);
2385 switch (level) { 2377 switch (level) {
2386 case PT32_ROOT_LEVEL: 2378 case PT32_ROOT_LEVEL:
2387 /* no rsvd bits for 2 level 4K page table entries */ 2379 /* no rsvd bits for 2 level 4K page table entries */
2388 context->rsvd_bits_mask[0][1] = 0; 2380 context->rsvd_bits_mask[0][1] = 0;
2389 context->rsvd_bits_mask[0][0] = 0; 2381 context->rsvd_bits_mask[0][0] = 0;
2390 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; 2382 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
2391 2383
2392 if (!is_pse(vcpu)) { 2384 if (!is_pse(vcpu)) {
2393 context->rsvd_bits_mask[1][1] = 0; 2385 context->rsvd_bits_mask[1][1] = 0;
2394 break; 2386 break;
2395 } 2387 }
2396 2388
2397 if (is_cpuid_PSE36()) 2389 if (is_cpuid_PSE36())
2398 /* 36bits PSE 4MB page */ 2390 /* 36bits PSE 4MB page */
2399 context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21); 2391 context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
2400 else 2392 else
2401 /* 32 bits PSE 4MB page */ 2393 /* 32 bits PSE 4MB page */
2402 context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21); 2394 context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
2403 break; 2395 break;
2404 case PT32E_ROOT_LEVEL: 2396 case PT32E_ROOT_LEVEL:
2405 context->rsvd_bits_mask[0][2] = 2397 context->rsvd_bits_mask[0][2] =
2406 rsvd_bits(maxphyaddr, 63) | 2398 rsvd_bits(maxphyaddr, 63) |
2407 rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */ 2399 rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */
2408 context->rsvd_bits_mask[0][1] = exb_bit_rsvd | 2400 context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
2409 rsvd_bits(maxphyaddr, 62); /* PDE */ 2401 rsvd_bits(maxphyaddr, 62); /* PDE */
2410 context->rsvd_bits_mask[0][0] = exb_bit_rsvd | 2402 context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
2411 rsvd_bits(maxphyaddr, 62); /* PTE */ 2403 rsvd_bits(maxphyaddr, 62); /* PTE */
2412 context->rsvd_bits_mask[1][1] = exb_bit_rsvd | 2404 context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
2413 rsvd_bits(maxphyaddr, 62) | 2405 rsvd_bits(maxphyaddr, 62) |
2414 rsvd_bits(13, 20); /* large page */ 2406 rsvd_bits(13, 20); /* large page */
2415 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; 2407 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
2416 break; 2408 break;
2417 case PT64_ROOT_LEVEL: 2409 case PT64_ROOT_LEVEL:
2418 context->rsvd_bits_mask[0][3] = exb_bit_rsvd | 2410 context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
2419 rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); 2411 rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
2420 context->rsvd_bits_mask[0][2] = exb_bit_rsvd | 2412 context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
2421 rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); 2413 rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
2422 context->rsvd_bits_mask[0][1] = exb_bit_rsvd | 2414 context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
2423 rsvd_bits(maxphyaddr, 51); 2415 rsvd_bits(maxphyaddr, 51);
2424 context->rsvd_bits_mask[0][0] = exb_bit_rsvd | 2416 context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
2425 rsvd_bits(maxphyaddr, 51); 2417 rsvd_bits(maxphyaddr, 51);
2426 context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3]; 2418 context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
2427 context->rsvd_bits_mask[1][2] = exb_bit_rsvd | 2419 context->rsvd_bits_mask[1][2] = exb_bit_rsvd |
2428 rsvd_bits(maxphyaddr, 51) | 2420 rsvd_bits(maxphyaddr, 51) |
2429 rsvd_bits(13, 29); 2421 rsvd_bits(13, 29);
2430 context->rsvd_bits_mask[1][1] = exb_bit_rsvd | 2422 context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
2431 rsvd_bits(maxphyaddr, 51) | 2423 rsvd_bits(maxphyaddr, 51) |
2432 rsvd_bits(13, 20); /* large page */ 2424 rsvd_bits(13, 20); /* large page */
2433 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; 2425 context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
2434 break; 2426 break;
2435 } 2427 }
2436 } 2428 }
2437 2429
2438 static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level) 2430 static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
2439 { 2431 {
2440 struct kvm_mmu *context = &vcpu->arch.mmu; 2432 struct kvm_mmu *context = &vcpu->arch.mmu;
2441 2433
2442 ASSERT(is_pae(vcpu)); 2434 ASSERT(is_pae(vcpu));
2443 context->new_cr3 = paging_new_cr3; 2435 context->new_cr3 = paging_new_cr3;
2444 context->page_fault = paging64_page_fault; 2436 context->page_fault = paging64_page_fault;
2445 context->gva_to_gpa = paging64_gva_to_gpa; 2437 context->gva_to_gpa = paging64_gva_to_gpa;
2446 context->prefetch_page = paging64_prefetch_page; 2438 context->prefetch_page = paging64_prefetch_page;
2447 context->sync_page = paging64_sync_page; 2439 context->sync_page = paging64_sync_page;
2448 context->invlpg = paging64_invlpg; 2440 context->invlpg = paging64_invlpg;
2449 context->free = paging_free; 2441 context->free = paging_free;
2450 context->root_level = level; 2442 context->root_level = level;
2451 context->shadow_root_level = level; 2443 context->shadow_root_level = level;
2452 context->root_hpa = INVALID_PAGE; 2444 context->root_hpa = INVALID_PAGE;
2453 return 0; 2445 return 0;
2454 } 2446 }
2455 2447
2456 static int paging64_init_context(struct kvm_vcpu *vcpu) 2448 static int paging64_init_context(struct kvm_vcpu *vcpu)
2457 { 2449 {
2458 reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); 2450 reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
2459 return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); 2451 return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
2460 } 2452 }
2461 2453
2462 static int paging32_init_context(struct kvm_vcpu *vcpu) 2454 static int paging32_init_context(struct kvm_vcpu *vcpu)
2463 { 2455 {
2464 struct kvm_mmu *context = &vcpu->arch.mmu; 2456 struct kvm_mmu *context = &vcpu->arch.mmu;
2465 2457
2466 reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); 2458 reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
2467 context->new_cr3 = paging_new_cr3; 2459 context->new_cr3 = paging_new_cr3;
2468 context->page_fault = paging32_page_fault; 2460 context->page_fault = paging32_page_fault;
2469 context->gva_to_gpa = paging32_gva_to_gpa; 2461 context->gva_to_gpa = paging32_gva_to_gpa;
2470 context->free = paging_free; 2462 context->free = paging_free;
2471 context->prefetch_page = paging32_prefetch_page; 2463 context->prefetch_page = paging32_prefetch_page;
2472 context->sync_page = paging32_sync_page; 2464 context->sync_page = paging32_sync_page;
2473 context->invlpg = paging32_invlpg; 2465 context->invlpg = paging32_invlpg;
2474 context->root_level = PT32_ROOT_LEVEL; 2466 context->root_level = PT32_ROOT_LEVEL;
2475 context->shadow_root_level = PT32E_ROOT_LEVEL; 2467 context->shadow_root_level = PT32E_ROOT_LEVEL;
2476 context->root_hpa = INVALID_PAGE; 2468 context->root_hpa = INVALID_PAGE;
2477 return 0; 2469 return 0;
2478 } 2470 }
2479 2471
2480 static int paging32E_init_context(struct kvm_vcpu *vcpu) 2472 static int paging32E_init_context(struct kvm_vcpu *vcpu)
2481 { 2473 {
2482 reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); 2474 reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
2483 return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL); 2475 return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
2484 } 2476 }
2485 2477
2486 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) 2478 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
2487 { 2479 {
2488 struct kvm_mmu *context = &vcpu->arch.mmu; 2480 struct kvm_mmu *context = &vcpu->arch.mmu;
2489 2481
2490 context->new_cr3 = nonpaging_new_cr3; 2482 context->new_cr3 = nonpaging_new_cr3;
2491 context->page_fault = tdp_page_fault; 2483 context->page_fault = tdp_page_fault;
2492 context->free = nonpaging_free; 2484 context->free = nonpaging_free;
2493 context->prefetch_page = nonpaging_prefetch_page; 2485 context->prefetch_page = nonpaging_prefetch_page;
2494 context->sync_page = nonpaging_sync_page; 2486 context->sync_page = nonpaging_sync_page;
2495 context->invlpg = nonpaging_invlpg; 2487 context->invlpg = nonpaging_invlpg;
2496 context->shadow_root_level = kvm_x86_ops->get_tdp_level(); 2488 context->shadow_root_level = kvm_x86_ops->get_tdp_level();
2497 context->root_hpa = INVALID_PAGE; 2489 context->root_hpa = INVALID_PAGE;
2498 2490
2499 if (!is_paging(vcpu)) { 2491 if (!is_paging(vcpu)) {
2500 context->gva_to_gpa = nonpaging_gva_to_gpa; 2492 context->gva_to_gpa = nonpaging_gva_to_gpa;
2501 context->root_level = 0; 2493 context->root_level = 0;
2502 } else if (is_long_mode(vcpu)) { 2494 } else if (is_long_mode(vcpu)) {
2503 reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); 2495 reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
2504 context->gva_to_gpa = paging64_gva_to_gpa; 2496 context->gva_to_gpa = paging64_gva_to_gpa;
2505 context->root_level = PT64_ROOT_LEVEL; 2497 context->root_level = PT64_ROOT_LEVEL;
2506 } else if (is_pae(vcpu)) { 2498 } else if (is_pae(vcpu)) {
2507 reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); 2499 reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
2508 context->gva_to_gpa = paging64_gva_to_gpa; 2500 context->gva_to_gpa = paging64_gva_to_gpa;
2509 context->root_level = PT32E_ROOT_LEVEL; 2501 context->root_level = PT32E_ROOT_LEVEL;
2510 } else { 2502 } else {
2511 reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); 2503 reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
2512 context->gva_to_gpa = paging32_gva_to_gpa; 2504 context->gva_to_gpa = paging32_gva_to_gpa;
2513 context->root_level = PT32_ROOT_LEVEL; 2505 context->root_level = PT32_ROOT_LEVEL;
2514 } 2506 }
2515 2507
2516 return 0; 2508 return 0;
2517 } 2509 }
2518 2510
2519 static int init_kvm_softmmu(struct kvm_vcpu *vcpu) 2511 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
2520 { 2512 {
2521 int r; 2513 int r;
2522 2514
2523 ASSERT(vcpu); 2515 ASSERT(vcpu);
2524 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); 2516 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
2525 2517
2526 if (!is_paging(vcpu)) 2518 if (!is_paging(vcpu))
2527 r = nonpaging_init_context(vcpu); 2519 r = nonpaging_init_context(vcpu);
2528 else if (is_long_mode(vcpu)) 2520 else if (is_long_mode(vcpu))
2529 r = paging64_init_context(vcpu); 2521 r = paging64_init_context(vcpu);
2530 else if (is_pae(vcpu)) 2522 else if (is_pae(vcpu))
2531 r = paging32E_init_context(vcpu); 2523 r = paging32E_init_context(vcpu);
2532 else 2524 else
2533 r = paging32_init_context(vcpu); 2525 r = paging32_init_context(vcpu);
2534 2526
2535 vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); 2527 vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
2536 vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); 2528 vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
2537 2529
2538 return r; 2530 return r;
2539 } 2531 }
2540 2532
2541 static int init_kvm_mmu(struct kvm_vcpu *vcpu) 2533 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
2542 { 2534 {
2543 vcpu->arch.update_pte.pfn = bad_pfn; 2535 vcpu->arch.update_pte.pfn = bad_pfn;
2544 2536
2545 if (tdp_enabled) 2537 if (tdp_enabled)
2546 return init_kvm_tdp_mmu(vcpu); 2538 return init_kvm_tdp_mmu(vcpu);
2547 else 2539 else
2548 return init_kvm_softmmu(vcpu); 2540 return init_kvm_softmmu(vcpu);
2549 } 2541 }
2550 2542
2551 static void destroy_kvm_mmu(struct kvm_vcpu *vcpu) 2543 static void destroy_kvm_mmu(struct kvm_vcpu *vcpu)
2552 { 2544 {
2553 ASSERT(vcpu); 2545 ASSERT(vcpu);
2554 if (VALID_PAGE(vcpu->arch.mmu.root_hpa)) 2546 if (VALID_PAGE(vcpu->arch.mmu.root_hpa))
2555 /* mmu.free() should set root_hpa = INVALID_PAGE */ 2547 /* mmu.free() should set root_hpa = INVALID_PAGE */
2556 vcpu->arch.mmu.free(vcpu); 2548 vcpu->arch.mmu.free(vcpu);
2557 } 2549 }
2558 2550
2559 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu) 2551 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
2560 { 2552 {
2561 destroy_kvm_mmu(vcpu); 2553 destroy_kvm_mmu(vcpu);
2562 return init_kvm_mmu(vcpu); 2554 return init_kvm_mmu(vcpu);
2563 } 2555 }
2564 EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); 2556 EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
2565 2557
2566 int kvm_mmu_load(struct kvm_vcpu *vcpu) 2558 int kvm_mmu_load(struct kvm_vcpu *vcpu)
2567 { 2559 {
2568 int r; 2560 int r;
2569 2561
2570 r = mmu_topup_memory_caches(vcpu); 2562 r = mmu_topup_memory_caches(vcpu);
2571 if (r) 2563 if (r)
2572 goto out; 2564 goto out;
2573 r = mmu_alloc_roots(vcpu); 2565 r = mmu_alloc_roots(vcpu);
2574 spin_lock(&vcpu->kvm->mmu_lock); 2566 spin_lock(&vcpu->kvm->mmu_lock);
2575 mmu_sync_roots(vcpu); 2567 mmu_sync_roots(vcpu);
2576 spin_unlock(&vcpu->kvm->mmu_lock); 2568 spin_unlock(&vcpu->kvm->mmu_lock);
2577 if (r) 2569 if (r)
2578 goto out; 2570 goto out;
2579 /* set_cr3() should ensure TLB has been flushed */ 2571 /* set_cr3() should ensure TLB has been flushed */
2580 kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa); 2572 kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
2581 out: 2573 out:
2582 return r; 2574 return r;
2583 } 2575 }
2584 EXPORT_SYMBOL_GPL(kvm_mmu_load); 2576 EXPORT_SYMBOL_GPL(kvm_mmu_load);
2585 2577
2586 void kvm_mmu_unload(struct kvm_vcpu *vcpu) 2578 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
2587 { 2579 {
2588 mmu_free_roots(vcpu); 2580 mmu_free_roots(vcpu);
2589 } 2581 }
2590 2582
2591 static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu, 2583 static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
2592 struct kvm_mmu_page *sp, 2584 struct kvm_mmu_page *sp,
2593 u64 *spte) 2585 u64 *spte)
2594 { 2586 {
2595 u64 pte; 2587 u64 pte;
2596 struct kvm_mmu_page *child; 2588 struct kvm_mmu_page *child;
2597 2589
2598 pte = *spte; 2590 pte = *spte;
2599 if (is_shadow_present_pte(pte)) { 2591 if (is_shadow_present_pte(pte)) {
2600 if (is_last_spte(pte, sp->role.level)) 2592 if (is_last_spte(pte, sp->role.level))
2601 rmap_remove(vcpu->kvm, spte); 2593 rmap_remove(vcpu->kvm, spte);
2602 else { 2594 else {
2603 child = page_header(pte & PT64_BASE_ADDR_MASK); 2595 child = page_header(pte & PT64_BASE_ADDR_MASK);
2604 mmu_page_remove_parent_pte(child, spte); 2596 mmu_page_remove_parent_pte(child, spte);
2605 } 2597 }
2606 } 2598 }
2607 __set_spte(spte, shadow_trap_nonpresent_pte); 2599 __set_spte(spte, shadow_trap_nonpresent_pte);
2608 if (is_large_pte(pte)) 2600 if (is_large_pte(pte))
2609 --vcpu->kvm->stat.lpages; 2601 --vcpu->kvm->stat.lpages;
2610 } 2602 }
2611 2603
2612 static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu, 2604 static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
2613 struct kvm_mmu_page *sp, 2605 struct kvm_mmu_page *sp,
2614 u64 *spte, 2606 u64 *spte,
2615 const void *new) 2607 const void *new)
2616 { 2608 {
2617 if (sp->role.level != PT_PAGE_TABLE_LEVEL) { 2609 if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
2618 ++vcpu->kvm->stat.mmu_pde_zapped; 2610 ++vcpu->kvm->stat.mmu_pde_zapped;
2619 return; 2611 return;
2620 } 2612 }
2621 2613
2622 ++vcpu->kvm->stat.mmu_pte_updated; 2614 ++vcpu->kvm->stat.mmu_pte_updated;
2623 if (!sp->role.cr4_pae) 2615 if (!sp->role.cr4_pae)
2624 paging32_update_pte(vcpu, sp, spte, new); 2616 paging32_update_pte(vcpu, sp, spte, new);
2625 else 2617 else
2626 paging64_update_pte(vcpu, sp, spte, new); 2618 paging64_update_pte(vcpu, sp, spte, new);
2627 } 2619 }
2628 2620
2629 static bool need_remote_flush(u64 old, u64 new) 2621 static bool need_remote_flush(u64 old, u64 new)
2630 { 2622 {
2631 if (!is_shadow_present_pte(old)) 2623 if (!is_shadow_present_pte(old))
2632 return false; 2624 return false;
2633 if (!is_shadow_present_pte(new)) 2625 if (!is_shadow_present_pte(new))
2634 return true; 2626 return true;
2635 if ((old ^ new) & PT64_BASE_ADDR_MASK) 2627 if ((old ^ new) & PT64_BASE_ADDR_MASK)
2636 return true; 2628 return true;
2637 old ^= PT64_NX_MASK; 2629 old ^= PT64_NX_MASK;
2638 new ^= PT64_NX_MASK; 2630 new ^= PT64_NX_MASK;
2639 return (old & ~new & PT64_PERM_MASK) != 0; 2631 return (old & ~new & PT64_PERM_MASK) != 0;
2640 } 2632 }
2641 2633
2642 static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, 2634 static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page,
2643 bool remote_flush, bool local_flush) 2635 bool remote_flush, bool local_flush)
2644 { 2636 {
2645 if (zap_page) 2637 if (zap_page)
2646 return; 2638 return;
2647 2639
2648 if (remote_flush) 2640 if (remote_flush)
2649 kvm_flush_remote_tlbs(vcpu->kvm); 2641 kvm_flush_remote_tlbs(vcpu->kvm);
2650 else if (local_flush) 2642 else if (local_flush)
2651 kvm_mmu_flush_tlb(vcpu); 2643 kvm_mmu_flush_tlb(vcpu);
2652 } 2644 }
2653 2645
2654 static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) 2646 static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
2655 { 2647 {
2656 u64 *spte = vcpu->arch.last_pte_updated; 2648 u64 *spte = vcpu->arch.last_pte_updated;
2657 2649
2658 return !!(spte && (*spte & shadow_accessed_mask)); 2650 return !!(spte && (*spte & shadow_accessed_mask));
2659 } 2651 }
2660 2652
2661 static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 2653 static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
2662 u64 gpte) 2654 u64 gpte)
2663 { 2655 {
2664 gfn_t gfn; 2656 gfn_t gfn;
2665 pfn_t pfn; 2657 pfn_t pfn;
2666 2658
2667 if (!is_present_gpte(gpte)) 2659 if (!is_present_gpte(gpte))
2668 return; 2660 return;
2669 gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; 2661 gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
2670 2662
2671 vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq; 2663 vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
2672 smp_rmb(); 2664 smp_rmb();
2673 pfn = gfn_to_pfn(vcpu->kvm, gfn); 2665 pfn = gfn_to_pfn(vcpu->kvm, gfn);
2674 2666
2675 if (is_error_pfn(pfn)) { 2667 if (is_error_pfn(pfn)) {
2676 kvm_release_pfn_clean(pfn); 2668 kvm_release_pfn_clean(pfn);
2677 return; 2669 return;
2678 } 2670 }
2679 vcpu->arch.update_pte.gfn = gfn; 2671 vcpu->arch.update_pte.gfn = gfn;
2680 vcpu->arch.update_pte.pfn = pfn; 2672 vcpu->arch.update_pte.pfn = pfn;
2681 } 2673 }
2682 2674
2683 static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) 2675 static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
2684 { 2676 {
2685 u64 *spte = vcpu->arch.last_pte_updated; 2677 u64 *spte = vcpu->arch.last_pte_updated;
2686 2678
2687 if (spte 2679 if (spte
2688 && vcpu->arch.last_pte_gfn == gfn 2680 && vcpu->arch.last_pte_gfn == gfn
2689 && shadow_accessed_mask 2681 && shadow_accessed_mask
2690 && !(*spte & shadow_accessed_mask) 2682 && !(*spte & shadow_accessed_mask)
2691 && is_shadow_present_pte(*spte)) 2683 && is_shadow_present_pte(*spte))
2692 set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); 2684 set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
2693 } 2685 }
2694 2686
2695 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 2687 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
2696 const u8 *new, int bytes, 2688 const u8 *new, int bytes,
2697 bool guest_initiated) 2689 bool guest_initiated)
2698 { 2690 {
2699 gfn_t gfn = gpa >> PAGE_SHIFT; 2691 gfn_t gfn = gpa >> PAGE_SHIFT;
2700 struct kvm_mmu_page *sp; 2692 struct kvm_mmu_page *sp;
2701 struct hlist_node *node; 2693 struct hlist_node *node;
2702 LIST_HEAD(invalid_list); 2694 LIST_HEAD(invalid_list);
2703 u64 entry, gentry; 2695 u64 entry, gentry;
2704 u64 *spte; 2696 u64 *spte;
2705 unsigned offset = offset_in_page(gpa); 2697 unsigned offset = offset_in_page(gpa);
2706 unsigned pte_size; 2698 unsigned pte_size;
2707 unsigned page_offset; 2699 unsigned page_offset;
2708 unsigned misaligned; 2700 unsigned misaligned;
2709 unsigned quadrant; 2701 unsigned quadrant;
2710 int level; 2702 int level;
2711 int flooded = 0; 2703 int flooded = 0;
2712 int npte; 2704 int npte;
2713 int r; 2705 int r;
2714 int invlpg_counter; 2706 int invlpg_counter;
2715 bool remote_flush, local_flush, zap_page; 2707 bool remote_flush, local_flush, zap_page;
2716 2708
2717 zap_page = remote_flush = local_flush = false; 2709 zap_page = remote_flush = local_flush = false;
2718 2710
2719 pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); 2711 pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
2720 2712
2721 invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter); 2713 invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
2722 2714
2723 /* 2715 /*
2724 * Assume that the pte write on a page table of the same type 2716 * Assume that the pte write on a page table of the same type
2725 * as the current vcpu paging mode. This is nearly always true 2717 * as the current vcpu paging mode. This is nearly always true
2726 * (might be false while changing modes). Note it is verified later 2718 * (might be false while changing modes). Note it is verified later
2727 * by update_pte(). 2719 * by update_pte().
2728 */ 2720 */
2729 if ((is_pae(vcpu) && bytes == 4) || !new) { 2721 if ((is_pae(vcpu) && bytes == 4) || !new) {
2730 /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ 2722 /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
2731 if (is_pae(vcpu)) { 2723 if (is_pae(vcpu)) {
2732 gpa &= ~(gpa_t)7; 2724 gpa &= ~(gpa_t)7;
2733 bytes = 8; 2725 bytes = 8;
2734 } 2726 }
2735 r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8)); 2727 r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
2736 if (r) 2728 if (r)
2737 gentry = 0; 2729 gentry = 0;
2738 new = (const u8 *)&gentry; 2730 new = (const u8 *)&gentry;
2739 } 2731 }
2740 2732
2741 switch (bytes) { 2733 switch (bytes) {
2742 case 4: 2734 case 4:
2743 gentry = *(const u32 *)new; 2735 gentry = *(const u32 *)new;
2744 break; 2736 break;
2745 case 8: 2737 case 8:
2746 gentry = *(const u64 *)new; 2738 gentry = *(const u64 *)new;
2747 break; 2739 break;
2748 default: 2740 default:
2749 gentry = 0; 2741 gentry = 0;
2750 break; 2742 break;
2751 } 2743 }
2752 2744
2753 mmu_guess_page_from_pte_write(vcpu, gpa, gentry); 2745 mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
2754 spin_lock(&vcpu->kvm->mmu_lock); 2746 spin_lock(&vcpu->kvm->mmu_lock);
2755 if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) 2747 if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
2756 gentry = 0; 2748 gentry = 0;
2757 kvm_mmu_access_page(vcpu, gfn); 2749 kvm_mmu_access_page(vcpu, gfn);
2758 kvm_mmu_free_some_pages(vcpu); 2750 kvm_mmu_free_some_pages(vcpu);
2759 ++vcpu->kvm->stat.mmu_pte_write; 2751 ++vcpu->kvm->stat.mmu_pte_write;
2760 kvm_mmu_audit(vcpu, "pre pte write"); 2752 kvm_mmu_audit(vcpu, "pre pte write");
2761 if (guest_initiated) { 2753 if (guest_initiated) {
2762 if (gfn == vcpu->arch.last_pt_write_gfn 2754 if (gfn == vcpu->arch.last_pt_write_gfn
2763 && !last_updated_pte_accessed(vcpu)) { 2755 && !last_updated_pte_accessed(vcpu)) {
2764 ++vcpu->arch.last_pt_write_count; 2756 ++vcpu->arch.last_pt_write_count;
2765 if (vcpu->arch.last_pt_write_count >= 3) 2757 if (vcpu->arch.last_pt_write_count >= 3)
2766 flooded = 1; 2758 flooded = 1;
2767 } else { 2759 } else {
2768 vcpu->arch.last_pt_write_gfn = gfn; 2760 vcpu->arch.last_pt_write_gfn = gfn;
2769 vcpu->arch.last_pt_write_count = 1; 2761 vcpu->arch.last_pt_write_count = 1;
2770 vcpu->arch.last_pte_updated = NULL; 2762 vcpu->arch.last_pte_updated = NULL;
2771 } 2763 }
2772 } 2764 }
2773 2765
2774 for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) { 2766 for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) {
2775 pte_size = sp->role.cr4_pae ? 8 : 4; 2767 pte_size = sp->role.cr4_pae ? 8 : 4;
2776 misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1); 2768 misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
2777 misaligned |= bytes < 4; 2769 misaligned |= bytes < 4;
2778 if (misaligned || flooded) { 2770 if (misaligned || flooded) {
2779 /* 2771 /*
2780 * Misaligned accesses are too much trouble to fix 2772 * Misaligned accesses are too much trouble to fix
2781 * up; also, they usually indicate a page is not used 2773 * up; also, they usually indicate a page is not used
2782 * as a page table. 2774 * as a page table.
2783 * 2775 *
2784 * If we're seeing too many writes to a page, 2776 * If we're seeing too many writes to a page,
2785 * it may no longer be a page table, or we may be 2777 * it may no longer be a page table, or we may be
2786 * forking, in which case it is better to unmap the 2778 * forking, in which case it is better to unmap the
2787 * page. 2779 * page.
2788 */ 2780 */
2789 pgprintk("misaligned: gpa %llx bytes %d role %x\n", 2781 pgprintk("misaligned: gpa %llx bytes %d role %x\n",
2790 gpa, bytes, sp->role.word); 2782 gpa, bytes, sp->role.word);
2791 zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp, 2783 zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
2792 &invalid_list); 2784 &invalid_list);
2793 ++vcpu->kvm->stat.mmu_flooded; 2785 ++vcpu->kvm->stat.mmu_flooded;
2794 continue; 2786 continue;
2795 } 2787 }
2796 page_offset = offset; 2788 page_offset = offset;
2797 level = sp->role.level; 2789 level = sp->role.level;
2798 npte = 1; 2790 npte = 1;
2799 if (!sp->role.cr4_pae) { 2791 if (!sp->role.cr4_pae) {
2800 page_offset <<= 1; /* 32->64 */ 2792 page_offset <<= 1; /* 32->64 */
2801 /* 2793 /*
2802 * A 32-bit pde maps 4MB while the shadow pdes map 2794 * A 32-bit pde maps 4MB while the shadow pdes map
2803 * only 2MB. So we need to double the offset again 2795 * only 2MB. So we need to double the offset again
2804 * and zap two pdes instead of one. 2796 * and zap two pdes instead of one.
2805 */ 2797 */
2806 if (level == PT32_ROOT_LEVEL) { 2798 if (level == PT32_ROOT_LEVEL) {
2807 page_offset &= ~7; /* kill rounding error */ 2799 page_offset &= ~7; /* kill rounding error */
2808 page_offset <<= 1; 2800 page_offset <<= 1;
2809 npte = 2; 2801 npte = 2;
2810 } 2802 }
2811 quadrant = page_offset >> PAGE_SHIFT; 2803 quadrant = page_offset >> PAGE_SHIFT;
2812 page_offset &= ~PAGE_MASK; 2804 page_offset &= ~PAGE_MASK;
2813 if (quadrant != sp->role.quadrant) 2805 if (quadrant != sp->role.quadrant)
2814 continue; 2806 continue;
2815 } 2807 }
2816 local_flush = true; 2808 local_flush = true;
2817 spte = &sp->spt[page_offset / sizeof(*spte)]; 2809 spte = &sp->spt[page_offset / sizeof(*spte)];
2818 while (npte--) { 2810 while (npte--) {
2819 entry = *spte; 2811 entry = *spte;
2820 mmu_pte_write_zap_pte(vcpu, sp, spte); 2812 mmu_pte_write_zap_pte(vcpu, sp, spte);
2821 if (gentry) 2813 if (gentry)
2822 mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); 2814 mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
2823 if (!remote_flush && need_remote_flush(entry, *spte)) 2815 if (!remote_flush && need_remote_flush(entry, *spte))
2824 remote_flush = true; 2816 remote_flush = true;
2825 ++spte; 2817 ++spte;
2826 } 2818 }
2827 } 2819 }
2828 mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); 2820 mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
2829 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 2821 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
2830 kvm_mmu_audit(vcpu, "post pte write"); 2822 kvm_mmu_audit(vcpu, "post pte write");
2831 spin_unlock(&vcpu->kvm->mmu_lock); 2823 spin_unlock(&vcpu->kvm->mmu_lock);
2832 if (!is_error_pfn(vcpu->arch.update_pte.pfn)) { 2824 if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
2833 kvm_release_pfn_clean(vcpu->arch.update_pte.pfn); 2825 kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
2834 vcpu->arch.update_pte.pfn = bad_pfn; 2826 vcpu->arch.update_pte.pfn = bad_pfn;
2835 } 2827 }
2836 } 2828 }
2837 2829
2838 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) 2830 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
2839 { 2831 {
2840 gpa_t gpa; 2832 gpa_t gpa;
2841 int r; 2833 int r;
2842 2834
2843 if (tdp_enabled) 2835 if (tdp_enabled)
2844 return 0; 2836 return 0;
2845 2837
2846 gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); 2838 gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
2847 2839
2848 spin_lock(&vcpu->kvm->mmu_lock); 2840 spin_lock(&vcpu->kvm->mmu_lock);
2849 r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT); 2841 r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
2850 spin_unlock(&vcpu->kvm->mmu_lock); 2842 spin_unlock(&vcpu->kvm->mmu_lock);
2851 return r; 2843 return r;
2852 } 2844 }
2853 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt); 2845 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
2854 2846
2855 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) 2847 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
2856 { 2848 {
2857 int free_pages; 2849 int free_pages;
2858 LIST_HEAD(invalid_list); 2850 LIST_HEAD(invalid_list);
2859 2851
2860 free_pages = vcpu->kvm->arch.n_free_mmu_pages; 2852 free_pages = vcpu->kvm->arch.n_free_mmu_pages;
2861 while (free_pages < KVM_REFILL_PAGES && 2853 while (free_pages < KVM_REFILL_PAGES &&
2862 !list_empty(&vcpu->kvm->arch.active_mmu_pages)) { 2854 !list_empty(&vcpu->kvm->arch.active_mmu_pages)) {
2863 struct kvm_mmu_page *sp; 2855 struct kvm_mmu_page *sp;
2864 2856
2865 sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev, 2857 sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
2866 struct kvm_mmu_page, link); 2858 struct kvm_mmu_page, link);
2867 free_pages += kvm_mmu_prepare_zap_page(vcpu->kvm, sp, 2859 free_pages += kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
2868 &invalid_list); 2860 &invalid_list);
2869 ++vcpu->kvm->stat.mmu_recycled; 2861 ++vcpu->kvm->stat.mmu_recycled;
2870 } 2862 }
2871 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 2863 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
2872 } 2864 }
2873 2865
2874 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) 2866 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
2875 { 2867 {
2876 int r; 2868 int r;
2877 enum emulation_result er; 2869 enum emulation_result er;
2878 2870
2879 r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code); 2871 r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code);
2880 if (r < 0) 2872 if (r < 0)
2881 goto out; 2873 goto out;
2882 2874
2883 if (!r) { 2875 if (!r) {
2884 r = 1; 2876 r = 1;
2885 goto out; 2877 goto out;
2886 } 2878 }
2887 2879
2888 r = mmu_topup_memory_caches(vcpu); 2880 r = mmu_topup_memory_caches(vcpu);
2889 if (r) 2881 if (r)
2890 goto out; 2882 goto out;
2891 2883
2892 er = emulate_instruction(vcpu, cr2, error_code, 0); 2884 er = emulate_instruction(vcpu, cr2, error_code, 0);
2893 2885
2894 switch (er) { 2886 switch (er) {
2895 case EMULATE_DONE: 2887 case EMULATE_DONE:
2896 return 1; 2888 return 1;
2897 case EMULATE_DO_MMIO: 2889 case EMULATE_DO_MMIO:
2898 ++vcpu->stat.mmio_exits; 2890 ++vcpu->stat.mmio_exits;
2899 /* fall through */ 2891 /* fall through */
2900 case EMULATE_FAIL: 2892 case EMULATE_FAIL:
2901 return 0; 2893 return 0;
2902 default: 2894 default:
2903 BUG(); 2895 BUG();
2904 } 2896 }
2905 out: 2897 out:
2906 return r; 2898 return r;
2907 } 2899 }
2908 EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); 2900 EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
2909 2901
2910 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) 2902 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
2911 { 2903 {
2912 vcpu->arch.mmu.invlpg(vcpu, gva); 2904 vcpu->arch.mmu.invlpg(vcpu, gva);
2913 kvm_mmu_flush_tlb(vcpu); 2905 kvm_mmu_flush_tlb(vcpu);
2914 ++vcpu->stat.invlpg; 2906 ++vcpu->stat.invlpg;
2915 } 2907 }
2916 EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); 2908 EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
2917 2909
2918 void kvm_enable_tdp(void) 2910 void kvm_enable_tdp(void)
2919 { 2911 {
2920 tdp_enabled = true; 2912 tdp_enabled = true;
2921 } 2913 }
2922 EXPORT_SYMBOL_GPL(kvm_enable_tdp); 2914 EXPORT_SYMBOL_GPL(kvm_enable_tdp);
2923 2915
2924 void kvm_disable_tdp(void) 2916 void kvm_disable_tdp(void)
2925 { 2917 {
2926 tdp_enabled = false; 2918 tdp_enabled = false;
2927 } 2919 }
2928 EXPORT_SYMBOL_GPL(kvm_disable_tdp); 2920 EXPORT_SYMBOL_GPL(kvm_disable_tdp);
2929 2921
2930 static void free_mmu_pages(struct kvm_vcpu *vcpu) 2922 static void free_mmu_pages(struct kvm_vcpu *vcpu)
2931 { 2923 {
2932 free_page((unsigned long)vcpu->arch.mmu.pae_root); 2924 free_page((unsigned long)vcpu->arch.mmu.pae_root);
2933 } 2925 }
2934 2926
2935 static int alloc_mmu_pages(struct kvm_vcpu *vcpu) 2927 static int alloc_mmu_pages(struct kvm_vcpu *vcpu)
2936 { 2928 {
2937 struct page *page; 2929 struct page *page;
2938 int i; 2930 int i;
2939 2931
2940 ASSERT(vcpu); 2932 ASSERT(vcpu);
2941 2933
2942 /* 2934 /*
2943 * When emulating 32-bit mode, cr3 is only 32 bits even on x86_64. 2935 * When emulating 32-bit mode, cr3 is only 32 bits even on x86_64.
2944 * Therefore we need to allocate shadow page tables in the first 2936 * Therefore we need to allocate shadow page tables in the first
2945 * 4GB of memory, which happens to fit the DMA32 zone. 2937 * 4GB of memory, which happens to fit the DMA32 zone.
2946 */ 2938 */
2947 page = alloc_page(GFP_KERNEL | __GFP_DMA32); 2939 page = alloc_page(GFP_KERNEL | __GFP_DMA32);
2948 if (!page) 2940 if (!page)
2949 return -ENOMEM; 2941 return -ENOMEM;
2950 2942
2951 vcpu->arch.mmu.pae_root = page_address(page); 2943 vcpu->arch.mmu.pae_root = page_address(page);
2952 for (i = 0; i < 4; ++i) 2944 for (i = 0; i < 4; ++i)
2953 vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; 2945 vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
2954 2946
2955 return 0; 2947 return 0;
2956 } 2948 }
2957 2949
2958 int kvm_mmu_create(struct kvm_vcpu *vcpu) 2950 int kvm_mmu_create(struct kvm_vcpu *vcpu)
2959 { 2951 {
2960 ASSERT(vcpu); 2952 ASSERT(vcpu);
2961 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); 2953 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
2962 2954
2963 return alloc_mmu_pages(vcpu); 2955 return alloc_mmu_pages(vcpu);
2964 } 2956 }
2965 2957
2966 int kvm_mmu_setup(struct kvm_vcpu *vcpu) 2958 int kvm_mmu_setup(struct kvm_vcpu *vcpu)
2967 { 2959 {
2968 ASSERT(vcpu); 2960 ASSERT(vcpu);
2969 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); 2961 ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
2970 2962
2971 return init_kvm_mmu(vcpu); 2963 return init_kvm_mmu(vcpu);
2972 } 2964 }
2973 2965
2974 void kvm_mmu_destroy(struct kvm_vcpu *vcpu) 2966 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
2975 { 2967 {
2976 ASSERT(vcpu); 2968 ASSERT(vcpu);
2977 2969
2978 destroy_kvm_mmu(vcpu); 2970 destroy_kvm_mmu(vcpu);
2979 free_mmu_pages(vcpu); 2971 free_mmu_pages(vcpu);
2980 mmu_free_memory_caches(vcpu); 2972 mmu_free_memory_caches(vcpu);
2981 } 2973 }
2982 2974
2983 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) 2975 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
2984 { 2976 {
2985 struct kvm_mmu_page *sp; 2977 struct kvm_mmu_page *sp;
2986 2978
2987 list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) { 2979 list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
2988 int i; 2980 int i;
2989 u64 *pt; 2981 u64 *pt;
2990 2982
2991 if (!test_bit(slot, sp->slot_bitmap)) 2983 if (!test_bit(slot, sp->slot_bitmap))
2992 continue; 2984 continue;
2993 2985
2994 pt = sp->spt; 2986 pt = sp->spt;
2995 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) 2987 for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
2996 /* avoid RMW */ 2988 /* avoid RMW */
2997 if (is_writable_pte(pt[i])) 2989 if (is_writable_pte(pt[i]))
2998 pt[i] &= ~PT_WRITABLE_MASK; 2990 pt[i] &= ~PT_WRITABLE_MASK;
2999 } 2991 }
3000 kvm_flush_remote_tlbs(kvm); 2992 kvm_flush_remote_tlbs(kvm);
3001 } 2993 }
3002 2994
3003 void kvm_mmu_zap_all(struct kvm *kvm) 2995 void kvm_mmu_zap_all(struct kvm *kvm)
3004 { 2996 {
3005 struct kvm_mmu_page *sp, *node; 2997 struct kvm_mmu_page *sp, *node;
3006 LIST_HEAD(invalid_list); 2998 LIST_HEAD(invalid_list);
3007 2999
3008 spin_lock(&kvm->mmu_lock); 3000 spin_lock(&kvm->mmu_lock);
3009 restart: 3001 restart:
3010 list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) 3002 list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
3011 if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list)) 3003 if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
3012 goto restart; 3004 goto restart;
3013 3005
3014 kvm_mmu_commit_zap_page(kvm, &invalid_list); 3006 kvm_mmu_commit_zap_page(kvm, &invalid_list);
3015 spin_unlock(&kvm->mmu_lock); 3007 spin_unlock(&kvm->mmu_lock);
3016 } 3008 }
3017 3009
3018 static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm, 3010 static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
3019 struct list_head *invalid_list) 3011 struct list_head *invalid_list)
3020 { 3012 {
3021 struct kvm_mmu_page *page; 3013 struct kvm_mmu_page *page;
3022 3014
3023 page = container_of(kvm->arch.active_mmu_pages.prev, 3015 page = container_of(kvm->arch.active_mmu_pages.prev,
3024 struct kvm_mmu_page, link); 3016 struct kvm_mmu_page, link);
3025 return kvm_mmu_prepare_zap_page(kvm, page, invalid_list); 3017 return kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
3026 } 3018 }
3027 3019
3028 static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask) 3020 static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
3029 { 3021 {
3030 struct kvm *kvm; 3022 struct kvm *kvm;
3031 struct kvm *kvm_freed = NULL; 3023 struct kvm *kvm_freed = NULL;
3032 int cache_count = 0; 3024 int cache_count = 0;
3033 3025
3034 spin_lock(&kvm_lock); 3026 spin_lock(&kvm_lock);
3035 3027
3036 list_for_each_entry(kvm, &vm_list, vm_list) { 3028 list_for_each_entry(kvm, &vm_list, vm_list) {
3037 int npages, idx, freed_pages; 3029 int npages, idx, freed_pages;
3038 LIST_HEAD(invalid_list); 3030 LIST_HEAD(invalid_list);
3039 3031
3040 idx = srcu_read_lock(&kvm->srcu); 3032 idx = srcu_read_lock(&kvm->srcu);
3041 spin_lock(&kvm->mmu_lock); 3033 spin_lock(&kvm->mmu_lock);
3042 npages = kvm->arch.n_alloc_mmu_pages - 3034 npages = kvm->arch.n_alloc_mmu_pages -
3043 kvm->arch.n_free_mmu_pages; 3035 kvm->arch.n_free_mmu_pages;
3044 cache_count += npages; 3036 cache_count += npages;
3045 if (!kvm_freed && nr_to_scan > 0 && npages > 0) { 3037 if (!kvm_freed && nr_to_scan > 0 && npages > 0) {
3046 freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm, 3038 freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm,
3047 &invalid_list); 3039 &invalid_list);
3048 cache_count -= freed_pages; 3040 cache_count -= freed_pages;
3049 kvm_freed = kvm; 3041 kvm_freed = kvm;
3050 } 3042 }
3051 nr_to_scan--; 3043 nr_to_scan--;
3052 3044
3053 kvm_mmu_commit_zap_page(kvm, &invalid_list); 3045 kvm_mmu_commit_zap_page(kvm, &invalid_list);
3054 spin_unlock(&kvm->mmu_lock); 3046 spin_unlock(&kvm->mmu_lock);
3055 srcu_read_unlock(&kvm->srcu, idx); 3047 srcu_read_unlock(&kvm->srcu, idx);
3056 } 3048 }
3057 if (kvm_freed) 3049 if (kvm_freed)
3058 list_move_tail(&kvm_freed->vm_list, &vm_list); 3050 list_move_tail(&kvm_freed->vm_list, &vm_list);
3059 3051
3060 spin_unlock(&kvm_lock); 3052 spin_unlock(&kvm_lock);
3061 3053
3062 return cache_count; 3054 return cache_count;
3063 } 3055 }
3064 3056
3065 static struct shrinker mmu_shrinker = { 3057 static struct shrinker mmu_shrinker = {
3066 .shrink = mmu_shrink, 3058 .shrink = mmu_shrink,
3067 .seeks = DEFAULT_SEEKS * 10, 3059 .seeks = DEFAULT_SEEKS * 10,
3068 }; 3060 };
3069 3061
3070 static void mmu_destroy_caches(void) 3062 static void mmu_destroy_caches(void)
3071 { 3063 {
3072 if (pte_chain_cache) 3064 if (pte_chain_cache)
3073 kmem_cache_destroy(pte_chain_cache); 3065 kmem_cache_destroy(pte_chain_cache);
3074 if (rmap_desc_cache) 3066 if (rmap_desc_cache)
3075 kmem_cache_destroy(rmap_desc_cache); 3067 kmem_cache_destroy(rmap_desc_cache);
3076 if (mmu_page_header_cache) 3068 if (mmu_page_header_cache)
3077 kmem_cache_destroy(mmu_page_header_cache); 3069 kmem_cache_destroy(mmu_page_header_cache);
3078 } 3070 }
3079 3071
3080 void kvm_mmu_module_exit(void) 3072 void kvm_mmu_module_exit(void)
3081 { 3073 {
3082 mmu_destroy_caches(); 3074 mmu_destroy_caches();
3083 unregister_shrinker(&mmu_shrinker); 3075 unregister_shrinker(&mmu_shrinker);
3084 } 3076 }
3085 3077
3086 int kvm_mmu_module_init(void) 3078 int kvm_mmu_module_init(void)
3087 { 3079 {
3088 pte_chain_cache = kmem_cache_create("kvm_pte_chain", 3080 pte_chain_cache = kmem_cache_create("kvm_pte_chain",
3089 sizeof(struct kvm_pte_chain), 3081 sizeof(struct kvm_pte_chain),
3090 0, 0, NULL); 3082 0, 0, NULL);
3091 if (!pte_chain_cache) 3083 if (!pte_chain_cache)
3092 goto nomem; 3084 goto nomem;
3093 rmap_desc_cache = kmem_cache_create("kvm_rmap_desc", 3085 rmap_desc_cache = kmem_cache_create("kvm_rmap_desc",
3094 sizeof(struct kvm_rmap_desc), 3086 sizeof(struct kvm_rmap_desc),
3095 0, 0, NULL); 3087 0, 0, NULL);
3096 if (!rmap_desc_cache) 3088 if (!rmap_desc_cache)
3097 goto nomem; 3089 goto nomem;
3098 3090
3099 mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header", 3091 mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header",
3100 sizeof(struct kvm_mmu_page), 3092 sizeof(struct kvm_mmu_page),
3101 0, 0, NULL); 3093 0, 0, NULL);
3102 if (!mmu_page_header_cache) 3094 if (!mmu_page_header_cache)
3103 goto nomem; 3095 goto nomem;
3104 3096
3105 register_shrinker(&mmu_shrinker); 3097 register_shrinker(&mmu_shrinker);
3106 3098
3107 return 0; 3099 return 0;
3108 3100
3109 nomem: 3101 nomem:
3110 mmu_destroy_caches(); 3102 mmu_destroy_caches();
3111 return -ENOMEM; 3103 return -ENOMEM;
3112 } 3104 }
3113 3105
3114 /* 3106 /*
3115 * Caculate mmu pages needed for kvm. 3107 * Caculate mmu pages needed for kvm.
3116 */ 3108 */
3117 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm) 3109 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
3118 { 3110 {
3119 int i; 3111 int i;
3120 unsigned int nr_mmu_pages; 3112 unsigned int nr_mmu_pages;
3121 unsigned int nr_pages = 0; 3113 unsigned int nr_pages = 0;
3122 struct kvm_memslots *slots; 3114 struct kvm_memslots *slots;
3123 3115
3124 slots = kvm_memslots(kvm); 3116 slots = kvm_memslots(kvm);
3125 3117
3126 for (i = 0; i < slots->nmemslots; i++) 3118 for (i = 0; i < slots->nmemslots; i++)
3127 nr_pages += slots->memslots[i].npages; 3119 nr_pages += slots->memslots[i].npages;
3128 3120
3129 nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000; 3121 nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
3130 nr_mmu_pages = max(nr_mmu_pages, 3122 nr_mmu_pages = max(nr_mmu_pages,
3131 (unsigned int) KVM_MIN_ALLOC_MMU_PAGES); 3123 (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
3132 3124
3133 return nr_mmu_pages; 3125 return nr_mmu_pages;
3134 } 3126 }
3135 3127
3136 static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer, 3128 static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
3137 unsigned len) 3129 unsigned len)
3138 { 3130 {
3139 if (len > buffer->len) 3131 if (len > buffer->len)
3140 return NULL; 3132 return NULL;
3141 return buffer->ptr; 3133 return buffer->ptr;
3142 } 3134 }
3143 3135
3144 static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer, 3136 static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
3145 unsigned len) 3137 unsigned len)
3146 { 3138 {
3147 void *ret; 3139 void *ret;
3148 3140
3149 ret = pv_mmu_peek_buffer(buffer, len); 3141 ret = pv_mmu_peek_buffer(buffer, len);
3150 if (!ret) 3142 if (!ret)
3151 return ret; 3143 return ret;
3152 buffer->ptr += len; 3144 buffer->ptr += len;
3153 buffer->len -= len; 3145 buffer->len -= len;
3154 buffer->processed += len; 3146 buffer->processed += len;
3155 return ret; 3147 return ret;
3156 } 3148 }
3157 3149
3158 static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu, 3150 static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
3159 gpa_t addr, gpa_t value) 3151 gpa_t addr, gpa_t value)
3160 { 3152 {
3161 int bytes = 8; 3153 int bytes = 8;
3162 int r; 3154 int r;
3163 3155
3164 if (!is_long_mode(vcpu) && !is_pae(vcpu)) 3156 if (!is_long_mode(vcpu) && !is_pae(vcpu))
3165 bytes = 4; 3157 bytes = 4;
3166 3158
3167 r = mmu_topup_memory_caches(vcpu); 3159 r = mmu_topup_memory_caches(vcpu);
3168 if (r) 3160 if (r)
3169 return r; 3161 return r;
3170 3162
3171 if (!emulator_write_phys(vcpu, addr, &value, bytes)) 3163 if (!emulator_write_phys(vcpu, addr, &value, bytes))
3172 return -EFAULT; 3164 return -EFAULT;
3173 3165
3174 return 1; 3166 return 1;
3175 } 3167 }
3176 3168
3177 static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu) 3169 static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
3178 { 3170 {
3179 (void)kvm_set_cr3(vcpu, vcpu->arch.cr3); 3171 (void)kvm_set_cr3(vcpu, vcpu->arch.cr3);
3180 return 1; 3172 return 1;
3181 } 3173 }
3182 3174
3183 static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr) 3175 static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
3184 { 3176 {
3185 spin_lock(&vcpu->kvm->mmu_lock); 3177 spin_lock(&vcpu->kvm->mmu_lock);
3186 mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT); 3178 mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT);
3187 spin_unlock(&vcpu->kvm->mmu_lock); 3179 spin_unlock(&vcpu->kvm->mmu_lock);
3188 return 1; 3180 return 1;
3189 } 3181 }
3190 3182
3191 static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu, 3183 static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu,
3192 struct kvm_pv_mmu_op_buffer *buffer) 3184 struct kvm_pv_mmu_op_buffer *buffer)
3193 { 3185 {
3194 struct kvm_mmu_op_header *header; 3186 struct kvm_mmu_op_header *header;
3195 3187
3196 header = pv_mmu_peek_buffer(buffer, sizeof *header); 3188 header = pv_mmu_peek_buffer(buffer, sizeof *header);
3197 if (!header) 3189 if (!header)
3198 return 0; 3190 return 0;
3199 switch (header->op) { 3191 switch (header->op) {
3200 case KVM_MMU_OP_WRITE_PTE: { 3192 case KVM_MMU_OP_WRITE_PTE: {
3201 struct kvm_mmu_op_write_pte *wpte; 3193 struct kvm_mmu_op_write_pte *wpte;
3202 3194
3203 wpte = pv_mmu_read_buffer(buffer, sizeof *wpte); 3195 wpte = pv_mmu_read_buffer(buffer, sizeof *wpte);
3204 if (!wpte) 3196 if (!wpte)
3205 return 0; 3197 return 0;
3206 return kvm_pv_mmu_write(vcpu, wpte->pte_phys, 3198 return kvm_pv_mmu_write(vcpu, wpte->pte_phys,
3207 wpte->pte_val); 3199 wpte->pte_val);
3208 } 3200 }
3209 case KVM_MMU_OP_FLUSH_TLB: { 3201 case KVM_MMU_OP_FLUSH_TLB: {
3210 struct kvm_mmu_op_flush_tlb *ftlb; 3202 struct kvm_mmu_op_flush_tlb *ftlb;
3211 3203
3212 ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb); 3204 ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb);
3213 if (!ftlb) 3205 if (!ftlb)
3214 return 0; 3206 return 0;
3215 return kvm_pv_mmu_flush_tlb(vcpu); 3207 return kvm_pv_mmu_flush_tlb(vcpu);
3216 } 3208 }
3217 case KVM_MMU_OP_RELEASE_PT: { 3209 case KVM_MMU_OP_RELEASE_PT: {
3218 struct kvm_mmu_op_release_pt *rpt; 3210 struct kvm_mmu_op_release_pt *rpt;
3219 3211
3220 rpt = pv_mmu_read_buffer(buffer, sizeof *rpt); 3212 rpt = pv_mmu_read_buffer(buffer, sizeof *rpt);
3221 if (!rpt) 3213 if (!rpt)
3222 return 0; 3214 return 0;
3223 return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys); 3215 return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys);
3224 } 3216 }
3225 default: return 0; 3217 default: return 0;
3226 } 3218 }
3227 } 3219 }
3228 3220
3229 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, 3221 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
3230 gpa_t addr, unsigned long *ret) 3222 gpa_t addr, unsigned long *ret)
3231 { 3223 {
3232 int r; 3224 int r;
3233 struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer; 3225 struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer;
3234 3226
3235 buffer->ptr = buffer->buf; 3227 buffer->ptr = buffer->buf;
3236 buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf); 3228 buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf);
3237 buffer->processed = 0; 3229 buffer->processed = 0;
3238 3230
3239 r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len); 3231 r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len);
3240 if (r) 3232 if (r)
3241 goto out; 3233 goto out;
3242 3234
3243 while (buffer->len) { 3235 while (buffer->len) {
3244 r = kvm_pv_mmu_op_one(vcpu, buffer); 3236 r = kvm_pv_mmu_op_one(vcpu, buffer);
3245 if (r < 0) 3237 if (r < 0)
3246 goto out; 3238 goto out;
3247 if (r == 0) 3239 if (r == 0)
3248 break; 3240 break;
3249 } 3241 }
3250 3242
3251 r = 1; 3243 r = 1;
3252 out: 3244 out:
3253 *ret = buffer->processed; 3245 *ret = buffer->processed;
3254 return r; 3246 return r;
3255 } 3247 }
3256 3248
3257 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]) 3249 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
3258 { 3250 {
3259 struct kvm_shadow_walk_iterator iterator; 3251 struct kvm_shadow_walk_iterator iterator;
3260 int nr_sptes = 0; 3252 int nr_sptes = 0;
3261 3253
3262 spin_lock(&vcpu->kvm->mmu_lock); 3254 spin_lock(&vcpu->kvm->mmu_lock);
3263 for_each_shadow_entry(vcpu, addr, iterator) { 3255 for_each_shadow_entry(vcpu, addr, iterator) {
3264 sptes[iterator.level-1] = *iterator.sptep; 3256 sptes[iterator.level-1] = *iterator.sptep;
3265 nr_sptes++; 3257 nr_sptes++;
3266 if (!is_shadow_present_pte(*iterator.sptep)) 3258 if (!is_shadow_present_pte(*iterator.sptep))
3267 break; 3259 break;
3268 } 3260 }
3269 spin_unlock(&vcpu->kvm->mmu_lock); 3261 spin_unlock(&vcpu->kvm->mmu_lock);
3270 3262
3271 return nr_sptes; 3263 return nr_sptes;
3272 } 3264 }
3273 EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy); 3265 EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
3274 3266
3275 #ifdef AUDIT 3267 #ifdef AUDIT
3276 3268
3277 static const char *audit_msg; 3269 static const char *audit_msg;
3278 3270
3279 static gva_t canonicalize(gva_t gva) 3271 static gva_t canonicalize(gva_t gva)
3280 { 3272 {
3281 #ifdef CONFIG_X86_64 3273 #ifdef CONFIG_X86_64
3282 gva = (long long)(gva << 16) >> 16; 3274 gva = (long long)(gva << 16) >> 16;
3283 #endif 3275 #endif
3284 return gva; 3276 return gva;
3285 } 3277 }
3286 3278
3287 3279
3288 typedef void (*inspect_spte_fn) (struct kvm *kvm, u64 *sptep); 3280 typedef void (*inspect_spte_fn) (struct kvm *kvm, u64 *sptep);
3289 3281
3290 static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp, 3282 static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
3291 inspect_spte_fn fn) 3283 inspect_spte_fn fn)
3292 { 3284 {
3293 int i; 3285 int i;
3294 3286
3295 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { 3287 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
3296 u64 ent = sp->spt[i]; 3288 u64 ent = sp->spt[i];
3297 3289
3298 if (is_shadow_present_pte(ent)) { 3290 if (is_shadow_present_pte(ent)) {
3299 if (!is_last_spte(ent, sp->role.level)) { 3291 if (!is_last_spte(ent, sp->role.level)) {
3300 struct kvm_mmu_page *child; 3292 struct kvm_mmu_page *child;
3301 child = page_header(ent & PT64_BASE_ADDR_MASK); 3293 child = page_header(ent & PT64_BASE_ADDR_MASK);
3302 __mmu_spte_walk(kvm, child, fn); 3294 __mmu_spte_walk(kvm, child, fn);
3303 } else 3295 } else
3304 fn(kvm, &sp->spt[i]); 3296 fn(kvm, &sp->spt[i]);
3305 } 3297 }
3306 } 3298 }
3307 } 3299 }
3308 3300
3309 static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn) 3301 static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
3310 { 3302 {
3311 int i; 3303 int i;
3312 struct kvm_mmu_page *sp; 3304 struct kvm_mmu_page *sp;
3313 3305
3314 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) 3306 if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
3315 return; 3307 return;
3316 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { 3308 if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
3317 hpa_t root = vcpu->arch.mmu.root_hpa; 3309 hpa_t root = vcpu->arch.mmu.root_hpa;
3318 sp = page_header(root); 3310 sp = page_header(root);
3319 __mmu_spte_walk(vcpu->kvm, sp, fn); 3311 __mmu_spte_walk(vcpu->kvm, sp, fn);
3320 return; 3312 return;
3321 } 3313 }
3322 for (i = 0; i < 4; ++i) { 3314 for (i = 0; i < 4; ++i) {
3323 hpa_t root = vcpu->arch.mmu.pae_root[i]; 3315 hpa_t root = vcpu->arch.mmu.pae_root[i];
3324 3316
3325 if (root && VALID_PAGE(root)) { 3317 if (root && VALID_PAGE(root)) {
3326 root &= PT64_BASE_ADDR_MASK; 3318 root &= PT64_BASE_ADDR_MASK;
3327 sp = page_header(root); 3319 sp = page_header(root);
3328 __mmu_spte_walk(vcpu->kvm, sp, fn); 3320 __mmu_spte_walk(vcpu->kvm, sp, fn);
3329 } 3321 }
3330 } 3322 }
3331 return; 3323 return;
3332 } 3324 }
3333 3325
3334 static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte, 3326 static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
3335 gva_t va, int level) 3327 gva_t va, int level)
3336 { 3328 {
3337 u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK); 3329 u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK);
3338 int i; 3330 int i;
3339 gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1)); 3331 gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1));
3340 3332
3341 for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) { 3333 for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) {
3342 u64 ent = pt[i]; 3334 u64 ent = pt[i];
3343 3335
3344 if (ent == shadow_trap_nonpresent_pte) 3336 if (ent == shadow_trap_nonpresent_pte)
3345 continue; 3337 continue;
3346 3338
3347 va = canonicalize(va); 3339 va = canonicalize(va);
3348 if (is_shadow_present_pte(ent) && !is_last_spte(ent, level)) 3340 if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
3349 audit_mappings_page(vcpu, ent, va, level - 1); 3341 audit_mappings_page(vcpu, ent, va, level - 1);
3350 else { 3342 else {
3351 gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL); 3343 gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL);
3352 gfn_t gfn = gpa >> PAGE_SHIFT; 3344 gfn_t gfn = gpa >> PAGE_SHIFT;
3353 pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn); 3345 pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
3354 hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT; 3346 hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
3355 3347
3356 if (is_error_pfn(pfn)) { 3348 if (is_error_pfn(pfn)) {
3357 kvm_release_pfn_clean(pfn); 3349 kvm_release_pfn_clean(pfn);
3358 continue; 3350 continue;
3359 } 3351 }
3360 3352
3361 if (is_shadow_present_pte(ent) 3353 if (is_shadow_present_pte(ent)
3362 && (ent & PT64_BASE_ADDR_MASK) != hpa) 3354 && (ent & PT64_BASE_ADDR_MASK) != hpa)
3363 printk(KERN_ERR "xx audit error: (%s) levels %d" 3355 printk(KERN_ERR "xx audit error: (%s) levels %d"
3364 " gva %lx gpa %llx hpa %llx ent %llx %d\n", 3356 " gva %lx gpa %llx hpa %llx ent %llx %d\n",
3365 audit_msg, vcpu->arch.mmu.root_level, 3357 audit_msg, vcpu->arch.mmu.root_level,
3366 va, gpa, hpa, ent, 3358 va, gpa, hpa, ent,
3367 is_shadow_present_pte(ent)); 3359 is_shadow_present_pte(ent));
3368 else if (ent == shadow_notrap_nonpresent_pte 3360 else if (ent == shadow_notrap_nonpresent_pte
3369 && !is_error_hpa(hpa)) 3361 && !is_error_hpa(hpa))
3370 printk(KERN_ERR "audit: (%s) notrap shadow," 3362 printk(KERN_ERR "audit: (%s) notrap shadow,"
3371 " valid guest gva %lx\n", audit_msg, va); 3363 " valid guest gva %lx\n", audit_msg, va);
3372 kvm_release_pfn_clean(pfn); 3364 kvm_release_pfn_clean(pfn);
3373 3365
3374 } 3366 }
3375 } 3367 }
3376 } 3368 }
3377 3369
3378 static void audit_mappings(struct kvm_vcpu *vcpu) 3370 static void audit_mappings(struct kvm_vcpu *vcpu)
3379 { 3371 {
3380 unsigned i; 3372 unsigned i;
3381 3373
3382 if (vcpu->arch.mmu.root_level == 4) 3374 if (vcpu->arch.mmu.root_level == 4)
3383 audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4); 3375 audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4);
3384 else 3376 else
3385 for (i = 0; i < 4; ++i) 3377 for (i = 0; i < 4; ++i)
3386 if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK) 3378 if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK)
3387 audit_mappings_page(vcpu, 3379 audit_mappings_page(vcpu,
3388 vcpu->arch.mmu.pae_root[i], 3380 vcpu->arch.mmu.pae_root[i],
3389 i << 30, 3381 i << 30,
3390 2); 3382 2);
3391 } 3383 }
3392 3384
3393 static int count_rmaps(struct kvm_vcpu *vcpu) 3385 static int count_rmaps(struct kvm_vcpu *vcpu)
3394 { 3386 {
3395 struct kvm *kvm = vcpu->kvm; 3387 struct kvm *kvm = vcpu->kvm;
3396 struct kvm_memslots *slots; 3388 struct kvm_memslots *slots;
3397 int nmaps = 0; 3389 int nmaps = 0;
3398 int i, j, k, idx; 3390 int i, j, k, idx;
3399 3391
3400 idx = srcu_read_lock(&kvm->srcu); 3392 idx = srcu_read_lock(&kvm->srcu);
3401 slots = kvm_memslots(kvm); 3393 slots = kvm_memslots(kvm);
3402 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { 3394 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
3403 struct kvm_memory_slot *m = &slots->memslots[i]; 3395 struct kvm_memory_slot *m = &slots->memslots[i];
3404 struct kvm_rmap_desc *d; 3396 struct kvm_rmap_desc *d;
3405 3397
3406 for (j = 0; j < m->npages; ++j) { 3398 for (j = 0; j < m->npages; ++j) {
3407 unsigned long *rmapp = &m->rmap[j]; 3399 unsigned long *rmapp = &m->rmap[j];
3408 3400
3409 if (!*rmapp) 3401 if (!*rmapp)
3410 continue; 3402 continue;
3411 if (!(*rmapp & 1)) { 3403 if (!(*rmapp & 1)) {
3412 ++nmaps; 3404 ++nmaps;
3413 continue; 3405 continue;
3414 } 3406 }
3415 d = (struct kvm_rmap_desc *)(*rmapp & ~1ul); 3407 d = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
3416 while (d) { 3408 while (d) {
3417 for (k = 0; k < RMAP_EXT; ++k) 3409 for (k = 0; k < RMAP_EXT; ++k)
3418 if (d->sptes[k]) 3410 if (d->sptes[k])
3419 ++nmaps; 3411 ++nmaps;
3420 else 3412 else
3421 break; 3413 break;
3422 d = d->more; 3414 d = d->more;
3423 } 3415 }
3424 } 3416 }
3425 } 3417 }
3426 srcu_read_unlock(&kvm->srcu, idx); 3418 srcu_read_unlock(&kvm->srcu, idx);
3427 return nmaps; 3419 return nmaps;
3428 } 3420 }
3429 3421
3430 void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep) 3422 void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
3431 { 3423 {
3432 unsigned long *rmapp; 3424 unsigned long *rmapp;
3433 struct kvm_mmu_page *rev_sp; 3425 struct kvm_mmu_page *rev_sp;
3434 gfn_t gfn; 3426 gfn_t gfn;
3435 3427
3436 if (is_writable_pte(*sptep)) { 3428 if (is_writable_pte(*sptep)) {
3437 rev_sp = page_header(__pa(sptep)); 3429 rev_sp = page_header(__pa(sptep));
3438 gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt); 3430 gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
3439 3431
3440 if (!gfn_to_memslot(kvm, gfn)) { 3432 if (!gfn_to_memslot(kvm, gfn)) {
3441 if (!printk_ratelimit()) 3433 if (!printk_ratelimit())
3442 return; 3434 return;
3443 printk(KERN_ERR "%s: no memslot for gfn %ld\n", 3435 printk(KERN_ERR "%s: no memslot for gfn %ld\n",
3444 audit_msg, gfn); 3436 audit_msg, gfn);
3445 printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n", 3437 printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n",
3446 audit_msg, (long int)(sptep - rev_sp->spt), 3438 audit_msg, (long int)(sptep - rev_sp->spt),
3447 rev_sp->gfn); 3439 rev_sp->gfn);
3448 dump_stack(); 3440 dump_stack();
3449 return; 3441 return;
3450 } 3442 }
3451 3443
3452 rmapp = gfn_to_rmap(kvm, gfn, rev_sp->role.level); 3444 rmapp = gfn_to_rmap(kvm, gfn, rev_sp->role.level);
3453 if (!*rmapp) { 3445 if (!*rmapp) {
3454 if (!printk_ratelimit()) 3446 if (!printk_ratelimit())
3455 return; 3447 return;
3456 printk(KERN_ERR "%s: no rmap for writable spte %llx\n", 3448 printk(KERN_ERR "%s: no rmap for writable spte %llx\n",
3457 audit_msg, *sptep); 3449 audit_msg, *sptep);
3458 dump_stack(); 3450 dump_stack();
3459 } 3451 }
3460 } 3452 }
3461 3453
3462 } 3454 }
3463 3455
3464 void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu) 3456 void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu)
3465 { 3457 {
3466 mmu_spte_walk(vcpu, inspect_spte_has_rmap); 3458 mmu_spte_walk(vcpu, inspect_spte_has_rmap);
3467 } 3459 }
3468 3460
3469 static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu) 3461 static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu)
3470 { 3462 {
3471 struct kvm_mmu_page *sp; 3463 struct kvm_mmu_page *sp;
3472 int i; 3464 int i;
3473 3465
3474 list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { 3466 list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
3475 u64 *pt = sp->spt; 3467 u64 *pt = sp->spt;
3476 3468
3477 if (sp->role.level != PT_PAGE_TABLE_LEVEL) 3469 if (sp->role.level != PT_PAGE_TABLE_LEVEL)
3478 continue; 3470 continue;
3479 3471
3480 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { 3472 for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
3481 u64 ent = pt[i]; 3473 u64 ent = pt[i];
3482 3474
3483 if (!(ent & PT_PRESENT_MASK)) 3475 if (!(ent & PT_PRESENT_MASK))
3484 continue; 3476 continue;
3485 if (!is_writable_pte(ent)) 3477 if (!is_writable_pte(ent))
3486 continue; 3478 continue;
3487 inspect_spte_has_rmap(vcpu->kvm, &pt[i]); 3479 inspect_spte_has_rmap(vcpu->kvm, &pt[i]);
3488 } 3480 }
3489 } 3481 }
3490 return; 3482 return;
3491 } 3483 }
3492 3484
3493 static void audit_rmap(struct kvm_vcpu *vcpu) 3485 static void audit_rmap(struct kvm_vcpu *vcpu)
3494 { 3486 {
3495 check_writable_mappings_rmap(vcpu); 3487 check_writable_mappings_rmap(vcpu);
3496 count_rmaps(vcpu); 3488 count_rmaps(vcpu);
3497 } 3489 }
3498 3490
3499 static void audit_write_protection(struct kvm_vcpu *vcpu) 3491 static void audit_write_protection(struct kvm_vcpu *vcpu)
3500 { 3492 {
3501 struct kvm_mmu_page *sp; 3493 struct kvm_mmu_page *sp;
3502 struct kvm_memory_slot *slot; 3494 struct kvm_memory_slot *slot;
3503 unsigned long *rmapp; 3495 unsigned long *rmapp;
3504 u64 *spte; 3496 u64 *spte;
3505 gfn_t gfn; 3497 gfn_t gfn;
3506 3498
3507 list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { 3499 list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
3508 if (sp->role.direct) 3500 if (sp->role.direct)
3509 continue; 3501 continue;
3510 if (sp->unsync) 3502 if (sp->unsync)
3511 continue; 3503 continue;
3512 3504
3513 gfn = unalias_gfn(vcpu->kvm, sp->gfn); 3505 slot = gfn_to_memslot(vcpu->kvm, sp->gfn);
3514 slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn);
3515 rmapp = &slot->rmap[gfn - slot->base_gfn]; 3506 rmapp = &slot->rmap[gfn - slot->base_gfn];
3516 3507
3517 spte = rmap_next(vcpu->kvm, rmapp, NULL); 3508 spte = rmap_next(vcpu->kvm, rmapp, NULL);
3518 while (spte) { 3509 while (spte) {
3519 if (is_writable_pte(*spte)) 3510 if (is_writable_pte(*spte))
3520 printk(KERN_ERR "%s: (%s) shadow page has " 3511 printk(KERN_ERR "%s: (%s) shadow page has "
3521 "writable mappings: gfn %lx role %x\n", 3512 "writable mappings: gfn %lx role %x\n",
3522 __func__, audit_msg, sp->gfn, 3513 __func__, audit_msg, sp->gfn,
3523 sp->role.word); 3514 sp->role.word);
3524 spte = rmap_next(vcpu->kvm, rmapp, spte); 3515 spte = rmap_next(vcpu->kvm, rmapp, spte);
3525 } 3516 }
3526 } 3517 }
3527 } 3518 }
3528 3519
3529 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) 3520 static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg)
3530 { 3521 {
3531 int olddbg = dbg; 3522 int olddbg = dbg;
3532 3523
3533 dbg = 0; 3524 dbg = 0;
3534 audit_msg = msg; 3525 audit_msg = msg;
3535 audit_rmap(vcpu); 3526 audit_rmap(vcpu);
3536 audit_write_protection(vcpu); 3527 audit_write_protection(vcpu);
3537 if (strcmp("pre pte write", audit_msg) != 0) 3528 if (strcmp("pre pte write", audit_msg) != 0)
3538 audit_mappings(vcpu); 3529 audit_mappings(vcpu);
3539 audit_writable_sptes_have_rmaps(vcpu); 3530 audit_writable_sptes_have_rmaps(vcpu);
3540 dbg = olddbg; 3531 dbg = olddbg;
3541 } 3532 }
3542 3533
3543 #endif 3534 #endif
3544 3535
arch/x86/kvm/paging_tmpl.h
1 /* 1 /*
2 * Kernel-based Virtual Machine driver for Linux 2 * Kernel-based Virtual Machine driver for Linux
3 * 3 *
4 * This module enables machines with Intel VT-x extensions to run virtual 4 * This module enables machines with Intel VT-x extensions to run virtual
5 * machines without emulation or binary translation. 5 * machines without emulation or binary translation.
6 * 6 *
7 * MMU support 7 * MMU support
8 * 8 *
9 * Copyright (C) 2006 Qumranet, Inc. 9 * Copyright (C) 2006 Qumranet, Inc.
10 * Copyright 2010 Red Hat, Inc. and/or its affilates. 10 * Copyright 2010 Red Hat, Inc. and/or its affilates.
11 * 11 *
12 * Authors: 12 * Authors:
13 * Yaniv Kamay <yaniv@qumranet.com> 13 * Yaniv Kamay <yaniv@qumranet.com>
14 * Avi Kivity <avi@qumranet.com> 14 * Avi Kivity <avi@qumranet.com>
15 * 15 *
16 * This work is licensed under the terms of the GNU GPL, version 2. See 16 * This work is licensed under the terms of the GNU GPL, version 2. See
17 * the COPYING file in the top-level directory. 17 * the COPYING file in the top-level directory.
18 * 18 *
19 */ 19 */
20 20
21 /* 21 /*
22 * We need the mmu code to access both 32-bit and 64-bit guest ptes, 22 * We need the mmu code to access both 32-bit and 64-bit guest ptes,
23 * so the code in this file is compiled twice, once per pte size. 23 * so the code in this file is compiled twice, once per pte size.
24 */ 24 */
25 25
26 #if PTTYPE == 64 26 #if PTTYPE == 64
27 #define pt_element_t u64 27 #define pt_element_t u64
28 #define guest_walker guest_walker64 28 #define guest_walker guest_walker64
29 #define FNAME(name) paging##64_##name 29 #define FNAME(name) paging##64_##name
30 #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK 30 #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
31 #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) 31 #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
32 #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) 32 #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
33 #define PT_INDEX(addr, level) PT64_INDEX(addr, level) 33 #define PT_INDEX(addr, level) PT64_INDEX(addr, level)
34 #define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level) 34 #define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level)
35 #define PT_LEVEL_BITS PT64_LEVEL_BITS 35 #define PT_LEVEL_BITS PT64_LEVEL_BITS
36 #ifdef CONFIG_X86_64 36 #ifdef CONFIG_X86_64
37 #define PT_MAX_FULL_LEVELS 4 37 #define PT_MAX_FULL_LEVELS 4
38 #define CMPXCHG cmpxchg 38 #define CMPXCHG cmpxchg
39 #else 39 #else
40 #define CMPXCHG cmpxchg64 40 #define CMPXCHG cmpxchg64
41 #define PT_MAX_FULL_LEVELS 2 41 #define PT_MAX_FULL_LEVELS 2
42 #endif 42 #endif
43 #elif PTTYPE == 32 43 #elif PTTYPE == 32
44 #define pt_element_t u32 44 #define pt_element_t u32
45 #define guest_walker guest_walker32 45 #define guest_walker guest_walker32
46 #define FNAME(name) paging##32_##name 46 #define FNAME(name) paging##32_##name
47 #define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK 47 #define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK
48 #define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl) 48 #define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl)
49 #define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl) 49 #define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl)
50 #define PT_INDEX(addr, level) PT32_INDEX(addr, level) 50 #define PT_INDEX(addr, level) PT32_INDEX(addr, level)
51 #define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level) 51 #define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level)
52 #define PT_LEVEL_BITS PT32_LEVEL_BITS 52 #define PT_LEVEL_BITS PT32_LEVEL_BITS
53 #define PT_MAX_FULL_LEVELS 2 53 #define PT_MAX_FULL_LEVELS 2
54 #define CMPXCHG cmpxchg 54 #define CMPXCHG cmpxchg
55 #else 55 #else
56 #error Invalid PTTYPE value 56 #error Invalid PTTYPE value
57 #endif 57 #endif
58 58
59 #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl) 59 #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl)
60 #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL) 60 #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL)
61 61
62 /* 62 /*
63 * The guest_walker structure emulates the behavior of the hardware page 63 * The guest_walker structure emulates the behavior of the hardware page
64 * table walker. 64 * table walker.
65 */ 65 */
66 struct guest_walker { 66 struct guest_walker {
67 int level; 67 int level;
68 gfn_t table_gfn[PT_MAX_FULL_LEVELS]; 68 gfn_t table_gfn[PT_MAX_FULL_LEVELS];
69 pt_element_t ptes[PT_MAX_FULL_LEVELS]; 69 pt_element_t ptes[PT_MAX_FULL_LEVELS];
70 gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; 70 gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
71 unsigned pt_access; 71 unsigned pt_access;
72 unsigned pte_access; 72 unsigned pte_access;
73 gfn_t gfn; 73 gfn_t gfn;
74 u32 error_code; 74 u32 error_code;
75 }; 75 };
76 76
77 static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) 77 static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
78 { 78 {
79 return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT; 79 return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
80 } 80 }
81 81
82 static bool FNAME(cmpxchg_gpte)(struct kvm *kvm, 82 static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
83 gfn_t table_gfn, unsigned index, 83 gfn_t table_gfn, unsigned index,
84 pt_element_t orig_pte, pt_element_t new_pte) 84 pt_element_t orig_pte, pt_element_t new_pte)
85 { 85 {
86 pt_element_t ret; 86 pt_element_t ret;
87 pt_element_t *table; 87 pt_element_t *table;
88 struct page *page; 88 struct page *page;
89 89
90 page = gfn_to_page(kvm, table_gfn); 90 page = gfn_to_page(kvm, table_gfn);
91 91
92 table = kmap_atomic(page, KM_USER0); 92 table = kmap_atomic(page, KM_USER0);
93 ret = CMPXCHG(&table[index], orig_pte, new_pte); 93 ret = CMPXCHG(&table[index], orig_pte, new_pte);
94 kunmap_atomic(table, KM_USER0); 94 kunmap_atomic(table, KM_USER0);
95 95
96 kvm_release_page_dirty(page); 96 kvm_release_page_dirty(page);
97 97
98 return (ret != orig_pte); 98 return (ret != orig_pte);
99 } 99 }
100 100
101 static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) 101 static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
102 { 102 {
103 unsigned access; 103 unsigned access;
104 104
105 access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; 105 access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
106 #if PTTYPE == 64 106 #if PTTYPE == 64
107 if (is_nx(vcpu)) 107 if (is_nx(vcpu))
108 access &= ~(gpte >> PT64_NX_SHIFT); 108 access &= ~(gpte >> PT64_NX_SHIFT);
109 #endif 109 #endif
110 return access; 110 return access;
111 } 111 }
112 112
113 /* 113 /*
114 * Fetch a guest pte for a guest virtual address 114 * Fetch a guest pte for a guest virtual address
115 */ 115 */
116 static int FNAME(walk_addr)(struct guest_walker *walker, 116 static int FNAME(walk_addr)(struct guest_walker *walker,
117 struct kvm_vcpu *vcpu, gva_t addr, 117 struct kvm_vcpu *vcpu, gva_t addr,
118 int write_fault, int user_fault, int fetch_fault) 118 int write_fault, int user_fault, int fetch_fault)
119 { 119 {
120 pt_element_t pte; 120 pt_element_t pte;
121 gfn_t table_gfn; 121 gfn_t table_gfn;
122 unsigned index, pt_access, pte_access; 122 unsigned index, pt_access, pte_access;
123 gpa_t pte_gpa; 123 gpa_t pte_gpa;
124 int rsvd_fault = 0; 124 int rsvd_fault = 0;
125 125
126 trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, 126 trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
127 fetch_fault); 127 fetch_fault);
128 walk: 128 walk:
129 walker->level = vcpu->arch.mmu.root_level; 129 walker->level = vcpu->arch.mmu.root_level;
130 pte = vcpu->arch.cr3; 130 pte = vcpu->arch.cr3;
131 #if PTTYPE == 64 131 #if PTTYPE == 64
132 if (!is_long_mode(vcpu)) { 132 if (!is_long_mode(vcpu)) {
133 pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3); 133 pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
134 trace_kvm_mmu_paging_element(pte, walker->level); 134 trace_kvm_mmu_paging_element(pte, walker->level);
135 if (!is_present_gpte(pte)) 135 if (!is_present_gpte(pte))
136 goto not_present; 136 goto not_present;
137 --walker->level; 137 --walker->level;
138 } 138 }
139 #endif 139 #endif
140 ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) || 140 ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
141 (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0); 141 (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0);
142 142
143 pt_access = ACC_ALL; 143 pt_access = ACC_ALL;
144 144
145 for (;;) { 145 for (;;) {
146 index = PT_INDEX(addr, walker->level); 146 index = PT_INDEX(addr, walker->level);
147 147
148 table_gfn = gpte_to_gfn(pte); 148 table_gfn = gpte_to_gfn(pte);
149 pte_gpa = gfn_to_gpa(table_gfn); 149 pte_gpa = gfn_to_gpa(table_gfn);
150 pte_gpa += index * sizeof(pt_element_t); 150 pte_gpa += index * sizeof(pt_element_t);
151 walker->table_gfn[walker->level - 1] = table_gfn; 151 walker->table_gfn[walker->level - 1] = table_gfn;
152 walker->pte_gpa[walker->level - 1] = pte_gpa; 152 walker->pte_gpa[walker->level - 1] = pte_gpa;
153 153
154 if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) 154 if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte)))
155 goto not_present; 155 goto not_present;
156 156
157 trace_kvm_mmu_paging_element(pte, walker->level); 157 trace_kvm_mmu_paging_element(pte, walker->level);
158 158
159 if (!is_present_gpte(pte)) 159 if (!is_present_gpte(pte))
160 goto not_present; 160 goto not_present;
161 161
162 rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker->level); 162 rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker->level);
163 if (rsvd_fault) 163 if (rsvd_fault)
164 goto access_error; 164 goto access_error;
165 165
166 if (write_fault && !is_writable_pte(pte)) 166 if (write_fault && !is_writable_pte(pte))
167 if (user_fault || is_write_protection(vcpu)) 167 if (user_fault || is_write_protection(vcpu))
168 goto access_error; 168 goto access_error;
169 169
170 if (user_fault && !(pte & PT_USER_MASK)) 170 if (user_fault && !(pte & PT_USER_MASK))
171 goto access_error; 171 goto access_error;
172 172
173 #if PTTYPE == 64 173 #if PTTYPE == 64
174 if (fetch_fault && (pte & PT64_NX_MASK)) 174 if (fetch_fault && (pte & PT64_NX_MASK))
175 goto access_error; 175 goto access_error;
176 #endif 176 #endif
177 177
178 if (!(pte & PT_ACCESSED_MASK)) { 178 if (!(pte & PT_ACCESSED_MASK)) {
179 trace_kvm_mmu_set_accessed_bit(table_gfn, index, 179 trace_kvm_mmu_set_accessed_bit(table_gfn, index,
180 sizeof(pte)); 180 sizeof(pte));
181 if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, 181 if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
182 index, pte, pte|PT_ACCESSED_MASK)) 182 index, pte, pte|PT_ACCESSED_MASK))
183 goto walk; 183 goto walk;
184 mark_page_dirty(vcpu->kvm, table_gfn); 184 mark_page_dirty(vcpu->kvm, table_gfn);
185 pte |= PT_ACCESSED_MASK; 185 pte |= PT_ACCESSED_MASK;
186 } 186 }
187 187
188 pte_access = pt_access & FNAME(gpte_access)(vcpu, pte); 188 pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
189 189
190 walker->ptes[walker->level - 1] = pte; 190 walker->ptes[walker->level - 1] = pte;
191 191
192 if ((walker->level == PT_PAGE_TABLE_LEVEL) || 192 if ((walker->level == PT_PAGE_TABLE_LEVEL) ||
193 ((walker->level == PT_DIRECTORY_LEVEL) && 193 ((walker->level == PT_DIRECTORY_LEVEL) &&
194 is_large_pte(pte) && 194 is_large_pte(pte) &&
195 (PTTYPE == 64 || is_pse(vcpu))) || 195 (PTTYPE == 64 || is_pse(vcpu))) ||
196 ((walker->level == PT_PDPE_LEVEL) && 196 ((walker->level == PT_PDPE_LEVEL) &&
197 is_large_pte(pte) && 197 is_large_pte(pte) &&
198 is_long_mode(vcpu))) { 198 is_long_mode(vcpu))) {
199 int lvl = walker->level; 199 int lvl = walker->level;
200 200
201 walker->gfn = gpte_to_gfn_lvl(pte, lvl); 201 walker->gfn = gpte_to_gfn_lvl(pte, lvl);
202 walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) 202 walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl))
203 >> PAGE_SHIFT; 203 >> PAGE_SHIFT;
204 204
205 if (PTTYPE == 32 && 205 if (PTTYPE == 32 &&
206 walker->level == PT_DIRECTORY_LEVEL && 206 walker->level == PT_DIRECTORY_LEVEL &&
207 is_cpuid_PSE36()) 207 is_cpuid_PSE36())
208 walker->gfn += pse36_gfn_delta(pte); 208 walker->gfn += pse36_gfn_delta(pte);
209 209
210 break; 210 break;
211 } 211 }
212 212
213 pt_access = pte_access; 213 pt_access = pte_access;
214 --walker->level; 214 --walker->level;
215 } 215 }
216 216
217 if (write_fault && !is_dirty_gpte(pte)) { 217 if (write_fault && !is_dirty_gpte(pte)) {
218 bool ret; 218 bool ret;
219 219
220 trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); 220 trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
221 ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, 221 ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
222 pte|PT_DIRTY_MASK); 222 pte|PT_DIRTY_MASK);
223 if (ret) 223 if (ret)
224 goto walk; 224 goto walk;
225 mark_page_dirty(vcpu->kvm, table_gfn); 225 mark_page_dirty(vcpu->kvm, table_gfn);
226 pte |= PT_DIRTY_MASK; 226 pte |= PT_DIRTY_MASK;
227 walker->ptes[walker->level - 1] = pte; 227 walker->ptes[walker->level - 1] = pte;
228 } 228 }
229 229
230 walker->pt_access = pt_access; 230 walker->pt_access = pt_access;
231 walker->pte_access = pte_access; 231 walker->pte_access = pte_access;
232 pgprintk("%s: pte %llx pte_access %x pt_access %x\n", 232 pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
233 __func__, (u64)pte, pte_access, pt_access); 233 __func__, (u64)pte, pte_access, pt_access);
234 return 1; 234 return 1;
235 235
236 not_present: 236 not_present:
237 walker->error_code = 0; 237 walker->error_code = 0;
238 goto err; 238 goto err;
239 239
240 access_error: 240 access_error:
241 walker->error_code = PFERR_PRESENT_MASK; 241 walker->error_code = PFERR_PRESENT_MASK;
242 242
243 err: 243 err:
244 if (write_fault) 244 if (write_fault)
245 walker->error_code |= PFERR_WRITE_MASK; 245 walker->error_code |= PFERR_WRITE_MASK;
246 if (user_fault) 246 if (user_fault)
247 walker->error_code |= PFERR_USER_MASK; 247 walker->error_code |= PFERR_USER_MASK;
248 if (fetch_fault) 248 if (fetch_fault)
249 walker->error_code |= PFERR_FETCH_MASK; 249 walker->error_code |= PFERR_FETCH_MASK;
250 if (rsvd_fault) 250 if (rsvd_fault)
251 walker->error_code |= PFERR_RSVD_MASK; 251 walker->error_code |= PFERR_RSVD_MASK;
252 trace_kvm_mmu_walker_error(walker->error_code); 252 trace_kvm_mmu_walker_error(walker->error_code);
253 return 0; 253 return 0;
254 } 254 }
255 255
256 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 256 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
257 u64 *spte, const void *pte) 257 u64 *spte, const void *pte)
258 { 258 {
259 pt_element_t gpte; 259 pt_element_t gpte;
260 unsigned pte_access; 260 unsigned pte_access;
261 pfn_t pfn; 261 pfn_t pfn;
262 u64 new_spte; 262 u64 new_spte;
263 263
264 gpte = *(const pt_element_t *)pte; 264 gpte = *(const pt_element_t *)pte;
265 if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { 265 if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
266 if (!is_present_gpte(gpte)) { 266 if (!is_present_gpte(gpte)) {
267 if (sp->unsync) 267 if (sp->unsync)
268 new_spte = shadow_trap_nonpresent_pte; 268 new_spte = shadow_trap_nonpresent_pte;
269 else 269 else
270 new_spte = shadow_notrap_nonpresent_pte; 270 new_spte = shadow_notrap_nonpresent_pte;
271 __set_spte(spte, new_spte); 271 __set_spte(spte, new_spte);
272 } 272 }
273 return; 273 return;
274 } 274 }
275 pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte); 275 pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
276 pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); 276 pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
277 if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn) 277 if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn)
278 return; 278 return;
279 pfn = vcpu->arch.update_pte.pfn; 279 pfn = vcpu->arch.update_pte.pfn;
280 if (is_error_pfn(pfn)) 280 if (is_error_pfn(pfn))
281 return; 281 return;
282 if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq)) 282 if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq))
283 return; 283 return;
284 kvm_get_pfn(pfn); 284 kvm_get_pfn(pfn);
285 /* 285 /*
286 * we call mmu_set_spte() with reset_host_protection = true beacuse that 286 * we call mmu_set_spte() with reset_host_protection = true beacuse that
287 * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1). 287 * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
288 */ 288 */
289 mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, 289 mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
290 is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL, 290 is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL,
291 gpte_to_gfn(gpte), pfn, true, true); 291 gpte_to_gfn(gpte), pfn, true, true);
292 } 292 }
293 293
294 /* 294 /*
295 * Fetch a shadow pte for a specific level in the paging hierarchy. 295 * Fetch a shadow pte for a specific level in the paging hierarchy.
296 */ 296 */
297 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, 297 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
298 struct guest_walker *gw, 298 struct guest_walker *gw,
299 int user_fault, int write_fault, int hlevel, 299 int user_fault, int write_fault, int hlevel,
300 int *ptwrite, pfn_t pfn) 300 int *ptwrite, pfn_t pfn)
301 { 301 {
302 unsigned access = gw->pt_access; 302 unsigned access = gw->pt_access;
303 struct kvm_mmu_page *sp; 303 struct kvm_mmu_page *sp;
304 u64 spte, *sptep = NULL; 304 u64 spte, *sptep = NULL;
305 int direct; 305 int direct;
306 gfn_t table_gfn; 306 gfn_t table_gfn;
307 int r; 307 int r;
308 int level; 308 int level;
309 pt_element_t curr_pte; 309 pt_element_t curr_pte;
310 struct kvm_shadow_walk_iterator iterator; 310 struct kvm_shadow_walk_iterator iterator;
311 311
312 if (!is_present_gpte(gw->ptes[gw->level - 1])) 312 if (!is_present_gpte(gw->ptes[gw->level - 1]))
313 return NULL; 313 return NULL;
314 314
315 for_each_shadow_entry(vcpu, addr, iterator) { 315 for_each_shadow_entry(vcpu, addr, iterator) {
316 level = iterator.level; 316 level = iterator.level;
317 sptep = iterator.sptep; 317 sptep = iterator.sptep;
318 if (iterator.level == hlevel) { 318 if (iterator.level == hlevel) {
319 mmu_set_spte(vcpu, sptep, access, 319 mmu_set_spte(vcpu, sptep, access,
320 gw->pte_access & access, 320 gw->pte_access & access,
321 user_fault, write_fault, 321 user_fault, write_fault,
322 is_dirty_gpte(gw->ptes[gw->level-1]), 322 is_dirty_gpte(gw->ptes[gw->level-1]),
323 ptwrite, level, 323 ptwrite, level,
324 gw->gfn, pfn, false, true); 324 gw->gfn, pfn, false, true);
325 break; 325 break;
326 } 326 }
327 327
328 if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) 328 if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
329 continue; 329 continue;
330 330
331 if (is_large_pte(*sptep)) { 331 if (is_large_pte(*sptep)) {
332 rmap_remove(vcpu->kvm, sptep); 332 rmap_remove(vcpu->kvm, sptep);
333 __set_spte(sptep, shadow_trap_nonpresent_pte); 333 __set_spte(sptep, shadow_trap_nonpresent_pte);
334 kvm_flush_remote_tlbs(vcpu->kvm); 334 kvm_flush_remote_tlbs(vcpu->kvm);
335 } 335 }
336 336
337 if (level <= gw->level) { 337 if (level <= gw->level) {
338 int delta = level - gw->level + 1; 338 int delta = level - gw->level + 1;
339 direct = 1; 339 direct = 1;
340 if (!is_dirty_gpte(gw->ptes[level - delta])) 340 if (!is_dirty_gpte(gw->ptes[level - delta]))
341 access &= ~ACC_WRITE_MASK; 341 access &= ~ACC_WRITE_MASK;
342 /* 342 /*
343 * It is a large guest pages backed by small host pages, 343 * It is a large guest pages backed by small host pages,
344 * So we set @direct(@sp->role.direct)=1, and set 344 * So we set @direct(@sp->role.direct)=1, and set
345 * @table_gfn(@sp->gfn)=the base page frame for linear 345 * @table_gfn(@sp->gfn)=the base page frame for linear
346 * translations. 346 * translations.
347 */ 347 */
348 table_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); 348 table_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
349 access &= gw->pte_access; 349 access &= gw->pte_access;
350 } else { 350 } else {
351 direct = 0; 351 direct = 0;
352 table_gfn = gw->table_gfn[level - 2]; 352 table_gfn = gw->table_gfn[level - 2];
353 } 353 }
354 sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, 354 sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1,
355 direct, access, sptep); 355 direct, access, sptep);
356 if (!direct) { 356 if (!direct) {
357 r = kvm_read_guest_atomic(vcpu->kvm, 357 r = kvm_read_guest_atomic(vcpu->kvm,
358 gw->pte_gpa[level - 2], 358 gw->pte_gpa[level - 2],
359 &curr_pte, sizeof(curr_pte)); 359 &curr_pte, sizeof(curr_pte));
360 if (r || curr_pte != gw->ptes[level - 2]) { 360 if (r || curr_pte != gw->ptes[level - 2]) {
361 kvm_mmu_put_page(sp, sptep); 361 kvm_mmu_put_page(sp, sptep);
362 kvm_release_pfn_clean(pfn); 362 kvm_release_pfn_clean(pfn);
363 sptep = NULL; 363 sptep = NULL;
364 break; 364 break;
365 } 365 }
366 } 366 }
367 367
368 spte = __pa(sp->spt) 368 spte = __pa(sp->spt)
369 | PT_PRESENT_MASK | PT_ACCESSED_MASK 369 | PT_PRESENT_MASK | PT_ACCESSED_MASK
370 | PT_WRITABLE_MASK | PT_USER_MASK; 370 | PT_WRITABLE_MASK | PT_USER_MASK;
371 *sptep = spte; 371 *sptep = spte;
372 } 372 }
373 373
374 return sptep; 374 return sptep;
375 } 375 }
376 376
377 /* 377 /*
378 * Page fault handler. There are several causes for a page fault: 378 * Page fault handler. There are several causes for a page fault:
379 * - there is no shadow pte for the guest pte 379 * - there is no shadow pte for the guest pte
380 * - write access through a shadow pte marked read only so that we can set 380 * - write access through a shadow pte marked read only so that we can set
381 * the dirty bit 381 * the dirty bit
382 * - write access to a shadow pte marked read only so we can update the page 382 * - write access to a shadow pte marked read only so we can update the page
383 * dirty bitmap, when userspace requests it 383 * dirty bitmap, when userspace requests it
384 * - mmio access; in this case we will never install a present shadow pte 384 * - mmio access; in this case we will never install a present shadow pte
385 * - normal guest page fault due to the guest pte marked not present, not 385 * - normal guest page fault due to the guest pte marked not present, not
386 * writable, or not executable 386 * writable, or not executable
387 * 387 *
388 * Returns: 1 if we need to emulate the instruction, 0 otherwise, or 388 * Returns: 1 if we need to emulate the instruction, 0 otherwise, or
389 * a negative value on error. 389 * a negative value on error.
390 */ 390 */
391 static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, 391 static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
392 u32 error_code) 392 u32 error_code)
393 { 393 {
394 int write_fault = error_code & PFERR_WRITE_MASK; 394 int write_fault = error_code & PFERR_WRITE_MASK;
395 int user_fault = error_code & PFERR_USER_MASK; 395 int user_fault = error_code & PFERR_USER_MASK;
396 int fetch_fault = error_code & PFERR_FETCH_MASK; 396 int fetch_fault = error_code & PFERR_FETCH_MASK;
397 struct guest_walker walker; 397 struct guest_walker walker;
398 u64 *sptep; 398 u64 *sptep;
399 int write_pt = 0; 399 int write_pt = 0;
400 int r; 400 int r;
401 pfn_t pfn; 401 pfn_t pfn;
402 int level = PT_PAGE_TABLE_LEVEL; 402 int level = PT_PAGE_TABLE_LEVEL;
403 unsigned long mmu_seq; 403 unsigned long mmu_seq;
404 404
405 pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); 405 pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
406 kvm_mmu_audit(vcpu, "pre page fault"); 406 kvm_mmu_audit(vcpu, "pre page fault");
407 407
408 r = mmu_topup_memory_caches(vcpu); 408 r = mmu_topup_memory_caches(vcpu);
409 if (r) 409 if (r)
410 return r; 410 return r;
411 411
412 /* 412 /*
413 * Look up the guest pte for the faulting address. 413 * Look up the guest pte for the faulting address.
414 */ 414 */
415 r = FNAME(walk_addr)(&walker, vcpu, addr, write_fault, user_fault, 415 r = FNAME(walk_addr)(&walker, vcpu, addr, write_fault, user_fault,
416 fetch_fault); 416 fetch_fault);
417 417
418 /* 418 /*
419 * The page is not mapped by the guest. Let the guest handle it. 419 * The page is not mapped by the guest. Let the guest handle it.
420 */ 420 */
421 if (!r) { 421 if (!r) {
422 pgprintk("%s: guest page fault\n", __func__); 422 pgprintk("%s: guest page fault\n", __func__);
423 inject_page_fault(vcpu, addr, walker.error_code); 423 inject_page_fault(vcpu, addr, walker.error_code);
424 vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ 424 vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
425 return 0; 425 return 0;
426 } 426 }
427 427
428 if (walker.level >= PT_DIRECTORY_LEVEL) { 428 if (walker.level >= PT_DIRECTORY_LEVEL) {
429 level = min(walker.level, mapping_level(vcpu, walker.gfn)); 429 level = min(walker.level, mapping_level(vcpu, walker.gfn));
430 walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); 430 walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
431 } 431 }
432 432
433 mmu_seq = vcpu->kvm->mmu_notifier_seq; 433 mmu_seq = vcpu->kvm->mmu_notifier_seq;
434 smp_rmb(); 434 smp_rmb();
435 pfn = gfn_to_pfn(vcpu->kvm, walker.gfn); 435 pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
436 436
437 /* mmio */ 437 /* mmio */
438 if (is_error_pfn(pfn)) 438 if (is_error_pfn(pfn))
439 return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); 439 return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
440 440
441 spin_lock(&vcpu->kvm->mmu_lock); 441 spin_lock(&vcpu->kvm->mmu_lock);
442 if (mmu_notifier_retry(vcpu, mmu_seq)) 442 if (mmu_notifier_retry(vcpu, mmu_seq))
443 goto out_unlock; 443 goto out_unlock;
444 kvm_mmu_free_some_pages(vcpu); 444 kvm_mmu_free_some_pages(vcpu);
445 sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, 445 sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
446 level, &write_pt, pfn); 446 level, &write_pt, pfn);
447 (void)sptep; 447 (void)sptep;
448 pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, 448 pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
449 sptep, *sptep, write_pt); 449 sptep, *sptep, write_pt);
450 450
451 if (!write_pt) 451 if (!write_pt)
452 vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ 452 vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
453 453
454 ++vcpu->stat.pf_fixed; 454 ++vcpu->stat.pf_fixed;
455 kvm_mmu_audit(vcpu, "post page fault (fixed)"); 455 kvm_mmu_audit(vcpu, "post page fault (fixed)");
456 spin_unlock(&vcpu->kvm->mmu_lock); 456 spin_unlock(&vcpu->kvm->mmu_lock);
457 457
458 return write_pt; 458 return write_pt;
459 459
460 out_unlock: 460 out_unlock:
461 spin_unlock(&vcpu->kvm->mmu_lock); 461 spin_unlock(&vcpu->kvm->mmu_lock);
462 kvm_release_pfn_clean(pfn); 462 kvm_release_pfn_clean(pfn);
463 return 0; 463 return 0;
464 } 464 }
465 465
466 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) 466 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
467 { 467 {
468 struct kvm_shadow_walk_iterator iterator; 468 struct kvm_shadow_walk_iterator iterator;
469 struct kvm_mmu_page *sp; 469 struct kvm_mmu_page *sp;
470 gpa_t pte_gpa = -1; 470 gpa_t pte_gpa = -1;
471 int level; 471 int level;
472 u64 *sptep; 472 u64 *sptep;
473 int need_flush = 0; 473 int need_flush = 0;
474 474
475 spin_lock(&vcpu->kvm->mmu_lock); 475 spin_lock(&vcpu->kvm->mmu_lock);
476 476
477 for_each_shadow_entry(vcpu, gva, iterator) { 477 for_each_shadow_entry(vcpu, gva, iterator) {
478 level = iterator.level; 478 level = iterator.level;
479 sptep = iterator.sptep; 479 sptep = iterator.sptep;
480 480
481 sp = page_header(__pa(sptep)); 481 sp = page_header(__pa(sptep));
482 if (is_last_spte(*sptep, level)) { 482 if (is_last_spte(*sptep, level)) {
483 int offset, shift; 483 int offset, shift;
484 484
485 if (!sp->unsync) 485 if (!sp->unsync)
486 break; 486 break;
487 487
488 shift = PAGE_SHIFT - 488 shift = PAGE_SHIFT -
489 (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; 489 (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
490 offset = sp->role.quadrant << shift; 490 offset = sp->role.quadrant << shift;
491 491
492 pte_gpa = (sp->gfn << PAGE_SHIFT) + offset; 492 pte_gpa = (sp->gfn << PAGE_SHIFT) + offset;
493 pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); 493 pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
494 494
495 if (is_shadow_present_pte(*sptep)) { 495 if (is_shadow_present_pte(*sptep)) {
496 rmap_remove(vcpu->kvm, sptep); 496 rmap_remove(vcpu->kvm, sptep);
497 if (is_large_pte(*sptep)) 497 if (is_large_pte(*sptep))
498 --vcpu->kvm->stat.lpages; 498 --vcpu->kvm->stat.lpages;
499 need_flush = 1; 499 need_flush = 1;
500 } 500 }
501 __set_spte(sptep, shadow_trap_nonpresent_pte); 501 __set_spte(sptep, shadow_trap_nonpresent_pte);
502 break; 502 break;
503 } 503 }
504 504
505 if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) 505 if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
506 break; 506 break;
507 } 507 }
508 508
509 if (need_flush) 509 if (need_flush)
510 kvm_flush_remote_tlbs(vcpu->kvm); 510 kvm_flush_remote_tlbs(vcpu->kvm);
511 511
512 atomic_inc(&vcpu->kvm->arch.invlpg_counter); 512 atomic_inc(&vcpu->kvm->arch.invlpg_counter);
513 513
514 spin_unlock(&vcpu->kvm->mmu_lock); 514 spin_unlock(&vcpu->kvm->mmu_lock);
515 515
516 if (pte_gpa == -1) 516 if (pte_gpa == -1)
517 return; 517 return;
518 518
519 if (mmu_topup_memory_caches(vcpu)) 519 if (mmu_topup_memory_caches(vcpu))
520 return; 520 return;
521 kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0); 521 kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0);
522 } 522 }
523 523
524 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, 524 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
525 u32 *error) 525 u32 *error)
526 { 526 {
527 struct guest_walker walker; 527 struct guest_walker walker;
528 gpa_t gpa = UNMAPPED_GVA; 528 gpa_t gpa = UNMAPPED_GVA;
529 int r; 529 int r;
530 530
531 r = FNAME(walk_addr)(&walker, vcpu, vaddr, 531 r = FNAME(walk_addr)(&walker, vcpu, vaddr,
532 !!(access & PFERR_WRITE_MASK), 532 !!(access & PFERR_WRITE_MASK),
533 !!(access & PFERR_USER_MASK), 533 !!(access & PFERR_USER_MASK),
534 !!(access & PFERR_FETCH_MASK)); 534 !!(access & PFERR_FETCH_MASK));
535 535
536 if (r) { 536 if (r) {
537 gpa = gfn_to_gpa(walker.gfn); 537 gpa = gfn_to_gpa(walker.gfn);
538 gpa |= vaddr & ~PAGE_MASK; 538 gpa |= vaddr & ~PAGE_MASK;
539 } else if (error) 539 } else if (error)
540 *error = walker.error_code; 540 *error = walker.error_code;
541 541
542 return gpa; 542 return gpa;
543 } 543 }
544 544
545 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu, 545 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
546 struct kvm_mmu_page *sp) 546 struct kvm_mmu_page *sp)
547 { 547 {
548 int i, j, offset, r; 548 int i, j, offset, r;
549 pt_element_t pt[256 / sizeof(pt_element_t)]; 549 pt_element_t pt[256 / sizeof(pt_element_t)];
550 gpa_t pte_gpa; 550 gpa_t pte_gpa;
551 551
552 if (sp->role.direct 552 if (sp->role.direct
553 || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) { 553 || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) {
554 nonpaging_prefetch_page(vcpu, sp); 554 nonpaging_prefetch_page(vcpu, sp);
555 return; 555 return;
556 } 556 }
557 557
558 pte_gpa = gfn_to_gpa(sp->gfn); 558 pte_gpa = gfn_to_gpa(sp->gfn);
559 if (PTTYPE == 32) { 559 if (PTTYPE == 32) {
560 offset = sp->role.quadrant << PT64_LEVEL_BITS; 560 offset = sp->role.quadrant << PT64_LEVEL_BITS;
561 pte_gpa += offset * sizeof(pt_element_t); 561 pte_gpa += offset * sizeof(pt_element_t);
562 } 562 }
563 563
564 for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) { 564 for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) {
565 r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt); 565 r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt);
566 pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t); 566 pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t);
567 for (j = 0; j < ARRAY_SIZE(pt); ++j) 567 for (j = 0; j < ARRAY_SIZE(pt); ++j)
568 if (r || is_present_gpte(pt[j])) 568 if (r || is_present_gpte(pt[j]))
569 sp->spt[i+j] = shadow_trap_nonpresent_pte; 569 sp->spt[i+j] = shadow_trap_nonpresent_pte;
570 else 570 else
571 sp->spt[i+j] = shadow_notrap_nonpresent_pte; 571 sp->spt[i+j] = shadow_notrap_nonpresent_pte;
572 } 572 }
573 } 573 }
574 574
575 /* 575 /*
576 * Using the cached information from sp->gfns is safe because: 576 * Using the cached information from sp->gfns is safe because:
577 * - The spte has a reference to the struct page, so the pfn for a given gfn 577 * - The spte has a reference to the struct page, so the pfn for a given gfn
578 * can't change unless all sptes pointing to it are nuked first. 578 * can't change unless all sptes pointing to it are nuked first.
579 * - Alias changes zap the entire shadow cache.
580 */ 579 */
581 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 580 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
582 bool clear_unsync) 581 bool clear_unsync)
583 { 582 {
584 int i, offset, nr_present; 583 int i, offset, nr_present;
585 bool reset_host_protection; 584 bool reset_host_protection;
586 gpa_t first_pte_gpa; 585 gpa_t first_pte_gpa;
587 586
588 offset = nr_present = 0; 587 offset = nr_present = 0;
589 588
590 /* direct kvm_mmu_page can not be unsync. */ 589 /* direct kvm_mmu_page can not be unsync. */
591 BUG_ON(sp->role.direct); 590 BUG_ON(sp->role.direct);
592 591
593 if (PTTYPE == 32) 592 if (PTTYPE == 32)
594 offset = sp->role.quadrant << PT64_LEVEL_BITS; 593 offset = sp->role.quadrant << PT64_LEVEL_BITS;
595 594
596 first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t); 595 first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
597 596
598 for (i = 0; i < PT64_ENT_PER_PAGE; i++) { 597 for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
599 unsigned pte_access; 598 unsigned pte_access;
600 pt_element_t gpte; 599 pt_element_t gpte;
601 gpa_t pte_gpa; 600 gpa_t pte_gpa;
602 gfn_t gfn; 601 gfn_t gfn;
603 602
604 if (!is_shadow_present_pte(sp->spt[i])) 603 if (!is_shadow_present_pte(sp->spt[i]))
605 continue; 604 continue;
606 605
607 pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); 606 pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
608 607
609 if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte, 608 if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
610 sizeof(pt_element_t))) 609 sizeof(pt_element_t)))
611 return -EINVAL; 610 return -EINVAL;
612 611
613 gfn = gpte_to_gfn(gpte); 612 gfn = gpte_to_gfn(gpte);
614 if (unalias_gfn(vcpu->kvm, gfn) != sp->gfns[i] || 613 if (gfn != sp->gfns[i] ||
615 !is_present_gpte(gpte) || !(gpte & PT_ACCESSED_MASK)) { 614 !is_present_gpte(gpte) || !(gpte & PT_ACCESSED_MASK)) {
616 u64 nonpresent; 615 u64 nonpresent;
617 616
618 rmap_remove(vcpu->kvm, &sp->spt[i]); 617 rmap_remove(vcpu->kvm, &sp->spt[i]);
619 if (is_present_gpte(gpte) || !clear_unsync) 618 if (is_present_gpte(gpte) || !clear_unsync)
620 nonpresent = shadow_trap_nonpresent_pte; 619 nonpresent = shadow_trap_nonpresent_pte;
621 else 620 else
622 nonpresent = shadow_notrap_nonpresent_pte; 621 nonpresent = shadow_notrap_nonpresent_pte;
623 __set_spte(&sp->spt[i], nonpresent); 622 __set_spte(&sp->spt[i], nonpresent);
624 continue; 623 continue;
625 } 624 }
626 625
627 nr_present++; 626 nr_present++;
628 pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); 627 pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
629 if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) { 628 if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) {
630 pte_access &= ~ACC_WRITE_MASK; 629 pte_access &= ~ACC_WRITE_MASK;
631 reset_host_protection = 0; 630 reset_host_protection = 0;
632 } else { 631 } else {
633 reset_host_protection = 1; 632 reset_host_protection = 1;
634 } 633 }
635 set_spte(vcpu, &sp->spt[i], pte_access, 0, 0, 634 set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
636 is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn, 635 is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
637 spte_to_pfn(sp->spt[i]), true, false, 636 spte_to_pfn(sp->spt[i]), true, false,
638 reset_host_protection); 637 reset_host_protection);
639 } 638 }
640 639
641 return !nr_present; 640 return !nr_present;
642 } 641 }
643 642
644 #undef pt_element_t 643 #undef pt_element_t
645 #undef guest_walker 644 #undef guest_walker
646 #undef FNAME 645 #undef FNAME
647 #undef PT_BASE_ADDR_MASK 646 #undef PT_BASE_ADDR_MASK
648 #undef PT_INDEX 647 #undef PT_INDEX
649 #undef PT_LEVEL_MASK 648 #undef PT_LEVEL_MASK
650 #undef PT_LVL_ADDR_MASK 649 #undef PT_LVL_ADDR_MASK
651 #undef PT_LVL_OFFSET_MASK 650 #undef PT_LVL_OFFSET_MASK
652 #undef PT_LEVEL_BITS 651 #undef PT_LEVEL_BITS
653 #undef PT_MAX_FULL_LEVELS 652 #undef PT_MAX_FULL_LEVELS
654 #undef gpte_to_gfn 653 #undef gpte_to_gfn
655 #undef gpte_to_gfn_lvl 654 #undef gpte_to_gfn_lvl
656 #undef CMPXCHG 655 #undef CMPXCHG
657 656
1 /* 1 /*
2 * Kernel-based Virtual Machine driver for Linux 2 * Kernel-based Virtual Machine driver for Linux
3 * 3 *
4 * derived from drivers/kvm/kvm_main.c 4 * derived from drivers/kvm/kvm_main.c
5 * 5 *
6 * Copyright (C) 2006 Qumranet, Inc. 6 * Copyright (C) 2006 Qumranet, Inc.
7 * Copyright (C) 2008 Qumranet, Inc. 7 * Copyright (C) 2008 Qumranet, Inc.
8 * Copyright IBM Corporation, 2008 8 * Copyright IBM Corporation, 2008
9 * Copyright 2010 Red Hat, Inc. and/or its affilates. 9 * Copyright 2010 Red Hat, Inc. and/or its affilates.
10 * 10 *
11 * Authors: 11 * Authors:
12 * Avi Kivity <avi@qumranet.com> 12 * Avi Kivity <avi@qumranet.com>
13 * Yaniv Kamay <yaniv@qumranet.com> 13 * Yaniv Kamay <yaniv@qumranet.com>
14 * Amit Shah <amit.shah@qumranet.com> 14 * Amit Shah <amit.shah@qumranet.com>
15 * Ben-Ami Yassour <benami@il.ibm.com> 15 * Ben-Ami Yassour <benami@il.ibm.com>
16 * 16 *
17 * This work is licensed under the terms of the GNU GPL, version 2. See 17 * This work is licensed under the terms of the GNU GPL, version 2. See
18 * the COPYING file in the top-level directory. 18 * the COPYING file in the top-level directory.
19 * 19 *
20 */ 20 */
21 21
22 #include <linux/kvm_host.h> 22 #include <linux/kvm_host.h>
23 #include "irq.h" 23 #include "irq.h"
24 #include "mmu.h" 24 #include "mmu.h"
25 #include "i8254.h" 25 #include "i8254.h"
26 #include "tss.h" 26 #include "tss.h"
27 #include "kvm_cache_regs.h" 27 #include "kvm_cache_regs.h"
28 #include "x86.h" 28 #include "x86.h"
29 29
30 #include <linux/clocksource.h> 30 #include <linux/clocksource.h>
31 #include <linux/interrupt.h> 31 #include <linux/interrupt.h>
32 #include <linux/kvm.h> 32 #include <linux/kvm.h>
33 #include <linux/fs.h> 33 #include <linux/fs.h>
34 #include <linux/vmalloc.h> 34 #include <linux/vmalloc.h>
35 #include <linux/module.h> 35 #include <linux/module.h>
36 #include <linux/mman.h> 36 #include <linux/mman.h>
37 #include <linux/highmem.h> 37 #include <linux/highmem.h>
38 #include <linux/iommu.h> 38 #include <linux/iommu.h>
39 #include <linux/intel-iommu.h> 39 #include <linux/intel-iommu.h>
40 #include <linux/cpufreq.h> 40 #include <linux/cpufreq.h>
41 #include <linux/user-return-notifier.h> 41 #include <linux/user-return-notifier.h>
42 #include <linux/srcu.h> 42 #include <linux/srcu.h>
43 #include <linux/slab.h> 43 #include <linux/slab.h>
44 #include <linux/perf_event.h> 44 #include <linux/perf_event.h>
45 #include <linux/uaccess.h> 45 #include <linux/uaccess.h>
46 #include <trace/events/kvm.h> 46 #include <trace/events/kvm.h>
47 47
48 #define CREATE_TRACE_POINTS 48 #define CREATE_TRACE_POINTS
49 #include "trace.h" 49 #include "trace.h"
50 50
51 #include <asm/debugreg.h> 51 #include <asm/debugreg.h>
52 #include <asm/msr.h> 52 #include <asm/msr.h>
53 #include <asm/desc.h> 53 #include <asm/desc.h>
54 #include <asm/mtrr.h> 54 #include <asm/mtrr.h>
55 #include <asm/mce.h> 55 #include <asm/mce.h>
56 #include <asm/i387.h> 56 #include <asm/i387.h>
57 #include <asm/xcr.h> 57 #include <asm/xcr.h>
58 58
59 #define MAX_IO_MSRS 256 59 #define MAX_IO_MSRS 256
60 #define CR0_RESERVED_BITS \ 60 #define CR0_RESERVED_BITS \
61 (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ 61 (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
62 | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ 62 | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
63 | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) 63 | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
64 #define CR4_RESERVED_BITS \ 64 #define CR4_RESERVED_BITS \
65 (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ 65 (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
66 | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ 66 | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
67 | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ 67 | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \
68 | X86_CR4_OSXSAVE \ 68 | X86_CR4_OSXSAVE \
69 | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) 69 | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))
70 70
71 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) 71 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
72 72
73 #define KVM_MAX_MCE_BANKS 32 73 #define KVM_MAX_MCE_BANKS 32
74 #define KVM_MCE_CAP_SUPPORTED MCG_CTL_P 74 #define KVM_MCE_CAP_SUPPORTED MCG_CTL_P
75 75
76 /* EFER defaults: 76 /* EFER defaults:
77 * - enable syscall per default because its emulated by KVM 77 * - enable syscall per default because its emulated by KVM
78 * - enable LME and LMA per default on 64 bit KVM 78 * - enable LME and LMA per default on 64 bit KVM
79 */ 79 */
80 #ifdef CONFIG_X86_64 80 #ifdef CONFIG_X86_64
81 static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffafeULL; 81 static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffafeULL;
82 #else 82 #else
83 static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffffeULL; 83 static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffffeULL;
84 #endif 84 #endif
85 85
86 #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM 86 #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
87 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU 87 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
88 88
89 static void update_cr8_intercept(struct kvm_vcpu *vcpu); 89 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
90 static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, 90 static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
91 struct kvm_cpuid_entry2 __user *entries); 91 struct kvm_cpuid_entry2 __user *entries);
92 92
93 struct kvm_x86_ops *kvm_x86_ops; 93 struct kvm_x86_ops *kvm_x86_ops;
94 EXPORT_SYMBOL_GPL(kvm_x86_ops); 94 EXPORT_SYMBOL_GPL(kvm_x86_ops);
95 95
96 int ignore_msrs = 0; 96 int ignore_msrs = 0;
97 module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR); 97 module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
98 98
99 #define KVM_NR_SHARED_MSRS 16 99 #define KVM_NR_SHARED_MSRS 16
100 100
101 struct kvm_shared_msrs_global { 101 struct kvm_shared_msrs_global {
102 int nr; 102 int nr;
103 u32 msrs[KVM_NR_SHARED_MSRS]; 103 u32 msrs[KVM_NR_SHARED_MSRS];
104 }; 104 };
105 105
106 struct kvm_shared_msrs { 106 struct kvm_shared_msrs {
107 struct user_return_notifier urn; 107 struct user_return_notifier urn;
108 bool registered; 108 bool registered;
109 struct kvm_shared_msr_values { 109 struct kvm_shared_msr_values {
110 u64 host; 110 u64 host;
111 u64 curr; 111 u64 curr;
112 } values[KVM_NR_SHARED_MSRS]; 112 } values[KVM_NR_SHARED_MSRS];
113 }; 113 };
114 114
115 static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; 115 static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
116 static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs); 116 static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
117 117
118 struct kvm_stats_debugfs_item debugfs_entries[] = { 118 struct kvm_stats_debugfs_item debugfs_entries[] = {
119 { "pf_fixed", VCPU_STAT(pf_fixed) }, 119 { "pf_fixed", VCPU_STAT(pf_fixed) },
120 { "pf_guest", VCPU_STAT(pf_guest) }, 120 { "pf_guest", VCPU_STAT(pf_guest) },
121 { "tlb_flush", VCPU_STAT(tlb_flush) }, 121 { "tlb_flush", VCPU_STAT(tlb_flush) },
122 { "invlpg", VCPU_STAT(invlpg) }, 122 { "invlpg", VCPU_STAT(invlpg) },
123 { "exits", VCPU_STAT(exits) }, 123 { "exits", VCPU_STAT(exits) },
124 { "io_exits", VCPU_STAT(io_exits) }, 124 { "io_exits", VCPU_STAT(io_exits) },
125 { "mmio_exits", VCPU_STAT(mmio_exits) }, 125 { "mmio_exits", VCPU_STAT(mmio_exits) },
126 { "signal_exits", VCPU_STAT(signal_exits) }, 126 { "signal_exits", VCPU_STAT(signal_exits) },
127 { "irq_window", VCPU_STAT(irq_window_exits) }, 127 { "irq_window", VCPU_STAT(irq_window_exits) },
128 { "nmi_window", VCPU_STAT(nmi_window_exits) }, 128 { "nmi_window", VCPU_STAT(nmi_window_exits) },
129 { "halt_exits", VCPU_STAT(halt_exits) }, 129 { "halt_exits", VCPU_STAT(halt_exits) },
130 { "halt_wakeup", VCPU_STAT(halt_wakeup) }, 130 { "halt_wakeup", VCPU_STAT(halt_wakeup) },
131 { "hypercalls", VCPU_STAT(hypercalls) }, 131 { "hypercalls", VCPU_STAT(hypercalls) },
132 { "request_irq", VCPU_STAT(request_irq_exits) }, 132 { "request_irq", VCPU_STAT(request_irq_exits) },
133 { "irq_exits", VCPU_STAT(irq_exits) }, 133 { "irq_exits", VCPU_STAT(irq_exits) },
134 { "host_state_reload", VCPU_STAT(host_state_reload) }, 134 { "host_state_reload", VCPU_STAT(host_state_reload) },
135 { "efer_reload", VCPU_STAT(efer_reload) }, 135 { "efer_reload", VCPU_STAT(efer_reload) },
136 { "fpu_reload", VCPU_STAT(fpu_reload) }, 136 { "fpu_reload", VCPU_STAT(fpu_reload) },
137 { "insn_emulation", VCPU_STAT(insn_emulation) }, 137 { "insn_emulation", VCPU_STAT(insn_emulation) },
138 { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, 138 { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
139 { "irq_injections", VCPU_STAT(irq_injections) }, 139 { "irq_injections", VCPU_STAT(irq_injections) },
140 { "nmi_injections", VCPU_STAT(nmi_injections) }, 140 { "nmi_injections", VCPU_STAT(nmi_injections) },
141 { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, 141 { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
142 { "mmu_pte_write", VM_STAT(mmu_pte_write) }, 142 { "mmu_pte_write", VM_STAT(mmu_pte_write) },
143 { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, 143 { "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
144 { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) }, 144 { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) },
145 { "mmu_flooded", VM_STAT(mmu_flooded) }, 145 { "mmu_flooded", VM_STAT(mmu_flooded) },
146 { "mmu_recycled", VM_STAT(mmu_recycled) }, 146 { "mmu_recycled", VM_STAT(mmu_recycled) },
147 { "mmu_cache_miss", VM_STAT(mmu_cache_miss) }, 147 { "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
148 { "mmu_unsync", VM_STAT(mmu_unsync) }, 148 { "mmu_unsync", VM_STAT(mmu_unsync) },
149 { "remote_tlb_flush", VM_STAT(remote_tlb_flush) }, 149 { "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
150 { "largepages", VM_STAT(lpages) }, 150 { "largepages", VM_STAT(lpages) },
151 { NULL } 151 { NULL }
152 }; 152 };
153 153
154 u64 __read_mostly host_xcr0; 154 u64 __read_mostly host_xcr0;
155 155
156 static inline u32 bit(int bitno) 156 static inline u32 bit(int bitno)
157 { 157 {
158 return 1 << (bitno & 31); 158 return 1 << (bitno & 31);
159 } 159 }
160 160
161 static void kvm_on_user_return(struct user_return_notifier *urn) 161 static void kvm_on_user_return(struct user_return_notifier *urn)
162 { 162 {
163 unsigned slot; 163 unsigned slot;
164 struct kvm_shared_msrs *locals 164 struct kvm_shared_msrs *locals
165 = container_of(urn, struct kvm_shared_msrs, urn); 165 = container_of(urn, struct kvm_shared_msrs, urn);
166 struct kvm_shared_msr_values *values; 166 struct kvm_shared_msr_values *values;
167 167
168 for (slot = 0; slot < shared_msrs_global.nr; ++slot) { 168 for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
169 values = &locals->values[slot]; 169 values = &locals->values[slot];
170 if (values->host != values->curr) { 170 if (values->host != values->curr) {
171 wrmsrl(shared_msrs_global.msrs[slot], values->host); 171 wrmsrl(shared_msrs_global.msrs[slot], values->host);
172 values->curr = values->host; 172 values->curr = values->host;
173 } 173 }
174 } 174 }
175 locals->registered = false; 175 locals->registered = false;
176 user_return_notifier_unregister(urn); 176 user_return_notifier_unregister(urn);
177 } 177 }
178 178
179 static void shared_msr_update(unsigned slot, u32 msr) 179 static void shared_msr_update(unsigned slot, u32 msr)
180 { 180 {
181 struct kvm_shared_msrs *smsr; 181 struct kvm_shared_msrs *smsr;
182 u64 value; 182 u64 value;
183 183
184 smsr = &__get_cpu_var(shared_msrs); 184 smsr = &__get_cpu_var(shared_msrs);
185 /* only read, and nobody should modify it at this time, 185 /* only read, and nobody should modify it at this time,
186 * so don't need lock */ 186 * so don't need lock */
187 if (slot >= shared_msrs_global.nr) { 187 if (slot >= shared_msrs_global.nr) {
188 printk(KERN_ERR "kvm: invalid MSR slot!"); 188 printk(KERN_ERR "kvm: invalid MSR slot!");
189 return; 189 return;
190 } 190 }
191 rdmsrl_safe(msr, &value); 191 rdmsrl_safe(msr, &value);
192 smsr->values[slot].host = value; 192 smsr->values[slot].host = value;
193 smsr->values[slot].curr = value; 193 smsr->values[slot].curr = value;
194 } 194 }
195 195
196 void kvm_define_shared_msr(unsigned slot, u32 msr) 196 void kvm_define_shared_msr(unsigned slot, u32 msr)
197 { 197 {
198 if (slot >= shared_msrs_global.nr) 198 if (slot >= shared_msrs_global.nr)
199 shared_msrs_global.nr = slot + 1; 199 shared_msrs_global.nr = slot + 1;
200 shared_msrs_global.msrs[slot] = msr; 200 shared_msrs_global.msrs[slot] = msr;
201 /* we need ensured the shared_msr_global have been updated */ 201 /* we need ensured the shared_msr_global have been updated */
202 smp_wmb(); 202 smp_wmb();
203 } 203 }
204 EXPORT_SYMBOL_GPL(kvm_define_shared_msr); 204 EXPORT_SYMBOL_GPL(kvm_define_shared_msr);
205 205
206 static void kvm_shared_msr_cpu_online(void) 206 static void kvm_shared_msr_cpu_online(void)
207 { 207 {
208 unsigned i; 208 unsigned i;
209 209
210 for (i = 0; i < shared_msrs_global.nr; ++i) 210 for (i = 0; i < shared_msrs_global.nr; ++i)
211 shared_msr_update(i, shared_msrs_global.msrs[i]); 211 shared_msr_update(i, shared_msrs_global.msrs[i]);
212 } 212 }
213 213
214 void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) 214 void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
215 { 215 {
216 struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); 216 struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs);
217 217
218 if (((value ^ smsr->values[slot].curr) & mask) == 0) 218 if (((value ^ smsr->values[slot].curr) & mask) == 0)
219 return; 219 return;
220 smsr->values[slot].curr = value; 220 smsr->values[slot].curr = value;
221 wrmsrl(shared_msrs_global.msrs[slot], value); 221 wrmsrl(shared_msrs_global.msrs[slot], value);
222 if (!smsr->registered) { 222 if (!smsr->registered) {
223 smsr->urn.on_user_return = kvm_on_user_return; 223 smsr->urn.on_user_return = kvm_on_user_return;
224 user_return_notifier_register(&smsr->urn); 224 user_return_notifier_register(&smsr->urn);
225 smsr->registered = true; 225 smsr->registered = true;
226 } 226 }
227 } 227 }
228 EXPORT_SYMBOL_GPL(kvm_set_shared_msr); 228 EXPORT_SYMBOL_GPL(kvm_set_shared_msr);
229 229
230 static void drop_user_return_notifiers(void *ignore) 230 static void drop_user_return_notifiers(void *ignore)
231 { 231 {
232 struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); 232 struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs);
233 233
234 if (smsr->registered) 234 if (smsr->registered)
235 kvm_on_user_return(&smsr->urn); 235 kvm_on_user_return(&smsr->urn);
236 } 236 }
237 237
238 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu) 238 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu)
239 { 239 {
240 if (irqchip_in_kernel(vcpu->kvm)) 240 if (irqchip_in_kernel(vcpu->kvm))
241 return vcpu->arch.apic_base; 241 return vcpu->arch.apic_base;
242 else 242 else
243 return vcpu->arch.apic_base; 243 return vcpu->arch.apic_base;
244 } 244 }
245 EXPORT_SYMBOL_GPL(kvm_get_apic_base); 245 EXPORT_SYMBOL_GPL(kvm_get_apic_base);
246 246
247 void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) 247 void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
248 { 248 {
249 /* TODO: reserve bits check */ 249 /* TODO: reserve bits check */
250 if (irqchip_in_kernel(vcpu->kvm)) 250 if (irqchip_in_kernel(vcpu->kvm))
251 kvm_lapic_set_base(vcpu, data); 251 kvm_lapic_set_base(vcpu, data);
252 else 252 else
253 vcpu->arch.apic_base = data; 253 vcpu->arch.apic_base = data;
254 } 254 }
255 EXPORT_SYMBOL_GPL(kvm_set_apic_base); 255 EXPORT_SYMBOL_GPL(kvm_set_apic_base);
256 256
257 #define EXCPT_BENIGN 0 257 #define EXCPT_BENIGN 0
258 #define EXCPT_CONTRIBUTORY 1 258 #define EXCPT_CONTRIBUTORY 1
259 #define EXCPT_PF 2 259 #define EXCPT_PF 2
260 260
261 static int exception_class(int vector) 261 static int exception_class(int vector)
262 { 262 {
263 switch (vector) { 263 switch (vector) {
264 case PF_VECTOR: 264 case PF_VECTOR:
265 return EXCPT_PF; 265 return EXCPT_PF;
266 case DE_VECTOR: 266 case DE_VECTOR:
267 case TS_VECTOR: 267 case TS_VECTOR:
268 case NP_VECTOR: 268 case NP_VECTOR:
269 case SS_VECTOR: 269 case SS_VECTOR:
270 case GP_VECTOR: 270 case GP_VECTOR:
271 return EXCPT_CONTRIBUTORY; 271 return EXCPT_CONTRIBUTORY;
272 default: 272 default:
273 break; 273 break;
274 } 274 }
275 return EXCPT_BENIGN; 275 return EXCPT_BENIGN;
276 } 276 }
277 277
278 static void kvm_multiple_exception(struct kvm_vcpu *vcpu, 278 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
279 unsigned nr, bool has_error, u32 error_code, 279 unsigned nr, bool has_error, u32 error_code,
280 bool reinject) 280 bool reinject)
281 { 281 {
282 u32 prev_nr; 282 u32 prev_nr;
283 int class1, class2; 283 int class1, class2;
284 284
285 if (!vcpu->arch.exception.pending) { 285 if (!vcpu->arch.exception.pending) {
286 queue: 286 queue:
287 vcpu->arch.exception.pending = true; 287 vcpu->arch.exception.pending = true;
288 vcpu->arch.exception.has_error_code = has_error; 288 vcpu->arch.exception.has_error_code = has_error;
289 vcpu->arch.exception.nr = nr; 289 vcpu->arch.exception.nr = nr;
290 vcpu->arch.exception.error_code = error_code; 290 vcpu->arch.exception.error_code = error_code;
291 vcpu->arch.exception.reinject = reinject; 291 vcpu->arch.exception.reinject = reinject;
292 return; 292 return;
293 } 293 }
294 294
295 /* to check exception */ 295 /* to check exception */
296 prev_nr = vcpu->arch.exception.nr; 296 prev_nr = vcpu->arch.exception.nr;
297 if (prev_nr == DF_VECTOR) { 297 if (prev_nr == DF_VECTOR) {
298 /* triple fault -> shutdown */ 298 /* triple fault -> shutdown */
299 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); 299 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
300 return; 300 return;
301 } 301 }
302 class1 = exception_class(prev_nr); 302 class1 = exception_class(prev_nr);
303 class2 = exception_class(nr); 303 class2 = exception_class(nr);
304 if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) 304 if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
305 || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) { 305 || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
306 /* generate double fault per SDM Table 5-5 */ 306 /* generate double fault per SDM Table 5-5 */
307 vcpu->arch.exception.pending = true; 307 vcpu->arch.exception.pending = true;
308 vcpu->arch.exception.has_error_code = true; 308 vcpu->arch.exception.has_error_code = true;
309 vcpu->arch.exception.nr = DF_VECTOR; 309 vcpu->arch.exception.nr = DF_VECTOR;
310 vcpu->arch.exception.error_code = 0; 310 vcpu->arch.exception.error_code = 0;
311 } else 311 } else
312 /* replace previous exception with a new one in a hope 312 /* replace previous exception with a new one in a hope
313 that instruction re-execution will regenerate lost 313 that instruction re-execution will regenerate lost
314 exception */ 314 exception */
315 goto queue; 315 goto queue;
316 } 316 }
317 317
318 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) 318 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
319 { 319 {
320 kvm_multiple_exception(vcpu, nr, false, 0, false); 320 kvm_multiple_exception(vcpu, nr, false, 0, false);
321 } 321 }
322 EXPORT_SYMBOL_GPL(kvm_queue_exception); 322 EXPORT_SYMBOL_GPL(kvm_queue_exception);
323 323
324 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr) 324 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
325 { 325 {
326 kvm_multiple_exception(vcpu, nr, false, 0, true); 326 kvm_multiple_exception(vcpu, nr, false, 0, true);
327 } 327 }
328 EXPORT_SYMBOL_GPL(kvm_requeue_exception); 328 EXPORT_SYMBOL_GPL(kvm_requeue_exception);
329 329
330 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr, 330 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
331 u32 error_code) 331 u32 error_code)
332 { 332 {
333 ++vcpu->stat.pf_guest; 333 ++vcpu->stat.pf_guest;
334 vcpu->arch.cr2 = addr; 334 vcpu->arch.cr2 = addr;
335 kvm_queue_exception_e(vcpu, PF_VECTOR, error_code); 335 kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
336 } 336 }
337 337
338 void kvm_inject_nmi(struct kvm_vcpu *vcpu) 338 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
339 { 339 {
340 vcpu->arch.nmi_pending = 1; 340 vcpu->arch.nmi_pending = 1;
341 } 341 }
342 EXPORT_SYMBOL_GPL(kvm_inject_nmi); 342 EXPORT_SYMBOL_GPL(kvm_inject_nmi);
343 343
344 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) 344 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
345 { 345 {
346 kvm_multiple_exception(vcpu, nr, true, error_code, false); 346 kvm_multiple_exception(vcpu, nr, true, error_code, false);
347 } 347 }
348 EXPORT_SYMBOL_GPL(kvm_queue_exception_e); 348 EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
349 349
350 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) 350 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
351 { 351 {
352 kvm_multiple_exception(vcpu, nr, true, error_code, true); 352 kvm_multiple_exception(vcpu, nr, true, error_code, true);
353 } 353 }
354 EXPORT_SYMBOL_GPL(kvm_requeue_exception_e); 354 EXPORT_SYMBOL_GPL(kvm_requeue_exception_e);
355 355
356 /* 356 /*
357 * Checks if cpl <= required_cpl; if true, return true. Otherwise queue 357 * Checks if cpl <= required_cpl; if true, return true. Otherwise queue
358 * a #GP and return false. 358 * a #GP and return false.
359 */ 359 */
360 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl) 360 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl)
361 { 361 {
362 if (kvm_x86_ops->get_cpl(vcpu) <= required_cpl) 362 if (kvm_x86_ops->get_cpl(vcpu) <= required_cpl)
363 return true; 363 return true;
364 kvm_queue_exception_e(vcpu, GP_VECTOR, 0); 364 kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
365 return false; 365 return false;
366 } 366 }
367 EXPORT_SYMBOL_GPL(kvm_require_cpl); 367 EXPORT_SYMBOL_GPL(kvm_require_cpl);
368 368
369 /* 369 /*
370 * Load the pae pdptrs. Return true is they are all valid. 370 * Load the pae pdptrs. Return true is they are all valid.
371 */ 371 */
372 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3) 372 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
373 { 373 {
374 gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT; 374 gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
375 unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2; 375 unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
376 int i; 376 int i;
377 int ret; 377 int ret;
378 u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; 378 u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
379 379
380 ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte, 380 ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte,
381 offset * sizeof(u64), sizeof(pdpte)); 381 offset * sizeof(u64), sizeof(pdpte));
382 if (ret < 0) { 382 if (ret < 0) {
383 ret = 0; 383 ret = 0;
384 goto out; 384 goto out;
385 } 385 }
386 for (i = 0; i < ARRAY_SIZE(pdpte); ++i) { 386 for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
387 if (is_present_gpte(pdpte[i]) && 387 if (is_present_gpte(pdpte[i]) &&
388 (pdpte[i] & vcpu->arch.mmu.rsvd_bits_mask[0][2])) { 388 (pdpte[i] & vcpu->arch.mmu.rsvd_bits_mask[0][2])) {
389 ret = 0; 389 ret = 0;
390 goto out; 390 goto out;
391 } 391 }
392 } 392 }
393 ret = 1; 393 ret = 1;
394 394
395 memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs)); 395 memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs));
396 __set_bit(VCPU_EXREG_PDPTR, 396 __set_bit(VCPU_EXREG_PDPTR,
397 (unsigned long *)&vcpu->arch.regs_avail); 397 (unsigned long *)&vcpu->arch.regs_avail);
398 __set_bit(VCPU_EXREG_PDPTR, 398 __set_bit(VCPU_EXREG_PDPTR,
399 (unsigned long *)&vcpu->arch.regs_dirty); 399 (unsigned long *)&vcpu->arch.regs_dirty);
400 out: 400 out:
401 401
402 return ret; 402 return ret;
403 } 403 }
404 EXPORT_SYMBOL_GPL(load_pdptrs); 404 EXPORT_SYMBOL_GPL(load_pdptrs);
405 405
406 static bool pdptrs_changed(struct kvm_vcpu *vcpu) 406 static bool pdptrs_changed(struct kvm_vcpu *vcpu)
407 { 407 {
408 u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; 408 u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
409 bool changed = true; 409 bool changed = true;
410 int r; 410 int r;
411 411
412 if (is_long_mode(vcpu) || !is_pae(vcpu)) 412 if (is_long_mode(vcpu) || !is_pae(vcpu))
413 return false; 413 return false;
414 414
415 if (!test_bit(VCPU_EXREG_PDPTR, 415 if (!test_bit(VCPU_EXREG_PDPTR,
416 (unsigned long *)&vcpu->arch.regs_avail)) 416 (unsigned long *)&vcpu->arch.regs_avail))
417 return true; 417 return true;
418 418
419 r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, sizeof(pdpte)); 419 r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, sizeof(pdpte));
420 if (r < 0) 420 if (r < 0)
421 goto out; 421 goto out;
422 changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0; 422 changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0;
423 out: 423 out:
424 424
425 return changed; 425 return changed;
426 } 426 }
427 427
428 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 428 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
429 { 429 {
430 unsigned long old_cr0 = kvm_read_cr0(vcpu); 430 unsigned long old_cr0 = kvm_read_cr0(vcpu);
431 unsigned long update_bits = X86_CR0_PG | X86_CR0_WP | 431 unsigned long update_bits = X86_CR0_PG | X86_CR0_WP |
432 X86_CR0_CD | X86_CR0_NW; 432 X86_CR0_CD | X86_CR0_NW;
433 433
434 cr0 |= X86_CR0_ET; 434 cr0 |= X86_CR0_ET;
435 435
436 #ifdef CONFIG_X86_64 436 #ifdef CONFIG_X86_64
437 if (cr0 & 0xffffffff00000000UL) 437 if (cr0 & 0xffffffff00000000UL)
438 return 1; 438 return 1;
439 #endif 439 #endif
440 440
441 cr0 &= ~CR0_RESERVED_BITS; 441 cr0 &= ~CR0_RESERVED_BITS;
442 442
443 if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) 443 if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
444 return 1; 444 return 1;
445 445
446 if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) 446 if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
447 return 1; 447 return 1;
448 448
449 if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { 449 if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
450 #ifdef CONFIG_X86_64 450 #ifdef CONFIG_X86_64
451 if ((vcpu->arch.efer & EFER_LME)) { 451 if ((vcpu->arch.efer & EFER_LME)) {
452 int cs_db, cs_l; 452 int cs_db, cs_l;
453 453
454 if (!is_pae(vcpu)) 454 if (!is_pae(vcpu))
455 return 1; 455 return 1;
456 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); 456 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
457 if (cs_l) 457 if (cs_l)
458 return 1; 458 return 1;
459 } else 459 } else
460 #endif 460 #endif
461 if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) 461 if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3))
462 return 1; 462 return 1;
463 } 463 }
464 464
465 kvm_x86_ops->set_cr0(vcpu, cr0); 465 kvm_x86_ops->set_cr0(vcpu, cr0);
466 466
467 if ((cr0 ^ old_cr0) & update_bits) 467 if ((cr0 ^ old_cr0) & update_bits)
468 kvm_mmu_reset_context(vcpu); 468 kvm_mmu_reset_context(vcpu);
469 return 0; 469 return 0;
470 } 470 }
471 EXPORT_SYMBOL_GPL(kvm_set_cr0); 471 EXPORT_SYMBOL_GPL(kvm_set_cr0);
472 472
473 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) 473 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
474 { 474 {
475 (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f)); 475 (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
476 } 476 }
477 EXPORT_SYMBOL_GPL(kvm_lmsw); 477 EXPORT_SYMBOL_GPL(kvm_lmsw);
478 478
479 int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 479 int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
480 { 480 {
481 u64 xcr0; 481 u64 xcr0;
482 482
483 /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ 483 /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */
484 if (index != XCR_XFEATURE_ENABLED_MASK) 484 if (index != XCR_XFEATURE_ENABLED_MASK)
485 return 1; 485 return 1;
486 xcr0 = xcr; 486 xcr0 = xcr;
487 if (kvm_x86_ops->get_cpl(vcpu) != 0) 487 if (kvm_x86_ops->get_cpl(vcpu) != 0)
488 return 1; 488 return 1;
489 if (!(xcr0 & XSTATE_FP)) 489 if (!(xcr0 & XSTATE_FP))
490 return 1; 490 return 1;
491 if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE)) 491 if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE))
492 return 1; 492 return 1;
493 if (xcr0 & ~host_xcr0) 493 if (xcr0 & ~host_xcr0)
494 return 1; 494 return 1;
495 vcpu->arch.xcr0 = xcr0; 495 vcpu->arch.xcr0 = xcr0;
496 vcpu->guest_xcr0_loaded = 0; 496 vcpu->guest_xcr0_loaded = 0;
497 return 0; 497 return 0;
498 } 498 }
499 499
500 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 500 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
501 { 501 {
502 if (__kvm_set_xcr(vcpu, index, xcr)) { 502 if (__kvm_set_xcr(vcpu, index, xcr)) {
503 kvm_inject_gp(vcpu, 0); 503 kvm_inject_gp(vcpu, 0);
504 return 1; 504 return 1;
505 } 505 }
506 return 0; 506 return 0;
507 } 507 }
508 EXPORT_SYMBOL_GPL(kvm_set_xcr); 508 EXPORT_SYMBOL_GPL(kvm_set_xcr);
509 509
510 static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu) 510 static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu)
511 { 511 {
512 struct kvm_cpuid_entry2 *best; 512 struct kvm_cpuid_entry2 *best;
513 513
514 best = kvm_find_cpuid_entry(vcpu, 1, 0); 514 best = kvm_find_cpuid_entry(vcpu, 1, 0);
515 return best && (best->ecx & bit(X86_FEATURE_XSAVE)); 515 return best && (best->ecx & bit(X86_FEATURE_XSAVE));
516 } 516 }
517 517
518 static void update_cpuid(struct kvm_vcpu *vcpu) 518 static void update_cpuid(struct kvm_vcpu *vcpu)
519 { 519 {
520 struct kvm_cpuid_entry2 *best; 520 struct kvm_cpuid_entry2 *best;
521 521
522 best = kvm_find_cpuid_entry(vcpu, 1, 0); 522 best = kvm_find_cpuid_entry(vcpu, 1, 0);
523 if (!best) 523 if (!best)
524 return; 524 return;
525 525
526 /* Update OSXSAVE bit */ 526 /* Update OSXSAVE bit */
527 if (cpu_has_xsave && best->function == 0x1) { 527 if (cpu_has_xsave && best->function == 0x1) {
528 best->ecx &= ~(bit(X86_FEATURE_OSXSAVE)); 528 best->ecx &= ~(bit(X86_FEATURE_OSXSAVE));
529 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) 529 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE))
530 best->ecx |= bit(X86_FEATURE_OSXSAVE); 530 best->ecx |= bit(X86_FEATURE_OSXSAVE);
531 } 531 }
532 } 532 }
533 533
534 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 534 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
535 { 535 {
536 unsigned long old_cr4 = kvm_read_cr4(vcpu); 536 unsigned long old_cr4 = kvm_read_cr4(vcpu);
537 unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE; 537 unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE;
538 538
539 if (cr4 & CR4_RESERVED_BITS) 539 if (cr4 & CR4_RESERVED_BITS)
540 return 1; 540 return 1;
541 541
542 if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE)) 542 if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE))
543 return 1; 543 return 1;
544 544
545 if (is_long_mode(vcpu)) { 545 if (is_long_mode(vcpu)) {
546 if (!(cr4 & X86_CR4_PAE)) 546 if (!(cr4 & X86_CR4_PAE))
547 return 1; 547 return 1;
548 } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE) 548 } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
549 && ((cr4 ^ old_cr4) & pdptr_bits) 549 && ((cr4 ^ old_cr4) & pdptr_bits)
550 && !load_pdptrs(vcpu, vcpu->arch.cr3)) 550 && !load_pdptrs(vcpu, vcpu->arch.cr3))
551 return 1; 551 return 1;
552 552
553 if (cr4 & X86_CR4_VMXE) 553 if (cr4 & X86_CR4_VMXE)
554 return 1; 554 return 1;
555 555
556 kvm_x86_ops->set_cr4(vcpu, cr4); 556 kvm_x86_ops->set_cr4(vcpu, cr4);
557 557
558 if ((cr4 ^ old_cr4) & pdptr_bits) 558 if ((cr4 ^ old_cr4) & pdptr_bits)
559 kvm_mmu_reset_context(vcpu); 559 kvm_mmu_reset_context(vcpu);
560 560
561 if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE) 561 if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE)
562 update_cpuid(vcpu); 562 update_cpuid(vcpu);
563 563
564 return 0; 564 return 0;
565 } 565 }
566 EXPORT_SYMBOL_GPL(kvm_set_cr4); 566 EXPORT_SYMBOL_GPL(kvm_set_cr4);
567 567
568 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) 568 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
569 { 569 {
570 if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) { 570 if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) {
571 kvm_mmu_sync_roots(vcpu); 571 kvm_mmu_sync_roots(vcpu);
572 kvm_mmu_flush_tlb(vcpu); 572 kvm_mmu_flush_tlb(vcpu);
573 return 0; 573 return 0;
574 } 574 }
575 575
576 if (is_long_mode(vcpu)) { 576 if (is_long_mode(vcpu)) {
577 if (cr3 & CR3_L_MODE_RESERVED_BITS) 577 if (cr3 & CR3_L_MODE_RESERVED_BITS)
578 return 1; 578 return 1;
579 } else { 579 } else {
580 if (is_pae(vcpu)) { 580 if (is_pae(vcpu)) {
581 if (cr3 & CR3_PAE_RESERVED_BITS) 581 if (cr3 & CR3_PAE_RESERVED_BITS)
582 return 1; 582 return 1;
583 if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3)) 583 if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3))
584 return 1; 584 return 1;
585 } 585 }
586 /* 586 /*
587 * We don't check reserved bits in nonpae mode, because 587 * We don't check reserved bits in nonpae mode, because
588 * this isn't enforced, and VMware depends on this. 588 * this isn't enforced, and VMware depends on this.
589 */ 589 */
590 } 590 }
591 591
592 /* 592 /*
593 * Does the new cr3 value map to physical memory? (Note, we 593 * Does the new cr3 value map to physical memory? (Note, we
594 * catch an invalid cr3 even in real-mode, because it would 594 * catch an invalid cr3 even in real-mode, because it would
595 * cause trouble later on when we turn on paging anyway.) 595 * cause trouble later on when we turn on paging anyway.)
596 * 596 *
597 * A real CPU would silently accept an invalid cr3 and would 597 * A real CPU would silently accept an invalid cr3 and would
598 * attempt to use it - with largely undefined (and often hard 598 * attempt to use it - with largely undefined (and often hard
599 * to debug) behavior on the guest side. 599 * to debug) behavior on the guest side.
600 */ 600 */
601 if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT))) 601 if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT)))
602 return 1; 602 return 1;
603 vcpu->arch.cr3 = cr3; 603 vcpu->arch.cr3 = cr3;
604 vcpu->arch.mmu.new_cr3(vcpu); 604 vcpu->arch.mmu.new_cr3(vcpu);
605 return 0; 605 return 0;
606 } 606 }
607 EXPORT_SYMBOL_GPL(kvm_set_cr3); 607 EXPORT_SYMBOL_GPL(kvm_set_cr3);
608 608
609 int __kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) 609 int __kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
610 { 610 {
611 if (cr8 & CR8_RESERVED_BITS) 611 if (cr8 & CR8_RESERVED_BITS)
612 return 1; 612 return 1;
613 if (irqchip_in_kernel(vcpu->kvm)) 613 if (irqchip_in_kernel(vcpu->kvm))
614 kvm_lapic_set_tpr(vcpu, cr8); 614 kvm_lapic_set_tpr(vcpu, cr8);
615 else 615 else
616 vcpu->arch.cr8 = cr8; 616 vcpu->arch.cr8 = cr8;
617 return 0; 617 return 0;
618 } 618 }
619 619
620 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) 620 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
621 { 621 {
622 if (__kvm_set_cr8(vcpu, cr8)) 622 if (__kvm_set_cr8(vcpu, cr8))
623 kvm_inject_gp(vcpu, 0); 623 kvm_inject_gp(vcpu, 0);
624 } 624 }
625 EXPORT_SYMBOL_GPL(kvm_set_cr8); 625 EXPORT_SYMBOL_GPL(kvm_set_cr8);
626 626
627 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) 627 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
628 { 628 {
629 if (irqchip_in_kernel(vcpu->kvm)) 629 if (irqchip_in_kernel(vcpu->kvm))
630 return kvm_lapic_get_cr8(vcpu); 630 return kvm_lapic_get_cr8(vcpu);
631 else 631 else
632 return vcpu->arch.cr8; 632 return vcpu->arch.cr8;
633 } 633 }
634 EXPORT_SYMBOL_GPL(kvm_get_cr8); 634 EXPORT_SYMBOL_GPL(kvm_get_cr8);
635 635
636 static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) 636 static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
637 { 637 {
638 switch (dr) { 638 switch (dr) {
639 case 0 ... 3: 639 case 0 ... 3:
640 vcpu->arch.db[dr] = val; 640 vcpu->arch.db[dr] = val;
641 if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) 641 if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
642 vcpu->arch.eff_db[dr] = val; 642 vcpu->arch.eff_db[dr] = val;
643 break; 643 break;
644 case 4: 644 case 4:
645 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) 645 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
646 return 1; /* #UD */ 646 return 1; /* #UD */
647 /* fall through */ 647 /* fall through */
648 case 6: 648 case 6:
649 if (val & 0xffffffff00000000ULL) 649 if (val & 0xffffffff00000000ULL)
650 return -1; /* #GP */ 650 return -1; /* #GP */
651 vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; 651 vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
652 break; 652 break;
653 case 5: 653 case 5:
654 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) 654 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
655 return 1; /* #UD */ 655 return 1; /* #UD */
656 /* fall through */ 656 /* fall through */
657 default: /* 7 */ 657 default: /* 7 */
658 if (val & 0xffffffff00000000ULL) 658 if (val & 0xffffffff00000000ULL)
659 return -1; /* #GP */ 659 return -1; /* #GP */
660 vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; 660 vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
661 if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) { 661 if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
662 kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7); 662 kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7);
663 vcpu->arch.switch_db_regs = (val & DR7_BP_EN_MASK); 663 vcpu->arch.switch_db_regs = (val & DR7_BP_EN_MASK);
664 } 664 }
665 break; 665 break;
666 } 666 }
667 667
668 return 0; 668 return 0;
669 } 669 }
670 670
671 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) 671 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
672 { 672 {
673 int res; 673 int res;
674 674
675 res = __kvm_set_dr(vcpu, dr, val); 675 res = __kvm_set_dr(vcpu, dr, val);
676 if (res > 0) 676 if (res > 0)
677 kvm_queue_exception(vcpu, UD_VECTOR); 677 kvm_queue_exception(vcpu, UD_VECTOR);
678 else if (res < 0) 678 else if (res < 0)
679 kvm_inject_gp(vcpu, 0); 679 kvm_inject_gp(vcpu, 0);
680 680
681 return res; 681 return res;
682 } 682 }
683 EXPORT_SYMBOL_GPL(kvm_set_dr); 683 EXPORT_SYMBOL_GPL(kvm_set_dr);
684 684
685 static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) 685 static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
686 { 686 {
687 switch (dr) { 687 switch (dr) {
688 case 0 ... 3: 688 case 0 ... 3:
689 *val = vcpu->arch.db[dr]; 689 *val = vcpu->arch.db[dr];
690 break; 690 break;
691 case 4: 691 case 4:
692 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) 692 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
693 return 1; 693 return 1;
694 /* fall through */ 694 /* fall through */
695 case 6: 695 case 6:
696 *val = vcpu->arch.dr6; 696 *val = vcpu->arch.dr6;
697 break; 697 break;
698 case 5: 698 case 5:
699 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) 699 if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
700 return 1; 700 return 1;
701 /* fall through */ 701 /* fall through */
702 default: /* 7 */ 702 default: /* 7 */
703 *val = vcpu->arch.dr7; 703 *val = vcpu->arch.dr7;
704 break; 704 break;
705 } 705 }
706 706
707 return 0; 707 return 0;
708 } 708 }
709 709
710 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) 710 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
711 { 711 {
712 if (_kvm_get_dr(vcpu, dr, val)) { 712 if (_kvm_get_dr(vcpu, dr, val)) {
713 kvm_queue_exception(vcpu, UD_VECTOR); 713 kvm_queue_exception(vcpu, UD_VECTOR);
714 return 1; 714 return 1;
715 } 715 }
716 return 0; 716 return 0;
717 } 717 }
718 EXPORT_SYMBOL_GPL(kvm_get_dr); 718 EXPORT_SYMBOL_GPL(kvm_get_dr);
719 719
720 /* 720 /*
721 * List of msr numbers which we expose to userspace through KVM_GET_MSRS 721 * List of msr numbers which we expose to userspace through KVM_GET_MSRS
722 * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. 722 * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
723 * 723 *
724 * This list is modified at module load time to reflect the 724 * This list is modified at module load time to reflect the
725 * capabilities of the host cpu. This capabilities test skips MSRs that are 725 * capabilities of the host cpu. This capabilities test skips MSRs that are
726 * kvm-specific. Those are put in the beginning of the list. 726 * kvm-specific. Those are put in the beginning of the list.
727 */ 727 */
728 728
729 #define KVM_SAVE_MSRS_BEGIN 7 729 #define KVM_SAVE_MSRS_BEGIN 7
730 static u32 msrs_to_save[] = { 730 static u32 msrs_to_save[] = {
731 MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, 731 MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
732 MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, 732 MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
733 HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, 733 HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
734 HV_X64_MSR_APIC_ASSIST_PAGE, 734 HV_X64_MSR_APIC_ASSIST_PAGE,
735 MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, 735 MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
736 MSR_K6_STAR, 736 MSR_K6_STAR,
737 #ifdef CONFIG_X86_64 737 #ifdef CONFIG_X86_64
738 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, 738 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
739 #endif 739 #endif
740 MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA 740 MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
741 }; 741 };
742 742
743 static unsigned num_msrs_to_save; 743 static unsigned num_msrs_to_save;
744 744
745 static u32 emulated_msrs[] = { 745 static u32 emulated_msrs[] = {
746 MSR_IA32_MISC_ENABLE, 746 MSR_IA32_MISC_ENABLE,
747 }; 747 };
748 748
749 static int set_efer(struct kvm_vcpu *vcpu, u64 efer) 749 static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
750 { 750 {
751 u64 old_efer = vcpu->arch.efer; 751 u64 old_efer = vcpu->arch.efer;
752 752
753 if (efer & efer_reserved_bits) 753 if (efer & efer_reserved_bits)
754 return 1; 754 return 1;
755 755
756 if (is_paging(vcpu) 756 if (is_paging(vcpu)
757 && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) 757 && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
758 return 1; 758 return 1;
759 759
760 if (efer & EFER_FFXSR) { 760 if (efer & EFER_FFXSR) {
761 struct kvm_cpuid_entry2 *feat; 761 struct kvm_cpuid_entry2 *feat;
762 762
763 feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); 763 feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
764 if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT))) 764 if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT)))
765 return 1; 765 return 1;
766 } 766 }
767 767
768 if (efer & EFER_SVME) { 768 if (efer & EFER_SVME) {
769 struct kvm_cpuid_entry2 *feat; 769 struct kvm_cpuid_entry2 *feat;
770 770
771 feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); 771 feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
772 if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM))) 772 if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM)))
773 return 1; 773 return 1;
774 } 774 }
775 775
776 efer &= ~EFER_LMA; 776 efer &= ~EFER_LMA;
777 efer |= vcpu->arch.efer & EFER_LMA; 777 efer |= vcpu->arch.efer & EFER_LMA;
778 778
779 kvm_x86_ops->set_efer(vcpu, efer); 779 kvm_x86_ops->set_efer(vcpu, efer);
780 780
781 vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled; 781 vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled;
782 kvm_mmu_reset_context(vcpu); 782 kvm_mmu_reset_context(vcpu);
783 783
784 /* Update reserved bits */ 784 /* Update reserved bits */
785 if ((efer ^ old_efer) & EFER_NX) 785 if ((efer ^ old_efer) & EFER_NX)
786 kvm_mmu_reset_context(vcpu); 786 kvm_mmu_reset_context(vcpu);
787 787
788 return 0; 788 return 0;
789 } 789 }
790 790
791 void kvm_enable_efer_bits(u64 mask) 791 void kvm_enable_efer_bits(u64 mask)
792 { 792 {
793 efer_reserved_bits &= ~mask; 793 efer_reserved_bits &= ~mask;
794 } 794 }
795 EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); 795 EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
796 796
797 797
798 /* 798 /*
799 * Writes msr value into into the appropriate "register". 799 * Writes msr value into into the appropriate "register".
800 * Returns 0 on success, non-0 otherwise. 800 * Returns 0 on success, non-0 otherwise.
801 * Assumes vcpu_load() was already called. 801 * Assumes vcpu_load() was already called.
802 */ 802 */
803 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) 803 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
804 { 804 {
805 return kvm_x86_ops->set_msr(vcpu, msr_index, data); 805 return kvm_x86_ops->set_msr(vcpu, msr_index, data);
806 } 806 }
807 807
808 /* 808 /*
809 * Adapt set_msr() to msr_io()'s calling convention 809 * Adapt set_msr() to msr_io()'s calling convention
810 */ 810 */
811 static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) 811 static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
812 { 812 {
813 return kvm_set_msr(vcpu, index, *data); 813 return kvm_set_msr(vcpu, index, *data);
814 } 814 }
815 815
816 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) 816 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
817 { 817 {
818 int version; 818 int version;
819 int r; 819 int r;
820 struct pvclock_wall_clock wc; 820 struct pvclock_wall_clock wc;
821 struct timespec boot; 821 struct timespec boot;
822 822
823 if (!wall_clock) 823 if (!wall_clock)
824 return; 824 return;
825 825
826 r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version)); 826 r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
827 if (r) 827 if (r)
828 return; 828 return;
829 829
830 if (version & 1) 830 if (version & 1)
831 ++version; /* first time write, random junk */ 831 ++version; /* first time write, random junk */
832 832
833 ++version; 833 ++version;
834 834
835 kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); 835 kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
836 836
837 /* 837 /*
838 * The guest calculates current wall clock time by adding 838 * The guest calculates current wall clock time by adding
839 * system time (updated by kvm_write_guest_time below) to the 839 * system time (updated by kvm_write_guest_time below) to the
840 * wall clock specified here. guest system time equals host 840 * wall clock specified here. guest system time equals host
841 * system time for us, thus we must fill in host boot time here. 841 * system time for us, thus we must fill in host boot time here.
842 */ 842 */
843 getboottime(&boot); 843 getboottime(&boot);
844 844
845 wc.sec = boot.tv_sec; 845 wc.sec = boot.tv_sec;
846 wc.nsec = boot.tv_nsec; 846 wc.nsec = boot.tv_nsec;
847 wc.version = version; 847 wc.version = version;
848 848
849 kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); 849 kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
850 850
851 version++; 851 version++;
852 kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); 852 kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
853 } 853 }
854 854
855 static uint32_t div_frac(uint32_t dividend, uint32_t divisor) 855 static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
856 { 856 {
857 uint32_t quotient, remainder; 857 uint32_t quotient, remainder;
858 858
859 /* Don't try to replace with do_div(), this one calculates 859 /* Don't try to replace with do_div(), this one calculates
860 * "(dividend << 32) / divisor" */ 860 * "(dividend << 32) / divisor" */
861 __asm__ ( "divl %4" 861 __asm__ ( "divl %4"
862 : "=a" (quotient), "=d" (remainder) 862 : "=a" (quotient), "=d" (remainder)
863 : "0" (0), "1" (dividend), "r" (divisor) ); 863 : "0" (0), "1" (dividend), "r" (divisor) );
864 return quotient; 864 return quotient;
865 } 865 }
866 866
867 static void kvm_set_time_scale(uint32_t tsc_khz, struct pvclock_vcpu_time_info *hv_clock) 867 static void kvm_set_time_scale(uint32_t tsc_khz, struct pvclock_vcpu_time_info *hv_clock)
868 { 868 {
869 uint64_t nsecs = 1000000000LL; 869 uint64_t nsecs = 1000000000LL;
870 int32_t shift = 0; 870 int32_t shift = 0;
871 uint64_t tps64; 871 uint64_t tps64;
872 uint32_t tps32; 872 uint32_t tps32;
873 873
874 tps64 = tsc_khz * 1000LL; 874 tps64 = tsc_khz * 1000LL;
875 while (tps64 > nsecs*2) { 875 while (tps64 > nsecs*2) {
876 tps64 >>= 1; 876 tps64 >>= 1;
877 shift--; 877 shift--;
878 } 878 }
879 879
880 tps32 = (uint32_t)tps64; 880 tps32 = (uint32_t)tps64;
881 while (tps32 <= (uint32_t)nsecs) { 881 while (tps32 <= (uint32_t)nsecs) {
882 tps32 <<= 1; 882 tps32 <<= 1;
883 shift++; 883 shift++;
884 } 884 }
885 885
886 hv_clock->tsc_shift = shift; 886 hv_clock->tsc_shift = shift;
887 hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32); 887 hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32);
888 888
889 pr_debug("%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n", 889 pr_debug("%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n",
890 __func__, tsc_khz, hv_clock->tsc_shift, 890 __func__, tsc_khz, hv_clock->tsc_shift,
891 hv_clock->tsc_to_system_mul); 891 hv_clock->tsc_to_system_mul);
892 } 892 }
893 893
894 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz); 894 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
895 895
896 static void kvm_write_guest_time(struct kvm_vcpu *v) 896 static void kvm_write_guest_time(struct kvm_vcpu *v)
897 { 897 {
898 struct timespec ts; 898 struct timespec ts;
899 unsigned long flags; 899 unsigned long flags;
900 struct kvm_vcpu_arch *vcpu = &v->arch; 900 struct kvm_vcpu_arch *vcpu = &v->arch;
901 void *shared_kaddr; 901 void *shared_kaddr;
902 unsigned long this_tsc_khz; 902 unsigned long this_tsc_khz;
903 903
904 if ((!vcpu->time_page)) 904 if ((!vcpu->time_page))
905 return; 905 return;
906 906
907 this_tsc_khz = get_cpu_var(cpu_tsc_khz); 907 this_tsc_khz = get_cpu_var(cpu_tsc_khz);
908 if (unlikely(vcpu->hv_clock_tsc_khz != this_tsc_khz)) { 908 if (unlikely(vcpu->hv_clock_tsc_khz != this_tsc_khz)) {
909 kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock); 909 kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock);
910 vcpu->hv_clock_tsc_khz = this_tsc_khz; 910 vcpu->hv_clock_tsc_khz = this_tsc_khz;
911 } 911 }
912 put_cpu_var(cpu_tsc_khz); 912 put_cpu_var(cpu_tsc_khz);
913 913
914 /* Keep irq disabled to prevent changes to the clock */ 914 /* Keep irq disabled to prevent changes to the clock */
915 local_irq_save(flags); 915 local_irq_save(flags);
916 kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp); 916 kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp);
917 ktime_get_ts(&ts); 917 ktime_get_ts(&ts);
918 monotonic_to_bootbased(&ts); 918 monotonic_to_bootbased(&ts);
919 local_irq_restore(flags); 919 local_irq_restore(flags);
920 920
921 /* With all the info we got, fill in the values */ 921 /* With all the info we got, fill in the values */
922 922
923 vcpu->hv_clock.system_time = ts.tv_nsec + 923 vcpu->hv_clock.system_time = ts.tv_nsec +
924 (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset; 924 (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
925 925
926 vcpu->hv_clock.flags = 0; 926 vcpu->hv_clock.flags = 0;
927 927
928 /* 928 /*
929 * The interface expects us to write an even number signaling that the 929 * The interface expects us to write an even number signaling that the
930 * update is finished. Since the guest won't see the intermediate 930 * update is finished. Since the guest won't see the intermediate
931 * state, we just increase by 2 at the end. 931 * state, we just increase by 2 at the end.
932 */ 932 */
933 vcpu->hv_clock.version += 2; 933 vcpu->hv_clock.version += 2;
934 934
935 shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); 935 shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
936 936
937 memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, 937 memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
938 sizeof(vcpu->hv_clock)); 938 sizeof(vcpu->hv_clock));
939 939
940 kunmap_atomic(shared_kaddr, KM_USER0); 940 kunmap_atomic(shared_kaddr, KM_USER0);
941 941
942 mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); 942 mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
943 } 943 }
944 944
945 static int kvm_request_guest_time_update(struct kvm_vcpu *v) 945 static int kvm_request_guest_time_update(struct kvm_vcpu *v)
946 { 946 {
947 struct kvm_vcpu_arch *vcpu = &v->arch; 947 struct kvm_vcpu_arch *vcpu = &v->arch;
948 948
949 if (!vcpu->time_page) 949 if (!vcpu->time_page)
950 return 0; 950 return 0;
951 set_bit(KVM_REQ_KVMCLOCK_UPDATE, &v->requests); 951 set_bit(KVM_REQ_KVMCLOCK_UPDATE, &v->requests);
952 return 1; 952 return 1;
953 } 953 }
954 954
955 static bool msr_mtrr_valid(unsigned msr) 955 static bool msr_mtrr_valid(unsigned msr)
956 { 956 {
957 switch (msr) { 957 switch (msr) {
958 case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1: 958 case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1:
959 case MSR_MTRRfix64K_00000: 959 case MSR_MTRRfix64K_00000:
960 case MSR_MTRRfix16K_80000: 960 case MSR_MTRRfix16K_80000:
961 case MSR_MTRRfix16K_A0000: 961 case MSR_MTRRfix16K_A0000:
962 case MSR_MTRRfix4K_C0000: 962 case MSR_MTRRfix4K_C0000:
963 case MSR_MTRRfix4K_C8000: 963 case MSR_MTRRfix4K_C8000:
964 case MSR_MTRRfix4K_D0000: 964 case MSR_MTRRfix4K_D0000:
965 case MSR_MTRRfix4K_D8000: 965 case MSR_MTRRfix4K_D8000:
966 case MSR_MTRRfix4K_E0000: 966 case MSR_MTRRfix4K_E0000:
967 case MSR_MTRRfix4K_E8000: 967 case MSR_MTRRfix4K_E8000:
968 case MSR_MTRRfix4K_F0000: 968 case MSR_MTRRfix4K_F0000:
969 case MSR_MTRRfix4K_F8000: 969 case MSR_MTRRfix4K_F8000:
970 case MSR_MTRRdefType: 970 case MSR_MTRRdefType:
971 case MSR_IA32_CR_PAT: 971 case MSR_IA32_CR_PAT:
972 return true; 972 return true;
973 case 0x2f8: 973 case 0x2f8:
974 return true; 974 return true;
975 } 975 }
976 return false; 976 return false;
977 } 977 }
978 978
979 static bool valid_pat_type(unsigned t) 979 static bool valid_pat_type(unsigned t)
980 { 980 {
981 return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */ 981 return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */
982 } 982 }
983 983
984 static bool valid_mtrr_type(unsigned t) 984 static bool valid_mtrr_type(unsigned t)
985 { 985 {
986 return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ 986 return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */
987 } 987 }
988 988
989 static bool mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) 989 static bool mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
990 { 990 {
991 int i; 991 int i;
992 992
993 if (!msr_mtrr_valid(msr)) 993 if (!msr_mtrr_valid(msr))
994 return false; 994 return false;
995 995
996 if (msr == MSR_IA32_CR_PAT) { 996 if (msr == MSR_IA32_CR_PAT) {
997 for (i = 0; i < 8; i++) 997 for (i = 0; i < 8; i++)
998 if (!valid_pat_type((data >> (i * 8)) & 0xff)) 998 if (!valid_pat_type((data >> (i * 8)) & 0xff))
999 return false; 999 return false;
1000 return true; 1000 return true;
1001 } else if (msr == MSR_MTRRdefType) { 1001 } else if (msr == MSR_MTRRdefType) {
1002 if (data & ~0xcff) 1002 if (data & ~0xcff)
1003 return false; 1003 return false;
1004 return valid_mtrr_type(data & 0xff); 1004 return valid_mtrr_type(data & 0xff);
1005 } else if (msr >= MSR_MTRRfix64K_00000 && msr <= MSR_MTRRfix4K_F8000) { 1005 } else if (msr >= MSR_MTRRfix64K_00000 && msr <= MSR_MTRRfix4K_F8000) {
1006 for (i = 0; i < 8 ; i++) 1006 for (i = 0; i < 8 ; i++)
1007 if (!valid_mtrr_type((data >> (i * 8)) & 0xff)) 1007 if (!valid_mtrr_type((data >> (i * 8)) & 0xff))
1008 return false; 1008 return false;
1009 return true; 1009 return true;
1010 } 1010 }
1011 1011
1012 /* variable MTRRs */ 1012 /* variable MTRRs */
1013 return valid_mtrr_type(data & 0xff); 1013 return valid_mtrr_type(data & 0xff);
1014 } 1014 }
1015 1015
1016 static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data) 1016 static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
1017 { 1017 {
1018 u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; 1018 u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
1019 1019
1020 if (!mtrr_valid(vcpu, msr, data)) 1020 if (!mtrr_valid(vcpu, msr, data))
1021 return 1; 1021 return 1;
1022 1022
1023 if (msr == MSR_MTRRdefType) { 1023 if (msr == MSR_MTRRdefType) {
1024 vcpu->arch.mtrr_state.def_type = data; 1024 vcpu->arch.mtrr_state.def_type = data;
1025 vcpu->arch.mtrr_state.enabled = (data & 0xc00) >> 10; 1025 vcpu->arch.mtrr_state.enabled = (data & 0xc00) >> 10;
1026 } else if (msr == MSR_MTRRfix64K_00000) 1026 } else if (msr == MSR_MTRRfix64K_00000)
1027 p[0] = data; 1027 p[0] = data;
1028 else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) 1028 else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000)
1029 p[1 + msr - MSR_MTRRfix16K_80000] = data; 1029 p[1 + msr - MSR_MTRRfix16K_80000] = data;
1030 else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) 1030 else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000)
1031 p[3 + msr - MSR_MTRRfix4K_C0000] = data; 1031 p[3 + msr - MSR_MTRRfix4K_C0000] = data;
1032 else if (msr == MSR_IA32_CR_PAT) 1032 else if (msr == MSR_IA32_CR_PAT)
1033 vcpu->arch.pat = data; 1033 vcpu->arch.pat = data;
1034 else { /* Variable MTRRs */ 1034 else { /* Variable MTRRs */
1035 int idx, is_mtrr_mask; 1035 int idx, is_mtrr_mask;
1036 u64 *pt; 1036 u64 *pt;
1037 1037
1038 idx = (msr - 0x200) / 2; 1038 idx = (msr - 0x200) / 2;
1039 is_mtrr_mask = msr - 0x200 - 2 * idx; 1039 is_mtrr_mask = msr - 0x200 - 2 * idx;
1040 if (!is_mtrr_mask) 1040 if (!is_mtrr_mask)
1041 pt = 1041 pt =
1042 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; 1042 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo;
1043 else 1043 else
1044 pt = 1044 pt =
1045 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; 1045 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo;
1046 *pt = data; 1046 *pt = data;
1047 } 1047 }
1048 1048
1049 kvm_mmu_reset_context(vcpu); 1049 kvm_mmu_reset_context(vcpu);
1050 return 0; 1050 return 0;
1051 } 1051 }
1052 1052
1053 static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data) 1053 static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data)
1054 { 1054 {
1055 u64 mcg_cap = vcpu->arch.mcg_cap; 1055 u64 mcg_cap = vcpu->arch.mcg_cap;
1056 unsigned bank_num = mcg_cap & 0xff; 1056 unsigned bank_num = mcg_cap & 0xff;
1057 1057
1058 switch (msr) { 1058 switch (msr) {
1059 case MSR_IA32_MCG_STATUS: 1059 case MSR_IA32_MCG_STATUS:
1060 vcpu->arch.mcg_status = data; 1060 vcpu->arch.mcg_status = data;
1061 break; 1061 break;
1062 case MSR_IA32_MCG_CTL: 1062 case MSR_IA32_MCG_CTL:
1063 if (!(mcg_cap & MCG_CTL_P)) 1063 if (!(mcg_cap & MCG_CTL_P))
1064 return 1; 1064 return 1;
1065 if (data != 0 && data != ~(u64)0) 1065 if (data != 0 && data != ~(u64)0)
1066 return -1; 1066 return -1;
1067 vcpu->arch.mcg_ctl = data; 1067 vcpu->arch.mcg_ctl = data;
1068 break; 1068 break;
1069 default: 1069 default:
1070 if (msr >= MSR_IA32_MC0_CTL && 1070 if (msr >= MSR_IA32_MC0_CTL &&
1071 msr < MSR_IA32_MC0_CTL + 4 * bank_num) { 1071 msr < MSR_IA32_MC0_CTL + 4 * bank_num) {
1072 u32 offset = msr - MSR_IA32_MC0_CTL; 1072 u32 offset = msr - MSR_IA32_MC0_CTL;
1073 /* only 0 or all 1s can be written to IA32_MCi_CTL 1073 /* only 0 or all 1s can be written to IA32_MCi_CTL
1074 * some Linux kernels though clear bit 10 in bank 4 to 1074 * some Linux kernels though clear bit 10 in bank 4 to
1075 * workaround a BIOS/GART TBL issue on AMD K8s, ignore 1075 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
1076 * this to avoid an uncatched #GP in the guest 1076 * this to avoid an uncatched #GP in the guest
1077 */ 1077 */
1078 if ((offset & 0x3) == 0 && 1078 if ((offset & 0x3) == 0 &&
1079 data != 0 && (data | (1 << 10)) != ~(u64)0) 1079 data != 0 && (data | (1 << 10)) != ~(u64)0)
1080 return -1; 1080 return -1;
1081 vcpu->arch.mce_banks[offset] = data; 1081 vcpu->arch.mce_banks[offset] = data;
1082 break; 1082 break;
1083 } 1083 }
1084 return 1; 1084 return 1;
1085 } 1085 }
1086 return 0; 1086 return 0;
1087 } 1087 }
1088 1088
1089 static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data) 1089 static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
1090 { 1090 {
1091 struct kvm *kvm = vcpu->kvm; 1091 struct kvm *kvm = vcpu->kvm;
1092 int lm = is_long_mode(vcpu); 1092 int lm = is_long_mode(vcpu);
1093 u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64 1093 u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64
1094 : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32; 1094 : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32;
1095 u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64 1095 u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64
1096 : kvm->arch.xen_hvm_config.blob_size_32; 1096 : kvm->arch.xen_hvm_config.blob_size_32;
1097 u32 page_num = data & ~PAGE_MASK; 1097 u32 page_num = data & ~PAGE_MASK;
1098 u64 page_addr = data & PAGE_MASK; 1098 u64 page_addr = data & PAGE_MASK;
1099 u8 *page; 1099 u8 *page;
1100 int r; 1100 int r;
1101 1101
1102 r = -E2BIG; 1102 r = -E2BIG;
1103 if (page_num >= blob_size) 1103 if (page_num >= blob_size)
1104 goto out; 1104 goto out;
1105 r = -ENOMEM; 1105 r = -ENOMEM;
1106 page = kzalloc(PAGE_SIZE, GFP_KERNEL); 1106 page = kzalloc(PAGE_SIZE, GFP_KERNEL);
1107 if (!page) 1107 if (!page)
1108 goto out; 1108 goto out;
1109 r = -EFAULT; 1109 r = -EFAULT;
1110 if (copy_from_user(page, blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE)) 1110 if (copy_from_user(page, blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE))
1111 goto out_free; 1111 goto out_free;
1112 if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE)) 1112 if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE))
1113 goto out_free; 1113 goto out_free;
1114 r = 0; 1114 r = 0;
1115 out_free: 1115 out_free:
1116 kfree(page); 1116 kfree(page);
1117 out: 1117 out:
1118 return r; 1118 return r;
1119 } 1119 }
1120 1120
1121 static bool kvm_hv_hypercall_enabled(struct kvm *kvm) 1121 static bool kvm_hv_hypercall_enabled(struct kvm *kvm)
1122 { 1122 {
1123 return kvm->arch.hv_hypercall & HV_X64_MSR_HYPERCALL_ENABLE; 1123 return kvm->arch.hv_hypercall & HV_X64_MSR_HYPERCALL_ENABLE;
1124 } 1124 }
1125 1125
1126 static bool kvm_hv_msr_partition_wide(u32 msr) 1126 static bool kvm_hv_msr_partition_wide(u32 msr)
1127 { 1127 {
1128 bool r = false; 1128 bool r = false;
1129 switch (msr) { 1129 switch (msr) {
1130 case HV_X64_MSR_GUEST_OS_ID: 1130 case HV_X64_MSR_GUEST_OS_ID:
1131 case HV_X64_MSR_HYPERCALL: 1131 case HV_X64_MSR_HYPERCALL:
1132 r = true; 1132 r = true;
1133 break; 1133 break;
1134 } 1134 }
1135 1135
1136 return r; 1136 return r;
1137 } 1137 }
1138 1138
1139 static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) 1139 static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
1140 { 1140 {
1141 struct kvm *kvm = vcpu->kvm; 1141 struct kvm *kvm = vcpu->kvm;
1142 1142
1143 switch (msr) { 1143 switch (msr) {
1144 case HV_X64_MSR_GUEST_OS_ID: 1144 case HV_X64_MSR_GUEST_OS_ID:
1145 kvm->arch.hv_guest_os_id = data; 1145 kvm->arch.hv_guest_os_id = data;
1146 /* setting guest os id to zero disables hypercall page */ 1146 /* setting guest os id to zero disables hypercall page */
1147 if (!kvm->arch.hv_guest_os_id) 1147 if (!kvm->arch.hv_guest_os_id)
1148 kvm->arch.hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE; 1148 kvm->arch.hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE;
1149 break; 1149 break;
1150 case HV_X64_MSR_HYPERCALL: { 1150 case HV_X64_MSR_HYPERCALL: {
1151 u64 gfn; 1151 u64 gfn;
1152 unsigned long addr; 1152 unsigned long addr;
1153 u8 instructions[4]; 1153 u8 instructions[4];
1154 1154
1155 /* if guest os id is not set hypercall should remain disabled */ 1155 /* if guest os id is not set hypercall should remain disabled */
1156 if (!kvm->arch.hv_guest_os_id) 1156 if (!kvm->arch.hv_guest_os_id)
1157 break; 1157 break;
1158 if (!(data & HV_X64_MSR_HYPERCALL_ENABLE)) { 1158 if (!(data & HV_X64_MSR_HYPERCALL_ENABLE)) {
1159 kvm->arch.hv_hypercall = data; 1159 kvm->arch.hv_hypercall = data;
1160 break; 1160 break;
1161 } 1161 }
1162 gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; 1162 gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
1163 addr = gfn_to_hva(kvm, gfn); 1163 addr = gfn_to_hva(kvm, gfn);
1164 if (kvm_is_error_hva(addr)) 1164 if (kvm_is_error_hva(addr))
1165 return 1; 1165 return 1;
1166 kvm_x86_ops->patch_hypercall(vcpu, instructions); 1166 kvm_x86_ops->patch_hypercall(vcpu, instructions);
1167 ((unsigned char *)instructions)[3] = 0xc3; /* ret */ 1167 ((unsigned char *)instructions)[3] = 0xc3; /* ret */
1168 if (copy_to_user((void __user *)addr, instructions, 4)) 1168 if (copy_to_user((void __user *)addr, instructions, 4))
1169 return 1; 1169 return 1;
1170 kvm->arch.hv_hypercall = data; 1170 kvm->arch.hv_hypercall = data;
1171 break; 1171 break;
1172 } 1172 }
1173 default: 1173 default:
1174 pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " 1174 pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
1175 "data 0x%llx\n", msr, data); 1175 "data 0x%llx\n", msr, data);
1176 return 1; 1176 return 1;
1177 } 1177 }
1178 return 0; 1178 return 0;
1179 } 1179 }
1180 1180
1181 static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data) 1181 static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
1182 { 1182 {
1183 switch (msr) { 1183 switch (msr) {
1184 case HV_X64_MSR_APIC_ASSIST_PAGE: { 1184 case HV_X64_MSR_APIC_ASSIST_PAGE: {
1185 unsigned long addr; 1185 unsigned long addr;
1186 1186
1187 if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) { 1187 if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) {
1188 vcpu->arch.hv_vapic = data; 1188 vcpu->arch.hv_vapic = data;
1189 break; 1189 break;
1190 } 1190 }
1191 addr = gfn_to_hva(vcpu->kvm, data >> 1191 addr = gfn_to_hva(vcpu->kvm, data >>
1192 HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT); 1192 HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT);
1193 if (kvm_is_error_hva(addr)) 1193 if (kvm_is_error_hva(addr))
1194 return 1; 1194 return 1;
1195 if (clear_user((void __user *)addr, PAGE_SIZE)) 1195 if (clear_user((void __user *)addr, PAGE_SIZE))
1196 return 1; 1196 return 1;
1197 vcpu->arch.hv_vapic = data; 1197 vcpu->arch.hv_vapic = data;
1198 break; 1198 break;
1199 } 1199 }
1200 case HV_X64_MSR_EOI: 1200 case HV_X64_MSR_EOI:
1201 return kvm_hv_vapic_msr_write(vcpu, APIC_EOI, data); 1201 return kvm_hv_vapic_msr_write(vcpu, APIC_EOI, data);
1202 case HV_X64_MSR_ICR: 1202 case HV_X64_MSR_ICR:
1203 return kvm_hv_vapic_msr_write(vcpu, APIC_ICR, data); 1203 return kvm_hv_vapic_msr_write(vcpu, APIC_ICR, data);
1204 case HV_X64_MSR_TPR: 1204 case HV_X64_MSR_TPR:
1205 return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data); 1205 return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data);
1206 default: 1206 default:
1207 pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " 1207 pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
1208 "data 0x%llx\n", msr, data); 1208 "data 0x%llx\n", msr, data);
1209 return 1; 1209 return 1;
1210 } 1210 }
1211 1211
1212 return 0; 1212 return 0;
1213 } 1213 }
1214 1214
1215 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) 1215 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
1216 { 1216 {
1217 switch (msr) { 1217 switch (msr) {
1218 case MSR_EFER: 1218 case MSR_EFER:
1219 return set_efer(vcpu, data); 1219 return set_efer(vcpu, data);
1220 case MSR_K7_HWCR: 1220 case MSR_K7_HWCR:
1221 data &= ~(u64)0x40; /* ignore flush filter disable */ 1221 data &= ~(u64)0x40; /* ignore flush filter disable */
1222 data &= ~(u64)0x100; /* ignore ignne emulation enable */ 1222 data &= ~(u64)0x100; /* ignore ignne emulation enable */
1223 if (data != 0) { 1223 if (data != 0) {
1224 pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", 1224 pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n",
1225 data); 1225 data);
1226 return 1; 1226 return 1;
1227 } 1227 }
1228 break; 1228 break;
1229 case MSR_FAM10H_MMIO_CONF_BASE: 1229 case MSR_FAM10H_MMIO_CONF_BASE:
1230 if (data != 0) { 1230 if (data != 0) {
1231 pr_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: " 1231 pr_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: "
1232 "0x%llx\n", data); 1232 "0x%llx\n", data);
1233 return 1; 1233 return 1;
1234 } 1234 }
1235 break; 1235 break;
1236 case MSR_AMD64_NB_CFG: 1236 case MSR_AMD64_NB_CFG:
1237 break; 1237 break;
1238 case MSR_IA32_DEBUGCTLMSR: 1238 case MSR_IA32_DEBUGCTLMSR:
1239 if (!data) { 1239 if (!data) {
1240 /* We support the non-activated case already */ 1240 /* We support the non-activated case already */
1241 break; 1241 break;
1242 } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { 1242 } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
1243 /* Values other than LBR and BTF are vendor-specific, 1243 /* Values other than LBR and BTF are vendor-specific,
1244 thus reserved and should throw a #GP */ 1244 thus reserved and should throw a #GP */
1245 return 1; 1245 return 1;
1246 } 1246 }
1247 pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", 1247 pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
1248 __func__, data); 1248 __func__, data);
1249 break; 1249 break;
1250 case MSR_IA32_UCODE_REV: 1250 case MSR_IA32_UCODE_REV:
1251 case MSR_IA32_UCODE_WRITE: 1251 case MSR_IA32_UCODE_WRITE:
1252 case MSR_VM_HSAVE_PA: 1252 case MSR_VM_HSAVE_PA:
1253 case MSR_AMD64_PATCH_LOADER: 1253 case MSR_AMD64_PATCH_LOADER:
1254 break; 1254 break;
1255 case 0x200 ... 0x2ff: 1255 case 0x200 ... 0x2ff:
1256 return set_msr_mtrr(vcpu, msr, data); 1256 return set_msr_mtrr(vcpu, msr, data);
1257 case MSR_IA32_APICBASE: 1257 case MSR_IA32_APICBASE:
1258 kvm_set_apic_base(vcpu, data); 1258 kvm_set_apic_base(vcpu, data);
1259 break; 1259 break;
1260 case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: 1260 case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
1261 return kvm_x2apic_msr_write(vcpu, msr, data); 1261 return kvm_x2apic_msr_write(vcpu, msr, data);
1262 case MSR_IA32_MISC_ENABLE: 1262 case MSR_IA32_MISC_ENABLE:
1263 vcpu->arch.ia32_misc_enable_msr = data; 1263 vcpu->arch.ia32_misc_enable_msr = data;
1264 break; 1264 break;
1265 case MSR_KVM_WALL_CLOCK_NEW: 1265 case MSR_KVM_WALL_CLOCK_NEW:
1266 case MSR_KVM_WALL_CLOCK: 1266 case MSR_KVM_WALL_CLOCK:
1267 vcpu->kvm->arch.wall_clock = data; 1267 vcpu->kvm->arch.wall_clock = data;
1268 kvm_write_wall_clock(vcpu->kvm, data); 1268 kvm_write_wall_clock(vcpu->kvm, data);
1269 break; 1269 break;
1270 case MSR_KVM_SYSTEM_TIME_NEW: 1270 case MSR_KVM_SYSTEM_TIME_NEW:
1271 case MSR_KVM_SYSTEM_TIME: { 1271 case MSR_KVM_SYSTEM_TIME: {
1272 if (vcpu->arch.time_page) { 1272 if (vcpu->arch.time_page) {
1273 kvm_release_page_dirty(vcpu->arch.time_page); 1273 kvm_release_page_dirty(vcpu->arch.time_page);
1274 vcpu->arch.time_page = NULL; 1274 vcpu->arch.time_page = NULL;
1275 } 1275 }
1276 1276
1277 vcpu->arch.time = data; 1277 vcpu->arch.time = data;
1278 1278
1279 /* we verify if the enable bit is set... */ 1279 /* we verify if the enable bit is set... */
1280 if (!(data & 1)) 1280 if (!(data & 1))
1281 break; 1281 break;
1282 1282
1283 /* ...but clean it before doing the actual write */ 1283 /* ...but clean it before doing the actual write */
1284 vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); 1284 vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
1285 1285
1286 vcpu->arch.time_page = 1286 vcpu->arch.time_page =
1287 gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); 1287 gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
1288 1288
1289 if (is_error_page(vcpu->arch.time_page)) { 1289 if (is_error_page(vcpu->arch.time_page)) {
1290 kvm_release_page_clean(vcpu->arch.time_page); 1290 kvm_release_page_clean(vcpu->arch.time_page);
1291 vcpu->arch.time_page = NULL; 1291 vcpu->arch.time_page = NULL;
1292 } 1292 }
1293 1293
1294 kvm_request_guest_time_update(vcpu); 1294 kvm_request_guest_time_update(vcpu);
1295 break; 1295 break;
1296 } 1296 }
1297 case MSR_IA32_MCG_CTL: 1297 case MSR_IA32_MCG_CTL:
1298 case MSR_IA32_MCG_STATUS: 1298 case MSR_IA32_MCG_STATUS:
1299 case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: 1299 case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
1300 return set_msr_mce(vcpu, msr, data); 1300 return set_msr_mce(vcpu, msr, data);
1301 1301
1302 /* Performance counters are not protected by a CPUID bit, 1302 /* Performance counters are not protected by a CPUID bit,
1303 * so we should check all of them in the generic path for the sake of 1303 * so we should check all of them in the generic path for the sake of
1304 * cross vendor migration. 1304 * cross vendor migration.
1305 * Writing a zero into the event select MSRs disables them, 1305 * Writing a zero into the event select MSRs disables them,
1306 * which we perfectly emulate ;-). Any other value should be at least 1306 * which we perfectly emulate ;-). Any other value should be at least
1307 * reported, some guests depend on them. 1307 * reported, some guests depend on them.
1308 */ 1308 */
1309 case MSR_P6_EVNTSEL0: 1309 case MSR_P6_EVNTSEL0:
1310 case MSR_P6_EVNTSEL1: 1310 case MSR_P6_EVNTSEL1:
1311 case MSR_K7_EVNTSEL0: 1311 case MSR_K7_EVNTSEL0:
1312 case MSR_K7_EVNTSEL1: 1312 case MSR_K7_EVNTSEL1:
1313 case MSR_K7_EVNTSEL2: 1313 case MSR_K7_EVNTSEL2:
1314 case MSR_K7_EVNTSEL3: 1314 case MSR_K7_EVNTSEL3:
1315 if (data != 0) 1315 if (data != 0)
1316 pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " 1316 pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
1317 "0x%x data 0x%llx\n", msr, data); 1317 "0x%x data 0x%llx\n", msr, data);
1318 break; 1318 break;
1319 /* at least RHEL 4 unconditionally writes to the perfctr registers, 1319 /* at least RHEL 4 unconditionally writes to the perfctr registers,
1320 * so we ignore writes to make it happy. 1320 * so we ignore writes to make it happy.
1321 */ 1321 */
1322 case MSR_P6_PERFCTR0: 1322 case MSR_P6_PERFCTR0:
1323 case MSR_P6_PERFCTR1: 1323 case MSR_P6_PERFCTR1:
1324 case MSR_K7_PERFCTR0: 1324 case MSR_K7_PERFCTR0:
1325 case MSR_K7_PERFCTR1: 1325 case MSR_K7_PERFCTR1:
1326 case MSR_K7_PERFCTR2: 1326 case MSR_K7_PERFCTR2:
1327 case MSR_K7_PERFCTR3: 1327 case MSR_K7_PERFCTR3:
1328 pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " 1328 pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
1329 "0x%x data 0x%llx\n", msr, data); 1329 "0x%x data 0x%llx\n", msr, data);
1330 break; 1330 break;
1331 case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: 1331 case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
1332 if (kvm_hv_msr_partition_wide(msr)) { 1332 if (kvm_hv_msr_partition_wide(msr)) {
1333 int r; 1333 int r;
1334 mutex_lock(&vcpu->kvm->lock); 1334 mutex_lock(&vcpu->kvm->lock);
1335 r = set_msr_hyperv_pw(vcpu, msr, data); 1335 r = set_msr_hyperv_pw(vcpu, msr, data);
1336 mutex_unlock(&vcpu->kvm->lock); 1336 mutex_unlock(&vcpu->kvm->lock);
1337 return r; 1337 return r;
1338 } else 1338 } else
1339 return set_msr_hyperv(vcpu, msr, data); 1339 return set_msr_hyperv(vcpu, msr, data);
1340 break; 1340 break;
1341 default: 1341 default:
1342 if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr)) 1342 if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
1343 return xen_hvm_config(vcpu, data); 1343 return xen_hvm_config(vcpu, data);
1344 if (!ignore_msrs) { 1344 if (!ignore_msrs) {
1345 pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", 1345 pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
1346 msr, data); 1346 msr, data);
1347 return 1; 1347 return 1;
1348 } else { 1348 } else {
1349 pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n", 1349 pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n",
1350 msr, data); 1350 msr, data);
1351 break; 1351 break;
1352 } 1352 }
1353 } 1353 }
1354 return 0; 1354 return 0;
1355 } 1355 }
1356 EXPORT_SYMBOL_GPL(kvm_set_msr_common); 1356 EXPORT_SYMBOL_GPL(kvm_set_msr_common);
1357 1357
1358 1358
1359 /* 1359 /*
1360 * Reads an msr value (of 'msr_index') into 'pdata'. 1360 * Reads an msr value (of 'msr_index') into 'pdata'.
1361 * Returns 0 on success, non-0 otherwise. 1361 * Returns 0 on success, non-0 otherwise.
1362 * Assumes vcpu_load() was already called. 1362 * Assumes vcpu_load() was already called.
1363 */ 1363 */
1364 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) 1364 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
1365 { 1365 {
1366 return kvm_x86_ops->get_msr(vcpu, msr_index, pdata); 1366 return kvm_x86_ops->get_msr(vcpu, msr_index, pdata);
1367 } 1367 }
1368 1368
1369 static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) 1369 static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
1370 { 1370 {
1371 u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; 1371 u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
1372 1372
1373 if (!msr_mtrr_valid(msr)) 1373 if (!msr_mtrr_valid(msr))
1374 return 1; 1374 return 1;
1375 1375
1376 if (msr == MSR_MTRRdefType) 1376 if (msr == MSR_MTRRdefType)
1377 *pdata = vcpu->arch.mtrr_state.def_type + 1377 *pdata = vcpu->arch.mtrr_state.def_type +
1378 (vcpu->arch.mtrr_state.enabled << 10); 1378 (vcpu->arch.mtrr_state.enabled << 10);
1379 else if (msr == MSR_MTRRfix64K_00000) 1379 else if (msr == MSR_MTRRfix64K_00000)
1380 *pdata = p[0]; 1380 *pdata = p[0];
1381 else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) 1381 else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000)
1382 *pdata = p[1 + msr - MSR_MTRRfix16K_80000]; 1382 *pdata = p[1 + msr - MSR_MTRRfix16K_80000];
1383 else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) 1383 else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000)
1384 *pdata = p[3 + msr - MSR_MTRRfix4K_C0000]; 1384 *pdata = p[3 + msr - MSR_MTRRfix4K_C0000];
1385 else if (msr == MSR_IA32_CR_PAT) 1385 else if (msr == MSR_IA32_CR_PAT)
1386 *pdata = vcpu->arch.pat; 1386 *pdata = vcpu->arch.pat;
1387 else { /* Variable MTRRs */ 1387 else { /* Variable MTRRs */
1388 int idx, is_mtrr_mask; 1388 int idx, is_mtrr_mask;
1389 u64 *pt; 1389 u64 *pt;
1390 1390
1391 idx = (msr - 0x200) / 2; 1391 idx = (msr - 0x200) / 2;
1392 is_mtrr_mask = msr - 0x200 - 2 * idx; 1392 is_mtrr_mask = msr - 0x200 - 2 * idx;
1393 if (!is_mtrr_mask) 1393 if (!is_mtrr_mask)
1394 pt = 1394 pt =
1395 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; 1395 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo;
1396 else 1396 else
1397 pt = 1397 pt =
1398 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; 1398 (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo;
1399 *pdata = *pt; 1399 *pdata = *pt;
1400 } 1400 }
1401 1401
1402 return 0; 1402 return 0;
1403 } 1403 }
1404 1404
1405 static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) 1405 static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
1406 { 1406 {
1407 u64 data; 1407 u64 data;
1408 u64 mcg_cap = vcpu->arch.mcg_cap; 1408 u64 mcg_cap = vcpu->arch.mcg_cap;
1409 unsigned bank_num = mcg_cap & 0xff; 1409 unsigned bank_num = mcg_cap & 0xff;
1410 1410
1411 switch (msr) { 1411 switch (msr) {
1412 case MSR_IA32_P5_MC_ADDR: 1412 case MSR_IA32_P5_MC_ADDR:
1413 case MSR_IA32_P5_MC_TYPE: 1413 case MSR_IA32_P5_MC_TYPE:
1414 data = 0; 1414 data = 0;
1415 break; 1415 break;
1416 case MSR_IA32_MCG_CAP: 1416 case MSR_IA32_MCG_CAP:
1417 data = vcpu->arch.mcg_cap; 1417 data = vcpu->arch.mcg_cap;
1418 break; 1418 break;
1419 case MSR_IA32_MCG_CTL: 1419 case MSR_IA32_MCG_CTL:
1420 if (!(mcg_cap & MCG_CTL_P)) 1420 if (!(mcg_cap & MCG_CTL_P))
1421 return 1; 1421 return 1;
1422 data = vcpu->arch.mcg_ctl; 1422 data = vcpu->arch.mcg_ctl;
1423 break; 1423 break;
1424 case MSR_IA32_MCG_STATUS: 1424 case MSR_IA32_MCG_STATUS:
1425 data = vcpu->arch.mcg_status; 1425 data = vcpu->arch.mcg_status;
1426 break; 1426 break;
1427 default: 1427 default:
1428 if (msr >= MSR_IA32_MC0_CTL && 1428 if (msr >= MSR_IA32_MC0_CTL &&
1429 msr < MSR_IA32_MC0_CTL + 4 * bank_num) { 1429 msr < MSR_IA32_MC0_CTL + 4 * bank_num) {
1430 u32 offset = msr - MSR_IA32_MC0_CTL; 1430 u32 offset = msr - MSR_IA32_MC0_CTL;
1431 data = vcpu->arch.mce_banks[offset]; 1431 data = vcpu->arch.mce_banks[offset];
1432 break; 1432 break;
1433 } 1433 }
1434 return 1; 1434 return 1;
1435 } 1435 }
1436 *pdata = data; 1436 *pdata = data;
1437 return 0; 1437 return 0;
1438 } 1438 }
1439 1439
1440 static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) 1440 static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
1441 { 1441 {
1442 u64 data = 0; 1442 u64 data = 0;
1443 struct kvm *kvm = vcpu->kvm; 1443 struct kvm *kvm = vcpu->kvm;
1444 1444
1445 switch (msr) { 1445 switch (msr) {
1446 case HV_X64_MSR_GUEST_OS_ID: 1446 case HV_X64_MSR_GUEST_OS_ID:
1447 data = kvm->arch.hv_guest_os_id; 1447 data = kvm->arch.hv_guest_os_id;
1448 break; 1448 break;
1449 case HV_X64_MSR_HYPERCALL: 1449 case HV_X64_MSR_HYPERCALL:
1450 data = kvm->arch.hv_hypercall; 1450 data = kvm->arch.hv_hypercall;
1451 break; 1451 break;
1452 default: 1452 default:
1453 pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); 1453 pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
1454 return 1; 1454 return 1;
1455 } 1455 }
1456 1456
1457 *pdata = data; 1457 *pdata = data;
1458 return 0; 1458 return 0;
1459 } 1459 }
1460 1460
1461 static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) 1461 static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
1462 { 1462 {
1463 u64 data = 0; 1463 u64 data = 0;
1464 1464
1465 switch (msr) { 1465 switch (msr) {
1466 case HV_X64_MSR_VP_INDEX: { 1466 case HV_X64_MSR_VP_INDEX: {
1467 int r; 1467 int r;
1468 struct kvm_vcpu *v; 1468 struct kvm_vcpu *v;
1469 kvm_for_each_vcpu(r, v, vcpu->kvm) 1469 kvm_for_each_vcpu(r, v, vcpu->kvm)
1470 if (v == vcpu) 1470 if (v == vcpu)
1471 data = r; 1471 data = r;
1472 break; 1472 break;
1473 } 1473 }
1474 case HV_X64_MSR_EOI: 1474 case HV_X64_MSR_EOI:
1475 return kvm_hv_vapic_msr_read(vcpu, APIC_EOI, pdata); 1475 return kvm_hv_vapic_msr_read(vcpu, APIC_EOI, pdata);
1476 case HV_X64_MSR_ICR: 1476 case HV_X64_MSR_ICR:
1477 return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata); 1477 return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata);
1478 case HV_X64_MSR_TPR: 1478 case HV_X64_MSR_TPR:
1479 return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata); 1479 return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata);
1480 default: 1480 default:
1481 pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); 1481 pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
1482 return 1; 1482 return 1;
1483 } 1483 }
1484 *pdata = data; 1484 *pdata = data;
1485 return 0; 1485 return 0;
1486 } 1486 }
1487 1487
1488 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) 1488 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
1489 { 1489 {
1490 u64 data; 1490 u64 data;
1491 1491
1492 switch (msr) { 1492 switch (msr) {
1493 case MSR_IA32_PLATFORM_ID: 1493 case MSR_IA32_PLATFORM_ID:
1494 case MSR_IA32_UCODE_REV: 1494 case MSR_IA32_UCODE_REV:
1495 case MSR_IA32_EBL_CR_POWERON: 1495 case MSR_IA32_EBL_CR_POWERON:
1496 case MSR_IA32_DEBUGCTLMSR: 1496 case MSR_IA32_DEBUGCTLMSR:
1497 case MSR_IA32_LASTBRANCHFROMIP: 1497 case MSR_IA32_LASTBRANCHFROMIP:
1498 case MSR_IA32_LASTBRANCHTOIP: 1498 case MSR_IA32_LASTBRANCHTOIP:
1499 case MSR_IA32_LASTINTFROMIP: 1499 case MSR_IA32_LASTINTFROMIP:
1500 case MSR_IA32_LASTINTTOIP: 1500 case MSR_IA32_LASTINTTOIP:
1501 case MSR_K8_SYSCFG: 1501 case MSR_K8_SYSCFG:
1502 case MSR_K7_HWCR: 1502 case MSR_K7_HWCR:
1503 case MSR_VM_HSAVE_PA: 1503 case MSR_VM_HSAVE_PA:
1504 case MSR_P6_PERFCTR0: 1504 case MSR_P6_PERFCTR0:
1505 case MSR_P6_PERFCTR1: 1505 case MSR_P6_PERFCTR1:
1506 case MSR_P6_EVNTSEL0: 1506 case MSR_P6_EVNTSEL0:
1507 case MSR_P6_EVNTSEL1: 1507 case MSR_P6_EVNTSEL1:
1508 case MSR_K7_EVNTSEL0: 1508 case MSR_K7_EVNTSEL0:
1509 case MSR_K7_PERFCTR0: 1509 case MSR_K7_PERFCTR0:
1510 case MSR_K8_INT_PENDING_MSG: 1510 case MSR_K8_INT_PENDING_MSG:
1511 case MSR_AMD64_NB_CFG: 1511 case MSR_AMD64_NB_CFG:
1512 case MSR_FAM10H_MMIO_CONF_BASE: 1512 case MSR_FAM10H_MMIO_CONF_BASE:
1513 data = 0; 1513 data = 0;
1514 break; 1514 break;
1515 case MSR_MTRRcap: 1515 case MSR_MTRRcap:
1516 data = 0x500 | KVM_NR_VAR_MTRR; 1516 data = 0x500 | KVM_NR_VAR_MTRR;
1517 break; 1517 break;
1518 case 0x200 ... 0x2ff: 1518 case 0x200 ... 0x2ff:
1519 return get_msr_mtrr(vcpu, msr, pdata); 1519 return get_msr_mtrr(vcpu, msr, pdata);
1520 case 0xcd: /* fsb frequency */ 1520 case 0xcd: /* fsb frequency */
1521 data = 3; 1521 data = 3;
1522 break; 1522 break;
1523 case MSR_IA32_APICBASE: 1523 case MSR_IA32_APICBASE:
1524 data = kvm_get_apic_base(vcpu); 1524 data = kvm_get_apic_base(vcpu);
1525 break; 1525 break;
1526 case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: 1526 case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
1527 return kvm_x2apic_msr_read(vcpu, msr, pdata); 1527 return kvm_x2apic_msr_read(vcpu, msr, pdata);
1528 break; 1528 break;
1529 case MSR_IA32_MISC_ENABLE: 1529 case MSR_IA32_MISC_ENABLE:
1530 data = vcpu->arch.ia32_misc_enable_msr; 1530 data = vcpu->arch.ia32_misc_enable_msr;
1531 break; 1531 break;
1532 case MSR_IA32_PERF_STATUS: 1532 case MSR_IA32_PERF_STATUS:
1533 /* TSC increment by tick */ 1533 /* TSC increment by tick */
1534 data = 1000ULL; 1534 data = 1000ULL;
1535 /* CPU multiplier */ 1535 /* CPU multiplier */
1536 data |= (((uint64_t)4ULL) << 40); 1536 data |= (((uint64_t)4ULL) << 40);
1537 break; 1537 break;
1538 case MSR_EFER: 1538 case MSR_EFER:
1539 data = vcpu->arch.efer; 1539 data = vcpu->arch.efer;
1540 break; 1540 break;
1541 case MSR_KVM_WALL_CLOCK: 1541 case MSR_KVM_WALL_CLOCK:
1542 case MSR_KVM_WALL_CLOCK_NEW: 1542 case MSR_KVM_WALL_CLOCK_NEW:
1543 data = vcpu->kvm->arch.wall_clock; 1543 data = vcpu->kvm->arch.wall_clock;
1544 break; 1544 break;
1545 case MSR_KVM_SYSTEM_TIME: 1545 case MSR_KVM_SYSTEM_TIME:
1546 case MSR_KVM_SYSTEM_TIME_NEW: 1546 case MSR_KVM_SYSTEM_TIME_NEW:
1547 data = vcpu->arch.time; 1547 data = vcpu->arch.time;
1548 break; 1548 break;
1549 case MSR_IA32_P5_MC_ADDR: 1549 case MSR_IA32_P5_MC_ADDR:
1550 case MSR_IA32_P5_MC_TYPE: 1550 case MSR_IA32_P5_MC_TYPE:
1551 case MSR_IA32_MCG_CAP: 1551 case MSR_IA32_MCG_CAP:
1552 case MSR_IA32_MCG_CTL: 1552 case MSR_IA32_MCG_CTL:
1553 case MSR_IA32_MCG_STATUS: 1553 case MSR_IA32_MCG_STATUS:
1554 case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: 1554 case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
1555 return get_msr_mce(vcpu, msr, pdata); 1555 return get_msr_mce(vcpu, msr, pdata);
1556 case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: 1556 case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
1557 if (kvm_hv_msr_partition_wide(msr)) { 1557 if (kvm_hv_msr_partition_wide(msr)) {
1558 int r; 1558 int r;
1559 mutex_lock(&vcpu->kvm->lock); 1559 mutex_lock(&vcpu->kvm->lock);
1560 r = get_msr_hyperv_pw(vcpu, msr, pdata); 1560 r = get_msr_hyperv_pw(vcpu, msr, pdata);
1561 mutex_unlock(&vcpu->kvm->lock); 1561 mutex_unlock(&vcpu->kvm->lock);
1562 return r; 1562 return r;
1563 } else 1563 } else
1564 return get_msr_hyperv(vcpu, msr, pdata); 1564 return get_msr_hyperv(vcpu, msr, pdata);
1565 break; 1565 break;
1566 default: 1566 default:
1567 if (!ignore_msrs) { 1567 if (!ignore_msrs) {
1568 pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); 1568 pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
1569 return 1; 1569 return 1;
1570 } else { 1570 } else {
1571 pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr); 1571 pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr);
1572 data = 0; 1572 data = 0;
1573 } 1573 }
1574 break; 1574 break;
1575 } 1575 }
1576 *pdata = data; 1576 *pdata = data;
1577 return 0; 1577 return 0;
1578 } 1578 }
1579 EXPORT_SYMBOL_GPL(kvm_get_msr_common); 1579 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
1580 1580
1581 /* 1581 /*
1582 * Read or write a bunch of msrs. All parameters are kernel addresses. 1582 * Read or write a bunch of msrs. All parameters are kernel addresses.
1583 * 1583 *
1584 * @return number of msrs set successfully. 1584 * @return number of msrs set successfully.
1585 */ 1585 */
1586 static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, 1586 static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
1587 struct kvm_msr_entry *entries, 1587 struct kvm_msr_entry *entries,
1588 int (*do_msr)(struct kvm_vcpu *vcpu, 1588 int (*do_msr)(struct kvm_vcpu *vcpu,
1589 unsigned index, u64 *data)) 1589 unsigned index, u64 *data))
1590 { 1590 {
1591 int i, idx; 1591 int i, idx;
1592 1592
1593 idx = srcu_read_lock(&vcpu->kvm->srcu); 1593 idx = srcu_read_lock(&vcpu->kvm->srcu);
1594 for (i = 0; i < msrs->nmsrs; ++i) 1594 for (i = 0; i < msrs->nmsrs; ++i)
1595 if (do_msr(vcpu, entries[i].index, &entries[i].data)) 1595 if (do_msr(vcpu, entries[i].index, &entries[i].data))
1596 break; 1596 break;
1597 srcu_read_unlock(&vcpu->kvm->srcu, idx); 1597 srcu_read_unlock(&vcpu->kvm->srcu, idx);
1598 1598
1599 return i; 1599 return i;
1600 } 1600 }
1601 1601
1602 /* 1602 /*
1603 * Read or write a bunch of msrs. Parameters are user addresses. 1603 * Read or write a bunch of msrs. Parameters are user addresses.
1604 * 1604 *
1605 * @return number of msrs set successfully. 1605 * @return number of msrs set successfully.
1606 */ 1606 */
1607 static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, 1607 static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
1608 int (*do_msr)(struct kvm_vcpu *vcpu, 1608 int (*do_msr)(struct kvm_vcpu *vcpu,
1609 unsigned index, u64 *data), 1609 unsigned index, u64 *data),
1610 int writeback) 1610 int writeback)
1611 { 1611 {
1612 struct kvm_msrs msrs; 1612 struct kvm_msrs msrs;
1613 struct kvm_msr_entry *entries; 1613 struct kvm_msr_entry *entries;
1614 int r, n; 1614 int r, n;
1615 unsigned size; 1615 unsigned size;
1616 1616
1617 r = -EFAULT; 1617 r = -EFAULT;
1618 if (copy_from_user(&msrs, user_msrs, sizeof msrs)) 1618 if (copy_from_user(&msrs, user_msrs, sizeof msrs))
1619 goto out; 1619 goto out;
1620 1620
1621 r = -E2BIG; 1621 r = -E2BIG;
1622 if (msrs.nmsrs >= MAX_IO_MSRS) 1622 if (msrs.nmsrs >= MAX_IO_MSRS)
1623 goto out; 1623 goto out;
1624 1624
1625 r = -ENOMEM; 1625 r = -ENOMEM;
1626 size = sizeof(struct kvm_msr_entry) * msrs.nmsrs; 1626 size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
1627 entries = kmalloc(size, GFP_KERNEL); 1627 entries = kmalloc(size, GFP_KERNEL);
1628 if (!entries) 1628 if (!entries)
1629 goto out; 1629 goto out;
1630 1630
1631 r = -EFAULT; 1631 r = -EFAULT;
1632 if (copy_from_user(entries, user_msrs->entries, size)) 1632 if (copy_from_user(entries, user_msrs->entries, size))
1633 goto out_free; 1633 goto out_free;
1634 1634
1635 r = n = __msr_io(vcpu, &msrs, entries, do_msr); 1635 r = n = __msr_io(vcpu, &msrs, entries, do_msr);
1636 if (r < 0) 1636 if (r < 0)
1637 goto out_free; 1637 goto out_free;
1638 1638
1639 r = -EFAULT; 1639 r = -EFAULT;
1640 if (writeback && copy_to_user(user_msrs->entries, entries, size)) 1640 if (writeback && copy_to_user(user_msrs->entries, entries, size))
1641 goto out_free; 1641 goto out_free;
1642 1642
1643 r = n; 1643 r = n;
1644 1644
1645 out_free: 1645 out_free:
1646 kfree(entries); 1646 kfree(entries);
1647 out: 1647 out:
1648 return r; 1648 return r;
1649 } 1649 }
1650 1650
1651 int kvm_dev_ioctl_check_extension(long ext) 1651 int kvm_dev_ioctl_check_extension(long ext)
1652 { 1652 {
1653 int r; 1653 int r;
1654 1654
1655 switch (ext) { 1655 switch (ext) {
1656 case KVM_CAP_IRQCHIP: 1656 case KVM_CAP_IRQCHIP:
1657 case KVM_CAP_HLT: 1657 case KVM_CAP_HLT:
1658 case KVM_CAP_MMU_SHADOW_CACHE_CONTROL: 1658 case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
1659 case KVM_CAP_SET_TSS_ADDR: 1659 case KVM_CAP_SET_TSS_ADDR:
1660 case KVM_CAP_EXT_CPUID: 1660 case KVM_CAP_EXT_CPUID:
1661 case KVM_CAP_CLOCKSOURCE: 1661 case KVM_CAP_CLOCKSOURCE:
1662 case KVM_CAP_PIT: 1662 case KVM_CAP_PIT:
1663 case KVM_CAP_NOP_IO_DELAY: 1663 case KVM_CAP_NOP_IO_DELAY:
1664 case KVM_CAP_MP_STATE: 1664 case KVM_CAP_MP_STATE:
1665 case KVM_CAP_SYNC_MMU: 1665 case KVM_CAP_SYNC_MMU:
1666 case KVM_CAP_REINJECT_CONTROL: 1666 case KVM_CAP_REINJECT_CONTROL:
1667 case KVM_CAP_IRQ_INJECT_STATUS: 1667 case KVM_CAP_IRQ_INJECT_STATUS:
1668 case KVM_CAP_ASSIGN_DEV_IRQ: 1668 case KVM_CAP_ASSIGN_DEV_IRQ:
1669 case KVM_CAP_IRQFD: 1669 case KVM_CAP_IRQFD:
1670 case KVM_CAP_IOEVENTFD: 1670 case KVM_CAP_IOEVENTFD:
1671 case KVM_CAP_PIT2: 1671 case KVM_CAP_PIT2:
1672 case KVM_CAP_PIT_STATE2: 1672 case KVM_CAP_PIT_STATE2:
1673 case KVM_CAP_SET_IDENTITY_MAP_ADDR: 1673 case KVM_CAP_SET_IDENTITY_MAP_ADDR:
1674 case KVM_CAP_XEN_HVM: 1674 case KVM_CAP_XEN_HVM:
1675 case KVM_CAP_ADJUST_CLOCK: 1675 case KVM_CAP_ADJUST_CLOCK:
1676 case KVM_CAP_VCPU_EVENTS: 1676 case KVM_CAP_VCPU_EVENTS:
1677 case KVM_CAP_HYPERV: 1677 case KVM_CAP_HYPERV:
1678 case KVM_CAP_HYPERV_VAPIC: 1678 case KVM_CAP_HYPERV_VAPIC:
1679 case KVM_CAP_HYPERV_SPIN: 1679 case KVM_CAP_HYPERV_SPIN:
1680 case KVM_CAP_PCI_SEGMENT: 1680 case KVM_CAP_PCI_SEGMENT:
1681 case KVM_CAP_DEBUGREGS: 1681 case KVM_CAP_DEBUGREGS:
1682 case KVM_CAP_X86_ROBUST_SINGLESTEP: 1682 case KVM_CAP_X86_ROBUST_SINGLESTEP:
1683 case KVM_CAP_XSAVE: 1683 case KVM_CAP_XSAVE:
1684 r = 1; 1684 r = 1;
1685 break; 1685 break;
1686 case KVM_CAP_COALESCED_MMIO: 1686 case KVM_CAP_COALESCED_MMIO:
1687 r = KVM_COALESCED_MMIO_PAGE_OFFSET; 1687 r = KVM_COALESCED_MMIO_PAGE_OFFSET;
1688 break; 1688 break;
1689 case KVM_CAP_VAPIC: 1689 case KVM_CAP_VAPIC:
1690 r = !kvm_x86_ops->cpu_has_accelerated_tpr(); 1690 r = !kvm_x86_ops->cpu_has_accelerated_tpr();
1691 break; 1691 break;
1692 case KVM_CAP_NR_VCPUS: 1692 case KVM_CAP_NR_VCPUS:
1693 r = KVM_MAX_VCPUS; 1693 r = KVM_MAX_VCPUS;
1694 break; 1694 break;
1695 case KVM_CAP_NR_MEMSLOTS: 1695 case KVM_CAP_NR_MEMSLOTS:
1696 r = KVM_MEMORY_SLOTS; 1696 r = KVM_MEMORY_SLOTS;
1697 break; 1697 break;
1698 case KVM_CAP_PV_MMU: /* obsolete */ 1698 case KVM_CAP_PV_MMU: /* obsolete */
1699 r = 0; 1699 r = 0;
1700 break; 1700 break;
1701 case KVM_CAP_IOMMU: 1701 case KVM_CAP_IOMMU:
1702 r = iommu_found(); 1702 r = iommu_found();
1703 break; 1703 break;
1704 case KVM_CAP_MCE: 1704 case KVM_CAP_MCE:
1705 r = KVM_MAX_MCE_BANKS; 1705 r = KVM_MAX_MCE_BANKS;
1706 break; 1706 break;
1707 case KVM_CAP_XCRS: 1707 case KVM_CAP_XCRS:
1708 r = cpu_has_xsave; 1708 r = cpu_has_xsave;
1709 break; 1709 break;
1710 default: 1710 default:
1711 r = 0; 1711 r = 0;
1712 break; 1712 break;
1713 } 1713 }
1714 return r; 1714 return r;
1715 1715
1716 } 1716 }
1717 1717
1718 long kvm_arch_dev_ioctl(struct file *filp, 1718 long kvm_arch_dev_ioctl(struct file *filp,
1719 unsigned int ioctl, unsigned long arg) 1719 unsigned int ioctl, unsigned long arg)
1720 { 1720 {
1721 void __user *argp = (void __user *)arg; 1721 void __user *argp = (void __user *)arg;
1722 long r; 1722 long r;
1723 1723
1724 switch (ioctl) { 1724 switch (ioctl) {
1725 case KVM_GET_MSR_INDEX_LIST: { 1725 case KVM_GET_MSR_INDEX_LIST: {
1726 struct kvm_msr_list __user *user_msr_list = argp; 1726 struct kvm_msr_list __user *user_msr_list = argp;
1727 struct kvm_msr_list msr_list; 1727 struct kvm_msr_list msr_list;
1728 unsigned n; 1728 unsigned n;
1729 1729
1730 r = -EFAULT; 1730 r = -EFAULT;
1731 if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list)) 1731 if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list))
1732 goto out; 1732 goto out;
1733 n = msr_list.nmsrs; 1733 n = msr_list.nmsrs;
1734 msr_list.nmsrs = num_msrs_to_save + ARRAY_SIZE(emulated_msrs); 1734 msr_list.nmsrs = num_msrs_to_save + ARRAY_SIZE(emulated_msrs);
1735 if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list)) 1735 if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list))
1736 goto out; 1736 goto out;
1737 r = -E2BIG; 1737 r = -E2BIG;
1738 if (n < msr_list.nmsrs) 1738 if (n < msr_list.nmsrs)
1739 goto out; 1739 goto out;
1740 r = -EFAULT; 1740 r = -EFAULT;
1741 if (copy_to_user(user_msr_list->indices, &msrs_to_save, 1741 if (copy_to_user(user_msr_list->indices, &msrs_to_save,
1742 num_msrs_to_save * sizeof(u32))) 1742 num_msrs_to_save * sizeof(u32)))
1743 goto out; 1743 goto out;
1744 if (copy_to_user(user_msr_list->indices + num_msrs_to_save, 1744 if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
1745 &emulated_msrs, 1745 &emulated_msrs,
1746 ARRAY_SIZE(emulated_msrs) * sizeof(u32))) 1746 ARRAY_SIZE(emulated_msrs) * sizeof(u32)))
1747 goto out; 1747 goto out;
1748 r = 0; 1748 r = 0;
1749 break; 1749 break;
1750 } 1750 }
1751 case KVM_GET_SUPPORTED_CPUID: { 1751 case KVM_GET_SUPPORTED_CPUID: {
1752 struct kvm_cpuid2 __user *cpuid_arg = argp; 1752 struct kvm_cpuid2 __user *cpuid_arg = argp;
1753 struct kvm_cpuid2 cpuid; 1753 struct kvm_cpuid2 cpuid;
1754 1754
1755 r = -EFAULT; 1755 r = -EFAULT;
1756 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) 1756 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
1757 goto out; 1757 goto out;
1758 r = kvm_dev_ioctl_get_supported_cpuid(&cpuid, 1758 r = kvm_dev_ioctl_get_supported_cpuid(&cpuid,
1759 cpuid_arg->entries); 1759 cpuid_arg->entries);
1760 if (r) 1760 if (r)
1761 goto out; 1761 goto out;
1762 1762
1763 r = -EFAULT; 1763 r = -EFAULT;
1764 if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) 1764 if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid))
1765 goto out; 1765 goto out;
1766 r = 0; 1766 r = 0;
1767 break; 1767 break;
1768 } 1768 }
1769 case KVM_X86_GET_MCE_CAP_SUPPORTED: { 1769 case KVM_X86_GET_MCE_CAP_SUPPORTED: {
1770 u64 mce_cap; 1770 u64 mce_cap;
1771 1771
1772 mce_cap = KVM_MCE_CAP_SUPPORTED; 1772 mce_cap = KVM_MCE_CAP_SUPPORTED;
1773 r = -EFAULT; 1773 r = -EFAULT;
1774 if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) 1774 if (copy_to_user(argp, &mce_cap, sizeof mce_cap))
1775 goto out; 1775 goto out;
1776 r = 0; 1776 r = 0;
1777 break; 1777 break;
1778 } 1778 }
1779 default: 1779 default:
1780 r = -EINVAL; 1780 r = -EINVAL;
1781 } 1781 }
1782 out: 1782 out:
1783 return r; 1783 return r;
1784 } 1784 }
1785 1785
1786 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 1786 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
1787 { 1787 {
1788 kvm_x86_ops->vcpu_load(vcpu, cpu); 1788 kvm_x86_ops->vcpu_load(vcpu, cpu);
1789 if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0)) { 1789 if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0)) {
1790 unsigned long khz = cpufreq_quick_get(cpu); 1790 unsigned long khz = cpufreq_quick_get(cpu);
1791 if (!khz) 1791 if (!khz)
1792 khz = tsc_khz; 1792 khz = tsc_khz;
1793 per_cpu(cpu_tsc_khz, cpu) = khz; 1793 per_cpu(cpu_tsc_khz, cpu) = khz;
1794 } 1794 }
1795 kvm_request_guest_time_update(vcpu); 1795 kvm_request_guest_time_update(vcpu);
1796 } 1796 }
1797 1797
1798 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 1798 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
1799 { 1799 {
1800 kvm_x86_ops->vcpu_put(vcpu); 1800 kvm_x86_ops->vcpu_put(vcpu);
1801 kvm_put_guest_fpu(vcpu); 1801 kvm_put_guest_fpu(vcpu);
1802 } 1802 }
1803 1803
1804 static int is_efer_nx(void) 1804 static int is_efer_nx(void)
1805 { 1805 {
1806 unsigned long long efer = 0; 1806 unsigned long long efer = 0;
1807 1807
1808 rdmsrl_safe(MSR_EFER, &efer); 1808 rdmsrl_safe(MSR_EFER, &efer);
1809 return efer & EFER_NX; 1809 return efer & EFER_NX;
1810 } 1810 }
1811 1811
1812 static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) 1812 static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
1813 { 1813 {
1814 int i; 1814 int i;
1815 struct kvm_cpuid_entry2 *e, *entry; 1815 struct kvm_cpuid_entry2 *e, *entry;
1816 1816
1817 entry = NULL; 1817 entry = NULL;
1818 for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { 1818 for (i = 0; i < vcpu->arch.cpuid_nent; ++i) {
1819 e = &vcpu->arch.cpuid_entries[i]; 1819 e = &vcpu->arch.cpuid_entries[i];
1820 if (e->function == 0x80000001) { 1820 if (e->function == 0x80000001) {
1821 entry = e; 1821 entry = e;
1822 break; 1822 break;
1823 } 1823 }
1824 } 1824 }
1825 if (entry && (entry->edx & (1 << 20)) && !is_efer_nx()) { 1825 if (entry && (entry->edx & (1 << 20)) && !is_efer_nx()) {
1826 entry->edx &= ~(1 << 20); 1826 entry->edx &= ~(1 << 20);
1827 printk(KERN_INFO "kvm: guest NX capability removed\n"); 1827 printk(KERN_INFO "kvm: guest NX capability removed\n");
1828 } 1828 }
1829 } 1829 }
1830 1830
1831 /* when an old userspace process fills a new kernel module */ 1831 /* when an old userspace process fills a new kernel module */
1832 static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, 1832 static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
1833 struct kvm_cpuid *cpuid, 1833 struct kvm_cpuid *cpuid,
1834 struct kvm_cpuid_entry __user *entries) 1834 struct kvm_cpuid_entry __user *entries)
1835 { 1835 {
1836 int r, i; 1836 int r, i;
1837 struct kvm_cpuid_entry *cpuid_entries; 1837 struct kvm_cpuid_entry *cpuid_entries;
1838 1838
1839 r = -E2BIG; 1839 r = -E2BIG;
1840 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) 1840 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
1841 goto out; 1841 goto out;
1842 r = -ENOMEM; 1842 r = -ENOMEM;
1843 cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry) * cpuid->nent); 1843 cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry) * cpuid->nent);
1844 if (!cpuid_entries) 1844 if (!cpuid_entries)
1845 goto out; 1845 goto out;
1846 r = -EFAULT; 1846 r = -EFAULT;
1847 if (copy_from_user(cpuid_entries, entries, 1847 if (copy_from_user(cpuid_entries, entries,
1848 cpuid->nent * sizeof(struct kvm_cpuid_entry))) 1848 cpuid->nent * sizeof(struct kvm_cpuid_entry)))
1849 goto out_free; 1849 goto out_free;
1850 for (i = 0; i < cpuid->nent; i++) { 1850 for (i = 0; i < cpuid->nent; i++) {
1851 vcpu->arch.cpuid_entries[i].function = cpuid_entries[i].function; 1851 vcpu->arch.cpuid_entries[i].function = cpuid_entries[i].function;
1852 vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax; 1852 vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
1853 vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx; 1853 vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
1854 vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx; 1854 vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
1855 vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx; 1855 vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;
1856 vcpu->arch.cpuid_entries[i].index = 0; 1856 vcpu->arch.cpuid_entries[i].index = 0;
1857 vcpu->arch.cpuid_entries[i].flags = 0; 1857 vcpu->arch.cpuid_entries[i].flags = 0;
1858 vcpu->arch.cpuid_entries[i].padding[0] = 0; 1858 vcpu->arch.cpuid_entries[i].padding[0] = 0;
1859 vcpu->arch.cpuid_entries[i].padding[1] = 0; 1859 vcpu->arch.cpuid_entries[i].padding[1] = 0;
1860 vcpu->arch.cpuid_entries[i].padding[2] = 0; 1860 vcpu->arch.cpuid_entries[i].padding[2] = 0;
1861 } 1861 }
1862 vcpu->arch.cpuid_nent = cpuid->nent; 1862 vcpu->arch.cpuid_nent = cpuid->nent;
1863 cpuid_fix_nx_cap(vcpu); 1863 cpuid_fix_nx_cap(vcpu);
1864 r = 0; 1864 r = 0;
1865 kvm_apic_set_version(vcpu); 1865 kvm_apic_set_version(vcpu);
1866 kvm_x86_ops->cpuid_update(vcpu); 1866 kvm_x86_ops->cpuid_update(vcpu);
1867 update_cpuid(vcpu); 1867 update_cpuid(vcpu);
1868 1868
1869 out_free: 1869 out_free:
1870 vfree(cpuid_entries); 1870 vfree(cpuid_entries);
1871 out: 1871 out:
1872 return r; 1872 return r;
1873 } 1873 }
1874 1874
1875 static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, 1875 static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
1876 struct kvm_cpuid2 *cpuid, 1876 struct kvm_cpuid2 *cpuid,
1877 struct kvm_cpuid_entry2 __user *entries) 1877 struct kvm_cpuid_entry2 __user *entries)
1878 { 1878 {
1879 int r; 1879 int r;
1880 1880
1881 r = -E2BIG; 1881 r = -E2BIG;
1882 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) 1882 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
1883 goto out; 1883 goto out;
1884 r = -EFAULT; 1884 r = -EFAULT;
1885 if (copy_from_user(&vcpu->arch.cpuid_entries, entries, 1885 if (copy_from_user(&vcpu->arch.cpuid_entries, entries,
1886 cpuid->nent * sizeof(struct kvm_cpuid_entry2))) 1886 cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
1887 goto out; 1887 goto out;
1888 vcpu->arch.cpuid_nent = cpuid->nent; 1888 vcpu->arch.cpuid_nent = cpuid->nent;
1889 kvm_apic_set_version(vcpu); 1889 kvm_apic_set_version(vcpu);
1890 kvm_x86_ops->cpuid_update(vcpu); 1890 kvm_x86_ops->cpuid_update(vcpu);
1891 update_cpuid(vcpu); 1891 update_cpuid(vcpu);
1892 return 0; 1892 return 0;
1893 1893
1894 out: 1894 out:
1895 return r; 1895 return r;
1896 } 1896 }
1897 1897
1898 static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, 1898 static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
1899 struct kvm_cpuid2 *cpuid, 1899 struct kvm_cpuid2 *cpuid,
1900 struct kvm_cpuid_entry2 __user *entries) 1900 struct kvm_cpuid_entry2 __user *entries)
1901 { 1901 {
1902 int r; 1902 int r;
1903 1903
1904 r = -E2BIG; 1904 r = -E2BIG;
1905 if (cpuid->nent < vcpu->arch.cpuid_nent) 1905 if (cpuid->nent < vcpu->arch.cpuid_nent)
1906 goto out; 1906 goto out;
1907 r = -EFAULT; 1907 r = -EFAULT;
1908 if (copy_to_user(entries, &vcpu->arch.cpuid_entries, 1908 if (copy_to_user(entries, &vcpu->arch.cpuid_entries,
1909 vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) 1909 vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2)))
1910 goto out; 1910 goto out;
1911 return 0; 1911 return 0;
1912 1912
1913 out: 1913 out:
1914 cpuid->nent = vcpu->arch.cpuid_nent; 1914 cpuid->nent = vcpu->arch.cpuid_nent;
1915 return r; 1915 return r;
1916 } 1916 }
1917 1917
1918 static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, 1918 static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
1919 u32 index) 1919 u32 index)
1920 { 1920 {
1921 entry->function = function; 1921 entry->function = function;
1922 entry->index = index; 1922 entry->index = index;
1923 cpuid_count(entry->function, entry->index, 1923 cpuid_count(entry->function, entry->index,
1924 &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); 1924 &entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
1925 entry->flags = 0; 1925 entry->flags = 0;
1926 } 1926 }
1927 1927
1928 #define F(x) bit(X86_FEATURE_##x) 1928 #define F(x) bit(X86_FEATURE_##x)
1929 1929
1930 static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 1930 static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
1931 u32 index, int *nent, int maxnent) 1931 u32 index, int *nent, int maxnent)
1932 { 1932 {
1933 unsigned f_nx = is_efer_nx() ? F(NX) : 0; 1933 unsigned f_nx = is_efer_nx() ? F(NX) : 0;
1934 #ifdef CONFIG_X86_64 1934 #ifdef CONFIG_X86_64
1935 unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL) 1935 unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
1936 ? F(GBPAGES) : 0; 1936 ? F(GBPAGES) : 0;
1937 unsigned f_lm = F(LM); 1937 unsigned f_lm = F(LM);
1938 #else 1938 #else
1939 unsigned f_gbpages = 0; 1939 unsigned f_gbpages = 0;
1940 unsigned f_lm = 0; 1940 unsigned f_lm = 0;
1941 #endif 1941 #endif
1942 unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0; 1942 unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
1943 1943
1944 /* cpuid 1.edx */ 1944 /* cpuid 1.edx */
1945 const u32 kvm_supported_word0_x86_features = 1945 const u32 kvm_supported_word0_x86_features =
1946 F(FPU) | F(VME) | F(DE) | F(PSE) | 1946 F(FPU) | F(VME) | F(DE) | F(PSE) |
1947 F(TSC) | F(MSR) | F(PAE) | F(MCE) | 1947 F(TSC) | F(MSR) | F(PAE) | F(MCE) |
1948 F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) | 1948 F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
1949 F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | 1949 F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
1950 F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLSH) | 1950 F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLSH) |
1951 0 /* Reserved, DS, ACPI */ | F(MMX) | 1951 0 /* Reserved, DS, ACPI */ | F(MMX) |
1952 F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) | 1952 F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
1953 0 /* HTT, TM, Reserved, PBE */; 1953 0 /* HTT, TM, Reserved, PBE */;
1954 /* cpuid 0x80000001.edx */ 1954 /* cpuid 0x80000001.edx */
1955 const u32 kvm_supported_word1_x86_features = 1955 const u32 kvm_supported_word1_x86_features =
1956 F(FPU) | F(VME) | F(DE) | F(PSE) | 1956 F(FPU) | F(VME) | F(DE) | F(PSE) |
1957 F(TSC) | F(MSR) | F(PAE) | F(MCE) | 1957 F(TSC) | F(MSR) | F(PAE) | F(MCE) |
1958 F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) | 1958 F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
1959 F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | 1959 F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
1960 F(PAT) | F(PSE36) | 0 /* Reserved */ | 1960 F(PAT) | F(PSE36) | 0 /* Reserved */ |
1961 f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | 1961 f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
1962 F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | 1962 F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp |
1963 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); 1963 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
1964 /* cpuid 1.ecx */ 1964 /* cpuid 1.ecx */
1965 const u32 kvm_supported_word4_x86_features = 1965 const u32 kvm_supported_word4_x86_features =
1966 F(XMM3) | 0 /* Reserved, DTES64, MONITOR */ | 1966 F(XMM3) | 0 /* Reserved, DTES64, MONITOR */ |
1967 0 /* DS-CPL, VMX, SMX, EST */ | 1967 0 /* DS-CPL, VMX, SMX, EST */ |
1968 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | 1968 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
1969 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | 1969 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ |
1970 0 /* Reserved, DCA */ | F(XMM4_1) | 1970 0 /* Reserved, DCA */ | F(XMM4_1) |
1971 F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | 1971 F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
1972 0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */; 1972 0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */;
1973 /* cpuid 0x80000001.ecx */ 1973 /* cpuid 0x80000001.ecx */
1974 const u32 kvm_supported_word6_x86_features = 1974 const u32 kvm_supported_word6_x86_features =
1975 F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ | 1975 F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ |
1976 F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) | 1976 F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
1977 F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) | 1977 F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) |
1978 0 /* SKINIT */ | 0 /* WDT */; 1978 0 /* SKINIT */ | 0 /* WDT */;
1979 1979
1980 /* all calls to cpuid_count() should be made on the same cpu */ 1980 /* all calls to cpuid_count() should be made on the same cpu */
1981 get_cpu(); 1981 get_cpu();
1982 do_cpuid_1_ent(entry, function, index); 1982 do_cpuid_1_ent(entry, function, index);
1983 ++*nent; 1983 ++*nent;
1984 1984
1985 switch (function) { 1985 switch (function) {
1986 case 0: 1986 case 0:
1987 entry->eax = min(entry->eax, (u32)0xd); 1987 entry->eax = min(entry->eax, (u32)0xd);
1988 break; 1988 break;
1989 case 1: 1989 case 1:
1990 entry->edx &= kvm_supported_word0_x86_features; 1990 entry->edx &= kvm_supported_word0_x86_features;
1991 entry->ecx &= kvm_supported_word4_x86_features; 1991 entry->ecx &= kvm_supported_word4_x86_features;
1992 /* we support x2apic emulation even if host does not support 1992 /* we support x2apic emulation even if host does not support
1993 * it since we emulate x2apic in software */ 1993 * it since we emulate x2apic in software */
1994 entry->ecx |= F(X2APIC); 1994 entry->ecx |= F(X2APIC);
1995 break; 1995 break;
1996 /* function 2 entries are STATEFUL. That is, repeated cpuid commands 1996 /* function 2 entries are STATEFUL. That is, repeated cpuid commands
1997 * may return different values. This forces us to get_cpu() before 1997 * may return different values. This forces us to get_cpu() before
1998 * issuing the first command, and also to emulate this annoying behavior 1998 * issuing the first command, and also to emulate this annoying behavior
1999 * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */ 1999 * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */
2000 case 2: { 2000 case 2: {
2001 int t, times = entry->eax & 0xff; 2001 int t, times = entry->eax & 0xff;
2002 2002
2003 entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; 2003 entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
2004 entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; 2004 entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
2005 for (t = 1; t < times && *nent < maxnent; ++t) { 2005 for (t = 1; t < times && *nent < maxnent; ++t) {
2006 do_cpuid_1_ent(&entry[t], function, 0); 2006 do_cpuid_1_ent(&entry[t], function, 0);
2007 entry[t].flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; 2007 entry[t].flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
2008 ++*nent; 2008 ++*nent;
2009 } 2009 }
2010 break; 2010 break;
2011 } 2011 }
2012 /* function 4 and 0xb have additional index. */ 2012 /* function 4 and 0xb have additional index. */
2013 case 4: { 2013 case 4: {
2014 int i, cache_type; 2014 int i, cache_type;
2015 2015
2016 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2016 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2017 /* read more entries until cache_type is zero */ 2017 /* read more entries until cache_type is zero */
2018 for (i = 1; *nent < maxnent; ++i) { 2018 for (i = 1; *nent < maxnent; ++i) {
2019 cache_type = entry[i - 1].eax & 0x1f; 2019 cache_type = entry[i - 1].eax & 0x1f;
2020 if (!cache_type) 2020 if (!cache_type)
2021 break; 2021 break;
2022 do_cpuid_1_ent(&entry[i], function, i); 2022 do_cpuid_1_ent(&entry[i], function, i);
2023 entry[i].flags |= 2023 entry[i].flags |=
2024 KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2024 KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2025 ++*nent; 2025 ++*nent;
2026 } 2026 }
2027 break; 2027 break;
2028 } 2028 }
2029 case 0xb: { 2029 case 0xb: {
2030 int i, level_type; 2030 int i, level_type;
2031 2031
2032 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2032 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2033 /* read more entries until level_type is zero */ 2033 /* read more entries until level_type is zero */
2034 for (i = 1; *nent < maxnent; ++i) { 2034 for (i = 1; *nent < maxnent; ++i) {
2035 level_type = entry[i - 1].ecx & 0xff00; 2035 level_type = entry[i - 1].ecx & 0xff00;
2036 if (!level_type) 2036 if (!level_type)
2037 break; 2037 break;
2038 do_cpuid_1_ent(&entry[i], function, i); 2038 do_cpuid_1_ent(&entry[i], function, i);
2039 entry[i].flags |= 2039 entry[i].flags |=
2040 KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2040 KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2041 ++*nent; 2041 ++*nent;
2042 } 2042 }
2043 break; 2043 break;
2044 } 2044 }
2045 case 0xd: { 2045 case 0xd: {
2046 int i; 2046 int i;
2047 2047
2048 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2048 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2049 for (i = 1; *nent < maxnent; ++i) { 2049 for (i = 1; *nent < maxnent; ++i) {
2050 if (entry[i - 1].eax == 0 && i != 2) 2050 if (entry[i - 1].eax == 0 && i != 2)
2051 break; 2051 break;
2052 do_cpuid_1_ent(&entry[i], function, i); 2052 do_cpuid_1_ent(&entry[i], function, i);
2053 entry[i].flags |= 2053 entry[i].flags |=
2054 KVM_CPUID_FLAG_SIGNIFCANT_INDEX; 2054 KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
2055 ++*nent; 2055 ++*nent;
2056 } 2056 }
2057 break; 2057 break;
2058 } 2058 }
2059 case KVM_CPUID_SIGNATURE: { 2059 case KVM_CPUID_SIGNATURE: {
2060 char signature[12] = "KVMKVMKVM\0\0"; 2060 char signature[12] = "KVMKVMKVM\0\0";
2061 u32 *sigptr = (u32 *)signature; 2061 u32 *sigptr = (u32 *)signature;
2062 entry->eax = 0; 2062 entry->eax = 0;
2063 entry->ebx = sigptr[0]; 2063 entry->ebx = sigptr[0];
2064 entry->ecx = sigptr[1]; 2064 entry->ecx = sigptr[1];
2065 entry->edx = sigptr[2]; 2065 entry->edx = sigptr[2];
2066 break; 2066 break;
2067 } 2067 }
2068 case KVM_CPUID_FEATURES: 2068 case KVM_CPUID_FEATURES:
2069 entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) | 2069 entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
2070 (1 << KVM_FEATURE_NOP_IO_DELAY) | 2070 (1 << KVM_FEATURE_NOP_IO_DELAY) |
2071 (1 << KVM_FEATURE_CLOCKSOURCE2) | 2071 (1 << KVM_FEATURE_CLOCKSOURCE2) |
2072 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); 2072 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
2073 entry->ebx = 0; 2073 entry->ebx = 0;
2074 entry->ecx = 0; 2074 entry->ecx = 0;
2075 entry->edx = 0; 2075 entry->edx = 0;
2076 break; 2076 break;
2077 case 0x80000000: 2077 case 0x80000000:
2078 entry->eax = min(entry->eax, 0x8000001a); 2078 entry->eax = min(entry->eax, 0x8000001a);
2079 break; 2079 break;
2080 case 0x80000001: 2080 case 0x80000001:
2081 entry->edx &= kvm_supported_word1_x86_features; 2081 entry->edx &= kvm_supported_word1_x86_features;
2082 entry->ecx &= kvm_supported_word6_x86_features; 2082 entry->ecx &= kvm_supported_word6_x86_features;
2083 break; 2083 break;
2084 } 2084 }
2085 2085
2086 kvm_x86_ops->set_supported_cpuid(function, entry); 2086 kvm_x86_ops->set_supported_cpuid(function, entry);
2087 2087
2088 put_cpu(); 2088 put_cpu();
2089 } 2089 }
2090 2090
2091 #undef F 2091 #undef F
2092 2092
2093 static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, 2093 static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
2094 struct kvm_cpuid_entry2 __user *entries) 2094 struct kvm_cpuid_entry2 __user *entries)
2095 { 2095 {
2096 struct kvm_cpuid_entry2 *cpuid_entries; 2096 struct kvm_cpuid_entry2 *cpuid_entries;
2097 int limit, nent = 0, r = -E2BIG; 2097 int limit, nent = 0, r = -E2BIG;
2098 u32 func; 2098 u32 func;
2099 2099
2100 if (cpuid->nent < 1) 2100 if (cpuid->nent < 1)
2101 goto out; 2101 goto out;
2102 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) 2102 if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
2103 cpuid->nent = KVM_MAX_CPUID_ENTRIES; 2103 cpuid->nent = KVM_MAX_CPUID_ENTRIES;
2104 r = -ENOMEM; 2104 r = -ENOMEM;
2105 cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent); 2105 cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
2106 if (!cpuid_entries) 2106 if (!cpuid_entries)
2107 goto out; 2107 goto out;
2108 2108
2109 do_cpuid_ent(&cpuid_entries[0], 0, 0, &nent, cpuid->nent); 2109 do_cpuid_ent(&cpuid_entries[0], 0, 0, &nent, cpuid->nent);
2110 limit = cpuid_entries[0].eax; 2110 limit = cpuid_entries[0].eax;
2111 for (func = 1; func <= limit && nent < cpuid->nent; ++func) 2111 for (func = 1; func <= limit && nent < cpuid->nent; ++func)
2112 do_cpuid_ent(&cpuid_entries[nent], func, 0, 2112 do_cpuid_ent(&cpuid_entries[nent], func, 0,
2113 &nent, cpuid->nent); 2113 &nent, cpuid->nent);
2114 r = -E2BIG; 2114 r = -E2BIG;
2115 if (nent >= cpuid->nent) 2115 if (nent >= cpuid->nent)
2116 goto out_free; 2116 goto out_free;
2117 2117
2118 do_cpuid_ent(&cpuid_entries[nent], 0x80000000, 0, &nent, cpuid->nent); 2118 do_cpuid_ent(&cpuid_entries[nent], 0x80000000, 0, &nent, cpuid->nent);
2119 limit = cpuid_entries[nent - 1].eax; 2119 limit = cpuid_entries[nent - 1].eax;
2120 for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func) 2120 for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func)
2121 do_cpuid_ent(&cpuid_entries[nent], func, 0, 2121 do_cpuid_ent(&cpuid_entries[nent], func, 0,
2122 &nent, cpuid->nent); 2122 &nent, cpuid->nent);
2123 2123
2124 2124
2125 2125
2126 r = -E2BIG; 2126 r = -E2BIG;
2127 if (nent >= cpuid->nent) 2127 if (nent >= cpuid->nent)
2128 goto out_free; 2128 goto out_free;
2129 2129
2130 do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, &nent, 2130 do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, &nent,
2131 cpuid->nent); 2131 cpuid->nent);
2132 2132
2133 r = -E2BIG; 2133 r = -E2BIG;
2134 if (nent >= cpuid->nent) 2134 if (nent >= cpuid->nent)
2135 goto out_free; 2135 goto out_free;
2136 2136
2137 do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_FEATURES, 0, &nent, 2137 do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_FEATURES, 0, &nent,
2138 cpuid->nent); 2138 cpuid->nent);
2139 2139
2140 r = -E2BIG; 2140 r = -E2BIG;
2141 if (nent >= cpuid->nent) 2141 if (nent >= cpuid->nent)
2142 goto out_free; 2142 goto out_free;
2143 2143
2144 r = -EFAULT; 2144 r = -EFAULT;
2145 if (copy_to_user(entries, cpuid_entries, 2145 if (copy_to_user(entries, cpuid_entries,
2146 nent * sizeof(struct kvm_cpuid_entry2))) 2146 nent * sizeof(struct kvm_cpuid_entry2)))
2147 goto out_free; 2147 goto out_free;
2148 cpuid->nent = nent; 2148 cpuid->nent = nent;
2149 r = 0; 2149 r = 0;
2150 2150
2151 out_free: 2151 out_free:
2152 vfree(cpuid_entries); 2152 vfree(cpuid_entries);
2153 out: 2153 out:
2154 return r; 2154 return r;
2155 } 2155 }
2156 2156
2157 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, 2157 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
2158 struct kvm_lapic_state *s) 2158 struct kvm_lapic_state *s)
2159 { 2159 {
2160 memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s); 2160 memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
2161 2161
2162 return 0; 2162 return 0;
2163 } 2163 }
2164 2164
2165 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, 2165 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
2166 struct kvm_lapic_state *s) 2166 struct kvm_lapic_state *s)
2167 { 2167 {
2168 memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s); 2168 memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
2169 kvm_apic_post_state_restore(vcpu); 2169 kvm_apic_post_state_restore(vcpu);
2170 update_cr8_intercept(vcpu); 2170 update_cr8_intercept(vcpu);
2171 2171
2172 return 0; 2172 return 0;
2173 } 2173 }
2174 2174
2175 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, 2175 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
2176 struct kvm_interrupt *irq) 2176 struct kvm_interrupt *irq)
2177 { 2177 {
2178 if (irq->irq < 0 || irq->irq >= 256) 2178 if (irq->irq < 0 || irq->irq >= 256)
2179 return -EINVAL; 2179 return -EINVAL;
2180 if (irqchip_in_kernel(vcpu->kvm)) 2180 if (irqchip_in_kernel(vcpu->kvm))
2181 return -ENXIO; 2181 return -ENXIO;
2182 2182
2183 kvm_queue_interrupt(vcpu, irq->irq, false); 2183 kvm_queue_interrupt(vcpu, irq->irq, false);
2184 2184
2185 return 0; 2185 return 0;
2186 } 2186 }
2187 2187
2188 static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) 2188 static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
2189 { 2189 {
2190 kvm_inject_nmi(vcpu); 2190 kvm_inject_nmi(vcpu);
2191 2191
2192 return 0; 2192 return 0;
2193 } 2193 }
2194 2194
2195 static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu, 2195 static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
2196 struct kvm_tpr_access_ctl *tac) 2196 struct kvm_tpr_access_ctl *tac)
2197 { 2197 {
2198 if (tac->flags) 2198 if (tac->flags)
2199 return -EINVAL; 2199 return -EINVAL;
2200 vcpu->arch.tpr_access_reporting = !!tac->enabled; 2200 vcpu->arch.tpr_access_reporting = !!tac->enabled;
2201 return 0; 2201 return 0;
2202 } 2202 }
2203 2203
2204 static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, 2204 static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
2205 u64 mcg_cap) 2205 u64 mcg_cap)
2206 { 2206 {
2207 int r; 2207 int r;
2208 unsigned bank_num = mcg_cap & 0xff, bank; 2208 unsigned bank_num = mcg_cap & 0xff, bank;
2209 2209
2210 r = -EINVAL; 2210 r = -EINVAL;
2211 if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) 2211 if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS)
2212 goto out; 2212 goto out;
2213 if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) 2213 if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000))
2214 goto out; 2214 goto out;
2215 r = 0; 2215 r = 0;
2216 vcpu->arch.mcg_cap = mcg_cap; 2216 vcpu->arch.mcg_cap = mcg_cap;
2217 /* Init IA32_MCG_CTL to all 1s */ 2217 /* Init IA32_MCG_CTL to all 1s */
2218 if (mcg_cap & MCG_CTL_P) 2218 if (mcg_cap & MCG_CTL_P)
2219 vcpu->arch.mcg_ctl = ~(u64)0; 2219 vcpu->arch.mcg_ctl = ~(u64)0;
2220 /* Init IA32_MCi_CTL to all 1s */ 2220 /* Init IA32_MCi_CTL to all 1s */
2221 for (bank = 0; bank < bank_num; bank++) 2221 for (bank = 0; bank < bank_num; bank++)
2222 vcpu->arch.mce_banks[bank*4] = ~(u64)0; 2222 vcpu->arch.mce_banks[bank*4] = ~(u64)0;
2223 out: 2223 out:
2224 return r; 2224 return r;
2225 } 2225 }
2226 2226
2227 static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu, 2227 static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
2228 struct kvm_x86_mce *mce) 2228 struct kvm_x86_mce *mce)
2229 { 2229 {
2230 u64 mcg_cap = vcpu->arch.mcg_cap; 2230 u64 mcg_cap = vcpu->arch.mcg_cap;
2231 unsigned bank_num = mcg_cap & 0xff; 2231 unsigned bank_num = mcg_cap & 0xff;
2232 u64 *banks = vcpu->arch.mce_banks; 2232 u64 *banks = vcpu->arch.mce_banks;
2233 2233
2234 if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL)) 2234 if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
2235 return -EINVAL; 2235 return -EINVAL;
2236 /* 2236 /*
2237 * if IA32_MCG_CTL is not all 1s, the uncorrected error 2237 * if IA32_MCG_CTL is not all 1s, the uncorrected error
2238 * reporting is disabled 2238 * reporting is disabled
2239 */ 2239 */
2240 if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) && 2240 if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
2241 vcpu->arch.mcg_ctl != ~(u64)0) 2241 vcpu->arch.mcg_ctl != ~(u64)0)
2242 return 0; 2242 return 0;
2243 banks += 4 * mce->bank; 2243 banks += 4 * mce->bank;
2244 /* 2244 /*
2245 * if IA32_MCi_CTL is not all 1s, the uncorrected error 2245 * if IA32_MCi_CTL is not all 1s, the uncorrected error
2246 * reporting is disabled for the bank 2246 * reporting is disabled for the bank
2247 */ 2247 */
2248 if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0) 2248 if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0)
2249 return 0; 2249 return 0;
2250 if (mce->status & MCI_STATUS_UC) { 2250 if (mce->status & MCI_STATUS_UC) {
2251 if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) || 2251 if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
2252 !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) { 2252 !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) {
2253 printk(KERN_DEBUG "kvm: set_mce: " 2253 printk(KERN_DEBUG "kvm: set_mce: "
2254 "injects mce exception while " 2254 "injects mce exception while "
2255 "previous one is in progress!\n"); 2255 "previous one is in progress!\n");
2256 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); 2256 set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
2257 return 0; 2257 return 0;
2258 } 2258 }
2259 if (banks[1] & MCI_STATUS_VAL) 2259 if (banks[1] & MCI_STATUS_VAL)
2260 mce->status |= MCI_STATUS_OVER; 2260 mce->status |= MCI_STATUS_OVER;
2261 banks[2] = mce->addr; 2261 banks[2] = mce->addr;
2262 banks[3] = mce->misc; 2262 banks[3] = mce->misc;
2263 vcpu->arch.mcg_status = mce->mcg_status; 2263 vcpu->arch.mcg_status = mce->mcg_status;
2264 banks[1] = mce->status; 2264 banks[1] = mce->status;
2265 kvm_queue_exception(vcpu, MC_VECTOR); 2265 kvm_queue_exception(vcpu, MC_VECTOR);
2266 } else if (!(banks[1] & MCI_STATUS_VAL) 2266 } else if (!(banks[1] & MCI_STATUS_VAL)
2267 || !(banks[1] & MCI_STATUS_UC)) { 2267 || !(banks[1] & MCI_STATUS_UC)) {
2268 if (banks[1] & MCI_STATUS_VAL) 2268 if (banks[1] & MCI_STATUS_VAL)
2269 mce->status |= MCI_STATUS_OVER; 2269 mce->status |= MCI_STATUS_OVER;
2270 banks[2] = mce->addr; 2270 banks[2] = mce->addr;
2271 banks[3] = mce->misc; 2271 banks[3] = mce->misc;
2272 banks[1] = mce->status; 2272 banks[1] = mce->status;
2273 } else 2273 } else
2274 banks[1] |= MCI_STATUS_OVER; 2274 banks[1] |= MCI_STATUS_OVER;
2275 return 0; 2275 return 0;
2276 } 2276 }
2277 2277
2278 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, 2278 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
2279 struct kvm_vcpu_events *events) 2279 struct kvm_vcpu_events *events)
2280 { 2280 {
2281 events->exception.injected = 2281 events->exception.injected =
2282 vcpu->arch.exception.pending && 2282 vcpu->arch.exception.pending &&
2283 !kvm_exception_is_soft(vcpu->arch.exception.nr); 2283 !kvm_exception_is_soft(vcpu->arch.exception.nr);
2284 events->exception.nr = vcpu->arch.exception.nr; 2284 events->exception.nr = vcpu->arch.exception.nr;
2285 events->exception.has_error_code = vcpu->arch.exception.has_error_code; 2285 events->exception.has_error_code = vcpu->arch.exception.has_error_code;
2286 events->exception.error_code = vcpu->arch.exception.error_code; 2286 events->exception.error_code = vcpu->arch.exception.error_code;
2287 2287
2288 events->interrupt.injected = 2288 events->interrupt.injected =
2289 vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft; 2289 vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft;
2290 events->interrupt.nr = vcpu->arch.interrupt.nr; 2290 events->interrupt.nr = vcpu->arch.interrupt.nr;
2291 events->interrupt.soft = 0; 2291 events->interrupt.soft = 0;
2292 events->interrupt.shadow = 2292 events->interrupt.shadow =
2293 kvm_x86_ops->get_interrupt_shadow(vcpu, 2293 kvm_x86_ops->get_interrupt_shadow(vcpu,
2294 KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI); 2294 KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
2295 2295
2296 events->nmi.injected = vcpu->arch.nmi_injected; 2296 events->nmi.injected = vcpu->arch.nmi_injected;
2297 events->nmi.pending = vcpu->arch.nmi_pending; 2297 events->nmi.pending = vcpu->arch.nmi_pending;
2298 events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu); 2298 events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
2299 2299
2300 events->sipi_vector = vcpu->arch.sipi_vector; 2300 events->sipi_vector = vcpu->arch.sipi_vector;
2301 2301
2302 events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING 2302 events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
2303 | KVM_VCPUEVENT_VALID_SIPI_VECTOR 2303 | KVM_VCPUEVENT_VALID_SIPI_VECTOR
2304 | KVM_VCPUEVENT_VALID_SHADOW); 2304 | KVM_VCPUEVENT_VALID_SHADOW);
2305 } 2305 }
2306 2306
2307 static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, 2307 static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
2308 struct kvm_vcpu_events *events) 2308 struct kvm_vcpu_events *events)
2309 { 2309 {
2310 if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING 2310 if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING
2311 | KVM_VCPUEVENT_VALID_SIPI_VECTOR 2311 | KVM_VCPUEVENT_VALID_SIPI_VECTOR
2312 | KVM_VCPUEVENT_VALID_SHADOW)) 2312 | KVM_VCPUEVENT_VALID_SHADOW))
2313 return -EINVAL; 2313 return -EINVAL;
2314 2314
2315 vcpu->arch.exception.pending = events->exception.injected; 2315 vcpu->arch.exception.pending = events->exception.injected;
2316 vcpu->arch.exception.nr = events->exception.nr; 2316 vcpu->arch.exception.nr = events->exception.nr;
2317 vcpu->arch.exception.has_error_code = events->exception.has_error_code; 2317 vcpu->arch.exception.has_error_code = events->exception.has_error_code;
2318 vcpu->arch.exception.error_code = events->exception.error_code; 2318 vcpu->arch.exception.error_code = events->exception.error_code;
2319 2319
2320 vcpu->arch.interrupt.pending = events->interrupt.injected; 2320 vcpu->arch.interrupt.pending = events->interrupt.injected;
2321 vcpu->arch.interrupt.nr = events->interrupt.nr; 2321 vcpu->arch.interrupt.nr = events->interrupt.nr;
2322 vcpu->arch.interrupt.soft = events->interrupt.soft; 2322 vcpu->arch.interrupt.soft = events->interrupt.soft;
2323 if (vcpu->arch.interrupt.pending && irqchip_in_kernel(vcpu->kvm)) 2323 if (vcpu->arch.interrupt.pending && irqchip_in_kernel(vcpu->kvm))
2324 kvm_pic_clear_isr_ack(vcpu->kvm); 2324 kvm_pic_clear_isr_ack(vcpu->kvm);
2325 if (events->flags & KVM_VCPUEVENT_VALID_SHADOW) 2325 if (events->flags & KVM_VCPUEVENT_VALID_SHADOW)
2326 kvm_x86_ops->set_interrupt_shadow(vcpu, 2326 kvm_x86_ops->set_interrupt_shadow(vcpu,
2327 events->interrupt.shadow); 2327 events->interrupt.shadow);
2328 2328
2329 vcpu->arch.nmi_injected = events->nmi.injected; 2329 vcpu->arch.nmi_injected = events->nmi.injected;
2330 if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) 2330 if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
2331 vcpu->arch.nmi_pending = events->nmi.pending; 2331 vcpu->arch.nmi_pending = events->nmi.pending;
2332 kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked); 2332 kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
2333 2333
2334 if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR) 2334 if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
2335 vcpu->arch.sipi_vector = events->sipi_vector; 2335 vcpu->arch.sipi_vector = events->sipi_vector;
2336 2336
2337 return 0; 2337 return 0;
2338 } 2338 }
2339 2339
2340 static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, 2340 static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
2341 struct kvm_debugregs *dbgregs) 2341 struct kvm_debugregs *dbgregs)
2342 { 2342 {
2343 memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db)); 2343 memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
2344 dbgregs->dr6 = vcpu->arch.dr6; 2344 dbgregs->dr6 = vcpu->arch.dr6;
2345 dbgregs->dr7 = vcpu->arch.dr7; 2345 dbgregs->dr7 = vcpu->arch.dr7;
2346 dbgregs->flags = 0; 2346 dbgregs->flags = 0;
2347 } 2347 }
2348 2348
2349 static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, 2349 static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
2350 struct kvm_debugregs *dbgregs) 2350 struct kvm_debugregs *dbgregs)
2351 { 2351 {
2352 if (dbgregs->flags) 2352 if (dbgregs->flags)
2353 return -EINVAL; 2353 return -EINVAL;
2354 2354
2355 memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db)); 2355 memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db));
2356 vcpu->arch.dr6 = dbgregs->dr6; 2356 vcpu->arch.dr6 = dbgregs->dr6;
2357 vcpu->arch.dr7 = dbgregs->dr7; 2357 vcpu->arch.dr7 = dbgregs->dr7;
2358 2358
2359 return 0; 2359 return 0;
2360 } 2360 }
2361 2361
2362 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, 2362 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
2363 struct kvm_xsave *guest_xsave) 2363 struct kvm_xsave *guest_xsave)
2364 { 2364 {
2365 if (cpu_has_xsave) 2365 if (cpu_has_xsave)
2366 memcpy(guest_xsave->region, 2366 memcpy(guest_xsave->region,
2367 &vcpu->arch.guest_fpu.state->xsave, 2367 &vcpu->arch.guest_fpu.state->xsave,
2368 sizeof(struct xsave_struct)); 2368 sizeof(struct xsave_struct));
2369 else { 2369 else {
2370 memcpy(guest_xsave->region, 2370 memcpy(guest_xsave->region,
2371 &vcpu->arch.guest_fpu.state->fxsave, 2371 &vcpu->arch.guest_fpu.state->fxsave,
2372 sizeof(struct i387_fxsave_struct)); 2372 sizeof(struct i387_fxsave_struct));
2373 *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] = 2373 *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
2374 XSTATE_FPSSE; 2374 XSTATE_FPSSE;
2375 } 2375 }
2376 } 2376 }
2377 2377
2378 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, 2378 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
2379 struct kvm_xsave *guest_xsave) 2379 struct kvm_xsave *guest_xsave)
2380 { 2380 {
2381 u64 xstate_bv = 2381 u64 xstate_bv =
2382 *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)]; 2382 *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
2383 2383
2384 if (cpu_has_xsave) 2384 if (cpu_has_xsave)
2385 memcpy(&vcpu->arch.guest_fpu.state->xsave, 2385 memcpy(&vcpu->arch.guest_fpu.state->xsave,
2386 guest_xsave->region, sizeof(struct xsave_struct)); 2386 guest_xsave->region, sizeof(struct xsave_struct));
2387 else { 2387 else {
2388 if (xstate_bv & ~XSTATE_FPSSE) 2388 if (xstate_bv & ~XSTATE_FPSSE)
2389 return -EINVAL; 2389 return -EINVAL;
2390 memcpy(&vcpu->arch.guest_fpu.state->fxsave, 2390 memcpy(&vcpu->arch.guest_fpu.state->fxsave,
2391 guest_xsave->region, sizeof(struct i387_fxsave_struct)); 2391 guest_xsave->region, sizeof(struct i387_fxsave_struct));
2392 } 2392 }
2393 return 0; 2393 return 0;
2394 } 2394 }
2395 2395
2396 static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, 2396 static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
2397 struct kvm_xcrs *guest_xcrs) 2397 struct kvm_xcrs *guest_xcrs)
2398 { 2398 {
2399 if (!cpu_has_xsave) { 2399 if (!cpu_has_xsave) {
2400 guest_xcrs->nr_xcrs = 0; 2400 guest_xcrs->nr_xcrs = 0;
2401 return; 2401 return;
2402 } 2402 }
2403 2403
2404 guest_xcrs->nr_xcrs = 1; 2404 guest_xcrs->nr_xcrs = 1;
2405 guest_xcrs->flags = 0; 2405 guest_xcrs->flags = 0;
2406 guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK; 2406 guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK;
2407 guest_xcrs->xcrs[0].value = vcpu->arch.xcr0; 2407 guest_xcrs->xcrs[0].value = vcpu->arch.xcr0;
2408 } 2408 }
2409 2409
2410 static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, 2410 static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
2411 struct kvm_xcrs *guest_xcrs) 2411 struct kvm_xcrs *guest_xcrs)
2412 { 2412 {
2413 int i, r = 0; 2413 int i, r = 0;
2414 2414
2415 if (!cpu_has_xsave) 2415 if (!cpu_has_xsave)
2416 return -EINVAL; 2416 return -EINVAL;
2417 2417
2418 if (guest_xcrs->nr_xcrs > KVM_MAX_XCRS || guest_xcrs->flags) 2418 if (guest_xcrs->nr_xcrs > KVM_MAX_XCRS || guest_xcrs->flags)
2419 return -EINVAL; 2419 return -EINVAL;
2420 2420
2421 for (i = 0; i < guest_xcrs->nr_xcrs; i++) 2421 for (i = 0; i < guest_xcrs->nr_xcrs; i++)
2422 /* Only support XCR0 currently */ 2422 /* Only support XCR0 currently */
2423 if (guest_xcrs->xcrs[0].xcr == XCR_XFEATURE_ENABLED_MASK) { 2423 if (guest_xcrs->xcrs[0].xcr == XCR_XFEATURE_ENABLED_MASK) {
2424 r = __kvm_set_xcr(vcpu, XCR_XFEATURE_ENABLED_MASK, 2424 r = __kvm_set_xcr(vcpu, XCR_XFEATURE_ENABLED_MASK,
2425 guest_xcrs->xcrs[0].value); 2425 guest_xcrs->xcrs[0].value);
2426 break; 2426 break;
2427 } 2427 }
2428 if (r) 2428 if (r)
2429 r = -EINVAL; 2429 r = -EINVAL;
2430 return r; 2430 return r;
2431 } 2431 }
2432 2432
2433 long kvm_arch_vcpu_ioctl(struct file *filp, 2433 long kvm_arch_vcpu_ioctl(struct file *filp,
2434 unsigned int ioctl, unsigned long arg) 2434 unsigned int ioctl, unsigned long arg)
2435 { 2435 {
2436 struct kvm_vcpu *vcpu = filp->private_data; 2436 struct kvm_vcpu *vcpu = filp->private_data;
2437 void __user *argp = (void __user *)arg; 2437 void __user *argp = (void __user *)arg;
2438 int r; 2438 int r;
2439 union { 2439 union {
2440 struct kvm_lapic_state *lapic; 2440 struct kvm_lapic_state *lapic;
2441 struct kvm_xsave *xsave; 2441 struct kvm_xsave *xsave;
2442 struct kvm_xcrs *xcrs; 2442 struct kvm_xcrs *xcrs;
2443 void *buffer; 2443 void *buffer;
2444 } u; 2444 } u;
2445 2445
2446 u.buffer = NULL; 2446 u.buffer = NULL;
2447 switch (ioctl) { 2447 switch (ioctl) {
2448 case KVM_GET_LAPIC: { 2448 case KVM_GET_LAPIC: {
2449 r = -EINVAL; 2449 r = -EINVAL;
2450 if (!vcpu->arch.apic) 2450 if (!vcpu->arch.apic)
2451 goto out; 2451 goto out;
2452 u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); 2452 u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
2453 2453
2454 r = -ENOMEM; 2454 r = -ENOMEM;
2455 if (!u.lapic) 2455 if (!u.lapic)
2456 goto out; 2456 goto out;
2457 r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic); 2457 r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic);
2458 if (r) 2458 if (r)
2459 goto out; 2459 goto out;
2460 r = -EFAULT; 2460 r = -EFAULT;
2461 if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state))) 2461 if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state)))
2462 goto out; 2462 goto out;
2463 r = 0; 2463 r = 0;
2464 break; 2464 break;
2465 } 2465 }
2466 case KVM_SET_LAPIC: { 2466 case KVM_SET_LAPIC: {
2467 r = -EINVAL; 2467 r = -EINVAL;
2468 if (!vcpu->arch.apic) 2468 if (!vcpu->arch.apic)
2469 goto out; 2469 goto out;
2470 u.lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); 2470 u.lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
2471 r = -ENOMEM; 2471 r = -ENOMEM;
2472 if (!u.lapic) 2472 if (!u.lapic)
2473 goto out; 2473 goto out;
2474 r = -EFAULT; 2474 r = -EFAULT;
2475 if (copy_from_user(u.lapic, argp, sizeof(struct kvm_lapic_state))) 2475 if (copy_from_user(u.lapic, argp, sizeof(struct kvm_lapic_state)))
2476 goto out; 2476 goto out;
2477 r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic); 2477 r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
2478 if (r) 2478 if (r)
2479 goto out; 2479 goto out;
2480 r = 0; 2480 r = 0;
2481 break; 2481 break;
2482 } 2482 }
2483 case KVM_INTERRUPT: { 2483 case KVM_INTERRUPT: {
2484 struct kvm_interrupt irq; 2484 struct kvm_interrupt irq;
2485 2485
2486 r = -EFAULT; 2486 r = -EFAULT;
2487 if (copy_from_user(&irq, argp, sizeof irq)) 2487 if (copy_from_user(&irq, argp, sizeof irq))
2488 goto out; 2488 goto out;
2489 r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); 2489 r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
2490 if (r) 2490 if (r)
2491 goto out; 2491 goto out;
2492 r = 0; 2492 r = 0;
2493 break; 2493 break;
2494 } 2494 }
2495 case KVM_NMI: { 2495 case KVM_NMI: {
2496 r = kvm_vcpu_ioctl_nmi(vcpu); 2496 r = kvm_vcpu_ioctl_nmi(vcpu);
2497 if (r) 2497 if (r)
2498 goto out; 2498 goto out;
2499 r = 0; 2499 r = 0;
2500 break; 2500 break;
2501 } 2501 }
2502 case KVM_SET_CPUID: { 2502 case KVM_SET_CPUID: {
2503 struct kvm_cpuid __user *cpuid_arg = argp; 2503 struct kvm_cpuid __user *cpuid_arg = argp;
2504 struct kvm_cpuid cpuid; 2504 struct kvm_cpuid cpuid;
2505 2505
2506 r = -EFAULT; 2506 r = -EFAULT;
2507 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) 2507 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
2508 goto out; 2508 goto out;
2509 r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries); 2509 r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries);
2510 if (r) 2510 if (r)
2511 goto out; 2511 goto out;
2512 break; 2512 break;
2513 } 2513 }
2514 case KVM_SET_CPUID2: { 2514 case KVM_SET_CPUID2: {
2515 struct kvm_cpuid2 __user *cpuid_arg = argp; 2515 struct kvm_cpuid2 __user *cpuid_arg = argp;
2516 struct kvm_cpuid2 cpuid; 2516 struct kvm_cpuid2 cpuid;
2517 2517
2518 r = -EFAULT; 2518 r = -EFAULT;
2519 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) 2519 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
2520 goto out; 2520 goto out;
2521 r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid, 2521 r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid,
2522 cpuid_arg->entries); 2522 cpuid_arg->entries);
2523 if (r) 2523 if (r)
2524 goto out; 2524 goto out;
2525 break; 2525 break;
2526 } 2526 }
2527 case KVM_GET_CPUID2: { 2527 case KVM_GET_CPUID2: {
2528 struct kvm_cpuid2 __user *cpuid_arg = argp; 2528 struct kvm_cpuid2 __user *cpuid_arg = argp;
2529 struct kvm_cpuid2 cpuid; 2529 struct kvm_cpuid2 cpuid;
2530 2530
2531 r = -EFAULT; 2531 r = -EFAULT;
2532 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) 2532 if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
2533 goto out; 2533 goto out;
2534 r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, 2534 r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
2535 cpuid_arg->entries); 2535 cpuid_arg->entries);
2536 if (r) 2536 if (r)
2537 goto out; 2537 goto out;
2538 r = -EFAULT; 2538 r = -EFAULT;
2539 if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) 2539 if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid))
2540 goto out; 2540 goto out;
2541 r = 0; 2541 r = 0;
2542 break; 2542 break;
2543 } 2543 }
2544 case KVM_GET_MSRS: 2544 case KVM_GET_MSRS:
2545 r = msr_io(vcpu, argp, kvm_get_msr, 1); 2545 r = msr_io(vcpu, argp, kvm_get_msr, 1);
2546 break; 2546 break;
2547 case KVM_SET_MSRS: 2547 case KVM_SET_MSRS:
2548 r = msr_io(vcpu, argp, do_set_msr, 0); 2548 r = msr_io(vcpu, argp, do_set_msr, 0);
2549 break; 2549 break;
2550 case KVM_TPR_ACCESS_REPORTING: { 2550 case KVM_TPR_ACCESS_REPORTING: {
2551 struct kvm_tpr_access_ctl tac; 2551 struct kvm_tpr_access_ctl tac;
2552 2552
2553 r = -EFAULT; 2553 r = -EFAULT;
2554 if (copy_from_user(&tac, argp, sizeof tac)) 2554 if (copy_from_user(&tac, argp, sizeof tac))
2555 goto out; 2555 goto out;
2556 r = vcpu_ioctl_tpr_access_reporting(vcpu, &tac); 2556 r = vcpu_ioctl_tpr_access_reporting(vcpu, &tac);
2557 if (r) 2557 if (r)
2558 goto out; 2558 goto out;
2559 r = -EFAULT; 2559 r = -EFAULT;
2560 if (copy_to_user(argp, &tac, sizeof tac)) 2560 if (copy_to_user(argp, &tac, sizeof tac))
2561 goto out; 2561 goto out;
2562 r = 0; 2562 r = 0;
2563 break; 2563 break;
2564 }; 2564 };
2565 case KVM_SET_VAPIC_ADDR: { 2565 case KVM_SET_VAPIC_ADDR: {
2566 struct kvm_vapic_addr va; 2566 struct kvm_vapic_addr va;
2567 2567
2568 r = -EINVAL; 2568 r = -EINVAL;
2569 if (!irqchip_in_kernel(vcpu->kvm)) 2569 if (!irqchip_in_kernel(vcpu->kvm))
2570 goto out; 2570 goto out;
2571 r = -EFAULT; 2571 r = -EFAULT;
2572 if (copy_from_user(&va, argp, sizeof va)) 2572 if (copy_from_user(&va, argp, sizeof va))
2573 goto out; 2573 goto out;
2574 r = 0; 2574 r = 0;
2575 kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr); 2575 kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr);
2576 break; 2576 break;
2577 } 2577 }
2578 case KVM_X86_SETUP_MCE: { 2578 case KVM_X86_SETUP_MCE: {
2579 u64 mcg_cap; 2579 u64 mcg_cap;
2580 2580
2581 r = -EFAULT; 2581 r = -EFAULT;
2582 if (copy_from_user(&mcg_cap, argp, sizeof mcg_cap)) 2582 if (copy_from_user(&mcg_cap, argp, sizeof mcg_cap))
2583 goto out; 2583 goto out;
2584 r = kvm_vcpu_ioctl_x86_setup_mce(vcpu, mcg_cap); 2584 r = kvm_vcpu_ioctl_x86_setup_mce(vcpu, mcg_cap);
2585 break; 2585 break;
2586 } 2586 }
2587 case KVM_X86_SET_MCE: { 2587 case KVM_X86_SET_MCE: {
2588 struct kvm_x86_mce mce; 2588 struct kvm_x86_mce mce;
2589 2589
2590 r = -EFAULT; 2590 r = -EFAULT;
2591 if (copy_from_user(&mce, argp, sizeof mce)) 2591 if (copy_from_user(&mce, argp, sizeof mce))
2592 goto out; 2592 goto out;
2593 r = kvm_vcpu_ioctl_x86_set_mce(vcpu, &mce); 2593 r = kvm_vcpu_ioctl_x86_set_mce(vcpu, &mce);
2594 break; 2594 break;
2595 } 2595 }
2596 case KVM_GET_VCPU_EVENTS: { 2596 case KVM_GET_VCPU_EVENTS: {
2597 struct kvm_vcpu_events events; 2597 struct kvm_vcpu_events events;
2598 2598
2599 kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &events); 2599 kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &events);
2600 2600
2601 r = -EFAULT; 2601 r = -EFAULT;
2602 if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events))) 2602 if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events)))
2603 break; 2603 break;
2604 r = 0; 2604 r = 0;
2605 break; 2605 break;
2606 } 2606 }
2607 case KVM_SET_VCPU_EVENTS: { 2607 case KVM_SET_VCPU_EVENTS: {
2608 struct kvm_vcpu_events events; 2608 struct kvm_vcpu_events events;
2609 2609
2610 r = -EFAULT; 2610 r = -EFAULT;
2611 if (copy_from_user(&events, argp, sizeof(struct kvm_vcpu_events))) 2611 if (copy_from_user(&events, argp, sizeof(struct kvm_vcpu_events)))
2612 break; 2612 break;
2613 2613
2614 r = kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events); 2614 r = kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events);
2615 break; 2615 break;
2616 } 2616 }
2617 case KVM_GET_DEBUGREGS: { 2617 case KVM_GET_DEBUGREGS: {
2618 struct kvm_debugregs dbgregs; 2618 struct kvm_debugregs dbgregs;
2619 2619
2620 kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs); 2620 kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
2621 2621
2622 r = -EFAULT; 2622 r = -EFAULT;
2623 if (copy_to_user(argp, &dbgregs, 2623 if (copy_to_user(argp, &dbgregs,
2624 sizeof(struct kvm_debugregs))) 2624 sizeof(struct kvm_debugregs)))
2625 break; 2625 break;
2626 r = 0; 2626 r = 0;
2627 break; 2627 break;
2628 } 2628 }
2629 case KVM_SET_DEBUGREGS: { 2629 case KVM_SET_DEBUGREGS: {
2630 struct kvm_debugregs dbgregs; 2630 struct kvm_debugregs dbgregs;
2631 2631
2632 r = -EFAULT; 2632 r = -EFAULT;
2633 if (copy_from_user(&dbgregs, argp, 2633 if (copy_from_user(&dbgregs, argp,
2634 sizeof(struct kvm_debugregs))) 2634 sizeof(struct kvm_debugregs)))
2635 break; 2635 break;
2636 2636
2637 r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu, &dbgregs); 2637 r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu, &dbgregs);
2638 break; 2638 break;
2639 } 2639 }
2640 case KVM_GET_XSAVE: { 2640 case KVM_GET_XSAVE: {
2641 u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); 2641 u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL);
2642 r = -ENOMEM; 2642 r = -ENOMEM;
2643 if (!u.xsave) 2643 if (!u.xsave)
2644 break; 2644 break;
2645 2645
2646 kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave); 2646 kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
2647 2647
2648 r = -EFAULT; 2648 r = -EFAULT;
2649 if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave))) 2649 if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave)))
2650 break; 2650 break;
2651 r = 0; 2651 r = 0;
2652 break; 2652 break;
2653 } 2653 }
2654 case KVM_SET_XSAVE: { 2654 case KVM_SET_XSAVE: {
2655 u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); 2655 u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL);
2656 r = -ENOMEM; 2656 r = -ENOMEM;
2657 if (!u.xsave) 2657 if (!u.xsave)
2658 break; 2658 break;
2659 2659
2660 r = -EFAULT; 2660 r = -EFAULT;
2661 if (copy_from_user(u.xsave, argp, sizeof(struct kvm_xsave))) 2661 if (copy_from_user(u.xsave, argp, sizeof(struct kvm_xsave)))
2662 break; 2662 break;
2663 2663
2664 r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave); 2664 r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave);
2665 break; 2665 break;
2666 } 2666 }
2667 case KVM_GET_XCRS: { 2667 case KVM_GET_XCRS: {
2668 u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); 2668 u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL);
2669 r = -ENOMEM; 2669 r = -ENOMEM;
2670 if (!u.xcrs) 2670 if (!u.xcrs)
2671 break; 2671 break;
2672 2672
2673 kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs); 2673 kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
2674 2674
2675 r = -EFAULT; 2675 r = -EFAULT;
2676 if (copy_to_user(argp, u.xcrs, 2676 if (copy_to_user(argp, u.xcrs,
2677 sizeof(struct kvm_xcrs))) 2677 sizeof(struct kvm_xcrs)))
2678 break; 2678 break;
2679 r = 0; 2679 r = 0;
2680 break; 2680 break;
2681 } 2681 }
2682 case KVM_SET_XCRS: { 2682 case KVM_SET_XCRS: {
2683 u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); 2683 u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL);
2684 r = -ENOMEM; 2684 r = -ENOMEM;
2685 if (!u.xcrs) 2685 if (!u.xcrs)
2686 break; 2686 break;
2687 2687
2688 r = -EFAULT; 2688 r = -EFAULT;
2689 if (copy_from_user(u.xcrs, argp, 2689 if (copy_from_user(u.xcrs, argp,
2690 sizeof(struct kvm_xcrs))) 2690 sizeof(struct kvm_xcrs)))
2691 break; 2691 break;
2692 2692
2693 r = kvm_vcpu_ioctl_x86_set_xcrs(vcpu, u.xcrs); 2693 r = kvm_vcpu_ioctl_x86_set_xcrs(vcpu, u.xcrs);
2694 break; 2694 break;
2695 } 2695 }
2696 default: 2696 default:
2697 r = -EINVAL; 2697 r = -EINVAL;
2698 } 2698 }
2699 out: 2699 out:
2700 kfree(u.buffer); 2700 kfree(u.buffer);
2701 return r; 2701 return r;
2702 } 2702 }
2703 2703
2704 static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr) 2704 static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
2705 { 2705 {
2706 int ret; 2706 int ret;
2707 2707
2708 if (addr > (unsigned int)(-3 * PAGE_SIZE)) 2708 if (addr > (unsigned int)(-3 * PAGE_SIZE))
2709 return -1; 2709 return -1;
2710 ret = kvm_x86_ops->set_tss_addr(kvm, addr); 2710 ret = kvm_x86_ops->set_tss_addr(kvm, addr);
2711 return ret; 2711 return ret;
2712 } 2712 }
2713 2713
2714 static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm, 2714 static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
2715 u64 ident_addr) 2715 u64 ident_addr)
2716 { 2716 {
2717 kvm->arch.ept_identity_map_addr = ident_addr; 2717 kvm->arch.ept_identity_map_addr = ident_addr;
2718 return 0; 2718 return 0;
2719 } 2719 }
2720 2720
2721 static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm, 2721 static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
2722 u32 kvm_nr_mmu_pages) 2722 u32 kvm_nr_mmu_pages)
2723 { 2723 {
2724 if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES) 2724 if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
2725 return -EINVAL; 2725 return -EINVAL;
2726 2726
2727 mutex_lock(&kvm->slots_lock); 2727 mutex_lock(&kvm->slots_lock);
2728 spin_lock(&kvm->mmu_lock); 2728 spin_lock(&kvm->mmu_lock);
2729 2729
2730 kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages); 2730 kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages);
2731 kvm->arch.n_requested_mmu_pages = kvm_nr_mmu_pages; 2731 kvm->arch.n_requested_mmu_pages = kvm_nr_mmu_pages;
2732 2732
2733 spin_unlock(&kvm->mmu_lock); 2733 spin_unlock(&kvm->mmu_lock);
2734 mutex_unlock(&kvm->slots_lock); 2734 mutex_unlock(&kvm->slots_lock);
2735 return 0; 2735 return 0;
2736 } 2736 }
2737 2737
2738 static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm) 2738 static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
2739 { 2739 {
2740 return kvm->arch.n_alloc_mmu_pages; 2740 return kvm->arch.n_alloc_mmu_pages;
2741 } 2741 }
2742 2742
2743 gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn)
2744 {
2745 int i;
2746 struct kvm_mem_alias *alias;
2747 struct kvm_mem_aliases *aliases;
2748
2749 aliases = kvm_aliases(kvm);
2750
2751 for (i = 0; i < aliases->naliases; ++i) {
2752 alias = &aliases->aliases[i];
2753 if (alias->flags & KVM_ALIAS_INVALID)
2754 continue;
2755 if (gfn >= alias->base_gfn
2756 && gfn < alias->base_gfn + alias->npages)
2757 return alias->target_gfn + gfn - alias->base_gfn;
2758 }
2759 return gfn;
2760 }
2761
2762 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
2763 {
2764 int i;
2765 struct kvm_mem_alias *alias;
2766 struct kvm_mem_aliases *aliases;
2767
2768 aliases = kvm_aliases(kvm);
2769
2770 for (i = 0; i < aliases->naliases; ++i) {
2771 alias = &aliases->aliases[i];
2772 if (gfn >= alias->base_gfn
2773 && gfn < alias->base_gfn + alias->npages)
2774 return alias->target_gfn + gfn - alias->base_gfn;
2775 }
2776 return gfn;
2777 }
2778
2779 /*
2780 * Set a new alias region. Aliases map a portion of physical memory into
2781 * another portion. This is useful for memory windows, for example the PC
2782 * VGA region.
2783 */
2784 static int kvm_vm_ioctl_set_memory_alias(struct kvm *kvm,
2785 struct kvm_memory_alias *alias)
2786 {
2787 int r, n;
2788 struct kvm_mem_alias *p;
2789 struct kvm_mem_aliases *aliases, *old_aliases;
2790
2791 r = -EINVAL;
2792 /* General sanity checks */
2793 if (alias->memory_size & (PAGE_SIZE - 1))
2794 goto out;
2795 if (alias->guest_phys_addr & (PAGE_SIZE - 1))
2796 goto out;
2797 if (alias->slot >= KVM_ALIAS_SLOTS)
2798 goto out;
2799 if (alias->guest_phys_addr + alias->memory_size
2800 < alias->guest_phys_addr)
2801 goto out;
2802 if (alias->target_phys_addr + alias->memory_size
2803 < alias->target_phys_addr)
2804 goto out;
2805
2806 r = -ENOMEM;
2807 aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL);
2808 if (!aliases)
2809 goto out;
2810
2811 mutex_lock(&kvm->slots_lock);
2812
2813 /* invalidate any gfn reference in case of deletion/shrinking */
2814 memcpy(aliases, kvm->arch.aliases, sizeof(struct kvm_mem_aliases));
2815 aliases->aliases[alias->slot].flags |= KVM_ALIAS_INVALID;
2816 old_aliases = kvm->arch.aliases;
2817 rcu_assign_pointer(kvm->arch.aliases, aliases);
2818 synchronize_srcu_expedited(&kvm->srcu);
2819 kvm_mmu_zap_all(kvm);
2820 kfree(old_aliases);
2821
2822 r = -ENOMEM;
2823 aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL);
2824 if (!aliases)
2825 goto out_unlock;
2826
2827 memcpy(aliases, kvm->arch.aliases, sizeof(struct kvm_mem_aliases));
2828
2829 p = &aliases->aliases[alias->slot];
2830 p->base_gfn = alias->guest_phys_addr >> PAGE_SHIFT;
2831 p->npages = alias->memory_size >> PAGE_SHIFT;
2832 p->target_gfn = alias->target_phys_addr >> PAGE_SHIFT;
2833 p->flags &= ~(KVM_ALIAS_INVALID);
2834
2835 for (n = KVM_ALIAS_SLOTS; n > 0; --n)
2836 if (aliases->aliases[n - 1].npages)
2837 break;
2838 aliases->naliases = n;
2839
2840 old_aliases = kvm->arch.aliases;
2841 rcu_assign_pointer(kvm->arch.aliases, aliases);
2842 synchronize_srcu_expedited(&kvm->srcu);
2843 kfree(old_aliases);
2844 r = 0;
2845
2846 out_unlock:
2847 mutex_unlock(&kvm->slots_lock);
2848 out:
2849 return r;
2850 }
2851
2852 static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) 2743 static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
2853 { 2744 {
2854 int r; 2745 int r;
2855 2746
2856 r = 0; 2747 r = 0;
2857 switch (chip->chip_id) { 2748 switch (chip->chip_id) {
2858 case KVM_IRQCHIP_PIC_MASTER: 2749 case KVM_IRQCHIP_PIC_MASTER:
2859 memcpy(&chip->chip.pic, 2750 memcpy(&chip->chip.pic,
2860 &pic_irqchip(kvm)->pics[0], 2751 &pic_irqchip(kvm)->pics[0],
2861 sizeof(struct kvm_pic_state)); 2752 sizeof(struct kvm_pic_state));
2862 break; 2753 break;
2863 case KVM_IRQCHIP_PIC_SLAVE: 2754 case KVM_IRQCHIP_PIC_SLAVE:
2864 memcpy(&chip->chip.pic, 2755 memcpy(&chip->chip.pic,
2865 &pic_irqchip(kvm)->pics[1], 2756 &pic_irqchip(kvm)->pics[1],
2866 sizeof(struct kvm_pic_state)); 2757 sizeof(struct kvm_pic_state));
2867 break; 2758 break;
2868 case KVM_IRQCHIP_IOAPIC: 2759 case KVM_IRQCHIP_IOAPIC:
2869 r = kvm_get_ioapic(kvm, &chip->chip.ioapic); 2760 r = kvm_get_ioapic(kvm, &chip->chip.ioapic);
2870 break; 2761 break;
2871 default: 2762 default:
2872 r = -EINVAL; 2763 r = -EINVAL;
2873 break; 2764 break;
2874 } 2765 }
2875 return r; 2766 return r;
2876 } 2767 }
2877 2768
2878 static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) 2769 static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
2879 { 2770 {
2880 int r; 2771 int r;
2881 2772
2882 r = 0; 2773 r = 0;
2883 switch (chip->chip_id) { 2774 switch (chip->chip_id) {
2884 case KVM_IRQCHIP_PIC_MASTER: 2775 case KVM_IRQCHIP_PIC_MASTER:
2885 raw_spin_lock(&pic_irqchip(kvm)->lock); 2776 raw_spin_lock(&pic_irqchip(kvm)->lock);
2886 memcpy(&pic_irqchip(kvm)->pics[0], 2777 memcpy(&pic_irqchip(kvm)->pics[0],
2887 &chip->chip.pic, 2778 &chip->chip.pic,
2888 sizeof(struct kvm_pic_state)); 2779 sizeof(struct kvm_pic_state));
2889 raw_spin_unlock(&pic_irqchip(kvm)->lock); 2780 raw_spin_unlock(&pic_irqchip(kvm)->lock);
2890 break; 2781 break;
2891 case KVM_IRQCHIP_PIC_SLAVE: 2782 case KVM_IRQCHIP_PIC_SLAVE:
2892 raw_spin_lock(&pic_irqchip(kvm)->lock); 2783 raw_spin_lock(&pic_irqchip(kvm)->lock);
2893 memcpy(&pic_irqchip(kvm)->pics[1], 2784 memcpy(&pic_irqchip(kvm)->pics[1],
2894 &chip->chip.pic, 2785 &chip->chip.pic,
2895 sizeof(struct kvm_pic_state)); 2786 sizeof(struct kvm_pic_state));
2896 raw_spin_unlock(&pic_irqchip(kvm)->lock); 2787 raw_spin_unlock(&pic_irqchip(kvm)->lock);
2897 break; 2788 break;
2898 case KVM_IRQCHIP_IOAPIC: 2789 case KVM_IRQCHIP_IOAPIC:
2899 r = kvm_set_ioapic(kvm, &chip->chip.ioapic); 2790 r = kvm_set_ioapic(kvm, &chip->chip.ioapic);
2900 break; 2791 break;
2901 default: 2792 default:
2902 r = -EINVAL; 2793 r = -EINVAL;
2903 break; 2794 break;
2904 } 2795 }
2905 kvm_pic_update_irq(pic_irqchip(kvm)); 2796 kvm_pic_update_irq(pic_irqchip(kvm));
2906 return r; 2797 return r;
2907 } 2798 }
2908 2799
2909 static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) 2800 static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps)
2910 { 2801 {
2911 int r = 0; 2802 int r = 0;
2912 2803
2913 mutex_lock(&kvm->arch.vpit->pit_state.lock); 2804 mutex_lock(&kvm->arch.vpit->pit_state.lock);
2914 memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state)); 2805 memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state));
2915 mutex_unlock(&kvm->arch.vpit->pit_state.lock); 2806 mutex_unlock(&kvm->arch.vpit->pit_state.lock);
2916 return r; 2807 return r;
2917 } 2808 }
2918 2809
2919 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) 2810 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
2920 { 2811 {
2921 int r = 0; 2812 int r = 0;
2922 2813
2923 mutex_lock(&kvm->arch.vpit->pit_state.lock); 2814 mutex_lock(&kvm->arch.vpit->pit_state.lock);
2924 memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); 2815 memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state));
2925 kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0); 2816 kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0);
2926 mutex_unlock(&kvm->arch.vpit->pit_state.lock); 2817 mutex_unlock(&kvm->arch.vpit->pit_state.lock);
2927 return r; 2818 return r;
2928 } 2819 }
2929 2820
2930 static int kvm_vm_ioctl_get_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) 2821 static int kvm_vm_ioctl_get_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps)
2931 { 2822 {
2932 int r = 0; 2823 int r = 0;
2933 2824
2934 mutex_lock(&kvm->arch.vpit->pit_state.lock); 2825 mutex_lock(&kvm->arch.vpit->pit_state.lock);
2935 memcpy(ps->channels, &kvm->arch.vpit->pit_state.channels, 2826 memcpy(ps->channels, &kvm->arch.vpit->pit_state.channels,
2936 sizeof(ps->channels)); 2827 sizeof(ps->channels));
2937 ps->flags = kvm->arch.vpit->pit_state.flags; 2828 ps->flags = kvm->arch.vpit->pit_state.flags;
2938 mutex_unlock(&kvm->arch.vpit->pit_state.lock); 2829 mutex_unlock(&kvm->arch.vpit->pit_state.lock);
2939 return r; 2830 return r;
2940 } 2831 }
2941 2832
2942 static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) 2833 static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps)
2943 { 2834 {
2944 int r = 0, start = 0; 2835 int r = 0, start = 0;
2945 u32 prev_legacy, cur_legacy; 2836 u32 prev_legacy, cur_legacy;
2946 mutex_lock(&kvm->arch.vpit->pit_state.lock); 2837 mutex_lock(&kvm->arch.vpit->pit_state.lock);
2947 prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY; 2838 prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY;
2948 cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY; 2839 cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY;
2949 if (!prev_legacy && cur_legacy) 2840 if (!prev_legacy && cur_legacy)
2950 start = 1; 2841 start = 1;
2951 memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels, 2842 memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels,
2952 sizeof(kvm->arch.vpit->pit_state.channels)); 2843 sizeof(kvm->arch.vpit->pit_state.channels));
2953 kvm->arch.vpit->pit_state.flags = ps->flags; 2844 kvm->arch.vpit->pit_state.flags = ps->flags;
2954 kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, start); 2845 kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, start);
2955 mutex_unlock(&kvm->arch.vpit->pit_state.lock); 2846 mutex_unlock(&kvm->arch.vpit->pit_state.lock);
2956 return r; 2847 return r;
2957 } 2848 }
2958 2849
2959 static int kvm_vm_ioctl_reinject(struct kvm *kvm, 2850 static int kvm_vm_ioctl_reinject(struct kvm *kvm,
2960 struct kvm_reinject_control *control) 2851 struct kvm_reinject_control *control)
2961 { 2852 {
2962 if (!kvm->arch.vpit) 2853 if (!kvm->arch.vpit)
2963 return -ENXIO; 2854 return -ENXIO;
2964 mutex_lock(&kvm->arch.vpit->pit_state.lock); 2855 mutex_lock(&kvm->arch.vpit->pit_state.lock);
2965 kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject; 2856 kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject;
2966 mutex_unlock(&kvm->arch.vpit->pit_state.lock); 2857 mutex_unlock(&kvm->arch.vpit->pit_state.lock);
2967 return 0; 2858 return 0;
2968 } 2859 }
2969 2860
2970 /* 2861 /*
2971 * Get (and clear) the dirty memory log for a memory slot. 2862 * Get (and clear) the dirty memory log for a memory slot.
2972 */ 2863 */
2973 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, 2864 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
2974 struct kvm_dirty_log *log) 2865 struct kvm_dirty_log *log)
2975 { 2866 {
2976 int r, i; 2867 int r, i;
2977 struct kvm_memory_slot *memslot; 2868 struct kvm_memory_slot *memslot;
2978 unsigned long n; 2869 unsigned long n;
2979 unsigned long is_dirty = 0; 2870 unsigned long is_dirty = 0;
2980 2871
2981 mutex_lock(&kvm->slots_lock); 2872 mutex_lock(&kvm->slots_lock);
2982 2873
2983 r = -EINVAL; 2874 r = -EINVAL;
2984 if (log->slot >= KVM_MEMORY_SLOTS) 2875 if (log->slot >= KVM_MEMORY_SLOTS)
2985 goto out; 2876 goto out;
2986 2877
2987 memslot = &kvm->memslots->memslots[log->slot]; 2878 memslot = &kvm->memslots->memslots[log->slot];
2988 r = -ENOENT; 2879 r = -ENOENT;
2989 if (!memslot->dirty_bitmap) 2880 if (!memslot->dirty_bitmap)
2990 goto out; 2881 goto out;
2991 2882
2992 n = kvm_dirty_bitmap_bytes(memslot); 2883 n = kvm_dirty_bitmap_bytes(memslot);
2993 2884
2994 for (i = 0; !is_dirty && i < n/sizeof(long); i++) 2885 for (i = 0; !is_dirty && i < n/sizeof(long); i++)
2995 is_dirty = memslot->dirty_bitmap[i]; 2886 is_dirty = memslot->dirty_bitmap[i];
2996 2887
2997 /* If nothing is dirty, don't bother messing with page tables. */ 2888 /* If nothing is dirty, don't bother messing with page tables. */
2998 if (is_dirty) { 2889 if (is_dirty) {
2999 struct kvm_memslots *slots, *old_slots; 2890 struct kvm_memslots *slots, *old_slots;
3000 unsigned long *dirty_bitmap; 2891 unsigned long *dirty_bitmap;
3001 2892
3002 spin_lock(&kvm->mmu_lock); 2893 spin_lock(&kvm->mmu_lock);
3003 kvm_mmu_slot_remove_write_access(kvm, log->slot); 2894 kvm_mmu_slot_remove_write_access(kvm, log->slot);
3004 spin_unlock(&kvm->mmu_lock); 2895 spin_unlock(&kvm->mmu_lock);
3005 2896
3006 r = -ENOMEM; 2897 r = -ENOMEM;
3007 dirty_bitmap = vmalloc(n); 2898 dirty_bitmap = vmalloc(n);
3008 if (!dirty_bitmap) 2899 if (!dirty_bitmap)
3009 goto out; 2900 goto out;
3010 memset(dirty_bitmap, 0, n); 2901 memset(dirty_bitmap, 0, n);
3011 2902
3012 r = -ENOMEM; 2903 r = -ENOMEM;
3013 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); 2904 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
3014 if (!slots) { 2905 if (!slots) {
3015 vfree(dirty_bitmap); 2906 vfree(dirty_bitmap);
3016 goto out; 2907 goto out;
3017 } 2908 }
3018 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); 2909 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
3019 slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; 2910 slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
3020 2911
3021 old_slots = kvm->memslots; 2912 old_slots = kvm->memslots;
3022 rcu_assign_pointer(kvm->memslots, slots); 2913 rcu_assign_pointer(kvm->memslots, slots);
3023 synchronize_srcu_expedited(&kvm->srcu); 2914 synchronize_srcu_expedited(&kvm->srcu);
3024 dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap; 2915 dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
3025 kfree(old_slots); 2916 kfree(old_slots);
3026 2917
3027 r = -EFAULT; 2918 r = -EFAULT;
3028 if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { 2919 if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
3029 vfree(dirty_bitmap); 2920 vfree(dirty_bitmap);
3030 goto out; 2921 goto out;
3031 } 2922 }
3032 vfree(dirty_bitmap); 2923 vfree(dirty_bitmap);
3033 } else { 2924 } else {
3034 r = -EFAULT; 2925 r = -EFAULT;
3035 if (clear_user(log->dirty_bitmap, n)) 2926 if (clear_user(log->dirty_bitmap, n))
3036 goto out; 2927 goto out;
3037 } 2928 }
3038 2929
3039 r = 0; 2930 r = 0;
3040 out: 2931 out:
3041 mutex_unlock(&kvm->slots_lock); 2932 mutex_unlock(&kvm->slots_lock);
3042 return r; 2933 return r;
3043 } 2934 }
3044 2935
3045 long kvm_arch_vm_ioctl(struct file *filp, 2936 long kvm_arch_vm_ioctl(struct file *filp,
3046 unsigned int ioctl, unsigned long arg) 2937 unsigned int ioctl, unsigned long arg)
3047 { 2938 {
3048 struct kvm *kvm = filp->private_data; 2939 struct kvm *kvm = filp->private_data;
3049 void __user *argp = (void __user *)arg; 2940 void __user *argp = (void __user *)arg;
3050 int r = -ENOTTY; 2941 int r = -ENOTTY;
3051 /* 2942 /*
3052 * This union makes it completely explicit to gcc-3.x 2943 * This union makes it completely explicit to gcc-3.x
3053 * that these two variables' stack usage should be 2944 * that these two variables' stack usage should be
3054 * combined, not added together. 2945 * combined, not added together.
3055 */ 2946 */
3056 union { 2947 union {
3057 struct kvm_pit_state ps; 2948 struct kvm_pit_state ps;
3058 struct kvm_pit_state2 ps2; 2949 struct kvm_pit_state2 ps2;
3059 struct kvm_memory_alias alias;
3060 struct kvm_pit_config pit_config; 2950 struct kvm_pit_config pit_config;
3061 } u; 2951 } u;
3062 2952
3063 switch (ioctl) { 2953 switch (ioctl) {
3064 case KVM_SET_TSS_ADDR: 2954 case KVM_SET_TSS_ADDR:
3065 r = kvm_vm_ioctl_set_tss_addr(kvm, arg); 2955 r = kvm_vm_ioctl_set_tss_addr(kvm, arg);
3066 if (r < 0) 2956 if (r < 0)
3067 goto out; 2957 goto out;
3068 break; 2958 break;
3069 case KVM_SET_IDENTITY_MAP_ADDR: { 2959 case KVM_SET_IDENTITY_MAP_ADDR: {
3070 u64 ident_addr; 2960 u64 ident_addr;
3071 2961
3072 r = -EFAULT; 2962 r = -EFAULT;
3073 if (copy_from_user(&ident_addr, argp, sizeof ident_addr)) 2963 if (copy_from_user(&ident_addr, argp, sizeof ident_addr))
3074 goto out; 2964 goto out;
3075 r = kvm_vm_ioctl_set_identity_map_addr(kvm, ident_addr); 2965 r = kvm_vm_ioctl_set_identity_map_addr(kvm, ident_addr);
3076 if (r < 0) 2966 if (r < 0)
3077 goto out; 2967 goto out;
3078 break; 2968 break;
3079 } 2969 }
3080 case KVM_SET_MEMORY_REGION: { 2970 case KVM_SET_MEMORY_REGION: {
3081 struct kvm_memory_region kvm_mem; 2971 struct kvm_memory_region kvm_mem;
3082 struct kvm_userspace_memory_region kvm_userspace_mem; 2972 struct kvm_userspace_memory_region kvm_userspace_mem;
3083 2973
3084 r = -EFAULT; 2974 r = -EFAULT;
3085 if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) 2975 if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem))
3086 goto out; 2976 goto out;
3087 kvm_userspace_mem.slot = kvm_mem.slot; 2977 kvm_userspace_mem.slot = kvm_mem.slot;
3088 kvm_userspace_mem.flags = kvm_mem.flags; 2978 kvm_userspace_mem.flags = kvm_mem.flags;
3089 kvm_userspace_mem.guest_phys_addr = kvm_mem.guest_phys_addr; 2979 kvm_userspace_mem.guest_phys_addr = kvm_mem.guest_phys_addr;
3090 kvm_userspace_mem.memory_size = kvm_mem.memory_size; 2980 kvm_userspace_mem.memory_size = kvm_mem.memory_size;
3091 r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 0); 2981 r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 0);
3092 if (r) 2982 if (r)
3093 goto out; 2983 goto out;
3094 break; 2984 break;
3095 } 2985 }
3096 case KVM_SET_NR_MMU_PAGES: 2986 case KVM_SET_NR_MMU_PAGES:
3097 r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg); 2987 r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg);
3098 if (r) 2988 if (r)
3099 goto out; 2989 goto out;
3100 break; 2990 break;
3101 case KVM_GET_NR_MMU_PAGES: 2991 case KVM_GET_NR_MMU_PAGES:
3102 r = kvm_vm_ioctl_get_nr_mmu_pages(kvm); 2992 r = kvm_vm_ioctl_get_nr_mmu_pages(kvm);
3103 break; 2993 break;
3104 case KVM_SET_MEMORY_ALIAS:
3105 r = -EFAULT;
3106 if (copy_from_user(&u.alias, argp, sizeof(struct kvm_memory_alias)))
3107 goto out;
3108 r = kvm_vm_ioctl_set_memory_alias(kvm, &u.alias);
3109 if (r)
3110 goto out;
3111 break;
3112 case KVM_CREATE_IRQCHIP: { 2994 case KVM_CREATE_IRQCHIP: {
3113 struct kvm_pic *vpic; 2995 struct kvm_pic *vpic;
3114 2996
3115 mutex_lock(&kvm->lock); 2997 mutex_lock(&kvm->lock);
3116 r = -EEXIST; 2998 r = -EEXIST;
3117 if (kvm->arch.vpic) 2999 if (kvm->arch.vpic)
3118 goto create_irqchip_unlock; 3000 goto create_irqchip_unlock;
3119 r = -ENOMEM; 3001 r = -ENOMEM;
3120 vpic = kvm_create_pic(kvm); 3002 vpic = kvm_create_pic(kvm);
3121 if (vpic) { 3003 if (vpic) {
3122 r = kvm_ioapic_init(kvm); 3004 r = kvm_ioapic_init(kvm);
3123 if (r) { 3005 if (r) {
3124 kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, 3006 kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
3125 &vpic->dev); 3007 &vpic->dev);
3126 kfree(vpic); 3008 kfree(vpic);
3127 goto create_irqchip_unlock; 3009 goto create_irqchip_unlock;
3128 } 3010 }
3129 } else 3011 } else
3130 goto create_irqchip_unlock; 3012 goto create_irqchip_unlock;
3131 smp_wmb(); 3013 smp_wmb();
3132 kvm->arch.vpic = vpic; 3014 kvm->arch.vpic = vpic;
3133 smp_wmb(); 3015 smp_wmb();
3134 r = kvm_setup_default_irq_routing(kvm); 3016 r = kvm_setup_default_irq_routing(kvm);
3135 if (r) { 3017 if (r) {
3136 mutex_lock(&kvm->irq_lock); 3018 mutex_lock(&kvm->irq_lock);
3137 kvm_ioapic_destroy(kvm); 3019 kvm_ioapic_destroy(kvm);
3138 kvm_destroy_pic(kvm); 3020 kvm_destroy_pic(kvm);
3139 mutex_unlock(&kvm->irq_lock); 3021 mutex_unlock(&kvm->irq_lock);
3140 } 3022 }
3141 create_irqchip_unlock: 3023 create_irqchip_unlock:
3142 mutex_unlock(&kvm->lock); 3024 mutex_unlock(&kvm->lock);
3143 break; 3025 break;
3144 } 3026 }
3145 case KVM_CREATE_PIT: 3027 case KVM_CREATE_PIT:
3146 u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY; 3028 u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY;
3147 goto create_pit; 3029 goto create_pit;
3148 case KVM_CREATE_PIT2: 3030 case KVM_CREATE_PIT2:
3149 r = -EFAULT; 3031 r = -EFAULT;
3150 if (copy_from_user(&u.pit_config, argp, 3032 if (copy_from_user(&u.pit_config, argp,
3151 sizeof(struct kvm_pit_config))) 3033 sizeof(struct kvm_pit_config)))
3152 goto out; 3034 goto out;
3153 create_pit: 3035 create_pit:
3154 mutex_lock(&kvm->slots_lock); 3036 mutex_lock(&kvm->slots_lock);
3155 r = -EEXIST; 3037 r = -EEXIST;
3156 if (kvm->arch.vpit) 3038 if (kvm->arch.vpit)
3157 goto create_pit_unlock; 3039 goto create_pit_unlock;
3158 r = -ENOMEM; 3040 r = -ENOMEM;
3159 kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags); 3041 kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags);
3160 if (kvm->arch.vpit) 3042 if (kvm->arch.vpit)
3161 r = 0; 3043 r = 0;
3162 create_pit_unlock: 3044 create_pit_unlock:
3163 mutex_unlock(&kvm->slots_lock); 3045 mutex_unlock(&kvm->slots_lock);
3164 break; 3046 break;
3165 case KVM_IRQ_LINE_STATUS: 3047 case KVM_IRQ_LINE_STATUS:
3166 case KVM_IRQ_LINE: { 3048 case KVM_IRQ_LINE: {
3167 struct kvm_irq_level irq_event; 3049 struct kvm_irq_level irq_event;
3168 3050
3169 r = -EFAULT; 3051 r = -EFAULT;
3170 if (copy_from_user(&irq_event, argp, sizeof irq_event)) 3052 if (copy_from_user(&irq_event, argp, sizeof irq_event))
3171 goto out; 3053 goto out;
3172 r = -ENXIO; 3054 r = -ENXIO;
3173 if (irqchip_in_kernel(kvm)) { 3055 if (irqchip_in_kernel(kvm)) {
3174 __s32 status; 3056 __s32 status;
3175 status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 3057 status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
3176 irq_event.irq, irq_event.level); 3058 irq_event.irq, irq_event.level);
3177 if (ioctl == KVM_IRQ_LINE_STATUS) { 3059 if (ioctl == KVM_IRQ_LINE_STATUS) {
3178 r = -EFAULT; 3060 r = -EFAULT;
3179 irq_event.status = status; 3061 irq_event.status = status;
3180 if (copy_to_user(argp, &irq_event, 3062 if (copy_to_user(argp, &irq_event,
3181 sizeof irq_event)) 3063 sizeof irq_event))
3182 goto out; 3064 goto out;
3183 } 3065 }
3184 r = 0; 3066 r = 0;
3185 } 3067 }
3186 break; 3068 break;
3187 } 3069 }
3188 case KVM_GET_IRQCHIP: { 3070 case KVM_GET_IRQCHIP: {
3189 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ 3071 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
3190 struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); 3072 struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL);
3191 3073
3192 r = -ENOMEM; 3074 r = -ENOMEM;
3193 if (!chip) 3075 if (!chip)
3194 goto out; 3076 goto out;
3195 r = -EFAULT; 3077 r = -EFAULT;
3196 if (copy_from_user(chip, argp, sizeof *chip)) 3078 if (copy_from_user(chip, argp, sizeof *chip))
3197 goto get_irqchip_out; 3079 goto get_irqchip_out;
3198 r = -ENXIO; 3080 r = -ENXIO;
3199 if (!irqchip_in_kernel(kvm)) 3081 if (!irqchip_in_kernel(kvm))
3200 goto get_irqchip_out; 3082 goto get_irqchip_out;
3201 r = kvm_vm_ioctl_get_irqchip(kvm, chip); 3083 r = kvm_vm_ioctl_get_irqchip(kvm, chip);
3202 if (r) 3084 if (r)
3203 goto get_irqchip_out; 3085 goto get_irqchip_out;
3204 r = -EFAULT; 3086 r = -EFAULT;
3205 if (copy_to_user(argp, chip, sizeof *chip)) 3087 if (copy_to_user(argp, chip, sizeof *chip))
3206 goto get_irqchip_out; 3088 goto get_irqchip_out;
3207 r = 0; 3089 r = 0;
3208 get_irqchip_out: 3090 get_irqchip_out:
3209 kfree(chip); 3091 kfree(chip);
3210 if (r) 3092 if (r)
3211 goto out; 3093 goto out;
3212 break; 3094 break;
3213 } 3095 }
3214 case KVM_SET_IRQCHIP: { 3096 case KVM_SET_IRQCHIP: {
3215 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ 3097 /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
3216 struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); 3098 struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL);
3217 3099
3218 r = -ENOMEM; 3100 r = -ENOMEM;
3219 if (!chip) 3101 if (!chip)
3220 goto out; 3102 goto out;
3221 r = -EFAULT; 3103 r = -EFAULT;
3222 if (copy_from_user(chip, argp, sizeof *chip)) 3104 if (copy_from_user(chip, argp, sizeof *chip))
3223 goto set_irqchip_out; 3105 goto set_irqchip_out;
3224 r = -ENXIO; 3106 r = -ENXIO;
3225 if (!irqchip_in_kernel(kvm)) 3107 if (!irqchip_in_kernel(kvm))
3226 goto set_irqchip_out; 3108 goto set_irqchip_out;
3227 r = kvm_vm_ioctl_set_irqchip(kvm, chip); 3109 r = kvm_vm_ioctl_set_irqchip(kvm, chip);
3228 if (r) 3110 if (r)
3229 goto set_irqchip_out; 3111 goto set_irqchip_out;
3230 r = 0; 3112 r = 0;
3231 set_irqchip_out: 3113 set_irqchip_out:
3232 kfree(chip); 3114 kfree(chip);
3233 if (r) 3115 if (r)
3234 goto out; 3116 goto out;
3235 break; 3117 break;
3236 } 3118 }
3237 case KVM_GET_PIT: { 3119 case KVM_GET_PIT: {
3238 r = -EFAULT; 3120 r = -EFAULT;
3239 if (copy_from_user(&u.ps, argp, sizeof(struct kvm_pit_state))) 3121 if (copy_from_user(&u.ps, argp, sizeof(struct kvm_pit_state)))
3240 goto out; 3122 goto out;
3241 r = -ENXIO; 3123 r = -ENXIO;
3242 if (!kvm->arch.vpit) 3124 if (!kvm->arch.vpit)
3243 goto out; 3125 goto out;
3244 r = kvm_vm_ioctl_get_pit(kvm, &u.ps); 3126 r = kvm_vm_ioctl_get_pit(kvm, &u.ps);
3245 if (r) 3127 if (r)
3246 goto out; 3128 goto out;
3247 r = -EFAULT; 3129 r = -EFAULT;
3248 if (copy_to_user(argp, &u.ps, sizeof(struct kvm_pit_state))) 3130 if (copy_to_user(argp, &u.ps, sizeof(struct kvm_pit_state)))
3249 goto out; 3131 goto out;
3250 r = 0; 3132 r = 0;
3251 break; 3133 break;
3252 } 3134 }
3253 case KVM_SET_PIT: { 3135 case KVM_SET_PIT: {
3254 r = -EFAULT; 3136 r = -EFAULT;
3255 if (copy_from_user(&u.ps, argp, sizeof u.ps)) 3137 if (copy_from_user(&u.ps, argp, sizeof u.ps))
3256 goto out; 3138 goto out;
3257 r = -ENXIO; 3139 r = -ENXIO;
3258 if (!kvm->arch.vpit) 3140 if (!kvm->arch.vpit)
3259 goto out; 3141 goto out;
3260 r = kvm_vm_ioctl_set_pit(kvm, &u.ps); 3142 r = kvm_vm_ioctl_set_pit(kvm, &u.ps);
3261 if (r) 3143 if (r)
3262 goto out; 3144 goto out;
3263 r = 0; 3145 r = 0;
3264 break; 3146 break;
3265 } 3147 }
3266 case KVM_GET_PIT2: { 3148 case KVM_GET_PIT2: {
3267 r = -ENXIO; 3149 r = -ENXIO;
3268 if (!kvm->arch.vpit) 3150 if (!kvm->arch.vpit)
3269 goto out; 3151 goto out;
3270 r = kvm_vm_ioctl_get_pit2(kvm, &u.ps2); 3152 r = kvm_vm_ioctl_get_pit2(kvm, &u.ps2);
3271 if (r) 3153 if (r)
3272 goto out; 3154 goto out;
3273 r = -EFAULT; 3155 r = -EFAULT;
3274 if (copy_to_user(argp, &u.ps2, sizeof(u.ps2))) 3156 if (copy_to_user(argp, &u.ps2, sizeof(u.ps2)))
3275 goto out; 3157 goto out;
3276 r = 0; 3158 r = 0;
3277 break; 3159 break;
3278 } 3160 }
3279 case KVM_SET_PIT2: { 3161 case KVM_SET_PIT2: {
3280 r = -EFAULT; 3162 r = -EFAULT;
3281 if (copy_from_user(&u.ps2, argp, sizeof(u.ps2))) 3163 if (copy_from_user(&u.ps2, argp, sizeof(u.ps2)))
3282 goto out; 3164 goto out;
3283 r = -ENXIO; 3165 r = -ENXIO;
3284 if (!kvm->arch.vpit) 3166 if (!kvm->arch.vpit)
3285 goto out; 3167 goto out;
3286 r = kvm_vm_ioctl_set_pit2(kvm, &u.ps2); 3168 r = kvm_vm_ioctl_set_pit2(kvm, &u.ps2);
3287 if (r) 3169 if (r)
3288 goto out; 3170 goto out;
3289 r = 0; 3171 r = 0;
3290 break; 3172 break;
3291 } 3173 }
3292 case KVM_REINJECT_CONTROL: { 3174 case KVM_REINJECT_CONTROL: {
3293 struct kvm_reinject_control control; 3175 struct kvm_reinject_control control;
3294 r = -EFAULT; 3176 r = -EFAULT;
3295 if (copy_from_user(&control, argp, sizeof(control))) 3177 if (copy_from_user(&control, argp, sizeof(control)))
3296 goto out; 3178 goto out;
3297 r = kvm_vm_ioctl_reinject(kvm, &control); 3179 r = kvm_vm_ioctl_reinject(kvm, &control);
3298 if (r) 3180 if (r)
3299 goto out; 3181 goto out;
3300 r = 0; 3182 r = 0;
3301 break; 3183 break;
3302 } 3184 }
3303 case KVM_XEN_HVM_CONFIG: { 3185 case KVM_XEN_HVM_CONFIG: {
3304 r = -EFAULT; 3186 r = -EFAULT;
3305 if (copy_from_user(&kvm->arch.xen_hvm_config, argp, 3187 if (copy_from_user(&kvm->arch.xen_hvm_config, argp,
3306 sizeof(struct kvm_xen_hvm_config))) 3188 sizeof(struct kvm_xen_hvm_config)))
3307 goto out; 3189 goto out;
3308 r = -EINVAL; 3190 r = -EINVAL;
3309 if (kvm->arch.xen_hvm_config.flags) 3191 if (kvm->arch.xen_hvm_config.flags)
3310 goto out; 3192 goto out;
3311 r = 0; 3193 r = 0;
3312 break; 3194 break;
3313 } 3195 }
3314 case KVM_SET_CLOCK: { 3196 case KVM_SET_CLOCK: {
3315 struct timespec now; 3197 struct timespec now;
3316 struct kvm_clock_data user_ns; 3198 struct kvm_clock_data user_ns;
3317 u64 now_ns; 3199 u64 now_ns;
3318 s64 delta; 3200 s64 delta;
3319 3201
3320 r = -EFAULT; 3202 r = -EFAULT;
3321 if (copy_from_user(&user_ns, argp, sizeof(user_ns))) 3203 if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
3322 goto out; 3204 goto out;
3323 3205
3324 r = -EINVAL; 3206 r = -EINVAL;
3325 if (user_ns.flags) 3207 if (user_ns.flags)
3326 goto out; 3208 goto out;
3327 3209
3328 r = 0; 3210 r = 0;
3329 ktime_get_ts(&now); 3211 ktime_get_ts(&now);
3330 now_ns = timespec_to_ns(&now); 3212 now_ns = timespec_to_ns(&now);
3331 delta = user_ns.clock - now_ns; 3213 delta = user_ns.clock - now_ns;
3332 kvm->arch.kvmclock_offset = delta; 3214 kvm->arch.kvmclock_offset = delta;
3333 break; 3215 break;
3334 } 3216 }
3335 case KVM_GET_CLOCK: { 3217 case KVM_GET_CLOCK: {
3336 struct timespec now; 3218 struct timespec now;
3337 struct kvm_clock_data user_ns; 3219 struct kvm_clock_data user_ns;
3338 u64 now_ns; 3220 u64 now_ns;
3339 3221
3340 ktime_get_ts(&now); 3222 ktime_get_ts(&now);
3341 now_ns = timespec_to_ns(&now); 3223 now_ns = timespec_to_ns(&now);
3342 user_ns.clock = kvm->arch.kvmclock_offset + now_ns; 3224 user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
3343 user_ns.flags = 0; 3225 user_ns.flags = 0;
3344 3226
3345 r = -EFAULT; 3227 r = -EFAULT;
3346 if (copy_to_user(argp, &user_ns, sizeof(user_ns))) 3228 if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
3347 goto out; 3229 goto out;
3348 r = 0; 3230 r = 0;
3349 break; 3231 break;
3350 } 3232 }
3351 3233
3352 default: 3234 default:
3353 ; 3235 ;
3354 } 3236 }
3355 out: 3237 out:
3356 return r; 3238 return r;
3357 } 3239 }
3358 3240
3359 static void kvm_init_msr_list(void) 3241 static void kvm_init_msr_list(void)
3360 { 3242 {
3361 u32 dummy[2]; 3243 u32 dummy[2];
3362 unsigned i, j; 3244 unsigned i, j;
3363 3245
3364 /* skip the first msrs in the list. KVM-specific */ 3246 /* skip the first msrs in the list. KVM-specific */
3365 for (i = j = KVM_SAVE_MSRS_BEGIN; i < ARRAY_SIZE(msrs_to_save); i++) { 3247 for (i = j = KVM_SAVE_MSRS_BEGIN; i < ARRAY_SIZE(msrs_to_save); i++) {
3366 if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0) 3248 if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0)
3367 continue; 3249 continue;
3368 if (j < i) 3250 if (j < i)
3369 msrs_to_save[j] = msrs_to_save[i]; 3251 msrs_to_save[j] = msrs_to_save[i];
3370 j++; 3252 j++;
3371 } 3253 }
3372 num_msrs_to_save = j; 3254 num_msrs_to_save = j;
3373 } 3255 }
3374 3256
3375 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, 3257 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
3376 const void *v) 3258 const void *v)
3377 { 3259 {
3378 if (vcpu->arch.apic && 3260 if (vcpu->arch.apic &&
3379 !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, len, v)) 3261 !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, len, v))
3380 return 0; 3262 return 0;
3381 3263
3382 return kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); 3264 return kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, len, v);
3383 } 3265 }
3384 3266
3385 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) 3267 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
3386 { 3268 {
3387 if (vcpu->arch.apic && 3269 if (vcpu->arch.apic &&
3388 !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, len, v)) 3270 !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, len, v))
3389 return 0; 3271 return 0;
3390 3272
3391 return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); 3273 return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v);
3392 } 3274 }
3393 3275
3394 static void kvm_set_segment(struct kvm_vcpu *vcpu, 3276 static void kvm_set_segment(struct kvm_vcpu *vcpu,
3395 struct kvm_segment *var, int seg) 3277 struct kvm_segment *var, int seg)
3396 { 3278 {
3397 kvm_x86_ops->set_segment(vcpu, var, seg); 3279 kvm_x86_ops->set_segment(vcpu, var, seg);
3398 } 3280 }
3399 3281
3400 void kvm_get_segment(struct kvm_vcpu *vcpu, 3282 void kvm_get_segment(struct kvm_vcpu *vcpu,
3401 struct kvm_segment *var, int seg) 3283 struct kvm_segment *var, int seg)
3402 { 3284 {
3403 kvm_x86_ops->get_segment(vcpu, var, seg); 3285 kvm_x86_ops->get_segment(vcpu, var, seg);
3404 } 3286 }
3405 3287
3406 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) 3288 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
3407 { 3289 {
3408 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; 3290 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
3409 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); 3291 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
3410 } 3292 }
3411 3293
3412 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) 3294 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
3413 { 3295 {
3414 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; 3296 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
3415 access |= PFERR_FETCH_MASK; 3297 access |= PFERR_FETCH_MASK;
3416 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); 3298 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
3417 } 3299 }
3418 3300
3419 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) 3301 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
3420 { 3302 {
3421 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; 3303 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
3422 access |= PFERR_WRITE_MASK; 3304 access |= PFERR_WRITE_MASK;
3423 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); 3305 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
3424 } 3306 }
3425 3307
3426 /* uses this to access any guest's mapped memory without checking CPL */ 3308 /* uses this to access any guest's mapped memory without checking CPL */
3427 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) 3309 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
3428 { 3310 {
3429 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error); 3311 return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error);
3430 } 3312 }
3431 3313
3432 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, 3314 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
3433 struct kvm_vcpu *vcpu, u32 access, 3315 struct kvm_vcpu *vcpu, u32 access,
3434 u32 *error) 3316 u32 *error)
3435 { 3317 {
3436 void *data = val; 3318 void *data = val;
3437 int r = X86EMUL_CONTINUE; 3319 int r = X86EMUL_CONTINUE;
3438 3320
3439 while (bytes) { 3321 while (bytes) {
3440 gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error); 3322 gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error);
3441 unsigned offset = addr & (PAGE_SIZE-1); 3323 unsigned offset = addr & (PAGE_SIZE-1);
3442 unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); 3324 unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
3443 int ret; 3325 int ret;
3444 3326
3445 if (gpa == UNMAPPED_GVA) { 3327 if (gpa == UNMAPPED_GVA) {
3446 r = X86EMUL_PROPAGATE_FAULT; 3328 r = X86EMUL_PROPAGATE_FAULT;
3447 goto out; 3329 goto out;
3448 } 3330 }
3449 ret = kvm_read_guest(vcpu->kvm, gpa, data, toread); 3331 ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
3450 if (ret < 0) { 3332 if (ret < 0) {
3451 r = X86EMUL_IO_NEEDED; 3333 r = X86EMUL_IO_NEEDED;
3452 goto out; 3334 goto out;
3453 } 3335 }
3454 3336
3455 bytes -= toread; 3337 bytes -= toread;
3456 data += toread; 3338 data += toread;
3457 addr += toread; 3339 addr += toread;
3458 } 3340 }
3459 out: 3341 out:
3460 return r; 3342 return r;
3461 } 3343 }
3462 3344
3463 /* used for instruction fetching */ 3345 /* used for instruction fetching */
3464 static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes, 3346 static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes,
3465 struct kvm_vcpu *vcpu, u32 *error) 3347 struct kvm_vcpu *vcpu, u32 *error)
3466 { 3348 {
3467 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; 3349 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
3468 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 3350 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
3469 access | PFERR_FETCH_MASK, error); 3351 access | PFERR_FETCH_MASK, error);
3470 } 3352 }
3471 3353
3472 static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes, 3354 static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
3473 struct kvm_vcpu *vcpu, u32 *error) 3355 struct kvm_vcpu *vcpu, u32 *error)
3474 { 3356 {
3475 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; 3357 u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
3476 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, 3358 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
3477 error); 3359 error);
3478 } 3360 }
3479 3361
3480 static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, 3362 static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes,
3481 struct kvm_vcpu *vcpu, u32 *error) 3363 struct kvm_vcpu *vcpu, u32 *error)
3482 { 3364 {
3483 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); 3365 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
3484 } 3366 }
3485 3367
3486 static int kvm_write_guest_virt_system(gva_t addr, void *val, 3368 static int kvm_write_guest_virt_system(gva_t addr, void *val,
3487 unsigned int bytes, 3369 unsigned int bytes,
3488 struct kvm_vcpu *vcpu, 3370 struct kvm_vcpu *vcpu,
3489 u32 *error) 3371 u32 *error)
3490 { 3372 {
3491 void *data = val; 3373 void *data = val;
3492 int r = X86EMUL_CONTINUE; 3374 int r = X86EMUL_CONTINUE;
3493 3375
3494 while (bytes) { 3376 while (bytes) {
3495 gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, 3377 gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr,
3496 PFERR_WRITE_MASK, error); 3378 PFERR_WRITE_MASK, error);
3497 unsigned offset = addr & (PAGE_SIZE-1); 3379 unsigned offset = addr & (PAGE_SIZE-1);
3498 unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); 3380 unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
3499 int ret; 3381 int ret;
3500 3382
3501 if (gpa == UNMAPPED_GVA) { 3383 if (gpa == UNMAPPED_GVA) {
3502 r = X86EMUL_PROPAGATE_FAULT; 3384 r = X86EMUL_PROPAGATE_FAULT;
3503 goto out; 3385 goto out;
3504 } 3386 }
3505 ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite); 3387 ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite);
3506 if (ret < 0) { 3388 if (ret < 0) {
3507 r = X86EMUL_IO_NEEDED; 3389 r = X86EMUL_IO_NEEDED;
3508 goto out; 3390 goto out;
3509 } 3391 }
3510 3392
3511 bytes -= towrite; 3393 bytes -= towrite;
3512 data += towrite; 3394 data += towrite;
3513 addr += towrite; 3395 addr += towrite;
3514 } 3396 }
3515 out: 3397 out:
3516 return r; 3398 return r;
3517 } 3399 }
3518 3400
3519 static int emulator_read_emulated(unsigned long addr, 3401 static int emulator_read_emulated(unsigned long addr,
3520 void *val, 3402 void *val,
3521 unsigned int bytes, 3403 unsigned int bytes,
3522 unsigned int *error_code, 3404 unsigned int *error_code,
3523 struct kvm_vcpu *vcpu) 3405 struct kvm_vcpu *vcpu)
3524 { 3406 {
3525 gpa_t gpa; 3407 gpa_t gpa;
3526 3408
3527 if (vcpu->mmio_read_completed) { 3409 if (vcpu->mmio_read_completed) {
3528 memcpy(val, vcpu->mmio_data, bytes); 3410 memcpy(val, vcpu->mmio_data, bytes);
3529 trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, 3411 trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes,
3530 vcpu->mmio_phys_addr, *(u64 *)val); 3412 vcpu->mmio_phys_addr, *(u64 *)val);
3531 vcpu->mmio_read_completed = 0; 3413 vcpu->mmio_read_completed = 0;
3532 return X86EMUL_CONTINUE; 3414 return X86EMUL_CONTINUE;
3533 } 3415 }
3534 3416
3535 gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code); 3417 gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code);
3536 3418
3537 if (gpa == UNMAPPED_GVA) 3419 if (gpa == UNMAPPED_GVA)
3538 return X86EMUL_PROPAGATE_FAULT; 3420 return X86EMUL_PROPAGATE_FAULT;
3539 3421
3540 /* For APIC access vmexit */ 3422 /* For APIC access vmexit */
3541 if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) 3423 if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
3542 goto mmio; 3424 goto mmio;
3543 3425
3544 if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL) 3426 if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL)
3545 == X86EMUL_CONTINUE) 3427 == X86EMUL_CONTINUE)
3546 return X86EMUL_CONTINUE; 3428 return X86EMUL_CONTINUE;
3547 3429
3548 mmio: 3430 mmio:
3549 /* 3431 /*
3550 * Is this MMIO handled locally? 3432 * Is this MMIO handled locally?
3551 */ 3433 */
3552 if (!vcpu_mmio_read(vcpu, gpa, bytes, val)) { 3434 if (!vcpu_mmio_read(vcpu, gpa, bytes, val)) {
3553 trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, gpa, *(u64 *)val); 3435 trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, gpa, *(u64 *)val);
3554 return X86EMUL_CONTINUE; 3436 return X86EMUL_CONTINUE;
3555 } 3437 }
3556 3438
3557 trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0); 3439 trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0);
3558 3440
3559 vcpu->mmio_needed = 1; 3441 vcpu->mmio_needed = 1;
3560 vcpu->run->exit_reason = KVM_EXIT_MMIO; 3442 vcpu->run->exit_reason = KVM_EXIT_MMIO;
3561 vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; 3443 vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa;
3562 vcpu->run->mmio.len = vcpu->mmio_size = bytes; 3444 vcpu->run->mmio.len = vcpu->mmio_size = bytes;
3563 vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0; 3445 vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0;
3564 3446
3565 return X86EMUL_IO_NEEDED; 3447 return X86EMUL_IO_NEEDED;
3566 } 3448 }
3567 3449
3568 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, 3450 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
3569 const void *val, int bytes) 3451 const void *val, int bytes)
3570 { 3452 {
3571 int ret; 3453 int ret;
3572 3454
3573 ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes); 3455 ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
3574 if (ret < 0) 3456 if (ret < 0)
3575 return 0; 3457 return 0;
3576 kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); 3458 kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1);
3577 return 1; 3459 return 1;
3578 } 3460 }
3579 3461
3580 static int emulator_write_emulated_onepage(unsigned long addr, 3462 static int emulator_write_emulated_onepage(unsigned long addr,
3581 const void *val, 3463 const void *val,
3582 unsigned int bytes, 3464 unsigned int bytes,
3583 unsigned int *error_code, 3465 unsigned int *error_code,
3584 struct kvm_vcpu *vcpu) 3466 struct kvm_vcpu *vcpu)
3585 { 3467 {
3586 gpa_t gpa; 3468 gpa_t gpa;
3587 3469
3588 gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); 3470 gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
3589 3471
3590 if (gpa == UNMAPPED_GVA) 3472 if (gpa == UNMAPPED_GVA)
3591 return X86EMUL_PROPAGATE_FAULT; 3473 return X86EMUL_PROPAGATE_FAULT;
3592 3474
3593 /* For APIC access vmexit */ 3475 /* For APIC access vmexit */
3594 if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) 3476 if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
3595 goto mmio; 3477 goto mmio;
3596 3478
3597 if (emulator_write_phys(vcpu, gpa, val, bytes)) 3479 if (emulator_write_phys(vcpu, gpa, val, bytes))
3598 return X86EMUL_CONTINUE; 3480 return X86EMUL_CONTINUE;
3599 3481
3600 mmio: 3482 mmio:
3601 trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val); 3483 trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
3602 /* 3484 /*
3603 * Is this MMIO handled locally? 3485 * Is this MMIO handled locally?
3604 */ 3486 */
3605 if (!vcpu_mmio_write(vcpu, gpa, bytes, val)) 3487 if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
3606 return X86EMUL_CONTINUE; 3488 return X86EMUL_CONTINUE;
3607 3489
3608 vcpu->mmio_needed = 1; 3490 vcpu->mmio_needed = 1;
3609 vcpu->run->exit_reason = KVM_EXIT_MMIO; 3491 vcpu->run->exit_reason = KVM_EXIT_MMIO;
3610 vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; 3492 vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa;
3611 vcpu->run->mmio.len = vcpu->mmio_size = bytes; 3493 vcpu->run->mmio.len = vcpu->mmio_size = bytes;
3612 vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1; 3494 vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1;
3613 memcpy(vcpu->run->mmio.data, val, bytes); 3495 memcpy(vcpu->run->mmio.data, val, bytes);
3614 3496
3615 return X86EMUL_CONTINUE; 3497 return X86EMUL_CONTINUE;
3616 } 3498 }
3617 3499
3618 int emulator_write_emulated(unsigned long addr, 3500 int emulator_write_emulated(unsigned long addr,
3619 const void *val, 3501 const void *val,
3620 unsigned int bytes, 3502 unsigned int bytes,
3621 unsigned int *error_code, 3503 unsigned int *error_code,
3622 struct kvm_vcpu *vcpu) 3504 struct kvm_vcpu *vcpu)
3623 { 3505 {
3624 /* Crossing a page boundary? */ 3506 /* Crossing a page boundary? */
3625 if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { 3507 if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
3626 int rc, now; 3508 int rc, now;
3627 3509
3628 now = -addr & ~PAGE_MASK; 3510 now = -addr & ~PAGE_MASK;
3629 rc = emulator_write_emulated_onepage(addr, val, now, error_code, 3511 rc = emulator_write_emulated_onepage(addr, val, now, error_code,
3630 vcpu); 3512 vcpu);
3631 if (rc != X86EMUL_CONTINUE) 3513 if (rc != X86EMUL_CONTINUE)
3632 return rc; 3514 return rc;
3633 addr += now; 3515 addr += now;
3634 val += now; 3516 val += now;
3635 bytes -= now; 3517 bytes -= now;
3636 } 3518 }
3637 return emulator_write_emulated_onepage(addr, val, bytes, error_code, 3519 return emulator_write_emulated_onepage(addr, val, bytes, error_code,
3638 vcpu); 3520 vcpu);
3639 } 3521 }
3640 3522
3641 #define CMPXCHG_TYPE(t, ptr, old, new) \ 3523 #define CMPXCHG_TYPE(t, ptr, old, new) \
3642 (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) 3524 (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
3643 3525
3644 #ifdef CONFIG_X86_64 3526 #ifdef CONFIG_X86_64
3645 # define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) 3527 # define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
3646 #else 3528 #else
3647 # define CMPXCHG64(ptr, old, new) \ 3529 # define CMPXCHG64(ptr, old, new) \
3648 (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old)) 3530 (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old))
3649 #endif 3531 #endif
3650 3532
3651 static int emulator_cmpxchg_emulated(unsigned long addr, 3533 static int emulator_cmpxchg_emulated(unsigned long addr,
3652 const void *old, 3534 const void *old,
3653 const void *new, 3535 const void *new,
3654 unsigned int bytes, 3536 unsigned int bytes,
3655 unsigned int *error_code, 3537 unsigned int *error_code,
3656 struct kvm_vcpu *vcpu) 3538 struct kvm_vcpu *vcpu)
3657 { 3539 {
3658 gpa_t gpa; 3540 gpa_t gpa;
3659 struct page *page; 3541 struct page *page;
3660 char *kaddr; 3542 char *kaddr;
3661 bool exchanged; 3543 bool exchanged;
3662 3544
3663 /* guests cmpxchg8b have to be emulated atomically */ 3545 /* guests cmpxchg8b have to be emulated atomically */
3664 if (bytes > 8 || (bytes & (bytes - 1))) 3546 if (bytes > 8 || (bytes & (bytes - 1)))
3665 goto emul_write; 3547 goto emul_write;
3666 3548
3667 gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); 3549 gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
3668 3550
3669 if (gpa == UNMAPPED_GVA || 3551 if (gpa == UNMAPPED_GVA ||
3670 (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) 3552 (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
3671 goto emul_write; 3553 goto emul_write;
3672 3554
3673 if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) 3555 if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
3674 goto emul_write; 3556 goto emul_write;
3675 3557
3676 page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); 3558 page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
3677 3559
3678 kaddr = kmap_atomic(page, KM_USER0); 3560 kaddr = kmap_atomic(page, KM_USER0);
3679 kaddr += offset_in_page(gpa); 3561 kaddr += offset_in_page(gpa);
3680 switch (bytes) { 3562 switch (bytes) {
3681 case 1: 3563 case 1:
3682 exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); 3564 exchanged = CMPXCHG_TYPE(u8, kaddr, old, new);
3683 break; 3565 break;
3684 case 2: 3566 case 2:
3685 exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); 3567 exchanged = CMPXCHG_TYPE(u16, kaddr, old, new);
3686 break; 3568 break;
3687 case 4: 3569 case 4:
3688 exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); 3570 exchanged = CMPXCHG_TYPE(u32, kaddr, old, new);
3689 break; 3571 break;
3690 case 8: 3572 case 8:
3691 exchanged = CMPXCHG64(kaddr, old, new); 3573 exchanged = CMPXCHG64(kaddr, old, new);
3692 break; 3574 break;
3693 default: 3575 default:
3694 BUG(); 3576 BUG();
3695 } 3577 }
3696 kunmap_atomic(kaddr, KM_USER0); 3578 kunmap_atomic(kaddr, KM_USER0);
3697 kvm_release_page_dirty(page); 3579 kvm_release_page_dirty(page);
3698 3580
3699 if (!exchanged) 3581 if (!exchanged)
3700 return X86EMUL_CMPXCHG_FAILED; 3582 return X86EMUL_CMPXCHG_FAILED;
3701 3583
3702 kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1); 3584 kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
3703 3585
3704 return X86EMUL_CONTINUE; 3586 return X86EMUL_CONTINUE;
3705 3587
3706 emul_write: 3588 emul_write:
3707 printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); 3589 printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
3708 3590
3709 return emulator_write_emulated(addr, new, bytes, error_code, vcpu); 3591 return emulator_write_emulated(addr, new, bytes, error_code, vcpu);
3710 } 3592 }
3711 3593
3712 static int kernel_pio(struct kvm_vcpu *vcpu, void *pd) 3594 static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
3713 { 3595 {
3714 /* TODO: String I/O for in kernel device */ 3596 /* TODO: String I/O for in kernel device */
3715 int r; 3597 int r;
3716 3598
3717 if (vcpu->arch.pio.in) 3599 if (vcpu->arch.pio.in)
3718 r = kvm_io_bus_read(vcpu->kvm, KVM_PIO_BUS, vcpu->arch.pio.port, 3600 r = kvm_io_bus_read(vcpu->kvm, KVM_PIO_BUS, vcpu->arch.pio.port,
3719 vcpu->arch.pio.size, pd); 3601 vcpu->arch.pio.size, pd);
3720 else 3602 else
3721 r = kvm_io_bus_write(vcpu->kvm, KVM_PIO_BUS, 3603 r = kvm_io_bus_write(vcpu->kvm, KVM_PIO_BUS,
3722 vcpu->arch.pio.port, vcpu->arch.pio.size, 3604 vcpu->arch.pio.port, vcpu->arch.pio.size,
3723 pd); 3605 pd);
3724 return r; 3606 return r;
3725 } 3607 }
3726 3608
3727 3609
3728 static int emulator_pio_in_emulated(int size, unsigned short port, void *val, 3610 static int emulator_pio_in_emulated(int size, unsigned short port, void *val,
3729 unsigned int count, struct kvm_vcpu *vcpu) 3611 unsigned int count, struct kvm_vcpu *vcpu)
3730 { 3612 {
3731 if (vcpu->arch.pio.count) 3613 if (vcpu->arch.pio.count)
3732 goto data_avail; 3614 goto data_avail;
3733 3615
3734 trace_kvm_pio(1, port, size, 1); 3616 trace_kvm_pio(1, port, size, 1);
3735 3617
3736 vcpu->arch.pio.port = port; 3618 vcpu->arch.pio.port = port;
3737 vcpu->arch.pio.in = 1; 3619 vcpu->arch.pio.in = 1;
3738 vcpu->arch.pio.count = count; 3620 vcpu->arch.pio.count = count;
3739 vcpu->arch.pio.size = size; 3621 vcpu->arch.pio.size = size;
3740 3622
3741 if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { 3623 if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
3742 data_avail: 3624 data_avail:
3743 memcpy(val, vcpu->arch.pio_data, size * count); 3625 memcpy(val, vcpu->arch.pio_data, size * count);
3744 vcpu->arch.pio.count = 0; 3626 vcpu->arch.pio.count = 0;
3745 return 1; 3627 return 1;
3746 } 3628 }
3747 3629
3748 vcpu->run->exit_reason = KVM_EXIT_IO; 3630 vcpu->run->exit_reason = KVM_EXIT_IO;
3749 vcpu->run->io.direction = KVM_EXIT_IO_IN; 3631 vcpu->run->io.direction = KVM_EXIT_IO_IN;
3750 vcpu->run->io.size = size; 3632 vcpu->run->io.size = size;
3751 vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; 3633 vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
3752 vcpu->run->io.count = count; 3634 vcpu->run->io.count = count;
3753 vcpu->run->io.port = port; 3635 vcpu->run->io.port = port;
3754 3636
3755 return 0; 3637 return 0;
3756 } 3638 }
3757 3639
3758 static int emulator_pio_out_emulated(int size, unsigned short port, 3640 static int emulator_pio_out_emulated(int size, unsigned short port,
3759 const void *val, unsigned int count, 3641 const void *val, unsigned int count,
3760 struct kvm_vcpu *vcpu) 3642 struct kvm_vcpu *vcpu)
3761 { 3643 {
3762 trace_kvm_pio(0, port, size, 1); 3644 trace_kvm_pio(0, port, size, 1);
3763 3645
3764 vcpu->arch.pio.port = port; 3646 vcpu->arch.pio.port = port;
3765 vcpu->arch.pio.in = 0; 3647 vcpu->arch.pio.in = 0;
3766 vcpu->arch.pio.count = count; 3648 vcpu->arch.pio.count = count;
3767 vcpu->arch.pio.size = size; 3649 vcpu->arch.pio.size = size;
3768 3650
3769 memcpy(vcpu->arch.pio_data, val, size * count); 3651 memcpy(vcpu->arch.pio_data, val, size * count);
3770 3652
3771 if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { 3653 if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
3772 vcpu->arch.pio.count = 0; 3654 vcpu->arch.pio.count = 0;
3773 return 1; 3655 return 1;
3774 } 3656 }
3775 3657
3776 vcpu->run->exit_reason = KVM_EXIT_IO; 3658 vcpu->run->exit_reason = KVM_EXIT_IO;
3777 vcpu->run->io.direction = KVM_EXIT_IO_OUT; 3659 vcpu->run->io.direction = KVM_EXIT_IO_OUT;
3778 vcpu->run->io.size = size; 3660 vcpu->run->io.size = size;
3779 vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; 3661 vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
3780 vcpu->run->io.count = count; 3662 vcpu->run->io.count = count;
3781 vcpu->run->io.port = port; 3663 vcpu->run->io.port = port;
3782 3664
3783 return 0; 3665 return 0;
3784 } 3666 }
3785 3667
3786 static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg) 3668 static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
3787 { 3669 {
3788 return kvm_x86_ops->get_segment_base(vcpu, seg); 3670 return kvm_x86_ops->get_segment_base(vcpu, seg);
3789 } 3671 }
3790 3672
3791 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address) 3673 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
3792 { 3674 {
3793 kvm_mmu_invlpg(vcpu, address); 3675 kvm_mmu_invlpg(vcpu, address);
3794 return X86EMUL_CONTINUE; 3676 return X86EMUL_CONTINUE;
3795 } 3677 }
3796 3678
3797 int emulate_clts(struct kvm_vcpu *vcpu) 3679 int emulate_clts(struct kvm_vcpu *vcpu)
3798 { 3680 {
3799 kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS)); 3681 kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
3800 kvm_x86_ops->fpu_activate(vcpu); 3682 kvm_x86_ops->fpu_activate(vcpu);
3801 return X86EMUL_CONTINUE; 3683 return X86EMUL_CONTINUE;
3802 } 3684 }
3803 3685
3804 int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu) 3686 int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu)
3805 { 3687 {
3806 return _kvm_get_dr(vcpu, dr, dest); 3688 return _kvm_get_dr(vcpu, dr, dest);
3807 } 3689 }
3808 3690
3809 int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu) 3691 int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu)
3810 { 3692 {
3811 3693
3812 return __kvm_set_dr(vcpu, dr, value); 3694 return __kvm_set_dr(vcpu, dr, value);
3813 } 3695 }
3814 3696
3815 static u64 mk_cr_64(u64 curr_cr, u32 new_val) 3697 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
3816 { 3698 {
3817 return (curr_cr & ~((1ULL << 32) - 1)) | new_val; 3699 return (curr_cr & ~((1ULL << 32) - 1)) | new_val;
3818 } 3700 }
3819 3701
3820 static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) 3702 static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
3821 { 3703 {
3822 unsigned long value; 3704 unsigned long value;
3823 3705
3824 switch (cr) { 3706 switch (cr) {
3825 case 0: 3707 case 0:
3826 value = kvm_read_cr0(vcpu); 3708 value = kvm_read_cr0(vcpu);
3827 break; 3709 break;
3828 case 2: 3710 case 2:
3829 value = vcpu->arch.cr2; 3711 value = vcpu->arch.cr2;
3830 break; 3712 break;
3831 case 3: 3713 case 3:
3832 value = vcpu->arch.cr3; 3714 value = vcpu->arch.cr3;
3833 break; 3715 break;
3834 case 4: 3716 case 4:
3835 value = kvm_read_cr4(vcpu); 3717 value = kvm_read_cr4(vcpu);
3836 break; 3718 break;
3837 case 8: 3719 case 8:
3838 value = kvm_get_cr8(vcpu); 3720 value = kvm_get_cr8(vcpu);
3839 break; 3721 break;
3840 default: 3722 default:
3841 vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); 3723 vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
3842 return 0; 3724 return 0;
3843 } 3725 }
3844 3726
3845 return value; 3727 return value;
3846 } 3728 }
3847 3729
3848 static int emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) 3730 static int emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu)
3849 { 3731 {
3850 int res = 0; 3732 int res = 0;
3851 3733
3852 switch (cr) { 3734 switch (cr) {
3853 case 0: 3735 case 0:
3854 res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); 3736 res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
3855 break; 3737 break;
3856 case 2: 3738 case 2:
3857 vcpu->arch.cr2 = val; 3739 vcpu->arch.cr2 = val;
3858 break; 3740 break;
3859 case 3: 3741 case 3:
3860 res = kvm_set_cr3(vcpu, val); 3742 res = kvm_set_cr3(vcpu, val);
3861 break; 3743 break;
3862 case 4: 3744 case 4:
3863 res = kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); 3745 res = kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val));
3864 break; 3746 break;
3865 case 8: 3747 case 8:
3866 res = __kvm_set_cr8(vcpu, val & 0xfUL); 3748 res = __kvm_set_cr8(vcpu, val & 0xfUL);
3867 break; 3749 break;
3868 default: 3750 default:
3869 vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); 3751 vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
3870 res = -1; 3752 res = -1;
3871 } 3753 }
3872 3754
3873 return res; 3755 return res;
3874 } 3756 }
3875 3757
3876 static int emulator_get_cpl(struct kvm_vcpu *vcpu) 3758 static int emulator_get_cpl(struct kvm_vcpu *vcpu)
3877 { 3759 {
3878 return kvm_x86_ops->get_cpl(vcpu); 3760 return kvm_x86_ops->get_cpl(vcpu);
3879 } 3761 }
3880 3762
3881 static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) 3763 static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu)
3882 { 3764 {
3883 kvm_x86_ops->get_gdt(vcpu, dt); 3765 kvm_x86_ops->get_gdt(vcpu, dt);
3884 } 3766 }
3885 3767
3886 static unsigned long emulator_get_cached_segment_base(int seg, 3768 static unsigned long emulator_get_cached_segment_base(int seg,
3887 struct kvm_vcpu *vcpu) 3769 struct kvm_vcpu *vcpu)
3888 { 3770 {
3889 return get_segment_base(vcpu, seg); 3771 return get_segment_base(vcpu, seg);
3890 } 3772 }
3891 3773
3892 static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg, 3774 static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg,
3893 struct kvm_vcpu *vcpu) 3775 struct kvm_vcpu *vcpu)
3894 { 3776 {
3895 struct kvm_segment var; 3777 struct kvm_segment var;
3896 3778
3897 kvm_get_segment(vcpu, &var, seg); 3779 kvm_get_segment(vcpu, &var, seg);
3898 3780
3899 if (var.unusable) 3781 if (var.unusable)
3900 return false; 3782 return false;
3901 3783
3902 if (var.g) 3784 if (var.g)
3903 var.limit >>= 12; 3785 var.limit >>= 12;
3904 set_desc_limit(desc, var.limit); 3786 set_desc_limit(desc, var.limit);
3905 set_desc_base(desc, (unsigned long)var.base); 3787 set_desc_base(desc, (unsigned long)var.base);
3906 desc->type = var.type; 3788 desc->type = var.type;
3907 desc->s = var.s; 3789 desc->s = var.s;
3908 desc->dpl = var.dpl; 3790 desc->dpl = var.dpl;
3909 desc->p = var.present; 3791 desc->p = var.present;
3910 desc->avl = var.avl; 3792 desc->avl = var.avl;
3911 desc->l = var.l; 3793 desc->l = var.l;
3912 desc->d = var.db; 3794 desc->d = var.db;
3913 desc->g = var.g; 3795 desc->g = var.g;
3914 3796
3915 return true; 3797 return true;
3916 } 3798 }
3917 3799
3918 static void emulator_set_cached_descriptor(struct desc_struct *desc, int seg, 3800 static void emulator_set_cached_descriptor(struct desc_struct *desc, int seg,
3919 struct kvm_vcpu *vcpu) 3801 struct kvm_vcpu *vcpu)
3920 { 3802 {
3921 struct kvm_segment var; 3803 struct kvm_segment var;
3922 3804
3923 /* needed to preserve selector */ 3805 /* needed to preserve selector */
3924 kvm_get_segment(vcpu, &var, seg); 3806 kvm_get_segment(vcpu, &var, seg);
3925 3807
3926 var.base = get_desc_base(desc); 3808 var.base = get_desc_base(desc);
3927 var.limit = get_desc_limit(desc); 3809 var.limit = get_desc_limit(desc);
3928 if (desc->g) 3810 if (desc->g)
3929 var.limit = (var.limit << 12) | 0xfff; 3811 var.limit = (var.limit << 12) | 0xfff;
3930 var.type = desc->type; 3812 var.type = desc->type;
3931 var.present = desc->p; 3813 var.present = desc->p;
3932 var.dpl = desc->dpl; 3814 var.dpl = desc->dpl;
3933 var.db = desc->d; 3815 var.db = desc->d;
3934 var.s = desc->s; 3816 var.s = desc->s;
3935 var.l = desc->l; 3817 var.l = desc->l;
3936 var.g = desc->g; 3818 var.g = desc->g;
3937 var.avl = desc->avl; 3819 var.avl = desc->avl;
3938 var.present = desc->p; 3820 var.present = desc->p;
3939 var.unusable = !var.present; 3821 var.unusable = !var.present;
3940 var.padding = 0; 3822 var.padding = 0;
3941 3823
3942 kvm_set_segment(vcpu, &var, seg); 3824 kvm_set_segment(vcpu, &var, seg);
3943 return; 3825 return;
3944 } 3826 }
3945 3827
3946 static u16 emulator_get_segment_selector(int seg, struct kvm_vcpu *vcpu) 3828 static u16 emulator_get_segment_selector(int seg, struct kvm_vcpu *vcpu)
3947 { 3829 {
3948 struct kvm_segment kvm_seg; 3830 struct kvm_segment kvm_seg;
3949 3831
3950 kvm_get_segment(vcpu, &kvm_seg, seg); 3832 kvm_get_segment(vcpu, &kvm_seg, seg);
3951 return kvm_seg.selector; 3833 return kvm_seg.selector;
3952 } 3834 }
3953 3835
3954 static void emulator_set_segment_selector(u16 sel, int seg, 3836 static void emulator_set_segment_selector(u16 sel, int seg,
3955 struct kvm_vcpu *vcpu) 3837 struct kvm_vcpu *vcpu)
3956 { 3838 {
3957 struct kvm_segment kvm_seg; 3839 struct kvm_segment kvm_seg;
3958 3840
3959 kvm_get_segment(vcpu, &kvm_seg, seg); 3841 kvm_get_segment(vcpu, &kvm_seg, seg);
3960 kvm_seg.selector = sel; 3842 kvm_seg.selector = sel;
3961 kvm_set_segment(vcpu, &kvm_seg, seg); 3843 kvm_set_segment(vcpu, &kvm_seg, seg);
3962 } 3844 }
3963 3845
3964 static struct x86_emulate_ops emulate_ops = { 3846 static struct x86_emulate_ops emulate_ops = {
3965 .read_std = kvm_read_guest_virt_system, 3847 .read_std = kvm_read_guest_virt_system,
3966 .write_std = kvm_write_guest_virt_system, 3848 .write_std = kvm_write_guest_virt_system,
3967 .fetch = kvm_fetch_guest_virt, 3849 .fetch = kvm_fetch_guest_virt,
3968 .read_emulated = emulator_read_emulated, 3850 .read_emulated = emulator_read_emulated,
3969 .write_emulated = emulator_write_emulated, 3851 .write_emulated = emulator_write_emulated,
3970 .cmpxchg_emulated = emulator_cmpxchg_emulated, 3852 .cmpxchg_emulated = emulator_cmpxchg_emulated,
3971 .pio_in_emulated = emulator_pio_in_emulated, 3853 .pio_in_emulated = emulator_pio_in_emulated,
3972 .pio_out_emulated = emulator_pio_out_emulated, 3854 .pio_out_emulated = emulator_pio_out_emulated,
3973 .get_cached_descriptor = emulator_get_cached_descriptor, 3855 .get_cached_descriptor = emulator_get_cached_descriptor,
3974 .set_cached_descriptor = emulator_set_cached_descriptor, 3856 .set_cached_descriptor = emulator_set_cached_descriptor,
3975 .get_segment_selector = emulator_get_segment_selector, 3857 .get_segment_selector = emulator_get_segment_selector,
3976 .set_segment_selector = emulator_set_segment_selector, 3858 .set_segment_selector = emulator_set_segment_selector,
3977 .get_cached_segment_base = emulator_get_cached_segment_base, 3859 .get_cached_segment_base = emulator_get_cached_segment_base,
3978 .get_gdt = emulator_get_gdt, 3860 .get_gdt = emulator_get_gdt,
3979 .get_cr = emulator_get_cr, 3861 .get_cr = emulator_get_cr,
3980 .set_cr = emulator_set_cr, 3862 .set_cr = emulator_set_cr,
3981 .cpl = emulator_get_cpl, 3863 .cpl = emulator_get_cpl,
3982 .get_dr = emulator_get_dr, 3864 .get_dr = emulator_get_dr,
3983 .set_dr = emulator_set_dr, 3865 .set_dr = emulator_set_dr,
3984 .set_msr = kvm_set_msr, 3866 .set_msr = kvm_set_msr,
3985 .get_msr = kvm_get_msr, 3867 .get_msr = kvm_get_msr,
3986 }; 3868 };
3987 3869
3988 static void cache_all_regs(struct kvm_vcpu *vcpu) 3870 static void cache_all_regs(struct kvm_vcpu *vcpu)
3989 { 3871 {
3990 kvm_register_read(vcpu, VCPU_REGS_RAX); 3872 kvm_register_read(vcpu, VCPU_REGS_RAX);
3991 kvm_register_read(vcpu, VCPU_REGS_RSP); 3873 kvm_register_read(vcpu, VCPU_REGS_RSP);
3992 kvm_register_read(vcpu, VCPU_REGS_RIP); 3874 kvm_register_read(vcpu, VCPU_REGS_RIP);
3993 vcpu->arch.regs_dirty = ~0; 3875 vcpu->arch.regs_dirty = ~0;
3994 } 3876 }
3995 3877
3996 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) 3878 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
3997 { 3879 {
3998 u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask); 3880 u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask);
3999 /* 3881 /*
4000 * an sti; sti; sequence only disable interrupts for the first 3882 * an sti; sti; sequence only disable interrupts for the first
4001 * instruction. So, if the last instruction, be it emulated or 3883 * instruction. So, if the last instruction, be it emulated or
4002 * not, left the system with the INT_STI flag enabled, it 3884 * not, left the system with the INT_STI flag enabled, it
4003 * means that the last instruction is an sti. We should not 3885 * means that the last instruction is an sti. We should not
4004 * leave the flag on in this case. The same goes for mov ss 3886 * leave the flag on in this case. The same goes for mov ss
4005 */ 3887 */
4006 if (!(int_shadow & mask)) 3888 if (!(int_shadow & mask))
4007 kvm_x86_ops->set_interrupt_shadow(vcpu, mask); 3889 kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
4008 } 3890 }
4009 3891
4010 static void inject_emulated_exception(struct kvm_vcpu *vcpu) 3892 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
4011 { 3893 {
4012 struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt; 3894 struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
4013 if (ctxt->exception == PF_VECTOR) 3895 if (ctxt->exception == PF_VECTOR)
4014 kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code); 3896 kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code);
4015 else if (ctxt->error_code_valid) 3897 else if (ctxt->error_code_valid)
4016 kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code); 3898 kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
4017 else 3899 else
4018 kvm_queue_exception(vcpu, ctxt->exception); 3900 kvm_queue_exception(vcpu, ctxt->exception);
4019 } 3901 }
4020 3902
4021 static int handle_emulation_failure(struct kvm_vcpu *vcpu) 3903 static int handle_emulation_failure(struct kvm_vcpu *vcpu)
4022 { 3904 {
4023 ++vcpu->stat.insn_emulation_fail; 3905 ++vcpu->stat.insn_emulation_fail;
4024 trace_kvm_emulate_insn_failed(vcpu); 3906 trace_kvm_emulate_insn_failed(vcpu);
4025 vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 3907 vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
4026 vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 3908 vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
4027 vcpu->run->internal.ndata = 0; 3909 vcpu->run->internal.ndata = 0;
4028 kvm_queue_exception(vcpu, UD_VECTOR); 3910 kvm_queue_exception(vcpu, UD_VECTOR);
4029 return EMULATE_FAIL; 3911 return EMULATE_FAIL;
4030 } 3912 }
4031 3913
4032 int emulate_instruction(struct kvm_vcpu *vcpu, 3914 int emulate_instruction(struct kvm_vcpu *vcpu,
4033 unsigned long cr2, 3915 unsigned long cr2,
4034 u16 error_code, 3916 u16 error_code,
4035 int emulation_type) 3917 int emulation_type)
4036 { 3918 {
4037 int r; 3919 int r;
4038 struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; 3920 struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
4039 3921
4040 kvm_clear_exception_queue(vcpu); 3922 kvm_clear_exception_queue(vcpu);
4041 vcpu->arch.mmio_fault_cr2 = cr2; 3923 vcpu->arch.mmio_fault_cr2 = cr2;
4042 /* 3924 /*
4043 * TODO: fix emulate.c to use guest_read/write_register 3925 * TODO: fix emulate.c to use guest_read/write_register
4044 * instead of direct ->regs accesses, can save hundred cycles 3926 * instead of direct ->regs accesses, can save hundred cycles
4045 * on Intel for instructions that don't read/change RSP, for 3927 * on Intel for instructions that don't read/change RSP, for
4046 * for example. 3928 * for example.
4047 */ 3929 */
4048 cache_all_regs(vcpu); 3930 cache_all_regs(vcpu);
4049 3931
4050 if (!(emulation_type & EMULTYPE_NO_DECODE)) { 3932 if (!(emulation_type & EMULTYPE_NO_DECODE)) {
4051 int cs_db, cs_l; 3933 int cs_db, cs_l;
4052 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); 3934 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
4053 3935
4054 vcpu->arch.emulate_ctxt.vcpu = vcpu; 3936 vcpu->arch.emulate_ctxt.vcpu = vcpu;
4055 vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); 3937 vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
4056 vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); 3938 vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
4057 vcpu->arch.emulate_ctxt.mode = 3939 vcpu->arch.emulate_ctxt.mode =
4058 (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : 3940 (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
4059 (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) 3941 (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
4060 ? X86EMUL_MODE_VM86 : cs_l 3942 ? X86EMUL_MODE_VM86 : cs_l
4061 ? X86EMUL_MODE_PROT64 : cs_db 3943 ? X86EMUL_MODE_PROT64 : cs_db
4062 ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; 3944 ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
4063 memset(c, 0, sizeof(struct decode_cache)); 3945 memset(c, 0, sizeof(struct decode_cache));
4064 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); 3946 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
4065 vcpu->arch.emulate_ctxt.interruptibility = 0; 3947 vcpu->arch.emulate_ctxt.interruptibility = 0;
4066 vcpu->arch.emulate_ctxt.exception = -1; 3948 vcpu->arch.emulate_ctxt.exception = -1;
4067 3949
4068 r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); 3950 r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
4069 trace_kvm_emulate_insn_start(vcpu); 3951 trace_kvm_emulate_insn_start(vcpu);
4070 3952
4071 /* Only allow emulation of specific instructions on #UD 3953 /* Only allow emulation of specific instructions on #UD
4072 * (namely VMMCALL, sysenter, sysexit, syscall)*/ 3954 * (namely VMMCALL, sysenter, sysexit, syscall)*/
4073 if (emulation_type & EMULTYPE_TRAP_UD) { 3955 if (emulation_type & EMULTYPE_TRAP_UD) {
4074 if (!c->twobyte) 3956 if (!c->twobyte)
4075 return EMULATE_FAIL; 3957 return EMULATE_FAIL;
4076 switch (c->b) { 3958 switch (c->b) {
4077 case 0x01: /* VMMCALL */ 3959 case 0x01: /* VMMCALL */
4078 if (c->modrm_mod != 3 || c->modrm_rm != 1) 3960 if (c->modrm_mod != 3 || c->modrm_rm != 1)
4079 return EMULATE_FAIL; 3961 return EMULATE_FAIL;
4080 break; 3962 break;
4081 case 0x34: /* sysenter */ 3963 case 0x34: /* sysenter */
4082 case 0x35: /* sysexit */ 3964 case 0x35: /* sysexit */
4083 if (c->modrm_mod != 0 || c->modrm_rm != 0) 3965 if (c->modrm_mod != 0 || c->modrm_rm != 0)
4084 return EMULATE_FAIL; 3966 return EMULATE_FAIL;
4085 break; 3967 break;
4086 case 0x05: /* syscall */ 3968 case 0x05: /* syscall */
4087 if (c->modrm_mod != 0 || c->modrm_rm != 0) 3969 if (c->modrm_mod != 0 || c->modrm_rm != 0)
4088 return EMULATE_FAIL; 3970 return EMULATE_FAIL;
4089 break; 3971 break;
4090 default: 3972 default:
4091 return EMULATE_FAIL; 3973 return EMULATE_FAIL;
4092 } 3974 }
4093 3975
4094 if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) 3976 if (!(c->modrm_reg == 0 || c->modrm_reg == 3))
4095 return EMULATE_FAIL; 3977 return EMULATE_FAIL;
4096 } 3978 }
4097 3979
4098 ++vcpu->stat.insn_emulation; 3980 ++vcpu->stat.insn_emulation;
4099 if (r) { 3981 if (r) {
4100 if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) 3982 if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
4101 return EMULATE_DONE; 3983 return EMULATE_DONE;
4102 if (emulation_type & EMULTYPE_SKIP) 3984 if (emulation_type & EMULTYPE_SKIP)
4103 return EMULATE_FAIL; 3985 return EMULATE_FAIL;
4104 return handle_emulation_failure(vcpu); 3986 return handle_emulation_failure(vcpu);
4105 } 3987 }
4106 } 3988 }
4107 3989
4108 if (emulation_type & EMULTYPE_SKIP) { 3990 if (emulation_type & EMULTYPE_SKIP) {
4109 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip); 3991 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
4110 return EMULATE_DONE; 3992 return EMULATE_DONE;
4111 } 3993 }
4112 3994
4113 /* this is needed for vmware backdor interface to work since it 3995 /* this is needed for vmware backdor interface to work since it
4114 changes registers values during IO operation */ 3996 changes registers values during IO operation */
4115 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); 3997 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
4116 3998
4117 restart: 3999 restart:
4118 r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); 4000 r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
4119 4001
4120 if (r) { /* emulation failed */ 4002 if (r) { /* emulation failed */
4121 /* 4003 /*
4122 * if emulation was due to access to shadowed page table 4004 * if emulation was due to access to shadowed page table
4123 * and it failed try to unshadow page and re-entetr the 4005 * and it failed try to unshadow page and re-entetr the
4124 * guest to let CPU execute the instruction. 4006 * guest to let CPU execute the instruction.
4125 */ 4007 */
4126 if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) 4008 if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
4127 return EMULATE_DONE; 4009 return EMULATE_DONE;
4128 4010
4129 return handle_emulation_failure(vcpu); 4011 return handle_emulation_failure(vcpu);
4130 } 4012 }
4131 4013
4132 toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility); 4014 toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility);
4133 kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); 4015 kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
4134 memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); 4016 memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
4135 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); 4017 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
4136 4018
4137 if (vcpu->arch.emulate_ctxt.exception >= 0) { 4019 if (vcpu->arch.emulate_ctxt.exception >= 0) {
4138 inject_emulated_exception(vcpu); 4020 inject_emulated_exception(vcpu);
4139 return EMULATE_DONE; 4021 return EMULATE_DONE;
4140 } 4022 }
4141 4023
4142 if (vcpu->arch.pio.count) { 4024 if (vcpu->arch.pio.count) {
4143 if (!vcpu->arch.pio.in) 4025 if (!vcpu->arch.pio.in)
4144 vcpu->arch.pio.count = 0; 4026 vcpu->arch.pio.count = 0;
4145 return EMULATE_DO_MMIO; 4027 return EMULATE_DO_MMIO;
4146 } 4028 }
4147 4029
4148 if (vcpu->mmio_needed) { 4030 if (vcpu->mmio_needed) {
4149 if (vcpu->mmio_is_write) 4031 if (vcpu->mmio_is_write)
4150 vcpu->mmio_needed = 0; 4032 vcpu->mmio_needed = 0;
4151 return EMULATE_DO_MMIO; 4033 return EMULATE_DO_MMIO;
4152 } 4034 }
4153 4035
4154 if (vcpu->arch.emulate_ctxt.restart) 4036 if (vcpu->arch.emulate_ctxt.restart)
4155 goto restart; 4037 goto restart;
4156 4038
4157 return EMULATE_DONE; 4039 return EMULATE_DONE;
4158 } 4040 }
4159 EXPORT_SYMBOL_GPL(emulate_instruction); 4041 EXPORT_SYMBOL_GPL(emulate_instruction);
4160 4042
4161 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) 4043 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
4162 { 4044 {
4163 unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); 4045 unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX);
4164 int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu); 4046 int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu);
4165 /* do not return to emulator after return from userspace */ 4047 /* do not return to emulator after return from userspace */
4166 vcpu->arch.pio.count = 0; 4048 vcpu->arch.pio.count = 0;
4167 return ret; 4049 return ret;
4168 } 4050 }
4169 EXPORT_SYMBOL_GPL(kvm_fast_pio_out); 4051 EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
4170 4052
4171 static void bounce_off(void *info) 4053 static void bounce_off(void *info)
4172 { 4054 {
4173 /* nothing */ 4055 /* nothing */
4174 } 4056 }
4175 4057
4176 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val, 4058 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
4177 void *data) 4059 void *data)
4178 { 4060 {
4179 struct cpufreq_freqs *freq = data; 4061 struct cpufreq_freqs *freq = data;
4180 struct kvm *kvm; 4062 struct kvm *kvm;
4181 struct kvm_vcpu *vcpu; 4063 struct kvm_vcpu *vcpu;
4182 int i, send_ipi = 0; 4064 int i, send_ipi = 0;
4183 4065
4184 if (val == CPUFREQ_PRECHANGE && freq->old > freq->new) 4066 if (val == CPUFREQ_PRECHANGE && freq->old > freq->new)
4185 return 0; 4067 return 0;
4186 if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new) 4068 if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new)
4187 return 0; 4069 return 0;
4188 per_cpu(cpu_tsc_khz, freq->cpu) = freq->new; 4070 per_cpu(cpu_tsc_khz, freq->cpu) = freq->new;
4189 4071
4190 spin_lock(&kvm_lock); 4072 spin_lock(&kvm_lock);
4191 list_for_each_entry(kvm, &vm_list, vm_list) { 4073 list_for_each_entry(kvm, &vm_list, vm_list) {
4192 kvm_for_each_vcpu(i, vcpu, kvm) { 4074 kvm_for_each_vcpu(i, vcpu, kvm) {
4193 if (vcpu->cpu != freq->cpu) 4075 if (vcpu->cpu != freq->cpu)
4194 continue; 4076 continue;
4195 if (!kvm_request_guest_time_update(vcpu)) 4077 if (!kvm_request_guest_time_update(vcpu))
4196 continue; 4078 continue;
4197 if (vcpu->cpu != smp_processor_id()) 4079 if (vcpu->cpu != smp_processor_id())
4198 send_ipi++; 4080 send_ipi++;
4199 } 4081 }
4200 } 4082 }
4201 spin_unlock(&kvm_lock); 4083 spin_unlock(&kvm_lock);
4202 4084
4203 if (freq->old < freq->new && send_ipi) { 4085 if (freq->old < freq->new && send_ipi) {
4204 /* 4086 /*
4205 * We upscale the frequency. Must make the guest 4087 * We upscale the frequency. Must make the guest
4206 * doesn't see old kvmclock values while running with 4088 * doesn't see old kvmclock values while running with
4207 * the new frequency, otherwise we risk the guest sees 4089 * the new frequency, otherwise we risk the guest sees
4208 * time go backwards. 4090 * time go backwards.
4209 * 4091 *
4210 * In case we update the frequency for another cpu 4092 * In case we update the frequency for another cpu
4211 * (which might be in guest context) send an interrupt 4093 * (which might be in guest context) send an interrupt
4212 * to kick the cpu out of guest context. Next time 4094 * to kick the cpu out of guest context. Next time
4213 * guest context is entered kvmclock will be updated, 4095 * guest context is entered kvmclock will be updated,
4214 * so the guest will not see stale values. 4096 * so the guest will not see stale values.
4215 */ 4097 */
4216 smp_call_function_single(freq->cpu, bounce_off, NULL, 1); 4098 smp_call_function_single(freq->cpu, bounce_off, NULL, 1);
4217 } 4099 }
4218 return 0; 4100 return 0;
4219 } 4101 }
4220 4102
4221 static struct notifier_block kvmclock_cpufreq_notifier_block = { 4103 static struct notifier_block kvmclock_cpufreq_notifier_block = {
4222 .notifier_call = kvmclock_cpufreq_notifier 4104 .notifier_call = kvmclock_cpufreq_notifier
4223 }; 4105 };
4224 4106
4225 static void kvm_timer_init(void) 4107 static void kvm_timer_init(void)
4226 { 4108 {
4227 int cpu; 4109 int cpu;
4228 4110
4229 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { 4111 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
4230 cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block, 4112 cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
4231 CPUFREQ_TRANSITION_NOTIFIER); 4113 CPUFREQ_TRANSITION_NOTIFIER);
4232 for_each_online_cpu(cpu) { 4114 for_each_online_cpu(cpu) {
4233 unsigned long khz = cpufreq_get(cpu); 4115 unsigned long khz = cpufreq_get(cpu);
4234 if (!khz) 4116 if (!khz)
4235 khz = tsc_khz; 4117 khz = tsc_khz;
4236 per_cpu(cpu_tsc_khz, cpu) = khz; 4118 per_cpu(cpu_tsc_khz, cpu) = khz;
4237 } 4119 }
4238 } else { 4120 } else {
4239 for_each_possible_cpu(cpu) 4121 for_each_possible_cpu(cpu)
4240 per_cpu(cpu_tsc_khz, cpu) = tsc_khz; 4122 per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
4241 } 4123 }
4242 } 4124 }
4243 4125
4244 static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); 4126 static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
4245 4127
4246 static int kvm_is_in_guest(void) 4128 static int kvm_is_in_guest(void)
4247 { 4129 {
4248 return percpu_read(current_vcpu) != NULL; 4130 return percpu_read(current_vcpu) != NULL;
4249 } 4131 }
4250 4132
4251 static int kvm_is_user_mode(void) 4133 static int kvm_is_user_mode(void)
4252 { 4134 {
4253 int user_mode = 3; 4135 int user_mode = 3;
4254 4136
4255 if (percpu_read(current_vcpu)) 4137 if (percpu_read(current_vcpu))
4256 user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu)); 4138 user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
4257 4139
4258 return user_mode != 0; 4140 return user_mode != 0;
4259 } 4141 }
4260 4142
4261 static unsigned long kvm_get_guest_ip(void) 4143 static unsigned long kvm_get_guest_ip(void)
4262 { 4144 {
4263 unsigned long ip = 0; 4145 unsigned long ip = 0;
4264 4146
4265 if (percpu_read(current_vcpu)) 4147 if (percpu_read(current_vcpu))
4266 ip = kvm_rip_read(percpu_read(current_vcpu)); 4148 ip = kvm_rip_read(percpu_read(current_vcpu));
4267 4149
4268 return ip; 4150 return ip;
4269 } 4151 }
4270 4152
4271 static struct perf_guest_info_callbacks kvm_guest_cbs = { 4153 static struct perf_guest_info_callbacks kvm_guest_cbs = {
4272 .is_in_guest = kvm_is_in_guest, 4154 .is_in_guest = kvm_is_in_guest,
4273 .is_user_mode = kvm_is_user_mode, 4155 .is_user_mode = kvm_is_user_mode,
4274 .get_guest_ip = kvm_get_guest_ip, 4156 .get_guest_ip = kvm_get_guest_ip,
4275 }; 4157 };
4276 4158
4277 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu) 4159 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu)
4278 { 4160 {
4279 percpu_write(current_vcpu, vcpu); 4161 percpu_write(current_vcpu, vcpu);
4280 } 4162 }
4281 EXPORT_SYMBOL_GPL(kvm_before_handle_nmi); 4163 EXPORT_SYMBOL_GPL(kvm_before_handle_nmi);
4282 4164
4283 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu) 4165 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu)
4284 { 4166 {
4285 percpu_write(current_vcpu, NULL); 4167 percpu_write(current_vcpu, NULL);
4286 } 4168 }
4287 EXPORT_SYMBOL_GPL(kvm_after_handle_nmi); 4169 EXPORT_SYMBOL_GPL(kvm_after_handle_nmi);
4288 4170
4289 int kvm_arch_init(void *opaque) 4171 int kvm_arch_init(void *opaque)
4290 { 4172 {
4291 int r; 4173 int r;
4292 struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque; 4174 struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque;
4293 4175
4294 if (kvm_x86_ops) { 4176 if (kvm_x86_ops) {
4295 printk(KERN_ERR "kvm: already loaded the other module\n"); 4177 printk(KERN_ERR "kvm: already loaded the other module\n");
4296 r = -EEXIST; 4178 r = -EEXIST;
4297 goto out; 4179 goto out;
4298 } 4180 }
4299 4181
4300 if (!ops->cpu_has_kvm_support()) { 4182 if (!ops->cpu_has_kvm_support()) {
4301 printk(KERN_ERR "kvm: no hardware support\n"); 4183 printk(KERN_ERR "kvm: no hardware support\n");
4302 r = -EOPNOTSUPP; 4184 r = -EOPNOTSUPP;
4303 goto out; 4185 goto out;
4304 } 4186 }
4305 if (ops->disabled_by_bios()) { 4187 if (ops->disabled_by_bios()) {
4306 printk(KERN_ERR "kvm: disabled by bios\n"); 4188 printk(KERN_ERR "kvm: disabled by bios\n");
4307 r = -EOPNOTSUPP; 4189 r = -EOPNOTSUPP;
4308 goto out; 4190 goto out;
4309 } 4191 }
4310 4192
4311 r = kvm_mmu_module_init(); 4193 r = kvm_mmu_module_init();
4312 if (r) 4194 if (r)
4313 goto out; 4195 goto out;
4314 4196
4315 kvm_init_msr_list(); 4197 kvm_init_msr_list();
4316 4198
4317 kvm_x86_ops = ops; 4199 kvm_x86_ops = ops;
4318 kvm_mmu_set_nonpresent_ptes(0ull, 0ull); 4200 kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
4319 kvm_mmu_set_base_ptes(PT_PRESENT_MASK); 4201 kvm_mmu_set_base_ptes(PT_PRESENT_MASK);
4320 kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, 4202 kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
4321 PT_DIRTY_MASK, PT64_NX_MASK, 0); 4203 PT_DIRTY_MASK, PT64_NX_MASK, 0);
4322 4204
4323 kvm_timer_init(); 4205 kvm_timer_init();
4324 4206
4325 perf_register_guest_info_callbacks(&kvm_guest_cbs); 4207 perf_register_guest_info_callbacks(&kvm_guest_cbs);
4326 4208
4327 if (cpu_has_xsave) 4209 if (cpu_has_xsave)
4328 host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); 4210 host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
4329 4211
4330 return 0; 4212 return 0;
4331 4213
4332 out: 4214 out:
4333 return r; 4215 return r;
4334 } 4216 }
4335 4217
4336 void kvm_arch_exit(void) 4218 void kvm_arch_exit(void)
4337 { 4219 {
4338 perf_unregister_guest_info_callbacks(&kvm_guest_cbs); 4220 perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
4339 4221
4340 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) 4222 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
4341 cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block, 4223 cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
4342 CPUFREQ_TRANSITION_NOTIFIER); 4224 CPUFREQ_TRANSITION_NOTIFIER);
4343 kvm_x86_ops = NULL; 4225 kvm_x86_ops = NULL;
4344 kvm_mmu_module_exit(); 4226 kvm_mmu_module_exit();
4345 } 4227 }
4346 4228
4347 int kvm_emulate_halt(struct kvm_vcpu *vcpu) 4229 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
4348 { 4230 {
4349 ++vcpu->stat.halt_exits; 4231 ++vcpu->stat.halt_exits;
4350 if (irqchip_in_kernel(vcpu->kvm)) { 4232 if (irqchip_in_kernel(vcpu->kvm)) {
4351 vcpu->arch.mp_state = KVM_MP_STATE_HALTED; 4233 vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
4352 return 1; 4234 return 1;
4353 } else { 4235 } else {
4354 vcpu->run->exit_reason = KVM_EXIT_HLT; 4236 vcpu->run->exit_reason = KVM_EXIT_HLT;
4355 return 0; 4237 return 0;
4356 } 4238 }
4357 } 4239 }
4358 EXPORT_SYMBOL_GPL(kvm_emulate_halt); 4240 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
4359 4241
4360 static inline gpa_t hc_gpa(struct kvm_vcpu *vcpu, unsigned long a0, 4242 static inline gpa_t hc_gpa(struct kvm_vcpu *vcpu, unsigned long a0,
4361 unsigned long a1) 4243 unsigned long a1)
4362 { 4244 {
4363 if (is_long_mode(vcpu)) 4245 if (is_long_mode(vcpu))
4364 return a0; 4246 return a0;
4365 else 4247 else
4366 return a0 | ((gpa_t)a1 << 32); 4248 return a0 | ((gpa_t)a1 << 32);
4367 } 4249 }
4368 4250
4369 int kvm_hv_hypercall(struct kvm_vcpu *vcpu) 4251 int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
4370 { 4252 {
4371 u64 param, ingpa, outgpa, ret; 4253 u64 param, ingpa, outgpa, ret;
4372 uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0; 4254 uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0;
4373 bool fast, longmode; 4255 bool fast, longmode;
4374 int cs_db, cs_l; 4256 int cs_db, cs_l;
4375 4257
4376 /* 4258 /*
4377 * hypercall generates UD from non zero cpl and real mode 4259 * hypercall generates UD from non zero cpl and real mode
4378 * per HYPER-V spec 4260 * per HYPER-V spec
4379 */ 4261 */
4380 if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { 4262 if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) {
4381 kvm_queue_exception(vcpu, UD_VECTOR); 4263 kvm_queue_exception(vcpu, UD_VECTOR);
4382 return 0; 4264 return 0;
4383 } 4265 }
4384 4266
4385 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); 4267 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
4386 longmode = is_long_mode(vcpu) && cs_l == 1; 4268 longmode = is_long_mode(vcpu) && cs_l == 1;
4387 4269
4388 if (!longmode) { 4270 if (!longmode) {
4389 param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | 4271 param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) |
4390 (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff); 4272 (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff);
4391 ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | 4273 ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) |
4392 (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff); 4274 (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff);
4393 outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | 4275 outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) |
4394 (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff); 4276 (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff);
4395 } 4277 }
4396 #ifdef CONFIG_X86_64 4278 #ifdef CONFIG_X86_64
4397 else { 4279 else {
4398 param = kvm_register_read(vcpu, VCPU_REGS_RCX); 4280 param = kvm_register_read(vcpu, VCPU_REGS_RCX);
4399 ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); 4281 ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
4400 outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); 4282 outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
4401 } 4283 }
4402 #endif 4284 #endif
4403 4285
4404 code = param & 0xffff; 4286 code = param & 0xffff;
4405 fast = (param >> 16) & 0x1; 4287 fast = (param >> 16) & 0x1;
4406 rep_cnt = (param >> 32) & 0xfff; 4288 rep_cnt = (param >> 32) & 0xfff;
4407 rep_idx = (param >> 48) & 0xfff; 4289 rep_idx = (param >> 48) & 0xfff;
4408 4290
4409 trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); 4291 trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa);
4410 4292
4411 switch (code) { 4293 switch (code) {
4412 case HV_X64_HV_NOTIFY_LONG_SPIN_WAIT: 4294 case HV_X64_HV_NOTIFY_LONG_SPIN_WAIT:
4413 kvm_vcpu_on_spin(vcpu); 4295 kvm_vcpu_on_spin(vcpu);
4414 break; 4296 break;
4415 default: 4297 default:
4416 res = HV_STATUS_INVALID_HYPERCALL_CODE; 4298 res = HV_STATUS_INVALID_HYPERCALL_CODE;
4417 break; 4299 break;
4418 } 4300 }
4419 4301
4420 ret = res | (((u64)rep_done & 0xfff) << 32); 4302 ret = res | (((u64)rep_done & 0xfff) << 32);
4421 if (longmode) { 4303 if (longmode) {
4422 kvm_register_write(vcpu, VCPU_REGS_RAX, ret); 4304 kvm_register_write(vcpu, VCPU_REGS_RAX, ret);
4423 } else { 4305 } else {
4424 kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32); 4306 kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32);
4425 kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0xffffffff); 4307 kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0xffffffff);
4426 } 4308 }
4427 4309
4428 return 1; 4310 return 1;
4429 } 4311 }
4430 4312
4431 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) 4313 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
4432 { 4314 {
4433 unsigned long nr, a0, a1, a2, a3, ret; 4315 unsigned long nr, a0, a1, a2, a3, ret;
4434 int r = 1; 4316 int r = 1;
4435 4317
4436 if (kvm_hv_hypercall_enabled(vcpu->kvm)) 4318 if (kvm_hv_hypercall_enabled(vcpu->kvm))
4437 return kvm_hv_hypercall(vcpu); 4319 return kvm_hv_hypercall(vcpu);
4438 4320
4439 nr = kvm_register_read(vcpu, VCPU_REGS_RAX); 4321 nr = kvm_register_read(vcpu, VCPU_REGS_RAX);
4440 a0 = kvm_register_read(vcpu, VCPU_REGS_RBX); 4322 a0 = kvm_register_read(vcpu, VCPU_REGS_RBX);
4441 a1 = kvm_register_read(vcpu, VCPU_REGS_RCX); 4323 a1 = kvm_register_read(vcpu, VCPU_REGS_RCX);
4442 a2 = kvm_register_read(vcpu, VCPU_REGS_RDX); 4324 a2 = kvm_register_read(vcpu, VCPU_REGS_RDX);
4443 a3 = kvm_register_read(vcpu, VCPU_REGS_RSI); 4325 a3 = kvm_register_read(vcpu, VCPU_REGS_RSI);
4444 4326
4445 trace_kvm_hypercall(nr, a0, a1, a2, a3); 4327 trace_kvm_hypercall(nr, a0, a1, a2, a3);
4446 4328
4447 if (!is_long_mode(vcpu)) { 4329 if (!is_long_mode(vcpu)) {
4448 nr &= 0xFFFFFFFF; 4330 nr &= 0xFFFFFFFF;
4449 a0 &= 0xFFFFFFFF; 4331 a0 &= 0xFFFFFFFF;
4450 a1 &= 0xFFFFFFFF; 4332 a1 &= 0xFFFFFFFF;
4451 a2 &= 0xFFFFFFFF; 4333 a2 &= 0xFFFFFFFF;
4452 a3 &= 0xFFFFFFFF; 4334 a3 &= 0xFFFFFFFF;
4453 } 4335 }
4454 4336
4455 if (kvm_x86_ops->get_cpl(vcpu) != 0) { 4337 if (kvm_x86_ops->get_cpl(vcpu) != 0) {
4456 ret = -KVM_EPERM; 4338 ret = -KVM_EPERM;
4457 goto out; 4339 goto out;
4458 } 4340 }
4459 4341
4460 switch (nr) { 4342 switch (nr) {
4461 case KVM_HC_VAPIC_POLL_IRQ: 4343 case KVM_HC_VAPIC_POLL_IRQ:
4462 ret = 0; 4344 ret = 0;
4463 break; 4345 break;
4464 case KVM_HC_MMU_OP: 4346 case KVM_HC_MMU_OP:
4465 r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret); 4347 r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
4466 break; 4348 break;
4467 default: 4349 default:
4468 ret = -KVM_ENOSYS; 4350 ret = -KVM_ENOSYS;
4469 break; 4351 break;
4470 } 4352 }
4471 out: 4353 out:
4472 kvm_register_write(vcpu, VCPU_REGS_RAX, ret); 4354 kvm_register_write(vcpu, VCPU_REGS_RAX, ret);
4473 ++vcpu->stat.hypercalls; 4355 ++vcpu->stat.hypercalls;
4474 return r; 4356 return r;
4475 } 4357 }
4476 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); 4358 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
4477 4359
4478 int kvm_fix_hypercall(struct kvm_vcpu *vcpu) 4360 int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
4479 { 4361 {
4480 char instruction[3]; 4362 char instruction[3];
4481 unsigned long rip = kvm_rip_read(vcpu); 4363 unsigned long rip = kvm_rip_read(vcpu);
4482 4364
4483 /* 4365 /*
4484 * Blow out the MMU to ensure that no other VCPU has an active mapping 4366 * Blow out the MMU to ensure that no other VCPU has an active mapping
4485 * to ensure that the updated hypercall appears atomically across all 4367 * to ensure that the updated hypercall appears atomically across all
4486 * VCPUs. 4368 * VCPUs.
4487 */ 4369 */
4488 kvm_mmu_zap_all(vcpu->kvm); 4370 kvm_mmu_zap_all(vcpu->kvm);
4489 4371
4490 kvm_x86_ops->patch_hypercall(vcpu, instruction); 4372 kvm_x86_ops->patch_hypercall(vcpu, instruction);
4491 4373
4492 return emulator_write_emulated(rip, instruction, 3, NULL, vcpu); 4374 return emulator_write_emulated(rip, instruction, 3, NULL, vcpu);
4493 } 4375 }
4494 4376
4495 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) 4377 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base)
4496 { 4378 {
4497 struct desc_ptr dt = { limit, base }; 4379 struct desc_ptr dt = { limit, base };
4498 4380
4499 kvm_x86_ops->set_gdt(vcpu, &dt); 4381 kvm_x86_ops->set_gdt(vcpu, &dt);
4500 } 4382 }
4501 4383
4502 void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) 4384 void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base)
4503 { 4385 {
4504 struct desc_ptr dt = { limit, base }; 4386 struct desc_ptr dt = { limit, base };
4505 4387
4506 kvm_x86_ops->set_idt(vcpu, &dt); 4388 kvm_x86_ops->set_idt(vcpu, &dt);
4507 } 4389 }
4508 4390
4509 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) 4391 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
4510 { 4392 {
4511 struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i]; 4393 struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i];
4512 int j, nent = vcpu->arch.cpuid_nent; 4394 int j, nent = vcpu->arch.cpuid_nent;
4513 4395
4514 e->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT; 4396 e->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT;
4515 /* when no next entry is found, the current entry[i] is reselected */ 4397 /* when no next entry is found, the current entry[i] is reselected */
4516 for (j = i + 1; ; j = (j + 1) % nent) { 4398 for (j = i + 1; ; j = (j + 1) % nent) {
4517 struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j]; 4399 struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
4518 if (ej->function == e->function) { 4400 if (ej->function == e->function) {
4519 ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; 4401 ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
4520 return j; 4402 return j;
4521 } 4403 }
4522 } 4404 }
4523 return 0; /* silence gcc, even though control never reaches here */ 4405 return 0; /* silence gcc, even though control never reaches here */
4524 } 4406 }
4525 4407
4526 /* find an entry with matching function, matching index (if needed), and that 4408 /* find an entry with matching function, matching index (if needed), and that
4527 * should be read next (if it's stateful) */ 4409 * should be read next (if it's stateful) */
4528 static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e, 4410 static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e,
4529 u32 function, u32 index) 4411 u32 function, u32 index)
4530 { 4412 {
4531 if (e->function != function) 4413 if (e->function != function)
4532 return 0; 4414 return 0;
4533 if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index) 4415 if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index)
4534 return 0; 4416 return 0;
4535 if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && 4417 if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
4536 !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT)) 4418 !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT))
4537 return 0; 4419 return 0;
4538 return 1; 4420 return 1;
4539 } 4421 }
4540 4422
4541 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 4423 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
4542 u32 function, u32 index) 4424 u32 function, u32 index)
4543 { 4425 {
4544 int i; 4426 int i;
4545 struct kvm_cpuid_entry2 *best = NULL; 4427 struct kvm_cpuid_entry2 *best = NULL;
4546 4428
4547 for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { 4429 for (i = 0; i < vcpu->arch.cpuid_nent; ++i) {
4548 struct kvm_cpuid_entry2 *e; 4430 struct kvm_cpuid_entry2 *e;
4549 4431
4550 e = &vcpu->arch.cpuid_entries[i]; 4432 e = &vcpu->arch.cpuid_entries[i];
4551 if (is_matching_cpuid_entry(e, function, index)) { 4433 if (is_matching_cpuid_entry(e, function, index)) {
4552 if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) 4434 if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC)
4553 move_to_next_stateful_cpuid_entry(vcpu, i); 4435 move_to_next_stateful_cpuid_entry(vcpu, i);
4554 best = e; 4436 best = e;
4555 break; 4437 break;
4556 } 4438 }
4557 /* 4439 /*
4558 * Both basic or both extended? 4440 * Both basic or both extended?
4559 */ 4441 */
4560 if (((e->function ^ function) & 0x80000000) == 0) 4442 if (((e->function ^ function) & 0x80000000) == 0)
4561 if (!best || e->function > best->function) 4443 if (!best || e->function > best->function)
4562 best = e; 4444 best = e;
4563 } 4445 }
4564 return best; 4446 return best;
4565 } 4447 }
4566 EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry); 4448 EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
4567 4449
4568 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) 4450 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
4569 { 4451 {
4570 struct kvm_cpuid_entry2 *best; 4452 struct kvm_cpuid_entry2 *best;
4571 4453
4572 best = kvm_find_cpuid_entry(vcpu, 0x80000000, 0); 4454 best = kvm_find_cpuid_entry(vcpu, 0x80000000, 0);
4573 if (!best || best->eax < 0x80000008) 4455 if (!best || best->eax < 0x80000008)
4574 goto not_found; 4456 goto not_found;
4575 best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); 4457 best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
4576 if (best) 4458 if (best)
4577 return best->eax & 0xff; 4459 return best->eax & 0xff;
4578 not_found: 4460 not_found:
4579 return 36; 4461 return 36;
4580 } 4462 }
4581 4463
4582 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) 4464 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
4583 { 4465 {
4584 u32 function, index; 4466 u32 function, index;
4585 struct kvm_cpuid_entry2 *best; 4467 struct kvm_cpuid_entry2 *best;
4586 4468
4587 function = kvm_register_read(vcpu, VCPU_REGS_RAX); 4469 function = kvm_register_read(vcpu, VCPU_REGS_RAX);
4588 index = kvm_register_read(vcpu, VCPU_REGS_RCX); 4470 index = kvm_register_read(vcpu, VCPU_REGS_RCX);
4589 kvm_register_write(vcpu, VCPU_REGS_RAX, 0); 4471 kvm_register_write(vcpu, VCPU_REGS_RAX, 0);
4590 kvm_register_write(vcpu, VCPU_REGS_RBX, 0); 4472 kvm_register_write(vcpu, VCPU_REGS_RBX, 0);
4591 kvm_register_write(vcpu, VCPU_REGS_RCX, 0); 4473 kvm_register_write(vcpu, VCPU_REGS_RCX, 0);
4592 kvm_register_write(vcpu, VCPU_REGS_RDX, 0); 4474 kvm_register_write(vcpu, VCPU_REGS_RDX, 0);
4593 best = kvm_find_cpuid_entry(vcpu, function, index); 4475 best = kvm_find_cpuid_entry(vcpu, function, index);
4594 if (best) { 4476 if (best) {
4595 kvm_register_write(vcpu, VCPU_REGS_RAX, best->eax); 4477 kvm_register_write(vcpu, VCPU_REGS_RAX, best->eax);
4596 kvm_register_write(vcpu, VCPU_REGS_RBX, best->ebx); 4478 kvm_register_write(vcpu, VCPU_REGS_RBX, best->ebx);
4597 kvm_register_write(vcpu, VCPU_REGS_RCX, best->ecx); 4479 kvm_register_write(vcpu, VCPU_REGS_RCX, best->ecx);
4598 kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx); 4480 kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx);
4599 } 4481 }
4600 kvm_x86_ops->skip_emulated_instruction(vcpu); 4482 kvm_x86_ops->skip_emulated_instruction(vcpu);
4601 trace_kvm_cpuid(function, 4483 trace_kvm_cpuid(function,
4602 kvm_register_read(vcpu, VCPU_REGS_RAX), 4484 kvm_register_read(vcpu, VCPU_REGS_RAX),
4603 kvm_register_read(vcpu, VCPU_REGS_RBX), 4485 kvm_register_read(vcpu, VCPU_REGS_RBX),
4604 kvm_register_read(vcpu, VCPU_REGS_RCX), 4486 kvm_register_read(vcpu, VCPU_REGS_RCX),
4605 kvm_register_read(vcpu, VCPU_REGS_RDX)); 4487 kvm_register_read(vcpu, VCPU_REGS_RDX));
4606 } 4488 }
4607 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); 4489 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
4608 4490
4609 /* 4491 /*
4610 * Check if userspace requested an interrupt window, and that the 4492 * Check if userspace requested an interrupt window, and that the
4611 * interrupt window is open. 4493 * interrupt window is open.
4612 * 4494 *
4613 * No need to exit to userspace if we already have an interrupt queued. 4495 * No need to exit to userspace if we already have an interrupt queued.
4614 */ 4496 */
4615 static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu) 4497 static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu)
4616 { 4498 {
4617 return (!irqchip_in_kernel(vcpu->kvm) && !kvm_cpu_has_interrupt(vcpu) && 4499 return (!irqchip_in_kernel(vcpu->kvm) && !kvm_cpu_has_interrupt(vcpu) &&
4618 vcpu->run->request_interrupt_window && 4500 vcpu->run->request_interrupt_window &&
4619 kvm_arch_interrupt_allowed(vcpu)); 4501 kvm_arch_interrupt_allowed(vcpu));
4620 } 4502 }
4621 4503
4622 static void post_kvm_run_save(struct kvm_vcpu *vcpu) 4504 static void post_kvm_run_save(struct kvm_vcpu *vcpu)
4623 { 4505 {
4624 struct kvm_run *kvm_run = vcpu->run; 4506 struct kvm_run *kvm_run = vcpu->run;
4625 4507
4626 kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0; 4508 kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
4627 kvm_run->cr8 = kvm_get_cr8(vcpu); 4509 kvm_run->cr8 = kvm_get_cr8(vcpu);
4628 kvm_run->apic_base = kvm_get_apic_base(vcpu); 4510 kvm_run->apic_base = kvm_get_apic_base(vcpu);
4629 if (irqchip_in_kernel(vcpu->kvm)) 4511 if (irqchip_in_kernel(vcpu->kvm))
4630 kvm_run->ready_for_interrupt_injection = 1; 4512 kvm_run->ready_for_interrupt_injection = 1;
4631 else 4513 else
4632 kvm_run->ready_for_interrupt_injection = 4514 kvm_run->ready_for_interrupt_injection =
4633 kvm_arch_interrupt_allowed(vcpu) && 4515 kvm_arch_interrupt_allowed(vcpu) &&
4634 !kvm_cpu_has_interrupt(vcpu) && 4516 !kvm_cpu_has_interrupt(vcpu) &&
4635 !kvm_event_needs_reinjection(vcpu); 4517 !kvm_event_needs_reinjection(vcpu);
4636 } 4518 }
4637 4519
4638 static void vapic_enter(struct kvm_vcpu *vcpu) 4520 static void vapic_enter(struct kvm_vcpu *vcpu)
4639 { 4521 {
4640 struct kvm_lapic *apic = vcpu->arch.apic; 4522 struct kvm_lapic *apic = vcpu->arch.apic;
4641 struct page *page; 4523 struct page *page;
4642 4524
4643 if (!apic || !apic->vapic_addr) 4525 if (!apic || !apic->vapic_addr)
4644 return; 4526 return;
4645 4527
4646 page = gfn_to_page(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); 4528 page = gfn_to_page(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT);
4647 4529
4648 vcpu->arch.apic->vapic_page = page; 4530 vcpu->arch.apic->vapic_page = page;
4649 } 4531 }
4650 4532
4651 static void vapic_exit(struct kvm_vcpu *vcpu) 4533 static void vapic_exit(struct kvm_vcpu *vcpu)
4652 { 4534 {
4653 struct kvm_lapic *apic = vcpu->arch.apic; 4535 struct kvm_lapic *apic = vcpu->arch.apic;
4654 int idx; 4536 int idx;
4655 4537
4656 if (!apic || !apic->vapic_addr) 4538 if (!apic || !apic->vapic_addr)
4657 return; 4539 return;
4658 4540
4659 idx = srcu_read_lock(&vcpu->kvm->srcu); 4541 idx = srcu_read_lock(&vcpu->kvm->srcu);
4660 kvm_release_page_dirty(apic->vapic_page); 4542 kvm_release_page_dirty(apic->vapic_page);
4661 mark_page_dirty(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); 4543 mark_page_dirty(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT);
4662 srcu_read_unlock(&vcpu->kvm->srcu, idx); 4544 srcu_read_unlock(&vcpu->kvm->srcu, idx);
4663 } 4545 }
4664 4546
4665 static void update_cr8_intercept(struct kvm_vcpu *vcpu) 4547 static void update_cr8_intercept(struct kvm_vcpu *vcpu)
4666 { 4548 {
4667 int max_irr, tpr; 4549 int max_irr, tpr;
4668 4550
4669 if (!kvm_x86_ops->update_cr8_intercept) 4551 if (!kvm_x86_ops->update_cr8_intercept)
4670 return; 4552 return;
4671 4553
4672 if (!vcpu->arch.apic) 4554 if (!vcpu->arch.apic)
4673 return; 4555 return;
4674 4556
4675 if (!vcpu->arch.apic->vapic_addr) 4557 if (!vcpu->arch.apic->vapic_addr)
4676 max_irr = kvm_lapic_find_highest_irr(vcpu); 4558 max_irr = kvm_lapic_find_highest_irr(vcpu);
4677 else 4559 else
4678 max_irr = -1; 4560 max_irr = -1;
4679 4561
4680 if (max_irr != -1) 4562 if (max_irr != -1)
4681 max_irr >>= 4; 4563 max_irr >>= 4;
4682 4564
4683 tpr = kvm_lapic_get_cr8(vcpu); 4565 tpr = kvm_lapic_get_cr8(vcpu);
4684 4566
4685 kvm_x86_ops->update_cr8_intercept(vcpu, tpr, max_irr); 4567 kvm_x86_ops->update_cr8_intercept(vcpu, tpr, max_irr);
4686 } 4568 }
4687 4569
4688 static void inject_pending_event(struct kvm_vcpu *vcpu) 4570 static void inject_pending_event(struct kvm_vcpu *vcpu)
4689 { 4571 {
4690 /* try to reinject previous events if any */ 4572 /* try to reinject previous events if any */
4691 if (vcpu->arch.exception.pending) { 4573 if (vcpu->arch.exception.pending) {
4692 trace_kvm_inj_exception(vcpu->arch.exception.nr, 4574 trace_kvm_inj_exception(vcpu->arch.exception.nr,
4693 vcpu->arch.exception.has_error_code, 4575 vcpu->arch.exception.has_error_code,
4694 vcpu->arch.exception.error_code); 4576 vcpu->arch.exception.error_code);
4695 kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr, 4577 kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr,
4696 vcpu->arch.exception.has_error_code, 4578 vcpu->arch.exception.has_error_code,
4697 vcpu->arch.exception.error_code, 4579 vcpu->arch.exception.error_code,
4698 vcpu->arch.exception.reinject); 4580 vcpu->arch.exception.reinject);
4699 return; 4581 return;
4700 } 4582 }
4701 4583
4702 if (vcpu->arch.nmi_injected) { 4584 if (vcpu->arch.nmi_injected) {
4703 kvm_x86_ops->set_nmi(vcpu); 4585 kvm_x86_ops->set_nmi(vcpu);
4704 return; 4586 return;
4705 } 4587 }
4706 4588
4707 if (vcpu->arch.interrupt.pending) { 4589 if (vcpu->arch.interrupt.pending) {
4708 kvm_x86_ops->set_irq(vcpu); 4590 kvm_x86_ops->set_irq(vcpu);
4709 return; 4591 return;
4710 } 4592 }
4711 4593
4712 /* try to inject new event if pending */ 4594 /* try to inject new event if pending */
4713 if (vcpu->arch.nmi_pending) { 4595 if (vcpu->arch.nmi_pending) {
4714 if (kvm_x86_ops->nmi_allowed(vcpu)) { 4596 if (kvm_x86_ops->nmi_allowed(vcpu)) {
4715 vcpu->arch.nmi_pending = false; 4597 vcpu->arch.nmi_pending = false;
4716 vcpu->arch.nmi_injected = true; 4598 vcpu->arch.nmi_injected = true;
4717 kvm_x86_ops->set_nmi(vcpu); 4599 kvm_x86_ops->set_nmi(vcpu);
4718 } 4600 }
4719 } else if (kvm_cpu_has_interrupt(vcpu)) { 4601 } else if (kvm_cpu_has_interrupt(vcpu)) {
4720 if (kvm_x86_ops->interrupt_allowed(vcpu)) { 4602 if (kvm_x86_ops->interrupt_allowed(vcpu)) {
4721 kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), 4603 kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
4722 false); 4604 false);
4723 kvm_x86_ops->set_irq(vcpu); 4605 kvm_x86_ops->set_irq(vcpu);
4724 } 4606 }
4725 } 4607 }
4726 } 4608 }
4727 4609
4728 static void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu) 4610 static void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu)
4729 { 4611 {
4730 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) && 4612 if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) &&
4731 !vcpu->guest_xcr0_loaded) { 4613 !vcpu->guest_xcr0_loaded) {
4732 /* kvm_set_xcr() also depends on this */ 4614 /* kvm_set_xcr() also depends on this */
4733 xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); 4615 xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
4734 vcpu->guest_xcr0_loaded = 1; 4616 vcpu->guest_xcr0_loaded = 1;
4735 } 4617 }
4736 } 4618 }
4737 4619
4738 static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu) 4620 static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
4739 { 4621 {
4740 if (vcpu->guest_xcr0_loaded) { 4622 if (vcpu->guest_xcr0_loaded) {
4741 if (vcpu->arch.xcr0 != host_xcr0) 4623 if (vcpu->arch.xcr0 != host_xcr0)
4742 xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); 4624 xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
4743 vcpu->guest_xcr0_loaded = 0; 4625 vcpu->guest_xcr0_loaded = 0;
4744 } 4626 }
4745 } 4627 }
4746 4628
4747 static int vcpu_enter_guest(struct kvm_vcpu *vcpu) 4629 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
4748 { 4630 {
4749 int r; 4631 int r;
4750 bool req_int_win = !irqchip_in_kernel(vcpu->kvm) && 4632 bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
4751 vcpu->run->request_interrupt_window; 4633 vcpu->run->request_interrupt_window;
4752 4634
4753 if (vcpu->requests) 4635 if (vcpu->requests)
4754 if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) 4636 if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
4755 kvm_mmu_unload(vcpu); 4637 kvm_mmu_unload(vcpu);
4756 4638
4757 r = kvm_mmu_reload(vcpu); 4639 r = kvm_mmu_reload(vcpu);
4758 if (unlikely(r)) 4640 if (unlikely(r))
4759 goto out; 4641 goto out;
4760 4642
4761 if (vcpu->requests) { 4643 if (vcpu->requests) {
4762 if (test_and_clear_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests)) 4644 if (test_and_clear_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests))
4763 __kvm_migrate_timers(vcpu); 4645 __kvm_migrate_timers(vcpu);
4764 if (test_and_clear_bit(KVM_REQ_KVMCLOCK_UPDATE, &vcpu->requests)) 4646 if (test_and_clear_bit(KVM_REQ_KVMCLOCK_UPDATE, &vcpu->requests))
4765 kvm_write_guest_time(vcpu); 4647 kvm_write_guest_time(vcpu);
4766 if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests)) 4648 if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
4767 kvm_mmu_sync_roots(vcpu); 4649 kvm_mmu_sync_roots(vcpu);
4768 if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) 4650 if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
4769 kvm_x86_ops->tlb_flush(vcpu); 4651 kvm_x86_ops->tlb_flush(vcpu);
4770 if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS, 4652 if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
4771 &vcpu->requests)) { 4653 &vcpu->requests)) {
4772 vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS; 4654 vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
4773 r = 0; 4655 r = 0;
4774 goto out; 4656 goto out;
4775 } 4657 }
4776 if (test_and_clear_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests)) { 4658 if (test_and_clear_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests)) {
4777 vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; 4659 vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
4778 r = 0; 4660 r = 0;
4779 goto out; 4661 goto out;
4780 } 4662 }
4781 if (test_and_clear_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests)) { 4663 if (test_and_clear_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests)) {
4782 vcpu->fpu_active = 0; 4664 vcpu->fpu_active = 0;
4783 kvm_x86_ops->fpu_deactivate(vcpu); 4665 kvm_x86_ops->fpu_deactivate(vcpu);
4784 } 4666 }
4785 } 4667 }
4786 4668
4787 preempt_disable(); 4669 preempt_disable();
4788 4670
4789 kvm_x86_ops->prepare_guest_switch(vcpu); 4671 kvm_x86_ops->prepare_guest_switch(vcpu);
4790 if (vcpu->fpu_active) 4672 if (vcpu->fpu_active)
4791 kvm_load_guest_fpu(vcpu); 4673 kvm_load_guest_fpu(vcpu);
4792 kvm_load_guest_xcr0(vcpu); 4674 kvm_load_guest_xcr0(vcpu);
4793 4675
4794 atomic_set(&vcpu->guest_mode, 1); 4676 atomic_set(&vcpu->guest_mode, 1);
4795 smp_wmb(); 4677 smp_wmb();
4796 4678
4797 local_irq_disable(); 4679 local_irq_disable();
4798 4680
4799 if (!atomic_read(&vcpu->guest_mode) || vcpu->requests 4681 if (!atomic_read(&vcpu->guest_mode) || vcpu->requests
4800 || need_resched() || signal_pending(current)) { 4682 || need_resched() || signal_pending(current)) {
4801 atomic_set(&vcpu->guest_mode, 0); 4683 atomic_set(&vcpu->guest_mode, 0);
4802 smp_wmb(); 4684 smp_wmb();
4803 local_irq_enable(); 4685 local_irq_enable();
4804 preempt_enable(); 4686 preempt_enable();
4805 r = 1; 4687 r = 1;
4806 goto out; 4688 goto out;
4807 } 4689 }
4808 4690
4809 inject_pending_event(vcpu); 4691 inject_pending_event(vcpu);
4810 4692
4811 /* enable NMI/IRQ window open exits if needed */ 4693 /* enable NMI/IRQ window open exits if needed */
4812 if (vcpu->arch.nmi_pending) 4694 if (vcpu->arch.nmi_pending)
4813 kvm_x86_ops->enable_nmi_window(vcpu); 4695 kvm_x86_ops->enable_nmi_window(vcpu);
4814 else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) 4696 else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
4815 kvm_x86_ops->enable_irq_window(vcpu); 4697 kvm_x86_ops->enable_irq_window(vcpu);
4816 4698
4817 if (kvm_lapic_enabled(vcpu)) { 4699 if (kvm_lapic_enabled(vcpu)) {
4818 update_cr8_intercept(vcpu); 4700 update_cr8_intercept(vcpu);
4819 kvm_lapic_sync_to_vapic(vcpu); 4701 kvm_lapic_sync_to_vapic(vcpu);
4820 } 4702 }
4821 4703
4822 srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); 4704 srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
4823 4705
4824 kvm_guest_enter(); 4706 kvm_guest_enter();
4825 4707
4826 if (unlikely(vcpu->arch.switch_db_regs)) { 4708 if (unlikely(vcpu->arch.switch_db_regs)) {
4827 set_debugreg(0, 7); 4709 set_debugreg(0, 7);
4828 set_debugreg(vcpu->arch.eff_db[0], 0); 4710 set_debugreg(vcpu->arch.eff_db[0], 0);
4829 set_debugreg(vcpu->arch.eff_db[1], 1); 4711 set_debugreg(vcpu->arch.eff_db[1], 1);
4830 set_debugreg(vcpu->arch.eff_db[2], 2); 4712 set_debugreg(vcpu->arch.eff_db[2], 2);
4831 set_debugreg(vcpu->arch.eff_db[3], 3); 4713 set_debugreg(vcpu->arch.eff_db[3], 3);
4832 } 4714 }
4833 4715
4834 trace_kvm_entry(vcpu->vcpu_id); 4716 trace_kvm_entry(vcpu->vcpu_id);
4835 kvm_x86_ops->run(vcpu); 4717 kvm_x86_ops->run(vcpu);
4836 4718
4837 /* 4719 /*
4838 * If the guest has used debug registers, at least dr7 4720 * If the guest has used debug registers, at least dr7
4839 * will be disabled while returning to the host. 4721 * will be disabled while returning to the host.
4840 * If we don't have active breakpoints in the host, we don't 4722 * If we don't have active breakpoints in the host, we don't
4841 * care about the messed up debug address registers. But if 4723 * care about the messed up debug address registers. But if
4842 * we have some of them active, restore the old state. 4724 * we have some of them active, restore the old state.
4843 */ 4725 */
4844 if (hw_breakpoint_active()) 4726 if (hw_breakpoint_active())
4845 hw_breakpoint_restore(); 4727 hw_breakpoint_restore();
4846 4728
4847 atomic_set(&vcpu->guest_mode, 0); 4729 atomic_set(&vcpu->guest_mode, 0);
4848 smp_wmb(); 4730 smp_wmb();
4849 local_irq_enable(); 4731 local_irq_enable();
4850 4732
4851 ++vcpu->stat.exits; 4733 ++vcpu->stat.exits;
4852 4734
4853 /* 4735 /*
4854 * We must have an instruction between local_irq_enable() and 4736 * We must have an instruction between local_irq_enable() and
4855 * kvm_guest_exit(), so the timer interrupt isn't delayed by 4737 * kvm_guest_exit(), so the timer interrupt isn't delayed by
4856 * the interrupt shadow. The stat.exits increment will do nicely. 4738 * the interrupt shadow. The stat.exits increment will do nicely.
4857 * But we need to prevent reordering, hence this barrier(): 4739 * But we need to prevent reordering, hence this barrier():
4858 */ 4740 */
4859 barrier(); 4741 barrier();
4860 4742
4861 kvm_guest_exit(); 4743 kvm_guest_exit();
4862 4744
4863 preempt_enable(); 4745 preempt_enable();
4864 4746
4865 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 4747 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
4866 4748
4867 /* 4749 /*
4868 * Profile KVM exit RIPs: 4750 * Profile KVM exit RIPs:
4869 */ 4751 */
4870 if (unlikely(prof_on == KVM_PROFILING)) { 4752 if (unlikely(prof_on == KVM_PROFILING)) {
4871 unsigned long rip = kvm_rip_read(vcpu); 4753 unsigned long rip = kvm_rip_read(vcpu);
4872 profile_hit(KVM_PROFILING, (void *)rip); 4754 profile_hit(KVM_PROFILING, (void *)rip);
4873 } 4755 }
4874 4756
4875 4757
4876 kvm_lapic_sync_from_vapic(vcpu); 4758 kvm_lapic_sync_from_vapic(vcpu);
4877 4759
4878 r = kvm_x86_ops->handle_exit(vcpu); 4760 r = kvm_x86_ops->handle_exit(vcpu);
4879 out: 4761 out:
4880 return r; 4762 return r;
4881 } 4763 }
4882 4764
4883 4765
4884 static int __vcpu_run(struct kvm_vcpu *vcpu) 4766 static int __vcpu_run(struct kvm_vcpu *vcpu)
4885 { 4767 {
4886 int r; 4768 int r;
4887 struct kvm *kvm = vcpu->kvm; 4769 struct kvm *kvm = vcpu->kvm;
4888 4770
4889 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) { 4771 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
4890 pr_debug("vcpu %d received sipi with vector # %x\n", 4772 pr_debug("vcpu %d received sipi with vector # %x\n",
4891 vcpu->vcpu_id, vcpu->arch.sipi_vector); 4773 vcpu->vcpu_id, vcpu->arch.sipi_vector);
4892 kvm_lapic_reset(vcpu); 4774 kvm_lapic_reset(vcpu);
4893 r = kvm_arch_vcpu_reset(vcpu); 4775 r = kvm_arch_vcpu_reset(vcpu);
4894 if (r) 4776 if (r)
4895 return r; 4777 return r;
4896 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 4778 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
4897 } 4779 }
4898 4780
4899 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); 4781 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
4900 vapic_enter(vcpu); 4782 vapic_enter(vcpu);
4901 4783
4902 r = 1; 4784 r = 1;
4903 while (r > 0) { 4785 while (r > 0) {
4904 if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) 4786 if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
4905 r = vcpu_enter_guest(vcpu); 4787 r = vcpu_enter_guest(vcpu);
4906 else { 4788 else {
4907 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); 4789 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
4908 kvm_vcpu_block(vcpu); 4790 kvm_vcpu_block(vcpu);
4909 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); 4791 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
4910 if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests)) 4792 if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests))
4911 { 4793 {
4912 switch(vcpu->arch.mp_state) { 4794 switch(vcpu->arch.mp_state) {
4913 case KVM_MP_STATE_HALTED: 4795 case KVM_MP_STATE_HALTED:
4914 vcpu->arch.mp_state = 4796 vcpu->arch.mp_state =
4915 KVM_MP_STATE_RUNNABLE; 4797 KVM_MP_STATE_RUNNABLE;
4916 case KVM_MP_STATE_RUNNABLE: 4798 case KVM_MP_STATE_RUNNABLE:
4917 break; 4799 break;
4918 case KVM_MP_STATE_SIPI_RECEIVED: 4800 case KVM_MP_STATE_SIPI_RECEIVED:
4919 default: 4801 default:
4920 r = -EINTR; 4802 r = -EINTR;
4921 break; 4803 break;
4922 } 4804 }
4923 } 4805 }
4924 } 4806 }
4925 4807
4926 if (r <= 0) 4808 if (r <= 0)
4927 break; 4809 break;
4928 4810
4929 clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests); 4811 clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
4930 if (kvm_cpu_has_pending_timer(vcpu)) 4812 if (kvm_cpu_has_pending_timer(vcpu))
4931 kvm_inject_pending_timer_irqs(vcpu); 4813 kvm_inject_pending_timer_irqs(vcpu);
4932 4814
4933 if (dm_request_for_irq_injection(vcpu)) { 4815 if (dm_request_for_irq_injection(vcpu)) {
4934 r = -EINTR; 4816 r = -EINTR;
4935 vcpu->run->exit_reason = KVM_EXIT_INTR; 4817 vcpu->run->exit_reason = KVM_EXIT_INTR;
4936 ++vcpu->stat.request_irq_exits; 4818 ++vcpu->stat.request_irq_exits;
4937 } 4819 }
4938 if (signal_pending(current)) { 4820 if (signal_pending(current)) {
4939 r = -EINTR; 4821 r = -EINTR;
4940 vcpu->run->exit_reason = KVM_EXIT_INTR; 4822 vcpu->run->exit_reason = KVM_EXIT_INTR;
4941 ++vcpu->stat.signal_exits; 4823 ++vcpu->stat.signal_exits;
4942 } 4824 }
4943 if (need_resched()) { 4825 if (need_resched()) {
4944 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); 4826 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
4945 kvm_resched(vcpu); 4827 kvm_resched(vcpu);
4946 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); 4828 vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
4947 } 4829 }
4948 } 4830 }
4949 4831
4950 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); 4832 srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
4951 4833
4952 vapic_exit(vcpu); 4834 vapic_exit(vcpu);
4953 4835
4954 return r; 4836 return r;
4955 } 4837 }
4956 4838
4957 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 4839 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
4958 { 4840 {
4959 int r; 4841 int r;
4960 sigset_t sigsaved; 4842 sigset_t sigsaved;
4961 4843
4962 if (vcpu->sigset_active) 4844 if (vcpu->sigset_active)
4963 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); 4845 sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
4964 4846
4965 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { 4847 if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
4966 kvm_vcpu_block(vcpu); 4848 kvm_vcpu_block(vcpu);
4967 clear_bit(KVM_REQ_UNHALT, &vcpu->requests); 4849 clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
4968 r = -EAGAIN; 4850 r = -EAGAIN;
4969 goto out; 4851 goto out;
4970 } 4852 }
4971 4853
4972 /* re-sync apic's tpr */ 4854 /* re-sync apic's tpr */
4973 if (!irqchip_in_kernel(vcpu->kvm)) 4855 if (!irqchip_in_kernel(vcpu->kvm))
4974 kvm_set_cr8(vcpu, kvm_run->cr8); 4856 kvm_set_cr8(vcpu, kvm_run->cr8);
4975 4857
4976 if (vcpu->arch.pio.count || vcpu->mmio_needed || 4858 if (vcpu->arch.pio.count || vcpu->mmio_needed ||
4977 vcpu->arch.emulate_ctxt.restart) { 4859 vcpu->arch.emulate_ctxt.restart) {
4978 if (vcpu->mmio_needed) { 4860 if (vcpu->mmio_needed) {
4979 memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); 4861 memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
4980 vcpu->mmio_read_completed = 1; 4862 vcpu->mmio_read_completed = 1;
4981 vcpu->mmio_needed = 0; 4863 vcpu->mmio_needed = 0;
4982 } 4864 }
4983 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 4865 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
4984 r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); 4866 r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
4985 srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); 4867 srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
4986 if (r != EMULATE_DONE) { 4868 if (r != EMULATE_DONE) {
4987 r = 0; 4869 r = 0;
4988 goto out; 4870 goto out;
4989 } 4871 }
4990 } 4872 }
4991 if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL) 4873 if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL)
4992 kvm_register_write(vcpu, VCPU_REGS_RAX, 4874 kvm_register_write(vcpu, VCPU_REGS_RAX,
4993 kvm_run->hypercall.ret); 4875 kvm_run->hypercall.ret);
4994 4876
4995 r = __vcpu_run(vcpu); 4877 r = __vcpu_run(vcpu);
4996 4878
4997 out: 4879 out:
4998 post_kvm_run_save(vcpu); 4880 post_kvm_run_save(vcpu);
4999 if (vcpu->sigset_active) 4881 if (vcpu->sigset_active)
5000 sigprocmask(SIG_SETMASK, &sigsaved, NULL); 4882 sigprocmask(SIG_SETMASK, &sigsaved, NULL);
5001 4883
5002 return r; 4884 return r;
5003 } 4885 }
5004 4886
5005 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 4887 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
5006 { 4888 {
5007 regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX); 4889 regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
5008 regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX); 4890 regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX);
5009 regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX); 4891 regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX);
5010 regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX); 4892 regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX);
5011 regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI); 4893 regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI);
5012 regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI); 4894 regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI);
5013 regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); 4895 regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP);
5014 regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP); 4896 regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP);
5015 #ifdef CONFIG_X86_64 4897 #ifdef CONFIG_X86_64
5016 regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8); 4898 regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8);
5017 regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9); 4899 regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9);
5018 regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10); 4900 regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10);
5019 regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11); 4901 regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11);
5020 regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12); 4902 regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12);
5021 regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13); 4903 regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13);
5022 regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14); 4904 regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14);
5023 regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15); 4905 regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15);
5024 #endif 4906 #endif
5025 4907
5026 regs->rip = kvm_rip_read(vcpu); 4908 regs->rip = kvm_rip_read(vcpu);
5027 regs->rflags = kvm_get_rflags(vcpu); 4909 regs->rflags = kvm_get_rflags(vcpu);
5028 4910
5029 return 0; 4911 return 0;
5030 } 4912 }
5031 4913
5032 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 4914 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
5033 { 4915 {
5034 kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax); 4916 kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax);
5035 kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx); 4917 kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx);
5036 kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx); 4918 kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx);
5037 kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx); 4919 kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx);
5038 kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi); 4920 kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi);
5039 kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi); 4921 kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi);
5040 kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp); 4922 kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp);
5041 kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp); 4923 kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp);
5042 #ifdef CONFIG_X86_64 4924 #ifdef CONFIG_X86_64
5043 kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8); 4925 kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8);
5044 kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9); 4926 kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9);
5045 kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10); 4927 kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10);
5046 kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11); 4928 kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11);
5047 kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12); 4929 kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12);
5048 kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13); 4930 kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13);
5049 kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14); 4931 kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14);
5050 kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15); 4932 kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15);
5051 #endif 4933 #endif
5052 4934
5053 kvm_rip_write(vcpu, regs->rip); 4935 kvm_rip_write(vcpu, regs->rip);
5054 kvm_set_rflags(vcpu, regs->rflags); 4936 kvm_set_rflags(vcpu, regs->rflags);
5055 4937
5056 vcpu->arch.exception.pending = false; 4938 vcpu->arch.exception.pending = false;
5057 4939
5058 return 0; 4940 return 0;
5059 } 4941 }
5060 4942
5061 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) 4943 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
5062 { 4944 {
5063 struct kvm_segment cs; 4945 struct kvm_segment cs;
5064 4946
5065 kvm_get_segment(vcpu, &cs, VCPU_SREG_CS); 4947 kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
5066 *db = cs.db; 4948 *db = cs.db;
5067 *l = cs.l; 4949 *l = cs.l;
5068 } 4950 }
5069 EXPORT_SYMBOL_GPL(kvm_get_cs_db_l_bits); 4951 EXPORT_SYMBOL_GPL(kvm_get_cs_db_l_bits);
5070 4952
5071 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, 4953 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
5072 struct kvm_sregs *sregs) 4954 struct kvm_sregs *sregs)
5073 { 4955 {
5074 struct desc_ptr dt; 4956 struct desc_ptr dt;
5075 4957
5076 kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS); 4958 kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
5077 kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS); 4959 kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
5078 kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES); 4960 kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
5079 kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS); 4961 kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
5080 kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS); 4962 kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
5081 kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS); 4963 kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
5082 4964
5083 kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR); 4965 kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
5084 kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); 4966 kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
5085 4967
5086 kvm_x86_ops->get_idt(vcpu, &dt); 4968 kvm_x86_ops->get_idt(vcpu, &dt);
5087 sregs->idt.limit = dt.size; 4969 sregs->idt.limit = dt.size;
5088 sregs->idt.base = dt.address; 4970 sregs->idt.base = dt.address;
5089 kvm_x86_ops->get_gdt(vcpu, &dt); 4971 kvm_x86_ops->get_gdt(vcpu, &dt);
5090 sregs->gdt.limit = dt.size; 4972 sregs->gdt.limit = dt.size;
5091 sregs->gdt.base = dt.address; 4973 sregs->gdt.base = dt.address;
5092 4974
5093 sregs->cr0 = kvm_read_cr0(vcpu); 4975 sregs->cr0 = kvm_read_cr0(vcpu);
5094 sregs->cr2 = vcpu->arch.cr2; 4976 sregs->cr2 = vcpu->arch.cr2;
5095 sregs->cr3 = vcpu->arch.cr3; 4977 sregs->cr3 = vcpu->arch.cr3;
5096 sregs->cr4 = kvm_read_cr4(vcpu); 4978 sregs->cr4 = kvm_read_cr4(vcpu);
5097 sregs->cr8 = kvm_get_cr8(vcpu); 4979 sregs->cr8 = kvm_get_cr8(vcpu);
5098 sregs->efer = vcpu->arch.efer; 4980 sregs->efer = vcpu->arch.efer;
5099 sregs->apic_base = kvm_get_apic_base(vcpu); 4981 sregs->apic_base = kvm_get_apic_base(vcpu);
5100 4982
5101 memset(sregs->interrupt_bitmap, 0, sizeof sregs->interrupt_bitmap); 4983 memset(sregs->interrupt_bitmap, 0, sizeof sregs->interrupt_bitmap);
5102 4984
5103 if (vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft) 4985 if (vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft)
5104 set_bit(vcpu->arch.interrupt.nr, 4986 set_bit(vcpu->arch.interrupt.nr,
5105 (unsigned long *)sregs->interrupt_bitmap); 4987 (unsigned long *)sregs->interrupt_bitmap);
5106 4988
5107 return 0; 4989 return 0;
5108 } 4990 }
5109 4991
5110 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 4992 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
5111 struct kvm_mp_state *mp_state) 4993 struct kvm_mp_state *mp_state)
5112 { 4994 {
5113 mp_state->mp_state = vcpu->arch.mp_state; 4995 mp_state->mp_state = vcpu->arch.mp_state;
5114 return 0; 4996 return 0;
5115 } 4997 }
5116 4998
5117 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 4999 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
5118 struct kvm_mp_state *mp_state) 5000 struct kvm_mp_state *mp_state)
5119 { 5001 {
5120 vcpu->arch.mp_state = mp_state->mp_state; 5002 vcpu->arch.mp_state = mp_state->mp_state;
5121 return 0; 5003 return 0;
5122 } 5004 }
5123 5005
5124 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, 5006 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
5125 bool has_error_code, u32 error_code) 5007 bool has_error_code, u32 error_code)
5126 { 5008 {
5127 struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; 5009 struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
5128 int cs_db, cs_l, ret; 5010 int cs_db, cs_l, ret;
5129 cache_all_regs(vcpu); 5011 cache_all_regs(vcpu);
5130 5012
5131 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); 5013 kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
5132 5014
5133 vcpu->arch.emulate_ctxt.vcpu = vcpu; 5015 vcpu->arch.emulate_ctxt.vcpu = vcpu;
5134 vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); 5016 vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
5135 vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); 5017 vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
5136 vcpu->arch.emulate_ctxt.mode = 5018 vcpu->arch.emulate_ctxt.mode =
5137 (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : 5019 (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
5138 (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) 5020 (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
5139 ? X86EMUL_MODE_VM86 : cs_l 5021 ? X86EMUL_MODE_VM86 : cs_l
5140 ? X86EMUL_MODE_PROT64 : cs_db 5022 ? X86EMUL_MODE_PROT64 : cs_db
5141 ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; 5023 ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
5142 memset(c, 0, sizeof(struct decode_cache)); 5024 memset(c, 0, sizeof(struct decode_cache));
5143 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); 5025 memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
5144 5026
5145 ret = emulator_task_switch(&vcpu->arch.emulate_ctxt, &emulate_ops, 5027 ret = emulator_task_switch(&vcpu->arch.emulate_ctxt, &emulate_ops,
5146 tss_selector, reason, has_error_code, 5028 tss_selector, reason, has_error_code,
5147 error_code); 5029 error_code);
5148 5030
5149 if (ret) 5031 if (ret)
5150 return EMULATE_FAIL; 5032 return EMULATE_FAIL;
5151 5033
5152 memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); 5034 memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
5153 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); 5035 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
5154 kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); 5036 kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
5155 return EMULATE_DONE; 5037 return EMULATE_DONE;
5156 } 5038 }
5157 EXPORT_SYMBOL_GPL(kvm_task_switch); 5039 EXPORT_SYMBOL_GPL(kvm_task_switch);
5158 5040
5159 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, 5041 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
5160 struct kvm_sregs *sregs) 5042 struct kvm_sregs *sregs)
5161 { 5043 {
5162 int mmu_reset_needed = 0; 5044 int mmu_reset_needed = 0;
5163 int pending_vec, max_bits; 5045 int pending_vec, max_bits;
5164 struct desc_ptr dt; 5046 struct desc_ptr dt;
5165 5047
5166 dt.size = sregs->idt.limit; 5048 dt.size = sregs->idt.limit;
5167 dt.address = sregs->idt.base; 5049 dt.address = sregs->idt.base;
5168 kvm_x86_ops->set_idt(vcpu, &dt); 5050 kvm_x86_ops->set_idt(vcpu, &dt);
5169 dt.size = sregs->gdt.limit; 5051 dt.size = sregs->gdt.limit;
5170 dt.address = sregs->gdt.base; 5052 dt.address = sregs->gdt.base;
5171 kvm_x86_ops->set_gdt(vcpu, &dt); 5053 kvm_x86_ops->set_gdt(vcpu, &dt);
5172 5054
5173 vcpu->arch.cr2 = sregs->cr2; 5055 vcpu->arch.cr2 = sregs->cr2;
5174 mmu_reset_needed |= vcpu->arch.cr3 != sregs->cr3; 5056 mmu_reset_needed |= vcpu->arch.cr3 != sregs->cr3;
5175 vcpu->arch.cr3 = sregs->cr3; 5057 vcpu->arch.cr3 = sregs->cr3;
5176 5058
5177 kvm_set_cr8(vcpu, sregs->cr8); 5059 kvm_set_cr8(vcpu, sregs->cr8);
5178 5060
5179 mmu_reset_needed |= vcpu->arch.efer != sregs->efer; 5061 mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
5180 kvm_x86_ops->set_efer(vcpu, sregs->efer); 5062 kvm_x86_ops->set_efer(vcpu, sregs->efer);
5181 kvm_set_apic_base(vcpu, sregs->apic_base); 5063 kvm_set_apic_base(vcpu, sregs->apic_base);
5182 5064
5183 mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; 5065 mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
5184 kvm_x86_ops->set_cr0(vcpu, sregs->cr0); 5066 kvm_x86_ops->set_cr0(vcpu, sregs->cr0);
5185 vcpu->arch.cr0 = sregs->cr0; 5067 vcpu->arch.cr0 = sregs->cr0;
5186 5068
5187 mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4; 5069 mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
5188 kvm_x86_ops->set_cr4(vcpu, sregs->cr4); 5070 kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
5189 if (!is_long_mode(vcpu) && is_pae(vcpu)) { 5071 if (!is_long_mode(vcpu) && is_pae(vcpu)) {
5190 load_pdptrs(vcpu, vcpu->arch.cr3); 5072 load_pdptrs(vcpu, vcpu->arch.cr3);
5191 mmu_reset_needed = 1; 5073 mmu_reset_needed = 1;
5192 } 5074 }
5193 5075
5194 if (mmu_reset_needed) 5076 if (mmu_reset_needed)
5195 kvm_mmu_reset_context(vcpu); 5077 kvm_mmu_reset_context(vcpu);
5196 5078
5197 max_bits = (sizeof sregs->interrupt_bitmap) << 3; 5079 max_bits = (sizeof sregs->interrupt_bitmap) << 3;
5198 pending_vec = find_first_bit( 5080 pending_vec = find_first_bit(
5199 (const unsigned long *)sregs->interrupt_bitmap, max_bits); 5081 (const unsigned long *)sregs->interrupt_bitmap, max_bits);
5200 if (pending_vec < max_bits) { 5082 if (pending_vec < max_bits) {
5201 kvm_queue_interrupt(vcpu, pending_vec, false); 5083 kvm_queue_interrupt(vcpu, pending_vec, false);
5202 pr_debug("Set back pending irq %d\n", pending_vec); 5084 pr_debug("Set back pending irq %d\n", pending_vec);
5203 if (irqchip_in_kernel(vcpu->kvm)) 5085 if (irqchip_in_kernel(vcpu->kvm))
5204 kvm_pic_clear_isr_ack(vcpu->kvm); 5086 kvm_pic_clear_isr_ack(vcpu->kvm);
5205 } 5087 }
5206 5088
5207 kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS); 5089 kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
5208 kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS); 5090 kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
5209 kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES); 5091 kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
5210 kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS); 5092 kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
5211 kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS); 5093 kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
5212 kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS); 5094 kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
5213 5095
5214 kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR); 5096 kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
5215 kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); 5097 kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
5216 5098
5217 update_cr8_intercept(vcpu); 5099 update_cr8_intercept(vcpu);
5218 5100
5219 /* Older userspace won't unhalt the vcpu on reset. */ 5101 /* Older userspace won't unhalt the vcpu on reset. */
5220 if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 && 5102 if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
5221 sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 && 5103 sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 &&
5222 !is_protmode(vcpu)) 5104 !is_protmode(vcpu))
5223 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 5105 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
5224 5106
5225 return 0; 5107 return 0;
5226 } 5108 }
5227 5109
5228 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 5110 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
5229 struct kvm_guest_debug *dbg) 5111 struct kvm_guest_debug *dbg)
5230 { 5112 {
5231 unsigned long rflags; 5113 unsigned long rflags;
5232 int i, r; 5114 int i, r;
5233 5115
5234 if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) { 5116 if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
5235 r = -EBUSY; 5117 r = -EBUSY;
5236 if (vcpu->arch.exception.pending) 5118 if (vcpu->arch.exception.pending)
5237 goto out; 5119 goto out;
5238 if (dbg->control & KVM_GUESTDBG_INJECT_DB) 5120 if (dbg->control & KVM_GUESTDBG_INJECT_DB)
5239 kvm_queue_exception(vcpu, DB_VECTOR); 5121 kvm_queue_exception(vcpu, DB_VECTOR);
5240 else 5122 else
5241 kvm_queue_exception(vcpu, BP_VECTOR); 5123 kvm_queue_exception(vcpu, BP_VECTOR);
5242 } 5124 }
5243 5125
5244 /* 5126 /*
5245 * Read rflags as long as potentially injected trace flags are still 5127 * Read rflags as long as potentially injected trace flags are still
5246 * filtered out. 5128 * filtered out.
5247 */ 5129 */
5248 rflags = kvm_get_rflags(vcpu); 5130 rflags = kvm_get_rflags(vcpu);
5249 5131
5250 vcpu->guest_debug = dbg->control; 5132 vcpu->guest_debug = dbg->control;
5251 if (!(vcpu->guest_debug & KVM_GUESTDBG_ENABLE)) 5133 if (!(vcpu->guest_debug & KVM_GUESTDBG_ENABLE))
5252 vcpu->guest_debug = 0; 5134 vcpu->guest_debug = 0;
5253 5135
5254 if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) { 5136 if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) {
5255 for (i = 0; i < KVM_NR_DB_REGS; ++i) 5137 for (i = 0; i < KVM_NR_DB_REGS; ++i)
5256 vcpu->arch.eff_db[i] = dbg->arch.debugreg[i]; 5138 vcpu->arch.eff_db[i] = dbg->arch.debugreg[i];
5257 vcpu->arch.switch_db_regs = 5139 vcpu->arch.switch_db_regs =
5258 (dbg->arch.debugreg[7] & DR7_BP_EN_MASK); 5140 (dbg->arch.debugreg[7] & DR7_BP_EN_MASK);
5259 } else { 5141 } else {
5260 for (i = 0; i < KVM_NR_DB_REGS; i++) 5142 for (i = 0; i < KVM_NR_DB_REGS; i++)
5261 vcpu->arch.eff_db[i] = vcpu->arch.db[i]; 5143 vcpu->arch.eff_db[i] = vcpu->arch.db[i];
5262 vcpu->arch.switch_db_regs = (vcpu->arch.dr7 & DR7_BP_EN_MASK); 5144 vcpu->arch.switch_db_regs = (vcpu->arch.dr7 & DR7_BP_EN_MASK);
5263 } 5145 }
5264 5146
5265 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) 5147 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
5266 vcpu->arch.singlestep_rip = kvm_rip_read(vcpu) + 5148 vcpu->arch.singlestep_rip = kvm_rip_read(vcpu) +
5267 get_segment_base(vcpu, VCPU_SREG_CS); 5149 get_segment_base(vcpu, VCPU_SREG_CS);
5268 5150
5269 /* 5151 /*
5270 * Trigger an rflags update that will inject or remove the trace 5152 * Trigger an rflags update that will inject or remove the trace
5271 * flags. 5153 * flags.
5272 */ 5154 */
5273 kvm_set_rflags(vcpu, rflags); 5155 kvm_set_rflags(vcpu, rflags);
5274 5156
5275 kvm_x86_ops->set_guest_debug(vcpu, dbg); 5157 kvm_x86_ops->set_guest_debug(vcpu, dbg);
5276 5158
5277 r = 0; 5159 r = 0;
5278 5160
5279 out: 5161 out:
5280 5162
5281 return r; 5163 return r;
5282 } 5164 }
5283 5165
5284 /* 5166 /*
5285 * Translate a guest virtual address to a guest physical address. 5167 * Translate a guest virtual address to a guest physical address.
5286 */ 5168 */
5287 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, 5169 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
5288 struct kvm_translation *tr) 5170 struct kvm_translation *tr)
5289 { 5171 {
5290 unsigned long vaddr = tr->linear_address; 5172 unsigned long vaddr = tr->linear_address;
5291 gpa_t gpa; 5173 gpa_t gpa;
5292 int idx; 5174 int idx;
5293 5175
5294 idx = srcu_read_lock(&vcpu->kvm->srcu); 5176 idx = srcu_read_lock(&vcpu->kvm->srcu);
5295 gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL); 5177 gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL);
5296 srcu_read_unlock(&vcpu->kvm->srcu, idx); 5178 srcu_read_unlock(&vcpu->kvm->srcu, idx);
5297 tr->physical_address = gpa; 5179 tr->physical_address = gpa;
5298 tr->valid = gpa != UNMAPPED_GVA; 5180 tr->valid = gpa != UNMAPPED_GVA;
5299 tr->writeable = 1; 5181 tr->writeable = 1;
5300 tr->usermode = 0; 5182 tr->usermode = 0;
5301 5183
5302 return 0; 5184 return 0;
5303 } 5185 }
5304 5186
5305 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 5187 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
5306 { 5188 {
5307 struct i387_fxsave_struct *fxsave = 5189 struct i387_fxsave_struct *fxsave =
5308 &vcpu->arch.guest_fpu.state->fxsave; 5190 &vcpu->arch.guest_fpu.state->fxsave;
5309 5191
5310 memcpy(fpu->fpr, fxsave->st_space, 128); 5192 memcpy(fpu->fpr, fxsave->st_space, 128);
5311 fpu->fcw = fxsave->cwd; 5193 fpu->fcw = fxsave->cwd;
5312 fpu->fsw = fxsave->swd; 5194 fpu->fsw = fxsave->swd;
5313 fpu->ftwx = fxsave->twd; 5195 fpu->ftwx = fxsave->twd;
5314 fpu->last_opcode = fxsave->fop; 5196 fpu->last_opcode = fxsave->fop;
5315 fpu->last_ip = fxsave->rip; 5197 fpu->last_ip = fxsave->rip;
5316 fpu->last_dp = fxsave->rdp; 5198 fpu->last_dp = fxsave->rdp;
5317 memcpy(fpu->xmm, fxsave->xmm_space, sizeof fxsave->xmm_space); 5199 memcpy(fpu->xmm, fxsave->xmm_space, sizeof fxsave->xmm_space);
5318 5200
5319 return 0; 5201 return 0;
5320 } 5202 }
5321 5203
5322 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 5204 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
5323 { 5205 {
5324 struct i387_fxsave_struct *fxsave = 5206 struct i387_fxsave_struct *fxsave =
5325 &vcpu->arch.guest_fpu.state->fxsave; 5207 &vcpu->arch.guest_fpu.state->fxsave;
5326 5208
5327 memcpy(fxsave->st_space, fpu->fpr, 128); 5209 memcpy(fxsave->st_space, fpu->fpr, 128);
5328 fxsave->cwd = fpu->fcw; 5210 fxsave->cwd = fpu->fcw;
5329 fxsave->swd = fpu->fsw; 5211 fxsave->swd = fpu->fsw;
5330 fxsave->twd = fpu->ftwx; 5212 fxsave->twd = fpu->ftwx;
5331 fxsave->fop = fpu->last_opcode; 5213 fxsave->fop = fpu->last_opcode;
5332 fxsave->rip = fpu->last_ip; 5214 fxsave->rip = fpu->last_ip;
5333 fxsave->rdp = fpu->last_dp; 5215 fxsave->rdp = fpu->last_dp;
5334 memcpy(fxsave->xmm_space, fpu->xmm, sizeof fxsave->xmm_space); 5216 memcpy(fxsave->xmm_space, fpu->xmm, sizeof fxsave->xmm_space);
5335 5217
5336 return 0; 5218 return 0;
5337 } 5219 }
5338 5220
5339 int fx_init(struct kvm_vcpu *vcpu) 5221 int fx_init(struct kvm_vcpu *vcpu)
5340 { 5222 {
5341 int err; 5223 int err;
5342 5224
5343 err = fpu_alloc(&vcpu->arch.guest_fpu); 5225 err = fpu_alloc(&vcpu->arch.guest_fpu);
5344 if (err) 5226 if (err)
5345 return err; 5227 return err;
5346 5228
5347 fpu_finit(&vcpu->arch.guest_fpu); 5229 fpu_finit(&vcpu->arch.guest_fpu);
5348 5230
5349 /* 5231 /*
5350 * Ensure guest xcr0 is valid for loading 5232 * Ensure guest xcr0 is valid for loading
5351 */ 5233 */
5352 vcpu->arch.xcr0 = XSTATE_FP; 5234 vcpu->arch.xcr0 = XSTATE_FP;
5353 5235
5354 vcpu->arch.cr0 |= X86_CR0_ET; 5236 vcpu->arch.cr0 |= X86_CR0_ET;
5355 5237
5356 return 0; 5238 return 0;
5357 } 5239 }
5358 EXPORT_SYMBOL_GPL(fx_init); 5240 EXPORT_SYMBOL_GPL(fx_init);
5359 5241
5360 static void fx_free(struct kvm_vcpu *vcpu) 5242 static void fx_free(struct kvm_vcpu *vcpu)
5361 { 5243 {
5362 fpu_free(&vcpu->arch.guest_fpu); 5244 fpu_free(&vcpu->arch.guest_fpu);
5363 } 5245 }
5364 5246
5365 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) 5247 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
5366 { 5248 {
5367 if (vcpu->guest_fpu_loaded) 5249 if (vcpu->guest_fpu_loaded)
5368 return; 5250 return;
5369 5251
5370 /* 5252 /*
5371 * Restore all possible states in the guest, 5253 * Restore all possible states in the guest,
5372 * and assume host would use all available bits. 5254 * and assume host would use all available bits.
5373 * Guest xcr0 would be loaded later. 5255 * Guest xcr0 would be loaded later.
5374 */ 5256 */
5375 kvm_put_guest_xcr0(vcpu); 5257 kvm_put_guest_xcr0(vcpu);
5376 vcpu->guest_fpu_loaded = 1; 5258 vcpu->guest_fpu_loaded = 1;
5377 unlazy_fpu(current); 5259 unlazy_fpu(current);
5378 fpu_restore_checking(&vcpu->arch.guest_fpu); 5260 fpu_restore_checking(&vcpu->arch.guest_fpu);
5379 trace_kvm_fpu(1); 5261 trace_kvm_fpu(1);
5380 } 5262 }
5381 5263
5382 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) 5264 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
5383 { 5265 {
5384 kvm_put_guest_xcr0(vcpu); 5266 kvm_put_guest_xcr0(vcpu);
5385 5267
5386 if (!vcpu->guest_fpu_loaded) 5268 if (!vcpu->guest_fpu_loaded)
5387 return; 5269 return;
5388 5270
5389 vcpu->guest_fpu_loaded = 0; 5271 vcpu->guest_fpu_loaded = 0;
5390 fpu_save_init(&vcpu->arch.guest_fpu); 5272 fpu_save_init(&vcpu->arch.guest_fpu);
5391 ++vcpu->stat.fpu_reload; 5273 ++vcpu->stat.fpu_reload;
5392 set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests); 5274 set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests);
5393 trace_kvm_fpu(0); 5275 trace_kvm_fpu(0);
5394 } 5276 }
5395 5277
5396 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) 5278 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
5397 { 5279 {
5398 if (vcpu->arch.time_page) { 5280 if (vcpu->arch.time_page) {
5399 kvm_release_page_dirty(vcpu->arch.time_page); 5281 kvm_release_page_dirty(vcpu->arch.time_page);
5400 vcpu->arch.time_page = NULL; 5282 vcpu->arch.time_page = NULL;
5401 } 5283 }
5402 5284
5403 fx_free(vcpu); 5285 fx_free(vcpu);
5404 kvm_x86_ops->vcpu_free(vcpu); 5286 kvm_x86_ops->vcpu_free(vcpu);
5405 } 5287 }
5406 5288
5407 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 5289 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
5408 unsigned int id) 5290 unsigned int id)
5409 { 5291 {
5410 return kvm_x86_ops->vcpu_create(kvm, id); 5292 return kvm_x86_ops->vcpu_create(kvm, id);
5411 } 5293 }
5412 5294
5413 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) 5295 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
5414 { 5296 {
5415 int r; 5297 int r;
5416 5298
5417 vcpu->arch.mtrr_state.have_fixed = 1; 5299 vcpu->arch.mtrr_state.have_fixed = 1;
5418 vcpu_load(vcpu); 5300 vcpu_load(vcpu);
5419 r = kvm_arch_vcpu_reset(vcpu); 5301 r = kvm_arch_vcpu_reset(vcpu);
5420 if (r == 0) 5302 if (r == 0)
5421 r = kvm_mmu_setup(vcpu); 5303 r = kvm_mmu_setup(vcpu);
5422 vcpu_put(vcpu); 5304 vcpu_put(vcpu);
5423 if (r < 0) 5305 if (r < 0)
5424 goto free_vcpu; 5306 goto free_vcpu;
5425 5307
5426 return 0; 5308 return 0;
5427 free_vcpu: 5309 free_vcpu:
5428 kvm_x86_ops->vcpu_free(vcpu); 5310 kvm_x86_ops->vcpu_free(vcpu);
5429 return r; 5311 return r;
5430 } 5312 }
5431 5313
5432 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 5314 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
5433 { 5315 {
5434 vcpu_load(vcpu); 5316 vcpu_load(vcpu);
5435 kvm_mmu_unload(vcpu); 5317 kvm_mmu_unload(vcpu);
5436 vcpu_put(vcpu); 5318 vcpu_put(vcpu);
5437 5319
5438 fx_free(vcpu); 5320 fx_free(vcpu);
5439 kvm_x86_ops->vcpu_free(vcpu); 5321 kvm_x86_ops->vcpu_free(vcpu);
5440 } 5322 }
5441 5323
5442 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) 5324 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
5443 { 5325 {
5444 vcpu->arch.nmi_pending = false; 5326 vcpu->arch.nmi_pending = false;
5445 vcpu->arch.nmi_injected = false; 5327 vcpu->arch.nmi_injected = false;
5446 5328
5447 vcpu->arch.switch_db_regs = 0; 5329 vcpu->arch.switch_db_regs = 0;
5448 memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); 5330 memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db));
5449 vcpu->arch.dr6 = DR6_FIXED_1; 5331 vcpu->arch.dr6 = DR6_FIXED_1;
5450 vcpu->arch.dr7 = DR7_FIXED_1; 5332 vcpu->arch.dr7 = DR7_FIXED_1;
5451 5333
5452 return kvm_x86_ops->vcpu_reset(vcpu); 5334 return kvm_x86_ops->vcpu_reset(vcpu);
5453 } 5335 }
5454 5336
5455 int kvm_arch_hardware_enable(void *garbage) 5337 int kvm_arch_hardware_enable(void *garbage)
5456 { 5338 {
5457 /* 5339 /*
5458 * Since this may be called from a hotplug notifcation, 5340 * Since this may be called from a hotplug notifcation,
5459 * we can't get the CPU frequency directly. 5341 * we can't get the CPU frequency directly.
5460 */ 5342 */
5461 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { 5343 if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
5462 int cpu = raw_smp_processor_id(); 5344 int cpu = raw_smp_processor_id();
5463 per_cpu(cpu_tsc_khz, cpu) = 0; 5345 per_cpu(cpu_tsc_khz, cpu) = 0;
5464 } 5346 }
5465 5347
5466 kvm_shared_msr_cpu_online(); 5348 kvm_shared_msr_cpu_online();
5467 5349
5468 return kvm_x86_ops->hardware_enable(garbage); 5350 return kvm_x86_ops->hardware_enable(garbage);
5469 } 5351 }
5470 5352
5471 void kvm_arch_hardware_disable(void *garbage) 5353 void kvm_arch_hardware_disable(void *garbage)
5472 { 5354 {
5473 kvm_x86_ops->hardware_disable(garbage); 5355 kvm_x86_ops->hardware_disable(garbage);
5474 drop_user_return_notifiers(garbage); 5356 drop_user_return_notifiers(garbage);
5475 } 5357 }
5476 5358
5477 int kvm_arch_hardware_setup(void) 5359 int kvm_arch_hardware_setup(void)
5478 { 5360 {
5479 return kvm_x86_ops->hardware_setup(); 5361 return kvm_x86_ops->hardware_setup();
5480 } 5362 }
5481 5363
5482 void kvm_arch_hardware_unsetup(void) 5364 void kvm_arch_hardware_unsetup(void)
5483 { 5365 {
5484 kvm_x86_ops->hardware_unsetup(); 5366 kvm_x86_ops->hardware_unsetup();
5485 } 5367 }
5486 5368
5487 void kvm_arch_check_processor_compat(void *rtn) 5369 void kvm_arch_check_processor_compat(void *rtn)
5488 { 5370 {
5489 kvm_x86_ops->check_processor_compatibility(rtn); 5371 kvm_x86_ops->check_processor_compatibility(rtn);
5490 } 5372 }
5491 5373
5492 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) 5374 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
5493 { 5375 {
5494 struct page *page; 5376 struct page *page;
5495 struct kvm *kvm; 5377 struct kvm *kvm;
5496 int r; 5378 int r;
5497 5379
5498 BUG_ON(vcpu->kvm == NULL); 5380 BUG_ON(vcpu->kvm == NULL);
5499 kvm = vcpu->kvm; 5381 kvm = vcpu->kvm;
5500 5382
5501 vcpu->arch.mmu.root_hpa = INVALID_PAGE; 5383 vcpu->arch.mmu.root_hpa = INVALID_PAGE;
5502 if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) 5384 if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
5503 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 5385 vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
5504 else 5386 else
5505 vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; 5387 vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
5506 5388
5507 page = alloc_page(GFP_KERNEL | __GFP_ZERO); 5389 page = alloc_page(GFP_KERNEL | __GFP_ZERO);
5508 if (!page) { 5390 if (!page) {
5509 r = -ENOMEM; 5391 r = -ENOMEM;
5510 goto fail; 5392 goto fail;
5511 } 5393 }
5512 vcpu->arch.pio_data = page_address(page); 5394 vcpu->arch.pio_data = page_address(page);
5513 5395
5514 r = kvm_mmu_create(vcpu); 5396 r = kvm_mmu_create(vcpu);
5515 if (r < 0) 5397 if (r < 0)
5516 goto fail_free_pio_data; 5398 goto fail_free_pio_data;
5517 5399
5518 if (irqchip_in_kernel(kvm)) { 5400 if (irqchip_in_kernel(kvm)) {
5519 r = kvm_create_lapic(vcpu); 5401 r = kvm_create_lapic(vcpu);
5520 if (r < 0) 5402 if (r < 0)
5521 goto fail_mmu_destroy; 5403 goto fail_mmu_destroy;
5522 } 5404 }
5523 5405
5524 vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4, 5406 vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
5525 GFP_KERNEL); 5407 GFP_KERNEL);
5526 if (!vcpu->arch.mce_banks) { 5408 if (!vcpu->arch.mce_banks) {
5527 r = -ENOMEM; 5409 r = -ENOMEM;
5528 goto fail_free_lapic; 5410 goto fail_free_lapic;
5529 } 5411 }
5530 vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS; 5412 vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
5531 5413
5532 return 0; 5414 return 0;
5533 fail_free_lapic: 5415 fail_free_lapic:
5534 kvm_free_lapic(vcpu); 5416 kvm_free_lapic(vcpu);
5535 fail_mmu_destroy: 5417 fail_mmu_destroy:
5536 kvm_mmu_destroy(vcpu); 5418 kvm_mmu_destroy(vcpu);
5537 fail_free_pio_data: 5419 fail_free_pio_data:
5538 free_page((unsigned long)vcpu->arch.pio_data); 5420 free_page((unsigned long)vcpu->arch.pio_data);
5539 fail: 5421 fail:
5540 return r; 5422 return r;
5541 } 5423 }
5542 5424
5543 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) 5425 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
5544 { 5426 {
5545 int idx; 5427 int idx;
5546 5428
5547 kfree(vcpu->arch.mce_banks); 5429 kfree(vcpu->arch.mce_banks);
5548 kvm_free_lapic(vcpu); 5430 kvm_free_lapic(vcpu);
5549 idx = srcu_read_lock(&vcpu->kvm->srcu); 5431 idx = srcu_read_lock(&vcpu->kvm->srcu);
5550 kvm_mmu_destroy(vcpu); 5432 kvm_mmu_destroy(vcpu);
5551 srcu_read_unlock(&vcpu->kvm->srcu, idx); 5433 srcu_read_unlock(&vcpu->kvm->srcu, idx);
5552 free_page((unsigned long)vcpu->arch.pio_data); 5434 free_page((unsigned long)vcpu->arch.pio_data);
5553 } 5435 }
5554 5436
5555 struct kvm *kvm_arch_create_vm(void) 5437 struct kvm *kvm_arch_create_vm(void)
5556 { 5438 {
5557 struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); 5439 struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
5558 5440
5559 if (!kvm) 5441 if (!kvm)
5560 return ERR_PTR(-ENOMEM); 5442 return ERR_PTR(-ENOMEM);
5561 5443
5562 kvm->arch.aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL);
5563 if (!kvm->arch.aliases) {
5564 kfree(kvm);
5565 return ERR_PTR(-ENOMEM);
5566 }
5567
5568 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); 5444 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
5569 INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); 5445 INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
5570 5446
5571 /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ 5447 /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
5572 set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); 5448 set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
5573 5449
5574 rdtscll(kvm->arch.vm_init_tsc); 5450 rdtscll(kvm->arch.vm_init_tsc);
5575 5451
5576 return kvm; 5452 return kvm;
5577 } 5453 }
5578 5454
5579 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu) 5455 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
5580 { 5456 {
5581 vcpu_load(vcpu); 5457 vcpu_load(vcpu);
5582 kvm_mmu_unload(vcpu); 5458 kvm_mmu_unload(vcpu);
5583 vcpu_put(vcpu); 5459 vcpu_put(vcpu);
5584 } 5460 }
5585 5461
5586 static void kvm_free_vcpus(struct kvm *kvm) 5462 static void kvm_free_vcpus(struct kvm *kvm)
5587 { 5463 {
5588 unsigned int i; 5464 unsigned int i;
5589 struct kvm_vcpu *vcpu; 5465 struct kvm_vcpu *vcpu;
5590 5466
5591 /* 5467 /*
5592 * Unpin any mmu pages first. 5468 * Unpin any mmu pages first.
5593 */ 5469 */
5594 kvm_for_each_vcpu(i, vcpu, kvm) 5470 kvm_for_each_vcpu(i, vcpu, kvm)
5595 kvm_unload_vcpu_mmu(vcpu); 5471 kvm_unload_vcpu_mmu(vcpu);
5596 kvm_for_each_vcpu(i, vcpu, kvm) 5472 kvm_for_each_vcpu(i, vcpu, kvm)
5597 kvm_arch_vcpu_free(vcpu); 5473 kvm_arch_vcpu_free(vcpu);
5598 5474
5599 mutex_lock(&kvm->lock); 5475 mutex_lock(&kvm->lock);
5600 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) 5476 for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
5601 kvm->vcpus[i] = NULL; 5477 kvm->vcpus[i] = NULL;
5602 5478
5603 atomic_set(&kvm->online_vcpus, 0); 5479 atomic_set(&kvm->online_vcpus, 0);
5604 mutex_unlock(&kvm->lock); 5480 mutex_unlock(&kvm->lock);
5605 } 5481 }
5606 5482
5607 void kvm_arch_sync_events(struct kvm *kvm) 5483 void kvm_arch_sync_events(struct kvm *kvm)
5608 { 5484 {
5609 kvm_free_all_assigned_devices(kvm); 5485 kvm_free_all_assigned_devices(kvm);
5610 } 5486 }
5611 5487
5612 void kvm_arch_destroy_vm(struct kvm *kvm) 5488 void kvm_arch_destroy_vm(struct kvm *kvm)
5613 { 5489 {
5614 kvm_iommu_unmap_guest(kvm); 5490 kvm_iommu_unmap_guest(kvm);
5615 kvm_free_pit(kvm); 5491 kvm_free_pit(kvm);
5616 kfree(kvm->arch.vpic); 5492 kfree(kvm->arch.vpic);
5617 kfree(kvm->arch.vioapic); 5493 kfree(kvm->arch.vioapic);
5618 kvm_free_vcpus(kvm); 5494 kvm_free_vcpus(kvm);
5619 kvm_free_physmem(kvm); 5495 kvm_free_physmem(kvm);
5620 if (kvm->arch.apic_access_page) 5496 if (kvm->arch.apic_access_page)
5621 put_page(kvm->arch.apic_access_page); 5497 put_page(kvm->arch.apic_access_page);
5622 if (kvm->arch.ept_identity_pagetable) 5498 if (kvm->arch.ept_identity_pagetable)
5623 put_page(kvm->arch.ept_identity_pagetable); 5499 put_page(kvm->arch.ept_identity_pagetable);
5624 cleanup_srcu_struct(&kvm->srcu); 5500 cleanup_srcu_struct(&kvm->srcu);
5625 kfree(kvm->arch.aliases);
5626 kfree(kvm); 5501 kfree(kvm);
5627 } 5502 }
5628 5503
5629 int kvm_arch_prepare_memory_region(struct kvm *kvm, 5504 int kvm_arch_prepare_memory_region(struct kvm *kvm,
5630 struct kvm_memory_slot *memslot, 5505 struct kvm_memory_slot *memslot,
5631 struct kvm_memory_slot old, 5506 struct kvm_memory_slot old,
5632 struct kvm_userspace_memory_region *mem, 5507 struct kvm_userspace_memory_region *mem,
5633 int user_alloc) 5508 int user_alloc)
5634 { 5509 {
5635 int npages = memslot->npages; 5510 int npages = memslot->npages;
5636 5511
5637 /*To keep backward compatibility with older userspace, 5512 /*To keep backward compatibility with older userspace,
5638 *x86 needs to hanlde !user_alloc case. 5513 *x86 needs to hanlde !user_alloc case.
5639 */ 5514 */
5640 if (!user_alloc) { 5515 if (!user_alloc) {
5641 if (npages && !old.rmap) { 5516 if (npages && !old.rmap) {
5642 unsigned long userspace_addr; 5517 unsigned long userspace_addr;
5643 5518
5644 down_write(&current->mm->mmap_sem); 5519 down_write(&current->mm->mmap_sem);
5645 userspace_addr = do_mmap(NULL, 0, 5520 userspace_addr = do_mmap(NULL, 0,
5646 npages * PAGE_SIZE, 5521 npages * PAGE_SIZE,
5647 PROT_READ | PROT_WRITE, 5522 PROT_READ | PROT_WRITE,
5648 MAP_PRIVATE | MAP_ANONYMOUS, 5523 MAP_PRIVATE | MAP_ANONYMOUS,
5649 0); 5524 0);
5650 up_write(&current->mm->mmap_sem); 5525 up_write(&current->mm->mmap_sem);
5651 5526
5652 if (IS_ERR((void *)userspace_addr)) 5527 if (IS_ERR((void *)userspace_addr))
5653 return PTR_ERR((void *)userspace_addr); 5528 return PTR_ERR((void *)userspace_addr);
5654 5529
5655 memslot->userspace_addr = userspace_addr; 5530 memslot->userspace_addr = userspace_addr;
5656 } 5531 }
5657 } 5532 }
5658 5533
5659 5534
5660 return 0; 5535 return 0;
5661 } 5536 }
5662 5537
5663 void kvm_arch_commit_memory_region(struct kvm *kvm, 5538 void kvm_arch_commit_memory_region(struct kvm *kvm,
5664 struct kvm_userspace_memory_region *mem, 5539 struct kvm_userspace_memory_region *mem,
5665 struct kvm_memory_slot old, 5540 struct kvm_memory_slot old,
5666 int user_alloc) 5541 int user_alloc)
5667 { 5542 {
5668 5543
5669 int npages = mem->memory_size >> PAGE_SHIFT; 5544 int npages = mem->memory_size >> PAGE_SHIFT;
5670 5545
5671 if (!user_alloc && !old.user_alloc && old.rmap && !npages) { 5546 if (!user_alloc && !old.user_alloc && old.rmap && !npages) {
5672 int ret; 5547 int ret;
5673 5548
5674 down_write(&current->mm->mmap_sem); 5549 down_write(&current->mm->mmap_sem);
5675 ret = do_munmap(current->mm, old.userspace_addr, 5550 ret = do_munmap(current->mm, old.userspace_addr,
5676 old.npages * PAGE_SIZE); 5551 old.npages * PAGE_SIZE);
5677 up_write(&current->mm->mmap_sem); 5552 up_write(&current->mm->mmap_sem);
5678 if (ret < 0) 5553 if (ret < 0)
5679 printk(KERN_WARNING 5554 printk(KERN_WARNING
5680 "kvm_vm_ioctl_set_memory_region: " 5555 "kvm_vm_ioctl_set_memory_region: "
5681 "failed to munmap memory\n"); 5556 "failed to munmap memory\n");
5682 } 5557 }
5683 5558
5684 spin_lock(&kvm->mmu_lock); 5559 spin_lock(&kvm->mmu_lock);
5685 if (!kvm->arch.n_requested_mmu_pages) { 5560 if (!kvm->arch.n_requested_mmu_pages) {
5686 unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); 5561 unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
5687 kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); 5562 kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
5688 } 5563 }
5689 5564
5690 kvm_mmu_slot_remove_write_access(kvm, mem->slot); 5565 kvm_mmu_slot_remove_write_access(kvm, mem->slot);
5691 spin_unlock(&kvm->mmu_lock); 5566 spin_unlock(&kvm->mmu_lock);
5692 } 5567 }
5693 5568
5694 void kvm_arch_flush_shadow(struct kvm *kvm) 5569 void kvm_arch_flush_shadow(struct kvm *kvm)
5695 { 5570 {
5696 kvm_mmu_zap_all(kvm); 5571 kvm_mmu_zap_all(kvm);
5697 kvm_reload_remote_mmus(kvm); 5572 kvm_reload_remote_mmus(kvm);
5698 } 5573 }
5699 5574
5700 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 5575 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
5701 { 5576 {
5702 return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE 5577 return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
5703 || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED 5578 || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
5704 || vcpu->arch.nmi_pending || 5579 || vcpu->arch.nmi_pending ||
5705 (kvm_arch_interrupt_allowed(vcpu) && 5580 (kvm_arch_interrupt_allowed(vcpu) &&
5706 kvm_cpu_has_interrupt(vcpu)); 5581 kvm_cpu_has_interrupt(vcpu));
5707 } 5582 }
5708 5583
5709 void kvm_vcpu_kick(struct kvm_vcpu *vcpu) 5584 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
5710 { 5585 {
5711 int me; 5586 int me;
5712 int cpu = vcpu->cpu; 5587 int cpu = vcpu->cpu;
5713 5588
5714 if (waitqueue_active(&vcpu->wq)) { 5589 if (waitqueue_active(&vcpu->wq)) {
5715 wake_up_interruptible(&vcpu->wq); 5590 wake_up_interruptible(&vcpu->wq);
5716 ++vcpu->stat.halt_wakeup; 5591 ++vcpu->stat.halt_wakeup;
5717 } 5592 }
5718 5593
5719 me = get_cpu(); 5594 me = get_cpu();
5720 if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) 5595 if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
5721 if (atomic_xchg(&vcpu->guest_mode, 0)) 5596 if (atomic_xchg(&vcpu->guest_mode, 0))
5722 smp_send_reschedule(cpu); 5597 smp_send_reschedule(cpu);
5723 put_cpu(); 5598 put_cpu();
5724 } 5599 }
5725 5600
5726 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) 5601 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
5727 { 5602 {
5728 return kvm_x86_ops->interrupt_allowed(vcpu); 5603 return kvm_x86_ops->interrupt_allowed(vcpu);
5729 } 5604 }
5730 5605
5731 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip) 5606 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
5732 { 5607 {
5733 unsigned long current_rip = kvm_rip_read(vcpu) + 5608 unsigned long current_rip = kvm_rip_read(vcpu) +
5734 get_segment_base(vcpu, VCPU_SREG_CS); 5609 get_segment_base(vcpu, VCPU_SREG_CS);
5735 5610
5736 return current_rip == linear_rip; 5611 return current_rip == linear_rip;
5737 } 5612 }
5738 EXPORT_SYMBOL_GPL(kvm_is_linear_rip); 5613 EXPORT_SYMBOL_GPL(kvm_is_linear_rip);
5739 5614
5740 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) 5615 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
5741 { 5616 {
5742 unsigned long rflags; 5617 unsigned long rflags;
5743 5618
5744 rflags = kvm_x86_ops->get_rflags(vcpu); 5619 rflags = kvm_x86_ops->get_rflags(vcpu);
5745 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) 5620 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
5746 rflags &= ~X86_EFLAGS_TF; 5621 rflags &= ~X86_EFLAGS_TF;
5747 return rflags; 5622 return rflags;
5748 } 5623 }
5749 EXPORT_SYMBOL_GPL(kvm_get_rflags); 5624 EXPORT_SYMBOL_GPL(kvm_get_rflags);
5750 5625
5751 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) 5626 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
5752 { 5627 {
5753 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP && 5628 if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
5754 kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip)) 5629 kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
5755 rflags |= X86_EFLAGS_TF; 5630 rflags |= X86_EFLAGS_TF;
5756 kvm_x86_ops->set_rflags(vcpu, rflags); 5631 kvm_x86_ops->set_rflags(vcpu, rflags);
5757 } 5632 }
5758 EXPORT_SYMBOL_GPL(kvm_set_rflags); 5633 EXPORT_SYMBOL_GPL(kvm_set_rflags);
5759 5634
5760 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); 5635 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
5761 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); 5636 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
5762 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); 5637 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
5763 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); 5638 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
5764 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); 5639 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
5765 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); 5640 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
5766 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); 5641 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
5767 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); 5642 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
5768 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit); 5643 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
5769 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga); 5644 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
5770 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit); 5645 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
5771 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts); 5646 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
5772 5647
1 #ifndef ARCH_X86_KVM_X86_H 1 #ifndef ARCH_X86_KVM_X86_H
2 #define ARCH_X86_KVM_X86_H 2 #define ARCH_X86_KVM_X86_H
3 3
4 #include <linux/kvm_host.h> 4 #include <linux/kvm_host.h>
5 #include "kvm_cache_regs.h" 5 #include "kvm_cache_regs.h"
6 6
7 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) 7 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
8 { 8 {
9 vcpu->arch.exception.pending = false; 9 vcpu->arch.exception.pending = false;
10 } 10 }
11 11
12 static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector, 12 static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector,
13 bool soft) 13 bool soft)
14 { 14 {
15 vcpu->arch.interrupt.pending = true; 15 vcpu->arch.interrupt.pending = true;
16 vcpu->arch.interrupt.soft = soft; 16 vcpu->arch.interrupt.soft = soft;
17 vcpu->arch.interrupt.nr = vector; 17 vcpu->arch.interrupt.nr = vector;
18 } 18 }
19 19
20 static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu) 20 static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu)
21 { 21 {
22 vcpu->arch.interrupt.pending = false; 22 vcpu->arch.interrupt.pending = false;
23 } 23 }
24 24
25 static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu) 25 static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu)
26 { 26 {
27 return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending || 27 return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending ||
28 vcpu->arch.nmi_injected; 28 vcpu->arch.nmi_injected;
29 } 29 }
30 30
31 static inline bool kvm_exception_is_soft(unsigned int nr) 31 static inline bool kvm_exception_is_soft(unsigned int nr)
32 { 32 {
33 return (nr == BP_VECTOR) || (nr == OF_VECTOR); 33 return (nr == BP_VECTOR) || (nr == OF_VECTOR);
34 } 34 }
35 35
36 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 36 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
37 u32 function, u32 index); 37 u32 function, u32 index);
38 38
39 static inline bool is_protmode(struct kvm_vcpu *vcpu) 39 static inline bool is_protmode(struct kvm_vcpu *vcpu)
40 { 40 {
41 return kvm_read_cr0_bits(vcpu, X86_CR0_PE); 41 return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
42 } 42 }
43 43
44 static inline int is_long_mode(struct kvm_vcpu *vcpu) 44 static inline int is_long_mode(struct kvm_vcpu *vcpu)
45 { 45 {
46 #ifdef CONFIG_X86_64 46 #ifdef CONFIG_X86_64
47 return vcpu->arch.efer & EFER_LMA; 47 return vcpu->arch.efer & EFER_LMA;
48 #else 48 #else
49 return 0; 49 return 0;
50 #endif 50 #endif
51 } 51 }
52 52
53 static inline int is_pae(struct kvm_vcpu *vcpu) 53 static inline int is_pae(struct kvm_vcpu *vcpu)
54 { 54 {
55 return kvm_read_cr4_bits(vcpu, X86_CR4_PAE); 55 return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
56 } 56 }
57 57
58 static inline int is_pse(struct kvm_vcpu *vcpu) 58 static inline int is_pse(struct kvm_vcpu *vcpu)
59 { 59 {
60 return kvm_read_cr4_bits(vcpu, X86_CR4_PSE); 60 return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
61 } 61 }
62 62
63 static inline int is_paging(struct kvm_vcpu *vcpu) 63 static inline int is_paging(struct kvm_vcpu *vcpu)
64 { 64 {
65 return kvm_read_cr0_bits(vcpu, X86_CR0_PG); 65 return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
66 } 66 }
67 67
68 static inline struct kvm_mem_aliases *kvm_aliases(struct kvm *kvm)
69 {
70 return rcu_dereference_check(kvm->arch.aliases,
71 srcu_read_lock_held(&kvm->srcu)
72 || lockdep_is_held(&kvm->slots_lock));
73 }
74
75 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); 68 void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
76 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); 69 void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
77 70
78 #endif 71 #endif
79 72
1 #ifndef __LINUX_KVM_H 1 #ifndef __LINUX_KVM_H
2 #define __LINUX_KVM_H 2 #define __LINUX_KVM_H
3 3
4 /* 4 /*
5 * Userspace interface for /dev/kvm - kernel based virtual machine 5 * Userspace interface for /dev/kvm - kernel based virtual machine
6 * 6 *
7 * Note: you must update KVM_API_VERSION if you change this interface. 7 * Note: you must update KVM_API_VERSION if you change this interface.
8 */ 8 */
9 9
10 #include <linux/types.h> 10 #include <linux/types.h>
11 #include <linux/compiler.h> 11 #include <linux/compiler.h>
12 #include <linux/ioctl.h> 12 #include <linux/ioctl.h>
13 #include <asm/kvm.h> 13 #include <asm/kvm.h>
14 14
15 #define KVM_API_VERSION 12 15 #define KVM_API_VERSION 12
16 16
17 /* *** Deprecated interfaces *** */ 17 /* *** Deprecated interfaces *** */
18 18
19 #define KVM_TRC_SHIFT 16 19 #define KVM_TRC_SHIFT 16
20 20
21 #define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) 21 #define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT)
22 #define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) 22 #define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1))
23 23
24 #define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) 24 #define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
25 #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) 25 #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02)
26 #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) 26 #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01)
27 27
28 #define KVM_TRC_HEAD_SIZE 12 28 #define KVM_TRC_HEAD_SIZE 12
29 #define KVM_TRC_CYCLE_SIZE 8 29 #define KVM_TRC_CYCLE_SIZE 8
30 #define KVM_TRC_EXTRA_MAX 7 30 #define KVM_TRC_EXTRA_MAX 7
31 31
32 #define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02) 32 #define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
33 #define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03) 33 #define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
34 #define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04) 34 #define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
35 #define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05) 35 #define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05)
36 #define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06) 36 #define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06)
37 #define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07) 37 #define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07)
38 #define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08) 38 #define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08)
39 #define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09) 39 #define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09)
40 #define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A) 40 #define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A)
41 #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) 41 #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B)
42 #define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C) 42 #define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C)
43 #define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D) 43 #define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D)
44 #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) 44 #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E)
45 #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) 45 #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F)
46 #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) 46 #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10)
47 #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) 47 #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11)
48 #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) 48 #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12)
49 #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) 49 #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13)
50 #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) 50 #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14)
51 #define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15) 51 #define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15)
52 #define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16) 52 #define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16)
53 #define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17) 53 #define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17)
54 #define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18) 54 #define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18)
55 #define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19) 55 #define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19)
56 56
57 struct kvm_user_trace_setup { 57 struct kvm_user_trace_setup {
58 __u32 buf_size; 58 __u32 buf_size;
59 __u32 buf_nr; 59 __u32 buf_nr;
60 }; 60 };
61 61
62 #define __KVM_DEPRECATED_MAIN_W_0x06 \ 62 #define __KVM_DEPRECATED_MAIN_W_0x06 \
63 _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) 63 _IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
64 #define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07) 64 #define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07)
65 #define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08) 65 #define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08)
66 66
67 #define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq) 67 #define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq)
68 68
69 struct kvm_breakpoint { 69 struct kvm_breakpoint {
70 __u32 enabled; 70 __u32 enabled;
71 __u32 padding; 71 __u32 padding;
72 __u64 address; 72 __u64 address;
73 }; 73 };
74 74
75 struct kvm_debug_guest { 75 struct kvm_debug_guest {
76 __u32 enabled; 76 __u32 enabled;
77 __u32 pad; 77 __u32 pad;
78 struct kvm_breakpoint breakpoints[4]; 78 struct kvm_breakpoint breakpoints[4];
79 __u32 singlestep; 79 __u32 singlestep;
80 }; 80 };
81 81
82 #define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest) 82 #define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest)
83 83
84 /* *** End of deprecated interfaces *** */ 84 /* *** End of deprecated interfaces *** */
85 85
86 86
87 /* for KVM_CREATE_MEMORY_REGION */ 87 /* for KVM_CREATE_MEMORY_REGION */
88 struct kvm_memory_region { 88 struct kvm_memory_region {
89 __u32 slot; 89 __u32 slot;
90 __u32 flags; 90 __u32 flags;
91 __u64 guest_phys_addr; 91 __u64 guest_phys_addr;
92 __u64 memory_size; /* bytes */ 92 __u64 memory_size; /* bytes */
93 }; 93 };
94 94
95 /* for KVM_SET_USER_MEMORY_REGION */ 95 /* for KVM_SET_USER_MEMORY_REGION */
96 struct kvm_userspace_memory_region { 96 struct kvm_userspace_memory_region {
97 __u32 slot; 97 __u32 slot;
98 __u32 flags; 98 __u32 flags;
99 __u64 guest_phys_addr; 99 __u64 guest_phys_addr;
100 __u64 memory_size; /* bytes */ 100 __u64 memory_size; /* bytes */
101 __u64 userspace_addr; /* start of the userspace allocated memory */ 101 __u64 userspace_addr; /* start of the userspace allocated memory */
102 }; 102 };
103 103
104 /* for kvm_memory_region::flags */ 104 /* for kvm_memory_region::flags */
105 #define KVM_MEM_LOG_DIRTY_PAGES 1UL 105 #define KVM_MEM_LOG_DIRTY_PAGES 1UL
106 #define KVM_MEMSLOT_INVALID (1UL << 1) 106 #define KVM_MEMSLOT_INVALID (1UL << 1)
107 107
108 /* for KVM_IRQ_LINE */ 108 /* for KVM_IRQ_LINE */
109 struct kvm_irq_level { 109 struct kvm_irq_level {
110 /* 110 /*
111 * ACPI gsi notion of irq. 111 * ACPI gsi notion of irq.
112 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47.. 112 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
113 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23.. 113 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
114 */ 114 */
115 union { 115 union {
116 __u32 irq; 116 __u32 irq;
117 __s32 status; 117 __s32 status;
118 }; 118 };
119 __u32 level; 119 __u32 level;
120 }; 120 };
121 121
122 122
123 struct kvm_irqchip { 123 struct kvm_irqchip {
124 __u32 chip_id; 124 __u32 chip_id;
125 __u32 pad; 125 __u32 pad;
126 union { 126 union {
127 char dummy[512]; /* reserving space */ 127 char dummy[512]; /* reserving space */
128 #ifdef __KVM_HAVE_PIT 128 #ifdef __KVM_HAVE_PIT
129 struct kvm_pic_state pic; 129 struct kvm_pic_state pic;
130 #endif 130 #endif
131 #ifdef __KVM_HAVE_IOAPIC 131 #ifdef __KVM_HAVE_IOAPIC
132 struct kvm_ioapic_state ioapic; 132 struct kvm_ioapic_state ioapic;
133 #endif 133 #endif
134 } chip; 134 } chip;
135 }; 135 };
136 136
137 /* for KVM_CREATE_PIT2 */ 137 /* for KVM_CREATE_PIT2 */
138 struct kvm_pit_config { 138 struct kvm_pit_config {
139 __u32 flags; 139 __u32 flags;
140 __u32 pad[15]; 140 __u32 pad[15];
141 }; 141 };
142 142
143 #define KVM_PIT_SPEAKER_DUMMY 1 143 #define KVM_PIT_SPEAKER_DUMMY 1
144 144
145 #define KVM_EXIT_UNKNOWN 0 145 #define KVM_EXIT_UNKNOWN 0
146 #define KVM_EXIT_EXCEPTION 1 146 #define KVM_EXIT_EXCEPTION 1
147 #define KVM_EXIT_IO 2 147 #define KVM_EXIT_IO 2
148 #define KVM_EXIT_HYPERCALL 3 148 #define KVM_EXIT_HYPERCALL 3
149 #define KVM_EXIT_DEBUG 4 149 #define KVM_EXIT_DEBUG 4
150 #define KVM_EXIT_HLT 5 150 #define KVM_EXIT_HLT 5
151 #define KVM_EXIT_MMIO 6 151 #define KVM_EXIT_MMIO 6
152 #define KVM_EXIT_IRQ_WINDOW_OPEN 7 152 #define KVM_EXIT_IRQ_WINDOW_OPEN 7
153 #define KVM_EXIT_SHUTDOWN 8 153 #define KVM_EXIT_SHUTDOWN 8
154 #define KVM_EXIT_FAIL_ENTRY 9 154 #define KVM_EXIT_FAIL_ENTRY 9
155 #define KVM_EXIT_INTR 10 155 #define KVM_EXIT_INTR 10
156 #define KVM_EXIT_SET_TPR 11 156 #define KVM_EXIT_SET_TPR 11
157 #define KVM_EXIT_TPR_ACCESS 12 157 #define KVM_EXIT_TPR_ACCESS 12
158 #define KVM_EXIT_S390_SIEIC 13 158 #define KVM_EXIT_S390_SIEIC 13
159 #define KVM_EXIT_S390_RESET 14 159 #define KVM_EXIT_S390_RESET 14
160 #define KVM_EXIT_DCR 15 160 #define KVM_EXIT_DCR 15
161 #define KVM_EXIT_NMI 16 161 #define KVM_EXIT_NMI 16
162 #define KVM_EXIT_INTERNAL_ERROR 17 162 #define KVM_EXIT_INTERNAL_ERROR 17
163 #define KVM_EXIT_OSI 18 163 #define KVM_EXIT_OSI 18
164 164
165 /* For KVM_EXIT_INTERNAL_ERROR */ 165 /* For KVM_EXIT_INTERNAL_ERROR */
166 #define KVM_INTERNAL_ERROR_EMULATION 1 166 #define KVM_INTERNAL_ERROR_EMULATION 1
167 #define KVM_INTERNAL_ERROR_SIMUL_EX 2 167 #define KVM_INTERNAL_ERROR_SIMUL_EX 2
168 168
169 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ 169 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
170 struct kvm_run { 170 struct kvm_run {
171 /* in */ 171 /* in */
172 __u8 request_interrupt_window; 172 __u8 request_interrupt_window;
173 __u8 padding1[7]; 173 __u8 padding1[7];
174 174
175 /* out */ 175 /* out */
176 __u32 exit_reason; 176 __u32 exit_reason;
177 __u8 ready_for_interrupt_injection; 177 __u8 ready_for_interrupt_injection;
178 __u8 if_flag; 178 __u8 if_flag;
179 __u8 padding2[2]; 179 __u8 padding2[2];
180 180
181 /* in (pre_kvm_run), out (post_kvm_run) */ 181 /* in (pre_kvm_run), out (post_kvm_run) */
182 __u64 cr8; 182 __u64 cr8;
183 __u64 apic_base; 183 __u64 apic_base;
184 184
185 #ifdef __KVM_S390 185 #ifdef __KVM_S390
186 /* the processor status word for s390 */ 186 /* the processor status word for s390 */
187 __u64 psw_mask; /* psw upper half */ 187 __u64 psw_mask; /* psw upper half */
188 __u64 psw_addr; /* psw lower half */ 188 __u64 psw_addr; /* psw lower half */
189 #endif 189 #endif
190 union { 190 union {
191 /* KVM_EXIT_UNKNOWN */ 191 /* KVM_EXIT_UNKNOWN */
192 struct { 192 struct {
193 __u64 hardware_exit_reason; 193 __u64 hardware_exit_reason;
194 } hw; 194 } hw;
195 /* KVM_EXIT_FAIL_ENTRY */ 195 /* KVM_EXIT_FAIL_ENTRY */
196 struct { 196 struct {
197 __u64 hardware_entry_failure_reason; 197 __u64 hardware_entry_failure_reason;
198 } fail_entry; 198 } fail_entry;
199 /* KVM_EXIT_EXCEPTION */ 199 /* KVM_EXIT_EXCEPTION */
200 struct { 200 struct {
201 __u32 exception; 201 __u32 exception;
202 __u32 error_code; 202 __u32 error_code;
203 } ex; 203 } ex;
204 /* KVM_EXIT_IO */ 204 /* KVM_EXIT_IO */
205 struct { 205 struct {
206 #define KVM_EXIT_IO_IN 0 206 #define KVM_EXIT_IO_IN 0
207 #define KVM_EXIT_IO_OUT 1 207 #define KVM_EXIT_IO_OUT 1
208 __u8 direction; 208 __u8 direction;
209 __u8 size; /* bytes */ 209 __u8 size; /* bytes */
210 __u16 port; 210 __u16 port;
211 __u32 count; 211 __u32 count;
212 __u64 data_offset; /* relative to kvm_run start */ 212 __u64 data_offset; /* relative to kvm_run start */
213 } io; 213 } io;
214 struct { 214 struct {
215 struct kvm_debug_exit_arch arch; 215 struct kvm_debug_exit_arch arch;
216 } debug; 216 } debug;
217 /* KVM_EXIT_MMIO */ 217 /* KVM_EXIT_MMIO */
218 struct { 218 struct {
219 __u64 phys_addr; 219 __u64 phys_addr;
220 __u8 data[8]; 220 __u8 data[8];
221 __u32 len; 221 __u32 len;
222 __u8 is_write; 222 __u8 is_write;
223 } mmio; 223 } mmio;
224 /* KVM_EXIT_HYPERCALL */ 224 /* KVM_EXIT_HYPERCALL */
225 struct { 225 struct {
226 __u64 nr; 226 __u64 nr;
227 __u64 args[6]; 227 __u64 args[6];
228 __u64 ret; 228 __u64 ret;
229 __u32 longmode; 229 __u32 longmode;
230 __u32 pad; 230 __u32 pad;
231 } hypercall; 231 } hypercall;
232 /* KVM_EXIT_TPR_ACCESS */ 232 /* KVM_EXIT_TPR_ACCESS */
233 struct { 233 struct {
234 __u64 rip; 234 __u64 rip;
235 __u32 is_write; 235 __u32 is_write;
236 __u32 pad; 236 __u32 pad;
237 } tpr_access; 237 } tpr_access;
238 /* KVM_EXIT_S390_SIEIC */ 238 /* KVM_EXIT_S390_SIEIC */
239 struct { 239 struct {
240 __u8 icptcode; 240 __u8 icptcode;
241 __u16 ipa; 241 __u16 ipa;
242 __u32 ipb; 242 __u32 ipb;
243 } s390_sieic; 243 } s390_sieic;
244 /* KVM_EXIT_S390_RESET */ 244 /* KVM_EXIT_S390_RESET */
245 #define KVM_S390_RESET_POR 1 245 #define KVM_S390_RESET_POR 1
246 #define KVM_S390_RESET_CLEAR 2 246 #define KVM_S390_RESET_CLEAR 2
247 #define KVM_S390_RESET_SUBSYSTEM 4 247 #define KVM_S390_RESET_SUBSYSTEM 4
248 #define KVM_S390_RESET_CPU_INIT 8 248 #define KVM_S390_RESET_CPU_INIT 8
249 #define KVM_S390_RESET_IPL 16 249 #define KVM_S390_RESET_IPL 16
250 __u64 s390_reset_flags; 250 __u64 s390_reset_flags;
251 /* KVM_EXIT_DCR */ 251 /* KVM_EXIT_DCR */
252 struct { 252 struct {
253 __u32 dcrn; 253 __u32 dcrn;
254 __u32 data; 254 __u32 data;
255 __u8 is_write; 255 __u8 is_write;
256 } dcr; 256 } dcr;
257 struct { 257 struct {
258 __u32 suberror; 258 __u32 suberror;
259 /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */ 259 /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
260 __u32 ndata; 260 __u32 ndata;
261 __u64 data[16]; 261 __u64 data[16];
262 } internal; 262 } internal;
263 /* KVM_EXIT_OSI */ 263 /* KVM_EXIT_OSI */
264 struct { 264 struct {
265 __u64 gprs[32]; 265 __u64 gprs[32];
266 } osi; 266 } osi;
267 /* Fix the size of the union. */ 267 /* Fix the size of the union. */
268 char padding[256]; 268 char padding[256];
269 }; 269 };
270 }; 270 };
271 271
272 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ 272 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
273 273
274 struct kvm_coalesced_mmio_zone { 274 struct kvm_coalesced_mmio_zone {
275 __u64 addr; 275 __u64 addr;
276 __u32 size; 276 __u32 size;
277 __u32 pad; 277 __u32 pad;
278 }; 278 };
279 279
280 struct kvm_coalesced_mmio { 280 struct kvm_coalesced_mmio {
281 __u64 phys_addr; 281 __u64 phys_addr;
282 __u32 len; 282 __u32 len;
283 __u32 pad; 283 __u32 pad;
284 __u8 data[8]; 284 __u8 data[8];
285 }; 285 };
286 286
287 struct kvm_coalesced_mmio_ring { 287 struct kvm_coalesced_mmio_ring {
288 __u32 first, last; 288 __u32 first, last;
289 struct kvm_coalesced_mmio coalesced_mmio[0]; 289 struct kvm_coalesced_mmio coalesced_mmio[0];
290 }; 290 };
291 291
292 #define KVM_COALESCED_MMIO_MAX \ 292 #define KVM_COALESCED_MMIO_MAX \
293 ((PAGE_SIZE - sizeof(struct kvm_coalesced_mmio_ring)) / \ 293 ((PAGE_SIZE - sizeof(struct kvm_coalesced_mmio_ring)) / \
294 sizeof(struct kvm_coalesced_mmio)) 294 sizeof(struct kvm_coalesced_mmio))
295 295
296 /* for KVM_TRANSLATE */ 296 /* for KVM_TRANSLATE */
297 struct kvm_translation { 297 struct kvm_translation {
298 /* in */ 298 /* in */
299 __u64 linear_address; 299 __u64 linear_address;
300 300
301 /* out */ 301 /* out */
302 __u64 physical_address; 302 __u64 physical_address;
303 __u8 valid; 303 __u8 valid;
304 __u8 writeable; 304 __u8 writeable;
305 __u8 usermode; 305 __u8 usermode;
306 __u8 pad[5]; 306 __u8 pad[5];
307 }; 307 };
308 308
309 /* for KVM_INTERRUPT */ 309 /* for KVM_INTERRUPT */
310 struct kvm_interrupt { 310 struct kvm_interrupt {
311 /* in */ 311 /* in */
312 __u32 irq; 312 __u32 irq;
313 }; 313 };
314 314
315 /* for KVM_GET_DIRTY_LOG */ 315 /* for KVM_GET_DIRTY_LOG */
316 struct kvm_dirty_log { 316 struct kvm_dirty_log {
317 __u32 slot; 317 __u32 slot;
318 __u32 padding1; 318 __u32 padding1;
319 union { 319 union {
320 void __user *dirty_bitmap; /* one bit per page */ 320 void __user *dirty_bitmap; /* one bit per page */
321 __u64 padding2; 321 __u64 padding2;
322 }; 322 };
323 }; 323 };
324 324
325 /* for KVM_SET_SIGNAL_MASK */ 325 /* for KVM_SET_SIGNAL_MASK */
326 struct kvm_signal_mask { 326 struct kvm_signal_mask {
327 __u32 len; 327 __u32 len;
328 __u8 sigset[0]; 328 __u8 sigset[0];
329 }; 329 };
330 330
331 /* for KVM_TPR_ACCESS_REPORTING */ 331 /* for KVM_TPR_ACCESS_REPORTING */
332 struct kvm_tpr_access_ctl { 332 struct kvm_tpr_access_ctl {
333 __u32 enabled; 333 __u32 enabled;
334 __u32 flags; 334 __u32 flags;
335 __u32 reserved[8]; 335 __u32 reserved[8];
336 }; 336 };
337 337
338 /* for KVM_SET_VAPIC_ADDR */ 338 /* for KVM_SET_VAPIC_ADDR */
339 struct kvm_vapic_addr { 339 struct kvm_vapic_addr {
340 __u64 vapic_addr; 340 __u64 vapic_addr;
341 }; 341 };
342 342
343 /* for KVM_SET_MPSTATE */ 343 /* for KVM_SET_MPSTATE */
344 344
345 #define KVM_MP_STATE_RUNNABLE 0 345 #define KVM_MP_STATE_RUNNABLE 0
346 #define KVM_MP_STATE_UNINITIALIZED 1 346 #define KVM_MP_STATE_UNINITIALIZED 1
347 #define KVM_MP_STATE_INIT_RECEIVED 2 347 #define KVM_MP_STATE_INIT_RECEIVED 2
348 #define KVM_MP_STATE_HALTED 3 348 #define KVM_MP_STATE_HALTED 3
349 #define KVM_MP_STATE_SIPI_RECEIVED 4 349 #define KVM_MP_STATE_SIPI_RECEIVED 4
350 350
351 struct kvm_mp_state { 351 struct kvm_mp_state {
352 __u32 mp_state; 352 __u32 mp_state;
353 }; 353 };
354 354
355 struct kvm_s390_psw { 355 struct kvm_s390_psw {
356 __u64 mask; 356 __u64 mask;
357 __u64 addr; 357 __u64 addr;
358 }; 358 };
359 359
360 /* valid values for type in kvm_s390_interrupt */ 360 /* valid values for type in kvm_s390_interrupt */
361 #define KVM_S390_SIGP_STOP 0xfffe0000u 361 #define KVM_S390_SIGP_STOP 0xfffe0000u
362 #define KVM_S390_PROGRAM_INT 0xfffe0001u 362 #define KVM_S390_PROGRAM_INT 0xfffe0001u
363 #define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u 363 #define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u
364 #define KVM_S390_RESTART 0xfffe0003u 364 #define KVM_S390_RESTART 0xfffe0003u
365 #define KVM_S390_INT_VIRTIO 0xffff2603u 365 #define KVM_S390_INT_VIRTIO 0xffff2603u
366 #define KVM_S390_INT_SERVICE 0xffff2401u 366 #define KVM_S390_INT_SERVICE 0xffff2401u
367 #define KVM_S390_INT_EMERGENCY 0xffff1201u 367 #define KVM_S390_INT_EMERGENCY 0xffff1201u
368 368
369 struct kvm_s390_interrupt { 369 struct kvm_s390_interrupt {
370 __u32 type; 370 __u32 type;
371 __u32 parm; 371 __u32 parm;
372 __u64 parm64; 372 __u64 parm64;
373 }; 373 };
374 374
375 /* for KVM_SET_GUEST_DEBUG */ 375 /* for KVM_SET_GUEST_DEBUG */
376 376
377 #define KVM_GUESTDBG_ENABLE 0x00000001 377 #define KVM_GUESTDBG_ENABLE 0x00000001
378 #define KVM_GUESTDBG_SINGLESTEP 0x00000002 378 #define KVM_GUESTDBG_SINGLESTEP 0x00000002
379 379
380 struct kvm_guest_debug { 380 struct kvm_guest_debug {
381 __u32 control; 381 __u32 control;
382 __u32 pad; 382 __u32 pad;
383 struct kvm_guest_debug_arch arch; 383 struct kvm_guest_debug_arch arch;
384 }; 384 };
385 385
386 enum { 386 enum {
387 kvm_ioeventfd_flag_nr_datamatch, 387 kvm_ioeventfd_flag_nr_datamatch,
388 kvm_ioeventfd_flag_nr_pio, 388 kvm_ioeventfd_flag_nr_pio,
389 kvm_ioeventfd_flag_nr_deassign, 389 kvm_ioeventfd_flag_nr_deassign,
390 kvm_ioeventfd_flag_nr_max, 390 kvm_ioeventfd_flag_nr_max,
391 }; 391 };
392 392
393 #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) 393 #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
394 #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) 394 #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
395 #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) 395 #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
396 396
397 #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) 397 #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1)
398 398
399 struct kvm_ioeventfd { 399 struct kvm_ioeventfd {
400 __u64 datamatch; 400 __u64 datamatch;
401 __u64 addr; /* legal pio/mmio address */ 401 __u64 addr; /* legal pio/mmio address */
402 __u32 len; /* 1, 2, 4, or 8 bytes */ 402 __u32 len; /* 1, 2, 4, or 8 bytes */
403 __s32 fd; 403 __s32 fd;
404 __u32 flags; 404 __u32 flags;
405 __u8 pad[36]; 405 __u8 pad[36];
406 }; 406 };
407 407
408 /* for KVM_ENABLE_CAP */ 408 /* for KVM_ENABLE_CAP */
409 struct kvm_enable_cap { 409 struct kvm_enable_cap {
410 /* in */ 410 /* in */
411 __u32 cap; 411 __u32 cap;
412 __u32 flags; 412 __u32 flags;
413 __u64 args[4]; 413 __u64 args[4];
414 __u8 pad[64]; 414 __u8 pad[64];
415 }; 415 };
416 416
417 #define KVMIO 0xAE 417 #define KVMIO 0xAE
418 418
419 /* 419 /*
420 * ioctls for /dev/kvm fds: 420 * ioctls for /dev/kvm fds:
421 */ 421 */
422 #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) 422 #define KVM_GET_API_VERSION _IO(KVMIO, 0x00)
423 #define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */ 423 #define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */
424 #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) 424 #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list)
425 425
426 #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) 426 #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06)
427 /* 427 /*
428 * Check if a kvm extension is available. Argument is extension number, 428 * Check if a kvm extension is available. Argument is extension number,
429 * return is 1 (yes) or 0 (no, sorry). 429 * return is 1 (yes) or 0 (no, sorry).
430 */ 430 */
431 #define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03) 431 #define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03)
432 /* 432 /*
433 * Get size for mmap(vcpu_fd) 433 * Get size for mmap(vcpu_fd)
434 */ 434 */
435 #define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */ 435 #define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */
436 #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2) 436 #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2)
437 #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06 437 #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
438 #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07 438 #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
439 #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08 439 #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
440 440
441 /* 441 /*
442 * Extension capability list. 442 * Extension capability list.
443 */ 443 */
444 #define KVM_CAP_IRQCHIP 0 444 #define KVM_CAP_IRQCHIP 0
445 #define KVM_CAP_HLT 1 445 #define KVM_CAP_HLT 1
446 #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2 446 #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
447 #define KVM_CAP_USER_MEMORY 3 447 #define KVM_CAP_USER_MEMORY 3
448 #define KVM_CAP_SET_TSS_ADDR 4 448 #define KVM_CAP_SET_TSS_ADDR 4
449 #define KVM_CAP_VAPIC 6 449 #define KVM_CAP_VAPIC 6
450 #define KVM_CAP_EXT_CPUID 7 450 #define KVM_CAP_EXT_CPUID 7
451 #define KVM_CAP_CLOCKSOURCE 8 451 #define KVM_CAP_CLOCKSOURCE 8
452 #define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */ 452 #define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */
453 #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */ 453 #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */
454 #define KVM_CAP_PIT 11 454 #define KVM_CAP_PIT 11
455 #define KVM_CAP_NOP_IO_DELAY 12 455 #define KVM_CAP_NOP_IO_DELAY 12
456 #define KVM_CAP_PV_MMU 13 456 #define KVM_CAP_PV_MMU 13
457 #define KVM_CAP_MP_STATE 14 457 #define KVM_CAP_MP_STATE 14
458 #define KVM_CAP_COALESCED_MMIO 15 458 #define KVM_CAP_COALESCED_MMIO 15
459 #define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */ 459 #define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */
460 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT 460 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
461 #define KVM_CAP_DEVICE_ASSIGNMENT 17 461 #define KVM_CAP_DEVICE_ASSIGNMENT 17
462 #endif 462 #endif
463 #define KVM_CAP_IOMMU 18 463 #define KVM_CAP_IOMMU 18
464 #ifdef __KVM_HAVE_MSI 464 #ifdef __KVM_HAVE_MSI
465 #define KVM_CAP_DEVICE_MSI 20 465 #define KVM_CAP_DEVICE_MSI 20
466 #endif 466 #endif
467 /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */ 467 /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */
468 #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21 468 #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21
469 #ifdef __KVM_HAVE_USER_NMI 469 #ifdef __KVM_HAVE_USER_NMI
470 #define KVM_CAP_USER_NMI 22 470 #define KVM_CAP_USER_NMI 22
471 #endif 471 #endif
472 #ifdef __KVM_HAVE_GUEST_DEBUG 472 #ifdef __KVM_HAVE_GUEST_DEBUG
473 #define KVM_CAP_SET_GUEST_DEBUG 23 473 #define KVM_CAP_SET_GUEST_DEBUG 23
474 #endif 474 #endif
475 #ifdef __KVM_HAVE_PIT 475 #ifdef __KVM_HAVE_PIT
476 #define KVM_CAP_REINJECT_CONTROL 24 476 #define KVM_CAP_REINJECT_CONTROL 24
477 #endif 477 #endif
478 #ifdef __KVM_HAVE_IOAPIC 478 #ifdef __KVM_HAVE_IOAPIC
479 #define KVM_CAP_IRQ_ROUTING 25 479 #define KVM_CAP_IRQ_ROUTING 25
480 #endif 480 #endif
481 #define KVM_CAP_IRQ_INJECT_STATUS 26 481 #define KVM_CAP_IRQ_INJECT_STATUS 26
482 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT 482 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
483 #define KVM_CAP_DEVICE_DEASSIGNMENT 27 483 #define KVM_CAP_DEVICE_DEASSIGNMENT 27
484 #endif 484 #endif
485 #ifdef __KVM_HAVE_MSIX 485 #ifdef __KVM_HAVE_MSIX
486 #define KVM_CAP_DEVICE_MSIX 28 486 #define KVM_CAP_DEVICE_MSIX 28
487 #endif 487 #endif
488 #define KVM_CAP_ASSIGN_DEV_IRQ 29 488 #define KVM_CAP_ASSIGN_DEV_IRQ 29
489 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ 489 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
490 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 490 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
491 #ifdef __KVM_HAVE_MCE 491 #ifdef __KVM_HAVE_MCE
492 #define KVM_CAP_MCE 31 492 #define KVM_CAP_MCE 31
493 #endif 493 #endif
494 #define KVM_CAP_IRQFD 32 494 #define KVM_CAP_IRQFD 32
495 #ifdef __KVM_HAVE_PIT 495 #ifdef __KVM_HAVE_PIT
496 #define KVM_CAP_PIT2 33 496 #define KVM_CAP_PIT2 33
497 #endif 497 #endif
498 #define KVM_CAP_SET_BOOT_CPU_ID 34 498 #define KVM_CAP_SET_BOOT_CPU_ID 34
499 #ifdef __KVM_HAVE_PIT_STATE2 499 #ifdef __KVM_HAVE_PIT_STATE2
500 #define KVM_CAP_PIT_STATE2 35 500 #define KVM_CAP_PIT_STATE2 35
501 #endif 501 #endif
502 #define KVM_CAP_IOEVENTFD 36 502 #define KVM_CAP_IOEVENTFD 36
503 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37 503 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
504 #ifdef __KVM_HAVE_XEN_HVM 504 #ifdef __KVM_HAVE_XEN_HVM
505 #define KVM_CAP_XEN_HVM 38 505 #define KVM_CAP_XEN_HVM 38
506 #endif 506 #endif
507 #define KVM_CAP_ADJUST_CLOCK 39 507 #define KVM_CAP_ADJUST_CLOCK 39
508 #define KVM_CAP_INTERNAL_ERROR_DATA 40 508 #define KVM_CAP_INTERNAL_ERROR_DATA 40
509 #ifdef __KVM_HAVE_VCPU_EVENTS 509 #ifdef __KVM_HAVE_VCPU_EVENTS
510 #define KVM_CAP_VCPU_EVENTS 41 510 #define KVM_CAP_VCPU_EVENTS 41
511 #endif 511 #endif
512 #define KVM_CAP_S390_PSW 42 512 #define KVM_CAP_S390_PSW 42
513 #define KVM_CAP_PPC_SEGSTATE 43 513 #define KVM_CAP_PPC_SEGSTATE 43
514 #define KVM_CAP_HYPERV 44 514 #define KVM_CAP_HYPERV 44
515 #define KVM_CAP_HYPERV_VAPIC 45 515 #define KVM_CAP_HYPERV_VAPIC 45
516 #define KVM_CAP_HYPERV_SPIN 46 516 #define KVM_CAP_HYPERV_SPIN 46
517 #define KVM_CAP_PCI_SEGMENT 47 517 #define KVM_CAP_PCI_SEGMENT 47
518 #define KVM_CAP_PPC_PAIRED_SINGLES 48 518 #define KVM_CAP_PPC_PAIRED_SINGLES 48
519 #define KVM_CAP_INTR_SHADOW 49 519 #define KVM_CAP_INTR_SHADOW 49
520 #ifdef __KVM_HAVE_DEBUGREGS 520 #ifdef __KVM_HAVE_DEBUGREGS
521 #define KVM_CAP_DEBUGREGS 50 521 #define KVM_CAP_DEBUGREGS 50
522 #endif 522 #endif
523 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51 523 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
524 #define KVM_CAP_PPC_OSI 52 524 #define KVM_CAP_PPC_OSI 52
525 #define KVM_CAP_PPC_UNSET_IRQ 53 525 #define KVM_CAP_PPC_UNSET_IRQ 53
526 #define KVM_CAP_ENABLE_CAP 54 526 #define KVM_CAP_ENABLE_CAP 54
527 #ifdef __KVM_HAVE_XSAVE 527 #ifdef __KVM_HAVE_XSAVE
528 #define KVM_CAP_XSAVE 55 528 #define KVM_CAP_XSAVE 55
529 #endif 529 #endif
530 #ifdef __KVM_HAVE_XCRS 530 #ifdef __KVM_HAVE_XCRS
531 #define KVM_CAP_XCRS 56 531 #define KVM_CAP_XCRS 56
532 #endif 532 #endif
533 533
534 #ifdef KVM_CAP_IRQ_ROUTING 534 #ifdef KVM_CAP_IRQ_ROUTING
535 535
536 struct kvm_irq_routing_irqchip { 536 struct kvm_irq_routing_irqchip {
537 __u32 irqchip; 537 __u32 irqchip;
538 __u32 pin; 538 __u32 pin;
539 }; 539 };
540 540
541 struct kvm_irq_routing_msi { 541 struct kvm_irq_routing_msi {
542 __u32 address_lo; 542 __u32 address_lo;
543 __u32 address_hi; 543 __u32 address_hi;
544 __u32 data; 544 __u32 data;
545 __u32 pad; 545 __u32 pad;
546 }; 546 };
547 547
548 /* gsi routing entry types */ 548 /* gsi routing entry types */
549 #define KVM_IRQ_ROUTING_IRQCHIP 1 549 #define KVM_IRQ_ROUTING_IRQCHIP 1
550 #define KVM_IRQ_ROUTING_MSI 2 550 #define KVM_IRQ_ROUTING_MSI 2
551 551
552 struct kvm_irq_routing_entry { 552 struct kvm_irq_routing_entry {
553 __u32 gsi; 553 __u32 gsi;
554 __u32 type; 554 __u32 type;
555 __u32 flags; 555 __u32 flags;
556 __u32 pad; 556 __u32 pad;
557 union { 557 union {
558 struct kvm_irq_routing_irqchip irqchip; 558 struct kvm_irq_routing_irqchip irqchip;
559 struct kvm_irq_routing_msi msi; 559 struct kvm_irq_routing_msi msi;
560 __u32 pad[8]; 560 __u32 pad[8];
561 } u; 561 } u;
562 }; 562 };
563 563
564 struct kvm_irq_routing { 564 struct kvm_irq_routing {
565 __u32 nr; 565 __u32 nr;
566 __u32 flags; 566 __u32 flags;
567 struct kvm_irq_routing_entry entries[0]; 567 struct kvm_irq_routing_entry entries[0];
568 }; 568 };
569 569
570 #endif 570 #endif
571 571
572 #ifdef KVM_CAP_MCE 572 #ifdef KVM_CAP_MCE
573 /* x86 MCE */ 573 /* x86 MCE */
574 struct kvm_x86_mce { 574 struct kvm_x86_mce {
575 __u64 status; 575 __u64 status;
576 __u64 addr; 576 __u64 addr;
577 __u64 misc; 577 __u64 misc;
578 __u64 mcg_status; 578 __u64 mcg_status;
579 __u8 bank; 579 __u8 bank;
580 __u8 pad1[7]; 580 __u8 pad1[7];
581 __u64 pad2[3]; 581 __u64 pad2[3];
582 }; 582 };
583 #endif 583 #endif
584 584
585 #ifdef KVM_CAP_XEN_HVM 585 #ifdef KVM_CAP_XEN_HVM
586 struct kvm_xen_hvm_config { 586 struct kvm_xen_hvm_config {
587 __u32 flags; 587 __u32 flags;
588 __u32 msr; 588 __u32 msr;
589 __u64 blob_addr_32; 589 __u64 blob_addr_32;
590 __u64 blob_addr_64; 590 __u64 blob_addr_64;
591 __u8 blob_size_32; 591 __u8 blob_size_32;
592 __u8 blob_size_64; 592 __u8 blob_size_64;
593 __u8 pad2[30]; 593 __u8 pad2[30];
594 }; 594 };
595 #endif 595 #endif
596 596
597 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) 597 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
598 598
599 struct kvm_irqfd { 599 struct kvm_irqfd {
600 __u32 fd; 600 __u32 fd;
601 __u32 gsi; 601 __u32 gsi;
602 __u32 flags; 602 __u32 flags;
603 __u8 pad[20]; 603 __u8 pad[20];
604 }; 604 };
605 605
606 struct kvm_clock_data { 606 struct kvm_clock_data {
607 __u64 clock; 607 __u64 clock;
608 __u32 flags; 608 __u32 flags;
609 __u32 pad[9]; 609 __u32 pad[9];
610 }; 610 };
611 611
612 /* 612 /*
613 * ioctls for VM fds 613 * ioctls for VM fds
614 */ 614 */
615 #define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region) 615 #define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
616 /* 616 /*
617 * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns 617 * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
618 * a vcpu fd. 618 * a vcpu fd.
619 */ 619 */
620 #define KVM_CREATE_VCPU _IO(KVMIO, 0x41) 620 #define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
621 #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log) 621 #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
622 /* KVM_SET_MEMORY_ALIAS is obsolete: */
622 #define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias) 623 #define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
623 #define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44) 624 #define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
624 #define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45) 625 #define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
625 #define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \ 626 #define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
626 struct kvm_userspace_memory_region) 627 struct kvm_userspace_memory_region)
627 #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) 628 #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
628 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) 629 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
629 /* Device model IOC */ 630 /* Device model IOC */
630 #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) 631 #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60)
631 #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) 632 #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
632 #define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip) 633 #define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip)
633 #define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip) 634 #define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip)
634 #define KVM_CREATE_PIT _IO(KVMIO, 0x64) 635 #define KVM_CREATE_PIT _IO(KVMIO, 0x64)
635 #define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state) 636 #define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
636 #define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state) 637 #define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
637 #define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level) 638 #define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level)
638 #define KVM_REGISTER_COALESCED_MMIO \ 639 #define KVM_REGISTER_COALESCED_MMIO \
639 _IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone) 640 _IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone)
640 #define KVM_UNREGISTER_COALESCED_MMIO \ 641 #define KVM_UNREGISTER_COALESCED_MMIO \
641 _IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone) 642 _IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone)
642 #define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \ 643 #define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
643 struct kvm_assigned_pci_dev) 644 struct kvm_assigned_pci_dev)
644 #define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing) 645 #define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing)
645 /* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */ 646 /* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */
646 #define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70 647 #define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70
647 #define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq) 648 #define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq)
648 #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71) 649 #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71)
649 #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \ 650 #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
650 struct kvm_assigned_pci_dev) 651 struct kvm_assigned_pci_dev)
651 #define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \ 652 #define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \
652 struct kvm_assigned_msix_nr) 653 struct kvm_assigned_msix_nr)
653 #define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \ 654 #define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \
654 struct kvm_assigned_msix_entry) 655 struct kvm_assigned_msix_entry)
655 #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) 656 #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
656 #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) 657 #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd)
657 #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) 658 #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
658 #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78) 659 #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
659 #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) 660 #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
660 #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config) 661 #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
661 #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data) 662 #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
662 #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data) 663 #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
663 /* Available with KVM_CAP_PIT_STATE2 */ 664 /* Available with KVM_CAP_PIT_STATE2 */
664 #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2) 665 #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
665 #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) 666 #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
666 667
667 /* 668 /*
668 * ioctls for vcpu fds 669 * ioctls for vcpu fds
669 */ 670 */
670 #define KVM_RUN _IO(KVMIO, 0x80) 671 #define KVM_RUN _IO(KVMIO, 0x80)
671 #define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs) 672 #define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs)
672 #define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs) 673 #define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs)
673 #define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs) 674 #define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs)
674 #define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs) 675 #define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs)
675 #define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation) 676 #define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation)
676 #define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt) 677 #define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt)
677 /* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */ 678 /* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */
678 #define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87 679 #define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87
679 #define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs) 680 #define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs)
680 #define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs) 681 #define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs)
681 #define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid) 682 #define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid)
682 #define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask) 683 #define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask)
683 #define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu) 684 #define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu)
684 #define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu) 685 #define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu)
685 #define KVM_GET_LAPIC _IOR(KVMIO, 0x8e, struct kvm_lapic_state) 686 #define KVM_GET_LAPIC _IOR(KVMIO, 0x8e, struct kvm_lapic_state)
686 #define KVM_SET_LAPIC _IOW(KVMIO, 0x8f, struct kvm_lapic_state) 687 #define KVM_SET_LAPIC _IOW(KVMIO, 0x8f, struct kvm_lapic_state)
687 #define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2) 688 #define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2)
688 #define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2) 689 #define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
689 /* Available with KVM_CAP_VAPIC */ 690 /* Available with KVM_CAP_VAPIC */
690 #define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl) 691 #define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl)
691 /* Available with KVM_CAP_VAPIC */ 692 /* Available with KVM_CAP_VAPIC */
692 #define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr) 693 #define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr)
693 /* valid for virtual machine (for floating interrupt)_and_ vcpu */ 694 /* valid for virtual machine (for floating interrupt)_and_ vcpu */
694 #define KVM_S390_INTERRUPT _IOW(KVMIO, 0x94, struct kvm_s390_interrupt) 695 #define KVM_S390_INTERRUPT _IOW(KVMIO, 0x94, struct kvm_s390_interrupt)
695 /* store status for s390 */ 696 /* store status for s390 */
696 #define KVM_S390_STORE_STATUS_NOADDR (-1ul) 697 #define KVM_S390_STORE_STATUS_NOADDR (-1ul)
697 #define KVM_S390_STORE_STATUS_PREFIXED (-2ul) 698 #define KVM_S390_STORE_STATUS_PREFIXED (-2ul)
698 #define KVM_S390_STORE_STATUS _IOW(KVMIO, 0x95, unsigned long) 699 #define KVM_S390_STORE_STATUS _IOW(KVMIO, 0x95, unsigned long)
699 /* initial ipl psw for s390 */ 700 /* initial ipl psw for s390 */
700 #define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw) 701 #define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw)
701 /* initial reset for s390 */ 702 /* initial reset for s390 */
702 #define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97) 703 #define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97)
703 #define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state) 704 #define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state)
704 #define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state) 705 #define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state)
705 /* Available with KVM_CAP_NMI */ 706 /* Available with KVM_CAP_NMI */
706 #define KVM_NMI _IO(KVMIO, 0x9a) 707 #define KVM_NMI _IO(KVMIO, 0x9a)
707 /* Available with KVM_CAP_SET_GUEST_DEBUG */ 708 /* Available with KVM_CAP_SET_GUEST_DEBUG */
708 #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) 709 #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug)
709 /* MCE for x86 */ 710 /* MCE for x86 */
710 #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) 711 #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64)
711 #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) 712 #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64)
712 #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) 713 #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce)
713 /* IA64 stack access */ 714 /* IA64 stack access */
714 #define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) 715 #define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *)
715 #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) 716 #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *)
716 /* Available with KVM_CAP_VCPU_EVENTS */ 717 /* Available with KVM_CAP_VCPU_EVENTS */
717 #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) 718 #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events)
718 #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) 719 #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events)
719 /* Available with KVM_CAP_DEBUGREGS */ 720 /* Available with KVM_CAP_DEBUGREGS */
720 #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) 721 #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs)
721 #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) 722 #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs)
722 #define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap) 723 #define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap)
723 /* Available with KVM_CAP_XSAVE */ 724 /* Available with KVM_CAP_XSAVE */
724 #define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave) 725 #define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave)
725 #define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave) 726 #define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave)
726 /* Available with KVM_CAP_XCRS */ 727 /* Available with KVM_CAP_XCRS */
727 #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs) 728 #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs)
728 #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs) 729 #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs)
729 730
730 #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) 731 #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
731 732
732 struct kvm_assigned_pci_dev { 733 struct kvm_assigned_pci_dev {
733 __u32 assigned_dev_id; 734 __u32 assigned_dev_id;
734 __u32 busnr; 735 __u32 busnr;
735 __u32 devfn; 736 __u32 devfn;
736 __u32 flags; 737 __u32 flags;
737 __u32 segnr; 738 __u32 segnr;
738 union { 739 union {
739 __u32 reserved[11]; 740 __u32 reserved[11];
740 }; 741 };
741 }; 742 };
742 743
743 #define KVM_DEV_IRQ_HOST_INTX (1 << 0) 744 #define KVM_DEV_IRQ_HOST_INTX (1 << 0)
744 #define KVM_DEV_IRQ_HOST_MSI (1 << 1) 745 #define KVM_DEV_IRQ_HOST_MSI (1 << 1)
745 #define KVM_DEV_IRQ_HOST_MSIX (1 << 2) 746 #define KVM_DEV_IRQ_HOST_MSIX (1 << 2)
746 747
747 #define KVM_DEV_IRQ_GUEST_INTX (1 << 8) 748 #define KVM_DEV_IRQ_GUEST_INTX (1 << 8)
748 #define KVM_DEV_IRQ_GUEST_MSI (1 << 9) 749 #define KVM_DEV_IRQ_GUEST_MSI (1 << 9)
749 #define KVM_DEV_IRQ_GUEST_MSIX (1 << 10) 750 #define KVM_DEV_IRQ_GUEST_MSIX (1 << 10)
750 751
751 #define KVM_DEV_IRQ_HOST_MASK 0x00ff 752 #define KVM_DEV_IRQ_HOST_MASK 0x00ff
752 #define KVM_DEV_IRQ_GUEST_MASK 0xff00 753 #define KVM_DEV_IRQ_GUEST_MASK 0xff00
753 754
754 struct kvm_assigned_irq { 755 struct kvm_assigned_irq {
755 __u32 assigned_dev_id; 756 __u32 assigned_dev_id;
756 __u32 host_irq; 757 __u32 host_irq;
757 __u32 guest_irq; 758 __u32 guest_irq;
758 __u32 flags; 759 __u32 flags;
759 union { 760 union {
760 struct { 761 struct {
761 __u32 addr_lo; 762 __u32 addr_lo;
762 __u32 addr_hi; 763 __u32 addr_hi;
763 __u32 data; 764 __u32 data;
764 } guest_msi; 765 } guest_msi;
765 __u32 reserved[12]; 766 __u32 reserved[12];
766 }; 767 };
767 }; 768 };
768 769
769 770
770 struct kvm_assigned_msix_nr { 771 struct kvm_assigned_msix_nr {
771 __u32 assigned_dev_id; 772 __u32 assigned_dev_id;
772 __u16 entry_nr; 773 __u16 entry_nr;
773 __u16 padding; 774 __u16 padding;
774 }; 775 };
775 776
776 #define KVM_MAX_MSIX_PER_DEV 256 777 #define KVM_MAX_MSIX_PER_DEV 256
777 struct kvm_assigned_msix_entry { 778 struct kvm_assigned_msix_entry {
778 __u32 assigned_dev_id; 779 __u32 assigned_dev_id;
779 __u32 gsi; 780 __u32 gsi;
780 __u16 entry; /* The index of entry in the MSI-X table */ 781 __u16 entry; /* The index of entry in the MSI-X table */
781 __u16 padding[3]; 782 __u16 padding[3];
782 }; 783 };
783 784
784 #endif /* __LINUX_KVM_H */ 785 #endif /* __LINUX_KVM_H */
785 786
include/linux/kvm_host.h
1 #ifndef __KVM_HOST_H 1 #ifndef __KVM_HOST_H
2 #define __KVM_HOST_H 2 #define __KVM_HOST_H
3 3
4 /* 4 /*
5 * This work is licensed under the terms of the GNU GPL, version 2. See 5 * This work is licensed under the terms of the GNU GPL, version 2. See
6 * the COPYING file in the top-level directory. 6 * the COPYING file in the top-level directory.
7 */ 7 */
8 8
9 #include <linux/types.h> 9 #include <linux/types.h>
10 #include <linux/hardirq.h> 10 #include <linux/hardirq.h>
11 #include <linux/list.h> 11 #include <linux/list.h>
12 #include <linux/mutex.h> 12 #include <linux/mutex.h>
13 #include <linux/spinlock.h> 13 #include <linux/spinlock.h>
14 #include <linux/signal.h> 14 #include <linux/signal.h>
15 #include <linux/sched.h> 15 #include <linux/sched.h>
16 #include <linux/mm.h> 16 #include <linux/mm.h>
17 #include <linux/preempt.h> 17 #include <linux/preempt.h>
18 #include <linux/msi.h> 18 #include <linux/msi.h>
19 #include <asm/signal.h> 19 #include <asm/signal.h>
20 20
21 #include <linux/kvm.h> 21 #include <linux/kvm.h>
22 #include <linux/kvm_para.h> 22 #include <linux/kvm_para.h>
23 23
24 #include <linux/kvm_types.h> 24 #include <linux/kvm_types.h>
25 25
26 #include <asm/kvm_host.h> 26 #include <asm/kvm_host.h>
27 27
28 /* 28 /*
29 * vcpu->requests bit members 29 * vcpu->requests bit members
30 */ 30 */
31 #define KVM_REQ_TLB_FLUSH 0 31 #define KVM_REQ_TLB_FLUSH 0
32 #define KVM_REQ_MIGRATE_TIMER 1 32 #define KVM_REQ_MIGRATE_TIMER 1
33 #define KVM_REQ_REPORT_TPR_ACCESS 2 33 #define KVM_REQ_REPORT_TPR_ACCESS 2
34 #define KVM_REQ_MMU_RELOAD 3 34 #define KVM_REQ_MMU_RELOAD 3
35 #define KVM_REQ_TRIPLE_FAULT 4 35 #define KVM_REQ_TRIPLE_FAULT 4
36 #define KVM_REQ_PENDING_TIMER 5 36 #define KVM_REQ_PENDING_TIMER 5
37 #define KVM_REQ_UNHALT 6 37 #define KVM_REQ_UNHALT 6
38 #define KVM_REQ_MMU_SYNC 7 38 #define KVM_REQ_MMU_SYNC 7
39 #define KVM_REQ_KVMCLOCK_UPDATE 8 39 #define KVM_REQ_KVMCLOCK_UPDATE 8
40 #define KVM_REQ_KICK 9 40 #define KVM_REQ_KICK 9
41 #define KVM_REQ_DEACTIVATE_FPU 10 41 #define KVM_REQ_DEACTIVATE_FPU 10
42 42
43 #define KVM_USERSPACE_IRQ_SOURCE_ID 0 43 #define KVM_USERSPACE_IRQ_SOURCE_ID 0
44 44
45 struct kvm; 45 struct kvm;
46 struct kvm_vcpu; 46 struct kvm_vcpu;
47 extern struct kmem_cache *kvm_vcpu_cache; 47 extern struct kmem_cache *kvm_vcpu_cache;
48 48
49 /* 49 /*
50 * It would be nice to use something smarter than a linear search, TBD... 50 * It would be nice to use something smarter than a linear search, TBD...
51 * Thankfully we dont expect many devices to register (famous last words :), 51 * Thankfully we dont expect many devices to register (famous last words :),
52 * so until then it will suffice. At least its abstracted so we can change 52 * so until then it will suffice. At least its abstracted so we can change
53 * in one place. 53 * in one place.
54 */ 54 */
55 struct kvm_io_bus { 55 struct kvm_io_bus {
56 int dev_count; 56 int dev_count;
57 #define NR_IOBUS_DEVS 200 57 #define NR_IOBUS_DEVS 200
58 struct kvm_io_device *devs[NR_IOBUS_DEVS]; 58 struct kvm_io_device *devs[NR_IOBUS_DEVS];
59 }; 59 };
60 60
61 enum kvm_bus { 61 enum kvm_bus {
62 KVM_MMIO_BUS, 62 KVM_MMIO_BUS,
63 KVM_PIO_BUS, 63 KVM_PIO_BUS,
64 KVM_NR_BUSES 64 KVM_NR_BUSES
65 }; 65 };
66 66
67 int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, 67 int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
68 int len, const void *val); 68 int len, const void *val);
69 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, 69 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len,
70 void *val); 70 void *val);
71 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, 71 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
72 struct kvm_io_device *dev); 72 struct kvm_io_device *dev);
73 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, 73 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
74 struct kvm_io_device *dev); 74 struct kvm_io_device *dev);
75 75
76 struct kvm_vcpu { 76 struct kvm_vcpu {
77 struct kvm *kvm; 77 struct kvm *kvm;
78 #ifdef CONFIG_PREEMPT_NOTIFIERS 78 #ifdef CONFIG_PREEMPT_NOTIFIERS
79 struct preempt_notifier preempt_notifier; 79 struct preempt_notifier preempt_notifier;
80 #endif 80 #endif
81 int vcpu_id; 81 int vcpu_id;
82 struct mutex mutex; 82 struct mutex mutex;
83 int cpu; 83 int cpu;
84 atomic_t guest_mode; 84 atomic_t guest_mode;
85 struct kvm_run *run; 85 struct kvm_run *run;
86 unsigned long requests; 86 unsigned long requests;
87 unsigned long guest_debug; 87 unsigned long guest_debug;
88 int srcu_idx; 88 int srcu_idx;
89 89
90 int fpu_active; 90 int fpu_active;
91 int guest_fpu_loaded, guest_xcr0_loaded; 91 int guest_fpu_loaded, guest_xcr0_loaded;
92 wait_queue_head_t wq; 92 wait_queue_head_t wq;
93 int sigset_active; 93 int sigset_active;
94 sigset_t sigset; 94 sigset_t sigset;
95 struct kvm_vcpu_stat stat; 95 struct kvm_vcpu_stat stat;
96 96
97 #ifdef CONFIG_HAS_IOMEM 97 #ifdef CONFIG_HAS_IOMEM
98 int mmio_needed; 98 int mmio_needed;
99 int mmio_read_completed; 99 int mmio_read_completed;
100 int mmio_is_write; 100 int mmio_is_write;
101 int mmio_size; 101 int mmio_size;
102 unsigned char mmio_data[8]; 102 unsigned char mmio_data[8];
103 gpa_t mmio_phys_addr; 103 gpa_t mmio_phys_addr;
104 #endif 104 #endif
105 105
106 struct kvm_vcpu_arch arch; 106 struct kvm_vcpu_arch arch;
107 }; 107 };
108 108
109 /* 109 /*
110 * Some of the bitops functions do not support too long bitmaps. 110 * Some of the bitops functions do not support too long bitmaps.
111 * This number must be determined not to exceed such limits. 111 * This number must be determined not to exceed such limits.
112 */ 112 */
113 #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1) 113 #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
114 114
115 struct kvm_memory_slot { 115 struct kvm_memory_slot {
116 gfn_t base_gfn; 116 gfn_t base_gfn;
117 unsigned long npages; 117 unsigned long npages;
118 unsigned long flags; 118 unsigned long flags;
119 unsigned long *rmap; 119 unsigned long *rmap;
120 unsigned long *dirty_bitmap; 120 unsigned long *dirty_bitmap;
121 struct { 121 struct {
122 unsigned long rmap_pde; 122 unsigned long rmap_pde;
123 int write_count; 123 int write_count;
124 } *lpage_info[KVM_NR_PAGE_SIZES - 1]; 124 } *lpage_info[KVM_NR_PAGE_SIZES - 1];
125 unsigned long userspace_addr; 125 unsigned long userspace_addr;
126 int user_alloc; 126 int user_alloc;
127 }; 127 };
128 128
129 static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) 129 static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
130 { 130 {
131 return ALIGN(memslot->npages, BITS_PER_LONG) / 8; 131 return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
132 } 132 }
133 133
134 struct kvm_kernel_irq_routing_entry { 134 struct kvm_kernel_irq_routing_entry {
135 u32 gsi; 135 u32 gsi;
136 u32 type; 136 u32 type;
137 int (*set)(struct kvm_kernel_irq_routing_entry *e, 137 int (*set)(struct kvm_kernel_irq_routing_entry *e,
138 struct kvm *kvm, int irq_source_id, int level); 138 struct kvm *kvm, int irq_source_id, int level);
139 union { 139 union {
140 struct { 140 struct {
141 unsigned irqchip; 141 unsigned irqchip;
142 unsigned pin; 142 unsigned pin;
143 } irqchip; 143 } irqchip;
144 struct msi_msg msi; 144 struct msi_msg msi;
145 }; 145 };
146 struct hlist_node link; 146 struct hlist_node link;
147 }; 147 };
148 148
149 #ifdef __KVM_HAVE_IOAPIC 149 #ifdef __KVM_HAVE_IOAPIC
150 150
151 struct kvm_irq_routing_table { 151 struct kvm_irq_routing_table {
152 int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS]; 152 int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS];
153 struct kvm_kernel_irq_routing_entry *rt_entries; 153 struct kvm_kernel_irq_routing_entry *rt_entries;
154 u32 nr_rt_entries; 154 u32 nr_rt_entries;
155 /* 155 /*
156 * Array indexed by gsi. Each entry contains list of irq chips 156 * Array indexed by gsi. Each entry contains list of irq chips
157 * the gsi is connected to. 157 * the gsi is connected to.
158 */ 158 */
159 struct hlist_head map[0]; 159 struct hlist_head map[0];
160 }; 160 };
161 161
162 #else 162 #else
163 163
164 struct kvm_irq_routing_table {}; 164 struct kvm_irq_routing_table {};
165 165
166 #endif 166 #endif
167 167
168 struct kvm_memslots { 168 struct kvm_memslots {
169 int nmemslots; 169 int nmemslots;
170 struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + 170 struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
171 KVM_PRIVATE_MEM_SLOTS]; 171 KVM_PRIVATE_MEM_SLOTS];
172 }; 172 };
173 173
174 struct kvm { 174 struct kvm {
175 spinlock_t mmu_lock; 175 spinlock_t mmu_lock;
176 raw_spinlock_t requests_lock; 176 raw_spinlock_t requests_lock;
177 struct mutex slots_lock; 177 struct mutex slots_lock;
178 struct mm_struct *mm; /* userspace tied to this vm */ 178 struct mm_struct *mm; /* userspace tied to this vm */
179 struct kvm_memslots *memslots; 179 struct kvm_memslots *memslots;
180 struct srcu_struct srcu; 180 struct srcu_struct srcu;
181 #ifdef CONFIG_KVM_APIC_ARCHITECTURE 181 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
182 u32 bsp_vcpu_id; 182 u32 bsp_vcpu_id;
183 struct kvm_vcpu *bsp_vcpu; 183 struct kvm_vcpu *bsp_vcpu;
184 #endif 184 #endif
185 struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; 185 struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
186 atomic_t online_vcpus; 186 atomic_t online_vcpus;
187 struct list_head vm_list; 187 struct list_head vm_list;
188 struct mutex lock; 188 struct mutex lock;
189 struct kvm_io_bus *buses[KVM_NR_BUSES]; 189 struct kvm_io_bus *buses[KVM_NR_BUSES];
190 #ifdef CONFIG_HAVE_KVM_EVENTFD 190 #ifdef CONFIG_HAVE_KVM_EVENTFD
191 struct { 191 struct {
192 spinlock_t lock; 192 spinlock_t lock;
193 struct list_head items; 193 struct list_head items;
194 } irqfds; 194 } irqfds;
195 struct list_head ioeventfds; 195 struct list_head ioeventfds;
196 #endif 196 #endif
197 struct kvm_vm_stat stat; 197 struct kvm_vm_stat stat;
198 struct kvm_arch arch; 198 struct kvm_arch arch;
199 atomic_t users_count; 199 atomic_t users_count;
200 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET 200 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
201 struct kvm_coalesced_mmio_dev *coalesced_mmio_dev; 201 struct kvm_coalesced_mmio_dev *coalesced_mmio_dev;
202 struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; 202 struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
203 #endif 203 #endif
204 204
205 struct mutex irq_lock; 205 struct mutex irq_lock;
206 #ifdef CONFIG_HAVE_KVM_IRQCHIP 206 #ifdef CONFIG_HAVE_KVM_IRQCHIP
207 struct kvm_irq_routing_table *irq_routing; 207 struct kvm_irq_routing_table *irq_routing;
208 struct hlist_head mask_notifier_list; 208 struct hlist_head mask_notifier_list;
209 struct hlist_head irq_ack_notifier_list; 209 struct hlist_head irq_ack_notifier_list;
210 #endif 210 #endif
211 211
212 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER 212 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
213 struct mmu_notifier mmu_notifier; 213 struct mmu_notifier mmu_notifier;
214 unsigned long mmu_notifier_seq; 214 unsigned long mmu_notifier_seq;
215 long mmu_notifier_count; 215 long mmu_notifier_count;
216 #endif 216 #endif
217 }; 217 };
218 218
219 /* The guest did something we don't support. */ 219 /* The guest did something we don't support. */
220 #define pr_unimpl(vcpu, fmt, ...) \ 220 #define pr_unimpl(vcpu, fmt, ...) \
221 do { \ 221 do { \
222 if (printk_ratelimit()) \ 222 if (printk_ratelimit()) \
223 printk(KERN_ERR "kvm: %i: cpu%i " fmt, \ 223 printk(KERN_ERR "kvm: %i: cpu%i " fmt, \
224 current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__); \ 224 current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__); \
225 } while (0) 225 } while (0)
226 226
227 #define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt) 227 #define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt)
228 #define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt) 228 #define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt)
229 229
230 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) 230 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
231 { 231 {
232 smp_rmb(); 232 smp_rmb();
233 return kvm->vcpus[i]; 233 return kvm->vcpus[i];
234 } 234 }
235 235
236 #define kvm_for_each_vcpu(idx, vcpup, kvm) \ 236 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
237 for (idx = 0, vcpup = kvm_get_vcpu(kvm, idx); \ 237 for (idx = 0, vcpup = kvm_get_vcpu(kvm, idx); \
238 idx < atomic_read(&kvm->online_vcpus) && vcpup; \ 238 idx < atomic_read(&kvm->online_vcpus) && vcpup; \
239 vcpup = kvm_get_vcpu(kvm, ++idx)) 239 vcpup = kvm_get_vcpu(kvm, ++idx))
240 240
241 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); 241 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
242 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); 242 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
243 243
244 void vcpu_load(struct kvm_vcpu *vcpu); 244 void vcpu_load(struct kvm_vcpu *vcpu);
245 void vcpu_put(struct kvm_vcpu *vcpu); 245 void vcpu_put(struct kvm_vcpu *vcpu);
246 246
247 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, 247 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
248 struct module *module); 248 struct module *module);
249 void kvm_exit(void); 249 void kvm_exit(void);
250 250
251 void kvm_get_kvm(struct kvm *kvm); 251 void kvm_get_kvm(struct kvm *kvm);
252 void kvm_put_kvm(struct kvm *kvm); 252 void kvm_put_kvm(struct kvm *kvm);
253 253
254 static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) 254 static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
255 { 255 {
256 return rcu_dereference_check(kvm->memslots, 256 return rcu_dereference_check(kvm->memslots,
257 srcu_read_lock_held(&kvm->srcu) 257 srcu_read_lock_held(&kvm->srcu)
258 || lockdep_is_held(&kvm->slots_lock)); 258 || lockdep_is_held(&kvm->slots_lock));
259 } 259 }
260 260
261 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1) 261 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
262 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB) 262 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
263 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; } 263 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
264 264
265 extern struct page *bad_page; 265 extern struct page *bad_page;
266 extern pfn_t bad_pfn; 266 extern pfn_t bad_pfn;
267 267
268 int is_error_page(struct page *page); 268 int is_error_page(struct page *page);
269 int is_error_pfn(pfn_t pfn); 269 int is_error_pfn(pfn_t pfn);
270 int is_hwpoison_pfn(pfn_t pfn); 270 int is_hwpoison_pfn(pfn_t pfn);
271 int kvm_is_error_hva(unsigned long addr); 271 int kvm_is_error_hva(unsigned long addr);
272 int kvm_set_memory_region(struct kvm *kvm, 272 int kvm_set_memory_region(struct kvm *kvm,
273 struct kvm_userspace_memory_region *mem, 273 struct kvm_userspace_memory_region *mem,
274 int user_alloc); 274 int user_alloc);
275 int __kvm_set_memory_region(struct kvm *kvm, 275 int __kvm_set_memory_region(struct kvm *kvm,
276 struct kvm_userspace_memory_region *mem, 276 struct kvm_userspace_memory_region *mem,
277 int user_alloc); 277 int user_alloc);
278 int kvm_arch_prepare_memory_region(struct kvm *kvm, 278 int kvm_arch_prepare_memory_region(struct kvm *kvm,
279 struct kvm_memory_slot *memslot, 279 struct kvm_memory_slot *memslot,
280 struct kvm_memory_slot old, 280 struct kvm_memory_slot old,
281 struct kvm_userspace_memory_region *mem, 281 struct kvm_userspace_memory_region *mem,
282 int user_alloc); 282 int user_alloc);
283 void kvm_arch_commit_memory_region(struct kvm *kvm, 283 void kvm_arch_commit_memory_region(struct kvm *kvm,
284 struct kvm_userspace_memory_region *mem, 284 struct kvm_userspace_memory_region *mem,
285 struct kvm_memory_slot old, 285 struct kvm_memory_slot old,
286 int user_alloc); 286 int user_alloc);
287 void kvm_disable_largepages(void); 287 void kvm_disable_largepages(void);
288 void kvm_arch_flush_shadow(struct kvm *kvm); 288 void kvm_arch_flush_shadow(struct kvm *kvm);
289 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
290 gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn);
291 289
292 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); 290 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
293 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); 291 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
294 void kvm_release_page_clean(struct page *page); 292 void kvm_release_page_clean(struct page *page);
295 void kvm_release_page_dirty(struct page *page); 293 void kvm_release_page_dirty(struct page *page);
296 void kvm_set_page_dirty(struct page *page); 294 void kvm_set_page_dirty(struct page *page);
297 void kvm_set_page_accessed(struct page *page); 295 void kvm_set_page_accessed(struct page *page);
298 296
299 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); 297 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
300 pfn_t gfn_to_pfn_memslot(struct kvm *kvm, 298 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
301 struct kvm_memory_slot *slot, gfn_t gfn); 299 struct kvm_memory_slot *slot, gfn_t gfn);
302 int memslot_id(struct kvm *kvm, gfn_t gfn); 300 int memslot_id(struct kvm *kvm, gfn_t gfn);
303 void kvm_release_pfn_dirty(pfn_t); 301 void kvm_release_pfn_dirty(pfn_t);
304 void kvm_release_pfn_clean(pfn_t pfn); 302 void kvm_release_pfn_clean(pfn_t pfn);
305 void kvm_set_pfn_dirty(pfn_t pfn); 303 void kvm_set_pfn_dirty(pfn_t pfn);
306 void kvm_set_pfn_accessed(pfn_t pfn); 304 void kvm_set_pfn_accessed(pfn_t pfn);
307 void kvm_get_pfn(pfn_t pfn); 305 void kvm_get_pfn(pfn_t pfn);
308 306
309 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, 307 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
310 int len); 308 int len);
311 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, 309 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
312 unsigned long len); 310 unsigned long len);
313 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); 311 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
314 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, 312 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
315 int offset, int len); 313 int offset, int len);
316 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, 314 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
317 unsigned long len); 315 unsigned long len);
318 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len); 316 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
319 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); 317 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
320 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); 318 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
321 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); 319 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
322 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); 320 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
323 void mark_page_dirty(struct kvm *kvm, gfn_t gfn); 321 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
324 322
325 void kvm_vcpu_block(struct kvm_vcpu *vcpu); 323 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
326 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); 324 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
327 void kvm_resched(struct kvm_vcpu *vcpu); 325 void kvm_resched(struct kvm_vcpu *vcpu);
328 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); 326 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
329 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); 327 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
330 void kvm_flush_remote_tlbs(struct kvm *kvm); 328 void kvm_flush_remote_tlbs(struct kvm *kvm);
331 void kvm_reload_remote_mmus(struct kvm *kvm); 329 void kvm_reload_remote_mmus(struct kvm *kvm);
332 330
333 long kvm_arch_dev_ioctl(struct file *filp, 331 long kvm_arch_dev_ioctl(struct file *filp,
334 unsigned int ioctl, unsigned long arg); 332 unsigned int ioctl, unsigned long arg);
335 long kvm_arch_vcpu_ioctl(struct file *filp, 333 long kvm_arch_vcpu_ioctl(struct file *filp,
336 unsigned int ioctl, unsigned long arg); 334 unsigned int ioctl, unsigned long arg);
337 335
338 int kvm_dev_ioctl_check_extension(long ext); 336 int kvm_dev_ioctl_check_extension(long ext);
339 337
340 int kvm_get_dirty_log(struct kvm *kvm, 338 int kvm_get_dirty_log(struct kvm *kvm,
341 struct kvm_dirty_log *log, int *is_dirty); 339 struct kvm_dirty_log *log, int *is_dirty);
342 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, 340 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
343 struct kvm_dirty_log *log); 341 struct kvm_dirty_log *log);
344 342
345 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, 343 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
346 struct 344 struct
347 kvm_userspace_memory_region *mem, 345 kvm_userspace_memory_region *mem,
348 int user_alloc); 346 int user_alloc);
349 long kvm_arch_vm_ioctl(struct file *filp, 347 long kvm_arch_vm_ioctl(struct file *filp,
350 unsigned int ioctl, unsigned long arg); 348 unsigned int ioctl, unsigned long arg);
351 349
352 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); 350 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu);
353 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); 351 int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu);
354 352
355 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, 353 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
356 struct kvm_translation *tr); 354 struct kvm_translation *tr);
357 355
358 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); 356 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs);
359 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); 357 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs);
360 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, 358 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
361 struct kvm_sregs *sregs); 359 struct kvm_sregs *sregs);
362 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, 360 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
363 struct kvm_sregs *sregs); 361 struct kvm_sregs *sregs);
364 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 362 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
365 struct kvm_mp_state *mp_state); 363 struct kvm_mp_state *mp_state);
366 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 364 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
367 struct kvm_mp_state *mp_state); 365 struct kvm_mp_state *mp_state);
368 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 366 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
369 struct kvm_guest_debug *dbg); 367 struct kvm_guest_debug *dbg);
370 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); 368 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
371 369
372 int kvm_arch_init(void *opaque); 370 int kvm_arch_init(void *opaque);
373 void kvm_arch_exit(void); 371 void kvm_arch_exit(void);
374 372
375 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu); 373 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
376 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); 374 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
377 375
378 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu); 376 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
379 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); 377 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
380 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); 378 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
381 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id); 379 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id);
382 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu); 380 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu);
383 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu); 381 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
384 382
385 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu); 383 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu);
386 int kvm_arch_hardware_enable(void *garbage); 384 int kvm_arch_hardware_enable(void *garbage);
387 void kvm_arch_hardware_disable(void *garbage); 385 void kvm_arch_hardware_disable(void *garbage);
388 int kvm_arch_hardware_setup(void); 386 int kvm_arch_hardware_setup(void);
389 void kvm_arch_hardware_unsetup(void); 387 void kvm_arch_hardware_unsetup(void);
390 void kvm_arch_check_processor_compat(void *rtn); 388 void kvm_arch_check_processor_compat(void *rtn);
391 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); 389 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
392 390
393 void kvm_free_physmem(struct kvm *kvm); 391 void kvm_free_physmem(struct kvm *kvm);
394 392
395 struct kvm *kvm_arch_create_vm(void); 393 struct kvm *kvm_arch_create_vm(void);
396 void kvm_arch_destroy_vm(struct kvm *kvm); 394 void kvm_arch_destroy_vm(struct kvm *kvm);
397 void kvm_free_all_assigned_devices(struct kvm *kvm); 395 void kvm_free_all_assigned_devices(struct kvm *kvm);
398 void kvm_arch_sync_events(struct kvm *kvm); 396 void kvm_arch_sync_events(struct kvm *kvm);
399 397
400 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); 398 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
401 void kvm_vcpu_kick(struct kvm_vcpu *vcpu); 399 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
402 400
403 int kvm_is_mmio_pfn(pfn_t pfn); 401 int kvm_is_mmio_pfn(pfn_t pfn);
404 402
405 struct kvm_irq_ack_notifier { 403 struct kvm_irq_ack_notifier {
406 struct hlist_node link; 404 struct hlist_node link;
407 unsigned gsi; 405 unsigned gsi;
408 void (*irq_acked)(struct kvm_irq_ack_notifier *kian); 406 void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
409 }; 407 };
410 408
411 #define KVM_ASSIGNED_MSIX_PENDING 0x1 409 #define KVM_ASSIGNED_MSIX_PENDING 0x1
412 struct kvm_guest_msix_entry { 410 struct kvm_guest_msix_entry {
413 u32 vector; 411 u32 vector;
414 u16 entry; 412 u16 entry;
415 u16 flags; 413 u16 flags;
416 }; 414 };
417 415
418 struct kvm_assigned_dev_kernel { 416 struct kvm_assigned_dev_kernel {
419 struct kvm_irq_ack_notifier ack_notifier; 417 struct kvm_irq_ack_notifier ack_notifier;
420 struct work_struct interrupt_work; 418 struct work_struct interrupt_work;
421 struct list_head list; 419 struct list_head list;
422 int assigned_dev_id; 420 int assigned_dev_id;
423 int host_segnr; 421 int host_segnr;
424 int host_busnr; 422 int host_busnr;
425 int host_devfn; 423 int host_devfn;
426 unsigned int entries_nr; 424 unsigned int entries_nr;
427 int host_irq; 425 int host_irq;
428 bool host_irq_disabled; 426 bool host_irq_disabled;
429 struct msix_entry *host_msix_entries; 427 struct msix_entry *host_msix_entries;
430 int guest_irq; 428 int guest_irq;
431 struct kvm_guest_msix_entry *guest_msix_entries; 429 struct kvm_guest_msix_entry *guest_msix_entries;
432 unsigned long irq_requested_type; 430 unsigned long irq_requested_type;
433 int irq_source_id; 431 int irq_source_id;
434 int flags; 432 int flags;
435 struct pci_dev *dev; 433 struct pci_dev *dev;
436 struct kvm *kvm; 434 struct kvm *kvm;
437 spinlock_t assigned_dev_lock; 435 spinlock_t assigned_dev_lock;
438 }; 436 };
439 437
440 struct kvm_irq_mask_notifier { 438 struct kvm_irq_mask_notifier {
441 void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); 439 void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked);
442 int irq; 440 int irq;
443 struct hlist_node link; 441 struct hlist_node link;
444 }; 442 };
445 443
446 void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, 444 void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
447 struct kvm_irq_mask_notifier *kimn); 445 struct kvm_irq_mask_notifier *kimn);
448 void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, 446 void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
449 struct kvm_irq_mask_notifier *kimn); 447 struct kvm_irq_mask_notifier *kimn);
450 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); 448 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
451 449
452 #ifdef __KVM_HAVE_IOAPIC 450 #ifdef __KVM_HAVE_IOAPIC
453 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, 451 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
454 union kvm_ioapic_redirect_entry *entry, 452 union kvm_ioapic_redirect_entry *entry,
455 unsigned long *deliver_bitmask); 453 unsigned long *deliver_bitmask);
456 #endif 454 #endif
457 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); 455 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
458 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); 456 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
459 void kvm_register_irq_ack_notifier(struct kvm *kvm, 457 void kvm_register_irq_ack_notifier(struct kvm *kvm,
460 struct kvm_irq_ack_notifier *kian); 458 struct kvm_irq_ack_notifier *kian);
461 void kvm_unregister_irq_ack_notifier(struct kvm *kvm, 459 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
462 struct kvm_irq_ack_notifier *kian); 460 struct kvm_irq_ack_notifier *kian);
463 int kvm_request_irq_source_id(struct kvm *kvm); 461 int kvm_request_irq_source_id(struct kvm *kvm);
464 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); 462 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
465 463
466 /* For vcpu->arch.iommu_flags */ 464 /* For vcpu->arch.iommu_flags */
467 #define KVM_IOMMU_CACHE_COHERENCY 0x1 465 #define KVM_IOMMU_CACHE_COHERENCY 0x1
468 466
469 #ifdef CONFIG_IOMMU_API 467 #ifdef CONFIG_IOMMU_API
470 int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot); 468 int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
471 int kvm_iommu_map_guest(struct kvm *kvm); 469 int kvm_iommu_map_guest(struct kvm *kvm);
472 int kvm_iommu_unmap_guest(struct kvm *kvm); 470 int kvm_iommu_unmap_guest(struct kvm *kvm);
473 int kvm_assign_device(struct kvm *kvm, 471 int kvm_assign_device(struct kvm *kvm,
474 struct kvm_assigned_dev_kernel *assigned_dev); 472 struct kvm_assigned_dev_kernel *assigned_dev);
475 int kvm_deassign_device(struct kvm *kvm, 473 int kvm_deassign_device(struct kvm *kvm,
476 struct kvm_assigned_dev_kernel *assigned_dev); 474 struct kvm_assigned_dev_kernel *assigned_dev);
477 #else /* CONFIG_IOMMU_API */ 475 #else /* CONFIG_IOMMU_API */
478 static inline int kvm_iommu_map_pages(struct kvm *kvm, 476 static inline int kvm_iommu_map_pages(struct kvm *kvm,
479 gfn_t base_gfn, 477 gfn_t base_gfn,
480 unsigned long npages) 478 unsigned long npages)
481 { 479 {
482 return 0; 480 return 0;
483 } 481 }
484 482
485 static inline int kvm_iommu_map_guest(struct kvm *kvm) 483 static inline int kvm_iommu_map_guest(struct kvm *kvm)
486 { 484 {
487 return -ENODEV; 485 return -ENODEV;
488 } 486 }
489 487
490 static inline int kvm_iommu_unmap_guest(struct kvm *kvm) 488 static inline int kvm_iommu_unmap_guest(struct kvm *kvm)
491 { 489 {
492 return 0; 490 return 0;
493 } 491 }
494 492
495 static inline int kvm_assign_device(struct kvm *kvm, 493 static inline int kvm_assign_device(struct kvm *kvm,
496 struct kvm_assigned_dev_kernel *assigned_dev) 494 struct kvm_assigned_dev_kernel *assigned_dev)
497 { 495 {
498 return 0; 496 return 0;
499 } 497 }
500 498
501 static inline int kvm_deassign_device(struct kvm *kvm, 499 static inline int kvm_deassign_device(struct kvm *kvm,
502 struct kvm_assigned_dev_kernel *assigned_dev) 500 struct kvm_assigned_dev_kernel *assigned_dev)
503 { 501 {
504 return 0; 502 return 0;
505 } 503 }
506 #endif /* CONFIG_IOMMU_API */ 504 #endif /* CONFIG_IOMMU_API */
507 505
508 static inline void kvm_guest_enter(void) 506 static inline void kvm_guest_enter(void)
509 { 507 {
510 account_system_vtime(current); 508 account_system_vtime(current);
511 current->flags |= PF_VCPU; 509 current->flags |= PF_VCPU;
512 } 510 }
513 511
514 static inline void kvm_guest_exit(void) 512 static inline void kvm_guest_exit(void)
515 { 513 {
516 account_system_vtime(current); 514 account_system_vtime(current);
517 current->flags &= ~PF_VCPU; 515 current->flags &= ~PF_VCPU;
518 } 516 }
519 517
520 static inline gpa_t gfn_to_gpa(gfn_t gfn) 518 static inline gpa_t gfn_to_gpa(gfn_t gfn)
521 { 519 {
522 return (gpa_t)gfn << PAGE_SHIFT; 520 return (gpa_t)gfn << PAGE_SHIFT;
523 } 521 }
524 522
525 static inline hpa_t pfn_to_hpa(pfn_t pfn) 523 static inline hpa_t pfn_to_hpa(pfn_t pfn)
526 { 524 {
527 return (hpa_t)pfn << PAGE_SHIFT; 525 return (hpa_t)pfn << PAGE_SHIFT;
528 } 526 }
529 527
530 static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu) 528 static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu)
531 { 529 {
532 set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests); 530 set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests);
533 } 531 }
534 532
535 enum kvm_stat_kind { 533 enum kvm_stat_kind {
536 KVM_STAT_VM, 534 KVM_STAT_VM,
537 KVM_STAT_VCPU, 535 KVM_STAT_VCPU,
538 }; 536 };
539 537
540 struct kvm_stats_debugfs_item { 538 struct kvm_stats_debugfs_item {
541 const char *name; 539 const char *name;
542 int offset; 540 int offset;
543 enum kvm_stat_kind kind; 541 enum kvm_stat_kind kind;
544 struct dentry *dentry; 542 struct dentry *dentry;
545 }; 543 };
546 extern struct kvm_stats_debugfs_item debugfs_entries[]; 544 extern struct kvm_stats_debugfs_item debugfs_entries[];
547 extern struct dentry *kvm_debugfs_dir; 545 extern struct dentry *kvm_debugfs_dir;
548 546
549 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER 547 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
550 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) 548 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
551 { 549 {
552 if (unlikely(vcpu->kvm->mmu_notifier_count)) 550 if (unlikely(vcpu->kvm->mmu_notifier_count))
553 return 1; 551 return 1;
554 /* 552 /*
555 * Both reads happen under the mmu_lock and both values are 553 * Both reads happen under the mmu_lock and both values are
556 * modified under mmu_lock, so there's no need of smb_rmb() 554 * modified under mmu_lock, so there's no need of smb_rmb()
557 * here in between, otherwise mmu_notifier_count should be 555 * here in between, otherwise mmu_notifier_count should be
558 * read before mmu_notifier_seq, see 556 * read before mmu_notifier_seq, see
559 * mmu_notifier_invalidate_range_end write side. 557 * mmu_notifier_invalidate_range_end write side.
560 */ 558 */
561 if (vcpu->kvm->mmu_notifier_seq != mmu_seq) 559 if (vcpu->kvm->mmu_notifier_seq != mmu_seq)
562 return 1; 560 return 1;
563 return 0; 561 return 0;
564 } 562 }
565 #endif
566
567 #ifndef KVM_ARCH_HAS_UNALIAS_INSTANTIATION
568 #define unalias_gfn_instantiation unalias_gfn
569 #endif 563 #endif
570 564
571 #ifdef CONFIG_HAVE_KVM_IRQCHIP 565 #ifdef CONFIG_HAVE_KVM_IRQCHIP
572 566
573 #define KVM_MAX_IRQ_ROUTES 1024 567 #define KVM_MAX_IRQ_ROUTES 1024
574 568
575 int kvm_setup_default_irq_routing(struct kvm *kvm); 569 int kvm_setup_default_irq_routing(struct kvm *kvm);
576 int kvm_set_irq_routing(struct kvm *kvm, 570 int kvm_set_irq_routing(struct kvm *kvm,
577 const struct kvm_irq_routing_entry *entries, 571 const struct kvm_irq_routing_entry *entries,
578 unsigned nr, 572 unsigned nr,
579 unsigned flags); 573 unsigned flags);
580 void kvm_free_irq_routing(struct kvm *kvm); 574 void kvm_free_irq_routing(struct kvm *kvm);
581 575
582 #else 576 #else
583 577
584 static inline void kvm_free_irq_routing(struct kvm *kvm) {} 578 static inline void kvm_free_irq_routing(struct kvm *kvm) {}
585 579
586 #endif 580 #endif
587 581
588 #ifdef CONFIG_HAVE_KVM_EVENTFD 582 #ifdef CONFIG_HAVE_KVM_EVENTFD
589 583
590 void kvm_eventfd_init(struct kvm *kvm); 584 void kvm_eventfd_init(struct kvm *kvm);
591 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); 585 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
592 void kvm_irqfd_release(struct kvm *kvm); 586 void kvm_irqfd_release(struct kvm *kvm);
593 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); 587 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
594 588
595 #else 589 #else
596 590
597 static inline void kvm_eventfd_init(struct kvm *kvm) {} 591 static inline void kvm_eventfd_init(struct kvm *kvm) {}
598 static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) 592 static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
599 { 593 {
600 return -EINVAL; 594 return -EINVAL;
601 } 595 }
602 596
603 static inline void kvm_irqfd_release(struct kvm *kvm) {} 597 static inline void kvm_irqfd_release(struct kvm *kvm) {}
604 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) 598 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
605 { 599 {
606 return -ENOSYS; 600 return -ENOSYS;
607 } 601 }
608 602
609 #endif /* CONFIG_HAVE_KVM_EVENTFD */ 603 #endif /* CONFIG_HAVE_KVM_EVENTFD */
610 604
611 #ifdef CONFIG_KVM_APIC_ARCHITECTURE 605 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
612 static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) 606 static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
613 { 607 {
614 return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id; 608 return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id;
615 } 609 }
616 #endif 610 #endif
617 611
618 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT 612 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
619 613
620 long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, 614 long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
621 unsigned long arg); 615 unsigned long arg);
622 616
623 #else 617 #else
624 618
625 static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, 619 static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
626 unsigned long arg) 620 unsigned long arg)
627 { 621 {
628 return -ENOTTY; 622 return -ENOTTY;
629 } 623 }
630 624
631 #endif 625 #endif
632 626
633 #endif 627 #endif
634 628
635 629
1 /* 1 /*
2 * Kernel-based Virtual Machine driver for Linux 2 * Kernel-based Virtual Machine driver for Linux
3 * 3 *
4 * This module enables machines with Intel VT-x extensions to run virtual 4 * This module enables machines with Intel VT-x extensions to run virtual
5 * machines without emulation or binary translation. 5 * machines without emulation or binary translation.
6 * 6 *
7 * Copyright (C) 2006 Qumranet, Inc. 7 * Copyright (C) 2006 Qumranet, Inc.
8 * Copyright 2010 Red Hat, Inc. and/or its affilates. 8 * Copyright 2010 Red Hat, Inc. and/or its affilates.
9 * 9 *
10 * Authors: 10 * Authors:
11 * Avi Kivity <avi@qumranet.com> 11 * Avi Kivity <avi@qumranet.com>
12 * Yaniv Kamay <yaniv@qumranet.com> 12 * Yaniv Kamay <yaniv@qumranet.com>
13 * 13 *
14 * This work is licensed under the terms of the GNU GPL, version 2. See 14 * This work is licensed under the terms of the GNU GPL, version 2. See
15 * the COPYING file in the top-level directory. 15 * the COPYING file in the top-level directory.
16 * 16 *
17 */ 17 */
18 18
19 #include "iodev.h" 19 #include "iodev.h"
20 20
21 #include <linux/kvm_host.h> 21 #include <linux/kvm_host.h>
22 #include <linux/kvm.h> 22 #include <linux/kvm.h>
23 #include <linux/module.h> 23 #include <linux/module.h>
24 #include <linux/errno.h> 24 #include <linux/errno.h>
25 #include <linux/percpu.h> 25 #include <linux/percpu.h>
26 #include <linux/mm.h> 26 #include <linux/mm.h>
27 #include <linux/miscdevice.h> 27 #include <linux/miscdevice.h>
28 #include <linux/vmalloc.h> 28 #include <linux/vmalloc.h>
29 #include <linux/reboot.h> 29 #include <linux/reboot.h>
30 #include <linux/debugfs.h> 30 #include <linux/debugfs.h>
31 #include <linux/highmem.h> 31 #include <linux/highmem.h>
32 #include <linux/file.h> 32 #include <linux/file.h>
33 #include <linux/sysdev.h> 33 #include <linux/sysdev.h>
34 #include <linux/cpu.h> 34 #include <linux/cpu.h>
35 #include <linux/sched.h> 35 #include <linux/sched.h>
36 #include <linux/cpumask.h> 36 #include <linux/cpumask.h>
37 #include <linux/smp.h> 37 #include <linux/smp.h>
38 #include <linux/anon_inodes.h> 38 #include <linux/anon_inodes.h>
39 #include <linux/profile.h> 39 #include <linux/profile.h>
40 #include <linux/kvm_para.h> 40 #include <linux/kvm_para.h>
41 #include <linux/pagemap.h> 41 #include <linux/pagemap.h>
42 #include <linux/mman.h> 42 #include <linux/mman.h>
43 #include <linux/swap.h> 43 #include <linux/swap.h>
44 #include <linux/bitops.h> 44 #include <linux/bitops.h>
45 #include <linux/spinlock.h> 45 #include <linux/spinlock.h>
46 #include <linux/compat.h> 46 #include <linux/compat.h>
47 #include <linux/srcu.h> 47 #include <linux/srcu.h>
48 #include <linux/hugetlb.h> 48 #include <linux/hugetlb.h>
49 #include <linux/slab.h> 49 #include <linux/slab.h>
50 50
51 #include <asm/processor.h> 51 #include <asm/processor.h>
52 #include <asm/io.h> 52 #include <asm/io.h>
53 #include <asm/uaccess.h> 53 #include <asm/uaccess.h>
54 #include <asm/pgtable.h> 54 #include <asm/pgtable.h>
55 #include <asm-generic/bitops/le.h> 55 #include <asm-generic/bitops/le.h>
56 56
57 #include "coalesced_mmio.h" 57 #include "coalesced_mmio.h"
58 58
59 #define CREATE_TRACE_POINTS 59 #define CREATE_TRACE_POINTS
60 #include <trace/events/kvm.h> 60 #include <trace/events/kvm.h>
61 61
62 MODULE_AUTHOR("Qumranet"); 62 MODULE_AUTHOR("Qumranet");
63 MODULE_LICENSE("GPL"); 63 MODULE_LICENSE("GPL");
64 64
65 /* 65 /*
66 * Ordering of locks: 66 * Ordering of locks:
67 * 67 *
68 * kvm->lock --> kvm->slots_lock --> kvm->irq_lock 68 * kvm->lock --> kvm->slots_lock --> kvm->irq_lock
69 */ 69 */
70 70
71 DEFINE_SPINLOCK(kvm_lock); 71 DEFINE_SPINLOCK(kvm_lock);
72 LIST_HEAD(vm_list); 72 LIST_HEAD(vm_list);
73 73
74 static cpumask_var_t cpus_hardware_enabled; 74 static cpumask_var_t cpus_hardware_enabled;
75 static int kvm_usage_count = 0; 75 static int kvm_usage_count = 0;
76 static atomic_t hardware_enable_failed; 76 static atomic_t hardware_enable_failed;
77 77
78 struct kmem_cache *kvm_vcpu_cache; 78 struct kmem_cache *kvm_vcpu_cache;
79 EXPORT_SYMBOL_GPL(kvm_vcpu_cache); 79 EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
80 80
81 static __read_mostly struct preempt_ops kvm_preempt_ops; 81 static __read_mostly struct preempt_ops kvm_preempt_ops;
82 82
83 struct dentry *kvm_debugfs_dir; 83 struct dentry *kvm_debugfs_dir;
84 84
85 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, 85 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
86 unsigned long arg); 86 unsigned long arg);
87 static int hardware_enable_all(void); 87 static int hardware_enable_all(void);
88 static void hardware_disable_all(void); 88 static void hardware_disable_all(void);
89 89
90 static void kvm_io_bus_destroy(struct kvm_io_bus *bus); 90 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
91 91
92 static bool kvm_rebooting; 92 static bool kvm_rebooting;
93 93
94 static bool largepages_enabled = true; 94 static bool largepages_enabled = true;
95 95
96 struct page *hwpoison_page; 96 struct page *hwpoison_page;
97 pfn_t hwpoison_pfn; 97 pfn_t hwpoison_pfn;
98 98
99 inline int kvm_is_mmio_pfn(pfn_t pfn) 99 inline int kvm_is_mmio_pfn(pfn_t pfn)
100 { 100 {
101 if (pfn_valid(pfn)) { 101 if (pfn_valid(pfn)) {
102 struct page *page = compound_head(pfn_to_page(pfn)); 102 struct page *page = compound_head(pfn_to_page(pfn));
103 return PageReserved(page); 103 return PageReserved(page);
104 } 104 }
105 105
106 return true; 106 return true;
107 } 107 }
108 108
109 /* 109 /*
110 * Switches to specified vcpu, until a matching vcpu_put() 110 * Switches to specified vcpu, until a matching vcpu_put()
111 */ 111 */
112 void vcpu_load(struct kvm_vcpu *vcpu) 112 void vcpu_load(struct kvm_vcpu *vcpu)
113 { 113 {
114 int cpu; 114 int cpu;
115 115
116 mutex_lock(&vcpu->mutex); 116 mutex_lock(&vcpu->mutex);
117 cpu = get_cpu(); 117 cpu = get_cpu();
118 preempt_notifier_register(&vcpu->preempt_notifier); 118 preempt_notifier_register(&vcpu->preempt_notifier);
119 kvm_arch_vcpu_load(vcpu, cpu); 119 kvm_arch_vcpu_load(vcpu, cpu);
120 put_cpu(); 120 put_cpu();
121 } 121 }
122 122
123 void vcpu_put(struct kvm_vcpu *vcpu) 123 void vcpu_put(struct kvm_vcpu *vcpu)
124 { 124 {
125 preempt_disable(); 125 preempt_disable();
126 kvm_arch_vcpu_put(vcpu); 126 kvm_arch_vcpu_put(vcpu);
127 preempt_notifier_unregister(&vcpu->preempt_notifier); 127 preempt_notifier_unregister(&vcpu->preempt_notifier);
128 preempt_enable(); 128 preempt_enable();
129 mutex_unlock(&vcpu->mutex); 129 mutex_unlock(&vcpu->mutex);
130 } 130 }
131 131
132 static void ack_flush(void *_completed) 132 static void ack_flush(void *_completed)
133 { 133 {
134 } 134 }
135 135
136 static bool make_all_cpus_request(struct kvm *kvm, unsigned int req) 136 static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
137 { 137 {
138 int i, cpu, me; 138 int i, cpu, me;
139 cpumask_var_t cpus; 139 cpumask_var_t cpus;
140 bool called = true; 140 bool called = true;
141 struct kvm_vcpu *vcpu; 141 struct kvm_vcpu *vcpu;
142 142
143 zalloc_cpumask_var(&cpus, GFP_ATOMIC); 143 zalloc_cpumask_var(&cpus, GFP_ATOMIC);
144 144
145 raw_spin_lock(&kvm->requests_lock); 145 raw_spin_lock(&kvm->requests_lock);
146 me = smp_processor_id(); 146 me = smp_processor_id();
147 kvm_for_each_vcpu(i, vcpu, kvm) { 147 kvm_for_each_vcpu(i, vcpu, kvm) {
148 if (test_and_set_bit(req, &vcpu->requests)) 148 if (test_and_set_bit(req, &vcpu->requests))
149 continue; 149 continue;
150 cpu = vcpu->cpu; 150 cpu = vcpu->cpu;
151 if (cpus != NULL && cpu != -1 && cpu != me) 151 if (cpus != NULL && cpu != -1 && cpu != me)
152 cpumask_set_cpu(cpu, cpus); 152 cpumask_set_cpu(cpu, cpus);
153 } 153 }
154 if (unlikely(cpus == NULL)) 154 if (unlikely(cpus == NULL))
155 smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1); 155 smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1);
156 else if (!cpumask_empty(cpus)) 156 else if (!cpumask_empty(cpus))
157 smp_call_function_many(cpus, ack_flush, NULL, 1); 157 smp_call_function_many(cpus, ack_flush, NULL, 1);
158 else 158 else
159 called = false; 159 called = false;
160 raw_spin_unlock(&kvm->requests_lock); 160 raw_spin_unlock(&kvm->requests_lock);
161 free_cpumask_var(cpus); 161 free_cpumask_var(cpus);
162 return called; 162 return called;
163 } 163 }
164 164
165 void kvm_flush_remote_tlbs(struct kvm *kvm) 165 void kvm_flush_remote_tlbs(struct kvm *kvm)
166 { 166 {
167 if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) 167 if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
168 ++kvm->stat.remote_tlb_flush; 168 ++kvm->stat.remote_tlb_flush;
169 } 169 }
170 170
171 void kvm_reload_remote_mmus(struct kvm *kvm) 171 void kvm_reload_remote_mmus(struct kvm *kvm)
172 { 172 {
173 make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); 173 make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
174 } 174 }
175 175
176 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) 176 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
177 { 177 {
178 struct page *page; 178 struct page *page;
179 int r; 179 int r;
180 180
181 mutex_init(&vcpu->mutex); 181 mutex_init(&vcpu->mutex);
182 vcpu->cpu = -1; 182 vcpu->cpu = -1;
183 vcpu->kvm = kvm; 183 vcpu->kvm = kvm;
184 vcpu->vcpu_id = id; 184 vcpu->vcpu_id = id;
185 init_waitqueue_head(&vcpu->wq); 185 init_waitqueue_head(&vcpu->wq);
186 186
187 page = alloc_page(GFP_KERNEL | __GFP_ZERO); 187 page = alloc_page(GFP_KERNEL | __GFP_ZERO);
188 if (!page) { 188 if (!page) {
189 r = -ENOMEM; 189 r = -ENOMEM;
190 goto fail; 190 goto fail;
191 } 191 }
192 vcpu->run = page_address(page); 192 vcpu->run = page_address(page);
193 193
194 r = kvm_arch_vcpu_init(vcpu); 194 r = kvm_arch_vcpu_init(vcpu);
195 if (r < 0) 195 if (r < 0)
196 goto fail_free_run; 196 goto fail_free_run;
197 return 0; 197 return 0;
198 198
199 fail_free_run: 199 fail_free_run:
200 free_page((unsigned long)vcpu->run); 200 free_page((unsigned long)vcpu->run);
201 fail: 201 fail:
202 return r; 202 return r;
203 } 203 }
204 EXPORT_SYMBOL_GPL(kvm_vcpu_init); 204 EXPORT_SYMBOL_GPL(kvm_vcpu_init);
205 205
206 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu) 206 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu)
207 { 207 {
208 kvm_arch_vcpu_uninit(vcpu); 208 kvm_arch_vcpu_uninit(vcpu);
209 free_page((unsigned long)vcpu->run); 209 free_page((unsigned long)vcpu->run);
210 } 210 }
211 EXPORT_SYMBOL_GPL(kvm_vcpu_uninit); 211 EXPORT_SYMBOL_GPL(kvm_vcpu_uninit);
212 212
213 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) 213 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
214 static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) 214 static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
215 { 215 {
216 return container_of(mn, struct kvm, mmu_notifier); 216 return container_of(mn, struct kvm, mmu_notifier);
217 } 217 }
218 218
219 static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, 219 static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
220 struct mm_struct *mm, 220 struct mm_struct *mm,
221 unsigned long address) 221 unsigned long address)
222 { 222 {
223 struct kvm *kvm = mmu_notifier_to_kvm(mn); 223 struct kvm *kvm = mmu_notifier_to_kvm(mn);
224 int need_tlb_flush, idx; 224 int need_tlb_flush, idx;
225 225
226 /* 226 /*
227 * When ->invalidate_page runs, the linux pte has been zapped 227 * When ->invalidate_page runs, the linux pte has been zapped
228 * already but the page is still allocated until 228 * already but the page is still allocated until
229 * ->invalidate_page returns. So if we increase the sequence 229 * ->invalidate_page returns. So if we increase the sequence
230 * here the kvm page fault will notice if the spte can't be 230 * here the kvm page fault will notice if the spte can't be
231 * established because the page is going to be freed. If 231 * established because the page is going to be freed. If
232 * instead the kvm page fault establishes the spte before 232 * instead the kvm page fault establishes the spte before
233 * ->invalidate_page runs, kvm_unmap_hva will release it 233 * ->invalidate_page runs, kvm_unmap_hva will release it
234 * before returning. 234 * before returning.
235 * 235 *
236 * The sequence increase only need to be seen at spin_unlock 236 * The sequence increase only need to be seen at spin_unlock
237 * time, and not at spin_lock time. 237 * time, and not at spin_lock time.
238 * 238 *
239 * Increasing the sequence after the spin_unlock would be 239 * Increasing the sequence after the spin_unlock would be
240 * unsafe because the kvm page fault could then establish the 240 * unsafe because the kvm page fault could then establish the
241 * pte after kvm_unmap_hva returned, without noticing the page 241 * pte after kvm_unmap_hva returned, without noticing the page
242 * is going to be freed. 242 * is going to be freed.
243 */ 243 */
244 idx = srcu_read_lock(&kvm->srcu); 244 idx = srcu_read_lock(&kvm->srcu);
245 spin_lock(&kvm->mmu_lock); 245 spin_lock(&kvm->mmu_lock);
246 kvm->mmu_notifier_seq++; 246 kvm->mmu_notifier_seq++;
247 need_tlb_flush = kvm_unmap_hva(kvm, address); 247 need_tlb_flush = kvm_unmap_hva(kvm, address);
248 spin_unlock(&kvm->mmu_lock); 248 spin_unlock(&kvm->mmu_lock);
249 srcu_read_unlock(&kvm->srcu, idx); 249 srcu_read_unlock(&kvm->srcu, idx);
250 250
251 /* we've to flush the tlb before the pages can be freed */ 251 /* we've to flush the tlb before the pages can be freed */
252 if (need_tlb_flush) 252 if (need_tlb_flush)
253 kvm_flush_remote_tlbs(kvm); 253 kvm_flush_remote_tlbs(kvm);
254 254
255 } 255 }
256 256
257 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, 257 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
258 struct mm_struct *mm, 258 struct mm_struct *mm,
259 unsigned long address, 259 unsigned long address,
260 pte_t pte) 260 pte_t pte)
261 { 261 {
262 struct kvm *kvm = mmu_notifier_to_kvm(mn); 262 struct kvm *kvm = mmu_notifier_to_kvm(mn);
263 int idx; 263 int idx;
264 264
265 idx = srcu_read_lock(&kvm->srcu); 265 idx = srcu_read_lock(&kvm->srcu);
266 spin_lock(&kvm->mmu_lock); 266 spin_lock(&kvm->mmu_lock);
267 kvm->mmu_notifier_seq++; 267 kvm->mmu_notifier_seq++;
268 kvm_set_spte_hva(kvm, address, pte); 268 kvm_set_spte_hva(kvm, address, pte);
269 spin_unlock(&kvm->mmu_lock); 269 spin_unlock(&kvm->mmu_lock);
270 srcu_read_unlock(&kvm->srcu, idx); 270 srcu_read_unlock(&kvm->srcu, idx);
271 } 271 }
272 272
273 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, 273 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
274 struct mm_struct *mm, 274 struct mm_struct *mm,
275 unsigned long start, 275 unsigned long start,
276 unsigned long end) 276 unsigned long end)
277 { 277 {
278 struct kvm *kvm = mmu_notifier_to_kvm(mn); 278 struct kvm *kvm = mmu_notifier_to_kvm(mn);
279 int need_tlb_flush = 0, idx; 279 int need_tlb_flush = 0, idx;
280 280
281 idx = srcu_read_lock(&kvm->srcu); 281 idx = srcu_read_lock(&kvm->srcu);
282 spin_lock(&kvm->mmu_lock); 282 spin_lock(&kvm->mmu_lock);
283 /* 283 /*
284 * The count increase must become visible at unlock time as no 284 * The count increase must become visible at unlock time as no
285 * spte can be established without taking the mmu_lock and 285 * spte can be established without taking the mmu_lock and
286 * count is also read inside the mmu_lock critical section. 286 * count is also read inside the mmu_lock critical section.
287 */ 287 */
288 kvm->mmu_notifier_count++; 288 kvm->mmu_notifier_count++;
289 for (; start < end; start += PAGE_SIZE) 289 for (; start < end; start += PAGE_SIZE)
290 need_tlb_flush |= kvm_unmap_hva(kvm, start); 290 need_tlb_flush |= kvm_unmap_hva(kvm, start);
291 spin_unlock(&kvm->mmu_lock); 291 spin_unlock(&kvm->mmu_lock);
292 srcu_read_unlock(&kvm->srcu, idx); 292 srcu_read_unlock(&kvm->srcu, idx);
293 293
294 /* we've to flush the tlb before the pages can be freed */ 294 /* we've to flush the tlb before the pages can be freed */
295 if (need_tlb_flush) 295 if (need_tlb_flush)
296 kvm_flush_remote_tlbs(kvm); 296 kvm_flush_remote_tlbs(kvm);
297 } 297 }
298 298
299 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, 299 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
300 struct mm_struct *mm, 300 struct mm_struct *mm,
301 unsigned long start, 301 unsigned long start,
302 unsigned long end) 302 unsigned long end)
303 { 303 {
304 struct kvm *kvm = mmu_notifier_to_kvm(mn); 304 struct kvm *kvm = mmu_notifier_to_kvm(mn);
305 305
306 spin_lock(&kvm->mmu_lock); 306 spin_lock(&kvm->mmu_lock);
307 /* 307 /*
308 * This sequence increase will notify the kvm page fault that 308 * This sequence increase will notify the kvm page fault that
309 * the page that is going to be mapped in the spte could have 309 * the page that is going to be mapped in the spte could have
310 * been freed. 310 * been freed.
311 */ 311 */
312 kvm->mmu_notifier_seq++; 312 kvm->mmu_notifier_seq++;
313 /* 313 /*
314 * The above sequence increase must be visible before the 314 * The above sequence increase must be visible before the
315 * below count decrease but both values are read by the kvm 315 * below count decrease but both values are read by the kvm
316 * page fault under mmu_lock spinlock so we don't need to add 316 * page fault under mmu_lock spinlock so we don't need to add
317 * a smb_wmb() here in between the two. 317 * a smb_wmb() here in between the two.
318 */ 318 */
319 kvm->mmu_notifier_count--; 319 kvm->mmu_notifier_count--;
320 spin_unlock(&kvm->mmu_lock); 320 spin_unlock(&kvm->mmu_lock);
321 321
322 BUG_ON(kvm->mmu_notifier_count < 0); 322 BUG_ON(kvm->mmu_notifier_count < 0);
323 } 323 }
324 324
325 static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, 325 static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
326 struct mm_struct *mm, 326 struct mm_struct *mm,
327 unsigned long address) 327 unsigned long address)
328 { 328 {
329 struct kvm *kvm = mmu_notifier_to_kvm(mn); 329 struct kvm *kvm = mmu_notifier_to_kvm(mn);
330 int young, idx; 330 int young, idx;
331 331
332 idx = srcu_read_lock(&kvm->srcu); 332 idx = srcu_read_lock(&kvm->srcu);
333 spin_lock(&kvm->mmu_lock); 333 spin_lock(&kvm->mmu_lock);
334 young = kvm_age_hva(kvm, address); 334 young = kvm_age_hva(kvm, address);
335 spin_unlock(&kvm->mmu_lock); 335 spin_unlock(&kvm->mmu_lock);
336 srcu_read_unlock(&kvm->srcu, idx); 336 srcu_read_unlock(&kvm->srcu, idx);
337 337
338 if (young) 338 if (young)
339 kvm_flush_remote_tlbs(kvm); 339 kvm_flush_remote_tlbs(kvm);
340 340
341 return young; 341 return young;
342 } 342 }
343 343
344 static void kvm_mmu_notifier_release(struct mmu_notifier *mn, 344 static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
345 struct mm_struct *mm) 345 struct mm_struct *mm)
346 { 346 {
347 struct kvm *kvm = mmu_notifier_to_kvm(mn); 347 struct kvm *kvm = mmu_notifier_to_kvm(mn);
348 int idx; 348 int idx;
349 349
350 idx = srcu_read_lock(&kvm->srcu); 350 idx = srcu_read_lock(&kvm->srcu);
351 kvm_arch_flush_shadow(kvm); 351 kvm_arch_flush_shadow(kvm);
352 srcu_read_unlock(&kvm->srcu, idx); 352 srcu_read_unlock(&kvm->srcu, idx);
353 } 353 }
354 354
355 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { 355 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
356 .invalidate_page = kvm_mmu_notifier_invalidate_page, 356 .invalidate_page = kvm_mmu_notifier_invalidate_page,
357 .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, 357 .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start,
358 .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, 358 .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
359 .clear_flush_young = kvm_mmu_notifier_clear_flush_young, 359 .clear_flush_young = kvm_mmu_notifier_clear_flush_young,
360 .change_pte = kvm_mmu_notifier_change_pte, 360 .change_pte = kvm_mmu_notifier_change_pte,
361 .release = kvm_mmu_notifier_release, 361 .release = kvm_mmu_notifier_release,
362 }; 362 };
363 363
364 static int kvm_init_mmu_notifier(struct kvm *kvm) 364 static int kvm_init_mmu_notifier(struct kvm *kvm)
365 { 365 {
366 kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops; 366 kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops;
367 return mmu_notifier_register(&kvm->mmu_notifier, current->mm); 367 return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
368 } 368 }
369 369
370 #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */ 370 #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */
371 371
372 static int kvm_init_mmu_notifier(struct kvm *kvm) 372 static int kvm_init_mmu_notifier(struct kvm *kvm)
373 { 373 {
374 return 0; 374 return 0;
375 } 375 }
376 376
377 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ 377 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
378 378
379 static struct kvm *kvm_create_vm(void) 379 static struct kvm *kvm_create_vm(void)
380 { 380 {
381 int r = 0, i; 381 int r = 0, i;
382 struct kvm *kvm = kvm_arch_create_vm(); 382 struct kvm *kvm = kvm_arch_create_vm();
383 383
384 if (IS_ERR(kvm)) 384 if (IS_ERR(kvm))
385 goto out; 385 goto out;
386 386
387 r = hardware_enable_all(); 387 r = hardware_enable_all();
388 if (r) 388 if (r)
389 goto out_err_nodisable; 389 goto out_err_nodisable;
390 390
391 #ifdef CONFIG_HAVE_KVM_IRQCHIP 391 #ifdef CONFIG_HAVE_KVM_IRQCHIP
392 INIT_HLIST_HEAD(&kvm->mask_notifier_list); 392 INIT_HLIST_HEAD(&kvm->mask_notifier_list);
393 INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list); 393 INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
394 #endif 394 #endif
395 395
396 r = -ENOMEM; 396 r = -ENOMEM;
397 kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); 397 kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
398 if (!kvm->memslots) 398 if (!kvm->memslots)
399 goto out_err; 399 goto out_err;
400 if (init_srcu_struct(&kvm->srcu)) 400 if (init_srcu_struct(&kvm->srcu))
401 goto out_err; 401 goto out_err;
402 for (i = 0; i < KVM_NR_BUSES; i++) { 402 for (i = 0; i < KVM_NR_BUSES; i++) {
403 kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus), 403 kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
404 GFP_KERNEL); 404 GFP_KERNEL);
405 if (!kvm->buses[i]) { 405 if (!kvm->buses[i]) {
406 cleanup_srcu_struct(&kvm->srcu); 406 cleanup_srcu_struct(&kvm->srcu);
407 goto out_err; 407 goto out_err;
408 } 408 }
409 } 409 }
410 410
411 r = kvm_init_mmu_notifier(kvm); 411 r = kvm_init_mmu_notifier(kvm);
412 if (r) { 412 if (r) {
413 cleanup_srcu_struct(&kvm->srcu); 413 cleanup_srcu_struct(&kvm->srcu);
414 goto out_err; 414 goto out_err;
415 } 415 }
416 416
417 kvm->mm = current->mm; 417 kvm->mm = current->mm;
418 atomic_inc(&kvm->mm->mm_count); 418 atomic_inc(&kvm->mm->mm_count);
419 spin_lock_init(&kvm->mmu_lock); 419 spin_lock_init(&kvm->mmu_lock);
420 raw_spin_lock_init(&kvm->requests_lock); 420 raw_spin_lock_init(&kvm->requests_lock);
421 kvm_eventfd_init(kvm); 421 kvm_eventfd_init(kvm);
422 mutex_init(&kvm->lock); 422 mutex_init(&kvm->lock);
423 mutex_init(&kvm->irq_lock); 423 mutex_init(&kvm->irq_lock);
424 mutex_init(&kvm->slots_lock); 424 mutex_init(&kvm->slots_lock);
425 atomic_set(&kvm->users_count, 1); 425 atomic_set(&kvm->users_count, 1);
426 spin_lock(&kvm_lock); 426 spin_lock(&kvm_lock);
427 list_add(&kvm->vm_list, &vm_list); 427 list_add(&kvm->vm_list, &vm_list);
428 spin_unlock(&kvm_lock); 428 spin_unlock(&kvm_lock);
429 out: 429 out:
430 return kvm; 430 return kvm;
431 431
432 out_err: 432 out_err:
433 hardware_disable_all(); 433 hardware_disable_all();
434 out_err_nodisable: 434 out_err_nodisable:
435 for (i = 0; i < KVM_NR_BUSES; i++) 435 for (i = 0; i < KVM_NR_BUSES; i++)
436 kfree(kvm->buses[i]); 436 kfree(kvm->buses[i]);
437 kfree(kvm->memslots); 437 kfree(kvm->memslots);
438 kfree(kvm); 438 kfree(kvm);
439 return ERR_PTR(r); 439 return ERR_PTR(r);
440 } 440 }
441 441
442 /* 442 /*
443 * Free any memory in @free but not in @dont. 443 * Free any memory in @free but not in @dont.
444 */ 444 */
445 static void kvm_free_physmem_slot(struct kvm_memory_slot *free, 445 static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
446 struct kvm_memory_slot *dont) 446 struct kvm_memory_slot *dont)
447 { 447 {
448 int i; 448 int i;
449 449
450 if (!dont || free->rmap != dont->rmap) 450 if (!dont || free->rmap != dont->rmap)
451 vfree(free->rmap); 451 vfree(free->rmap);
452 452
453 if (!dont || free->dirty_bitmap != dont->dirty_bitmap) 453 if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
454 vfree(free->dirty_bitmap); 454 vfree(free->dirty_bitmap);
455 455
456 456
457 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { 457 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
458 if (!dont || free->lpage_info[i] != dont->lpage_info[i]) { 458 if (!dont || free->lpage_info[i] != dont->lpage_info[i]) {
459 vfree(free->lpage_info[i]); 459 vfree(free->lpage_info[i]);
460 free->lpage_info[i] = NULL; 460 free->lpage_info[i] = NULL;
461 } 461 }
462 } 462 }
463 463
464 free->npages = 0; 464 free->npages = 0;
465 free->dirty_bitmap = NULL; 465 free->dirty_bitmap = NULL;
466 free->rmap = NULL; 466 free->rmap = NULL;
467 } 467 }
468 468
469 void kvm_free_physmem(struct kvm *kvm) 469 void kvm_free_physmem(struct kvm *kvm)
470 { 470 {
471 int i; 471 int i;
472 struct kvm_memslots *slots = kvm->memslots; 472 struct kvm_memslots *slots = kvm->memslots;
473 473
474 for (i = 0; i < slots->nmemslots; ++i) 474 for (i = 0; i < slots->nmemslots; ++i)
475 kvm_free_physmem_slot(&slots->memslots[i], NULL); 475 kvm_free_physmem_slot(&slots->memslots[i], NULL);
476 476
477 kfree(kvm->memslots); 477 kfree(kvm->memslots);
478 } 478 }
479 479
480 static void kvm_destroy_vm(struct kvm *kvm) 480 static void kvm_destroy_vm(struct kvm *kvm)
481 { 481 {
482 int i; 482 int i;
483 struct mm_struct *mm = kvm->mm; 483 struct mm_struct *mm = kvm->mm;
484 484
485 kvm_arch_sync_events(kvm); 485 kvm_arch_sync_events(kvm);
486 spin_lock(&kvm_lock); 486 spin_lock(&kvm_lock);
487 list_del(&kvm->vm_list); 487 list_del(&kvm->vm_list);
488 spin_unlock(&kvm_lock); 488 spin_unlock(&kvm_lock);
489 kvm_free_irq_routing(kvm); 489 kvm_free_irq_routing(kvm);
490 for (i = 0; i < KVM_NR_BUSES; i++) 490 for (i = 0; i < KVM_NR_BUSES; i++)
491 kvm_io_bus_destroy(kvm->buses[i]); 491 kvm_io_bus_destroy(kvm->buses[i]);
492 kvm_coalesced_mmio_free(kvm); 492 kvm_coalesced_mmio_free(kvm);
493 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) 493 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
494 mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); 494 mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
495 #else 495 #else
496 kvm_arch_flush_shadow(kvm); 496 kvm_arch_flush_shadow(kvm);
497 #endif 497 #endif
498 kvm_arch_destroy_vm(kvm); 498 kvm_arch_destroy_vm(kvm);
499 hardware_disable_all(); 499 hardware_disable_all();
500 mmdrop(mm); 500 mmdrop(mm);
501 } 501 }
502 502
503 void kvm_get_kvm(struct kvm *kvm) 503 void kvm_get_kvm(struct kvm *kvm)
504 { 504 {
505 atomic_inc(&kvm->users_count); 505 atomic_inc(&kvm->users_count);
506 } 506 }
507 EXPORT_SYMBOL_GPL(kvm_get_kvm); 507 EXPORT_SYMBOL_GPL(kvm_get_kvm);
508 508
509 void kvm_put_kvm(struct kvm *kvm) 509 void kvm_put_kvm(struct kvm *kvm)
510 { 510 {
511 if (atomic_dec_and_test(&kvm->users_count)) 511 if (atomic_dec_and_test(&kvm->users_count))
512 kvm_destroy_vm(kvm); 512 kvm_destroy_vm(kvm);
513 } 513 }
514 EXPORT_SYMBOL_GPL(kvm_put_kvm); 514 EXPORT_SYMBOL_GPL(kvm_put_kvm);
515 515
516 516
517 static int kvm_vm_release(struct inode *inode, struct file *filp) 517 static int kvm_vm_release(struct inode *inode, struct file *filp)
518 { 518 {
519 struct kvm *kvm = filp->private_data; 519 struct kvm *kvm = filp->private_data;
520 520
521 kvm_irqfd_release(kvm); 521 kvm_irqfd_release(kvm);
522 522
523 kvm_put_kvm(kvm); 523 kvm_put_kvm(kvm);
524 return 0; 524 return 0;
525 } 525 }
526 526
527 /* 527 /*
528 * Allocate some memory and give it an address in the guest physical address 528 * Allocate some memory and give it an address in the guest physical address
529 * space. 529 * space.
530 * 530 *
531 * Discontiguous memory is allowed, mostly for framebuffers. 531 * Discontiguous memory is allowed, mostly for framebuffers.
532 * 532 *
533 * Must be called holding mmap_sem for write. 533 * Must be called holding mmap_sem for write.
534 */ 534 */
535 int __kvm_set_memory_region(struct kvm *kvm, 535 int __kvm_set_memory_region(struct kvm *kvm,
536 struct kvm_userspace_memory_region *mem, 536 struct kvm_userspace_memory_region *mem,
537 int user_alloc) 537 int user_alloc)
538 { 538 {
539 int r, flush_shadow = 0; 539 int r, flush_shadow = 0;
540 gfn_t base_gfn; 540 gfn_t base_gfn;
541 unsigned long npages; 541 unsigned long npages;
542 unsigned long i; 542 unsigned long i;
543 struct kvm_memory_slot *memslot; 543 struct kvm_memory_slot *memslot;
544 struct kvm_memory_slot old, new; 544 struct kvm_memory_slot old, new;
545 struct kvm_memslots *slots, *old_memslots; 545 struct kvm_memslots *slots, *old_memslots;
546 546
547 r = -EINVAL; 547 r = -EINVAL;
548 /* General sanity checks */ 548 /* General sanity checks */
549 if (mem->memory_size & (PAGE_SIZE - 1)) 549 if (mem->memory_size & (PAGE_SIZE - 1))
550 goto out; 550 goto out;
551 if (mem->guest_phys_addr & (PAGE_SIZE - 1)) 551 if (mem->guest_phys_addr & (PAGE_SIZE - 1))
552 goto out; 552 goto out;
553 if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1))) 553 if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1)))
554 goto out; 554 goto out;
555 if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) 555 if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
556 goto out; 556 goto out;
557 if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) 557 if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
558 goto out; 558 goto out;
559 559
560 memslot = &kvm->memslots->memslots[mem->slot]; 560 memslot = &kvm->memslots->memslots[mem->slot];
561 base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; 561 base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
562 npages = mem->memory_size >> PAGE_SHIFT; 562 npages = mem->memory_size >> PAGE_SHIFT;
563 563
564 r = -EINVAL; 564 r = -EINVAL;
565 if (npages > KVM_MEM_MAX_NR_PAGES) 565 if (npages > KVM_MEM_MAX_NR_PAGES)
566 goto out; 566 goto out;
567 567
568 if (!npages) 568 if (!npages)
569 mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES; 569 mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
570 570
571 new = old = *memslot; 571 new = old = *memslot;
572 572
573 new.base_gfn = base_gfn; 573 new.base_gfn = base_gfn;
574 new.npages = npages; 574 new.npages = npages;
575 new.flags = mem->flags; 575 new.flags = mem->flags;
576 576
577 /* Disallow changing a memory slot's size. */ 577 /* Disallow changing a memory slot's size. */
578 r = -EINVAL; 578 r = -EINVAL;
579 if (npages && old.npages && npages != old.npages) 579 if (npages && old.npages && npages != old.npages)
580 goto out_free; 580 goto out_free;
581 581
582 /* Check for overlaps */ 582 /* Check for overlaps */
583 r = -EEXIST; 583 r = -EEXIST;
584 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { 584 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
585 struct kvm_memory_slot *s = &kvm->memslots->memslots[i]; 585 struct kvm_memory_slot *s = &kvm->memslots->memslots[i];
586 586
587 if (s == memslot || !s->npages) 587 if (s == memslot || !s->npages)
588 continue; 588 continue;
589 if (!((base_gfn + npages <= s->base_gfn) || 589 if (!((base_gfn + npages <= s->base_gfn) ||
590 (base_gfn >= s->base_gfn + s->npages))) 590 (base_gfn >= s->base_gfn + s->npages)))
591 goto out_free; 591 goto out_free;
592 } 592 }
593 593
594 /* Free page dirty bitmap if unneeded */ 594 /* Free page dirty bitmap if unneeded */
595 if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) 595 if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
596 new.dirty_bitmap = NULL; 596 new.dirty_bitmap = NULL;
597 597
598 r = -ENOMEM; 598 r = -ENOMEM;
599 599
600 /* Allocate if a slot is being created */ 600 /* Allocate if a slot is being created */
601 #ifndef CONFIG_S390 601 #ifndef CONFIG_S390
602 if (npages && !new.rmap) { 602 if (npages && !new.rmap) {
603 new.rmap = vmalloc(npages * sizeof(*new.rmap)); 603 new.rmap = vmalloc(npages * sizeof(*new.rmap));
604 604
605 if (!new.rmap) 605 if (!new.rmap)
606 goto out_free; 606 goto out_free;
607 607
608 memset(new.rmap, 0, npages * sizeof(*new.rmap)); 608 memset(new.rmap, 0, npages * sizeof(*new.rmap));
609 609
610 new.user_alloc = user_alloc; 610 new.user_alloc = user_alloc;
611 new.userspace_addr = mem->userspace_addr; 611 new.userspace_addr = mem->userspace_addr;
612 } 612 }
613 if (!npages) 613 if (!npages)
614 goto skip_lpage; 614 goto skip_lpage;
615 615
616 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { 616 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
617 unsigned long ugfn; 617 unsigned long ugfn;
618 unsigned long j; 618 unsigned long j;
619 int lpages; 619 int lpages;
620 int level = i + 2; 620 int level = i + 2;
621 621
622 /* Avoid unused variable warning if no large pages */ 622 /* Avoid unused variable warning if no large pages */
623 (void)level; 623 (void)level;
624 624
625 if (new.lpage_info[i]) 625 if (new.lpage_info[i])
626 continue; 626 continue;
627 627
628 lpages = 1 + (base_gfn + npages - 1) / 628 lpages = 1 + (base_gfn + npages - 1) /
629 KVM_PAGES_PER_HPAGE(level); 629 KVM_PAGES_PER_HPAGE(level);
630 lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level); 630 lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level);
631 631
632 new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i])); 632 new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i]));
633 633
634 if (!new.lpage_info[i]) 634 if (!new.lpage_info[i])
635 goto out_free; 635 goto out_free;
636 636
637 memset(new.lpage_info[i], 0, 637 memset(new.lpage_info[i], 0,
638 lpages * sizeof(*new.lpage_info[i])); 638 lpages * sizeof(*new.lpage_info[i]));
639 639
640 if (base_gfn % KVM_PAGES_PER_HPAGE(level)) 640 if (base_gfn % KVM_PAGES_PER_HPAGE(level))
641 new.lpage_info[i][0].write_count = 1; 641 new.lpage_info[i][0].write_count = 1;
642 if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level)) 642 if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level))
643 new.lpage_info[i][lpages - 1].write_count = 1; 643 new.lpage_info[i][lpages - 1].write_count = 1;
644 ugfn = new.userspace_addr >> PAGE_SHIFT; 644 ugfn = new.userspace_addr >> PAGE_SHIFT;
645 /* 645 /*
646 * If the gfn and userspace address are not aligned wrt each 646 * If the gfn and userspace address are not aligned wrt each
647 * other, or if explicitly asked to, disable large page 647 * other, or if explicitly asked to, disable large page
648 * support for this slot 648 * support for this slot
649 */ 649 */
650 if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) || 650 if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) ||
651 !largepages_enabled) 651 !largepages_enabled)
652 for (j = 0; j < lpages; ++j) 652 for (j = 0; j < lpages; ++j)
653 new.lpage_info[i][j].write_count = 1; 653 new.lpage_info[i][j].write_count = 1;
654 } 654 }
655 655
656 skip_lpage: 656 skip_lpage:
657 657
658 /* Allocate page dirty bitmap if needed */ 658 /* Allocate page dirty bitmap if needed */
659 if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) { 659 if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
660 unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new); 660 unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new);
661 661
662 new.dirty_bitmap = vmalloc(dirty_bytes); 662 new.dirty_bitmap = vmalloc(dirty_bytes);
663 if (!new.dirty_bitmap) 663 if (!new.dirty_bitmap)
664 goto out_free; 664 goto out_free;
665 memset(new.dirty_bitmap, 0, dirty_bytes); 665 memset(new.dirty_bitmap, 0, dirty_bytes);
666 /* destroy any largepage mappings for dirty tracking */ 666 /* destroy any largepage mappings for dirty tracking */
667 if (old.npages) 667 if (old.npages)
668 flush_shadow = 1; 668 flush_shadow = 1;
669 } 669 }
670 #else /* not defined CONFIG_S390 */ 670 #else /* not defined CONFIG_S390 */
671 new.user_alloc = user_alloc; 671 new.user_alloc = user_alloc;
672 if (user_alloc) 672 if (user_alloc)
673 new.userspace_addr = mem->userspace_addr; 673 new.userspace_addr = mem->userspace_addr;
674 #endif /* not defined CONFIG_S390 */ 674 #endif /* not defined CONFIG_S390 */
675 675
676 if (!npages) { 676 if (!npages) {
677 r = -ENOMEM; 677 r = -ENOMEM;
678 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); 678 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
679 if (!slots) 679 if (!slots)
680 goto out_free; 680 goto out_free;
681 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); 681 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
682 if (mem->slot >= slots->nmemslots) 682 if (mem->slot >= slots->nmemslots)
683 slots->nmemslots = mem->slot + 1; 683 slots->nmemslots = mem->slot + 1;
684 slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID; 684 slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
685 685
686 old_memslots = kvm->memslots; 686 old_memslots = kvm->memslots;
687 rcu_assign_pointer(kvm->memslots, slots); 687 rcu_assign_pointer(kvm->memslots, slots);
688 synchronize_srcu_expedited(&kvm->srcu); 688 synchronize_srcu_expedited(&kvm->srcu);
689 /* From this point no new shadow pages pointing to a deleted 689 /* From this point no new shadow pages pointing to a deleted
690 * memslot will be created. 690 * memslot will be created.
691 * 691 *
692 * validation of sp->gfn happens in: 692 * validation of sp->gfn happens in:
693 * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) 693 * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
694 * - kvm_is_visible_gfn (mmu_check_roots) 694 * - kvm_is_visible_gfn (mmu_check_roots)
695 */ 695 */
696 kvm_arch_flush_shadow(kvm); 696 kvm_arch_flush_shadow(kvm);
697 kfree(old_memslots); 697 kfree(old_memslots);
698 } 698 }
699 699
700 r = kvm_arch_prepare_memory_region(kvm, &new, old, mem, user_alloc); 700 r = kvm_arch_prepare_memory_region(kvm, &new, old, mem, user_alloc);
701 if (r) 701 if (r)
702 goto out_free; 702 goto out_free;
703 703
704 #ifdef CONFIG_DMAR 704 #ifdef CONFIG_DMAR
705 /* map the pages in iommu page table */ 705 /* map the pages in iommu page table */
706 if (npages) { 706 if (npages) {
707 r = kvm_iommu_map_pages(kvm, &new); 707 r = kvm_iommu_map_pages(kvm, &new);
708 if (r) 708 if (r)
709 goto out_free; 709 goto out_free;
710 } 710 }
711 #endif 711 #endif
712 712
713 r = -ENOMEM; 713 r = -ENOMEM;
714 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); 714 slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
715 if (!slots) 715 if (!slots)
716 goto out_free; 716 goto out_free;
717 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); 717 memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
718 if (mem->slot >= slots->nmemslots) 718 if (mem->slot >= slots->nmemslots)
719 slots->nmemslots = mem->slot + 1; 719 slots->nmemslots = mem->slot + 1;
720 720
721 /* actual memory is freed via old in kvm_free_physmem_slot below */ 721 /* actual memory is freed via old in kvm_free_physmem_slot below */
722 if (!npages) { 722 if (!npages) {
723 new.rmap = NULL; 723 new.rmap = NULL;
724 new.dirty_bitmap = NULL; 724 new.dirty_bitmap = NULL;
725 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) 725 for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i)
726 new.lpage_info[i] = NULL; 726 new.lpage_info[i] = NULL;
727 } 727 }
728 728
729 slots->memslots[mem->slot] = new; 729 slots->memslots[mem->slot] = new;
730 old_memslots = kvm->memslots; 730 old_memslots = kvm->memslots;
731 rcu_assign_pointer(kvm->memslots, slots); 731 rcu_assign_pointer(kvm->memslots, slots);
732 synchronize_srcu_expedited(&kvm->srcu); 732 synchronize_srcu_expedited(&kvm->srcu);
733 733
734 kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); 734 kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
735 735
736 kvm_free_physmem_slot(&old, &new); 736 kvm_free_physmem_slot(&old, &new);
737 kfree(old_memslots); 737 kfree(old_memslots);
738 738
739 if (flush_shadow) 739 if (flush_shadow)
740 kvm_arch_flush_shadow(kvm); 740 kvm_arch_flush_shadow(kvm);
741 741
742 return 0; 742 return 0;
743 743
744 out_free: 744 out_free:
745 kvm_free_physmem_slot(&new, &old); 745 kvm_free_physmem_slot(&new, &old);
746 out: 746 out:
747 return r; 747 return r;
748 748
749 } 749 }
750 EXPORT_SYMBOL_GPL(__kvm_set_memory_region); 750 EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
751 751
752 int kvm_set_memory_region(struct kvm *kvm, 752 int kvm_set_memory_region(struct kvm *kvm,
753 struct kvm_userspace_memory_region *mem, 753 struct kvm_userspace_memory_region *mem,
754 int user_alloc) 754 int user_alloc)
755 { 755 {
756 int r; 756 int r;
757 757
758 mutex_lock(&kvm->slots_lock); 758 mutex_lock(&kvm->slots_lock);
759 r = __kvm_set_memory_region(kvm, mem, user_alloc); 759 r = __kvm_set_memory_region(kvm, mem, user_alloc);
760 mutex_unlock(&kvm->slots_lock); 760 mutex_unlock(&kvm->slots_lock);
761 return r; 761 return r;
762 } 762 }
763 EXPORT_SYMBOL_GPL(kvm_set_memory_region); 763 EXPORT_SYMBOL_GPL(kvm_set_memory_region);
764 764
765 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, 765 int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
766 struct 766 struct
767 kvm_userspace_memory_region *mem, 767 kvm_userspace_memory_region *mem,
768 int user_alloc) 768 int user_alloc)
769 { 769 {
770 if (mem->slot >= KVM_MEMORY_SLOTS) 770 if (mem->slot >= KVM_MEMORY_SLOTS)
771 return -EINVAL; 771 return -EINVAL;
772 return kvm_set_memory_region(kvm, mem, user_alloc); 772 return kvm_set_memory_region(kvm, mem, user_alloc);
773 } 773 }
774 774
775 int kvm_get_dirty_log(struct kvm *kvm, 775 int kvm_get_dirty_log(struct kvm *kvm,
776 struct kvm_dirty_log *log, int *is_dirty) 776 struct kvm_dirty_log *log, int *is_dirty)
777 { 777 {
778 struct kvm_memory_slot *memslot; 778 struct kvm_memory_slot *memslot;
779 int r, i; 779 int r, i;
780 unsigned long n; 780 unsigned long n;
781 unsigned long any = 0; 781 unsigned long any = 0;
782 782
783 r = -EINVAL; 783 r = -EINVAL;
784 if (log->slot >= KVM_MEMORY_SLOTS) 784 if (log->slot >= KVM_MEMORY_SLOTS)
785 goto out; 785 goto out;
786 786
787 memslot = &kvm->memslots->memslots[log->slot]; 787 memslot = &kvm->memslots->memslots[log->slot];
788 r = -ENOENT; 788 r = -ENOENT;
789 if (!memslot->dirty_bitmap) 789 if (!memslot->dirty_bitmap)
790 goto out; 790 goto out;
791 791
792 n = kvm_dirty_bitmap_bytes(memslot); 792 n = kvm_dirty_bitmap_bytes(memslot);
793 793
794 for (i = 0; !any && i < n/sizeof(long); ++i) 794 for (i = 0; !any && i < n/sizeof(long); ++i)
795 any = memslot->dirty_bitmap[i]; 795 any = memslot->dirty_bitmap[i];
796 796
797 r = -EFAULT; 797 r = -EFAULT;
798 if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n)) 798 if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
799 goto out; 799 goto out;
800 800
801 if (any) 801 if (any)
802 *is_dirty = 1; 802 *is_dirty = 1;
803 803
804 r = 0; 804 r = 0;
805 out: 805 out:
806 return r; 806 return r;
807 } 807 }
808 808
809 void kvm_disable_largepages(void) 809 void kvm_disable_largepages(void)
810 { 810 {
811 largepages_enabled = false; 811 largepages_enabled = false;
812 } 812 }
813 EXPORT_SYMBOL_GPL(kvm_disable_largepages); 813 EXPORT_SYMBOL_GPL(kvm_disable_largepages);
814 814
815 int is_error_page(struct page *page) 815 int is_error_page(struct page *page)
816 { 816 {
817 return page == bad_page || page == hwpoison_page; 817 return page == bad_page || page == hwpoison_page;
818 } 818 }
819 EXPORT_SYMBOL_GPL(is_error_page); 819 EXPORT_SYMBOL_GPL(is_error_page);
820 820
821 int is_error_pfn(pfn_t pfn) 821 int is_error_pfn(pfn_t pfn)
822 { 822 {
823 return pfn == bad_pfn || pfn == hwpoison_pfn; 823 return pfn == bad_pfn || pfn == hwpoison_pfn;
824 } 824 }
825 EXPORT_SYMBOL_GPL(is_error_pfn); 825 EXPORT_SYMBOL_GPL(is_error_pfn);
826 826
827 int is_hwpoison_pfn(pfn_t pfn) 827 int is_hwpoison_pfn(pfn_t pfn)
828 { 828 {
829 return pfn == hwpoison_pfn; 829 return pfn == hwpoison_pfn;
830 } 830 }
831 EXPORT_SYMBOL_GPL(is_hwpoison_pfn); 831 EXPORT_SYMBOL_GPL(is_hwpoison_pfn);
832 832
833 static inline unsigned long bad_hva(void) 833 static inline unsigned long bad_hva(void)
834 { 834 {
835 return PAGE_OFFSET; 835 return PAGE_OFFSET;
836 } 836 }
837 837
838 int kvm_is_error_hva(unsigned long addr) 838 int kvm_is_error_hva(unsigned long addr)
839 { 839 {
840 return addr == bad_hva(); 840 return addr == bad_hva();
841 } 841 }
842 EXPORT_SYMBOL_GPL(kvm_is_error_hva); 842 EXPORT_SYMBOL_GPL(kvm_is_error_hva);
843 843
844 struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn) 844 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
845 { 845 {
846 int i; 846 int i;
847 struct kvm_memslots *slots = kvm_memslots(kvm); 847 struct kvm_memslots *slots = kvm_memslots(kvm);
848 848
849 for (i = 0; i < slots->nmemslots; ++i) { 849 for (i = 0; i < slots->nmemslots; ++i) {
850 struct kvm_memory_slot *memslot = &slots->memslots[i]; 850 struct kvm_memory_slot *memslot = &slots->memslots[i];
851 851
852 if (gfn >= memslot->base_gfn 852 if (gfn >= memslot->base_gfn
853 && gfn < memslot->base_gfn + memslot->npages) 853 && gfn < memslot->base_gfn + memslot->npages)
854 return memslot; 854 return memslot;
855 } 855 }
856 return NULL; 856 return NULL;
857 } 857 }
858 EXPORT_SYMBOL_GPL(gfn_to_memslot_unaliased); 858 EXPORT_SYMBOL_GPL(gfn_to_memslot);
859 859
860 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
861 {
862 gfn = unalias_gfn(kvm, gfn);
863 return gfn_to_memslot_unaliased(kvm, gfn);
864 }
865
866 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) 860 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
867 { 861 {
868 int i; 862 int i;
869 struct kvm_memslots *slots = kvm_memslots(kvm); 863 struct kvm_memslots *slots = kvm_memslots(kvm);
870 864
871 gfn = unalias_gfn_instantiation(kvm, gfn);
872 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { 865 for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
873 struct kvm_memory_slot *memslot = &slots->memslots[i]; 866 struct kvm_memory_slot *memslot = &slots->memslots[i];
874 867
875 if (memslot->flags & KVM_MEMSLOT_INVALID) 868 if (memslot->flags & KVM_MEMSLOT_INVALID)
876 continue; 869 continue;
877 870
878 if (gfn >= memslot->base_gfn 871 if (gfn >= memslot->base_gfn
879 && gfn < memslot->base_gfn + memslot->npages) 872 && gfn < memslot->base_gfn + memslot->npages)
880 return 1; 873 return 1;
881 } 874 }
882 return 0; 875 return 0;
883 } 876 }
884 EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); 877 EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
885 878
886 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) 879 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
887 { 880 {
888 struct vm_area_struct *vma; 881 struct vm_area_struct *vma;
889 unsigned long addr, size; 882 unsigned long addr, size;
890 883
891 size = PAGE_SIZE; 884 size = PAGE_SIZE;
892 885
893 addr = gfn_to_hva(kvm, gfn); 886 addr = gfn_to_hva(kvm, gfn);
894 if (kvm_is_error_hva(addr)) 887 if (kvm_is_error_hva(addr))
895 return PAGE_SIZE; 888 return PAGE_SIZE;
896 889
897 down_read(&current->mm->mmap_sem); 890 down_read(&current->mm->mmap_sem);
898 vma = find_vma(current->mm, addr); 891 vma = find_vma(current->mm, addr);
899 if (!vma) 892 if (!vma)
900 goto out; 893 goto out;
901 894
902 size = vma_kernel_pagesize(vma); 895 size = vma_kernel_pagesize(vma);
903 896
904 out: 897 out:
905 up_read(&current->mm->mmap_sem); 898 up_read(&current->mm->mmap_sem);
906 899
907 return size; 900 return size;
908 } 901 }
909 902
910 int memslot_id(struct kvm *kvm, gfn_t gfn) 903 int memslot_id(struct kvm *kvm, gfn_t gfn)
911 { 904 {
912 int i; 905 int i;
913 struct kvm_memslots *slots = kvm_memslots(kvm); 906 struct kvm_memslots *slots = kvm_memslots(kvm);
914 struct kvm_memory_slot *memslot = NULL; 907 struct kvm_memory_slot *memslot = NULL;
915 908
916 gfn = unalias_gfn(kvm, gfn);
917 for (i = 0; i < slots->nmemslots; ++i) { 909 for (i = 0; i < slots->nmemslots; ++i) {
918 memslot = &slots->memslots[i]; 910 memslot = &slots->memslots[i];
919 911
920 if (gfn >= memslot->base_gfn 912 if (gfn >= memslot->base_gfn
921 && gfn < memslot->base_gfn + memslot->npages) 913 && gfn < memslot->base_gfn + memslot->npages)
922 break; 914 break;
923 } 915 }
924 916
925 return memslot - slots->memslots; 917 return memslot - slots->memslots;
926 } 918 }
927 919
928 static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) 920 static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
929 { 921 {
930 return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE; 922 return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
931 } 923 }
932 924
933 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) 925 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
934 { 926 {
935 struct kvm_memory_slot *slot; 927 struct kvm_memory_slot *slot;
936 928
937 gfn = unalias_gfn_instantiation(kvm, gfn); 929 slot = gfn_to_memslot(kvm, gfn);
938 slot = gfn_to_memslot_unaliased(kvm, gfn);
939 if (!slot || slot->flags & KVM_MEMSLOT_INVALID) 930 if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
940 return bad_hva(); 931 return bad_hva();
941 return gfn_to_hva_memslot(slot, gfn); 932 return gfn_to_hva_memslot(slot, gfn);
942 } 933 }
943 EXPORT_SYMBOL_GPL(gfn_to_hva); 934 EXPORT_SYMBOL_GPL(gfn_to_hva);
944 935
945 static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) 936 static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
946 { 937 {
947 struct page *page[1]; 938 struct page *page[1];
948 int npages; 939 int npages;
949 pfn_t pfn; 940 pfn_t pfn;
950 941
951 might_sleep(); 942 might_sleep();
952 943
953 npages = get_user_pages_fast(addr, 1, 1, page); 944 npages = get_user_pages_fast(addr, 1, 1, page);
954 945
955 if (unlikely(npages != 1)) { 946 if (unlikely(npages != 1)) {
956 struct vm_area_struct *vma; 947 struct vm_area_struct *vma;
957 948
958 if (is_hwpoison_address(addr)) { 949 if (is_hwpoison_address(addr)) {
959 get_page(hwpoison_page); 950 get_page(hwpoison_page);
960 return page_to_pfn(hwpoison_page); 951 return page_to_pfn(hwpoison_page);
961 } 952 }
962 953
963 down_read(&current->mm->mmap_sem); 954 down_read(&current->mm->mmap_sem);
964 vma = find_vma(current->mm, addr); 955 vma = find_vma(current->mm, addr);
965 956
966 if (vma == NULL || addr < vma->vm_start || 957 if (vma == NULL || addr < vma->vm_start ||
967 !(vma->vm_flags & VM_PFNMAP)) { 958 !(vma->vm_flags & VM_PFNMAP)) {
968 up_read(&current->mm->mmap_sem); 959 up_read(&current->mm->mmap_sem);
969 get_page(bad_page); 960 get_page(bad_page);
970 return page_to_pfn(bad_page); 961 return page_to_pfn(bad_page);
971 } 962 }
972 963
973 pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; 964 pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
974 up_read(&current->mm->mmap_sem); 965 up_read(&current->mm->mmap_sem);
975 BUG_ON(!kvm_is_mmio_pfn(pfn)); 966 BUG_ON(!kvm_is_mmio_pfn(pfn));
976 } else 967 } else
977 pfn = page_to_pfn(page[0]); 968 pfn = page_to_pfn(page[0]);
978 969
979 return pfn; 970 return pfn;
980 } 971 }
981 972
982 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) 973 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
983 { 974 {
984 unsigned long addr; 975 unsigned long addr;
985 976
986 addr = gfn_to_hva(kvm, gfn); 977 addr = gfn_to_hva(kvm, gfn);
987 if (kvm_is_error_hva(addr)) { 978 if (kvm_is_error_hva(addr)) {
988 get_page(bad_page); 979 get_page(bad_page);
989 return page_to_pfn(bad_page); 980 return page_to_pfn(bad_page);
990 } 981 }
991 982
992 return hva_to_pfn(kvm, addr); 983 return hva_to_pfn(kvm, addr);
993 } 984 }
994 EXPORT_SYMBOL_GPL(gfn_to_pfn); 985 EXPORT_SYMBOL_GPL(gfn_to_pfn);
995 986
996 pfn_t gfn_to_pfn_memslot(struct kvm *kvm, 987 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
997 struct kvm_memory_slot *slot, gfn_t gfn) 988 struct kvm_memory_slot *slot, gfn_t gfn)
998 { 989 {
999 unsigned long addr = gfn_to_hva_memslot(slot, gfn); 990 unsigned long addr = gfn_to_hva_memslot(slot, gfn);
1000 return hva_to_pfn(kvm, addr); 991 return hva_to_pfn(kvm, addr);
1001 } 992 }
1002 993
1003 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) 994 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
1004 { 995 {
1005 pfn_t pfn; 996 pfn_t pfn;
1006 997
1007 pfn = gfn_to_pfn(kvm, gfn); 998 pfn = gfn_to_pfn(kvm, gfn);
1008 if (!kvm_is_mmio_pfn(pfn)) 999 if (!kvm_is_mmio_pfn(pfn))
1009 return pfn_to_page(pfn); 1000 return pfn_to_page(pfn);
1010 1001
1011 WARN_ON(kvm_is_mmio_pfn(pfn)); 1002 WARN_ON(kvm_is_mmio_pfn(pfn));
1012 1003
1013 get_page(bad_page); 1004 get_page(bad_page);
1014 return bad_page; 1005 return bad_page;
1015 } 1006 }
1016 1007
1017 EXPORT_SYMBOL_GPL(gfn_to_page); 1008 EXPORT_SYMBOL_GPL(gfn_to_page);
1018 1009
1019 void kvm_release_page_clean(struct page *page) 1010 void kvm_release_page_clean(struct page *page)
1020 { 1011 {
1021 kvm_release_pfn_clean(page_to_pfn(page)); 1012 kvm_release_pfn_clean(page_to_pfn(page));
1022 } 1013 }
1023 EXPORT_SYMBOL_GPL(kvm_release_page_clean); 1014 EXPORT_SYMBOL_GPL(kvm_release_page_clean);
1024 1015
1025 void kvm_release_pfn_clean(pfn_t pfn) 1016 void kvm_release_pfn_clean(pfn_t pfn)
1026 { 1017 {
1027 if (!kvm_is_mmio_pfn(pfn)) 1018 if (!kvm_is_mmio_pfn(pfn))
1028 put_page(pfn_to_page(pfn)); 1019 put_page(pfn_to_page(pfn));
1029 } 1020 }
1030 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); 1021 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
1031 1022
1032 void kvm_release_page_dirty(struct page *page) 1023 void kvm_release_page_dirty(struct page *page)
1033 { 1024 {
1034 kvm_release_pfn_dirty(page_to_pfn(page)); 1025 kvm_release_pfn_dirty(page_to_pfn(page));
1035 } 1026 }
1036 EXPORT_SYMBOL_GPL(kvm_release_page_dirty); 1027 EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
1037 1028
1038 void kvm_release_pfn_dirty(pfn_t pfn) 1029 void kvm_release_pfn_dirty(pfn_t pfn)
1039 { 1030 {
1040 kvm_set_pfn_dirty(pfn); 1031 kvm_set_pfn_dirty(pfn);
1041 kvm_release_pfn_clean(pfn); 1032 kvm_release_pfn_clean(pfn);
1042 } 1033 }
1043 EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty); 1034 EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
1044 1035
1045 void kvm_set_page_dirty(struct page *page) 1036 void kvm_set_page_dirty(struct page *page)
1046 { 1037 {
1047 kvm_set_pfn_dirty(page_to_pfn(page)); 1038 kvm_set_pfn_dirty(page_to_pfn(page));
1048 } 1039 }
1049 EXPORT_SYMBOL_GPL(kvm_set_page_dirty); 1040 EXPORT_SYMBOL_GPL(kvm_set_page_dirty);
1050 1041
1051 void kvm_set_pfn_dirty(pfn_t pfn) 1042 void kvm_set_pfn_dirty(pfn_t pfn)
1052 { 1043 {
1053 if (!kvm_is_mmio_pfn(pfn)) { 1044 if (!kvm_is_mmio_pfn(pfn)) {
1054 struct page *page = pfn_to_page(pfn); 1045 struct page *page = pfn_to_page(pfn);
1055 if (!PageReserved(page)) 1046 if (!PageReserved(page))
1056 SetPageDirty(page); 1047 SetPageDirty(page);
1057 } 1048 }
1058 } 1049 }
1059 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); 1050 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
1060 1051
1061 void kvm_set_pfn_accessed(pfn_t pfn) 1052 void kvm_set_pfn_accessed(pfn_t pfn)
1062 { 1053 {
1063 if (!kvm_is_mmio_pfn(pfn)) 1054 if (!kvm_is_mmio_pfn(pfn))
1064 mark_page_accessed(pfn_to_page(pfn)); 1055 mark_page_accessed(pfn_to_page(pfn));
1065 } 1056 }
1066 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); 1057 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
1067 1058
1068 void kvm_get_pfn(pfn_t pfn) 1059 void kvm_get_pfn(pfn_t pfn)
1069 { 1060 {
1070 if (!kvm_is_mmio_pfn(pfn)) 1061 if (!kvm_is_mmio_pfn(pfn))
1071 get_page(pfn_to_page(pfn)); 1062 get_page(pfn_to_page(pfn));
1072 } 1063 }
1073 EXPORT_SYMBOL_GPL(kvm_get_pfn); 1064 EXPORT_SYMBOL_GPL(kvm_get_pfn);
1074 1065
1075 static int next_segment(unsigned long len, int offset) 1066 static int next_segment(unsigned long len, int offset)
1076 { 1067 {
1077 if (len > PAGE_SIZE - offset) 1068 if (len > PAGE_SIZE - offset)
1078 return PAGE_SIZE - offset; 1069 return PAGE_SIZE - offset;
1079 else 1070 else
1080 return len; 1071 return len;
1081 } 1072 }
1082 1073
1083 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, 1074 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
1084 int len) 1075 int len)
1085 { 1076 {
1086 int r; 1077 int r;
1087 unsigned long addr; 1078 unsigned long addr;
1088 1079
1089 addr = gfn_to_hva(kvm, gfn); 1080 addr = gfn_to_hva(kvm, gfn);
1090 if (kvm_is_error_hva(addr)) 1081 if (kvm_is_error_hva(addr))
1091 return -EFAULT; 1082 return -EFAULT;
1092 r = copy_from_user(data, (void __user *)addr + offset, len); 1083 r = copy_from_user(data, (void __user *)addr + offset, len);
1093 if (r) 1084 if (r)
1094 return -EFAULT; 1085 return -EFAULT;
1095 return 0; 1086 return 0;
1096 } 1087 }
1097 EXPORT_SYMBOL_GPL(kvm_read_guest_page); 1088 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
1098 1089
1099 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len) 1090 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len)
1100 { 1091 {
1101 gfn_t gfn = gpa >> PAGE_SHIFT; 1092 gfn_t gfn = gpa >> PAGE_SHIFT;
1102 int seg; 1093 int seg;
1103 int offset = offset_in_page(gpa); 1094 int offset = offset_in_page(gpa);
1104 int ret; 1095 int ret;
1105 1096
1106 while ((seg = next_segment(len, offset)) != 0) { 1097 while ((seg = next_segment(len, offset)) != 0) {
1107 ret = kvm_read_guest_page(kvm, gfn, data, offset, seg); 1098 ret = kvm_read_guest_page(kvm, gfn, data, offset, seg);
1108 if (ret < 0) 1099 if (ret < 0)
1109 return ret; 1100 return ret;
1110 offset = 0; 1101 offset = 0;
1111 len -= seg; 1102 len -= seg;
1112 data += seg; 1103 data += seg;
1113 ++gfn; 1104 ++gfn;
1114 } 1105 }
1115 return 0; 1106 return 0;
1116 } 1107 }
1117 EXPORT_SYMBOL_GPL(kvm_read_guest); 1108 EXPORT_SYMBOL_GPL(kvm_read_guest);
1118 1109
1119 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, 1110 int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
1120 unsigned long len) 1111 unsigned long len)
1121 { 1112 {
1122 int r; 1113 int r;
1123 unsigned long addr; 1114 unsigned long addr;
1124 gfn_t gfn = gpa >> PAGE_SHIFT; 1115 gfn_t gfn = gpa >> PAGE_SHIFT;
1125 int offset = offset_in_page(gpa); 1116 int offset = offset_in_page(gpa);
1126 1117
1127 addr = gfn_to_hva(kvm, gfn); 1118 addr = gfn_to_hva(kvm, gfn);
1128 if (kvm_is_error_hva(addr)) 1119 if (kvm_is_error_hva(addr))
1129 return -EFAULT; 1120 return -EFAULT;
1130 pagefault_disable(); 1121 pagefault_disable();
1131 r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len); 1122 r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
1132 pagefault_enable(); 1123 pagefault_enable();
1133 if (r) 1124 if (r)
1134 return -EFAULT; 1125 return -EFAULT;
1135 return 0; 1126 return 0;
1136 } 1127 }
1137 EXPORT_SYMBOL(kvm_read_guest_atomic); 1128 EXPORT_SYMBOL(kvm_read_guest_atomic);
1138 1129
1139 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, 1130 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
1140 int offset, int len) 1131 int offset, int len)
1141 { 1132 {
1142 int r; 1133 int r;
1143 unsigned long addr; 1134 unsigned long addr;
1144 1135
1145 addr = gfn_to_hva(kvm, gfn); 1136 addr = gfn_to_hva(kvm, gfn);
1146 if (kvm_is_error_hva(addr)) 1137 if (kvm_is_error_hva(addr))
1147 return -EFAULT; 1138 return -EFAULT;
1148 r = copy_to_user((void __user *)addr + offset, data, len); 1139 r = copy_to_user((void __user *)addr + offset, data, len);
1149 if (r) 1140 if (r)
1150 return -EFAULT; 1141 return -EFAULT;
1151 mark_page_dirty(kvm, gfn); 1142 mark_page_dirty(kvm, gfn);
1152 return 0; 1143 return 0;
1153 } 1144 }
1154 EXPORT_SYMBOL_GPL(kvm_write_guest_page); 1145 EXPORT_SYMBOL_GPL(kvm_write_guest_page);
1155 1146
1156 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, 1147 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
1157 unsigned long len) 1148 unsigned long len)
1158 { 1149 {
1159 gfn_t gfn = gpa >> PAGE_SHIFT; 1150 gfn_t gfn = gpa >> PAGE_SHIFT;
1160 int seg; 1151 int seg;
1161 int offset = offset_in_page(gpa); 1152 int offset = offset_in_page(gpa);
1162 int ret; 1153 int ret;
1163 1154
1164 while ((seg = next_segment(len, offset)) != 0) { 1155 while ((seg = next_segment(len, offset)) != 0) {
1165 ret = kvm_write_guest_page(kvm, gfn, data, offset, seg); 1156 ret = kvm_write_guest_page(kvm, gfn, data, offset, seg);
1166 if (ret < 0) 1157 if (ret < 0)
1167 return ret; 1158 return ret;
1168 offset = 0; 1159 offset = 0;
1169 len -= seg; 1160 len -= seg;
1170 data += seg; 1161 data += seg;
1171 ++gfn; 1162 ++gfn;
1172 } 1163 }
1173 return 0; 1164 return 0;
1174 } 1165 }
1175 1166
1176 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) 1167 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
1177 { 1168 {
1178 return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len); 1169 return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len);
1179 } 1170 }
1180 EXPORT_SYMBOL_GPL(kvm_clear_guest_page); 1171 EXPORT_SYMBOL_GPL(kvm_clear_guest_page);
1181 1172
1182 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) 1173 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
1183 { 1174 {
1184 gfn_t gfn = gpa >> PAGE_SHIFT; 1175 gfn_t gfn = gpa >> PAGE_SHIFT;
1185 int seg; 1176 int seg;
1186 int offset = offset_in_page(gpa); 1177 int offset = offset_in_page(gpa);
1187 int ret; 1178 int ret;
1188 1179
1189 while ((seg = next_segment(len, offset)) != 0) { 1180 while ((seg = next_segment(len, offset)) != 0) {
1190 ret = kvm_clear_guest_page(kvm, gfn, offset, seg); 1181 ret = kvm_clear_guest_page(kvm, gfn, offset, seg);
1191 if (ret < 0) 1182 if (ret < 0)
1192 return ret; 1183 return ret;
1193 offset = 0; 1184 offset = 0;
1194 len -= seg; 1185 len -= seg;
1195 ++gfn; 1186 ++gfn;
1196 } 1187 }
1197 return 0; 1188 return 0;
1198 } 1189 }
1199 EXPORT_SYMBOL_GPL(kvm_clear_guest); 1190 EXPORT_SYMBOL_GPL(kvm_clear_guest);
1200 1191
1201 void mark_page_dirty(struct kvm *kvm, gfn_t gfn) 1192 void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
1202 { 1193 {
1203 struct kvm_memory_slot *memslot; 1194 struct kvm_memory_slot *memslot;
1204 1195
1205 gfn = unalias_gfn(kvm, gfn); 1196 memslot = gfn_to_memslot(kvm, gfn);
1206 memslot = gfn_to_memslot_unaliased(kvm, gfn);
1207 if (memslot && memslot->dirty_bitmap) { 1197 if (memslot && memslot->dirty_bitmap) {
1208 unsigned long rel_gfn = gfn - memslot->base_gfn; 1198 unsigned long rel_gfn = gfn - memslot->base_gfn;
1209 1199
1210 generic___set_le_bit(rel_gfn, memslot->dirty_bitmap); 1200 generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
1211 } 1201 }
1212 } 1202 }
1213 1203
1214 /* 1204 /*
1215 * The vCPU has executed a HLT instruction with in-kernel mode enabled. 1205 * The vCPU has executed a HLT instruction with in-kernel mode enabled.
1216 */ 1206 */
1217 void kvm_vcpu_block(struct kvm_vcpu *vcpu) 1207 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
1218 { 1208 {
1219 DEFINE_WAIT(wait); 1209 DEFINE_WAIT(wait);
1220 1210
1221 for (;;) { 1211 for (;;) {
1222 prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); 1212 prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
1223 1213
1224 if (kvm_arch_vcpu_runnable(vcpu)) { 1214 if (kvm_arch_vcpu_runnable(vcpu)) {
1225 set_bit(KVM_REQ_UNHALT, &vcpu->requests); 1215 set_bit(KVM_REQ_UNHALT, &vcpu->requests);
1226 break; 1216 break;
1227 } 1217 }
1228 if (kvm_cpu_has_pending_timer(vcpu)) 1218 if (kvm_cpu_has_pending_timer(vcpu))
1229 break; 1219 break;
1230 if (signal_pending(current)) 1220 if (signal_pending(current))
1231 break; 1221 break;
1232 1222
1233 schedule(); 1223 schedule();
1234 } 1224 }
1235 1225
1236 finish_wait(&vcpu->wq, &wait); 1226 finish_wait(&vcpu->wq, &wait);
1237 } 1227 }
1238 1228
1239 void kvm_resched(struct kvm_vcpu *vcpu) 1229 void kvm_resched(struct kvm_vcpu *vcpu)
1240 { 1230 {
1241 if (!need_resched()) 1231 if (!need_resched())
1242 return; 1232 return;
1243 cond_resched(); 1233 cond_resched();
1244 } 1234 }
1245 EXPORT_SYMBOL_GPL(kvm_resched); 1235 EXPORT_SYMBOL_GPL(kvm_resched);
1246 1236
1247 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) 1237 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
1248 { 1238 {
1249 ktime_t expires; 1239 ktime_t expires;
1250 DEFINE_WAIT(wait); 1240 DEFINE_WAIT(wait);
1251 1241
1252 prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); 1242 prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
1253 1243
1254 /* Sleep for 100 us, and hope lock-holder got scheduled */ 1244 /* Sleep for 100 us, and hope lock-holder got scheduled */
1255 expires = ktime_add_ns(ktime_get(), 100000UL); 1245 expires = ktime_add_ns(ktime_get(), 100000UL);
1256 schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); 1246 schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
1257 1247
1258 finish_wait(&vcpu->wq, &wait); 1248 finish_wait(&vcpu->wq, &wait);
1259 } 1249 }
1260 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); 1250 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
1261 1251
1262 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) 1252 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
1263 { 1253 {
1264 struct kvm_vcpu *vcpu = vma->vm_file->private_data; 1254 struct kvm_vcpu *vcpu = vma->vm_file->private_data;
1265 struct page *page; 1255 struct page *page;
1266 1256
1267 if (vmf->pgoff == 0) 1257 if (vmf->pgoff == 0)
1268 page = virt_to_page(vcpu->run); 1258 page = virt_to_page(vcpu->run);
1269 #ifdef CONFIG_X86 1259 #ifdef CONFIG_X86
1270 else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET) 1260 else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET)
1271 page = virt_to_page(vcpu->arch.pio_data); 1261 page = virt_to_page(vcpu->arch.pio_data);
1272 #endif 1262 #endif
1273 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET 1263 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
1274 else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) 1264 else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
1275 page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); 1265 page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
1276 #endif 1266 #endif
1277 else 1267 else
1278 return VM_FAULT_SIGBUS; 1268 return VM_FAULT_SIGBUS;
1279 get_page(page); 1269 get_page(page);
1280 vmf->page = page; 1270 vmf->page = page;
1281 return 0; 1271 return 0;
1282 } 1272 }
1283 1273
1284 static const struct vm_operations_struct kvm_vcpu_vm_ops = { 1274 static const struct vm_operations_struct kvm_vcpu_vm_ops = {
1285 .fault = kvm_vcpu_fault, 1275 .fault = kvm_vcpu_fault,
1286 }; 1276 };
1287 1277
1288 static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) 1278 static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
1289 { 1279 {
1290 vma->vm_ops = &kvm_vcpu_vm_ops; 1280 vma->vm_ops = &kvm_vcpu_vm_ops;
1291 return 0; 1281 return 0;
1292 } 1282 }
1293 1283
1294 static int kvm_vcpu_release(struct inode *inode, struct file *filp) 1284 static int kvm_vcpu_release(struct inode *inode, struct file *filp)
1295 { 1285 {
1296 struct kvm_vcpu *vcpu = filp->private_data; 1286 struct kvm_vcpu *vcpu = filp->private_data;
1297 1287
1298 kvm_put_kvm(vcpu->kvm); 1288 kvm_put_kvm(vcpu->kvm);
1299 return 0; 1289 return 0;
1300 } 1290 }
1301 1291
1302 static struct file_operations kvm_vcpu_fops = { 1292 static struct file_operations kvm_vcpu_fops = {
1303 .release = kvm_vcpu_release, 1293 .release = kvm_vcpu_release,
1304 .unlocked_ioctl = kvm_vcpu_ioctl, 1294 .unlocked_ioctl = kvm_vcpu_ioctl,
1305 .compat_ioctl = kvm_vcpu_ioctl, 1295 .compat_ioctl = kvm_vcpu_ioctl,
1306 .mmap = kvm_vcpu_mmap, 1296 .mmap = kvm_vcpu_mmap,
1307 }; 1297 };
1308 1298
1309 /* 1299 /*
1310 * Allocates an inode for the vcpu. 1300 * Allocates an inode for the vcpu.
1311 */ 1301 */
1312 static int create_vcpu_fd(struct kvm_vcpu *vcpu) 1302 static int create_vcpu_fd(struct kvm_vcpu *vcpu)
1313 { 1303 {
1314 return anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, O_RDWR); 1304 return anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, O_RDWR);
1315 } 1305 }
1316 1306
1317 /* 1307 /*
1318 * Creates some virtual cpus. Good luck creating more than one. 1308 * Creates some virtual cpus. Good luck creating more than one.
1319 */ 1309 */
1320 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) 1310 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
1321 { 1311 {
1322 int r; 1312 int r;
1323 struct kvm_vcpu *vcpu, *v; 1313 struct kvm_vcpu *vcpu, *v;
1324 1314
1325 vcpu = kvm_arch_vcpu_create(kvm, id); 1315 vcpu = kvm_arch_vcpu_create(kvm, id);
1326 if (IS_ERR(vcpu)) 1316 if (IS_ERR(vcpu))
1327 return PTR_ERR(vcpu); 1317 return PTR_ERR(vcpu);
1328 1318
1329 preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); 1319 preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
1330 1320
1331 r = kvm_arch_vcpu_setup(vcpu); 1321 r = kvm_arch_vcpu_setup(vcpu);
1332 if (r) 1322 if (r)
1333 return r; 1323 return r;
1334 1324
1335 mutex_lock(&kvm->lock); 1325 mutex_lock(&kvm->lock);
1336 if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { 1326 if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
1337 r = -EINVAL; 1327 r = -EINVAL;
1338 goto vcpu_destroy; 1328 goto vcpu_destroy;
1339 } 1329 }
1340 1330
1341 kvm_for_each_vcpu(r, v, kvm) 1331 kvm_for_each_vcpu(r, v, kvm)
1342 if (v->vcpu_id == id) { 1332 if (v->vcpu_id == id) {
1343 r = -EEXIST; 1333 r = -EEXIST;
1344 goto vcpu_destroy; 1334 goto vcpu_destroy;
1345 } 1335 }
1346 1336
1347 BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]); 1337 BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]);
1348 1338
1349 /* Now it's all set up, let userspace reach it */ 1339 /* Now it's all set up, let userspace reach it */
1350 kvm_get_kvm(kvm); 1340 kvm_get_kvm(kvm);
1351 r = create_vcpu_fd(vcpu); 1341 r = create_vcpu_fd(vcpu);
1352 if (r < 0) { 1342 if (r < 0) {
1353 kvm_put_kvm(kvm); 1343 kvm_put_kvm(kvm);
1354 goto vcpu_destroy; 1344 goto vcpu_destroy;
1355 } 1345 }
1356 1346
1357 kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu; 1347 kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
1358 smp_wmb(); 1348 smp_wmb();
1359 atomic_inc(&kvm->online_vcpus); 1349 atomic_inc(&kvm->online_vcpus);
1360 1350
1361 #ifdef CONFIG_KVM_APIC_ARCHITECTURE 1351 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
1362 if (kvm->bsp_vcpu_id == id) 1352 if (kvm->bsp_vcpu_id == id)
1363 kvm->bsp_vcpu = vcpu; 1353 kvm->bsp_vcpu = vcpu;
1364 #endif 1354 #endif
1365 mutex_unlock(&kvm->lock); 1355 mutex_unlock(&kvm->lock);
1366 return r; 1356 return r;
1367 1357
1368 vcpu_destroy: 1358 vcpu_destroy:
1369 mutex_unlock(&kvm->lock); 1359 mutex_unlock(&kvm->lock);
1370 kvm_arch_vcpu_destroy(vcpu); 1360 kvm_arch_vcpu_destroy(vcpu);
1371 return r; 1361 return r;
1372 } 1362 }
1373 1363
1374 static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset) 1364 static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
1375 { 1365 {
1376 if (sigset) { 1366 if (sigset) {
1377 sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP)); 1367 sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP));
1378 vcpu->sigset_active = 1; 1368 vcpu->sigset_active = 1;
1379 vcpu->sigset = *sigset; 1369 vcpu->sigset = *sigset;
1380 } else 1370 } else
1381 vcpu->sigset_active = 0; 1371 vcpu->sigset_active = 0;
1382 return 0; 1372 return 0;
1383 } 1373 }
1384 1374
1385 static long kvm_vcpu_ioctl(struct file *filp, 1375 static long kvm_vcpu_ioctl(struct file *filp,
1386 unsigned int ioctl, unsigned long arg) 1376 unsigned int ioctl, unsigned long arg)
1387 { 1377 {
1388 struct kvm_vcpu *vcpu = filp->private_data; 1378 struct kvm_vcpu *vcpu = filp->private_data;
1389 void __user *argp = (void __user *)arg; 1379 void __user *argp = (void __user *)arg;
1390 int r; 1380 int r;
1391 struct kvm_fpu *fpu = NULL; 1381 struct kvm_fpu *fpu = NULL;
1392 struct kvm_sregs *kvm_sregs = NULL; 1382 struct kvm_sregs *kvm_sregs = NULL;
1393 1383
1394 if (vcpu->kvm->mm != current->mm) 1384 if (vcpu->kvm->mm != current->mm)
1395 return -EIO; 1385 return -EIO;
1396 1386
1397 #if defined(CONFIG_S390) || defined(CONFIG_PPC) 1387 #if defined(CONFIG_S390) || defined(CONFIG_PPC)
1398 /* 1388 /*
1399 * Special cases: vcpu ioctls that are asynchronous to vcpu execution, 1389 * Special cases: vcpu ioctls that are asynchronous to vcpu execution,
1400 * so vcpu_load() would break it. 1390 * so vcpu_load() would break it.
1401 */ 1391 */
1402 if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_INTERRUPT) 1392 if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_INTERRUPT)
1403 return kvm_arch_vcpu_ioctl(filp, ioctl, arg); 1393 return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
1404 #endif 1394 #endif
1405 1395
1406 1396
1407 vcpu_load(vcpu); 1397 vcpu_load(vcpu);
1408 switch (ioctl) { 1398 switch (ioctl) {
1409 case KVM_RUN: 1399 case KVM_RUN:
1410 r = -EINVAL; 1400 r = -EINVAL;
1411 if (arg) 1401 if (arg)
1412 goto out; 1402 goto out;
1413 r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run); 1403 r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
1414 break; 1404 break;
1415 case KVM_GET_REGS: { 1405 case KVM_GET_REGS: {
1416 struct kvm_regs *kvm_regs; 1406 struct kvm_regs *kvm_regs;
1417 1407
1418 r = -ENOMEM; 1408 r = -ENOMEM;
1419 kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); 1409 kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL);
1420 if (!kvm_regs) 1410 if (!kvm_regs)
1421 goto out; 1411 goto out;
1422 r = kvm_arch_vcpu_ioctl_get_regs(vcpu, kvm_regs); 1412 r = kvm_arch_vcpu_ioctl_get_regs(vcpu, kvm_regs);
1423 if (r) 1413 if (r)
1424 goto out_free1; 1414 goto out_free1;
1425 r = -EFAULT; 1415 r = -EFAULT;
1426 if (copy_to_user(argp, kvm_regs, sizeof(struct kvm_regs))) 1416 if (copy_to_user(argp, kvm_regs, sizeof(struct kvm_regs)))
1427 goto out_free1; 1417 goto out_free1;
1428 r = 0; 1418 r = 0;
1429 out_free1: 1419 out_free1:
1430 kfree(kvm_regs); 1420 kfree(kvm_regs);
1431 break; 1421 break;
1432 } 1422 }
1433 case KVM_SET_REGS: { 1423 case KVM_SET_REGS: {
1434 struct kvm_regs *kvm_regs; 1424 struct kvm_regs *kvm_regs;
1435 1425
1436 r = -ENOMEM; 1426 r = -ENOMEM;
1437 kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); 1427 kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL);
1438 if (!kvm_regs) 1428 if (!kvm_regs)
1439 goto out; 1429 goto out;
1440 r = -EFAULT; 1430 r = -EFAULT;
1441 if (copy_from_user(kvm_regs, argp, sizeof(struct kvm_regs))) 1431 if (copy_from_user(kvm_regs, argp, sizeof(struct kvm_regs)))
1442 goto out_free2; 1432 goto out_free2;
1443 r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs); 1433 r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs);
1444 if (r) 1434 if (r)
1445 goto out_free2; 1435 goto out_free2;
1446 r = 0; 1436 r = 0;
1447 out_free2: 1437 out_free2:
1448 kfree(kvm_regs); 1438 kfree(kvm_regs);
1449 break; 1439 break;
1450 } 1440 }
1451 case KVM_GET_SREGS: { 1441 case KVM_GET_SREGS: {
1452 kvm_sregs = kzalloc(sizeof(struct kvm_sregs), GFP_KERNEL); 1442 kvm_sregs = kzalloc(sizeof(struct kvm_sregs), GFP_KERNEL);
1453 r = -ENOMEM; 1443 r = -ENOMEM;
1454 if (!kvm_sregs) 1444 if (!kvm_sregs)
1455 goto out; 1445 goto out;
1456 r = kvm_arch_vcpu_ioctl_get_sregs(vcpu, kvm_sregs); 1446 r = kvm_arch_vcpu_ioctl_get_sregs(vcpu, kvm_sregs);
1457 if (r) 1447 if (r)
1458 goto out; 1448 goto out;
1459 r = -EFAULT; 1449 r = -EFAULT;
1460 if (copy_to_user(argp, kvm_sregs, sizeof(struct kvm_sregs))) 1450 if (copy_to_user(argp, kvm_sregs, sizeof(struct kvm_sregs)))
1461 goto out; 1451 goto out;
1462 r = 0; 1452 r = 0;
1463 break; 1453 break;
1464 } 1454 }
1465 case KVM_SET_SREGS: { 1455 case KVM_SET_SREGS: {
1466 kvm_sregs = kmalloc(sizeof(struct kvm_sregs), GFP_KERNEL); 1456 kvm_sregs = kmalloc(sizeof(struct kvm_sregs), GFP_KERNEL);
1467 r = -ENOMEM; 1457 r = -ENOMEM;
1468 if (!kvm_sregs) 1458 if (!kvm_sregs)
1469 goto out; 1459 goto out;
1470 r = -EFAULT; 1460 r = -EFAULT;
1471 if (copy_from_user(kvm_sregs, argp, sizeof(struct kvm_sregs))) 1461 if (copy_from_user(kvm_sregs, argp, sizeof(struct kvm_sregs)))
1472 goto out; 1462 goto out;
1473 r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs); 1463 r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs);
1474 if (r) 1464 if (r)
1475 goto out; 1465 goto out;
1476 r = 0; 1466 r = 0;
1477 break; 1467 break;
1478 } 1468 }
1479 case KVM_GET_MP_STATE: { 1469 case KVM_GET_MP_STATE: {
1480 struct kvm_mp_state mp_state; 1470 struct kvm_mp_state mp_state;
1481 1471
1482 r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state); 1472 r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state);
1483 if (r) 1473 if (r)
1484 goto out; 1474 goto out;
1485 r = -EFAULT; 1475 r = -EFAULT;
1486 if (copy_to_user(argp, &mp_state, sizeof mp_state)) 1476 if (copy_to_user(argp, &mp_state, sizeof mp_state))
1487 goto out; 1477 goto out;
1488 r = 0; 1478 r = 0;
1489 break; 1479 break;
1490 } 1480 }
1491 case KVM_SET_MP_STATE: { 1481 case KVM_SET_MP_STATE: {
1492 struct kvm_mp_state mp_state; 1482 struct kvm_mp_state mp_state;
1493 1483
1494 r = -EFAULT; 1484 r = -EFAULT;
1495 if (copy_from_user(&mp_state, argp, sizeof mp_state)) 1485 if (copy_from_user(&mp_state, argp, sizeof mp_state))
1496 goto out; 1486 goto out;
1497 r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state); 1487 r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);
1498 if (r) 1488 if (r)
1499 goto out; 1489 goto out;
1500 r = 0; 1490 r = 0;
1501 break; 1491 break;
1502 } 1492 }
1503 case KVM_TRANSLATE: { 1493 case KVM_TRANSLATE: {
1504 struct kvm_translation tr; 1494 struct kvm_translation tr;
1505 1495
1506 r = -EFAULT; 1496 r = -EFAULT;
1507 if (copy_from_user(&tr, argp, sizeof tr)) 1497 if (copy_from_user(&tr, argp, sizeof tr))
1508 goto out; 1498 goto out;
1509 r = kvm_arch_vcpu_ioctl_translate(vcpu, &tr); 1499 r = kvm_arch_vcpu_ioctl_translate(vcpu, &tr);
1510 if (r) 1500 if (r)
1511 goto out; 1501 goto out;
1512 r = -EFAULT; 1502 r = -EFAULT;
1513 if (copy_to_user(argp, &tr, sizeof tr)) 1503 if (copy_to_user(argp, &tr, sizeof tr))
1514 goto out; 1504 goto out;
1515 r = 0; 1505 r = 0;
1516 break; 1506 break;
1517 } 1507 }
1518 case KVM_SET_GUEST_DEBUG: { 1508 case KVM_SET_GUEST_DEBUG: {
1519 struct kvm_guest_debug dbg; 1509 struct kvm_guest_debug dbg;
1520 1510
1521 r = -EFAULT; 1511 r = -EFAULT;
1522 if (copy_from_user(&dbg, argp, sizeof dbg)) 1512 if (copy_from_user(&dbg, argp, sizeof dbg))
1523 goto out; 1513 goto out;
1524 r = kvm_arch_vcpu_ioctl_set_guest_debug(vcpu, &dbg); 1514 r = kvm_arch_vcpu_ioctl_set_guest_debug(vcpu, &dbg);
1525 if (r) 1515 if (r)
1526 goto out; 1516 goto out;
1527 r = 0; 1517 r = 0;
1528 break; 1518 break;
1529 } 1519 }
1530 case KVM_SET_SIGNAL_MASK: { 1520 case KVM_SET_SIGNAL_MASK: {
1531 struct kvm_signal_mask __user *sigmask_arg = argp; 1521 struct kvm_signal_mask __user *sigmask_arg = argp;
1532 struct kvm_signal_mask kvm_sigmask; 1522 struct kvm_signal_mask kvm_sigmask;
1533 sigset_t sigset, *p; 1523 sigset_t sigset, *p;
1534 1524
1535 p = NULL; 1525 p = NULL;
1536 if (argp) { 1526 if (argp) {
1537 r = -EFAULT; 1527 r = -EFAULT;
1538 if (copy_from_user(&kvm_sigmask, argp, 1528 if (copy_from_user(&kvm_sigmask, argp,
1539 sizeof kvm_sigmask)) 1529 sizeof kvm_sigmask))
1540 goto out; 1530 goto out;
1541 r = -EINVAL; 1531 r = -EINVAL;
1542 if (kvm_sigmask.len != sizeof sigset) 1532 if (kvm_sigmask.len != sizeof sigset)
1543 goto out; 1533 goto out;
1544 r = -EFAULT; 1534 r = -EFAULT;
1545 if (copy_from_user(&sigset, sigmask_arg->sigset, 1535 if (copy_from_user(&sigset, sigmask_arg->sigset,
1546 sizeof sigset)) 1536 sizeof sigset))
1547 goto out; 1537 goto out;
1548 p = &sigset; 1538 p = &sigset;
1549 } 1539 }
1550 r = kvm_vcpu_ioctl_set_sigmask(vcpu, p); 1540 r = kvm_vcpu_ioctl_set_sigmask(vcpu, p);
1551 break; 1541 break;
1552 } 1542 }
1553 case KVM_GET_FPU: { 1543 case KVM_GET_FPU: {
1554 fpu = kzalloc(sizeof(struct kvm_fpu), GFP_KERNEL); 1544 fpu = kzalloc(sizeof(struct kvm_fpu), GFP_KERNEL);
1555 r = -ENOMEM; 1545 r = -ENOMEM;
1556 if (!fpu) 1546 if (!fpu)
1557 goto out; 1547 goto out;
1558 r = kvm_arch_vcpu_ioctl_get_fpu(vcpu, fpu); 1548 r = kvm_arch_vcpu_ioctl_get_fpu(vcpu, fpu);
1559 if (r) 1549 if (r)
1560 goto out; 1550 goto out;
1561 r = -EFAULT; 1551 r = -EFAULT;
1562 if (copy_to_user(argp, fpu, sizeof(struct kvm_fpu))) 1552 if (copy_to_user(argp, fpu, sizeof(struct kvm_fpu)))
1563 goto out; 1553 goto out;
1564 r = 0; 1554 r = 0;
1565 break; 1555 break;
1566 } 1556 }
1567 case KVM_SET_FPU: { 1557 case KVM_SET_FPU: {
1568 fpu = kmalloc(sizeof(struct kvm_fpu), GFP_KERNEL); 1558 fpu = kmalloc(sizeof(struct kvm_fpu), GFP_KERNEL);
1569 r = -ENOMEM; 1559 r = -ENOMEM;
1570 if (!fpu) 1560 if (!fpu)
1571 goto out; 1561 goto out;
1572 r = -EFAULT; 1562 r = -EFAULT;
1573 if (copy_from_user(fpu, argp, sizeof(struct kvm_fpu))) 1563 if (copy_from_user(fpu, argp, sizeof(struct kvm_fpu)))
1574 goto out; 1564 goto out;
1575 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu); 1565 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
1576 if (r) 1566 if (r)
1577 goto out; 1567 goto out;
1578 r = 0; 1568 r = 0;
1579 break; 1569 break;
1580 } 1570 }
1581 default: 1571 default:
1582 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg); 1572 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
1583 } 1573 }
1584 out: 1574 out:
1585 vcpu_put(vcpu); 1575 vcpu_put(vcpu);
1586 kfree(fpu); 1576 kfree(fpu);
1587 kfree(kvm_sregs); 1577 kfree(kvm_sregs);
1588 return r; 1578 return r;
1589 } 1579 }
1590 1580
1591 static long kvm_vm_ioctl(struct file *filp, 1581 static long kvm_vm_ioctl(struct file *filp,
1592 unsigned int ioctl, unsigned long arg) 1582 unsigned int ioctl, unsigned long arg)
1593 { 1583 {
1594 struct kvm *kvm = filp->private_data; 1584 struct kvm *kvm = filp->private_data;
1595 void __user *argp = (void __user *)arg; 1585 void __user *argp = (void __user *)arg;
1596 int r; 1586 int r;
1597 1587
1598 if (kvm->mm != current->mm) 1588 if (kvm->mm != current->mm)
1599 return -EIO; 1589 return -EIO;
1600 switch (ioctl) { 1590 switch (ioctl) {
1601 case KVM_CREATE_VCPU: 1591 case KVM_CREATE_VCPU:
1602 r = kvm_vm_ioctl_create_vcpu(kvm, arg); 1592 r = kvm_vm_ioctl_create_vcpu(kvm, arg);
1603 if (r < 0) 1593 if (r < 0)
1604 goto out; 1594 goto out;
1605 break; 1595 break;
1606 case KVM_SET_USER_MEMORY_REGION: { 1596 case KVM_SET_USER_MEMORY_REGION: {
1607 struct kvm_userspace_memory_region kvm_userspace_mem; 1597 struct kvm_userspace_memory_region kvm_userspace_mem;
1608 1598
1609 r = -EFAULT; 1599 r = -EFAULT;
1610 if (copy_from_user(&kvm_userspace_mem, argp, 1600 if (copy_from_user(&kvm_userspace_mem, argp,
1611 sizeof kvm_userspace_mem)) 1601 sizeof kvm_userspace_mem))
1612 goto out; 1602 goto out;
1613 1603
1614 r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1); 1604 r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1);
1615 if (r) 1605 if (r)
1616 goto out; 1606 goto out;
1617 break; 1607 break;
1618 } 1608 }
1619 case KVM_GET_DIRTY_LOG: { 1609 case KVM_GET_DIRTY_LOG: {
1620 struct kvm_dirty_log log; 1610 struct kvm_dirty_log log;
1621 1611
1622 r = -EFAULT; 1612 r = -EFAULT;
1623 if (copy_from_user(&log, argp, sizeof log)) 1613 if (copy_from_user(&log, argp, sizeof log))
1624 goto out; 1614 goto out;
1625 r = kvm_vm_ioctl_get_dirty_log(kvm, &log); 1615 r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
1626 if (r) 1616 if (r)
1627 goto out; 1617 goto out;
1628 break; 1618 break;
1629 } 1619 }
1630 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET 1620 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
1631 case KVM_REGISTER_COALESCED_MMIO: { 1621 case KVM_REGISTER_COALESCED_MMIO: {
1632 struct kvm_coalesced_mmio_zone zone; 1622 struct kvm_coalesced_mmio_zone zone;
1633 r = -EFAULT; 1623 r = -EFAULT;
1634 if (copy_from_user(&zone, argp, sizeof zone)) 1624 if (copy_from_user(&zone, argp, sizeof zone))
1635 goto out; 1625 goto out;
1636 r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone); 1626 r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone);
1637 if (r) 1627 if (r)
1638 goto out; 1628 goto out;
1639 r = 0; 1629 r = 0;
1640 break; 1630 break;
1641 } 1631 }
1642 case KVM_UNREGISTER_COALESCED_MMIO: { 1632 case KVM_UNREGISTER_COALESCED_MMIO: {
1643 struct kvm_coalesced_mmio_zone zone; 1633 struct kvm_coalesced_mmio_zone zone;
1644 r = -EFAULT; 1634 r = -EFAULT;
1645 if (copy_from_user(&zone, argp, sizeof zone)) 1635 if (copy_from_user(&zone, argp, sizeof zone))
1646 goto out; 1636 goto out;
1647 r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone); 1637 r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone);
1648 if (r) 1638 if (r)
1649 goto out; 1639 goto out;
1650 r = 0; 1640 r = 0;
1651 break; 1641 break;
1652 } 1642 }
1653 #endif 1643 #endif
1654 case KVM_IRQFD: { 1644 case KVM_IRQFD: {
1655 struct kvm_irqfd data; 1645 struct kvm_irqfd data;
1656 1646
1657 r = -EFAULT; 1647 r = -EFAULT;
1658 if (copy_from_user(&data, argp, sizeof data)) 1648 if (copy_from_user(&data, argp, sizeof data))
1659 goto out; 1649 goto out;
1660 r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags); 1650 r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
1661 break; 1651 break;
1662 } 1652 }
1663 case KVM_IOEVENTFD: { 1653 case KVM_IOEVENTFD: {
1664 struct kvm_ioeventfd data; 1654 struct kvm_ioeventfd data;
1665 1655
1666 r = -EFAULT; 1656 r = -EFAULT;
1667 if (copy_from_user(&data, argp, sizeof data)) 1657 if (copy_from_user(&data, argp, sizeof data))
1668 goto out; 1658 goto out;
1669 r = kvm_ioeventfd(kvm, &data); 1659 r = kvm_ioeventfd(kvm, &data);
1670 break; 1660 break;
1671 } 1661 }
1672 #ifdef CONFIG_KVM_APIC_ARCHITECTURE 1662 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
1673 case KVM_SET_BOOT_CPU_ID: 1663 case KVM_SET_BOOT_CPU_ID:
1674 r = 0; 1664 r = 0;
1675 mutex_lock(&kvm->lock); 1665 mutex_lock(&kvm->lock);
1676 if (atomic_read(&kvm->online_vcpus) != 0) 1666 if (atomic_read(&kvm->online_vcpus) != 0)
1677 r = -EBUSY; 1667 r = -EBUSY;
1678 else 1668 else
1679 kvm->bsp_vcpu_id = arg; 1669 kvm->bsp_vcpu_id = arg;
1680 mutex_unlock(&kvm->lock); 1670 mutex_unlock(&kvm->lock);
1681 break; 1671 break;
1682 #endif 1672 #endif
1683 default: 1673 default:
1684 r = kvm_arch_vm_ioctl(filp, ioctl, arg); 1674 r = kvm_arch_vm_ioctl(filp, ioctl, arg);
1685 if (r == -ENOTTY) 1675 if (r == -ENOTTY)
1686 r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg); 1676 r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg);
1687 } 1677 }
1688 out: 1678 out:
1689 return r; 1679 return r;
1690 } 1680 }
1691 1681
1692 #ifdef CONFIG_COMPAT 1682 #ifdef CONFIG_COMPAT
1693 struct compat_kvm_dirty_log { 1683 struct compat_kvm_dirty_log {
1694 __u32 slot; 1684 __u32 slot;
1695 __u32 padding1; 1685 __u32 padding1;
1696 union { 1686 union {
1697 compat_uptr_t dirty_bitmap; /* one bit per page */ 1687 compat_uptr_t dirty_bitmap; /* one bit per page */
1698 __u64 padding2; 1688 __u64 padding2;
1699 }; 1689 };
1700 }; 1690 };
1701 1691
1702 static long kvm_vm_compat_ioctl(struct file *filp, 1692 static long kvm_vm_compat_ioctl(struct file *filp,
1703 unsigned int ioctl, unsigned long arg) 1693 unsigned int ioctl, unsigned long arg)
1704 { 1694 {
1705 struct kvm *kvm = filp->private_data; 1695 struct kvm *kvm = filp->private_data;
1706 int r; 1696 int r;
1707 1697
1708 if (kvm->mm != current->mm) 1698 if (kvm->mm != current->mm)
1709 return -EIO; 1699 return -EIO;
1710 switch (ioctl) { 1700 switch (ioctl) {
1711 case KVM_GET_DIRTY_LOG: { 1701 case KVM_GET_DIRTY_LOG: {
1712 struct compat_kvm_dirty_log compat_log; 1702 struct compat_kvm_dirty_log compat_log;
1713 struct kvm_dirty_log log; 1703 struct kvm_dirty_log log;
1714 1704
1715 r = -EFAULT; 1705 r = -EFAULT;
1716 if (copy_from_user(&compat_log, (void __user *)arg, 1706 if (copy_from_user(&compat_log, (void __user *)arg,
1717 sizeof(compat_log))) 1707 sizeof(compat_log)))
1718 goto out; 1708 goto out;
1719 log.slot = compat_log.slot; 1709 log.slot = compat_log.slot;
1720 log.padding1 = compat_log.padding1; 1710 log.padding1 = compat_log.padding1;
1721 log.padding2 = compat_log.padding2; 1711 log.padding2 = compat_log.padding2;
1722 log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap); 1712 log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);
1723 1713
1724 r = kvm_vm_ioctl_get_dirty_log(kvm, &log); 1714 r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
1725 if (r) 1715 if (r)
1726 goto out; 1716 goto out;
1727 break; 1717 break;
1728 } 1718 }
1729 default: 1719 default:
1730 r = kvm_vm_ioctl(filp, ioctl, arg); 1720 r = kvm_vm_ioctl(filp, ioctl, arg);
1731 } 1721 }
1732 1722
1733 out: 1723 out:
1734 return r; 1724 return r;
1735 } 1725 }
1736 #endif 1726 #endif
1737 1727
1738 static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) 1728 static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
1739 { 1729 {
1740 struct page *page[1]; 1730 struct page *page[1];
1741 unsigned long addr; 1731 unsigned long addr;
1742 int npages; 1732 int npages;
1743 gfn_t gfn = vmf->pgoff; 1733 gfn_t gfn = vmf->pgoff;
1744 struct kvm *kvm = vma->vm_file->private_data; 1734 struct kvm *kvm = vma->vm_file->private_data;
1745 1735
1746 addr = gfn_to_hva(kvm, gfn); 1736 addr = gfn_to_hva(kvm, gfn);
1747 if (kvm_is_error_hva(addr)) 1737 if (kvm_is_error_hva(addr))
1748 return VM_FAULT_SIGBUS; 1738 return VM_FAULT_SIGBUS;
1749 1739
1750 npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page, 1740 npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
1751 NULL); 1741 NULL);
1752 if (unlikely(npages != 1)) 1742 if (unlikely(npages != 1))
1753 return VM_FAULT_SIGBUS; 1743 return VM_FAULT_SIGBUS;
1754 1744
1755 vmf->page = page[0]; 1745 vmf->page = page[0];
1756 return 0; 1746 return 0;
1757 } 1747 }
1758 1748
1759 static const struct vm_operations_struct kvm_vm_vm_ops = { 1749 static const struct vm_operations_struct kvm_vm_vm_ops = {
1760 .fault = kvm_vm_fault, 1750 .fault = kvm_vm_fault,
1761 }; 1751 };
1762 1752
1763 static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) 1753 static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma)
1764 { 1754 {
1765 vma->vm_ops = &kvm_vm_vm_ops; 1755 vma->vm_ops = &kvm_vm_vm_ops;
1766 return 0; 1756 return 0;
1767 } 1757 }
1768 1758
1769 static struct file_operations kvm_vm_fops = { 1759 static struct file_operations kvm_vm_fops = {
1770 .release = kvm_vm_release, 1760 .release = kvm_vm_release,
1771 .unlocked_ioctl = kvm_vm_ioctl, 1761 .unlocked_ioctl = kvm_vm_ioctl,
1772 #ifdef CONFIG_COMPAT 1762 #ifdef CONFIG_COMPAT
1773 .compat_ioctl = kvm_vm_compat_ioctl, 1763 .compat_ioctl = kvm_vm_compat_ioctl,
1774 #endif 1764 #endif
1775 .mmap = kvm_vm_mmap, 1765 .mmap = kvm_vm_mmap,
1776 }; 1766 };
1777 1767
1778 static int kvm_dev_ioctl_create_vm(void) 1768 static int kvm_dev_ioctl_create_vm(void)
1779 { 1769 {
1780 int fd, r; 1770 int fd, r;
1781 struct kvm *kvm; 1771 struct kvm *kvm;
1782 1772
1783 kvm = kvm_create_vm(); 1773 kvm = kvm_create_vm();
1784 if (IS_ERR(kvm)) 1774 if (IS_ERR(kvm))
1785 return PTR_ERR(kvm); 1775 return PTR_ERR(kvm);
1786 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET 1776 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
1787 r = kvm_coalesced_mmio_init(kvm); 1777 r = kvm_coalesced_mmio_init(kvm);
1788 if (r < 0) { 1778 if (r < 0) {
1789 kvm_put_kvm(kvm); 1779 kvm_put_kvm(kvm);
1790 return r; 1780 return r;
1791 } 1781 }
1792 #endif 1782 #endif
1793 fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); 1783 fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
1794 if (fd < 0) 1784 if (fd < 0)
1795 kvm_put_kvm(kvm); 1785 kvm_put_kvm(kvm);
1796 1786
1797 return fd; 1787 return fd;
1798 } 1788 }
1799 1789
1800 static long kvm_dev_ioctl_check_extension_generic(long arg) 1790 static long kvm_dev_ioctl_check_extension_generic(long arg)
1801 { 1791 {
1802 switch (arg) { 1792 switch (arg) {
1803 case KVM_CAP_USER_MEMORY: 1793 case KVM_CAP_USER_MEMORY:
1804 case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: 1794 case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
1805 case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: 1795 case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS:
1806 #ifdef CONFIG_KVM_APIC_ARCHITECTURE 1796 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
1807 case KVM_CAP_SET_BOOT_CPU_ID: 1797 case KVM_CAP_SET_BOOT_CPU_ID:
1808 #endif 1798 #endif
1809 case KVM_CAP_INTERNAL_ERROR_DATA: 1799 case KVM_CAP_INTERNAL_ERROR_DATA:
1810 return 1; 1800 return 1;
1811 #ifdef CONFIG_HAVE_KVM_IRQCHIP 1801 #ifdef CONFIG_HAVE_KVM_IRQCHIP
1812 case KVM_CAP_IRQ_ROUTING: 1802 case KVM_CAP_IRQ_ROUTING:
1813 return KVM_MAX_IRQ_ROUTES; 1803 return KVM_MAX_IRQ_ROUTES;
1814 #endif 1804 #endif
1815 default: 1805 default:
1816 break; 1806 break;
1817 } 1807 }
1818 return kvm_dev_ioctl_check_extension(arg); 1808 return kvm_dev_ioctl_check_extension(arg);
1819 } 1809 }
1820 1810
1821 static long kvm_dev_ioctl(struct file *filp, 1811 static long kvm_dev_ioctl(struct file *filp,
1822 unsigned int ioctl, unsigned long arg) 1812 unsigned int ioctl, unsigned long arg)
1823 { 1813 {
1824 long r = -EINVAL; 1814 long r = -EINVAL;
1825 1815
1826 switch (ioctl) { 1816 switch (ioctl) {
1827 case KVM_GET_API_VERSION: 1817 case KVM_GET_API_VERSION:
1828 r = -EINVAL; 1818 r = -EINVAL;
1829 if (arg) 1819 if (arg)
1830 goto out; 1820 goto out;
1831 r = KVM_API_VERSION; 1821 r = KVM_API_VERSION;
1832 break; 1822 break;
1833 case KVM_CREATE_VM: 1823 case KVM_CREATE_VM:
1834 r = -EINVAL; 1824 r = -EINVAL;
1835 if (arg) 1825 if (arg)
1836 goto out; 1826 goto out;
1837 r = kvm_dev_ioctl_create_vm(); 1827 r = kvm_dev_ioctl_create_vm();
1838 break; 1828 break;
1839 case KVM_CHECK_EXTENSION: 1829 case KVM_CHECK_EXTENSION:
1840 r = kvm_dev_ioctl_check_extension_generic(arg); 1830 r = kvm_dev_ioctl_check_extension_generic(arg);
1841 break; 1831 break;
1842 case KVM_GET_VCPU_MMAP_SIZE: 1832 case KVM_GET_VCPU_MMAP_SIZE:
1843 r = -EINVAL; 1833 r = -EINVAL;
1844 if (arg) 1834 if (arg)
1845 goto out; 1835 goto out;
1846 r = PAGE_SIZE; /* struct kvm_run */ 1836 r = PAGE_SIZE; /* struct kvm_run */
1847 #ifdef CONFIG_X86 1837 #ifdef CONFIG_X86
1848 r += PAGE_SIZE; /* pio data page */ 1838 r += PAGE_SIZE; /* pio data page */
1849 #endif 1839 #endif
1850 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET 1840 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
1851 r += PAGE_SIZE; /* coalesced mmio ring page */ 1841 r += PAGE_SIZE; /* coalesced mmio ring page */
1852 #endif 1842 #endif
1853 break; 1843 break;
1854 case KVM_TRACE_ENABLE: 1844 case KVM_TRACE_ENABLE:
1855 case KVM_TRACE_PAUSE: 1845 case KVM_TRACE_PAUSE:
1856 case KVM_TRACE_DISABLE: 1846 case KVM_TRACE_DISABLE:
1857 r = -EOPNOTSUPP; 1847 r = -EOPNOTSUPP;
1858 break; 1848 break;
1859 default: 1849 default:
1860 return kvm_arch_dev_ioctl(filp, ioctl, arg); 1850 return kvm_arch_dev_ioctl(filp, ioctl, arg);
1861 } 1851 }
1862 out: 1852 out:
1863 return r; 1853 return r;
1864 } 1854 }
1865 1855
1866 static struct file_operations kvm_chardev_ops = { 1856 static struct file_operations kvm_chardev_ops = {
1867 .unlocked_ioctl = kvm_dev_ioctl, 1857 .unlocked_ioctl = kvm_dev_ioctl,
1868 .compat_ioctl = kvm_dev_ioctl, 1858 .compat_ioctl = kvm_dev_ioctl,
1869 }; 1859 };
1870 1860
1871 static struct miscdevice kvm_dev = { 1861 static struct miscdevice kvm_dev = {
1872 KVM_MINOR, 1862 KVM_MINOR,
1873 "kvm", 1863 "kvm",
1874 &kvm_chardev_ops, 1864 &kvm_chardev_ops,
1875 }; 1865 };
1876 1866
1877 static void hardware_enable(void *junk) 1867 static void hardware_enable(void *junk)
1878 { 1868 {
1879 int cpu = raw_smp_processor_id(); 1869 int cpu = raw_smp_processor_id();
1880 int r; 1870 int r;
1881 1871
1882 if (cpumask_test_cpu(cpu, cpus_hardware_enabled)) 1872 if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
1883 return; 1873 return;
1884 1874
1885 cpumask_set_cpu(cpu, cpus_hardware_enabled); 1875 cpumask_set_cpu(cpu, cpus_hardware_enabled);
1886 1876
1887 r = kvm_arch_hardware_enable(NULL); 1877 r = kvm_arch_hardware_enable(NULL);
1888 1878
1889 if (r) { 1879 if (r) {
1890 cpumask_clear_cpu(cpu, cpus_hardware_enabled); 1880 cpumask_clear_cpu(cpu, cpus_hardware_enabled);
1891 atomic_inc(&hardware_enable_failed); 1881 atomic_inc(&hardware_enable_failed);
1892 printk(KERN_INFO "kvm: enabling virtualization on " 1882 printk(KERN_INFO "kvm: enabling virtualization on "
1893 "CPU%d failed\n", cpu); 1883 "CPU%d failed\n", cpu);
1894 } 1884 }
1895 } 1885 }
1896 1886
1897 static void hardware_disable(void *junk) 1887 static void hardware_disable(void *junk)
1898 { 1888 {
1899 int cpu = raw_smp_processor_id(); 1889 int cpu = raw_smp_processor_id();
1900 1890
1901 if (!cpumask_test_cpu(cpu, cpus_hardware_enabled)) 1891 if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
1902 return; 1892 return;
1903 cpumask_clear_cpu(cpu, cpus_hardware_enabled); 1893 cpumask_clear_cpu(cpu, cpus_hardware_enabled);
1904 kvm_arch_hardware_disable(NULL); 1894 kvm_arch_hardware_disable(NULL);
1905 } 1895 }
1906 1896
1907 static void hardware_disable_all_nolock(void) 1897 static void hardware_disable_all_nolock(void)
1908 { 1898 {
1909 BUG_ON(!kvm_usage_count); 1899 BUG_ON(!kvm_usage_count);
1910 1900
1911 kvm_usage_count--; 1901 kvm_usage_count--;
1912 if (!kvm_usage_count) 1902 if (!kvm_usage_count)
1913 on_each_cpu(hardware_disable, NULL, 1); 1903 on_each_cpu(hardware_disable, NULL, 1);
1914 } 1904 }
1915 1905
1916 static void hardware_disable_all(void) 1906 static void hardware_disable_all(void)
1917 { 1907 {
1918 spin_lock(&kvm_lock); 1908 spin_lock(&kvm_lock);
1919 hardware_disable_all_nolock(); 1909 hardware_disable_all_nolock();
1920 spin_unlock(&kvm_lock); 1910 spin_unlock(&kvm_lock);
1921 } 1911 }
1922 1912
1923 static int hardware_enable_all(void) 1913 static int hardware_enable_all(void)
1924 { 1914 {
1925 int r = 0; 1915 int r = 0;
1926 1916
1927 spin_lock(&kvm_lock); 1917 spin_lock(&kvm_lock);
1928 1918
1929 kvm_usage_count++; 1919 kvm_usage_count++;
1930 if (kvm_usage_count == 1) { 1920 if (kvm_usage_count == 1) {
1931 atomic_set(&hardware_enable_failed, 0); 1921 atomic_set(&hardware_enable_failed, 0);
1932 on_each_cpu(hardware_enable, NULL, 1); 1922 on_each_cpu(hardware_enable, NULL, 1);
1933 1923
1934 if (atomic_read(&hardware_enable_failed)) { 1924 if (atomic_read(&hardware_enable_failed)) {
1935 hardware_disable_all_nolock(); 1925 hardware_disable_all_nolock();
1936 r = -EBUSY; 1926 r = -EBUSY;
1937 } 1927 }
1938 } 1928 }
1939 1929
1940 spin_unlock(&kvm_lock); 1930 spin_unlock(&kvm_lock);
1941 1931
1942 return r; 1932 return r;
1943 } 1933 }
1944 1934
1945 static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, 1935 static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
1946 void *v) 1936 void *v)
1947 { 1937 {
1948 int cpu = (long)v; 1938 int cpu = (long)v;
1949 1939
1950 if (!kvm_usage_count) 1940 if (!kvm_usage_count)
1951 return NOTIFY_OK; 1941 return NOTIFY_OK;
1952 1942
1953 val &= ~CPU_TASKS_FROZEN; 1943 val &= ~CPU_TASKS_FROZEN;
1954 switch (val) { 1944 switch (val) {
1955 case CPU_DYING: 1945 case CPU_DYING:
1956 printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n", 1946 printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
1957 cpu); 1947 cpu);
1958 hardware_disable(NULL); 1948 hardware_disable(NULL);
1959 break; 1949 break;
1960 case CPU_ONLINE: 1950 case CPU_ONLINE:
1961 printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n", 1951 printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",
1962 cpu); 1952 cpu);
1963 smp_call_function_single(cpu, hardware_enable, NULL, 1); 1953 smp_call_function_single(cpu, hardware_enable, NULL, 1);
1964 break; 1954 break;
1965 } 1955 }
1966 return NOTIFY_OK; 1956 return NOTIFY_OK;
1967 } 1957 }
1968 1958
1969 1959
1970 asmlinkage void kvm_handle_fault_on_reboot(void) 1960 asmlinkage void kvm_handle_fault_on_reboot(void)
1971 { 1961 {
1972 if (kvm_rebooting) 1962 if (kvm_rebooting)
1973 /* spin while reset goes on */ 1963 /* spin while reset goes on */
1974 while (true) 1964 while (true)
1975 ; 1965 ;
1976 /* Fault while not rebooting. We want the trace. */ 1966 /* Fault while not rebooting. We want the trace. */
1977 BUG(); 1967 BUG();
1978 } 1968 }
1979 EXPORT_SYMBOL_GPL(kvm_handle_fault_on_reboot); 1969 EXPORT_SYMBOL_GPL(kvm_handle_fault_on_reboot);
1980 1970
1981 static int kvm_reboot(struct notifier_block *notifier, unsigned long val, 1971 static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
1982 void *v) 1972 void *v)
1983 { 1973 {
1984 /* 1974 /*
1985 * Some (well, at least mine) BIOSes hang on reboot if 1975 * Some (well, at least mine) BIOSes hang on reboot if
1986 * in vmx root mode. 1976 * in vmx root mode.
1987 * 1977 *
1988 * And Intel TXT required VMX off for all cpu when system shutdown. 1978 * And Intel TXT required VMX off for all cpu when system shutdown.
1989 */ 1979 */
1990 printk(KERN_INFO "kvm: exiting hardware virtualization\n"); 1980 printk(KERN_INFO "kvm: exiting hardware virtualization\n");
1991 kvm_rebooting = true; 1981 kvm_rebooting = true;
1992 on_each_cpu(hardware_disable, NULL, 1); 1982 on_each_cpu(hardware_disable, NULL, 1);
1993 return NOTIFY_OK; 1983 return NOTIFY_OK;
1994 } 1984 }
1995 1985
1996 static struct notifier_block kvm_reboot_notifier = { 1986 static struct notifier_block kvm_reboot_notifier = {
1997 .notifier_call = kvm_reboot, 1987 .notifier_call = kvm_reboot,
1998 .priority = 0, 1988 .priority = 0,
1999 }; 1989 };
2000 1990
2001 static void kvm_io_bus_destroy(struct kvm_io_bus *bus) 1991 static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
2002 { 1992 {
2003 int i; 1993 int i;
2004 1994
2005 for (i = 0; i < bus->dev_count; i++) { 1995 for (i = 0; i < bus->dev_count; i++) {
2006 struct kvm_io_device *pos = bus->devs[i]; 1996 struct kvm_io_device *pos = bus->devs[i];
2007 1997
2008 kvm_iodevice_destructor(pos); 1998 kvm_iodevice_destructor(pos);
2009 } 1999 }
2010 kfree(bus); 2000 kfree(bus);
2011 } 2001 }
2012 2002
2013 /* kvm_io_bus_write - called under kvm->slots_lock */ 2003 /* kvm_io_bus_write - called under kvm->slots_lock */
2014 int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, 2004 int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
2015 int len, const void *val) 2005 int len, const void *val)
2016 { 2006 {
2017 int i; 2007 int i;
2018 struct kvm_io_bus *bus; 2008 struct kvm_io_bus *bus;
2019 2009
2020 bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); 2010 bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
2021 for (i = 0; i < bus->dev_count; i++) 2011 for (i = 0; i < bus->dev_count; i++)
2022 if (!kvm_iodevice_write(bus->devs[i], addr, len, val)) 2012 if (!kvm_iodevice_write(bus->devs[i], addr, len, val))
2023 return 0; 2013 return 0;
2024 return -EOPNOTSUPP; 2014 return -EOPNOTSUPP;
2025 } 2015 }
2026 2016
2027 /* kvm_io_bus_read - called under kvm->slots_lock */ 2017 /* kvm_io_bus_read - called under kvm->slots_lock */
2028 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, 2018 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
2029 int len, void *val) 2019 int len, void *val)
2030 { 2020 {
2031 int i; 2021 int i;
2032 struct kvm_io_bus *bus; 2022 struct kvm_io_bus *bus;
2033 2023
2034 bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); 2024 bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
2035 for (i = 0; i < bus->dev_count; i++) 2025 for (i = 0; i < bus->dev_count; i++)
2036 if (!kvm_iodevice_read(bus->devs[i], addr, len, val)) 2026 if (!kvm_iodevice_read(bus->devs[i], addr, len, val))
2037 return 0; 2027 return 0;
2038 return -EOPNOTSUPP; 2028 return -EOPNOTSUPP;
2039 } 2029 }
2040 2030
2041 /* Caller must hold slots_lock. */ 2031 /* Caller must hold slots_lock. */
2042 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, 2032 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
2043 struct kvm_io_device *dev) 2033 struct kvm_io_device *dev)
2044 { 2034 {
2045 struct kvm_io_bus *new_bus, *bus; 2035 struct kvm_io_bus *new_bus, *bus;
2046 2036
2047 bus = kvm->buses[bus_idx]; 2037 bus = kvm->buses[bus_idx];
2048 if (bus->dev_count > NR_IOBUS_DEVS-1) 2038 if (bus->dev_count > NR_IOBUS_DEVS-1)
2049 return -ENOSPC; 2039 return -ENOSPC;
2050 2040
2051 new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); 2041 new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL);
2052 if (!new_bus) 2042 if (!new_bus)
2053 return -ENOMEM; 2043 return -ENOMEM;
2054 memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); 2044 memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
2055 new_bus->devs[new_bus->dev_count++] = dev; 2045 new_bus->devs[new_bus->dev_count++] = dev;
2056 rcu_assign_pointer(kvm->buses[bus_idx], new_bus); 2046 rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
2057 synchronize_srcu_expedited(&kvm->srcu); 2047 synchronize_srcu_expedited(&kvm->srcu);
2058 kfree(bus); 2048 kfree(bus);
2059 2049
2060 return 0; 2050 return 0;
2061 } 2051 }
2062 2052
2063 /* Caller must hold slots_lock. */ 2053 /* Caller must hold slots_lock. */
2064 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, 2054 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
2065 struct kvm_io_device *dev) 2055 struct kvm_io_device *dev)
2066 { 2056 {
2067 int i, r; 2057 int i, r;
2068 struct kvm_io_bus *new_bus, *bus; 2058 struct kvm_io_bus *new_bus, *bus;
2069 2059
2070 new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); 2060 new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL);
2071 if (!new_bus) 2061 if (!new_bus)
2072 return -ENOMEM; 2062 return -ENOMEM;
2073 2063
2074 bus = kvm->buses[bus_idx]; 2064 bus = kvm->buses[bus_idx];
2075 memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); 2065 memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
2076 2066
2077 r = -ENOENT; 2067 r = -ENOENT;
2078 for (i = 0; i < new_bus->dev_count; i++) 2068 for (i = 0; i < new_bus->dev_count; i++)
2079 if (new_bus->devs[i] == dev) { 2069 if (new_bus->devs[i] == dev) {
2080 r = 0; 2070 r = 0;
2081 new_bus->devs[i] = new_bus->devs[--new_bus->dev_count]; 2071 new_bus->devs[i] = new_bus->devs[--new_bus->dev_count];
2082 break; 2072 break;
2083 } 2073 }
2084 2074
2085 if (r) { 2075 if (r) {
2086 kfree(new_bus); 2076 kfree(new_bus);
2087 return r; 2077 return r;
2088 } 2078 }
2089 2079
2090 rcu_assign_pointer(kvm->buses[bus_idx], new_bus); 2080 rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
2091 synchronize_srcu_expedited(&kvm->srcu); 2081 synchronize_srcu_expedited(&kvm->srcu);
2092 kfree(bus); 2082 kfree(bus);
2093 return r; 2083 return r;
2094 } 2084 }
2095 2085
2096 static struct notifier_block kvm_cpu_notifier = { 2086 static struct notifier_block kvm_cpu_notifier = {
2097 .notifier_call = kvm_cpu_hotplug, 2087 .notifier_call = kvm_cpu_hotplug,
2098 .priority = 20, /* must be > scheduler priority */ 2088 .priority = 20, /* must be > scheduler priority */
2099 }; 2089 };
2100 2090
2101 static int vm_stat_get(void *_offset, u64 *val) 2091 static int vm_stat_get(void *_offset, u64 *val)
2102 { 2092 {
2103 unsigned offset = (long)_offset; 2093 unsigned offset = (long)_offset;
2104 struct kvm *kvm; 2094 struct kvm *kvm;
2105 2095
2106 *val = 0; 2096 *val = 0;
2107 spin_lock(&kvm_lock); 2097 spin_lock(&kvm_lock);
2108 list_for_each_entry(kvm, &vm_list, vm_list) 2098 list_for_each_entry(kvm, &vm_list, vm_list)
2109 *val += *(u32 *)((void *)kvm + offset); 2099 *val += *(u32 *)((void *)kvm + offset);
2110 spin_unlock(&kvm_lock); 2100 spin_unlock(&kvm_lock);
2111 return 0; 2101 return 0;
2112 } 2102 }
2113 2103
2114 DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n"); 2104 DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n");
2115 2105
2116 static int vcpu_stat_get(void *_offset, u64 *val) 2106 static int vcpu_stat_get(void *_offset, u64 *val)
2117 { 2107 {
2118 unsigned offset = (long)_offset; 2108 unsigned offset = (long)_offset;
2119 struct kvm *kvm; 2109 struct kvm *kvm;
2120 struct kvm_vcpu *vcpu; 2110 struct kvm_vcpu *vcpu;
2121 int i; 2111 int i;
2122 2112
2123 *val = 0; 2113 *val = 0;
2124 spin_lock(&kvm_lock); 2114 spin_lock(&kvm_lock);
2125 list_for_each_entry(kvm, &vm_list, vm_list) 2115 list_for_each_entry(kvm, &vm_list, vm_list)
2126 kvm_for_each_vcpu(i, vcpu, kvm) 2116 kvm_for_each_vcpu(i, vcpu, kvm)
2127 *val += *(u32 *)((void *)vcpu + offset); 2117 *val += *(u32 *)((void *)vcpu + offset);
2128 2118
2129 spin_unlock(&kvm_lock); 2119 spin_unlock(&kvm_lock);
2130 return 0; 2120 return 0;
2131 } 2121 }
2132 2122
2133 DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n"); 2123 DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n");
2134 2124
2135 static const struct file_operations *stat_fops[] = { 2125 static const struct file_operations *stat_fops[] = {
2136 [KVM_STAT_VCPU] = &vcpu_stat_fops, 2126 [KVM_STAT_VCPU] = &vcpu_stat_fops,
2137 [KVM_STAT_VM] = &vm_stat_fops, 2127 [KVM_STAT_VM] = &vm_stat_fops,
2138 }; 2128 };
2139 2129
2140 static void kvm_init_debug(void) 2130 static void kvm_init_debug(void)
2141 { 2131 {
2142 struct kvm_stats_debugfs_item *p; 2132 struct kvm_stats_debugfs_item *p;
2143 2133
2144 kvm_debugfs_dir = debugfs_create_dir("kvm", NULL); 2134 kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
2145 for (p = debugfs_entries; p->name; ++p) 2135 for (p = debugfs_entries; p->name; ++p)
2146 p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir, 2136 p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir,
2147 (void *)(long)p->offset, 2137 (void *)(long)p->offset,
2148 stat_fops[p->kind]); 2138 stat_fops[p->kind]);
2149 } 2139 }
2150 2140
2151 static void kvm_exit_debug(void) 2141 static void kvm_exit_debug(void)
2152 { 2142 {
2153 struct kvm_stats_debugfs_item *p; 2143 struct kvm_stats_debugfs_item *p;
2154 2144
2155 for (p = debugfs_entries; p->name; ++p) 2145 for (p = debugfs_entries; p->name; ++p)
2156 debugfs_remove(p->dentry); 2146 debugfs_remove(p->dentry);
2157 debugfs_remove(kvm_debugfs_dir); 2147 debugfs_remove(kvm_debugfs_dir);
2158 } 2148 }
2159 2149
2160 static int kvm_suspend(struct sys_device *dev, pm_message_t state) 2150 static int kvm_suspend(struct sys_device *dev, pm_message_t state)
2161 { 2151 {
2162 if (kvm_usage_count) 2152 if (kvm_usage_count)
2163 hardware_disable(NULL); 2153 hardware_disable(NULL);
2164 return 0; 2154 return 0;
2165 } 2155 }
2166 2156
2167 static int kvm_resume(struct sys_device *dev) 2157 static int kvm_resume(struct sys_device *dev)
2168 { 2158 {
2169 if (kvm_usage_count) 2159 if (kvm_usage_count)
2170 hardware_enable(NULL); 2160 hardware_enable(NULL);
2171 return 0; 2161 return 0;
2172 } 2162 }
2173 2163
2174 static struct sysdev_class kvm_sysdev_class = { 2164 static struct sysdev_class kvm_sysdev_class = {
2175 .name = "kvm", 2165 .name = "kvm",
2176 .suspend = kvm_suspend, 2166 .suspend = kvm_suspend,
2177 .resume = kvm_resume, 2167 .resume = kvm_resume,
2178 }; 2168 };
2179 2169
2180 static struct sys_device kvm_sysdev = { 2170 static struct sys_device kvm_sysdev = {
2181 .id = 0, 2171 .id = 0,
2182 .cls = &kvm_sysdev_class, 2172 .cls = &kvm_sysdev_class,
2183 }; 2173 };
2184 2174
2185 struct page *bad_page; 2175 struct page *bad_page;
2186 pfn_t bad_pfn; 2176 pfn_t bad_pfn;
2187 2177
2188 static inline 2178 static inline
2189 struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) 2179 struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
2190 { 2180 {
2191 return container_of(pn, struct kvm_vcpu, preempt_notifier); 2181 return container_of(pn, struct kvm_vcpu, preempt_notifier);
2192 } 2182 }
2193 2183
2194 static void kvm_sched_in(struct preempt_notifier *pn, int cpu) 2184 static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
2195 { 2185 {
2196 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); 2186 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
2197 2187
2198 kvm_arch_vcpu_load(vcpu, cpu); 2188 kvm_arch_vcpu_load(vcpu, cpu);
2199 } 2189 }
2200 2190
2201 static void kvm_sched_out(struct preempt_notifier *pn, 2191 static void kvm_sched_out(struct preempt_notifier *pn,
2202 struct task_struct *next) 2192 struct task_struct *next)
2203 { 2193 {
2204 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); 2194 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
2205 2195
2206 kvm_arch_vcpu_put(vcpu); 2196 kvm_arch_vcpu_put(vcpu);
2207 } 2197 }
2208 2198
2209 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, 2199 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
2210 struct module *module) 2200 struct module *module)
2211 { 2201 {
2212 int r; 2202 int r;
2213 int cpu; 2203 int cpu;
2214 2204
2215 r = kvm_arch_init(opaque); 2205 r = kvm_arch_init(opaque);
2216 if (r) 2206 if (r)
2217 goto out_fail; 2207 goto out_fail;
2218 2208
2219 bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO); 2209 bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
2220 2210
2221 if (bad_page == NULL) { 2211 if (bad_page == NULL) {
2222 r = -ENOMEM; 2212 r = -ENOMEM;
2223 goto out; 2213 goto out;
2224 } 2214 }
2225 2215
2226 bad_pfn = page_to_pfn(bad_page); 2216 bad_pfn = page_to_pfn(bad_page);
2227 2217
2228 hwpoison_page = alloc_page(GFP_KERNEL | __GFP_ZERO); 2218 hwpoison_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
2229 2219
2230 if (hwpoison_page == NULL) { 2220 if (hwpoison_page == NULL) {
2231 r = -ENOMEM; 2221 r = -ENOMEM;
2232 goto out_free_0; 2222 goto out_free_0;
2233 } 2223 }
2234 2224
2235 hwpoison_pfn = page_to_pfn(hwpoison_page); 2225 hwpoison_pfn = page_to_pfn(hwpoison_page);
2236 2226
2237 if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) { 2227 if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
2238 r = -ENOMEM; 2228 r = -ENOMEM;
2239 goto out_free_0; 2229 goto out_free_0;
2240 } 2230 }
2241 2231
2242 r = kvm_arch_hardware_setup(); 2232 r = kvm_arch_hardware_setup();
2243 if (r < 0) 2233 if (r < 0)
2244 goto out_free_0a; 2234 goto out_free_0a;
2245 2235
2246 for_each_online_cpu(cpu) { 2236 for_each_online_cpu(cpu) {
2247 smp_call_function_single(cpu, 2237 smp_call_function_single(cpu,
2248 kvm_arch_check_processor_compat, 2238 kvm_arch_check_processor_compat,
2249 &r, 1); 2239 &r, 1);
2250 if (r < 0) 2240 if (r < 0)
2251 goto out_free_1; 2241 goto out_free_1;
2252 } 2242 }
2253 2243
2254 r = register_cpu_notifier(&kvm_cpu_notifier); 2244 r = register_cpu_notifier(&kvm_cpu_notifier);
2255 if (r) 2245 if (r)
2256 goto out_free_2; 2246 goto out_free_2;
2257 register_reboot_notifier(&kvm_reboot_notifier); 2247 register_reboot_notifier(&kvm_reboot_notifier);
2258 2248
2259 r = sysdev_class_register(&kvm_sysdev_class); 2249 r = sysdev_class_register(&kvm_sysdev_class);
2260 if (r) 2250 if (r)
2261 goto out_free_3; 2251 goto out_free_3;
2262 2252
2263 r = sysdev_register(&kvm_sysdev); 2253 r = sysdev_register(&kvm_sysdev);
2264 if (r) 2254 if (r)
2265 goto out_free_4; 2255 goto out_free_4;
2266 2256
2267 /* A kmem cache lets us meet the alignment requirements of fx_save. */ 2257 /* A kmem cache lets us meet the alignment requirements of fx_save. */
2268 if (!vcpu_align) 2258 if (!vcpu_align)
2269 vcpu_align = __alignof__(struct kvm_vcpu); 2259 vcpu_align = __alignof__(struct kvm_vcpu);
2270 kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align, 2260 kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align,
2271 0, NULL); 2261 0, NULL);
2272 if (!kvm_vcpu_cache) { 2262 if (!kvm_vcpu_cache) {
2273 r = -ENOMEM; 2263 r = -ENOMEM;
2274 goto out_free_5; 2264 goto out_free_5;
2275 } 2265 }
2276 2266
2277 kvm_chardev_ops.owner = module; 2267 kvm_chardev_ops.owner = module;
2278 kvm_vm_fops.owner = module; 2268 kvm_vm_fops.owner = module;
2279 kvm_vcpu_fops.owner = module; 2269 kvm_vcpu_fops.owner = module;
2280 2270
2281 r = misc_register(&kvm_dev); 2271 r = misc_register(&kvm_dev);
2282 if (r) { 2272 if (r) {
2283 printk(KERN_ERR "kvm: misc device register failed\n"); 2273 printk(KERN_ERR "kvm: misc device register failed\n");
2284 goto out_free; 2274 goto out_free;
2285 } 2275 }
2286 2276
2287 kvm_preempt_ops.sched_in = kvm_sched_in; 2277 kvm_preempt_ops.sched_in = kvm_sched_in;
2288 kvm_preempt_ops.sched_out = kvm_sched_out; 2278 kvm_preempt_ops.sched_out = kvm_sched_out;
2289 2279
2290 kvm_init_debug(); 2280 kvm_init_debug();
2291 2281
2292 return 0; 2282 return 0;
2293 2283
2294 out_free: 2284 out_free:
2295 kmem_cache_destroy(kvm_vcpu_cache); 2285 kmem_cache_destroy(kvm_vcpu_cache);
2296 out_free_5: 2286 out_free_5:
2297 sysdev_unregister(&kvm_sysdev); 2287 sysdev_unregister(&kvm_sysdev);
2298 out_free_4: 2288 out_free_4:
2299 sysdev_class_unregister(&kvm_sysdev_class); 2289 sysdev_class_unregister(&kvm_sysdev_class);
2300 out_free_3: 2290 out_free_3:
2301 unregister_reboot_notifier(&kvm_reboot_notifier); 2291 unregister_reboot_notifier(&kvm_reboot_notifier);
2302 unregister_cpu_notifier(&kvm_cpu_notifier); 2292 unregister_cpu_notifier(&kvm_cpu_notifier);
2303 out_free_2: 2293 out_free_2:
2304 out_free_1: 2294 out_free_1:
2305 kvm_arch_hardware_unsetup(); 2295 kvm_arch_hardware_unsetup();
2306 out_free_0a: 2296 out_free_0a:
2307 free_cpumask_var(cpus_hardware_enabled); 2297 free_cpumask_var(cpus_hardware_enabled);
2308 out_free_0: 2298 out_free_0:
2309 if (hwpoison_page) 2299 if (hwpoison_page)
2310 __free_page(hwpoison_page); 2300 __free_page(hwpoison_page);
2311 __free_page(bad_page); 2301 __free_page(bad_page);
2312 out: 2302 out:
2313 kvm_arch_exit(); 2303 kvm_arch_exit();
2314 out_fail: 2304 out_fail:
2315 return r; 2305 return r;
2316 } 2306 }
2317 EXPORT_SYMBOL_GPL(kvm_init); 2307 EXPORT_SYMBOL_GPL(kvm_init);
2318 2308
2319 void kvm_exit(void) 2309 void kvm_exit(void)
2320 { 2310 {
2321 kvm_exit_debug(); 2311 kvm_exit_debug();
2322 misc_deregister(&kvm_dev); 2312 misc_deregister(&kvm_dev);
2323 kmem_cache_destroy(kvm_vcpu_cache); 2313 kmem_cache_destroy(kvm_vcpu_cache);
2324 sysdev_unregister(&kvm_sysdev); 2314 sysdev_unregister(&kvm_sysdev);
2325 sysdev_class_unregister(&kvm_sysdev_class); 2315 sysdev_class_unregister(&kvm_sysdev_class);
2326 unregister_reboot_notifier(&kvm_reboot_notifier); 2316 unregister_reboot_notifier(&kvm_reboot_notifier);
2327 unregister_cpu_notifier(&kvm_cpu_notifier); 2317 unregister_cpu_notifier(&kvm_cpu_notifier);
2328 on_each_cpu(hardware_disable, NULL, 1); 2318 on_each_cpu(hardware_disable, NULL, 1);
2329 kvm_arch_hardware_unsetup(); 2319 kvm_arch_hardware_unsetup();
2330 kvm_arch_exit(); 2320 kvm_arch_exit();
2331 free_cpumask_var(cpus_hardware_enabled); 2321 free_cpumask_var(cpus_hardware_enabled);
2332 __free_page(hwpoison_page); 2322 __free_page(hwpoison_page);
2333 __free_page(bad_page); 2323 __free_page(bad_page);
2334 } 2324 }
2335 EXPORT_SYMBOL_GPL(kvm_exit); 2325 EXPORT_SYMBOL_GPL(kvm_exit);
2336 2326