Commit a1f4d39500ad8ed61825eff061debff42386ab5b
1 parent
fc34531db3
Exists in
master
and in
4 other branches
KVM: Remove memory alias support
As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
Showing 13 changed files with 11 additions and 225 deletions Inline Diff
- Documentation/feature-removal-schedule.txt
- Documentation/kvm/api.txt
- arch/ia64/kvm/kvm-ia64.c
- arch/powerpc/kvm/powerpc.c
- arch/s390/kvm/kvm-s390.c
- arch/x86/include/asm/kvm_host.h
- arch/x86/kvm/mmu.c
- arch/x86/kvm/paging_tmpl.h
- arch/x86/kvm/x86.c
- arch/x86/kvm/x86.h
- include/linux/kvm.h
- include/linux/kvm_host.h
- virt/kvm/kvm_main.c
Documentation/feature-removal-schedule.txt
1 | The following is a list of files and features that are going to be | 1 | The following is a list of files and features that are going to be |
2 | removed in the kernel source tree. Every entry should contain what | 2 | removed in the kernel source tree. Every entry should contain what |
3 | exactly is going away, why it is happening, and who is going to be doing | 3 | exactly is going away, why it is happening, and who is going to be doing |
4 | the work. When the feature is removed from the kernel, it should also | 4 | the work. When the feature is removed from the kernel, it should also |
5 | be removed from this file. | 5 | be removed from this file. |
6 | 6 | ||
7 | --------------------------- | 7 | --------------------------- |
8 | 8 | ||
9 | What: PRISM54 | 9 | What: PRISM54 |
10 | When: 2.6.34 | 10 | When: 2.6.34 |
11 | 11 | ||
12 | Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the | 12 | Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the |
13 | prism54 wireless driver. After Intersil stopped selling these | 13 | prism54 wireless driver. After Intersil stopped selling these |
14 | devices in preference for the newer more flexible SoftMAC devices | 14 | devices in preference for the newer more flexible SoftMAC devices |
15 | a SoftMAC device driver was required and prism54 did not support | 15 | a SoftMAC device driver was required and prism54 did not support |
16 | them. The p54pci driver now exists and has been present in the kernel for | 16 | them. The p54pci driver now exists and has been present in the kernel for |
17 | a while. This driver supports both SoftMAC devices and FullMAC devices. | 17 | a while. This driver supports both SoftMAC devices and FullMAC devices. |
18 | The main difference between these devices was the amount of memory which | 18 | The main difference between these devices was the amount of memory which |
19 | could be used for the firmware. The SoftMAC devices support a smaller | 19 | could be used for the firmware. The SoftMAC devices support a smaller |
20 | amount of memory. Because of this the SoftMAC firmware fits into FullMAC | 20 | amount of memory. Because of this the SoftMAC firmware fits into FullMAC |
21 | devices's memory. p54pci supports not only PCI / Cardbus but also USB | 21 | devices's memory. p54pci supports not only PCI / Cardbus but also USB |
22 | and SPI. Since p54pci supports all devices prism54 supports | 22 | and SPI. Since p54pci supports all devices prism54 supports |
23 | you will have a conflict. I'm not quite sure how distributions are | 23 | you will have a conflict. I'm not quite sure how distributions are |
24 | handling this conflict right now. prism54 was kept around due to | 24 | handling this conflict right now. prism54 was kept around due to |
25 | claims users may experience issues when using the SoftMAC driver. | 25 | claims users may experience issues when using the SoftMAC driver. |
26 | Time has passed users have not reported issues. If you use prism54 | 26 | Time has passed users have not reported issues. If you use prism54 |
27 | and for whatever reason you cannot use p54pci please let us know! | 27 | and for whatever reason you cannot use p54pci please let us know! |
28 | E-mail us at: linux-wireless@vger.kernel.org | 28 | E-mail us at: linux-wireless@vger.kernel.org |
29 | 29 | ||
30 | For more information see the p54 wiki page: | 30 | For more information see the p54 wiki page: |
31 | 31 | ||
32 | http://wireless.kernel.org/en/users/Drivers/p54 | 32 | http://wireless.kernel.org/en/users/Drivers/p54 |
33 | 33 | ||
34 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | 34 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> |
35 | 35 | ||
36 | --------------------------- | 36 | --------------------------- |
37 | 37 | ||
38 | What: IRQF_SAMPLE_RANDOM | 38 | What: IRQF_SAMPLE_RANDOM |
39 | Check: IRQF_SAMPLE_RANDOM | 39 | Check: IRQF_SAMPLE_RANDOM |
40 | When: July 2009 | 40 | When: July 2009 |
41 | 41 | ||
42 | Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy | 42 | Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy |
43 | sources in the kernel's current entropy model. To resolve this, every | 43 | sources in the kernel's current entropy model. To resolve this, every |
44 | input point to the kernel's entropy pool needs to better document the | 44 | input point to the kernel's entropy pool needs to better document the |
45 | type of entropy source it actually is. This will be replaced with | 45 | type of entropy source it actually is. This will be replaced with |
46 | additional add_*_randomness functions in drivers/char/random.c | 46 | additional add_*_randomness functions in drivers/char/random.c |
47 | 47 | ||
48 | Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> | 48 | Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> |
49 | 49 | ||
50 | --------------------------- | 50 | --------------------------- |
51 | 51 | ||
52 | What: Deprecated snapshot ioctls | 52 | What: Deprecated snapshot ioctls |
53 | When: 2.6.36 | 53 | When: 2.6.36 |
54 | 54 | ||
55 | Why: The ioctls in kernel/power/user.c were marked as deprecated long time | 55 | Why: The ioctls in kernel/power/user.c were marked as deprecated long time |
56 | ago. Now they notify users about that so that they need to replace | 56 | ago. Now they notify users about that so that they need to replace |
57 | their userspace. After some more time, remove them completely. | 57 | their userspace. After some more time, remove them completely. |
58 | 58 | ||
59 | Who: Jiri Slaby <jirislaby@gmail.com> | 59 | Who: Jiri Slaby <jirislaby@gmail.com> |
60 | 60 | ||
61 | --------------------------- | 61 | --------------------------- |
62 | 62 | ||
63 | What: The ieee80211_regdom module parameter | 63 | What: The ieee80211_regdom module parameter |
64 | When: March 2010 / desktop catchup | 64 | When: March 2010 / desktop catchup |
65 | 65 | ||
66 | Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code, | 66 | Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code, |
67 | and currently serves as an option for users to define an | 67 | and currently serves as an option for users to define an |
68 | ISO / IEC 3166 alpha2 code for the country they are currently | 68 | ISO / IEC 3166 alpha2 code for the country they are currently |
69 | present in. Although there are userspace API replacements for this | 69 | present in. Although there are userspace API replacements for this |
70 | through nl80211 distributions haven't yet caught up with implementing | 70 | through nl80211 distributions haven't yet caught up with implementing |
71 | decent alternatives through standard GUIs. Although available as an | 71 | decent alternatives through standard GUIs. Although available as an |
72 | option through iw or wpa_supplicant its just a matter of time before | 72 | option through iw or wpa_supplicant its just a matter of time before |
73 | distributions pick up good GUI options for this. The ideal solution | 73 | distributions pick up good GUI options for this. The ideal solution |
74 | would actually consist of intelligent designs which would do this for | 74 | would actually consist of intelligent designs which would do this for |
75 | the user automatically even when travelling through different countries. | 75 | the user automatically even when travelling through different countries. |
76 | Until then we leave this module parameter as a compromise. | 76 | Until then we leave this module parameter as a compromise. |
77 | 77 | ||
78 | When userspace improves with reasonable widely-available alternatives for | 78 | When userspace improves with reasonable widely-available alternatives for |
79 | this we will no longer need this module parameter. This entry hopes that | 79 | this we will no longer need this module parameter. This entry hopes that |
80 | by the super-futuristically looking date of "March 2010" we will have | 80 | by the super-futuristically looking date of "March 2010" we will have |
81 | such replacements widely available. | 81 | such replacements widely available. |
82 | 82 | ||
83 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | 83 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> |
84 | 84 | ||
85 | --------------------------- | 85 | --------------------------- |
86 | 86 | ||
87 | What: dev->power.power_state | 87 | What: dev->power.power_state |
88 | When: July 2007 | 88 | When: July 2007 |
89 | Why: Broken design for runtime control over driver power states, confusing | 89 | Why: Broken design for runtime control over driver power states, confusing |
90 | driver-internal runtime power management with: mechanisms to support | 90 | driver-internal runtime power management with: mechanisms to support |
91 | system-wide sleep state transitions; event codes that distinguish | 91 | system-wide sleep state transitions; event codes that distinguish |
92 | different phases of swsusp "sleep" transitions; and userspace policy | 92 | different phases of swsusp "sleep" transitions; and userspace policy |
93 | inputs. This framework was never widely used, and most attempts to | 93 | inputs. This framework was never widely used, and most attempts to |
94 | use it were broken. Drivers should instead be exposing domain-specific | 94 | use it were broken. Drivers should instead be exposing domain-specific |
95 | interfaces either to kernel or to userspace. | 95 | interfaces either to kernel or to userspace. |
96 | Who: Pavel Machek <pavel@suse.cz> | 96 | Who: Pavel Machek <pavel@suse.cz> |
97 | 97 | ||
98 | --------------------------- | 98 | --------------------------- |
99 | 99 | ||
100 | What: Video4Linux API 1 ioctls and from Video devices. | 100 | What: Video4Linux API 1 ioctls and from Video devices. |
101 | When: July 2009 | 101 | When: July 2009 |
102 | Files: include/linux/videodev.h | 102 | Files: include/linux/videodev.h |
103 | Check: include/linux/videodev.h | 103 | Check: include/linux/videodev.h |
104 | Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6 | 104 | Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6 |
105 | series. The old API have lots of drawbacks and don't provide enough | 105 | series. The old API have lots of drawbacks and don't provide enough |
106 | means to work with all video and audio standards. The newer API is | 106 | means to work with all video and audio standards. The newer API is |
107 | already available on the main drivers and should be used instead. | 107 | already available on the main drivers and should be used instead. |
108 | Newer drivers should use v4l_compat_translate_ioctl function to handle | 108 | Newer drivers should use v4l_compat_translate_ioctl function to handle |
109 | old calls, replacing to newer ones. | 109 | old calls, replacing to newer ones. |
110 | Decoder iocts are using internally to allow video drivers to | 110 | Decoder iocts are using internally to allow video drivers to |
111 | communicate with video decoders. This should also be improved to allow | 111 | communicate with video decoders. This should also be improved to allow |
112 | V4L2 calls being translated into compatible internal ioctls. | 112 | V4L2 calls being translated into compatible internal ioctls. |
113 | Compatibility ioctls will be provided, for a while, via | 113 | Compatibility ioctls will be provided, for a while, via |
114 | v4l1-compat module. | 114 | v4l1-compat module. |
115 | Who: Mauro Carvalho Chehab <mchehab@infradead.org> | 115 | Who: Mauro Carvalho Chehab <mchehab@infradead.org> |
116 | 116 | ||
117 | --------------------------- | 117 | --------------------------- |
118 | 118 | ||
119 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) | 119 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) |
120 | When: 2.6.35/2.6.36 | 120 | When: 2.6.35/2.6.36 |
121 | Files: drivers/pcmcia/: pcmcia_ioctl.c | 121 | Files: drivers/pcmcia/: pcmcia_ioctl.c |
122 | Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a | 122 | Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a |
123 | normal hotpluggable bus, and with it using the default kernel | 123 | normal hotpluggable bus, and with it using the default kernel |
124 | infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA | 124 | infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA |
125 | control ioctl needed by cardmgr and cardctl from pcmcia-cs is | 125 | control ioctl needed by cardmgr and cardctl from pcmcia-cs is |
126 | unnecessary and potentially harmful (it does not provide for | 126 | unnecessary and potentially harmful (it does not provide for |
127 | proper locking), and makes further cleanups and integration of the | 127 | proper locking), and makes further cleanups and integration of the |
128 | PCMCIA subsystem into the Linux kernel device driver model more | 128 | PCMCIA subsystem into the Linux kernel device driver model more |
129 | difficult. The features provided by cardmgr and cardctl are either | 129 | difficult. The features provided by cardmgr and cardctl are either |
130 | handled by the kernel itself now or are available in the new | 130 | handled by the kernel itself now or are available in the new |
131 | pcmciautils package available at | 131 | pcmciautils package available at |
132 | http://kernel.org/pub/linux/utils/kernel/pcmcia/ | 132 | http://kernel.org/pub/linux/utils/kernel/pcmcia/ |
133 | 133 | ||
134 | For all architectures except ARM, the associated config symbol | 134 | For all architectures except ARM, the associated config symbol |
135 | has been removed from kernel 2.6.34; for ARM, it will be likely | 135 | has been removed from kernel 2.6.34; for ARM, it will be likely |
136 | be removed from kernel 2.6.35. The actual code will then likely | 136 | be removed from kernel 2.6.35. The actual code will then likely |
137 | be removed from kernel 2.6.36. | 137 | be removed from kernel 2.6.36. |
138 | Who: Dominik Brodowski <linux@dominikbrodowski.net> | 138 | Who: Dominik Brodowski <linux@dominikbrodowski.net> |
139 | 139 | ||
140 | --------------------------- | 140 | --------------------------- |
141 | 141 | ||
142 | What: sys_sysctl | 142 | What: sys_sysctl |
143 | When: September 2010 | 143 | When: September 2010 |
144 | Option: CONFIG_SYSCTL_SYSCALL | 144 | Option: CONFIG_SYSCTL_SYSCALL |
145 | Why: The same information is available in a more convenient from | 145 | Why: The same information is available in a more convenient from |
146 | /proc/sys, and none of the sysctl variables appear to be | 146 | /proc/sys, and none of the sysctl variables appear to be |
147 | important performance wise. | 147 | important performance wise. |
148 | 148 | ||
149 | Binary sysctls are a long standing source of subtle kernel | 149 | Binary sysctls are a long standing source of subtle kernel |
150 | bugs and security issues. | 150 | bugs and security issues. |
151 | 151 | ||
152 | When I looked several months ago all I could find after | 152 | When I looked several months ago all I could find after |
153 | searching several distributions were 5 user space programs and | 153 | searching several distributions were 5 user space programs and |
154 | glibc (which falls back to /proc/sys) using this syscall. | 154 | glibc (which falls back to /proc/sys) using this syscall. |
155 | 155 | ||
156 | The man page for sysctl(2) documents it as unusable for user | 156 | The man page for sysctl(2) documents it as unusable for user |
157 | space programs. | 157 | space programs. |
158 | 158 | ||
159 | sysctl(2) is not generally ABI compatible to a 32bit user | 159 | sysctl(2) is not generally ABI compatible to a 32bit user |
160 | space application on a 64bit and a 32bit kernel. | 160 | space application on a 64bit and a 32bit kernel. |
161 | 161 | ||
162 | For the last several months the policy has been no new binary | 162 | For the last several months the policy has been no new binary |
163 | sysctls and no one has put forward an argument to use them. | 163 | sysctls and no one has put forward an argument to use them. |
164 | 164 | ||
165 | Binary sysctls issues seem to keep happening appearing so | 165 | Binary sysctls issues seem to keep happening appearing so |
166 | properly deprecating them (with a warning to user space) and a | 166 | properly deprecating them (with a warning to user space) and a |
167 | 2 year grace warning period will mean eventually we can kill | 167 | 2 year grace warning period will mean eventually we can kill |
168 | them and end the pain. | 168 | them and end the pain. |
169 | 169 | ||
170 | In the mean time individual binary sysctls can be dealt with | 170 | In the mean time individual binary sysctls can be dealt with |
171 | in a piecewise fashion. | 171 | in a piecewise fashion. |
172 | 172 | ||
173 | Who: Eric Biederman <ebiederm@xmission.com> | 173 | Who: Eric Biederman <ebiederm@xmission.com> |
174 | 174 | ||
175 | --------------------------- | 175 | --------------------------- |
176 | 176 | ||
177 | What: remove EXPORT_SYMBOL(kernel_thread) | 177 | What: remove EXPORT_SYMBOL(kernel_thread) |
178 | When: August 2006 | 178 | When: August 2006 |
179 | Files: arch/*/kernel/*_ksyms.c | 179 | Files: arch/*/kernel/*_ksyms.c |
180 | Check: kernel_thread | 180 | Check: kernel_thread |
181 | Why: kernel_thread is a low-level implementation detail. Drivers should | 181 | Why: kernel_thread is a low-level implementation detail. Drivers should |
182 | use the <linux/kthread.h> API instead which shields them from | 182 | use the <linux/kthread.h> API instead which shields them from |
183 | implementation details and provides a higherlevel interface that | 183 | implementation details and provides a higherlevel interface that |
184 | prevents bugs and code duplication | 184 | prevents bugs and code duplication |
185 | Who: Christoph Hellwig <hch@lst.de> | 185 | Who: Christoph Hellwig <hch@lst.de> |
186 | 186 | ||
187 | --------------------------- | 187 | --------------------------- |
188 | 188 | ||
189 | What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports | 189 | What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports |
190 | (temporary transition config option provided until then) | 190 | (temporary transition config option provided until then) |
191 | The transition config option will also be removed at the same time. | 191 | The transition config option will also be removed at the same time. |
192 | When: before 2.6.19 | 192 | When: before 2.6.19 |
193 | Why: Unused symbols are both increasing the size of the kernel binary | 193 | Why: Unused symbols are both increasing the size of the kernel binary |
194 | and are often a sign of "wrong API" | 194 | and are often a sign of "wrong API" |
195 | Who: Arjan van de Ven <arjan@linux.intel.com> | 195 | Who: Arjan van de Ven <arjan@linux.intel.com> |
196 | 196 | ||
197 | --------------------------- | 197 | --------------------------- |
198 | 198 | ||
199 | What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment | 199 | What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment |
200 | When: October 2008 | 200 | When: October 2008 |
201 | Why: The stacking of class devices makes these values misleading and | 201 | Why: The stacking of class devices makes these values misleading and |
202 | inconsistent. | 202 | inconsistent. |
203 | Class devices should not carry any of these properties, and bus | 203 | Class devices should not carry any of these properties, and bus |
204 | devices have SUBSYTEM and DRIVER as a replacement. | 204 | devices have SUBSYTEM and DRIVER as a replacement. |
205 | Who: Kay Sievers <kay.sievers@suse.de> | 205 | Who: Kay Sievers <kay.sievers@suse.de> |
206 | 206 | ||
207 | --------------------------- | 207 | --------------------------- |
208 | 208 | ||
209 | What: ACPI procfs interface | 209 | What: ACPI procfs interface |
210 | When: July 2008 | 210 | When: July 2008 |
211 | Why: ACPI sysfs conversion should be finished by January 2008. | 211 | Why: ACPI sysfs conversion should be finished by January 2008. |
212 | ACPI procfs interface will be removed in July 2008 so that | 212 | ACPI procfs interface will be removed in July 2008 so that |
213 | there is enough time for the user space to catch up. | 213 | there is enough time for the user space to catch up. |
214 | Who: Zhang Rui <rui.zhang@intel.com> | 214 | Who: Zhang Rui <rui.zhang@intel.com> |
215 | 215 | ||
216 | --------------------------- | 216 | --------------------------- |
217 | 217 | ||
218 | What: /proc/acpi/button | 218 | What: /proc/acpi/button |
219 | When: August 2007 | 219 | When: August 2007 |
220 | Why: /proc/acpi/button has been replaced by events to the input layer | 220 | Why: /proc/acpi/button has been replaced by events to the input layer |
221 | since 2.6.20. | 221 | since 2.6.20. |
222 | Who: Len Brown <len.brown@intel.com> | 222 | Who: Len Brown <len.brown@intel.com> |
223 | 223 | ||
224 | --------------------------- | 224 | --------------------------- |
225 | 225 | ||
226 | What: /proc/acpi/event | 226 | What: /proc/acpi/event |
227 | When: February 2008 | 227 | When: February 2008 |
228 | Why: /proc/acpi/event has been replaced by events via the input layer | 228 | Why: /proc/acpi/event has been replaced by events via the input layer |
229 | and netlink since 2.6.23. | 229 | and netlink since 2.6.23. |
230 | Who: Len Brown <len.brown@intel.com> | 230 | Who: Len Brown <len.brown@intel.com> |
231 | 231 | ||
232 | --------------------------- | 232 | --------------------------- |
233 | 233 | ||
234 | What: i386/x86_64 bzImage symlinks | 234 | What: i386/x86_64 bzImage symlinks |
235 | When: April 2010 | 235 | When: April 2010 |
236 | 236 | ||
237 | Why: The i386/x86_64 merge provides a symlink to the old bzImage | 237 | Why: The i386/x86_64 merge provides a symlink to the old bzImage |
238 | location so not yet updated user space tools, e.g. package | 238 | location so not yet updated user space tools, e.g. package |
239 | scripts, do not break. | 239 | scripts, do not break. |
240 | Who: Thomas Gleixner <tglx@linutronix.de> | 240 | Who: Thomas Gleixner <tglx@linutronix.de> |
241 | 241 | ||
242 | --------------------------- | 242 | --------------------------- |
243 | 243 | ||
244 | What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib | 244 | What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib |
245 | When: February 2010 | 245 | When: February 2010 |
246 | Why: All callers should use explicit gpio_request()/gpio_free(). | 246 | Why: All callers should use explicit gpio_request()/gpio_free(). |
247 | The autorequest mechanism in gpiolib was provided mostly as a | 247 | The autorequest mechanism in gpiolib was provided mostly as a |
248 | migration aid for legacy GPIO interfaces (for SOC based GPIOs). | 248 | migration aid for legacy GPIO interfaces (for SOC based GPIOs). |
249 | Those users have now largely migrated. Platforms implementing | 249 | Those users have now largely migrated. Platforms implementing |
250 | the GPIO interfaces without using gpiolib will see no changes. | 250 | the GPIO interfaces without using gpiolib will see no changes. |
251 | Who: David Brownell <dbrownell@users.sourceforge.net> | 251 | Who: David Brownell <dbrownell@users.sourceforge.net> |
252 | --------------------------- | 252 | --------------------------- |
253 | 253 | ||
254 | What: b43 support for firmware revision < 410 | 254 | What: b43 support for firmware revision < 410 |
255 | When: The schedule was July 2008, but it was decided that we are going to keep the | 255 | When: The schedule was July 2008, but it was decided that we are going to keep the |
256 | code as long as there are no major maintanance headaches. | 256 | code as long as there are no major maintanance headaches. |
257 | So it _could_ be removed _any_ time now, if it conflicts with something new. | 257 | So it _could_ be removed _any_ time now, if it conflicts with something new. |
258 | Why: The support code for the old firmware hurts code readability/maintainability | 258 | Why: The support code for the old firmware hurts code readability/maintainability |
259 | and slightly hurts runtime performance. Bugfixes for the old firmware | 259 | and slightly hurts runtime performance. Bugfixes for the old firmware |
260 | are not provided by Broadcom anymore. | 260 | are not provided by Broadcom anymore. |
261 | Who: Michael Buesch <mb@bu3sch.de> | 261 | Who: Michael Buesch <mb@bu3sch.de> |
262 | 262 | ||
263 | --------------------------- | 263 | --------------------------- |
264 | 264 | ||
265 | What: /sys/o2cb symlink | 265 | What: /sys/o2cb symlink |
266 | When: January 2010 | 266 | When: January 2010 |
267 | Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb | 267 | Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb |
268 | exists as a symlink for backwards compatibility for old versions of | 268 | exists as a symlink for backwards compatibility for old versions of |
269 | ocfs2-tools. 2 years should be sufficient time to phase in new versions | 269 | ocfs2-tools. 2 years should be sufficient time to phase in new versions |
270 | which know to look in /sys/fs/o2cb. | 270 | which know to look in /sys/fs/o2cb. |
271 | Who: ocfs2-devel@oss.oracle.com | 271 | Who: ocfs2-devel@oss.oracle.com |
272 | 272 | ||
273 | --------------------------- | 273 | --------------------------- |
274 | 274 | ||
275 | What: Ability for non root users to shm_get hugetlb pages based on mlock | 275 | What: Ability for non root users to shm_get hugetlb pages based on mlock |
276 | resource limits | 276 | resource limits |
277 | When: 2.6.31 | 277 | When: 2.6.31 |
278 | Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or | 278 | Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or |
279 | have CAP_IPC_LOCK to be able to allocate shm segments backed by | 279 | have CAP_IPC_LOCK to be able to allocate shm segments backed by |
280 | huge pages. The mlock based rlimit check to allow shm hugetlb is | 280 | huge pages. The mlock based rlimit check to allow shm hugetlb is |
281 | inconsistent with mmap based allocations. Hence it is being | 281 | inconsistent with mmap based allocations. Hence it is being |
282 | deprecated. | 282 | deprecated. |
283 | Who: Ravikiran Thirumalai <kiran@scalex86.org> | 283 | Who: Ravikiran Thirumalai <kiran@scalex86.org> |
284 | 284 | ||
285 | --------------------------- | 285 | --------------------------- |
286 | 286 | ||
287 | What: CONFIG_THERMAL_HWMON | 287 | What: CONFIG_THERMAL_HWMON |
288 | When: January 2009 | 288 | When: January 2009 |
289 | Why: This option was introduced just to allow older lm-sensors userspace | 289 | Why: This option was introduced just to allow older lm-sensors userspace |
290 | to keep working over the upgrade to 2.6.26. At the scheduled time of | 290 | to keep working over the upgrade to 2.6.26. At the scheduled time of |
291 | removal fixed lm-sensors (2.x or 3.x) should be readily available. | 291 | removal fixed lm-sensors (2.x or 3.x) should be readily available. |
292 | Who: Rene Herman <rene.herman@gmail.com> | 292 | Who: Rene Herman <rene.herman@gmail.com> |
293 | 293 | ||
294 | --------------------------- | 294 | --------------------------- |
295 | 295 | ||
296 | What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS | 296 | What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS |
297 | (in net/core/net-sysfs.c) | 297 | (in net/core/net-sysfs.c) |
298 | When: After the only user (hal) has seen a release with the patches | 298 | When: After the only user (hal) has seen a release with the patches |
299 | for enough time, probably some time in 2010. | 299 | for enough time, probably some time in 2010. |
300 | Why: Over 1K .text/.data size reduction, data is available in other | 300 | Why: Over 1K .text/.data size reduction, data is available in other |
301 | ways (ioctls) | 301 | ways (ioctls) |
302 | Who: Johannes Berg <johannes@sipsolutions.net> | 302 | Who: Johannes Berg <johannes@sipsolutions.net> |
303 | 303 | ||
304 | --------------------------- | 304 | --------------------------- |
305 | 305 | ||
306 | What: CONFIG_NF_CT_ACCT | 306 | What: CONFIG_NF_CT_ACCT |
307 | When: 2.6.29 | 307 | When: 2.6.29 |
308 | Why: Accounting can now be enabled/disabled without kernel recompilation. | 308 | Why: Accounting can now be enabled/disabled without kernel recompilation. |
309 | Currently used only to set a default value for a feature that is also | 309 | Currently used only to set a default value for a feature that is also |
310 | controlled by a kernel/module/sysfs/sysctl parameter. | 310 | controlled by a kernel/module/sysfs/sysctl parameter. |
311 | Who: Krzysztof Piotr Oledzki <ole@ans.pl> | 311 | Who: Krzysztof Piotr Oledzki <ole@ans.pl> |
312 | 312 | ||
313 | --------------------------- | 313 | --------------------------- |
314 | 314 | ||
315 | What: sysfs ui for changing p4-clockmod parameters | 315 | What: sysfs ui for changing p4-clockmod parameters |
316 | When: September 2009 | 316 | When: September 2009 |
317 | Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and | 317 | Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and |
318 | e088e4c9cdb618675874becb91b2fd581ee707e6. | 318 | e088e4c9cdb618675874becb91b2fd581ee707e6. |
319 | Removal is subject to fixing any remaining bugs in ACPI which may | 319 | Removal is subject to fixing any remaining bugs in ACPI which may |
320 | cause the thermal throttling not to happen at the right time. | 320 | cause the thermal throttling not to happen at the right time. |
321 | Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com> | 321 | Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com> |
322 | 322 | ||
323 | ----------------------------- | 323 | ----------------------------- |
324 | 324 | ||
325 | What: __do_IRQ all in one fits nothing interrupt handler | 325 | What: __do_IRQ all in one fits nothing interrupt handler |
326 | When: 2.6.32 | 326 | When: 2.6.32 |
327 | Why: __do_IRQ was kept for easy migration to the type flow handlers. | 327 | Why: __do_IRQ was kept for easy migration to the type flow handlers. |
328 | More than two years of migration time is enough. | 328 | More than two years of migration time is enough. |
329 | Who: Thomas Gleixner <tglx@linutronix.de> | 329 | Who: Thomas Gleixner <tglx@linutronix.de> |
330 | 330 | ||
331 | ----------------------------- | 331 | ----------------------------- |
332 | 332 | ||
333 | What: fakephp and associated sysfs files in /sys/bus/pci/slots/ | 333 | What: fakephp and associated sysfs files in /sys/bus/pci/slots/ |
334 | When: 2011 | 334 | When: 2011 |
335 | Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to | 335 | Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to |
336 | represent a machine's physical PCI slots. The change in semantics | 336 | represent a machine's physical PCI slots. The change in semantics |
337 | had userspace implications, as the hotplug core no longer allowed | 337 | had userspace implications, as the hotplug core no longer allowed |
338 | drivers to create multiple sysfs files per physical slot (required | 338 | drivers to create multiple sysfs files per physical slot (required |
339 | for multi-function devices, e.g.). fakephp was seen as a developer's | 339 | for multi-function devices, e.g.). fakephp was seen as a developer's |
340 | tool only, and its interface changed. Too late, we learned that | 340 | tool only, and its interface changed. Too late, we learned that |
341 | there were some users of the fakephp interface. | 341 | there were some users of the fakephp interface. |
342 | 342 | ||
343 | In 2.6.30, the original fakephp interface was restored. At the same | 343 | In 2.6.30, the original fakephp interface was restored. At the same |
344 | time, the PCI core gained the ability that fakephp provided, namely | 344 | time, the PCI core gained the ability that fakephp provided, namely |
345 | function-level hot-remove and hot-add. | 345 | function-level hot-remove and hot-add. |
346 | 346 | ||
347 | Since the PCI core now provides the same functionality, exposed in: | 347 | Since the PCI core now provides the same functionality, exposed in: |
348 | 348 | ||
349 | /sys/bus/pci/rescan | 349 | /sys/bus/pci/rescan |
350 | /sys/bus/pci/devices/.../remove | 350 | /sys/bus/pci/devices/.../remove |
351 | /sys/bus/pci/devices/.../rescan | 351 | /sys/bus/pci/devices/.../rescan |
352 | 352 | ||
353 | there is no functional reason to maintain fakephp as well. | 353 | there is no functional reason to maintain fakephp as well. |
354 | 354 | ||
355 | We will keep the existing module so that 'modprobe fakephp' will | 355 | We will keep the existing module so that 'modprobe fakephp' will |
356 | present the old /sys/bus/pci/slots/... interface for compatibility, | 356 | present the old /sys/bus/pci/slots/... interface for compatibility, |
357 | but users are urged to migrate their applications to the API above. | 357 | but users are urged to migrate their applications to the API above. |
358 | 358 | ||
359 | After a reasonable transition period, we will remove the legacy | 359 | After a reasonable transition period, we will remove the legacy |
360 | fakephp interface. | 360 | fakephp interface. |
361 | Who: Alex Chiang <achiang@hp.com> | 361 | Who: Alex Chiang <achiang@hp.com> |
362 | 362 | ||
363 | --------------------------- | 363 | --------------------------- |
364 | 364 | ||
365 | What: CONFIG_RFKILL_INPUT | 365 | What: CONFIG_RFKILL_INPUT |
366 | When: 2.6.33 | 366 | When: 2.6.33 |
367 | Why: Should be implemented in userspace, policy daemon. | 367 | Why: Should be implemented in userspace, policy daemon. |
368 | Who: Johannes Berg <johannes@sipsolutions.net> | 368 | Who: Johannes Berg <johannes@sipsolutions.net> |
369 | 369 | ||
370 | --------------------------- | 370 | --------------------------- |
371 | 371 | ||
372 | What: CONFIG_INOTIFY | 372 | What: CONFIG_INOTIFY |
373 | When: 2.6.33 | 373 | When: 2.6.33 |
374 | Why: last user (audit) will be converted to the newer more generic | 374 | Why: last user (audit) will be converted to the newer more generic |
375 | and more easily maintained fsnotify subsystem | 375 | and more easily maintained fsnotify subsystem |
376 | Who: Eric Paris <eparis@redhat.com> | 376 | Who: Eric Paris <eparis@redhat.com> |
377 | 377 | ||
378 | ---------------------------- | 378 | ---------------------------- |
379 | 379 | ||
380 | What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be | 380 | What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be |
381 | exported interface anymore. | 381 | exported interface anymore. |
382 | When: 2.6.33 | 382 | When: 2.6.33 |
383 | Why: cpu_policy_rwsem has a new cleaner definition making it local to | 383 | Why: cpu_policy_rwsem has a new cleaner definition making it local to |
384 | cpufreq core and contained inside cpufreq.c. Other dependent | 384 | cpufreq core and contained inside cpufreq.c. Other dependent |
385 | drivers should not use it in order to safely avoid lockdep issues. | 385 | drivers should not use it in order to safely avoid lockdep issues. |
386 | Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> | 386 | Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> |
387 | 387 | ||
388 | ---------------------------- | 388 | ---------------------------- |
389 | 389 | ||
390 | What: sound-slot/service-* module aliases and related clutters in | 390 | What: sound-slot/service-* module aliases and related clutters in |
391 | sound/sound_core.c | 391 | sound/sound_core.c |
392 | When: August 2010 | 392 | When: August 2010 |
393 | Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR | 393 | Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR |
394 | (14) and requests modules using custom sound-slot/service-* | 394 | (14) and requests modules using custom sound-slot/service-* |
395 | module aliases. The only benefit of doing this is allowing | 395 | module aliases. The only benefit of doing this is allowing |
396 | use of custom module aliases which might as well be considered | 396 | use of custom module aliases which might as well be considered |
397 | a bug at this point. This preemptive claiming prevents | 397 | a bug at this point. This preemptive claiming prevents |
398 | alternative OSS implementations. | 398 | alternative OSS implementations. |
399 | 399 | ||
400 | Till the feature is removed, the kernel will be requesting | 400 | Till the feature is removed, the kernel will be requesting |
401 | both sound-slot/service-* and the standard char-major-* module | 401 | both sound-slot/service-* and the standard char-major-* module |
402 | aliases and allow turning off the pre-claiming selectively via | 402 | aliases and allow turning off the pre-claiming selectively via |
403 | CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss | 403 | CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss |
404 | kernel parameter. | 404 | kernel parameter. |
405 | 405 | ||
406 | After the transition phase is complete, both the custom module | 406 | After the transition phase is complete, both the custom module |
407 | aliases and switches to disable it will go away. This removal | 407 | aliases and switches to disable it will go away. This removal |
408 | will also allow making ALSA OSS emulation independent of | 408 | will also allow making ALSA OSS emulation independent of |
409 | sound_core. The dependency will be broken then too. | 409 | sound_core. The dependency will be broken then too. |
410 | Who: Tejun Heo <tj@kernel.org> | 410 | Who: Tejun Heo <tj@kernel.org> |
411 | 411 | ||
412 | ---------------------------- | 412 | ---------------------------- |
413 | 413 | ||
414 | What: Support for VMware's guest paravirtuliazation technique [VMI] will be | 414 | What: Support for VMware's guest paravirtuliazation technique [VMI] will be |
415 | dropped. | 415 | dropped. |
416 | When: 2.6.37 or earlier. | 416 | When: 2.6.37 or earlier. |
417 | Why: With the recent innovations in CPU hardware acceleration technologies | 417 | Why: With the recent innovations in CPU hardware acceleration technologies |
418 | from Intel and AMD, VMware ran a few experiments to compare these | 418 | from Intel and AMD, VMware ran a few experiments to compare these |
419 | techniques to guest paravirtualization technique on VMware's platform. | 419 | techniques to guest paravirtualization technique on VMware's platform. |
420 | These hardware assisted virtualization techniques have outperformed the | 420 | These hardware assisted virtualization techniques have outperformed the |
421 | performance benefits provided by VMI in most of the workloads. VMware | 421 | performance benefits provided by VMI in most of the workloads. VMware |
422 | expects that these hardware features will be ubiquitous in a couple of | 422 | expects that these hardware features will be ubiquitous in a couple of |
423 | years, as a result, VMware has started a phased retirement of this | 423 | years, as a result, VMware has started a phased retirement of this |
424 | feature from the hypervisor. We will be removing this feature from the | 424 | feature from the hypervisor. We will be removing this feature from the |
425 | Kernel too. Right now we are targeting 2.6.37 but can retire earlier if | 425 | Kernel too. Right now we are targeting 2.6.37 but can retire earlier if |
426 | technical reasons (read opportunity to remove major chunk of pvops) | 426 | technical reasons (read opportunity to remove major chunk of pvops) |
427 | arise. | 427 | arise. |
428 | 428 | ||
429 | Please note that VMI has always been an optimization and non-VMI kernels | 429 | Please note that VMI has always been an optimization and non-VMI kernels |
430 | still work fine on VMware's platform. | 430 | still work fine on VMware's platform. |
431 | Latest versions of VMware's product which support VMI are, | 431 | Latest versions of VMware's product which support VMI are, |
432 | Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence | 432 | Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence |
433 | releases for these products will continue supporting VMI. | 433 | releases for these products will continue supporting VMI. |
434 | 434 | ||
435 | For more details about VMI retirement take a look at this, | 435 | For more details about VMI retirement take a look at this, |
436 | http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html | 436 | http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html |
437 | 437 | ||
438 | Who: Alok N Kataria <akataria@vmware.com> | 438 | Who: Alok N Kataria <akataria@vmware.com> |
439 | 439 | ||
440 | ---------------------------- | 440 | ---------------------------- |
441 | 441 | ||
442 | What: Support for lcd_switch and display_get in asus-laptop driver | 442 | What: Support for lcd_switch and display_get in asus-laptop driver |
443 | When: March 2010 | 443 | When: March 2010 |
444 | Why: These two features use non-standard interfaces. There are the | 444 | Why: These two features use non-standard interfaces. There are the |
445 | only features that really need multiple path to guess what's | 445 | only features that really need multiple path to guess what's |
446 | the right method name on a specific laptop. | 446 | the right method name on a specific laptop. |
447 | 447 | ||
448 | Removing them will allow to remove a lot of code an significantly | 448 | Removing them will allow to remove a lot of code an significantly |
449 | clean the drivers. | 449 | clean the drivers. |
450 | 450 | ||
451 | This will affect the backlight code which won't be able to know | 451 | This will affect the backlight code which won't be able to know |
452 | if the backlight is on or off. The platform display file will also be | 452 | if the backlight is on or off. The platform display file will also be |
453 | write only (like the one in eeepc-laptop). | 453 | write only (like the one in eeepc-laptop). |
454 | 454 | ||
455 | This should'nt affect a lot of user because they usually know | 455 | This should'nt affect a lot of user because they usually know |
456 | when their display is on or off. | 456 | when their display is on or off. |
457 | 457 | ||
458 | Who: Corentin Chary <corentin.chary@gmail.com> | 458 | Who: Corentin Chary <corentin.chary@gmail.com> |
459 | 459 | ||
460 | ---------------------------- | 460 | ---------------------------- |
461 | 461 | ||
462 | What: usbvideo quickcam_messenger driver | 462 | What: usbvideo quickcam_messenger driver |
463 | When: 2.6.35 | 463 | When: 2.6.35 |
464 | Files: drivers/media/video/usbvideo/quickcam_messenger.[ch] | 464 | Files: drivers/media/video/usbvideo/quickcam_messenger.[ch] |
465 | Why: obsolete v4l1 driver replaced by gspca_stv06xx | 465 | Why: obsolete v4l1 driver replaced by gspca_stv06xx |
466 | Who: Hans de Goede <hdegoede@redhat.com> | 466 | Who: Hans de Goede <hdegoede@redhat.com> |
467 | 467 | ||
468 | ---------------------------- | 468 | ---------------------------- |
469 | 469 | ||
470 | What: ov511 v4l1 driver | 470 | What: ov511 v4l1 driver |
471 | When: 2.6.35 | 471 | When: 2.6.35 |
472 | Files: drivers/media/video/ov511.[ch] | 472 | Files: drivers/media/video/ov511.[ch] |
473 | Why: obsolete v4l1 driver replaced by gspca_ov519 | 473 | Why: obsolete v4l1 driver replaced by gspca_ov519 |
474 | Who: Hans de Goede <hdegoede@redhat.com> | 474 | Who: Hans de Goede <hdegoede@redhat.com> |
475 | 475 | ||
476 | ---------------------------- | 476 | ---------------------------- |
477 | 477 | ||
478 | What: w9968cf v4l1 driver | 478 | What: w9968cf v4l1 driver |
479 | When: 2.6.35 | 479 | When: 2.6.35 |
480 | Files: drivers/media/video/w9968cf*.[ch] | 480 | Files: drivers/media/video/w9968cf*.[ch] |
481 | Why: obsolete v4l1 driver replaced by gspca_ov519 | 481 | Why: obsolete v4l1 driver replaced by gspca_ov519 |
482 | Who: Hans de Goede <hdegoede@redhat.com> | 482 | Who: Hans de Goede <hdegoede@redhat.com> |
483 | 483 | ||
484 | ---------------------------- | 484 | ---------------------------- |
485 | 485 | ||
486 | What: ovcamchip sensor framework | 486 | What: ovcamchip sensor framework |
487 | When: 2.6.35 | 487 | When: 2.6.35 |
488 | Files: drivers/media/video/ovcamchip/* | 488 | Files: drivers/media/video/ovcamchip/* |
489 | Why: Only used by obsoleted v4l1 drivers | 489 | Why: Only used by obsoleted v4l1 drivers |
490 | Who: Hans de Goede <hdegoede@redhat.com> | 490 | Who: Hans de Goede <hdegoede@redhat.com> |
491 | 491 | ||
492 | ---------------------------- | 492 | ---------------------------- |
493 | 493 | ||
494 | What: stv680 v4l1 driver | 494 | What: stv680 v4l1 driver |
495 | When: 2.6.35 | 495 | When: 2.6.35 |
496 | Files: drivers/media/video/stv680.[ch] | 496 | Files: drivers/media/video/stv680.[ch] |
497 | Why: obsolete v4l1 driver replaced by gspca_stv0680 | 497 | Why: obsolete v4l1 driver replaced by gspca_stv0680 |
498 | Who: Hans de Goede <hdegoede@redhat.com> | 498 | Who: Hans de Goede <hdegoede@redhat.com> |
499 | 499 | ||
500 | ---------------------------- | 500 | ---------------------------- |
501 | 501 | ||
502 | What: zc0301 v4l driver | 502 | What: zc0301 v4l driver |
503 | When: 2.6.35 | 503 | When: 2.6.35 |
504 | Files: drivers/media/video/zc0301/* | 504 | Files: drivers/media/video/zc0301/* |
505 | Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only | 505 | Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only |
506 | supports 2 USB-ID's (because it only supports a limited set of | 506 | supports 2 USB-ID's (because it only supports a limited set of |
507 | sensors) wich are also supported by the gspca_zc3xx driver | 507 | sensors) wich are also supported by the gspca_zc3xx driver |
508 | (which supports 53 USB-ID's in total) | 508 | (which supports 53 USB-ID's in total) |
509 | Who: Hans de Goede <hdegoede@redhat.com> | 509 | Who: Hans de Goede <hdegoede@redhat.com> |
510 | 510 | ||
511 | ---------------------------- | 511 | ---------------------------- |
512 | 512 | ||
513 | What: sysfs-class-rfkill state file | 513 | What: sysfs-class-rfkill state file |
514 | When: Feb 2014 | 514 | When: Feb 2014 |
515 | Files: net/rfkill/core.c | 515 | Files: net/rfkill/core.c |
516 | Why: Documented as obsolete since Feb 2010. This file is limited to 3 | 516 | Why: Documented as obsolete since Feb 2010. This file is limited to 3 |
517 | states while the rfkill drivers can have 4 states. | 517 | states while the rfkill drivers can have 4 states. |
518 | Who: anybody or Florian Mickler <florian@mickler.org> | 518 | Who: anybody or Florian Mickler <florian@mickler.org> |
519 | 519 | ||
520 | ---------------------------- | 520 | ---------------------------- |
521 | 521 | ||
522 | What: sysfs-class-rfkill claim file | 522 | What: sysfs-class-rfkill claim file |
523 | When: Feb 2012 | 523 | When: Feb 2012 |
524 | Files: net/rfkill/core.c | 524 | Files: net/rfkill/core.c |
525 | Why: It is not possible to claim an rfkill driver since 2007. This is | 525 | Why: It is not possible to claim an rfkill driver since 2007. This is |
526 | Documented as obsolete since Feb 2010. | 526 | Documented as obsolete since Feb 2010. |
527 | Who: anybody or Florian Mickler <florian@mickler.org> | 527 | Who: anybody or Florian Mickler <florian@mickler.org> |
528 | 528 | ||
529 | ---------------------------- | 529 | ---------------------------- |
530 | 530 | ||
531 | What: capifs | 531 | What: capifs |
532 | When: February 2011 | 532 | When: February 2011 |
533 | Files: drivers/isdn/capi/capifs.* | 533 | Files: drivers/isdn/capi/capifs.* |
534 | Why: udev fully replaces this special file system that only contains CAPI | 534 | Why: udev fully replaces this special file system that only contains CAPI |
535 | NCCI TTY device nodes. User space (pppdcapiplugin) works without | 535 | NCCI TTY device nodes. User space (pppdcapiplugin) works without |
536 | noticing the difference. | 536 | noticing the difference. |
537 | Who: Jan Kiszka <jan.kiszka@web.de> | 537 | Who: Jan Kiszka <jan.kiszka@web.de> |
538 | 538 | ||
539 | ---------------------------- | 539 | ---------------------------- |
540 | 540 | ||
541 | What: KVM memory aliases support | ||
542 | When: July 2010 | ||
543 | Why: Memory aliasing support is used for speeding up guest vga access | ||
544 | through the vga windows. | ||
545 | |||
546 | Modern userspace no longer uses this feature, so it's just bitrotted | ||
547 | code and can be removed with no impact. | ||
548 | Who: Avi Kivity <avi@redhat.com> | ||
549 | |||
550 | ---------------------------- | ||
551 | |||
552 | What: xtime, wall_to_monotonic | 541 | What: xtime, wall_to_monotonic |
553 | When: 2.6.36+ | 542 | When: 2.6.36+ |
554 | Files: kernel/time/timekeeping.c include/linux/time.h | 543 | Files: kernel/time/timekeeping.c include/linux/time.h |
555 | Why: Cleaning up timekeeping internal values. Please use | 544 | Why: Cleaning up timekeeping internal values. Please use |
556 | existing timekeeping accessor functions to access | 545 | existing timekeeping accessor functions to access |
557 | the equivalent functionality. | 546 | the equivalent functionality. |
558 | Who: John Stultz <johnstul@us.ibm.com> | 547 | Who: John Stultz <johnstul@us.ibm.com> |
559 | 548 | ||
560 | ---------------------------- | 549 | ---------------------------- |
561 | 550 | ||
562 | What: KVM kernel-allocated memory slots | 551 | What: KVM kernel-allocated memory slots |
563 | When: July 2010 | 552 | When: July 2010 |
564 | Why: Since 2.6.25, kvm supports user-allocated memory slots, which are | 553 | Why: Since 2.6.25, kvm supports user-allocated memory slots, which are |
565 | much more flexible than kernel-allocated slots. All current userspace | 554 | much more flexible than kernel-allocated slots. All current userspace |
566 | supports the newer interface and this code can be removed with no | 555 | supports the newer interface and this code can be removed with no |
567 | impact. | 556 | impact. |
568 | Who: Avi Kivity <avi@redhat.com> | 557 | Who: Avi Kivity <avi@redhat.com> |
569 | 558 | ||
570 | ---------------------------- | 559 | ---------------------------- |
571 | 560 | ||
572 | What: KVM paravirt mmu host support | 561 | What: KVM paravirt mmu host support |
573 | When: January 2011 | 562 | When: January 2011 |
574 | Why: The paravirt mmu host support is slower than non-paravirt mmu, both | 563 | Why: The paravirt mmu host support is slower than non-paravirt mmu, both |
575 | on newer and older hardware. It is already not exposed to the guest, | 564 | on newer and older hardware. It is already not exposed to the guest, |
576 | and kept only for live migration purposes. | 565 | and kept only for live migration purposes. |
577 | Who: Avi Kivity <avi@redhat.com> | 566 | Who: Avi Kivity <avi@redhat.com> |
578 | 567 | ||
579 | ---------------------------- | 568 | ---------------------------- |
580 | 569 | ||
581 | What: iwlwifi 50XX module parameters | 570 | What: iwlwifi 50XX module parameters |
582 | When: 2.6.40 | 571 | When: 2.6.40 |
583 | Why: The "..50" modules parameters were used to configure 5000 series and | 572 | Why: The "..50" modules parameters were used to configure 5000 series and |
584 | up devices; different set of module parameters also available for 4965 | 573 | up devices; different set of module parameters also available for 4965 |
585 | with same functionalities. Consolidate both set into single place | 574 | with same functionalities. Consolidate both set into single place |
586 | in drivers/net/wireless/iwlwifi/iwl-agn.c | 575 | in drivers/net/wireless/iwlwifi/iwl-agn.c |
587 | 576 | ||
588 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> | 577 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> |
589 | 578 | ||
590 | ---------------------------- | 579 | ---------------------------- |
591 | 580 | ||
592 | What: iwl4965 alias support | 581 | What: iwl4965 alias support |
593 | When: 2.6.40 | 582 | When: 2.6.40 |
594 | Why: Internal alias support has been present in module-init-tools for some | 583 | Why: Internal alias support has been present in module-init-tools for some |
595 | time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed | 584 | time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed |
596 | with no impact. | 585 | with no impact. |
597 | 586 | ||
598 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> | 587 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> |
599 | 588 | ||
600 | --------------------------- | 589 | --------------------------- |
601 | 590 | ||
602 | What: xt_NOTRACK | 591 | What: xt_NOTRACK |
603 | Files: net/netfilter/xt_NOTRACK.c | 592 | Files: net/netfilter/xt_NOTRACK.c |
604 | When: April 2011 | 593 | When: April 2011 |
605 | Why: Superseded by xt_CT | 594 | Why: Superseded by xt_CT |
606 | Who: Netfilter developer team <netfilter-devel@vger.kernel.org> | 595 | Who: Netfilter developer team <netfilter-devel@vger.kernel.org> |
607 | 596 | ||
608 | --------------------------- | 597 | --------------------------- |
609 | 598 | ||
610 | What: video4linux /dev/vtx teletext API support | 599 | What: video4linux /dev/vtx teletext API support |
611 | When: 2.6.35 | 600 | When: 2.6.35 |
612 | Files: drivers/media/video/saa5246a.c drivers/media/video/saa5249.c | 601 | Files: drivers/media/video/saa5246a.c drivers/media/video/saa5249.c |
613 | include/linux/videotext.h | 602 | include/linux/videotext.h |
614 | Why: The vtx device nodes have been superseded by vbi device nodes | 603 | Why: The vtx device nodes have been superseded by vbi device nodes |
615 | for many years. No applications exist that use the vtx support. | 604 | for many years. No applications exist that use the vtx support. |
616 | Of the two i2c drivers that actually support this API the saa5249 | 605 | Of the two i2c drivers that actually support this API the saa5249 |
617 | has been impossible to use for a year now and no known hardware | 606 | has been impossible to use for a year now and no known hardware |
618 | that supports this device exists. The saa5246a is theoretically | 607 | that supports this device exists. The saa5246a is theoretically |
619 | supported by the old mxb boards, but it never actually worked. | 608 | supported by the old mxb boards, but it never actually worked. |
620 | 609 | ||
621 | In summary: there is no hardware that can use this API and there | 610 | In summary: there is no hardware that can use this API and there |
622 | are no applications actually implementing this API. | 611 | are no applications actually implementing this API. |
623 | 612 | ||
624 | The vtx support still reserves minors 192-223 and we would really | 613 | The vtx support still reserves minors 192-223 and we would really |
625 | like to reuse those for upcoming new functionality. In the unlikely | 614 | like to reuse those for upcoming new functionality. In the unlikely |
626 | event that new hardware appears that wants to use the functionality | 615 | event that new hardware appears that wants to use the functionality |
627 | provided by the vtx API, then that functionality should be build | 616 | provided by the vtx API, then that functionality should be build |
628 | around the sliced VBI API instead. | 617 | around the sliced VBI API instead. |
629 | Who: Hans Verkuil <hverkuil@xs4all.nl> | 618 | Who: Hans Verkuil <hverkuil@xs4all.nl> |
630 | 619 | ||
631 | ---------------------------- | 620 | ---------------------------- |
632 | 621 | ||
633 | What: IRQF_DISABLED | 622 | What: IRQF_DISABLED |
634 | When: 2.6.36 | 623 | When: 2.6.36 |
635 | Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled | 624 | Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled |
636 | Who: Thomas Gleixner <tglx@linutronix.de> | 625 | Who: Thomas Gleixner <tglx@linutronix.de> |
637 | 626 | ||
638 | ---------------------------- | 627 | ---------------------------- |
639 | 628 | ||
640 | What: old ieee1394 subsystem (CONFIG_IEEE1394) | 629 | What: old ieee1394 subsystem (CONFIG_IEEE1394) |
641 | When: 2.6.37 | 630 | When: 2.6.37 |
642 | Files: drivers/ieee1394/ except init_ohci1394_dma.c | 631 | Files: drivers/ieee1394/ except init_ohci1394_dma.c |
643 | Why: superseded by drivers/firewire/ (CONFIG_FIREWIRE) which offers more | 632 | Why: superseded by drivers/firewire/ (CONFIG_FIREWIRE) which offers more |
644 | features, better performance, and better security, all with smaller | 633 | features, better performance, and better security, all with smaller |
645 | and more modern code base | 634 | and more modern code base |
646 | Who: Stefan Richter <stefanr@s5r6.in-berlin.de> | 635 | Who: Stefan Richter <stefanr@s5r6.in-berlin.de> |
647 | 636 | ||
648 | ---------------------------- | 637 | ---------------------------- |
649 | 638 | ||
650 | What: The acpi_sleep=s4_nonvs command line option | 639 | What: The acpi_sleep=s4_nonvs command line option |
651 | When: 2.6.37 | 640 | When: 2.6.37 |
652 | Files: arch/x86/kernel/acpi/sleep.c | 641 | Files: arch/x86/kernel/acpi/sleep.c |
653 | Why: superseded by acpi_sleep=nonvs | 642 | Why: superseded by acpi_sleep=nonvs |
654 | Who: Rafael J. Wysocki <rjw@sisk.pl> | 643 | Who: Rafael J. Wysocki <rjw@sisk.pl> |
655 | 644 | ||
656 | ---------------------------- | 645 | ---------------------------- |
657 | 646 |
Documentation/kvm/api.txt
1 | The Definitive KVM (Kernel-based Virtual Machine) API Documentation | 1 | The Definitive KVM (Kernel-based Virtual Machine) API Documentation |
2 | =================================================================== | 2 | =================================================================== |
3 | 3 | ||
4 | 1. General description | 4 | 1. General description |
5 | 5 | ||
6 | The kvm API is a set of ioctls that are issued to control various aspects | 6 | The kvm API is a set of ioctls that are issued to control various aspects |
7 | of a virtual machine. The ioctls belong to three classes | 7 | of a virtual machine. The ioctls belong to three classes |
8 | 8 | ||
9 | - System ioctls: These query and set global attributes which affect the | 9 | - System ioctls: These query and set global attributes which affect the |
10 | whole kvm subsystem. In addition a system ioctl is used to create | 10 | whole kvm subsystem. In addition a system ioctl is used to create |
11 | virtual machines | 11 | virtual machines |
12 | 12 | ||
13 | - VM ioctls: These query and set attributes that affect an entire virtual | 13 | - VM ioctls: These query and set attributes that affect an entire virtual |
14 | machine, for example memory layout. In addition a VM ioctl is used to | 14 | machine, for example memory layout. In addition a VM ioctl is used to |
15 | create virtual cpus (vcpus). | 15 | create virtual cpus (vcpus). |
16 | 16 | ||
17 | Only run VM ioctls from the same process (address space) that was used | 17 | Only run VM ioctls from the same process (address space) that was used |
18 | to create the VM. | 18 | to create the VM. |
19 | 19 | ||
20 | - vcpu ioctls: These query and set attributes that control the operation | 20 | - vcpu ioctls: These query and set attributes that control the operation |
21 | of a single virtual cpu. | 21 | of a single virtual cpu. |
22 | 22 | ||
23 | Only run vcpu ioctls from the same thread that was used to create the | 23 | Only run vcpu ioctls from the same thread that was used to create the |
24 | vcpu. | 24 | vcpu. |
25 | 25 | ||
26 | 2. File descriptors | 26 | 2. File descriptors |
27 | 27 | ||
28 | The kvm API is centered around file descriptors. An initial | 28 | The kvm API is centered around file descriptors. An initial |
29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle | 29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle |
30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this | 30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this |
31 | handle will create a VM file descriptor which can be used to issue VM | 31 | handle will create a VM file descriptor which can be used to issue VM |
32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu | 32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu |
33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu | 33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu |
34 | fd can be used to control the vcpu, including the important task of | 34 | fd can be used to control the vcpu, including the important task of |
35 | actually running guest code. | 35 | actually running guest code. |
36 | 36 | ||
37 | In general file descriptors can be migrated among processes by means | 37 | In general file descriptors can be migrated among processes by means |
38 | of fork() and the SCM_RIGHTS facility of unix domain socket. These | 38 | of fork() and the SCM_RIGHTS facility of unix domain socket. These |
39 | kinds of tricks are explicitly not supported by kvm. While they will | 39 | kinds of tricks are explicitly not supported by kvm. While they will |
40 | not cause harm to the host, their actual behavior is not guaranteed by | 40 | not cause harm to the host, their actual behavior is not guaranteed by |
41 | the API. The only supported use is one virtual machine per process, | 41 | the API. The only supported use is one virtual machine per process, |
42 | and one vcpu per thread. | 42 | and one vcpu per thread. |
43 | 43 | ||
44 | 3. Extensions | 44 | 3. Extensions |
45 | 45 | ||
46 | As of Linux 2.6.22, the KVM ABI has been stabilized: no backward | 46 | As of Linux 2.6.22, the KVM ABI has been stabilized: no backward |
47 | incompatible change are allowed. However, there is an extension | 47 | incompatible change are allowed. However, there is an extension |
48 | facility that allows backward-compatible extensions to the API to be | 48 | facility that allows backward-compatible extensions to the API to be |
49 | queried and used. | 49 | queried and used. |
50 | 50 | ||
51 | The extension mechanism is not based on on the Linux version number. | 51 | The extension mechanism is not based on on the Linux version number. |
52 | Instead, kvm defines extension identifiers and a facility to query | 52 | Instead, kvm defines extension identifiers and a facility to query |
53 | whether a particular extension identifier is available. If it is, a | 53 | whether a particular extension identifier is available. If it is, a |
54 | set of ioctls is available for application use. | 54 | set of ioctls is available for application use. |
55 | 55 | ||
56 | 4. API description | 56 | 4. API description |
57 | 57 | ||
58 | This section describes ioctls that can be used to control kvm guests. | 58 | This section describes ioctls that can be used to control kvm guests. |
59 | For each ioctl, the following information is provided along with a | 59 | For each ioctl, the following information is provided along with a |
60 | description: | 60 | description: |
61 | 61 | ||
62 | Capability: which KVM extension provides this ioctl. Can be 'basic', | 62 | Capability: which KVM extension provides this ioctl. Can be 'basic', |
63 | which means that is will be provided by any kernel that supports | 63 | which means that is will be provided by any kernel that supports |
64 | API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which | 64 | API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which |
65 | means availability needs to be checked with KVM_CHECK_EXTENSION | 65 | means availability needs to be checked with KVM_CHECK_EXTENSION |
66 | (see section 4.4). | 66 | (see section 4.4). |
67 | 67 | ||
68 | Architectures: which instruction set architectures provide this ioctl. | 68 | Architectures: which instruction set architectures provide this ioctl. |
69 | x86 includes both i386 and x86_64. | 69 | x86 includes both i386 and x86_64. |
70 | 70 | ||
71 | Type: system, vm, or vcpu. | 71 | Type: system, vm, or vcpu. |
72 | 72 | ||
73 | Parameters: what parameters are accepted by the ioctl. | 73 | Parameters: what parameters are accepted by the ioctl. |
74 | 74 | ||
75 | Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) | 75 | Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) |
76 | are not detailed, but errors with specific meanings are. | 76 | are not detailed, but errors with specific meanings are. |
77 | 77 | ||
78 | 4.1 KVM_GET_API_VERSION | 78 | 4.1 KVM_GET_API_VERSION |
79 | 79 | ||
80 | Capability: basic | 80 | Capability: basic |
81 | Architectures: all | 81 | Architectures: all |
82 | Type: system ioctl | 82 | Type: system ioctl |
83 | Parameters: none | 83 | Parameters: none |
84 | Returns: the constant KVM_API_VERSION (=12) | 84 | Returns: the constant KVM_API_VERSION (=12) |
85 | 85 | ||
86 | This identifies the API version as the stable kvm API. It is not | 86 | This identifies the API version as the stable kvm API. It is not |
87 | expected that this number will change. However, Linux 2.6.20 and | 87 | expected that this number will change. However, Linux 2.6.20 and |
88 | 2.6.21 report earlier versions; these are not documented and not | 88 | 2.6.21 report earlier versions; these are not documented and not |
89 | supported. Applications should refuse to run if KVM_GET_API_VERSION | 89 | supported. Applications should refuse to run if KVM_GET_API_VERSION |
90 | returns a value other than 12. If this check passes, all ioctls | 90 | returns a value other than 12. If this check passes, all ioctls |
91 | described as 'basic' will be available. | 91 | described as 'basic' will be available. |
92 | 92 | ||
93 | 4.2 KVM_CREATE_VM | 93 | 4.2 KVM_CREATE_VM |
94 | 94 | ||
95 | Capability: basic | 95 | Capability: basic |
96 | Architectures: all | 96 | Architectures: all |
97 | Type: system ioctl | 97 | Type: system ioctl |
98 | Parameters: none | 98 | Parameters: none |
99 | Returns: a VM fd that can be used to control the new virtual machine. | 99 | Returns: a VM fd that can be used to control the new virtual machine. |
100 | 100 | ||
101 | The new VM has no virtual cpus and no memory. An mmap() of a VM fd | 101 | The new VM has no virtual cpus and no memory. An mmap() of a VM fd |
102 | will access the virtual machine's physical address space; offset zero | 102 | will access the virtual machine's physical address space; offset zero |
103 | corresponds to guest physical address zero. Use of mmap() on a VM fd | 103 | corresponds to guest physical address zero. Use of mmap() on a VM fd |
104 | is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is | 104 | is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is |
105 | available. | 105 | available. |
106 | 106 | ||
107 | 4.3 KVM_GET_MSR_INDEX_LIST | 107 | 4.3 KVM_GET_MSR_INDEX_LIST |
108 | 108 | ||
109 | Capability: basic | 109 | Capability: basic |
110 | Architectures: x86 | 110 | Architectures: x86 |
111 | Type: system | 111 | Type: system |
112 | Parameters: struct kvm_msr_list (in/out) | 112 | Parameters: struct kvm_msr_list (in/out) |
113 | Returns: 0 on success; -1 on error | 113 | Returns: 0 on success; -1 on error |
114 | Errors: | 114 | Errors: |
115 | E2BIG: the msr index list is to be to fit in the array specified by | 115 | E2BIG: the msr index list is to be to fit in the array specified by |
116 | the user. | 116 | the user. |
117 | 117 | ||
118 | struct kvm_msr_list { | 118 | struct kvm_msr_list { |
119 | __u32 nmsrs; /* number of msrs in entries */ | 119 | __u32 nmsrs; /* number of msrs in entries */ |
120 | __u32 indices[0]; | 120 | __u32 indices[0]; |
121 | }; | 121 | }; |
122 | 122 | ||
123 | This ioctl returns the guest msrs that are supported. The list varies | 123 | This ioctl returns the guest msrs that are supported. The list varies |
124 | by kvm version and host processor, but does not change otherwise. The | 124 | by kvm version and host processor, but does not change otherwise. The |
125 | user fills in the size of the indices array in nmsrs, and in return | 125 | user fills in the size of the indices array in nmsrs, and in return |
126 | kvm adjusts nmsrs to reflect the actual number of msrs and fills in | 126 | kvm adjusts nmsrs to reflect the actual number of msrs and fills in |
127 | the indices array with their numbers. | 127 | the indices array with their numbers. |
128 | 128 | ||
129 | 4.4 KVM_CHECK_EXTENSION | 129 | 4.4 KVM_CHECK_EXTENSION |
130 | 130 | ||
131 | Capability: basic | 131 | Capability: basic |
132 | Architectures: all | 132 | Architectures: all |
133 | Type: system ioctl | 133 | Type: system ioctl |
134 | Parameters: extension identifier (KVM_CAP_*) | 134 | Parameters: extension identifier (KVM_CAP_*) |
135 | Returns: 0 if unsupported; 1 (or some other positive integer) if supported | 135 | Returns: 0 if unsupported; 1 (or some other positive integer) if supported |
136 | 136 | ||
137 | The API allows the application to query about extensions to the core | 137 | The API allows the application to query about extensions to the core |
138 | kvm API. Userspace passes an extension identifier (an integer) and | 138 | kvm API. Userspace passes an extension identifier (an integer) and |
139 | receives an integer that describes the extension availability. | 139 | receives an integer that describes the extension availability. |
140 | Generally 0 means no and 1 means yes, but some extensions may report | 140 | Generally 0 means no and 1 means yes, but some extensions may report |
141 | additional information in the integer return value. | 141 | additional information in the integer return value. |
142 | 142 | ||
143 | 4.5 KVM_GET_VCPU_MMAP_SIZE | 143 | 4.5 KVM_GET_VCPU_MMAP_SIZE |
144 | 144 | ||
145 | Capability: basic | 145 | Capability: basic |
146 | Architectures: all | 146 | Architectures: all |
147 | Type: system ioctl | 147 | Type: system ioctl |
148 | Parameters: none | 148 | Parameters: none |
149 | Returns: size of vcpu mmap area, in bytes | 149 | Returns: size of vcpu mmap area, in bytes |
150 | 150 | ||
151 | The KVM_RUN ioctl (cf.) communicates with userspace via a shared | 151 | The KVM_RUN ioctl (cf.) communicates with userspace via a shared |
152 | memory region. This ioctl returns the size of that region. See the | 152 | memory region. This ioctl returns the size of that region. See the |
153 | KVM_RUN documentation for details. | 153 | KVM_RUN documentation for details. |
154 | 154 | ||
155 | 4.6 KVM_SET_MEMORY_REGION | 155 | 4.6 KVM_SET_MEMORY_REGION |
156 | 156 | ||
157 | Capability: basic | 157 | Capability: basic |
158 | Architectures: all | 158 | Architectures: all |
159 | Type: vm ioctl | 159 | Type: vm ioctl |
160 | Parameters: struct kvm_memory_region (in) | 160 | Parameters: struct kvm_memory_region (in) |
161 | Returns: 0 on success, -1 on error | 161 | Returns: 0 on success, -1 on error |
162 | 162 | ||
163 | struct kvm_memory_region { | 163 | struct kvm_memory_region { |
164 | __u32 slot; | 164 | __u32 slot; |
165 | __u32 flags; | 165 | __u32 flags; |
166 | __u64 guest_phys_addr; | 166 | __u64 guest_phys_addr; |
167 | __u64 memory_size; /* bytes */ | 167 | __u64 memory_size; /* bytes */ |
168 | }; | 168 | }; |
169 | 169 | ||
170 | /* for kvm_memory_region::flags */ | 170 | /* for kvm_memory_region::flags */ |
171 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | 171 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL |
172 | 172 | ||
173 | This ioctl allows the user to create or modify a guest physical memory | 173 | This ioctl allows the user to create or modify a guest physical memory |
174 | slot. When changing an existing slot, it may be moved in the guest | 174 | slot. When changing an existing slot, it may be moved in the guest |
175 | physical memory space, or its flags may be modified. It may not be | 175 | physical memory space, or its flags may be modified. It may not be |
176 | resized. Slots may not overlap. | 176 | resized. Slots may not overlap. |
177 | 177 | ||
178 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | 178 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which |
179 | instructs kvm to keep track of writes to memory within the slot. See | 179 | instructs kvm to keep track of writes to memory within the slot. See |
180 | the KVM_GET_DIRTY_LOG ioctl. | 180 | the KVM_GET_DIRTY_LOG ioctl. |
181 | 181 | ||
182 | It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead | 182 | It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead |
183 | of this API, if available. This newer API allows placing guest memory | 183 | of this API, if available. This newer API allows placing guest memory |
184 | at specified locations in the host address space, yielding better | 184 | at specified locations in the host address space, yielding better |
185 | control and easy access. | 185 | control and easy access. |
186 | 186 | ||
187 | 4.6 KVM_CREATE_VCPU | 187 | 4.6 KVM_CREATE_VCPU |
188 | 188 | ||
189 | Capability: basic | 189 | Capability: basic |
190 | Architectures: all | 190 | Architectures: all |
191 | Type: vm ioctl | 191 | Type: vm ioctl |
192 | Parameters: vcpu id (apic id on x86) | 192 | Parameters: vcpu id (apic id on x86) |
193 | Returns: vcpu fd on success, -1 on error | 193 | Returns: vcpu fd on success, -1 on error |
194 | 194 | ||
195 | This API adds a vcpu to a virtual machine. The vcpu id is a small integer | 195 | This API adds a vcpu to a virtual machine. The vcpu id is a small integer |
196 | in the range [0, max_vcpus). | 196 | in the range [0, max_vcpus). |
197 | 197 | ||
198 | 4.7 KVM_GET_DIRTY_LOG (vm ioctl) | 198 | 4.7 KVM_GET_DIRTY_LOG (vm ioctl) |
199 | 199 | ||
200 | Capability: basic | 200 | Capability: basic |
201 | Architectures: x86 | 201 | Architectures: x86 |
202 | Type: vm ioctl | 202 | Type: vm ioctl |
203 | Parameters: struct kvm_dirty_log (in/out) | 203 | Parameters: struct kvm_dirty_log (in/out) |
204 | Returns: 0 on success, -1 on error | 204 | Returns: 0 on success, -1 on error |
205 | 205 | ||
206 | /* for KVM_GET_DIRTY_LOG */ | 206 | /* for KVM_GET_DIRTY_LOG */ |
207 | struct kvm_dirty_log { | 207 | struct kvm_dirty_log { |
208 | __u32 slot; | 208 | __u32 slot; |
209 | __u32 padding; | 209 | __u32 padding; |
210 | union { | 210 | union { |
211 | void __user *dirty_bitmap; /* one bit per page */ | 211 | void __user *dirty_bitmap; /* one bit per page */ |
212 | __u64 padding; | 212 | __u64 padding; |
213 | }; | 213 | }; |
214 | }; | 214 | }; |
215 | 215 | ||
216 | Given a memory slot, return a bitmap containing any pages dirtied | 216 | Given a memory slot, return a bitmap containing any pages dirtied |
217 | since the last call to this ioctl. Bit 0 is the first page in the | 217 | since the last call to this ioctl. Bit 0 is the first page in the |
218 | memory slot. Ensure the entire structure is cleared to avoid padding | 218 | memory slot. Ensure the entire structure is cleared to avoid padding |
219 | issues. | 219 | issues. |
220 | 220 | ||
221 | 4.8 KVM_SET_MEMORY_ALIAS | 221 | 4.8 KVM_SET_MEMORY_ALIAS |
222 | 222 | ||
223 | Capability: basic | 223 | Capability: basic |
224 | Architectures: x86 | 224 | Architectures: x86 |
225 | Type: vm ioctl | 225 | Type: vm ioctl |
226 | Parameters: struct kvm_memory_alias (in) | 226 | Parameters: struct kvm_memory_alias (in) |
227 | Returns: 0 (success), -1 (error) | 227 | Returns: 0 (success), -1 (error) |
228 | 228 | ||
229 | struct kvm_memory_alias { | 229 | This ioctl is obsolete and has been removed. |
230 | __u32 slot; /* this has a different namespace than memory slots */ | ||
231 | __u32 flags; | ||
232 | __u64 guest_phys_addr; | ||
233 | __u64 memory_size; | ||
234 | __u64 target_phys_addr; | ||
235 | }; | ||
236 | |||
237 | Defines a guest physical address space region as an alias to another | ||
238 | region. Useful for aliased address, for example the VGA low memory | ||
239 | window. Should not be used with userspace memory. | ||
240 | 230 | ||
241 | 4.9 KVM_RUN | 231 | 4.9 KVM_RUN |
242 | 232 | ||
243 | Capability: basic | 233 | Capability: basic |
244 | Architectures: all | 234 | Architectures: all |
245 | Type: vcpu ioctl | 235 | Type: vcpu ioctl |
246 | Parameters: none | 236 | Parameters: none |
247 | Returns: 0 on success, -1 on error | 237 | Returns: 0 on success, -1 on error |
248 | Errors: | 238 | Errors: |
249 | EINTR: an unmasked signal is pending | 239 | EINTR: an unmasked signal is pending |
250 | 240 | ||
251 | This ioctl is used to run a guest virtual cpu. While there are no | 241 | This ioctl is used to run a guest virtual cpu. While there are no |
252 | explicit parameters, there is an implicit parameter block that can be | 242 | explicit parameters, there is an implicit parameter block that can be |
253 | obtained by mmap()ing the vcpu fd at offset 0, with the size given by | 243 | obtained by mmap()ing the vcpu fd at offset 0, with the size given by |
254 | KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct | 244 | KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct |
255 | kvm_run' (see below). | 245 | kvm_run' (see below). |
256 | 246 | ||
257 | 4.10 KVM_GET_REGS | 247 | 4.10 KVM_GET_REGS |
258 | 248 | ||
259 | Capability: basic | 249 | Capability: basic |
260 | Architectures: all | 250 | Architectures: all |
261 | Type: vcpu ioctl | 251 | Type: vcpu ioctl |
262 | Parameters: struct kvm_regs (out) | 252 | Parameters: struct kvm_regs (out) |
263 | Returns: 0 on success, -1 on error | 253 | Returns: 0 on success, -1 on error |
264 | 254 | ||
265 | Reads the general purpose registers from the vcpu. | 255 | Reads the general purpose registers from the vcpu. |
266 | 256 | ||
267 | /* x86 */ | 257 | /* x86 */ |
268 | struct kvm_regs { | 258 | struct kvm_regs { |
269 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ | 259 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ |
270 | __u64 rax, rbx, rcx, rdx; | 260 | __u64 rax, rbx, rcx, rdx; |
271 | __u64 rsi, rdi, rsp, rbp; | 261 | __u64 rsi, rdi, rsp, rbp; |
272 | __u64 r8, r9, r10, r11; | 262 | __u64 r8, r9, r10, r11; |
273 | __u64 r12, r13, r14, r15; | 263 | __u64 r12, r13, r14, r15; |
274 | __u64 rip, rflags; | 264 | __u64 rip, rflags; |
275 | }; | 265 | }; |
276 | 266 | ||
277 | 4.11 KVM_SET_REGS | 267 | 4.11 KVM_SET_REGS |
278 | 268 | ||
279 | Capability: basic | 269 | Capability: basic |
280 | Architectures: all | 270 | Architectures: all |
281 | Type: vcpu ioctl | 271 | Type: vcpu ioctl |
282 | Parameters: struct kvm_regs (in) | 272 | Parameters: struct kvm_regs (in) |
283 | Returns: 0 on success, -1 on error | 273 | Returns: 0 on success, -1 on error |
284 | 274 | ||
285 | Writes the general purpose registers into the vcpu. | 275 | Writes the general purpose registers into the vcpu. |
286 | 276 | ||
287 | See KVM_GET_REGS for the data structure. | 277 | See KVM_GET_REGS for the data structure. |
288 | 278 | ||
289 | 4.12 KVM_GET_SREGS | 279 | 4.12 KVM_GET_SREGS |
290 | 280 | ||
291 | Capability: basic | 281 | Capability: basic |
292 | Architectures: x86 | 282 | Architectures: x86 |
293 | Type: vcpu ioctl | 283 | Type: vcpu ioctl |
294 | Parameters: struct kvm_sregs (out) | 284 | Parameters: struct kvm_sregs (out) |
295 | Returns: 0 on success, -1 on error | 285 | Returns: 0 on success, -1 on error |
296 | 286 | ||
297 | Reads special registers from the vcpu. | 287 | Reads special registers from the vcpu. |
298 | 288 | ||
299 | /* x86 */ | 289 | /* x86 */ |
300 | struct kvm_sregs { | 290 | struct kvm_sregs { |
301 | struct kvm_segment cs, ds, es, fs, gs, ss; | 291 | struct kvm_segment cs, ds, es, fs, gs, ss; |
302 | struct kvm_segment tr, ldt; | 292 | struct kvm_segment tr, ldt; |
303 | struct kvm_dtable gdt, idt; | 293 | struct kvm_dtable gdt, idt; |
304 | __u64 cr0, cr2, cr3, cr4, cr8; | 294 | __u64 cr0, cr2, cr3, cr4, cr8; |
305 | __u64 efer; | 295 | __u64 efer; |
306 | __u64 apic_base; | 296 | __u64 apic_base; |
307 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; | 297 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; |
308 | }; | 298 | }; |
309 | 299 | ||
310 | interrupt_bitmap is a bitmap of pending external interrupts. At most | 300 | interrupt_bitmap is a bitmap of pending external interrupts. At most |
311 | one bit may be set. This interrupt has been acknowledged by the APIC | 301 | one bit may be set. This interrupt has been acknowledged by the APIC |
312 | but not yet injected into the cpu core. | 302 | but not yet injected into the cpu core. |
313 | 303 | ||
314 | 4.13 KVM_SET_SREGS | 304 | 4.13 KVM_SET_SREGS |
315 | 305 | ||
316 | Capability: basic | 306 | Capability: basic |
317 | Architectures: x86 | 307 | Architectures: x86 |
318 | Type: vcpu ioctl | 308 | Type: vcpu ioctl |
319 | Parameters: struct kvm_sregs (in) | 309 | Parameters: struct kvm_sregs (in) |
320 | Returns: 0 on success, -1 on error | 310 | Returns: 0 on success, -1 on error |
321 | 311 | ||
322 | Writes special registers into the vcpu. See KVM_GET_SREGS for the | 312 | Writes special registers into the vcpu. See KVM_GET_SREGS for the |
323 | data structures. | 313 | data structures. |
324 | 314 | ||
325 | 4.14 KVM_TRANSLATE | 315 | 4.14 KVM_TRANSLATE |
326 | 316 | ||
327 | Capability: basic | 317 | Capability: basic |
328 | Architectures: x86 | 318 | Architectures: x86 |
329 | Type: vcpu ioctl | 319 | Type: vcpu ioctl |
330 | Parameters: struct kvm_translation (in/out) | 320 | Parameters: struct kvm_translation (in/out) |
331 | Returns: 0 on success, -1 on error | 321 | Returns: 0 on success, -1 on error |
332 | 322 | ||
333 | Translates a virtual address according to the vcpu's current address | 323 | Translates a virtual address according to the vcpu's current address |
334 | translation mode. | 324 | translation mode. |
335 | 325 | ||
336 | struct kvm_translation { | 326 | struct kvm_translation { |
337 | /* in */ | 327 | /* in */ |
338 | __u64 linear_address; | 328 | __u64 linear_address; |
339 | 329 | ||
340 | /* out */ | 330 | /* out */ |
341 | __u64 physical_address; | 331 | __u64 physical_address; |
342 | __u8 valid; | 332 | __u8 valid; |
343 | __u8 writeable; | 333 | __u8 writeable; |
344 | __u8 usermode; | 334 | __u8 usermode; |
345 | __u8 pad[5]; | 335 | __u8 pad[5]; |
346 | }; | 336 | }; |
347 | 337 | ||
348 | 4.15 KVM_INTERRUPT | 338 | 4.15 KVM_INTERRUPT |
349 | 339 | ||
350 | Capability: basic | 340 | Capability: basic |
351 | Architectures: x86 | 341 | Architectures: x86 |
352 | Type: vcpu ioctl | 342 | Type: vcpu ioctl |
353 | Parameters: struct kvm_interrupt (in) | 343 | Parameters: struct kvm_interrupt (in) |
354 | Returns: 0 on success, -1 on error | 344 | Returns: 0 on success, -1 on error |
355 | 345 | ||
356 | Queues a hardware interrupt vector to be injected. This is only | 346 | Queues a hardware interrupt vector to be injected. This is only |
357 | useful if in-kernel local APIC is not used. | 347 | useful if in-kernel local APIC is not used. |
358 | 348 | ||
359 | /* for KVM_INTERRUPT */ | 349 | /* for KVM_INTERRUPT */ |
360 | struct kvm_interrupt { | 350 | struct kvm_interrupt { |
361 | /* in */ | 351 | /* in */ |
362 | __u32 irq; | 352 | __u32 irq; |
363 | }; | 353 | }; |
364 | 354 | ||
365 | Note 'irq' is an interrupt vector, not an interrupt pin or line. | 355 | Note 'irq' is an interrupt vector, not an interrupt pin or line. |
366 | 356 | ||
367 | 4.16 KVM_DEBUG_GUEST | 357 | 4.16 KVM_DEBUG_GUEST |
368 | 358 | ||
369 | Capability: basic | 359 | Capability: basic |
370 | Architectures: none | 360 | Architectures: none |
371 | Type: vcpu ioctl | 361 | Type: vcpu ioctl |
372 | Parameters: none) | 362 | Parameters: none) |
373 | Returns: -1 on error | 363 | Returns: -1 on error |
374 | 364 | ||
375 | Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. | 365 | Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. |
376 | 366 | ||
377 | 4.17 KVM_GET_MSRS | 367 | 4.17 KVM_GET_MSRS |
378 | 368 | ||
379 | Capability: basic | 369 | Capability: basic |
380 | Architectures: x86 | 370 | Architectures: x86 |
381 | Type: vcpu ioctl | 371 | Type: vcpu ioctl |
382 | Parameters: struct kvm_msrs (in/out) | 372 | Parameters: struct kvm_msrs (in/out) |
383 | Returns: 0 on success, -1 on error | 373 | Returns: 0 on success, -1 on error |
384 | 374 | ||
385 | Reads model-specific registers from the vcpu. Supported msr indices can | 375 | Reads model-specific registers from the vcpu. Supported msr indices can |
386 | be obtained using KVM_GET_MSR_INDEX_LIST. | 376 | be obtained using KVM_GET_MSR_INDEX_LIST. |
387 | 377 | ||
388 | struct kvm_msrs { | 378 | struct kvm_msrs { |
389 | __u32 nmsrs; /* number of msrs in entries */ | 379 | __u32 nmsrs; /* number of msrs in entries */ |
390 | __u32 pad; | 380 | __u32 pad; |
391 | 381 | ||
392 | struct kvm_msr_entry entries[0]; | 382 | struct kvm_msr_entry entries[0]; |
393 | }; | 383 | }; |
394 | 384 | ||
395 | struct kvm_msr_entry { | 385 | struct kvm_msr_entry { |
396 | __u32 index; | 386 | __u32 index; |
397 | __u32 reserved; | 387 | __u32 reserved; |
398 | __u64 data; | 388 | __u64 data; |
399 | }; | 389 | }; |
400 | 390 | ||
401 | Application code should set the 'nmsrs' member (which indicates the | 391 | Application code should set the 'nmsrs' member (which indicates the |
402 | size of the entries array) and the 'index' member of each array entry. | 392 | size of the entries array) and the 'index' member of each array entry. |
403 | kvm will fill in the 'data' member. | 393 | kvm will fill in the 'data' member. |
404 | 394 | ||
405 | 4.18 KVM_SET_MSRS | 395 | 4.18 KVM_SET_MSRS |
406 | 396 | ||
407 | Capability: basic | 397 | Capability: basic |
408 | Architectures: x86 | 398 | Architectures: x86 |
409 | Type: vcpu ioctl | 399 | Type: vcpu ioctl |
410 | Parameters: struct kvm_msrs (in) | 400 | Parameters: struct kvm_msrs (in) |
411 | Returns: 0 on success, -1 on error | 401 | Returns: 0 on success, -1 on error |
412 | 402 | ||
413 | Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the | 403 | Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the |
414 | data structures. | 404 | data structures. |
415 | 405 | ||
416 | Application code should set the 'nmsrs' member (which indicates the | 406 | Application code should set the 'nmsrs' member (which indicates the |
417 | size of the entries array), and the 'index' and 'data' members of each | 407 | size of the entries array), and the 'index' and 'data' members of each |
418 | array entry. | 408 | array entry. |
419 | 409 | ||
420 | 4.19 KVM_SET_CPUID | 410 | 4.19 KVM_SET_CPUID |
421 | 411 | ||
422 | Capability: basic | 412 | Capability: basic |
423 | Architectures: x86 | 413 | Architectures: x86 |
424 | Type: vcpu ioctl | 414 | Type: vcpu ioctl |
425 | Parameters: struct kvm_cpuid (in) | 415 | Parameters: struct kvm_cpuid (in) |
426 | Returns: 0 on success, -1 on error | 416 | Returns: 0 on success, -1 on error |
427 | 417 | ||
428 | Defines the vcpu responses to the cpuid instruction. Applications | 418 | Defines the vcpu responses to the cpuid instruction. Applications |
429 | should use the KVM_SET_CPUID2 ioctl if available. | 419 | should use the KVM_SET_CPUID2 ioctl if available. |
430 | 420 | ||
431 | 421 | ||
432 | struct kvm_cpuid_entry { | 422 | struct kvm_cpuid_entry { |
433 | __u32 function; | 423 | __u32 function; |
434 | __u32 eax; | 424 | __u32 eax; |
435 | __u32 ebx; | 425 | __u32 ebx; |
436 | __u32 ecx; | 426 | __u32 ecx; |
437 | __u32 edx; | 427 | __u32 edx; |
438 | __u32 padding; | 428 | __u32 padding; |
439 | }; | 429 | }; |
440 | 430 | ||
441 | /* for KVM_SET_CPUID */ | 431 | /* for KVM_SET_CPUID */ |
442 | struct kvm_cpuid { | 432 | struct kvm_cpuid { |
443 | __u32 nent; | 433 | __u32 nent; |
444 | __u32 padding; | 434 | __u32 padding; |
445 | struct kvm_cpuid_entry entries[0]; | 435 | struct kvm_cpuid_entry entries[0]; |
446 | }; | 436 | }; |
447 | 437 | ||
448 | 4.20 KVM_SET_SIGNAL_MASK | 438 | 4.20 KVM_SET_SIGNAL_MASK |
449 | 439 | ||
450 | Capability: basic | 440 | Capability: basic |
451 | Architectures: x86 | 441 | Architectures: x86 |
452 | Type: vcpu ioctl | 442 | Type: vcpu ioctl |
453 | Parameters: struct kvm_signal_mask (in) | 443 | Parameters: struct kvm_signal_mask (in) |
454 | Returns: 0 on success, -1 on error | 444 | Returns: 0 on success, -1 on error |
455 | 445 | ||
456 | Defines which signals are blocked during execution of KVM_RUN. This | 446 | Defines which signals are blocked during execution of KVM_RUN. This |
457 | signal mask temporarily overrides the threads signal mask. Any | 447 | signal mask temporarily overrides the threads signal mask. Any |
458 | unblocked signal received (except SIGKILL and SIGSTOP, which retain | 448 | unblocked signal received (except SIGKILL and SIGSTOP, which retain |
459 | their traditional behaviour) will cause KVM_RUN to return with -EINTR. | 449 | their traditional behaviour) will cause KVM_RUN to return with -EINTR. |
460 | 450 | ||
461 | Note the signal will only be delivered if not blocked by the original | 451 | Note the signal will only be delivered if not blocked by the original |
462 | signal mask. | 452 | signal mask. |
463 | 453 | ||
464 | /* for KVM_SET_SIGNAL_MASK */ | 454 | /* for KVM_SET_SIGNAL_MASK */ |
465 | struct kvm_signal_mask { | 455 | struct kvm_signal_mask { |
466 | __u32 len; | 456 | __u32 len; |
467 | __u8 sigset[0]; | 457 | __u8 sigset[0]; |
468 | }; | 458 | }; |
469 | 459 | ||
470 | 4.21 KVM_GET_FPU | 460 | 4.21 KVM_GET_FPU |
471 | 461 | ||
472 | Capability: basic | 462 | Capability: basic |
473 | Architectures: x86 | 463 | Architectures: x86 |
474 | Type: vcpu ioctl | 464 | Type: vcpu ioctl |
475 | Parameters: struct kvm_fpu (out) | 465 | Parameters: struct kvm_fpu (out) |
476 | Returns: 0 on success, -1 on error | 466 | Returns: 0 on success, -1 on error |
477 | 467 | ||
478 | Reads the floating point state from the vcpu. | 468 | Reads the floating point state from the vcpu. |
479 | 469 | ||
480 | /* for KVM_GET_FPU and KVM_SET_FPU */ | 470 | /* for KVM_GET_FPU and KVM_SET_FPU */ |
481 | struct kvm_fpu { | 471 | struct kvm_fpu { |
482 | __u8 fpr[8][16]; | 472 | __u8 fpr[8][16]; |
483 | __u16 fcw; | 473 | __u16 fcw; |
484 | __u16 fsw; | 474 | __u16 fsw; |
485 | __u8 ftwx; /* in fxsave format */ | 475 | __u8 ftwx; /* in fxsave format */ |
486 | __u8 pad1; | 476 | __u8 pad1; |
487 | __u16 last_opcode; | 477 | __u16 last_opcode; |
488 | __u64 last_ip; | 478 | __u64 last_ip; |
489 | __u64 last_dp; | 479 | __u64 last_dp; |
490 | __u8 xmm[16][16]; | 480 | __u8 xmm[16][16]; |
491 | __u32 mxcsr; | 481 | __u32 mxcsr; |
492 | __u32 pad2; | 482 | __u32 pad2; |
493 | }; | 483 | }; |
494 | 484 | ||
495 | 4.22 KVM_SET_FPU | 485 | 4.22 KVM_SET_FPU |
496 | 486 | ||
497 | Capability: basic | 487 | Capability: basic |
498 | Architectures: x86 | 488 | Architectures: x86 |
499 | Type: vcpu ioctl | 489 | Type: vcpu ioctl |
500 | Parameters: struct kvm_fpu (in) | 490 | Parameters: struct kvm_fpu (in) |
501 | Returns: 0 on success, -1 on error | 491 | Returns: 0 on success, -1 on error |
502 | 492 | ||
503 | Writes the floating point state to the vcpu. | 493 | Writes the floating point state to the vcpu. |
504 | 494 | ||
505 | /* for KVM_GET_FPU and KVM_SET_FPU */ | 495 | /* for KVM_GET_FPU and KVM_SET_FPU */ |
506 | struct kvm_fpu { | 496 | struct kvm_fpu { |
507 | __u8 fpr[8][16]; | 497 | __u8 fpr[8][16]; |
508 | __u16 fcw; | 498 | __u16 fcw; |
509 | __u16 fsw; | 499 | __u16 fsw; |
510 | __u8 ftwx; /* in fxsave format */ | 500 | __u8 ftwx; /* in fxsave format */ |
511 | __u8 pad1; | 501 | __u8 pad1; |
512 | __u16 last_opcode; | 502 | __u16 last_opcode; |
513 | __u64 last_ip; | 503 | __u64 last_ip; |
514 | __u64 last_dp; | 504 | __u64 last_dp; |
515 | __u8 xmm[16][16]; | 505 | __u8 xmm[16][16]; |
516 | __u32 mxcsr; | 506 | __u32 mxcsr; |
517 | __u32 pad2; | 507 | __u32 pad2; |
518 | }; | 508 | }; |
519 | 509 | ||
520 | 4.23 KVM_CREATE_IRQCHIP | 510 | 4.23 KVM_CREATE_IRQCHIP |
521 | 511 | ||
522 | Capability: KVM_CAP_IRQCHIP | 512 | Capability: KVM_CAP_IRQCHIP |
523 | Architectures: x86, ia64 | 513 | Architectures: x86, ia64 |
524 | Type: vm ioctl | 514 | Type: vm ioctl |
525 | Parameters: none | 515 | Parameters: none |
526 | Returns: 0 on success, -1 on error | 516 | Returns: 0 on success, -1 on error |
527 | 517 | ||
528 | Creates an interrupt controller model in the kernel. On x86, creates a virtual | 518 | Creates an interrupt controller model in the kernel. On x86, creates a virtual |
529 | ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a | 519 | ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a |
530 | local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 | 520 | local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 |
531 | only go to the IOAPIC. On ia64, a IOSAPIC is created. | 521 | only go to the IOAPIC. On ia64, a IOSAPIC is created. |
532 | 522 | ||
533 | 4.24 KVM_IRQ_LINE | 523 | 4.24 KVM_IRQ_LINE |
534 | 524 | ||
535 | Capability: KVM_CAP_IRQCHIP | 525 | Capability: KVM_CAP_IRQCHIP |
536 | Architectures: x86, ia64 | 526 | Architectures: x86, ia64 |
537 | Type: vm ioctl | 527 | Type: vm ioctl |
538 | Parameters: struct kvm_irq_level | 528 | Parameters: struct kvm_irq_level |
539 | Returns: 0 on success, -1 on error | 529 | Returns: 0 on success, -1 on error |
540 | 530 | ||
541 | Sets the level of a GSI input to the interrupt controller model in the kernel. | 531 | Sets the level of a GSI input to the interrupt controller model in the kernel. |
542 | Requires that an interrupt controller model has been previously created with | 532 | Requires that an interrupt controller model has been previously created with |
543 | KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level | 533 | KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level |
544 | to be set to 1 and then back to 0. | 534 | to be set to 1 and then back to 0. |
545 | 535 | ||
546 | struct kvm_irq_level { | 536 | struct kvm_irq_level { |
547 | union { | 537 | union { |
548 | __u32 irq; /* GSI */ | 538 | __u32 irq; /* GSI */ |
549 | __s32 status; /* not used for KVM_IRQ_LEVEL */ | 539 | __s32 status; /* not used for KVM_IRQ_LEVEL */ |
550 | }; | 540 | }; |
551 | __u32 level; /* 0 or 1 */ | 541 | __u32 level; /* 0 or 1 */ |
552 | }; | 542 | }; |
553 | 543 | ||
554 | 4.25 KVM_GET_IRQCHIP | 544 | 4.25 KVM_GET_IRQCHIP |
555 | 545 | ||
556 | Capability: KVM_CAP_IRQCHIP | 546 | Capability: KVM_CAP_IRQCHIP |
557 | Architectures: x86, ia64 | 547 | Architectures: x86, ia64 |
558 | Type: vm ioctl | 548 | Type: vm ioctl |
559 | Parameters: struct kvm_irqchip (in/out) | 549 | Parameters: struct kvm_irqchip (in/out) |
560 | Returns: 0 on success, -1 on error | 550 | Returns: 0 on success, -1 on error |
561 | 551 | ||
562 | Reads the state of a kernel interrupt controller created with | 552 | Reads the state of a kernel interrupt controller created with |
563 | KVM_CREATE_IRQCHIP into a buffer provided by the caller. | 553 | KVM_CREATE_IRQCHIP into a buffer provided by the caller. |
564 | 554 | ||
565 | struct kvm_irqchip { | 555 | struct kvm_irqchip { |
566 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | 556 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ |
567 | __u32 pad; | 557 | __u32 pad; |
568 | union { | 558 | union { |
569 | char dummy[512]; /* reserving space */ | 559 | char dummy[512]; /* reserving space */ |
570 | struct kvm_pic_state pic; | 560 | struct kvm_pic_state pic; |
571 | struct kvm_ioapic_state ioapic; | 561 | struct kvm_ioapic_state ioapic; |
572 | } chip; | 562 | } chip; |
573 | }; | 563 | }; |
574 | 564 | ||
575 | 4.26 KVM_SET_IRQCHIP | 565 | 4.26 KVM_SET_IRQCHIP |
576 | 566 | ||
577 | Capability: KVM_CAP_IRQCHIP | 567 | Capability: KVM_CAP_IRQCHIP |
578 | Architectures: x86, ia64 | 568 | Architectures: x86, ia64 |
579 | Type: vm ioctl | 569 | Type: vm ioctl |
580 | Parameters: struct kvm_irqchip (in) | 570 | Parameters: struct kvm_irqchip (in) |
581 | Returns: 0 on success, -1 on error | 571 | Returns: 0 on success, -1 on error |
582 | 572 | ||
583 | Sets the state of a kernel interrupt controller created with | 573 | Sets the state of a kernel interrupt controller created with |
584 | KVM_CREATE_IRQCHIP from a buffer provided by the caller. | 574 | KVM_CREATE_IRQCHIP from a buffer provided by the caller. |
585 | 575 | ||
586 | struct kvm_irqchip { | 576 | struct kvm_irqchip { |
587 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | 577 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ |
588 | __u32 pad; | 578 | __u32 pad; |
589 | union { | 579 | union { |
590 | char dummy[512]; /* reserving space */ | 580 | char dummy[512]; /* reserving space */ |
591 | struct kvm_pic_state pic; | 581 | struct kvm_pic_state pic; |
592 | struct kvm_ioapic_state ioapic; | 582 | struct kvm_ioapic_state ioapic; |
593 | } chip; | 583 | } chip; |
594 | }; | 584 | }; |
595 | 585 | ||
596 | 4.27 KVM_XEN_HVM_CONFIG | 586 | 4.27 KVM_XEN_HVM_CONFIG |
597 | 587 | ||
598 | Capability: KVM_CAP_XEN_HVM | 588 | Capability: KVM_CAP_XEN_HVM |
599 | Architectures: x86 | 589 | Architectures: x86 |
600 | Type: vm ioctl | 590 | Type: vm ioctl |
601 | Parameters: struct kvm_xen_hvm_config (in) | 591 | Parameters: struct kvm_xen_hvm_config (in) |
602 | Returns: 0 on success, -1 on error | 592 | Returns: 0 on success, -1 on error |
603 | 593 | ||
604 | Sets the MSR that the Xen HVM guest uses to initialize its hypercall | 594 | Sets the MSR that the Xen HVM guest uses to initialize its hypercall |
605 | page, and provides the starting address and size of the hypercall | 595 | page, and provides the starting address and size of the hypercall |
606 | blobs in userspace. When the guest writes the MSR, kvm copies one | 596 | blobs in userspace. When the guest writes the MSR, kvm copies one |
607 | page of a blob (32- or 64-bit, depending on the vcpu mode) to guest | 597 | page of a blob (32- or 64-bit, depending on the vcpu mode) to guest |
608 | memory. | 598 | memory. |
609 | 599 | ||
610 | struct kvm_xen_hvm_config { | 600 | struct kvm_xen_hvm_config { |
611 | __u32 flags; | 601 | __u32 flags; |
612 | __u32 msr; | 602 | __u32 msr; |
613 | __u64 blob_addr_32; | 603 | __u64 blob_addr_32; |
614 | __u64 blob_addr_64; | 604 | __u64 blob_addr_64; |
615 | __u8 blob_size_32; | 605 | __u8 blob_size_32; |
616 | __u8 blob_size_64; | 606 | __u8 blob_size_64; |
617 | __u8 pad2[30]; | 607 | __u8 pad2[30]; |
618 | }; | 608 | }; |
619 | 609 | ||
620 | 4.27 KVM_GET_CLOCK | 610 | 4.27 KVM_GET_CLOCK |
621 | 611 | ||
622 | Capability: KVM_CAP_ADJUST_CLOCK | 612 | Capability: KVM_CAP_ADJUST_CLOCK |
623 | Architectures: x86 | 613 | Architectures: x86 |
624 | Type: vm ioctl | 614 | Type: vm ioctl |
625 | Parameters: struct kvm_clock_data (out) | 615 | Parameters: struct kvm_clock_data (out) |
626 | Returns: 0 on success, -1 on error | 616 | Returns: 0 on success, -1 on error |
627 | 617 | ||
628 | Gets the current timestamp of kvmclock as seen by the current guest. In | 618 | Gets the current timestamp of kvmclock as seen by the current guest. In |
629 | conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios | 619 | conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios |
630 | such as migration. | 620 | such as migration. |
631 | 621 | ||
632 | struct kvm_clock_data { | 622 | struct kvm_clock_data { |
633 | __u64 clock; /* kvmclock current value */ | 623 | __u64 clock; /* kvmclock current value */ |
634 | __u32 flags; | 624 | __u32 flags; |
635 | __u32 pad[9]; | 625 | __u32 pad[9]; |
636 | }; | 626 | }; |
637 | 627 | ||
638 | 4.28 KVM_SET_CLOCK | 628 | 4.28 KVM_SET_CLOCK |
639 | 629 | ||
640 | Capability: KVM_CAP_ADJUST_CLOCK | 630 | Capability: KVM_CAP_ADJUST_CLOCK |
641 | Architectures: x86 | 631 | Architectures: x86 |
642 | Type: vm ioctl | 632 | Type: vm ioctl |
643 | Parameters: struct kvm_clock_data (in) | 633 | Parameters: struct kvm_clock_data (in) |
644 | Returns: 0 on success, -1 on error | 634 | Returns: 0 on success, -1 on error |
645 | 635 | ||
646 | Sets the current timestamp of kvmclock to the value specified in its parameter. | 636 | Sets the current timestamp of kvmclock to the value specified in its parameter. |
647 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios | 637 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios |
648 | such as migration. | 638 | such as migration. |
649 | 639 | ||
650 | struct kvm_clock_data { | 640 | struct kvm_clock_data { |
651 | __u64 clock; /* kvmclock current value */ | 641 | __u64 clock; /* kvmclock current value */ |
652 | __u32 flags; | 642 | __u32 flags; |
653 | __u32 pad[9]; | 643 | __u32 pad[9]; |
654 | }; | 644 | }; |
655 | 645 | ||
656 | 4.29 KVM_GET_VCPU_EVENTS | 646 | 4.29 KVM_GET_VCPU_EVENTS |
657 | 647 | ||
658 | Capability: KVM_CAP_VCPU_EVENTS | 648 | Capability: KVM_CAP_VCPU_EVENTS |
659 | Extended by: KVM_CAP_INTR_SHADOW | 649 | Extended by: KVM_CAP_INTR_SHADOW |
660 | Architectures: x86 | 650 | Architectures: x86 |
661 | Type: vm ioctl | 651 | Type: vm ioctl |
662 | Parameters: struct kvm_vcpu_event (out) | 652 | Parameters: struct kvm_vcpu_event (out) |
663 | Returns: 0 on success, -1 on error | 653 | Returns: 0 on success, -1 on error |
664 | 654 | ||
665 | Gets currently pending exceptions, interrupts, and NMIs as well as related | 655 | Gets currently pending exceptions, interrupts, and NMIs as well as related |
666 | states of the vcpu. | 656 | states of the vcpu. |
667 | 657 | ||
668 | struct kvm_vcpu_events { | 658 | struct kvm_vcpu_events { |
669 | struct { | 659 | struct { |
670 | __u8 injected; | 660 | __u8 injected; |
671 | __u8 nr; | 661 | __u8 nr; |
672 | __u8 has_error_code; | 662 | __u8 has_error_code; |
673 | __u8 pad; | 663 | __u8 pad; |
674 | __u32 error_code; | 664 | __u32 error_code; |
675 | } exception; | 665 | } exception; |
676 | struct { | 666 | struct { |
677 | __u8 injected; | 667 | __u8 injected; |
678 | __u8 nr; | 668 | __u8 nr; |
679 | __u8 soft; | 669 | __u8 soft; |
680 | __u8 shadow; | 670 | __u8 shadow; |
681 | } interrupt; | 671 | } interrupt; |
682 | struct { | 672 | struct { |
683 | __u8 injected; | 673 | __u8 injected; |
684 | __u8 pending; | 674 | __u8 pending; |
685 | __u8 masked; | 675 | __u8 masked; |
686 | __u8 pad; | 676 | __u8 pad; |
687 | } nmi; | 677 | } nmi; |
688 | __u32 sipi_vector; | 678 | __u32 sipi_vector; |
689 | __u32 flags; | 679 | __u32 flags; |
690 | }; | 680 | }; |
691 | 681 | ||
692 | KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that | 682 | KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that |
693 | interrupt.shadow contains a valid state. Otherwise, this field is undefined. | 683 | interrupt.shadow contains a valid state. Otherwise, this field is undefined. |
694 | 684 | ||
695 | 4.30 KVM_SET_VCPU_EVENTS | 685 | 4.30 KVM_SET_VCPU_EVENTS |
696 | 686 | ||
697 | Capability: KVM_CAP_VCPU_EVENTS | 687 | Capability: KVM_CAP_VCPU_EVENTS |
698 | Extended by: KVM_CAP_INTR_SHADOW | 688 | Extended by: KVM_CAP_INTR_SHADOW |
699 | Architectures: x86 | 689 | Architectures: x86 |
700 | Type: vm ioctl | 690 | Type: vm ioctl |
701 | Parameters: struct kvm_vcpu_event (in) | 691 | Parameters: struct kvm_vcpu_event (in) |
702 | Returns: 0 on success, -1 on error | 692 | Returns: 0 on success, -1 on error |
703 | 693 | ||
704 | Set pending exceptions, interrupts, and NMIs as well as related states of the | 694 | Set pending exceptions, interrupts, and NMIs as well as related states of the |
705 | vcpu. | 695 | vcpu. |
706 | 696 | ||
707 | See KVM_GET_VCPU_EVENTS for the data structure. | 697 | See KVM_GET_VCPU_EVENTS for the data structure. |
708 | 698 | ||
709 | Fields that may be modified asynchronously by running VCPUs can be excluded | 699 | Fields that may be modified asynchronously by running VCPUs can be excluded |
710 | from the update. These fields are nmi.pending and sipi_vector. Keep the | 700 | from the update. These fields are nmi.pending and sipi_vector. Keep the |
711 | corresponding bits in the flags field cleared to suppress overwriting the | 701 | corresponding bits in the flags field cleared to suppress overwriting the |
712 | current in-kernel state. The bits are: | 702 | current in-kernel state. The bits are: |
713 | 703 | ||
714 | KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel | 704 | KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel |
715 | KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector | 705 | KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector |
716 | 706 | ||
717 | If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in | 707 | If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in |
718 | the flags field to signal that interrupt.shadow contains a valid state and | 708 | the flags field to signal that interrupt.shadow contains a valid state and |
719 | shall be written into the VCPU. | 709 | shall be written into the VCPU. |
720 | 710 | ||
721 | 4.32 KVM_GET_DEBUGREGS | 711 | 4.32 KVM_GET_DEBUGREGS |
722 | 712 | ||
723 | Capability: KVM_CAP_DEBUGREGS | 713 | Capability: KVM_CAP_DEBUGREGS |
724 | Architectures: x86 | 714 | Architectures: x86 |
725 | Type: vm ioctl | 715 | Type: vm ioctl |
726 | Parameters: struct kvm_debugregs (out) | 716 | Parameters: struct kvm_debugregs (out) |
727 | Returns: 0 on success, -1 on error | 717 | Returns: 0 on success, -1 on error |
728 | 718 | ||
729 | Reads debug registers from the vcpu. | 719 | Reads debug registers from the vcpu. |
730 | 720 | ||
731 | struct kvm_debugregs { | 721 | struct kvm_debugregs { |
732 | __u64 db[4]; | 722 | __u64 db[4]; |
733 | __u64 dr6; | 723 | __u64 dr6; |
734 | __u64 dr7; | 724 | __u64 dr7; |
735 | __u64 flags; | 725 | __u64 flags; |
736 | __u64 reserved[9]; | 726 | __u64 reserved[9]; |
737 | }; | 727 | }; |
738 | 728 | ||
739 | 4.33 KVM_SET_DEBUGREGS | 729 | 4.33 KVM_SET_DEBUGREGS |
740 | 730 | ||
741 | Capability: KVM_CAP_DEBUGREGS | 731 | Capability: KVM_CAP_DEBUGREGS |
742 | Architectures: x86 | 732 | Architectures: x86 |
743 | Type: vm ioctl | 733 | Type: vm ioctl |
744 | Parameters: struct kvm_debugregs (in) | 734 | Parameters: struct kvm_debugregs (in) |
745 | Returns: 0 on success, -1 on error | 735 | Returns: 0 on success, -1 on error |
746 | 736 | ||
747 | Writes debug registers into the vcpu. | 737 | Writes debug registers into the vcpu. |
748 | 738 | ||
749 | See KVM_GET_DEBUGREGS for the data structure. The flags field is unused | 739 | See KVM_GET_DEBUGREGS for the data structure. The flags field is unused |
750 | yet and must be cleared on entry. | 740 | yet and must be cleared on entry. |
751 | 741 | ||
752 | 4.34 KVM_SET_USER_MEMORY_REGION | 742 | 4.34 KVM_SET_USER_MEMORY_REGION |
753 | 743 | ||
754 | Capability: KVM_CAP_USER_MEM | 744 | Capability: KVM_CAP_USER_MEM |
755 | Architectures: all | 745 | Architectures: all |
756 | Type: vm ioctl | 746 | Type: vm ioctl |
757 | Parameters: struct kvm_userspace_memory_region (in) | 747 | Parameters: struct kvm_userspace_memory_region (in) |
758 | Returns: 0 on success, -1 on error | 748 | Returns: 0 on success, -1 on error |
759 | 749 | ||
760 | struct kvm_userspace_memory_region { | 750 | struct kvm_userspace_memory_region { |
761 | __u32 slot; | 751 | __u32 slot; |
762 | __u32 flags; | 752 | __u32 flags; |
763 | __u64 guest_phys_addr; | 753 | __u64 guest_phys_addr; |
764 | __u64 memory_size; /* bytes */ | 754 | __u64 memory_size; /* bytes */ |
765 | __u64 userspace_addr; /* start of the userspace allocated memory */ | 755 | __u64 userspace_addr; /* start of the userspace allocated memory */ |
766 | }; | 756 | }; |
767 | 757 | ||
768 | /* for kvm_memory_region::flags */ | 758 | /* for kvm_memory_region::flags */ |
769 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | 759 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL |
770 | 760 | ||
771 | This ioctl allows the user to create or modify a guest physical memory | 761 | This ioctl allows the user to create or modify a guest physical memory |
772 | slot. When changing an existing slot, it may be moved in the guest | 762 | slot. When changing an existing slot, it may be moved in the guest |
773 | physical memory space, or its flags may be modified. It may not be | 763 | physical memory space, or its flags may be modified. It may not be |
774 | resized. Slots may not overlap in guest physical address space. | 764 | resized. Slots may not overlap in guest physical address space. |
775 | 765 | ||
776 | Memory for the region is taken starting at the address denoted by the | 766 | Memory for the region is taken starting at the address denoted by the |
777 | field userspace_addr, which must point at user addressable memory for | 767 | field userspace_addr, which must point at user addressable memory for |
778 | the entire memory slot size. Any object may back this memory, including | 768 | the entire memory slot size. Any object may back this memory, including |
779 | anonymous memory, ordinary files, and hugetlbfs. | 769 | anonymous memory, ordinary files, and hugetlbfs. |
780 | 770 | ||
781 | It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr | 771 | It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr |
782 | be identical. This allows large pages in the guest to be backed by large | 772 | be identical. This allows large pages in the guest to be backed by large |
783 | pages in the host. | 773 | pages in the host. |
784 | 774 | ||
785 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | 775 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which |
786 | instructs kvm to keep track of writes to memory within the slot. See | 776 | instructs kvm to keep track of writes to memory within the slot. See |
787 | the KVM_GET_DIRTY_LOG ioctl. | 777 | the KVM_GET_DIRTY_LOG ioctl. |
788 | 778 | ||
789 | When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory | 779 | When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory |
790 | region are automatically reflected into the guest. For example, an mmap() | 780 | region are automatically reflected into the guest. For example, an mmap() |
791 | that affects the region will be made visible immediately. Another example | 781 | that affects the region will be made visible immediately. Another example |
792 | is madvise(MADV_DROP). | 782 | is madvise(MADV_DROP). |
793 | 783 | ||
794 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. | 784 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. |
795 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory | 785 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory |
796 | allocation and is deprecated. | 786 | allocation and is deprecated. |
797 | 787 | ||
798 | 4.35 KVM_SET_TSS_ADDR | 788 | 4.35 KVM_SET_TSS_ADDR |
799 | 789 | ||
800 | Capability: KVM_CAP_SET_TSS_ADDR | 790 | Capability: KVM_CAP_SET_TSS_ADDR |
801 | Architectures: x86 | 791 | Architectures: x86 |
802 | Type: vm ioctl | 792 | Type: vm ioctl |
803 | Parameters: unsigned long tss_address (in) | 793 | Parameters: unsigned long tss_address (in) |
804 | Returns: 0 on success, -1 on error | 794 | Returns: 0 on success, -1 on error |
805 | 795 | ||
806 | This ioctl defines the physical address of a three-page region in the guest | 796 | This ioctl defines the physical address of a three-page region in the guest |
807 | physical address space. The region must be within the first 4GB of the | 797 | physical address space. The region must be within the first 4GB of the |
808 | guest physical address space and must not conflict with any memory slot | 798 | guest physical address space and must not conflict with any memory slot |
809 | or any mmio address. The guest may malfunction if it accesses this memory | 799 | or any mmio address. The guest may malfunction if it accesses this memory |
810 | region. | 800 | region. |
811 | 801 | ||
812 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware | 802 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware |
813 | because of a quirk in the virtualization implementation (see the internals | 803 | because of a quirk in the virtualization implementation (see the internals |
814 | documentation when it pops into existence). | 804 | documentation when it pops into existence). |
815 | 805 | ||
816 | 4.36 KVM_ENABLE_CAP | 806 | 4.36 KVM_ENABLE_CAP |
817 | 807 | ||
818 | Capability: KVM_CAP_ENABLE_CAP | 808 | Capability: KVM_CAP_ENABLE_CAP |
819 | Architectures: ppc | 809 | Architectures: ppc |
820 | Type: vcpu ioctl | 810 | Type: vcpu ioctl |
821 | Parameters: struct kvm_enable_cap (in) | 811 | Parameters: struct kvm_enable_cap (in) |
822 | Returns: 0 on success; -1 on error | 812 | Returns: 0 on success; -1 on error |
823 | 813 | ||
824 | +Not all extensions are enabled by default. Using this ioctl the application | 814 | +Not all extensions are enabled by default. Using this ioctl the application |
825 | can enable an extension, making it available to the guest. | 815 | can enable an extension, making it available to the guest. |
826 | 816 | ||
827 | On systems that do not support this ioctl, it always fails. On systems that | 817 | On systems that do not support this ioctl, it always fails. On systems that |
828 | do support it, it only works for extensions that are supported for enablement. | 818 | do support it, it only works for extensions that are supported for enablement. |
829 | 819 | ||
830 | To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should | 820 | To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should |
831 | be used. | 821 | be used. |
832 | 822 | ||
833 | struct kvm_enable_cap { | 823 | struct kvm_enable_cap { |
834 | /* in */ | 824 | /* in */ |
835 | __u32 cap; | 825 | __u32 cap; |
836 | 826 | ||
837 | The capability that is supposed to get enabled. | 827 | The capability that is supposed to get enabled. |
838 | 828 | ||
839 | __u32 flags; | 829 | __u32 flags; |
840 | 830 | ||
841 | A bitfield indicating future enhancements. Has to be 0 for now. | 831 | A bitfield indicating future enhancements. Has to be 0 for now. |
842 | 832 | ||
843 | __u64 args[4]; | 833 | __u64 args[4]; |
844 | 834 | ||
845 | Arguments for enabling a feature. If a feature needs initial values to | 835 | Arguments for enabling a feature. If a feature needs initial values to |
846 | function properly, this is the place to put them. | 836 | function properly, this is the place to put them. |
847 | 837 | ||
848 | __u8 pad[64]; | 838 | __u8 pad[64]; |
849 | }; | 839 | }; |
850 | 840 | ||
851 | 4.37 KVM_GET_MP_STATE | 841 | 4.37 KVM_GET_MP_STATE |
852 | 842 | ||
853 | Capability: KVM_CAP_MP_STATE | 843 | Capability: KVM_CAP_MP_STATE |
854 | Architectures: x86, ia64 | 844 | Architectures: x86, ia64 |
855 | Type: vcpu ioctl | 845 | Type: vcpu ioctl |
856 | Parameters: struct kvm_mp_state (out) | 846 | Parameters: struct kvm_mp_state (out) |
857 | Returns: 0 on success; -1 on error | 847 | Returns: 0 on success; -1 on error |
858 | 848 | ||
859 | struct kvm_mp_state { | 849 | struct kvm_mp_state { |
860 | __u32 mp_state; | 850 | __u32 mp_state; |
861 | }; | 851 | }; |
862 | 852 | ||
863 | Returns the vcpu's current "multiprocessing state" (though also valid on | 853 | Returns the vcpu's current "multiprocessing state" (though also valid on |
864 | uniprocessor guests). | 854 | uniprocessor guests). |
865 | 855 | ||
866 | Possible values are: | 856 | Possible values are: |
867 | 857 | ||
868 | - KVM_MP_STATE_RUNNABLE: the vcpu is currently running | 858 | - KVM_MP_STATE_RUNNABLE: the vcpu is currently running |
869 | - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) | 859 | - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) |
870 | which has not yet received an INIT signal | 860 | which has not yet received an INIT signal |
871 | - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is | 861 | - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is |
872 | now ready for a SIPI | 862 | now ready for a SIPI |
873 | - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and | 863 | - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and |
874 | is waiting for an interrupt | 864 | is waiting for an interrupt |
875 | - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector | 865 | - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector |
876 | accesible via KVM_GET_VCPU_EVENTS) | 866 | accesible via KVM_GET_VCPU_EVENTS) |
877 | 867 | ||
878 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel | 868 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel |
879 | irqchip, the multiprocessing state must be maintained by userspace. | 869 | irqchip, the multiprocessing state must be maintained by userspace. |
880 | 870 | ||
881 | 4.38 KVM_SET_MP_STATE | 871 | 4.38 KVM_SET_MP_STATE |
882 | 872 | ||
883 | Capability: KVM_CAP_MP_STATE | 873 | Capability: KVM_CAP_MP_STATE |
884 | Architectures: x86, ia64 | 874 | Architectures: x86, ia64 |
885 | Type: vcpu ioctl | 875 | Type: vcpu ioctl |
886 | Parameters: struct kvm_mp_state (in) | 876 | Parameters: struct kvm_mp_state (in) |
887 | Returns: 0 on success; -1 on error | 877 | Returns: 0 on success; -1 on error |
888 | 878 | ||
889 | Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for | 879 | Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for |
890 | arguments. | 880 | arguments. |
891 | 881 | ||
892 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel | 882 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel |
893 | irqchip, the multiprocessing state must be maintained by userspace. | 883 | irqchip, the multiprocessing state must be maintained by userspace. |
894 | 884 | ||
895 | 4.39 KVM_SET_IDENTITY_MAP_ADDR | 885 | 4.39 KVM_SET_IDENTITY_MAP_ADDR |
896 | 886 | ||
897 | Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR | 887 | Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR |
898 | Architectures: x86 | 888 | Architectures: x86 |
899 | Type: vm ioctl | 889 | Type: vm ioctl |
900 | Parameters: unsigned long identity (in) | 890 | Parameters: unsigned long identity (in) |
901 | Returns: 0 on success, -1 on error | 891 | Returns: 0 on success, -1 on error |
902 | 892 | ||
903 | This ioctl defines the physical address of a one-page region in the guest | 893 | This ioctl defines the physical address of a one-page region in the guest |
904 | physical address space. The region must be within the first 4GB of the | 894 | physical address space. The region must be within the first 4GB of the |
905 | guest physical address space and must not conflict with any memory slot | 895 | guest physical address space and must not conflict with any memory slot |
906 | or any mmio address. The guest may malfunction if it accesses this memory | 896 | or any mmio address. The guest may malfunction if it accesses this memory |
907 | region. | 897 | region. |
908 | 898 | ||
909 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware | 899 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware |
910 | because of a quirk in the virtualization implementation (see the internals | 900 | because of a quirk in the virtualization implementation (see the internals |
911 | documentation when it pops into existence). | 901 | documentation when it pops into existence). |
912 | 902 | ||
913 | 4.40 KVM_SET_BOOT_CPU_ID | 903 | 4.40 KVM_SET_BOOT_CPU_ID |
914 | 904 | ||
915 | Capability: KVM_CAP_SET_BOOT_CPU_ID | 905 | Capability: KVM_CAP_SET_BOOT_CPU_ID |
916 | Architectures: x86, ia64 | 906 | Architectures: x86, ia64 |
917 | Type: vm ioctl | 907 | Type: vm ioctl |
918 | Parameters: unsigned long vcpu_id | 908 | Parameters: unsigned long vcpu_id |
919 | Returns: 0 on success, -1 on error | 909 | Returns: 0 on success, -1 on error |
920 | 910 | ||
921 | Define which vcpu is the Bootstrap Processor (BSP). Values are the same | 911 | Define which vcpu is the Bootstrap Processor (BSP). Values are the same |
922 | as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default | 912 | as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default |
923 | is vcpu 0. | 913 | is vcpu 0. |
924 | 914 | ||
925 | 4.41 KVM_GET_XSAVE | 915 | 4.41 KVM_GET_XSAVE |
926 | 916 | ||
927 | Capability: KVM_CAP_XSAVE | 917 | Capability: KVM_CAP_XSAVE |
928 | Architectures: x86 | 918 | Architectures: x86 |
929 | Type: vcpu ioctl | 919 | Type: vcpu ioctl |
930 | Parameters: struct kvm_xsave (out) | 920 | Parameters: struct kvm_xsave (out) |
931 | Returns: 0 on success, -1 on error | 921 | Returns: 0 on success, -1 on error |
932 | 922 | ||
933 | struct kvm_xsave { | 923 | struct kvm_xsave { |
934 | __u32 region[1024]; | 924 | __u32 region[1024]; |
935 | }; | 925 | }; |
936 | 926 | ||
937 | This ioctl would copy current vcpu's xsave struct to the userspace. | 927 | This ioctl would copy current vcpu's xsave struct to the userspace. |
938 | 928 | ||
939 | 4.42 KVM_SET_XSAVE | 929 | 4.42 KVM_SET_XSAVE |
940 | 930 | ||
941 | Capability: KVM_CAP_XSAVE | 931 | Capability: KVM_CAP_XSAVE |
942 | Architectures: x86 | 932 | Architectures: x86 |
943 | Type: vcpu ioctl | 933 | Type: vcpu ioctl |
944 | Parameters: struct kvm_xsave (in) | 934 | Parameters: struct kvm_xsave (in) |
945 | Returns: 0 on success, -1 on error | 935 | Returns: 0 on success, -1 on error |
946 | 936 | ||
947 | struct kvm_xsave { | 937 | struct kvm_xsave { |
948 | __u32 region[1024]; | 938 | __u32 region[1024]; |
949 | }; | 939 | }; |
950 | 940 | ||
951 | This ioctl would copy userspace's xsave struct to the kernel. | 941 | This ioctl would copy userspace's xsave struct to the kernel. |
952 | 942 | ||
953 | 4.43 KVM_GET_XCRS | 943 | 4.43 KVM_GET_XCRS |
954 | 944 | ||
955 | Capability: KVM_CAP_XCRS | 945 | Capability: KVM_CAP_XCRS |
956 | Architectures: x86 | 946 | Architectures: x86 |
957 | Type: vcpu ioctl | 947 | Type: vcpu ioctl |
958 | Parameters: struct kvm_xcrs (out) | 948 | Parameters: struct kvm_xcrs (out) |
959 | Returns: 0 on success, -1 on error | 949 | Returns: 0 on success, -1 on error |
960 | 950 | ||
961 | struct kvm_xcr { | 951 | struct kvm_xcr { |
962 | __u32 xcr; | 952 | __u32 xcr; |
963 | __u32 reserved; | 953 | __u32 reserved; |
964 | __u64 value; | 954 | __u64 value; |
965 | }; | 955 | }; |
966 | 956 | ||
967 | struct kvm_xcrs { | 957 | struct kvm_xcrs { |
968 | __u32 nr_xcrs; | 958 | __u32 nr_xcrs; |
969 | __u32 flags; | 959 | __u32 flags; |
970 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; | 960 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; |
971 | __u64 padding[16]; | 961 | __u64 padding[16]; |
972 | }; | 962 | }; |
973 | 963 | ||
974 | This ioctl would copy current vcpu's xcrs to the userspace. | 964 | This ioctl would copy current vcpu's xcrs to the userspace. |
975 | 965 | ||
976 | 4.44 KVM_SET_XCRS | 966 | 4.44 KVM_SET_XCRS |
977 | 967 | ||
978 | Capability: KVM_CAP_XCRS | 968 | Capability: KVM_CAP_XCRS |
979 | Architectures: x86 | 969 | Architectures: x86 |
980 | Type: vcpu ioctl | 970 | Type: vcpu ioctl |
981 | Parameters: struct kvm_xcrs (in) | 971 | Parameters: struct kvm_xcrs (in) |
982 | Returns: 0 on success, -1 on error | 972 | Returns: 0 on success, -1 on error |
983 | 973 | ||
984 | struct kvm_xcr { | 974 | struct kvm_xcr { |
985 | __u32 xcr; | 975 | __u32 xcr; |
986 | __u32 reserved; | 976 | __u32 reserved; |
987 | __u64 value; | 977 | __u64 value; |
988 | }; | 978 | }; |
989 | 979 | ||
990 | struct kvm_xcrs { | 980 | struct kvm_xcrs { |
991 | __u32 nr_xcrs; | 981 | __u32 nr_xcrs; |
992 | __u32 flags; | 982 | __u32 flags; |
993 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; | 983 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; |
994 | __u64 padding[16]; | 984 | __u64 padding[16]; |
995 | }; | 985 | }; |
996 | 986 | ||
997 | This ioctl would set vcpu's xcr to the value userspace specified. | 987 | This ioctl would set vcpu's xcr to the value userspace specified. |
998 | 988 | ||
999 | 5. The kvm_run structure | 989 | 5. The kvm_run structure |
1000 | 990 | ||
1001 | Application code obtains a pointer to the kvm_run structure by | 991 | Application code obtains a pointer to the kvm_run structure by |
1002 | mmap()ing a vcpu fd. From that point, application code can control | 992 | mmap()ing a vcpu fd. From that point, application code can control |
1003 | execution by changing fields in kvm_run prior to calling the KVM_RUN | 993 | execution by changing fields in kvm_run prior to calling the KVM_RUN |
1004 | ioctl, and obtain information about the reason KVM_RUN returned by | 994 | ioctl, and obtain information about the reason KVM_RUN returned by |
1005 | looking up structure members. | 995 | looking up structure members. |
1006 | 996 | ||
1007 | struct kvm_run { | 997 | struct kvm_run { |
1008 | /* in */ | 998 | /* in */ |
1009 | __u8 request_interrupt_window; | 999 | __u8 request_interrupt_window; |
1010 | 1000 | ||
1011 | Request that KVM_RUN return when it becomes possible to inject external | 1001 | Request that KVM_RUN return when it becomes possible to inject external |
1012 | interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. | 1002 | interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. |
1013 | 1003 | ||
1014 | __u8 padding1[7]; | 1004 | __u8 padding1[7]; |
1015 | 1005 | ||
1016 | /* out */ | 1006 | /* out */ |
1017 | __u32 exit_reason; | 1007 | __u32 exit_reason; |
1018 | 1008 | ||
1019 | When KVM_RUN has returned successfully (return value 0), this informs | 1009 | When KVM_RUN has returned successfully (return value 0), this informs |
1020 | application code why KVM_RUN has returned. Allowable values for this | 1010 | application code why KVM_RUN has returned. Allowable values for this |
1021 | field are detailed below. | 1011 | field are detailed below. |
1022 | 1012 | ||
1023 | __u8 ready_for_interrupt_injection; | 1013 | __u8 ready_for_interrupt_injection; |
1024 | 1014 | ||
1025 | If request_interrupt_window has been specified, this field indicates | 1015 | If request_interrupt_window has been specified, this field indicates |
1026 | an interrupt can be injected now with KVM_INTERRUPT. | 1016 | an interrupt can be injected now with KVM_INTERRUPT. |
1027 | 1017 | ||
1028 | __u8 if_flag; | 1018 | __u8 if_flag; |
1029 | 1019 | ||
1030 | The value of the current interrupt flag. Only valid if in-kernel | 1020 | The value of the current interrupt flag. Only valid if in-kernel |
1031 | local APIC is not used. | 1021 | local APIC is not used. |
1032 | 1022 | ||
1033 | __u8 padding2[2]; | 1023 | __u8 padding2[2]; |
1034 | 1024 | ||
1035 | /* in (pre_kvm_run), out (post_kvm_run) */ | 1025 | /* in (pre_kvm_run), out (post_kvm_run) */ |
1036 | __u64 cr8; | 1026 | __u64 cr8; |
1037 | 1027 | ||
1038 | The value of the cr8 register. Only valid if in-kernel local APIC is | 1028 | The value of the cr8 register. Only valid if in-kernel local APIC is |
1039 | not used. Both input and output. | 1029 | not used. Both input and output. |
1040 | 1030 | ||
1041 | __u64 apic_base; | 1031 | __u64 apic_base; |
1042 | 1032 | ||
1043 | The value of the APIC BASE msr. Only valid if in-kernel local | 1033 | The value of the APIC BASE msr. Only valid if in-kernel local |
1044 | APIC is not used. Both input and output. | 1034 | APIC is not used. Both input and output. |
1045 | 1035 | ||
1046 | union { | 1036 | union { |
1047 | /* KVM_EXIT_UNKNOWN */ | 1037 | /* KVM_EXIT_UNKNOWN */ |
1048 | struct { | 1038 | struct { |
1049 | __u64 hardware_exit_reason; | 1039 | __u64 hardware_exit_reason; |
1050 | } hw; | 1040 | } hw; |
1051 | 1041 | ||
1052 | If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown | 1042 | If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown |
1053 | reasons. Further architecture-specific information is available in | 1043 | reasons. Further architecture-specific information is available in |
1054 | hardware_exit_reason. | 1044 | hardware_exit_reason. |
1055 | 1045 | ||
1056 | /* KVM_EXIT_FAIL_ENTRY */ | 1046 | /* KVM_EXIT_FAIL_ENTRY */ |
1057 | struct { | 1047 | struct { |
1058 | __u64 hardware_entry_failure_reason; | 1048 | __u64 hardware_entry_failure_reason; |
1059 | } fail_entry; | 1049 | } fail_entry; |
1060 | 1050 | ||
1061 | If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due | 1051 | If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due |
1062 | to unknown reasons. Further architecture-specific information is | 1052 | to unknown reasons. Further architecture-specific information is |
1063 | available in hardware_entry_failure_reason. | 1053 | available in hardware_entry_failure_reason. |
1064 | 1054 | ||
1065 | /* KVM_EXIT_EXCEPTION */ | 1055 | /* KVM_EXIT_EXCEPTION */ |
1066 | struct { | 1056 | struct { |
1067 | __u32 exception; | 1057 | __u32 exception; |
1068 | __u32 error_code; | 1058 | __u32 error_code; |
1069 | } ex; | 1059 | } ex; |
1070 | 1060 | ||
1071 | Unused. | 1061 | Unused. |
1072 | 1062 | ||
1073 | /* KVM_EXIT_IO */ | 1063 | /* KVM_EXIT_IO */ |
1074 | struct { | 1064 | struct { |
1075 | #define KVM_EXIT_IO_IN 0 | 1065 | #define KVM_EXIT_IO_IN 0 |
1076 | #define KVM_EXIT_IO_OUT 1 | 1066 | #define KVM_EXIT_IO_OUT 1 |
1077 | __u8 direction; | 1067 | __u8 direction; |
1078 | __u8 size; /* bytes */ | 1068 | __u8 size; /* bytes */ |
1079 | __u16 port; | 1069 | __u16 port; |
1080 | __u32 count; | 1070 | __u32 count; |
1081 | __u64 data_offset; /* relative to kvm_run start */ | 1071 | __u64 data_offset; /* relative to kvm_run start */ |
1082 | } io; | 1072 | } io; |
1083 | 1073 | ||
1084 | If exit_reason is KVM_EXIT_IO, then the vcpu has | 1074 | If exit_reason is KVM_EXIT_IO, then the vcpu has |
1085 | executed a port I/O instruction which could not be satisfied by kvm. | 1075 | executed a port I/O instruction which could not be satisfied by kvm. |
1086 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or | 1076 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or |
1087 | where kvm expects application code to place the data for the next | 1077 | where kvm expects application code to place the data for the next |
1088 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. | 1078 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. |
1089 | 1079 | ||
1090 | struct { | 1080 | struct { |
1091 | struct kvm_debug_exit_arch arch; | 1081 | struct kvm_debug_exit_arch arch; |
1092 | } debug; | 1082 | } debug; |
1093 | 1083 | ||
1094 | Unused. | 1084 | Unused. |
1095 | 1085 | ||
1096 | /* KVM_EXIT_MMIO */ | 1086 | /* KVM_EXIT_MMIO */ |
1097 | struct { | 1087 | struct { |
1098 | __u64 phys_addr; | 1088 | __u64 phys_addr; |
1099 | __u8 data[8]; | 1089 | __u8 data[8]; |
1100 | __u32 len; | 1090 | __u32 len; |
1101 | __u8 is_write; | 1091 | __u8 is_write; |
1102 | } mmio; | 1092 | } mmio; |
1103 | 1093 | ||
1104 | If exit_reason is KVM_EXIT_MMIO, then the vcpu has | 1094 | If exit_reason is KVM_EXIT_MMIO, then the vcpu has |
1105 | executed a memory-mapped I/O instruction which could not be satisfied | 1095 | executed a memory-mapped I/O instruction which could not be satisfied |
1106 | by kvm. The 'data' member contains the written data if 'is_write' is | 1096 | by kvm. The 'data' member contains the written data if 'is_write' is |
1107 | true, and should be filled by application code otherwise. | 1097 | true, and should be filled by application code otherwise. |
1108 | 1098 | ||
1109 | NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding | 1099 | NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding |
1110 | operations are complete (and guest state is consistent) only after userspace | 1100 | operations are complete (and guest state is consistent) only after userspace |
1111 | has re-entered the kernel with KVM_RUN. The kernel side will first finish | 1101 | has re-entered the kernel with KVM_RUN. The kernel side will first finish |
1112 | incomplete operations and then check for pending signals. Userspace | 1102 | incomplete operations and then check for pending signals. Userspace |
1113 | can re-enter the guest with an unmasked signal pending to complete | 1103 | can re-enter the guest with an unmasked signal pending to complete |
1114 | pending operations. | 1104 | pending operations. |
1115 | 1105 | ||
1116 | /* KVM_EXIT_HYPERCALL */ | 1106 | /* KVM_EXIT_HYPERCALL */ |
1117 | struct { | 1107 | struct { |
1118 | __u64 nr; | 1108 | __u64 nr; |
1119 | __u64 args[6]; | 1109 | __u64 args[6]; |
1120 | __u64 ret; | 1110 | __u64 ret; |
1121 | __u32 longmode; | 1111 | __u32 longmode; |
1122 | __u32 pad; | 1112 | __u32 pad; |
1123 | } hypercall; | 1113 | } hypercall; |
1124 | 1114 | ||
1125 | Unused. This was once used for 'hypercall to userspace'. To implement | 1115 | Unused. This was once used for 'hypercall to userspace'. To implement |
1126 | such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). | 1116 | such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). |
1127 | Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. | 1117 | Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. |
1128 | 1118 | ||
1129 | /* KVM_EXIT_TPR_ACCESS */ | 1119 | /* KVM_EXIT_TPR_ACCESS */ |
1130 | struct { | 1120 | struct { |
1131 | __u64 rip; | 1121 | __u64 rip; |
1132 | __u32 is_write; | 1122 | __u32 is_write; |
1133 | __u32 pad; | 1123 | __u32 pad; |
1134 | } tpr_access; | 1124 | } tpr_access; |
1135 | 1125 | ||
1136 | To be documented (KVM_TPR_ACCESS_REPORTING). | 1126 | To be documented (KVM_TPR_ACCESS_REPORTING). |
1137 | 1127 | ||
1138 | /* KVM_EXIT_S390_SIEIC */ | 1128 | /* KVM_EXIT_S390_SIEIC */ |
1139 | struct { | 1129 | struct { |
1140 | __u8 icptcode; | 1130 | __u8 icptcode; |
1141 | __u64 mask; /* psw upper half */ | 1131 | __u64 mask; /* psw upper half */ |
1142 | __u64 addr; /* psw lower half */ | 1132 | __u64 addr; /* psw lower half */ |
1143 | __u16 ipa; | 1133 | __u16 ipa; |
1144 | __u32 ipb; | 1134 | __u32 ipb; |
1145 | } s390_sieic; | 1135 | } s390_sieic; |
1146 | 1136 | ||
1147 | s390 specific. | 1137 | s390 specific. |
1148 | 1138 | ||
1149 | /* KVM_EXIT_S390_RESET */ | 1139 | /* KVM_EXIT_S390_RESET */ |
1150 | #define KVM_S390_RESET_POR 1 | 1140 | #define KVM_S390_RESET_POR 1 |
1151 | #define KVM_S390_RESET_CLEAR 2 | 1141 | #define KVM_S390_RESET_CLEAR 2 |
1152 | #define KVM_S390_RESET_SUBSYSTEM 4 | 1142 | #define KVM_S390_RESET_SUBSYSTEM 4 |
1153 | #define KVM_S390_RESET_CPU_INIT 8 | 1143 | #define KVM_S390_RESET_CPU_INIT 8 |
1154 | #define KVM_S390_RESET_IPL 16 | 1144 | #define KVM_S390_RESET_IPL 16 |
1155 | __u64 s390_reset_flags; | 1145 | __u64 s390_reset_flags; |
1156 | 1146 | ||
1157 | s390 specific. | 1147 | s390 specific. |
1158 | 1148 | ||
1159 | /* KVM_EXIT_DCR */ | 1149 | /* KVM_EXIT_DCR */ |
1160 | struct { | 1150 | struct { |
1161 | __u32 dcrn; | 1151 | __u32 dcrn; |
1162 | __u32 data; | 1152 | __u32 data; |
1163 | __u8 is_write; | 1153 | __u8 is_write; |
1164 | } dcr; | 1154 | } dcr; |
1165 | 1155 | ||
1166 | powerpc specific. | 1156 | powerpc specific. |
1167 | 1157 | ||
1168 | /* KVM_EXIT_OSI */ | 1158 | /* KVM_EXIT_OSI */ |
1169 | struct { | 1159 | struct { |
1170 | __u64 gprs[32]; | 1160 | __u64 gprs[32]; |
1171 | } osi; | 1161 | } osi; |
1172 | 1162 | ||
1173 | MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch | 1163 | MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch |
1174 | hypercalls and exit with this exit struct that contains all the guest gprs. | 1164 | hypercalls and exit with this exit struct that contains all the guest gprs. |
1175 | 1165 | ||
1176 | If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. | 1166 | If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. |
1177 | Userspace can now handle the hypercall and when it's done modify the gprs as | 1167 | Userspace can now handle the hypercall and when it's done modify the gprs as |
1178 | necessary. Upon guest entry all guest GPRs will then be replaced by the values | 1168 | necessary. Upon guest entry all guest GPRs will then be replaced by the values |
1179 | in this struct. | 1169 | in this struct. |
1180 | 1170 | ||
1181 | /* Fix the size of the union. */ | 1171 | /* Fix the size of the union. */ |
1182 | char padding[256]; | 1172 | char padding[256]; |
1183 | }; | 1173 | }; |
1184 | }; | 1174 | }; |
1185 | 1175 |
arch/ia64/kvm/kvm-ia64.c
1 | /* | 1 | /* |
2 | * kvm_ia64.c: Basic KVM suppport On Itanium series processors | 2 | * kvm_ia64.c: Basic KVM suppport On Itanium series processors |
3 | * | 3 | * |
4 | * | 4 | * |
5 | * Copyright (C) 2007, Intel Corporation. | 5 | * Copyright (C) 2007, Intel Corporation. |
6 | * Xiantao Zhang (xiantao.zhang@intel.com) | 6 | * Xiantao Zhang (xiantao.zhang@intel.com) |
7 | * | 7 | * |
8 | * This program is free software; you can redistribute it and/or modify it | 8 | * This program is free software; you can redistribute it and/or modify it |
9 | * under the terms and conditions of the GNU General Public License, | 9 | * under the terms and conditions of the GNU General Public License, |
10 | * version 2, as published by the Free Software Foundation. | 10 | * version 2, as published by the Free Software Foundation. |
11 | * | 11 | * |
12 | * This program is distributed in the hope it will be useful, but WITHOUT | 12 | * This program is distributed in the hope it will be useful, but WITHOUT |
13 | * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | 13 | * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
14 | * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | 14 | * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for |
15 | * more details. | 15 | * more details. |
16 | * | 16 | * |
17 | * You should have received a copy of the GNU General Public License along with | 17 | * You should have received a copy of the GNU General Public License along with |
18 | * this program; if not, write to the Free Software Foundation, Inc., 59 Temple | 18 | * this program; if not, write to the Free Software Foundation, Inc., 59 Temple |
19 | * Place - Suite 330, Boston, MA 02111-1307 USA. | 19 | * Place - Suite 330, Boston, MA 02111-1307 USA. |
20 | * | 20 | * |
21 | */ | 21 | */ |
22 | 22 | ||
23 | #include <linux/module.h> | 23 | #include <linux/module.h> |
24 | #include <linux/errno.h> | 24 | #include <linux/errno.h> |
25 | #include <linux/percpu.h> | 25 | #include <linux/percpu.h> |
26 | #include <linux/fs.h> | 26 | #include <linux/fs.h> |
27 | #include <linux/slab.h> | 27 | #include <linux/slab.h> |
28 | #include <linux/smp.h> | 28 | #include <linux/smp.h> |
29 | #include <linux/kvm_host.h> | 29 | #include <linux/kvm_host.h> |
30 | #include <linux/kvm.h> | 30 | #include <linux/kvm.h> |
31 | #include <linux/bitops.h> | 31 | #include <linux/bitops.h> |
32 | #include <linux/hrtimer.h> | 32 | #include <linux/hrtimer.h> |
33 | #include <linux/uaccess.h> | 33 | #include <linux/uaccess.h> |
34 | #include <linux/iommu.h> | 34 | #include <linux/iommu.h> |
35 | #include <linux/intel-iommu.h> | 35 | #include <linux/intel-iommu.h> |
36 | 36 | ||
37 | #include <asm/pgtable.h> | 37 | #include <asm/pgtable.h> |
38 | #include <asm/gcc_intrin.h> | 38 | #include <asm/gcc_intrin.h> |
39 | #include <asm/pal.h> | 39 | #include <asm/pal.h> |
40 | #include <asm/cacheflush.h> | 40 | #include <asm/cacheflush.h> |
41 | #include <asm/div64.h> | 41 | #include <asm/div64.h> |
42 | #include <asm/tlb.h> | 42 | #include <asm/tlb.h> |
43 | #include <asm/elf.h> | 43 | #include <asm/elf.h> |
44 | #include <asm/sn/addrs.h> | 44 | #include <asm/sn/addrs.h> |
45 | #include <asm/sn/clksupport.h> | 45 | #include <asm/sn/clksupport.h> |
46 | #include <asm/sn/shub_mmr.h> | 46 | #include <asm/sn/shub_mmr.h> |
47 | 47 | ||
48 | #include "misc.h" | 48 | #include "misc.h" |
49 | #include "vti.h" | 49 | #include "vti.h" |
50 | #include "iodev.h" | 50 | #include "iodev.h" |
51 | #include "ioapic.h" | 51 | #include "ioapic.h" |
52 | #include "lapic.h" | 52 | #include "lapic.h" |
53 | #include "irq.h" | 53 | #include "irq.h" |
54 | 54 | ||
55 | static unsigned long kvm_vmm_base; | 55 | static unsigned long kvm_vmm_base; |
56 | static unsigned long kvm_vsa_base; | 56 | static unsigned long kvm_vsa_base; |
57 | static unsigned long kvm_vm_buffer; | 57 | static unsigned long kvm_vm_buffer; |
58 | static unsigned long kvm_vm_buffer_size; | 58 | static unsigned long kvm_vm_buffer_size; |
59 | unsigned long kvm_vmm_gp; | 59 | unsigned long kvm_vmm_gp; |
60 | 60 | ||
61 | static long vp_env_info; | 61 | static long vp_env_info; |
62 | 62 | ||
63 | static struct kvm_vmm_info *kvm_vmm_info; | 63 | static struct kvm_vmm_info *kvm_vmm_info; |
64 | 64 | ||
65 | static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu); | 65 | static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu); |
66 | 66 | ||
67 | struct kvm_stats_debugfs_item debugfs_entries[] = { | 67 | struct kvm_stats_debugfs_item debugfs_entries[] = { |
68 | { NULL } | 68 | { NULL } |
69 | }; | 69 | }; |
70 | 70 | ||
71 | static unsigned long kvm_get_itc(struct kvm_vcpu *vcpu) | 71 | static unsigned long kvm_get_itc(struct kvm_vcpu *vcpu) |
72 | { | 72 | { |
73 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) | 73 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) |
74 | if (vcpu->kvm->arch.is_sn2) | 74 | if (vcpu->kvm->arch.is_sn2) |
75 | return rtc_time(); | 75 | return rtc_time(); |
76 | else | 76 | else |
77 | #endif | 77 | #endif |
78 | return ia64_getreg(_IA64_REG_AR_ITC); | 78 | return ia64_getreg(_IA64_REG_AR_ITC); |
79 | } | 79 | } |
80 | 80 | ||
81 | static void kvm_flush_icache(unsigned long start, unsigned long len) | 81 | static void kvm_flush_icache(unsigned long start, unsigned long len) |
82 | { | 82 | { |
83 | int l; | 83 | int l; |
84 | 84 | ||
85 | for (l = 0; l < (len + 32); l += 32) | 85 | for (l = 0; l < (len + 32); l += 32) |
86 | ia64_fc((void *)(start + l)); | 86 | ia64_fc((void *)(start + l)); |
87 | 87 | ||
88 | ia64_sync_i(); | 88 | ia64_sync_i(); |
89 | ia64_srlz_i(); | 89 | ia64_srlz_i(); |
90 | } | 90 | } |
91 | 91 | ||
92 | static void kvm_flush_tlb_all(void) | 92 | static void kvm_flush_tlb_all(void) |
93 | { | 93 | { |
94 | unsigned long i, j, count0, count1, stride0, stride1, addr; | 94 | unsigned long i, j, count0, count1, stride0, stride1, addr; |
95 | long flags; | 95 | long flags; |
96 | 96 | ||
97 | addr = local_cpu_data->ptce_base; | 97 | addr = local_cpu_data->ptce_base; |
98 | count0 = local_cpu_data->ptce_count[0]; | 98 | count0 = local_cpu_data->ptce_count[0]; |
99 | count1 = local_cpu_data->ptce_count[1]; | 99 | count1 = local_cpu_data->ptce_count[1]; |
100 | stride0 = local_cpu_data->ptce_stride[0]; | 100 | stride0 = local_cpu_data->ptce_stride[0]; |
101 | stride1 = local_cpu_data->ptce_stride[1]; | 101 | stride1 = local_cpu_data->ptce_stride[1]; |
102 | 102 | ||
103 | local_irq_save(flags); | 103 | local_irq_save(flags); |
104 | for (i = 0; i < count0; ++i) { | 104 | for (i = 0; i < count0; ++i) { |
105 | for (j = 0; j < count1; ++j) { | 105 | for (j = 0; j < count1; ++j) { |
106 | ia64_ptce(addr); | 106 | ia64_ptce(addr); |
107 | addr += stride1; | 107 | addr += stride1; |
108 | } | 108 | } |
109 | addr += stride0; | 109 | addr += stride0; |
110 | } | 110 | } |
111 | local_irq_restore(flags); | 111 | local_irq_restore(flags); |
112 | ia64_srlz_i(); /* srlz.i implies srlz.d */ | 112 | ia64_srlz_i(); /* srlz.i implies srlz.d */ |
113 | } | 113 | } |
114 | 114 | ||
115 | long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 *opt_handler) | 115 | long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 *opt_handler) |
116 | { | 116 | { |
117 | struct ia64_pal_retval iprv; | 117 | struct ia64_pal_retval iprv; |
118 | 118 | ||
119 | PAL_CALL_STK(iprv, PAL_VP_CREATE, (u64)vpd, (u64)host_iva, | 119 | PAL_CALL_STK(iprv, PAL_VP_CREATE, (u64)vpd, (u64)host_iva, |
120 | (u64)opt_handler); | 120 | (u64)opt_handler); |
121 | 121 | ||
122 | return iprv.status; | 122 | return iprv.status; |
123 | } | 123 | } |
124 | 124 | ||
125 | static DEFINE_SPINLOCK(vp_lock); | 125 | static DEFINE_SPINLOCK(vp_lock); |
126 | 126 | ||
127 | int kvm_arch_hardware_enable(void *garbage) | 127 | int kvm_arch_hardware_enable(void *garbage) |
128 | { | 128 | { |
129 | long status; | 129 | long status; |
130 | long tmp_base; | 130 | long tmp_base; |
131 | unsigned long pte; | 131 | unsigned long pte; |
132 | unsigned long saved_psr; | 132 | unsigned long saved_psr; |
133 | int slot; | 133 | int slot; |
134 | 134 | ||
135 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); | 135 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); |
136 | local_irq_save(saved_psr); | 136 | local_irq_save(saved_psr); |
137 | slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); | 137 | slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); |
138 | local_irq_restore(saved_psr); | 138 | local_irq_restore(saved_psr); |
139 | if (slot < 0) | 139 | if (slot < 0) |
140 | return -EINVAL; | 140 | return -EINVAL; |
141 | 141 | ||
142 | spin_lock(&vp_lock); | 142 | spin_lock(&vp_lock); |
143 | status = ia64_pal_vp_init_env(kvm_vsa_base ? | 143 | status = ia64_pal_vp_init_env(kvm_vsa_base ? |
144 | VP_INIT_ENV : VP_INIT_ENV_INITALIZE, | 144 | VP_INIT_ENV : VP_INIT_ENV_INITALIZE, |
145 | __pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, &tmp_base); | 145 | __pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, &tmp_base); |
146 | if (status != 0) { | 146 | if (status != 0) { |
147 | spin_unlock(&vp_lock); | 147 | spin_unlock(&vp_lock); |
148 | printk(KERN_WARNING"kvm: Failed to Enable VT Support!!!!\n"); | 148 | printk(KERN_WARNING"kvm: Failed to Enable VT Support!!!!\n"); |
149 | return -EINVAL; | 149 | return -EINVAL; |
150 | } | 150 | } |
151 | 151 | ||
152 | if (!kvm_vsa_base) { | 152 | if (!kvm_vsa_base) { |
153 | kvm_vsa_base = tmp_base; | 153 | kvm_vsa_base = tmp_base; |
154 | printk(KERN_INFO"kvm: kvm_vsa_base:0x%lx\n", kvm_vsa_base); | 154 | printk(KERN_INFO"kvm: kvm_vsa_base:0x%lx\n", kvm_vsa_base); |
155 | } | 155 | } |
156 | spin_unlock(&vp_lock); | 156 | spin_unlock(&vp_lock); |
157 | ia64_ptr_entry(0x3, slot); | 157 | ia64_ptr_entry(0x3, slot); |
158 | 158 | ||
159 | return 0; | 159 | return 0; |
160 | } | 160 | } |
161 | 161 | ||
162 | void kvm_arch_hardware_disable(void *garbage) | 162 | void kvm_arch_hardware_disable(void *garbage) |
163 | { | 163 | { |
164 | 164 | ||
165 | long status; | 165 | long status; |
166 | int slot; | 166 | int slot; |
167 | unsigned long pte; | 167 | unsigned long pte; |
168 | unsigned long saved_psr; | 168 | unsigned long saved_psr; |
169 | unsigned long host_iva = ia64_getreg(_IA64_REG_CR_IVA); | 169 | unsigned long host_iva = ia64_getreg(_IA64_REG_CR_IVA); |
170 | 170 | ||
171 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), | 171 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), |
172 | PAGE_KERNEL)); | 172 | PAGE_KERNEL)); |
173 | 173 | ||
174 | local_irq_save(saved_psr); | 174 | local_irq_save(saved_psr); |
175 | slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); | 175 | slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); |
176 | local_irq_restore(saved_psr); | 176 | local_irq_restore(saved_psr); |
177 | if (slot < 0) | 177 | if (slot < 0) |
178 | return; | 178 | return; |
179 | 179 | ||
180 | status = ia64_pal_vp_exit_env(host_iva); | 180 | status = ia64_pal_vp_exit_env(host_iva); |
181 | if (status) | 181 | if (status) |
182 | printk(KERN_DEBUG"kvm: Failed to disable VT support! :%ld\n", | 182 | printk(KERN_DEBUG"kvm: Failed to disable VT support! :%ld\n", |
183 | status); | 183 | status); |
184 | ia64_ptr_entry(0x3, slot); | 184 | ia64_ptr_entry(0x3, slot); |
185 | } | 185 | } |
186 | 186 | ||
187 | void kvm_arch_check_processor_compat(void *rtn) | 187 | void kvm_arch_check_processor_compat(void *rtn) |
188 | { | 188 | { |
189 | *(int *)rtn = 0; | 189 | *(int *)rtn = 0; |
190 | } | 190 | } |
191 | 191 | ||
192 | int kvm_dev_ioctl_check_extension(long ext) | 192 | int kvm_dev_ioctl_check_extension(long ext) |
193 | { | 193 | { |
194 | 194 | ||
195 | int r; | 195 | int r; |
196 | 196 | ||
197 | switch (ext) { | 197 | switch (ext) { |
198 | case KVM_CAP_IRQCHIP: | 198 | case KVM_CAP_IRQCHIP: |
199 | case KVM_CAP_MP_STATE: | 199 | case KVM_CAP_MP_STATE: |
200 | case KVM_CAP_IRQ_INJECT_STATUS: | 200 | case KVM_CAP_IRQ_INJECT_STATUS: |
201 | r = 1; | 201 | r = 1; |
202 | break; | 202 | break; |
203 | case KVM_CAP_COALESCED_MMIO: | 203 | case KVM_CAP_COALESCED_MMIO: |
204 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; | 204 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; |
205 | break; | 205 | break; |
206 | case KVM_CAP_IOMMU: | 206 | case KVM_CAP_IOMMU: |
207 | r = iommu_found(); | 207 | r = iommu_found(); |
208 | break; | 208 | break; |
209 | default: | 209 | default: |
210 | r = 0; | 210 | r = 0; |
211 | } | 211 | } |
212 | return r; | 212 | return r; |
213 | 213 | ||
214 | } | 214 | } |
215 | 215 | ||
216 | static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 216 | static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
217 | { | 217 | { |
218 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; | 218 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; |
219 | kvm_run->hw.hardware_exit_reason = 1; | 219 | kvm_run->hw.hardware_exit_reason = 1; |
220 | return 0; | 220 | return 0; |
221 | } | 221 | } |
222 | 222 | ||
223 | static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 223 | static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
224 | { | 224 | { |
225 | struct kvm_mmio_req *p; | 225 | struct kvm_mmio_req *p; |
226 | struct kvm_io_device *mmio_dev; | 226 | struct kvm_io_device *mmio_dev; |
227 | int r; | 227 | int r; |
228 | 228 | ||
229 | p = kvm_get_vcpu_ioreq(vcpu); | 229 | p = kvm_get_vcpu_ioreq(vcpu); |
230 | 230 | ||
231 | if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS) | 231 | if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS) |
232 | goto mmio; | 232 | goto mmio; |
233 | vcpu->mmio_needed = 1; | 233 | vcpu->mmio_needed = 1; |
234 | vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr; | 234 | vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr; |
235 | vcpu->mmio_size = kvm_run->mmio.len = p->size; | 235 | vcpu->mmio_size = kvm_run->mmio.len = p->size; |
236 | vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir; | 236 | vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir; |
237 | 237 | ||
238 | if (vcpu->mmio_is_write) | 238 | if (vcpu->mmio_is_write) |
239 | memcpy(vcpu->mmio_data, &p->data, p->size); | 239 | memcpy(vcpu->mmio_data, &p->data, p->size); |
240 | memcpy(kvm_run->mmio.data, &p->data, p->size); | 240 | memcpy(kvm_run->mmio.data, &p->data, p->size); |
241 | kvm_run->exit_reason = KVM_EXIT_MMIO; | 241 | kvm_run->exit_reason = KVM_EXIT_MMIO; |
242 | return 0; | 242 | return 0; |
243 | mmio: | 243 | mmio: |
244 | if (p->dir) | 244 | if (p->dir) |
245 | r = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, p->addr, | 245 | r = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, p->addr, |
246 | p->size, &p->data); | 246 | p->size, &p->data); |
247 | else | 247 | else |
248 | r = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, p->addr, | 248 | r = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, p->addr, |
249 | p->size, &p->data); | 249 | p->size, &p->data); |
250 | if (r) | 250 | if (r) |
251 | printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); | 251 | printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); |
252 | p->state = STATE_IORESP_READY; | 252 | p->state = STATE_IORESP_READY; |
253 | 253 | ||
254 | return 1; | 254 | return 1; |
255 | } | 255 | } |
256 | 256 | ||
257 | static int handle_pal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 257 | static int handle_pal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
258 | { | 258 | { |
259 | struct exit_ctl_data *p; | 259 | struct exit_ctl_data *p; |
260 | 260 | ||
261 | p = kvm_get_exit_data(vcpu); | 261 | p = kvm_get_exit_data(vcpu); |
262 | 262 | ||
263 | if (p->exit_reason == EXIT_REASON_PAL_CALL) | 263 | if (p->exit_reason == EXIT_REASON_PAL_CALL) |
264 | return kvm_pal_emul(vcpu, kvm_run); | 264 | return kvm_pal_emul(vcpu, kvm_run); |
265 | else { | 265 | else { |
266 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; | 266 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; |
267 | kvm_run->hw.hardware_exit_reason = 2; | 267 | kvm_run->hw.hardware_exit_reason = 2; |
268 | return 0; | 268 | return 0; |
269 | } | 269 | } |
270 | } | 270 | } |
271 | 271 | ||
272 | static int handle_sal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 272 | static int handle_sal_call(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
273 | { | 273 | { |
274 | struct exit_ctl_data *p; | 274 | struct exit_ctl_data *p; |
275 | 275 | ||
276 | p = kvm_get_exit_data(vcpu); | 276 | p = kvm_get_exit_data(vcpu); |
277 | 277 | ||
278 | if (p->exit_reason == EXIT_REASON_SAL_CALL) { | 278 | if (p->exit_reason == EXIT_REASON_SAL_CALL) { |
279 | kvm_sal_emul(vcpu); | 279 | kvm_sal_emul(vcpu); |
280 | return 1; | 280 | return 1; |
281 | } else { | 281 | } else { |
282 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; | 282 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; |
283 | kvm_run->hw.hardware_exit_reason = 3; | 283 | kvm_run->hw.hardware_exit_reason = 3; |
284 | return 0; | 284 | return 0; |
285 | } | 285 | } |
286 | 286 | ||
287 | } | 287 | } |
288 | 288 | ||
289 | static int __apic_accept_irq(struct kvm_vcpu *vcpu, uint64_t vector) | 289 | static int __apic_accept_irq(struct kvm_vcpu *vcpu, uint64_t vector) |
290 | { | 290 | { |
291 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 291 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
292 | 292 | ||
293 | if (!test_and_set_bit(vector, &vpd->irr[0])) { | 293 | if (!test_and_set_bit(vector, &vpd->irr[0])) { |
294 | vcpu->arch.irq_new_pending = 1; | 294 | vcpu->arch.irq_new_pending = 1; |
295 | kvm_vcpu_kick(vcpu); | 295 | kvm_vcpu_kick(vcpu); |
296 | return 1; | 296 | return 1; |
297 | } | 297 | } |
298 | return 0; | 298 | return 0; |
299 | } | 299 | } |
300 | 300 | ||
301 | /* | 301 | /* |
302 | * offset: address offset to IPI space. | 302 | * offset: address offset to IPI space. |
303 | * value: deliver value. | 303 | * value: deliver value. |
304 | */ | 304 | */ |
305 | static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, uint64_t dm, | 305 | static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, uint64_t dm, |
306 | uint64_t vector) | 306 | uint64_t vector) |
307 | { | 307 | { |
308 | switch (dm) { | 308 | switch (dm) { |
309 | case SAPIC_FIXED: | 309 | case SAPIC_FIXED: |
310 | break; | 310 | break; |
311 | case SAPIC_NMI: | 311 | case SAPIC_NMI: |
312 | vector = 2; | 312 | vector = 2; |
313 | break; | 313 | break; |
314 | case SAPIC_EXTINT: | 314 | case SAPIC_EXTINT: |
315 | vector = 0; | 315 | vector = 0; |
316 | break; | 316 | break; |
317 | case SAPIC_INIT: | 317 | case SAPIC_INIT: |
318 | case SAPIC_PMI: | 318 | case SAPIC_PMI: |
319 | default: | 319 | default: |
320 | printk(KERN_ERR"kvm: Unimplemented Deliver reserved IPI!\n"); | 320 | printk(KERN_ERR"kvm: Unimplemented Deliver reserved IPI!\n"); |
321 | return; | 321 | return; |
322 | } | 322 | } |
323 | __apic_accept_irq(vcpu, vector); | 323 | __apic_accept_irq(vcpu, vector); |
324 | } | 324 | } |
325 | 325 | ||
326 | static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id, | 326 | static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id, |
327 | unsigned long eid) | 327 | unsigned long eid) |
328 | { | 328 | { |
329 | union ia64_lid lid; | 329 | union ia64_lid lid; |
330 | int i; | 330 | int i; |
331 | struct kvm_vcpu *vcpu; | 331 | struct kvm_vcpu *vcpu; |
332 | 332 | ||
333 | kvm_for_each_vcpu(i, vcpu, kvm) { | 333 | kvm_for_each_vcpu(i, vcpu, kvm) { |
334 | lid.val = VCPU_LID(vcpu); | 334 | lid.val = VCPU_LID(vcpu); |
335 | if (lid.id == id && lid.eid == eid) | 335 | if (lid.id == id && lid.eid == eid) |
336 | return vcpu; | 336 | return vcpu; |
337 | } | 337 | } |
338 | 338 | ||
339 | return NULL; | 339 | return NULL; |
340 | } | 340 | } |
341 | 341 | ||
342 | static int handle_ipi(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 342 | static int handle_ipi(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
343 | { | 343 | { |
344 | struct exit_ctl_data *p = kvm_get_exit_data(vcpu); | 344 | struct exit_ctl_data *p = kvm_get_exit_data(vcpu); |
345 | struct kvm_vcpu *target_vcpu; | 345 | struct kvm_vcpu *target_vcpu; |
346 | struct kvm_pt_regs *regs; | 346 | struct kvm_pt_regs *regs; |
347 | union ia64_ipi_a addr = p->u.ipi_data.addr; | 347 | union ia64_ipi_a addr = p->u.ipi_data.addr; |
348 | union ia64_ipi_d data = p->u.ipi_data.data; | 348 | union ia64_ipi_d data = p->u.ipi_data.data; |
349 | 349 | ||
350 | target_vcpu = lid_to_vcpu(vcpu->kvm, addr.id, addr.eid); | 350 | target_vcpu = lid_to_vcpu(vcpu->kvm, addr.id, addr.eid); |
351 | if (!target_vcpu) | 351 | if (!target_vcpu) |
352 | return handle_vm_error(vcpu, kvm_run); | 352 | return handle_vm_error(vcpu, kvm_run); |
353 | 353 | ||
354 | if (!target_vcpu->arch.launched) { | 354 | if (!target_vcpu->arch.launched) { |
355 | regs = vcpu_regs(target_vcpu); | 355 | regs = vcpu_regs(target_vcpu); |
356 | 356 | ||
357 | regs->cr_iip = vcpu->kvm->arch.rdv_sal_data.boot_ip; | 357 | regs->cr_iip = vcpu->kvm->arch.rdv_sal_data.boot_ip; |
358 | regs->r1 = vcpu->kvm->arch.rdv_sal_data.boot_gp; | 358 | regs->r1 = vcpu->kvm->arch.rdv_sal_data.boot_gp; |
359 | 359 | ||
360 | target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 360 | target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
361 | if (waitqueue_active(&target_vcpu->wq)) | 361 | if (waitqueue_active(&target_vcpu->wq)) |
362 | wake_up_interruptible(&target_vcpu->wq); | 362 | wake_up_interruptible(&target_vcpu->wq); |
363 | } else { | 363 | } else { |
364 | vcpu_deliver_ipi(target_vcpu, data.dm, data.vector); | 364 | vcpu_deliver_ipi(target_vcpu, data.dm, data.vector); |
365 | if (target_vcpu != vcpu) | 365 | if (target_vcpu != vcpu) |
366 | kvm_vcpu_kick(target_vcpu); | 366 | kvm_vcpu_kick(target_vcpu); |
367 | } | 367 | } |
368 | 368 | ||
369 | return 1; | 369 | return 1; |
370 | } | 370 | } |
371 | 371 | ||
372 | struct call_data { | 372 | struct call_data { |
373 | struct kvm_ptc_g ptc_g_data; | 373 | struct kvm_ptc_g ptc_g_data; |
374 | struct kvm_vcpu *vcpu; | 374 | struct kvm_vcpu *vcpu; |
375 | }; | 375 | }; |
376 | 376 | ||
377 | static void vcpu_global_purge(void *info) | 377 | static void vcpu_global_purge(void *info) |
378 | { | 378 | { |
379 | struct call_data *p = (struct call_data *)info; | 379 | struct call_data *p = (struct call_data *)info; |
380 | struct kvm_vcpu *vcpu = p->vcpu; | 380 | struct kvm_vcpu *vcpu = p->vcpu; |
381 | 381 | ||
382 | if (test_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) | 382 | if (test_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) |
383 | return; | 383 | return; |
384 | 384 | ||
385 | set_bit(KVM_REQ_PTC_G, &vcpu->requests); | 385 | set_bit(KVM_REQ_PTC_G, &vcpu->requests); |
386 | if (vcpu->arch.ptc_g_count < MAX_PTC_G_NUM) { | 386 | if (vcpu->arch.ptc_g_count < MAX_PTC_G_NUM) { |
387 | vcpu->arch.ptc_g_data[vcpu->arch.ptc_g_count++] = | 387 | vcpu->arch.ptc_g_data[vcpu->arch.ptc_g_count++] = |
388 | p->ptc_g_data; | 388 | p->ptc_g_data; |
389 | } else { | 389 | } else { |
390 | clear_bit(KVM_REQ_PTC_G, &vcpu->requests); | 390 | clear_bit(KVM_REQ_PTC_G, &vcpu->requests); |
391 | vcpu->arch.ptc_g_count = 0; | 391 | vcpu->arch.ptc_g_count = 0; |
392 | set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); | 392 | set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); |
393 | } | 393 | } |
394 | } | 394 | } |
395 | 395 | ||
396 | static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 396 | static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
397 | { | 397 | { |
398 | struct exit_ctl_data *p = kvm_get_exit_data(vcpu); | 398 | struct exit_ctl_data *p = kvm_get_exit_data(vcpu); |
399 | struct kvm *kvm = vcpu->kvm; | 399 | struct kvm *kvm = vcpu->kvm; |
400 | struct call_data call_data; | 400 | struct call_data call_data; |
401 | int i; | 401 | int i; |
402 | struct kvm_vcpu *vcpui; | 402 | struct kvm_vcpu *vcpui; |
403 | 403 | ||
404 | call_data.ptc_g_data = p->u.ptc_g_data; | 404 | call_data.ptc_g_data = p->u.ptc_g_data; |
405 | 405 | ||
406 | kvm_for_each_vcpu(i, vcpui, kvm) { | 406 | kvm_for_each_vcpu(i, vcpui, kvm) { |
407 | if (vcpui->arch.mp_state == KVM_MP_STATE_UNINITIALIZED || | 407 | if (vcpui->arch.mp_state == KVM_MP_STATE_UNINITIALIZED || |
408 | vcpu == vcpui) | 408 | vcpu == vcpui) |
409 | continue; | 409 | continue; |
410 | 410 | ||
411 | if (waitqueue_active(&vcpui->wq)) | 411 | if (waitqueue_active(&vcpui->wq)) |
412 | wake_up_interruptible(&vcpui->wq); | 412 | wake_up_interruptible(&vcpui->wq); |
413 | 413 | ||
414 | if (vcpui->cpu != -1) { | 414 | if (vcpui->cpu != -1) { |
415 | call_data.vcpu = vcpui; | 415 | call_data.vcpu = vcpui; |
416 | smp_call_function_single(vcpui->cpu, | 416 | smp_call_function_single(vcpui->cpu, |
417 | vcpu_global_purge, &call_data, 1); | 417 | vcpu_global_purge, &call_data, 1); |
418 | } else | 418 | } else |
419 | printk(KERN_WARNING"kvm: Uninit vcpu received ipi!\n"); | 419 | printk(KERN_WARNING"kvm: Uninit vcpu received ipi!\n"); |
420 | 420 | ||
421 | } | 421 | } |
422 | return 1; | 422 | return 1; |
423 | } | 423 | } |
424 | 424 | ||
425 | static int handle_switch_rr6(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 425 | static int handle_switch_rr6(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
426 | { | 426 | { |
427 | return 1; | 427 | return 1; |
428 | } | 428 | } |
429 | 429 | ||
430 | static int kvm_sn2_setup_mappings(struct kvm_vcpu *vcpu) | 430 | static int kvm_sn2_setup_mappings(struct kvm_vcpu *vcpu) |
431 | { | 431 | { |
432 | unsigned long pte, rtc_phys_addr, map_addr; | 432 | unsigned long pte, rtc_phys_addr, map_addr; |
433 | int slot; | 433 | int slot; |
434 | 434 | ||
435 | map_addr = KVM_VMM_BASE + (1UL << KVM_VMM_SHIFT); | 435 | map_addr = KVM_VMM_BASE + (1UL << KVM_VMM_SHIFT); |
436 | rtc_phys_addr = LOCAL_MMR_OFFSET | SH_RTC; | 436 | rtc_phys_addr = LOCAL_MMR_OFFSET | SH_RTC; |
437 | pte = pte_val(mk_pte_phys(rtc_phys_addr, PAGE_KERNEL_UC)); | 437 | pte = pte_val(mk_pte_phys(rtc_phys_addr, PAGE_KERNEL_UC)); |
438 | slot = ia64_itr_entry(0x3, map_addr, pte, PAGE_SHIFT); | 438 | slot = ia64_itr_entry(0x3, map_addr, pte, PAGE_SHIFT); |
439 | vcpu->arch.sn_rtc_tr_slot = slot; | 439 | vcpu->arch.sn_rtc_tr_slot = slot; |
440 | if (slot < 0) { | 440 | if (slot < 0) { |
441 | printk(KERN_ERR "Mayday mayday! RTC mapping failed!\n"); | 441 | printk(KERN_ERR "Mayday mayday! RTC mapping failed!\n"); |
442 | slot = 0; | 442 | slot = 0; |
443 | } | 443 | } |
444 | return slot; | 444 | return slot; |
445 | } | 445 | } |
446 | 446 | ||
447 | int kvm_emulate_halt(struct kvm_vcpu *vcpu) | 447 | int kvm_emulate_halt(struct kvm_vcpu *vcpu) |
448 | { | 448 | { |
449 | 449 | ||
450 | ktime_t kt; | 450 | ktime_t kt; |
451 | long itc_diff; | 451 | long itc_diff; |
452 | unsigned long vcpu_now_itc; | 452 | unsigned long vcpu_now_itc; |
453 | unsigned long expires; | 453 | unsigned long expires; |
454 | struct hrtimer *p_ht = &vcpu->arch.hlt_timer; | 454 | struct hrtimer *p_ht = &vcpu->arch.hlt_timer; |
455 | unsigned long cyc_per_usec = local_cpu_data->cyc_per_usec; | 455 | unsigned long cyc_per_usec = local_cpu_data->cyc_per_usec; |
456 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 456 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
457 | 457 | ||
458 | if (irqchip_in_kernel(vcpu->kvm)) { | 458 | if (irqchip_in_kernel(vcpu->kvm)) { |
459 | 459 | ||
460 | vcpu_now_itc = kvm_get_itc(vcpu) + vcpu->arch.itc_offset; | 460 | vcpu_now_itc = kvm_get_itc(vcpu) + vcpu->arch.itc_offset; |
461 | 461 | ||
462 | if (time_after(vcpu_now_itc, vpd->itm)) { | 462 | if (time_after(vcpu_now_itc, vpd->itm)) { |
463 | vcpu->arch.timer_check = 1; | 463 | vcpu->arch.timer_check = 1; |
464 | return 1; | 464 | return 1; |
465 | } | 465 | } |
466 | itc_diff = vpd->itm - vcpu_now_itc; | 466 | itc_diff = vpd->itm - vcpu_now_itc; |
467 | if (itc_diff < 0) | 467 | if (itc_diff < 0) |
468 | itc_diff = -itc_diff; | 468 | itc_diff = -itc_diff; |
469 | 469 | ||
470 | expires = div64_u64(itc_diff, cyc_per_usec); | 470 | expires = div64_u64(itc_diff, cyc_per_usec); |
471 | kt = ktime_set(0, 1000 * expires); | 471 | kt = ktime_set(0, 1000 * expires); |
472 | 472 | ||
473 | vcpu->arch.ht_active = 1; | 473 | vcpu->arch.ht_active = 1; |
474 | hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS); | 474 | hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS); |
475 | 475 | ||
476 | vcpu->arch.mp_state = KVM_MP_STATE_HALTED; | 476 | vcpu->arch.mp_state = KVM_MP_STATE_HALTED; |
477 | kvm_vcpu_block(vcpu); | 477 | kvm_vcpu_block(vcpu); |
478 | hrtimer_cancel(p_ht); | 478 | hrtimer_cancel(p_ht); |
479 | vcpu->arch.ht_active = 0; | 479 | vcpu->arch.ht_active = 0; |
480 | 480 | ||
481 | if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests) || | 481 | if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests) || |
482 | kvm_cpu_has_pending_timer(vcpu)) | 482 | kvm_cpu_has_pending_timer(vcpu)) |
483 | if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) | 483 | if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) |
484 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 484 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
485 | 485 | ||
486 | if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) | 486 | if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) |
487 | return -EINTR; | 487 | return -EINTR; |
488 | return 1; | 488 | return 1; |
489 | } else { | 489 | } else { |
490 | printk(KERN_ERR"kvm: Unsupported userspace halt!"); | 490 | printk(KERN_ERR"kvm: Unsupported userspace halt!"); |
491 | return 0; | 491 | return 0; |
492 | } | 492 | } |
493 | } | 493 | } |
494 | 494 | ||
495 | static int handle_vm_shutdown(struct kvm_vcpu *vcpu, | 495 | static int handle_vm_shutdown(struct kvm_vcpu *vcpu, |
496 | struct kvm_run *kvm_run) | 496 | struct kvm_run *kvm_run) |
497 | { | 497 | { |
498 | kvm_run->exit_reason = KVM_EXIT_SHUTDOWN; | 498 | kvm_run->exit_reason = KVM_EXIT_SHUTDOWN; |
499 | return 0; | 499 | return 0; |
500 | } | 500 | } |
501 | 501 | ||
502 | static int handle_external_interrupt(struct kvm_vcpu *vcpu, | 502 | static int handle_external_interrupt(struct kvm_vcpu *vcpu, |
503 | struct kvm_run *kvm_run) | 503 | struct kvm_run *kvm_run) |
504 | { | 504 | { |
505 | return 1; | 505 | return 1; |
506 | } | 506 | } |
507 | 507 | ||
508 | static int handle_vcpu_debug(struct kvm_vcpu *vcpu, | 508 | static int handle_vcpu_debug(struct kvm_vcpu *vcpu, |
509 | struct kvm_run *kvm_run) | 509 | struct kvm_run *kvm_run) |
510 | { | 510 | { |
511 | printk("VMM: %s", vcpu->arch.log_buf); | 511 | printk("VMM: %s", vcpu->arch.log_buf); |
512 | return 1; | 512 | return 1; |
513 | } | 513 | } |
514 | 514 | ||
515 | static int (*kvm_vti_exit_handlers[])(struct kvm_vcpu *vcpu, | 515 | static int (*kvm_vti_exit_handlers[])(struct kvm_vcpu *vcpu, |
516 | struct kvm_run *kvm_run) = { | 516 | struct kvm_run *kvm_run) = { |
517 | [EXIT_REASON_VM_PANIC] = handle_vm_error, | 517 | [EXIT_REASON_VM_PANIC] = handle_vm_error, |
518 | [EXIT_REASON_MMIO_INSTRUCTION] = handle_mmio, | 518 | [EXIT_REASON_MMIO_INSTRUCTION] = handle_mmio, |
519 | [EXIT_REASON_PAL_CALL] = handle_pal_call, | 519 | [EXIT_REASON_PAL_CALL] = handle_pal_call, |
520 | [EXIT_REASON_SAL_CALL] = handle_sal_call, | 520 | [EXIT_REASON_SAL_CALL] = handle_sal_call, |
521 | [EXIT_REASON_SWITCH_RR6] = handle_switch_rr6, | 521 | [EXIT_REASON_SWITCH_RR6] = handle_switch_rr6, |
522 | [EXIT_REASON_VM_DESTROY] = handle_vm_shutdown, | 522 | [EXIT_REASON_VM_DESTROY] = handle_vm_shutdown, |
523 | [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, | 523 | [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, |
524 | [EXIT_REASON_IPI] = handle_ipi, | 524 | [EXIT_REASON_IPI] = handle_ipi, |
525 | [EXIT_REASON_PTC_G] = handle_global_purge, | 525 | [EXIT_REASON_PTC_G] = handle_global_purge, |
526 | [EXIT_REASON_DEBUG] = handle_vcpu_debug, | 526 | [EXIT_REASON_DEBUG] = handle_vcpu_debug, |
527 | 527 | ||
528 | }; | 528 | }; |
529 | 529 | ||
530 | static const int kvm_vti_max_exit_handlers = | 530 | static const int kvm_vti_max_exit_handlers = |
531 | sizeof(kvm_vti_exit_handlers)/sizeof(*kvm_vti_exit_handlers); | 531 | sizeof(kvm_vti_exit_handlers)/sizeof(*kvm_vti_exit_handlers); |
532 | 532 | ||
533 | static uint32_t kvm_get_exit_reason(struct kvm_vcpu *vcpu) | 533 | static uint32_t kvm_get_exit_reason(struct kvm_vcpu *vcpu) |
534 | { | 534 | { |
535 | struct exit_ctl_data *p_exit_data; | 535 | struct exit_ctl_data *p_exit_data; |
536 | 536 | ||
537 | p_exit_data = kvm_get_exit_data(vcpu); | 537 | p_exit_data = kvm_get_exit_data(vcpu); |
538 | return p_exit_data->exit_reason; | 538 | return p_exit_data->exit_reason; |
539 | } | 539 | } |
540 | 540 | ||
541 | /* | 541 | /* |
542 | * The guest has exited. See if we can fix it or if we need userspace | 542 | * The guest has exited. See if we can fix it or if we need userspace |
543 | * assistance. | 543 | * assistance. |
544 | */ | 544 | */ |
545 | static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) | 545 | static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) |
546 | { | 546 | { |
547 | u32 exit_reason = kvm_get_exit_reason(vcpu); | 547 | u32 exit_reason = kvm_get_exit_reason(vcpu); |
548 | vcpu->arch.last_exit = exit_reason; | 548 | vcpu->arch.last_exit = exit_reason; |
549 | 549 | ||
550 | if (exit_reason < kvm_vti_max_exit_handlers | 550 | if (exit_reason < kvm_vti_max_exit_handlers |
551 | && kvm_vti_exit_handlers[exit_reason]) | 551 | && kvm_vti_exit_handlers[exit_reason]) |
552 | return kvm_vti_exit_handlers[exit_reason](vcpu, kvm_run); | 552 | return kvm_vti_exit_handlers[exit_reason](vcpu, kvm_run); |
553 | else { | 553 | else { |
554 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; | 554 | kvm_run->exit_reason = KVM_EXIT_UNKNOWN; |
555 | kvm_run->hw.hardware_exit_reason = exit_reason; | 555 | kvm_run->hw.hardware_exit_reason = exit_reason; |
556 | } | 556 | } |
557 | return 0; | 557 | return 0; |
558 | } | 558 | } |
559 | 559 | ||
560 | static inline void vti_set_rr6(unsigned long rr6) | 560 | static inline void vti_set_rr6(unsigned long rr6) |
561 | { | 561 | { |
562 | ia64_set_rr(RR6, rr6); | 562 | ia64_set_rr(RR6, rr6); |
563 | ia64_srlz_i(); | 563 | ia64_srlz_i(); |
564 | } | 564 | } |
565 | 565 | ||
566 | static int kvm_insert_vmm_mapping(struct kvm_vcpu *vcpu) | 566 | static int kvm_insert_vmm_mapping(struct kvm_vcpu *vcpu) |
567 | { | 567 | { |
568 | unsigned long pte; | 568 | unsigned long pte; |
569 | struct kvm *kvm = vcpu->kvm; | 569 | struct kvm *kvm = vcpu->kvm; |
570 | int r; | 570 | int r; |
571 | 571 | ||
572 | /*Insert a pair of tr to map vmm*/ | 572 | /*Insert a pair of tr to map vmm*/ |
573 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); | 573 | pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL)); |
574 | r = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); | 574 | r = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); |
575 | if (r < 0) | 575 | if (r < 0) |
576 | goto out; | 576 | goto out; |
577 | vcpu->arch.vmm_tr_slot = r; | 577 | vcpu->arch.vmm_tr_slot = r; |
578 | /*Insert a pairt of tr to map data of vm*/ | 578 | /*Insert a pairt of tr to map data of vm*/ |
579 | pte = pte_val(mk_pte_phys(__pa(kvm->arch.vm_base), PAGE_KERNEL)); | 579 | pte = pte_val(mk_pte_phys(__pa(kvm->arch.vm_base), PAGE_KERNEL)); |
580 | r = ia64_itr_entry(0x3, KVM_VM_DATA_BASE, | 580 | r = ia64_itr_entry(0x3, KVM_VM_DATA_BASE, |
581 | pte, KVM_VM_DATA_SHIFT); | 581 | pte, KVM_VM_DATA_SHIFT); |
582 | if (r < 0) | 582 | if (r < 0) |
583 | goto out; | 583 | goto out; |
584 | vcpu->arch.vm_tr_slot = r; | 584 | vcpu->arch.vm_tr_slot = r; |
585 | 585 | ||
586 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) | 586 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) |
587 | if (kvm->arch.is_sn2) { | 587 | if (kvm->arch.is_sn2) { |
588 | r = kvm_sn2_setup_mappings(vcpu); | 588 | r = kvm_sn2_setup_mappings(vcpu); |
589 | if (r < 0) | 589 | if (r < 0) |
590 | goto out; | 590 | goto out; |
591 | } | 591 | } |
592 | #endif | 592 | #endif |
593 | 593 | ||
594 | r = 0; | 594 | r = 0; |
595 | out: | 595 | out: |
596 | return r; | 596 | return r; |
597 | } | 597 | } |
598 | 598 | ||
599 | static void kvm_purge_vmm_mapping(struct kvm_vcpu *vcpu) | 599 | static void kvm_purge_vmm_mapping(struct kvm_vcpu *vcpu) |
600 | { | 600 | { |
601 | struct kvm *kvm = vcpu->kvm; | 601 | struct kvm *kvm = vcpu->kvm; |
602 | ia64_ptr_entry(0x3, vcpu->arch.vmm_tr_slot); | 602 | ia64_ptr_entry(0x3, vcpu->arch.vmm_tr_slot); |
603 | ia64_ptr_entry(0x3, vcpu->arch.vm_tr_slot); | 603 | ia64_ptr_entry(0x3, vcpu->arch.vm_tr_slot); |
604 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) | 604 | #if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC) |
605 | if (kvm->arch.is_sn2) | 605 | if (kvm->arch.is_sn2) |
606 | ia64_ptr_entry(0x3, vcpu->arch.sn_rtc_tr_slot); | 606 | ia64_ptr_entry(0x3, vcpu->arch.sn_rtc_tr_slot); |
607 | #endif | 607 | #endif |
608 | } | 608 | } |
609 | 609 | ||
610 | static int kvm_vcpu_pre_transition(struct kvm_vcpu *vcpu) | 610 | static int kvm_vcpu_pre_transition(struct kvm_vcpu *vcpu) |
611 | { | 611 | { |
612 | unsigned long psr; | 612 | unsigned long psr; |
613 | int r; | 613 | int r; |
614 | int cpu = smp_processor_id(); | 614 | int cpu = smp_processor_id(); |
615 | 615 | ||
616 | if (vcpu->arch.last_run_cpu != cpu || | 616 | if (vcpu->arch.last_run_cpu != cpu || |
617 | per_cpu(last_vcpu, cpu) != vcpu) { | 617 | per_cpu(last_vcpu, cpu) != vcpu) { |
618 | per_cpu(last_vcpu, cpu) = vcpu; | 618 | per_cpu(last_vcpu, cpu) = vcpu; |
619 | vcpu->arch.last_run_cpu = cpu; | 619 | vcpu->arch.last_run_cpu = cpu; |
620 | kvm_flush_tlb_all(); | 620 | kvm_flush_tlb_all(); |
621 | } | 621 | } |
622 | 622 | ||
623 | vcpu->arch.host_rr6 = ia64_get_rr(RR6); | 623 | vcpu->arch.host_rr6 = ia64_get_rr(RR6); |
624 | vti_set_rr6(vcpu->arch.vmm_rr); | 624 | vti_set_rr6(vcpu->arch.vmm_rr); |
625 | local_irq_save(psr); | 625 | local_irq_save(psr); |
626 | r = kvm_insert_vmm_mapping(vcpu); | 626 | r = kvm_insert_vmm_mapping(vcpu); |
627 | local_irq_restore(psr); | 627 | local_irq_restore(psr); |
628 | return r; | 628 | return r; |
629 | } | 629 | } |
630 | 630 | ||
631 | static void kvm_vcpu_post_transition(struct kvm_vcpu *vcpu) | 631 | static void kvm_vcpu_post_transition(struct kvm_vcpu *vcpu) |
632 | { | 632 | { |
633 | kvm_purge_vmm_mapping(vcpu); | 633 | kvm_purge_vmm_mapping(vcpu); |
634 | vti_set_rr6(vcpu->arch.host_rr6); | 634 | vti_set_rr6(vcpu->arch.host_rr6); |
635 | } | 635 | } |
636 | 636 | ||
637 | static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 637 | static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
638 | { | 638 | { |
639 | union context *host_ctx, *guest_ctx; | 639 | union context *host_ctx, *guest_ctx; |
640 | int r, idx; | 640 | int r, idx; |
641 | 641 | ||
642 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 642 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
643 | 643 | ||
644 | again: | 644 | again: |
645 | if (signal_pending(current)) { | 645 | if (signal_pending(current)) { |
646 | r = -EINTR; | 646 | r = -EINTR; |
647 | kvm_run->exit_reason = KVM_EXIT_INTR; | 647 | kvm_run->exit_reason = KVM_EXIT_INTR; |
648 | goto out; | 648 | goto out; |
649 | } | 649 | } |
650 | 650 | ||
651 | preempt_disable(); | 651 | preempt_disable(); |
652 | local_irq_disable(); | 652 | local_irq_disable(); |
653 | 653 | ||
654 | /*Get host and guest context with guest address space.*/ | 654 | /*Get host and guest context with guest address space.*/ |
655 | host_ctx = kvm_get_host_context(vcpu); | 655 | host_ctx = kvm_get_host_context(vcpu); |
656 | guest_ctx = kvm_get_guest_context(vcpu); | 656 | guest_ctx = kvm_get_guest_context(vcpu); |
657 | 657 | ||
658 | clear_bit(KVM_REQ_KICK, &vcpu->requests); | 658 | clear_bit(KVM_REQ_KICK, &vcpu->requests); |
659 | 659 | ||
660 | r = kvm_vcpu_pre_transition(vcpu); | 660 | r = kvm_vcpu_pre_transition(vcpu); |
661 | if (r < 0) | 661 | if (r < 0) |
662 | goto vcpu_run_fail; | 662 | goto vcpu_run_fail; |
663 | 663 | ||
664 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 664 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
665 | kvm_guest_enter(); | 665 | kvm_guest_enter(); |
666 | 666 | ||
667 | /* | 667 | /* |
668 | * Transition to the guest | 668 | * Transition to the guest |
669 | */ | 669 | */ |
670 | kvm_vmm_info->tramp_entry(host_ctx, guest_ctx); | 670 | kvm_vmm_info->tramp_entry(host_ctx, guest_ctx); |
671 | 671 | ||
672 | kvm_vcpu_post_transition(vcpu); | 672 | kvm_vcpu_post_transition(vcpu); |
673 | 673 | ||
674 | vcpu->arch.launched = 1; | 674 | vcpu->arch.launched = 1; |
675 | set_bit(KVM_REQ_KICK, &vcpu->requests); | 675 | set_bit(KVM_REQ_KICK, &vcpu->requests); |
676 | local_irq_enable(); | 676 | local_irq_enable(); |
677 | 677 | ||
678 | /* | 678 | /* |
679 | * We must have an instruction between local_irq_enable() and | 679 | * We must have an instruction between local_irq_enable() and |
680 | * kvm_guest_exit(), so the timer interrupt isn't delayed by | 680 | * kvm_guest_exit(), so the timer interrupt isn't delayed by |
681 | * the interrupt shadow. The stat.exits increment will do nicely. | 681 | * the interrupt shadow. The stat.exits increment will do nicely. |
682 | * But we need to prevent reordering, hence this barrier(): | 682 | * But we need to prevent reordering, hence this barrier(): |
683 | */ | 683 | */ |
684 | barrier(); | 684 | barrier(); |
685 | kvm_guest_exit(); | 685 | kvm_guest_exit(); |
686 | preempt_enable(); | 686 | preempt_enable(); |
687 | 687 | ||
688 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 688 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
689 | 689 | ||
690 | r = kvm_handle_exit(kvm_run, vcpu); | 690 | r = kvm_handle_exit(kvm_run, vcpu); |
691 | 691 | ||
692 | if (r > 0) { | 692 | if (r > 0) { |
693 | if (!need_resched()) | 693 | if (!need_resched()) |
694 | goto again; | 694 | goto again; |
695 | } | 695 | } |
696 | 696 | ||
697 | out: | 697 | out: |
698 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 698 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
699 | if (r > 0) { | 699 | if (r > 0) { |
700 | kvm_resched(vcpu); | 700 | kvm_resched(vcpu); |
701 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 701 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
702 | goto again; | 702 | goto again; |
703 | } | 703 | } |
704 | 704 | ||
705 | return r; | 705 | return r; |
706 | 706 | ||
707 | vcpu_run_fail: | 707 | vcpu_run_fail: |
708 | local_irq_enable(); | 708 | local_irq_enable(); |
709 | preempt_enable(); | 709 | preempt_enable(); |
710 | kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY; | 710 | kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY; |
711 | goto out; | 711 | goto out; |
712 | } | 712 | } |
713 | 713 | ||
714 | static void kvm_set_mmio_data(struct kvm_vcpu *vcpu) | 714 | static void kvm_set_mmio_data(struct kvm_vcpu *vcpu) |
715 | { | 715 | { |
716 | struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu); | 716 | struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu); |
717 | 717 | ||
718 | if (!vcpu->mmio_is_write) | 718 | if (!vcpu->mmio_is_write) |
719 | memcpy(&p->data, vcpu->mmio_data, 8); | 719 | memcpy(&p->data, vcpu->mmio_data, 8); |
720 | p->state = STATE_IORESP_READY; | 720 | p->state = STATE_IORESP_READY; |
721 | } | 721 | } |
722 | 722 | ||
723 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 723 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
724 | { | 724 | { |
725 | int r; | 725 | int r; |
726 | sigset_t sigsaved; | 726 | sigset_t sigsaved; |
727 | 727 | ||
728 | if (vcpu->sigset_active) | 728 | if (vcpu->sigset_active) |
729 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); | 729 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); |
730 | 730 | ||
731 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { | 731 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { |
732 | kvm_vcpu_block(vcpu); | 732 | kvm_vcpu_block(vcpu); |
733 | clear_bit(KVM_REQ_UNHALT, &vcpu->requests); | 733 | clear_bit(KVM_REQ_UNHALT, &vcpu->requests); |
734 | r = -EAGAIN; | 734 | r = -EAGAIN; |
735 | goto out; | 735 | goto out; |
736 | } | 736 | } |
737 | 737 | ||
738 | if (vcpu->mmio_needed) { | 738 | if (vcpu->mmio_needed) { |
739 | memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); | 739 | memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); |
740 | kvm_set_mmio_data(vcpu); | 740 | kvm_set_mmio_data(vcpu); |
741 | vcpu->mmio_read_completed = 1; | 741 | vcpu->mmio_read_completed = 1; |
742 | vcpu->mmio_needed = 0; | 742 | vcpu->mmio_needed = 0; |
743 | } | 743 | } |
744 | r = __vcpu_run(vcpu, kvm_run); | 744 | r = __vcpu_run(vcpu, kvm_run); |
745 | out: | 745 | out: |
746 | if (vcpu->sigset_active) | 746 | if (vcpu->sigset_active) |
747 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); | 747 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); |
748 | 748 | ||
749 | return r; | 749 | return r; |
750 | } | 750 | } |
751 | 751 | ||
752 | static struct kvm *kvm_alloc_kvm(void) | 752 | static struct kvm *kvm_alloc_kvm(void) |
753 | { | 753 | { |
754 | 754 | ||
755 | struct kvm *kvm; | 755 | struct kvm *kvm; |
756 | uint64_t vm_base; | 756 | uint64_t vm_base; |
757 | 757 | ||
758 | BUG_ON(sizeof(struct kvm) > KVM_VM_STRUCT_SIZE); | 758 | BUG_ON(sizeof(struct kvm) > KVM_VM_STRUCT_SIZE); |
759 | 759 | ||
760 | vm_base = __get_free_pages(GFP_KERNEL, get_order(KVM_VM_DATA_SIZE)); | 760 | vm_base = __get_free_pages(GFP_KERNEL, get_order(KVM_VM_DATA_SIZE)); |
761 | 761 | ||
762 | if (!vm_base) | 762 | if (!vm_base) |
763 | return ERR_PTR(-ENOMEM); | 763 | return ERR_PTR(-ENOMEM); |
764 | 764 | ||
765 | memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); | 765 | memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); |
766 | kvm = (struct kvm *)(vm_base + | 766 | kvm = (struct kvm *)(vm_base + |
767 | offsetof(struct kvm_vm_data, kvm_vm_struct)); | 767 | offsetof(struct kvm_vm_data, kvm_vm_struct)); |
768 | kvm->arch.vm_base = vm_base; | 768 | kvm->arch.vm_base = vm_base; |
769 | printk(KERN_DEBUG"kvm: vm's data area:0x%lx\n", vm_base); | 769 | printk(KERN_DEBUG"kvm: vm's data area:0x%lx\n", vm_base); |
770 | 770 | ||
771 | return kvm; | 771 | return kvm; |
772 | } | 772 | } |
773 | 773 | ||
774 | struct kvm_io_range { | 774 | struct kvm_io_range { |
775 | unsigned long start; | 775 | unsigned long start; |
776 | unsigned long size; | 776 | unsigned long size; |
777 | unsigned long type; | 777 | unsigned long type; |
778 | }; | 778 | }; |
779 | 779 | ||
780 | static const struct kvm_io_range io_ranges[] = { | 780 | static const struct kvm_io_range io_ranges[] = { |
781 | {VGA_IO_START, VGA_IO_SIZE, GPFN_FRAME_BUFFER}, | 781 | {VGA_IO_START, VGA_IO_SIZE, GPFN_FRAME_BUFFER}, |
782 | {MMIO_START, MMIO_SIZE, GPFN_LOW_MMIO}, | 782 | {MMIO_START, MMIO_SIZE, GPFN_LOW_MMIO}, |
783 | {LEGACY_IO_START, LEGACY_IO_SIZE, GPFN_LEGACY_IO}, | 783 | {LEGACY_IO_START, LEGACY_IO_SIZE, GPFN_LEGACY_IO}, |
784 | {IO_SAPIC_START, IO_SAPIC_SIZE, GPFN_IOSAPIC}, | 784 | {IO_SAPIC_START, IO_SAPIC_SIZE, GPFN_IOSAPIC}, |
785 | {PIB_START, PIB_SIZE, GPFN_PIB}, | 785 | {PIB_START, PIB_SIZE, GPFN_PIB}, |
786 | }; | 786 | }; |
787 | 787 | ||
788 | static void kvm_build_io_pmt(struct kvm *kvm) | 788 | static void kvm_build_io_pmt(struct kvm *kvm) |
789 | { | 789 | { |
790 | unsigned long i, j; | 790 | unsigned long i, j; |
791 | 791 | ||
792 | /* Mark I/O ranges */ | 792 | /* Mark I/O ranges */ |
793 | for (i = 0; i < (sizeof(io_ranges) / sizeof(struct kvm_io_range)); | 793 | for (i = 0; i < (sizeof(io_ranges) / sizeof(struct kvm_io_range)); |
794 | i++) { | 794 | i++) { |
795 | for (j = io_ranges[i].start; | 795 | for (j = io_ranges[i].start; |
796 | j < io_ranges[i].start + io_ranges[i].size; | 796 | j < io_ranges[i].start + io_ranges[i].size; |
797 | j += PAGE_SIZE) | 797 | j += PAGE_SIZE) |
798 | kvm_set_pmt_entry(kvm, j >> PAGE_SHIFT, | 798 | kvm_set_pmt_entry(kvm, j >> PAGE_SHIFT, |
799 | io_ranges[i].type, 0); | 799 | io_ranges[i].type, 0); |
800 | } | 800 | } |
801 | 801 | ||
802 | } | 802 | } |
803 | 803 | ||
804 | /*Use unused rids to virtualize guest rid.*/ | 804 | /*Use unused rids to virtualize guest rid.*/ |
805 | #define GUEST_PHYSICAL_RR0 0x1739 | 805 | #define GUEST_PHYSICAL_RR0 0x1739 |
806 | #define GUEST_PHYSICAL_RR4 0x2739 | 806 | #define GUEST_PHYSICAL_RR4 0x2739 |
807 | #define VMM_INIT_RR 0x1660 | 807 | #define VMM_INIT_RR 0x1660 |
808 | 808 | ||
809 | static void kvm_init_vm(struct kvm *kvm) | 809 | static void kvm_init_vm(struct kvm *kvm) |
810 | { | 810 | { |
811 | BUG_ON(!kvm); | 811 | BUG_ON(!kvm); |
812 | 812 | ||
813 | kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0; | 813 | kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0; |
814 | kvm->arch.metaphysical_rr4 = GUEST_PHYSICAL_RR4; | 814 | kvm->arch.metaphysical_rr4 = GUEST_PHYSICAL_RR4; |
815 | kvm->arch.vmm_init_rr = VMM_INIT_RR; | 815 | kvm->arch.vmm_init_rr = VMM_INIT_RR; |
816 | 816 | ||
817 | /* | 817 | /* |
818 | *Fill P2M entries for MMIO/IO ranges | 818 | *Fill P2M entries for MMIO/IO ranges |
819 | */ | 819 | */ |
820 | kvm_build_io_pmt(kvm); | 820 | kvm_build_io_pmt(kvm); |
821 | 821 | ||
822 | INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); | 822 | INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); |
823 | 823 | ||
824 | /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ | 824 | /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ |
825 | set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); | 825 | set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); |
826 | } | 826 | } |
827 | 827 | ||
828 | struct kvm *kvm_arch_create_vm(void) | 828 | struct kvm *kvm_arch_create_vm(void) |
829 | { | 829 | { |
830 | struct kvm *kvm = kvm_alloc_kvm(); | 830 | struct kvm *kvm = kvm_alloc_kvm(); |
831 | 831 | ||
832 | if (IS_ERR(kvm)) | 832 | if (IS_ERR(kvm)) |
833 | return ERR_PTR(-ENOMEM); | 833 | return ERR_PTR(-ENOMEM); |
834 | 834 | ||
835 | kvm->arch.is_sn2 = ia64_platform_is("sn2"); | 835 | kvm->arch.is_sn2 = ia64_platform_is("sn2"); |
836 | 836 | ||
837 | kvm_init_vm(kvm); | 837 | kvm_init_vm(kvm); |
838 | 838 | ||
839 | return kvm; | 839 | return kvm; |
840 | 840 | ||
841 | } | 841 | } |
842 | 842 | ||
843 | static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, | 843 | static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, |
844 | struct kvm_irqchip *chip) | 844 | struct kvm_irqchip *chip) |
845 | { | 845 | { |
846 | int r; | 846 | int r; |
847 | 847 | ||
848 | r = 0; | 848 | r = 0; |
849 | switch (chip->chip_id) { | 849 | switch (chip->chip_id) { |
850 | case KVM_IRQCHIP_IOAPIC: | 850 | case KVM_IRQCHIP_IOAPIC: |
851 | r = kvm_get_ioapic(kvm, &chip->chip.ioapic); | 851 | r = kvm_get_ioapic(kvm, &chip->chip.ioapic); |
852 | break; | 852 | break; |
853 | default: | 853 | default: |
854 | r = -EINVAL; | 854 | r = -EINVAL; |
855 | break; | 855 | break; |
856 | } | 856 | } |
857 | return r; | 857 | return r; |
858 | } | 858 | } |
859 | 859 | ||
860 | static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) | 860 | static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) |
861 | { | 861 | { |
862 | int r; | 862 | int r; |
863 | 863 | ||
864 | r = 0; | 864 | r = 0; |
865 | switch (chip->chip_id) { | 865 | switch (chip->chip_id) { |
866 | case KVM_IRQCHIP_IOAPIC: | 866 | case KVM_IRQCHIP_IOAPIC: |
867 | r = kvm_set_ioapic(kvm, &chip->chip.ioapic); | 867 | r = kvm_set_ioapic(kvm, &chip->chip.ioapic); |
868 | break; | 868 | break; |
869 | default: | 869 | default: |
870 | r = -EINVAL; | 870 | r = -EINVAL; |
871 | break; | 871 | break; |
872 | } | 872 | } |
873 | return r; | 873 | return r; |
874 | } | 874 | } |
875 | 875 | ||
876 | #define RESTORE_REGS(_x) vcpu->arch._x = regs->_x | 876 | #define RESTORE_REGS(_x) vcpu->arch._x = regs->_x |
877 | 877 | ||
878 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 878 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
879 | { | 879 | { |
880 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 880 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
881 | int i; | 881 | int i; |
882 | 882 | ||
883 | for (i = 0; i < 16; i++) { | 883 | for (i = 0; i < 16; i++) { |
884 | vpd->vgr[i] = regs->vpd.vgr[i]; | 884 | vpd->vgr[i] = regs->vpd.vgr[i]; |
885 | vpd->vbgr[i] = regs->vpd.vbgr[i]; | 885 | vpd->vbgr[i] = regs->vpd.vbgr[i]; |
886 | } | 886 | } |
887 | for (i = 0; i < 128; i++) | 887 | for (i = 0; i < 128; i++) |
888 | vpd->vcr[i] = regs->vpd.vcr[i]; | 888 | vpd->vcr[i] = regs->vpd.vcr[i]; |
889 | vpd->vhpi = regs->vpd.vhpi; | 889 | vpd->vhpi = regs->vpd.vhpi; |
890 | vpd->vnat = regs->vpd.vnat; | 890 | vpd->vnat = regs->vpd.vnat; |
891 | vpd->vbnat = regs->vpd.vbnat; | 891 | vpd->vbnat = regs->vpd.vbnat; |
892 | vpd->vpsr = regs->vpd.vpsr; | 892 | vpd->vpsr = regs->vpd.vpsr; |
893 | 893 | ||
894 | vpd->vpr = regs->vpd.vpr; | 894 | vpd->vpr = regs->vpd.vpr; |
895 | 895 | ||
896 | memcpy(&vcpu->arch.guest, ®s->saved_guest, sizeof(union context)); | 896 | memcpy(&vcpu->arch.guest, ®s->saved_guest, sizeof(union context)); |
897 | 897 | ||
898 | RESTORE_REGS(mp_state); | 898 | RESTORE_REGS(mp_state); |
899 | RESTORE_REGS(vmm_rr); | 899 | RESTORE_REGS(vmm_rr); |
900 | memcpy(vcpu->arch.itrs, regs->itrs, sizeof(struct thash_data) * NITRS); | 900 | memcpy(vcpu->arch.itrs, regs->itrs, sizeof(struct thash_data) * NITRS); |
901 | memcpy(vcpu->arch.dtrs, regs->dtrs, sizeof(struct thash_data) * NDTRS); | 901 | memcpy(vcpu->arch.dtrs, regs->dtrs, sizeof(struct thash_data) * NDTRS); |
902 | RESTORE_REGS(itr_regions); | 902 | RESTORE_REGS(itr_regions); |
903 | RESTORE_REGS(dtr_regions); | 903 | RESTORE_REGS(dtr_regions); |
904 | RESTORE_REGS(tc_regions); | 904 | RESTORE_REGS(tc_regions); |
905 | RESTORE_REGS(irq_check); | 905 | RESTORE_REGS(irq_check); |
906 | RESTORE_REGS(itc_check); | 906 | RESTORE_REGS(itc_check); |
907 | RESTORE_REGS(timer_check); | 907 | RESTORE_REGS(timer_check); |
908 | RESTORE_REGS(timer_pending); | 908 | RESTORE_REGS(timer_pending); |
909 | RESTORE_REGS(last_itc); | 909 | RESTORE_REGS(last_itc); |
910 | for (i = 0; i < 8; i++) { | 910 | for (i = 0; i < 8; i++) { |
911 | vcpu->arch.vrr[i] = regs->vrr[i]; | 911 | vcpu->arch.vrr[i] = regs->vrr[i]; |
912 | vcpu->arch.ibr[i] = regs->ibr[i]; | 912 | vcpu->arch.ibr[i] = regs->ibr[i]; |
913 | vcpu->arch.dbr[i] = regs->dbr[i]; | 913 | vcpu->arch.dbr[i] = regs->dbr[i]; |
914 | } | 914 | } |
915 | for (i = 0; i < 4; i++) | 915 | for (i = 0; i < 4; i++) |
916 | vcpu->arch.insvc[i] = regs->insvc[i]; | 916 | vcpu->arch.insvc[i] = regs->insvc[i]; |
917 | RESTORE_REGS(xtp); | 917 | RESTORE_REGS(xtp); |
918 | RESTORE_REGS(metaphysical_rr0); | 918 | RESTORE_REGS(metaphysical_rr0); |
919 | RESTORE_REGS(metaphysical_rr4); | 919 | RESTORE_REGS(metaphysical_rr4); |
920 | RESTORE_REGS(metaphysical_saved_rr0); | 920 | RESTORE_REGS(metaphysical_saved_rr0); |
921 | RESTORE_REGS(metaphysical_saved_rr4); | 921 | RESTORE_REGS(metaphysical_saved_rr4); |
922 | RESTORE_REGS(fp_psr); | 922 | RESTORE_REGS(fp_psr); |
923 | RESTORE_REGS(saved_gp); | 923 | RESTORE_REGS(saved_gp); |
924 | 924 | ||
925 | vcpu->arch.irq_new_pending = 1; | 925 | vcpu->arch.irq_new_pending = 1; |
926 | vcpu->arch.itc_offset = regs->saved_itc - kvm_get_itc(vcpu); | 926 | vcpu->arch.itc_offset = regs->saved_itc - kvm_get_itc(vcpu); |
927 | set_bit(KVM_REQ_RESUME, &vcpu->requests); | 927 | set_bit(KVM_REQ_RESUME, &vcpu->requests); |
928 | 928 | ||
929 | return 0; | 929 | return 0; |
930 | } | 930 | } |
931 | 931 | ||
932 | long kvm_arch_vm_ioctl(struct file *filp, | 932 | long kvm_arch_vm_ioctl(struct file *filp, |
933 | unsigned int ioctl, unsigned long arg) | 933 | unsigned int ioctl, unsigned long arg) |
934 | { | 934 | { |
935 | struct kvm *kvm = filp->private_data; | 935 | struct kvm *kvm = filp->private_data; |
936 | void __user *argp = (void __user *)arg; | 936 | void __user *argp = (void __user *)arg; |
937 | int r = -ENOTTY; | 937 | int r = -ENOTTY; |
938 | 938 | ||
939 | switch (ioctl) { | 939 | switch (ioctl) { |
940 | case KVM_SET_MEMORY_REGION: { | 940 | case KVM_SET_MEMORY_REGION: { |
941 | struct kvm_memory_region kvm_mem; | 941 | struct kvm_memory_region kvm_mem; |
942 | struct kvm_userspace_memory_region kvm_userspace_mem; | 942 | struct kvm_userspace_memory_region kvm_userspace_mem; |
943 | 943 | ||
944 | r = -EFAULT; | 944 | r = -EFAULT; |
945 | if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) | 945 | if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) |
946 | goto out; | 946 | goto out; |
947 | kvm_userspace_mem.slot = kvm_mem.slot; | 947 | kvm_userspace_mem.slot = kvm_mem.slot; |
948 | kvm_userspace_mem.flags = kvm_mem.flags; | 948 | kvm_userspace_mem.flags = kvm_mem.flags; |
949 | kvm_userspace_mem.guest_phys_addr = | 949 | kvm_userspace_mem.guest_phys_addr = |
950 | kvm_mem.guest_phys_addr; | 950 | kvm_mem.guest_phys_addr; |
951 | kvm_userspace_mem.memory_size = kvm_mem.memory_size; | 951 | kvm_userspace_mem.memory_size = kvm_mem.memory_size; |
952 | r = kvm_vm_ioctl_set_memory_region(kvm, | 952 | r = kvm_vm_ioctl_set_memory_region(kvm, |
953 | &kvm_userspace_mem, 0); | 953 | &kvm_userspace_mem, 0); |
954 | if (r) | 954 | if (r) |
955 | goto out; | 955 | goto out; |
956 | break; | 956 | break; |
957 | } | 957 | } |
958 | case KVM_CREATE_IRQCHIP: | 958 | case KVM_CREATE_IRQCHIP: |
959 | r = -EFAULT; | 959 | r = -EFAULT; |
960 | r = kvm_ioapic_init(kvm); | 960 | r = kvm_ioapic_init(kvm); |
961 | if (r) | 961 | if (r) |
962 | goto out; | 962 | goto out; |
963 | r = kvm_setup_default_irq_routing(kvm); | 963 | r = kvm_setup_default_irq_routing(kvm); |
964 | if (r) { | 964 | if (r) { |
965 | kvm_ioapic_destroy(kvm); | 965 | kvm_ioapic_destroy(kvm); |
966 | goto out; | 966 | goto out; |
967 | } | 967 | } |
968 | break; | 968 | break; |
969 | case KVM_IRQ_LINE_STATUS: | 969 | case KVM_IRQ_LINE_STATUS: |
970 | case KVM_IRQ_LINE: { | 970 | case KVM_IRQ_LINE: { |
971 | struct kvm_irq_level irq_event; | 971 | struct kvm_irq_level irq_event; |
972 | 972 | ||
973 | r = -EFAULT; | 973 | r = -EFAULT; |
974 | if (copy_from_user(&irq_event, argp, sizeof irq_event)) | 974 | if (copy_from_user(&irq_event, argp, sizeof irq_event)) |
975 | goto out; | 975 | goto out; |
976 | r = -ENXIO; | 976 | r = -ENXIO; |
977 | if (irqchip_in_kernel(kvm)) { | 977 | if (irqchip_in_kernel(kvm)) { |
978 | __s32 status; | 978 | __s32 status; |
979 | status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, | 979 | status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, |
980 | irq_event.irq, irq_event.level); | 980 | irq_event.irq, irq_event.level); |
981 | if (ioctl == KVM_IRQ_LINE_STATUS) { | 981 | if (ioctl == KVM_IRQ_LINE_STATUS) { |
982 | r = -EFAULT; | 982 | r = -EFAULT; |
983 | irq_event.status = status; | 983 | irq_event.status = status; |
984 | if (copy_to_user(argp, &irq_event, | 984 | if (copy_to_user(argp, &irq_event, |
985 | sizeof irq_event)) | 985 | sizeof irq_event)) |
986 | goto out; | 986 | goto out; |
987 | } | 987 | } |
988 | r = 0; | 988 | r = 0; |
989 | } | 989 | } |
990 | break; | 990 | break; |
991 | } | 991 | } |
992 | case KVM_GET_IRQCHIP: { | 992 | case KVM_GET_IRQCHIP: { |
993 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ | 993 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ |
994 | struct kvm_irqchip chip; | 994 | struct kvm_irqchip chip; |
995 | 995 | ||
996 | r = -EFAULT; | 996 | r = -EFAULT; |
997 | if (copy_from_user(&chip, argp, sizeof chip)) | 997 | if (copy_from_user(&chip, argp, sizeof chip)) |
998 | goto out; | 998 | goto out; |
999 | r = -ENXIO; | 999 | r = -ENXIO; |
1000 | if (!irqchip_in_kernel(kvm)) | 1000 | if (!irqchip_in_kernel(kvm)) |
1001 | goto out; | 1001 | goto out; |
1002 | r = kvm_vm_ioctl_get_irqchip(kvm, &chip); | 1002 | r = kvm_vm_ioctl_get_irqchip(kvm, &chip); |
1003 | if (r) | 1003 | if (r) |
1004 | goto out; | 1004 | goto out; |
1005 | r = -EFAULT; | 1005 | r = -EFAULT; |
1006 | if (copy_to_user(argp, &chip, sizeof chip)) | 1006 | if (copy_to_user(argp, &chip, sizeof chip)) |
1007 | goto out; | 1007 | goto out; |
1008 | r = 0; | 1008 | r = 0; |
1009 | break; | 1009 | break; |
1010 | } | 1010 | } |
1011 | case KVM_SET_IRQCHIP: { | 1011 | case KVM_SET_IRQCHIP: { |
1012 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ | 1012 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ |
1013 | struct kvm_irqchip chip; | 1013 | struct kvm_irqchip chip; |
1014 | 1014 | ||
1015 | r = -EFAULT; | 1015 | r = -EFAULT; |
1016 | if (copy_from_user(&chip, argp, sizeof chip)) | 1016 | if (copy_from_user(&chip, argp, sizeof chip)) |
1017 | goto out; | 1017 | goto out; |
1018 | r = -ENXIO; | 1018 | r = -ENXIO; |
1019 | if (!irqchip_in_kernel(kvm)) | 1019 | if (!irqchip_in_kernel(kvm)) |
1020 | goto out; | 1020 | goto out; |
1021 | r = kvm_vm_ioctl_set_irqchip(kvm, &chip); | 1021 | r = kvm_vm_ioctl_set_irqchip(kvm, &chip); |
1022 | if (r) | 1022 | if (r) |
1023 | goto out; | 1023 | goto out; |
1024 | r = 0; | 1024 | r = 0; |
1025 | break; | 1025 | break; |
1026 | } | 1026 | } |
1027 | default: | 1027 | default: |
1028 | ; | 1028 | ; |
1029 | } | 1029 | } |
1030 | out: | 1030 | out: |
1031 | return r; | 1031 | return r; |
1032 | } | 1032 | } |
1033 | 1033 | ||
1034 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, | 1034 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, |
1035 | struct kvm_sregs *sregs) | 1035 | struct kvm_sregs *sregs) |
1036 | { | 1036 | { |
1037 | return -EINVAL; | 1037 | return -EINVAL; |
1038 | } | 1038 | } |
1039 | 1039 | ||
1040 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, | 1040 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, |
1041 | struct kvm_sregs *sregs) | 1041 | struct kvm_sregs *sregs) |
1042 | { | 1042 | { |
1043 | return -EINVAL; | 1043 | return -EINVAL; |
1044 | 1044 | ||
1045 | } | 1045 | } |
1046 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, | 1046 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, |
1047 | struct kvm_translation *tr) | 1047 | struct kvm_translation *tr) |
1048 | { | 1048 | { |
1049 | 1049 | ||
1050 | return -EINVAL; | 1050 | return -EINVAL; |
1051 | } | 1051 | } |
1052 | 1052 | ||
1053 | static int kvm_alloc_vmm_area(void) | 1053 | static int kvm_alloc_vmm_area(void) |
1054 | { | 1054 | { |
1055 | if (!kvm_vmm_base && (kvm_vm_buffer_size < KVM_VM_BUFFER_SIZE)) { | 1055 | if (!kvm_vmm_base && (kvm_vm_buffer_size < KVM_VM_BUFFER_SIZE)) { |
1056 | kvm_vmm_base = __get_free_pages(GFP_KERNEL, | 1056 | kvm_vmm_base = __get_free_pages(GFP_KERNEL, |
1057 | get_order(KVM_VMM_SIZE)); | 1057 | get_order(KVM_VMM_SIZE)); |
1058 | if (!kvm_vmm_base) | 1058 | if (!kvm_vmm_base) |
1059 | return -ENOMEM; | 1059 | return -ENOMEM; |
1060 | 1060 | ||
1061 | memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); | 1061 | memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); |
1062 | kvm_vm_buffer = kvm_vmm_base + VMM_SIZE; | 1062 | kvm_vm_buffer = kvm_vmm_base + VMM_SIZE; |
1063 | 1063 | ||
1064 | printk(KERN_DEBUG"kvm:VMM's Base Addr:0x%lx, vm_buffer:0x%lx\n", | 1064 | printk(KERN_DEBUG"kvm:VMM's Base Addr:0x%lx, vm_buffer:0x%lx\n", |
1065 | kvm_vmm_base, kvm_vm_buffer); | 1065 | kvm_vmm_base, kvm_vm_buffer); |
1066 | } | 1066 | } |
1067 | 1067 | ||
1068 | return 0; | 1068 | return 0; |
1069 | } | 1069 | } |
1070 | 1070 | ||
1071 | static void kvm_free_vmm_area(void) | 1071 | static void kvm_free_vmm_area(void) |
1072 | { | 1072 | { |
1073 | if (kvm_vmm_base) { | 1073 | if (kvm_vmm_base) { |
1074 | /*Zero this area before free to avoid bits leak!!*/ | 1074 | /*Zero this area before free to avoid bits leak!!*/ |
1075 | memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); | 1075 | memset((void *)kvm_vmm_base, 0, KVM_VMM_SIZE); |
1076 | free_pages(kvm_vmm_base, get_order(KVM_VMM_SIZE)); | 1076 | free_pages(kvm_vmm_base, get_order(KVM_VMM_SIZE)); |
1077 | kvm_vmm_base = 0; | 1077 | kvm_vmm_base = 0; |
1078 | kvm_vm_buffer = 0; | 1078 | kvm_vm_buffer = 0; |
1079 | kvm_vsa_base = 0; | 1079 | kvm_vsa_base = 0; |
1080 | } | 1080 | } |
1081 | } | 1081 | } |
1082 | 1082 | ||
1083 | static int vti_init_vpd(struct kvm_vcpu *vcpu) | 1083 | static int vti_init_vpd(struct kvm_vcpu *vcpu) |
1084 | { | 1084 | { |
1085 | int i; | 1085 | int i; |
1086 | union cpuid3_t cpuid3; | 1086 | union cpuid3_t cpuid3; |
1087 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 1087 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
1088 | 1088 | ||
1089 | if (IS_ERR(vpd)) | 1089 | if (IS_ERR(vpd)) |
1090 | return PTR_ERR(vpd); | 1090 | return PTR_ERR(vpd); |
1091 | 1091 | ||
1092 | /* CPUID init */ | 1092 | /* CPUID init */ |
1093 | for (i = 0; i < 5; i++) | 1093 | for (i = 0; i < 5; i++) |
1094 | vpd->vcpuid[i] = ia64_get_cpuid(i); | 1094 | vpd->vcpuid[i] = ia64_get_cpuid(i); |
1095 | 1095 | ||
1096 | /* Limit the CPUID number to 5 */ | 1096 | /* Limit the CPUID number to 5 */ |
1097 | cpuid3.value = vpd->vcpuid[3]; | 1097 | cpuid3.value = vpd->vcpuid[3]; |
1098 | cpuid3.number = 4; /* 5 - 1 */ | 1098 | cpuid3.number = 4; /* 5 - 1 */ |
1099 | vpd->vcpuid[3] = cpuid3.value; | 1099 | vpd->vcpuid[3] = cpuid3.value; |
1100 | 1100 | ||
1101 | /*Set vac and vdc fields*/ | 1101 | /*Set vac and vdc fields*/ |
1102 | vpd->vac.a_from_int_cr = 1; | 1102 | vpd->vac.a_from_int_cr = 1; |
1103 | vpd->vac.a_to_int_cr = 1; | 1103 | vpd->vac.a_to_int_cr = 1; |
1104 | vpd->vac.a_from_psr = 1; | 1104 | vpd->vac.a_from_psr = 1; |
1105 | vpd->vac.a_from_cpuid = 1; | 1105 | vpd->vac.a_from_cpuid = 1; |
1106 | vpd->vac.a_cover = 1; | 1106 | vpd->vac.a_cover = 1; |
1107 | vpd->vac.a_bsw = 1; | 1107 | vpd->vac.a_bsw = 1; |
1108 | vpd->vac.a_int = 1; | 1108 | vpd->vac.a_int = 1; |
1109 | vpd->vdc.d_vmsw = 1; | 1109 | vpd->vdc.d_vmsw = 1; |
1110 | 1110 | ||
1111 | /*Set virtual buffer*/ | 1111 | /*Set virtual buffer*/ |
1112 | vpd->virt_env_vaddr = KVM_VM_BUFFER_BASE; | 1112 | vpd->virt_env_vaddr = KVM_VM_BUFFER_BASE; |
1113 | 1113 | ||
1114 | return 0; | 1114 | return 0; |
1115 | } | 1115 | } |
1116 | 1116 | ||
1117 | static int vti_create_vp(struct kvm_vcpu *vcpu) | 1117 | static int vti_create_vp(struct kvm_vcpu *vcpu) |
1118 | { | 1118 | { |
1119 | long ret; | 1119 | long ret; |
1120 | struct vpd *vpd = vcpu->arch.vpd; | 1120 | struct vpd *vpd = vcpu->arch.vpd; |
1121 | unsigned long vmm_ivt; | 1121 | unsigned long vmm_ivt; |
1122 | 1122 | ||
1123 | vmm_ivt = kvm_vmm_info->vmm_ivt; | 1123 | vmm_ivt = kvm_vmm_info->vmm_ivt; |
1124 | 1124 | ||
1125 | printk(KERN_DEBUG "kvm: vcpu:%p,ivt: 0x%lx\n", vcpu, vmm_ivt); | 1125 | printk(KERN_DEBUG "kvm: vcpu:%p,ivt: 0x%lx\n", vcpu, vmm_ivt); |
1126 | 1126 | ||
1127 | ret = ia64_pal_vp_create((u64 *)vpd, (u64 *)vmm_ivt, 0); | 1127 | ret = ia64_pal_vp_create((u64 *)vpd, (u64 *)vmm_ivt, 0); |
1128 | 1128 | ||
1129 | if (ret) { | 1129 | if (ret) { |
1130 | printk(KERN_ERR"kvm: ia64_pal_vp_create failed!\n"); | 1130 | printk(KERN_ERR"kvm: ia64_pal_vp_create failed!\n"); |
1131 | return -EINVAL; | 1131 | return -EINVAL; |
1132 | } | 1132 | } |
1133 | return 0; | 1133 | return 0; |
1134 | } | 1134 | } |
1135 | 1135 | ||
1136 | static void init_ptce_info(struct kvm_vcpu *vcpu) | 1136 | static void init_ptce_info(struct kvm_vcpu *vcpu) |
1137 | { | 1137 | { |
1138 | ia64_ptce_info_t ptce = {0}; | 1138 | ia64_ptce_info_t ptce = {0}; |
1139 | 1139 | ||
1140 | ia64_get_ptce(&ptce); | 1140 | ia64_get_ptce(&ptce); |
1141 | vcpu->arch.ptce_base = ptce.base; | 1141 | vcpu->arch.ptce_base = ptce.base; |
1142 | vcpu->arch.ptce_count[0] = ptce.count[0]; | 1142 | vcpu->arch.ptce_count[0] = ptce.count[0]; |
1143 | vcpu->arch.ptce_count[1] = ptce.count[1]; | 1143 | vcpu->arch.ptce_count[1] = ptce.count[1]; |
1144 | vcpu->arch.ptce_stride[0] = ptce.stride[0]; | 1144 | vcpu->arch.ptce_stride[0] = ptce.stride[0]; |
1145 | vcpu->arch.ptce_stride[1] = ptce.stride[1]; | 1145 | vcpu->arch.ptce_stride[1] = ptce.stride[1]; |
1146 | } | 1146 | } |
1147 | 1147 | ||
1148 | static void kvm_migrate_hlt_timer(struct kvm_vcpu *vcpu) | 1148 | static void kvm_migrate_hlt_timer(struct kvm_vcpu *vcpu) |
1149 | { | 1149 | { |
1150 | struct hrtimer *p_ht = &vcpu->arch.hlt_timer; | 1150 | struct hrtimer *p_ht = &vcpu->arch.hlt_timer; |
1151 | 1151 | ||
1152 | if (hrtimer_cancel(p_ht)) | 1152 | if (hrtimer_cancel(p_ht)) |
1153 | hrtimer_start_expires(p_ht, HRTIMER_MODE_ABS); | 1153 | hrtimer_start_expires(p_ht, HRTIMER_MODE_ABS); |
1154 | } | 1154 | } |
1155 | 1155 | ||
1156 | static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data) | 1156 | static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data) |
1157 | { | 1157 | { |
1158 | struct kvm_vcpu *vcpu; | 1158 | struct kvm_vcpu *vcpu; |
1159 | wait_queue_head_t *q; | 1159 | wait_queue_head_t *q; |
1160 | 1160 | ||
1161 | vcpu = container_of(data, struct kvm_vcpu, arch.hlt_timer); | 1161 | vcpu = container_of(data, struct kvm_vcpu, arch.hlt_timer); |
1162 | q = &vcpu->wq; | 1162 | q = &vcpu->wq; |
1163 | 1163 | ||
1164 | if (vcpu->arch.mp_state != KVM_MP_STATE_HALTED) | 1164 | if (vcpu->arch.mp_state != KVM_MP_STATE_HALTED) |
1165 | goto out; | 1165 | goto out; |
1166 | 1166 | ||
1167 | if (waitqueue_active(q)) | 1167 | if (waitqueue_active(q)) |
1168 | wake_up_interruptible(q); | 1168 | wake_up_interruptible(q); |
1169 | 1169 | ||
1170 | out: | 1170 | out: |
1171 | vcpu->arch.timer_fired = 1; | 1171 | vcpu->arch.timer_fired = 1; |
1172 | vcpu->arch.timer_check = 1; | 1172 | vcpu->arch.timer_check = 1; |
1173 | return HRTIMER_NORESTART; | 1173 | return HRTIMER_NORESTART; |
1174 | } | 1174 | } |
1175 | 1175 | ||
1176 | #define PALE_RESET_ENTRY 0x80000000ffffffb0UL | 1176 | #define PALE_RESET_ENTRY 0x80000000ffffffb0UL |
1177 | 1177 | ||
1178 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) | 1178 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) |
1179 | { | 1179 | { |
1180 | struct kvm_vcpu *v; | 1180 | struct kvm_vcpu *v; |
1181 | int r; | 1181 | int r; |
1182 | int i; | 1182 | int i; |
1183 | long itc_offset; | 1183 | long itc_offset; |
1184 | struct kvm *kvm = vcpu->kvm; | 1184 | struct kvm *kvm = vcpu->kvm; |
1185 | struct kvm_pt_regs *regs = vcpu_regs(vcpu); | 1185 | struct kvm_pt_regs *regs = vcpu_regs(vcpu); |
1186 | 1186 | ||
1187 | union context *p_ctx = &vcpu->arch.guest; | 1187 | union context *p_ctx = &vcpu->arch.guest; |
1188 | struct kvm_vcpu *vmm_vcpu = to_guest(vcpu->kvm, vcpu); | 1188 | struct kvm_vcpu *vmm_vcpu = to_guest(vcpu->kvm, vcpu); |
1189 | 1189 | ||
1190 | /*Init vcpu context for first run.*/ | 1190 | /*Init vcpu context for first run.*/ |
1191 | if (IS_ERR(vmm_vcpu)) | 1191 | if (IS_ERR(vmm_vcpu)) |
1192 | return PTR_ERR(vmm_vcpu); | 1192 | return PTR_ERR(vmm_vcpu); |
1193 | 1193 | ||
1194 | if (kvm_vcpu_is_bsp(vcpu)) { | 1194 | if (kvm_vcpu_is_bsp(vcpu)) { |
1195 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 1195 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
1196 | 1196 | ||
1197 | /*Set entry address for first run.*/ | 1197 | /*Set entry address for first run.*/ |
1198 | regs->cr_iip = PALE_RESET_ENTRY; | 1198 | regs->cr_iip = PALE_RESET_ENTRY; |
1199 | 1199 | ||
1200 | /*Initialize itc offset for vcpus*/ | 1200 | /*Initialize itc offset for vcpus*/ |
1201 | itc_offset = 0UL - kvm_get_itc(vcpu); | 1201 | itc_offset = 0UL - kvm_get_itc(vcpu); |
1202 | for (i = 0; i < KVM_MAX_VCPUS; i++) { | 1202 | for (i = 0; i < KVM_MAX_VCPUS; i++) { |
1203 | v = (struct kvm_vcpu *)((char *)vcpu + | 1203 | v = (struct kvm_vcpu *)((char *)vcpu + |
1204 | sizeof(struct kvm_vcpu_data) * i); | 1204 | sizeof(struct kvm_vcpu_data) * i); |
1205 | v->arch.itc_offset = itc_offset; | 1205 | v->arch.itc_offset = itc_offset; |
1206 | v->arch.last_itc = 0; | 1206 | v->arch.last_itc = 0; |
1207 | } | 1207 | } |
1208 | } else | 1208 | } else |
1209 | vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; | 1209 | vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; |
1210 | 1210 | ||
1211 | r = -ENOMEM; | 1211 | r = -ENOMEM; |
1212 | vcpu->arch.apic = kzalloc(sizeof(struct kvm_lapic), GFP_KERNEL); | 1212 | vcpu->arch.apic = kzalloc(sizeof(struct kvm_lapic), GFP_KERNEL); |
1213 | if (!vcpu->arch.apic) | 1213 | if (!vcpu->arch.apic) |
1214 | goto out; | 1214 | goto out; |
1215 | vcpu->arch.apic->vcpu = vcpu; | 1215 | vcpu->arch.apic->vcpu = vcpu; |
1216 | 1216 | ||
1217 | p_ctx->gr[1] = 0; | 1217 | p_ctx->gr[1] = 0; |
1218 | p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + KVM_STK_OFFSET); | 1218 | p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + KVM_STK_OFFSET); |
1219 | p_ctx->gr[13] = (unsigned long)vmm_vcpu; | 1219 | p_ctx->gr[13] = (unsigned long)vmm_vcpu; |
1220 | p_ctx->psr = 0x1008522000UL; | 1220 | p_ctx->psr = 0x1008522000UL; |
1221 | p_ctx->ar[40] = FPSR_DEFAULT; /*fpsr*/ | 1221 | p_ctx->ar[40] = FPSR_DEFAULT; /*fpsr*/ |
1222 | p_ctx->caller_unat = 0; | 1222 | p_ctx->caller_unat = 0; |
1223 | p_ctx->pr = 0x0; | 1223 | p_ctx->pr = 0x0; |
1224 | p_ctx->ar[36] = 0x0; /*unat*/ | 1224 | p_ctx->ar[36] = 0x0; /*unat*/ |
1225 | p_ctx->ar[19] = 0x0; /*rnat*/ | 1225 | p_ctx->ar[19] = 0x0; /*rnat*/ |
1226 | p_ctx->ar[18] = (unsigned long)vmm_vcpu + | 1226 | p_ctx->ar[18] = (unsigned long)vmm_vcpu + |
1227 | ((sizeof(struct kvm_vcpu)+15) & ~15); | 1227 | ((sizeof(struct kvm_vcpu)+15) & ~15); |
1228 | p_ctx->ar[64] = 0x0; /*pfs*/ | 1228 | p_ctx->ar[64] = 0x0; /*pfs*/ |
1229 | p_ctx->cr[0] = 0x7e04UL; | 1229 | p_ctx->cr[0] = 0x7e04UL; |
1230 | p_ctx->cr[2] = (unsigned long)kvm_vmm_info->vmm_ivt; | 1230 | p_ctx->cr[2] = (unsigned long)kvm_vmm_info->vmm_ivt; |
1231 | p_ctx->cr[8] = 0x3c; | 1231 | p_ctx->cr[8] = 0x3c; |
1232 | 1232 | ||
1233 | /*Initilize region register*/ | 1233 | /*Initilize region register*/ |
1234 | p_ctx->rr[0] = 0x30; | 1234 | p_ctx->rr[0] = 0x30; |
1235 | p_ctx->rr[1] = 0x30; | 1235 | p_ctx->rr[1] = 0x30; |
1236 | p_ctx->rr[2] = 0x30; | 1236 | p_ctx->rr[2] = 0x30; |
1237 | p_ctx->rr[3] = 0x30; | 1237 | p_ctx->rr[3] = 0x30; |
1238 | p_ctx->rr[4] = 0x30; | 1238 | p_ctx->rr[4] = 0x30; |
1239 | p_ctx->rr[5] = 0x30; | 1239 | p_ctx->rr[5] = 0x30; |
1240 | p_ctx->rr[7] = 0x30; | 1240 | p_ctx->rr[7] = 0x30; |
1241 | 1241 | ||
1242 | /*Initilize branch register 0*/ | 1242 | /*Initilize branch register 0*/ |
1243 | p_ctx->br[0] = *(unsigned long *)kvm_vmm_info->vmm_entry; | 1243 | p_ctx->br[0] = *(unsigned long *)kvm_vmm_info->vmm_entry; |
1244 | 1244 | ||
1245 | vcpu->arch.vmm_rr = kvm->arch.vmm_init_rr; | 1245 | vcpu->arch.vmm_rr = kvm->arch.vmm_init_rr; |
1246 | vcpu->arch.metaphysical_rr0 = kvm->arch.metaphysical_rr0; | 1246 | vcpu->arch.metaphysical_rr0 = kvm->arch.metaphysical_rr0; |
1247 | vcpu->arch.metaphysical_rr4 = kvm->arch.metaphysical_rr4; | 1247 | vcpu->arch.metaphysical_rr4 = kvm->arch.metaphysical_rr4; |
1248 | 1248 | ||
1249 | hrtimer_init(&vcpu->arch.hlt_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); | 1249 | hrtimer_init(&vcpu->arch.hlt_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); |
1250 | vcpu->arch.hlt_timer.function = hlt_timer_fn; | 1250 | vcpu->arch.hlt_timer.function = hlt_timer_fn; |
1251 | 1251 | ||
1252 | vcpu->arch.last_run_cpu = -1; | 1252 | vcpu->arch.last_run_cpu = -1; |
1253 | vcpu->arch.vpd = (struct vpd *)VPD_BASE(vcpu->vcpu_id); | 1253 | vcpu->arch.vpd = (struct vpd *)VPD_BASE(vcpu->vcpu_id); |
1254 | vcpu->arch.vsa_base = kvm_vsa_base; | 1254 | vcpu->arch.vsa_base = kvm_vsa_base; |
1255 | vcpu->arch.__gp = kvm_vmm_gp; | 1255 | vcpu->arch.__gp = kvm_vmm_gp; |
1256 | vcpu->arch.dirty_log_lock_pa = __pa(&kvm->arch.dirty_log_lock); | 1256 | vcpu->arch.dirty_log_lock_pa = __pa(&kvm->arch.dirty_log_lock); |
1257 | vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_BASE(vcpu->vcpu_id); | 1257 | vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_BASE(vcpu->vcpu_id); |
1258 | vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_BASE(vcpu->vcpu_id); | 1258 | vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_BASE(vcpu->vcpu_id); |
1259 | init_ptce_info(vcpu); | 1259 | init_ptce_info(vcpu); |
1260 | 1260 | ||
1261 | r = 0; | 1261 | r = 0; |
1262 | out: | 1262 | out: |
1263 | return r; | 1263 | return r; |
1264 | } | 1264 | } |
1265 | 1265 | ||
1266 | static int vti_vcpu_setup(struct kvm_vcpu *vcpu, int id) | 1266 | static int vti_vcpu_setup(struct kvm_vcpu *vcpu, int id) |
1267 | { | 1267 | { |
1268 | unsigned long psr; | 1268 | unsigned long psr; |
1269 | int r; | 1269 | int r; |
1270 | 1270 | ||
1271 | local_irq_save(psr); | 1271 | local_irq_save(psr); |
1272 | r = kvm_insert_vmm_mapping(vcpu); | 1272 | r = kvm_insert_vmm_mapping(vcpu); |
1273 | local_irq_restore(psr); | 1273 | local_irq_restore(psr); |
1274 | if (r) | 1274 | if (r) |
1275 | goto fail; | 1275 | goto fail; |
1276 | r = kvm_vcpu_init(vcpu, vcpu->kvm, id); | 1276 | r = kvm_vcpu_init(vcpu, vcpu->kvm, id); |
1277 | if (r) | 1277 | if (r) |
1278 | goto fail; | 1278 | goto fail; |
1279 | 1279 | ||
1280 | r = vti_init_vpd(vcpu); | 1280 | r = vti_init_vpd(vcpu); |
1281 | if (r) { | 1281 | if (r) { |
1282 | printk(KERN_DEBUG"kvm: vpd init error!!\n"); | 1282 | printk(KERN_DEBUG"kvm: vpd init error!!\n"); |
1283 | goto uninit; | 1283 | goto uninit; |
1284 | } | 1284 | } |
1285 | 1285 | ||
1286 | r = vti_create_vp(vcpu); | 1286 | r = vti_create_vp(vcpu); |
1287 | if (r) | 1287 | if (r) |
1288 | goto uninit; | 1288 | goto uninit; |
1289 | 1289 | ||
1290 | kvm_purge_vmm_mapping(vcpu); | 1290 | kvm_purge_vmm_mapping(vcpu); |
1291 | 1291 | ||
1292 | return 0; | 1292 | return 0; |
1293 | uninit: | 1293 | uninit: |
1294 | kvm_vcpu_uninit(vcpu); | 1294 | kvm_vcpu_uninit(vcpu); |
1295 | fail: | 1295 | fail: |
1296 | return r; | 1296 | return r; |
1297 | } | 1297 | } |
1298 | 1298 | ||
1299 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, | 1299 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, |
1300 | unsigned int id) | 1300 | unsigned int id) |
1301 | { | 1301 | { |
1302 | struct kvm_vcpu *vcpu; | 1302 | struct kvm_vcpu *vcpu; |
1303 | unsigned long vm_base = kvm->arch.vm_base; | 1303 | unsigned long vm_base = kvm->arch.vm_base; |
1304 | int r; | 1304 | int r; |
1305 | int cpu; | 1305 | int cpu; |
1306 | 1306 | ||
1307 | BUG_ON(sizeof(struct kvm_vcpu) > VCPU_STRUCT_SIZE/2); | 1307 | BUG_ON(sizeof(struct kvm_vcpu) > VCPU_STRUCT_SIZE/2); |
1308 | 1308 | ||
1309 | r = -EINVAL; | 1309 | r = -EINVAL; |
1310 | if (id >= KVM_MAX_VCPUS) { | 1310 | if (id >= KVM_MAX_VCPUS) { |
1311 | printk(KERN_ERR"kvm: Can't configure vcpus > %ld", | 1311 | printk(KERN_ERR"kvm: Can't configure vcpus > %ld", |
1312 | KVM_MAX_VCPUS); | 1312 | KVM_MAX_VCPUS); |
1313 | goto fail; | 1313 | goto fail; |
1314 | } | 1314 | } |
1315 | 1315 | ||
1316 | r = -ENOMEM; | 1316 | r = -ENOMEM; |
1317 | if (!vm_base) { | 1317 | if (!vm_base) { |
1318 | printk(KERN_ERR"kvm: Create vcpu[%d] error!\n", id); | 1318 | printk(KERN_ERR"kvm: Create vcpu[%d] error!\n", id); |
1319 | goto fail; | 1319 | goto fail; |
1320 | } | 1320 | } |
1321 | vcpu = (struct kvm_vcpu *)(vm_base + offsetof(struct kvm_vm_data, | 1321 | vcpu = (struct kvm_vcpu *)(vm_base + offsetof(struct kvm_vm_data, |
1322 | vcpu_data[id].vcpu_struct)); | 1322 | vcpu_data[id].vcpu_struct)); |
1323 | vcpu->kvm = kvm; | 1323 | vcpu->kvm = kvm; |
1324 | 1324 | ||
1325 | cpu = get_cpu(); | 1325 | cpu = get_cpu(); |
1326 | r = vti_vcpu_setup(vcpu, id); | 1326 | r = vti_vcpu_setup(vcpu, id); |
1327 | put_cpu(); | 1327 | put_cpu(); |
1328 | 1328 | ||
1329 | if (r) { | 1329 | if (r) { |
1330 | printk(KERN_DEBUG"kvm: vcpu_setup error!!\n"); | 1330 | printk(KERN_DEBUG"kvm: vcpu_setup error!!\n"); |
1331 | goto fail; | 1331 | goto fail; |
1332 | } | 1332 | } |
1333 | 1333 | ||
1334 | return vcpu; | 1334 | return vcpu; |
1335 | fail: | 1335 | fail: |
1336 | return ERR_PTR(r); | 1336 | return ERR_PTR(r); |
1337 | } | 1337 | } |
1338 | 1338 | ||
1339 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) | 1339 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) |
1340 | { | 1340 | { |
1341 | return 0; | 1341 | return 0; |
1342 | } | 1342 | } |
1343 | 1343 | ||
1344 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 1344 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
1345 | { | 1345 | { |
1346 | return -EINVAL; | 1346 | return -EINVAL; |
1347 | } | 1347 | } |
1348 | 1348 | ||
1349 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 1349 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
1350 | { | 1350 | { |
1351 | return -EINVAL; | 1351 | return -EINVAL; |
1352 | } | 1352 | } |
1353 | 1353 | ||
1354 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, | 1354 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, |
1355 | struct kvm_guest_debug *dbg) | 1355 | struct kvm_guest_debug *dbg) |
1356 | { | 1356 | { |
1357 | return -EINVAL; | 1357 | return -EINVAL; |
1358 | } | 1358 | } |
1359 | 1359 | ||
1360 | static void free_kvm(struct kvm *kvm) | 1360 | static void free_kvm(struct kvm *kvm) |
1361 | { | 1361 | { |
1362 | unsigned long vm_base = kvm->arch.vm_base; | 1362 | unsigned long vm_base = kvm->arch.vm_base; |
1363 | 1363 | ||
1364 | if (vm_base) { | 1364 | if (vm_base) { |
1365 | memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); | 1365 | memset((void *)vm_base, 0, KVM_VM_DATA_SIZE); |
1366 | free_pages(vm_base, get_order(KVM_VM_DATA_SIZE)); | 1366 | free_pages(vm_base, get_order(KVM_VM_DATA_SIZE)); |
1367 | } | 1367 | } |
1368 | 1368 | ||
1369 | } | 1369 | } |
1370 | 1370 | ||
1371 | static void kvm_release_vm_pages(struct kvm *kvm) | 1371 | static void kvm_release_vm_pages(struct kvm *kvm) |
1372 | { | 1372 | { |
1373 | struct kvm_memslots *slots; | 1373 | struct kvm_memslots *slots; |
1374 | struct kvm_memory_slot *memslot; | 1374 | struct kvm_memory_slot *memslot; |
1375 | int i, j; | 1375 | int i, j; |
1376 | unsigned long base_gfn; | 1376 | unsigned long base_gfn; |
1377 | 1377 | ||
1378 | slots = kvm_memslots(kvm); | 1378 | slots = kvm_memslots(kvm); |
1379 | for (i = 0; i < slots->nmemslots; i++) { | 1379 | for (i = 0; i < slots->nmemslots; i++) { |
1380 | memslot = &slots->memslots[i]; | 1380 | memslot = &slots->memslots[i]; |
1381 | base_gfn = memslot->base_gfn; | 1381 | base_gfn = memslot->base_gfn; |
1382 | 1382 | ||
1383 | for (j = 0; j < memslot->npages; j++) { | 1383 | for (j = 0; j < memslot->npages; j++) { |
1384 | if (memslot->rmap[j]) | 1384 | if (memslot->rmap[j]) |
1385 | put_page((struct page *)memslot->rmap[j]); | 1385 | put_page((struct page *)memslot->rmap[j]); |
1386 | } | 1386 | } |
1387 | } | 1387 | } |
1388 | } | 1388 | } |
1389 | 1389 | ||
1390 | void kvm_arch_sync_events(struct kvm *kvm) | 1390 | void kvm_arch_sync_events(struct kvm *kvm) |
1391 | { | 1391 | { |
1392 | } | 1392 | } |
1393 | 1393 | ||
1394 | void kvm_arch_destroy_vm(struct kvm *kvm) | 1394 | void kvm_arch_destroy_vm(struct kvm *kvm) |
1395 | { | 1395 | { |
1396 | kvm_iommu_unmap_guest(kvm); | 1396 | kvm_iommu_unmap_guest(kvm); |
1397 | #ifdef KVM_CAP_DEVICE_ASSIGNMENT | 1397 | #ifdef KVM_CAP_DEVICE_ASSIGNMENT |
1398 | kvm_free_all_assigned_devices(kvm); | 1398 | kvm_free_all_assigned_devices(kvm); |
1399 | #endif | 1399 | #endif |
1400 | kfree(kvm->arch.vioapic); | 1400 | kfree(kvm->arch.vioapic); |
1401 | kvm_release_vm_pages(kvm); | 1401 | kvm_release_vm_pages(kvm); |
1402 | kvm_free_physmem(kvm); | 1402 | kvm_free_physmem(kvm); |
1403 | cleanup_srcu_struct(&kvm->srcu); | 1403 | cleanup_srcu_struct(&kvm->srcu); |
1404 | free_kvm(kvm); | 1404 | free_kvm(kvm); |
1405 | } | 1405 | } |
1406 | 1406 | ||
1407 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) | 1407 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) |
1408 | { | 1408 | { |
1409 | } | 1409 | } |
1410 | 1410 | ||
1411 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) | 1411 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) |
1412 | { | 1412 | { |
1413 | if (cpu != vcpu->cpu) { | 1413 | if (cpu != vcpu->cpu) { |
1414 | vcpu->cpu = cpu; | 1414 | vcpu->cpu = cpu; |
1415 | if (vcpu->arch.ht_active) | 1415 | if (vcpu->arch.ht_active) |
1416 | kvm_migrate_hlt_timer(vcpu); | 1416 | kvm_migrate_hlt_timer(vcpu); |
1417 | } | 1417 | } |
1418 | } | 1418 | } |
1419 | 1419 | ||
1420 | #define SAVE_REGS(_x) regs->_x = vcpu->arch._x | 1420 | #define SAVE_REGS(_x) regs->_x = vcpu->arch._x |
1421 | 1421 | ||
1422 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 1422 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
1423 | { | 1423 | { |
1424 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 1424 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
1425 | int i; | 1425 | int i; |
1426 | 1426 | ||
1427 | vcpu_load(vcpu); | 1427 | vcpu_load(vcpu); |
1428 | 1428 | ||
1429 | for (i = 0; i < 16; i++) { | 1429 | for (i = 0; i < 16; i++) { |
1430 | regs->vpd.vgr[i] = vpd->vgr[i]; | 1430 | regs->vpd.vgr[i] = vpd->vgr[i]; |
1431 | regs->vpd.vbgr[i] = vpd->vbgr[i]; | 1431 | regs->vpd.vbgr[i] = vpd->vbgr[i]; |
1432 | } | 1432 | } |
1433 | for (i = 0; i < 128; i++) | 1433 | for (i = 0; i < 128; i++) |
1434 | regs->vpd.vcr[i] = vpd->vcr[i]; | 1434 | regs->vpd.vcr[i] = vpd->vcr[i]; |
1435 | regs->vpd.vhpi = vpd->vhpi; | 1435 | regs->vpd.vhpi = vpd->vhpi; |
1436 | regs->vpd.vnat = vpd->vnat; | 1436 | regs->vpd.vnat = vpd->vnat; |
1437 | regs->vpd.vbnat = vpd->vbnat; | 1437 | regs->vpd.vbnat = vpd->vbnat; |
1438 | regs->vpd.vpsr = vpd->vpsr; | 1438 | regs->vpd.vpsr = vpd->vpsr; |
1439 | regs->vpd.vpr = vpd->vpr; | 1439 | regs->vpd.vpr = vpd->vpr; |
1440 | 1440 | ||
1441 | memcpy(®s->saved_guest, &vcpu->arch.guest, sizeof(union context)); | 1441 | memcpy(®s->saved_guest, &vcpu->arch.guest, sizeof(union context)); |
1442 | 1442 | ||
1443 | SAVE_REGS(mp_state); | 1443 | SAVE_REGS(mp_state); |
1444 | SAVE_REGS(vmm_rr); | 1444 | SAVE_REGS(vmm_rr); |
1445 | memcpy(regs->itrs, vcpu->arch.itrs, sizeof(struct thash_data) * NITRS); | 1445 | memcpy(regs->itrs, vcpu->arch.itrs, sizeof(struct thash_data) * NITRS); |
1446 | memcpy(regs->dtrs, vcpu->arch.dtrs, sizeof(struct thash_data) * NDTRS); | 1446 | memcpy(regs->dtrs, vcpu->arch.dtrs, sizeof(struct thash_data) * NDTRS); |
1447 | SAVE_REGS(itr_regions); | 1447 | SAVE_REGS(itr_regions); |
1448 | SAVE_REGS(dtr_regions); | 1448 | SAVE_REGS(dtr_regions); |
1449 | SAVE_REGS(tc_regions); | 1449 | SAVE_REGS(tc_regions); |
1450 | SAVE_REGS(irq_check); | 1450 | SAVE_REGS(irq_check); |
1451 | SAVE_REGS(itc_check); | 1451 | SAVE_REGS(itc_check); |
1452 | SAVE_REGS(timer_check); | 1452 | SAVE_REGS(timer_check); |
1453 | SAVE_REGS(timer_pending); | 1453 | SAVE_REGS(timer_pending); |
1454 | SAVE_REGS(last_itc); | 1454 | SAVE_REGS(last_itc); |
1455 | for (i = 0; i < 8; i++) { | 1455 | for (i = 0; i < 8; i++) { |
1456 | regs->vrr[i] = vcpu->arch.vrr[i]; | 1456 | regs->vrr[i] = vcpu->arch.vrr[i]; |
1457 | regs->ibr[i] = vcpu->arch.ibr[i]; | 1457 | regs->ibr[i] = vcpu->arch.ibr[i]; |
1458 | regs->dbr[i] = vcpu->arch.dbr[i]; | 1458 | regs->dbr[i] = vcpu->arch.dbr[i]; |
1459 | } | 1459 | } |
1460 | for (i = 0; i < 4; i++) | 1460 | for (i = 0; i < 4; i++) |
1461 | regs->insvc[i] = vcpu->arch.insvc[i]; | 1461 | regs->insvc[i] = vcpu->arch.insvc[i]; |
1462 | regs->saved_itc = vcpu->arch.itc_offset + kvm_get_itc(vcpu); | 1462 | regs->saved_itc = vcpu->arch.itc_offset + kvm_get_itc(vcpu); |
1463 | SAVE_REGS(xtp); | 1463 | SAVE_REGS(xtp); |
1464 | SAVE_REGS(metaphysical_rr0); | 1464 | SAVE_REGS(metaphysical_rr0); |
1465 | SAVE_REGS(metaphysical_rr4); | 1465 | SAVE_REGS(metaphysical_rr4); |
1466 | SAVE_REGS(metaphysical_saved_rr0); | 1466 | SAVE_REGS(metaphysical_saved_rr0); |
1467 | SAVE_REGS(metaphysical_saved_rr4); | 1467 | SAVE_REGS(metaphysical_saved_rr4); |
1468 | SAVE_REGS(fp_psr); | 1468 | SAVE_REGS(fp_psr); |
1469 | SAVE_REGS(saved_gp); | 1469 | SAVE_REGS(saved_gp); |
1470 | 1470 | ||
1471 | vcpu_put(vcpu); | 1471 | vcpu_put(vcpu); |
1472 | return 0; | 1472 | return 0; |
1473 | } | 1473 | } |
1474 | 1474 | ||
1475 | int kvm_arch_vcpu_ioctl_get_stack(struct kvm_vcpu *vcpu, | 1475 | int kvm_arch_vcpu_ioctl_get_stack(struct kvm_vcpu *vcpu, |
1476 | struct kvm_ia64_vcpu_stack *stack) | 1476 | struct kvm_ia64_vcpu_stack *stack) |
1477 | { | 1477 | { |
1478 | memcpy(stack, vcpu, sizeof(struct kvm_ia64_vcpu_stack)); | 1478 | memcpy(stack, vcpu, sizeof(struct kvm_ia64_vcpu_stack)); |
1479 | return 0; | 1479 | return 0; |
1480 | } | 1480 | } |
1481 | 1481 | ||
1482 | int kvm_arch_vcpu_ioctl_set_stack(struct kvm_vcpu *vcpu, | 1482 | int kvm_arch_vcpu_ioctl_set_stack(struct kvm_vcpu *vcpu, |
1483 | struct kvm_ia64_vcpu_stack *stack) | 1483 | struct kvm_ia64_vcpu_stack *stack) |
1484 | { | 1484 | { |
1485 | memcpy(vcpu + 1, &stack->stack[0] + sizeof(struct kvm_vcpu), | 1485 | memcpy(vcpu + 1, &stack->stack[0] + sizeof(struct kvm_vcpu), |
1486 | sizeof(struct kvm_ia64_vcpu_stack) - sizeof(struct kvm_vcpu)); | 1486 | sizeof(struct kvm_ia64_vcpu_stack) - sizeof(struct kvm_vcpu)); |
1487 | 1487 | ||
1488 | vcpu->arch.exit_data = ((struct kvm_vcpu *)stack)->arch.exit_data; | 1488 | vcpu->arch.exit_data = ((struct kvm_vcpu *)stack)->arch.exit_data; |
1489 | return 0; | 1489 | return 0; |
1490 | } | 1490 | } |
1491 | 1491 | ||
1492 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) | 1492 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) |
1493 | { | 1493 | { |
1494 | 1494 | ||
1495 | hrtimer_cancel(&vcpu->arch.hlt_timer); | 1495 | hrtimer_cancel(&vcpu->arch.hlt_timer); |
1496 | kfree(vcpu->arch.apic); | 1496 | kfree(vcpu->arch.apic); |
1497 | } | 1497 | } |
1498 | 1498 | ||
1499 | 1499 | ||
1500 | long kvm_arch_vcpu_ioctl(struct file *filp, | 1500 | long kvm_arch_vcpu_ioctl(struct file *filp, |
1501 | unsigned int ioctl, unsigned long arg) | 1501 | unsigned int ioctl, unsigned long arg) |
1502 | { | 1502 | { |
1503 | struct kvm_vcpu *vcpu = filp->private_data; | 1503 | struct kvm_vcpu *vcpu = filp->private_data; |
1504 | void __user *argp = (void __user *)arg; | 1504 | void __user *argp = (void __user *)arg; |
1505 | struct kvm_ia64_vcpu_stack *stack = NULL; | 1505 | struct kvm_ia64_vcpu_stack *stack = NULL; |
1506 | long r; | 1506 | long r; |
1507 | 1507 | ||
1508 | switch (ioctl) { | 1508 | switch (ioctl) { |
1509 | case KVM_IA64_VCPU_GET_STACK: { | 1509 | case KVM_IA64_VCPU_GET_STACK: { |
1510 | struct kvm_ia64_vcpu_stack __user *user_stack; | 1510 | struct kvm_ia64_vcpu_stack __user *user_stack; |
1511 | void __user *first_p = argp; | 1511 | void __user *first_p = argp; |
1512 | 1512 | ||
1513 | r = -EFAULT; | 1513 | r = -EFAULT; |
1514 | if (copy_from_user(&user_stack, first_p, sizeof(void *))) | 1514 | if (copy_from_user(&user_stack, first_p, sizeof(void *))) |
1515 | goto out; | 1515 | goto out; |
1516 | 1516 | ||
1517 | if (!access_ok(VERIFY_WRITE, user_stack, | 1517 | if (!access_ok(VERIFY_WRITE, user_stack, |
1518 | sizeof(struct kvm_ia64_vcpu_stack))) { | 1518 | sizeof(struct kvm_ia64_vcpu_stack))) { |
1519 | printk(KERN_INFO "KVM_IA64_VCPU_GET_STACK: " | 1519 | printk(KERN_INFO "KVM_IA64_VCPU_GET_STACK: " |
1520 | "Illegal user destination address for stack\n"); | 1520 | "Illegal user destination address for stack\n"); |
1521 | goto out; | 1521 | goto out; |
1522 | } | 1522 | } |
1523 | stack = kzalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); | 1523 | stack = kzalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); |
1524 | if (!stack) { | 1524 | if (!stack) { |
1525 | r = -ENOMEM; | 1525 | r = -ENOMEM; |
1526 | goto out; | 1526 | goto out; |
1527 | } | 1527 | } |
1528 | 1528 | ||
1529 | r = kvm_arch_vcpu_ioctl_get_stack(vcpu, stack); | 1529 | r = kvm_arch_vcpu_ioctl_get_stack(vcpu, stack); |
1530 | if (r) | 1530 | if (r) |
1531 | goto out; | 1531 | goto out; |
1532 | 1532 | ||
1533 | if (copy_to_user(user_stack, stack, | 1533 | if (copy_to_user(user_stack, stack, |
1534 | sizeof(struct kvm_ia64_vcpu_stack))) { | 1534 | sizeof(struct kvm_ia64_vcpu_stack))) { |
1535 | r = -EFAULT; | 1535 | r = -EFAULT; |
1536 | goto out; | 1536 | goto out; |
1537 | } | 1537 | } |
1538 | 1538 | ||
1539 | break; | 1539 | break; |
1540 | } | 1540 | } |
1541 | case KVM_IA64_VCPU_SET_STACK: { | 1541 | case KVM_IA64_VCPU_SET_STACK: { |
1542 | struct kvm_ia64_vcpu_stack __user *user_stack; | 1542 | struct kvm_ia64_vcpu_stack __user *user_stack; |
1543 | void __user *first_p = argp; | 1543 | void __user *first_p = argp; |
1544 | 1544 | ||
1545 | r = -EFAULT; | 1545 | r = -EFAULT; |
1546 | if (copy_from_user(&user_stack, first_p, sizeof(void *))) | 1546 | if (copy_from_user(&user_stack, first_p, sizeof(void *))) |
1547 | goto out; | 1547 | goto out; |
1548 | 1548 | ||
1549 | if (!access_ok(VERIFY_READ, user_stack, | 1549 | if (!access_ok(VERIFY_READ, user_stack, |
1550 | sizeof(struct kvm_ia64_vcpu_stack))) { | 1550 | sizeof(struct kvm_ia64_vcpu_stack))) { |
1551 | printk(KERN_INFO "KVM_IA64_VCPU_SET_STACK: " | 1551 | printk(KERN_INFO "KVM_IA64_VCPU_SET_STACK: " |
1552 | "Illegal user address for stack\n"); | 1552 | "Illegal user address for stack\n"); |
1553 | goto out; | 1553 | goto out; |
1554 | } | 1554 | } |
1555 | stack = kmalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); | 1555 | stack = kmalloc(sizeof(struct kvm_ia64_vcpu_stack), GFP_KERNEL); |
1556 | if (!stack) { | 1556 | if (!stack) { |
1557 | r = -ENOMEM; | 1557 | r = -ENOMEM; |
1558 | goto out; | 1558 | goto out; |
1559 | } | 1559 | } |
1560 | if (copy_from_user(stack, user_stack, | 1560 | if (copy_from_user(stack, user_stack, |
1561 | sizeof(struct kvm_ia64_vcpu_stack))) | 1561 | sizeof(struct kvm_ia64_vcpu_stack))) |
1562 | goto out; | 1562 | goto out; |
1563 | 1563 | ||
1564 | r = kvm_arch_vcpu_ioctl_set_stack(vcpu, stack); | 1564 | r = kvm_arch_vcpu_ioctl_set_stack(vcpu, stack); |
1565 | break; | 1565 | break; |
1566 | } | 1566 | } |
1567 | 1567 | ||
1568 | default: | 1568 | default: |
1569 | r = -EINVAL; | 1569 | r = -EINVAL; |
1570 | } | 1570 | } |
1571 | 1571 | ||
1572 | out: | 1572 | out: |
1573 | kfree(stack); | 1573 | kfree(stack); |
1574 | return r; | 1574 | return r; |
1575 | } | 1575 | } |
1576 | 1576 | ||
1577 | int kvm_arch_prepare_memory_region(struct kvm *kvm, | 1577 | int kvm_arch_prepare_memory_region(struct kvm *kvm, |
1578 | struct kvm_memory_slot *memslot, | 1578 | struct kvm_memory_slot *memslot, |
1579 | struct kvm_memory_slot old, | 1579 | struct kvm_memory_slot old, |
1580 | struct kvm_userspace_memory_region *mem, | 1580 | struct kvm_userspace_memory_region *mem, |
1581 | int user_alloc) | 1581 | int user_alloc) |
1582 | { | 1582 | { |
1583 | unsigned long i; | 1583 | unsigned long i; |
1584 | unsigned long pfn; | 1584 | unsigned long pfn; |
1585 | int npages = memslot->npages; | 1585 | int npages = memslot->npages; |
1586 | unsigned long base_gfn = memslot->base_gfn; | 1586 | unsigned long base_gfn = memslot->base_gfn; |
1587 | 1587 | ||
1588 | if (base_gfn + npages > (KVM_MAX_MEM_SIZE >> PAGE_SHIFT)) | 1588 | if (base_gfn + npages > (KVM_MAX_MEM_SIZE >> PAGE_SHIFT)) |
1589 | return -ENOMEM; | 1589 | return -ENOMEM; |
1590 | 1590 | ||
1591 | for (i = 0; i < npages; i++) { | 1591 | for (i = 0; i < npages; i++) { |
1592 | pfn = gfn_to_pfn(kvm, base_gfn + i); | 1592 | pfn = gfn_to_pfn(kvm, base_gfn + i); |
1593 | if (!kvm_is_mmio_pfn(pfn)) { | 1593 | if (!kvm_is_mmio_pfn(pfn)) { |
1594 | kvm_set_pmt_entry(kvm, base_gfn + i, | 1594 | kvm_set_pmt_entry(kvm, base_gfn + i, |
1595 | pfn << PAGE_SHIFT, | 1595 | pfn << PAGE_SHIFT, |
1596 | _PAGE_AR_RWX | _PAGE_MA_WB); | 1596 | _PAGE_AR_RWX | _PAGE_MA_WB); |
1597 | memslot->rmap[i] = (unsigned long)pfn_to_page(pfn); | 1597 | memslot->rmap[i] = (unsigned long)pfn_to_page(pfn); |
1598 | } else { | 1598 | } else { |
1599 | kvm_set_pmt_entry(kvm, base_gfn + i, | 1599 | kvm_set_pmt_entry(kvm, base_gfn + i, |
1600 | GPFN_PHYS_MMIO | (pfn << PAGE_SHIFT), | 1600 | GPFN_PHYS_MMIO | (pfn << PAGE_SHIFT), |
1601 | _PAGE_MA_UC); | 1601 | _PAGE_MA_UC); |
1602 | memslot->rmap[i] = 0; | 1602 | memslot->rmap[i] = 0; |
1603 | } | 1603 | } |
1604 | } | 1604 | } |
1605 | 1605 | ||
1606 | return 0; | 1606 | return 0; |
1607 | } | 1607 | } |
1608 | 1608 | ||
1609 | void kvm_arch_commit_memory_region(struct kvm *kvm, | 1609 | void kvm_arch_commit_memory_region(struct kvm *kvm, |
1610 | struct kvm_userspace_memory_region *mem, | 1610 | struct kvm_userspace_memory_region *mem, |
1611 | struct kvm_memory_slot old, | 1611 | struct kvm_memory_slot old, |
1612 | int user_alloc) | 1612 | int user_alloc) |
1613 | { | 1613 | { |
1614 | return; | 1614 | return; |
1615 | } | 1615 | } |
1616 | 1616 | ||
1617 | void kvm_arch_flush_shadow(struct kvm *kvm) | 1617 | void kvm_arch_flush_shadow(struct kvm *kvm) |
1618 | { | 1618 | { |
1619 | kvm_flush_remote_tlbs(kvm); | 1619 | kvm_flush_remote_tlbs(kvm); |
1620 | } | 1620 | } |
1621 | 1621 | ||
1622 | long kvm_arch_dev_ioctl(struct file *filp, | 1622 | long kvm_arch_dev_ioctl(struct file *filp, |
1623 | unsigned int ioctl, unsigned long arg) | 1623 | unsigned int ioctl, unsigned long arg) |
1624 | { | 1624 | { |
1625 | return -EINVAL; | 1625 | return -EINVAL; |
1626 | } | 1626 | } |
1627 | 1627 | ||
1628 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) | 1628 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) |
1629 | { | 1629 | { |
1630 | kvm_vcpu_uninit(vcpu); | 1630 | kvm_vcpu_uninit(vcpu); |
1631 | } | 1631 | } |
1632 | 1632 | ||
1633 | static int vti_cpu_has_kvm_support(void) | 1633 | static int vti_cpu_has_kvm_support(void) |
1634 | { | 1634 | { |
1635 | long avail = 1, status = 1, control = 1; | 1635 | long avail = 1, status = 1, control = 1; |
1636 | long ret; | 1636 | long ret; |
1637 | 1637 | ||
1638 | ret = ia64_pal_proc_get_features(&avail, &status, &control, 0); | 1638 | ret = ia64_pal_proc_get_features(&avail, &status, &control, 0); |
1639 | if (ret) | 1639 | if (ret) |
1640 | goto out; | 1640 | goto out; |
1641 | 1641 | ||
1642 | if (!(avail & PAL_PROC_VM_BIT)) | 1642 | if (!(avail & PAL_PROC_VM_BIT)) |
1643 | goto out; | 1643 | goto out; |
1644 | 1644 | ||
1645 | printk(KERN_DEBUG"kvm: Hardware Supports VT\n"); | 1645 | printk(KERN_DEBUG"kvm: Hardware Supports VT\n"); |
1646 | 1646 | ||
1647 | ret = ia64_pal_vp_env_info(&kvm_vm_buffer_size, &vp_env_info); | 1647 | ret = ia64_pal_vp_env_info(&kvm_vm_buffer_size, &vp_env_info); |
1648 | if (ret) | 1648 | if (ret) |
1649 | goto out; | 1649 | goto out; |
1650 | printk(KERN_DEBUG"kvm: VM Buffer Size:0x%lx\n", kvm_vm_buffer_size); | 1650 | printk(KERN_DEBUG"kvm: VM Buffer Size:0x%lx\n", kvm_vm_buffer_size); |
1651 | 1651 | ||
1652 | if (!(vp_env_info & VP_OPCODE)) { | 1652 | if (!(vp_env_info & VP_OPCODE)) { |
1653 | printk(KERN_WARNING"kvm: No opcode ability on hardware, " | 1653 | printk(KERN_WARNING"kvm: No opcode ability on hardware, " |
1654 | "vm_env_info:0x%lx\n", vp_env_info); | 1654 | "vm_env_info:0x%lx\n", vp_env_info); |
1655 | } | 1655 | } |
1656 | 1656 | ||
1657 | return 1; | 1657 | return 1; |
1658 | out: | 1658 | out: |
1659 | return 0; | 1659 | return 0; |
1660 | } | 1660 | } |
1661 | 1661 | ||
1662 | 1662 | ||
1663 | /* | 1663 | /* |
1664 | * On SN2, the ITC isn't stable, so copy in fast path code to use the | 1664 | * On SN2, the ITC isn't stable, so copy in fast path code to use the |
1665 | * SN2 RTC, replacing the ITC based default verion. | 1665 | * SN2 RTC, replacing the ITC based default verion. |
1666 | */ | 1666 | */ |
1667 | static void kvm_patch_vmm(struct kvm_vmm_info *vmm_info, | 1667 | static void kvm_patch_vmm(struct kvm_vmm_info *vmm_info, |
1668 | struct module *module) | 1668 | struct module *module) |
1669 | { | 1669 | { |
1670 | unsigned long new_ar, new_ar_sn2; | 1670 | unsigned long new_ar, new_ar_sn2; |
1671 | unsigned long module_base; | 1671 | unsigned long module_base; |
1672 | 1672 | ||
1673 | if (!ia64_platform_is("sn2")) | 1673 | if (!ia64_platform_is("sn2")) |
1674 | return; | 1674 | return; |
1675 | 1675 | ||
1676 | module_base = (unsigned long)module->module_core; | 1676 | module_base = (unsigned long)module->module_core; |
1677 | 1677 | ||
1678 | new_ar = kvm_vmm_base + vmm_info->patch_mov_ar - module_base; | 1678 | new_ar = kvm_vmm_base + vmm_info->patch_mov_ar - module_base; |
1679 | new_ar_sn2 = kvm_vmm_base + vmm_info->patch_mov_ar_sn2 - module_base; | 1679 | new_ar_sn2 = kvm_vmm_base + vmm_info->patch_mov_ar_sn2 - module_base; |
1680 | 1680 | ||
1681 | printk(KERN_INFO "kvm: Patching ITC emulation to use SGI SN2 RTC " | 1681 | printk(KERN_INFO "kvm: Patching ITC emulation to use SGI SN2 RTC " |
1682 | "as source\n"); | 1682 | "as source\n"); |
1683 | 1683 | ||
1684 | /* | 1684 | /* |
1685 | * Copy the SN2 version of mov_ar into place. They are both | 1685 | * Copy the SN2 version of mov_ar into place. They are both |
1686 | * the same size, so 6 bundles is sufficient (6 * 0x10). | 1686 | * the same size, so 6 bundles is sufficient (6 * 0x10). |
1687 | */ | 1687 | */ |
1688 | memcpy((void *)new_ar, (void *)new_ar_sn2, 0x60); | 1688 | memcpy((void *)new_ar, (void *)new_ar_sn2, 0x60); |
1689 | } | 1689 | } |
1690 | 1690 | ||
1691 | static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info, | 1691 | static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info, |
1692 | struct module *module) | 1692 | struct module *module) |
1693 | { | 1693 | { |
1694 | unsigned long module_base; | 1694 | unsigned long module_base; |
1695 | unsigned long vmm_size; | 1695 | unsigned long vmm_size; |
1696 | 1696 | ||
1697 | unsigned long vmm_offset, func_offset, fdesc_offset; | 1697 | unsigned long vmm_offset, func_offset, fdesc_offset; |
1698 | struct fdesc *p_fdesc; | 1698 | struct fdesc *p_fdesc; |
1699 | 1699 | ||
1700 | BUG_ON(!module); | 1700 | BUG_ON(!module); |
1701 | 1701 | ||
1702 | if (!kvm_vmm_base) { | 1702 | if (!kvm_vmm_base) { |
1703 | printk("kvm: kvm area hasn't been initilized yet!!\n"); | 1703 | printk("kvm: kvm area hasn't been initilized yet!!\n"); |
1704 | return -EFAULT; | 1704 | return -EFAULT; |
1705 | } | 1705 | } |
1706 | 1706 | ||
1707 | /*Calculate new position of relocated vmm module.*/ | 1707 | /*Calculate new position of relocated vmm module.*/ |
1708 | module_base = (unsigned long)module->module_core; | 1708 | module_base = (unsigned long)module->module_core; |
1709 | vmm_size = module->core_size; | 1709 | vmm_size = module->core_size; |
1710 | if (unlikely(vmm_size > KVM_VMM_SIZE)) | 1710 | if (unlikely(vmm_size > KVM_VMM_SIZE)) |
1711 | return -EFAULT; | 1711 | return -EFAULT; |
1712 | 1712 | ||
1713 | memcpy((void *)kvm_vmm_base, (void *)module_base, vmm_size); | 1713 | memcpy((void *)kvm_vmm_base, (void *)module_base, vmm_size); |
1714 | kvm_patch_vmm(vmm_info, module); | 1714 | kvm_patch_vmm(vmm_info, module); |
1715 | kvm_flush_icache(kvm_vmm_base, vmm_size); | 1715 | kvm_flush_icache(kvm_vmm_base, vmm_size); |
1716 | 1716 | ||
1717 | /*Recalculate kvm_vmm_info based on new VMM*/ | 1717 | /*Recalculate kvm_vmm_info based on new VMM*/ |
1718 | vmm_offset = vmm_info->vmm_ivt - module_base; | 1718 | vmm_offset = vmm_info->vmm_ivt - module_base; |
1719 | kvm_vmm_info->vmm_ivt = KVM_VMM_BASE + vmm_offset; | 1719 | kvm_vmm_info->vmm_ivt = KVM_VMM_BASE + vmm_offset; |
1720 | printk(KERN_DEBUG"kvm: Relocated VMM's IVT Base Addr:%lx\n", | 1720 | printk(KERN_DEBUG"kvm: Relocated VMM's IVT Base Addr:%lx\n", |
1721 | kvm_vmm_info->vmm_ivt); | 1721 | kvm_vmm_info->vmm_ivt); |
1722 | 1722 | ||
1723 | fdesc_offset = (unsigned long)vmm_info->vmm_entry - module_base; | 1723 | fdesc_offset = (unsigned long)vmm_info->vmm_entry - module_base; |
1724 | kvm_vmm_info->vmm_entry = (kvm_vmm_entry *)(KVM_VMM_BASE + | 1724 | kvm_vmm_info->vmm_entry = (kvm_vmm_entry *)(KVM_VMM_BASE + |
1725 | fdesc_offset); | 1725 | fdesc_offset); |
1726 | func_offset = *(unsigned long *)vmm_info->vmm_entry - module_base; | 1726 | func_offset = *(unsigned long *)vmm_info->vmm_entry - module_base; |
1727 | p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); | 1727 | p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); |
1728 | p_fdesc->ip = KVM_VMM_BASE + func_offset; | 1728 | p_fdesc->ip = KVM_VMM_BASE + func_offset; |
1729 | p_fdesc->gp = KVM_VMM_BASE+(p_fdesc->gp - module_base); | 1729 | p_fdesc->gp = KVM_VMM_BASE+(p_fdesc->gp - module_base); |
1730 | 1730 | ||
1731 | printk(KERN_DEBUG"kvm: Relocated VMM's Init Entry Addr:%lx\n", | 1731 | printk(KERN_DEBUG"kvm: Relocated VMM's Init Entry Addr:%lx\n", |
1732 | KVM_VMM_BASE+func_offset); | 1732 | KVM_VMM_BASE+func_offset); |
1733 | 1733 | ||
1734 | fdesc_offset = (unsigned long)vmm_info->tramp_entry - module_base; | 1734 | fdesc_offset = (unsigned long)vmm_info->tramp_entry - module_base; |
1735 | kvm_vmm_info->tramp_entry = (kvm_tramp_entry *)(KVM_VMM_BASE + | 1735 | kvm_vmm_info->tramp_entry = (kvm_tramp_entry *)(KVM_VMM_BASE + |
1736 | fdesc_offset); | 1736 | fdesc_offset); |
1737 | func_offset = *(unsigned long *)vmm_info->tramp_entry - module_base; | 1737 | func_offset = *(unsigned long *)vmm_info->tramp_entry - module_base; |
1738 | p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); | 1738 | p_fdesc = (struct fdesc *)(kvm_vmm_base + fdesc_offset); |
1739 | p_fdesc->ip = KVM_VMM_BASE + func_offset; | 1739 | p_fdesc->ip = KVM_VMM_BASE + func_offset; |
1740 | p_fdesc->gp = KVM_VMM_BASE + (p_fdesc->gp - module_base); | 1740 | p_fdesc->gp = KVM_VMM_BASE + (p_fdesc->gp - module_base); |
1741 | 1741 | ||
1742 | kvm_vmm_gp = p_fdesc->gp; | 1742 | kvm_vmm_gp = p_fdesc->gp; |
1743 | 1743 | ||
1744 | printk(KERN_DEBUG"kvm: Relocated VMM's Entry IP:%p\n", | 1744 | printk(KERN_DEBUG"kvm: Relocated VMM's Entry IP:%p\n", |
1745 | kvm_vmm_info->vmm_entry); | 1745 | kvm_vmm_info->vmm_entry); |
1746 | printk(KERN_DEBUG"kvm: Relocated VMM's Trampoline Entry IP:0x%lx\n", | 1746 | printk(KERN_DEBUG"kvm: Relocated VMM's Trampoline Entry IP:0x%lx\n", |
1747 | KVM_VMM_BASE + func_offset); | 1747 | KVM_VMM_BASE + func_offset); |
1748 | 1748 | ||
1749 | return 0; | 1749 | return 0; |
1750 | } | 1750 | } |
1751 | 1751 | ||
1752 | int kvm_arch_init(void *opaque) | 1752 | int kvm_arch_init(void *opaque) |
1753 | { | 1753 | { |
1754 | int r; | 1754 | int r; |
1755 | struct kvm_vmm_info *vmm_info = (struct kvm_vmm_info *)opaque; | 1755 | struct kvm_vmm_info *vmm_info = (struct kvm_vmm_info *)opaque; |
1756 | 1756 | ||
1757 | if (!vti_cpu_has_kvm_support()) { | 1757 | if (!vti_cpu_has_kvm_support()) { |
1758 | printk(KERN_ERR "kvm: No Hardware Virtualization Support!\n"); | 1758 | printk(KERN_ERR "kvm: No Hardware Virtualization Support!\n"); |
1759 | r = -EOPNOTSUPP; | 1759 | r = -EOPNOTSUPP; |
1760 | goto out; | 1760 | goto out; |
1761 | } | 1761 | } |
1762 | 1762 | ||
1763 | if (kvm_vmm_info) { | 1763 | if (kvm_vmm_info) { |
1764 | printk(KERN_ERR "kvm: Already loaded VMM module!\n"); | 1764 | printk(KERN_ERR "kvm: Already loaded VMM module!\n"); |
1765 | r = -EEXIST; | 1765 | r = -EEXIST; |
1766 | goto out; | 1766 | goto out; |
1767 | } | 1767 | } |
1768 | 1768 | ||
1769 | r = -ENOMEM; | 1769 | r = -ENOMEM; |
1770 | kvm_vmm_info = kzalloc(sizeof(struct kvm_vmm_info), GFP_KERNEL); | 1770 | kvm_vmm_info = kzalloc(sizeof(struct kvm_vmm_info), GFP_KERNEL); |
1771 | if (!kvm_vmm_info) | 1771 | if (!kvm_vmm_info) |
1772 | goto out; | 1772 | goto out; |
1773 | 1773 | ||
1774 | if (kvm_alloc_vmm_area()) | 1774 | if (kvm_alloc_vmm_area()) |
1775 | goto out_free0; | 1775 | goto out_free0; |
1776 | 1776 | ||
1777 | r = kvm_relocate_vmm(vmm_info, vmm_info->module); | 1777 | r = kvm_relocate_vmm(vmm_info, vmm_info->module); |
1778 | if (r) | 1778 | if (r) |
1779 | goto out_free1; | 1779 | goto out_free1; |
1780 | 1780 | ||
1781 | return 0; | 1781 | return 0; |
1782 | 1782 | ||
1783 | out_free1: | 1783 | out_free1: |
1784 | kvm_free_vmm_area(); | 1784 | kvm_free_vmm_area(); |
1785 | out_free0: | 1785 | out_free0: |
1786 | kfree(kvm_vmm_info); | 1786 | kfree(kvm_vmm_info); |
1787 | out: | 1787 | out: |
1788 | return r; | 1788 | return r; |
1789 | } | 1789 | } |
1790 | 1790 | ||
1791 | void kvm_arch_exit(void) | 1791 | void kvm_arch_exit(void) |
1792 | { | 1792 | { |
1793 | kvm_free_vmm_area(); | 1793 | kvm_free_vmm_area(); |
1794 | kfree(kvm_vmm_info); | 1794 | kfree(kvm_vmm_info); |
1795 | kvm_vmm_info = NULL; | 1795 | kvm_vmm_info = NULL; |
1796 | } | 1796 | } |
1797 | 1797 | ||
1798 | static int kvm_ia64_sync_dirty_log(struct kvm *kvm, | 1798 | static int kvm_ia64_sync_dirty_log(struct kvm *kvm, |
1799 | struct kvm_dirty_log *log) | 1799 | struct kvm_dirty_log *log) |
1800 | { | 1800 | { |
1801 | struct kvm_memory_slot *memslot; | 1801 | struct kvm_memory_slot *memslot; |
1802 | int r, i; | 1802 | int r, i; |
1803 | long base; | 1803 | long base; |
1804 | unsigned long n; | 1804 | unsigned long n; |
1805 | unsigned long *dirty_bitmap = (unsigned long *)(kvm->arch.vm_base + | 1805 | unsigned long *dirty_bitmap = (unsigned long *)(kvm->arch.vm_base + |
1806 | offsetof(struct kvm_vm_data, kvm_mem_dirty_log)); | 1806 | offsetof(struct kvm_vm_data, kvm_mem_dirty_log)); |
1807 | 1807 | ||
1808 | r = -EINVAL; | 1808 | r = -EINVAL; |
1809 | if (log->slot >= KVM_MEMORY_SLOTS) | 1809 | if (log->slot >= KVM_MEMORY_SLOTS) |
1810 | goto out; | 1810 | goto out; |
1811 | 1811 | ||
1812 | memslot = &kvm->memslots->memslots[log->slot]; | 1812 | memslot = &kvm->memslots->memslots[log->slot]; |
1813 | r = -ENOENT; | 1813 | r = -ENOENT; |
1814 | if (!memslot->dirty_bitmap) | 1814 | if (!memslot->dirty_bitmap) |
1815 | goto out; | 1815 | goto out; |
1816 | 1816 | ||
1817 | n = kvm_dirty_bitmap_bytes(memslot); | 1817 | n = kvm_dirty_bitmap_bytes(memslot); |
1818 | base = memslot->base_gfn / BITS_PER_LONG; | 1818 | base = memslot->base_gfn / BITS_PER_LONG; |
1819 | 1819 | ||
1820 | for (i = 0; i < n/sizeof(long); ++i) { | 1820 | for (i = 0; i < n/sizeof(long); ++i) { |
1821 | memslot->dirty_bitmap[i] = dirty_bitmap[base + i]; | 1821 | memslot->dirty_bitmap[i] = dirty_bitmap[base + i]; |
1822 | dirty_bitmap[base + i] = 0; | 1822 | dirty_bitmap[base + i] = 0; |
1823 | } | 1823 | } |
1824 | r = 0; | 1824 | r = 0; |
1825 | out: | 1825 | out: |
1826 | return r; | 1826 | return r; |
1827 | } | 1827 | } |
1828 | 1828 | ||
1829 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, | 1829 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, |
1830 | struct kvm_dirty_log *log) | 1830 | struct kvm_dirty_log *log) |
1831 | { | 1831 | { |
1832 | int r; | 1832 | int r; |
1833 | unsigned long n; | 1833 | unsigned long n; |
1834 | struct kvm_memory_slot *memslot; | 1834 | struct kvm_memory_slot *memslot; |
1835 | int is_dirty = 0; | 1835 | int is_dirty = 0; |
1836 | 1836 | ||
1837 | mutex_lock(&kvm->slots_lock); | 1837 | mutex_lock(&kvm->slots_lock); |
1838 | spin_lock(&kvm->arch.dirty_log_lock); | 1838 | spin_lock(&kvm->arch.dirty_log_lock); |
1839 | 1839 | ||
1840 | r = kvm_ia64_sync_dirty_log(kvm, log); | 1840 | r = kvm_ia64_sync_dirty_log(kvm, log); |
1841 | if (r) | 1841 | if (r) |
1842 | goto out; | 1842 | goto out; |
1843 | 1843 | ||
1844 | r = kvm_get_dirty_log(kvm, log, &is_dirty); | 1844 | r = kvm_get_dirty_log(kvm, log, &is_dirty); |
1845 | if (r) | 1845 | if (r) |
1846 | goto out; | 1846 | goto out; |
1847 | 1847 | ||
1848 | /* If nothing is dirty, don't bother messing with page tables. */ | 1848 | /* If nothing is dirty, don't bother messing with page tables. */ |
1849 | if (is_dirty) { | 1849 | if (is_dirty) { |
1850 | kvm_flush_remote_tlbs(kvm); | 1850 | kvm_flush_remote_tlbs(kvm); |
1851 | memslot = &kvm->memslots->memslots[log->slot]; | 1851 | memslot = &kvm->memslots->memslots[log->slot]; |
1852 | n = kvm_dirty_bitmap_bytes(memslot); | 1852 | n = kvm_dirty_bitmap_bytes(memslot); |
1853 | memset(memslot->dirty_bitmap, 0, n); | 1853 | memset(memslot->dirty_bitmap, 0, n); |
1854 | } | 1854 | } |
1855 | r = 0; | 1855 | r = 0; |
1856 | out: | 1856 | out: |
1857 | mutex_unlock(&kvm->slots_lock); | 1857 | mutex_unlock(&kvm->slots_lock); |
1858 | spin_unlock(&kvm->arch.dirty_log_lock); | 1858 | spin_unlock(&kvm->arch.dirty_log_lock); |
1859 | return r; | 1859 | return r; |
1860 | } | 1860 | } |
1861 | 1861 | ||
1862 | int kvm_arch_hardware_setup(void) | 1862 | int kvm_arch_hardware_setup(void) |
1863 | { | 1863 | { |
1864 | return 0; | 1864 | return 0; |
1865 | } | 1865 | } |
1866 | 1866 | ||
1867 | void kvm_arch_hardware_unsetup(void) | 1867 | void kvm_arch_hardware_unsetup(void) |
1868 | { | 1868 | { |
1869 | } | 1869 | } |
1870 | 1870 | ||
1871 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu) | 1871 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu) |
1872 | { | 1872 | { |
1873 | int me; | 1873 | int me; |
1874 | int cpu = vcpu->cpu; | 1874 | int cpu = vcpu->cpu; |
1875 | 1875 | ||
1876 | if (waitqueue_active(&vcpu->wq)) | 1876 | if (waitqueue_active(&vcpu->wq)) |
1877 | wake_up_interruptible(&vcpu->wq); | 1877 | wake_up_interruptible(&vcpu->wq); |
1878 | 1878 | ||
1879 | me = get_cpu(); | 1879 | me = get_cpu(); |
1880 | if (cpu != me && (unsigned) cpu < nr_cpu_ids && cpu_online(cpu)) | 1880 | if (cpu != me && (unsigned) cpu < nr_cpu_ids && cpu_online(cpu)) |
1881 | if (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests)) | 1881 | if (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests)) |
1882 | smp_send_reschedule(cpu); | 1882 | smp_send_reschedule(cpu); |
1883 | put_cpu(); | 1883 | put_cpu(); |
1884 | } | 1884 | } |
1885 | 1885 | ||
1886 | int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) | 1886 | int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) |
1887 | { | 1887 | { |
1888 | return __apic_accept_irq(vcpu, irq->vector); | 1888 | return __apic_accept_irq(vcpu, irq->vector); |
1889 | } | 1889 | } |
1890 | 1890 | ||
1891 | int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) | 1891 | int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) |
1892 | { | 1892 | { |
1893 | return apic->vcpu->vcpu_id == dest; | 1893 | return apic->vcpu->vcpu_id == dest; |
1894 | } | 1894 | } |
1895 | 1895 | ||
1896 | int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda) | 1896 | int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda) |
1897 | { | 1897 | { |
1898 | return 0; | 1898 | return 0; |
1899 | } | 1899 | } |
1900 | 1900 | ||
1901 | int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) | 1901 | int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) |
1902 | { | 1902 | { |
1903 | return vcpu1->arch.xtp - vcpu2->arch.xtp; | 1903 | return vcpu1->arch.xtp - vcpu2->arch.xtp; |
1904 | } | 1904 | } |
1905 | 1905 | ||
1906 | int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, | 1906 | int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, |
1907 | int short_hand, int dest, int dest_mode) | 1907 | int short_hand, int dest, int dest_mode) |
1908 | { | 1908 | { |
1909 | struct kvm_lapic *target = vcpu->arch.apic; | 1909 | struct kvm_lapic *target = vcpu->arch.apic; |
1910 | return (dest_mode == 0) ? | 1910 | return (dest_mode == 0) ? |
1911 | kvm_apic_match_physical_addr(target, dest) : | 1911 | kvm_apic_match_physical_addr(target, dest) : |
1912 | kvm_apic_match_logical_addr(target, dest); | 1912 | kvm_apic_match_logical_addr(target, dest); |
1913 | } | 1913 | } |
1914 | 1914 | ||
1915 | static int find_highest_bits(int *dat) | 1915 | static int find_highest_bits(int *dat) |
1916 | { | 1916 | { |
1917 | u32 bits, bitnum; | 1917 | u32 bits, bitnum; |
1918 | int i; | 1918 | int i; |
1919 | 1919 | ||
1920 | /* loop for all 256 bits */ | 1920 | /* loop for all 256 bits */ |
1921 | for (i = 7; i >= 0 ; i--) { | 1921 | for (i = 7; i >= 0 ; i--) { |
1922 | bits = dat[i]; | 1922 | bits = dat[i]; |
1923 | if (bits) { | 1923 | if (bits) { |
1924 | bitnum = fls(bits); | 1924 | bitnum = fls(bits); |
1925 | return i * 32 + bitnum - 1; | 1925 | return i * 32 + bitnum - 1; |
1926 | } | 1926 | } |
1927 | } | 1927 | } |
1928 | 1928 | ||
1929 | return -1; | 1929 | return -1; |
1930 | } | 1930 | } |
1931 | 1931 | ||
1932 | int kvm_highest_pending_irq(struct kvm_vcpu *vcpu) | 1932 | int kvm_highest_pending_irq(struct kvm_vcpu *vcpu) |
1933 | { | 1933 | { |
1934 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); | 1934 | struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd); |
1935 | 1935 | ||
1936 | if (vpd->irr[0] & (1UL << NMI_VECTOR)) | 1936 | if (vpd->irr[0] & (1UL << NMI_VECTOR)) |
1937 | return NMI_VECTOR; | 1937 | return NMI_VECTOR; |
1938 | if (vpd->irr[0] & (1UL << ExtINT_VECTOR)) | 1938 | if (vpd->irr[0] & (1UL << ExtINT_VECTOR)) |
1939 | return ExtINT_VECTOR; | 1939 | return ExtINT_VECTOR; |
1940 | 1940 | ||
1941 | return find_highest_bits((int *)&vpd->irr[0]); | 1941 | return find_highest_bits((int *)&vpd->irr[0]); |
1942 | } | 1942 | } |
1943 | 1943 | ||
1944 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) | 1944 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) |
1945 | { | 1945 | { |
1946 | return vcpu->arch.timer_fired; | 1946 | return vcpu->arch.timer_fired; |
1947 | } | 1947 | } |
1948 | 1948 | ||
1949 | gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) | ||
1950 | { | ||
1951 | return gfn; | ||
1952 | } | ||
1953 | |||
1954 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) | 1949 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) |
1955 | { | 1950 | { |
1956 | return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) || | 1951 | return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) || |
1957 | (kvm_highest_pending_irq(vcpu) != -1); | 1952 | (kvm_highest_pending_irq(vcpu) != -1); |
1958 | } | 1953 | } |
1959 | 1954 | ||
1960 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, | 1955 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, |
1961 | struct kvm_mp_state *mp_state) | 1956 | struct kvm_mp_state *mp_state) |
1962 | { | 1957 | { |
1963 | mp_state->mp_state = vcpu->arch.mp_state; | 1958 | mp_state->mp_state = vcpu->arch.mp_state; |
1964 | return 0; | 1959 | return 0; |
1965 | } | 1960 | } |
1966 | 1961 | ||
1967 | static int vcpu_reset(struct kvm_vcpu *vcpu) | 1962 | static int vcpu_reset(struct kvm_vcpu *vcpu) |
1968 | { | 1963 | { |
1969 | int r; | 1964 | int r; |
1970 | long psr; | 1965 | long psr; |
1971 | local_irq_save(psr); | 1966 | local_irq_save(psr); |
1972 | r = kvm_insert_vmm_mapping(vcpu); | 1967 | r = kvm_insert_vmm_mapping(vcpu); |
1973 | local_irq_restore(psr); | 1968 | local_irq_restore(psr); |
1974 | if (r) | 1969 | if (r) |
1975 | goto fail; | 1970 | goto fail; |
1976 | 1971 | ||
1977 | vcpu->arch.launched = 0; | 1972 | vcpu->arch.launched = 0; |
1978 | kvm_arch_vcpu_uninit(vcpu); | 1973 | kvm_arch_vcpu_uninit(vcpu); |
1979 | r = kvm_arch_vcpu_init(vcpu); | 1974 | r = kvm_arch_vcpu_init(vcpu); |
1980 | if (r) | 1975 | if (r) |
1981 | goto fail; | 1976 | goto fail; |
1982 | 1977 | ||
1983 | kvm_purge_vmm_mapping(vcpu); | 1978 | kvm_purge_vmm_mapping(vcpu); |
1984 | r = 0; | 1979 | r = 0; |
1985 | fail: | 1980 | fail: |
1986 | return r; | 1981 | return r; |
1987 | } | 1982 | } |
1988 | 1983 | ||
1989 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, | 1984 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, |
1990 | struct kvm_mp_state *mp_state) | 1985 | struct kvm_mp_state *mp_state) |
1991 | { | 1986 | { |
1992 | int r = 0; | 1987 | int r = 0; |
1993 | 1988 | ||
1994 | vcpu->arch.mp_state = mp_state->mp_state; | 1989 | vcpu->arch.mp_state = mp_state->mp_state; |
1995 | if (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) | 1990 | if (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) |
1996 | r = vcpu_reset(vcpu); | 1991 | r = vcpu_reset(vcpu); |
1997 | return r; | 1992 | return r; |
1998 | } | 1993 | } |
1999 | 1994 |
arch/powerpc/kvm/powerpc.c
1 | /* | 1 | /* |
2 | * This program is free software; you can redistribute it and/or modify | 2 | * This program is free software; you can redistribute it and/or modify |
3 | * it under the terms of the GNU General Public License, version 2, as | 3 | * it under the terms of the GNU General Public License, version 2, as |
4 | * published by the Free Software Foundation. | 4 | * published by the Free Software Foundation. |
5 | * | 5 | * |
6 | * This program is distributed in the hope that it will be useful, | 6 | * This program is distributed in the hope that it will be useful, |
7 | * but WITHOUT ANY WARRANTY; without even the implied warranty of | 7 | * but WITHOUT ANY WARRANTY; without even the implied warranty of |
8 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 8 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
9 | * GNU General Public License for more details. | 9 | * GNU General Public License for more details. |
10 | * | 10 | * |
11 | * You should have received a copy of the GNU General Public License | 11 | * You should have received a copy of the GNU General Public License |
12 | * along with this program; if not, write to the Free Software | 12 | * along with this program; if not, write to the Free Software |
13 | * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. | 13 | * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. |
14 | * | 14 | * |
15 | * Copyright IBM Corp. 2007 | 15 | * Copyright IBM Corp. 2007 |
16 | * | 16 | * |
17 | * Authors: Hollis Blanchard <hollisb@us.ibm.com> | 17 | * Authors: Hollis Blanchard <hollisb@us.ibm.com> |
18 | * Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> | 18 | * Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> |
19 | */ | 19 | */ |
20 | 20 | ||
21 | #include <linux/errno.h> | 21 | #include <linux/errno.h> |
22 | #include <linux/err.h> | 22 | #include <linux/err.h> |
23 | #include <linux/kvm_host.h> | 23 | #include <linux/kvm_host.h> |
24 | #include <linux/module.h> | 24 | #include <linux/module.h> |
25 | #include <linux/vmalloc.h> | 25 | #include <linux/vmalloc.h> |
26 | #include <linux/hrtimer.h> | 26 | #include <linux/hrtimer.h> |
27 | #include <linux/fs.h> | 27 | #include <linux/fs.h> |
28 | #include <linux/slab.h> | 28 | #include <linux/slab.h> |
29 | #include <asm/cputable.h> | 29 | #include <asm/cputable.h> |
30 | #include <asm/uaccess.h> | 30 | #include <asm/uaccess.h> |
31 | #include <asm/kvm_ppc.h> | 31 | #include <asm/kvm_ppc.h> |
32 | #include <asm/tlbflush.h> | 32 | #include <asm/tlbflush.h> |
33 | #include "timing.h" | 33 | #include "timing.h" |
34 | #include "../mm/mmu_decl.h" | 34 | #include "../mm/mmu_decl.h" |
35 | 35 | ||
36 | #define CREATE_TRACE_POINTS | 36 | #define CREATE_TRACE_POINTS |
37 | #include "trace.h" | 37 | #include "trace.h" |
38 | 38 | ||
39 | gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) | ||
40 | { | ||
41 | return gfn; | ||
42 | } | ||
43 | |||
44 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) | 39 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) |
45 | { | 40 | { |
46 | return !(v->arch.msr & MSR_WE) || !!(v->arch.pending_exceptions); | 41 | return !(v->arch.msr & MSR_WE) || !!(v->arch.pending_exceptions); |
47 | } | 42 | } |
48 | 43 | ||
49 | 44 | ||
50 | int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) | 45 | int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) |
51 | { | 46 | { |
52 | enum emulation_result er; | 47 | enum emulation_result er; |
53 | int r; | 48 | int r; |
54 | 49 | ||
55 | er = kvmppc_emulate_instruction(run, vcpu); | 50 | er = kvmppc_emulate_instruction(run, vcpu); |
56 | switch (er) { | 51 | switch (er) { |
57 | case EMULATE_DONE: | 52 | case EMULATE_DONE: |
58 | /* Future optimization: only reload non-volatiles if they were | 53 | /* Future optimization: only reload non-volatiles if they were |
59 | * actually modified. */ | 54 | * actually modified. */ |
60 | r = RESUME_GUEST_NV; | 55 | r = RESUME_GUEST_NV; |
61 | break; | 56 | break; |
62 | case EMULATE_DO_MMIO: | 57 | case EMULATE_DO_MMIO: |
63 | run->exit_reason = KVM_EXIT_MMIO; | 58 | run->exit_reason = KVM_EXIT_MMIO; |
64 | /* We must reload nonvolatiles because "update" load/store | 59 | /* We must reload nonvolatiles because "update" load/store |
65 | * instructions modify register state. */ | 60 | * instructions modify register state. */ |
66 | /* Future optimization: only reload non-volatiles if they were | 61 | /* Future optimization: only reload non-volatiles if they were |
67 | * actually modified. */ | 62 | * actually modified. */ |
68 | r = RESUME_HOST_NV; | 63 | r = RESUME_HOST_NV; |
69 | break; | 64 | break; |
70 | case EMULATE_FAIL: | 65 | case EMULATE_FAIL: |
71 | /* XXX Deliver Program interrupt to guest. */ | 66 | /* XXX Deliver Program interrupt to guest. */ |
72 | printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__, | 67 | printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__, |
73 | kvmppc_get_last_inst(vcpu)); | 68 | kvmppc_get_last_inst(vcpu)); |
74 | r = RESUME_HOST; | 69 | r = RESUME_HOST; |
75 | break; | 70 | break; |
76 | default: | 71 | default: |
77 | BUG(); | 72 | BUG(); |
78 | } | 73 | } |
79 | 74 | ||
80 | return r; | 75 | return r; |
81 | } | 76 | } |
82 | 77 | ||
83 | int kvm_arch_hardware_enable(void *garbage) | 78 | int kvm_arch_hardware_enable(void *garbage) |
84 | { | 79 | { |
85 | return 0; | 80 | return 0; |
86 | } | 81 | } |
87 | 82 | ||
88 | void kvm_arch_hardware_disable(void *garbage) | 83 | void kvm_arch_hardware_disable(void *garbage) |
89 | { | 84 | { |
90 | } | 85 | } |
91 | 86 | ||
92 | int kvm_arch_hardware_setup(void) | 87 | int kvm_arch_hardware_setup(void) |
93 | { | 88 | { |
94 | return 0; | 89 | return 0; |
95 | } | 90 | } |
96 | 91 | ||
97 | void kvm_arch_hardware_unsetup(void) | 92 | void kvm_arch_hardware_unsetup(void) |
98 | { | 93 | { |
99 | } | 94 | } |
100 | 95 | ||
101 | void kvm_arch_check_processor_compat(void *rtn) | 96 | void kvm_arch_check_processor_compat(void *rtn) |
102 | { | 97 | { |
103 | *(int *)rtn = kvmppc_core_check_processor_compat(); | 98 | *(int *)rtn = kvmppc_core_check_processor_compat(); |
104 | } | 99 | } |
105 | 100 | ||
106 | struct kvm *kvm_arch_create_vm(void) | 101 | struct kvm *kvm_arch_create_vm(void) |
107 | { | 102 | { |
108 | struct kvm *kvm; | 103 | struct kvm *kvm; |
109 | 104 | ||
110 | kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); | 105 | kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); |
111 | if (!kvm) | 106 | if (!kvm) |
112 | return ERR_PTR(-ENOMEM); | 107 | return ERR_PTR(-ENOMEM); |
113 | 108 | ||
114 | return kvm; | 109 | return kvm; |
115 | } | 110 | } |
116 | 111 | ||
117 | static void kvmppc_free_vcpus(struct kvm *kvm) | 112 | static void kvmppc_free_vcpus(struct kvm *kvm) |
118 | { | 113 | { |
119 | unsigned int i; | 114 | unsigned int i; |
120 | struct kvm_vcpu *vcpu; | 115 | struct kvm_vcpu *vcpu; |
121 | 116 | ||
122 | kvm_for_each_vcpu(i, vcpu, kvm) | 117 | kvm_for_each_vcpu(i, vcpu, kvm) |
123 | kvm_arch_vcpu_free(vcpu); | 118 | kvm_arch_vcpu_free(vcpu); |
124 | 119 | ||
125 | mutex_lock(&kvm->lock); | 120 | mutex_lock(&kvm->lock); |
126 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) | 121 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) |
127 | kvm->vcpus[i] = NULL; | 122 | kvm->vcpus[i] = NULL; |
128 | 123 | ||
129 | atomic_set(&kvm->online_vcpus, 0); | 124 | atomic_set(&kvm->online_vcpus, 0); |
130 | mutex_unlock(&kvm->lock); | 125 | mutex_unlock(&kvm->lock); |
131 | } | 126 | } |
132 | 127 | ||
133 | void kvm_arch_sync_events(struct kvm *kvm) | 128 | void kvm_arch_sync_events(struct kvm *kvm) |
134 | { | 129 | { |
135 | } | 130 | } |
136 | 131 | ||
137 | void kvm_arch_destroy_vm(struct kvm *kvm) | 132 | void kvm_arch_destroy_vm(struct kvm *kvm) |
138 | { | 133 | { |
139 | kvmppc_free_vcpus(kvm); | 134 | kvmppc_free_vcpus(kvm); |
140 | kvm_free_physmem(kvm); | 135 | kvm_free_physmem(kvm); |
141 | cleanup_srcu_struct(&kvm->srcu); | 136 | cleanup_srcu_struct(&kvm->srcu); |
142 | kfree(kvm); | 137 | kfree(kvm); |
143 | } | 138 | } |
144 | 139 | ||
145 | int kvm_dev_ioctl_check_extension(long ext) | 140 | int kvm_dev_ioctl_check_extension(long ext) |
146 | { | 141 | { |
147 | int r; | 142 | int r; |
148 | 143 | ||
149 | switch (ext) { | 144 | switch (ext) { |
150 | case KVM_CAP_PPC_SEGSTATE: | 145 | case KVM_CAP_PPC_SEGSTATE: |
151 | case KVM_CAP_PPC_PAIRED_SINGLES: | 146 | case KVM_CAP_PPC_PAIRED_SINGLES: |
152 | case KVM_CAP_PPC_UNSET_IRQ: | 147 | case KVM_CAP_PPC_UNSET_IRQ: |
153 | case KVM_CAP_ENABLE_CAP: | 148 | case KVM_CAP_ENABLE_CAP: |
154 | case KVM_CAP_PPC_OSI: | 149 | case KVM_CAP_PPC_OSI: |
155 | r = 1; | 150 | r = 1; |
156 | break; | 151 | break; |
157 | case KVM_CAP_COALESCED_MMIO: | 152 | case KVM_CAP_COALESCED_MMIO: |
158 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; | 153 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; |
159 | break; | 154 | break; |
160 | default: | 155 | default: |
161 | r = 0; | 156 | r = 0; |
162 | break; | 157 | break; |
163 | } | 158 | } |
164 | return r; | 159 | return r; |
165 | 160 | ||
166 | } | 161 | } |
167 | 162 | ||
168 | long kvm_arch_dev_ioctl(struct file *filp, | 163 | long kvm_arch_dev_ioctl(struct file *filp, |
169 | unsigned int ioctl, unsigned long arg) | 164 | unsigned int ioctl, unsigned long arg) |
170 | { | 165 | { |
171 | return -EINVAL; | 166 | return -EINVAL; |
172 | } | 167 | } |
173 | 168 | ||
174 | int kvm_arch_prepare_memory_region(struct kvm *kvm, | 169 | int kvm_arch_prepare_memory_region(struct kvm *kvm, |
175 | struct kvm_memory_slot *memslot, | 170 | struct kvm_memory_slot *memslot, |
176 | struct kvm_memory_slot old, | 171 | struct kvm_memory_slot old, |
177 | struct kvm_userspace_memory_region *mem, | 172 | struct kvm_userspace_memory_region *mem, |
178 | int user_alloc) | 173 | int user_alloc) |
179 | { | 174 | { |
180 | return 0; | 175 | return 0; |
181 | } | 176 | } |
182 | 177 | ||
183 | void kvm_arch_commit_memory_region(struct kvm *kvm, | 178 | void kvm_arch_commit_memory_region(struct kvm *kvm, |
184 | struct kvm_userspace_memory_region *mem, | 179 | struct kvm_userspace_memory_region *mem, |
185 | struct kvm_memory_slot old, | 180 | struct kvm_memory_slot old, |
186 | int user_alloc) | 181 | int user_alloc) |
187 | { | 182 | { |
188 | return; | 183 | return; |
189 | } | 184 | } |
190 | 185 | ||
191 | 186 | ||
192 | void kvm_arch_flush_shadow(struct kvm *kvm) | 187 | void kvm_arch_flush_shadow(struct kvm *kvm) |
193 | { | 188 | { |
194 | } | 189 | } |
195 | 190 | ||
196 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) | 191 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) |
197 | { | 192 | { |
198 | struct kvm_vcpu *vcpu; | 193 | struct kvm_vcpu *vcpu; |
199 | vcpu = kvmppc_core_vcpu_create(kvm, id); | 194 | vcpu = kvmppc_core_vcpu_create(kvm, id); |
200 | if (!IS_ERR(vcpu)) | 195 | if (!IS_ERR(vcpu)) |
201 | kvmppc_create_vcpu_debugfs(vcpu, id); | 196 | kvmppc_create_vcpu_debugfs(vcpu, id); |
202 | return vcpu; | 197 | return vcpu; |
203 | } | 198 | } |
204 | 199 | ||
205 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) | 200 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) |
206 | { | 201 | { |
207 | /* Make sure we're not using the vcpu anymore */ | 202 | /* Make sure we're not using the vcpu anymore */ |
208 | hrtimer_cancel(&vcpu->arch.dec_timer); | 203 | hrtimer_cancel(&vcpu->arch.dec_timer); |
209 | tasklet_kill(&vcpu->arch.tasklet); | 204 | tasklet_kill(&vcpu->arch.tasklet); |
210 | 205 | ||
211 | kvmppc_remove_vcpu_debugfs(vcpu); | 206 | kvmppc_remove_vcpu_debugfs(vcpu); |
212 | kvmppc_core_vcpu_free(vcpu); | 207 | kvmppc_core_vcpu_free(vcpu); |
213 | } | 208 | } |
214 | 209 | ||
215 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) | 210 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) |
216 | { | 211 | { |
217 | kvm_arch_vcpu_free(vcpu); | 212 | kvm_arch_vcpu_free(vcpu); |
218 | } | 213 | } |
219 | 214 | ||
220 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) | 215 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) |
221 | { | 216 | { |
222 | return kvmppc_core_pending_dec(vcpu); | 217 | return kvmppc_core_pending_dec(vcpu); |
223 | } | 218 | } |
224 | 219 | ||
225 | static void kvmppc_decrementer_func(unsigned long data) | 220 | static void kvmppc_decrementer_func(unsigned long data) |
226 | { | 221 | { |
227 | struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; | 222 | struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; |
228 | 223 | ||
229 | kvmppc_core_queue_dec(vcpu); | 224 | kvmppc_core_queue_dec(vcpu); |
230 | 225 | ||
231 | if (waitqueue_active(&vcpu->wq)) { | 226 | if (waitqueue_active(&vcpu->wq)) { |
232 | wake_up_interruptible(&vcpu->wq); | 227 | wake_up_interruptible(&vcpu->wq); |
233 | vcpu->stat.halt_wakeup++; | 228 | vcpu->stat.halt_wakeup++; |
234 | } | 229 | } |
235 | } | 230 | } |
236 | 231 | ||
237 | /* | 232 | /* |
238 | * low level hrtimer wake routine. Because this runs in hardirq context | 233 | * low level hrtimer wake routine. Because this runs in hardirq context |
239 | * we schedule a tasklet to do the real work. | 234 | * we schedule a tasklet to do the real work. |
240 | */ | 235 | */ |
241 | enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer) | 236 | enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer) |
242 | { | 237 | { |
243 | struct kvm_vcpu *vcpu; | 238 | struct kvm_vcpu *vcpu; |
244 | 239 | ||
245 | vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer); | 240 | vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer); |
246 | tasklet_schedule(&vcpu->arch.tasklet); | 241 | tasklet_schedule(&vcpu->arch.tasklet); |
247 | 242 | ||
248 | return HRTIMER_NORESTART; | 243 | return HRTIMER_NORESTART; |
249 | } | 244 | } |
250 | 245 | ||
251 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) | 246 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) |
252 | { | 247 | { |
253 | hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); | 248 | hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); |
254 | tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu); | 249 | tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu); |
255 | vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup; | 250 | vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup; |
256 | 251 | ||
257 | return 0; | 252 | return 0; |
258 | } | 253 | } |
259 | 254 | ||
260 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) | 255 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) |
261 | { | 256 | { |
262 | kvmppc_mmu_destroy(vcpu); | 257 | kvmppc_mmu_destroy(vcpu); |
263 | } | 258 | } |
264 | 259 | ||
265 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) | 260 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) |
266 | { | 261 | { |
267 | kvmppc_core_vcpu_load(vcpu, cpu); | 262 | kvmppc_core_vcpu_load(vcpu, cpu); |
268 | } | 263 | } |
269 | 264 | ||
270 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) | 265 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) |
271 | { | 266 | { |
272 | kvmppc_core_vcpu_put(vcpu); | 267 | kvmppc_core_vcpu_put(vcpu); |
273 | } | 268 | } |
274 | 269 | ||
275 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, | 270 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, |
276 | struct kvm_guest_debug *dbg) | 271 | struct kvm_guest_debug *dbg) |
277 | { | 272 | { |
278 | return -EINVAL; | 273 | return -EINVAL; |
279 | } | 274 | } |
280 | 275 | ||
281 | static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu, | 276 | static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu, |
282 | struct kvm_run *run) | 277 | struct kvm_run *run) |
283 | { | 278 | { |
284 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data); | 279 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data); |
285 | } | 280 | } |
286 | 281 | ||
287 | static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, | 282 | static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, |
288 | struct kvm_run *run) | 283 | struct kvm_run *run) |
289 | { | 284 | { |
290 | u64 uninitialized_var(gpr); | 285 | u64 uninitialized_var(gpr); |
291 | 286 | ||
292 | if (run->mmio.len > sizeof(gpr)) { | 287 | if (run->mmio.len > sizeof(gpr)) { |
293 | printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len); | 288 | printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len); |
294 | return; | 289 | return; |
295 | } | 290 | } |
296 | 291 | ||
297 | if (vcpu->arch.mmio_is_bigendian) { | 292 | if (vcpu->arch.mmio_is_bigendian) { |
298 | switch (run->mmio.len) { | 293 | switch (run->mmio.len) { |
299 | case 8: gpr = *(u64 *)run->mmio.data; break; | 294 | case 8: gpr = *(u64 *)run->mmio.data; break; |
300 | case 4: gpr = *(u32 *)run->mmio.data; break; | 295 | case 4: gpr = *(u32 *)run->mmio.data; break; |
301 | case 2: gpr = *(u16 *)run->mmio.data; break; | 296 | case 2: gpr = *(u16 *)run->mmio.data; break; |
302 | case 1: gpr = *(u8 *)run->mmio.data; break; | 297 | case 1: gpr = *(u8 *)run->mmio.data; break; |
303 | } | 298 | } |
304 | } else { | 299 | } else { |
305 | /* Convert BE data from userland back to LE. */ | 300 | /* Convert BE data from userland back to LE. */ |
306 | switch (run->mmio.len) { | 301 | switch (run->mmio.len) { |
307 | case 4: gpr = ld_le32((u32 *)run->mmio.data); break; | 302 | case 4: gpr = ld_le32((u32 *)run->mmio.data); break; |
308 | case 2: gpr = ld_le16((u16 *)run->mmio.data); break; | 303 | case 2: gpr = ld_le16((u16 *)run->mmio.data); break; |
309 | case 1: gpr = *(u8 *)run->mmio.data; break; | 304 | case 1: gpr = *(u8 *)run->mmio.data; break; |
310 | } | 305 | } |
311 | } | 306 | } |
312 | 307 | ||
313 | if (vcpu->arch.mmio_sign_extend) { | 308 | if (vcpu->arch.mmio_sign_extend) { |
314 | switch (run->mmio.len) { | 309 | switch (run->mmio.len) { |
315 | #ifdef CONFIG_PPC64 | 310 | #ifdef CONFIG_PPC64 |
316 | case 4: | 311 | case 4: |
317 | gpr = (s64)(s32)gpr; | 312 | gpr = (s64)(s32)gpr; |
318 | break; | 313 | break; |
319 | #endif | 314 | #endif |
320 | case 2: | 315 | case 2: |
321 | gpr = (s64)(s16)gpr; | 316 | gpr = (s64)(s16)gpr; |
322 | break; | 317 | break; |
323 | case 1: | 318 | case 1: |
324 | gpr = (s64)(s8)gpr; | 319 | gpr = (s64)(s8)gpr; |
325 | break; | 320 | break; |
326 | } | 321 | } |
327 | } | 322 | } |
328 | 323 | ||
329 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); | 324 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); |
330 | 325 | ||
331 | switch (vcpu->arch.io_gpr & KVM_REG_EXT_MASK) { | 326 | switch (vcpu->arch.io_gpr & KVM_REG_EXT_MASK) { |
332 | case KVM_REG_GPR: | 327 | case KVM_REG_GPR: |
333 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); | 328 | kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); |
334 | break; | 329 | break; |
335 | case KVM_REG_FPR: | 330 | case KVM_REG_FPR: |
336 | vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; | 331 | vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; |
337 | break; | 332 | break; |
338 | #ifdef CONFIG_PPC_BOOK3S | 333 | #ifdef CONFIG_PPC_BOOK3S |
339 | case KVM_REG_QPR: | 334 | case KVM_REG_QPR: |
340 | vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; | 335 | vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; |
341 | break; | 336 | break; |
342 | case KVM_REG_FQPR: | 337 | case KVM_REG_FQPR: |
343 | vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; | 338 | vcpu->arch.fpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; |
344 | vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; | 339 | vcpu->arch.qpr[vcpu->arch.io_gpr & KVM_REG_MASK] = gpr; |
345 | break; | 340 | break; |
346 | #endif | 341 | #endif |
347 | default: | 342 | default: |
348 | BUG(); | 343 | BUG(); |
349 | } | 344 | } |
350 | } | 345 | } |
351 | 346 | ||
352 | int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, | 347 | int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, |
353 | unsigned int rt, unsigned int bytes, int is_bigendian) | 348 | unsigned int rt, unsigned int bytes, int is_bigendian) |
354 | { | 349 | { |
355 | if (bytes > sizeof(run->mmio.data)) { | 350 | if (bytes > sizeof(run->mmio.data)) { |
356 | printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, | 351 | printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, |
357 | run->mmio.len); | 352 | run->mmio.len); |
358 | } | 353 | } |
359 | 354 | ||
360 | run->mmio.phys_addr = vcpu->arch.paddr_accessed; | 355 | run->mmio.phys_addr = vcpu->arch.paddr_accessed; |
361 | run->mmio.len = bytes; | 356 | run->mmio.len = bytes; |
362 | run->mmio.is_write = 0; | 357 | run->mmio.is_write = 0; |
363 | 358 | ||
364 | vcpu->arch.io_gpr = rt; | 359 | vcpu->arch.io_gpr = rt; |
365 | vcpu->arch.mmio_is_bigendian = is_bigendian; | 360 | vcpu->arch.mmio_is_bigendian = is_bigendian; |
366 | vcpu->mmio_needed = 1; | 361 | vcpu->mmio_needed = 1; |
367 | vcpu->mmio_is_write = 0; | 362 | vcpu->mmio_is_write = 0; |
368 | vcpu->arch.mmio_sign_extend = 0; | 363 | vcpu->arch.mmio_sign_extend = 0; |
369 | 364 | ||
370 | return EMULATE_DO_MMIO; | 365 | return EMULATE_DO_MMIO; |
371 | } | 366 | } |
372 | 367 | ||
373 | /* Same as above, but sign extends */ | 368 | /* Same as above, but sign extends */ |
374 | int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, | 369 | int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, |
375 | unsigned int rt, unsigned int bytes, int is_bigendian) | 370 | unsigned int rt, unsigned int bytes, int is_bigendian) |
376 | { | 371 | { |
377 | int r; | 372 | int r; |
378 | 373 | ||
379 | r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); | 374 | r = kvmppc_handle_load(run, vcpu, rt, bytes, is_bigendian); |
380 | vcpu->arch.mmio_sign_extend = 1; | 375 | vcpu->arch.mmio_sign_extend = 1; |
381 | 376 | ||
382 | return r; | 377 | return r; |
383 | } | 378 | } |
384 | 379 | ||
385 | int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, | 380 | int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, |
386 | u64 val, unsigned int bytes, int is_bigendian) | 381 | u64 val, unsigned int bytes, int is_bigendian) |
387 | { | 382 | { |
388 | void *data = run->mmio.data; | 383 | void *data = run->mmio.data; |
389 | 384 | ||
390 | if (bytes > sizeof(run->mmio.data)) { | 385 | if (bytes > sizeof(run->mmio.data)) { |
391 | printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, | 386 | printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__, |
392 | run->mmio.len); | 387 | run->mmio.len); |
393 | } | 388 | } |
394 | 389 | ||
395 | run->mmio.phys_addr = vcpu->arch.paddr_accessed; | 390 | run->mmio.phys_addr = vcpu->arch.paddr_accessed; |
396 | run->mmio.len = bytes; | 391 | run->mmio.len = bytes; |
397 | run->mmio.is_write = 1; | 392 | run->mmio.is_write = 1; |
398 | vcpu->mmio_needed = 1; | 393 | vcpu->mmio_needed = 1; |
399 | vcpu->mmio_is_write = 1; | 394 | vcpu->mmio_is_write = 1; |
400 | 395 | ||
401 | /* Store the value at the lowest bytes in 'data'. */ | 396 | /* Store the value at the lowest bytes in 'data'. */ |
402 | if (is_bigendian) { | 397 | if (is_bigendian) { |
403 | switch (bytes) { | 398 | switch (bytes) { |
404 | case 8: *(u64 *)data = val; break; | 399 | case 8: *(u64 *)data = val; break; |
405 | case 4: *(u32 *)data = val; break; | 400 | case 4: *(u32 *)data = val; break; |
406 | case 2: *(u16 *)data = val; break; | 401 | case 2: *(u16 *)data = val; break; |
407 | case 1: *(u8 *)data = val; break; | 402 | case 1: *(u8 *)data = val; break; |
408 | } | 403 | } |
409 | } else { | 404 | } else { |
410 | /* Store LE value into 'data'. */ | 405 | /* Store LE value into 'data'. */ |
411 | switch (bytes) { | 406 | switch (bytes) { |
412 | case 4: st_le32(data, val); break; | 407 | case 4: st_le32(data, val); break; |
413 | case 2: st_le16(data, val); break; | 408 | case 2: st_le16(data, val); break; |
414 | case 1: *(u8 *)data = val; break; | 409 | case 1: *(u8 *)data = val; break; |
415 | } | 410 | } |
416 | } | 411 | } |
417 | 412 | ||
418 | return EMULATE_DO_MMIO; | 413 | return EMULATE_DO_MMIO; |
419 | } | 414 | } |
420 | 415 | ||
421 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) | 416 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) |
422 | { | 417 | { |
423 | int r; | 418 | int r; |
424 | sigset_t sigsaved; | 419 | sigset_t sigsaved; |
425 | 420 | ||
426 | if (vcpu->sigset_active) | 421 | if (vcpu->sigset_active) |
427 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); | 422 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); |
428 | 423 | ||
429 | if (vcpu->mmio_needed) { | 424 | if (vcpu->mmio_needed) { |
430 | if (!vcpu->mmio_is_write) | 425 | if (!vcpu->mmio_is_write) |
431 | kvmppc_complete_mmio_load(vcpu, run); | 426 | kvmppc_complete_mmio_load(vcpu, run); |
432 | vcpu->mmio_needed = 0; | 427 | vcpu->mmio_needed = 0; |
433 | } else if (vcpu->arch.dcr_needed) { | 428 | } else if (vcpu->arch.dcr_needed) { |
434 | if (!vcpu->arch.dcr_is_write) | 429 | if (!vcpu->arch.dcr_is_write) |
435 | kvmppc_complete_dcr_load(vcpu, run); | 430 | kvmppc_complete_dcr_load(vcpu, run); |
436 | vcpu->arch.dcr_needed = 0; | 431 | vcpu->arch.dcr_needed = 0; |
437 | } else if (vcpu->arch.osi_needed) { | 432 | } else if (vcpu->arch.osi_needed) { |
438 | u64 *gprs = run->osi.gprs; | 433 | u64 *gprs = run->osi.gprs; |
439 | int i; | 434 | int i; |
440 | 435 | ||
441 | for (i = 0; i < 32; i++) | 436 | for (i = 0; i < 32; i++) |
442 | kvmppc_set_gpr(vcpu, i, gprs[i]); | 437 | kvmppc_set_gpr(vcpu, i, gprs[i]); |
443 | vcpu->arch.osi_needed = 0; | 438 | vcpu->arch.osi_needed = 0; |
444 | } | 439 | } |
445 | 440 | ||
446 | kvmppc_core_deliver_interrupts(vcpu); | 441 | kvmppc_core_deliver_interrupts(vcpu); |
447 | 442 | ||
448 | local_irq_disable(); | 443 | local_irq_disable(); |
449 | kvm_guest_enter(); | 444 | kvm_guest_enter(); |
450 | r = __kvmppc_vcpu_run(run, vcpu); | 445 | r = __kvmppc_vcpu_run(run, vcpu); |
451 | kvm_guest_exit(); | 446 | kvm_guest_exit(); |
452 | local_irq_enable(); | 447 | local_irq_enable(); |
453 | 448 | ||
454 | if (vcpu->sigset_active) | 449 | if (vcpu->sigset_active) |
455 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); | 450 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); |
456 | 451 | ||
457 | return r; | 452 | return r; |
458 | } | 453 | } |
459 | 454 | ||
460 | int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) | 455 | int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) |
461 | { | 456 | { |
462 | if (irq->irq == KVM_INTERRUPT_UNSET) | 457 | if (irq->irq == KVM_INTERRUPT_UNSET) |
463 | kvmppc_core_dequeue_external(vcpu, irq); | 458 | kvmppc_core_dequeue_external(vcpu, irq); |
464 | else | 459 | else |
465 | kvmppc_core_queue_external(vcpu, irq); | 460 | kvmppc_core_queue_external(vcpu, irq); |
466 | 461 | ||
467 | if (waitqueue_active(&vcpu->wq)) { | 462 | if (waitqueue_active(&vcpu->wq)) { |
468 | wake_up_interruptible(&vcpu->wq); | 463 | wake_up_interruptible(&vcpu->wq); |
469 | vcpu->stat.halt_wakeup++; | 464 | vcpu->stat.halt_wakeup++; |
470 | } | 465 | } |
471 | 466 | ||
472 | return 0; | 467 | return 0; |
473 | } | 468 | } |
474 | 469 | ||
475 | static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, | 470 | static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, |
476 | struct kvm_enable_cap *cap) | 471 | struct kvm_enable_cap *cap) |
477 | { | 472 | { |
478 | int r; | 473 | int r; |
479 | 474 | ||
480 | if (cap->flags) | 475 | if (cap->flags) |
481 | return -EINVAL; | 476 | return -EINVAL; |
482 | 477 | ||
483 | switch (cap->cap) { | 478 | switch (cap->cap) { |
484 | case KVM_CAP_PPC_OSI: | 479 | case KVM_CAP_PPC_OSI: |
485 | r = 0; | 480 | r = 0; |
486 | vcpu->arch.osi_enabled = true; | 481 | vcpu->arch.osi_enabled = true; |
487 | break; | 482 | break; |
488 | default: | 483 | default: |
489 | r = -EINVAL; | 484 | r = -EINVAL; |
490 | break; | 485 | break; |
491 | } | 486 | } |
492 | 487 | ||
493 | return r; | 488 | return r; |
494 | } | 489 | } |
495 | 490 | ||
496 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, | 491 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, |
497 | struct kvm_mp_state *mp_state) | 492 | struct kvm_mp_state *mp_state) |
498 | { | 493 | { |
499 | return -EINVAL; | 494 | return -EINVAL; |
500 | } | 495 | } |
501 | 496 | ||
502 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, | 497 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, |
503 | struct kvm_mp_state *mp_state) | 498 | struct kvm_mp_state *mp_state) |
504 | { | 499 | { |
505 | return -EINVAL; | 500 | return -EINVAL; |
506 | } | 501 | } |
507 | 502 | ||
508 | long kvm_arch_vcpu_ioctl(struct file *filp, | 503 | long kvm_arch_vcpu_ioctl(struct file *filp, |
509 | unsigned int ioctl, unsigned long arg) | 504 | unsigned int ioctl, unsigned long arg) |
510 | { | 505 | { |
511 | struct kvm_vcpu *vcpu = filp->private_data; | 506 | struct kvm_vcpu *vcpu = filp->private_data; |
512 | void __user *argp = (void __user *)arg; | 507 | void __user *argp = (void __user *)arg; |
513 | long r; | 508 | long r; |
514 | 509 | ||
515 | switch (ioctl) { | 510 | switch (ioctl) { |
516 | case KVM_INTERRUPT: { | 511 | case KVM_INTERRUPT: { |
517 | struct kvm_interrupt irq; | 512 | struct kvm_interrupt irq; |
518 | r = -EFAULT; | 513 | r = -EFAULT; |
519 | if (copy_from_user(&irq, argp, sizeof(irq))) | 514 | if (copy_from_user(&irq, argp, sizeof(irq))) |
520 | goto out; | 515 | goto out; |
521 | r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); | 516 | r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); |
522 | goto out; | 517 | goto out; |
523 | } | 518 | } |
524 | 519 | ||
525 | case KVM_ENABLE_CAP: | 520 | case KVM_ENABLE_CAP: |
526 | { | 521 | { |
527 | struct kvm_enable_cap cap; | 522 | struct kvm_enable_cap cap; |
528 | r = -EFAULT; | 523 | r = -EFAULT; |
529 | if (copy_from_user(&cap, argp, sizeof(cap))) | 524 | if (copy_from_user(&cap, argp, sizeof(cap))) |
530 | goto out; | 525 | goto out; |
531 | r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap); | 526 | r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap); |
532 | break; | 527 | break; |
533 | } | 528 | } |
534 | default: | 529 | default: |
535 | r = -EINVAL; | 530 | r = -EINVAL; |
536 | } | 531 | } |
537 | 532 | ||
538 | out: | 533 | out: |
539 | return r; | 534 | return r; |
540 | } | 535 | } |
541 | 536 | ||
542 | long kvm_arch_vm_ioctl(struct file *filp, | 537 | long kvm_arch_vm_ioctl(struct file *filp, |
543 | unsigned int ioctl, unsigned long arg) | 538 | unsigned int ioctl, unsigned long arg) |
544 | { | 539 | { |
545 | long r; | 540 | long r; |
546 | 541 | ||
547 | switch (ioctl) { | 542 | switch (ioctl) { |
548 | default: | 543 | default: |
549 | r = -ENOTTY; | 544 | r = -ENOTTY; |
550 | } | 545 | } |
551 | 546 | ||
552 | return r; | 547 | return r; |
553 | } | 548 | } |
554 | 549 | ||
555 | int kvm_arch_init(void *opaque) | 550 | int kvm_arch_init(void *opaque) |
556 | { | 551 | { |
557 | return 0; | 552 | return 0; |
558 | } | 553 | } |
559 | 554 | ||
560 | void kvm_arch_exit(void) | 555 | void kvm_arch_exit(void) |
561 | { | 556 | { |
562 | } | 557 | } |
563 | 558 |
arch/s390/kvm/kvm-s390.c
1 | /* | 1 | /* |
2 | * s390host.c -- hosting zSeries kernel virtual machines | 2 | * s390host.c -- hosting zSeries kernel virtual machines |
3 | * | 3 | * |
4 | * Copyright IBM Corp. 2008,2009 | 4 | * Copyright IBM Corp. 2008,2009 |
5 | * | 5 | * |
6 | * This program is free software; you can redistribute it and/or modify | 6 | * This program is free software; you can redistribute it and/or modify |
7 | * it under the terms of the GNU General Public License (version 2 only) | 7 | * it under the terms of the GNU General Public License (version 2 only) |
8 | * as published by the Free Software Foundation. | 8 | * as published by the Free Software Foundation. |
9 | * | 9 | * |
10 | * Author(s): Carsten Otte <cotte@de.ibm.com> | 10 | * Author(s): Carsten Otte <cotte@de.ibm.com> |
11 | * Christian Borntraeger <borntraeger@de.ibm.com> | 11 | * Christian Borntraeger <borntraeger@de.ibm.com> |
12 | * Heiko Carstens <heiko.carstens@de.ibm.com> | 12 | * Heiko Carstens <heiko.carstens@de.ibm.com> |
13 | * Christian Ehrhardt <ehrhardt@de.ibm.com> | 13 | * Christian Ehrhardt <ehrhardt@de.ibm.com> |
14 | */ | 14 | */ |
15 | 15 | ||
16 | #include <linux/compiler.h> | 16 | #include <linux/compiler.h> |
17 | #include <linux/err.h> | 17 | #include <linux/err.h> |
18 | #include <linux/fs.h> | 18 | #include <linux/fs.h> |
19 | #include <linux/hrtimer.h> | 19 | #include <linux/hrtimer.h> |
20 | #include <linux/init.h> | 20 | #include <linux/init.h> |
21 | #include <linux/kvm.h> | 21 | #include <linux/kvm.h> |
22 | #include <linux/kvm_host.h> | 22 | #include <linux/kvm_host.h> |
23 | #include <linux/module.h> | 23 | #include <linux/module.h> |
24 | #include <linux/slab.h> | 24 | #include <linux/slab.h> |
25 | #include <linux/timer.h> | 25 | #include <linux/timer.h> |
26 | #include <asm/asm-offsets.h> | 26 | #include <asm/asm-offsets.h> |
27 | #include <asm/lowcore.h> | 27 | #include <asm/lowcore.h> |
28 | #include <asm/pgtable.h> | 28 | #include <asm/pgtable.h> |
29 | #include <asm/nmi.h> | 29 | #include <asm/nmi.h> |
30 | #include <asm/system.h> | 30 | #include <asm/system.h> |
31 | #include "kvm-s390.h" | 31 | #include "kvm-s390.h" |
32 | #include "gaccess.h" | 32 | #include "gaccess.h" |
33 | 33 | ||
34 | #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU | 34 | #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU |
35 | 35 | ||
36 | struct kvm_stats_debugfs_item debugfs_entries[] = { | 36 | struct kvm_stats_debugfs_item debugfs_entries[] = { |
37 | { "userspace_handled", VCPU_STAT(exit_userspace) }, | 37 | { "userspace_handled", VCPU_STAT(exit_userspace) }, |
38 | { "exit_null", VCPU_STAT(exit_null) }, | 38 | { "exit_null", VCPU_STAT(exit_null) }, |
39 | { "exit_validity", VCPU_STAT(exit_validity) }, | 39 | { "exit_validity", VCPU_STAT(exit_validity) }, |
40 | { "exit_stop_request", VCPU_STAT(exit_stop_request) }, | 40 | { "exit_stop_request", VCPU_STAT(exit_stop_request) }, |
41 | { "exit_external_request", VCPU_STAT(exit_external_request) }, | 41 | { "exit_external_request", VCPU_STAT(exit_external_request) }, |
42 | { "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) }, | 42 | { "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) }, |
43 | { "exit_instruction", VCPU_STAT(exit_instruction) }, | 43 | { "exit_instruction", VCPU_STAT(exit_instruction) }, |
44 | { "exit_program_interruption", VCPU_STAT(exit_program_interruption) }, | 44 | { "exit_program_interruption", VCPU_STAT(exit_program_interruption) }, |
45 | { "exit_instr_and_program_int", VCPU_STAT(exit_instr_and_program) }, | 45 | { "exit_instr_and_program_int", VCPU_STAT(exit_instr_and_program) }, |
46 | { "instruction_lctlg", VCPU_STAT(instruction_lctlg) }, | 46 | { "instruction_lctlg", VCPU_STAT(instruction_lctlg) }, |
47 | { "instruction_lctl", VCPU_STAT(instruction_lctl) }, | 47 | { "instruction_lctl", VCPU_STAT(instruction_lctl) }, |
48 | { "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) }, | 48 | { "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) }, |
49 | { "deliver_service_signal", VCPU_STAT(deliver_service_signal) }, | 49 | { "deliver_service_signal", VCPU_STAT(deliver_service_signal) }, |
50 | { "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) }, | 50 | { "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) }, |
51 | { "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) }, | 51 | { "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) }, |
52 | { "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) }, | 52 | { "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) }, |
53 | { "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) }, | 53 | { "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) }, |
54 | { "deliver_program_interruption", VCPU_STAT(deliver_program_int) }, | 54 | { "deliver_program_interruption", VCPU_STAT(deliver_program_int) }, |
55 | { "exit_wait_state", VCPU_STAT(exit_wait_state) }, | 55 | { "exit_wait_state", VCPU_STAT(exit_wait_state) }, |
56 | { "instruction_stidp", VCPU_STAT(instruction_stidp) }, | 56 | { "instruction_stidp", VCPU_STAT(instruction_stidp) }, |
57 | { "instruction_spx", VCPU_STAT(instruction_spx) }, | 57 | { "instruction_spx", VCPU_STAT(instruction_spx) }, |
58 | { "instruction_stpx", VCPU_STAT(instruction_stpx) }, | 58 | { "instruction_stpx", VCPU_STAT(instruction_stpx) }, |
59 | { "instruction_stap", VCPU_STAT(instruction_stap) }, | 59 | { "instruction_stap", VCPU_STAT(instruction_stap) }, |
60 | { "instruction_storage_key", VCPU_STAT(instruction_storage_key) }, | 60 | { "instruction_storage_key", VCPU_STAT(instruction_storage_key) }, |
61 | { "instruction_stsch", VCPU_STAT(instruction_stsch) }, | 61 | { "instruction_stsch", VCPU_STAT(instruction_stsch) }, |
62 | { "instruction_chsc", VCPU_STAT(instruction_chsc) }, | 62 | { "instruction_chsc", VCPU_STAT(instruction_chsc) }, |
63 | { "instruction_stsi", VCPU_STAT(instruction_stsi) }, | 63 | { "instruction_stsi", VCPU_STAT(instruction_stsi) }, |
64 | { "instruction_stfl", VCPU_STAT(instruction_stfl) }, | 64 | { "instruction_stfl", VCPU_STAT(instruction_stfl) }, |
65 | { "instruction_sigp_sense", VCPU_STAT(instruction_sigp_sense) }, | 65 | { "instruction_sigp_sense", VCPU_STAT(instruction_sigp_sense) }, |
66 | { "instruction_sigp_emergency", VCPU_STAT(instruction_sigp_emergency) }, | 66 | { "instruction_sigp_emergency", VCPU_STAT(instruction_sigp_emergency) }, |
67 | { "instruction_sigp_stop", VCPU_STAT(instruction_sigp_stop) }, | 67 | { "instruction_sigp_stop", VCPU_STAT(instruction_sigp_stop) }, |
68 | { "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) }, | 68 | { "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) }, |
69 | { "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) }, | 69 | { "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) }, |
70 | { "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) }, | 70 | { "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) }, |
71 | { "diagnose_44", VCPU_STAT(diagnose_44) }, | 71 | { "diagnose_44", VCPU_STAT(diagnose_44) }, |
72 | { NULL } | 72 | { NULL } |
73 | }; | 73 | }; |
74 | 74 | ||
75 | static unsigned long long *facilities; | 75 | static unsigned long long *facilities; |
76 | 76 | ||
77 | /* Section: not file related */ | 77 | /* Section: not file related */ |
78 | int kvm_arch_hardware_enable(void *garbage) | 78 | int kvm_arch_hardware_enable(void *garbage) |
79 | { | 79 | { |
80 | /* every s390 is virtualization enabled ;-) */ | 80 | /* every s390 is virtualization enabled ;-) */ |
81 | return 0; | 81 | return 0; |
82 | } | 82 | } |
83 | 83 | ||
84 | void kvm_arch_hardware_disable(void *garbage) | 84 | void kvm_arch_hardware_disable(void *garbage) |
85 | { | 85 | { |
86 | } | 86 | } |
87 | 87 | ||
88 | int kvm_arch_hardware_setup(void) | 88 | int kvm_arch_hardware_setup(void) |
89 | { | 89 | { |
90 | return 0; | 90 | return 0; |
91 | } | 91 | } |
92 | 92 | ||
93 | void kvm_arch_hardware_unsetup(void) | 93 | void kvm_arch_hardware_unsetup(void) |
94 | { | 94 | { |
95 | } | 95 | } |
96 | 96 | ||
97 | void kvm_arch_check_processor_compat(void *rtn) | 97 | void kvm_arch_check_processor_compat(void *rtn) |
98 | { | 98 | { |
99 | } | 99 | } |
100 | 100 | ||
101 | int kvm_arch_init(void *opaque) | 101 | int kvm_arch_init(void *opaque) |
102 | { | 102 | { |
103 | return 0; | 103 | return 0; |
104 | } | 104 | } |
105 | 105 | ||
106 | void kvm_arch_exit(void) | 106 | void kvm_arch_exit(void) |
107 | { | 107 | { |
108 | } | 108 | } |
109 | 109 | ||
110 | /* Section: device related */ | 110 | /* Section: device related */ |
111 | long kvm_arch_dev_ioctl(struct file *filp, | 111 | long kvm_arch_dev_ioctl(struct file *filp, |
112 | unsigned int ioctl, unsigned long arg) | 112 | unsigned int ioctl, unsigned long arg) |
113 | { | 113 | { |
114 | if (ioctl == KVM_S390_ENABLE_SIE) | 114 | if (ioctl == KVM_S390_ENABLE_SIE) |
115 | return s390_enable_sie(); | 115 | return s390_enable_sie(); |
116 | return -EINVAL; | 116 | return -EINVAL; |
117 | } | 117 | } |
118 | 118 | ||
119 | int kvm_dev_ioctl_check_extension(long ext) | 119 | int kvm_dev_ioctl_check_extension(long ext) |
120 | { | 120 | { |
121 | int r; | 121 | int r; |
122 | 122 | ||
123 | switch (ext) { | 123 | switch (ext) { |
124 | case KVM_CAP_S390_PSW: | 124 | case KVM_CAP_S390_PSW: |
125 | r = 1; | 125 | r = 1; |
126 | break; | 126 | break; |
127 | default: | 127 | default: |
128 | r = 0; | 128 | r = 0; |
129 | } | 129 | } |
130 | return r; | 130 | return r; |
131 | } | 131 | } |
132 | 132 | ||
133 | /* Section: vm related */ | 133 | /* Section: vm related */ |
134 | /* | 134 | /* |
135 | * Get (and clear) the dirty memory log for a memory slot. | 135 | * Get (and clear) the dirty memory log for a memory slot. |
136 | */ | 136 | */ |
137 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, | 137 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, |
138 | struct kvm_dirty_log *log) | 138 | struct kvm_dirty_log *log) |
139 | { | 139 | { |
140 | return 0; | 140 | return 0; |
141 | } | 141 | } |
142 | 142 | ||
143 | long kvm_arch_vm_ioctl(struct file *filp, | 143 | long kvm_arch_vm_ioctl(struct file *filp, |
144 | unsigned int ioctl, unsigned long arg) | 144 | unsigned int ioctl, unsigned long arg) |
145 | { | 145 | { |
146 | struct kvm *kvm = filp->private_data; | 146 | struct kvm *kvm = filp->private_data; |
147 | void __user *argp = (void __user *)arg; | 147 | void __user *argp = (void __user *)arg; |
148 | int r; | 148 | int r; |
149 | 149 | ||
150 | switch (ioctl) { | 150 | switch (ioctl) { |
151 | case KVM_S390_INTERRUPT: { | 151 | case KVM_S390_INTERRUPT: { |
152 | struct kvm_s390_interrupt s390int; | 152 | struct kvm_s390_interrupt s390int; |
153 | 153 | ||
154 | r = -EFAULT; | 154 | r = -EFAULT; |
155 | if (copy_from_user(&s390int, argp, sizeof(s390int))) | 155 | if (copy_from_user(&s390int, argp, sizeof(s390int))) |
156 | break; | 156 | break; |
157 | r = kvm_s390_inject_vm(kvm, &s390int); | 157 | r = kvm_s390_inject_vm(kvm, &s390int); |
158 | break; | 158 | break; |
159 | } | 159 | } |
160 | default: | 160 | default: |
161 | r = -ENOTTY; | 161 | r = -ENOTTY; |
162 | } | 162 | } |
163 | 163 | ||
164 | return r; | 164 | return r; |
165 | } | 165 | } |
166 | 166 | ||
167 | struct kvm *kvm_arch_create_vm(void) | 167 | struct kvm *kvm_arch_create_vm(void) |
168 | { | 168 | { |
169 | struct kvm *kvm; | 169 | struct kvm *kvm; |
170 | int rc; | 170 | int rc; |
171 | char debug_name[16]; | 171 | char debug_name[16]; |
172 | 172 | ||
173 | rc = s390_enable_sie(); | 173 | rc = s390_enable_sie(); |
174 | if (rc) | 174 | if (rc) |
175 | goto out_nokvm; | 175 | goto out_nokvm; |
176 | 176 | ||
177 | rc = -ENOMEM; | 177 | rc = -ENOMEM; |
178 | kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); | 178 | kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); |
179 | if (!kvm) | 179 | if (!kvm) |
180 | goto out_nokvm; | 180 | goto out_nokvm; |
181 | 181 | ||
182 | kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL); | 182 | kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL); |
183 | if (!kvm->arch.sca) | 183 | if (!kvm->arch.sca) |
184 | goto out_nosca; | 184 | goto out_nosca; |
185 | 185 | ||
186 | sprintf(debug_name, "kvm-%u", current->pid); | 186 | sprintf(debug_name, "kvm-%u", current->pid); |
187 | 187 | ||
188 | kvm->arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long)); | 188 | kvm->arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long)); |
189 | if (!kvm->arch.dbf) | 189 | if (!kvm->arch.dbf) |
190 | goto out_nodbf; | 190 | goto out_nodbf; |
191 | 191 | ||
192 | spin_lock_init(&kvm->arch.float_int.lock); | 192 | spin_lock_init(&kvm->arch.float_int.lock); |
193 | INIT_LIST_HEAD(&kvm->arch.float_int.list); | 193 | INIT_LIST_HEAD(&kvm->arch.float_int.list); |
194 | 194 | ||
195 | debug_register_view(kvm->arch.dbf, &debug_sprintf_view); | 195 | debug_register_view(kvm->arch.dbf, &debug_sprintf_view); |
196 | VM_EVENT(kvm, 3, "%s", "vm created"); | 196 | VM_EVENT(kvm, 3, "%s", "vm created"); |
197 | 197 | ||
198 | return kvm; | 198 | return kvm; |
199 | out_nodbf: | 199 | out_nodbf: |
200 | free_page((unsigned long)(kvm->arch.sca)); | 200 | free_page((unsigned long)(kvm->arch.sca)); |
201 | out_nosca: | 201 | out_nosca: |
202 | kfree(kvm); | 202 | kfree(kvm); |
203 | out_nokvm: | 203 | out_nokvm: |
204 | return ERR_PTR(rc); | 204 | return ERR_PTR(rc); |
205 | } | 205 | } |
206 | 206 | ||
207 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) | 207 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) |
208 | { | 208 | { |
209 | VCPU_EVENT(vcpu, 3, "%s", "free cpu"); | 209 | VCPU_EVENT(vcpu, 3, "%s", "free cpu"); |
210 | clear_bit(63 - vcpu->vcpu_id, (unsigned long *) &vcpu->kvm->arch.sca->mcn); | 210 | clear_bit(63 - vcpu->vcpu_id, (unsigned long *) &vcpu->kvm->arch.sca->mcn); |
211 | if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda == | 211 | if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda == |
212 | (__u64) vcpu->arch.sie_block) | 212 | (__u64) vcpu->arch.sie_block) |
213 | vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; | 213 | vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; |
214 | smp_mb(); | 214 | smp_mb(); |
215 | free_page((unsigned long)(vcpu->arch.sie_block)); | 215 | free_page((unsigned long)(vcpu->arch.sie_block)); |
216 | kvm_vcpu_uninit(vcpu); | 216 | kvm_vcpu_uninit(vcpu); |
217 | kfree(vcpu); | 217 | kfree(vcpu); |
218 | } | 218 | } |
219 | 219 | ||
220 | static void kvm_free_vcpus(struct kvm *kvm) | 220 | static void kvm_free_vcpus(struct kvm *kvm) |
221 | { | 221 | { |
222 | unsigned int i; | 222 | unsigned int i; |
223 | struct kvm_vcpu *vcpu; | 223 | struct kvm_vcpu *vcpu; |
224 | 224 | ||
225 | kvm_for_each_vcpu(i, vcpu, kvm) | 225 | kvm_for_each_vcpu(i, vcpu, kvm) |
226 | kvm_arch_vcpu_destroy(vcpu); | 226 | kvm_arch_vcpu_destroy(vcpu); |
227 | 227 | ||
228 | mutex_lock(&kvm->lock); | 228 | mutex_lock(&kvm->lock); |
229 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) | 229 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) |
230 | kvm->vcpus[i] = NULL; | 230 | kvm->vcpus[i] = NULL; |
231 | 231 | ||
232 | atomic_set(&kvm->online_vcpus, 0); | 232 | atomic_set(&kvm->online_vcpus, 0); |
233 | mutex_unlock(&kvm->lock); | 233 | mutex_unlock(&kvm->lock); |
234 | } | 234 | } |
235 | 235 | ||
236 | void kvm_arch_sync_events(struct kvm *kvm) | 236 | void kvm_arch_sync_events(struct kvm *kvm) |
237 | { | 237 | { |
238 | } | 238 | } |
239 | 239 | ||
240 | void kvm_arch_destroy_vm(struct kvm *kvm) | 240 | void kvm_arch_destroy_vm(struct kvm *kvm) |
241 | { | 241 | { |
242 | kvm_free_vcpus(kvm); | 242 | kvm_free_vcpus(kvm); |
243 | kvm_free_physmem(kvm); | 243 | kvm_free_physmem(kvm); |
244 | free_page((unsigned long)(kvm->arch.sca)); | 244 | free_page((unsigned long)(kvm->arch.sca)); |
245 | debug_unregister(kvm->arch.dbf); | 245 | debug_unregister(kvm->arch.dbf); |
246 | cleanup_srcu_struct(&kvm->srcu); | 246 | cleanup_srcu_struct(&kvm->srcu); |
247 | kfree(kvm); | 247 | kfree(kvm); |
248 | } | 248 | } |
249 | 249 | ||
250 | /* Section: vcpu related */ | 250 | /* Section: vcpu related */ |
251 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) | 251 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) |
252 | { | 252 | { |
253 | return 0; | 253 | return 0; |
254 | } | 254 | } |
255 | 255 | ||
256 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) | 256 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) |
257 | { | 257 | { |
258 | /* Nothing todo */ | 258 | /* Nothing todo */ |
259 | } | 259 | } |
260 | 260 | ||
261 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) | 261 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) |
262 | { | 262 | { |
263 | save_fp_regs(&vcpu->arch.host_fpregs); | 263 | save_fp_regs(&vcpu->arch.host_fpregs); |
264 | save_access_regs(vcpu->arch.host_acrs); | 264 | save_access_regs(vcpu->arch.host_acrs); |
265 | vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK; | 265 | vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK; |
266 | restore_fp_regs(&vcpu->arch.guest_fpregs); | 266 | restore_fp_regs(&vcpu->arch.guest_fpregs); |
267 | restore_access_regs(vcpu->arch.guest_acrs); | 267 | restore_access_regs(vcpu->arch.guest_acrs); |
268 | } | 268 | } |
269 | 269 | ||
270 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) | 270 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) |
271 | { | 271 | { |
272 | save_fp_regs(&vcpu->arch.guest_fpregs); | 272 | save_fp_regs(&vcpu->arch.guest_fpregs); |
273 | save_access_regs(vcpu->arch.guest_acrs); | 273 | save_access_regs(vcpu->arch.guest_acrs); |
274 | restore_fp_regs(&vcpu->arch.host_fpregs); | 274 | restore_fp_regs(&vcpu->arch.host_fpregs); |
275 | restore_access_regs(vcpu->arch.host_acrs); | 275 | restore_access_regs(vcpu->arch.host_acrs); |
276 | } | 276 | } |
277 | 277 | ||
278 | static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu) | 278 | static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu) |
279 | { | 279 | { |
280 | /* this equals initial cpu reset in pop, but we don't switch to ESA */ | 280 | /* this equals initial cpu reset in pop, but we don't switch to ESA */ |
281 | vcpu->arch.sie_block->gpsw.mask = 0UL; | 281 | vcpu->arch.sie_block->gpsw.mask = 0UL; |
282 | vcpu->arch.sie_block->gpsw.addr = 0UL; | 282 | vcpu->arch.sie_block->gpsw.addr = 0UL; |
283 | vcpu->arch.sie_block->prefix = 0UL; | 283 | vcpu->arch.sie_block->prefix = 0UL; |
284 | vcpu->arch.sie_block->ihcpu = 0xffff; | 284 | vcpu->arch.sie_block->ihcpu = 0xffff; |
285 | vcpu->arch.sie_block->cputm = 0UL; | 285 | vcpu->arch.sie_block->cputm = 0UL; |
286 | vcpu->arch.sie_block->ckc = 0UL; | 286 | vcpu->arch.sie_block->ckc = 0UL; |
287 | vcpu->arch.sie_block->todpr = 0; | 287 | vcpu->arch.sie_block->todpr = 0; |
288 | memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64)); | 288 | memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64)); |
289 | vcpu->arch.sie_block->gcr[0] = 0xE0UL; | 289 | vcpu->arch.sie_block->gcr[0] = 0xE0UL; |
290 | vcpu->arch.sie_block->gcr[14] = 0xC2000000UL; | 290 | vcpu->arch.sie_block->gcr[14] = 0xC2000000UL; |
291 | vcpu->arch.guest_fpregs.fpc = 0; | 291 | vcpu->arch.guest_fpregs.fpc = 0; |
292 | asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc)); | 292 | asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc)); |
293 | vcpu->arch.sie_block->gbea = 1; | 293 | vcpu->arch.sie_block->gbea = 1; |
294 | } | 294 | } |
295 | 295 | ||
296 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) | 296 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) |
297 | { | 297 | { |
298 | atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH); | 298 | atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH); |
299 | set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests); | 299 | set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests); |
300 | vcpu->arch.sie_block->ecb = 6; | 300 | vcpu->arch.sie_block->ecb = 6; |
301 | vcpu->arch.sie_block->eca = 0xC1002001U; | 301 | vcpu->arch.sie_block->eca = 0xC1002001U; |
302 | vcpu->arch.sie_block->fac = (int) (long) facilities; | 302 | vcpu->arch.sie_block->fac = (int) (long) facilities; |
303 | hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); | 303 | hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); |
304 | tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet, | 304 | tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet, |
305 | (unsigned long) vcpu); | 305 | (unsigned long) vcpu); |
306 | vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup; | 306 | vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup; |
307 | get_cpu_id(&vcpu->arch.cpu_id); | 307 | get_cpu_id(&vcpu->arch.cpu_id); |
308 | vcpu->arch.cpu_id.version = 0xff; | 308 | vcpu->arch.cpu_id.version = 0xff; |
309 | return 0; | 309 | return 0; |
310 | } | 310 | } |
311 | 311 | ||
312 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, | 312 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, |
313 | unsigned int id) | 313 | unsigned int id) |
314 | { | 314 | { |
315 | struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); | 315 | struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); |
316 | int rc = -ENOMEM; | 316 | int rc = -ENOMEM; |
317 | 317 | ||
318 | if (!vcpu) | 318 | if (!vcpu) |
319 | goto out_nomem; | 319 | goto out_nomem; |
320 | 320 | ||
321 | vcpu->arch.sie_block = (struct kvm_s390_sie_block *) | 321 | vcpu->arch.sie_block = (struct kvm_s390_sie_block *) |
322 | get_zeroed_page(GFP_KERNEL); | 322 | get_zeroed_page(GFP_KERNEL); |
323 | 323 | ||
324 | if (!vcpu->arch.sie_block) | 324 | if (!vcpu->arch.sie_block) |
325 | goto out_free_cpu; | 325 | goto out_free_cpu; |
326 | 326 | ||
327 | vcpu->arch.sie_block->icpua = id; | 327 | vcpu->arch.sie_block->icpua = id; |
328 | BUG_ON(!kvm->arch.sca); | 328 | BUG_ON(!kvm->arch.sca); |
329 | if (!kvm->arch.sca->cpu[id].sda) | 329 | if (!kvm->arch.sca->cpu[id].sda) |
330 | kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block; | 330 | kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block; |
331 | vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32); | 331 | vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32); |
332 | vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca; | 332 | vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca; |
333 | set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn); | 333 | set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn); |
334 | 334 | ||
335 | spin_lock_init(&vcpu->arch.local_int.lock); | 335 | spin_lock_init(&vcpu->arch.local_int.lock); |
336 | INIT_LIST_HEAD(&vcpu->arch.local_int.list); | 336 | INIT_LIST_HEAD(&vcpu->arch.local_int.list); |
337 | vcpu->arch.local_int.float_int = &kvm->arch.float_int; | 337 | vcpu->arch.local_int.float_int = &kvm->arch.float_int; |
338 | spin_lock(&kvm->arch.float_int.lock); | 338 | spin_lock(&kvm->arch.float_int.lock); |
339 | kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int; | 339 | kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int; |
340 | init_waitqueue_head(&vcpu->arch.local_int.wq); | 340 | init_waitqueue_head(&vcpu->arch.local_int.wq); |
341 | vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags; | 341 | vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags; |
342 | spin_unlock(&kvm->arch.float_int.lock); | 342 | spin_unlock(&kvm->arch.float_int.lock); |
343 | 343 | ||
344 | rc = kvm_vcpu_init(vcpu, kvm, id); | 344 | rc = kvm_vcpu_init(vcpu, kvm, id); |
345 | if (rc) | 345 | if (rc) |
346 | goto out_free_sie_block; | 346 | goto out_free_sie_block; |
347 | VM_EVENT(kvm, 3, "create cpu %d at %p, sie block at %p", id, vcpu, | 347 | VM_EVENT(kvm, 3, "create cpu %d at %p, sie block at %p", id, vcpu, |
348 | vcpu->arch.sie_block); | 348 | vcpu->arch.sie_block); |
349 | 349 | ||
350 | return vcpu; | 350 | return vcpu; |
351 | out_free_sie_block: | 351 | out_free_sie_block: |
352 | free_page((unsigned long)(vcpu->arch.sie_block)); | 352 | free_page((unsigned long)(vcpu->arch.sie_block)); |
353 | out_free_cpu: | 353 | out_free_cpu: |
354 | kfree(vcpu); | 354 | kfree(vcpu); |
355 | out_nomem: | 355 | out_nomem: |
356 | return ERR_PTR(rc); | 356 | return ERR_PTR(rc); |
357 | } | 357 | } |
358 | 358 | ||
359 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) | 359 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) |
360 | { | 360 | { |
361 | /* kvm common code refers to this, but never calls it */ | 361 | /* kvm common code refers to this, but never calls it */ |
362 | BUG(); | 362 | BUG(); |
363 | return 0; | 363 | return 0; |
364 | } | 364 | } |
365 | 365 | ||
366 | static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) | 366 | static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) |
367 | { | 367 | { |
368 | kvm_s390_vcpu_initial_reset(vcpu); | 368 | kvm_s390_vcpu_initial_reset(vcpu); |
369 | return 0; | 369 | return 0; |
370 | } | 370 | } |
371 | 371 | ||
372 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 372 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
373 | { | 373 | { |
374 | memcpy(&vcpu->arch.guest_gprs, ®s->gprs, sizeof(regs->gprs)); | 374 | memcpy(&vcpu->arch.guest_gprs, ®s->gprs, sizeof(regs->gprs)); |
375 | return 0; | 375 | return 0; |
376 | } | 376 | } |
377 | 377 | ||
378 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 378 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
379 | { | 379 | { |
380 | memcpy(®s->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs)); | 380 | memcpy(®s->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs)); |
381 | return 0; | 381 | return 0; |
382 | } | 382 | } |
383 | 383 | ||
384 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, | 384 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, |
385 | struct kvm_sregs *sregs) | 385 | struct kvm_sregs *sregs) |
386 | { | 386 | { |
387 | memcpy(&vcpu->arch.guest_acrs, &sregs->acrs, sizeof(sregs->acrs)); | 387 | memcpy(&vcpu->arch.guest_acrs, &sregs->acrs, sizeof(sregs->acrs)); |
388 | memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs)); | 388 | memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs)); |
389 | return 0; | 389 | return 0; |
390 | } | 390 | } |
391 | 391 | ||
392 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, | 392 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, |
393 | struct kvm_sregs *sregs) | 393 | struct kvm_sregs *sregs) |
394 | { | 394 | { |
395 | memcpy(&sregs->acrs, &vcpu->arch.guest_acrs, sizeof(sregs->acrs)); | 395 | memcpy(&sregs->acrs, &vcpu->arch.guest_acrs, sizeof(sregs->acrs)); |
396 | memcpy(&sregs->crs, &vcpu->arch.sie_block->gcr, sizeof(sregs->crs)); | 396 | memcpy(&sregs->crs, &vcpu->arch.sie_block->gcr, sizeof(sregs->crs)); |
397 | return 0; | 397 | return 0; |
398 | } | 398 | } |
399 | 399 | ||
400 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 400 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
401 | { | 401 | { |
402 | memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs)); | 402 | memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs)); |
403 | vcpu->arch.guest_fpregs.fpc = fpu->fpc; | 403 | vcpu->arch.guest_fpregs.fpc = fpu->fpc; |
404 | return 0; | 404 | return 0; |
405 | } | 405 | } |
406 | 406 | ||
407 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 407 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
408 | { | 408 | { |
409 | memcpy(&fpu->fprs, &vcpu->arch.guest_fpregs.fprs, sizeof(fpu->fprs)); | 409 | memcpy(&fpu->fprs, &vcpu->arch.guest_fpregs.fprs, sizeof(fpu->fprs)); |
410 | fpu->fpc = vcpu->arch.guest_fpregs.fpc; | 410 | fpu->fpc = vcpu->arch.guest_fpregs.fpc; |
411 | return 0; | 411 | return 0; |
412 | } | 412 | } |
413 | 413 | ||
414 | static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw) | 414 | static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw) |
415 | { | 415 | { |
416 | int rc = 0; | 416 | int rc = 0; |
417 | 417 | ||
418 | if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING) | 418 | if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING) |
419 | rc = -EBUSY; | 419 | rc = -EBUSY; |
420 | else { | 420 | else { |
421 | vcpu->run->psw_mask = psw.mask; | 421 | vcpu->run->psw_mask = psw.mask; |
422 | vcpu->run->psw_addr = psw.addr; | 422 | vcpu->run->psw_addr = psw.addr; |
423 | } | 423 | } |
424 | return rc; | 424 | return rc; |
425 | } | 425 | } |
426 | 426 | ||
427 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, | 427 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, |
428 | struct kvm_translation *tr) | 428 | struct kvm_translation *tr) |
429 | { | 429 | { |
430 | return -EINVAL; /* not implemented yet */ | 430 | return -EINVAL; /* not implemented yet */ |
431 | } | 431 | } |
432 | 432 | ||
433 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, | 433 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, |
434 | struct kvm_guest_debug *dbg) | 434 | struct kvm_guest_debug *dbg) |
435 | { | 435 | { |
436 | return -EINVAL; /* not implemented yet */ | 436 | return -EINVAL; /* not implemented yet */ |
437 | } | 437 | } |
438 | 438 | ||
439 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, | 439 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, |
440 | struct kvm_mp_state *mp_state) | 440 | struct kvm_mp_state *mp_state) |
441 | { | 441 | { |
442 | return -EINVAL; /* not implemented yet */ | 442 | return -EINVAL; /* not implemented yet */ |
443 | } | 443 | } |
444 | 444 | ||
445 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, | 445 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, |
446 | struct kvm_mp_state *mp_state) | 446 | struct kvm_mp_state *mp_state) |
447 | { | 447 | { |
448 | return -EINVAL; /* not implemented yet */ | 448 | return -EINVAL; /* not implemented yet */ |
449 | } | 449 | } |
450 | 450 | ||
451 | static void __vcpu_run(struct kvm_vcpu *vcpu) | 451 | static void __vcpu_run(struct kvm_vcpu *vcpu) |
452 | { | 452 | { |
453 | memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); | 453 | memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); |
454 | 454 | ||
455 | if (need_resched()) | 455 | if (need_resched()) |
456 | schedule(); | 456 | schedule(); |
457 | 457 | ||
458 | if (test_thread_flag(TIF_MCCK_PENDING)) | 458 | if (test_thread_flag(TIF_MCCK_PENDING)) |
459 | s390_handle_mcck(); | 459 | s390_handle_mcck(); |
460 | 460 | ||
461 | kvm_s390_deliver_pending_interrupts(vcpu); | 461 | kvm_s390_deliver_pending_interrupts(vcpu); |
462 | 462 | ||
463 | vcpu->arch.sie_block->icptcode = 0; | 463 | vcpu->arch.sie_block->icptcode = 0; |
464 | local_irq_disable(); | 464 | local_irq_disable(); |
465 | kvm_guest_enter(); | 465 | kvm_guest_enter(); |
466 | local_irq_enable(); | 466 | local_irq_enable(); |
467 | VCPU_EVENT(vcpu, 6, "entering sie flags %x", | 467 | VCPU_EVENT(vcpu, 6, "entering sie flags %x", |
468 | atomic_read(&vcpu->arch.sie_block->cpuflags)); | 468 | atomic_read(&vcpu->arch.sie_block->cpuflags)); |
469 | if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) { | 469 | if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) { |
470 | VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); | 470 | VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); |
471 | kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); | 471 | kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); |
472 | } | 472 | } |
473 | VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", | 473 | VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", |
474 | vcpu->arch.sie_block->icptcode); | 474 | vcpu->arch.sie_block->icptcode); |
475 | local_irq_disable(); | 475 | local_irq_disable(); |
476 | kvm_guest_exit(); | 476 | kvm_guest_exit(); |
477 | local_irq_enable(); | 477 | local_irq_enable(); |
478 | 478 | ||
479 | memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16); | 479 | memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16); |
480 | } | 480 | } |
481 | 481 | ||
482 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 482 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
483 | { | 483 | { |
484 | int rc; | 484 | int rc; |
485 | sigset_t sigsaved; | 485 | sigset_t sigsaved; |
486 | 486 | ||
487 | rerun_vcpu: | 487 | rerun_vcpu: |
488 | if (vcpu->requests) | 488 | if (vcpu->requests) |
489 | if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) | 489 | if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) |
490 | kvm_s390_vcpu_set_mem(vcpu); | 490 | kvm_s390_vcpu_set_mem(vcpu); |
491 | 491 | ||
492 | /* verify, that memory has been registered */ | 492 | /* verify, that memory has been registered */ |
493 | if (!vcpu->arch.sie_block->gmslm) { | 493 | if (!vcpu->arch.sie_block->gmslm) { |
494 | vcpu_put(vcpu); | 494 | vcpu_put(vcpu); |
495 | VCPU_EVENT(vcpu, 3, "%s", "no memory registered to run vcpu"); | 495 | VCPU_EVENT(vcpu, 3, "%s", "no memory registered to run vcpu"); |
496 | return -EINVAL; | 496 | return -EINVAL; |
497 | } | 497 | } |
498 | 498 | ||
499 | if (vcpu->sigset_active) | 499 | if (vcpu->sigset_active) |
500 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); | 500 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); |
501 | 501 | ||
502 | atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags); | 502 | atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags); |
503 | 503 | ||
504 | BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL); | 504 | BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL); |
505 | 505 | ||
506 | switch (kvm_run->exit_reason) { | 506 | switch (kvm_run->exit_reason) { |
507 | case KVM_EXIT_S390_SIEIC: | 507 | case KVM_EXIT_S390_SIEIC: |
508 | case KVM_EXIT_UNKNOWN: | 508 | case KVM_EXIT_UNKNOWN: |
509 | case KVM_EXIT_INTR: | 509 | case KVM_EXIT_INTR: |
510 | case KVM_EXIT_S390_RESET: | 510 | case KVM_EXIT_S390_RESET: |
511 | break; | 511 | break; |
512 | default: | 512 | default: |
513 | BUG(); | 513 | BUG(); |
514 | } | 514 | } |
515 | 515 | ||
516 | vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask; | 516 | vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask; |
517 | vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr; | 517 | vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr; |
518 | 518 | ||
519 | might_fault(); | 519 | might_fault(); |
520 | 520 | ||
521 | do { | 521 | do { |
522 | __vcpu_run(vcpu); | 522 | __vcpu_run(vcpu); |
523 | rc = kvm_handle_sie_intercept(vcpu); | 523 | rc = kvm_handle_sie_intercept(vcpu); |
524 | } while (!signal_pending(current) && !rc); | 524 | } while (!signal_pending(current) && !rc); |
525 | 525 | ||
526 | if (rc == SIE_INTERCEPT_RERUNVCPU) | 526 | if (rc == SIE_INTERCEPT_RERUNVCPU) |
527 | goto rerun_vcpu; | 527 | goto rerun_vcpu; |
528 | 528 | ||
529 | if (signal_pending(current) && !rc) { | 529 | if (signal_pending(current) && !rc) { |
530 | kvm_run->exit_reason = KVM_EXIT_INTR; | 530 | kvm_run->exit_reason = KVM_EXIT_INTR; |
531 | rc = -EINTR; | 531 | rc = -EINTR; |
532 | } | 532 | } |
533 | 533 | ||
534 | if (rc == -EOPNOTSUPP) { | 534 | if (rc == -EOPNOTSUPP) { |
535 | /* intercept cannot be handled in-kernel, prepare kvm-run */ | 535 | /* intercept cannot be handled in-kernel, prepare kvm-run */ |
536 | kvm_run->exit_reason = KVM_EXIT_S390_SIEIC; | 536 | kvm_run->exit_reason = KVM_EXIT_S390_SIEIC; |
537 | kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode; | 537 | kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode; |
538 | kvm_run->s390_sieic.ipa = vcpu->arch.sie_block->ipa; | 538 | kvm_run->s390_sieic.ipa = vcpu->arch.sie_block->ipa; |
539 | kvm_run->s390_sieic.ipb = vcpu->arch.sie_block->ipb; | 539 | kvm_run->s390_sieic.ipb = vcpu->arch.sie_block->ipb; |
540 | rc = 0; | 540 | rc = 0; |
541 | } | 541 | } |
542 | 542 | ||
543 | if (rc == -EREMOTE) { | 543 | if (rc == -EREMOTE) { |
544 | /* intercept was handled, but userspace support is needed | 544 | /* intercept was handled, but userspace support is needed |
545 | * kvm_run has been prepared by the handler */ | 545 | * kvm_run has been prepared by the handler */ |
546 | rc = 0; | 546 | rc = 0; |
547 | } | 547 | } |
548 | 548 | ||
549 | kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask; | 549 | kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask; |
550 | kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr; | 550 | kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr; |
551 | 551 | ||
552 | if (vcpu->sigset_active) | 552 | if (vcpu->sigset_active) |
553 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); | 553 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); |
554 | 554 | ||
555 | vcpu->stat.exit_userspace++; | 555 | vcpu->stat.exit_userspace++; |
556 | return rc; | 556 | return rc; |
557 | } | 557 | } |
558 | 558 | ||
559 | static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from, | 559 | static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from, |
560 | unsigned long n, int prefix) | 560 | unsigned long n, int prefix) |
561 | { | 561 | { |
562 | if (prefix) | 562 | if (prefix) |
563 | return copy_to_guest(vcpu, guestdest, from, n); | 563 | return copy_to_guest(vcpu, guestdest, from, n); |
564 | else | 564 | else |
565 | return copy_to_guest_absolute(vcpu, guestdest, from, n); | 565 | return copy_to_guest_absolute(vcpu, guestdest, from, n); |
566 | } | 566 | } |
567 | 567 | ||
568 | /* | 568 | /* |
569 | * store status at address | 569 | * store status at address |
570 | * we use have two special cases: | 570 | * we use have two special cases: |
571 | * KVM_S390_STORE_STATUS_NOADDR: -> 0x1200 on 64 bit | 571 | * KVM_S390_STORE_STATUS_NOADDR: -> 0x1200 on 64 bit |
572 | * KVM_S390_STORE_STATUS_PREFIXED: -> prefix | 572 | * KVM_S390_STORE_STATUS_PREFIXED: -> prefix |
573 | */ | 573 | */ |
574 | int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) | 574 | int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) |
575 | { | 575 | { |
576 | const unsigned char archmode = 1; | 576 | const unsigned char archmode = 1; |
577 | int prefix; | 577 | int prefix; |
578 | 578 | ||
579 | if (addr == KVM_S390_STORE_STATUS_NOADDR) { | 579 | if (addr == KVM_S390_STORE_STATUS_NOADDR) { |
580 | if (copy_to_guest_absolute(vcpu, 163ul, &archmode, 1)) | 580 | if (copy_to_guest_absolute(vcpu, 163ul, &archmode, 1)) |
581 | return -EFAULT; | 581 | return -EFAULT; |
582 | addr = SAVE_AREA_BASE; | 582 | addr = SAVE_AREA_BASE; |
583 | prefix = 0; | 583 | prefix = 0; |
584 | } else if (addr == KVM_S390_STORE_STATUS_PREFIXED) { | 584 | } else if (addr == KVM_S390_STORE_STATUS_PREFIXED) { |
585 | if (copy_to_guest(vcpu, 163ul, &archmode, 1)) | 585 | if (copy_to_guest(vcpu, 163ul, &archmode, 1)) |
586 | return -EFAULT; | 586 | return -EFAULT; |
587 | addr = SAVE_AREA_BASE; | 587 | addr = SAVE_AREA_BASE; |
588 | prefix = 1; | 588 | prefix = 1; |
589 | } else | 589 | } else |
590 | prefix = 0; | 590 | prefix = 0; |
591 | 591 | ||
592 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs), | 592 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs), |
593 | vcpu->arch.guest_fpregs.fprs, 128, prefix)) | 593 | vcpu->arch.guest_fpregs.fprs, 128, prefix)) |
594 | return -EFAULT; | 594 | return -EFAULT; |
595 | 595 | ||
596 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, gp_regs), | 596 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, gp_regs), |
597 | vcpu->arch.guest_gprs, 128, prefix)) | 597 | vcpu->arch.guest_gprs, 128, prefix)) |
598 | return -EFAULT; | 598 | return -EFAULT; |
599 | 599 | ||
600 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, psw), | 600 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, psw), |
601 | &vcpu->arch.sie_block->gpsw, 16, prefix)) | 601 | &vcpu->arch.sie_block->gpsw, 16, prefix)) |
602 | return -EFAULT; | 602 | return -EFAULT; |
603 | 603 | ||
604 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, pref_reg), | 604 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, pref_reg), |
605 | &vcpu->arch.sie_block->prefix, 4, prefix)) | 605 | &vcpu->arch.sie_block->prefix, 4, prefix)) |
606 | return -EFAULT; | 606 | return -EFAULT; |
607 | 607 | ||
608 | if (__guestcopy(vcpu, | 608 | if (__guestcopy(vcpu, |
609 | addr + offsetof(struct save_area, fp_ctrl_reg), | 609 | addr + offsetof(struct save_area, fp_ctrl_reg), |
610 | &vcpu->arch.guest_fpregs.fpc, 4, prefix)) | 610 | &vcpu->arch.guest_fpregs.fpc, 4, prefix)) |
611 | return -EFAULT; | 611 | return -EFAULT; |
612 | 612 | ||
613 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, tod_reg), | 613 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, tod_reg), |
614 | &vcpu->arch.sie_block->todpr, 4, prefix)) | 614 | &vcpu->arch.sie_block->todpr, 4, prefix)) |
615 | return -EFAULT; | 615 | return -EFAULT; |
616 | 616 | ||
617 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, timer), | 617 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, timer), |
618 | &vcpu->arch.sie_block->cputm, 8, prefix)) | 618 | &vcpu->arch.sie_block->cputm, 8, prefix)) |
619 | return -EFAULT; | 619 | return -EFAULT; |
620 | 620 | ||
621 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, clk_cmp), | 621 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, clk_cmp), |
622 | &vcpu->arch.sie_block->ckc, 8, prefix)) | 622 | &vcpu->arch.sie_block->ckc, 8, prefix)) |
623 | return -EFAULT; | 623 | return -EFAULT; |
624 | 624 | ||
625 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, acc_regs), | 625 | if (__guestcopy(vcpu, addr + offsetof(struct save_area, acc_regs), |
626 | &vcpu->arch.guest_acrs, 64, prefix)) | 626 | &vcpu->arch.guest_acrs, 64, prefix)) |
627 | return -EFAULT; | 627 | return -EFAULT; |
628 | 628 | ||
629 | if (__guestcopy(vcpu, | 629 | if (__guestcopy(vcpu, |
630 | addr + offsetof(struct save_area, ctrl_regs), | 630 | addr + offsetof(struct save_area, ctrl_regs), |
631 | &vcpu->arch.sie_block->gcr, 128, prefix)) | 631 | &vcpu->arch.sie_block->gcr, 128, prefix)) |
632 | return -EFAULT; | 632 | return -EFAULT; |
633 | return 0; | 633 | return 0; |
634 | } | 634 | } |
635 | 635 | ||
636 | long kvm_arch_vcpu_ioctl(struct file *filp, | 636 | long kvm_arch_vcpu_ioctl(struct file *filp, |
637 | unsigned int ioctl, unsigned long arg) | 637 | unsigned int ioctl, unsigned long arg) |
638 | { | 638 | { |
639 | struct kvm_vcpu *vcpu = filp->private_data; | 639 | struct kvm_vcpu *vcpu = filp->private_data; |
640 | void __user *argp = (void __user *)arg; | 640 | void __user *argp = (void __user *)arg; |
641 | long r; | 641 | long r; |
642 | 642 | ||
643 | switch (ioctl) { | 643 | switch (ioctl) { |
644 | case KVM_S390_INTERRUPT: { | 644 | case KVM_S390_INTERRUPT: { |
645 | struct kvm_s390_interrupt s390int; | 645 | struct kvm_s390_interrupt s390int; |
646 | 646 | ||
647 | r = -EFAULT; | 647 | r = -EFAULT; |
648 | if (copy_from_user(&s390int, argp, sizeof(s390int))) | 648 | if (copy_from_user(&s390int, argp, sizeof(s390int))) |
649 | break; | 649 | break; |
650 | r = kvm_s390_inject_vcpu(vcpu, &s390int); | 650 | r = kvm_s390_inject_vcpu(vcpu, &s390int); |
651 | break; | 651 | break; |
652 | } | 652 | } |
653 | case KVM_S390_STORE_STATUS: | 653 | case KVM_S390_STORE_STATUS: |
654 | r = kvm_s390_vcpu_store_status(vcpu, arg); | 654 | r = kvm_s390_vcpu_store_status(vcpu, arg); |
655 | break; | 655 | break; |
656 | case KVM_S390_SET_INITIAL_PSW: { | 656 | case KVM_S390_SET_INITIAL_PSW: { |
657 | psw_t psw; | 657 | psw_t psw; |
658 | 658 | ||
659 | r = -EFAULT; | 659 | r = -EFAULT; |
660 | if (copy_from_user(&psw, argp, sizeof(psw))) | 660 | if (copy_from_user(&psw, argp, sizeof(psw))) |
661 | break; | 661 | break; |
662 | r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw); | 662 | r = kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw); |
663 | break; | 663 | break; |
664 | } | 664 | } |
665 | case KVM_S390_INITIAL_RESET: | 665 | case KVM_S390_INITIAL_RESET: |
666 | r = kvm_arch_vcpu_ioctl_initial_reset(vcpu); | 666 | r = kvm_arch_vcpu_ioctl_initial_reset(vcpu); |
667 | break; | 667 | break; |
668 | default: | 668 | default: |
669 | r = -EINVAL; | 669 | r = -EINVAL; |
670 | } | 670 | } |
671 | return r; | 671 | return r; |
672 | } | 672 | } |
673 | 673 | ||
674 | /* Section: memory related */ | 674 | /* Section: memory related */ |
675 | int kvm_arch_prepare_memory_region(struct kvm *kvm, | 675 | int kvm_arch_prepare_memory_region(struct kvm *kvm, |
676 | struct kvm_memory_slot *memslot, | 676 | struct kvm_memory_slot *memslot, |
677 | struct kvm_memory_slot old, | 677 | struct kvm_memory_slot old, |
678 | struct kvm_userspace_memory_region *mem, | 678 | struct kvm_userspace_memory_region *mem, |
679 | int user_alloc) | 679 | int user_alloc) |
680 | { | 680 | { |
681 | /* A few sanity checks. We can have exactly one memory slot which has | 681 | /* A few sanity checks. We can have exactly one memory slot which has |
682 | to start at guest virtual zero and which has to be located at a | 682 | to start at guest virtual zero and which has to be located at a |
683 | page boundary in userland and which has to end at a page boundary. | 683 | page boundary in userland and which has to end at a page boundary. |
684 | The memory in userland is ok to be fragmented into various different | 684 | The memory in userland is ok to be fragmented into various different |
685 | vmas. It is okay to mmap() and munmap() stuff in this slot after | 685 | vmas. It is okay to mmap() and munmap() stuff in this slot after |
686 | doing this call at any time */ | 686 | doing this call at any time */ |
687 | 687 | ||
688 | if (mem->slot) | 688 | if (mem->slot) |
689 | return -EINVAL; | 689 | return -EINVAL; |
690 | 690 | ||
691 | if (mem->guest_phys_addr) | 691 | if (mem->guest_phys_addr) |
692 | return -EINVAL; | 692 | return -EINVAL; |
693 | 693 | ||
694 | if (mem->userspace_addr & (PAGE_SIZE - 1)) | 694 | if (mem->userspace_addr & (PAGE_SIZE - 1)) |
695 | return -EINVAL; | 695 | return -EINVAL; |
696 | 696 | ||
697 | if (mem->memory_size & (PAGE_SIZE - 1)) | 697 | if (mem->memory_size & (PAGE_SIZE - 1)) |
698 | return -EINVAL; | 698 | return -EINVAL; |
699 | 699 | ||
700 | if (!user_alloc) | 700 | if (!user_alloc) |
701 | return -EINVAL; | 701 | return -EINVAL; |
702 | 702 | ||
703 | return 0; | 703 | return 0; |
704 | } | 704 | } |
705 | 705 | ||
706 | void kvm_arch_commit_memory_region(struct kvm *kvm, | 706 | void kvm_arch_commit_memory_region(struct kvm *kvm, |
707 | struct kvm_userspace_memory_region *mem, | 707 | struct kvm_userspace_memory_region *mem, |
708 | struct kvm_memory_slot old, | 708 | struct kvm_memory_slot old, |
709 | int user_alloc) | 709 | int user_alloc) |
710 | { | 710 | { |
711 | int i; | 711 | int i; |
712 | struct kvm_vcpu *vcpu; | 712 | struct kvm_vcpu *vcpu; |
713 | 713 | ||
714 | /* request update of sie control block for all available vcpus */ | 714 | /* request update of sie control block for all available vcpus */ |
715 | kvm_for_each_vcpu(i, vcpu, kvm) { | 715 | kvm_for_each_vcpu(i, vcpu, kvm) { |
716 | if (test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) | 716 | if (test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) |
717 | continue; | 717 | continue; |
718 | kvm_s390_inject_sigp_stop(vcpu, ACTION_RELOADVCPU_ON_STOP); | 718 | kvm_s390_inject_sigp_stop(vcpu, ACTION_RELOADVCPU_ON_STOP); |
719 | } | 719 | } |
720 | } | 720 | } |
721 | 721 | ||
722 | void kvm_arch_flush_shadow(struct kvm *kvm) | 722 | void kvm_arch_flush_shadow(struct kvm *kvm) |
723 | { | 723 | { |
724 | } | 724 | } |
725 | 725 | ||
726 | gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) | ||
727 | { | ||
728 | return gfn; | ||
729 | } | ||
730 | |||
731 | static int __init kvm_s390_init(void) | 726 | static int __init kvm_s390_init(void) |
732 | { | 727 | { |
733 | int ret; | 728 | int ret; |
734 | ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); | 729 | ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); |
735 | if (ret) | 730 | if (ret) |
736 | return ret; | 731 | return ret; |
737 | 732 | ||
738 | /* | 733 | /* |
739 | * guests can ask for up to 255+1 double words, we need a full page | 734 | * guests can ask for up to 255+1 double words, we need a full page |
740 | * to hold the maximum amount of facilites. On the other hand, we | 735 | * to hold the maximum amount of facilites. On the other hand, we |
741 | * only set facilities that are known to work in KVM. | 736 | * only set facilities that are known to work in KVM. |
742 | */ | 737 | */ |
743 | facilities = (unsigned long long *) get_zeroed_page(GFP_KERNEL|GFP_DMA); | 738 | facilities = (unsigned long long *) get_zeroed_page(GFP_KERNEL|GFP_DMA); |
744 | if (!facilities) { | 739 | if (!facilities) { |
745 | kvm_exit(); | 740 | kvm_exit(); |
746 | return -ENOMEM; | 741 | return -ENOMEM; |
747 | } | 742 | } |
748 | stfle(facilities, 1); | 743 | stfle(facilities, 1); |
749 | facilities[0] &= 0xff00fff3f0700000ULL; | 744 | facilities[0] &= 0xff00fff3f0700000ULL; |
750 | return 0; | 745 | return 0; |
751 | } | 746 | } |
752 | 747 | ||
753 | static void __exit kvm_s390_exit(void) | 748 | static void __exit kvm_s390_exit(void) |
754 | { | 749 | { |
755 | free_page((unsigned long) facilities); | 750 | free_page((unsigned long) facilities); |
756 | kvm_exit(); | 751 | kvm_exit(); |
757 | } | 752 | } |
758 | 753 | ||
759 | module_init(kvm_s390_init); | 754 | module_init(kvm_s390_init); |
760 | module_exit(kvm_s390_exit); | 755 | module_exit(kvm_s390_exit); |
761 | 756 |
arch/x86/include/asm/kvm_host.h
1 | /* | 1 | /* |
2 | * Kernel-based Virtual Machine driver for Linux | 2 | * Kernel-based Virtual Machine driver for Linux |
3 | * | 3 | * |
4 | * This header defines architecture specific interfaces, x86 version | 4 | * This header defines architecture specific interfaces, x86 version |
5 | * | 5 | * |
6 | * This work is licensed under the terms of the GNU GPL, version 2. See | 6 | * This work is licensed under the terms of the GNU GPL, version 2. See |
7 | * the COPYING file in the top-level directory. | 7 | * the COPYING file in the top-level directory. |
8 | * | 8 | * |
9 | */ | 9 | */ |
10 | 10 | ||
11 | #ifndef _ASM_X86_KVM_HOST_H | 11 | #ifndef _ASM_X86_KVM_HOST_H |
12 | #define _ASM_X86_KVM_HOST_H | 12 | #define _ASM_X86_KVM_HOST_H |
13 | 13 | ||
14 | #include <linux/types.h> | 14 | #include <linux/types.h> |
15 | #include <linux/mm.h> | 15 | #include <linux/mm.h> |
16 | #include <linux/mmu_notifier.h> | 16 | #include <linux/mmu_notifier.h> |
17 | #include <linux/tracepoint.h> | 17 | #include <linux/tracepoint.h> |
18 | 18 | ||
19 | #include <linux/kvm.h> | 19 | #include <linux/kvm.h> |
20 | #include <linux/kvm_para.h> | 20 | #include <linux/kvm_para.h> |
21 | #include <linux/kvm_types.h> | 21 | #include <linux/kvm_types.h> |
22 | 22 | ||
23 | #include <asm/pvclock-abi.h> | 23 | #include <asm/pvclock-abi.h> |
24 | #include <asm/desc.h> | 24 | #include <asm/desc.h> |
25 | #include <asm/mtrr.h> | 25 | #include <asm/mtrr.h> |
26 | #include <asm/msr-index.h> | 26 | #include <asm/msr-index.h> |
27 | 27 | ||
28 | #define KVM_MAX_VCPUS 64 | 28 | #define KVM_MAX_VCPUS 64 |
29 | #define KVM_MEMORY_SLOTS 32 | 29 | #define KVM_MEMORY_SLOTS 32 |
30 | /* memory slots that does not exposed to userspace */ | 30 | /* memory slots that does not exposed to userspace */ |
31 | #define KVM_PRIVATE_MEM_SLOTS 4 | 31 | #define KVM_PRIVATE_MEM_SLOTS 4 |
32 | 32 | ||
33 | #define KVM_PIO_PAGE_OFFSET 1 | 33 | #define KVM_PIO_PAGE_OFFSET 1 |
34 | #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 | 34 | #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 |
35 | 35 | ||
36 | #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1) | 36 | #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1) |
37 | #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD)) | 37 | #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD)) |
38 | #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \ | 38 | #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \ |
39 | 0xFFFFFF0000000000ULL) | 39 | 0xFFFFFF0000000000ULL) |
40 | 40 | ||
41 | #define INVALID_PAGE (~(hpa_t)0) | 41 | #define INVALID_PAGE (~(hpa_t)0) |
42 | #define UNMAPPED_GVA (~(gpa_t)0) | 42 | #define UNMAPPED_GVA (~(gpa_t)0) |
43 | 43 | ||
44 | /* KVM Hugepage definitions for x86 */ | 44 | /* KVM Hugepage definitions for x86 */ |
45 | #define KVM_NR_PAGE_SIZES 3 | 45 | #define KVM_NR_PAGE_SIZES 3 |
46 | #define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9)) | 46 | #define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9)) |
47 | #define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x)) | 47 | #define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x)) |
48 | #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1)) | 48 | #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1)) |
49 | #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE) | 49 | #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE) |
50 | 50 | ||
51 | #define DE_VECTOR 0 | 51 | #define DE_VECTOR 0 |
52 | #define DB_VECTOR 1 | 52 | #define DB_VECTOR 1 |
53 | #define BP_VECTOR 3 | 53 | #define BP_VECTOR 3 |
54 | #define OF_VECTOR 4 | 54 | #define OF_VECTOR 4 |
55 | #define BR_VECTOR 5 | 55 | #define BR_VECTOR 5 |
56 | #define UD_VECTOR 6 | 56 | #define UD_VECTOR 6 |
57 | #define NM_VECTOR 7 | 57 | #define NM_VECTOR 7 |
58 | #define DF_VECTOR 8 | 58 | #define DF_VECTOR 8 |
59 | #define TS_VECTOR 10 | 59 | #define TS_VECTOR 10 |
60 | #define NP_VECTOR 11 | 60 | #define NP_VECTOR 11 |
61 | #define SS_VECTOR 12 | 61 | #define SS_VECTOR 12 |
62 | #define GP_VECTOR 13 | 62 | #define GP_VECTOR 13 |
63 | #define PF_VECTOR 14 | 63 | #define PF_VECTOR 14 |
64 | #define MF_VECTOR 16 | 64 | #define MF_VECTOR 16 |
65 | #define MC_VECTOR 18 | 65 | #define MC_VECTOR 18 |
66 | 66 | ||
67 | #define SELECTOR_TI_MASK (1 << 2) | 67 | #define SELECTOR_TI_MASK (1 << 2) |
68 | #define SELECTOR_RPL_MASK 0x03 | 68 | #define SELECTOR_RPL_MASK 0x03 |
69 | 69 | ||
70 | #define IOPL_SHIFT 12 | 70 | #define IOPL_SHIFT 12 |
71 | 71 | ||
72 | #define KVM_ALIAS_SLOTS 4 | ||
73 | |||
74 | #define KVM_PERMILLE_MMU_PAGES 20 | 72 | #define KVM_PERMILLE_MMU_PAGES 20 |
75 | #define KVM_MIN_ALLOC_MMU_PAGES 64 | 73 | #define KVM_MIN_ALLOC_MMU_PAGES 64 |
76 | #define KVM_MMU_HASH_SHIFT 10 | 74 | #define KVM_MMU_HASH_SHIFT 10 |
77 | #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT) | 75 | #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT) |
78 | #define KVM_MIN_FREE_MMU_PAGES 5 | 76 | #define KVM_MIN_FREE_MMU_PAGES 5 |
79 | #define KVM_REFILL_PAGES 25 | 77 | #define KVM_REFILL_PAGES 25 |
80 | #define KVM_MAX_CPUID_ENTRIES 40 | 78 | #define KVM_MAX_CPUID_ENTRIES 40 |
81 | #define KVM_NR_FIXED_MTRR_REGION 88 | 79 | #define KVM_NR_FIXED_MTRR_REGION 88 |
82 | #define KVM_NR_VAR_MTRR 8 | 80 | #define KVM_NR_VAR_MTRR 8 |
83 | 81 | ||
84 | extern spinlock_t kvm_lock; | 82 | extern spinlock_t kvm_lock; |
85 | extern struct list_head vm_list; | 83 | extern struct list_head vm_list; |
86 | 84 | ||
87 | struct kvm_vcpu; | 85 | struct kvm_vcpu; |
88 | struct kvm; | 86 | struct kvm; |
89 | 87 | ||
90 | enum kvm_reg { | 88 | enum kvm_reg { |
91 | VCPU_REGS_RAX = 0, | 89 | VCPU_REGS_RAX = 0, |
92 | VCPU_REGS_RCX = 1, | 90 | VCPU_REGS_RCX = 1, |
93 | VCPU_REGS_RDX = 2, | 91 | VCPU_REGS_RDX = 2, |
94 | VCPU_REGS_RBX = 3, | 92 | VCPU_REGS_RBX = 3, |
95 | VCPU_REGS_RSP = 4, | 93 | VCPU_REGS_RSP = 4, |
96 | VCPU_REGS_RBP = 5, | 94 | VCPU_REGS_RBP = 5, |
97 | VCPU_REGS_RSI = 6, | 95 | VCPU_REGS_RSI = 6, |
98 | VCPU_REGS_RDI = 7, | 96 | VCPU_REGS_RDI = 7, |
99 | #ifdef CONFIG_X86_64 | 97 | #ifdef CONFIG_X86_64 |
100 | VCPU_REGS_R8 = 8, | 98 | VCPU_REGS_R8 = 8, |
101 | VCPU_REGS_R9 = 9, | 99 | VCPU_REGS_R9 = 9, |
102 | VCPU_REGS_R10 = 10, | 100 | VCPU_REGS_R10 = 10, |
103 | VCPU_REGS_R11 = 11, | 101 | VCPU_REGS_R11 = 11, |
104 | VCPU_REGS_R12 = 12, | 102 | VCPU_REGS_R12 = 12, |
105 | VCPU_REGS_R13 = 13, | 103 | VCPU_REGS_R13 = 13, |
106 | VCPU_REGS_R14 = 14, | 104 | VCPU_REGS_R14 = 14, |
107 | VCPU_REGS_R15 = 15, | 105 | VCPU_REGS_R15 = 15, |
108 | #endif | 106 | #endif |
109 | VCPU_REGS_RIP, | 107 | VCPU_REGS_RIP, |
110 | NR_VCPU_REGS | 108 | NR_VCPU_REGS |
111 | }; | 109 | }; |
112 | 110 | ||
113 | enum kvm_reg_ex { | 111 | enum kvm_reg_ex { |
114 | VCPU_EXREG_PDPTR = NR_VCPU_REGS, | 112 | VCPU_EXREG_PDPTR = NR_VCPU_REGS, |
115 | }; | 113 | }; |
116 | 114 | ||
117 | enum { | 115 | enum { |
118 | VCPU_SREG_ES, | 116 | VCPU_SREG_ES, |
119 | VCPU_SREG_CS, | 117 | VCPU_SREG_CS, |
120 | VCPU_SREG_SS, | 118 | VCPU_SREG_SS, |
121 | VCPU_SREG_DS, | 119 | VCPU_SREG_DS, |
122 | VCPU_SREG_FS, | 120 | VCPU_SREG_FS, |
123 | VCPU_SREG_GS, | 121 | VCPU_SREG_GS, |
124 | VCPU_SREG_TR, | 122 | VCPU_SREG_TR, |
125 | VCPU_SREG_LDTR, | 123 | VCPU_SREG_LDTR, |
126 | }; | 124 | }; |
127 | 125 | ||
128 | #include <asm/kvm_emulate.h> | 126 | #include <asm/kvm_emulate.h> |
129 | 127 | ||
130 | #define KVM_NR_MEM_OBJS 40 | 128 | #define KVM_NR_MEM_OBJS 40 |
131 | 129 | ||
132 | #define KVM_NR_DB_REGS 4 | 130 | #define KVM_NR_DB_REGS 4 |
133 | 131 | ||
134 | #define DR6_BD (1 << 13) | 132 | #define DR6_BD (1 << 13) |
135 | #define DR6_BS (1 << 14) | 133 | #define DR6_BS (1 << 14) |
136 | #define DR6_FIXED_1 0xffff0ff0 | 134 | #define DR6_FIXED_1 0xffff0ff0 |
137 | #define DR6_VOLATILE 0x0000e00f | 135 | #define DR6_VOLATILE 0x0000e00f |
138 | 136 | ||
139 | #define DR7_BP_EN_MASK 0x000000ff | 137 | #define DR7_BP_EN_MASK 0x000000ff |
140 | #define DR7_GE (1 << 9) | 138 | #define DR7_GE (1 << 9) |
141 | #define DR7_GD (1 << 13) | 139 | #define DR7_GD (1 << 13) |
142 | #define DR7_FIXED_1 0x00000400 | 140 | #define DR7_FIXED_1 0x00000400 |
143 | #define DR7_VOLATILE 0xffff23ff | 141 | #define DR7_VOLATILE 0xffff23ff |
144 | 142 | ||
145 | /* | 143 | /* |
146 | * We don't want allocation failures within the mmu code, so we preallocate | 144 | * We don't want allocation failures within the mmu code, so we preallocate |
147 | * enough memory for a single page fault in a cache. | 145 | * enough memory for a single page fault in a cache. |
148 | */ | 146 | */ |
149 | struct kvm_mmu_memory_cache { | 147 | struct kvm_mmu_memory_cache { |
150 | int nobjs; | 148 | int nobjs; |
151 | void *objects[KVM_NR_MEM_OBJS]; | 149 | void *objects[KVM_NR_MEM_OBJS]; |
152 | }; | 150 | }; |
153 | 151 | ||
154 | #define NR_PTE_CHAIN_ENTRIES 5 | 152 | #define NR_PTE_CHAIN_ENTRIES 5 |
155 | 153 | ||
156 | struct kvm_pte_chain { | 154 | struct kvm_pte_chain { |
157 | u64 *parent_ptes[NR_PTE_CHAIN_ENTRIES]; | 155 | u64 *parent_ptes[NR_PTE_CHAIN_ENTRIES]; |
158 | struct hlist_node link; | 156 | struct hlist_node link; |
159 | }; | 157 | }; |
160 | 158 | ||
161 | /* | 159 | /* |
162 | * kvm_mmu_page_role, below, is defined as: | 160 | * kvm_mmu_page_role, below, is defined as: |
163 | * | 161 | * |
164 | * bits 0:3 - total guest paging levels (2-4, or zero for real mode) | 162 | * bits 0:3 - total guest paging levels (2-4, or zero for real mode) |
165 | * bits 4:7 - page table level for this shadow (1-4) | 163 | * bits 4:7 - page table level for this shadow (1-4) |
166 | * bits 8:9 - page table quadrant for 2-level guests | 164 | * bits 8:9 - page table quadrant for 2-level guests |
167 | * bit 16 - direct mapping of virtual to physical mapping at gfn | 165 | * bit 16 - direct mapping of virtual to physical mapping at gfn |
168 | * used for real mode and two-dimensional paging | 166 | * used for real mode and two-dimensional paging |
169 | * bits 17:19 - common access permissions for all ptes in this shadow page | 167 | * bits 17:19 - common access permissions for all ptes in this shadow page |
170 | */ | 168 | */ |
171 | union kvm_mmu_page_role { | 169 | union kvm_mmu_page_role { |
172 | unsigned word; | 170 | unsigned word; |
173 | struct { | 171 | struct { |
174 | unsigned level:4; | 172 | unsigned level:4; |
175 | unsigned cr4_pae:1; | 173 | unsigned cr4_pae:1; |
176 | unsigned quadrant:2; | 174 | unsigned quadrant:2; |
177 | unsigned pad_for_nice_hex_output:6; | 175 | unsigned pad_for_nice_hex_output:6; |
178 | unsigned direct:1; | 176 | unsigned direct:1; |
179 | unsigned access:3; | 177 | unsigned access:3; |
180 | unsigned invalid:1; | 178 | unsigned invalid:1; |
181 | unsigned nxe:1; | 179 | unsigned nxe:1; |
182 | unsigned cr0_wp:1; | 180 | unsigned cr0_wp:1; |
183 | }; | 181 | }; |
184 | }; | 182 | }; |
185 | 183 | ||
186 | struct kvm_mmu_page { | 184 | struct kvm_mmu_page { |
187 | struct list_head link; | 185 | struct list_head link; |
188 | struct hlist_node hash_link; | 186 | struct hlist_node hash_link; |
189 | 187 | ||
190 | /* | 188 | /* |
191 | * The following two entries are used to key the shadow page in the | 189 | * The following two entries are used to key the shadow page in the |
192 | * hash table. | 190 | * hash table. |
193 | */ | 191 | */ |
194 | gfn_t gfn; | 192 | gfn_t gfn; |
195 | union kvm_mmu_page_role role; | 193 | union kvm_mmu_page_role role; |
196 | 194 | ||
197 | u64 *spt; | 195 | u64 *spt; |
198 | /* hold the gfn of each spte inside spt */ | 196 | /* hold the gfn of each spte inside spt */ |
199 | gfn_t *gfns; | 197 | gfn_t *gfns; |
200 | /* | 198 | /* |
201 | * One bit set per slot which has memory | 199 | * One bit set per slot which has memory |
202 | * in this shadow page. | 200 | * in this shadow page. |
203 | */ | 201 | */ |
204 | DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); | 202 | DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); |
205 | bool multimapped; /* More than one parent_pte? */ | 203 | bool multimapped; /* More than one parent_pte? */ |
206 | bool unsync; | 204 | bool unsync; |
207 | int root_count; /* Currently serving as active root */ | 205 | int root_count; /* Currently serving as active root */ |
208 | unsigned int unsync_children; | 206 | unsigned int unsync_children; |
209 | union { | 207 | union { |
210 | u64 *parent_pte; /* !multimapped */ | 208 | u64 *parent_pte; /* !multimapped */ |
211 | struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ | 209 | struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ |
212 | }; | 210 | }; |
213 | DECLARE_BITMAP(unsync_child_bitmap, 512); | 211 | DECLARE_BITMAP(unsync_child_bitmap, 512); |
214 | }; | 212 | }; |
215 | 213 | ||
216 | struct kvm_pv_mmu_op_buffer { | 214 | struct kvm_pv_mmu_op_buffer { |
217 | void *ptr; | 215 | void *ptr; |
218 | unsigned len; | 216 | unsigned len; |
219 | unsigned processed; | 217 | unsigned processed; |
220 | char buf[512] __aligned(sizeof(long)); | 218 | char buf[512] __aligned(sizeof(long)); |
221 | }; | 219 | }; |
222 | 220 | ||
223 | struct kvm_pio_request { | 221 | struct kvm_pio_request { |
224 | unsigned long count; | 222 | unsigned long count; |
225 | int in; | 223 | int in; |
226 | int port; | 224 | int port; |
227 | int size; | 225 | int size; |
228 | }; | 226 | }; |
229 | 227 | ||
230 | /* | 228 | /* |
231 | * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level | 229 | * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level |
232 | * 32-bit). The kvm_mmu structure abstracts the details of the current mmu | 230 | * 32-bit). The kvm_mmu structure abstracts the details of the current mmu |
233 | * mode. | 231 | * mode. |
234 | */ | 232 | */ |
235 | struct kvm_mmu { | 233 | struct kvm_mmu { |
236 | void (*new_cr3)(struct kvm_vcpu *vcpu); | 234 | void (*new_cr3)(struct kvm_vcpu *vcpu); |
237 | int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err); | 235 | int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err); |
238 | void (*free)(struct kvm_vcpu *vcpu); | 236 | void (*free)(struct kvm_vcpu *vcpu); |
239 | gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, | 237 | gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, |
240 | u32 *error); | 238 | u32 *error); |
241 | void (*prefetch_page)(struct kvm_vcpu *vcpu, | 239 | void (*prefetch_page)(struct kvm_vcpu *vcpu, |
242 | struct kvm_mmu_page *page); | 240 | struct kvm_mmu_page *page); |
243 | int (*sync_page)(struct kvm_vcpu *vcpu, | 241 | int (*sync_page)(struct kvm_vcpu *vcpu, |
244 | struct kvm_mmu_page *sp, bool clear_unsync); | 242 | struct kvm_mmu_page *sp, bool clear_unsync); |
245 | void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva); | 243 | void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva); |
246 | hpa_t root_hpa; | 244 | hpa_t root_hpa; |
247 | int root_level; | 245 | int root_level; |
248 | int shadow_root_level; | 246 | int shadow_root_level; |
249 | union kvm_mmu_page_role base_role; | 247 | union kvm_mmu_page_role base_role; |
250 | 248 | ||
251 | u64 *pae_root; | 249 | u64 *pae_root; |
252 | u64 rsvd_bits_mask[2][4]; | 250 | u64 rsvd_bits_mask[2][4]; |
253 | }; | 251 | }; |
254 | 252 | ||
255 | struct kvm_vcpu_arch { | 253 | struct kvm_vcpu_arch { |
256 | u64 host_tsc; | 254 | u64 host_tsc; |
257 | /* | 255 | /* |
258 | * rip and regs accesses must go through | 256 | * rip and regs accesses must go through |
259 | * kvm_{register,rip}_{read,write} functions. | 257 | * kvm_{register,rip}_{read,write} functions. |
260 | */ | 258 | */ |
261 | unsigned long regs[NR_VCPU_REGS]; | 259 | unsigned long regs[NR_VCPU_REGS]; |
262 | u32 regs_avail; | 260 | u32 regs_avail; |
263 | u32 regs_dirty; | 261 | u32 regs_dirty; |
264 | 262 | ||
265 | unsigned long cr0; | 263 | unsigned long cr0; |
266 | unsigned long cr0_guest_owned_bits; | 264 | unsigned long cr0_guest_owned_bits; |
267 | unsigned long cr2; | 265 | unsigned long cr2; |
268 | unsigned long cr3; | 266 | unsigned long cr3; |
269 | unsigned long cr4; | 267 | unsigned long cr4; |
270 | unsigned long cr4_guest_owned_bits; | 268 | unsigned long cr4_guest_owned_bits; |
271 | unsigned long cr8; | 269 | unsigned long cr8; |
272 | u32 hflags; | 270 | u32 hflags; |
273 | u64 pdptrs[4]; /* pae */ | 271 | u64 pdptrs[4]; /* pae */ |
274 | u64 efer; | 272 | u64 efer; |
275 | u64 apic_base; | 273 | u64 apic_base; |
276 | struct kvm_lapic *apic; /* kernel irqchip context */ | 274 | struct kvm_lapic *apic; /* kernel irqchip context */ |
277 | int32_t apic_arb_prio; | 275 | int32_t apic_arb_prio; |
278 | int mp_state; | 276 | int mp_state; |
279 | int sipi_vector; | 277 | int sipi_vector; |
280 | u64 ia32_misc_enable_msr; | 278 | u64 ia32_misc_enable_msr; |
281 | bool tpr_access_reporting; | 279 | bool tpr_access_reporting; |
282 | 280 | ||
283 | struct kvm_mmu mmu; | 281 | struct kvm_mmu mmu; |
284 | /* only needed in kvm_pv_mmu_op() path, but it's hot so | 282 | /* only needed in kvm_pv_mmu_op() path, but it's hot so |
285 | * put it here to avoid allocation */ | 283 | * put it here to avoid allocation */ |
286 | struct kvm_pv_mmu_op_buffer mmu_op_buffer; | 284 | struct kvm_pv_mmu_op_buffer mmu_op_buffer; |
287 | 285 | ||
288 | struct kvm_mmu_memory_cache mmu_pte_chain_cache; | 286 | struct kvm_mmu_memory_cache mmu_pte_chain_cache; |
289 | struct kvm_mmu_memory_cache mmu_rmap_desc_cache; | 287 | struct kvm_mmu_memory_cache mmu_rmap_desc_cache; |
290 | struct kvm_mmu_memory_cache mmu_page_cache; | 288 | struct kvm_mmu_memory_cache mmu_page_cache; |
291 | struct kvm_mmu_memory_cache mmu_page_header_cache; | 289 | struct kvm_mmu_memory_cache mmu_page_header_cache; |
292 | 290 | ||
293 | gfn_t last_pt_write_gfn; | 291 | gfn_t last_pt_write_gfn; |
294 | int last_pt_write_count; | 292 | int last_pt_write_count; |
295 | u64 *last_pte_updated; | 293 | u64 *last_pte_updated; |
296 | gfn_t last_pte_gfn; | 294 | gfn_t last_pte_gfn; |
297 | 295 | ||
298 | struct { | 296 | struct { |
299 | gfn_t gfn; /* presumed gfn during guest pte update */ | 297 | gfn_t gfn; /* presumed gfn during guest pte update */ |
300 | pfn_t pfn; /* pfn corresponding to that gfn */ | 298 | pfn_t pfn; /* pfn corresponding to that gfn */ |
301 | unsigned long mmu_seq; | 299 | unsigned long mmu_seq; |
302 | } update_pte; | 300 | } update_pte; |
303 | 301 | ||
304 | struct fpu guest_fpu; | 302 | struct fpu guest_fpu; |
305 | u64 xcr0; | 303 | u64 xcr0; |
306 | 304 | ||
307 | gva_t mmio_fault_cr2; | 305 | gva_t mmio_fault_cr2; |
308 | struct kvm_pio_request pio; | 306 | struct kvm_pio_request pio; |
309 | void *pio_data; | 307 | void *pio_data; |
310 | 308 | ||
311 | u8 event_exit_inst_len; | 309 | u8 event_exit_inst_len; |
312 | 310 | ||
313 | struct kvm_queued_exception { | 311 | struct kvm_queued_exception { |
314 | bool pending; | 312 | bool pending; |
315 | bool has_error_code; | 313 | bool has_error_code; |
316 | bool reinject; | 314 | bool reinject; |
317 | u8 nr; | 315 | u8 nr; |
318 | u32 error_code; | 316 | u32 error_code; |
319 | } exception; | 317 | } exception; |
320 | 318 | ||
321 | struct kvm_queued_interrupt { | 319 | struct kvm_queued_interrupt { |
322 | bool pending; | 320 | bool pending; |
323 | bool soft; | 321 | bool soft; |
324 | u8 nr; | 322 | u8 nr; |
325 | } interrupt; | 323 | } interrupt; |
326 | 324 | ||
327 | int halt_request; /* real mode on Intel only */ | 325 | int halt_request; /* real mode on Intel only */ |
328 | 326 | ||
329 | int cpuid_nent; | 327 | int cpuid_nent; |
330 | struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES]; | 328 | struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES]; |
331 | /* emulate context */ | 329 | /* emulate context */ |
332 | 330 | ||
333 | struct x86_emulate_ctxt emulate_ctxt; | 331 | struct x86_emulate_ctxt emulate_ctxt; |
334 | 332 | ||
335 | gpa_t time; | 333 | gpa_t time; |
336 | struct pvclock_vcpu_time_info hv_clock; | 334 | struct pvclock_vcpu_time_info hv_clock; |
337 | unsigned int hv_clock_tsc_khz; | 335 | unsigned int hv_clock_tsc_khz; |
338 | unsigned int time_offset; | 336 | unsigned int time_offset; |
339 | struct page *time_page; | 337 | struct page *time_page; |
340 | 338 | ||
341 | bool nmi_pending; | 339 | bool nmi_pending; |
342 | bool nmi_injected; | 340 | bool nmi_injected; |
343 | 341 | ||
344 | struct mtrr_state_type mtrr_state; | 342 | struct mtrr_state_type mtrr_state; |
345 | u32 pat; | 343 | u32 pat; |
346 | 344 | ||
347 | int switch_db_regs; | 345 | int switch_db_regs; |
348 | unsigned long db[KVM_NR_DB_REGS]; | 346 | unsigned long db[KVM_NR_DB_REGS]; |
349 | unsigned long dr6; | 347 | unsigned long dr6; |
350 | unsigned long dr7; | 348 | unsigned long dr7; |
351 | unsigned long eff_db[KVM_NR_DB_REGS]; | 349 | unsigned long eff_db[KVM_NR_DB_REGS]; |
352 | 350 | ||
353 | u64 mcg_cap; | 351 | u64 mcg_cap; |
354 | u64 mcg_status; | 352 | u64 mcg_status; |
355 | u64 mcg_ctl; | 353 | u64 mcg_ctl; |
356 | u64 *mce_banks; | 354 | u64 *mce_banks; |
357 | 355 | ||
358 | /* used for guest single stepping over the given code position */ | 356 | /* used for guest single stepping over the given code position */ |
359 | unsigned long singlestep_rip; | 357 | unsigned long singlestep_rip; |
360 | 358 | ||
361 | /* fields used by HYPER-V emulation */ | 359 | /* fields used by HYPER-V emulation */ |
362 | u64 hv_vapic; | 360 | u64 hv_vapic; |
363 | }; | 361 | }; |
364 | 362 | ||
365 | struct kvm_mem_alias { | ||
366 | gfn_t base_gfn; | ||
367 | unsigned long npages; | ||
368 | gfn_t target_gfn; | ||
369 | #define KVM_ALIAS_INVALID 1UL | ||
370 | unsigned long flags; | ||
371 | }; | ||
372 | |||
373 | #define KVM_ARCH_HAS_UNALIAS_INSTANTIATION | ||
374 | |||
375 | struct kvm_mem_aliases { | ||
376 | struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS]; | ||
377 | int naliases; | ||
378 | }; | ||
379 | |||
380 | struct kvm_arch { | 363 | struct kvm_arch { |
381 | struct kvm_mem_aliases *aliases; | ||
382 | |||
383 | unsigned int n_free_mmu_pages; | 364 | unsigned int n_free_mmu_pages; |
384 | unsigned int n_requested_mmu_pages; | 365 | unsigned int n_requested_mmu_pages; |
385 | unsigned int n_alloc_mmu_pages; | 366 | unsigned int n_alloc_mmu_pages; |
386 | atomic_t invlpg_counter; | 367 | atomic_t invlpg_counter; |
387 | struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; | 368 | struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; |
388 | /* | 369 | /* |
389 | * Hash table of struct kvm_mmu_page. | 370 | * Hash table of struct kvm_mmu_page. |
390 | */ | 371 | */ |
391 | struct list_head active_mmu_pages; | 372 | struct list_head active_mmu_pages; |
392 | struct list_head assigned_dev_head; | 373 | struct list_head assigned_dev_head; |
393 | struct iommu_domain *iommu_domain; | 374 | struct iommu_domain *iommu_domain; |
394 | int iommu_flags; | 375 | int iommu_flags; |
395 | struct kvm_pic *vpic; | 376 | struct kvm_pic *vpic; |
396 | struct kvm_ioapic *vioapic; | 377 | struct kvm_ioapic *vioapic; |
397 | struct kvm_pit *vpit; | 378 | struct kvm_pit *vpit; |
398 | int vapics_in_nmi_mode; | 379 | int vapics_in_nmi_mode; |
399 | 380 | ||
400 | unsigned int tss_addr; | 381 | unsigned int tss_addr; |
401 | struct page *apic_access_page; | 382 | struct page *apic_access_page; |
402 | 383 | ||
403 | gpa_t wall_clock; | 384 | gpa_t wall_clock; |
404 | 385 | ||
405 | struct page *ept_identity_pagetable; | 386 | struct page *ept_identity_pagetable; |
406 | bool ept_identity_pagetable_done; | 387 | bool ept_identity_pagetable_done; |
407 | gpa_t ept_identity_map_addr; | 388 | gpa_t ept_identity_map_addr; |
408 | 389 | ||
409 | unsigned long irq_sources_bitmap; | 390 | unsigned long irq_sources_bitmap; |
410 | u64 vm_init_tsc; | 391 | u64 vm_init_tsc; |
411 | s64 kvmclock_offset; | 392 | s64 kvmclock_offset; |
412 | 393 | ||
413 | struct kvm_xen_hvm_config xen_hvm_config; | 394 | struct kvm_xen_hvm_config xen_hvm_config; |
414 | 395 | ||
415 | /* fields used by HYPER-V emulation */ | 396 | /* fields used by HYPER-V emulation */ |
416 | u64 hv_guest_os_id; | 397 | u64 hv_guest_os_id; |
417 | u64 hv_hypercall; | 398 | u64 hv_hypercall; |
418 | }; | 399 | }; |
419 | 400 | ||
420 | struct kvm_vm_stat { | 401 | struct kvm_vm_stat { |
421 | u32 mmu_shadow_zapped; | 402 | u32 mmu_shadow_zapped; |
422 | u32 mmu_pte_write; | 403 | u32 mmu_pte_write; |
423 | u32 mmu_pte_updated; | 404 | u32 mmu_pte_updated; |
424 | u32 mmu_pde_zapped; | 405 | u32 mmu_pde_zapped; |
425 | u32 mmu_flooded; | 406 | u32 mmu_flooded; |
426 | u32 mmu_recycled; | 407 | u32 mmu_recycled; |
427 | u32 mmu_cache_miss; | 408 | u32 mmu_cache_miss; |
428 | u32 mmu_unsync; | 409 | u32 mmu_unsync; |
429 | u32 remote_tlb_flush; | 410 | u32 remote_tlb_flush; |
430 | u32 lpages; | 411 | u32 lpages; |
431 | }; | 412 | }; |
432 | 413 | ||
433 | struct kvm_vcpu_stat { | 414 | struct kvm_vcpu_stat { |
434 | u32 pf_fixed; | 415 | u32 pf_fixed; |
435 | u32 pf_guest; | 416 | u32 pf_guest; |
436 | u32 tlb_flush; | 417 | u32 tlb_flush; |
437 | u32 invlpg; | 418 | u32 invlpg; |
438 | 419 | ||
439 | u32 exits; | 420 | u32 exits; |
440 | u32 io_exits; | 421 | u32 io_exits; |
441 | u32 mmio_exits; | 422 | u32 mmio_exits; |
442 | u32 signal_exits; | 423 | u32 signal_exits; |
443 | u32 irq_window_exits; | 424 | u32 irq_window_exits; |
444 | u32 nmi_window_exits; | 425 | u32 nmi_window_exits; |
445 | u32 halt_exits; | 426 | u32 halt_exits; |
446 | u32 halt_wakeup; | 427 | u32 halt_wakeup; |
447 | u32 request_irq_exits; | 428 | u32 request_irq_exits; |
448 | u32 irq_exits; | 429 | u32 irq_exits; |
449 | u32 host_state_reload; | 430 | u32 host_state_reload; |
450 | u32 efer_reload; | 431 | u32 efer_reload; |
451 | u32 fpu_reload; | 432 | u32 fpu_reload; |
452 | u32 insn_emulation; | 433 | u32 insn_emulation; |
453 | u32 insn_emulation_fail; | 434 | u32 insn_emulation_fail; |
454 | u32 hypercalls; | 435 | u32 hypercalls; |
455 | u32 irq_injections; | 436 | u32 irq_injections; |
456 | u32 nmi_injections; | 437 | u32 nmi_injections; |
457 | }; | 438 | }; |
458 | 439 | ||
459 | struct kvm_x86_ops { | 440 | struct kvm_x86_ops { |
460 | int (*cpu_has_kvm_support)(void); /* __init */ | 441 | int (*cpu_has_kvm_support)(void); /* __init */ |
461 | int (*disabled_by_bios)(void); /* __init */ | 442 | int (*disabled_by_bios)(void); /* __init */ |
462 | int (*hardware_enable)(void *dummy); | 443 | int (*hardware_enable)(void *dummy); |
463 | void (*hardware_disable)(void *dummy); | 444 | void (*hardware_disable)(void *dummy); |
464 | void (*check_processor_compatibility)(void *rtn); | 445 | void (*check_processor_compatibility)(void *rtn); |
465 | int (*hardware_setup)(void); /* __init */ | 446 | int (*hardware_setup)(void); /* __init */ |
466 | void (*hardware_unsetup)(void); /* __exit */ | 447 | void (*hardware_unsetup)(void); /* __exit */ |
467 | bool (*cpu_has_accelerated_tpr)(void); | 448 | bool (*cpu_has_accelerated_tpr)(void); |
468 | void (*cpuid_update)(struct kvm_vcpu *vcpu); | 449 | void (*cpuid_update)(struct kvm_vcpu *vcpu); |
469 | 450 | ||
470 | /* Create, but do not attach this VCPU */ | 451 | /* Create, but do not attach this VCPU */ |
471 | struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id); | 452 | struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id); |
472 | void (*vcpu_free)(struct kvm_vcpu *vcpu); | 453 | void (*vcpu_free)(struct kvm_vcpu *vcpu); |
473 | int (*vcpu_reset)(struct kvm_vcpu *vcpu); | 454 | int (*vcpu_reset)(struct kvm_vcpu *vcpu); |
474 | 455 | ||
475 | void (*prepare_guest_switch)(struct kvm_vcpu *vcpu); | 456 | void (*prepare_guest_switch)(struct kvm_vcpu *vcpu); |
476 | void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); | 457 | void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); |
477 | void (*vcpu_put)(struct kvm_vcpu *vcpu); | 458 | void (*vcpu_put)(struct kvm_vcpu *vcpu); |
478 | 459 | ||
479 | void (*set_guest_debug)(struct kvm_vcpu *vcpu, | 460 | void (*set_guest_debug)(struct kvm_vcpu *vcpu, |
480 | struct kvm_guest_debug *dbg); | 461 | struct kvm_guest_debug *dbg); |
481 | int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); | 462 | int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); |
482 | int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); | 463 | int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); |
483 | u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); | 464 | u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); |
484 | void (*get_segment)(struct kvm_vcpu *vcpu, | 465 | void (*get_segment)(struct kvm_vcpu *vcpu, |
485 | struct kvm_segment *var, int seg); | 466 | struct kvm_segment *var, int seg); |
486 | int (*get_cpl)(struct kvm_vcpu *vcpu); | 467 | int (*get_cpl)(struct kvm_vcpu *vcpu); |
487 | void (*set_segment)(struct kvm_vcpu *vcpu, | 468 | void (*set_segment)(struct kvm_vcpu *vcpu, |
488 | struct kvm_segment *var, int seg); | 469 | struct kvm_segment *var, int seg); |
489 | void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); | 470 | void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); |
490 | void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu); | 471 | void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu); |
491 | void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu); | 472 | void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu); |
492 | void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0); | 473 | void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0); |
493 | void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); | 474 | void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); |
494 | void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4); | 475 | void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4); |
495 | void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer); | 476 | void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer); |
496 | void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); | 477 | void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); |
497 | void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); | 478 | void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); |
498 | void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); | 479 | void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); |
499 | void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); | 480 | void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); |
500 | void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value); | 481 | void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value); |
501 | void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); | 482 | void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); |
502 | unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); | 483 | unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); |
503 | void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); | 484 | void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); |
504 | void (*fpu_activate)(struct kvm_vcpu *vcpu); | 485 | void (*fpu_activate)(struct kvm_vcpu *vcpu); |
505 | void (*fpu_deactivate)(struct kvm_vcpu *vcpu); | 486 | void (*fpu_deactivate)(struct kvm_vcpu *vcpu); |
506 | 487 | ||
507 | void (*tlb_flush)(struct kvm_vcpu *vcpu); | 488 | void (*tlb_flush)(struct kvm_vcpu *vcpu); |
508 | 489 | ||
509 | void (*run)(struct kvm_vcpu *vcpu); | 490 | void (*run)(struct kvm_vcpu *vcpu); |
510 | int (*handle_exit)(struct kvm_vcpu *vcpu); | 491 | int (*handle_exit)(struct kvm_vcpu *vcpu); |
511 | void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); | 492 | void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); |
512 | void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); | 493 | void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); |
513 | u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); | 494 | u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); |
514 | void (*patch_hypercall)(struct kvm_vcpu *vcpu, | 495 | void (*patch_hypercall)(struct kvm_vcpu *vcpu, |
515 | unsigned char *hypercall_addr); | 496 | unsigned char *hypercall_addr); |
516 | void (*set_irq)(struct kvm_vcpu *vcpu); | 497 | void (*set_irq)(struct kvm_vcpu *vcpu); |
517 | void (*set_nmi)(struct kvm_vcpu *vcpu); | 498 | void (*set_nmi)(struct kvm_vcpu *vcpu); |
518 | void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, | 499 | void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, |
519 | bool has_error_code, u32 error_code, | 500 | bool has_error_code, u32 error_code, |
520 | bool reinject); | 501 | bool reinject); |
521 | int (*interrupt_allowed)(struct kvm_vcpu *vcpu); | 502 | int (*interrupt_allowed)(struct kvm_vcpu *vcpu); |
522 | int (*nmi_allowed)(struct kvm_vcpu *vcpu); | 503 | int (*nmi_allowed)(struct kvm_vcpu *vcpu); |
523 | bool (*get_nmi_mask)(struct kvm_vcpu *vcpu); | 504 | bool (*get_nmi_mask)(struct kvm_vcpu *vcpu); |
524 | void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked); | 505 | void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked); |
525 | void (*enable_nmi_window)(struct kvm_vcpu *vcpu); | 506 | void (*enable_nmi_window)(struct kvm_vcpu *vcpu); |
526 | void (*enable_irq_window)(struct kvm_vcpu *vcpu); | 507 | void (*enable_irq_window)(struct kvm_vcpu *vcpu); |
527 | void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); | 508 | void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); |
528 | int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); | 509 | int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); |
529 | int (*get_tdp_level)(void); | 510 | int (*get_tdp_level)(void); |
530 | u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); | 511 | u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); |
531 | int (*get_lpage_level)(void); | 512 | int (*get_lpage_level)(void); |
532 | bool (*rdtscp_supported)(void); | 513 | bool (*rdtscp_supported)(void); |
533 | 514 | ||
534 | void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); | 515 | void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); |
535 | 516 | ||
536 | const struct trace_print_flags *exit_reasons_str; | 517 | const struct trace_print_flags *exit_reasons_str; |
537 | }; | 518 | }; |
538 | 519 | ||
539 | extern struct kvm_x86_ops *kvm_x86_ops; | 520 | extern struct kvm_x86_ops *kvm_x86_ops; |
540 | 521 | ||
541 | int kvm_mmu_module_init(void); | 522 | int kvm_mmu_module_init(void); |
542 | void kvm_mmu_module_exit(void); | 523 | void kvm_mmu_module_exit(void); |
543 | 524 | ||
544 | void kvm_mmu_destroy(struct kvm_vcpu *vcpu); | 525 | void kvm_mmu_destroy(struct kvm_vcpu *vcpu); |
545 | int kvm_mmu_create(struct kvm_vcpu *vcpu); | 526 | int kvm_mmu_create(struct kvm_vcpu *vcpu); |
546 | int kvm_mmu_setup(struct kvm_vcpu *vcpu); | 527 | int kvm_mmu_setup(struct kvm_vcpu *vcpu); |
547 | void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); | 528 | void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); |
548 | void kvm_mmu_set_base_ptes(u64 base_pte); | 529 | void kvm_mmu_set_base_ptes(u64 base_pte); |
549 | void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, | 530 | void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, |
550 | u64 dirty_mask, u64 nx_mask, u64 x_mask); | 531 | u64 dirty_mask, u64 nx_mask, u64 x_mask); |
551 | 532 | ||
552 | int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); | 533 | int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); |
553 | void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); | 534 | void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); |
554 | void kvm_mmu_zap_all(struct kvm *kvm); | 535 | void kvm_mmu_zap_all(struct kvm *kvm); |
555 | unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); | 536 | unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); |
556 | void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); | 537 | void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); |
557 | 538 | ||
558 | int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); | 539 | int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); |
559 | 540 | ||
560 | int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, | 541 | int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, |
561 | const void *val, int bytes); | 542 | const void *val, int bytes); |
562 | int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, | 543 | int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, |
563 | gpa_t addr, unsigned long *ret); | 544 | gpa_t addr, unsigned long *ret); |
564 | u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); | 545 | u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); |
565 | 546 | ||
566 | extern bool tdp_enabled; | 547 | extern bool tdp_enabled; |
567 | 548 | ||
568 | enum emulation_result { | 549 | enum emulation_result { |
569 | EMULATE_DONE, /* no further processing */ | 550 | EMULATE_DONE, /* no further processing */ |
570 | EMULATE_DO_MMIO, /* kvm_run filled with mmio request */ | 551 | EMULATE_DO_MMIO, /* kvm_run filled with mmio request */ |
571 | EMULATE_FAIL, /* can't emulate this instruction */ | 552 | EMULATE_FAIL, /* can't emulate this instruction */ |
572 | }; | 553 | }; |
573 | 554 | ||
574 | #define EMULTYPE_NO_DECODE (1 << 0) | 555 | #define EMULTYPE_NO_DECODE (1 << 0) |
575 | #define EMULTYPE_TRAP_UD (1 << 1) | 556 | #define EMULTYPE_TRAP_UD (1 << 1) |
576 | #define EMULTYPE_SKIP (1 << 2) | 557 | #define EMULTYPE_SKIP (1 << 2) |
577 | int emulate_instruction(struct kvm_vcpu *vcpu, | 558 | int emulate_instruction(struct kvm_vcpu *vcpu, |
578 | unsigned long cr2, u16 error_code, int emulation_type); | 559 | unsigned long cr2, u16 error_code, int emulation_type); |
579 | void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); | 560 | void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); |
580 | void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); | 561 | void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); |
581 | 562 | ||
582 | void kvm_enable_efer_bits(u64); | 563 | void kvm_enable_efer_bits(u64); |
583 | int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); | 564 | int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); |
584 | int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); | 565 | int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); |
585 | 566 | ||
586 | struct x86_emulate_ctxt; | 567 | struct x86_emulate_ctxt; |
587 | 568 | ||
588 | int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); | 569 | int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); |
589 | void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); | 570 | void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); |
590 | int kvm_emulate_halt(struct kvm_vcpu *vcpu); | 571 | int kvm_emulate_halt(struct kvm_vcpu *vcpu); |
591 | int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); | 572 | int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); |
592 | int emulate_clts(struct kvm_vcpu *vcpu); | 573 | int emulate_clts(struct kvm_vcpu *vcpu); |
593 | 574 | ||
594 | void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); | 575 | void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); |
595 | int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); | 576 | int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); |
596 | 577 | ||
597 | int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, | 578 | int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, |
598 | bool has_error_code, u32 error_code); | 579 | bool has_error_code, u32 error_code); |
599 | 580 | ||
600 | int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); | 581 | int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); |
601 | int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); | 582 | int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); |
602 | int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); | 583 | int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); |
603 | void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8); | 584 | void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8); |
604 | int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val); | 585 | int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val); |
605 | int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val); | 586 | int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val); |
606 | unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu); | 587 | unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu); |
607 | void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw); | 588 | void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw); |
608 | void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); | 589 | void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); |
609 | int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); | 590 | int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); |
610 | 591 | ||
611 | int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); | 592 | int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); |
612 | int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); | 593 | int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); |
613 | 594 | ||
614 | unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); | 595 | unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); |
615 | void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); | 596 | void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); |
616 | 597 | ||
617 | void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); | 598 | void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); |
618 | void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); | 599 | void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); |
619 | void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr); | 600 | void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr); |
620 | void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); | 601 | void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); |
621 | void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2, | 602 | void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2, |
622 | u32 error_code); | 603 | u32 error_code); |
623 | bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl); | 604 | bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl); |
624 | 605 | ||
625 | int kvm_pic_set_irq(void *opaque, int irq, int level); | 606 | int kvm_pic_set_irq(void *opaque, int irq, int level); |
626 | 607 | ||
627 | void kvm_inject_nmi(struct kvm_vcpu *vcpu); | 608 | void kvm_inject_nmi(struct kvm_vcpu *vcpu); |
628 | 609 | ||
629 | int fx_init(struct kvm_vcpu *vcpu); | 610 | int fx_init(struct kvm_vcpu *vcpu); |
630 | 611 | ||
631 | void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu); | 612 | void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu); |
632 | void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, | 613 | void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, |
633 | const u8 *new, int bytes, | 614 | const u8 *new, int bytes, |
634 | bool guest_initiated); | 615 | bool guest_initiated); |
635 | int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); | 616 | int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); |
636 | void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); | 617 | void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); |
637 | int kvm_mmu_load(struct kvm_vcpu *vcpu); | 618 | int kvm_mmu_load(struct kvm_vcpu *vcpu); |
638 | void kvm_mmu_unload(struct kvm_vcpu *vcpu); | 619 | void kvm_mmu_unload(struct kvm_vcpu *vcpu); |
639 | void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); | 620 | void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); |
640 | gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); | 621 | gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); |
641 | gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); | 622 | gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); |
642 | gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); | 623 | gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); |
643 | gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); | 624 | gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); |
644 | 625 | ||
645 | int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); | 626 | int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); |
646 | 627 | ||
647 | int kvm_fix_hypercall(struct kvm_vcpu *vcpu); | 628 | int kvm_fix_hypercall(struct kvm_vcpu *vcpu); |
648 | 629 | ||
649 | int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code); | 630 | int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code); |
650 | void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); | 631 | void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); |
651 | 632 | ||
652 | void kvm_enable_tdp(void); | 633 | void kvm_enable_tdp(void); |
653 | void kvm_disable_tdp(void); | 634 | void kvm_disable_tdp(void); |
654 | 635 | ||
655 | int complete_pio(struct kvm_vcpu *vcpu); | 636 | int complete_pio(struct kvm_vcpu *vcpu); |
656 | bool kvm_check_iopl(struct kvm_vcpu *vcpu); | 637 | bool kvm_check_iopl(struct kvm_vcpu *vcpu); |
657 | |||
658 | struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn); | ||
659 | 638 | ||
660 | static inline struct kvm_mmu_page *page_header(hpa_t shadow_page) | 639 | static inline struct kvm_mmu_page *page_header(hpa_t shadow_page) |
661 | { | 640 | { |
662 | struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT); | 641 | struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT); |
663 | 642 | ||
664 | return (struct kvm_mmu_page *)page_private(page); | 643 | return (struct kvm_mmu_page *)page_private(page); |
665 | } | 644 | } |
666 | 645 | ||
667 | static inline u16 kvm_read_fs(void) | 646 | static inline u16 kvm_read_fs(void) |
668 | { | 647 | { |
669 | u16 seg; | 648 | u16 seg; |
670 | asm("mov %%fs, %0" : "=g"(seg)); | 649 | asm("mov %%fs, %0" : "=g"(seg)); |
671 | return seg; | 650 | return seg; |
672 | } | 651 | } |
673 | 652 | ||
674 | static inline u16 kvm_read_gs(void) | 653 | static inline u16 kvm_read_gs(void) |
675 | { | 654 | { |
676 | u16 seg; | 655 | u16 seg; |
677 | asm("mov %%gs, %0" : "=g"(seg)); | 656 | asm("mov %%gs, %0" : "=g"(seg)); |
678 | return seg; | 657 | return seg; |
679 | } | 658 | } |
680 | 659 | ||
681 | static inline u16 kvm_read_ldt(void) | 660 | static inline u16 kvm_read_ldt(void) |
682 | { | 661 | { |
683 | u16 ldt; | 662 | u16 ldt; |
684 | asm("sldt %0" : "=g"(ldt)); | 663 | asm("sldt %0" : "=g"(ldt)); |
685 | return ldt; | 664 | return ldt; |
686 | } | 665 | } |
687 | 666 | ||
688 | static inline void kvm_load_fs(u16 sel) | 667 | static inline void kvm_load_fs(u16 sel) |
689 | { | 668 | { |
690 | asm("mov %0, %%fs" : : "rm"(sel)); | 669 | asm("mov %0, %%fs" : : "rm"(sel)); |
691 | } | 670 | } |
692 | 671 | ||
693 | static inline void kvm_load_gs(u16 sel) | 672 | static inline void kvm_load_gs(u16 sel) |
694 | { | 673 | { |
695 | asm("mov %0, %%gs" : : "rm"(sel)); | 674 | asm("mov %0, %%gs" : : "rm"(sel)); |
696 | } | 675 | } |
697 | 676 | ||
698 | static inline void kvm_load_ldt(u16 sel) | 677 | static inline void kvm_load_ldt(u16 sel) |
699 | { | 678 | { |
700 | asm("lldt %0" : : "rm"(sel)); | 679 | asm("lldt %0" : : "rm"(sel)); |
701 | } | 680 | } |
702 | 681 | ||
703 | #ifdef CONFIG_X86_64 | 682 | #ifdef CONFIG_X86_64 |
704 | static inline unsigned long read_msr(unsigned long msr) | 683 | static inline unsigned long read_msr(unsigned long msr) |
705 | { | 684 | { |
706 | u64 value; | 685 | u64 value; |
707 | 686 | ||
708 | rdmsrl(msr, value); | 687 | rdmsrl(msr, value); |
709 | return value; | 688 | return value; |
710 | } | 689 | } |
711 | #endif | 690 | #endif |
712 | 691 | ||
713 | static inline u32 get_rdx_init_val(void) | 692 | static inline u32 get_rdx_init_val(void) |
714 | { | 693 | { |
715 | return 0x600; /* P6 family */ | 694 | return 0x600; /* P6 family */ |
716 | } | 695 | } |
717 | 696 | ||
718 | static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code) | 697 | static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code) |
719 | { | 698 | { |
720 | kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); | 699 | kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); |
721 | } | 700 | } |
722 | 701 | ||
723 | #define TSS_IOPB_BASE_OFFSET 0x66 | 702 | #define TSS_IOPB_BASE_OFFSET 0x66 |
724 | #define TSS_BASE_SIZE 0x68 | 703 | #define TSS_BASE_SIZE 0x68 |
725 | #define TSS_IOPB_SIZE (65536 / 8) | 704 | #define TSS_IOPB_SIZE (65536 / 8) |
726 | #define TSS_REDIRECTION_SIZE (256 / 8) | 705 | #define TSS_REDIRECTION_SIZE (256 / 8) |
727 | #define RMODE_TSS_SIZE \ | 706 | #define RMODE_TSS_SIZE \ |
728 | (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 1) | 707 | (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 1) |
729 | 708 | ||
730 | enum { | 709 | enum { |
731 | TASK_SWITCH_CALL = 0, | 710 | TASK_SWITCH_CALL = 0, |
732 | TASK_SWITCH_IRET = 1, | 711 | TASK_SWITCH_IRET = 1, |
733 | TASK_SWITCH_JMP = 2, | 712 | TASK_SWITCH_JMP = 2, |
734 | TASK_SWITCH_GATE = 3, | 713 | TASK_SWITCH_GATE = 3, |
735 | }; | 714 | }; |
736 | 715 | ||
737 | #define HF_GIF_MASK (1 << 0) | 716 | #define HF_GIF_MASK (1 << 0) |
738 | #define HF_HIF_MASK (1 << 1) | 717 | #define HF_HIF_MASK (1 << 1) |
739 | #define HF_VINTR_MASK (1 << 2) | 718 | #define HF_VINTR_MASK (1 << 2) |
740 | #define HF_NMI_MASK (1 << 3) | 719 | #define HF_NMI_MASK (1 << 3) |
741 | #define HF_IRET_MASK (1 << 4) | 720 | #define HF_IRET_MASK (1 << 4) |
742 | 721 | ||
743 | /* | 722 | /* |
744 | * Hardware virtualization extension instructions may fault if a | 723 | * Hardware virtualization extension instructions may fault if a |
745 | * reboot turns off virtualization while processes are running. | 724 | * reboot turns off virtualization while processes are running. |
746 | * Trap the fault and ignore the instruction if that happens. | 725 | * Trap the fault and ignore the instruction if that happens. |
747 | */ | 726 | */ |
748 | asmlinkage void kvm_handle_fault_on_reboot(void); | 727 | asmlinkage void kvm_handle_fault_on_reboot(void); |
749 | 728 | ||
750 | #define __kvm_handle_fault_on_reboot(insn) \ | 729 | #define __kvm_handle_fault_on_reboot(insn) \ |
751 | "666: " insn "\n\t" \ | 730 | "666: " insn "\n\t" \ |
752 | ".pushsection .fixup, \"ax\" \n" \ | 731 | ".pushsection .fixup, \"ax\" \n" \ |
753 | "667: \n\t" \ | 732 | "667: \n\t" \ |
754 | __ASM_SIZE(push) " $666b \n\t" \ | 733 | __ASM_SIZE(push) " $666b \n\t" \ |
755 | "jmp kvm_handle_fault_on_reboot \n\t" \ | 734 | "jmp kvm_handle_fault_on_reboot \n\t" \ |
756 | ".popsection \n\t" \ | 735 | ".popsection \n\t" \ |
757 | ".pushsection __ex_table, \"a\" \n\t" \ | 736 | ".pushsection __ex_table, \"a\" \n\t" \ |
758 | _ASM_PTR " 666b, 667b \n\t" \ | 737 | _ASM_PTR " 666b, 667b \n\t" \ |
759 | ".popsection" | 738 | ".popsection" |
760 | 739 | ||
761 | #define KVM_ARCH_WANT_MMU_NOTIFIER | 740 | #define KVM_ARCH_WANT_MMU_NOTIFIER |
762 | int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); | 741 | int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); |
763 | int kvm_age_hva(struct kvm *kvm, unsigned long hva); | 742 | int kvm_age_hva(struct kvm *kvm, unsigned long hva); |
764 | void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); | 743 | void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); |
765 | int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); | 744 | int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); |
766 | int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); | 745 | int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); |
767 | int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); | 746 | int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); |
768 | int kvm_cpu_get_interrupt(struct kvm_vcpu *v); | 747 | int kvm_cpu_get_interrupt(struct kvm_vcpu *v); |
769 | 748 | ||
770 | void kvm_define_shared_msr(unsigned index, u32 msr); | 749 | void kvm_define_shared_msr(unsigned index, u32 msr); |
771 | void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); | 750 | void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); |
772 | 751 | ||
773 | bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); | 752 | bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); |
774 | 753 | ||
775 | #endif /* _ASM_X86_KVM_HOST_H */ | 754 | #endif /* _ASM_X86_KVM_HOST_H */ |
776 | 755 |
arch/x86/kvm/mmu.c
1 | /* | 1 | /* |
2 | * Kernel-based Virtual Machine driver for Linux | 2 | * Kernel-based Virtual Machine driver for Linux |
3 | * | 3 | * |
4 | * This module enables machines with Intel VT-x extensions to run virtual | 4 | * This module enables machines with Intel VT-x extensions to run virtual |
5 | * machines without emulation or binary translation. | 5 | * machines without emulation or binary translation. |
6 | * | 6 | * |
7 | * MMU support | 7 | * MMU support |
8 | * | 8 | * |
9 | * Copyright (C) 2006 Qumranet, Inc. | 9 | * Copyright (C) 2006 Qumranet, Inc. |
10 | * Copyright 2010 Red Hat, Inc. and/or its affilates. | 10 | * Copyright 2010 Red Hat, Inc. and/or its affilates. |
11 | * | 11 | * |
12 | * Authors: | 12 | * Authors: |
13 | * Yaniv Kamay <yaniv@qumranet.com> | 13 | * Yaniv Kamay <yaniv@qumranet.com> |
14 | * Avi Kivity <avi@qumranet.com> | 14 | * Avi Kivity <avi@qumranet.com> |
15 | * | 15 | * |
16 | * This work is licensed under the terms of the GNU GPL, version 2. See | 16 | * This work is licensed under the terms of the GNU GPL, version 2. See |
17 | * the COPYING file in the top-level directory. | 17 | * the COPYING file in the top-level directory. |
18 | * | 18 | * |
19 | */ | 19 | */ |
20 | 20 | ||
21 | #include "mmu.h" | 21 | #include "mmu.h" |
22 | #include "x86.h" | 22 | #include "x86.h" |
23 | #include "kvm_cache_regs.h" | 23 | #include "kvm_cache_regs.h" |
24 | 24 | ||
25 | #include <linux/kvm_host.h> | 25 | #include <linux/kvm_host.h> |
26 | #include <linux/types.h> | 26 | #include <linux/types.h> |
27 | #include <linux/string.h> | 27 | #include <linux/string.h> |
28 | #include <linux/mm.h> | 28 | #include <linux/mm.h> |
29 | #include <linux/highmem.h> | 29 | #include <linux/highmem.h> |
30 | #include <linux/module.h> | 30 | #include <linux/module.h> |
31 | #include <linux/swap.h> | 31 | #include <linux/swap.h> |
32 | #include <linux/hugetlb.h> | 32 | #include <linux/hugetlb.h> |
33 | #include <linux/compiler.h> | 33 | #include <linux/compiler.h> |
34 | #include <linux/srcu.h> | 34 | #include <linux/srcu.h> |
35 | #include <linux/slab.h> | 35 | #include <linux/slab.h> |
36 | #include <linux/uaccess.h> | 36 | #include <linux/uaccess.h> |
37 | 37 | ||
38 | #include <asm/page.h> | 38 | #include <asm/page.h> |
39 | #include <asm/cmpxchg.h> | 39 | #include <asm/cmpxchg.h> |
40 | #include <asm/io.h> | 40 | #include <asm/io.h> |
41 | #include <asm/vmx.h> | 41 | #include <asm/vmx.h> |
42 | 42 | ||
43 | /* | 43 | /* |
44 | * When setting this variable to true it enables Two-Dimensional-Paging | 44 | * When setting this variable to true it enables Two-Dimensional-Paging |
45 | * where the hardware walks 2 page tables: | 45 | * where the hardware walks 2 page tables: |
46 | * 1. the guest-virtual to guest-physical | 46 | * 1. the guest-virtual to guest-physical |
47 | * 2. while doing 1. it walks guest-physical to host-physical | 47 | * 2. while doing 1. it walks guest-physical to host-physical |
48 | * If the hardware supports that we don't need to do shadow paging. | 48 | * If the hardware supports that we don't need to do shadow paging. |
49 | */ | 49 | */ |
50 | bool tdp_enabled = false; | 50 | bool tdp_enabled = false; |
51 | 51 | ||
52 | #undef MMU_DEBUG | 52 | #undef MMU_DEBUG |
53 | 53 | ||
54 | #undef AUDIT | 54 | #undef AUDIT |
55 | 55 | ||
56 | #ifdef AUDIT | 56 | #ifdef AUDIT |
57 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg); | 57 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg); |
58 | #else | 58 | #else |
59 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {} | 59 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {} |
60 | #endif | 60 | #endif |
61 | 61 | ||
62 | #ifdef MMU_DEBUG | 62 | #ifdef MMU_DEBUG |
63 | 63 | ||
64 | #define pgprintk(x...) do { if (dbg) printk(x); } while (0) | 64 | #define pgprintk(x...) do { if (dbg) printk(x); } while (0) |
65 | #define rmap_printk(x...) do { if (dbg) printk(x); } while (0) | 65 | #define rmap_printk(x...) do { if (dbg) printk(x); } while (0) |
66 | 66 | ||
67 | #else | 67 | #else |
68 | 68 | ||
69 | #define pgprintk(x...) do { } while (0) | 69 | #define pgprintk(x...) do { } while (0) |
70 | #define rmap_printk(x...) do { } while (0) | 70 | #define rmap_printk(x...) do { } while (0) |
71 | 71 | ||
72 | #endif | 72 | #endif |
73 | 73 | ||
74 | #if defined(MMU_DEBUG) || defined(AUDIT) | 74 | #if defined(MMU_DEBUG) || defined(AUDIT) |
75 | static int dbg = 0; | 75 | static int dbg = 0; |
76 | module_param(dbg, bool, 0644); | 76 | module_param(dbg, bool, 0644); |
77 | #endif | 77 | #endif |
78 | 78 | ||
79 | static int oos_shadow = 1; | 79 | static int oos_shadow = 1; |
80 | module_param(oos_shadow, bool, 0644); | 80 | module_param(oos_shadow, bool, 0644); |
81 | 81 | ||
82 | #ifndef MMU_DEBUG | 82 | #ifndef MMU_DEBUG |
83 | #define ASSERT(x) do { } while (0) | 83 | #define ASSERT(x) do { } while (0) |
84 | #else | 84 | #else |
85 | #define ASSERT(x) \ | 85 | #define ASSERT(x) \ |
86 | if (!(x)) { \ | 86 | if (!(x)) { \ |
87 | printk(KERN_WARNING "assertion failed %s:%d: %s\n", \ | 87 | printk(KERN_WARNING "assertion failed %s:%d: %s\n", \ |
88 | __FILE__, __LINE__, #x); \ | 88 | __FILE__, __LINE__, #x); \ |
89 | } | 89 | } |
90 | #endif | 90 | #endif |
91 | 91 | ||
92 | #define PT_FIRST_AVAIL_BITS_SHIFT 9 | 92 | #define PT_FIRST_AVAIL_BITS_SHIFT 9 |
93 | #define PT64_SECOND_AVAIL_BITS_SHIFT 52 | 93 | #define PT64_SECOND_AVAIL_BITS_SHIFT 52 |
94 | 94 | ||
95 | #define VALID_PAGE(x) ((x) != INVALID_PAGE) | 95 | #define VALID_PAGE(x) ((x) != INVALID_PAGE) |
96 | 96 | ||
97 | #define PT64_LEVEL_BITS 9 | 97 | #define PT64_LEVEL_BITS 9 |
98 | 98 | ||
99 | #define PT64_LEVEL_SHIFT(level) \ | 99 | #define PT64_LEVEL_SHIFT(level) \ |
100 | (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS) | 100 | (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS) |
101 | 101 | ||
102 | #define PT64_LEVEL_MASK(level) \ | 102 | #define PT64_LEVEL_MASK(level) \ |
103 | (((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level)) | 103 | (((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level)) |
104 | 104 | ||
105 | #define PT64_INDEX(address, level)\ | 105 | #define PT64_INDEX(address, level)\ |
106 | (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) | 106 | (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) |
107 | 107 | ||
108 | 108 | ||
109 | #define PT32_LEVEL_BITS 10 | 109 | #define PT32_LEVEL_BITS 10 |
110 | 110 | ||
111 | #define PT32_LEVEL_SHIFT(level) \ | 111 | #define PT32_LEVEL_SHIFT(level) \ |
112 | (PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS) | 112 | (PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS) |
113 | 113 | ||
114 | #define PT32_LEVEL_MASK(level) \ | 114 | #define PT32_LEVEL_MASK(level) \ |
115 | (((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level)) | 115 | (((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level)) |
116 | #define PT32_LVL_OFFSET_MASK(level) \ | 116 | #define PT32_LVL_OFFSET_MASK(level) \ |
117 | (PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ | 117 | (PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ |
118 | * PT32_LEVEL_BITS))) - 1)) | 118 | * PT32_LEVEL_BITS))) - 1)) |
119 | 119 | ||
120 | #define PT32_INDEX(address, level)\ | 120 | #define PT32_INDEX(address, level)\ |
121 | (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1)) | 121 | (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1)) |
122 | 122 | ||
123 | 123 | ||
124 | #define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) | 124 | #define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) |
125 | #define PT64_DIR_BASE_ADDR_MASK \ | 125 | #define PT64_DIR_BASE_ADDR_MASK \ |
126 | (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1)) | 126 | (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1)) |
127 | #define PT64_LVL_ADDR_MASK(level) \ | 127 | #define PT64_LVL_ADDR_MASK(level) \ |
128 | (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ | 128 | (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ |
129 | * PT64_LEVEL_BITS))) - 1)) | 129 | * PT64_LEVEL_BITS))) - 1)) |
130 | #define PT64_LVL_OFFSET_MASK(level) \ | 130 | #define PT64_LVL_OFFSET_MASK(level) \ |
131 | (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ | 131 | (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ |
132 | * PT64_LEVEL_BITS))) - 1)) | 132 | * PT64_LEVEL_BITS))) - 1)) |
133 | 133 | ||
134 | #define PT32_BASE_ADDR_MASK PAGE_MASK | 134 | #define PT32_BASE_ADDR_MASK PAGE_MASK |
135 | #define PT32_DIR_BASE_ADDR_MASK \ | 135 | #define PT32_DIR_BASE_ADDR_MASK \ |
136 | (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1)) | 136 | (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1)) |
137 | #define PT32_LVL_ADDR_MASK(level) \ | 137 | #define PT32_LVL_ADDR_MASK(level) \ |
138 | (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ | 138 | (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ |
139 | * PT32_LEVEL_BITS))) - 1)) | 139 | * PT32_LEVEL_BITS))) - 1)) |
140 | 140 | ||
141 | #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \ | 141 | #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \ |
142 | | PT64_NX_MASK) | 142 | | PT64_NX_MASK) |
143 | 143 | ||
144 | #define RMAP_EXT 4 | 144 | #define RMAP_EXT 4 |
145 | 145 | ||
146 | #define ACC_EXEC_MASK 1 | 146 | #define ACC_EXEC_MASK 1 |
147 | #define ACC_WRITE_MASK PT_WRITABLE_MASK | 147 | #define ACC_WRITE_MASK PT_WRITABLE_MASK |
148 | #define ACC_USER_MASK PT_USER_MASK | 148 | #define ACC_USER_MASK PT_USER_MASK |
149 | #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) | 149 | #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) |
150 | 150 | ||
151 | #include <trace/events/kvm.h> | 151 | #include <trace/events/kvm.h> |
152 | 152 | ||
153 | #define CREATE_TRACE_POINTS | 153 | #define CREATE_TRACE_POINTS |
154 | #include "mmutrace.h" | 154 | #include "mmutrace.h" |
155 | 155 | ||
156 | #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) | 156 | #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) |
157 | 157 | ||
158 | #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) | 158 | #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) |
159 | 159 | ||
160 | struct kvm_rmap_desc { | 160 | struct kvm_rmap_desc { |
161 | u64 *sptes[RMAP_EXT]; | 161 | u64 *sptes[RMAP_EXT]; |
162 | struct kvm_rmap_desc *more; | 162 | struct kvm_rmap_desc *more; |
163 | }; | 163 | }; |
164 | 164 | ||
165 | struct kvm_shadow_walk_iterator { | 165 | struct kvm_shadow_walk_iterator { |
166 | u64 addr; | 166 | u64 addr; |
167 | hpa_t shadow_addr; | 167 | hpa_t shadow_addr; |
168 | int level; | 168 | int level; |
169 | u64 *sptep; | 169 | u64 *sptep; |
170 | unsigned index; | 170 | unsigned index; |
171 | }; | 171 | }; |
172 | 172 | ||
173 | #define for_each_shadow_entry(_vcpu, _addr, _walker) \ | 173 | #define for_each_shadow_entry(_vcpu, _addr, _walker) \ |
174 | for (shadow_walk_init(&(_walker), _vcpu, _addr); \ | 174 | for (shadow_walk_init(&(_walker), _vcpu, _addr); \ |
175 | shadow_walk_okay(&(_walker)); \ | 175 | shadow_walk_okay(&(_walker)); \ |
176 | shadow_walk_next(&(_walker))) | 176 | shadow_walk_next(&(_walker))) |
177 | 177 | ||
178 | typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte); | 178 | typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte); |
179 | 179 | ||
180 | static struct kmem_cache *pte_chain_cache; | 180 | static struct kmem_cache *pte_chain_cache; |
181 | static struct kmem_cache *rmap_desc_cache; | 181 | static struct kmem_cache *rmap_desc_cache; |
182 | static struct kmem_cache *mmu_page_header_cache; | 182 | static struct kmem_cache *mmu_page_header_cache; |
183 | 183 | ||
184 | static u64 __read_mostly shadow_trap_nonpresent_pte; | 184 | static u64 __read_mostly shadow_trap_nonpresent_pte; |
185 | static u64 __read_mostly shadow_notrap_nonpresent_pte; | 185 | static u64 __read_mostly shadow_notrap_nonpresent_pte; |
186 | static u64 __read_mostly shadow_base_present_pte; | 186 | static u64 __read_mostly shadow_base_present_pte; |
187 | static u64 __read_mostly shadow_nx_mask; | 187 | static u64 __read_mostly shadow_nx_mask; |
188 | static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ | 188 | static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ |
189 | static u64 __read_mostly shadow_user_mask; | 189 | static u64 __read_mostly shadow_user_mask; |
190 | static u64 __read_mostly shadow_accessed_mask; | 190 | static u64 __read_mostly shadow_accessed_mask; |
191 | static u64 __read_mostly shadow_dirty_mask; | 191 | static u64 __read_mostly shadow_dirty_mask; |
192 | 192 | ||
193 | static inline u64 rsvd_bits(int s, int e) | 193 | static inline u64 rsvd_bits(int s, int e) |
194 | { | 194 | { |
195 | return ((1ULL << (e - s + 1)) - 1) << s; | 195 | return ((1ULL << (e - s + 1)) - 1) << s; |
196 | } | 196 | } |
197 | 197 | ||
198 | void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) | 198 | void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) |
199 | { | 199 | { |
200 | shadow_trap_nonpresent_pte = trap_pte; | 200 | shadow_trap_nonpresent_pte = trap_pte; |
201 | shadow_notrap_nonpresent_pte = notrap_pte; | 201 | shadow_notrap_nonpresent_pte = notrap_pte; |
202 | } | 202 | } |
203 | EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); | 203 | EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); |
204 | 204 | ||
205 | void kvm_mmu_set_base_ptes(u64 base_pte) | 205 | void kvm_mmu_set_base_ptes(u64 base_pte) |
206 | { | 206 | { |
207 | shadow_base_present_pte = base_pte; | 207 | shadow_base_present_pte = base_pte; |
208 | } | 208 | } |
209 | EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); | 209 | EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); |
210 | 210 | ||
211 | void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, | 211 | void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, |
212 | u64 dirty_mask, u64 nx_mask, u64 x_mask) | 212 | u64 dirty_mask, u64 nx_mask, u64 x_mask) |
213 | { | 213 | { |
214 | shadow_user_mask = user_mask; | 214 | shadow_user_mask = user_mask; |
215 | shadow_accessed_mask = accessed_mask; | 215 | shadow_accessed_mask = accessed_mask; |
216 | shadow_dirty_mask = dirty_mask; | 216 | shadow_dirty_mask = dirty_mask; |
217 | shadow_nx_mask = nx_mask; | 217 | shadow_nx_mask = nx_mask; |
218 | shadow_x_mask = x_mask; | 218 | shadow_x_mask = x_mask; |
219 | } | 219 | } |
220 | EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); | 220 | EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); |
221 | 221 | ||
222 | static bool is_write_protection(struct kvm_vcpu *vcpu) | 222 | static bool is_write_protection(struct kvm_vcpu *vcpu) |
223 | { | 223 | { |
224 | return kvm_read_cr0_bits(vcpu, X86_CR0_WP); | 224 | return kvm_read_cr0_bits(vcpu, X86_CR0_WP); |
225 | } | 225 | } |
226 | 226 | ||
227 | static int is_cpuid_PSE36(void) | 227 | static int is_cpuid_PSE36(void) |
228 | { | 228 | { |
229 | return 1; | 229 | return 1; |
230 | } | 230 | } |
231 | 231 | ||
232 | static int is_nx(struct kvm_vcpu *vcpu) | 232 | static int is_nx(struct kvm_vcpu *vcpu) |
233 | { | 233 | { |
234 | return vcpu->arch.efer & EFER_NX; | 234 | return vcpu->arch.efer & EFER_NX; |
235 | } | 235 | } |
236 | 236 | ||
237 | static int is_shadow_present_pte(u64 pte) | 237 | static int is_shadow_present_pte(u64 pte) |
238 | { | 238 | { |
239 | return pte != shadow_trap_nonpresent_pte | 239 | return pte != shadow_trap_nonpresent_pte |
240 | && pte != shadow_notrap_nonpresent_pte; | 240 | && pte != shadow_notrap_nonpresent_pte; |
241 | } | 241 | } |
242 | 242 | ||
243 | static int is_large_pte(u64 pte) | 243 | static int is_large_pte(u64 pte) |
244 | { | 244 | { |
245 | return pte & PT_PAGE_SIZE_MASK; | 245 | return pte & PT_PAGE_SIZE_MASK; |
246 | } | 246 | } |
247 | 247 | ||
248 | static int is_writable_pte(unsigned long pte) | 248 | static int is_writable_pte(unsigned long pte) |
249 | { | 249 | { |
250 | return pte & PT_WRITABLE_MASK; | 250 | return pte & PT_WRITABLE_MASK; |
251 | } | 251 | } |
252 | 252 | ||
253 | static int is_dirty_gpte(unsigned long pte) | 253 | static int is_dirty_gpte(unsigned long pte) |
254 | { | 254 | { |
255 | return pte & PT_DIRTY_MASK; | 255 | return pte & PT_DIRTY_MASK; |
256 | } | 256 | } |
257 | 257 | ||
258 | static int is_rmap_spte(u64 pte) | 258 | static int is_rmap_spte(u64 pte) |
259 | { | 259 | { |
260 | return is_shadow_present_pte(pte); | 260 | return is_shadow_present_pte(pte); |
261 | } | 261 | } |
262 | 262 | ||
263 | static int is_last_spte(u64 pte, int level) | 263 | static int is_last_spte(u64 pte, int level) |
264 | { | 264 | { |
265 | if (level == PT_PAGE_TABLE_LEVEL) | 265 | if (level == PT_PAGE_TABLE_LEVEL) |
266 | return 1; | 266 | return 1; |
267 | if (is_large_pte(pte)) | 267 | if (is_large_pte(pte)) |
268 | return 1; | 268 | return 1; |
269 | return 0; | 269 | return 0; |
270 | } | 270 | } |
271 | 271 | ||
272 | static pfn_t spte_to_pfn(u64 pte) | 272 | static pfn_t spte_to_pfn(u64 pte) |
273 | { | 273 | { |
274 | return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; | 274 | return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; |
275 | } | 275 | } |
276 | 276 | ||
277 | static gfn_t pse36_gfn_delta(u32 gpte) | 277 | static gfn_t pse36_gfn_delta(u32 gpte) |
278 | { | 278 | { |
279 | int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT; | 279 | int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT; |
280 | 280 | ||
281 | return (gpte & PT32_DIR_PSE36_MASK) << shift; | 281 | return (gpte & PT32_DIR_PSE36_MASK) << shift; |
282 | } | 282 | } |
283 | 283 | ||
284 | static void __set_spte(u64 *sptep, u64 spte) | 284 | static void __set_spte(u64 *sptep, u64 spte) |
285 | { | 285 | { |
286 | #ifdef CONFIG_X86_64 | 286 | #ifdef CONFIG_X86_64 |
287 | set_64bit((unsigned long *)sptep, spte); | 287 | set_64bit((unsigned long *)sptep, spte); |
288 | #else | 288 | #else |
289 | set_64bit((unsigned long long *)sptep, spte); | 289 | set_64bit((unsigned long long *)sptep, spte); |
290 | #endif | 290 | #endif |
291 | } | 291 | } |
292 | 292 | ||
293 | static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, | 293 | static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, |
294 | struct kmem_cache *base_cache, int min) | 294 | struct kmem_cache *base_cache, int min) |
295 | { | 295 | { |
296 | void *obj; | 296 | void *obj; |
297 | 297 | ||
298 | if (cache->nobjs >= min) | 298 | if (cache->nobjs >= min) |
299 | return 0; | 299 | return 0; |
300 | while (cache->nobjs < ARRAY_SIZE(cache->objects)) { | 300 | while (cache->nobjs < ARRAY_SIZE(cache->objects)) { |
301 | obj = kmem_cache_zalloc(base_cache, GFP_KERNEL); | 301 | obj = kmem_cache_zalloc(base_cache, GFP_KERNEL); |
302 | if (!obj) | 302 | if (!obj) |
303 | return -ENOMEM; | 303 | return -ENOMEM; |
304 | cache->objects[cache->nobjs++] = obj; | 304 | cache->objects[cache->nobjs++] = obj; |
305 | } | 305 | } |
306 | return 0; | 306 | return 0; |
307 | } | 307 | } |
308 | 308 | ||
309 | static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, | 309 | static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, |
310 | struct kmem_cache *cache) | 310 | struct kmem_cache *cache) |
311 | { | 311 | { |
312 | while (mc->nobjs) | 312 | while (mc->nobjs) |
313 | kmem_cache_free(cache, mc->objects[--mc->nobjs]); | 313 | kmem_cache_free(cache, mc->objects[--mc->nobjs]); |
314 | } | 314 | } |
315 | 315 | ||
316 | static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, | 316 | static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, |
317 | int min) | 317 | int min) |
318 | { | 318 | { |
319 | struct page *page; | 319 | struct page *page; |
320 | 320 | ||
321 | if (cache->nobjs >= min) | 321 | if (cache->nobjs >= min) |
322 | return 0; | 322 | return 0; |
323 | while (cache->nobjs < ARRAY_SIZE(cache->objects)) { | 323 | while (cache->nobjs < ARRAY_SIZE(cache->objects)) { |
324 | page = alloc_page(GFP_KERNEL); | 324 | page = alloc_page(GFP_KERNEL); |
325 | if (!page) | 325 | if (!page) |
326 | return -ENOMEM; | 326 | return -ENOMEM; |
327 | cache->objects[cache->nobjs++] = page_address(page); | 327 | cache->objects[cache->nobjs++] = page_address(page); |
328 | } | 328 | } |
329 | return 0; | 329 | return 0; |
330 | } | 330 | } |
331 | 331 | ||
332 | static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc) | 332 | static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc) |
333 | { | 333 | { |
334 | while (mc->nobjs) | 334 | while (mc->nobjs) |
335 | free_page((unsigned long)mc->objects[--mc->nobjs]); | 335 | free_page((unsigned long)mc->objects[--mc->nobjs]); |
336 | } | 336 | } |
337 | 337 | ||
338 | static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu) | 338 | static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu) |
339 | { | 339 | { |
340 | int r; | 340 | int r; |
341 | 341 | ||
342 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache, | 342 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache, |
343 | pte_chain_cache, 4); | 343 | pte_chain_cache, 4); |
344 | if (r) | 344 | if (r) |
345 | goto out; | 345 | goto out; |
346 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, | 346 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, |
347 | rmap_desc_cache, 4); | 347 | rmap_desc_cache, 4); |
348 | if (r) | 348 | if (r) |
349 | goto out; | 349 | goto out; |
350 | r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8); | 350 | r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8); |
351 | if (r) | 351 | if (r) |
352 | goto out; | 352 | goto out; |
353 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, | 353 | r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, |
354 | mmu_page_header_cache, 4); | 354 | mmu_page_header_cache, 4); |
355 | out: | 355 | out: |
356 | return r; | 356 | return r; |
357 | } | 357 | } |
358 | 358 | ||
359 | static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) | 359 | static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) |
360 | { | 360 | { |
361 | mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache); | 361 | mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache); |
362 | mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache); | 362 | mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache); |
363 | mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache); | 363 | mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache); |
364 | mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache, | 364 | mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache, |
365 | mmu_page_header_cache); | 365 | mmu_page_header_cache); |
366 | } | 366 | } |
367 | 367 | ||
368 | static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc, | 368 | static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc, |
369 | size_t size) | 369 | size_t size) |
370 | { | 370 | { |
371 | void *p; | 371 | void *p; |
372 | 372 | ||
373 | BUG_ON(!mc->nobjs); | 373 | BUG_ON(!mc->nobjs); |
374 | p = mc->objects[--mc->nobjs]; | 374 | p = mc->objects[--mc->nobjs]; |
375 | return p; | 375 | return p; |
376 | } | 376 | } |
377 | 377 | ||
378 | static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu) | 378 | static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu) |
379 | { | 379 | { |
380 | return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache, | 380 | return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache, |
381 | sizeof(struct kvm_pte_chain)); | 381 | sizeof(struct kvm_pte_chain)); |
382 | } | 382 | } |
383 | 383 | ||
384 | static void mmu_free_pte_chain(struct kvm_pte_chain *pc) | 384 | static void mmu_free_pte_chain(struct kvm_pte_chain *pc) |
385 | { | 385 | { |
386 | kmem_cache_free(pte_chain_cache, pc); | 386 | kmem_cache_free(pte_chain_cache, pc); |
387 | } | 387 | } |
388 | 388 | ||
389 | static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) | 389 | static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) |
390 | { | 390 | { |
391 | return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache, | 391 | return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache, |
392 | sizeof(struct kvm_rmap_desc)); | 392 | sizeof(struct kvm_rmap_desc)); |
393 | } | 393 | } |
394 | 394 | ||
395 | static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) | 395 | static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) |
396 | { | 396 | { |
397 | kmem_cache_free(rmap_desc_cache, rd); | 397 | kmem_cache_free(rmap_desc_cache, rd); |
398 | } | 398 | } |
399 | 399 | ||
400 | static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) | 400 | static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) |
401 | { | 401 | { |
402 | if (!sp->role.direct) | 402 | if (!sp->role.direct) |
403 | return sp->gfns[index]; | 403 | return sp->gfns[index]; |
404 | 404 | ||
405 | return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); | 405 | return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); |
406 | } | 406 | } |
407 | 407 | ||
408 | static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) | 408 | static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) |
409 | { | 409 | { |
410 | if (sp->role.direct) | 410 | if (sp->role.direct) |
411 | BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index)); | 411 | BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index)); |
412 | else | 412 | else |
413 | sp->gfns[index] = gfn; | 413 | sp->gfns[index] = gfn; |
414 | } | 414 | } |
415 | 415 | ||
416 | /* | 416 | /* |
417 | * Return the pointer to the largepage write count for a given | 417 | * Return the pointer to the largepage write count for a given |
418 | * gfn, handling slots that are not large page aligned. | 418 | * gfn, handling slots that are not large page aligned. |
419 | */ | 419 | */ |
420 | static int *slot_largepage_idx(gfn_t gfn, | 420 | static int *slot_largepage_idx(gfn_t gfn, |
421 | struct kvm_memory_slot *slot, | 421 | struct kvm_memory_slot *slot, |
422 | int level) | 422 | int level) |
423 | { | 423 | { |
424 | unsigned long idx; | 424 | unsigned long idx; |
425 | 425 | ||
426 | idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - | 426 | idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - |
427 | (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); | 427 | (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); |
428 | return &slot->lpage_info[level - 2][idx].write_count; | 428 | return &slot->lpage_info[level - 2][idx].write_count; |
429 | } | 429 | } |
430 | 430 | ||
431 | static void account_shadowed(struct kvm *kvm, gfn_t gfn) | 431 | static void account_shadowed(struct kvm *kvm, gfn_t gfn) |
432 | { | 432 | { |
433 | struct kvm_memory_slot *slot; | 433 | struct kvm_memory_slot *slot; |
434 | int *write_count; | 434 | int *write_count; |
435 | int i; | 435 | int i; |
436 | 436 | ||
437 | gfn = unalias_gfn(kvm, gfn); | 437 | slot = gfn_to_memslot(kvm, gfn); |
438 | |||
439 | slot = gfn_to_memslot_unaliased(kvm, gfn); | ||
440 | for (i = PT_DIRECTORY_LEVEL; | 438 | for (i = PT_DIRECTORY_LEVEL; |
441 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { | 439 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { |
442 | write_count = slot_largepage_idx(gfn, slot, i); | 440 | write_count = slot_largepage_idx(gfn, slot, i); |
443 | *write_count += 1; | 441 | *write_count += 1; |
444 | } | 442 | } |
445 | } | 443 | } |
446 | 444 | ||
447 | static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) | 445 | static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) |
448 | { | 446 | { |
449 | struct kvm_memory_slot *slot; | 447 | struct kvm_memory_slot *slot; |
450 | int *write_count; | 448 | int *write_count; |
451 | int i; | 449 | int i; |
452 | 450 | ||
453 | gfn = unalias_gfn(kvm, gfn); | 451 | slot = gfn_to_memslot(kvm, gfn); |
454 | slot = gfn_to_memslot_unaliased(kvm, gfn); | ||
455 | for (i = PT_DIRECTORY_LEVEL; | 452 | for (i = PT_DIRECTORY_LEVEL; |
456 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { | 453 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { |
457 | write_count = slot_largepage_idx(gfn, slot, i); | 454 | write_count = slot_largepage_idx(gfn, slot, i); |
458 | *write_count -= 1; | 455 | *write_count -= 1; |
459 | WARN_ON(*write_count < 0); | 456 | WARN_ON(*write_count < 0); |
460 | } | 457 | } |
461 | } | 458 | } |
462 | 459 | ||
463 | static int has_wrprotected_page(struct kvm *kvm, | 460 | static int has_wrprotected_page(struct kvm *kvm, |
464 | gfn_t gfn, | 461 | gfn_t gfn, |
465 | int level) | 462 | int level) |
466 | { | 463 | { |
467 | struct kvm_memory_slot *slot; | 464 | struct kvm_memory_slot *slot; |
468 | int *largepage_idx; | 465 | int *largepage_idx; |
469 | 466 | ||
470 | gfn = unalias_gfn(kvm, gfn); | 467 | slot = gfn_to_memslot(kvm, gfn); |
471 | slot = gfn_to_memslot_unaliased(kvm, gfn); | ||
472 | if (slot) { | 468 | if (slot) { |
473 | largepage_idx = slot_largepage_idx(gfn, slot, level); | 469 | largepage_idx = slot_largepage_idx(gfn, slot, level); |
474 | return *largepage_idx; | 470 | return *largepage_idx; |
475 | } | 471 | } |
476 | 472 | ||
477 | return 1; | 473 | return 1; |
478 | } | 474 | } |
479 | 475 | ||
480 | static int host_mapping_level(struct kvm *kvm, gfn_t gfn) | 476 | static int host_mapping_level(struct kvm *kvm, gfn_t gfn) |
481 | { | 477 | { |
482 | unsigned long page_size; | 478 | unsigned long page_size; |
483 | int i, ret = 0; | 479 | int i, ret = 0; |
484 | 480 | ||
485 | page_size = kvm_host_page_size(kvm, gfn); | 481 | page_size = kvm_host_page_size(kvm, gfn); |
486 | 482 | ||
487 | for (i = PT_PAGE_TABLE_LEVEL; | 483 | for (i = PT_PAGE_TABLE_LEVEL; |
488 | i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) { | 484 | i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) { |
489 | if (page_size >= KVM_HPAGE_SIZE(i)) | 485 | if (page_size >= KVM_HPAGE_SIZE(i)) |
490 | ret = i; | 486 | ret = i; |
491 | else | 487 | else |
492 | break; | 488 | break; |
493 | } | 489 | } |
494 | 490 | ||
495 | return ret; | 491 | return ret; |
496 | } | 492 | } |
497 | 493 | ||
498 | static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) | 494 | static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) |
499 | { | 495 | { |
500 | struct kvm_memory_slot *slot; | 496 | struct kvm_memory_slot *slot; |
501 | int host_level, level, max_level; | 497 | int host_level, level, max_level; |
502 | 498 | ||
503 | slot = gfn_to_memslot(vcpu->kvm, large_gfn); | 499 | slot = gfn_to_memslot(vcpu->kvm, large_gfn); |
504 | if (slot && slot->dirty_bitmap) | 500 | if (slot && slot->dirty_bitmap) |
505 | return PT_PAGE_TABLE_LEVEL; | 501 | return PT_PAGE_TABLE_LEVEL; |
506 | 502 | ||
507 | host_level = host_mapping_level(vcpu->kvm, large_gfn); | 503 | host_level = host_mapping_level(vcpu->kvm, large_gfn); |
508 | 504 | ||
509 | if (host_level == PT_PAGE_TABLE_LEVEL) | 505 | if (host_level == PT_PAGE_TABLE_LEVEL) |
510 | return host_level; | 506 | return host_level; |
511 | 507 | ||
512 | max_level = kvm_x86_ops->get_lpage_level() < host_level ? | 508 | max_level = kvm_x86_ops->get_lpage_level() < host_level ? |
513 | kvm_x86_ops->get_lpage_level() : host_level; | 509 | kvm_x86_ops->get_lpage_level() : host_level; |
514 | 510 | ||
515 | for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) | 511 | for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) |
516 | if (has_wrprotected_page(vcpu->kvm, large_gfn, level)) | 512 | if (has_wrprotected_page(vcpu->kvm, large_gfn, level)) |
517 | break; | 513 | break; |
518 | 514 | ||
519 | return level - 1; | 515 | return level - 1; |
520 | } | 516 | } |
521 | 517 | ||
522 | /* | 518 | /* |
523 | * Take gfn and return the reverse mapping to it. | 519 | * Take gfn and return the reverse mapping to it. |
524 | * Note: gfn must be unaliased before this function get called | ||
525 | */ | 520 | */ |
526 | 521 | ||
527 | static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level) | 522 | static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level) |
528 | { | 523 | { |
529 | struct kvm_memory_slot *slot; | 524 | struct kvm_memory_slot *slot; |
530 | unsigned long idx; | 525 | unsigned long idx; |
531 | 526 | ||
532 | slot = gfn_to_memslot(kvm, gfn); | 527 | slot = gfn_to_memslot(kvm, gfn); |
533 | if (likely(level == PT_PAGE_TABLE_LEVEL)) | 528 | if (likely(level == PT_PAGE_TABLE_LEVEL)) |
534 | return &slot->rmap[gfn - slot->base_gfn]; | 529 | return &slot->rmap[gfn - slot->base_gfn]; |
535 | 530 | ||
536 | idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - | 531 | idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - |
537 | (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); | 532 | (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); |
538 | 533 | ||
539 | return &slot->lpage_info[level - 2][idx].rmap_pde; | 534 | return &slot->lpage_info[level - 2][idx].rmap_pde; |
540 | } | 535 | } |
541 | 536 | ||
542 | /* | 537 | /* |
543 | * Reverse mapping data structures: | 538 | * Reverse mapping data structures: |
544 | * | 539 | * |
545 | * If rmapp bit zero is zero, then rmapp point to the shadw page table entry | 540 | * If rmapp bit zero is zero, then rmapp point to the shadw page table entry |
546 | * that points to page_address(page). | 541 | * that points to page_address(page). |
547 | * | 542 | * |
548 | * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc | 543 | * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc |
549 | * containing more mappings. | 544 | * containing more mappings. |
550 | * | 545 | * |
551 | * Returns the number of rmap entries before the spte was added or zero if | 546 | * Returns the number of rmap entries before the spte was added or zero if |
552 | * the spte was not added. | 547 | * the spte was not added. |
553 | * | 548 | * |
554 | */ | 549 | */ |
555 | static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) | 550 | static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) |
556 | { | 551 | { |
557 | struct kvm_mmu_page *sp; | 552 | struct kvm_mmu_page *sp; |
558 | struct kvm_rmap_desc *desc; | 553 | struct kvm_rmap_desc *desc; |
559 | unsigned long *rmapp; | 554 | unsigned long *rmapp; |
560 | int i, count = 0; | 555 | int i, count = 0; |
561 | 556 | ||
562 | if (!is_rmap_spte(*spte)) | 557 | if (!is_rmap_spte(*spte)) |
563 | return count; | 558 | return count; |
564 | gfn = unalias_gfn(vcpu->kvm, gfn); | ||
565 | sp = page_header(__pa(spte)); | 559 | sp = page_header(__pa(spte)); |
566 | kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); | 560 | kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); |
567 | rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); | 561 | rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); |
568 | if (!*rmapp) { | 562 | if (!*rmapp) { |
569 | rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte); | 563 | rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte); |
570 | *rmapp = (unsigned long)spte; | 564 | *rmapp = (unsigned long)spte; |
571 | } else if (!(*rmapp & 1)) { | 565 | } else if (!(*rmapp & 1)) { |
572 | rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte); | 566 | rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte); |
573 | desc = mmu_alloc_rmap_desc(vcpu); | 567 | desc = mmu_alloc_rmap_desc(vcpu); |
574 | desc->sptes[0] = (u64 *)*rmapp; | 568 | desc->sptes[0] = (u64 *)*rmapp; |
575 | desc->sptes[1] = spte; | 569 | desc->sptes[1] = spte; |
576 | *rmapp = (unsigned long)desc | 1; | 570 | *rmapp = (unsigned long)desc | 1; |
577 | } else { | 571 | } else { |
578 | rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte); | 572 | rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte); |
579 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); | 573 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); |
580 | while (desc->sptes[RMAP_EXT-1] && desc->more) { | 574 | while (desc->sptes[RMAP_EXT-1] && desc->more) { |
581 | desc = desc->more; | 575 | desc = desc->more; |
582 | count += RMAP_EXT; | 576 | count += RMAP_EXT; |
583 | } | 577 | } |
584 | if (desc->sptes[RMAP_EXT-1]) { | 578 | if (desc->sptes[RMAP_EXT-1]) { |
585 | desc->more = mmu_alloc_rmap_desc(vcpu); | 579 | desc->more = mmu_alloc_rmap_desc(vcpu); |
586 | desc = desc->more; | 580 | desc = desc->more; |
587 | } | 581 | } |
588 | for (i = 0; desc->sptes[i]; ++i) | 582 | for (i = 0; desc->sptes[i]; ++i) |
589 | ; | 583 | ; |
590 | desc->sptes[i] = spte; | 584 | desc->sptes[i] = spte; |
591 | } | 585 | } |
592 | return count; | 586 | return count; |
593 | } | 587 | } |
594 | 588 | ||
595 | static void rmap_desc_remove_entry(unsigned long *rmapp, | 589 | static void rmap_desc_remove_entry(unsigned long *rmapp, |
596 | struct kvm_rmap_desc *desc, | 590 | struct kvm_rmap_desc *desc, |
597 | int i, | 591 | int i, |
598 | struct kvm_rmap_desc *prev_desc) | 592 | struct kvm_rmap_desc *prev_desc) |
599 | { | 593 | { |
600 | int j; | 594 | int j; |
601 | 595 | ||
602 | for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j) | 596 | for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j) |
603 | ; | 597 | ; |
604 | desc->sptes[i] = desc->sptes[j]; | 598 | desc->sptes[i] = desc->sptes[j]; |
605 | desc->sptes[j] = NULL; | 599 | desc->sptes[j] = NULL; |
606 | if (j != 0) | 600 | if (j != 0) |
607 | return; | 601 | return; |
608 | if (!prev_desc && !desc->more) | 602 | if (!prev_desc && !desc->more) |
609 | *rmapp = (unsigned long)desc->sptes[0]; | 603 | *rmapp = (unsigned long)desc->sptes[0]; |
610 | else | 604 | else |
611 | if (prev_desc) | 605 | if (prev_desc) |
612 | prev_desc->more = desc->more; | 606 | prev_desc->more = desc->more; |
613 | else | 607 | else |
614 | *rmapp = (unsigned long)desc->more | 1; | 608 | *rmapp = (unsigned long)desc->more | 1; |
615 | mmu_free_rmap_desc(desc); | 609 | mmu_free_rmap_desc(desc); |
616 | } | 610 | } |
617 | 611 | ||
618 | static void rmap_remove(struct kvm *kvm, u64 *spte) | 612 | static void rmap_remove(struct kvm *kvm, u64 *spte) |
619 | { | 613 | { |
620 | struct kvm_rmap_desc *desc; | 614 | struct kvm_rmap_desc *desc; |
621 | struct kvm_rmap_desc *prev_desc; | 615 | struct kvm_rmap_desc *prev_desc; |
622 | struct kvm_mmu_page *sp; | 616 | struct kvm_mmu_page *sp; |
623 | pfn_t pfn; | 617 | pfn_t pfn; |
624 | gfn_t gfn; | 618 | gfn_t gfn; |
625 | unsigned long *rmapp; | 619 | unsigned long *rmapp; |
626 | int i; | 620 | int i; |
627 | 621 | ||
628 | if (!is_rmap_spte(*spte)) | 622 | if (!is_rmap_spte(*spte)) |
629 | return; | 623 | return; |
630 | sp = page_header(__pa(spte)); | 624 | sp = page_header(__pa(spte)); |
631 | pfn = spte_to_pfn(*spte); | 625 | pfn = spte_to_pfn(*spte); |
632 | if (*spte & shadow_accessed_mask) | 626 | if (*spte & shadow_accessed_mask) |
633 | kvm_set_pfn_accessed(pfn); | 627 | kvm_set_pfn_accessed(pfn); |
634 | if (is_writable_pte(*spte)) | 628 | if (is_writable_pte(*spte)) |
635 | kvm_set_pfn_dirty(pfn); | 629 | kvm_set_pfn_dirty(pfn); |
636 | gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); | 630 | gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); |
637 | rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); | 631 | rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); |
638 | if (!*rmapp) { | 632 | if (!*rmapp) { |
639 | printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); | 633 | printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); |
640 | BUG(); | 634 | BUG(); |
641 | } else if (!(*rmapp & 1)) { | 635 | } else if (!(*rmapp & 1)) { |
642 | rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte); | 636 | rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte); |
643 | if ((u64 *)*rmapp != spte) { | 637 | if ((u64 *)*rmapp != spte) { |
644 | printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n", | 638 | printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n", |
645 | spte, *spte); | 639 | spte, *spte); |
646 | BUG(); | 640 | BUG(); |
647 | } | 641 | } |
648 | *rmapp = 0; | 642 | *rmapp = 0; |
649 | } else { | 643 | } else { |
650 | rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte); | 644 | rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte); |
651 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); | 645 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); |
652 | prev_desc = NULL; | 646 | prev_desc = NULL; |
653 | while (desc) { | 647 | while (desc) { |
654 | for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) | 648 | for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) |
655 | if (desc->sptes[i] == spte) { | 649 | if (desc->sptes[i] == spte) { |
656 | rmap_desc_remove_entry(rmapp, | 650 | rmap_desc_remove_entry(rmapp, |
657 | desc, i, | 651 | desc, i, |
658 | prev_desc); | 652 | prev_desc); |
659 | return; | 653 | return; |
660 | } | 654 | } |
661 | prev_desc = desc; | 655 | prev_desc = desc; |
662 | desc = desc->more; | 656 | desc = desc->more; |
663 | } | 657 | } |
664 | pr_err("rmap_remove: %p %llx many->many\n", spte, *spte); | 658 | pr_err("rmap_remove: %p %llx many->many\n", spte, *spte); |
665 | BUG(); | 659 | BUG(); |
666 | } | 660 | } |
667 | } | 661 | } |
668 | 662 | ||
669 | static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) | 663 | static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) |
670 | { | 664 | { |
671 | struct kvm_rmap_desc *desc; | 665 | struct kvm_rmap_desc *desc; |
672 | u64 *prev_spte; | 666 | u64 *prev_spte; |
673 | int i; | 667 | int i; |
674 | 668 | ||
675 | if (!*rmapp) | 669 | if (!*rmapp) |
676 | return NULL; | 670 | return NULL; |
677 | else if (!(*rmapp & 1)) { | 671 | else if (!(*rmapp & 1)) { |
678 | if (!spte) | 672 | if (!spte) |
679 | return (u64 *)*rmapp; | 673 | return (u64 *)*rmapp; |
680 | return NULL; | 674 | return NULL; |
681 | } | 675 | } |
682 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); | 676 | desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); |
683 | prev_spte = NULL; | 677 | prev_spte = NULL; |
684 | while (desc) { | 678 | while (desc) { |
685 | for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) { | 679 | for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) { |
686 | if (prev_spte == spte) | 680 | if (prev_spte == spte) |
687 | return desc->sptes[i]; | 681 | return desc->sptes[i]; |
688 | prev_spte = desc->sptes[i]; | 682 | prev_spte = desc->sptes[i]; |
689 | } | 683 | } |
690 | desc = desc->more; | 684 | desc = desc->more; |
691 | } | 685 | } |
692 | return NULL; | 686 | return NULL; |
693 | } | 687 | } |
694 | 688 | ||
695 | static int rmap_write_protect(struct kvm *kvm, u64 gfn) | 689 | static int rmap_write_protect(struct kvm *kvm, u64 gfn) |
696 | { | 690 | { |
697 | unsigned long *rmapp; | 691 | unsigned long *rmapp; |
698 | u64 *spte; | 692 | u64 *spte; |
699 | int i, write_protected = 0; | 693 | int i, write_protected = 0; |
700 | 694 | ||
701 | gfn = unalias_gfn(kvm, gfn); | ||
702 | rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL); | 695 | rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL); |
703 | 696 | ||
704 | spte = rmap_next(kvm, rmapp, NULL); | 697 | spte = rmap_next(kvm, rmapp, NULL); |
705 | while (spte) { | 698 | while (spte) { |
706 | BUG_ON(!spte); | 699 | BUG_ON(!spte); |
707 | BUG_ON(!(*spte & PT_PRESENT_MASK)); | 700 | BUG_ON(!(*spte & PT_PRESENT_MASK)); |
708 | rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte); | 701 | rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte); |
709 | if (is_writable_pte(*spte)) { | 702 | if (is_writable_pte(*spte)) { |
710 | __set_spte(spte, *spte & ~PT_WRITABLE_MASK); | 703 | __set_spte(spte, *spte & ~PT_WRITABLE_MASK); |
711 | write_protected = 1; | 704 | write_protected = 1; |
712 | } | 705 | } |
713 | spte = rmap_next(kvm, rmapp, spte); | 706 | spte = rmap_next(kvm, rmapp, spte); |
714 | } | 707 | } |
715 | if (write_protected) { | 708 | if (write_protected) { |
716 | pfn_t pfn; | 709 | pfn_t pfn; |
717 | 710 | ||
718 | spte = rmap_next(kvm, rmapp, NULL); | 711 | spte = rmap_next(kvm, rmapp, NULL); |
719 | pfn = spte_to_pfn(*spte); | 712 | pfn = spte_to_pfn(*spte); |
720 | kvm_set_pfn_dirty(pfn); | 713 | kvm_set_pfn_dirty(pfn); |
721 | } | 714 | } |
722 | 715 | ||
723 | /* check for huge page mappings */ | 716 | /* check for huge page mappings */ |
724 | for (i = PT_DIRECTORY_LEVEL; | 717 | for (i = PT_DIRECTORY_LEVEL; |
725 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { | 718 | i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { |
726 | rmapp = gfn_to_rmap(kvm, gfn, i); | 719 | rmapp = gfn_to_rmap(kvm, gfn, i); |
727 | spte = rmap_next(kvm, rmapp, NULL); | 720 | spte = rmap_next(kvm, rmapp, NULL); |
728 | while (spte) { | 721 | while (spte) { |
729 | BUG_ON(!spte); | 722 | BUG_ON(!spte); |
730 | BUG_ON(!(*spte & PT_PRESENT_MASK)); | 723 | BUG_ON(!(*spte & PT_PRESENT_MASK)); |
731 | BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)); | 724 | BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)); |
732 | pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn); | 725 | pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn); |
733 | if (is_writable_pte(*spte)) { | 726 | if (is_writable_pte(*spte)) { |
734 | rmap_remove(kvm, spte); | 727 | rmap_remove(kvm, spte); |
735 | --kvm->stat.lpages; | 728 | --kvm->stat.lpages; |
736 | __set_spte(spte, shadow_trap_nonpresent_pte); | 729 | __set_spte(spte, shadow_trap_nonpresent_pte); |
737 | spte = NULL; | 730 | spte = NULL; |
738 | write_protected = 1; | 731 | write_protected = 1; |
739 | } | 732 | } |
740 | spte = rmap_next(kvm, rmapp, spte); | 733 | spte = rmap_next(kvm, rmapp, spte); |
741 | } | 734 | } |
742 | } | 735 | } |
743 | 736 | ||
744 | return write_protected; | 737 | return write_protected; |
745 | } | 738 | } |
746 | 739 | ||
747 | static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, | 740 | static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, |
748 | unsigned long data) | 741 | unsigned long data) |
749 | { | 742 | { |
750 | u64 *spte; | 743 | u64 *spte; |
751 | int need_tlb_flush = 0; | 744 | int need_tlb_flush = 0; |
752 | 745 | ||
753 | while ((spte = rmap_next(kvm, rmapp, NULL))) { | 746 | while ((spte = rmap_next(kvm, rmapp, NULL))) { |
754 | BUG_ON(!(*spte & PT_PRESENT_MASK)); | 747 | BUG_ON(!(*spte & PT_PRESENT_MASK)); |
755 | rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); | 748 | rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); |
756 | rmap_remove(kvm, spte); | 749 | rmap_remove(kvm, spte); |
757 | __set_spte(spte, shadow_trap_nonpresent_pte); | 750 | __set_spte(spte, shadow_trap_nonpresent_pte); |
758 | need_tlb_flush = 1; | 751 | need_tlb_flush = 1; |
759 | } | 752 | } |
760 | return need_tlb_flush; | 753 | return need_tlb_flush; |
761 | } | 754 | } |
762 | 755 | ||
763 | static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, | 756 | static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, |
764 | unsigned long data) | 757 | unsigned long data) |
765 | { | 758 | { |
766 | int need_flush = 0; | 759 | int need_flush = 0; |
767 | u64 *spte, new_spte; | 760 | u64 *spte, new_spte; |
768 | pte_t *ptep = (pte_t *)data; | 761 | pte_t *ptep = (pte_t *)data; |
769 | pfn_t new_pfn; | 762 | pfn_t new_pfn; |
770 | 763 | ||
771 | WARN_ON(pte_huge(*ptep)); | 764 | WARN_ON(pte_huge(*ptep)); |
772 | new_pfn = pte_pfn(*ptep); | 765 | new_pfn = pte_pfn(*ptep); |
773 | spte = rmap_next(kvm, rmapp, NULL); | 766 | spte = rmap_next(kvm, rmapp, NULL); |
774 | while (spte) { | 767 | while (spte) { |
775 | BUG_ON(!is_shadow_present_pte(*spte)); | 768 | BUG_ON(!is_shadow_present_pte(*spte)); |
776 | rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte); | 769 | rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte); |
777 | need_flush = 1; | 770 | need_flush = 1; |
778 | if (pte_write(*ptep)) { | 771 | if (pte_write(*ptep)) { |
779 | rmap_remove(kvm, spte); | 772 | rmap_remove(kvm, spte); |
780 | __set_spte(spte, shadow_trap_nonpresent_pte); | 773 | __set_spte(spte, shadow_trap_nonpresent_pte); |
781 | spte = rmap_next(kvm, rmapp, NULL); | 774 | spte = rmap_next(kvm, rmapp, NULL); |
782 | } else { | 775 | } else { |
783 | new_spte = *spte &~ (PT64_BASE_ADDR_MASK); | 776 | new_spte = *spte &~ (PT64_BASE_ADDR_MASK); |
784 | new_spte |= (u64)new_pfn << PAGE_SHIFT; | 777 | new_spte |= (u64)new_pfn << PAGE_SHIFT; |
785 | 778 | ||
786 | new_spte &= ~PT_WRITABLE_MASK; | 779 | new_spte &= ~PT_WRITABLE_MASK; |
787 | new_spte &= ~SPTE_HOST_WRITEABLE; | 780 | new_spte &= ~SPTE_HOST_WRITEABLE; |
788 | if (is_writable_pte(*spte)) | 781 | if (is_writable_pte(*spte)) |
789 | kvm_set_pfn_dirty(spte_to_pfn(*spte)); | 782 | kvm_set_pfn_dirty(spte_to_pfn(*spte)); |
790 | __set_spte(spte, new_spte); | 783 | __set_spte(spte, new_spte); |
791 | spte = rmap_next(kvm, rmapp, spte); | 784 | spte = rmap_next(kvm, rmapp, spte); |
792 | } | 785 | } |
793 | } | 786 | } |
794 | if (need_flush) | 787 | if (need_flush) |
795 | kvm_flush_remote_tlbs(kvm); | 788 | kvm_flush_remote_tlbs(kvm); |
796 | 789 | ||
797 | return 0; | 790 | return 0; |
798 | } | 791 | } |
799 | 792 | ||
800 | static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, | 793 | static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, |
801 | unsigned long data, | 794 | unsigned long data, |
802 | int (*handler)(struct kvm *kvm, unsigned long *rmapp, | 795 | int (*handler)(struct kvm *kvm, unsigned long *rmapp, |
803 | unsigned long data)) | 796 | unsigned long data)) |
804 | { | 797 | { |
805 | int i, j; | 798 | int i, j; |
806 | int ret; | 799 | int ret; |
807 | int retval = 0; | 800 | int retval = 0; |
808 | struct kvm_memslots *slots; | 801 | struct kvm_memslots *slots; |
809 | 802 | ||
810 | slots = kvm_memslots(kvm); | 803 | slots = kvm_memslots(kvm); |
811 | 804 | ||
812 | for (i = 0; i < slots->nmemslots; i++) { | 805 | for (i = 0; i < slots->nmemslots; i++) { |
813 | struct kvm_memory_slot *memslot = &slots->memslots[i]; | 806 | struct kvm_memory_slot *memslot = &slots->memslots[i]; |
814 | unsigned long start = memslot->userspace_addr; | 807 | unsigned long start = memslot->userspace_addr; |
815 | unsigned long end; | 808 | unsigned long end; |
816 | 809 | ||
817 | end = start + (memslot->npages << PAGE_SHIFT); | 810 | end = start + (memslot->npages << PAGE_SHIFT); |
818 | if (hva >= start && hva < end) { | 811 | if (hva >= start && hva < end) { |
819 | gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; | 812 | gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; |
820 | 813 | ||
821 | ret = handler(kvm, &memslot->rmap[gfn_offset], data); | 814 | ret = handler(kvm, &memslot->rmap[gfn_offset], data); |
822 | 815 | ||
823 | for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) { | 816 | for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) { |
824 | int idx = gfn_offset; | 817 | int idx = gfn_offset; |
825 | idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j); | 818 | idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j); |
826 | ret |= handler(kvm, | 819 | ret |= handler(kvm, |
827 | &memslot->lpage_info[j][idx].rmap_pde, | 820 | &memslot->lpage_info[j][idx].rmap_pde, |
828 | data); | 821 | data); |
829 | } | 822 | } |
830 | trace_kvm_age_page(hva, memslot, ret); | 823 | trace_kvm_age_page(hva, memslot, ret); |
831 | retval |= ret; | 824 | retval |= ret; |
832 | } | 825 | } |
833 | } | 826 | } |
834 | 827 | ||
835 | return retval; | 828 | return retval; |
836 | } | 829 | } |
837 | 830 | ||
838 | int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) | 831 | int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) |
839 | { | 832 | { |
840 | return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp); | 833 | return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp); |
841 | } | 834 | } |
842 | 835 | ||
843 | void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) | 836 | void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) |
844 | { | 837 | { |
845 | kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); | 838 | kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); |
846 | } | 839 | } |
847 | 840 | ||
848 | static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, | 841 | static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, |
849 | unsigned long data) | 842 | unsigned long data) |
850 | { | 843 | { |
851 | u64 *spte; | 844 | u64 *spte; |
852 | int young = 0; | 845 | int young = 0; |
853 | 846 | ||
854 | /* | 847 | /* |
855 | * Emulate the accessed bit for EPT, by checking if this page has | 848 | * Emulate the accessed bit for EPT, by checking if this page has |
856 | * an EPT mapping, and clearing it if it does. On the next access, | 849 | * an EPT mapping, and clearing it if it does. On the next access, |
857 | * a new EPT mapping will be established. | 850 | * a new EPT mapping will be established. |
858 | * This has some overhead, but not as much as the cost of swapping | 851 | * This has some overhead, but not as much as the cost of swapping |
859 | * out actively used pages or breaking up actively used hugepages. | 852 | * out actively used pages or breaking up actively used hugepages. |
860 | */ | 853 | */ |
861 | if (!shadow_accessed_mask) | 854 | if (!shadow_accessed_mask) |
862 | return kvm_unmap_rmapp(kvm, rmapp, data); | 855 | return kvm_unmap_rmapp(kvm, rmapp, data); |
863 | 856 | ||
864 | spte = rmap_next(kvm, rmapp, NULL); | 857 | spte = rmap_next(kvm, rmapp, NULL); |
865 | while (spte) { | 858 | while (spte) { |
866 | int _young; | 859 | int _young; |
867 | u64 _spte = *spte; | 860 | u64 _spte = *spte; |
868 | BUG_ON(!(_spte & PT_PRESENT_MASK)); | 861 | BUG_ON(!(_spte & PT_PRESENT_MASK)); |
869 | _young = _spte & PT_ACCESSED_MASK; | 862 | _young = _spte & PT_ACCESSED_MASK; |
870 | if (_young) { | 863 | if (_young) { |
871 | young = 1; | 864 | young = 1; |
872 | clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); | 865 | clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); |
873 | } | 866 | } |
874 | spte = rmap_next(kvm, rmapp, spte); | 867 | spte = rmap_next(kvm, rmapp, spte); |
875 | } | 868 | } |
876 | return young; | 869 | return young; |
877 | } | 870 | } |
878 | 871 | ||
879 | #define RMAP_RECYCLE_THRESHOLD 1000 | 872 | #define RMAP_RECYCLE_THRESHOLD 1000 |
880 | 873 | ||
881 | static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) | 874 | static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) |
882 | { | 875 | { |
883 | unsigned long *rmapp; | 876 | unsigned long *rmapp; |
884 | struct kvm_mmu_page *sp; | 877 | struct kvm_mmu_page *sp; |
885 | 878 | ||
886 | sp = page_header(__pa(spte)); | 879 | sp = page_header(__pa(spte)); |
887 | 880 | ||
888 | gfn = unalias_gfn(vcpu->kvm, gfn); | ||
889 | rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); | 881 | rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); |
890 | 882 | ||
891 | kvm_unmap_rmapp(vcpu->kvm, rmapp, 0); | 883 | kvm_unmap_rmapp(vcpu->kvm, rmapp, 0); |
892 | kvm_flush_remote_tlbs(vcpu->kvm); | 884 | kvm_flush_remote_tlbs(vcpu->kvm); |
893 | } | 885 | } |
894 | 886 | ||
895 | int kvm_age_hva(struct kvm *kvm, unsigned long hva) | 887 | int kvm_age_hva(struct kvm *kvm, unsigned long hva) |
896 | { | 888 | { |
897 | return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp); | 889 | return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp); |
898 | } | 890 | } |
899 | 891 | ||
900 | #ifdef MMU_DEBUG | 892 | #ifdef MMU_DEBUG |
901 | static int is_empty_shadow_page(u64 *spt) | 893 | static int is_empty_shadow_page(u64 *spt) |
902 | { | 894 | { |
903 | u64 *pos; | 895 | u64 *pos; |
904 | u64 *end; | 896 | u64 *end; |
905 | 897 | ||
906 | for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++) | 898 | for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++) |
907 | if (is_shadow_present_pte(*pos)) { | 899 | if (is_shadow_present_pte(*pos)) { |
908 | printk(KERN_ERR "%s: %p %llx\n", __func__, | 900 | printk(KERN_ERR "%s: %p %llx\n", __func__, |
909 | pos, *pos); | 901 | pos, *pos); |
910 | return 0; | 902 | return 0; |
911 | } | 903 | } |
912 | return 1; | 904 | return 1; |
913 | } | 905 | } |
914 | #endif | 906 | #endif |
915 | 907 | ||
916 | static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) | 908 | static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) |
917 | { | 909 | { |
918 | ASSERT(is_empty_shadow_page(sp->spt)); | 910 | ASSERT(is_empty_shadow_page(sp->spt)); |
919 | hlist_del(&sp->hash_link); | 911 | hlist_del(&sp->hash_link); |
920 | list_del(&sp->link); | 912 | list_del(&sp->link); |
921 | __free_page(virt_to_page(sp->spt)); | 913 | __free_page(virt_to_page(sp->spt)); |
922 | if (!sp->role.direct) | 914 | if (!sp->role.direct) |
923 | __free_page(virt_to_page(sp->gfns)); | 915 | __free_page(virt_to_page(sp->gfns)); |
924 | kmem_cache_free(mmu_page_header_cache, sp); | 916 | kmem_cache_free(mmu_page_header_cache, sp); |
925 | ++kvm->arch.n_free_mmu_pages; | 917 | ++kvm->arch.n_free_mmu_pages; |
926 | } | 918 | } |
927 | 919 | ||
928 | static unsigned kvm_page_table_hashfn(gfn_t gfn) | 920 | static unsigned kvm_page_table_hashfn(gfn_t gfn) |
929 | { | 921 | { |
930 | return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1); | 922 | return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1); |
931 | } | 923 | } |
932 | 924 | ||
933 | static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, | 925 | static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, |
934 | u64 *parent_pte, int direct) | 926 | u64 *parent_pte, int direct) |
935 | { | 927 | { |
936 | struct kvm_mmu_page *sp; | 928 | struct kvm_mmu_page *sp; |
937 | 929 | ||
938 | sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp); | 930 | sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp); |
939 | sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); | 931 | sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); |
940 | if (!direct) | 932 | if (!direct) |
941 | sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, | 933 | sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, |
942 | PAGE_SIZE); | 934 | PAGE_SIZE); |
943 | set_page_private(virt_to_page(sp->spt), (unsigned long)sp); | 935 | set_page_private(virt_to_page(sp->spt), (unsigned long)sp); |
944 | list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); | 936 | list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); |
945 | bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); | 937 | bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); |
946 | sp->multimapped = 0; | 938 | sp->multimapped = 0; |
947 | sp->parent_pte = parent_pte; | 939 | sp->parent_pte = parent_pte; |
948 | --vcpu->kvm->arch.n_free_mmu_pages; | 940 | --vcpu->kvm->arch.n_free_mmu_pages; |
949 | return sp; | 941 | return sp; |
950 | } | 942 | } |
951 | 943 | ||
952 | static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu, | 944 | static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu, |
953 | struct kvm_mmu_page *sp, u64 *parent_pte) | 945 | struct kvm_mmu_page *sp, u64 *parent_pte) |
954 | { | 946 | { |
955 | struct kvm_pte_chain *pte_chain; | 947 | struct kvm_pte_chain *pte_chain; |
956 | struct hlist_node *node; | 948 | struct hlist_node *node; |
957 | int i; | 949 | int i; |
958 | 950 | ||
959 | if (!parent_pte) | 951 | if (!parent_pte) |
960 | return; | 952 | return; |
961 | if (!sp->multimapped) { | 953 | if (!sp->multimapped) { |
962 | u64 *old = sp->parent_pte; | 954 | u64 *old = sp->parent_pte; |
963 | 955 | ||
964 | if (!old) { | 956 | if (!old) { |
965 | sp->parent_pte = parent_pte; | 957 | sp->parent_pte = parent_pte; |
966 | return; | 958 | return; |
967 | } | 959 | } |
968 | sp->multimapped = 1; | 960 | sp->multimapped = 1; |
969 | pte_chain = mmu_alloc_pte_chain(vcpu); | 961 | pte_chain = mmu_alloc_pte_chain(vcpu); |
970 | INIT_HLIST_HEAD(&sp->parent_ptes); | 962 | INIT_HLIST_HEAD(&sp->parent_ptes); |
971 | hlist_add_head(&pte_chain->link, &sp->parent_ptes); | 963 | hlist_add_head(&pte_chain->link, &sp->parent_ptes); |
972 | pte_chain->parent_ptes[0] = old; | 964 | pte_chain->parent_ptes[0] = old; |
973 | } | 965 | } |
974 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) { | 966 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) { |
975 | if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1]) | 967 | if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1]) |
976 | continue; | 968 | continue; |
977 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) | 969 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) |
978 | if (!pte_chain->parent_ptes[i]) { | 970 | if (!pte_chain->parent_ptes[i]) { |
979 | pte_chain->parent_ptes[i] = parent_pte; | 971 | pte_chain->parent_ptes[i] = parent_pte; |
980 | return; | 972 | return; |
981 | } | 973 | } |
982 | } | 974 | } |
983 | pte_chain = mmu_alloc_pte_chain(vcpu); | 975 | pte_chain = mmu_alloc_pte_chain(vcpu); |
984 | BUG_ON(!pte_chain); | 976 | BUG_ON(!pte_chain); |
985 | hlist_add_head(&pte_chain->link, &sp->parent_ptes); | 977 | hlist_add_head(&pte_chain->link, &sp->parent_ptes); |
986 | pte_chain->parent_ptes[0] = parent_pte; | 978 | pte_chain->parent_ptes[0] = parent_pte; |
987 | } | 979 | } |
988 | 980 | ||
989 | static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp, | 981 | static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp, |
990 | u64 *parent_pte) | 982 | u64 *parent_pte) |
991 | { | 983 | { |
992 | struct kvm_pte_chain *pte_chain; | 984 | struct kvm_pte_chain *pte_chain; |
993 | struct hlist_node *node; | 985 | struct hlist_node *node; |
994 | int i; | 986 | int i; |
995 | 987 | ||
996 | if (!sp->multimapped) { | 988 | if (!sp->multimapped) { |
997 | BUG_ON(sp->parent_pte != parent_pte); | 989 | BUG_ON(sp->parent_pte != parent_pte); |
998 | sp->parent_pte = NULL; | 990 | sp->parent_pte = NULL; |
999 | return; | 991 | return; |
1000 | } | 992 | } |
1001 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) | 993 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) |
1002 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { | 994 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { |
1003 | if (!pte_chain->parent_ptes[i]) | 995 | if (!pte_chain->parent_ptes[i]) |
1004 | break; | 996 | break; |
1005 | if (pte_chain->parent_ptes[i] != parent_pte) | 997 | if (pte_chain->parent_ptes[i] != parent_pte) |
1006 | continue; | 998 | continue; |
1007 | while (i + 1 < NR_PTE_CHAIN_ENTRIES | 999 | while (i + 1 < NR_PTE_CHAIN_ENTRIES |
1008 | && pte_chain->parent_ptes[i + 1]) { | 1000 | && pte_chain->parent_ptes[i + 1]) { |
1009 | pte_chain->parent_ptes[i] | 1001 | pte_chain->parent_ptes[i] |
1010 | = pte_chain->parent_ptes[i + 1]; | 1002 | = pte_chain->parent_ptes[i + 1]; |
1011 | ++i; | 1003 | ++i; |
1012 | } | 1004 | } |
1013 | pte_chain->parent_ptes[i] = NULL; | 1005 | pte_chain->parent_ptes[i] = NULL; |
1014 | if (i == 0) { | 1006 | if (i == 0) { |
1015 | hlist_del(&pte_chain->link); | 1007 | hlist_del(&pte_chain->link); |
1016 | mmu_free_pte_chain(pte_chain); | 1008 | mmu_free_pte_chain(pte_chain); |
1017 | if (hlist_empty(&sp->parent_ptes)) { | 1009 | if (hlist_empty(&sp->parent_ptes)) { |
1018 | sp->multimapped = 0; | 1010 | sp->multimapped = 0; |
1019 | sp->parent_pte = NULL; | 1011 | sp->parent_pte = NULL; |
1020 | } | 1012 | } |
1021 | } | 1013 | } |
1022 | return; | 1014 | return; |
1023 | } | 1015 | } |
1024 | BUG(); | 1016 | BUG(); |
1025 | } | 1017 | } |
1026 | 1018 | ||
1027 | static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn) | 1019 | static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn) |
1028 | { | 1020 | { |
1029 | struct kvm_pte_chain *pte_chain; | 1021 | struct kvm_pte_chain *pte_chain; |
1030 | struct hlist_node *node; | 1022 | struct hlist_node *node; |
1031 | struct kvm_mmu_page *parent_sp; | 1023 | struct kvm_mmu_page *parent_sp; |
1032 | int i; | 1024 | int i; |
1033 | 1025 | ||
1034 | if (!sp->multimapped && sp->parent_pte) { | 1026 | if (!sp->multimapped && sp->parent_pte) { |
1035 | parent_sp = page_header(__pa(sp->parent_pte)); | 1027 | parent_sp = page_header(__pa(sp->parent_pte)); |
1036 | fn(parent_sp, sp->parent_pte); | 1028 | fn(parent_sp, sp->parent_pte); |
1037 | return; | 1029 | return; |
1038 | } | 1030 | } |
1039 | 1031 | ||
1040 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) | 1032 | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) |
1041 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { | 1033 | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { |
1042 | u64 *spte = pte_chain->parent_ptes[i]; | 1034 | u64 *spte = pte_chain->parent_ptes[i]; |
1043 | 1035 | ||
1044 | if (!spte) | 1036 | if (!spte) |
1045 | break; | 1037 | break; |
1046 | parent_sp = page_header(__pa(spte)); | 1038 | parent_sp = page_header(__pa(spte)); |
1047 | fn(parent_sp, spte); | 1039 | fn(parent_sp, spte); |
1048 | } | 1040 | } |
1049 | } | 1041 | } |
1050 | 1042 | ||
1051 | static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte); | 1043 | static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte); |
1052 | static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) | 1044 | static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) |
1053 | { | 1045 | { |
1054 | mmu_parent_walk(sp, mark_unsync); | 1046 | mmu_parent_walk(sp, mark_unsync); |
1055 | } | 1047 | } |
1056 | 1048 | ||
1057 | static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte) | 1049 | static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte) |
1058 | { | 1050 | { |
1059 | unsigned int index; | 1051 | unsigned int index; |
1060 | 1052 | ||
1061 | index = spte - sp->spt; | 1053 | index = spte - sp->spt; |
1062 | if (__test_and_set_bit(index, sp->unsync_child_bitmap)) | 1054 | if (__test_and_set_bit(index, sp->unsync_child_bitmap)) |
1063 | return; | 1055 | return; |
1064 | if (sp->unsync_children++) | 1056 | if (sp->unsync_children++) |
1065 | return; | 1057 | return; |
1066 | kvm_mmu_mark_parents_unsync(sp); | 1058 | kvm_mmu_mark_parents_unsync(sp); |
1067 | } | 1059 | } |
1068 | 1060 | ||
1069 | static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu, | 1061 | static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu, |
1070 | struct kvm_mmu_page *sp) | 1062 | struct kvm_mmu_page *sp) |
1071 | { | 1063 | { |
1072 | int i; | 1064 | int i; |
1073 | 1065 | ||
1074 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) | 1066 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) |
1075 | sp->spt[i] = shadow_trap_nonpresent_pte; | 1067 | sp->spt[i] = shadow_trap_nonpresent_pte; |
1076 | } | 1068 | } |
1077 | 1069 | ||
1078 | static int nonpaging_sync_page(struct kvm_vcpu *vcpu, | 1070 | static int nonpaging_sync_page(struct kvm_vcpu *vcpu, |
1079 | struct kvm_mmu_page *sp, bool clear_unsync) | 1071 | struct kvm_mmu_page *sp, bool clear_unsync) |
1080 | { | 1072 | { |
1081 | return 1; | 1073 | return 1; |
1082 | } | 1074 | } |
1083 | 1075 | ||
1084 | static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva) | 1076 | static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva) |
1085 | { | 1077 | { |
1086 | } | 1078 | } |
1087 | 1079 | ||
1088 | #define KVM_PAGE_ARRAY_NR 16 | 1080 | #define KVM_PAGE_ARRAY_NR 16 |
1089 | 1081 | ||
1090 | struct kvm_mmu_pages { | 1082 | struct kvm_mmu_pages { |
1091 | struct mmu_page_and_offset { | 1083 | struct mmu_page_and_offset { |
1092 | struct kvm_mmu_page *sp; | 1084 | struct kvm_mmu_page *sp; |
1093 | unsigned int idx; | 1085 | unsigned int idx; |
1094 | } page[KVM_PAGE_ARRAY_NR]; | 1086 | } page[KVM_PAGE_ARRAY_NR]; |
1095 | unsigned int nr; | 1087 | unsigned int nr; |
1096 | }; | 1088 | }; |
1097 | 1089 | ||
1098 | #define for_each_unsync_children(bitmap, idx) \ | 1090 | #define for_each_unsync_children(bitmap, idx) \ |
1099 | for (idx = find_first_bit(bitmap, 512); \ | 1091 | for (idx = find_first_bit(bitmap, 512); \ |
1100 | idx < 512; \ | 1092 | idx < 512; \ |
1101 | idx = find_next_bit(bitmap, 512, idx+1)) | 1093 | idx = find_next_bit(bitmap, 512, idx+1)) |
1102 | 1094 | ||
1103 | static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, | 1095 | static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, |
1104 | int idx) | 1096 | int idx) |
1105 | { | 1097 | { |
1106 | int i; | 1098 | int i; |
1107 | 1099 | ||
1108 | if (sp->unsync) | 1100 | if (sp->unsync) |
1109 | for (i=0; i < pvec->nr; i++) | 1101 | for (i=0; i < pvec->nr; i++) |
1110 | if (pvec->page[i].sp == sp) | 1102 | if (pvec->page[i].sp == sp) |
1111 | return 0; | 1103 | return 0; |
1112 | 1104 | ||
1113 | pvec->page[pvec->nr].sp = sp; | 1105 | pvec->page[pvec->nr].sp = sp; |
1114 | pvec->page[pvec->nr].idx = idx; | 1106 | pvec->page[pvec->nr].idx = idx; |
1115 | pvec->nr++; | 1107 | pvec->nr++; |
1116 | return (pvec->nr == KVM_PAGE_ARRAY_NR); | 1108 | return (pvec->nr == KVM_PAGE_ARRAY_NR); |
1117 | } | 1109 | } |
1118 | 1110 | ||
1119 | static int __mmu_unsync_walk(struct kvm_mmu_page *sp, | 1111 | static int __mmu_unsync_walk(struct kvm_mmu_page *sp, |
1120 | struct kvm_mmu_pages *pvec) | 1112 | struct kvm_mmu_pages *pvec) |
1121 | { | 1113 | { |
1122 | int i, ret, nr_unsync_leaf = 0; | 1114 | int i, ret, nr_unsync_leaf = 0; |
1123 | 1115 | ||
1124 | for_each_unsync_children(sp->unsync_child_bitmap, i) { | 1116 | for_each_unsync_children(sp->unsync_child_bitmap, i) { |
1125 | struct kvm_mmu_page *child; | 1117 | struct kvm_mmu_page *child; |
1126 | u64 ent = sp->spt[i]; | 1118 | u64 ent = sp->spt[i]; |
1127 | 1119 | ||
1128 | if (!is_shadow_present_pte(ent) || is_large_pte(ent)) | 1120 | if (!is_shadow_present_pte(ent) || is_large_pte(ent)) |
1129 | goto clear_child_bitmap; | 1121 | goto clear_child_bitmap; |
1130 | 1122 | ||
1131 | child = page_header(ent & PT64_BASE_ADDR_MASK); | 1123 | child = page_header(ent & PT64_BASE_ADDR_MASK); |
1132 | 1124 | ||
1133 | if (child->unsync_children) { | 1125 | if (child->unsync_children) { |
1134 | if (mmu_pages_add(pvec, child, i)) | 1126 | if (mmu_pages_add(pvec, child, i)) |
1135 | return -ENOSPC; | 1127 | return -ENOSPC; |
1136 | 1128 | ||
1137 | ret = __mmu_unsync_walk(child, pvec); | 1129 | ret = __mmu_unsync_walk(child, pvec); |
1138 | if (!ret) | 1130 | if (!ret) |
1139 | goto clear_child_bitmap; | 1131 | goto clear_child_bitmap; |
1140 | else if (ret > 0) | 1132 | else if (ret > 0) |
1141 | nr_unsync_leaf += ret; | 1133 | nr_unsync_leaf += ret; |
1142 | else | 1134 | else |
1143 | return ret; | 1135 | return ret; |
1144 | } else if (child->unsync) { | 1136 | } else if (child->unsync) { |
1145 | nr_unsync_leaf++; | 1137 | nr_unsync_leaf++; |
1146 | if (mmu_pages_add(pvec, child, i)) | 1138 | if (mmu_pages_add(pvec, child, i)) |
1147 | return -ENOSPC; | 1139 | return -ENOSPC; |
1148 | } else | 1140 | } else |
1149 | goto clear_child_bitmap; | 1141 | goto clear_child_bitmap; |
1150 | 1142 | ||
1151 | continue; | 1143 | continue; |
1152 | 1144 | ||
1153 | clear_child_bitmap: | 1145 | clear_child_bitmap: |
1154 | __clear_bit(i, sp->unsync_child_bitmap); | 1146 | __clear_bit(i, sp->unsync_child_bitmap); |
1155 | sp->unsync_children--; | 1147 | sp->unsync_children--; |
1156 | WARN_ON((int)sp->unsync_children < 0); | 1148 | WARN_ON((int)sp->unsync_children < 0); |
1157 | } | 1149 | } |
1158 | 1150 | ||
1159 | 1151 | ||
1160 | return nr_unsync_leaf; | 1152 | return nr_unsync_leaf; |
1161 | } | 1153 | } |
1162 | 1154 | ||
1163 | static int mmu_unsync_walk(struct kvm_mmu_page *sp, | 1155 | static int mmu_unsync_walk(struct kvm_mmu_page *sp, |
1164 | struct kvm_mmu_pages *pvec) | 1156 | struct kvm_mmu_pages *pvec) |
1165 | { | 1157 | { |
1166 | if (!sp->unsync_children) | 1158 | if (!sp->unsync_children) |
1167 | return 0; | 1159 | return 0; |
1168 | 1160 | ||
1169 | mmu_pages_add(pvec, sp, 0); | 1161 | mmu_pages_add(pvec, sp, 0); |
1170 | return __mmu_unsync_walk(sp, pvec); | 1162 | return __mmu_unsync_walk(sp, pvec); |
1171 | } | 1163 | } |
1172 | 1164 | ||
1173 | static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) | 1165 | static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) |
1174 | { | 1166 | { |
1175 | WARN_ON(!sp->unsync); | 1167 | WARN_ON(!sp->unsync); |
1176 | trace_kvm_mmu_sync_page(sp); | 1168 | trace_kvm_mmu_sync_page(sp); |
1177 | sp->unsync = 0; | 1169 | sp->unsync = 0; |
1178 | --kvm->stat.mmu_unsync; | 1170 | --kvm->stat.mmu_unsync; |
1179 | } | 1171 | } |
1180 | 1172 | ||
1181 | static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, | 1173 | static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, |
1182 | struct list_head *invalid_list); | 1174 | struct list_head *invalid_list); |
1183 | static void kvm_mmu_commit_zap_page(struct kvm *kvm, | 1175 | static void kvm_mmu_commit_zap_page(struct kvm *kvm, |
1184 | struct list_head *invalid_list); | 1176 | struct list_head *invalid_list); |
1185 | 1177 | ||
1186 | #define for_each_gfn_sp(kvm, sp, gfn, pos) \ | 1178 | #define for_each_gfn_sp(kvm, sp, gfn, pos) \ |
1187 | hlist_for_each_entry(sp, pos, \ | 1179 | hlist_for_each_entry(sp, pos, \ |
1188 | &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ | 1180 | &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ |
1189 | if ((sp)->gfn != (gfn)) {} else | 1181 | if ((sp)->gfn != (gfn)) {} else |
1190 | 1182 | ||
1191 | #define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \ | 1183 | #define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \ |
1192 | hlist_for_each_entry(sp, pos, \ | 1184 | hlist_for_each_entry(sp, pos, \ |
1193 | &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ | 1185 | &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \ |
1194 | if ((sp)->gfn != (gfn) || (sp)->role.direct || \ | 1186 | if ((sp)->gfn != (gfn) || (sp)->role.direct || \ |
1195 | (sp)->role.invalid) {} else | 1187 | (sp)->role.invalid) {} else |
1196 | 1188 | ||
1197 | /* @sp->gfn should be write-protected at the call site */ | 1189 | /* @sp->gfn should be write-protected at the call site */ |
1198 | static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, | 1190 | static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, |
1199 | struct list_head *invalid_list, bool clear_unsync) | 1191 | struct list_head *invalid_list, bool clear_unsync) |
1200 | { | 1192 | { |
1201 | if (sp->role.cr4_pae != !!is_pae(vcpu)) { | 1193 | if (sp->role.cr4_pae != !!is_pae(vcpu)) { |
1202 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); | 1194 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); |
1203 | return 1; | 1195 | return 1; |
1204 | } | 1196 | } |
1205 | 1197 | ||
1206 | if (clear_unsync) | 1198 | if (clear_unsync) |
1207 | kvm_unlink_unsync_page(vcpu->kvm, sp); | 1199 | kvm_unlink_unsync_page(vcpu->kvm, sp); |
1208 | 1200 | ||
1209 | if (vcpu->arch.mmu.sync_page(vcpu, sp, clear_unsync)) { | 1201 | if (vcpu->arch.mmu.sync_page(vcpu, sp, clear_unsync)) { |
1210 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); | 1202 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); |
1211 | return 1; | 1203 | return 1; |
1212 | } | 1204 | } |
1213 | 1205 | ||
1214 | kvm_mmu_flush_tlb(vcpu); | 1206 | kvm_mmu_flush_tlb(vcpu); |
1215 | return 0; | 1207 | return 0; |
1216 | } | 1208 | } |
1217 | 1209 | ||
1218 | static int kvm_sync_page_transient(struct kvm_vcpu *vcpu, | 1210 | static int kvm_sync_page_transient(struct kvm_vcpu *vcpu, |
1219 | struct kvm_mmu_page *sp) | 1211 | struct kvm_mmu_page *sp) |
1220 | { | 1212 | { |
1221 | LIST_HEAD(invalid_list); | 1213 | LIST_HEAD(invalid_list); |
1222 | int ret; | 1214 | int ret; |
1223 | 1215 | ||
1224 | ret = __kvm_sync_page(vcpu, sp, &invalid_list, false); | 1216 | ret = __kvm_sync_page(vcpu, sp, &invalid_list, false); |
1225 | if (ret) | 1217 | if (ret) |
1226 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 1218 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
1227 | 1219 | ||
1228 | return ret; | 1220 | return ret; |
1229 | } | 1221 | } |
1230 | 1222 | ||
1231 | static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, | 1223 | static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, |
1232 | struct list_head *invalid_list) | 1224 | struct list_head *invalid_list) |
1233 | { | 1225 | { |
1234 | return __kvm_sync_page(vcpu, sp, invalid_list, true); | 1226 | return __kvm_sync_page(vcpu, sp, invalid_list, true); |
1235 | } | 1227 | } |
1236 | 1228 | ||
1237 | /* @gfn should be write-protected at the call site */ | 1229 | /* @gfn should be write-protected at the call site */ |
1238 | static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) | 1230 | static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) |
1239 | { | 1231 | { |
1240 | struct kvm_mmu_page *s; | 1232 | struct kvm_mmu_page *s; |
1241 | struct hlist_node *node; | 1233 | struct hlist_node *node; |
1242 | LIST_HEAD(invalid_list); | 1234 | LIST_HEAD(invalid_list); |
1243 | bool flush = false; | 1235 | bool flush = false; |
1244 | 1236 | ||
1245 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { | 1237 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { |
1246 | if (!s->unsync) | 1238 | if (!s->unsync) |
1247 | continue; | 1239 | continue; |
1248 | 1240 | ||
1249 | WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); | 1241 | WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); |
1250 | if ((s->role.cr4_pae != !!is_pae(vcpu)) || | 1242 | if ((s->role.cr4_pae != !!is_pae(vcpu)) || |
1251 | (vcpu->arch.mmu.sync_page(vcpu, s, true))) { | 1243 | (vcpu->arch.mmu.sync_page(vcpu, s, true))) { |
1252 | kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list); | 1244 | kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list); |
1253 | continue; | 1245 | continue; |
1254 | } | 1246 | } |
1255 | kvm_unlink_unsync_page(vcpu->kvm, s); | 1247 | kvm_unlink_unsync_page(vcpu->kvm, s); |
1256 | flush = true; | 1248 | flush = true; |
1257 | } | 1249 | } |
1258 | 1250 | ||
1259 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 1251 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
1260 | if (flush) | 1252 | if (flush) |
1261 | kvm_mmu_flush_tlb(vcpu); | 1253 | kvm_mmu_flush_tlb(vcpu); |
1262 | } | 1254 | } |
1263 | 1255 | ||
1264 | struct mmu_page_path { | 1256 | struct mmu_page_path { |
1265 | struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1]; | 1257 | struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1]; |
1266 | unsigned int idx[PT64_ROOT_LEVEL-1]; | 1258 | unsigned int idx[PT64_ROOT_LEVEL-1]; |
1267 | }; | 1259 | }; |
1268 | 1260 | ||
1269 | #define for_each_sp(pvec, sp, parents, i) \ | 1261 | #define for_each_sp(pvec, sp, parents, i) \ |
1270 | for (i = mmu_pages_next(&pvec, &parents, -1), \ | 1262 | for (i = mmu_pages_next(&pvec, &parents, -1), \ |
1271 | sp = pvec.page[i].sp; \ | 1263 | sp = pvec.page[i].sp; \ |
1272 | i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \ | 1264 | i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \ |
1273 | i = mmu_pages_next(&pvec, &parents, i)) | 1265 | i = mmu_pages_next(&pvec, &parents, i)) |
1274 | 1266 | ||
1275 | static int mmu_pages_next(struct kvm_mmu_pages *pvec, | 1267 | static int mmu_pages_next(struct kvm_mmu_pages *pvec, |
1276 | struct mmu_page_path *parents, | 1268 | struct mmu_page_path *parents, |
1277 | int i) | 1269 | int i) |
1278 | { | 1270 | { |
1279 | int n; | 1271 | int n; |
1280 | 1272 | ||
1281 | for (n = i+1; n < pvec->nr; n++) { | 1273 | for (n = i+1; n < pvec->nr; n++) { |
1282 | struct kvm_mmu_page *sp = pvec->page[n].sp; | 1274 | struct kvm_mmu_page *sp = pvec->page[n].sp; |
1283 | 1275 | ||
1284 | if (sp->role.level == PT_PAGE_TABLE_LEVEL) { | 1276 | if (sp->role.level == PT_PAGE_TABLE_LEVEL) { |
1285 | parents->idx[0] = pvec->page[n].idx; | 1277 | parents->idx[0] = pvec->page[n].idx; |
1286 | return n; | 1278 | return n; |
1287 | } | 1279 | } |
1288 | 1280 | ||
1289 | parents->parent[sp->role.level-2] = sp; | 1281 | parents->parent[sp->role.level-2] = sp; |
1290 | parents->idx[sp->role.level-1] = pvec->page[n].idx; | 1282 | parents->idx[sp->role.level-1] = pvec->page[n].idx; |
1291 | } | 1283 | } |
1292 | 1284 | ||
1293 | return n; | 1285 | return n; |
1294 | } | 1286 | } |
1295 | 1287 | ||
1296 | static void mmu_pages_clear_parents(struct mmu_page_path *parents) | 1288 | static void mmu_pages_clear_parents(struct mmu_page_path *parents) |
1297 | { | 1289 | { |
1298 | struct kvm_mmu_page *sp; | 1290 | struct kvm_mmu_page *sp; |
1299 | unsigned int level = 0; | 1291 | unsigned int level = 0; |
1300 | 1292 | ||
1301 | do { | 1293 | do { |
1302 | unsigned int idx = parents->idx[level]; | 1294 | unsigned int idx = parents->idx[level]; |
1303 | 1295 | ||
1304 | sp = parents->parent[level]; | 1296 | sp = parents->parent[level]; |
1305 | if (!sp) | 1297 | if (!sp) |
1306 | return; | 1298 | return; |
1307 | 1299 | ||
1308 | --sp->unsync_children; | 1300 | --sp->unsync_children; |
1309 | WARN_ON((int)sp->unsync_children < 0); | 1301 | WARN_ON((int)sp->unsync_children < 0); |
1310 | __clear_bit(idx, sp->unsync_child_bitmap); | 1302 | __clear_bit(idx, sp->unsync_child_bitmap); |
1311 | level++; | 1303 | level++; |
1312 | } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); | 1304 | } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); |
1313 | } | 1305 | } |
1314 | 1306 | ||
1315 | static void kvm_mmu_pages_init(struct kvm_mmu_page *parent, | 1307 | static void kvm_mmu_pages_init(struct kvm_mmu_page *parent, |
1316 | struct mmu_page_path *parents, | 1308 | struct mmu_page_path *parents, |
1317 | struct kvm_mmu_pages *pvec) | 1309 | struct kvm_mmu_pages *pvec) |
1318 | { | 1310 | { |
1319 | parents->parent[parent->role.level-1] = NULL; | 1311 | parents->parent[parent->role.level-1] = NULL; |
1320 | pvec->nr = 0; | 1312 | pvec->nr = 0; |
1321 | } | 1313 | } |
1322 | 1314 | ||
1323 | static void mmu_sync_children(struct kvm_vcpu *vcpu, | 1315 | static void mmu_sync_children(struct kvm_vcpu *vcpu, |
1324 | struct kvm_mmu_page *parent) | 1316 | struct kvm_mmu_page *parent) |
1325 | { | 1317 | { |
1326 | int i; | 1318 | int i; |
1327 | struct kvm_mmu_page *sp; | 1319 | struct kvm_mmu_page *sp; |
1328 | struct mmu_page_path parents; | 1320 | struct mmu_page_path parents; |
1329 | struct kvm_mmu_pages pages; | 1321 | struct kvm_mmu_pages pages; |
1330 | LIST_HEAD(invalid_list); | 1322 | LIST_HEAD(invalid_list); |
1331 | 1323 | ||
1332 | kvm_mmu_pages_init(parent, &parents, &pages); | 1324 | kvm_mmu_pages_init(parent, &parents, &pages); |
1333 | while (mmu_unsync_walk(parent, &pages)) { | 1325 | while (mmu_unsync_walk(parent, &pages)) { |
1334 | int protected = 0; | 1326 | int protected = 0; |
1335 | 1327 | ||
1336 | for_each_sp(pages, sp, parents, i) | 1328 | for_each_sp(pages, sp, parents, i) |
1337 | protected |= rmap_write_protect(vcpu->kvm, sp->gfn); | 1329 | protected |= rmap_write_protect(vcpu->kvm, sp->gfn); |
1338 | 1330 | ||
1339 | if (protected) | 1331 | if (protected) |
1340 | kvm_flush_remote_tlbs(vcpu->kvm); | 1332 | kvm_flush_remote_tlbs(vcpu->kvm); |
1341 | 1333 | ||
1342 | for_each_sp(pages, sp, parents, i) { | 1334 | for_each_sp(pages, sp, parents, i) { |
1343 | kvm_sync_page(vcpu, sp, &invalid_list); | 1335 | kvm_sync_page(vcpu, sp, &invalid_list); |
1344 | mmu_pages_clear_parents(&parents); | 1336 | mmu_pages_clear_parents(&parents); |
1345 | } | 1337 | } |
1346 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 1338 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
1347 | cond_resched_lock(&vcpu->kvm->mmu_lock); | 1339 | cond_resched_lock(&vcpu->kvm->mmu_lock); |
1348 | kvm_mmu_pages_init(parent, &parents, &pages); | 1340 | kvm_mmu_pages_init(parent, &parents, &pages); |
1349 | } | 1341 | } |
1350 | } | 1342 | } |
1351 | 1343 | ||
1352 | static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, | 1344 | static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, |
1353 | gfn_t gfn, | 1345 | gfn_t gfn, |
1354 | gva_t gaddr, | 1346 | gva_t gaddr, |
1355 | unsigned level, | 1347 | unsigned level, |
1356 | int direct, | 1348 | int direct, |
1357 | unsigned access, | 1349 | unsigned access, |
1358 | u64 *parent_pte) | 1350 | u64 *parent_pte) |
1359 | { | 1351 | { |
1360 | union kvm_mmu_page_role role; | 1352 | union kvm_mmu_page_role role; |
1361 | unsigned quadrant; | 1353 | unsigned quadrant; |
1362 | struct kvm_mmu_page *sp; | 1354 | struct kvm_mmu_page *sp; |
1363 | struct hlist_node *node; | 1355 | struct hlist_node *node; |
1364 | bool need_sync = false; | 1356 | bool need_sync = false; |
1365 | 1357 | ||
1366 | role = vcpu->arch.mmu.base_role; | 1358 | role = vcpu->arch.mmu.base_role; |
1367 | role.level = level; | 1359 | role.level = level; |
1368 | role.direct = direct; | 1360 | role.direct = direct; |
1369 | if (role.direct) | 1361 | if (role.direct) |
1370 | role.cr4_pae = 0; | 1362 | role.cr4_pae = 0; |
1371 | role.access = access; | 1363 | role.access = access; |
1372 | if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) { | 1364 | if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) { |
1373 | quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); | 1365 | quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); |
1374 | quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; | 1366 | quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; |
1375 | role.quadrant = quadrant; | 1367 | role.quadrant = quadrant; |
1376 | } | 1368 | } |
1377 | for_each_gfn_sp(vcpu->kvm, sp, gfn, node) { | 1369 | for_each_gfn_sp(vcpu->kvm, sp, gfn, node) { |
1378 | if (!need_sync && sp->unsync) | 1370 | if (!need_sync && sp->unsync) |
1379 | need_sync = true; | 1371 | need_sync = true; |
1380 | 1372 | ||
1381 | if (sp->role.word != role.word) | 1373 | if (sp->role.word != role.word) |
1382 | continue; | 1374 | continue; |
1383 | 1375 | ||
1384 | if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) | 1376 | if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) |
1385 | break; | 1377 | break; |
1386 | 1378 | ||
1387 | mmu_page_add_parent_pte(vcpu, sp, parent_pte); | 1379 | mmu_page_add_parent_pte(vcpu, sp, parent_pte); |
1388 | if (sp->unsync_children) { | 1380 | if (sp->unsync_children) { |
1389 | set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests); | 1381 | set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests); |
1390 | kvm_mmu_mark_parents_unsync(sp); | 1382 | kvm_mmu_mark_parents_unsync(sp); |
1391 | } else if (sp->unsync) | 1383 | } else if (sp->unsync) |
1392 | kvm_mmu_mark_parents_unsync(sp); | 1384 | kvm_mmu_mark_parents_unsync(sp); |
1393 | 1385 | ||
1394 | trace_kvm_mmu_get_page(sp, false); | 1386 | trace_kvm_mmu_get_page(sp, false); |
1395 | return sp; | 1387 | return sp; |
1396 | } | 1388 | } |
1397 | ++vcpu->kvm->stat.mmu_cache_miss; | 1389 | ++vcpu->kvm->stat.mmu_cache_miss; |
1398 | sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); | 1390 | sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); |
1399 | if (!sp) | 1391 | if (!sp) |
1400 | return sp; | 1392 | return sp; |
1401 | sp->gfn = gfn; | 1393 | sp->gfn = gfn; |
1402 | sp->role = role; | 1394 | sp->role = role; |
1403 | hlist_add_head(&sp->hash_link, | 1395 | hlist_add_head(&sp->hash_link, |
1404 | &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]); | 1396 | &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]); |
1405 | if (!direct) { | 1397 | if (!direct) { |
1406 | if (rmap_write_protect(vcpu->kvm, gfn)) | 1398 | if (rmap_write_protect(vcpu->kvm, gfn)) |
1407 | kvm_flush_remote_tlbs(vcpu->kvm); | 1399 | kvm_flush_remote_tlbs(vcpu->kvm); |
1408 | if (level > PT_PAGE_TABLE_LEVEL && need_sync) | 1400 | if (level > PT_PAGE_TABLE_LEVEL && need_sync) |
1409 | kvm_sync_pages(vcpu, gfn); | 1401 | kvm_sync_pages(vcpu, gfn); |
1410 | 1402 | ||
1411 | account_shadowed(vcpu->kvm, gfn); | 1403 | account_shadowed(vcpu->kvm, gfn); |
1412 | } | 1404 | } |
1413 | if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte) | 1405 | if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte) |
1414 | vcpu->arch.mmu.prefetch_page(vcpu, sp); | 1406 | vcpu->arch.mmu.prefetch_page(vcpu, sp); |
1415 | else | 1407 | else |
1416 | nonpaging_prefetch_page(vcpu, sp); | 1408 | nonpaging_prefetch_page(vcpu, sp); |
1417 | trace_kvm_mmu_get_page(sp, true); | 1409 | trace_kvm_mmu_get_page(sp, true); |
1418 | return sp; | 1410 | return sp; |
1419 | } | 1411 | } |
1420 | 1412 | ||
1421 | static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, | 1413 | static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, |
1422 | struct kvm_vcpu *vcpu, u64 addr) | 1414 | struct kvm_vcpu *vcpu, u64 addr) |
1423 | { | 1415 | { |
1424 | iterator->addr = addr; | 1416 | iterator->addr = addr; |
1425 | iterator->shadow_addr = vcpu->arch.mmu.root_hpa; | 1417 | iterator->shadow_addr = vcpu->arch.mmu.root_hpa; |
1426 | iterator->level = vcpu->arch.mmu.shadow_root_level; | 1418 | iterator->level = vcpu->arch.mmu.shadow_root_level; |
1427 | if (iterator->level == PT32E_ROOT_LEVEL) { | 1419 | if (iterator->level == PT32E_ROOT_LEVEL) { |
1428 | iterator->shadow_addr | 1420 | iterator->shadow_addr |
1429 | = vcpu->arch.mmu.pae_root[(addr >> 30) & 3]; | 1421 | = vcpu->arch.mmu.pae_root[(addr >> 30) & 3]; |
1430 | iterator->shadow_addr &= PT64_BASE_ADDR_MASK; | 1422 | iterator->shadow_addr &= PT64_BASE_ADDR_MASK; |
1431 | --iterator->level; | 1423 | --iterator->level; |
1432 | if (!iterator->shadow_addr) | 1424 | if (!iterator->shadow_addr) |
1433 | iterator->level = 0; | 1425 | iterator->level = 0; |
1434 | } | 1426 | } |
1435 | } | 1427 | } |
1436 | 1428 | ||
1437 | static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) | 1429 | static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) |
1438 | { | 1430 | { |
1439 | if (iterator->level < PT_PAGE_TABLE_LEVEL) | 1431 | if (iterator->level < PT_PAGE_TABLE_LEVEL) |
1440 | return false; | 1432 | return false; |
1441 | 1433 | ||
1442 | if (iterator->level == PT_PAGE_TABLE_LEVEL) | 1434 | if (iterator->level == PT_PAGE_TABLE_LEVEL) |
1443 | if (is_large_pte(*iterator->sptep)) | 1435 | if (is_large_pte(*iterator->sptep)) |
1444 | return false; | 1436 | return false; |
1445 | 1437 | ||
1446 | iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level); | 1438 | iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level); |
1447 | iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index; | 1439 | iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index; |
1448 | return true; | 1440 | return true; |
1449 | } | 1441 | } |
1450 | 1442 | ||
1451 | static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) | 1443 | static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) |
1452 | { | 1444 | { |
1453 | iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK; | 1445 | iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK; |
1454 | --iterator->level; | 1446 | --iterator->level; |
1455 | } | 1447 | } |
1456 | 1448 | ||
1457 | static void kvm_mmu_page_unlink_children(struct kvm *kvm, | 1449 | static void kvm_mmu_page_unlink_children(struct kvm *kvm, |
1458 | struct kvm_mmu_page *sp) | 1450 | struct kvm_mmu_page *sp) |
1459 | { | 1451 | { |
1460 | unsigned i; | 1452 | unsigned i; |
1461 | u64 *pt; | 1453 | u64 *pt; |
1462 | u64 ent; | 1454 | u64 ent; |
1463 | 1455 | ||
1464 | pt = sp->spt; | 1456 | pt = sp->spt; |
1465 | 1457 | ||
1466 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { | 1458 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { |
1467 | ent = pt[i]; | 1459 | ent = pt[i]; |
1468 | 1460 | ||
1469 | if (is_shadow_present_pte(ent)) { | 1461 | if (is_shadow_present_pte(ent)) { |
1470 | if (!is_last_spte(ent, sp->role.level)) { | 1462 | if (!is_last_spte(ent, sp->role.level)) { |
1471 | ent &= PT64_BASE_ADDR_MASK; | 1463 | ent &= PT64_BASE_ADDR_MASK; |
1472 | mmu_page_remove_parent_pte(page_header(ent), | 1464 | mmu_page_remove_parent_pte(page_header(ent), |
1473 | &pt[i]); | 1465 | &pt[i]); |
1474 | } else { | 1466 | } else { |
1475 | if (is_large_pte(ent)) | 1467 | if (is_large_pte(ent)) |
1476 | --kvm->stat.lpages; | 1468 | --kvm->stat.lpages; |
1477 | rmap_remove(kvm, &pt[i]); | 1469 | rmap_remove(kvm, &pt[i]); |
1478 | } | 1470 | } |
1479 | } | 1471 | } |
1480 | pt[i] = shadow_trap_nonpresent_pte; | 1472 | pt[i] = shadow_trap_nonpresent_pte; |
1481 | } | 1473 | } |
1482 | } | 1474 | } |
1483 | 1475 | ||
1484 | static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) | 1476 | static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) |
1485 | { | 1477 | { |
1486 | mmu_page_remove_parent_pte(sp, parent_pte); | 1478 | mmu_page_remove_parent_pte(sp, parent_pte); |
1487 | } | 1479 | } |
1488 | 1480 | ||
1489 | static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm) | 1481 | static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm) |
1490 | { | 1482 | { |
1491 | int i; | 1483 | int i; |
1492 | struct kvm_vcpu *vcpu; | 1484 | struct kvm_vcpu *vcpu; |
1493 | 1485 | ||
1494 | kvm_for_each_vcpu(i, vcpu, kvm) | 1486 | kvm_for_each_vcpu(i, vcpu, kvm) |
1495 | vcpu->arch.last_pte_updated = NULL; | 1487 | vcpu->arch.last_pte_updated = NULL; |
1496 | } | 1488 | } |
1497 | 1489 | ||
1498 | static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) | 1490 | static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) |
1499 | { | 1491 | { |
1500 | u64 *parent_pte; | 1492 | u64 *parent_pte; |
1501 | 1493 | ||
1502 | while (sp->multimapped || sp->parent_pte) { | 1494 | while (sp->multimapped || sp->parent_pte) { |
1503 | if (!sp->multimapped) | 1495 | if (!sp->multimapped) |
1504 | parent_pte = sp->parent_pte; | 1496 | parent_pte = sp->parent_pte; |
1505 | else { | 1497 | else { |
1506 | struct kvm_pte_chain *chain; | 1498 | struct kvm_pte_chain *chain; |
1507 | 1499 | ||
1508 | chain = container_of(sp->parent_ptes.first, | 1500 | chain = container_of(sp->parent_ptes.first, |
1509 | struct kvm_pte_chain, link); | 1501 | struct kvm_pte_chain, link); |
1510 | parent_pte = chain->parent_ptes[0]; | 1502 | parent_pte = chain->parent_ptes[0]; |
1511 | } | 1503 | } |
1512 | BUG_ON(!parent_pte); | 1504 | BUG_ON(!parent_pte); |
1513 | kvm_mmu_put_page(sp, parent_pte); | 1505 | kvm_mmu_put_page(sp, parent_pte); |
1514 | __set_spte(parent_pte, shadow_trap_nonpresent_pte); | 1506 | __set_spte(parent_pte, shadow_trap_nonpresent_pte); |
1515 | } | 1507 | } |
1516 | } | 1508 | } |
1517 | 1509 | ||
1518 | static int mmu_zap_unsync_children(struct kvm *kvm, | 1510 | static int mmu_zap_unsync_children(struct kvm *kvm, |
1519 | struct kvm_mmu_page *parent, | 1511 | struct kvm_mmu_page *parent, |
1520 | struct list_head *invalid_list) | 1512 | struct list_head *invalid_list) |
1521 | { | 1513 | { |
1522 | int i, zapped = 0; | 1514 | int i, zapped = 0; |
1523 | struct mmu_page_path parents; | 1515 | struct mmu_page_path parents; |
1524 | struct kvm_mmu_pages pages; | 1516 | struct kvm_mmu_pages pages; |
1525 | 1517 | ||
1526 | if (parent->role.level == PT_PAGE_TABLE_LEVEL) | 1518 | if (parent->role.level == PT_PAGE_TABLE_LEVEL) |
1527 | return 0; | 1519 | return 0; |
1528 | 1520 | ||
1529 | kvm_mmu_pages_init(parent, &parents, &pages); | 1521 | kvm_mmu_pages_init(parent, &parents, &pages); |
1530 | while (mmu_unsync_walk(parent, &pages)) { | 1522 | while (mmu_unsync_walk(parent, &pages)) { |
1531 | struct kvm_mmu_page *sp; | 1523 | struct kvm_mmu_page *sp; |
1532 | 1524 | ||
1533 | for_each_sp(pages, sp, parents, i) { | 1525 | for_each_sp(pages, sp, parents, i) { |
1534 | kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); | 1526 | kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); |
1535 | mmu_pages_clear_parents(&parents); | 1527 | mmu_pages_clear_parents(&parents); |
1536 | zapped++; | 1528 | zapped++; |
1537 | } | 1529 | } |
1538 | kvm_mmu_pages_init(parent, &parents, &pages); | 1530 | kvm_mmu_pages_init(parent, &parents, &pages); |
1539 | } | 1531 | } |
1540 | 1532 | ||
1541 | return zapped; | 1533 | return zapped; |
1542 | } | 1534 | } |
1543 | 1535 | ||
1544 | static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, | 1536 | static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, |
1545 | struct list_head *invalid_list) | 1537 | struct list_head *invalid_list) |
1546 | { | 1538 | { |
1547 | int ret; | 1539 | int ret; |
1548 | 1540 | ||
1549 | trace_kvm_mmu_prepare_zap_page(sp); | 1541 | trace_kvm_mmu_prepare_zap_page(sp); |
1550 | ++kvm->stat.mmu_shadow_zapped; | 1542 | ++kvm->stat.mmu_shadow_zapped; |
1551 | ret = mmu_zap_unsync_children(kvm, sp, invalid_list); | 1543 | ret = mmu_zap_unsync_children(kvm, sp, invalid_list); |
1552 | kvm_mmu_page_unlink_children(kvm, sp); | 1544 | kvm_mmu_page_unlink_children(kvm, sp); |
1553 | kvm_mmu_unlink_parents(kvm, sp); | 1545 | kvm_mmu_unlink_parents(kvm, sp); |
1554 | if (!sp->role.invalid && !sp->role.direct) | 1546 | if (!sp->role.invalid && !sp->role.direct) |
1555 | unaccount_shadowed(kvm, sp->gfn); | 1547 | unaccount_shadowed(kvm, sp->gfn); |
1556 | if (sp->unsync) | 1548 | if (sp->unsync) |
1557 | kvm_unlink_unsync_page(kvm, sp); | 1549 | kvm_unlink_unsync_page(kvm, sp); |
1558 | if (!sp->root_count) { | 1550 | if (!sp->root_count) { |
1559 | /* Count self */ | 1551 | /* Count self */ |
1560 | ret++; | 1552 | ret++; |
1561 | list_move(&sp->link, invalid_list); | 1553 | list_move(&sp->link, invalid_list); |
1562 | } else { | 1554 | } else { |
1563 | list_move(&sp->link, &kvm->arch.active_mmu_pages); | 1555 | list_move(&sp->link, &kvm->arch.active_mmu_pages); |
1564 | kvm_reload_remote_mmus(kvm); | 1556 | kvm_reload_remote_mmus(kvm); |
1565 | } | 1557 | } |
1566 | 1558 | ||
1567 | sp->role.invalid = 1; | 1559 | sp->role.invalid = 1; |
1568 | kvm_mmu_reset_last_pte_updated(kvm); | 1560 | kvm_mmu_reset_last_pte_updated(kvm); |
1569 | return ret; | 1561 | return ret; |
1570 | } | 1562 | } |
1571 | 1563 | ||
1572 | static void kvm_mmu_commit_zap_page(struct kvm *kvm, | 1564 | static void kvm_mmu_commit_zap_page(struct kvm *kvm, |
1573 | struct list_head *invalid_list) | 1565 | struct list_head *invalid_list) |
1574 | { | 1566 | { |
1575 | struct kvm_mmu_page *sp; | 1567 | struct kvm_mmu_page *sp; |
1576 | 1568 | ||
1577 | if (list_empty(invalid_list)) | 1569 | if (list_empty(invalid_list)) |
1578 | return; | 1570 | return; |
1579 | 1571 | ||
1580 | kvm_flush_remote_tlbs(kvm); | 1572 | kvm_flush_remote_tlbs(kvm); |
1581 | 1573 | ||
1582 | do { | 1574 | do { |
1583 | sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); | 1575 | sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); |
1584 | WARN_ON(!sp->role.invalid || sp->root_count); | 1576 | WARN_ON(!sp->role.invalid || sp->root_count); |
1585 | kvm_mmu_free_page(kvm, sp); | 1577 | kvm_mmu_free_page(kvm, sp); |
1586 | } while (!list_empty(invalid_list)); | 1578 | } while (!list_empty(invalid_list)); |
1587 | 1579 | ||
1588 | } | 1580 | } |
1589 | 1581 | ||
1590 | /* | 1582 | /* |
1591 | * Changing the number of mmu pages allocated to the vm | 1583 | * Changing the number of mmu pages allocated to the vm |
1592 | * Note: if kvm_nr_mmu_pages is too small, you will get dead lock | 1584 | * Note: if kvm_nr_mmu_pages is too small, you will get dead lock |
1593 | */ | 1585 | */ |
1594 | void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) | 1586 | void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) |
1595 | { | 1587 | { |
1596 | int used_pages; | 1588 | int used_pages; |
1597 | LIST_HEAD(invalid_list); | 1589 | LIST_HEAD(invalid_list); |
1598 | 1590 | ||
1599 | used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages; | 1591 | used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages; |
1600 | used_pages = max(0, used_pages); | 1592 | used_pages = max(0, used_pages); |
1601 | 1593 | ||
1602 | /* | 1594 | /* |
1603 | * If we set the number of mmu pages to be smaller be than the | 1595 | * If we set the number of mmu pages to be smaller be than the |
1604 | * number of actived pages , we must to free some mmu pages before we | 1596 | * number of actived pages , we must to free some mmu pages before we |
1605 | * change the value | 1597 | * change the value |
1606 | */ | 1598 | */ |
1607 | 1599 | ||
1608 | if (used_pages > kvm_nr_mmu_pages) { | 1600 | if (used_pages > kvm_nr_mmu_pages) { |
1609 | while (used_pages > kvm_nr_mmu_pages && | 1601 | while (used_pages > kvm_nr_mmu_pages && |
1610 | !list_empty(&kvm->arch.active_mmu_pages)) { | 1602 | !list_empty(&kvm->arch.active_mmu_pages)) { |
1611 | struct kvm_mmu_page *page; | 1603 | struct kvm_mmu_page *page; |
1612 | 1604 | ||
1613 | page = container_of(kvm->arch.active_mmu_pages.prev, | 1605 | page = container_of(kvm->arch.active_mmu_pages.prev, |
1614 | struct kvm_mmu_page, link); | 1606 | struct kvm_mmu_page, link); |
1615 | used_pages -= kvm_mmu_prepare_zap_page(kvm, page, | 1607 | used_pages -= kvm_mmu_prepare_zap_page(kvm, page, |
1616 | &invalid_list); | 1608 | &invalid_list); |
1617 | } | 1609 | } |
1618 | kvm_mmu_commit_zap_page(kvm, &invalid_list); | 1610 | kvm_mmu_commit_zap_page(kvm, &invalid_list); |
1619 | kvm_nr_mmu_pages = used_pages; | 1611 | kvm_nr_mmu_pages = used_pages; |
1620 | kvm->arch.n_free_mmu_pages = 0; | 1612 | kvm->arch.n_free_mmu_pages = 0; |
1621 | } | 1613 | } |
1622 | else | 1614 | else |
1623 | kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages | 1615 | kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages |
1624 | - kvm->arch.n_alloc_mmu_pages; | 1616 | - kvm->arch.n_alloc_mmu_pages; |
1625 | 1617 | ||
1626 | kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages; | 1618 | kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages; |
1627 | } | 1619 | } |
1628 | 1620 | ||
1629 | static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) | 1621 | static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) |
1630 | { | 1622 | { |
1631 | struct kvm_mmu_page *sp; | 1623 | struct kvm_mmu_page *sp; |
1632 | struct hlist_node *node; | 1624 | struct hlist_node *node; |
1633 | LIST_HEAD(invalid_list); | 1625 | LIST_HEAD(invalid_list); |
1634 | int r; | 1626 | int r; |
1635 | 1627 | ||
1636 | pgprintk("%s: looking for gfn %lx\n", __func__, gfn); | 1628 | pgprintk("%s: looking for gfn %lx\n", __func__, gfn); |
1637 | r = 0; | 1629 | r = 0; |
1638 | 1630 | ||
1639 | for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { | 1631 | for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { |
1640 | pgprintk("%s: gfn %lx role %x\n", __func__, gfn, | 1632 | pgprintk("%s: gfn %lx role %x\n", __func__, gfn, |
1641 | sp->role.word); | 1633 | sp->role.word); |
1642 | r = 1; | 1634 | r = 1; |
1643 | kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); | 1635 | kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); |
1644 | } | 1636 | } |
1645 | kvm_mmu_commit_zap_page(kvm, &invalid_list); | 1637 | kvm_mmu_commit_zap_page(kvm, &invalid_list); |
1646 | return r; | 1638 | return r; |
1647 | } | 1639 | } |
1648 | 1640 | ||
1649 | static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) | 1641 | static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) |
1650 | { | 1642 | { |
1651 | struct kvm_mmu_page *sp; | 1643 | struct kvm_mmu_page *sp; |
1652 | struct hlist_node *node; | 1644 | struct hlist_node *node; |
1653 | LIST_HEAD(invalid_list); | 1645 | LIST_HEAD(invalid_list); |
1654 | 1646 | ||
1655 | for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { | 1647 | for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) { |
1656 | pgprintk("%s: zap %lx %x\n", | 1648 | pgprintk("%s: zap %lx %x\n", |
1657 | __func__, gfn, sp->role.word); | 1649 | __func__, gfn, sp->role.word); |
1658 | kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); | 1650 | kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); |
1659 | } | 1651 | } |
1660 | kvm_mmu_commit_zap_page(kvm, &invalid_list); | 1652 | kvm_mmu_commit_zap_page(kvm, &invalid_list); |
1661 | } | 1653 | } |
1662 | 1654 | ||
1663 | static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn) | 1655 | static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn) |
1664 | { | 1656 | { |
1665 | int slot = memslot_id(kvm, gfn); | 1657 | int slot = memslot_id(kvm, gfn); |
1666 | struct kvm_mmu_page *sp = page_header(__pa(pte)); | 1658 | struct kvm_mmu_page *sp = page_header(__pa(pte)); |
1667 | 1659 | ||
1668 | __set_bit(slot, sp->slot_bitmap); | 1660 | __set_bit(slot, sp->slot_bitmap); |
1669 | } | 1661 | } |
1670 | 1662 | ||
1671 | static void mmu_convert_notrap(struct kvm_mmu_page *sp) | 1663 | static void mmu_convert_notrap(struct kvm_mmu_page *sp) |
1672 | { | 1664 | { |
1673 | int i; | 1665 | int i; |
1674 | u64 *pt = sp->spt; | 1666 | u64 *pt = sp->spt; |
1675 | 1667 | ||
1676 | if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte) | 1668 | if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte) |
1677 | return; | 1669 | return; |
1678 | 1670 | ||
1679 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { | 1671 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { |
1680 | if (pt[i] == shadow_notrap_nonpresent_pte) | 1672 | if (pt[i] == shadow_notrap_nonpresent_pte) |
1681 | __set_spte(&pt[i], shadow_trap_nonpresent_pte); | 1673 | __set_spte(&pt[i], shadow_trap_nonpresent_pte); |
1682 | } | 1674 | } |
1683 | } | 1675 | } |
1684 | 1676 | ||
1685 | /* | 1677 | /* |
1686 | * The function is based on mtrr_type_lookup() in | 1678 | * The function is based on mtrr_type_lookup() in |
1687 | * arch/x86/kernel/cpu/mtrr/generic.c | 1679 | * arch/x86/kernel/cpu/mtrr/generic.c |
1688 | */ | 1680 | */ |
1689 | static int get_mtrr_type(struct mtrr_state_type *mtrr_state, | 1681 | static int get_mtrr_type(struct mtrr_state_type *mtrr_state, |
1690 | u64 start, u64 end) | 1682 | u64 start, u64 end) |
1691 | { | 1683 | { |
1692 | int i; | 1684 | int i; |
1693 | u64 base, mask; | 1685 | u64 base, mask; |
1694 | u8 prev_match, curr_match; | 1686 | u8 prev_match, curr_match; |
1695 | int num_var_ranges = KVM_NR_VAR_MTRR; | 1687 | int num_var_ranges = KVM_NR_VAR_MTRR; |
1696 | 1688 | ||
1697 | if (!mtrr_state->enabled) | 1689 | if (!mtrr_state->enabled) |
1698 | return 0xFF; | 1690 | return 0xFF; |
1699 | 1691 | ||
1700 | /* Make end inclusive end, instead of exclusive */ | 1692 | /* Make end inclusive end, instead of exclusive */ |
1701 | end--; | 1693 | end--; |
1702 | 1694 | ||
1703 | /* Look in fixed ranges. Just return the type as per start */ | 1695 | /* Look in fixed ranges. Just return the type as per start */ |
1704 | if (mtrr_state->have_fixed && (start < 0x100000)) { | 1696 | if (mtrr_state->have_fixed && (start < 0x100000)) { |
1705 | int idx; | 1697 | int idx; |
1706 | 1698 | ||
1707 | if (start < 0x80000) { | 1699 | if (start < 0x80000) { |
1708 | idx = 0; | 1700 | idx = 0; |
1709 | idx += (start >> 16); | 1701 | idx += (start >> 16); |
1710 | return mtrr_state->fixed_ranges[idx]; | 1702 | return mtrr_state->fixed_ranges[idx]; |
1711 | } else if (start < 0xC0000) { | 1703 | } else if (start < 0xC0000) { |
1712 | idx = 1 * 8; | 1704 | idx = 1 * 8; |
1713 | idx += ((start - 0x80000) >> 14); | 1705 | idx += ((start - 0x80000) >> 14); |
1714 | return mtrr_state->fixed_ranges[idx]; | 1706 | return mtrr_state->fixed_ranges[idx]; |
1715 | } else if (start < 0x1000000) { | 1707 | } else if (start < 0x1000000) { |
1716 | idx = 3 * 8; | 1708 | idx = 3 * 8; |
1717 | idx += ((start - 0xC0000) >> 12); | 1709 | idx += ((start - 0xC0000) >> 12); |
1718 | return mtrr_state->fixed_ranges[idx]; | 1710 | return mtrr_state->fixed_ranges[idx]; |
1719 | } | 1711 | } |
1720 | } | 1712 | } |
1721 | 1713 | ||
1722 | /* | 1714 | /* |
1723 | * Look in variable ranges | 1715 | * Look in variable ranges |
1724 | * Look of multiple ranges matching this address and pick type | 1716 | * Look of multiple ranges matching this address and pick type |
1725 | * as per MTRR precedence | 1717 | * as per MTRR precedence |
1726 | */ | 1718 | */ |
1727 | if (!(mtrr_state->enabled & 2)) | 1719 | if (!(mtrr_state->enabled & 2)) |
1728 | return mtrr_state->def_type; | 1720 | return mtrr_state->def_type; |
1729 | 1721 | ||
1730 | prev_match = 0xFF; | 1722 | prev_match = 0xFF; |
1731 | for (i = 0; i < num_var_ranges; ++i) { | 1723 | for (i = 0; i < num_var_ranges; ++i) { |
1732 | unsigned short start_state, end_state; | 1724 | unsigned short start_state, end_state; |
1733 | 1725 | ||
1734 | if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11))) | 1726 | if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11))) |
1735 | continue; | 1727 | continue; |
1736 | 1728 | ||
1737 | base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) + | 1729 | base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) + |
1738 | (mtrr_state->var_ranges[i].base_lo & PAGE_MASK); | 1730 | (mtrr_state->var_ranges[i].base_lo & PAGE_MASK); |
1739 | mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) + | 1731 | mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) + |
1740 | (mtrr_state->var_ranges[i].mask_lo & PAGE_MASK); | 1732 | (mtrr_state->var_ranges[i].mask_lo & PAGE_MASK); |
1741 | 1733 | ||
1742 | start_state = ((start & mask) == (base & mask)); | 1734 | start_state = ((start & mask) == (base & mask)); |
1743 | end_state = ((end & mask) == (base & mask)); | 1735 | end_state = ((end & mask) == (base & mask)); |
1744 | if (start_state != end_state) | 1736 | if (start_state != end_state) |
1745 | return 0xFE; | 1737 | return 0xFE; |
1746 | 1738 | ||
1747 | if ((start & mask) != (base & mask)) | 1739 | if ((start & mask) != (base & mask)) |
1748 | continue; | 1740 | continue; |
1749 | 1741 | ||
1750 | curr_match = mtrr_state->var_ranges[i].base_lo & 0xff; | 1742 | curr_match = mtrr_state->var_ranges[i].base_lo & 0xff; |
1751 | if (prev_match == 0xFF) { | 1743 | if (prev_match == 0xFF) { |
1752 | prev_match = curr_match; | 1744 | prev_match = curr_match; |
1753 | continue; | 1745 | continue; |
1754 | } | 1746 | } |
1755 | 1747 | ||
1756 | if (prev_match == MTRR_TYPE_UNCACHABLE || | 1748 | if (prev_match == MTRR_TYPE_UNCACHABLE || |
1757 | curr_match == MTRR_TYPE_UNCACHABLE) | 1749 | curr_match == MTRR_TYPE_UNCACHABLE) |
1758 | return MTRR_TYPE_UNCACHABLE; | 1750 | return MTRR_TYPE_UNCACHABLE; |
1759 | 1751 | ||
1760 | if ((prev_match == MTRR_TYPE_WRBACK && | 1752 | if ((prev_match == MTRR_TYPE_WRBACK && |
1761 | curr_match == MTRR_TYPE_WRTHROUGH) || | 1753 | curr_match == MTRR_TYPE_WRTHROUGH) || |
1762 | (prev_match == MTRR_TYPE_WRTHROUGH && | 1754 | (prev_match == MTRR_TYPE_WRTHROUGH && |
1763 | curr_match == MTRR_TYPE_WRBACK)) { | 1755 | curr_match == MTRR_TYPE_WRBACK)) { |
1764 | prev_match = MTRR_TYPE_WRTHROUGH; | 1756 | prev_match = MTRR_TYPE_WRTHROUGH; |
1765 | curr_match = MTRR_TYPE_WRTHROUGH; | 1757 | curr_match = MTRR_TYPE_WRTHROUGH; |
1766 | } | 1758 | } |
1767 | 1759 | ||
1768 | if (prev_match != curr_match) | 1760 | if (prev_match != curr_match) |
1769 | return MTRR_TYPE_UNCACHABLE; | 1761 | return MTRR_TYPE_UNCACHABLE; |
1770 | } | 1762 | } |
1771 | 1763 | ||
1772 | if (prev_match != 0xFF) | 1764 | if (prev_match != 0xFF) |
1773 | return prev_match; | 1765 | return prev_match; |
1774 | 1766 | ||
1775 | return mtrr_state->def_type; | 1767 | return mtrr_state->def_type; |
1776 | } | 1768 | } |
1777 | 1769 | ||
1778 | u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) | 1770 | u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) |
1779 | { | 1771 | { |
1780 | u8 mtrr; | 1772 | u8 mtrr; |
1781 | 1773 | ||
1782 | mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT, | 1774 | mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT, |
1783 | (gfn << PAGE_SHIFT) + PAGE_SIZE); | 1775 | (gfn << PAGE_SHIFT) + PAGE_SIZE); |
1784 | if (mtrr == 0xfe || mtrr == 0xff) | 1776 | if (mtrr == 0xfe || mtrr == 0xff) |
1785 | mtrr = MTRR_TYPE_WRBACK; | 1777 | mtrr = MTRR_TYPE_WRBACK; |
1786 | return mtrr; | 1778 | return mtrr; |
1787 | } | 1779 | } |
1788 | EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type); | 1780 | EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type); |
1789 | 1781 | ||
1790 | static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) | 1782 | static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) |
1791 | { | 1783 | { |
1792 | trace_kvm_mmu_unsync_page(sp); | 1784 | trace_kvm_mmu_unsync_page(sp); |
1793 | ++vcpu->kvm->stat.mmu_unsync; | 1785 | ++vcpu->kvm->stat.mmu_unsync; |
1794 | sp->unsync = 1; | 1786 | sp->unsync = 1; |
1795 | 1787 | ||
1796 | kvm_mmu_mark_parents_unsync(sp); | 1788 | kvm_mmu_mark_parents_unsync(sp); |
1797 | mmu_convert_notrap(sp); | 1789 | mmu_convert_notrap(sp); |
1798 | } | 1790 | } |
1799 | 1791 | ||
1800 | static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) | 1792 | static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) |
1801 | { | 1793 | { |
1802 | struct kvm_mmu_page *s; | 1794 | struct kvm_mmu_page *s; |
1803 | struct hlist_node *node; | 1795 | struct hlist_node *node; |
1804 | 1796 | ||
1805 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { | 1797 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { |
1806 | if (s->unsync) | 1798 | if (s->unsync) |
1807 | continue; | 1799 | continue; |
1808 | WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); | 1800 | WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); |
1809 | __kvm_unsync_page(vcpu, s); | 1801 | __kvm_unsync_page(vcpu, s); |
1810 | } | 1802 | } |
1811 | } | 1803 | } |
1812 | 1804 | ||
1813 | static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, | 1805 | static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, |
1814 | bool can_unsync) | 1806 | bool can_unsync) |
1815 | { | 1807 | { |
1816 | struct kvm_mmu_page *s; | 1808 | struct kvm_mmu_page *s; |
1817 | struct hlist_node *node; | 1809 | struct hlist_node *node; |
1818 | bool need_unsync = false; | 1810 | bool need_unsync = false; |
1819 | 1811 | ||
1820 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { | 1812 | for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) { |
1821 | if (s->role.level != PT_PAGE_TABLE_LEVEL) | 1813 | if (s->role.level != PT_PAGE_TABLE_LEVEL) |
1822 | return 1; | 1814 | return 1; |
1823 | 1815 | ||
1824 | if (!need_unsync && !s->unsync) { | 1816 | if (!need_unsync && !s->unsync) { |
1825 | if (!can_unsync || !oos_shadow) | 1817 | if (!can_unsync || !oos_shadow) |
1826 | return 1; | 1818 | return 1; |
1827 | need_unsync = true; | 1819 | need_unsync = true; |
1828 | } | 1820 | } |
1829 | } | 1821 | } |
1830 | if (need_unsync) | 1822 | if (need_unsync) |
1831 | kvm_unsync_pages(vcpu, gfn); | 1823 | kvm_unsync_pages(vcpu, gfn); |
1832 | return 0; | 1824 | return 0; |
1833 | } | 1825 | } |
1834 | 1826 | ||
1835 | static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, | 1827 | static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, |
1836 | unsigned pte_access, int user_fault, | 1828 | unsigned pte_access, int user_fault, |
1837 | int write_fault, int dirty, int level, | 1829 | int write_fault, int dirty, int level, |
1838 | gfn_t gfn, pfn_t pfn, bool speculative, | 1830 | gfn_t gfn, pfn_t pfn, bool speculative, |
1839 | bool can_unsync, bool reset_host_protection) | 1831 | bool can_unsync, bool reset_host_protection) |
1840 | { | 1832 | { |
1841 | u64 spte; | 1833 | u64 spte; |
1842 | int ret = 0; | 1834 | int ret = 0; |
1843 | 1835 | ||
1844 | /* | 1836 | /* |
1845 | * We don't set the accessed bit, since we sometimes want to see | 1837 | * We don't set the accessed bit, since we sometimes want to see |
1846 | * whether the guest actually used the pte (in order to detect | 1838 | * whether the guest actually used the pte (in order to detect |
1847 | * demand paging). | 1839 | * demand paging). |
1848 | */ | 1840 | */ |
1849 | spte = shadow_base_present_pte | shadow_dirty_mask; | 1841 | spte = shadow_base_present_pte | shadow_dirty_mask; |
1850 | if (!speculative) | 1842 | if (!speculative) |
1851 | spte |= shadow_accessed_mask; | 1843 | spte |= shadow_accessed_mask; |
1852 | if (!dirty) | 1844 | if (!dirty) |
1853 | pte_access &= ~ACC_WRITE_MASK; | 1845 | pte_access &= ~ACC_WRITE_MASK; |
1854 | if (pte_access & ACC_EXEC_MASK) | 1846 | if (pte_access & ACC_EXEC_MASK) |
1855 | spte |= shadow_x_mask; | 1847 | spte |= shadow_x_mask; |
1856 | else | 1848 | else |
1857 | spte |= shadow_nx_mask; | 1849 | spte |= shadow_nx_mask; |
1858 | if (pte_access & ACC_USER_MASK) | 1850 | if (pte_access & ACC_USER_MASK) |
1859 | spte |= shadow_user_mask; | 1851 | spte |= shadow_user_mask; |
1860 | if (level > PT_PAGE_TABLE_LEVEL) | 1852 | if (level > PT_PAGE_TABLE_LEVEL) |
1861 | spte |= PT_PAGE_SIZE_MASK; | 1853 | spte |= PT_PAGE_SIZE_MASK; |
1862 | if (tdp_enabled) | 1854 | if (tdp_enabled) |
1863 | spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, | 1855 | spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, |
1864 | kvm_is_mmio_pfn(pfn)); | 1856 | kvm_is_mmio_pfn(pfn)); |
1865 | 1857 | ||
1866 | if (reset_host_protection) | 1858 | if (reset_host_protection) |
1867 | spte |= SPTE_HOST_WRITEABLE; | 1859 | spte |= SPTE_HOST_WRITEABLE; |
1868 | 1860 | ||
1869 | spte |= (u64)pfn << PAGE_SHIFT; | 1861 | spte |= (u64)pfn << PAGE_SHIFT; |
1870 | 1862 | ||
1871 | if ((pte_access & ACC_WRITE_MASK) | 1863 | if ((pte_access & ACC_WRITE_MASK) |
1872 | || (!tdp_enabled && write_fault && !is_write_protection(vcpu) | 1864 | || (!tdp_enabled && write_fault && !is_write_protection(vcpu) |
1873 | && !user_fault)) { | 1865 | && !user_fault)) { |
1874 | 1866 | ||
1875 | if (level > PT_PAGE_TABLE_LEVEL && | 1867 | if (level > PT_PAGE_TABLE_LEVEL && |
1876 | has_wrprotected_page(vcpu->kvm, gfn, level)) { | 1868 | has_wrprotected_page(vcpu->kvm, gfn, level)) { |
1877 | ret = 1; | 1869 | ret = 1; |
1878 | rmap_remove(vcpu->kvm, sptep); | 1870 | rmap_remove(vcpu->kvm, sptep); |
1879 | spte = shadow_trap_nonpresent_pte; | 1871 | spte = shadow_trap_nonpresent_pte; |
1880 | goto set_pte; | 1872 | goto set_pte; |
1881 | } | 1873 | } |
1882 | 1874 | ||
1883 | spte |= PT_WRITABLE_MASK; | 1875 | spte |= PT_WRITABLE_MASK; |
1884 | 1876 | ||
1885 | if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK)) | 1877 | if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK)) |
1886 | spte &= ~PT_USER_MASK; | 1878 | spte &= ~PT_USER_MASK; |
1887 | 1879 | ||
1888 | /* | 1880 | /* |
1889 | * Optimization: for pte sync, if spte was writable the hash | 1881 | * Optimization: for pte sync, if spte was writable the hash |
1890 | * lookup is unnecessary (and expensive). Write protection | 1882 | * lookup is unnecessary (and expensive). Write protection |
1891 | * is responsibility of mmu_get_page / kvm_sync_page. | 1883 | * is responsibility of mmu_get_page / kvm_sync_page. |
1892 | * Same reasoning can be applied to dirty page accounting. | 1884 | * Same reasoning can be applied to dirty page accounting. |
1893 | */ | 1885 | */ |
1894 | if (!can_unsync && is_writable_pte(*sptep)) | 1886 | if (!can_unsync && is_writable_pte(*sptep)) |
1895 | goto set_pte; | 1887 | goto set_pte; |
1896 | 1888 | ||
1897 | if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { | 1889 | if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { |
1898 | pgprintk("%s: found shadow page for %lx, marking ro\n", | 1890 | pgprintk("%s: found shadow page for %lx, marking ro\n", |
1899 | __func__, gfn); | 1891 | __func__, gfn); |
1900 | ret = 1; | 1892 | ret = 1; |
1901 | pte_access &= ~ACC_WRITE_MASK; | 1893 | pte_access &= ~ACC_WRITE_MASK; |
1902 | if (is_writable_pte(spte)) | 1894 | if (is_writable_pte(spte)) |
1903 | spte &= ~PT_WRITABLE_MASK; | 1895 | spte &= ~PT_WRITABLE_MASK; |
1904 | } | 1896 | } |
1905 | } | 1897 | } |
1906 | 1898 | ||
1907 | if (pte_access & ACC_WRITE_MASK) | 1899 | if (pte_access & ACC_WRITE_MASK) |
1908 | mark_page_dirty(vcpu->kvm, gfn); | 1900 | mark_page_dirty(vcpu->kvm, gfn); |
1909 | 1901 | ||
1910 | set_pte: | 1902 | set_pte: |
1911 | __set_spte(sptep, spte); | 1903 | __set_spte(sptep, spte); |
1912 | return ret; | 1904 | return ret; |
1913 | } | 1905 | } |
1914 | 1906 | ||
1915 | static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, | 1907 | static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, |
1916 | unsigned pt_access, unsigned pte_access, | 1908 | unsigned pt_access, unsigned pte_access, |
1917 | int user_fault, int write_fault, int dirty, | 1909 | int user_fault, int write_fault, int dirty, |
1918 | int *ptwrite, int level, gfn_t gfn, | 1910 | int *ptwrite, int level, gfn_t gfn, |
1919 | pfn_t pfn, bool speculative, | 1911 | pfn_t pfn, bool speculative, |
1920 | bool reset_host_protection) | 1912 | bool reset_host_protection) |
1921 | { | 1913 | { |
1922 | int was_rmapped = 0; | 1914 | int was_rmapped = 0; |
1923 | int was_writable = is_writable_pte(*sptep); | 1915 | int was_writable = is_writable_pte(*sptep); |
1924 | int rmap_count; | 1916 | int rmap_count; |
1925 | 1917 | ||
1926 | pgprintk("%s: spte %llx access %x write_fault %d" | 1918 | pgprintk("%s: spte %llx access %x write_fault %d" |
1927 | " user_fault %d gfn %lx\n", | 1919 | " user_fault %d gfn %lx\n", |
1928 | __func__, *sptep, pt_access, | 1920 | __func__, *sptep, pt_access, |
1929 | write_fault, user_fault, gfn); | 1921 | write_fault, user_fault, gfn); |
1930 | 1922 | ||
1931 | if (is_rmap_spte(*sptep)) { | 1923 | if (is_rmap_spte(*sptep)) { |
1932 | /* | 1924 | /* |
1933 | * If we overwrite a PTE page pointer with a 2MB PMD, unlink | 1925 | * If we overwrite a PTE page pointer with a 2MB PMD, unlink |
1934 | * the parent of the now unreachable PTE. | 1926 | * the parent of the now unreachable PTE. |
1935 | */ | 1927 | */ |
1936 | if (level > PT_PAGE_TABLE_LEVEL && | 1928 | if (level > PT_PAGE_TABLE_LEVEL && |
1937 | !is_large_pte(*sptep)) { | 1929 | !is_large_pte(*sptep)) { |
1938 | struct kvm_mmu_page *child; | 1930 | struct kvm_mmu_page *child; |
1939 | u64 pte = *sptep; | 1931 | u64 pte = *sptep; |
1940 | 1932 | ||
1941 | child = page_header(pte & PT64_BASE_ADDR_MASK); | 1933 | child = page_header(pte & PT64_BASE_ADDR_MASK); |
1942 | mmu_page_remove_parent_pte(child, sptep); | 1934 | mmu_page_remove_parent_pte(child, sptep); |
1943 | __set_spte(sptep, shadow_trap_nonpresent_pte); | 1935 | __set_spte(sptep, shadow_trap_nonpresent_pte); |
1944 | kvm_flush_remote_tlbs(vcpu->kvm); | 1936 | kvm_flush_remote_tlbs(vcpu->kvm); |
1945 | } else if (pfn != spte_to_pfn(*sptep)) { | 1937 | } else if (pfn != spte_to_pfn(*sptep)) { |
1946 | pgprintk("hfn old %lx new %lx\n", | 1938 | pgprintk("hfn old %lx new %lx\n", |
1947 | spte_to_pfn(*sptep), pfn); | 1939 | spte_to_pfn(*sptep), pfn); |
1948 | rmap_remove(vcpu->kvm, sptep); | 1940 | rmap_remove(vcpu->kvm, sptep); |
1949 | __set_spte(sptep, shadow_trap_nonpresent_pte); | 1941 | __set_spte(sptep, shadow_trap_nonpresent_pte); |
1950 | kvm_flush_remote_tlbs(vcpu->kvm); | 1942 | kvm_flush_remote_tlbs(vcpu->kvm); |
1951 | } else | 1943 | } else |
1952 | was_rmapped = 1; | 1944 | was_rmapped = 1; |
1953 | } | 1945 | } |
1954 | 1946 | ||
1955 | if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, | 1947 | if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, |
1956 | dirty, level, gfn, pfn, speculative, true, | 1948 | dirty, level, gfn, pfn, speculative, true, |
1957 | reset_host_protection)) { | 1949 | reset_host_protection)) { |
1958 | if (write_fault) | 1950 | if (write_fault) |
1959 | *ptwrite = 1; | 1951 | *ptwrite = 1; |
1960 | kvm_mmu_flush_tlb(vcpu); | 1952 | kvm_mmu_flush_tlb(vcpu); |
1961 | } | 1953 | } |
1962 | 1954 | ||
1963 | pgprintk("%s: setting spte %llx\n", __func__, *sptep); | 1955 | pgprintk("%s: setting spte %llx\n", __func__, *sptep); |
1964 | pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n", | 1956 | pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n", |
1965 | is_large_pte(*sptep)? "2MB" : "4kB", | 1957 | is_large_pte(*sptep)? "2MB" : "4kB", |
1966 | *sptep & PT_PRESENT_MASK ?"RW":"R", gfn, | 1958 | *sptep & PT_PRESENT_MASK ?"RW":"R", gfn, |
1967 | *sptep, sptep); | 1959 | *sptep, sptep); |
1968 | if (!was_rmapped && is_large_pte(*sptep)) | 1960 | if (!was_rmapped && is_large_pte(*sptep)) |
1969 | ++vcpu->kvm->stat.lpages; | 1961 | ++vcpu->kvm->stat.lpages; |
1970 | 1962 | ||
1971 | page_header_update_slot(vcpu->kvm, sptep, gfn); | 1963 | page_header_update_slot(vcpu->kvm, sptep, gfn); |
1972 | if (!was_rmapped) { | 1964 | if (!was_rmapped) { |
1973 | rmap_count = rmap_add(vcpu, sptep, gfn); | 1965 | rmap_count = rmap_add(vcpu, sptep, gfn); |
1974 | kvm_release_pfn_clean(pfn); | 1966 | kvm_release_pfn_clean(pfn); |
1975 | if (rmap_count > RMAP_RECYCLE_THRESHOLD) | 1967 | if (rmap_count > RMAP_RECYCLE_THRESHOLD) |
1976 | rmap_recycle(vcpu, sptep, gfn); | 1968 | rmap_recycle(vcpu, sptep, gfn); |
1977 | } else { | 1969 | } else { |
1978 | if (was_writable) | 1970 | if (was_writable) |
1979 | kvm_release_pfn_dirty(pfn); | 1971 | kvm_release_pfn_dirty(pfn); |
1980 | else | 1972 | else |
1981 | kvm_release_pfn_clean(pfn); | 1973 | kvm_release_pfn_clean(pfn); |
1982 | } | 1974 | } |
1983 | if (speculative) { | 1975 | if (speculative) { |
1984 | vcpu->arch.last_pte_updated = sptep; | 1976 | vcpu->arch.last_pte_updated = sptep; |
1985 | vcpu->arch.last_pte_gfn = gfn; | 1977 | vcpu->arch.last_pte_gfn = gfn; |
1986 | } | 1978 | } |
1987 | } | 1979 | } |
1988 | 1980 | ||
1989 | static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) | 1981 | static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) |
1990 | { | 1982 | { |
1991 | } | 1983 | } |
1992 | 1984 | ||
1993 | static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, | 1985 | static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, |
1994 | int level, gfn_t gfn, pfn_t pfn) | 1986 | int level, gfn_t gfn, pfn_t pfn) |
1995 | { | 1987 | { |
1996 | struct kvm_shadow_walk_iterator iterator; | 1988 | struct kvm_shadow_walk_iterator iterator; |
1997 | struct kvm_mmu_page *sp; | 1989 | struct kvm_mmu_page *sp; |
1998 | int pt_write = 0; | 1990 | int pt_write = 0; |
1999 | gfn_t pseudo_gfn; | 1991 | gfn_t pseudo_gfn; |
2000 | 1992 | ||
2001 | for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { | 1993 | for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { |
2002 | if (iterator.level == level) { | 1994 | if (iterator.level == level) { |
2003 | mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL, | 1995 | mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL, |
2004 | 0, write, 1, &pt_write, | 1996 | 0, write, 1, &pt_write, |
2005 | level, gfn, pfn, false, true); | 1997 | level, gfn, pfn, false, true); |
2006 | ++vcpu->stat.pf_fixed; | 1998 | ++vcpu->stat.pf_fixed; |
2007 | break; | 1999 | break; |
2008 | } | 2000 | } |
2009 | 2001 | ||
2010 | if (*iterator.sptep == shadow_trap_nonpresent_pte) { | 2002 | if (*iterator.sptep == shadow_trap_nonpresent_pte) { |
2011 | u64 base_addr = iterator.addr; | 2003 | u64 base_addr = iterator.addr; |
2012 | 2004 | ||
2013 | base_addr &= PT64_LVL_ADDR_MASK(iterator.level); | 2005 | base_addr &= PT64_LVL_ADDR_MASK(iterator.level); |
2014 | pseudo_gfn = base_addr >> PAGE_SHIFT; | 2006 | pseudo_gfn = base_addr >> PAGE_SHIFT; |
2015 | sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, | 2007 | sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, |
2016 | iterator.level - 1, | 2008 | iterator.level - 1, |
2017 | 1, ACC_ALL, iterator.sptep); | 2009 | 1, ACC_ALL, iterator.sptep); |
2018 | if (!sp) { | 2010 | if (!sp) { |
2019 | pgprintk("nonpaging_map: ENOMEM\n"); | 2011 | pgprintk("nonpaging_map: ENOMEM\n"); |
2020 | kvm_release_pfn_clean(pfn); | 2012 | kvm_release_pfn_clean(pfn); |
2021 | return -ENOMEM; | 2013 | return -ENOMEM; |
2022 | } | 2014 | } |
2023 | 2015 | ||
2024 | __set_spte(iterator.sptep, | 2016 | __set_spte(iterator.sptep, |
2025 | __pa(sp->spt) | 2017 | __pa(sp->spt) |
2026 | | PT_PRESENT_MASK | PT_WRITABLE_MASK | 2018 | | PT_PRESENT_MASK | PT_WRITABLE_MASK |
2027 | | shadow_user_mask | shadow_x_mask); | 2019 | | shadow_user_mask | shadow_x_mask); |
2028 | } | 2020 | } |
2029 | } | 2021 | } |
2030 | return pt_write; | 2022 | return pt_write; |
2031 | } | 2023 | } |
2032 | 2024 | ||
2033 | static void kvm_send_hwpoison_signal(struct kvm *kvm, gfn_t gfn) | 2025 | static void kvm_send_hwpoison_signal(struct kvm *kvm, gfn_t gfn) |
2034 | { | 2026 | { |
2035 | char buf[1]; | 2027 | char buf[1]; |
2036 | void __user *hva; | 2028 | void __user *hva; |
2037 | int r; | 2029 | int r; |
2038 | 2030 | ||
2039 | /* Touch the page, so send SIGBUS */ | 2031 | /* Touch the page, so send SIGBUS */ |
2040 | hva = (void __user *)gfn_to_hva(kvm, gfn); | 2032 | hva = (void __user *)gfn_to_hva(kvm, gfn); |
2041 | r = copy_from_user(buf, hva, 1); | 2033 | r = copy_from_user(buf, hva, 1); |
2042 | } | 2034 | } |
2043 | 2035 | ||
2044 | static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn) | 2036 | static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn) |
2045 | { | 2037 | { |
2046 | kvm_release_pfn_clean(pfn); | 2038 | kvm_release_pfn_clean(pfn); |
2047 | if (is_hwpoison_pfn(pfn)) { | 2039 | if (is_hwpoison_pfn(pfn)) { |
2048 | kvm_send_hwpoison_signal(kvm, gfn); | 2040 | kvm_send_hwpoison_signal(kvm, gfn); |
2049 | return 0; | 2041 | return 0; |
2050 | } | 2042 | } |
2051 | return 1; | 2043 | return 1; |
2052 | } | 2044 | } |
2053 | 2045 | ||
2054 | static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) | 2046 | static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) |
2055 | { | 2047 | { |
2056 | int r; | 2048 | int r; |
2057 | int level; | 2049 | int level; |
2058 | pfn_t pfn; | 2050 | pfn_t pfn; |
2059 | unsigned long mmu_seq; | 2051 | unsigned long mmu_seq; |
2060 | 2052 | ||
2061 | level = mapping_level(vcpu, gfn); | 2053 | level = mapping_level(vcpu, gfn); |
2062 | 2054 | ||
2063 | /* | 2055 | /* |
2064 | * This path builds a PAE pagetable - so we can map 2mb pages at | 2056 | * This path builds a PAE pagetable - so we can map 2mb pages at |
2065 | * maximum. Therefore check if the level is larger than that. | 2057 | * maximum. Therefore check if the level is larger than that. |
2066 | */ | 2058 | */ |
2067 | if (level > PT_DIRECTORY_LEVEL) | 2059 | if (level > PT_DIRECTORY_LEVEL) |
2068 | level = PT_DIRECTORY_LEVEL; | 2060 | level = PT_DIRECTORY_LEVEL; |
2069 | 2061 | ||
2070 | gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); | 2062 | gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); |
2071 | 2063 | ||
2072 | mmu_seq = vcpu->kvm->mmu_notifier_seq; | 2064 | mmu_seq = vcpu->kvm->mmu_notifier_seq; |
2073 | smp_rmb(); | 2065 | smp_rmb(); |
2074 | pfn = gfn_to_pfn(vcpu->kvm, gfn); | 2066 | pfn = gfn_to_pfn(vcpu->kvm, gfn); |
2075 | 2067 | ||
2076 | /* mmio */ | 2068 | /* mmio */ |
2077 | if (is_error_pfn(pfn)) | 2069 | if (is_error_pfn(pfn)) |
2078 | return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); | 2070 | return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); |
2079 | 2071 | ||
2080 | spin_lock(&vcpu->kvm->mmu_lock); | 2072 | spin_lock(&vcpu->kvm->mmu_lock); |
2081 | if (mmu_notifier_retry(vcpu, mmu_seq)) | 2073 | if (mmu_notifier_retry(vcpu, mmu_seq)) |
2082 | goto out_unlock; | 2074 | goto out_unlock; |
2083 | kvm_mmu_free_some_pages(vcpu); | 2075 | kvm_mmu_free_some_pages(vcpu); |
2084 | r = __direct_map(vcpu, v, write, level, gfn, pfn); | 2076 | r = __direct_map(vcpu, v, write, level, gfn, pfn); |
2085 | spin_unlock(&vcpu->kvm->mmu_lock); | 2077 | spin_unlock(&vcpu->kvm->mmu_lock); |
2086 | 2078 | ||
2087 | 2079 | ||
2088 | return r; | 2080 | return r; |
2089 | 2081 | ||
2090 | out_unlock: | 2082 | out_unlock: |
2091 | spin_unlock(&vcpu->kvm->mmu_lock); | 2083 | spin_unlock(&vcpu->kvm->mmu_lock); |
2092 | kvm_release_pfn_clean(pfn); | 2084 | kvm_release_pfn_clean(pfn); |
2093 | return 0; | 2085 | return 0; |
2094 | } | 2086 | } |
2095 | 2087 | ||
2096 | 2088 | ||
2097 | static void mmu_free_roots(struct kvm_vcpu *vcpu) | 2089 | static void mmu_free_roots(struct kvm_vcpu *vcpu) |
2098 | { | 2090 | { |
2099 | int i; | 2091 | int i; |
2100 | struct kvm_mmu_page *sp; | 2092 | struct kvm_mmu_page *sp; |
2101 | LIST_HEAD(invalid_list); | 2093 | LIST_HEAD(invalid_list); |
2102 | 2094 | ||
2103 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) | 2095 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) |
2104 | return; | 2096 | return; |
2105 | spin_lock(&vcpu->kvm->mmu_lock); | 2097 | spin_lock(&vcpu->kvm->mmu_lock); |
2106 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { | 2098 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { |
2107 | hpa_t root = vcpu->arch.mmu.root_hpa; | 2099 | hpa_t root = vcpu->arch.mmu.root_hpa; |
2108 | 2100 | ||
2109 | sp = page_header(root); | 2101 | sp = page_header(root); |
2110 | --sp->root_count; | 2102 | --sp->root_count; |
2111 | if (!sp->root_count && sp->role.invalid) { | 2103 | if (!sp->root_count && sp->role.invalid) { |
2112 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); | 2104 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); |
2113 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 2105 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
2114 | } | 2106 | } |
2115 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; | 2107 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; |
2116 | spin_unlock(&vcpu->kvm->mmu_lock); | 2108 | spin_unlock(&vcpu->kvm->mmu_lock); |
2117 | return; | 2109 | return; |
2118 | } | 2110 | } |
2119 | for (i = 0; i < 4; ++i) { | 2111 | for (i = 0; i < 4; ++i) { |
2120 | hpa_t root = vcpu->arch.mmu.pae_root[i]; | 2112 | hpa_t root = vcpu->arch.mmu.pae_root[i]; |
2121 | 2113 | ||
2122 | if (root) { | 2114 | if (root) { |
2123 | root &= PT64_BASE_ADDR_MASK; | 2115 | root &= PT64_BASE_ADDR_MASK; |
2124 | sp = page_header(root); | 2116 | sp = page_header(root); |
2125 | --sp->root_count; | 2117 | --sp->root_count; |
2126 | if (!sp->root_count && sp->role.invalid) | 2118 | if (!sp->root_count && sp->role.invalid) |
2127 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, | 2119 | kvm_mmu_prepare_zap_page(vcpu->kvm, sp, |
2128 | &invalid_list); | 2120 | &invalid_list); |
2129 | } | 2121 | } |
2130 | vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; | 2122 | vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; |
2131 | } | 2123 | } |
2132 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 2124 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
2133 | spin_unlock(&vcpu->kvm->mmu_lock); | 2125 | spin_unlock(&vcpu->kvm->mmu_lock); |
2134 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; | 2126 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; |
2135 | } | 2127 | } |
2136 | 2128 | ||
2137 | static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn) | 2129 | static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn) |
2138 | { | 2130 | { |
2139 | int ret = 0; | 2131 | int ret = 0; |
2140 | 2132 | ||
2141 | if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) { | 2133 | if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) { |
2142 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); | 2134 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); |
2143 | ret = 1; | 2135 | ret = 1; |
2144 | } | 2136 | } |
2145 | 2137 | ||
2146 | return ret; | 2138 | return ret; |
2147 | } | 2139 | } |
2148 | 2140 | ||
2149 | static int mmu_alloc_roots(struct kvm_vcpu *vcpu) | 2141 | static int mmu_alloc_roots(struct kvm_vcpu *vcpu) |
2150 | { | 2142 | { |
2151 | int i; | 2143 | int i; |
2152 | gfn_t root_gfn; | 2144 | gfn_t root_gfn; |
2153 | struct kvm_mmu_page *sp; | 2145 | struct kvm_mmu_page *sp; |
2154 | int direct = 0; | 2146 | int direct = 0; |
2155 | u64 pdptr; | 2147 | u64 pdptr; |
2156 | 2148 | ||
2157 | root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT; | 2149 | root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT; |
2158 | 2150 | ||
2159 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { | 2151 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { |
2160 | hpa_t root = vcpu->arch.mmu.root_hpa; | 2152 | hpa_t root = vcpu->arch.mmu.root_hpa; |
2161 | 2153 | ||
2162 | ASSERT(!VALID_PAGE(root)); | 2154 | ASSERT(!VALID_PAGE(root)); |
2163 | if (mmu_check_root(vcpu, root_gfn)) | 2155 | if (mmu_check_root(vcpu, root_gfn)) |
2164 | return 1; | 2156 | return 1; |
2165 | if (tdp_enabled) { | 2157 | if (tdp_enabled) { |
2166 | direct = 1; | 2158 | direct = 1; |
2167 | root_gfn = 0; | 2159 | root_gfn = 0; |
2168 | } | 2160 | } |
2169 | spin_lock(&vcpu->kvm->mmu_lock); | 2161 | spin_lock(&vcpu->kvm->mmu_lock); |
2170 | kvm_mmu_free_some_pages(vcpu); | 2162 | kvm_mmu_free_some_pages(vcpu); |
2171 | sp = kvm_mmu_get_page(vcpu, root_gfn, 0, | 2163 | sp = kvm_mmu_get_page(vcpu, root_gfn, 0, |
2172 | PT64_ROOT_LEVEL, direct, | 2164 | PT64_ROOT_LEVEL, direct, |
2173 | ACC_ALL, NULL); | 2165 | ACC_ALL, NULL); |
2174 | root = __pa(sp->spt); | 2166 | root = __pa(sp->spt); |
2175 | ++sp->root_count; | 2167 | ++sp->root_count; |
2176 | spin_unlock(&vcpu->kvm->mmu_lock); | 2168 | spin_unlock(&vcpu->kvm->mmu_lock); |
2177 | vcpu->arch.mmu.root_hpa = root; | 2169 | vcpu->arch.mmu.root_hpa = root; |
2178 | return 0; | 2170 | return 0; |
2179 | } | 2171 | } |
2180 | direct = !is_paging(vcpu); | 2172 | direct = !is_paging(vcpu); |
2181 | for (i = 0; i < 4; ++i) { | 2173 | for (i = 0; i < 4; ++i) { |
2182 | hpa_t root = vcpu->arch.mmu.pae_root[i]; | 2174 | hpa_t root = vcpu->arch.mmu.pae_root[i]; |
2183 | 2175 | ||
2184 | ASSERT(!VALID_PAGE(root)); | 2176 | ASSERT(!VALID_PAGE(root)); |
2185 | if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) { | 2177 | if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) { |
2186 | pdptr = kvm_pdptr_read(vcpu, i); | 2178 | pdptr = kvm_pdptr_read(vcpu, i); |
2187 | if (!is_present_gpte(pdptr)) { | 2179 | if (!is_present_gpte(pdptr)) { |
2188 | vcpu->arch.mmu.pae_root[i] = 0; | 2180 | vcpu->arch.mmu.pae_root[i] = 0; |
2189 | continue; | 2181 | continue; |
2190 | } | 2182 | } |
2191 | root_gfn = pdptr >> PAGE_SHIFT; | 2183 | root_gfn = pdptr >> PAGE_SHIFT; |
2192 | } else if (vcpu->arch.mmu.root_level == 0) | 2184 | } else if (vcpu->arch.mmu.root_level == 0) |
2193 | root_gfn = 0; | 2185 | root_gfn = 0; |
2194 | if (mmu_check_root(vcpu, root_gfn)) | 2186 | if (mmu_check_root(vcpu, root_gfn)) |
2195 | return 1; | 2187 | return 1; |
2196 | if (tdp_enabled) { | 2188 | if (tdp_enabled) { |
2197 | direct = 1; | 2189 | direct = 1; |
2198 | root_gfn = i << 30; | 2190 | root_gfn = i << 30; |
2199 | } | 2191 | } |
2200 | spin_lock(&vcpu->kvm->mmu_lock); | 2192 | spin_lock(&vcpu->kvm->mmu_lock); |
2201 | kvm_mmu_free_some_pages(vcpu); | 2193 | kvm_mmu_free_some_pages(vcpu); |
2202 | sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, | 2194 | sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, |
2203 | PT32_ROOT_LEVEL, direct, | 2195 | PT32_ROOT_LEVEL, direct, |
2204 | ACC_ALL, NULL); | 2196 | ACC_ALL, NULL); |
2205 | root = __pa(sp->spt); | 2197 | root = __pa(sp->spt); |
2206 | ++sp->root_count; | 2198 | ++sp->root_count; |
2207 | spin_unlock(&vcpu->kvm->mmu_lock); | 2199 | spin_unlock(&vcpu->kvm->mmu_lock); |
2208 | 2200 | ||
2209 | vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; | 2201 | vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; |
2210 | } | 2202 | } |
2211 | vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root); | 2203 | vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root); |
2212 | return 0; | 2204 | return 0; |
2213 | } | 2205 | } |
2214 | 2206 | ||
2215 | static void mmu_sync_roots(struct kvm_vcpu *vcpu) | 2207 | static void mmu_sync_roots(struct kvm_vcpu *vcpu) |
2216 | { | 2208 | { |
2217 | int i; | 2209 | int i; |
2218 | struct kvm_mmu_page *sp; | 2210 | struct kvm_mmu_page *sp; |
2219 | 2211 | ||
2220 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) | 2212 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) |
2221 | return; | 2213 | return; |
2222 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { | 2214 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { |
2223 | hpa_t root = vcpu->arch.mmu.root_hpa; | 2215 | hpa_t root = vcpu->arch.mmu.root_hpa; |
2224 | sp = page_header(root); | 2216 | sp = page_header(root); |
2225 | mmu_sync_children(vcpu, sp); | 2217 | mmu_sync_children(vcpu, sp); |
2226 | return; | 2218 | return; |
2227 | } | 2219 | } |
2228 | for (i = 0; i < 4; ++i) { | 2220 | for (i = 0; i < 4; ++i) { |
2229 | hpa_t root = vcpu->arch.mmu.pae_root[i]; | 2221 | hpa_t root = vcpu->arch.mmu.pae_root[i]; |
2230 | 2222 | ||
2231 | if (root && VALID_PAGE(root)) { | 2223 | if (root && VALID_PAGE(root)) { |
2232 | root &= PT64_BASE_ADDR_MASK; | 2224 | root &= PT64_BASE_ADDR_MASK; |
2233 | sp = page_header(root); | 2225 | sp = page_header(root); |
2234 | mmu_sync_children(vcpu, sp); | 2226 | mmu_sync_children(vcpu, sp); |
2235 | } | 2227 | } |
2236 | } | 2228 | } |
2237 | } | 2229 | } |
2238 | 2230 | ||
2239 | void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) | 2231 | void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) |
2240 | { | 2232 | { |
2241 | spin_lock(&vcpu->kvm->mmu_lock); | 2233 | spin_lock(&vcpu->kvm->mmu_lock); |
2242 | mmu_sync_roots(vcpu); | 2234 | mmu_sync_roots(vcpu); |
2243 | spin_unlock(&vcpu->kvm->mmu_lock); | 2235 | spin_unlock(&vcpu->kvm->mmu_lock); |
2244 | } | 2236 | } |
2245 | 2237 | ||
2246 | static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr, | 2238 | static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr, |
2247 | u32 access, u32 *error) | 2239 | u32 access, u32 *error) |
2248 | { | 2240 | { |
2249 | if (error) | 2241 | if (error) |
2250 | *error = 0; | 2242 | *error = 0; |
2251 | return vaddr; | 2243 | return vaddr; |
2252 | } | 2244 | } |
2253 | 2245 | ||
2254 | static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, | 2246 | static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, |
2255 | u32 error_code) | 2247 | u32 error_code) |
2256 | { | 2248 | { |
2257 | gfn_t gfn; | 2249 | gfn_t gfn; |
2258 | int r; | 2250 | int r; |
2259 | 2251 | ||
2260 | pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); | 2252 | pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); |
2261 | r = mmu_topup_memory_caches(vcpu); | 2253 | r = mmu_topup_memory_caches(vcpu); |
2262 | if (r) | 2254 | if (r) |
2263 | return r; | 2255 | return r; |
2264 | 2256 | ||
2265 | ASSERT(vcpu); | 2257 | ASSERT(vcpu); |
2266 | ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); | 2258 | ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); |
2267 | 2259 | ||
2268 | gfn = gva >> PAGE_SHIFT; | 2260 | gfn = gva >> PAGE_SHIFT; |
2269 | 2261 | ||
2270 | return nonpaging_map(vcpu, gva & PAGE_MASK, | 2262 | return nonpaging_map(vcpu, gva & PAGE_MASK, |
2271 | error_code & PFERR_WRITE_MASK, gfn); | 2263 | error_code & PFERR_WRITE_MASK, gfn); |
2272 | } | 2264 | } |
2273 | 2265 | ||
2274 | static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, | 2266 | static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, |
2275 | u32 error_code) | 2267 | u32 error_code) |
2276 | { | 2268 | { |
2277 | pfn_t pfn; | 2269 | pfn_t pfn; |
2278 | int r; | 2270 | int r; |
2279 | int level; | 2271 | int level; |
2280 | gfn_t gfn = gpa >> PAGE_SHIFT; | 2272 | gfn_t gfn = gpa >> PAGE_SHIFT; |
2281 | unsigned long mmu_seq; | 2273 | unsigned long mmu_seq; |
2282 | 2274 | ||
2283 | ASSERT(vcpu); | 2275 | ASSERT(vcpu); |
2284 | ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); | 2276 | ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); |
2285 | 2277 | ||
2286 | r = mmu_topup_memory_caches(vcpu); | 2278 | r = mmu_topup_memory_caches(vcpu); |
2287 | if (r) | 2279 | if (r) |
2288 | return r; | 2280 | return r; |
2289 | 2281 | ||
2290 | level = mapping_level(vcpu, gfn); | 2282 | level = mapping_level(vcpu, gfn); |
2291 | 2283 | ||
2292 | gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); | 2284 | gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); |
2293 | 2285 | ||
2294 | mmu_seq = vcpu->kvm->mmu_notifier_seq; | 2286 | mmu_seq = vcpu->kvm->mmu_notifier_seq; |
2295 | smp_rmb(); | 2287 | smp_rmb(); |
2296 | pfn = gfn_to_pfn(vcpu->kvm, gfn); | 2288 | pfn = gfn_to_pfn(vcpu->kvm, gfn); |
2297 | if (is_error_pfn(pfn)) | 2289 | if (is_error_pfn(pfn)) |
2298 | return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); | 2290 | return kvm_handle_bad_page(vcpu->kvm, gfn, pfn); |
2299 | spin_lock(&vcpu->kvm->mmu_lock); | 2291 | spin_lock(&vcpu->kvm->mmu_lock); |
2300 | if (mmu_notifier_retry(vcpu, mmu_seq)) | 2292 | if (mmu_notifier_retry(vcpu, mmu_seq)) |
2301 | goto out_unlock; | 2293 | goto out_unlock; |
2302 | kvm_mmu_free_some_pages(vcpu); | 2294 | kvm_mmu_free_some_pages(vcpu); |
2303 | r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, | 2295 | r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK, |
2304 | level, gfn, pfn); | 2296 | level, gfn, pfn); |
2305 | spin_unlock(&vcpu->kvm->mmu_lock); | 2297 | spin_unlock(&vcpu->kvm->mmu_lock); |
2306 | 2298 | ||
2307 | return r; | 2299 | return r; |
2308 | 2300 | ||
2309 | out_unlock: | 2301 | out_unlock: |
2310 | spin_unlock(&vcpu->kvm->mmu_lock); | 2302 | spin_unlock(&vcpu->kvm->mmu_lock); |
2311 | kvm_release_pfn_clean(pfn); | 2303 | kvm_release_pfn_clean(pfn); |
2312 | return 0; | 2304 | return 0; |
2313 | } | 2305 | } |
2314 | 2306 | ||
2315 | static void nonpaging_free(struct kvm_vcpu *vcpu) | 2307 | static void nonpaging_free(struct kvm_vcpu *vcpu) |
2316 | { | 2308 | { |
2317 | mmu_free_roots(vcpu); | 2309 | mmu_free_roots(vcpu); |
2318 | } | 2310 | } |
2319 | 2311 | ||
2320 | static int nonpaging_init_context(struct kvm_vcpu *vcpu) | 2312 | static int nonpaging_init_context(struct kvm_vcpu *vcpu) |
2321 | { | 2313 | { |
2322 | struct kvm_mmu *context = &vcpu->arch.mmu; | 2314 | struct kvm_mmu *context = &vcpu->arch.mmu; |
2323 | 2315 | ||
2324 | context->new_cr3 = nonpaging_new_cr3; | 2316 | context->new_cr3 = nonpaging_new_cr3; |
2325 | context->page_fault = nonpaging_page_fault; | 2317 | context->page_fault = nonpaging_page_fault; |
2326 | context->gva_to_gpa = nonpaging_gva_to_gpa; | 2318 | context->gva_to_gpa = nonpaging_gva_to_gpa; |
2327 | context->free = nonpaging_free; | 2319 | context->free = nonpaging_free; |
2328 | context->prefetch_page = nonpaging_prefetch_page; | 2320 | context->prefetch_page = nonpaging_prefetch_page; |
2329 | context->sync_page = nonpaging_sync_page; | 2321 | context->sync_page = nonpaging_sync_page; |
2330 | context->invlpg = nonpaging_invlpg; | 2322 | context->invlpg = nonpaging_invlpg; |
2331 | context->root_level = 0; | 2323 | context->root_level = 0; |
2332 | context->shadow_root_level = PT32E_ROOT_LEVEL; | 2324 | context->shadow_root_level = PT32E_ROOT_LEVEL; |
2333 | context->root_hpa = INVALID_PAGE; | 2325 | context->root_hpa = INVALID_PAGE; |
2334 | return 0; | 2326 | return 0; |
2335 | } | 2327 | } |
2336 | 2328 | ||
2337 | void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu) | 2329 | void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu) |
2338 | { | 2330 | { |
2339 | ++vcpu->stat.tlb_flush; | 2331 | ++vcpu->stat.tlb_flush; |
2340 | set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); | 2332 | set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests); |
2341 | } | 2333 | } |
2342 | 2334 | ||
2343 | static void paging_new_cr3(struct kvm_vcpu *vcpu) | 2335 | static void paging_new_cr3(struct kvm_vcpu *vcpu) |
2344 | { | 2336 | { |
2345 | pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3); | 2337 | pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3); |
2346 | mmu_free_roots(vcpu); | 2338 | mmu_free_roots(vcpu); |
2347 | } | 2339 | } |
2348 | 2340 | ||
2349 | static void inject_page_fault(struct kvm_vcpu *vcpu, | 2341 | static void inject_page_fault(struct kvm_vcpu *vcpu, |
2350 | u64 addr, | 2342 | u64 addr, |
2351 | u32 err_code) | 2343 | u32 err_code) |
2352 | { | 2344 | { |
2353 | kvm_inject_page_fault(vcpu, addr, err_code); | 2345 | kvm_inject_page_fault(vcpu, addr, err_code); |
2354 | } | 2346 | } |
2355 | 2347 | ||
2356 | static void paging_free(struct kvm_vcpu *vcpu) | 2348 | static void paging_free(struct kvm_vcpu *vcpu) |
2357 | { | 2349 | { |
2358 | nonpaging_free(vcpu); | 2350 | nonpaging_free(vcpu); |
2359 | } | 2351 | } |
2360 | 2352 | ||
2361 | static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level) | 2353 | static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level) |
2362 | { | 2354 | { |
2363 | int bit7; | 2355 | int bit7; |
2364 | 2356 | ||
2365 | bit7 = (gpte >> 7) & 1; | 2357 | bit7 = (gpte >> 7) & 1; |
2366 | return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0; | 2358 | return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0; |
2367 | } | 2359 | } |
2368 | 2360 | ||
2369 | #define PTTYPE 64 | 2361 | #define PTTYPE 64 |
2370 | #include "paging_tmpl.h" | 2362 | #include "paging_tmpl.h" |
2371 | #undef PTTYPE | 2363 | #undef PTTYPE |
2372 | 2364 | ||
2373 | #define PTTYPE 32 | 2365 | #define PTTYPE 32 |
2374 | #include "paging_tmpl.h" | 2366 | #include "paging_tmpl.h" |
2375 | #undef PTTYPE | 2367 | #undef PTTYPE |
2376 | 2368 | ||
2377 | static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) | 2369 | static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) |
2378 | { | 2370 | { |
2379 | struct kvm_mmu *context = &vcpu->arch.mmu; | 2371 | struct kvm_mmu *context = &vcpu->arch.mmu; |
2380 | int maxphyaddr = cpuid_maxphyaddr(vcpu); | 2372 | int maxphyaddr = cpuid_maxphyaddr(vcpu); |
2381 | u64 exb_bit_rsvd = 0; | 2373 | u64 exb_bit_rsvd = 0; |
2382 | 2374 | ||
2383 | if (!is_nx(vcpu)) | 2375 | if (!is_nx(vcpu)) |
2384 | exb_bit_rsvd = rsvd_bits(63, 63); | 2376 | exb_bit_rsvd = rsvd_bits(63, 63); |
2385 | switch (level) { | 2377 | switch (level) { |
2386 | case PT32_ROOT_LEVEL: | 2378 | case PT32_ROOT_LEVEL: |
2387 | /* no rsvd bits for 2 level 4K page table entries */ | 2379 | /* no rsvd bits for 2 level 4K page table entries */ |
2388 | context->rsvd_bits_mask[0][1] = 0; | 2380 | context->rsvd_bits_mask[0][1] = 0; |
2389 | context->rsvd_bits_mask[0][0] = 0; | 2381 | context->rsvd_bits_mask[0][0] = 0; |
2390 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; | 2382 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; |
2391 | 2383 | ||
2392 | if (!is_pse(vcpu)) { | 2384 | if (!is_pse(vcpu)) { |
2393 | context->rsvd_bits_mask[1][1] = 0; | 2385 | context->rsvd_bits_mask[1][1] = 0; |
2394 | break; | 2386 | break; |
2395 | } | 2387 | } |
2396 | 2388 | ||
2397 | if (is_cpuid_PSE36()) | 2389 | if (is_cpuid_PSE36()) |
2398 | /* 36bits PSE 4MB page */ | 2390 | /* 36bits PSE 4MB page */ |
2399 | context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21); | 2391 | context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21); |
2400 | else | 2392 | else |
2401 | /* 32 bits PSE 4MB page */ | 2393 | /* 32 bits PSE 4MB page */ |
2402 | context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21); | 2394 | context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21); |
2403 | break; | 2395 | break; |
2404 | case PT32E_ROOT_LEVEL: | 2396 | case PT32E_ROOT_LEVEL: |
2405 | context->rsvd_bits_mask[0][2] = | 2397 | context->rsvd_bits_mask[0][2] = |
2406 | rsvd_bits(maxphyaddr, 63) | | 2398 | rsvd_bits(maxphyaddr, 63) | |
2407 | rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */ | 2399 | rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */ |
2408 | context->rsvd_bits_mask[0][1] = exb_bit_rsvd | | 2400 | context->rsvd_bits_mask[0][1] = exb_bit_rsvd | |
2409 | rsvd_bits(maxphyaddr, 62); /* PDE */ | 2401 | rsvd_bits(maxphyaddr, 62); /* PDE */ |
2410 | context->rsvd_bits_mask[0][0] = exb_bit_rsvd | | 2402 | context->rsvd_bits_mask[0][0] = exb_bit_rsvd | |
2411 | rsvd_bits(maxphyaddr, 62); /* PTE */ | 2403 | rsvd_bits(maxphyaddr, 62); /* PTE */ |
2412 | context->rsvd_bits_mask[1][1] = exb_bit_rsvd | | 2404 | context->rsvd_bits_mask[1][1] = exb_bit_rsvd | |
2413 | rsvd_bits(maxphyaddr, 62) | | 2405 | rsvd_bits(maxphyaddr, 62) | |
2414 | rsvd_bits(13, 20); /* large page */ | 2406 | rsvd_bits(13, 20); /* large page */ |
2415 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; | 2407 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; |
2416 | break; | 2408 | break; |
2417 | case PT64_ROOT_LEVEL: | 2409 | case PT64_ROOT_LEVEL: |
2418 | context->rsvd_bits_mask[0][3] = exb_bit_rsvd | | 2410 | context->rsvd_bits_mask[0][3] = exb_bit_rsvd | |
2419 | rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); | 2411 | rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); |
2420 | context->rsvd_bits_mask[0][2] = exb_bit_rsvd | | 2412 | context->rsvd_bits_mask[0][2] = exb_bit_rsvd | |
2421 | rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); | 2413 | rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); |
2422 | context->rsvd_bits_mask[0][1] = exb_bit_rsvd | | 2414 | context->rsvd_bits_mask[0][1] = exb_bit_rsvd | |
2423 | rsvd_bits(maxphyaddr, 51); | 2415 | rsvd_bits(maxphyaddr, 51); |
2424 | context->rsvd_bits_mask[0][0] = exb_bit_rsvd | | 2416 | context->rsvd_bits_mask[0][0] = exb_bit_rsvd | |
2425 | rsvd_bits(maxphyaddr, 51); | 2417 | rsvd_bits(maxphyaddr, 51); |
2426 | context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3]; | 2418 | context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3]; |
2427 | context->rsvd_bits_mask[1][2] = exb_bit_rsvd | | 2419 | context->rsvd_bits_mask[1][2] = exb_bit_rsvd | |
2428 | rsvd_bits(maxphyaddr, 51) | | 2420 | rsvd_bits(maxphyaddr, 51) | |
2429 | rsvd_bits(13, 29); | 2421 | rsvd_bits(13, 29); |
2430 | context->rsvd_bits_mask[1][1] = exb_bit_rsvd | | 2422 | context->rsvd_bits_mask[1][1] = exb_bit_rsvd | |
2431 | rsvd_bits(maxphyaddr, 51) | | 2423 | rsvd_bits(maxphyaddr, 51) | |
2432 | rsvd_bits(13, 20); /* large page */ | 2424 | rsvd_bits(13, 20); /* large page */ |
2433 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; | 2425 | context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; |
2434 | break; | 2426 | break; |
2435 | } | 2427 | } |
2436 | } | 2428 | } |
2437 | 2429 | ||
2438 | static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level) | 2430 | static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level) |
2439 | { | 2431 | { |
2440 | struct kvm_mmu *context = &vcpu->arch.mmu; | 2432 | struct kvm_mmu *context = &vcpu->arch.mmu; |
2441 | 2433 | ||
2442 | ASSERT(is_pae(vcpu)); | 2434 | ASSERT(is_pae(vcpu)); |
2443 | context->new_cr3 = paging_new_cr3; | 2435 | context->new_cr3 = paging_new_cr3; |
2444 | context->page_fault = paging64_page_fault; | 2436 | context->page_fault = paging64_page_fault; |
2445 | context->gva_to_gpa = paging64_gva_to_gpa; | 2437 | context->gva_to_gpa = paging64_gva_to_gpa; |
2446 | context->prefetch_page = paging64_prefetch_page; | 2438 | context->prefetch_page = paging64_prefetch_page; |
2447 | context->sync_page = paging64_sync_page; | 2439 | context->sync_page = paging64_sync_page; |
2448 | context->invlpg = paging64_invlpg; | 2440 | context->invlpg = paging64_invlpg; |
2449 | context->free = paging_free; | 2441 | context->free = paging_free; |
2450 | context->root_level = level; | 2442 | context->root_level = level; |
2451 | context->shadow_root_level = level; | 2443 | context->shadow_root_level = level; |
2452 | context->root_hpa = INVALID_PAGE; | 2444 | context->root_hpa = INVALID_PAGE; |
2453 | return 0; | 2445 | return 0; |
2454 | } | 2446 | } |
2455 | 2447 | ||
2456 | static int paging64_init_context(struct kvm_vcpu *vcpu) | 2448 | static int paging64_init_context(struct kvm_vcpu *vcpu) |
2457 | { | 2449 | { |
2458 | reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); | 2450 | reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); |
2459 | return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); | 2451 | return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); |
2460 | } | 2452 | } |
2461 | 2453 | ||
2462 | static int paging32_init_context(struct kvm_vcpu *vcpu) | 2454 | static int paging32_init_context(struct kvm_vcpu *vcpu) |
2463 | { | 2455 | { |
2464 | struct kvm_mmu *context = &vcpu->arch.mmu; | 2456 | struct kvm_mmu *context = &vcpu->arch.mmu; |
2465 | 2457 | ||
2466 | reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); | 2458 | reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); |
2467 | context->new_cr3 = paging_new_cr3; | 2459 | context->new_cr3 = paging_new_cr3; |
2468 | context->page_fault = paging32_page_fault; | 2460 | context->page_fault = paging32_page_fault; |
2469 | context->gva_to_gpa = paging32_gva_to_gpa; | 2461 | context->gva_to_gpa = paging32_gva_to_gpa; |
2470 | context->free = paging_free; | 2462 | context->free = paging_free; |
2471 | context->prefetch_page = paging32_prefetch_page; | 2463 | context->prefetch_page = paging32_prefetch_page; |
2472 | context->sync_page = paging32_sync_page; | 2464 | context->sync_page = paging32_sync_page; |
2473 | context->invlpg = paging32_invlpg; | 2465 | context->invlpg = paging32_invlpg; |
2474 | context->root_level = PT32_ROOT_LEVEL; | 2466 | context->root_level = PT32_ROOT_LEVEL; |
2475 | context->shadow_root_level = PT32E_ROOT_LEVEL; | 2467 | context->shadow_root_level = PT32E_ROOT_LEVEL; |
2476 | context->root_hpa = INVALID_PAGE; | 2468 | context->root_hpa = INVALID_PAGE; |
2477 | return 0; | 2469 | return 0; |
2478 | } | 2470 | } |
2479 | 2471 | ||
2480 | static int paging32E_init_context(struct kvm_vcpu *vcpu) | 2472 | static int paging32E_init_context(struct kvm_vcpu *vcpu) |
2481 | { | 2473 | { |
2482 | reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); | 2474 | reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); |
2483 | return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL); | 2475 | return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL); |
2484 | } | 2476 | } |
2485 | 2477 | ||
2486 | static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) | 2478 | static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) |
2487 | { | 2479 | { |
2488 | struct kvm_mmu *context = &vcpu->arch.mmu; | 2480 | struct kvm_mmu *context = &vcpu->arch.mmu; |
2489 | 2481 | ||
2490 | context->new_cr3 = nonpaging_new_cr3; | 2482 | context->new_cr3 = nonpaging_new_cr3; |
2491 | context->page_fault = tdp_page_fault; | 2483 | context->page_fault = tdp_page_fault; |
2492 | context->free = nonpaging_free; | 2484 | context->free = nonpaging_free; |
2493 | context->prefetch_page = nonpaging_prefetch_page; | 2485 | context->prefetch_page = nonpaging_prefetch_page; |
2494 | context->sync_page = nonpaging_sync_page; | 2486 | context->sync_page = nonpaging_sync_page; |
2495 | context->invlpg = nonpaging_invlpg; | 2487 | context->invlpg = nonpaging_invlpg; |
2496 | context->shadow_root_level = kvm_x86_ops->get_tdp_level(); | 2488 | context->shadow_root_level = kvm_x86_ops->get_tdp_level(); |
2497 | context->root_hpa = INVALID_PAGE; | 2489 | context->root_hpa = INVALID_PAGE; |
2498 | 2490 | ||
2499 | if (!is_paging(vcpu)) { | 2491 | if (!is_paging(vcpu)) { |
2500 | context->gva_to_gpa = nonpaging_gva_to_gpa; | 2492 | context->gva_to_gpa = nonpaging_gva_to_gpa; |
2501 | context->root_level = 0; | 2493 | context->root_level = 0; |
2502 | } else if (is_long_mode(vcpu)) { | 2494 | } else if (is_long_mode(vcpu)) { |
2503 | reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); | 2495 | reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL); |
2504 | context->gva_to_gpa = paging64_gva_to_gpa; | 2496 | context->gva_to_gpa = paging64_gva_to_gpa; |
2505 | context->root_level = PT64_ROOT_LEVEL; | 2497 | context->root_level = PT64_ROOT_LEVEL; |
2506 | } else if (is_pae(vcpu)) { | 2498 | } else if (is_pae(vcpu)) { |
2507 | reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); | 2499 | reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL); |
2508 | context->gva_to_gpa = paging64_gva_to_gpa; | 2500 | context->gva_to_gpa = paging64_gva_to_gpa; |
2509 | context->root_level = PT32E_ROOT_LEVEL; | 2501 | context->root_level = PT32E_ROOT_LEVEL; |
2510 | } else { | 2502 | } else { |
2511 | reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); | 2503 | reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL); |
2512 | context->gva_to_gpa = paging32_gva_to_gpa; | 2504 | context->gva_to_gpa = paging32_gva_to_gpa; |
2513 | context->root_level = PT32_ROOT_LEVEL; | 2505 | context->root_level = PT32_ROOT_LEVEL; |
2514 | } | 2506 | } |
2515 | 2507 | ||
2516 | return 0; | 2508 | return 0; |
2517 | } | 2509 | } |
2518 | 2510 | ||
2519 | static int init_kvm_softmmu(struct kvm_vcpu *vcpu) | 2511 | static int init_kvm_softmmu(struct kvm_vcpu *vcpu) |
2520 | { | 2512 | { |
2521 | int r; | 2513 | int r; |
2522 | 2514 | ||
2523 | ASSERT(vcpu); | 2515 | ASSERT(vcpu); |
2524 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); | 2516 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); |
2525 | 2517 | ||
2526 | if (!is_paging(vcpu)) | 2518 | if (!is_paging(vcpu)) |
2527 | r = nonpaging_init_context(vcpu); | 2519 | r = nonpaging_init_context(vcpu); |
2528 | else if (is_long_mode(vcpu)) | 2520 | else if (is_long_mode(vcpu)) |
2529 | r = paging64_init_context(vcpu); | 2521 | r = paging64_init_context(vcpu); |
2530 | else if (is_pae(vcpu)) | 2522 | else if (is_pae(vcpu)) |
2531 | r = paging32E_init_context(vcpu); | 2523 | r = paging32E_init_context(vcpu); |
2532 | else | 2524 | else |
2533 | r = paging32_init_context(vcpu); | 2525 | r = paging32_init_context(vcpu); |
2534 | 2526 | ||
2535 | vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); | 2527 | vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); |
2536 | vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); | 2528 | vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); |
2537 | 2529 | ||
2538 | return r; | 2530 | return r; |
2539 | } | 2531 | } |
2540 | 2532 | ||
2541 | static int init_kvm_mmu(struct kvm_vcpu *vcpu) | 2533 | static int init_kvm_mmu(struct kvm_vcpu *vcpu) |
2542 | { | 2534 | { |
2543 | vcpu->arch.update_pte.pfn = bad_pfn; | 2535 | vcpu->arch.update_pte.pfn = bad_pfn; |
2544 | 2536 | ||
2545 | if (tdp_enabled) | 2537 | if (tdp_enabled) |
2546 | return init_kvm_tdp_mmu(vcpu); | 2538 | return init_kvm_tdp_mmu(vcpu); |
2547 | else | 2539 | else |
2548 | return init_kvm_softmmu(vcpu); | 2540 | return init_kvm_softmmu(vcpu); |
2549 | } | 2541 | } |
2550 | 2542 | ||
2551 | static void destroy_kvm_mmu(struct kvm_vcpu *vcpu) | 2543 | static void destroy_kvm_mmu(struct kvm_vcpu *vcpu) |
2552 | { | 2544 | { |
2553 | ASSERT(vcpu); | 2545 | ASSERT(vcpu); |
2554 | if (VALID_PAGE(vcpu->arch.mmu.root_hpa)) | 2546 | if (VALID_PAGE(vcpu->arch.mmu.root_hpa)) |
2555 | /* mmu.free() should set root_hpa = INVALID_PAGE */ | 2547 | /* mmu.free() should set root_hpa = INVALID_PAGE */ |
2556 | vcpu->arch.mmu.free(vcpu); | 2548 | vcpu->arch.mmu.free(vcpu); |
2557 | } | 2549 | } |
2558 | 2550 | ||
2559 | int kvm_mmu_reset_context(struct kvm_vcpu *vcpu) | 2551 | int kvm_mmu_reset_context(struct kvm_vcpu *vcpu) |
2560 | { | 2552 | { |
2561 | destroy_kvm_mmu(vcpu); | 2553 | destroy_kvm_mmu(vcpu); |
2562 | return init_kvm_mmu(vcpu); | 2554 | return init_kvm_mmu(vcpu); |
2563 | } | 2555 | } |
2564 | EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); | 2556 | EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); |
2565 | 2557 | ||
2566 | int kvm_mmu_load(struct kvm_vcpu *vcpu) | 2558 | int kvm_mmu_load(struct kvm_vcpu *vcpu) |
2567 | { | 2559 | { |
2568 | int r; | 2560 | int r; |
2569 | 2561 | ||
2570 | r = mmu_topup_memory_caches(vcpu); | 2562 | r = mmu_topup_memory_caches(vcpu); |
2571 | if (r) | 2563 | if (r) |
2572 | goto out; | 2564 | goto out; |
2573 | r = mmu_alloc_roots(vcpu); | 2565 | r = mmu_alloc_roots(vcpu); |
2574 | spin_lock(&vcpu->kvm->mmu_lock); | 2566 | spin_lock(&vcpu->kvm->mmu_lock); |
2575 | mmu_sync_roots(vcpu); | 2567 | mmu_sync_roots(vcpu); |
2576 | spin_unlock(&vcpu->kvm->mmu_lock); | 2568 | spin_unlock(&vcpu->kvm->mmu_lock); |
2577 | if (r) | 2569 | if (r) |
2578 | goto out; | 2570 | goto out; |
2579 | /* set_cr3() should ensure TLB has been flushed */ | 2571 | /* set_cr3() should ensure TLB has been flushed */ |
2580 | kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa); | 2572 | kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa); |
2581 | out: | 2573 | out: |
2582 | return r; | 2574 | return r; |
2583 | } | 2575 | } |
2584 | EXPORT_SYMBOL_GPL(kvm_mmu_load); | 2576 | EXPORT_SYMBOL_GPL(kvm_mmu_load); |
2585 | 2577 | ||
2586 | void kvm_mmu_unload(struct kvm_vcpu *vcpu) | 2578 | void kvm_mmu_unload(struct kvm_vcpu *vcpu) |
2587 | { | 2579 | { |
2588 | mmu_free_roots(vcpu); | 2580 | mmu_free_roots(vcpu); |
2589 | } | 2581 | } |
2590 | 2582 | ||
2591 | static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu, | 2583 | static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu, |
2592 | struct kvm_mmu_page *sp, | 2584 | struct kvm_mmu_page *sp, |
2593 | u64 *spte) | 2585 | u64 *spte) |
2594 | { | 2586 | { |
2595 | u64 pte; | 2587 | u64 pte; |
2596 | struct kvm_mmu_page *child; | 2588 | struct kvm_mmu_page *child; |
2597 | 2589 | ||
2598 | pte = *spte; | 2590 | pte = *spte; |
2599 | if (is_shadow_present_pte(pte)) { | 2591 | if (is_shadow_present_pte(pte)) { |
2600 | if (is_last_spte(pte, sp->role.level)) | 2592 | if (is_last_spte(pte, sp->role.level)) |
2601 | rmap_remove(vcpu->kvm, spte); | 2593 | rmap_remove(vcpu->kvm, spte); |
2602 | else { | 2594 | else { |
2603 | child = page_header(pte & PT64_BASE_ADDR_MASK); | 2595 | child = page_header(pte & PT64_BASE_ADDR_MASK); |
2604 | mmu_page_remove_parent_pte(child, spte); | 2596 | mmu_page_remove_parent_pte(child, spte); |
2605 | } | 2597 | } |
2606 | } | 2598 | } |
2607 | __set_spte(spte, shadow_trap_nonpresent_pte); | 2599 | __set_spte(spte, shadow_trap_nonpresent_pte); |
2608 | if (is_large_pte(pte)) | 2600 | if (is_large_pte(pte)) |
2609 | --vcpu->kvm->stat.lpages; | 2601 | --vcpu->kvm->stat.lpages; |
2610 | } | 2602 | } |
2611 | 2603 | ||
2612 | static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu, | 2604 | static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu, |
2613 | struct kvm_mmu_page *sp, | 2605 | struct kvm_mmu_page *sp, |
2614 | u64 *spte, | 2606 | u64 *spte, |
2615 | const void *new) | 2607 | const void *new) |
2616 | { | 2608 | { |
2617 | if (sp->role.level != PT_PAGE_TABLE_LEVEL) { | 2609 | if (sp->role.level != PT_PAGE_TABLE_LEVEL) { |
2618 | ++vcpu->kvm->stat.mmu_pde_zapped; | 2610 | ++vcpu->kvm->stat.mmu_pde_zapped; |
2619 | return; | 2611 | return; |
2620 | } | 2612 | } |
2621 | 2613 | ||
2622 | ++vcpu->kvm->stat.mmu_pte_updated; | 2614 | ++vcpu->kvm->stat.mmu_pte_updated; |
2623 | if (!sp->role.cr4_pae) | 2615 | if (!sp->role.cr4_pae) |
2624 | paging32_update_pte(vcpu, sp, spte, new); | 2616 | paging32_update_pte(vcpu, sp, spte, new); |
2625 | else | 2617 | else |
2626 | paging64_update_pte(vcpu, sp, spte, new); | 2618 | paging64_update_pte(vcpu, sp, spte, new); |
2627 | } | 2619 | } |
2628 | 2620 | ||
2629 | static bool need_remote_flush(u64 old, u64 new) | 2621 | static bool need_remote_flush(u64 old, u64 new) |
2630 | { | 2622 | { |
2631 | if (!is_shadow_present_pte(old)) | 2623 | if (!is_shadow_present_pte(old)) |
2632 | return false; | 2624 | return false; |
2633 | if (!is_shadow_present_pte(new)) | 2625 | if (!is_shadow_present_pte(new)) |
2634 | return true; | 2626 | return true; |
2635 | if ((old ^ new) & PT64_BASE_ADDR_MASK) | 2627 | if ((old ^ new) & PT64_BASE_ADDR_MASK) |
2636 | return true; | 2628 | return true; |
2637 | old ^= PT64_NX_MASK; | 2629 | old ^= PT64_NX_MASK; |
2638 | new ^= PT64_NX_MASK; | 2630 | new ^= PT64_NX_MASK; |
2639 | return (old & ~new & PT64_PERM_MASK) != 0; | 2631 | return (old & ~new & PT64_PERM_MASK) != 0; |
2640 | } | 2632 | } |
2641 | 2633 | ||
2642 | static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, | 2634 | static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, |
2643 | bool remote_flush, bool local_flush) | 2635 | bool remote_flush, bool local_flush) |
2644 | { | 2636 | { |
2645 | if (zap_page) | 2637 | if (zap_page) |
2646 | return; | 2638 | return; |
2647 | 2639 | ||
2648 | if (remote_flush) | 2640 | if (remote_flush) |
2649 | kvm_flush_remote_tlbs(vcpu->kvm); | 2641 | kvm_flush_remote_tlbs(vcpu->kvm); |
2650 | else if (local_flush) | 2642 | else if (local_flush) |
2651 | kvm_mmu_flush_tlb(vcpu); | 2643 | kvm_mmu_flush_tlb(vcpu); |
2652 | } | 2644 | } |
2653 | 2645 | ||
2654 | static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) | 2646 | static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) |
2655 | { | 2647 | { |
2656 | u64 *spte = vcpu->arch.last_pte_updated; | 2648 | u64 *spte = vcpu->arch.last_pte_updated; |
2657 | 2649 | ||
2658 | return !!(spte && (*spte & shadow_accessed_mask)); | 2650 | return !!(spte && (*spte & shadow_accessed_mask)); |
2659 | } | 2651 | } |
2660 | 2652 | ||
2661 | static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, | 2653 | static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, |
2662 | u64 gpte) | 2654 | u64 gpte) |
2663 | { | 2655 | { |
2664 | gfn_t gfn; | 2656 | gfn_t gfn; |
2665 | pfn_t pfn; | 2657 | pfn_t pfn; |
2666 | 2658 | ||
2667 | if (!is_present_gpte(gpte)) | 2659 | if (!is_present_gpte(gpte)) |
2668 | return; | 2660 | return; |
2669 | gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; | 2661 | gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; |
2670 | 2662 | ||
2671 | vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq; | 2663 | vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq; |
2672 | smp_rmb(); | 2664 | smp_rmb(); |
2673 | pfn = gfn_to_pfn(vcpu->kvm, gfn); | 2665 | pfn = gfn_to_pfn(vcpu->kvm, gfn); |
2674 | 2666 | ||
2675 | if (is_error_pfn(pfn)) { | 2667 | if (is_error_pfn(pfn)) { |
2676 | kvm_release_pfn_clean(pfn); | 2668 | kvm_release_pfn_clean(pfn); |
2677 | return; | 2669 | return; |
2678 | } | 2670 | } |
2679 | vcpu->arch.update_pte.gfn = gfn; | 2671 | vcpu->arch.update_pte.gfn = gfn; |
2680 | vcpu->arch.update_pte.pfn = pfn; | 2672 | vcpu->arch.update_pte.pfn = pfn; |
2681 | } | 2673 | } |
2682 | 2674 | ||
2683 | static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) | 2675 | static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) |
2684 | { | 2676 | { |
2685 | u64 *spte = vcpu->arch.last_pte_updated; | 2677 | u64 *spte = vcpu->arch.last_pte_updated; |
2686 | 2678 | ||
2687 | if (spte | 2679 | if (spte |
2688 | && vcpu->arch.last_pte_gfn == gfn | 2680 | && vcpu->arch.last_pte_gfn == gfn |
2689 | && shadow_accessed_mask | 2681 | && shadow_accessed_mask |
2690 | && !(*spte & shadow_accessed_mask) | 2682 | && !(*spte & shadow_accessed_mask) |
2691 | && is_shadow_present_pte(*spte)) | 2683 | && is_shadow_present_pte(*spte)) |
2692 | set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); | 2684 | set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); |
2693 | } | 2685 | } |
2694 | 2686 | ||
2695 | void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, | 2687 | void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, |
2696 | const u8 *new, int bytes, | 2688 | const u8 *new, int bytes, |
2697 | bool guest_initiated) | 2689 | bool guest_initiated) |
2698 | { | 2690 | { |
2699 | gfn_t gfn = gpa >> PAGE_SHIFT; | 2691 | gfn_t gfn = gpa >> PAGE_SHIFT; |
2700 | struct kvm_mmu_page *sp; | 2692 | struct kvm_mmu_page *sp; |
2701 | struct hlist_node *node; | 2693 | struct hlist_node *node; |
2702 | LIST_HEAD(invalid_list); | 2694 | LIST_HEAD(invalid_list); |
2703 | u64 entry, gentry; | 2695 | u64 entry, gentry; |
2704 | u64 *spte; | 2696 | u64 *spte; |
2705 | unsigned offset = offset_in_page(gpa); | 2697 | unsigned offset = offset_in_page(gpa); |
2706 | unsigned pte_size; | 2698 | unsigned pte_size; |
2707 | unsigned page_offset; | 2699 | unsigned page_offset; |
2708 | unsigned misaligned; | 2700 | unsigned misaligned; |
2709 | unsigned quadrant; | 2701 | unsigned quadrant; |
2710 | int level; | 2702 | int level; |
2711 | int flooded = 0; | 2703 | int flooded = 0; |
2712 | int npte; | 2704 | int npte; |
2713 | int r; | 2705 | int r; |
2714 | int invlpg_counter; | 2706 | int invlpg_counter; |
2715 | bool remote_flush, local_flush, zap_page; | 2707 | bool remote_flush, local_flush, zap_page; |
2716 | 2708 | ||
2717 | zap_page = remote_flush = local_flush = false; | 2709 | zap_page = remote_flush = local_flush = false; |
2718 | 2710 | ||
2719 | pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); | 2711 | pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); |
2720 | 2712 | ||
2721 | invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter); | 2713 | invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter); |
2722 | 2714 | ||
2723 | /* | 2715 | /* |
2724 | * Assume that the pte write on a page table of the same type | 2716 | * Assume that the pte write on a page table of the same type |
2725 | * as the current vcpu paging mode. This is nearly always true | 2717 | * as the current vcpu paging mode. This is nearly always true |
2726 | * (might be false while changing modes). Note it is verified later | 2718 | * (might be false while changing modes). Note it is verified later |
2727 | * by update_pte(). | 2719 | * by update_pte(). |
2728 | */ | 2720 | */ |
2729 | if ((is_pae(vcpu) && bytes == 4) || !new) { | 2721 | if ((is_pae(vcpu) && bytes == 4) || !new) { |
2730 | /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ | 2722 | /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ |
2731 | if (is_pae(vcpu)) { | 2723 | if (is_pae(vcpu)) { |
2732 | gpa &= ~(gpa_t)7; | 2724 | gpa &= ~(gpa_t)7; |
2733 | bytes = 8; | 2725 | bytes = 8; |
2734 | } | 2726 | } |
2735 | r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8)); | 2727 | r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8)); |
2736 | if (r) | 2728 | if (r) |
2737 | gentry = 0; | 2729 | gentry = 0; |
2738 | new = (const u8 *)&gentry; | 2730 | new = (const u8 *)&gentry; |
2739 | } | 2731 | } |
2740 | 2732 | ||
2741 | switch (bytes) { | 2733 | switch (bytes) { |
2742 | case 4: | 2734 | case 4: |
2743 | gentry = *(const u32 *)new; | 2735 | gentry = *(const u32 *)new; |
2744 | break; | 2736 | break; |
2745 | case 8: | 2737 | case 8: |
2746 | gentry = *(const u64 *)new; | 2738 | gentry = *(const u64 *)new; |
2747 | break; | 2739 | break; |
2748 | default: | 2740 | default: |
2749 | gentry = 0; | 2741 | gentry = 0; |
2750 | break; | 2742 | break; |
2751 | } | 2743 | } |
2752 | 2744 | ||
2753 | mmu_guess_page_from_pte_write(vcpu, gpa, gentry); | 2745 | mmu_guess_page_from_pte_write(vcpu, gpa, gentry); |
2754 | spin_lock(&vcpu->kvm->mmu_lock); | 2746 | spin_lock(&vcpu->kvm->mmu_lock); |
2755 | if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) | 2747 | if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) |
2756 | gentry = 0; | 2748 | gentry = 0; |
2757 | kvm_mmu_access_page(vcpu, gfn); | 2749 | kvm_mmu_access_page(vcpu, gfn); |
2758 | kvm_mmu_free_some_pages(vcpu); | 2750 | kvm_mmu_free_some_pages(vcpu); |
2759 | ++vcpu->kvm->stat.mmu_pte_write; | 2751 | ++vcpu->kvm->stat.mmu_pte_write; |
2760 | kvm_mmu_audit(vcpu, "pre pte write"); | 2752 | kvm_mmu_audit(vcpu, "pre pte write"); |
2761 | if (guest_initiated) { | 2753 | if (guest_initiated) { |
2762 | if (gfn == vcpu->arch.last_pt_write_gfn | 2754 | if (gfn == vcpu->arch.last_pt_write_gfn |
2763 | && !last_updated_pte_accessed(vcpu)) { | 2755 | && !last_updated_pte_accessed(vcpu)) { |
2764 | ++vcpu->arch.last_pt_write_count; | 2756 | ++vcpu->arch.last_pt_write_count; |
2765 | if (vcpu->arch.last_pt_write_count >= 3) | 2757 | if (vcpu->arch.last_pt_write_count >= 3) |
2766 | flooded = 1; | 2758 | flooded = 1; |
2767 | } else { | 2759 | } else { |
2768 | vcpu->arch.last_pt_write_gfn = gfn; | 2760 | vcpu->arch.last_pt_write_gfn = gfn; |
2769 | vcpu->arch.last_pt_write_count = 1; | 2761 | vcpu->arch.last_pt_write_count = 1; |
2770 | vcpu->arch.last_pte_updated = NULL; | 2762 | vcpu->arch.last_pte_updated = NULL; |
2771 | } | 2763 | } |
2772 | } | 2764 | } |
2773 | 2765 | ||
2774 | for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) { | 2766 | for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) { |
2775 | pte_size = sp->role.cr4_pae ? 8 : 4; | 2767 | pte_size = sp->role.cr4_pae ? 8 : 4; |
2776 | misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1); | 2768 | misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1); |
2777 | misaligned |= bytes < 4; | 2769 | misaligned |= bytes < 4; |
2778 | if (misaligned || flooded) { | 2770 | if (misaligned || flooded) { |
2779 | /* | 2771 | /* |
2780 | * Misaligned accesses are too much trouble to fix | 2772 | * Misaligned accesses are too much trouble to fix |
2781 | * up; also, they usually indicate a page is not used | 2773 | * up; also, they usually indicate a page is not used |
2782 | * as a page table. | 2774 | * as a page table. |
2783 | * | 2775 | * |
2784 | * If we're seeing too many writes to a page, | 2776 | * If we're seeing too many writes to a page, |
2785 | * it may no longer be a page table, or we may be | 2777 | * it may no longer be a page table, or we may be |
2786 | * forking, in which case it is better to unmap the | 2778 | * forking, in which case it is better to unmap the |
2787 | * page. | 2779 | * page. |
2788 | */ | 2780 | */ |
2789 | pgprintk("misaligned: gpa %llx bytes %d role %x\n", | 2781 | pgprintk("misaligned: gpa %llx bytes %d role %x\n", |
2790 | gpa, bytes, sp->role.word); | 2782 | gpa, bytes, sp->role.word); |
2791 | zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp, | 2783 | zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp, |
2792 | &invalid_list); | 2784 | &invalid_list); |
2793 | ++vcpu->kvm->stat.mmu_flooded; | 2785 | ++vcpu->kvm->stat.mmu_flooded; |
2794 | continue; | 2786 | continue; |
2795 | } | 2787 | } |
2796 | page_offset = offset; | 2788 | page_offset = offset; |
2797 | level = sp->role.level; | 2789 | level = sp->role.level; |
2798 | npte = 1; | 2790 | npte = 1; |
2799 | if (!sp->role.cr4_pae) { | 2791 | if (!sp->role.cr4_pae) { |
2800 | page_offset <<= 1; /* 32->64 */ | 2792 | page_offset <<= 1; /* 32->64 */ |
2801 | /* | 2793 | /* |
2802 | * A 32-bit pde maps 4MB while the shadow pdes map | 2794 | * A 32-bit pde maps 4MB while the shadow pdes map |
2803 | * only 2MB. So we need to double the offset again | 2795 | * only 2MB. So we need to double the offset again |
2804 | * and zap two pdes instead of one. | 2796 | * and zap two pdes instead of one. |
2805 | */ | 2797 | */ |
2806 | if (level == PT32_ROOT_LEVEL) { | 2798 | if (level == PT32_ROOT_LEVEL) { |
2807 | page_offset &= ~7; /* kill rounding error */ | 2799 | page_offset &= ~7; /* kill rounding error */ |
2808 | page_offset <<= 1; | 2800 | page_offset <<= 1; |
2809 | npte = 2; | 2801 | npte = 2; |
2810 | } | 2802 | } |
2811 | quadrant = page_offset >> PAGE_SHIFT; | 2803 | quadrant = page_offset >> PAGE_SHIFT; |
2812 | page_offset &= ~PAGE_MASK; | 2804 | page_offset &= ~PAGE_MASK; |
2813 | if (quadrant != sp->role.quadrant) | 2805 | if (quadrant != sp->role.quadrant) |
2814 | continue; | 2806 | continue; |
2815 | } | 2807 | } |
2816 | local_flush = true; | 2808 | local_flush = true; |
2817 | spte = &sp->spt[page_offset / sizeof(*spte)]; | 2809 | spte = &sp->spt[page_offset / sizeof(*spte)]; |
2818 | while (npte--) { | 2810 | while (npte--) { |
2819 | entry = *spte; | 2811 | entry = *spte; |
2820 | mmu_pte_write_zap_pte(vcpu, sp, spte); | 2812 | mmu_pte_write_zap_pte(vcpu, sp, spte); |
2821 | if (gentry) | 2813 | if (gentry) |
2822 | mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); | 2814 | mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); |
2823 | if (!remote_flush && need_remote_flush(entry, *spte)) | 2815 | if (!remote_flush && need_remote_flush(entry, *spte)) |
2824 | remote_flush = true; | 2816 | remote_flush = true; |
2825 | ++spte; | 2817 | ++spte; |
2826 | } | 2818 | } |
2827 | } | 2819 | } |
2828 | mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); | 2820 | mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); |
2829 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 2821 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
2830 | kvm_mmu_audit(vcpu, "post pte write"); | 2822 | kvm_mmu_audit(vcpu, "post pte write"); |
2831 | spin_unlock(&vcpu->kvm->mmu_lock); | 2823 | spin_unlock(&vcpu->kvm->mmu_lock); |
2832 | if (!is_error_pfn(vcpu->arch.update_pte.pfn)) { | 2824 | if (!is_error_pfn(vcpu->arch.update_pte.pfn)) { |
2833 | kvm_release_pfn_clean(vcpu->arch.update_pte.pfn); | 2825 | kvm_release_pfn_clean(vcpu->arch.update_pte.pfn); |
2834 | vcpu->arch.update_pte.pfn = bad_pfn; | 2826 | vcpu->arch.update_pte.pfn = bad_pfn; |
2835 | } | 2827 | } |
2836 | } | 2828 | } |
2837 | 2829 | ||
2838 | int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) | 2830 | int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) |
2839 | { | 2831 | { |
2840 | gpa_t gpa; | 2832 | gpa_t gpa; |
2841 | int r; | 2833 | int r; |
2842 | 2834 | ||
2843 | if (tdp_enabled) | 2835 | if (tdp_enabled) |
2844 | return 0; | 2836 | return 0; |
2845 | 2837 | ||
2846 | gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); | 2838 | gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); |
2847 | 2839 | ||
2848 | spin_lock(&vcpu->kvm->mmu_lock); | 2840 | spin_lock(&vcpu->kvm->mmu_lock); |
2849 | r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT); | 2841 | r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT); |
2850 | spin_unlock(&vcpu->kvm->mmu_lock); | 2842 | spin_unlock(&vcpu->kvm->mmu_lock); |
2851 | return r; | 2843 | return r; |
2852 | } | 2844 | } |
2853 | EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt); | 2845 | EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt); |
2854 | 2846 | ||
2855 | void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) | 2847 | void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) |
2856 | { | 2848 | { |
2857 | int free_pages; | 2849 | int free_pages; |
2858 | LIST_HEAD(invalid_list); | 2850 | LIST_HEAD(invalid_list); |
2859 | 2851 | ||
2860 | free_pages = vcpu->kvm->arch.n_free_mmu_pages; | 2852 | free_pages = vcpu->kvm->arch.n_free_mmu_pages; |
2861 | while (free_pages < KVM_REFILL_PAGES && | 2853 | while (free_pages < KVM_REFILL_PAGES && |
2862 | !list_empty(&vcpu->kvm->arch.active_mmu_pages)) { | 2854 | !list_empty(&vcpu->kvm->arch.active_mmu_pages)) { |
2863 | struct kvm_mmu_page *sp; | 2855 | struct kvm_mmu_page *sp; |
2864 | 2856 | ||
2865 | sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev, | 2857 | sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev, |
2866 | struct kvm_mmu_page, link); | 2858 | struct kvm_mmu_page, link); |
2867 | free_pages += kvm_mmu_prepare_zap_page(vcpu->kvm, sp, | 2859 | free_pages += kvm_mmu_prepare_zap_page(vcpu->kvm, sp, |
2868 | &invalid_list); | 2860 | &invalid_list); |
2869 | ++vcpu->kvm->stat.mmu_recycled; | 2861 | ++vcpu->kvm->stat.mmu_recycled; |
2870 | } | 2862 | } |
2871 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); | 2863 | kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); |
2872 | } | 2864 | } |
2873 | 2865 | ||
2874 | int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) | 2866 | int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) |
2875 | { | 2867 | { |
2876 | int r; | 2868 | int r; |
2877 | enum emulation_result er; | 2869 | enum emulation_result er; |
2878 | 2870 | ||
2879 | r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code); | 2871 | r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code); |
2880 | if (r < 0) | 2872 | if (r < 0) |
2881 | goto out; | 2873 | goto out; |
2882 | 2874 | ||
2883 | if (!r) { | 2875 | if (!r) { |
2884 | r = 1; | 2876 | r = 1; |
2885 | goto out; | 2877 | goto out; |
2886 | } | 2878 | } |
2887 | 2879 | ||
2888 | r = mmu_topup_memory_caches(vcpu); | 2880 | r = mmu_topup_memory_caches(vcpu); |
2889 | if (r) | 2881 | if (r) |
2890 | goto out; | 2882 | goto out; |
2891 | 2883 | ||
2892 | er = emulate_instruction(vcpu, cr2, error_code, 0); | 2884 | er = emulate_instruction(vcpu, cr2, error_code, 0); |
2893 | 2885 | ||
2894 | switch (er) { | 2886 | switch (er) { |
2895 | case EMULATE_DONE: | 2887 | case EMULATE_DONE: |
2896 | return 1; | 2888 | return 1; |
2897 | case EMULATE_DO_MMIO: | 2889 | case EMULATE_DO_MMIO: |
2898 | ++vcpu->stat.mmio_exits; | 2890 | ++vcpu->stat.mmio_exits; |
2899 | /* fall through */ | 2891 | /* fall through */ |
2900 | case EMULATE_FAIL: | 2892 | case EMULATE_FAIL: |
2901 | return 0; | 2893 | return 0; |
2902 | default: | 2894 | default: |
2903 | BUG(); | 2895 | BUG(); |
2904 | } | 2896 | } |
2905 | out: | 2897 | out: |
2906 | return r; | 2898 | return r; |
2907 | } | 2899 | } |
2908 | EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); | 2900 | EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); |
2909 | 2901 | ||
2910 | void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) | 2902 | void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) |
2911 | { | 2903 | { |
2912 | vcpu->arch.mmu.invlpg(vcpu, gva); | 2904 | vcpu->arch.mmu.invlpg(vcpu, gva); |
2913 | kvm_mmu_flush_tlb(vcpu); | 2905 | kvm_mmu_flush_tlb(vcpu); |
2914 | ++vcpu->stat.invlpg; | 2906 | ++vcpu->stat.invlpg; |
2915 | } | 2907 | } |
2916 | EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); | 2908 | EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); |
2917 | 2909 | ||
2918 | void kvm_enable_tdp(void) | 2910 | void kvm_enable_tdp(void) |
2919 | { | 2911 | { |
2920 | tdp_enabled = true; | 2912 | tdp_enabled = true; |
2921 | } | 2913 | } |
2922 | EXPORT_SYMBOL_GPL(kvm_enable_tdp); | 2914 | EXPORT_SYMBOL_GPL(kvm_enable_tdp); |
2923 | 2915 | ||
2924 | void kvm_disable_tdp(void) | 2916 | void kvm_disable_tdp(void) |
2925 | { | 2917 | { |
2926 | tdp_enabled = false; | 2918 | tdp_enabled = false; |
2927 | } | 2919 | } |
2928 | EXPORT_SYMBOL_GPL(kvm_disable_tdp); | 2920 | EXPORT_SYMBOL_GPL(kvm_disable_tdp); |
2929 | 2921 | ||
2930 | static void free_mmu_pages(struct kvm_vcpu *vcpu) | 2922 | static void free_mmu_pages(struct kvm_vcpu *vcpu) |
2931 | { | 2923 | { |
2932 | free_page((unsigned long)vcpu->arch.mmu.pae_root); | 2924 | free_page((unsigned long)vcpu->arch.mmu.pae_root); |
2933 | } | 2925 | } |
2934 | 2926 | ||
2935 | static int alloc_mmu_pages(struct kvm_vcpu *vcpu) | 2927 | static int alloc_mmu_pages(struct kvm_vcpu *vcpu) |
2936 | { | 2928 | { |
2937 | struct page *page; | 2929 | struct page *page; |
2938 | int i; | 2930 | int i; |
2939 | 2931 | ||
2940 | ASSERT(vcpu); | 2932 | ASSERT(vcpu); |
2941 | 2933 | ||
2942 | /* | 2934 | /* |
2943 | * When emulating 32-bit mode, cr3 is only 32 bits even on x86_64. | 2935 | * When emulating 32-bit mode, cr3 is only 32 bits even on x86_64. |
2944 | * Therefore we need to allocate shadow page tables in the first | 2936 | * Therefore we need to allocate shadow page tables in the first |
2945 | * 4GB of memory, which happens to fit the DMA32 zone. | 2937 | * 4GB of memory, which happens to fit the DMA32 zone. |
2946 | */ | 2938 | */ |
2947 | page = alloc_page(GFP_KERNEL | __GFP_DMA32); | 2939 | page = alloc_page(GFP_KERNEL | __GFP_DMA32); |
2948 | if (!page) | 2940 | if (!page) |
2949 | return -ENOMEM; | 2941 | return -ENOMEM; |
2950 | 2942 | ||
2951 | vcpu->arch.mmu.pae_root = page_address(page); | 2943 | vcpu->arch.mmu.pae_root = page_address(page); |
2952 | for (i = 0; i < 4; ++i) | 2944 | for (i = 0; i < 4; ++i) |
2953 | vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; | 2945 | vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; |
2954 | 2946 | ||
2955 | return 0; | 2947 | return 0; |
2956 | } | 2948 | } |
2957 | 2949 | ||
2958 | int kvm_mmu_create(struct kvm_vcpu *vcpu) | 2950 | int kvm_mmu_create(struct kvm_vcpu *vcpu) |
2959 | { | 2951 | { |
2960 | ASSERT(vcpu); | 2952 | ASSERT(vcpu); |
2961 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); | 2953 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); |
2962 | 2954 | ||
2963 | return alloc_mmu_pages(vcpu); | 2955 | return alloc_mmu_pages(vcpu); |
2964 | } | 2956 | } |
2965 | 2957 | ||
2966 | int kvm_mmu_setup(struct kvm_vcpu *vcpu) | 2958 | int kvm_mmu_setup(struct kvm_vcpu *vcpu) |
2967 | { | 2959 | { |
2968 | ASSERT(vcpu); | 2960 | ASSERT(vcpu); |
2969 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); | 2961 | ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); |
2970 | 2962 | ||
2971 | return init_kvm_mmu(vcpu); | 2963 | return init_kvm_mmu(vcpu); |
2972 | } | 2964 | } |
2973 | 2965 | ||
2974 | void kvm_mmu_destroy(struct kvm_vcpu *vcpu) | 2966 | void kvm_mmu_destroy(struct kvm_vcpu *vcpu) |
2975 | { | 2967 | { |
2976 | ASSERT(vcpu); | 2968 | ASSERT(vcpu); |
2977 | 2969 | ||
2978 | destroy_kvm_mmu(vcpu); | 2970 | destroy_kvm_mmu(vcpu); |
2979 | free_mmu_pages(vcpu); | 2971 | free_mmu_pages(vcpu); |
2980 | mmu_free_memory_caches(vcpu); | 2972 | mmu_free_memory_caches(vcpu); |
2981 | } | 2973 | } |
2982 | 2974 | ||
2983 | void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) | 2975 | void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) |
2984 | { | 2976 | { |
2985 | struct kvm_mmu_page *sp; | 2977 | struct kvm_mmu_page *sp; |
2986 | 2978 | ||
2987 | list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) { | 2979 | list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) { |
2988 | int i; | 2980 | int i; |
2989 | u64 *pt; | 2981 | u64 *pt; |
2990 | 2982 | ||
2991 | if (!test_bit(slot, sp->slot_bitmap)) | 2983 | if (!test_bit(slot, sp->slot_bitmap)) |
2992 | continue; | 2984 | continue; |
2993 | 2985 | ||
2994 | pt = sp->spt; | 2986 | pt = sp->spt; |
2995 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) | 2987 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) |
2996 | /* avoid RMW */ | 2988 | /* avoid RMW */ |
2997 | if (is_writable_pte(pt[i])) | 2989 | if (is_writable_pte(pt[i])) |
2998 | pt[i] &= ~PT_WRITABLE_MASK; | 2990 | pt[i] &= ~PT_WRITABLE_MASK; |
2999 | } | 2991 | } |
3000 | kvm_flush_remote_tlbs(kvm); | 2992 | kvm_flush_remote_tlbs(kvm); |
3001 | } | 2993 | } |
3002 | 2994 | ||
3003 | void kvm_mmu_zap_all(struct kvm *kvm) | 2995 | void kvm_mmu_zap_all(struct kvm *kvm) |
3004 | { | 2996 | { |
3005 | struct kvm_mmu_page *sp, *node; | 2997 | struct kvm_mmu_page *sp, *node; |
3006 | LIST_HEAD(invalid_list); | 2998 | LIST_HEAD(invalid_list); |
3007 | 2999 | ||
3008 | spin_lock(&kvm->mmu_lock); | 3000 | spin_lock(&kvm->mmu_lock); |
3009 | restart: | 3001 | restart: |
3010 | list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) | 3002 | list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) |
3011 | if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list)) | 3003 | if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list)) |
3012 | goto restart; | 3004 | goto restart; |
3013 | 3005 | ||
3014 | kvm_mmu_commit_zap_page(kvm, &invalid_list); | 3006 | kvm_mmu_commit_zap_page(kvm, &invalid_list); |
3015 | spin_unlock(&kvm->mmu_lock); | 3007 | spin_unlock(&kvm->mmu_lock); |
3016 | } | 3008 | } |
3017 | 3009 | ||
3018 | static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm, | 3010 | static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm, |
3019 | struct list_head *invalid_list) | 3011 | struct list_head *invalid_list) |
3020 | { | 3012 | { |
3021 | struct kvm_mmu_page *page; | 3013 | struct kvm_mmu_page *page; |
3022 | 3014 | ||
3023 | page = container_of(kvm->arch.active_mmu_pages.prev, | 3015 | page = container_of(kvm->arch.active_mmu_pages.prev, |
3024 | struct kvm_mmu_page, link); | 3016 | struct kvm_mmu_page, link); |
3025 | return kvm_mmu_prepare_zap_page(kvm, page, invalid_list); | 3017 | return kvm_mmu_prepare_zap_page(kvm, page, invalid_list); |
3026 | } | 3018 | } |
3027 | 3019 | ||
3028 | static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask) | 3020 | static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask) |
3029 | { | 3021 | { |
3030 | struct kvm *kvm; | 3022 | struct kvm *kvm; |
3031 | struct kvm *kvm_freed = NULL; | 3023 | struct kvm *kvm_freed = NULL; |
3032 | int cache_count = 0; | 3024 | int cache_count = 0; |
3033 | 3025 | ||
3034 | spin_lock(&kvm_lock); | 3026 | spin_lock(&kvm_lock); |
3035 | 3027 | ||
3036 | list_for_each_entry(kvm, &vm_list, vm_list) { | 3028 | list_for_each_entry(kvm, &vm_list, vm_list) { |
3037 | int npages, idx, freed_pages; | 3029 | int npages, idx, freed_pages; |
3038 | LIST_HEAD(invalid_list); | 3030 | LIST_HEAD(invalid_list); |
3039 | 3031 | ||
3040 | idx = srcu_read_lock(&kvm->srcu); | 3032 | idx = srcu_read_lock(&kvm->srcu); |
3041 | spin_lock(&kvm->mmu_lock); | 3033 | spin_lock(&kvm->mmu_lock); |
3042 | npages = kvm->arch.n_alloc_mmu_pages - | 3034 | npages = kvm->arch.n_alloc_mmu_pages - |
3043 | kvm->arch.n_free_mmu_pages; | 3035 | kvm->arch.n_free_mmu_pages; |
3044 | cache_count += npages; | 3036 | cache_count += npages; |
3045 | if (!kvm_freed && nr_to_scan > 0 && npages > 0) { | 3037 | if (!kvm_freed && nr_to_scan > 0 && npages > 0) { |
3046 | freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm, | 3038 | freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm, |
3047 | &invalid_list); | 3039 | &invalid_list); |
3048 | cache_count -= freed_pages; | 3040 | cache_count -= freed_pages; |
3049 | kvm_freed = kvm; | 3041 | kvm_freed = kvm; |
3050 | } | 3042 | } |
3051 | nr_to_scan--; | 3043 | nr_to_scan--; |
3052 | 3044 | ||
3053 | kvm_mmu_commit_zap_page(kvm, &invalid_list); | 3045 | kvm_mmu_commit_zap_page(kvm, &invalid_list); |
3054 | spin_unlock(&kvm->mmu_lock); | 3046 | spin_unlock(&kvm->mmu_lock); |
3055 | srcu_read_unlock(&kvm->srcu, idx); | 3047 | srcu_read_unlock(&kvm->srcu, idx); |
3056 | } | 3048 | } |
3057 | if (kvm_freed) | 3049 | if (kvm_freed) |
3058 | list_move_tail(&kvm_freed->vm_list, &vm_list); | 3050 | list_move_tail(&kvm_freed->vm_list, &vm_list); |
3059 | 3051 | ||
3060 | spin_unlock(&kvm_lock); | 3052 | spin_unlock(&kvm_lock); |
3061 | 3053 | ||
3062 | return cache_count; | 3054 | return cache_count; |
3063 | } | 3055 | } |
3064 | 3056 | ||
3065 | static struct shrinker mmu_shrinker = { | 3057 | static struct shrinker mmu_shrinker = { |
3066 | .shrink = mmu_shrink, | 3058 | .shrink = mmu_shrink, |
3067 | .seeks = DEFAULT_SEEKS * 10, | 3059 | .seeks = DEFAULT_SEEKS * 10, |
3068 | }; | 3060 | }; |
3069 | 3061 | ||
3070 | static void mmu_destroy_caches(void) | 3062 | static void mmu_destroy_caches(void) |
3071 | { | 3063 | { |
3072 | if (pte_chain_cache) | 3064 | if (pte_chain_cache) |
3073 | kmem_cache_destroy(pte_chain_cache); | 3065 | kmem_cache_destroy(pte_chain_cache); |
3074 | if (rmap_desc_cache) | 3066 | if (rmap_desc_cache) |
3075 | kmem_cache_destroy(rmap_desc_cache); | 3067 | kmem_cache_destroy(rmap_desc_cache); |
3076 | if (mmu_page_header_cache) | 3068 | if (mmu_page_header_cache) |
3077 | kmem_cache_destroy(mmu_page_header_cache); | 3069 | kmem_cache_destroy(mmu_page_header_cache); |
3078 | } | 3070 | } |
3079 | 3071 | ||
3080 | void kvm_mmu_module_exit(void) | 3072 | void kvm_mmu_module_exit(void) |
3081 | { | 3073 | { |
3082 | mmu_destroy_caches(); | 3074 | mmu_destroy_caches(); |
3083 | unregister_shrinker(&mmu_shrinker); | 3075 | unregister_shrinker(&mmu_shrinker); |
3084 | } | 3076 | } |
3085 | 3077 | ||
3086 | int kvm_mmu_module_init(void) | 3078 | int kvm_mmu_module_init(void) |
3087 | { | 3079 | { |
3088 | pte_chain_cache = kmem_cache_create("kvm_pte_chain", | 3080 | pte_chain_cache = kmem_cache_create("kvm_pte_chain", |
3089 | sizeof(struct kvm_pte_chain), | 3081 | sizeof(struct kvm_pte_chain), |
3090 | 0, 0, NULL); | 3082 | 0, 0, NULL); |
3091 | if (!pte_chain_cache) | 3083 | if (!pte_chain_cache) |
3092 | goto nomem; | 3084 | goto nomem; |
3093 | rmap_desc_cache = kmem_cache_create("kvm_rmap_desc", | 3085 | rmap_desc_cache = kmem_cache_create("kvm_rmap_desc", |
3094 | sizeof(struct kvm_rmap_desc), | 3086 | sizeof(struct kvm_rmap_desc), |
3095 | 0, 0, NULL); | 3087 | 0, 0, NULL); |
3096 | if (!rmap_desc_cache) | 3088 | if (!rmap_desc_cache) |
3097 | goto nomem; | 3089 | goto nomem; |
3098 | 3090 | ||
3099 | mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header", | 3091 | mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header", |
3100 | sizeof(struct kvm_mmu_page), | 3092 | sizeof(struct kvm_mmu_page), |
3101 | 0, 0, NULL); | 3093 | 0, 0, NULL); |
3102 | if (!mmu_page_header_cache) | 3094 | if (!mmu_page_header_cache) |
3103 | goto nomem; | 3095 | goto nomem; |
3104 | 3096 | ||
3105 | register_shrinker(&mmu_shrinker); | 3097 | register_shrinker(&mmu_shrinker); |
3106 | 3098 | ||
3107 | return 0; | 3099 | return 0; |
3108 | 3100 | ||
3109 | nomem: | 3101 | nomem: |
3110 | mmu_destroy_caches(); | 3102 | mmu_destroy_caches(); |
3111 | return -ENOMEM; | 3103 | return -ENOMEM; |
3112 | } | 3104 | } |
3113 | 3105 | ||
3114 | /* | 3106 | /* |
3115 | * Caculate mmu pages needed for kvm. | 3107 | * Caculate mmu pages needed for kvm. |
3116 | */ | 3108 | */ |
3117 | unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm) | 3109 | unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm) |
3118 | { | 3110 | { |
3119 | int i; | 3111 | int i; |
3120 | unsigned int nr_mmu_pages; | 3112 | unsigned int nr_mmu_pages; |
3121 | unsigned int nr_pages = 0; | 3113 | unsigned int nr_pages = 0; |
3122 | struct kvm_memslots *slots; | 3114 | struct kvm_memslots *slots; |
3123 | 3115 | ||
3124 | slots = kvm_memslots(kvm); | 3116 | slots = kvm_memslots(kvm); |
3125 | 3117 | ||
3126 | for (i = 0; i < slots->nmemslots; i++) | 3118 | for (i = 0; i < slots->nmemslots; i++) |
3127 | nr_pages += slots->memslots[i].npages; | 3119 | nr_pages += slots->memslots[i].npages; |
3128 | 3120 | ||
3129 | nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000; | 3121 | nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000; |
3130 | nr_mmu_pages = max(nr_mmu_pages, | 3122 | nr_mmu_pages = max(nr_mmu_pages, |
3131 | (unsigned int) KVM_MIN_ALLOC_MMU_PAGES); | 3123 | (unsigned int) KVM_MIN_ALLOC_MMU_PAGES); |
3132 | 3124 | ||
3133 | return nr_mmu_pages; | 3125 | return nr_mmu_pages; |
3134 | } | 3126 | } |
3135 | 3127 | ||
3136 | static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer, | 3128 | static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer, |
3137 | unsigned len) | 3129 | unsigned len) |
3138 | { | 3130 | { |
3139 | if (len > buffer->len) | 3131 | if (len > buffer->len) |
3140 | return NULL; | 3132 | return NULL; |
3141 | return buffer->ptr; | 3133 | return buffer->ptr; |
3142 | } | 3134 | } |
3143 | 3135 | ||
3144 | static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer, | 3136 | static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer, |
3145 | unsigned len) | 3137 | unsigned len) |
3146 | { | 3138 | { |
3147 | void *ret; | 3139 | void *ret; |
3148 | 3140 | ||
3149 | ret = pv_mmu_peek_buffer(buffer, len); | 3141 | ret = pv_mmu_peek_buffer(buffer, len); |
3150 | if (!ret) | 3142 | if (!ret) |
3151 | return ret; | 3143 | return ret; |
3152 | buffer->ptr += len; | 3144 | buffer->ptr += len; |
3153 | buffer->len -= len; | 3145 | buffer->len -= len; |
3154 | buffer->processed += len; | 3146 | buffer->processed += len; |
3155 | return ret; | 3147 | return ret; |
3156 | } | 3148 | } |
3157 | 3149 | ||
3158 | static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu, | 3150 | static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu, |
3159 | gpa_t addr, gpa_t value) | 3151 | gpa_t addr, gpa_t value) |
3160 | { | 3152 | { |
3161 | int bytes = 8; | 3153 | int bytes = 8; |
3162 | int r; | 3154 | int r; |
3163 | 3155 | ||
3164 | if (!is_long_mode(vcpu) && !is_pae(vcpu)) | 3156 | if (!is_long_mode(vcpu) && !is_pae(vcpu)) |
3165 | bytes = 4; | 3157 | bytes = 4; |
3166 | 3158 | ||
3167 | r = mmu_topup_memory_caches(vcpu); | 3159 | r = mmu_topup_memory_caches(vcpu); |
3168 | if (r) | 3160 | if (r) |
3169 | return r; | 3161 | return r; |
3170 | 3162 | ||
3171 | if (!emulator_write_phys(vcpu, addr, &value, bytes)) | 3163 | if (!emulator_write_phys(vcpu, addr, &value, bytes)) |
3172 | return -EFAULT; | 3164 | return -EFAULT; |
3173 | 3165 | ||
3174 | return 1; | 3166 | return 1; |
3175 | } | 3167 | } |
3176 | 3168 | ||
3177 | static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu) | 3169 | static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu) |
3178 | { | 3170 | { |
3179 | (void)kvm_set_cr3(vcpu, vcpu->arch.cr3); | 3171 | (void)kvm_set_cr3(vcpu, vcpu->arch.cr3); |
3180 | return 1; | 3172 | return 1; |
3181 | } | 3173 | } |
3182 | 3174 | ||
3183 | static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr) | 3175 | static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr) |
3184 | { | 3176 | { |
3185 | spin_lock(&vcpu->kvm->mmu_lock); | 3177 | spin_lock(&vcpu->kvm->mmu_lock); |
3186 | mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT); | 3178 | mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT); |
3187 | spin_unlock(&vcpu->kvm->mmu_lock); | 3179 | spin_unlock(&vcpu->kvm->mmu_lock); |
3188 | return 1; | 3180 | return 1; |
3189 | } | 3181 | } |
3190 | 3182 | ||
3191 | static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu, | 3183 | static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu, |
3192 | struct kvm_pv_mmu_op_buffer *buffer) | 3184 | struct kvm_pv_mmu_op_buffer *buffer) |
3193 | { | 3185 | { |
3194 | struct kvm_mmu_op_header *header; | 3186 | struct kvm_mmu_op_header *header; |
3195 | 3187 | ||
3196 | header = pv_mmu_peek_buffer(buffer, sizeof *header); | 3188 | header = pv_mmu_peek_buffer(buffer, sizeof *header); |
3197 | if (!header) | 3189 | if (!header) |
3198 | return 0; | 3190 | return 0; |
3199 | switch (header->op) { | 3191 | switch (header->op) { |
3200 | case KVM_MMU_OP_WRITE_PTE: { | 3192 | case KVM_MMU_OP_WRITE_PTE: { |
3201 | struct kvm_mmu_op_write_pte *wpte; | 3193 | struct kvm_mmu_op_write_pte *wpte; |
3202 | 3194 | ||
3203 | wpte = pv_mmu_read_buffer(buffer, sizeof *wpte); | 3195 | wpte = pv_mmu_read_buffer(buffer, sizeof *wpte); |
3204 | if (!wpte) | 3196 | if (!wpte) |
3205 | return 0; | 3197 | return 0; |
3206 | return kvm_pv_mmu_write(vcpu, wpte->pte_phys, | 3198 | return kvm_pv_mmu_write(vcpu, wpte->pte_phys, |
3207 | wpte->pte_val); | 3199 | wpte->pte_val); |
3208 | } | 3200 | } |
3209 | case KVM_MMU_OP_FLUSH_TLB: { | 3201 | case KVM_MMU_OP_FLUSH_TLB: { |
3210 | struct kvm_mmu_op_flush_tlb *ftlb; | 3202 | struct kvm_mmu_op_flush_tlb *ftlb; |
3211 | 3203 | ||
3212 | ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb); | 3204 | ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb); |
3213 | if (!ftlb) | 3205 | if (!ftlb) |
3214 | return 0; | 3206 | return 0; |
3215 | return kvm_pv_mmu_flush_tlb(vcpu); | 3207 | return kvm_pv_mmu_flush_tlb(vcpu); |
3216 | } | 3208 | } |
3217 | case KVM_MMU_OP_RELEASE_PT: { | 3209 | case KVM_MMU_OP_RELEASE_PT: { |
3218 | struct kvm_mmu_op_release_pt *rpt; | 3210 | struct kvm_mmu_op_release_pt *rpt; |
3219 | 3211 | ||
3220 | rpt = pv_mmu_read_buffer(buffer, sizeof *rpt); | 3212 | rpt = pv_mmu_read_buffer(buffer, sizeof *rpt); |
3221 | if (!rpt) | 3213 | if (!rpt) |
3222 | return 0; | 3214 | return 0; |
3223 | return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys); | 3215 | return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys); |
3224 | } | 3216 | } |
3225 | default: return 0; | 3217 | default: return 0; |
3226 | } | 3218 | } |
3227 | } | 3219 | } |
3228 | 3220 | ||
3229 | int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, | 3221 | int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, |
3230 | gpa_t addr, unsigned long *ret) | 3222 | gpa_t addr, unsigned long *ret) |
3231 | { | 3223 | { |
3232 | int r; | 3224 | int r; |
3233 | struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer; | 3225 | struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer; |
3234 | 3226 | ||
3235 | buffer->ptr = buffer->buf; | 3227 | buffer->ptr = buffer->buf; |
3236 | buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf); | 3228 | buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf); |
3237 | buffer->processed = 0; | 3229 | buffer->processed = 0; |
3238 | 3230 | ||
3239 | r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len); | 3231 | r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len); |
3240 | if (r) | 3232 | if (r) |
3241 | goto out; | 3233 | goto out; |
3242 | 3234 | ||
3243 | while (buffer->len) { | 3235 | while (buffer->len) { |
3244 | r = kvm_pv_mmu_op_one(vcpu, buffer); | 3236 | r = kvm_pv_mmu_op_one(vcpu, buffer); |
3245 | if (r < 0) | 3237 | if (r < 0) |
3246 | goto out; | 3238 | goto out; |
3247 | if (r == 0) | 3239 | if (r == 0) |
3248 | break; | 3240 | break; |
3249 | } | 3241 | } |
3250 | 3242 | ||
3251 | r = 1; | 3243 | r = 1; |
3252 | out: | 3244 | out: |
3253 | *ret = buffer->processed; | 3245 | *ret = buffer->processed; |
3254 | return r; | 3246 | return r; |
3255 | } | 3247 | } |
3256 | 3248 | ||
3257 | int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]) | 3249 | int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]) |
3258 | { | 3250 | { |
3259 | struct kvm_shadow_walk_iterator iterator; | 3251 | struct kvm_shadow_walk_iterator iterator; |
3260 | int nr_sptes = 0; | 3252 | int nr_sptes = 0; |
3261 | 3253 | ||
3262 | spin_lock(&vcpu->kvm->mmu_lock); | 3254 | spin_lock(&vcpu->kvm->mmu_lock); |
3263 | for_each_shadow_entry(vcpu, addr, iterator) { | 3255 | for_each_shadow_entry(vcpu, addr, iterator) { |
3264 | sptes[iterator.level-1] = *iterator.sptep; | 3256 | sptes[iterator.level-1] = *iterator.sptep; |
3265 | nr_sptes++; | 3257 | nr_sptes++; |
3266 | if (!is_shadow_present_pte(*iterator.sptep)) | 3258 | if (!is_shadow_present_pte(*iterator.sptep)) |
3267 | break; | 3259 | break; |
3268 | } | 3260 | } |
3269 | spin_unlock(&vcpu->kvm->mmu_lock); | 3261 | spin_unlock(&vcpu->kvm->mmu_lock); |
3270 | 3262 | ||
3271 | return nr_sptes; | 3263 | return nr_sptes; |
3272 | } | 3264 | } |
3273 | EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy); | 3265 | EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy); |
3274 | 3266 | ||
3275 | #ifdef AUDIT | 3267 | #ifdef AUDIT |
3276 | 3268 | ||
3277 | static const char *audit_msg; | 3269 | static const char *audit_msg; |
3278 | 3270 | ||
3279 | static gva_t canonicalize(gva_t gva) | 3271 | static gva_t canonicalize(gva_t gva) |
3280 | { | 3272 | { |
3281 | #ifdef CONFIG_X86_64 | 3273 | #ifdef CONFIG_X86_64 |
3282 | gva = (long long)(gva << 16) >> 16; | 3274 | gva = (long long)(gva << 16) >> 16; |
3283 | #endif | 3275 | #endif |
3284 | return gva; | 3276 | return gva; |
3285 | } | 3277 | } |
3286 | 3278 | ||
3287 | 3279 | ||
3288 | typedef void (*inspect_spte_fn) (struct kvm *kvm, u64 *sptep); | 3280 | typedef void (*inspect_spte_fn) (struct kvm *kvm, u64 *sptep); |
3289 | 3281 | ||
3290 | static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp, | 3282 | static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp, |
3291 | inspect_spte_fn fn) | 3283 | inspect_spte_fn fn) |
3292 | { | 3284 | { |
3293 | int i; | 3285 | int i; |
3294 | 3286 | ||
3295 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { | 3287 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { |
3296 | u64 ent = sp->spt[i]; | 3288 | u64 ent = sp->spt[i]; |
3297 | 3289 | ||
3298 | if (is_shadow_present_pte(ent)) { | 3290 | if (is_shadow_present_pte(ent)) { |
3299 | if (!is_last_spte(ent, sp->role.level)) { | 3291 | if (!is_last_spte(ent, sp->role.level)) { |
3300 | struct kvm_mmu_page *child; | 3292 | struct kvm_mmu_page *child; |
3301 | child = page_header(ent & PT64_BASE_ADDR_MASK); | 3293 | child = page_header(ent & PT64_BASE_ADDR_MASK); |
3302 | __mmu_spte_walk(kvm, child, fn); | 3294 | __mmu_spte_walk(kvm, child, fn); |
3303 | } else | 3295 | } else |
3304 | fn(kvm, &sp->spt[i]); | 3296 | fn(kvm, &sp->spt[i]); |
3305 | } | 3297 | } |
3306 | } | 3298 | } |
3307 | } | 3299 | } |
3308 | 3300 | ||
3309 | static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn) | 3301 | static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn) |
3310 | { | 3302 | { |
3311 | int i; | 3303 | int i; |
3312 | struct kvm_mmu_page *sp; | 3304 | struct kvm_mmu_page *sp; |
3313 | 3305 | ||
3314 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) | 3306 | if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) |
3315 | return; | 3307 | return; |
3316 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { | 3308 | if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { |
3317 | hpa_t root = vcpu->arch.mmu.root_hpa; | 3309 | hpa_t root = vcpu->arch.mmu.root_hpa; |
3318 | sp = page_header(root); | 3310 | sp = page_header(root); |
3319 | __mmu_spte_walk(vcpu->kvm, sp, fn); | 3311 | __mmu_spte_walk(vcpu->kvm, sp, fn); |
3320 | return; | 3312 | return; |
3321 | } | 3313 | } |
3322 | for (i = 0; i < 4; ++i) { | 3314 | for (i = 0; i < 4; ++i) { |
3323 | hpa_t root = vcpu->arch.mmu.pae_root[i]; | 3315 | hpa_t root = vcpu->arch.mmu.pae_root[i]; |
3324 | 3316 | ||
3325 | if (root && VALID_PAGE(root)) { | 3317 | if (root && VALID_PAGE(root)) { |
3326 | root &= PT64_BASE_ADDR_MASK; | 3318 | root &= PT64_BASE_ADDR_MASK; |
3327 | sp = page_header(root); | 3319 | sp = page_header(root); |
3328 | __mmu_spte_walk(vcpu->kvm, sp, fn); | 3320 | __mmu_spte_walk(vcpu->kvm, sp, fn); |
3329 | } | 3321 | } |
3330 | } | 3322 | } |
3331 | return; | 3323 | return; |
3332 | } | 3324 | } |
3333 | 3325 | ||
3334 | static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte, | 3326 | static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte, |
3335 | gva_t va, int level) | 3327 | gva_t va, int level) |
3336 | { | 3328 | { |
3337 | u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK); | 3329 | u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK); |
3338 | int i; | 3330 | int i; |
3339 | gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1)); | 3331 | gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1)); |
3340 | 3332 | ||
3341 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) { | 3333 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) { |
3342 | u64 ent = pt[i]; | 3334 | u64 ent = pt[i]; |
3343 | 3335 | ||
3344 | if (ent == shadow_trap_nonpresent_pte) | 3336 | if (ent == shadow_trap_nonpresent_pte) |
3345 | continue; | 3337 | continue; |
3346 | 3338 | ||
3347 | va = canonicalize(va); | 3339 | va = canonicalize(va); |
3348 | if (is_shadow_present_pte(ent) && !is_last_spte(ent, level)) | 3340 | if (is_shadow_present_pte(ent) && !is_last_spte(ent, level)) |
3349 | audit_mappings_page(vcpu, ent, va, level - 1); | 3341 | audit_mappings_page(vcpu, ent, va, level - 1); |
3350 | else { | 3342 | else { |
3351 | gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL); | 3343 | gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL); |
3352 | gfn_t gfn = gpa >> PAGE_SHIFT; | 3344 | gfn_t gfn = gpa >> PAGE_SHIFT; |
3353 | pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn); | 3345 | pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn); |
3354 | hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT; | 3346 | hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT; |
3355 | 3347 | ||
3356 | if (is_error_pfn(pfn)) { | 3348 | if (is_error_pfn(pfn)) { |
3357 | kvm_release_pfn_clean(pfn); | 3349 | kvm_release_pfn_clean(pfn); |
3358 | continue; | 3350 | continue; |
3359 | } | 3351 | } |
3360 | 3352 | ||
3361 | if (is_shadow_present_pte(ent) | 3353 | if (is_shadow_present_pte(ent) |
3362 | && (ent & PT64_BASE_ADDR_MASK) != hpa) | 3354 | && (ent & PT64_BASE_ADDR_MASK) != hpa) |
3363 | printk(KERN_ERR "xx audit error: (%s) levels %d" | 3355 | printk(KERN_ERR "xx audit error: (%s) levels %d" |
3364 | " gva %lx gpa %llx hpa %llx ent %llx %d\n", | 3356 | " gva %lx gpa %llx hpa %llx ent %llx %d\n", |
3365 | audit_msg, vcpu->arch.mmu.root_level, | 3357 | audit_msg, vcpu->arch.mmu.root_level, |
3366 | va, gpa, hpa, ent, | 3358 | va, gpa, hpa, ent, |
3367 | is_shadow_present_pte(ent)); | 3359 | is_shadow_present_pte(ent)); |
3368 | else if (ent == shadow_notrap_nonpresent_pte | 3360 | else if (ent == shadow_notrap_nonpresent_pte |
3369 | && !is_error_hpa(hpa)) | 3361 | && !is_error_hpa(hpa)) |
3370 | printk(KERN_ERR "audit: (%s) notrap shadow," | 3362 | printk(KERN_ERR "audit: (%s) notrap shadow," |
3371 | " valid guest gva %lx\n", audit_msg, va); | 3363 | " valid guest gva %lx\n", audit_msg, va); |
3372 | kvm_release_pfn_clean(pfn); | 3364 | kvm_release_pfn_clean(pfn); |
3373 | 3365 | ||
3374 | } | 3366 | } |
3375 | } | 3367 | } |
3376 | } | 3368 | } |
3377 | 3369 | ||
3378 | static void audit_mappings(struct kvm_vcpu *vcpu) | 3370 | static void audit_mappings(struct kvm_vcpu *vcpu) |
3379 | { | 3371 | { |
3380 | unsigned i; | 3372 | unsigned i; |
3381 | 3373 | ||
3382 | if (vcpu->arch.mmu.root_level == 4) | 3374 | if (vcpu->arch.mmu.root_level == 4) |
3383 | audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4); | 3375 | audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4); |
3384 | else | 3376 | else |
3385 | for (i = 0; i < 4; ++i) | 3377 | for (i = 0; i < 4; ++i) |
3386 | if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK) | 3378 | if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK) |
3387 | audit_mappings_page(vcpu, | 3379 | audit_mappings_page(vcpu, |
3388 | vcpu->arch.mmu.pae_root[i], | 3380 | vcpu->arch.mmu.pae_root[i], |
3389 | i << 30, | 3381 | i << 30, |
3390 | 2); | 3382 | 2); |
3391 | } | 3383 | } |
3392 | 3384 | ||
3393 | static int count_rmaps(struct kvm_vcpu *vcpu) | 3385 | static int count_rmaps(struct kvm_vcpu *vcpu) |
3394 | { | 3386 | { |
3395 | struct kvm *kvm = vcpu->kvm; | 3387 | struct kvm *kvm = vcpu->kvm; |
3396 | struct kvm_memslots *slots; | 3388 | struct kvm_memslots *slots; |
3397 | int nmaps = 0; | 3389 | int nmaps = 0; |
3398 | int i, j, k, idx; | 3390 | int i, j, k, idx; |
3399 | 3391 | ||
3400 | idx = srcu_read_lock(&kvm->srcu); | 3392 | idx = srcu_read_lock(&kvm->srcu); |
3401 | slots = kvm_memslots(kvm); | 3393 | slots = kvm_memslots(kvm); |
3402 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { | 3394 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { |
3403 | struct kvm_memory_slot *m = &slots->memslots[i]; | 3395 | struct kvm_memory_slot *m = &slots->memslots[i]; |
3404 | struct kvm_rmap_desc *d; | 3396 | struct kvm_rmap_desc *d; |
3405 | 3397 | ||
3406 | for (j = 0; j < m->npages; ++j) { | 3398 | for (j = 0; j < m->npages; ++j) { |
3407 | unsigned long *rmapp = &m->rmap[j]; | 3399 | unsigned long *rmapp = &m->rmap[j]; |
3408 | 3400 | ||
3409 | if (!*rmapp) | 3401 | if (!*rmapp) |
3410 | continue; | 3402 | continue; |
3411 | if (!(*rmapp & 1)) { | 3403 | if (!(*rmapp & 1)) { |
3412 | ++nmaps; | 3404 | ++nmaps; |
3413 | continue; | 3405 | continue; |
3414 | } | 3406 | } |
3415 | d = (struct kvm_rmap_desc *)(*rmapp & ~1ul); | 3407 | d = (struct kvm_rmap_desc *)(*rmapp & ~1ul); |
3416 | while (d) { | 3408 | while (d) { |
3417 | for (k = 0; k < RMAP_EXT; ++k) | 3409 | for (k = 0; k < RMAP_EXT; ++k) |
3418 | if (d->sptes[k]) | 3410 | if (d->sptes[k]) |
3419 | ++nmaps; | 3411 | ++nmaps; |
3420 | else | 3412 | else |
3421 | break; | 3413 | break; |
3422 | d = d->more; | 3414 | d = d->more; |
3423 | } | 3415 | } |
3424 | } | 3416 | } |
3425 | } | 3417 | } |
3426 | srcu_read_unlock(&kvm->srcu, idx); | 3418 | srcu_read_unlock(&kvm->srcu, idx); |
3427 | return nmaps; | 3419 | return nmaps; |
3428 | } | 3420 | } |
3429 | 3421 | ||
3430 | void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep) | 3422 | void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep) |
3431 | { | 3423 | { |
3432 | unsigned long *rmapp; | 3424 | unsigned long *rmapp; |
3433 | struct kvm_mmu_page *rev_sp; | 3425 | struct kvm_mmu_page *rev_sp; |
3434 | gfn_t gfn; | 3426 | gfn_t gfn; |
3435 | 3427 | ||
3436 | if (is_writable_pte(*sptep)) { | 3428 | if (is_writable_pte(*sptep)) { |
3437 | rev_sp = page_header(__pa(sptep)); | 3429 | rev_sp = page_header(__pa(sptep)); |
3438 | gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt); | 3430 | gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt); |
3439 | 3431 | ||
3440 | if (!gfn_to_memslot(kvm, gfn)) { | 3432 | if (!gfn_to_memslot(kvm, gfn)) { |
3441 | if (!printk_ratelimit()) | 3433 | if (!printk_ratelimit()) |
3442 | return; | 3434 | return; |
3443 | printk(KERN_ERR "%s: no memslot for gfn %ld\n", | 3435 | printk(KERN_ERR "%s: no memslot for gfn %ld\n", |
3444 | audit_msg, gfn); | 3436 | audit_msg, gfn); |
3445 | printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n", | 3437 | printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n", |
3446 | audit_msg, (long int)(sptep - rev_sp->spt), | 3438 | audit_msg, (long int)(sptep - rev_sp->spt), |
3447 | rev_sp->gfn); | 3439 | rev_sp->gfn); |
3448 | dump_stack(); | 3440 | dump_stack(); |
3449 | return; | 3441 | return; |
3450 | } | 3442 | } |
3451 | 3443 | ||
3452 | rmapp = gfn_to_rmap(kvm, gfn, rev_sp->role.level); | 3444 | rmapp = gfn_to_rmap(kvm, gfn, rev_sp->role.level); |
3453 | if (!*rmapp) { | 3445 | if (!*rmapp) { |
3454 | if (!printk_ratelimit()) | 3446 | if (!printk_ratelimit()) |
3455 | return; | 3447 | return; |
3456 | printk(KERN_ERR "%s: no rmap for writable spte %llx\n", | 3448 | printk(KERN_ERR "%s: no rmap for writable spte %llx\n", |
3457 | audit_msg, *sptep); | 3449 | audit_msg, *sptep); |
3458 | dump_stack(); | 3450 | dump_stack(); |
3459 | } | 3451 | } |
3460 | } | 3452 | } |
3461 | 3453 | ||
3462 | } | 3454 | } |
3463 | 3455 | ||
3464 | void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu) | 3456 | void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu) |
3465 | { | 3457 | { |
3466 | mmu_spte_walk(vcpu, inspect_spte_has_rmap); | 3458 | mmu_spte_walk(vcpu, inspect_spte_has_rmap); |
3467 | } | 3459 | } |
3468 | 3460 | ||
3469 | static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu) | 3461 | static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu) |
3470 | { | 3462 | { |
3471 | struct kvm_mmu_page *sp; | 3463 | struct kvm_mmu_page *sp; |
3472 | int i; | 3464 | int i; |
3473 | 3465 | ||
3474 | list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { | 3466 | list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { |
3475 | u64 *pt = sp->spt; | 3467 | u64 *pt = sp->spt; |
3476 | 3468 | ||
3477 | if (sp->role.level != PT_PAGE_TABLE_LEVEL) | 3469 | if (sp->role.level != PT_PAGE_TABLE_LEVEL) |
3478 | continue; | 3470 | continue; |
3479 | 3471 | ||
3480 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { | 3472 | for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { |
3481 | u64 ent = pt[i]; | 3473 | u64 ent = pt[i]; |
3482 | 3474 | ||
3483 | if (!(ent & PT_PRESENT_MASK)) | 3475 | if (!(ent & PT_PRESENT_MASK)) |
3484 | continue; | 3476 | continue; |
3485 | if (!is_writable_pte(ent)) | 3477 | if (!is_writable_pte(ent)) |
3486 | continue; | 3478 | continue; |
3487 | inspect_spte_has_rmap(vcpu->kvm, &pt[i]); | 3479 | inspect_spte_has_rmap(vcpu->kvm, &pt[i]); |
3488 | } | 3480 | } |
3489 | } | 3481 | } |
3490 | return; | 3482 | return; |
3491 | } | 3483 | } |
3492 | 3484 | ||
3493 | static void audit_rmap(struct kvm_vcpu *vcpu) | 3485 | static void audit_rmap(struct kvm_vcpu *vcpu) |
3494 | { | 3486 | { |
3495 | check_writable_mappings_rmap(vcpu); | 3487 | check_writable_mappings_rmap(vcpu); |
3496 | count_rmaps(vcpu); | 3488 | count_rmaps(vcpu); |
3497 | } | 3489 | } |
3498 | 3490 | ||
3499 | static void audit_write_protection(struct kvm_vcpu *vcpu) | 3491 | static void audit_write_protection(struct kvm_vcpu *vcpu) |
3500 | { | 3492 | { |
3501 | struct kvm_mmu_page *sp; | 3493 | struct kvm_mmu_page *sp; |
3502 | struct kvm_memory_slot *slot; | 3494 | struct kvm_memory_slot *slot; |
3503 | unsigned long *rmapp; | 3495 | unsigned long *rmapp; |
3504 | u64 *spte; | 3496 | u64 *spte; |
3505 | gfn_t gfn; | 3497 | gfn_t gfn; |
3506 | 3498 | ||
3507 | list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { | 3499 | list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) { |
3508 | if (sp->role.direct) | 3500 | if (sp->role.direct) |
3509 | continue; | 3501 | continue; |
3510 | if (sp->unsync) | 3502 | if (sp->unsync) |
3511 | continue; | 3503 | continue; |
3512 | 3504 | ||
3513 | gfn = unalias_gfn(vcpu->kvm, sp->gfn); | 3505 | slot = gfn_to_memslot(vcpu->kvm, sp->gfn); |
3514 | slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn); | ||
3515 | rmapp = &slot->rmap[gfn - slot->base_gfn]; | 3506 | rmapp = &slot->rmap[gfn - slot->base_gfn]; |
3516 | 3507 | ||
3517 | spte = rmap_next(vcpu->kvm, rmapp, NULL); | 3508 | spte = rmap_next(vcpu->kvm, rmapp, NULL); |
3518 | while (spte) { | 3509 | while (spte) { |
3519 | if (is_writable_pte(*spte)) | 3510 | if (is_writable_pte(*spte)) |
3520 | printk(KERN_ERR "%s: (%s) shadow page has " | 3511 | printk(KERN_ERR "%s: (%s) shadow page has " |
3521 | "writable mappings: gfn %lx role %x\n", | 3512 | "writable mappings: gfn %lx role %x\n", |
3522 | __func__, audit_msg, sp->gfn, | 3513 | __func__, audit_msg, sp->gfn, |
3523 | sp->role.word); | 3514 | sp->role.word); |
3524 | spte = rmap_next(vcpu->kvm, rmapp, spte); | 3515 | spte = rmap_next(vcpu->kvm, rmapp, spte); |
3525 | } | 3516 | } |
3526 | } | 3517 | } |
3527 | } | 3518 | } |
3528 | 3519 | ||
3529 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) | 3520 | static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) |
3530 | { | 3521 | { |
3531 | int olddbg = dbg; | 3522 | int olddbg = dbg; |
3532 | 3523 | ||
3533 | dbg = 0; | 3524 | dbg = 0; |
3534 | audit_msg = msg; | 3525 | audit_msg = msg; |
3535 | audit_rmap(vcpu); | 3526 | audit_rmap(vcpu); |
3536 | audit_write_protection(vcpu); | 3527 | audit_write_protection(vcpu); |
3537 | if (strcmp("pre pte write", audit_msg) != 0) | 3528 | if (strcmp("pre pte write", audit_msg) != 0) |
3538 | audit_mappings(vcpu); | 3529 | audit_mappings(vcpu); |
3539 | audit_writable_sptes_have_rmaps(vcpu); | 3530 | audit_writable_sptes_have_rmaps(vcpu); |
3540 | dbg = olddbg; | 3531 | dbg = olddbg; |
3541 | } | 3532 | } |
3542 | 3533 | ||
3543 | #endif | 3534 | #endif |
3544 | 3535 |
arch/x86/kvm/paging_tmpl.h
1 | /* | 1 | /* |
2 | * Kernel-based Virtual Machine driver for Linux | 2 | * Kernel-based Virtual Machine driver for Linux |
3 | * | 3 | * |
4 | * This module enables machines with Intel VT-x extensions to run virtual | 4 | * This module enables machines with Intel VT-x extensions to run virtual |
5 | * machines without emulation or binary translation. | 5 | * machines without emulation or binary translation. |
6 | * | 6 | * |
7 | * MMU support | 7 | * MMU support |
8 | * | 8 | * |
9 | * Copyright (C) 2006 Qumranet, Inc. | 9 | * Copyright (C) 2006 Qumranet, Inc. |
10 | * Copyright 2010 Red Hat, Inc. and/or its affilates. | 10 | * Copyright 2010 Red Hat, Inc. and/or its affilates. |
11 | * | 11 | * |
12 | * Authors: | 12 | * Authors: |
13 | * Yaniv Kamay <yaniv@qumranet.com> | 13 | * Yaniv Kamay <yaniv@qumranet.com> |
14 | * Avi Kivity <avi@qumranet.com> | 14 | * Avi Kivity <avi@qumranet.com> |
15 | * | 15 | * |
16 | * This work is licensed under the terms of the GNU GPL, version 2. See | 16 | * This work is licensed under the terms of the GNU GPL, version 2. See |
17 | * the COPYING file in the top-level directory. | 17 | * the COPYING file in the top-level directory. |
18 | * | 18 | * |
19 | */ | 19 | */ |
20 | 20 | ||
21 | /* | 21 | /* |
22 | * We need the mmu code to access both 32-bit and 64-bit guest ptes, | 22 | * We need the mmu code to access both 32-bit and 64-bit guest ptes, |
23 | * so the code in this file is compiled twice, once per pte size. | 23 | * so the code in this file is compiled twice, once per pte size. |
24 | */ | 24 | */ |
25 | 25 | ||
26 | #if PTTYPE == 64 | 26 | #if PTTYPE == 64 |
27 | #define pt_element_t u64 | 27 | #define pt_element_t u64 |
28 | #define guest_walker guest_walker64 | 28 | #define guest_walker guest_walker64 |
29 | #define FNAME(name) paging##64_##name | 29 | #define FNAME(name) paging##64_##name |
30 | #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK | 30 | #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK |
31 | #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) | 31 | #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) |
32 | #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) | 32 | #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) |
33 | #define PT_INDEX(addr, level) PT64_INDEX(addr, level) | 33 | #define PT_INDEX(addr, level) PT64_INDEX(addr, level) |
34 | #define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level) | 34 | #define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level) |
35 | #define PT_LEVEL_BITS PT64_LEVEL_BITS | 35 | #define PT_LEVEL_BITS PT64_LEVEL_BITS |
36 | #ifdef CONFIG_X86_64 | 36 | #ifdef CONFIG_X86_64 |
37 | #define PT_MAX_FULL_LEVELS 4 | 37 | #define PT_MAX_FULL_LEVELS 4 |
38 | #define CMPXCHG cmpxchg | 38 | #define CMPXCHG cmpxchg |
39 | #else | 39 | #else |
40 | #define CMPXCHG cmpxchg64 | 40 | #define CMPXCHG cmpxchg64 |
41 | #define PT_MAX_FULL_LEVELS 2 | 41 | #define PT_MAX_FULL_LEVELS 2 |
42 | #endif | 42 | #endif |
43 | #elif PTTYPE == 32 | 43 | #elif PTTYPE == 32 |
44 | #define pt_element_t u32 | 44 | #define pt_element_t u32 |
45 | #define guest_walker guest_walker32 | 45 | #define guest_walker guest_walker32 |
46 | #define FNAME(name) paging##32_##name | 46 | #define FNAME(name) paging##32_##name |
47 | #define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK | 47 | #define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK |
48 | #define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl) | 48 | #define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl) |
49 | #define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl) | 49 | #define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl) |
50 | #define PT_INDEX(addr, level) PT32_INDEX(addr, level) | 50 | #define PT_INDEX(addr, level) PT32_INDEX(addr, level) |
51 | #define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level) | 51 | #define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level) |
52 | #define PT_LEVEL_BITS PT32_LEVEL_BITS | 52 | #define PT_LEVEL_BITS PT32_LEVEL_BITS |
53 | #define PT_MAX_FULL_LEVELS 2 | 53 | #define PT_MAX_FULL_LEVELS 2 |
54 | #define CMPXCHG cmpxchg | 54 | #define CMPXCHG cmpxchg |
55 | #else | 55 | #else |
56 | #error Invalid PTTYPE value | 56 | #error Invalid PTTYPE value |
57 | #endif | 57 | #endif |
58 | 58 | ||
59 | #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl) | 59 | #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl) |
60 | #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL) | 60 | #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL) |
61 | 61 | ||
62 | /* | 62 | /* |
63 | * The guest_walker structure emulates the behavior of the hardware page | 63 | * The guest_walker structure emulates the behavior of the hardware page |
64 | * table walker. | 64 | * table walker. |
65 | */ | 65 | */ |
66 | struct guest_walker { | 66 | struct guest_walker { |
67 | int level; | 67 | int level; |
68 | gfn_t table_gfn[PT_MAX_FULL_LEVELS]; | 68 | gfn_t table_gfn[PT_MAX_FULL_LEVELS]; |
69 | pt_element_t ptes[PT_MAX_FULL_LEVELS]; | 69 | pt_element_t ptes[PT_MAX_FULL_LEVELS]; |
70 | gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; | 70 | gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; |
71 | unsigned pt_access; | 71 | unsigned pt_access; |
72 | unsigned pte_access; | 72 | unsigned pte_access; |
73 | gfn_t gfn; | 73 | gfn_t gfn; |
74 | u32 error_code; | 74 | u32 error_code; |
75 | }; | 75 | }; |
76 | 76 | ||
77 | static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) | 77 | static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) |
78 | { | 78 | { |
79 | return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT; | 79 | return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT; |
80 | } | 80 | } |
81 | 81 | ||
82 | static bool FNAME(cmpxchg_gpte)(struct kvm *kvm, | 82 | static bool FNAME(cmpxchg_gpte)(struct kvm *kvm, |
83 | gfn_t table_gfn, unsigned index, | 83 | gfn_t table_gfn, unsigned index, |
84 | pt_element_t orig_pte, pt_element_t new_pte) | 84 | pt_element_t orig_pte, pt_element_t new_pte) |
85 | { | 85 | { |
86 | pt_element_t ret; | 86 | pt_element_t ret; |
87 | pt_element_t *table; | 87 | pt_element_t *table; |
88 | struct page *page; | 88 | struct page *page; |
89 | 89 | ||
90 | page = gfn_to_page(kvm, table_gfn); | 90 | page = gfn_to_page(kvm, table_gfn); |
91 | 91 | ||
92 | table = kmap_atomic(page, KM_USER0); | 92 | table = kmap_atomic(page, KM_USER0); |
93 | ret = CMPXCHG(&table[index], orig_pte, new_pte); | 93 | ret = CMPXCHG(&table[index], orig_pte, new_pte); |
94 | kunmap_atomic(table, KM_USER0); | 94 | kunmap_atomic(table, KM_USER0); |
95 | 95 | ||
96 | kvm_release_page_dirty(page); | 96 | kvm_release_page_dirty(page); |
97 | 97 | ||
98 | return (ret != orig_pte); | 98 | return (ret != orig_pte); |
99 | } | 99 | } |
100 | 100 | ||
101 | static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) | 101 | static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) |
102 | { | 102 | { |
103 | unsigned access; | 103 | unsigned access; |
104 | 104 | ||
105 | access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; | 105 | access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; |
106 | #if PTTYPE == 64 | 106 | #if PTTYPE == 64 |
107 | if (is_nx(vcpu)) | 107 | if (is_nx(vcpu)) |
108 | access &= ~(gpte >> PT64_NX_SHIFT); | 108 | access &= ~(gpte >> PT64_NX_SHIFT); |
109 | #endif | 109 | #endif |
110 | return access; | 110 | return access; |
111 | } | 111 | } |
112 | 112 | ||
113 | /* | 113 | /* |
114 | * Fetch a guest pte for a guest virtual address | 114 | * Fetch a guest pte for a guest virtual address |
115 | */ | 115 | */ |
116 | static int FNAME(walk_addr)(struct guest_walker *walker, | 116 | static int FNAME(walk_addr)(struct guest_walker *walker, |
117 | struct kvm_vcpu *vcpu, gva_t addr, | 117 | struct kvm_vcpu *vcpu, gva_t addr, |
118 | int write_fault, int user_fault, int fetch_fault) | 118 | int write_fault, int user_fault, int fetch_fault) |
119 | { | 119 | { |
120 | pt_element_t pte; | 120 | pt_element_t pte; |
121 | gfn_t table_gfn; | 121 | gfn_t table_gfn; |
122 | unsigned index, pt_access, pte_access; | 122 | unsigned index, pt_access, pte_access; |
123 | gpa_t pte_gpa; | 123 | gpa_t pte_gpa; |
124 | int rsvd_fault = 0; | 124 | int rsvd_fault = 0; |
125 | 125 | ||
126 | trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, | 126 | trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault, |
127 | fetch_fault); | 127 | fetch_fault); |
128 | walk: | 128 | walk: |
129 | walker->level = vcpu->arch.mmu.root_level; | 129 | walker->level = vcpu->arch.mmu.root_level; |
130 | pte = vcpu->arch.cr3; | 130 | pte = vcpu->arch.cr3; |
131 | #if PTTYPE == 64 | 131 | #if PTTYPE == 64 |
132 | if (!is_long_mode(vcpu)) { | 132 | if (!is_long_mode(vcpu)) { |
133 | pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3); | 133 | pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3); |
134 | trace_kvm_mmu_paging_element(pte, walker->level); | 134 | trace_kvm_mmu_paging_element(pte, walker->level); |
135 | if (!is_present_gpte(pte)) | 135 | if (!is_present_gpte(pte)) |
136 | goto not_present; | 136 | goto not_present; |
137 | --walker->level; | 137 | --walker->level; |
138 | } | 138 | } |
139 | #endif | 139 | #endif |
140 | ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) || | 140 | ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) || |
141 | (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0); | 141 | (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0); |
142 | 142 | ||
143 | pt_access = ACC_ALL; | 143 | pt_access = ACC_ALL; |
144 | 144 | ||
145 | for (;;) { | 145 | for (;;) { |
146 | index = PT_INDEX(addr, walker->level); | 146 | index = PT_INDEX(addr, walker->level); |
147 | 147 | ||
148 | table_gfn = gpte_to_gfn(pte); | 148 | table_gfn = gpte_to_gfn(pte); |
149 | pte_gpa = gfn_to_gpa(table_gfn); | 149 | pte_gpa = gfn_to_gpa(table_gfn); |
150 | pte_gpa += index * sizeof(pt_element_t); | 150 | pte_gpa += index * sizeof(pt_element_t); |
151 | walker->table_gfn[walker->level - 1] = table_gfn; | 151 | walker->table_gfn[walker->level - 1] = table_gfn; |
152 | walker->pte_gpa[walker->level - 1] = pte_gpa; | 152 | walker->pte_gpa[walker->level - 1] = pte_gpa; |
153 | 153 | ||
154 | if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) | 154 | if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) |
155 | goto not_present; | 155 | goto not_present; |
156 | 156 | ||
157 | trace_kvm_mmu_paging_element(pte, walker->level); | 157 | trace_kvm_mmu_paging_element(pte, walker->level); |
158 | 158 | ||
159 | if (!is_present_gpte(pte)) | 159 | if (!is_present_gpte(pte)) |
160 | goto not_present; | 160 | goto not_present; |
161 | 161 | ||
162 | rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker->level); | 162 | rsvd_fault = is_rsvd_bits_set(vcpu, pte, walker->level); |
163 | if (rsvd_fault) | 163 | if (rsvd_fault) |
164 | goto access_error; | 164 | goto access_error; |
165 | 165 | ||
166 | if (write_fault && !is_writable_pte(pte)) | 166 | if (write_fault && !is_writable_pte(pte)) |
167 | if (user_fault || is_write_protection(vcpu)) | 167 | if (user_fault || is_write_protection(vcpu)) |
168 | goto access_error; | 168 | goto access_error; |
169 | 169 | ||
170 | if (user_fault && !(pte & PT_USER_MASK)) | 170 | if (user_fault && !(pte & PT_USER_MASK)) |
171 | goto access_error; | 171 | goto access_error; |
172 | 172 | ||
173 | #if PTTYPE == 64 | 173 | #if PTTYPE == 64 |
174 | if (fetch_fault && (pte & PT64_NX_MASK)) | 174 | if (fetch_fault && (pte & PT64_NX_MASK)) |
175 | goto access_error; | 175 | goto access_error; |
176 | #endif | 176 | #endif |
177 | 177 | ||
178 | if (!(pte & PT_ACCESSED_MASK)) { | 178 | if (!(pte & PT_ACCESSED_MASK)) { |
179 | trace_kvm_mmu_set_accessed_bit(table_gfn, index, | 179 | trace_kvm_mmu_set_accessed_bit(table_gfn, index, |
180 | sizeof(pte)); | 180 | sizeof(pte)); |
181 | if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, | 181 | if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, |
182 | index, pte, pte|PT_ACCESSED_MASK)) | 182 | index, pte, pte|PT_ACCESSED_MASK)) |
183 | goto walk; | 183 | goto walk; |
184 | mark_page_dirty(vcpu->kvm, table_gfn); | 184 | mark_page_dirty(vcpu->kvm, table_gfn); |
185 | pte |= PT_ACCESSED_MASK; | 185 | pte |= PT_ACCESSED_MASK; |
186 | } | 186 | } |
187 | 187 | ||
188 | pte_access = pt_access & FNAME(gpte_access)(vcpu, pte); | 188 | pte_access = pt_access & FNAME(gpte_access)(vcpu, pte); |
189 | 189 | ||
190 | walker->ptes[walker->level - 1] = pte; | 190 | walker->ptes[walker->level - 1] = pte; |
191 | 191 | ||
192 | if ((walker->level == PT_PAGE_TABLE_LEVEL) || | 192 | if ((walker->level == PT_PAGE_TABLE_LEVEL) || |
193 | ((walker->level == PT_DIRECTORY_LEVEL) && | 193 | ((walker->level == PT_DIRECTORY_LEVEL) && |
194 | is_large_pte(pte) && | 194 | is_large_pte(pte) && |
195 | (PTTYPE == 64 || is_pse(vcpu))) || | 195 | (PTTYPE == 64 || is_pse(vcpu))) || |
196 | ((walker->level == PT_PDPE_LEVEL) && | 196 | ((walker->level == PT_PDPE_LEVEL) && |
197 | is_large_pte(pte) && | 197 | is_large_pte(pte) && |
198 | is_long_mode(vcpu))) { | 198 | is_long_mode(vcpu))) { |
199 | int lvl = walker->level; | 199 | int lvl = walker->level; |
200 | 200 | ||
201 | walker->gfn = gpte_to_gfn_lvl(pte, lvl); | 201 | walker->gfn = gpte_to_gfn_lvl(pte, lvl); |
202 | walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) | 202 | walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) |
203 | >> PAGE_SHIFT; | 203 | >> PAGE_SHIFT; |
204 | 204 | ||
205 | if (PTTYPE == 32 && | 205 | if (PTTYPE == 32 && |
206 | walker->level == PT_DIRECTORY_LEVEL && | 206 | walker->level == PT_DIRECTORY_LEVEL && |
207 | is_cpuid_PSE36()) | 207 | is_cpuid_PSE36()) |
208 | walker->gfn += pse36_gfn_delta(pte); | 208 | walker->gfn += pse36_gfn_delta(pte); |
209 | 209 | ||
210 | break; | 210 | break; |
211 | } | 211 | } |
212 | 212 | ||
213 | pt_access = pte_access; | 213 | pt_access = pte_access; |
214 | --walker->level; | 214 | --walker->level; |
215 | } | 215 | } |
216 | 216 | ||
217 | if (write_fault && !is_dirty_gpte(pte)) { | 217 | if (write_fault && !is_dirty_gpte(pte)) { |
218 | bool ret; | 218 | bool ret; |
219 | 219 | ||
220 | trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); | 220 | trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); |
221 | ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, | 221 | ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, |
222 | pte|PT_DIRTY_MASK); | 222 | pte|PT_DIRTY_MASK); |
223 | if (ret) | 223 | if (ret) |
224 | goto walk; | 224 | goto walk; |
225 | mark_page_dirty(vcpu->kvm, table_gfn); | 225 | mark_page_dirty(vcpu->kvm, table_gfn); |
226 | pte |= PT_DIRTY_MASK; | 226 | pte |= PT_DIRTY_MASK; |
227 | walker->ptes[walker->level - 1] = pte; | 227 | walker->ptes[walker->level - 1] = pte; |
228 | } | 228 | } |
229 | 229 | ||
230 | walker->pt_access = pt_access; | 230 | walker->pt_access = pt_access; |
231 | walker->pte_access = pte_access; | 231 | walker->pte_access = pte_access; |
232 | pgprintk("%s: pte %llx pte_access %x pt_access %x\n", | 232 | pgprintk("%s: pte %llx pte_access %x pt_access %x\n", |
233 | __func__, (u64)pte, pte_access, pt_access); | 233 | __func__, (u64)pte, pte_access, pt_access); |
234 | return 1; | 234 | return 1; |
235 | 235 | ||
236 | not_present: | 236 | not_present: |
237 | walker->error_code = 0; | 237 | walker->error_code = 0; |
238 | goto err; | 238 | goto err; |
239 | 239 | ||
240 | access_error: | 240 | access_error: |
241 | walker->error_code = PFERR_PRESENT_MASK; | 241 | walker->error_code = PFERR_PRESENT_MASK; |
242 | 242 | ||
243 | err: | 243 | err: |
244 | if (write_fault) | 244 | if (write_fault) |
245 | walker->error_code |= PFERR_WRITE_MASK; | 245 | walker->error_code |= PFERR_WRITE_MASK; |
246 | if (user_fault) | 246 | if (user_fault) |
247 | walker->error_code |= PFERR_USER_MASK; | 247 | walker->error_code |= PFERR_USER_MASK; |
248 | if (fetch_fault) | 248 | if (fetch_fault) |
249 | walker->error_code |= PFERR_FETCH_MASK; | 249 | walker->error_code |= PFERR_FETCH_MASK; |
250 | if (rsvd_fault) | 250 | if (rsvd_fault) |
251 | walker->error_code |= PFERR_RSVD_MASK; | 251 | walker->error_code |= PFERR_RSVD_MASK; |
252 | trace_kvm_mmu_walker_error(walker->error_code); | 252 | trace_kvm_mmu_walker_error(walker->error_code); |
253 | return 0; | 253 | return 0; |
254 | } | 254 | } |
255 | 255 | ||
256 | static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, | 256 | static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, |
257 | u64 *spte, const void *pte) | 257 | u64 *spte, const void *pte) |
258 | { | 258 | { |
259 | pt_element_t gpte; | 259 | pt_element_t gpte; |
260 | unsigned pte_access; | 260 | unsigned pte_access; |
261 | pfn_t pfn; | 261 | pfn_t pfn; |
262 | u64 new_spte; | 262 | u64 new_spte; |
263 | 263 | ||
264 | gpte = *(const pt_element_t *)pte; | 264 | gpte = *(const pt_element_t *)pte; |
265 | if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { | 265 | if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { |
266 | if (!is_present_gpte(gpte)) { | 266 | if (!is_present_gpte(gpte)) { |
267 | if (sp->unsync) | 267 | if (sp->unsync) |
268 | new_spte = shadow_trap_nonpresent_pte; | 268 | new_spte = shadow_trap_nonpresent_pte; |
269 | else | 269 | else |
270 | new_spte = shadow_notrap_nonpresent_pte; | 270 | new_spte = shadow_notrap_nonpresent_pte; |
271 | __set_spte(spte, new_spte); | 271 | __set_spte(spte, new_spte); |
272 | } | 272 | } |
273 | return; | 273 | return; |
274 | } | 274 | } |
275 | pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte); | 275 | pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte); |
276 | pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); | 276 | pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); |
277 | if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn) | 277 | if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn) |
278 | return; | 278 | return; |
279 | pfn = vcpu->arch.update_pte.pfn; | 279 | pfn = vcpu->arch.update_pte.pfn; |
280 | if (is_error_pfn(pfn)) | 280 | if (is_error_pfn(pfn)) |
281 | return; | 281 | return; |
282 | if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq)) | 282 | if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq)) |
283 | return; | 283 | return; |
284 | kvm_get_pfn(pfn); | 284 | kvm_get_pfn(pfn); |
285 | /* | 285 | /* |
286 | * we call mmu_set_spte() with reset_host_protection = true beacuse that | 286 | * we call mmu_set_spte() with reset_host_protection = true beacuse that |
287 | * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1). | 287 | * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1). |
288 | */ | 288 | */ |
289 | mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, | 289 | mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, |
290 | is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL, | 290 | is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL, |
291 | gpte_to_gfn(gpte), pfn, true, true); | 291 | gpte_to_gfn(gpte), pfn, true, true); |
292 | } | 292 | } |
293 | 293 | ||
294 | /* | 294 | /* |
295 | * Fetch a shadow pte for a specific level in the paging hierarchy. | 295 | * Fetch a shadow pte for a specific level in the paging hierarchy. |
296 | */ | 296 | */ |
297 | static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, | 297 | static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, |
298 | struct guest_walker *gw, | 298 | struct guest_walker *gw, |
299 | int user_fault, int write_fault, int hlevel, | 299 | int user_fault, int write_fault, int hlevel, |
300 | int *ptwrite, pfn_t pfn) | 300 | int *ptwrite, pfn_t pfn) |
301 | { | 301 | { |
302 | unsigned access = gw->pt_access; | 302 | unsigned access = gw->pt_access; |
303 | struct kvm_mmu_page *sp; | 303 | struct kvm_mmu_page *sp; |
304 | u64 spte, *sptep = NULL; | 304 | u64 spte, *sptep = NULL; |
305 | int direct; | 305 | int direct; |
306 | gfn_t table_gfn; | 306 | gfn_t table_gfn; |
307 | int r; | 307 | int r; |
308 | int level; | 308 | int level; |
309 | pt_element_t curr_pte; | 309 | pt_element_t curr_pte; |
310 | struct kvm_shadow_walk_iterator iterator; | 310 | struct kvm_shadow_walk_iterator iterator; |
311 | 311 | ||
312 | if (!is_present_gpte(gw->ptes[gw->level - 1])) | 312 | if (!is_present_gpte(gw->ptes[gw->level - 1])) |
313 | return NULL; | 313 | return NULL; |
314 | 314 | ||
315 | for_each_shadow_entry(vcpu, addr, iterator) { | 315 | for_each_shadow_entry(vcpu, addr, iterator) { |
316 | level = iterator.level; | 316 | level = iterator.level; |
317 | sptep = iterator.sptep; | 317 | sptep = iterator.sptep; |
318 | if (iterator.level == hlevel) { | 318 | if (iterator.level == hlevel) { |
319 | mmu_set_spte(vcpu, sptep, access, | 319 | mmu_set_spte(vcpu, sptep, access, |
320 | gw->pte_access & access, | 320 | gw->pte_access & access, |
321 | user_fault, write_fault, | 321 | user_fault, write_fault, |
322 | is_dirty_gpte(gw->ptes[gw->level-1]), | 322 | is_dirty_gpte(gw->ptes[gw->level-1]), |
323 | ptwrite, level, | 323 | ptwrite, level, |
324 | gw->gfn, pfn, false, true); | 324 | gw->gfn, pfn, false, true); |
325 | break; | 325 | break; |
326 | } | 326 | } |
327 | 327 | ||
328 | if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) | 328 | if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) |
329 | continue; | 329 | continue; |
330 | 330 | ||
331 | if (is_large_pte(*sptep)) { | 331 | if (is_large_pte(*sptep)) { |
332 | rmap_remove(vcpu->kvm, sptep); | 332 | rmap_remove(vcpu->kvm, sptep); |
333 | __set_spte(sptep, shadow_trap_nonpresent_pte); | 333 | __set_spte(sptep, shadow_trap_nonpresent_pte); |
334 | kvm_flush_remote_tlbs(vcpu->kvm); | 334 | kvm_flush_remote_tlbs(vcpu->kvm); |
335 | } | 335 | } |
336 | 336 | ||
337 | if (level <= gw->level) { | 337 | if (level <= gw->level) { |
338 | int delta = level - gw->level + 1; | 338 | int delta = level - gw->level + 1; |
339 | direct = 1; | 339 | direct = 1; |
340 | if (!is_dirty_gpte(gw->ptes[level - delta])) | 340 | if (!is_dirty_gpte(gw->ptes[level - delta])) |
341 | access &= ~ACC_WRITE_MASK; | 341 | access &= ~ACC_WRITE_MASK; |
342 | /* | 342 | /* |
343 | * It is a large guest pages backed by small host pages, | 343 | * It is a large guest pages backed by small host pages, |
344 | * So we set @direct(@sp->role.direct)=1, and set | 344 | * So we set @direct(@sp->role.direct)=1, and set |
345 | * @table_gfn(@sp->gfn)=the base page frame for linear | 345 | * @table_gfn(@sp->gfn)=the base page frame for linear |
346 | * translations. | 346 | * translations. |
347 | */ | 347 | */ |
348 | table_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); | 348 | table_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); |
349 | access &= gw->pte_access; | 349 | access &= gw->pte_access; |
350 | } else { | 350 | } else { |
351 | direct = 0; | 351 | direct = 0; |
352 | table_gfn = gw->table_gfn[level - 2]; | 352 | table_gfn = gw->table_gfn[level - 2]; |
353 | } | 353 | } |
354 | sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, | 354 | sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, |
355 | direct, access, sptep); | 355 | direct, access, sptep); |
356 | if (!direct) { | 356 | if (!direct) { |
357 | r = kvm_read_guest_atomic(vcpu->kvm, | 357 | r = kvm_read_guest_atomic(vcpu->kvm, |
358 | gw->pte_gpa[level - 2], | 358 | gw->pte_gpa[level - 2], |
359 | &curr_pte, sizeof(curr_pte)); | 359 | &curr_pte, sizeof(curr_pte)); |
360 | if (r || curr_pte != gw->ptes[level - 2]) { | 360 | if (r || curr_pte != gw->ptes[level - 2]) { |
361 | kvm_mmu_put_page(sp, sptep); | 361 | kvm_mmu_put_page(sp, sptep); |
362 | kvm_release_pfn_clean(pfn); | 362 | kvm_release_pfn_clean(pfn); |
363 | sptep = NULL; | 363 | sptep = NULL; |
364 | break; | 364 | break; |
365 | } | 365 | } |
366 | } | 366 | } |
367 | 367 | ||
368 | spte = __pa(sp->spt) | 368 | spte = __pa(sp->spt) |
369 | | PT_PRESENT_MASK | PT_ACCESSED_MASK | 369 | | PT_PRESENT_MASK | PT_ACCESSED_MASK |
370 | | PT_WRITABLE_MASK | PT_USER_MASK; | 370 | | PT_WRITABLE_MASK | PT_USER_MASK; |
371 | *sptep = spte; | 371 | *sptep = spte; |
372 | } | 372 | } |
373 | 373 | ||
374 | return sptep; | 374 | return sptep; |
375 | } | 375 | } |
376 | 376 | ||
377 | /* | 377 | /* |
378 | * Page fault handler. There are several causes for a page fault: | 378 | * Page fault handler. There are several causes for a page fault: |
379 | * - there is no shadow pte for the guest pte | 379 | * - there is no shadow pte for the guest pte |
380 | * - write access through a shadow pte marked read only so that we can set | 380 | * - write access through a shadow pte marked read only so that we can set |
381 | * the dirty bit | 381 | * the dirty bit |
382 | * - write access to a shadow pte marked read only so we can update the page | 382 | * - write access to a shadow pte marked read only so we can update the page |
383 | * dirty bitmap, when userspace requests it | 383 | * dirty bitmap, when userspace requests it |
384 | * - mmio access; in this case we will never install a present shadow pte | 384 | * - mmio access; in this case we will never install a present shadow pte |
385 | * - normal guest page fault due to the guest pte marked not present, not | 385 | * - normal guest page fault due to the guest pte marked not present, not |
386 | * writable, or not executable | 386 | * writable, or not executable |
387 | * | 387 | * |
388 | * Returns: 1 if we need to emulate the instruction, 0 otherwise, or | 388 | * Returns: 1 if we need to emulate the instruction, 0 otherwise, or |
389 | * a negative value on error. | 389 | * a negative value on error. |
390 | */ | 390 | */ |
391 | static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, | 391 | static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, |
392 | u32 error_code) | 392 | u32 error_code) |
393 | { | 393 | { |
394 | int write_fault = error_code & PFERR_WRITE_MASK; | 394 | int write_fault = error_code & PFERR_WRITE_MASK; |
395 | int user_fault = error_code & PFERR_USER_MASK; | 395 | int user_fault = error_code & PFERR_USER_MASK; |
396 | int fetch_fault = error_code & PFERR_FETCH_MASK; | 396 | int fetch_fault = error_code & PFERR_FETCH_MASK; |
397 | struct guest_walker walker; | 397 | struct guest_walker walker; |
398 | u64 *sptep; | 398 | u64 *sptep; |
399 | int write_pt = 0; | 399 | int write_pt = 0; |
400 | int r; | 400 | int r; |
401 | pfn_t pfn; | 401 | pfn_t pfn; |
402 | int level = PT_PAGE_TABLE_LEVEL; | 402 | int level = PT_PAGE_TABLE_LEVEL; |
403 | unsigned long mmu_seq; | 403 | unsigned long mmu_seq; |
404 | 404 | ||
405 | pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); | 405 | pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); |
406 | kvm_mmu_audit(vcpu, "pre page fault"); | 406 | kvm_mmu_audit(vcpu, "pre page fault"); |
407 | 407 | ||
408 | r = mmu_topup_memory_caches(vcpu); | 408 | r = mmu_topup_memory_caches(vcpu); |
409 | if (r) | 409 | if (r) |
410 | return r; | 410 | return r; |
411 | 411 | ||
412 | /* | 412 | /* |
413 | * Look up the guest pte for the faulting address. | 413 | * Look up the guest pte for the faulting address. |
414 | */ | 414 | */ |
415 | r = FNAME(walk_addr)(&walker, vcpu, addr, write_fault, user_fault, | 415 | r = FNAME(walk_addr)(&walker, vcpu, addr, write_fault, user_fault, |
416 | fetch_fault); | 416 | fetch_fault); |
417 | 417 | ||
418 | /* | 418 | /* |
419 | * The page is not mapped by the guest. Let the guest handle it. | 419 | * The page is not mapped by the guest. Let the guest handle it. |
420 | */ | 420 | */ |
421 | if (!r) { | 421 | if (!r) { |
422 | pgprintk("%s: guest page fault\n", __func__); | 422 | pgprintk("%s: guest page fault\n", __func__); |
423 | inject_page_fault(vcpu, addr, walker.error_code); | 423 | inject_page_fault(vcpu, addr, walker.error_code); |
424 | vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ | 424 | vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ |
425 | return 0; | 425 | return 0; |
426 | } | 426 | } |
427 | 427 | ||
428 | if (walker.level >= PT_DIRECTORY_LEVEL) { | 428 | if (walker.level >= PT_DIRECTORY_LEVEL) { |
429 | level = min(walker.level, mapping_level(vcpu, walker.gfn)); | 429 | level = min(walker.level, mapping_level(vcpu, walker.gfn)); |
430 | walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); | 430 | walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); |
431 | } | 431 | } |
432 | 432 | ||
433 | mmu_seq = vcpu->kvm->mmu_notifier_seq; | 433 | mmu_seq = vcpu->kvm->mmu_notifier_seq; |
434 | smp_rmb(); | 434 | smp_rmb(); |
435 | pfn = gfn_to_pfn(vcpu->kvm, walker.gfn); | 435 | pfn = gfn_to_pfn(vcpu->kvm, walker.gfn); |
436 | 436 | ||
437 | /* mmio */ | 437 | /* mmio */ |
438 | if (is_error_pfn(pfn)) | 438 | if (is_error_pfn(pfn)) |
439 | return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); | 439 | return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn); |
440 | 440 | ||
441 | spin_lock(&vcpu->kvm->mmu_lock); | 441 | spin_lock(&vcpu->kvm->mmu_lock); |
442 | if (mmu_notifier_retry(vcpu, mmu_seq)) | 442 | if (mmu_notifier_retry(vcpu, mmu_seq)) |
443 | goto out_unlock; | 443 | goto out_unlock; |
444 | kvm_mmu_free_some_pages(vcpu); | 444 | kvm_mmu_free_some_pages(vcpu); |
445 | sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, | 445 | sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault, |
446 | level, &write_pt, pfn); | 446 | level, &write_pt, pfn); |
447 | (void)sptep; | 447 | (void)sptep; |
448 | pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, | 448 | pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__, |
449 | sptep, *sptep, write_pt); | 449 | sptep, *sptep, write_pt); |
450 | 450 | ||
451 | if (!write_pt) | 451 | if (!write_pt) |
452 | vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ | 452 | vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ |
453 | 453 | ||
454 | ++vcpu->stat.pf_fixed; | 454 | ++vcpu->stat.pf_fixed; |
455 | kvm_mmu_audit(vcpu, "post page fault (fixed)"); | 455 | kvm_mmu_audit(vcpu, "post page fault (fixed)"); |
456 | spin_unlock(&vcpu->kvm->mmu_lock); | 456 | spin_unlock(&vcpu->kvm->mmu_lock); |
457 | 457 | ||
458 | return write_pt; | 458 | return write_pt; |
459 | 459 | ||
460 | out_unlock: | 460 | out_unlock: |
461 | spin_unlock(&vcpu->kvm->mmu_lock); | 461 | spin_unlock(&vcpu->kvm->mmu_lock); |
462 | kvm_release_pfn_clean(pfn); | 462 | kvm_release_pfn_clean(pfn); |
463 | return 0; | 463 | return 0; |
464 | } | 464 | } |
465 | 465 | ||
466 | static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) | 466 | static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) |
467 | { | 467 | { |
468 | struct kvm_shadow_walk_iterator iterator; | 468 | struct kvm_shadow_walk_iterator iterator; |
469 | struct kvm_mmu_page *sp; | 469 | struct kvm_mmu_page *sp; |
470 | gpa_t pte_gpa = -1; | 470 | gpa_t pte_gpa = -1; |
471 | int level; | 471 | int level; |
472 | u64 *sptep; | 472 | u64 *sptep; |
473 | int need_flush = 0; | 473 | int need_flush = 0; |
474 | 474 | ||
475 | spin_lock(&vcpu->kvm->mmu_lock); | 475 | spin_lock(&vcpu->kvm->mmu_lock); |
476 | 476 | ||
477 | for_each_shadow_entry(vcpu, gva, iterator) { | 477 | for_each_shadow_entry(vcpu, gva, iterator) { |
478 | level = iterator.level; | 478 | level = iterator.level; |
479 | sptep = iterator.sptep; | 479 | sptep = iterator.sptep; |
480 | 480 | ||
481 | sp = page_header(__pa(sptep)); | 481 | sp = page_header(__pa(sptep)); |
482 | if (is_last_spte(*sptep, level)) { | 482 | if (is_last_spte(*sptep, level)) { |
483 | int offset, shift; | 483 | int offset, shift; |
484 | 484 | ||
485 | if (!sp->unsync) | 485 | if (!sp->unsync) |
486 | break; | 486 | break; |
487 | 487 | ||
488 | shift = PAGE_SHIFT - | 488 | shift = PAGE_SHIFT - |
489 | (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; | 489 | (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; |
490 | offset = sp->role.quadrant << shift; | 490 | offset = sp->role.quadrant << shift; |
491 | 491 | ||
492 | pte_gpa = (sp->gfn << PAGE_SHIFT) + offset; | 492 | pte_gpa = (sp->gfn << PAGE_SHIFT) + offset; |
493 | pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); | 493 | pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); |
494 | 494 | ||
495 | if (is_shadow_present_pte(*sptep)) { | 495 | if (is_shadow_present_pte(*sptep)) { |
496 | rmap_remove(vcpu->kvm, sptep); | 496 | rmap_remove(vcpu->kvm, sptep); |
497 | if (is_large_pte(*sptep)) | 497 | if (is_large_pte(*sptep)) |
498 | --vcpu->kvm->stat.lpages; | 498 | --vcpu->kvm->stat.lpages; |
499 | need_flush = 1; | 499 | need_flush = 1; |
500 | } | 500 | } |
501 | __set_spte(sptep, shadow_trap_nonpresent_pte); | 501 | __set_spte(sptep, shadow_trap_nonpresent_pte); |
502 | break; | 502 | break; |
503 | } | 503 | } |
504 | 504 | ||
505 | if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) | 505 | if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) |
506 | break; | 506 | break; |
507 | } | 507 | } |
508 | 508 | ||
509 | if (need_flush) | 509 | if (need_flush) |
510 | kvm_flush_remote_tlbs(vcpu->kvm); | 510 | kvm_flush_remote_tlbs(vcpu->kvm); |
511 | 511 | ||
512 | atomic_inc(&vcpu->kvm->arch.invlpg_counter); | 512 | atomic_inc(&vcpu->kvm->arch.invlpg_counter); |
513 | 513 | ||
514 | spin_unlock(&vcpu->kvm->mmu_lock); | 514 | spin_unlock(&vcpu->kvm->mmu_lock); |
515 | 515 | ||
516 | if (pte_gpa == -1) | 516 | if (pte_gpa == -1) |
517 | return; | 517 | return; |
518 | 518 | ||
519 | if (mmu_topup_memory_caches(vcpu)) | 519 | if (mmu_topup_memory_caches(vcpu)) |
520 | return; | 520 | return; |
521 | kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0); | 521 | kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0); |
522 | } | 522 | } |
523 | 523 | ||
524 | static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, | 524 | static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, |
525 | u32 *error) | 525 | u32 *error) |
526 | { | 526 | { |
527 | struct guest_walker walker; | 527 | struct guest_walker walker; |
528 | gpa_t gpa = UNMAPPED_GVA; | 528 | gpa_t gpa = UNMAPPED_GVA; |
529 | int r; | 529 | int r; |
530 | 530 | ||
531 | r = FNAME(walk_addr)(&walker, vcpu, vaddr, | 531 | r = FNAME(walk_addr)(&walker, vcpu, vaddr, |
532 | !!(access & PFERR_WRITE_MASK), | 532 | !!(access & PFERR_WRITE_MASK), |
533 | !!(access & PFERR_USER_MASK), | 533 | !!(access & PFERR_USER_MASK), |
534 | !!(access & PFERR_FETCH_MASK)); | 534 | !!(access & PFERR_FETCH_MASK)); |
535 | 535 | ||
536 | if (r) { | 536 | if (r) { |
537 | gpa = gfn_to_gpa(walker.gfn); | 537 | gpa = gfn_to_gpa(walker.gfn); |
538 | gpa |= vaddr & ~PAGE_MASK; | 538 | gpa |= vaddr & ~PAGE_MASK; |
539 | } else if (error) | 539 | } else if (error) |
540 | *error = walker.error_code; | 540 | *error = walker.error_code; |
541 | 541 | ||
542 | return gpa; | 542 | return gpa; |
543 | } | 543 | } |
544 | 544 | ||
545 | static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu, | 545 | static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu, |
546 | struct kvm_mmu_page *sp) | 546 | struct kvm_mmu_page *sp) |
547 | { | 547 | { |
548 | int i, j, offset, r; | 548 | int i, j, offset, r; |
549 | pt_element_t pt[256 / sizeof(pt_element_t)]; | 549 | pt_element_t pt[256 / sizeof(pt_element_t)]; |
550 | gpa_t pte_gpa; | 550 | gpa_t pte_gpa; |
551 | 551 | ||
552 | if (sp->role.direct | 552 | if (sp->role.direct |
553 | || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) { | 553 | || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) { |
554 | nonpaging_prefetch_page(vcpu, sp); | 554 | nonpaging_prefetch_page(vcpu, sp); |
555 | return; | 555 | return; |
556 | } | 556 | } |
557 | 557 | ||
558 | pte_gpa = gfn_to_gpa(sp->gfn); | 558 | pte_gpa = gfn_to_gpa(sp->gfn); |
559 | if (PTTYPE == 32) { | 559 | if (PTTYPE == 32) { |
560 | offset = sp->role.quadrant << PT64_LEVEL_BITS; | 560 | offset = sp->role.quadrant << PT64_LEVEL_BITS; |
561 | pte_gpa += offset * sizeof(pt_element_t); | 561 | pte_gpa += offset * sizeof(pt_element_t); |
562 | } | 562 | } |
563 | 563 | ||
564 | for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) { | 564 | for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) { |
565 | r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt); | 565 | r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt); |
566 | pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t); | 566 | pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t); |
567 | for (j = 0; j < ARRAY_SIZE(pt); ++j) | 567 | for (j = 0; j < ARRAY_SIZE(pt); ++j) |
568 | if (r || is_present_gpte(pt[j])) | 568 | if (r || is_present_gpte(pt[j])) |
569 | sp->spt[i+j] = shadow_trap_nonpresent_pte; | 569 | sp->spt[i+j] = shadow_trap_nonpresent_pte; |
570 | else | 570 | else |
571 | sp->spt[i+j] = shadow_notrap_nonpresent_pte; | 571 | sp->spt[i+j] = shadow_notrap_nonpresent_pte; |
572 | } | 572 | } |
573 | } | 573 | } |
574 | 574 | ||
575 | /* | 575 | /* |
576 | * Using the cached information from sp->gfns is safe because: | 576 | * Using the cached information from sp->gfns is safe because: |
577 | * - The spte has a reference to the struct page, so the pfn for a given gfn | 577 | * - The spte has a reference to the struct page, so the pfn for a given gfn |
578 | * can't change unless all sptes pointing to it are nuked first. | 578 | * can't change unless all sptes pointing to it are nuked first. |
579 | * - Alias changes zap the entire shadow cache. | ||
580 | */ | 579 | */ |
581 | static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, | 580 | static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, |
582 | bool clear_unsync) | 581 | bool clear_unsync) |
583 | { | 582 | { |
584 | int i, offset, nr_present; | 583 | int i, offset, nr_present; |
585 | bool reset_host_protection; | 584 | bool reset_host_protection; |
586 | gpa_t first_pte_gpa; | 585 | gpa_t first_pte_gpa; |
587 | 586 | ||
588 | offset = nr_present = 0; | 587 | offset = nr_present = 0; |
589 | 588 | ||
590 | /* direct kvm_mmu_page can not be unsync. */ | 589 | /* direct kvm_mmu_page can not be unsync. */ |
591 | BUG_ON(sp->role.direct); | 590 | BUG_ON(sp->role.direct); |
592 | 591 | ||
593 | if (PTTYPE == 32) | 592 | if (PTTYPE == 32) |
594 | offset = sp->role.quadrant << PT64_LEVEL_BITS; | 593 | offset = sp->role.quadrant << PT64_LEVEL_BITS; |
595 | 594 | ||
596 | first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t); | 595 | first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t); |
597 | 596 | ||
598 | for (i = 0; i < PT64_ENT_PER_PAGE; i++) { | 597 | for (i = 0; i < PT64_ENT_PER_PAGE; i++) { |
599 | unsigned pte_access; | 598 | unsigned pte_access; |
600 | pt_element_t gpte; | 599 | pt_element_t gpte; |
601 | gpa_t pte_gpa; | 600 | gpa_t pte_gpa; |
602 | gfn_t gfn; | 601 | gfn_t gfn; |
603 | 602 | ||
604 | if (!is_shadow_present_pte(sp->spt[i])) | 603 | if (!is_shadow_present_pte(sp->spt[i])) |
605 | continue; | 604 | continue; |
606 | 605 | ||
607 | pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); | 606 | pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); |
608 | 607 | ||
609 | if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte, | 608 | if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte, |
610 | sizeof(pt_element_t))) | 609 | sizeof(pt_element_t))) |
611 | return -EINVAL; | 610 | return -EINVAL; |
612 | 611 | ||
613 | gfn = gpte_to_gfn(gpte); | 612 | gfn = gpte_to_gfn(gpte); |
614 | if (unalias_gfn(vcpu->kvm, gfn) != sp->gfns[i] || | 613 | if (gfn != sp->gfns[i] || |
615 | !is_present_gpte(gpte) || !(gpte & PT_ACCESSED_MASK)) { | 614 | !is_present_gpte(gpte) || !(gpte & PT_ACCESSED_MASK)) { |
616 | u64 nonpresent; | 615 | u64 nonpresent; |
617 | 616 | ||
618 | rmap_remove(vcpu->kvm, &sp->spt[i]); | 617 | rmap_remove(vcpu->kvm, &sp->spt[i]); |
619 | if (is_present_gpte(gpte) || !clear_unsync) | 618 | if (is_present_gpte(gpte) || !clear_unsync) |
620 | nonpresent = shadow_trap_nonpresent_pte; | 619 | nonpresent = shadow_trap_nonpresent_pte; |
621 | else | 620 | else |
622 | nonpresent = shadow_notrap_nonpresent_pte; | 621 | nonpresent = shadow_notrap_nonpresent_pte; |
623 | __set_spte(&sp->spt[i], nonpresent); | 622 | __set_spte(&sp->spt[i], nonpresent); |
624 | continue; | 623 | continue; |
625 | } | 624 | } |
626 | 625 | ||
627 | nr_present++; | 626 | nr_present++; |
628 | pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); | 627 | pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte); |
629 | if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) { | 628 | if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) { |
630 | pte_access &= ~ACC_WRITE_MASK; | 629 | pte_access &= ~ACC_WRITE_MASK; |
631 | reset_host_protection = 0; | 630 | reset_host_protection = 0; |
632 | } else { | 631 | } else { |
633 | reset_host_protection = 1; | 632 | reset_host_protection = 1; |
634 | } | 633 | } |
635 | set_spte(vcpu, &sp->spt[i], pte_access, 0, 0, | 634 | set_spte(vcpu, &sp->spt[i], pte_access, 0, 0, |
636 | is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn, | 635 | is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn, |
637 | spte_to_pfn(sp->spt[i]), true, false, | 636 | spte_to_pfn(sp->spt[i]), true, false, |
638 | reset_host_protection); | 637 | reset_host_protection); |
639 | } | 638 | } |
640 | 639 | ||
641 | return !nr_present; | 640 | return !nr_present; |
642 | } | 641 | } |
643 | 642 | ||
644 | #undef pt_element_t | 643 | #undef pt_element_t |
645 | #undef guest_walker | 644 | #undef guest_walker |
646 | #undef FNAME | 645 | #undef FNAME |
647 | #undef PT_BASE_ADDR_MASK | 646 | #undef PT_BASE_ADDR_MASK |
648 | #undef PT_INDEX | 647 | #undef PT_INDEX |
649 | #undef PT_LEVEL_MASK | 648 | #undef PT_LEVEL_MASK |
650 | #undef PT_LVL_ADDR_MASK | 649 | #undef PT_LVL_ADDR_MASK |
651 | #undef PT_LVL_OFFSET_MASK | 650 | #undef PT_LVL_OFFSET_MASK |
652 | #undef PT_LEVEL_BITS | 651 | #undef PT_LEVEL_BITS |
653 | #undef PT_MAX_FULL_LEVELS | 652 | #undef PT_MAX_FULL_LEVELS |
654 | #undef gpte_to_gfn | 653 | #undef gpte_to_gfn |
655 | #undef gpte_to_gfn_lvl | 654 | #undef gpte_to_gfn_lvl |
656 | #undef CMPXCHG | 655 | #undef CMPXCHG |
657 | 656 |
arch/x86/kvm/x86.c
1 | /* | 1 | /* |
2 | * Kernel-based Virtual Machine driver for Linux | 2 | * Kernel-based Virtual Machine driver for Linux |
3 | * | 3 | * |
4 | * derived from drivers/kvm/kvm_main.c | 4 | * derived from drivers/kvm/kvm_main.c |
5 | * | 5 | * |
6 | * Copyright (C) 2006 Qumranet, Inc. | 6 | * Copyright (C) 2006 Qumranet, Inc. |
7 | * Copyright (C) 2008 Qumranet, Inc. | 7 | * Copyright (C) 2008 Qumranet, Inc. |
8 | * Copyright IBM Corporation, 2008 | 8 | * Copyright IBM Corporation, 2008 |
9 | * Copyright 2010 Red Hat, Inc. and/or its affilates. | 9 | * Copyright 2010 Red Hat, Inc. and/or its affilates. |
10 | * | 10 | * |
11 | * Authors: | 11 | * Authors: |
12 | * Avi Kivity <avi@qumranet.com> | 12 | * Avi Kivity <avi@qumranet.com> |
13 | * Yaniv Kamay <yaniv@qumranet.com> | 13 | * Yaniv Kamay <yaniv@qumranet.com> |
14 | * Amit Shah <amit.shah@qumranet.com> | 14 | * Amit Shah <amit.shah@qumranet.com> |
15 | * Ben-Ami Yassour <benami@il.ibm.com> | 15 | * Ben-Ami Yassour <benami@il.ibm.com> |
16 | * | 16 | * |
17 | * This work is licensed under the terms of the GNU GPL, version 2. See | 17 | * This work is licensed under the terms of the GNU GPL, version 2. See |
18 | * the COPYING file in the top-level directory. | 18 | * the COPYING file in the top-level directory. |
19 | * | 19 | * |
20 | */ | 20 | */ |
21 | 21 | ||
22 | #include <linux/kvm_host.h> | 22 | #include <linux/kvm_host.h> |
23 | #include "irq.h" | 23 | #include "irq.h" |
24 | #include "mmu.h" | 24 | #include "mmu.h" |
25 | #include "i8254.h" | 25 | #include "i8254.h" |
26 | #include "tss.h" | 26 | #include "tss.h" |
27 | #include "kvm_cache_regs.h" | 27 | #include "kvm_cache_regs.h" |
28 | #include "x86.h" | 28 | #include "x86.h" |
29 | 29 | ||
30 | #include <linux/clocksource.h> | 30 | #include <linux/clocksource.h> |
31 | #include <linux/interrupt.h> | 31 | #include <linux/interrupt.h> |
32 | #include <linux/kvm.h> | 32 | #include <linux/kvm.h> |
33 | #include <linux/fs.h> | 33 | #include <linux/fs.h> |
34 | #include <linux/vmalloc.h> | 34 | #include <linux/vmalloc.h> |
35 | #include <linux/module.h> | 35 | #include <linux/module.h> |
36 | #include <linux/mman.h> | 36 | #include <linux/mman.h> |
37 | #include <linux/highmem.h> | 37 | #include <linux/highmem.h> |
38 | #include <linux/iommu.h> | 38 | #include <linux/iommu.h> |
39 | #include <linux/intel-iommu.h> | 39 | #include <linux/intel-iommu.h> |
40 | #include <linux/cpufreq.h> | 40 | #include <linux/cpufreq.h> |
41 | #include <linux/user-return-notifier.h> | 41 | #include <linux/user-return-notifier.h> |
42 | #include <linux/srcu.h> | 42 | #include <linux/srcu.h> |
43 | #include <linux/slab.h> | 43 | #include <linux/slab.h> |
44 | #include <linux/perf_event.h> | 44 | #include <linux/perf_event.h> |
45 | #include <linux/uaccess.h> | 45 | #include <linux/uaccess.h> |
46 | #include <trace/events/kvm.h> | 46 | #include <trace/events/kvm.h> |
47 | 47 | ||
48 | #define CREATE_TRACE_POINTS | 48 | #define CREATE_TRACE_POINTS |
49 | #include "trace.h" | 49 | #include "trace.h" |
50 | 50 | ||
51 | #include <asm/debugreg.h> | 51 | #include <asm/debugreg.h> |
52 | #include <asm/msr.h> | 52 | #include <asm/msr.h> |
53 | #include <asm/desc.h> | 53 | #include <asm/desc.h> |
54 | #include <asm/mtrr.h> | 54 | #include <asm/mtrr.h> |
55 | #include <asm/mce.h> | 55 | #include <asm/mce.h> |
56 | #include <asm/i387.h> | 56 | #include <asm/i387.h> |
57 | #include <asm/xcr.h> | 57 | #include <asm/xcr.h> |
58 | 58 | ||
59 | #define MAX_IO_MSRS 256 | 59 | #define MAX_IO_MSRS 256 |
60 | #define CR0_RESERVED_BITS \ | 60 | #define CR0_RESERVED_BITS \ |
61 | (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ | 61 | (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ |
62 | | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ | 62 | | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ |
63 | | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) | 63 | | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) |
64 | #define CR4_RESERVED_BITS \ | 64 | #define CR4_RESERVED_BITS \ |
65 | (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | 65 | (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ |
66 | | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | 66 | | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ |
67 | | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ | 67 | | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ |
68 | | X86_CR4_OSXSAVE \ | 68 | | X86_CR4_OSXSAVE \ |
69 | | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) | 69 | | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) |
70 | 70 | ||
71 | #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) | 71 | #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) |
72 | 72 | ||
73 | #define KVM_MAX_MCE_BANKS 32 | 73 | #define KVM_MAX_MCE_BANKS 32 |
74 | #define KVM_MCE_CAP_SUPPORTED MCG_CTL_P | 74 | #define KVM_MCE_CAP_SUPPORTED MCG_CTL_P |
75 | 75 | ||
76 | /* EFER defaults: | 76 | /* EFER defaults: |
77 | * - enable syscall per default because its emulated by KVM | 77 | * - enable syscall per default because its emulated by KVM |
78 | * - enable LME and LMA per default on 64 bit KVM | 78 | * - enable LME and LMA per default on 64 bit KVM |
79 | */ | 79 | */ |
80 | #ifdef CONFIG_X86_64 | 80 | #ifdef CONFIG_X86_64 |
81 | static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffafeULL; | 81 | static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffafeULL; |
82 | #else | 82 | #else |
83 | static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffffeULL; | 83 | static u64 __read_mostly efer_reserved_bits = 0xfffffffffffffffeULL; |
84 | #endif | 84 | #endif |
85 | 85 | ||
86 | #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM | 86 | #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM |
87 | #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU | 87 | #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU |
88 | 88 | ||
89 | static void update_cr8_intercept(struct kvm_vcpu *vcpu); | 89 | static void update_cr8_intercept(struct kvm_vcpu *vcpu); |
90 | static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, | 90 | static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, |
91 | struct kvm_cpuid_entry2 __user *entries); | 91 | struct kvm_cpuid_entry2 __user *entries); |
92 | 92 | ||
93 | struct kvm_x86_ops *kvm_x86_ops; | 93 | struct kvm_x86_ops *kvm_x86_ops; |
94 | EXPORT_SYMBOL_GPL(kvm_x86_ops); | 94 | EXPORT_SYMBOL_GPL(kvm_x86_ops); |
95 | 95 | ||
96 | int ignore_msrs = 0; | 96 | int ignore_msrs = 0; |
97 | module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR); | 97 | module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR); |
98 | 98 | ||
99 | #define KVM_NR_SHARED_MSRS 16 | 99 | #define KVM_NR_SHARED_MSRS 16 |
100 | 100 | ||
101 | struct kvm_shared_msrs_global { | 101 | struct kvm_shared_msrs_global { |
102 | int nr; | 102 | int nr; |
103 | u32 msrs[KVM_NR_SHARED_MSRS]; | 103 | u32 msrs[KVM_NR_SHARED_MSRS]; |
104 | }; | 104 | }; |
105 | 105 | ||
106 | struct kvm_shared_msrs { | 106 | struct kvm_shared_msrs { |
107 | struct user_return_notifier urn; | 107 | struct user_return_notifier urn; |
108 | bool registered; | 108 | bool registered; |
109 | struct kvm_shared_msr_values { | 109 | struct kvm_shared_msr_values { |
110 | u64 host; | 110 | u64 host; |
111 | u64 curr; | 111 | u64 curr; |
112 | } values[KVM_NR_SHARED_MSRS]; | 112 | } values[KVM_NR_SHARED_MSRS]; |
113 | }; | 113 | }; |
114 | 114 | ||
115 | static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; | 115 | static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; |
116 | static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs); | 116 | static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs); |
117 | 117 | ||
118 | struct kvm_stats_debugfs_item debugfs_entries[] = { | 118 | struct kvm_stats_debugfs_item debugfs_entries[] = { |
119 | { "pf_fixed", VCPU_STAT(pf_fixed) }, | 119 | { "pf_fixed", VCPU_STAT(pf_fixed) }, |
120 | { "pf_guest", VCPU_STAT(pf_guest) }, | 120 | { "pf_guest", VCPU_STAT(pf_guest) }, |
121 | { "tlb_flush", VCPU_STAT(tlb_flush) }, | 121 | { "tlb_flush", VCPU_STAT(tlb_flush) }, |
122 | { "invlpg", VCPU_STAT(invlpg) }, | 122 | { "invlpg", VCPU_STAT(invlpg) }, |
123 | { "exits", VCPU_STAT(exits) }, | 123 | { "exits", VCPU_STAT(exits) }, |
124 | { "io_exits", VCPU_STAT(io_exits) }, | 124 | { "io_exits", VCPU_STAT(io_exits) }, |
125 | { "mmio_exits", VCPU_STAT(mmio_exits) }, | 125 | { "mmio_exits", VCPU_STAT(mmio_exits) }, |
126 | { "signal_exits", VCPU_STAT(signal_exits) }, | 126 | { "signal_exits", VCPU_STAT(signal_exits) }, |
127 | { "irq_window", VCPU_STAT(irq_window_exits) }, | 127 | { "irq_window", VCPU_STAT(irq_window_exits) }, |
128 | { "nmi_window", VCPU_STAT(nmi_window_exits) }, | 128 | { "nmi_window", VCPU_STAT(nmi_window_exits) }, |
129 | { "halt_exits", VCPU_STAT(halt_exits) }, | 129 | { "halt_exits", VCPU_STAT(halt_exits) }, |
130 | { "halt_wakeup", VCPU_STAT(halt_wakeup) }, | 130 | { "halt_wakeup", VCPU_STAT(halt_wakeup) }, |
131 | { "hypercalls", VCPU_STAT(hypercalls) }, | 131 | { "hypercalls", VCPU_STAT(hypercalls) }, |
132 | { "request_irq", VCPU_STAT(request_irq_exits) }, | 132 | { "request_irq", VCPU_STAT(request_irq_exits) }, |
133 | { "irq_exits", VCPU_STAT(irq_exits) }, | 133 | { "irq_exits", VCPU_STAT(irq_exits) }, |
134 | { "host_state_reload", VCPU_STAT(host_state_reload) }, | 134 | { "host_state_reload", VCPU_STAT(host_state_reload) }, |
135 | { "efer_reload", VCPU_STAT(efer_reload) }, | 135 | { "efer_reload", VCPU_STAT(efer_reload) }, |
136 | { "fpu_reload", VCPU_STAT(fpu_reload) }, | 136 | { "fpu_reload", VCPU_STAT(fpu_reload) }, |
137 | { "insn_emulation", VCPU_STAT(insn_emulation) }, | 137 | { "insn_emulation", VCPU_STAT(insn_emulation) }, |
138 | { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, | 138 | { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, |
139 | { "irq_injections", VCPU_STAT(irq_injections) }, | 139 | { "irq_injections", VCPU_STAT(irq_injections) }, |
140 | { "nmi_injections", VCPU_STAT(nmi_injections) }, | 140 | { "nmi_injections", VCPU_STAT(nmi_injections) }, |
141 | { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, | 141 | { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, |
142 | { "mmu_pte_write", VM_STAT(mmu_pte_write) }, | 142 | { "mmu_pte_write", VM_STAT(mmu_pte_write) }, |
143 | { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, | 143 | { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, |
144 | { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) }, | 144 | { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) }, |
145 | { "mmu_flooded", VM_STAT(mmu_flooded) }, | 145 | { "mmu_flooded", VM_STAT(mmu_flooded) }, |
146 | { "mmu_recycled", VM_STAT(mmu_recycled) }, | 146 | { "mmu_recycled", VM_STAT(mmu_recycled) }, |
147 | { "mmu_cache_miss", VM_STAT(mmu_cache_miss) }, | 147 | { "mmu_cache_miss", VM_STAT(mmu_cache_miss) }, |
148 | { "mmu_unsync", VM_STAT(mmu_unsync) }, | 148 | { "mmu_unsync", VM_STAT(mmu_unsync) }, |
149 | { "remote_tlb_flush", VM_STAT(remote_tlb_flush) }, | 149 | { "remote_tlb_flush", VM_STAT(remote_tlb_flush) }, |
150 | { "largepages", VM_STAT(lpages) }, | 150 | { "largepages", VM_STAT(lpages) }, |
151 | { NULL } | 151 | { NULL } |
152 | }; | 152 | }; |
153 | 153 | ||
154 | u64 __read_mostly host_xcr0; | 154 | u64 __read_mostly host_xcr0; |
155 | 155 | ||
156 | static inline u32 bit(int bitno) | 156 | static inline u32 bit(int bitno) |
157 | { | 157 | { |
158 | return 1 << (bitno & 31); | 158 | return 1 << (bitno & 31); |
159 | } | 159 | } |
160 | 160 | ||
161 | static void kvm_on_user_return(struct user_return_notifier *urn) | 161 | static void kvm_on_user_return(struct user_return_notifier *urn) |
162 | { | 162 | { |
163 | unsigned slot; | 163 | unsigned slot; |
164 | struct kvm_shared_msrs *locals | 164 | struct kvm_shared_msrs *locals |
165 | = container_of(urn, struct kvm_shared_msrs, urn); | 165 | = container_of(urn, struct kvm_shared_msrs, urn); |
166 | struct kvm_shared_msr_values *values; | 166 | struct kvm_shared_msr_values *values; |
167 | 167 | ||
168 | for (slot = 0; slot < shared_msrs_global.nr; ++slot) { | 168 | for (slot = 0; slot < shared_msrs_global.nr; ++slot) { |
169 | values = &locals->values[slot]; | 169 | values = &locals->values[slot]; |
170 | if (values->host != values->curr) { | 170 | if (values->host != values->curr) { |
171 | wrmsrl(shared_msrs_global.msrs[slot], values->host); | 171 | wrmsrl(shared_msrs_global.msrs[slot], values->host); |
172 | values->curr = values->host; | 172 | values->curr = values->host; |
173 | } | 173 | } |
174 | } | 174 | } |
175 | locals->registered = false; | 175 | locals->registered = false; |
176 | user_return_notifier_unregister(urn); | 176 | user_return_notifier_unregister(urn); |
177 | } | 177 | } |
178 | 178 | ||
179 | static void shared_msr_update(unsigned slot, u32 msr) | 179 | static void shared_msr_update(unsigned slot, u32 msr) |
180 | { | 180 | { |
181 | struct kvm_shared_msrs *smsr; | 181 | struct kvm_shared_msrs *smsr; |
182 | u64 value; | 182 | u64 value; |
183 | 183 | ||
184 | smsr = &__get_cpu_var(shared_msrs); | 184 | smsr = &__get_cpu_var(shared_msrs); |
185 | /* only read, and nobody should modify it at this time, | 185 | /* only read, and nobody should modify it at this time, |
186 | * so don't need lock */ | 186 | * so don't need lock */ |
187 | if (slot >= shared_msrs_global.nr) { | 187 | if (slot >= shared_msrs_global.nr) { |
188 | printk(KERN_ERR "kvm: invalid MSR slot!"); | 188 | printk(KERN_ERR "kvm: invalid MSR slot!"); |
189 | return; | 189 | return; |
190 | } | 190 | } |
191 | rdmsrl_safe(msr, &value); | 191 | rdmsrl_safe(msr, &value); |
192 | smsr->values[slot].host = value; | 192 | smsr->values[slot].host = value; |
193 | smsr->values[slot].curr = value; | 193 | smsr->values[slot].curr = value; |
194 | } | 194 | } |
195 | 195 | ||
196 | void kvm_define_shared_msr(unsigned slot, u32 msr) | 196 | void kvm_define_shared_msr(unsigned slot, u32 msr) |
197 | { | 197 | { |
198 | if (slot >= shared_msrs_global.nr) | 198 | if (slot >= shared_msrs_global.nr) |
199 | shared_msrs_global.nr = slot + 1; | 199 | shared_msrs_global.nr = slot + 1; |
200 | shared_msrs_global.msrs[slot] = msr; | 200 | shared_msrs_global.msrs[slot] = msr; |
201 | /* we need ensured the shared_msr_global have been updated */ | 201 | /* we need ensured the shared_msr_global have been updated */ |
202 | smp_wmb(); | 202 | smp_wmb(); |
203 | } | 203 | } |
204 | EXPORT_SYMBOL_GPL(kvm_define_shared_msr); | 204 | EXPORT_SYMBOL_GPL(kvm_define_shared_msr); |
205 | 205 | ||
206 | static void kvm_shared_msr_cpu_online(void) | 206 | static void kvm_shared_msr_cpu_online(void) |
207 | { | 207 | { |
208 | unsigned i; | 208 | unsigned i; |
209 | 209 | ||
210 | for (i = 0; i < shared_msrs_global.nr; ++i) | 210 | for (i = 0; i < shared_msrs_global.nr; ++i) |
211 | shared_msr_update(i, shared_msrs_global.msrs[i]); | 211 | shared_msr_update(i, shared_msrs_global.msrs[i]); |
212 | } | 212 | } |
213 | 213 | ||
214 | void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) | 214 | void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) |
215 | { | 215 | { |
216 | struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); | 216 | struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); |
217 | 217 | ||
218 | if (((value ^ smsr->values[slot].curr) & mask) == 0) | 218 | if (((value ^ smsr->values[slot].curr) & mask) == 0) |
219 | return; | 219 | return; |
220 | smsr->values[slot].curr = value; | 220 | smsr->values[slot].curr = value; |
221 | wrmsrl(shared_msrs_global.msrs[slot], value); | 221 | wrmsrl(shared_msrs_global.msrs[slot], value); |
222 | if (!smsr->registered) { | 222 | if (!smsr->registered) { |
223 | smsr->urn.on_user_return = kvm_on_user_return; | 223 | smsr->urn.on_user_return = kvm_on_user_return; |
224 | user_return_notifier_register(&smsr->urn); | 224 | user_return_notifier_register(&smsr->urn); |
225 | smsr->registered = true; | 225 | smsr->registered = true; |
226 | } | 226 | } |
227 | } | 227 | } |
228 | EXPORT_SYMBOL_GPL(kvm_set_shared_msr); | 228 | EXPORT_SYMBOL_GPL(kvm_set_shared_msr); |
229 | 229 | ||
230 | static void drop_user_return_notifiers(void *ignore) | 230 | static void drop_user_return_notifiers(void *ignore) |
231 | { | 231 | { |
232 | struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); | 232 | struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs); |
233 | 233 | ||
234 | if (smsr->registered) | 234 | if (smsr->registered) |
235 | kvm_on_user_return(&smsr->urn); | 235 | kvm_on_user_return(&smsr->urn); |
236 | } | 236 | } |
237 | 237 | ||
238 | u64 kvm_get_apic_base(struct kvm_vcpu *vcpu) | 238 | u64 kvm_get_apic_base(struct kvm_vcpu *vcpu) |
239 | { | 239 | { |
240 | if (irqchip_in_kernel(vcpu->kvm)) | 240 | if (irqchip_in_kernel(vcpu->kvm)) |
241 | return vcpu->arch.apic_base; | 241 | return vcpu->arch.apic_base; |
242 | else | 242 | else |
243 | return vcpu->arch.apic_base; | 243 | return vcpu->arch.apic_base; |
244 | } | 244 | } |
245 | EXPORT_SYMBOL_GPL(kvm_get_apic_base); | 245 | EXPORT_SYMBOL_GPL(kvm_get_apic_base); |
246 | 246 | ||
247 | void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) | 247 | void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) |
248 | { | 248 | { |
249 | /* TODO: reserve bits check */ | 249 | /* TODO: reserve bits check */ |
250 | if (irqchip_in_kernel(vcpu->kvm)) | 250 | if (irqchip_in_kernel(vcpu->kvm)) |
251 | kvm_lapic_set_base(vcpu, data); | 251 | kvm_lapic_set_base(vcpu, data); |
252 | else | 252 | else |
253 | vcpu->arch.apic_base = data; | 253 | vcpu->arch.apic_base = data; |
254 | } | 254 | } |
255 | EXPORT_SYMBOL_GPL(kvm_set_apic_base); | 255 | EXPORT_SYMBOL_GPL(kvm_set_apic_base); |
256 | 256 | ||
257 | #define EXCPT_BENIGN 0 | 257 | #define EXCPT_BENIGN 0 |
258 | #define EXCPT_CONTRIBUTORY 1 | 258 | #define EXCPT_CONTRIBUTORY 1 |
259 | #define EXCPT_PF 2 | 259 | #define EXCPT_PF 2 |
260 | 260 | ||
261 | static int exception_class(int vector) | 261 | static int exception_class(int vector) |
262 | { | 262 | { |
263 | switch (vector) { | 263 | switch (vector) { |
264 | case PF_VECTOR: | 264 | case PF_VECTOR: |
265 | return EXCPT_PF; | 265 | return EXCPT_PF; |
266 | case DE_VECTOR: | 266 | case DE_VECTOR: |
267 | case TS_VECTOR: | 267 | case TS_VECTOR: |
268 | case NP_VECTOR: | 268 | case NP_VECTOR: |
269 | case SS_VECTOR: | 269 | case SS_VECTOR: |
270 | case GP_VECTOR: | 270 | case GP_VECTOR: |
271 | return EXCPT_CONTRIBUTORY; | 271 | return EXCPT_CONTRIBUTORY; |
272 | default: | 272 | default: |
273 | break; | 273 | break; |
274 | } | 274 | } |
275 | return EXCPT_BENIGN; | 275 | return EXCPT_BENIGN; |
276 | } | 276 | } |
277 | 277 | ||
278 | static void kvm_multiple_exception(struct kvm_vcpu *vcpu, | 278 | static void kvm_multiple_exception(struct kvm_vcpu *vcpu, |
279 | unsigned nr, bool has_error, u32 error_code, | 279 | unsigned nr, bool has_error, u32 error_code, |
280 | bool reinject) | 280 | bool reinject) |
281 | { | 281 | { |
282 | u32 prev_nr; | 282 | u32 prev_nr; |
283 | int class1, class2; | 283 | int class1, class2; |
284 | 284 | ||
285 | if (!vcpu->arch.exception.pending) { | 285 | if (!vcpu->arch.exception.pending) { |
286 | queue: | 286 | queue: |
287 | vcpu->arch.exception.pending = true; | 287 | vcpu->arch.exception.pending = true; |
288 | vcpu->arch.exception.has_error_code = has_error; | 288 | vcpu->arch.exception.has_error_code = has_error; |
289 | vcpu->arch.exception.nr = nr; | 289 | vcpu->arch.exception.nr = nr; |
290 | vcpu->arch.exception.error_code = error_code; | 290 | vcpu->arch.exception.error_code = error_code; |
291 | vcpu->arch.exception.reinject = reinject; | 291 | vcpu->arch.exception.reinject = reinject; |
292 | return; | 292 | return; |
293 | } | 293 | } |
294 | 294 | ||
295 | /* to check exception */ | 295 | /* to check exception */ |
296 | prev_nr = vcpu->arch.exception.nr; | 296 | prev_nr = vcpu->arch.exception.nr; |
297 | if (prev_nr == DF_VECTOR) { | 297 | if (prev_nr == DF_VECTOR) { |
298 | /* triple fault -> shutdown */ | 298 | /* triple fault -> shutdown */ |
299 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); | 299 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); |
300 | return; | 300 | return; |
301 | } | 301 | } |
302 | class1 = exception_class(prev_nr); | 302 | class1 = exception_class(prev_nr); |
303 | class2 = exception_class(nr); | 303 | class2 = exception_class(nr); |
304 | if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) | 304 | if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) |
305 | || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) { | 305 | || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) { |
306 | /* generate double fault per SDM Table 5-5 */ | 306 | /* generate double fault per SDM Table 5-5 */ |
307 | vcpu->arch.exception.pending = true; | 307 | vcpu->arch.exception.pending = true; |
308 | vcpu->arch.exception.has_error_code = true; | 308 | vcpu->arch.exception.has_error_code = true; |
309 | vcpu->arch.exception.nr = DF_VECTOR; | 309 | vcpu->arch.exception.nr = DF_VECTOR; |
310 | vcpu->arch.exception.error_code = 0; | 310 | vcpu->arch.exception.error_code = 0; |
311 | } else | 311 | } else |
312 | /* replace previous exception with a new one in a hope | 312 | /* replace previous exception with a new one in a hope |
313 | that instruction re-execution will regenerate lost | 313 | that instruction re-execution will regenerate lost |
314 | exception */ | 314 | exception */ |
315 | goto queue; | 315 | goto queue; |
316 | } | 316 | } |
317 | 317 | ||
318 | void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) | 318 | void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) |
319 | { | 319 | { |
320 | kvm_multiple_exception(vcpu, nr, false, 0, false); | 320 | kvm_multiple_exception(vcpu, nr, false, 0, false); |
321 | } | 321 | } |
322 | EXPORT_SYMBOL_GPL(kvm_queue_exception); | 322 | EXPORT_SYMBOL_GPL(kvm_queue_exception); |
323 | 323 | ||
324 | void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr) | 324 | void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr) |
325 | { | 325 | { |
326 | kvm_multiple_exception(vcpu, nr, false, 0, true); | 326 | kvm_multiple_exception(vcpu, nr, false, 0, true); |
327 | } | 327 | } |
328 | EXPORT_SYMBOL_GPL(kvm_requeue_exception); | 328 | EXPORT_SYMBOL_GPL(kvm_requeue_exception); |
329 | 329 | ||
330 | void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr, | 330 | void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr, |
331 | u32 error_code) | 331 | u32 error_code) |
332 | { | 332 | { |
333 | ++vcpu->stat.pf_guest; | 333 | ++vcpu->stat.pf_guest; |
334 | vcpu->arch.cr2 = addr; | 334 | vcpu->arch.cr2 = addr; |
335 | kvm_queue_exception_e(vcpu, PF_VECTOR, error_code); | 335 | kvm_queue_exception_e(vcpu, PF_VECTOR, error_code); |
336 | } | 336 | } |
337 | 337 | ||
338 | void kvm_inject_nmi(struct kvm_vcpu *vcpu) | 338 | void kvm_inject_nmi(struct kvm_vcpu *vcpu) |
339 | { | 339 | { |
340 | vcpu->arch.nmi_pending = 1; | 340 | vcpu->arch.nmi_pending = 1; |
341 | } | 341 | } |
342 | EXPORT_SYMBOL_GPL(kvm_inject_nmi); | 342 | EXPORT_SYMBOL_GPL(kvm_inject_nmi); |
343 | 343 | ||
344 | void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) | 344 | void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) |
345 | { | 345 | { |
346 | kvm_multiple_exception(vcpu, nr, true, error_code, false); | 346 | kvm_multiple_exception(vcpu, nr, true, error_code, false); |
347 | } | 347 | } |
348 | EXPORT_SYMBOL_GPL(kvm_queue_exception_e); | 348 | EXPORT_SYMBOL_GPL(kvm_queue_exception_e); |
349 | 349 | ||
350 | void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) | 350 | void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) |
351 | { | 351 | { |
352 | kvm_multiple_exception(vcpu, nr, true, error_code, true); | 352 | kvm_multiple_exception(vcpu, nr, true, error_code, true); |
353 | } | 353 | } |
354 | EXPORT_SYMBOL_GPL(kvm_requeue_exception_e); | 354 | EXPORT_SYMBOL_GPL(kvm_requeue_exception_e); |
355 | 355 | ||
356 | /* | 356 | /* |
357 | * Checks if cpl <= required_cpl; if true, return true. Otherwise queue | 357 | * Checks if cpl <= required_cpl; if true, return true. Otherwise queue |
358 | * a #GP and return false. | 358 | * a #GP and return false. |
359 | */ | 359 | */ |
360 | bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl) | 360 | bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl) |
361 | { | 361 | { |
362 | if (kvm_x86_ops->get_cpl(vcpu) <= required_cpl) | 362 | if (kvm_x86_ops->get_cpl(vcpu) <= required_cpl) |
363 | return true; | 363 | return true; |
364 | kvm_queue_exception_e(vcpu, GP_VECTOR, 0); | 364 | kvm_queue_exception_e(vcpu, GP_VECTOR, 0); |
365 | return false; | 365 | return false; |
366 | } | 366 | } |
367 | EXPORT_SYMBOL_GPL(kvm_require_cpl); | 367 | EXPORT_SYMBOL_GPL(kvm_require_cpl); |
368 | 368 | ||
369 | /* | 369 | /* |
370 | * Load the pae pdptrs. Return true is they are all valid. | 370 | * Load the pae pdptrs. Return true is they are all valid. |
371 | */ | 371 | */ |
372 | int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3) | 372 | int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3) |
373 | { | 373 | { |
374 | gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT; | 374 | gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT; |
375 | unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2; | 375 | unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2; |
376 | int i; | 376 | int i; |
377 | int ret; | 377 | int ret; |
378 | u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; | 378 | u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; |
379 | 379 | ||
380 | ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte, | 380 | ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte, |
381 | offset * sizeof(u64), sizeof(pdpte)); | 381 | offset * sizeof(u64), sizeof(pdpte)); |
382 | if (ret < 0) { | 382 | if (ret < 0) { |
383 | ret = 0; | 383 | ret = 0; |
384 | goto out; | 384 | goto out; |
385 | } | 385 | } |
386 | for (i = 0; i < ARRAY_SIZE(pdpte); ++i) { | 386 | for (i = 0; i < ARRAY_SIZE(pdpte); ++i) { |
387 | if (is_present_gpte(pdpte[i]) && | 387 | if (is_present_gpte(pdpte[i]) && |
388 | (pdpte[i] & vcpu->arch.mmu.rsvd_bits_mask[0][2])) { | 388 | (pdpte[i] & vcpu->arch.mmu.rsvd_bits_mask[0][2])) { |
389 | ret = 0; | 389 | ret = 0; |
390 | goto out; | 390 | goto out; |
391 | } | 391 | } |
392 | } | 392 | } |
393 | ret = 1; | 393 | ret = 1; |
394 | 394 | ||
395 | memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs)); | 395 | memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs)); |
396 | __set_bit(VCPU_EXREG_PDPTR, | 396 | __set_bit(VCPU_EXREG_PDPTR, |
397 | (unsigned long *)&vcpu->arch.regs_avail); | 397 | (unsigned long *)&vcpu->arch.regs_avail); |
398 | __set_bit(VCPU_EXREG_PDPTR, | 398 | __set_bit(VCPU_EXREG_PDPTR, |
399 | (unsigned long *)&vcpu->arch.regs_dirty); | 399 | (unsigned long *)&vcpu->arch.regs_dirty); |
400 | out: | 400 | out: |
401 | 401 | ||
402 | return ret; | 402 | return ret; |
403 | } | 403 | } |
404 | EXPORT_SYMBOL_GPL(load_pdptrs); | 404 | EXPORT_SYMBOL_GPL(load_pdptrs); |
405 | 405 | ||
406 | static bool pdptrs_changed(struct kvm_vcpu *vcpu) | 406 | static bool pdptrs_changed(struct kvm_vcpu *vcpu) |
407 | { | 407 | { |
408 | u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; | 408 | u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; |
409 | bool changed = true; | 409 | bool changed = true; |
410 | int r; | 410 | int r; |
411 | 411 | ||
412 | if (is_long_mode(vcpu) || !is_pae(vcpu)) | 412 | if (is_long_mode(vcpu) || !is_pae(vcpu)) |
413 | return false; | 413 | return false; |
414 | 414 | ||
415 | if (!test_bit(VCPU_EXREG_PDPTR, | 415 | if (!test_bit(VCPU_EXREG_PDPTR, |
416 | (unsigned long *)&vcpu->arch.regs_avail)) | 416 | (unsigned long *)&vcpu->arch.regs_avail)) |
417 | return true; | 417 | return true; |
418 | 418 | ||
419 | r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, sizeof(pdpte)); | 419 | r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, sizeof(pdpte)); |
420 | if (r < 0) | 420 | if (r < 0) |
421 | goto out; | 421 | goto out; |
422 | changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0; | 422 | changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0; |
423 | out: | 423 | out: |
424 | 424 | ||
425 | return changed; | 425 | return changed; |
426 | } | 426 | } |
427 | 427 | ||
428 | int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) | 428 | int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) |
429 | { | 429 | { |
430 | unsigned long old_cr0 = kvm_read_cr0(vcpu); | 430 | unsigned long old_cr0 = kvm_read_cr0(vcpu); |
431 | unsigned long update_bits = X86_CR0_PG | X86_CR0_WP | | 431 | unsigned long update_bits = X86_CR0_PG | X86_CR0_WP | |
432 | X86_CR0_CD | X86_CR0_NW; | 432 | X86_CR0_CD | X86_CR0_NW; |
433 | 433 | ||
434 | cr0 |= X86_CR0_ET; | 434 | cr0 |= X86_CR0_ET; |
435 | 435 | ||
436 | #ifdef CONFIG_X86_64 | 436 | #ifdef CONFIG_X86_64 |
437 | if (cr0 & 0xffffffff00000000UL) | 437 | if (cr0 & 0xffffffff00000000UL) |
438 | return 1; | 438 | return 1; |
439 | #endif | 439 | #endif |
440 | 440 | ||
441 | cr0 &= ~CR0_RESERVED_BITS; | 441 | cr0 &= ~CR0_RESERVED_BITS; |
442 | 442 | ||
443 | if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) | 443 | if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) |
444 | return 1; | 444 | return 1; |
445 | 445 | ||
446 | if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) | 446 | if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) |
447 | return 1; | 447 | return 1; |
448 | 448 | ||
449 | if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { | 449 | if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { |
450 | #ifdef CONFIG_X86_64 | 450 | #ifdef CONFIG_X86_64 |
451 | if ((vcpu->arch.efer & EFER_LME)) { | 451 | if ((vcpu->arch.efer & EFER_LME)) { |
452 | int cs_db, cs_l; | 452 | int cs_db, cs_l; |
453 | 453 | ||
454 | if (!is_pae(vcpu)) | 454 | if (!is_pae(vcpu)) |
455 | return 1; | 455 | return 1; |
456 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); | 456 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); |
457 | if (cs_l) | 457 | if (cs_l) |
458 | return 1; | 458 | return 1; |
459 | } else | 459 | } else |
460 | #endif | 460 | #endif |
461 | if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) | 461 | if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) |
462 | return 1; | 462 | return 1; |
463 | } | 463 | } |
464 | 464 | ||
465 | kvm_x86_ops->set_cr0(vcpu, cr0); | 465 | kvm_x86_ops->set_cr0(vcpu, cr0); |
466 | 466 | ||
467 | if ((cr0 ^ old_cr0) & update_bits) | 467 | if ((cr0 ^ old_cr0) & update_bits) |
468 | kvm_mmu_reset_context(vcpu); | 468 | kvm_mmu_reset_context(vcpu); |
469 | return 0; | 469 | return 0; |
470 | } | 470 | } |
471 | EXPORT_SYMBOL_GPL(kvm_set_cr0); | 471 | EXPORT_SYMBOL_GPL(kvm_set_cr0); |
472 | 472 | ||
473 | void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) | 473 | void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) |
474 | { | 474 | { |
475 | (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f)); | 475 | (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f)); |
476 | } | 476 | } |
477 | EXPORT_SYMBOL_GPL(kvm_lmsw); | 477 | EXPORT_SYMBOL_GPL(kvm_lmsw); |
478 | 478 | ||
479 | int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) | 479 | int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) |
480 | { | 480 | { |
481 | u64 xcr0; | 481 | u64 xcr0; |
482 | 482 | ||
483 | /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ | 483 | /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ |
484 | if (index != XCR_XFEATURE_ENABLED_MASK) | 484 | if (index != XCR_XFEATURE_ENABLED_MASK) |
485 | return 1; | 485 | return 1; |
486 | xcr0 = xcr; | 486 | xcr0 = xcr; |
487 | if (kvm_x86_ops->get_cpl(vcpu) != 0) | 487 | if (kvm_x86_ops->get_cpl(vcpu) != 0) |
488 | return 1; | 488 | return 1; |
489 | if (!(xcr0 & XSTATE_FP)) | 489 | if (!(xcr0 & XSTATE_FP)) |
490 | return 1; | 490 | return 1; |
491 | if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE)) | 491 | if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE)) |
492 | return 1; | 492 | return 1; |
493 | if (xcr0 & ~host_xcr0) | 493 | if (xcr0 & ~host_xcr0) |
494 | return 1; | 494 | return 1; |
495 | vcpu->arch.xcr0 = xcr0; | 495 | vcpu->arch.xcr0 = xcr0; |
496 | vcpu->guest_xcr0_loaded = 0; | 496 | vcpu->guest_xcr0_loaded = 0; |
497 | return 0; | 497 | return 0; |
498 | } | 498 | } |
499 | 499 | ||
500 | int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) | 500 | int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) |
501 | { | 501 | { |
502 | if (__kvm_set_xcr(vcpu, index, xcr)) { | 502 | if (__kvm_set_xcr(vcpu, index, xcr)) { |
503 | kvm_inject_gp(vcpu, 0); | 503 | kvm_inject_gp(vcpu, 0); |
504 | return 1; | 504 | return 1; |
505 | } | 505 | } |
506 | return 0; | 506 | return 0; |
507 | } | 507 | } |
508 | EXPORT_SYMBOL_GPL(kvm_set_xcr); | 508 | EXPORT_SYMBOL_GPL(kvm_set_xcr); |
509 | 509 | ||
510 | static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu) | 510 | static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu) |
511 | { | 511 | { |
512 | struct kvm_cpuid_entry2 *best; | 512 | struct kvm_cpuid_entry2 *best; |
513 | 513 | ||
514 | best = kvm_find_cpuid_entry(vcpu, 1, 0); | 514 | best = kvm_find_cpuid_entry(vcpu, 1, 0); |
515 | return best && (best->ecx & bit(X86_FEATURE_XSAVE)); | 515 | return best && (best->ecx & bit(X86_FEATURE_XSAVE)); |
516 | } | 516 | } |
517 | 517 | ||
518 | static void update_cpuid(struct kvm_vcpu *vcpu) | 518 | static void update_cpuid(struct kvm_vcpu *vcpu) |
519 | { | 519 | { |
520 | struct kvm_cpuid_entry2 *best; | 520 | struct kvm_cpuid_entry2 *best; |
521 | 521 | ||
522 | best = kvm_find_cpuid_entry(vcpu, 1, 0); | 522 | best = kvm_find_cpuid_entry(vcpu, 1, 0); |
523 | if (!best) | 523 | if (!best) |
524 | return; | 524 | return; |
525 | 525 | ||
526 | /* Update OSXSAVE bit */ | 526 | /* Update OSXSAVE bit */ |
527 | if (cpu_has_xsave && best->function == 0x1) { | 527 | if (cpu_has_xsave && best->function == 0x1) { |
528 | best->ecx &= ~(bit(X86_FEATURE_OSXSAVE)); | 528 | best->ecx &= ~(bit(X86_FEATURE_OSXSAVE)); |
529 | if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) | 529 | if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) |
530 | best->ecx |= bit(X86_FEATURE_OSXSAVE); | 530 | best->ecx |= bit(X86_FEATURE_OSXSAVE); |
531 | } | 531 | } |
532 | } | 532 | } |
533 | 533 | ||
534 | int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) | 534 | int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) |
535 | { | 535 | { |
536 | unsigned long old_cr4 = kvm_read_cr4(vcpu); | 536 | unsigned long old_cr4 = kvm_read_cr4(vcpu); |
537 | unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE; | 537 | unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE; |
538 | 538 | ||
539 | if (cr4 & CR4_RESERVED_BITS) | 539 | if (cr4 & CR4_RESERVED_BITS) |
540 | return 1; | 540 | return 1; |
541 | 541 | ||
542 | if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE)) | 542 | if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE)) |
543 | return 1; | 543 | return 1; |
544 | 544 | ||
545 | if (is_long_mode(vcpu)) { | 545 | if (is_long_mode(vcpu)) { |
546 | if (!(cr4 & X86_CR4_PAE)) | 546 | if (!(cr4 & X86_CR4_PAE)) |
547 | return 1; | 547 | return 1; |
548 | } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE) | 548 | } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE) |
549 | && ((cr4 ^ old_cr4) & pdptr_bits) | 549 | && ((cr4 ^ old_cr4) & pdptr_bits) |
550 | && !load_pdptrs(vcpu, vcpu->arch.cr3)) | 550 | && !load_pdptrs(vcpu, vcpu->arch.cr3)) |
551 | return 1; | 551 | return 1; |
552 | 552 | ||
553 | if (cr4 & X86_CR4_VMXE) | 553 | if (cr4 & X86_CR4_VMXE) |
554 | return 1; | 554 | return 1; |
555 | 555 | ||
556 | kvm_x86_ops->set_cr4(vcpu, cr4); | 556 | kvm_x86_ops->set_cr4(vcpu, cr4); |
557 | 557 | ||
558 | if ((cr4 ^ old_cr4) & pdptr_bits) | 558 | if ((cr4 ^ old_cr4) & pdptr_bits) |
559 | kvm_mmu_reset_context(vcpu); | 559 | kvm_mmu_reset_context(vcpu); |
560 | 560 | ||
561 | if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE) | 561 | if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE) |
562 | update_cpuid(vcpu); | 562 | update_cpuid(vcpu); |
563 | 563 | ||
564 | return 0; | 564 | return 0; |
565 | } | 565 | } |
566 | EXPORT_SYMBOL_GPL(kvm_set_cr4); | 566 | EXPORT_SYMBOL_GPL(kvm_set_cr4); |
567 | 567 | ||
568 | int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) | 568 | int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) |
569 | { | 569 | { |
570 | if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) { | 570 | if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) { |
571 | kvm_mmu_sync_roots(vcpu); | 571 | kvm_mmu_sync_roots(vcpu); |
572 | kvm_mmu_flush_tlb(vcpu); | 572 | kvm_mmu_flush_tlb(vcpu); |
573 | return 0; | 573 | return 0; |
574 | } | 574 | } |
575 | 575 | ||
576 | if (is_long_mode(vcpu)) { | 576 | if (is_long_mode(vcpu)) { |
577 | if (cr3 & CR3_L_MODE_RESERVED_BITS) | 577 | if (cr3 & CR3_L_MODE_RESERVED_BITS) |
578 | return 1; | 578 | return 1; |
579 | } else { | 579 | } else { |
580 | if (is_pae(vcpu)) { | 580 | if (is_pae(vcpu)) { |
581 | if (cr3 & CR3_PAE_RESERVED_BITS) | 581 | if (cr3 & CR3_PAE_RESERVED_BITS) |
582 | return 1; | 582 | return 1; |
583 | if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3)) | 583 | if (is_paging(vcpu) && !load_pdptrs(vcpu, cr3)) |
584 | return 1; | 584 | return 1; |
585 | } | 585 | } |
586 | /* | 586 | /* |
587 | * We don't check reserved bits in nonpae mode, because | 587 | * We don't check reserved bits in nonpae mode, because |
588 | * this isn't enforced, and VMware depends on this. | 588 | * this isn't enforced, and VMware depends on this. |
589 | */ | 589 | */ |
590 | } | 590 | } |
591 | 591 | ||
592 | /* | 592 | /* |
593 | * Does the new cr3 value map to physical memory? (Note, we | 593 | * Does the new cr3 value map to physical memory? (Note, we |
594 | * catch an invalid cr3 even in real-mode, because it would | 594 | * catch an invalid cr3 even in real-mode, because it would |
595 | * cause trouble later on when we turn on paging anyway.) | 595 | * cause trouble later on when we turn on paging anyway.) |
596 | * | 596 | * |
597 | * A real CPU would silently accept an invalid cr3 and would | 597 | * A real CPU would silently accept an invalid cr3 and would |
598 | * attempt to use it - with largely undefined (and often hard | 598 | * attempt to use it - with largely undefined (and often hard |
599 | * to debug) behavior on the guest side. | 599 | * to debug) behavior on the guest side. |
600 | */ | 600 | */ |
601 | if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT))) | 601 | if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT))) |
602 | return 1; | 602 | return 1; |
603 | vcpu->arch.cr3 = cr3; | 603 | vcpu->arch.cr3 = cr3; |
604 | vcpu->arch.mmu.new_cr3(vcpu); | 604 | vcpu->arch.mmu.new_cr3(vcpu); |
605 | return 0; | 605 | return 0; |
606 | } | 606 | } |
607 | EXPORT_SYMBOL_GPL(kvm_set_cr3); | 607 | EXPORT_SYMBOL_GPL(kvm_set_cr3); |
608 | 608 | ||
609 | int __kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) | 609 | int __kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) |
610 | { | 610 | { |
611 | if (cr8 & CR8_RESERVED_BITS) | 611 | if (cr8 & CR8_RESERVED_BITS) |
612 | return 1; | 612 | return 1; |
613 | if (irqchip_in_kernel(vcpu->kvm)) | 613 | if (irqchip_in_kernel(vcpu->kvm)) |
614 | kvm_lapic_set_tpr(vcpu, cr8); | 614 | kvm_lapic_set_tpr(vcpu, cr8); |
615 | else | 615 | else |
616 | vcpu->arch.cr8 = cr8; | 616 | vcpu->arch.cr8 = cr8; |
617 | return 0; | 617 | return 0; |
618 | } | 618 | } |
619 | 619 | ||
620 | void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) | 620 | void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) |
621 | { | 621 | { |
622 | if (__kvm_set_cr8(vcpu, cr8)) | 622 | if (__kvm_set_cr8(vcpu, cr8)) |
623 | kvm_inject_gp(vcpu, 0); | 623 | kvm_inject_gp(vcpu, 0); |
624 | } | 624 | } |
625 | EXPORT_SYMBOL_GPL(kvm_set_cr8); | 625 | EXPORT_SYMBOL_GPL(kvm_set_cr8); |
626 | 626 | ||
627 | unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) | 627 | unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) |
628 | { | 628 | { |
629 | if (irqchip_in_kernel(vcpu->kvm)) | 629 | if (irqchip_in_kernel(vcpu->kvm)) |
630 | return kvm_lapic_get_cr8(vcpu); | 630 | return kvm_lapic_get_cr8(vcpu); |
631 | else | 631 | else |
632 | return vcpu->arch.cr8; | 632 | return vcpu->arch.cr8; |
633 | } | 633 | } |
634 | EXPORT_SYMBOL_GPL(kvm_get_cr8); | 634 | EXPORT_SYMBOL_GPL(kvm_get_cr8); |
635 | 635 | ||
636 | static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) | 636 | static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) |
637 | { | 637 | { |
638 | switch (dr) { | 638 | switch (dr) { |
639 | case 0 ... 3: | 639 | case 0 ... 3: |
640 | vcpu->arch.db[dr] = val; | 640 | vcpu->arch.db[dr] = val; |
641 | if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) | 641 | if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) |
642 | vcpu->arch.eff_db[dr] = val; | 642 | vcpu->arch.eff_db[dr] = val; |
643 | break; | 643 | break; |
644 | case 4: | 644 | case 4: |
645 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) | 645 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) |
646 | return 1; /* #UD */ | 646 | return 1; /* #UD */ |
647 | /* fall through */ | 647 | /* fall through */ |
648 | case 6: | 648 | case 6: |
649 | if (val & 0xffffffff00000000ULL) | 649 | if (val & 0xffffffff00000000ULL) |
650 | return -1; /* #GP */ | 650 | return -1; /* #GP */ |
651 | vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; | 651 | vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; |
652 | break; | 652 | break; |
653 | case 5: | 653 | case 5: |
654 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) | 654 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) |
655 | return 1; /* #UD */ | 655 | return 1; /* #UD */ |
656 | /* fall through */ | 656 | /* fall through */ |
657 | default: /* 7 */ | 657 | default: /* 7 */ |
658 | if (val & 0xffffffff00000000ULL) | 658 | if (val & 0xffffffff00000000ULL) |
659 | return -1; /* #GP */ | 659 | return -1; /* #GP */ |
660 | vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; | 660 | vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; |
661 | if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) { | 661 | if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) { |
662 | kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7); | 662 | kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7); |
663 | vcpu->arch.switch_db_regs = (val & DR7_BP_EN_MASK); | 663 | vcpu->arch.switch_db_regs = (val & DR7_BP_EN_MASK); |
664 | } | 664 | } |
665 | break; | 665 | break; |
666 | } | 666 | } |
667 | 667 | ||
668 | return 0; | 668 | return 0; |
669 | } | 669 | } |
670 | 670 | ||
671 | int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) | 671 | int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) |
672 | { | 672 | { |
673 | int res; | 673 | int res; |
674 | 674 | ||
675 | res = __kvm_set_dr(vcpu, dr, val); | 675 | res = __kvm_set_dr(vcpu, dr, val); |
676 | if (res > 0) | 676 | if (res > 0) |
677 | kvm_queue_exception(vcpu, UD_VECTOR); | 677 | kvm_queue_exception(vcpu, UD_VECTOR); |
678 | else if (res < 0) | 678 | else if (res < 0) |
679 | kvm_inject_gp(vcpu, 0); | 679 | kvm_inject_gp(vcpu, 0); |
680 | 680 | ||
681 | return res; | 681 | return res; |
682 | } | 682 | } |
683 | EXPORT_SYMBOL_GPL(kvm_set_dr); | 683 | EXPORT_SYMBOL_GPL(kvm_set_dr); |
684 | 684 | ||
685 | static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) | 685 | static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) |
686 | { | 686 | { |
687 | switch (dr) { | 687 | switch (dr) { |
688 | case 0 ... 3: | 688 | case 0 ... 3: |
689 | *val = vcpu->arch.db[dr]; | 689 | *val = vcpu->arch.db[dr]; |
690 | break; | 690 | break; |
691 | case 4: | 691 | case 4: |
692 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) | 692 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) |
693 | return 1; | 693 | return 1; |
694 | /* fall through */ | 694 | /* fall through */ |
695 | case 6: | 695 | case 6: |
696 | *val = vcpu->arch.dr6; | 696 | *val = vcpu->arch.dr6; |
697 | break; | 697 | break; |
698 | case 5: | 698 | case 5: |
699 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) | 699 | if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) |
700 | return 1; | 700 | return 1; |
701 | /* fall through */ | 701 | /* fall through */ |
702 | default: /* 7 */ | 702 | default: /* 7 */ |
703 | *val = vcpu->arch.dr7; | 703 | *val = vcpu->arch.dr7; |
704 | break; | 704 | break; |
705 | } | 705 | } |
706 | 706 | ||
707 | return 0; | 707 | return 0; |
708 | } | 708 | } |
709 | 709 | ||
710 | int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) | 710 | int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) |
711 | { | 711 | { |
712 | if (_kvm_get_dr(vcpu, dr, val)) { | 712 | if (_kvm_get_dr(vcpu, dr, val)) { |
713 | kvm_queue_exception(vcpu, UD_VECTOR); | 713 | kvm_queue_exception(vcpu, UD_VECTOR); |
714 | return 1; | 714 | return 1; |
715 | } | 715 | } |
716 | return 0; | 716 | return 0; |
717 | } | 717 | } |
718 | EXPORT_SYMBOL_GPL(kvm_get_dr); | 718 | EXPORT_SYMBOL_GPL(kvm_get_dr); |
719 | 719 | ||
720 | /* | 720 | /* |
721 | * List of msr numbers which we expose to userspace through KVM_GET_MSRS | 721 | * List of msr numbers which we expose to userspace through KVM_GET_MSRS |
722 | * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. | 722 | * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. |
723 | * | 723 | * |
724 | * This list is modified at module load time to reflect the | 724 | * This list is modified at module load time to reflect the |
725 | * capabilities of the host cpu. This capabilities test skips MSRs that are | 725 | * capabilities of the host cpu. This capabilities test skips MSRs that are |
726 | * kvm-specific. Those are put in the beginning of the list. | 726 | * kvm-specific. Those are put in the beginning of the list. |
727 | */ | 727 | */ |
728 | 728 | ||
729 | #define KVM_SAVE_MSRS_BEGIN 7 | 729 | #define KVM_SAVE_MSRS_BEGIN 7 |
730 | static u32 msrs_to_save[] = { | 730 | static u32 msrs_to_save[] = { |
731 | MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, | 731 | MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, |
732 | MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, | 732 | MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, |
733 | HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, | 733 | HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, |
734 | HV_X64_MSR_APIC_ASSIST_PAGE, | 734 | HV_X64_MSR_APIC_ASSIST_PAGE, |
735 | MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, | 735 | MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, |
736 | MSR_K6_STAR, | 736 | MSR_K6_STAR, |
737 | #ifdef CONFIG_X86_64 | 737 | #ifdef CONFIG_X86_64 |
738 | MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, | 738 | MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, |
739 | #endif | 739 | #endif |
740 | MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA | 740 | MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA |
741 | }; | 741 | }; |
742 | 742 | ||
743 | static unsigned num_msrs_to_save; | 743 | static unsigned num_msrs_to_save; |
744 | 744 | ||
745 | static u32 emulated_msrs[] = { | 745 | static u32 emulated_msrs[] = { |
746 | MSR_IA32_MISC_ENABLE, | 746 | MSR_IA32_MISC_ENABLE, |
747 | }; | 747 | }; |
748 | 748 | ||
749 | static int set_efer(struct kvm_vcpu *vcpu, u64 efer) | 749 | static int set_efer(struct kvm_vcpu *vcpu, u64 efer) |
750 | { | 750 | { |
751 | u64 old_efer = vcpu->arch.efer; | 751 | u64 old_efer = vcpu->arch.efer; |
752 | 752 | ||
753 | if (efer & efer_reserved_bits) | 753 | if (efer & efer_reserved_bits) |
754 | return 1; | 754 | return 1; |
755 | 755 | ||
756 | if (is_paging(vcpu) | 756 | if (is_paging(vcpu) |
757 | && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) | 757 | && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) |
758 | return 1; | 758 | return 1; |
759 | 759 | ||
760 | if (efer & EFER_FFXSR) { | 760 | if (efer & EFER_FFXSR) { |
761 | struct kvm_cpuid_entry2 *feat; | 761 | struct kvm_cpuid_entry2 *feat; |
762 | 762 | ||
763 | feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); | 763 | feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); |
764 | if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT))) | 764 | if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT))) |
765 | return 1; | 765 | return 1; |
766 | } | 766 | } |
767 | 767 | ||
768 | if (efer & EFER_SVME) { | 768 | if (efer & EFER_SVME) { |
769 | struct kvm_cpuid_entry2 *feat; | 769 | struct kvm_cpuid_entry2 *feat; |
770 | 770 | ||
771 | feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); | 771 | feat = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); |
772 | if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM))) | 772 | if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM))) |
773 | return 1; | 773 | return 1; |
774 | } | 774 | } |
775 | 775 | ||
776 | efer &= ~EFER_LMA; | 776 | efer &= ~EFER_LMA; |
777 | efer |= vcpu->arch.efer & EFER_LMA; | 777 | efer |= vcpu->arch.efer & EFER_LMA; |
778 | 778 | ||
779 | kvm_x86_ops->set_efer(vcpu, efer); | 779 | kvm_x86_ops->set_efer(vcpu, efer); |
780 | 780 | ||
781 | vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled; | 781 | vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled; |
782 | kvm_mmu_reset_context(vcpu); | 782 | kvm_mmu_reset_context(vcpu); |
783 | 783 | ||
784 | /* Update reserved bits */ | 784 | /* Update reserved bits */ |
785 | if ((efer ^ old_efer) & EFER_NX) | 785 | if ((efer ^ old_efer) & EFER_NX) |
786 | kvm_mmu_reset_context(vcpu); | 786 | kvm_mmu_reset_context(vcpu); |
787 | 787 | ||
788 | return 0; | 788 | return 0; |
789 | } | 789 | } |
790 | 790 | ||
791 | void kvm_enable_efer_bits(u64 mask) | 791 | void kvm_enable_efer_bits(u64 mask) |
792 | { | 792 | { |
793 | efer_reserved_bits &= ~mask; | 793 | efer_reserved_bits &= ~mask; |
794 | } | 794 | } |
795 | EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); | 795 | EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); |
796 | 796 | ||
797 | 797 | ||
798 | /* | 798 | /* |
799 | * Writes msr value into into the appropriate "register". | 799 | * Writes msr value into into the appropriate "register". |
800 | * Returns 0 on success, non-0 otherwise. | 800 | * Returns 0 on success, non-0 otherwise. |
801 | * Assumes vcpu_load() was already called. | 801 | * Assumes vcpu_load() was already called. |
802 | */ | 802 | */ |
803 | int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) | 803 | int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) |
804 | { | 804 | { |
805 | return kvm_x86_ops->set_msr(vcpu, msr_index, data); | 805 | return kvm_x86_ops->set_msr(vcpu, msr_index, data); |
806 | } | 806 | } |
807 | 807 | ||
808 | /* | 808 | /* |
809 | * Adapt set_msr() to msr_io()'s calling convention | 809 | * Adapt set_msr() to msr_io()'s calling convention |
810 | */ | 810 | */ |
811 | static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) | 811 | static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) |
812 | { | 812 | { |
813 | return kvm_set_msr(vcpu, index, *data); | 813 | return kvm_set_msr(vcpu, index, *data); |
814 | } | 814 | } |
815 | 815 | ||
816 | static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) | 816 | static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) |
817 | { | 817 | { |
818 | int version; | 818 | int version; |
819 | int r; | 819 | int r; |
820 | struct pvclock_wall_clock wc; | 820 | struct pvclock_wall_clock wc; |
821 | struct timespec boot; | 821 | struct timespec boot; |
822 | 822 | ||
823 | if (!wall_clock) | 823 | if (!wall_clock) |
824 | return; | 824 | return; |
825 | 825 | ||
826 | r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version)); | 826 | r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version)); |
827 | if (r) | 827 | if (r) |
828 | return; | 828 | return; |
829 | 829 | ||
830 | if (version & 1) | 830 | if (version & 1) |
831 | ++version; /* first time write, random junk */ | 831 | ++version; /* first time write, random junk */ |
832 | 832 | ||
833 | ++version; | 833 | ++version; |
834 | 834 | ||
835 | kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); | 835 | kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); |
836 | 836 | ||
837 | /* | 837 | /* |
838 | * The guest calculates current wall clock time by adding | 838 | * The guest calculates current wall clock time by adding |
839 | * system time (updated by kvm_write_guest_time below) to the | 839 | * system time (updated by kvm_write_guest_time below) to the |
840 | * wall clock specified here. guest system time equals host | 840 | * wall clock specified here. guest system time equals host |
841 | * system time for us, thus we must fill in host boot time here. | 841 | * system time for us, thus we must fill in host boot time here. |
842 | */ | 842 | */ |
843 | getboottime(&boot); | 843 | getboottime(&boot); |
844 | 844 | ||
845 | wc.sec = boot.tv_sec; | 845 | wc.sec = boot.tv_sec; |
846 | wc.nsec = boot.tv_nsec; | 846 | wc.nsec = boot.tv_nsec; |
847 | wc.version = version; | 847 | wc.version = version; |
848 | 848 | ||
849 | kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); | 849 | kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); |
850 | 850 | ||
851 | version++; | 851 | version++; |
852 | kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); | 852 | kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); |
853 | } | 853 | } |
854 | 854 | ||
855 | static uint32_t div_frac(uint32_t dividend, uint32_t divisor) | 855 | static uint32_t div_frac(uint32_t dividend, uint32_t divisor) |
856 | { | 856 | { |
857 | uint32_t quotient, remainder; | 857 | uint32_t quotient, remainder; |
858 | 858 | ||
859 | /* Don't try to replace with do_div(), this one calculates | 859 | /* Don't try to replace with do_div(), this one calculates |
860 | * "(dividend << 32) / divisor" */ | 860 | * "(dividend << 32) / divisor" */ |
861 | __asm__ ( "divl %4" | 861 | __asm__ ( "divl %4" |
862 | : "=a" (quotient), "=d" (remainder) | 862 | : "=a" (quotient), "=d" (remainder) |
863 | : "0" (0), "1" (dividend), "r" (divisor) ); | 863 | : "0" (0), "1" (dividend), "r" (divisor) ); |
864 | return quotient; | 864 | return quotient; |
865 | } | 865 | } |
866 | 866 | ||
867 | static void kvm_set_time_scale(uint32_t tsc_khz, struct pvclock_vcpu_time_info *hv_clock) | 867 | static void kvm_set_time_scale(uint32_t tsc_khz, struct pvclock_vcpu_time_info *hv_clock) |
868 | { | 868 | { |
869 | uint64_t nsecs = 1000000000LL; | 869 | uint64_t nsecs = 1000000000LL; |
870 | int32_t shift = 0; | 870 | int32_t shift = 0; |
871 | uint64_t tps64; | 871 | uint64_t tps64; |
872 | uint32_t tps32; | 872 | uint32_t tps32; |
873 | 873 | ||
874 | tps64 = tsc_khz * 1000LL; | 874 | tps64 = tsc_khz * 1000LL; |
875 | while (tps64 > nsecs*2) { | 875 | while (tps64 > nsecs*2) { |
876 | tps64 >>= 1; | 876 | tps64 >>= 1; |
877 | shift--; | 877 | shift--; |
878 | } | 878 | } |
879 | 879 | ||
880 | tps32 = (uint32_t)tps64; | 880 | tps32 = (uint32_t)tps64; |
881 | while (tps32 <= (uint32_t)nsecs) { | 881 | while (tps32 <= (uint32_t)nsecs) { |
882 | tps32 <<= 1; | 882 | tps32 <<= 1; |
883 | shift++; | 883 | shift++; |
884 | } | 884 | } |
885 | 885 | ||
886 | hv_clock->tsc_shift = shift; | 886 | hv_clock->tsc_shift = shift; |
887 | hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32); | 887 | hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32); |
888 | 888 | ||
889 | pr_debug("%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n", | 889 | pr_debug("%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n", |
890 | __func__, tsc_khz, hv_clock->tsc_shift, | 890 | __func__, tsc_khz, hv_clock->tsc_shift, |
891 | hv_clock->tsc_to_system_mul); | 891 | hv_clock->tsc_to_system_mul); |
892 | } | 892 | } |
893 | 893 | ||
894 | static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz); | 894 | static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz); |
895 | 895 | ||
896 | static void kvm_write_guest_time(struct kvm_vcpu *v) | 896 | static void kvm_write_guest_time(struct kvm_vcpu *v) |
897 | { | 897 | { |
898 | struct timespec ts; | 898 | struct timespec ts; |
899 | unsigned long flags; | 899 | unsigned long flags; |
900 | struct kvm_vcpu_arch *vcpu = &v->arch; | 900 | struct kvm_vcpu_arch *vcpu = &v->arch; |
901 | void *shared_kaddr; | 901 | void *shared_kaddr; |
902 | unsigned long this_tsc_khz; | 902 | unsigned long this_tsc_khz; |
903 | 903 | ||
904 | if ((!vcpu->time_page)) | 904 | if ((!vcpu->time_page)) |
905 | return; | 905 | return; |
906 | 906 | ||
907 | this_tsc_khz = get_cpu_var(cpu_tsc_khz); | 907 | this_tsc_khz = get_cpu_var(cpu_tsc_khz); |
908 | if (unlikely(vcpu->hv_clock_tsc_khz != this_tsc_khz)) { | 908 | if (unlikely(vcpu->hv_clock_tsc_khz != this_tsc_khz)) { |
909 | kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock); | 909 | kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock); |
910 | vcpu->hv_clock_tsc_khz = this_tsc_khz; | 910 | vcpu->hv_clock_tsc_khz = this_tsc_khz; |
911 | } | 911 | } |
912 | put_cpu_var(cpu_tsc_khz); | 912 | put_cpu_var(cpu_tsc_khz); |
913 | 913 | ||
914 | /* Keep irq disabled to prevent changes to the clock */ | 914 | /* Keep irq disabled to prevent changes to the clock */ |
915 | local_irq_save(flags); | 915 | local_irq_save(flags); |
916 | kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp); | 916 | kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp); |
917 | ktime_get_ts(&ts); | 917 | ktime_get_ts(&ts); |
918 | monotonic_to_bootbased(&ts); | 918 | monotonic_to_bootbased(&ts); |
919 | local_irq_restore(flags); | 919 | local_irq_restore(flags); |
920 | 920 | ||
921 | /* With all the info we got, fill in the values */ | 921 | /* With all the info we got, fill in the values */ |
922 | 922 | ||
923 | vcpu->hv_clock.system_time = ts.tv_nsec + | 923 | vcpu->hv_clock.system_time = ts.tv_nsec + |
924 | (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset; | 924 | (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset; |
925 | 925 | ||
926 | vcpu->hv_clock.flags = 0; | 926 | vcpu->hv_clock.flags = 0; |
927 | 927 | ||
928 | /* | 928 | /* |
929 | * The interface expects us to write an even number signaling that the | 929 | * The interface expects us to write an even number signaling that the |
930 | * update is finished. Since the guest won't see the intermediate | 930 | * update is finished. Since the guest won't see the intermediate |
931 | * state, we just increase by 2 at the end. | 931 | * state, we just increase by 2 at the end. |
932 | */ | 932 | */ |
933 | vcpu->hv_clock.version += 2; | 933 | vcpu->hv_clock.version += 2; |
934 | 934 | ||
935 | shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); | 935 | shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); |
936 | 936 | ||
937 | memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, | 937 | memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, |
938 | sizeof(vcpu->hv_clock)); | 938 | sizeof(vcpu->hv_clock)); |
939 | 939 | ||
940 | kunmap_atomic(shared_kaddr, KM_USER0); | 940 | kunmap_atomic(shared_kaddr, KM_USER0); |
941 | 941 | ||
942 | mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); | 942 | mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); |
943 | } | 943 | } |
944 | 944 | ||
945 | static int kvm_request_guest_time_update(struct kvm_vcpu *v) | 945 | static int kvm_request_guest_time_update(struct kvm_vcpu *v) |
946 | { | 946 | { |
947 | struct kvm_vcpu_arch *vcpu = &v->arch; | 947 | struct kvm_vcpu_arch *vcpu = &v->arch; |
948 | 948 | ||
949 | if (!vcpu->time_page) | 949 | if (!vcpu->time_page) |
950 | return 0; | 950 | return 0; |
951 | set_bit(KVM_REQ_KVMCLOCK_UPDATE, &v->requests); | 951 | set_bit(KVM_REQ_KVMCLOCK_UPDATE, &v->requests); |
952 | return 1; | 952 | return 1; |
953 | } | 953 | } |
954 | 954 | ||
955 | static bool msr_mtrr_valid(unsigned msr) | 955 | static bool msr_mtrr_valid(unsigned msr) |
956 | { | 956 | { |
957 | switch (msr) { | 957 | switch (msr) { |
958 | case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1: | 958 | case 0x200 ... 0x200 + 2 * KVM_NR_VAR_MTRR - 1: |
959 | case MSR_MTRRfix64K_00000: | 959 | case MSR_MTRRfix64K_00000: |
960 | case MSR_MTRRfix16K_80000: | 960 | case MSR_MTRRfix16K_80000: |
961 | case MSR_MTRRfix16K_A0000: | 961 | case MSR_MTRRfix16K_A0000: |
962 | case MSR_MTRRfix4K_C0000: | 962 | case MSR_MTRRfix4K_C0000: |
963 | case MSR_MTRRfix4K_C8000: | 963 | case MSR_MTRRfix4K_C8000: |
964 | case MSR_MTRRfix4K_D0000: | 964 | case MSR_MTRRfix4K_D0000: |
965 | case MSR_MTRRfix4K_D8000: | 965 | case MSR_MTRRfix4K_D8000: |
966 | case MSR_MTRRfix4K_E0000: | 966 | case MSR_MTRRfix4K_E0000: |
967 | case MSR_MTRRfix4K_E8000: | 967 | case MSR_MTRRfix4K_E8000: |
968 | case MSR_MTRRfix4K_F0000: | 968 | case MSR_MTRRfix4K_F0000: |
969 | case MSR_MTRRfix4K_F8000: | 969 | case MSR_MTRRfix4K_F8000: |
970 | case MSR_MTRRdefType: | 970 | case MSR_MTRRdefType: |
971 | case MSR_IA32_CR_PAT: | 971 | case MSR_IA32_CR_PAT: |
972 | return true; | 972 | return true; |
973 | case 0x2f8: | 973 | case 0x2f8: |
974 | return true; | 974 | return true; |
975 | } | 975 | } |
976 | return false; | 976 | return false; |
977 | } | 977 | } |
978 | 978 | ||
979 | static bool valid_pat_type(unsigned t) | 979 | static bool valid_pat_type(unsigned t) |
980 | { | 980 | { |
981 | return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */ | 981 | return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */ |
982 | } | 982 | } |
983 | 983 | ||
984 | static bool valid_mtrr_type(unsigned t) | 984 | static bool valid_mtrr_type(unsigned t) |
985 | { | 985 | { |
986 | return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ | 986 | return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ |
987 | } | 987 | } |
988 | 988 | ||
989 | static bool mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 989 | static bool mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
990 | { | 990 | { |
991 | int i; | 991 | int i; |
992 | 992 | ||
993 | if (!msr_mtrr_valid(msr)) | 993 | if (!msr_mtrr_valid(msr)) |
994 | return false; | 994 | return false; |
995 | 995 | ||
996 | if (msr == MSR_IA32_CR_PAT) { | 996 | if (msr == MSR_IA32_CR_PAT) { |
997 | for (i = 0; i < 8; i++) | 997 | for (i = 0; i < 8; i++) |
998 | if (!valid_pat_type((data >> (i * 8)) & 0xff)) | 998 | if (!valid_pat_type((data >> (i * 8)) & 0xff)) |
999 | return false; | 999 | return false; |
1000 | return true; | 1000 | return true; |
1001 | } else if (msr == MSR_MTRRdefType) { | 1001 | } else if (msr == MSR_MTRRdefType) { |
1002 | if (data & ~0xcff) | 1002 | if (data & ~0xcff) |
1003 | return false; | 1003 | return false; |
1004 | return valid_mtrr_type(data & 0xff); | 1004 | return valid_mtrr_type(data & 0xff); |
1005 | } else if (msr >= MSR_MTRRfix64K_00000 && msr <= MSR_MTRRfix4K_F8000) { | 1005 | } else if (msr >= MSR_MTRRfix64K_00000 && msr <= MSR_MTRRfix4K_F8000) { |
1006 | for (i = 0; i < 8 ; i++) | 1006 | for (i = 0; i < 8 ; i++) |
1007 | if (!valid_mtrr_type((data >> (i * 8)) & 0xff)) | 1007 | if (!valid_mtrr_type((data >> (i * 8)) & 0xff)) |
1008 | return false; | 1008 | return false; |
1009 | return true; | 1009 | return true; |
1010 | } | 1010 | } |
1011 | 1011 | ||
1012 | /* variable MTRRs */ | 1012 | /* variable MTRRs */ |
1013 | return valid_mtrr_type(data & 0xff); | 1013 | return valid_mtrr_type(data & 0xff); |
1014 | } | 1014 | } |
1015 | 1015 | ||
1016 | static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 1016 | static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
1017 | { | 1017 | { |
1018 | u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; | 1018 | u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; |
1019 | 1019 | ||
1020 | if (!mtrr_valid(vcpu, msr, data)) | 1020 | if (!mtrr_valid(vcpu, msr, data)) |
1021 | return 1; | 1021 | return 1; |
1022 | 1022 | ||
1023 | if (msr == MSR_MTRRdefType) { | 1023 | if (msr == MSR_MTRRdefType) { |
1024 | vcpu->arch.mtrr_state.def_type = data; | 1024 | vcpu->arch.mtrr_state.def_type = data; |
1025 | vcpu->arch.mtrr_state.enabled = (data & 0xc00) >> 10; | 1025 | vcpu->arch.mtrr_state.enabled = (data & 0xc00) >> 10; |
1026 | } else if (msr == MSR_MTRRfix64K_00000) | 1026 | } else if (msr == MSR_MTRRfix64K_00000) |
1027 | p[0] = data; | 1027 | p[0] = data; |
1028 | else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) | 1028 | else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) |
1029 | p[1 + msr - MSR_MTRRfix16K_80000] = data; | 1029 | p[1 + msr - MSR_MTRRfix16K_80000] = data; |
1030 | else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) | 1030 | else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) |
1031 | p[3 + msr - MSR_MTRRfix4K_C0000] = data; | 1031 | p[3 + msr - MSR_MTRRfix4K_C0000] = data; |
1032 | else if (msr == MSR_IA32_CR_PAT) | 1032 | else if (msr == MSR_IA32_CR_PAT) |
1033 | vcpu->arch.pat = data; | 1033 | vcpu->arch.pat = data; |
1034 | else { /* Variable MTRRs */ | 1034 | else { /* Variable MTRRs */ |
1035 | int idx, is_mtrr_mask; | 1035 | int idx, is_mtrr_mask; |
1036 | u64 *pt; | 1036 | u64 *pt; |
1037 | 1037 | ||
1038 | idx = (msr - 0x200) / 2; | 1038 | idx = (msr - 0x200) / 2; |
1039 | is_mtrr_mask = msr - 0x200 - 2 * idx; | 1039 | is_mtrr_mask = msr - 0x200 - 2 * idx; |
1040 | if (!is_mtrr_mask) | 1040 | if (!is_mtrr_mask) |
1041 | pt = | 1041 | pt = |
1042 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; | 1042 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; |
1043 | else | 1043 | else |
1044 | pt = | 1044 | pt = |
1045 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; | 1045 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; |
1046 | *pt = data; | 1046 | *pt = data; |
1047 | } | 1047 | } |
1048 | 1048 | ||
1049 | kvm_mmu_reset_context(vcpu); | 1049 | kvm_mmu_reset_context(vcpu); |
1050 | return 0; | 1050 | return 0; |
1051 | } | 1051 | } |
1052 | 1052 | ||
1053 | static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 1053 | static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
1054 | { | 1054 | { |
1055 | u64 mcg_cap = vcpu->arch.mcg_cap; | 1055 | u64 mcg_cap = vcpu->arch.mcg_cap; |
1056 | unsigned bank_num = mcg_cap & 0xff; | 1056 | unsigned bank_num = mcg_cap & 0xff; |
1057 | 1057 | ||
1058 | switch (msr) { | 1058 | switch (msr) { |
1059 | case MSR_IA32_MCG_STATUS: | 1059 | case MSR_IA32_MCG_STATUS: |
1060 | vcpu->arch.mcg_status = data; | 1060 | vcpu->arch.mcg_status = data; |
1061 | break; | 1061 | break; |
1062 | case MSR_IA32_MCG_CTL: | 1062 | case MSR_IA32_MCG_CTL: |
1063 | if (!(mcg_cap & MCG_CTL_P)) | 1063 | if (!(mcg_cap & MCG_CTL_P)) |
1064 | return 1; | 1064 | return 1; |
1065 | if (data != 0 && data != ~(u64)0) | 1065 | if (data != 0 && data != ~(u64)0) |
1066 | return -1; | 1066 | return -1; |
1067 | vcpu->arch.mcg_ctl = data; | 1067 | vcpu->arch.mcg_ctl = data; |
1068 | break; | 1068 | break; |
1069 | default: | 1069 | default: |
1070 | if (msr >= MSR_IA32_MC0_CTL && | 1070 | if (msr >= MSR_IA32_MC0_CTL && |
1071 | msr < MSR_IA32_MC0_CTL + 4 * bank_num) { | 1071 | msr < MSR_IA32_MC0_CTL + 4 * bank_num) { |
1072 | u32 offset = msr - MSR_IA32_MC0_CTL; | 1072 | u32 offset = msr - MSR_IA32_MC0_CTL; |
1073 | /* only 0 or all 1s can be written to IA32_MCi_CTL | 1073 | /* only 0 or all 1s can be written to IA32_MCi_CTL |
1074 | * some Linux kernels though clear bit 10 in bank 4 to | 1074 | * some Linux kernels though clear bit 10 in bank 4 to |
1075 | * workaround a BIOS/GART TBL issue on AMD K8s, ignore | 1075 | * workaround a BIOS/GART TBL issue on AMD K8s, ignore |
1076 | * this to avoid an uncatched #GP in the guest | 1076 | * this to avoid an uncatched #GP in the guest |
1077 | */ | 1077 | */ |
1078 | if ((offset & 0x3) == 0 && | 1078 | if ((offset & 0x3) == 0 && |
1079 | data != 0 && (data | (1 << 10)) != ~(u64)0) | 1079 | data != 0 && (data | (1 << 10)) != ~(u64)0) |
1080 | return -1; | 1080 | return -1; |
1081 | vcpu->arch.mce_banks[offset] = data; | 1081 | vcpu->arch.mce_banks[offset] = data; |
1082 | break; | 1082 | break; |
1083 | } | 1083 | } |
1084 | return 1; | 1084 | return 1; |
1085 | } | 1085 | } |
1086 | return 0; | 1086 | return 0; |
1087 | } | 1087 | } |
1088 | 1088 | ||
1089 | static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data) | 1089 | static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data) |
1090 | { | 1090 | { |
1091 | struct kvm *kvm = vcpu->kvm; | 1091 | struct kvm *kvm = vcpu->kvm; |
1092 | int lm = is_long_mode(vcpu); | 1092 | int lm = is_long_mode(vcpu); |
1093 | u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64 | 1093 | u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64 |
1094 | : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32; | 1094 | : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32; |
1095 | u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64 | 1095 | u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64 |
1096 | : kvm->arch.xen_hvm_config.blob_size_32; | 1096 | : kvm->arch.xen_hvm_config.blob_size_32; |
1097 | u32 page_num = data & ~PAGE_MASK; | 1097 | u32 page_num = data & ~PAGE_MASK; |
1098 | u64 page_addr = data & PAGE_MASK; | 1098 | u64 page_addr = data & PAGE_MASK; |
1099 | u8 *page; | 1099 | u8 *page; |
1100 | int r; | 1100 | int r; |
1101 | 1101 | ||
1102 | r = -E2BIG; | 1102 | r = -E2BIG; |
1103 | if (page_num >= blob_size) | 1103 | if (page_num >= blob_size) |
1104 | goto out; | 1104 | goto out; |
1105 | r = -ENOMEM; | 1105 | r = -ENOMEM; |
1106 | page = kzalloc(PAGE_SIZE, GFP_KERNEL); | 1106 | page = kzalloc(PAGE_SIZE, GFP_KERNEL); |
1107 | if (!page) | 1107 | if (!page) |
1108 | goto out; | 1108 | goto out; |
1109 | r = -EFAULT; | 1109 | r = -EFAULT; |
1110 | if (copy_from_user(page, blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE)) | 1110 | if (copy_from_user(page, blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE)) |
1111 | goto out_free; | 1111 | goto out_free; |
1112 | if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE)) | 1112 | if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE)) |
1113 | goto out_free; | 1113 | goto out_free; |
1114 | r = 0; | 1114 | r = 0; |
1115 | out_free: | 1115 | out_free: |
1116 | kfree(page); | 1116 | kfree(page); |
1117 | out: | 1117 | out: |
1118 | return r; | 1118 | return r; |
1119 | } | 1119 | } |
1120 | 1120 | ||
1121 | static bool kvm_hv_hypercall_enabled(struct kvm *kvm) | 1121 | static bool kvm_hv_hypercall_enabled(struct kvm *kvm) |
1122 | { | 1122 | { |
1123 | return kvm->arch.hv_hypercall & HV_X64_MSR_HYPERCALL_ENABLE; | 1123 | return kvm->arch.hv_hypercall & HV_X64_MSR_HYPERCALL_ENABLE; |
1124 | } | 1124 | } |
1125 | 1125 | ||
1126 | static bool kvm_hv_msr_partition_wide(u32 msr) | 1126 | static bool kvm_hv_msr_partition_wide(u32 msr) |
1127 | { | 1127 | { |
1128 | bool r = false; | 1128 | bool r = false; |
1129 | switch (msr) { | 1129 | switch (msr) { |
1130 | case HV_X64_MSR_GUEST_OS_ID: | 1130 | case HV_X64_MSR_GUEST_OS_ID: |
1131 | case HV_X64_MSR_HYPERCALL: | 1131 | case HV_X64_MSR_HYPERCALL: |
1132 | r = true; | 1132 | r = true; |
1133 | break; | 1133 | break; |
1134 | } | 1134 | } |
1135 | 1135 | ||
1136 | return r; | 1136 | return r; |
1137 | } | 1137 | } |
1138 | 1138 | ||
1139 | static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 1139 | static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
1140 | { | 1140 | { |
1141 | struct kvm *kvm = vcpu->kvm; | 1141 | struct kvm *kvm = vcpu->kvm; |
1142 | 1142 | ||
1143 | switch (msr) { | 1143 | switch (msr) { |
1144 | case HV_X64_MSR_GUEST_OS_ID: | 1144 | case HV_X64_MSR_GUEST_OS_ID: |
1145 | kvm->arch.hv_guest_os_id = data; | 1145 | kvm->arch.hv_guest_os_id = data; |
1146 | /* setting guest os id to zero disables hypercall page */ | 1146 | /* setting guest os id to zero disables hypercall page */ |
1147 | if (!kvm->arch.hv_guest_os_id) | 1147 | if (!kvm->arch.hv_guest_os_id) |
1148 | kvm->arch.hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE; | 1148 | kvm->arch.hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE; |
1149 | break; | 1149 | break; |
1150 | case HV_X64_MSR_HYPERCALL: { | 1150 | case HV_X64_MSR_HYPERCALL: { |
1151 | u64 gfn; | 1151 | u64 gfn; |
1152 | unsigned long addr; | 1152 | unsigned long addr; |
1153 | u8 instructions[4]; | 1153 | u8 instructions[4]; |
1154 | 1154 | ||
1155 | /* if guest os id is not set hypercall should remain disabled */ | 1155 | /* if guest os id is not set hypercall should remain disabled */ |
1156 | if (!kvm->arch.hv_guest_os_id) | 1156 | if (!kvm->arch.hv_guest_os_id) |
1157 | break; | 1157 | break; |
1158 | if (!(data & HV_X64_MSR_HYPERCALL_ENABLE)) { | 1158 | if (!(data & HV_X64_MSR_HYPERCALL_ENABLE)) { |
1159 | kvm->arch.hv_hypercall = data; | 1159 | kvm->arch.hv_hypercall = data; |
1160 | break; | 1160 | break; |
1161 | } | 1161 | } |
1162 | gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; | 1162 | gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; |
1163 | addr = gfn_to_hva(kvm, gfn); | 1163 | addr = gfn_to_hva(kvm, gfn); |
1164 | if (kvm_is_error_hva(addr)) | 1164 | if (kvm_is_error_hva(addr)) |
1165 | return 1; | 1165 | return 1; |
1166 | kvm_x86_ops->patch_hypercall(vcpu, instructions); | 1166 | kvm_x86_ops->patch_hypercall(vcpu, instructions); |
1167 | ((unsigned char *)instructions)[3] = 0xc3; /* ret */ | 1167 | ((unsigned char *)instructions)[3] = 0xc3; /* ret */ |
1168 | if (copy_to_user((void __user *)addr, instructions, 4)) | 1168 | if (copy_to_user((void __user *)addr, instructions, 4)) |
1169 | return 1; | 1169 | return 1; |
1170 | kvm->arch.hv_hypercall = data; | 1170 | kvm->arch.hv_hypercall = data; |
1171 | break; | 1171 | break; |
1172 | } | 1172 | } |
1173 | default: | 1173 | default: |
1174 | pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " | 1174 | pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " |
1175 | "data 0x%llx\n", msr, data); | 1175 | "data 0x%llx\n", msr, data); |
1176 | return 1; | 1176 | return 1; |
1177 | } | 1177 | } |
1178 | return 0; | 1178 | return 0; |
1179 | } | 1179 | } |
1180 | 1180 | ||
1181 | static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 1181 | static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
1182 | { | 1182 | { |
1183 | switch (msr) { | 1183 | switch (msr) { |
1184 | case HV_X64_MSR_APIC_ASSIST_PAGE: { | 1184 | case HV_X64_MSR_APIC_ASSIST_PAGE: { |
1185 | unsigned long addr; | 1185 | unsigned long addr; |
1186 | 1186 | ||
1187 | if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) { | 1187 | if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) { |
1188 | vcpu->arch.hv_vapic = data; | 1188 | vcpu->arch.hv_vapic = data; |
1189 | break; | 1189 | break; |
1190 | } | 1190 | } |
1191 | addr = gfn_to_hva(vcpu->kvm, data >> | 1191 | addr = gfn_to_hva(vcpu->kvm, data >> |
1192 | HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT); | 1192 | HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT); |
1193 | if (kvm_is_error_hva(addr)) | 1193 | if (kvm_is_error_hva(addr)) |
1194 | return 1; | 1194 | return 1; |
1195 | if (clear_user((void __user *)addr, PAGE_SIZE)) | 1195 | if (clear_user((void __user *)addr, PAGE_SIZE)) |
1196 | return 1; | 1196 | return 1; |
1197 | vcpu->arch.hv_vapic = data; | 1197 | vcpu->arch.hv_vapic = data; |
1198 | break; | 1198 | break; |
1199 | } | 1199 | } |
1200 | case HV_X64_MSR_EOI: | 1200 | case HV_X64_MSR_EOI: |
1201 | return kvm_hv_vapic_msr_write(vcpu, APIC_EOI, data); | 1201 | return kvm_hv_vapic_msr_write(vcpu, APIC_EOI, data); |
1202 | case HV_X64_MSR_ICR: | 1202 | case HV_X64_MSR_ICR: |
1203 | return kvm_hv_vapic_msr_write(vcpu, APIC_ICR, data); | 1203 | return kvm_hv_vapic_msr_write(vcpu, APIC_ICR, data); |
1204 | case HV_X64_MSR_TPR: | 1204 | case HV_X64_MSR_TPR: |
1205 | return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data); | 1205 | return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data); |
1206 | default: | 1206 | default: |
1207 | pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " | 1207 | pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x " |
1208 | "data 0x%llx\n", msr, data); | 1208 | "data 0x%llx\n", msr, data); |
1209 | return 1; | 1209 | return 1; |
1210 | } | 1210 | } |
1211 | 1211 | ||
1212 | return 0; | 1212 | return 0; |
1213 | } | 1213 | } |
1214 | 1214 | ||
1215 | int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) | 1215 | int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) |
1216 | { | 1216 | { |
1217 | switch (msr) { | 1217 | switch (msr) { |
1218 | case MSR_EFER: | 1218 | case MSR_EFER: |
1219 | return set_efer(vcpu, data); | 1219 | return set_efer(vcpu, data); |
1220 | case MSR_K7_HWCR: | 1220 | case MSR_K7_HWCR: |
1221 | data &= ~(u64)0x40; /* ignore flush filter disable */ | 1221 | data &= ~(u64)0x40; /* ignore flush filter disable */ |
1222 | data &= ~(u64)0x100; /* ignore ignne emulation enable */ | 1222 | data &= ~(u64)0x100; /* ignore ignne emulation enable */ |
1223 | if (data != 0) { | 1223 | if (data != 0) { |
1224 | pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", | 1224 | pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", |
1225 | data); | 1225 | data); |
1226 | return 1; | 1226 | return 1; |
1227 | } | 1227 | } |
1228 | break; | 1228 | break; |
1229 | case MSR_FAM10H_MMIO_CONF_BASE: | 1229 | case MSR_FAM10H_MMIO_CONF_BASE: |
1230 | if (data != 0) { | 1230 | if (data != 0) { |
1231 | pr_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: " | 1231 | pr_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: " |
1232 | "0x%llx\n", data); | 1232 | "0x%llx\n", data); |
1233 | return 1; | 1233 | return 1; |
1234 | } | 1234 | } |
1235 | break; | 1235 | break; |
1236 | case MSR_AMD64_NB_CFG: | 1236 | case MSR_AMD64_NB_CFG: |
1237 | break; | 1237 | break; |
1238 | case MSR_IA32_DEBUGCTLMSR: | 1238 | case MSR_IA32_DEBUGCTLMSR: |
1239 | if (!data) { | 1239 | if (!data) { |
1240 | /* We support the non-activated case already */ | 1240 | /* We support the non-activated case already */ |
1241 | break; | 1241 | break; |
1242 | } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { | 1242 | } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { |
1243 | /* Values other than LBR and BTF are vendor-specific, | 1243 | /* Values other than LBR and BTF are vendor-specific, |
1244 | thus reserved and should throw a #GP */ | 1244 | thus reserved and should throw a #GP */ |
1245 | return 1; | 1245 | return 1; |
1246 | } | 1246 | } |
1247 | pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", | 1247 | pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", |
1248 | __func__, data); | 1248 | __func__, data); |
1249 | break; | 1249 | break; |
1250 | case MSR_IA32_UCODE_REV: | 1250 | case MSR_IA32_UCODE_REV: |
1251 | case MSR_IA32_UCODE_WRITE: | 1251 | case MSR_IA32_UCODE_WRITE: |
1252 | case MSR_VM_HSAVE_PA: | 1252 | case MSR_VM_HSAVE_PA: |
1253 | case MSR_AMD64_PATCH_LOADER: | 1253 | case MSR_AMD64_PATCH_LOADER: |
1254 | break; | 1254 | break; |
1255 | case 0x200 ... 0x2ff: | 1255 | case 0x200 ... 0x2ff: |
1256 | return set_msr_mtrr(vcpu, msr, data); | 1256 | return set_msr_mtrr(vcpu, msr, data); |
1257 | case MSR_IA32_APICBASE: | 1257 | case MSR_IA32_APICBASE: |
1258 | kvm_set_apic_base(vcpu, data); | 1258 | kvm_set_apic_base(vcpu, data); |
1259 | break; | 1259 | break; |
1260 | case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: | 1260 | case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: |
1261 | return kvm_x2apic_msr_write(vcpu, msr, data); | 1261 | return kvm_x2apic_msr_write(vcpu, msr, data); |
1262 | case MSR_IA32_MISC_ENABLE: | 1262 | case MSR_IA32_MISC_ENABLE: |
1263 | vcpu->arch.ia32_misc_enable_msr = data; | 1263 | vcpu->arch.ia32_misc_enable_msr = data; |
1264 | break; | 1264 | break; |
1265 | case MSR_KVM_WALL_CLOCK_NEW: | 1265 | case MSR_KVM_WALL_CLOCK_NEW: |
1266 | case MSR_KVM_WALL_CLOCK: | 1266 | case MSR_KVM_WALL_CLOCK: |
1267 | vcpu->kvm->arch.wall_clock = data; | 1267 | vcpu->kvm->arch.wall_clock = data; |
1268 | kvm_write_wall_clock(vcpu->kvm, data); | 1268 | kvm_write_wall_clock(vcpu->kvm, data); |
1269 | break; | 1269 | break; |
1270 | case MSR_KVM_SYSTEM_TIME_NEW: | 1270 | case MSR_KVM_SYSTEM_TIME_NEW: |
1271 | case MSR_KVM_SYSTEM_TIME: { | 1271 | case MSR_KVM_SYSTEM_TIME: { |
1272 | if (vcpu->arch.time_page) { | 1272 | if (vcpu->arch.time_page) { |
1273 | kvm_release_page_dirty(vcpu->arch.time_page); | 1273 | kvm_release_page_dirty(vcpu->arch.time_page); |
1274 | vcpu->arch.time_page = NULL; | 1274 | vcpu->arch.time_page = NULL; |
1275 | } | 1275 | } |
1276 | 1276 | ||
1277 | vcpu->arch.time = data; | 1277 | vcpu->arch.time = data; |
1278 | 1278 | ||
1279 | /* we verify if the enable bit is set... */ | 1279 | /* we verify if the enable bit is set... */ |
1280 | if (!(data & 1)) | 1280 | if (!(data & 1)) |
1281 | break; | 1281 | break; |
1282 | 1282 | ||
1283 | /* ...but clean it before doing the actual write */ | 1283 | /* ...but clean it before doing the actual write */ |
1284 | vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); | 1284 | vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); |
1285 | 1285 | ||
1286 | vcpu->arch.time_page = | 1286 | vcpu->arch.time_page = |
1287 | gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); | 1287 | gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); |
1288 | 1288 | ||
1289 | if (is_error_page(vcpu->arch.time_page)) { | 1289 | if (is_error_page(vcpu->arch.time_page)) { |
1290 | kvm_release_page_clean(vcpu->arch.time_page); | 1290 | kvm_release_page_clean(vcpu->arch.time_page); |
1291 | vcpu->arch.time_page = NULL; | 1291 | vcpu->arch.time_page = NULL; |
1292 | } | 1292 | } |
1293 | 1293 | ||
1294 | kvm_request_guest_time_update(vcpu); | 1294 | kvm_request_guest_time_update(vcpu); |
1295 | break; | 1295 | break; |
1296 | } | 1296 | } |
1297 | case MSR_IA32_MCG_CTL: | 1297 | case MSR_IA32_MCG_CTL: |
1298 | case MSR_IA32_MCG_STATUS: | 1298 | case MSR_IA32_MCG_STATUS: |
1299 | case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: | 1299 | case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: |
1300 | return set_msr_mce(vcpu, msr, data); | 1300 | return set_msr_mce(vcpu, msr, data); |
1301 | 1301 | ||
1302 | /* Performance counters are not protected by a CPUID bit, | 1302 | /* Performance counters are not protected by a CPUID bit, |
1303 | * so we should check all of them in the generic path for the sake of | 1303 | * so we should check all of them in the generic path for the sake of |
1304 | * cross vendor migration. | 1304 | * cross vendor migration. |
1305 | * Writing a zero into the event select MSRs disables them, | 1305 | * Writing a zero into the event select MSRs disables them, |
1306 | * which we perfectly emulate ;-). Any other value should be at least | 1306 | * which we perfectly emulate ;-). Any other value should be at least |
1307 | * reported, some guests depend on them. | 1307 | * reported, some guests depend on them. |
1308 | */ | 1308 | */ |
1309 | case MSR_P6_EVNTSEL0: | 1309 | case MSR_P6_EVNTSEL0: |
1310 | case MSR_P6_EVNTSEL1: | 1310 | case MSR_P6_EVNTSEL1: |
1311 | case MSR_K7_EVNTSEL0: | 1311 | case MSR_K7_EVNTSEL0: |
1312 | case MSR_K7_EVNTSEL1: | 1312 | case MSR_K7_EVNTSEL1: |
1313 | case MSR_K7_EVNTSEL2: | 1313 | case MSR_K7_EVNTSEL2: |
1314 | case MSR_K7_EVNTSEL3: | 1314 | case MSR_K7_EVNTSEL3: |
1315 | if (data != 0) | 1315 | if (data != 0) |
1316 | pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " | 1316 | pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " |
1317 | "0x%x data 0x%llx\n", msr, data); | 1317 | "0x%x data 0x%llx\n", msr, data); |
1318 | break; | 1318 | break; |
1319 | /* at least RHEL 4 unconditionally writes to the perfctr registers, | 1319 | /* at least RHEL 4 unconditionally writes to the perfctr registers, |
1320 | * so we ignore writes to make it happy. | 1320 | * so we ignore writes to make it happy. |
1321 | */ | 1321 | */ |
1322 | case MSR_P6_PERFCTR0: | 1322 | case MSR_P6_PERFCTR0: |
1323 | case MSR_P6_PERFCTR1: | 1323 | case MSR_P6_PERFCTR1: |
1324 | case MSR_K7_PERFCTR0: | 1324 | case MSR_K7_PERFCTR0: |
1325 | case MSR_K7_PERFCTR1: | 1325 | case MSR_K7_PERFCTR1: |
1326 | case MSR_K7_PERFCTR2: | 1326 | case MSR_K7_PERFCTR2: |
1327 | case MSR_K7_PERFCTR3: | 1327 | case MSR_K7_PERFCTR3: |
1328 | pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " | 1328 | pr_unimpl(vcpu, "unimplemented perfctr wrmsr: " |
1329 | "0x%x data 0x%llx\n", msr, data); | 1329 | "0x%x data 0x%llx\n", msr, data); |
1330 | break; | 1330 | break; |
1331 | case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: | 1331 | case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: |
1332 | if (kvm_hv_msr_partition_wide(msr)) { | 1332 | if (kvm_hv_msr_partition_wide(msr)) { |
1333 | int r; | 1333 | int r; |
1334 | mutex_lock(&vcpu->kvm->lock); | 1334 | mutex_lock(&vcpu->kvm->lock); |
1335 | r = set_msr_hyperv_pw(vcpu, msr, data); | 1335 | r = set_msr_hyperv_pw(vcpu, msr, data); |
1336 | mutex_unlock(&vcpu->kvm->lock); | 1336 | mutex_unlock(&vcpu->kvm->lock); |
1337 | return r; | 1337 | return r; |
1338 | } else | 1338 | } else |
1339 | return set_msr_hyperv(vcpu, msr, data); | 1339 | return set_msr_hyperv(vcpu, msr, data); |
1340 | break; | 1340 | break; |
1341 | default: | 1341 | default: |
1342 | if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr)) | 1342 | if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr)) |
1343 | return xen_hvm_config(vcpu, data); | 1343 | return xen_hvm_config(vcpu, data); |
1344 | if (!ignore_msrs) { | 1344 | if (!ignore_msrs) { |
1345 | pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", | 1345 | pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", |
1346 | msr, data); | 1346 | msr, data); |
1347 | return 1; | 1347 | return 1; |
1348 | } else { | 1348 | } else { |
1349 | pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n", | 1349 | pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n", |
1350 | msr, data); | 1350 | msr, data); |
1351 | break; | 1351 | break; |
1352 | } | 1352 | } |
1353 | } | 1353 | } |
1354 | return 0; | 1354 | return 0; |
1355 | } | 1355 | } |
1356 | EXPORT_SYMBOL_GPL(kvm_set_msr_common); | 1356 | EXPORT_SYMBOL_GPL(kvm_set_msr_common); |
1357 | 1357 | ||
1358 | 1358 | ||
1359 | /* | 1359 | /* |
1360 | * Reads an msr value (of 'msr_index') into 'pdata'. | 1360 | * Reads an msr value (of 'msr_index') into 'pdata'. |
1361 | * Returns 0 on success, non-0 otherwise. | 1361 | * Returns 0 on success, non-0 otherwise. |
1362 | * Assumes vcpu_load() was already called. | 1362 | * Assumes vcpu_load() was already called. |
1363 | */ | 1363 | */ |
1364 | int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) | 1364 | int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) |
1365 | { | 1365 | { |
1366 | return kvm_x86_ops->get_msr(vcpu, msr_index, pdata); | 1366 | return kvm_x86_ops->get_msr(vcpu, msr_index, pdata); |
1367 | } | 1367 | } |
1368 | 1368 | ||
1369 | static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) | 1369 | static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) |
1370 | { | 1370 | { |
1371 | u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; | 1371 | u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; |
1372 | 1372 | ||
1373 | if (!msr_mtrr_valid(msr)) | 1373 | if (!msr_mtrr_valid(msr)) |
1374 | return 1; | 1374 | return 1; |
1375 | 1375 | ||
1376 | if (msr == MSR_MTRRdefType) | 1376 | if (msr == MSR_MTRRdefType) |
1377 | *pdata = vcpu->arch.mtrr_state.def_type + | 1377 | *pdata = vcpu->arch.mtrr_state.def_type + |
1378 | (vcpu->arch.mtrr_state.enabled << 10); | 1378 | (vcpu->arch.mtrr_state.enabled << 10); |
1379 | else if (msr == MSR_MTRRfix64K_00000) | 1379 | else if (msr == MSR_MTRRfix64K_00000) |
1380 | *pdata = p[0]; | 1380 | *pdata = p[0]; |
1381 | else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) | 1381 | else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000) |
1382 | *pdata = p[1 + msr - MSR_MTRRfix16K_80000]; | 1382 | *pdata = p[1 + msr - MSR_MTRRfix16K_80000]; |
1383 | else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) | 1383 | else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000) |
1384 | *pdata = p[3 + msr - MSR_MTRRfix4K_C0000]; | 1384 | *pdata = p[3 + msr - MSR_MTRRfix4K_C0000]; |
1385 | else if (msr == MSR_IA32_CR_PAT) | 1385 | else if (msr == MSR_IA32_CR_PAT) |
1386 | *pdata = vcpu->arch.pat; | 1386 | *pdata = vcpu->arch.pat; |
1387 | else { /* Variable MTRRs */ | 1387 | else { /* Variable MTRRs */ |
1388 | int idx, is_mtrr_mask; | 1388 | int idx, is_mtrr_mask; |
1389 | u64 *pt; | 1389 | u64 *pt; |
1390 | 1390 | ||
1391 | idx = (msr - 0x200) / 2; | 1391 | idx = (msr - 0x200) / 2; |
1392 | is_mtrr_mask = msr - 0x200 - 2 * idx; | 1392 | is_mtrr_mask = msr - 0x200 - 2 * idx; |
1393 | if (!is_mtrr_mask) | 1393 | if (!is_mtrr_mask) |
1394 | pt = | 1394 | pt = |
1395 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; | 1395 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo; |
1396 | else | 1396 | else |
1397 | pt = | 1397 | pt = |
1398 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; | 1398 | (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo; |
1399 | *pdata = *pt; | 1399 | *pdata = *pt; |
1400 | } | 1400 | } |
1401 | 1401 | ||
1402 | return 0; | 1402 | return 0; |
1403 | } | 1403 | } |
1404 | 1404 | ||
1405 | static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) | 1405 | static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) |
1406 | { | 1406 | { |
1407 | u64 data; | 1407 | u64 data; |
1408 | u64 mcg_cap = vcpu->arch.mcg_cap; | 1408 | u64 mcg_cap = vcpu->arch.mcg_cap; |
1409 | unsigned bank_num = mcg_cap & 0xff; | 1409 | unsigned bank_num = mcg_cap & 0xff; |
1410 | 1410 | ||
1411 | switch (msr) { | 1411 | switch (msr) { |
1412 | case MSR_IA32_P5_MC_ADDR: | 1412 | case MSR_IA32_P5_MC_ADDR: |
1413 | case MSR_IA32_P5_MC_TYPE: | 1413 | case MSR_IA32_P5_MC_TYPE: |
1414 | data = 0; | 1414 | data = 0; |
1415 | break; | 1415 | break; |
1416 | case MSR_IA32_MCG_CAP: | 1416 | case MSR_IA32_MCG_CAP: |
1417 | data = vcpu->arch.mcg_cap; | 1417 | data = vcpu->arch.mcg_cap; |
1418 | break; | 1418 | break; |
1419 | case MSR_IA32_MCG_CTL: | 1419 | case MSR_IA32_MCG_CTL: |
1420 | if (!(mcg_cap & MCG_CTL_P)) | 1420 | if (!(mcg_cap & MCG_CTL_P)) |
1421 | return 1; | 1421 | return 1; |
1422 | data = vcpu->arch.mcg_ctl; | 1422 | data = vcpu->arch.mcg_ctl; |
1423 | break; | 1423 | break; |
1424 | case MSR_IA32_MCG_STATUS: | 1424 | case MSR_IA32_MCG_STATUS: |
1425 | data = vcpu->arch.mcg_status; | 1425 | data = vcpu->arch.mcg_status; |
1426 | break; | 1426 | break; |
1427 | default: | 1427 | default: |
1428 | if (msr >= MSR_IA32_MC0_CTL && | 1428 | if (msr >= MSR_IA32_MC0_CTL && |
1429 | msr < MSR_IA32_MC0_CTL + 4 * bank_num) { | 1429 | msr < MSR_IA32_MC0_CTL + 4 * bank_num) { |
1430 | u32 offset = msr - MSR_IA32_MC0_CTL; | 1430 | u32 offset = msr - MSR_IA32_MC0_CTL; |
1431 | data = vcpu->arch.mce_banks[offset]; | 1431 | data = vcpu->arch.mce_banks[offset]; |
1432 | break; | 1432 | break; |
1433 | } | 1433 | } |
1434 | return 1; | 1434 | return 1; |
1435 | } | 1435 | } |
1436 | *pdata = data; | 1436 | *pdata = data; |
1437 | return 0; | 1437 | return 0; |
1438 | } | 1438 | } |
1439 | 1439 | ||
1440 | static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) | 1440 | static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) |
1441 | { | 1441 | { |
1442 | u64 data = 0; | 1442 | u64 data = 0; |
1443 | struct kvm *kvm = vcpu->kvm; | 1443 | struct kvm *kvm = vcpu->kvm; |
1444 | 1444 | ||
1445 | switch (msr) { | 1445 | switch (msr) { |
1446 | case HV_X64_MSR_GUEST_OS_ID: | 1446 | case HV_X64_MSR_GUEST_OS_ID: |
1447 | data = kvm->arch.hv_guest_os_id; | 1447 | data = kvm->arch.hv_guest_os_id; |
1448 | break; | 1448 | break; |
1449 | case HV_X64_MSR_HYPERCALL: | 1449 | case HV_X64_MSR_HYPERCALL: |
1450 | data = kvm->arch.hv_hypercall; | 1450 | data = kvm->arch.hv_hypercall; |
1451 | break; | 1451 | break; |
1452 | default: | 1452 | default: |
1453 | pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); | 1453 | pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); |
1454 | return 1; | 1454 | return 1; |
1455 | } | 1455 | } |
1456 | 1456 | ||
1457 | *pdata = data; | 1457 | *pdata = data; |
1458 | return 0; | 1458 | return 0; |
1459 | } | 1459 | } |
1460 | 1460 | ||
1461 | static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) | 1461 | static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) |
1462 | { | 1462 | { |
1463 | u64 data = 0; | 1463 | u64 data = 0; |
1464 | 1464 | ||
1465 | switch (msr) { | 1465 | switch (msr) { |
1466 | case HV_X64_MSR_VP_INDEX: { | 1466 | case HV_X64_MSR_VP_INDEX: { |
1467 | int r; | 1467 | int r; |
1468 | struct kvm_vcpu *v; | 1468 | struct kvm_vcpu *v; |
1469 | kvm_for_each_vcpu(r, v, vcpu->kvm) | 1469 | kvm_for_each_vcpu(r, v, vcpu->kvm) |
1470 | if (v == vcpu) | 1470 | if (v == vcpu) |
1471 | data = r; | 1471 | data = r; |
1472 | break; | 1472 | break; |
1473 | } | 1473 | } |
1474 | case HV_X64_MSR_EOI: | 1474 | case HV_X64_MSR_EOI: |
1475 | return kvm_hv_vapic_msr_read(vcpu, APIC_EOI, pdata); | 1475 | return kvm_hv_vapic_msr_read(vcpu, APIC_EOI, pdata); |
1476 | case HV_X64_MSR_ICR: | 1476 | case HV_X64_MSR_ICR: |
1477 | return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata); | 1477 | return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata); |
1478 | case HV_X64_MSR_TPR: | 1478 | case HV_X64_MSR_TPR: |
1479 | return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata); | 1479 | return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata); |
1480 | default: | 1480 | default: |
1481 | pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); | 1481 | pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); |
1482 | return 1; | 1482 | return 1; |
1483 | } | 1483 | } |
1484 | *pdata = data; | 1484 | *pdata = data; |
1485 | return 0; | 1485 | return 0; |
1486 | } | 1486 | } |
1487 | 1487 | ||
1488 | int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) | 1488 | int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) |
1489 | { | 1489 | { |
1490 | u64 data; | 1490 | u64 data; |
1491 | 1491 | ||
1492 | switch (msr) { | 1492 | switch (msr) { |
1493 | case MSR_IA32_PLATFORM_ID: | 1493 | case MSR_IA32_PLATFORM_ID: |
1494 | case MSR_IA32_UCODE_REV: | 1494 | case MSR_IA32_UCODE_REV: |
1495 | case MSR_IA32_EBL_CR_POWERON: | 1495 | case MSR_IA32_EBL_CR_POWERON: |
1496 | case MSR_IA32_DEBUGCTLMSR: | 1496 | case MSR_IA32_DEBUGCTLMSR: |
1497 | case MSR_IA32_LASTBRANCHFROMIP: | 1497 | case MSR_IA32_LASTBRANCHFROMIP: |
1498 | case MSR_IA32_LASTBRANCHTOIP: | 1498 | case MSR_IA32_LASTBRANCHTOIP: |
1499 | case MSR_IA32_LASTINTFROMIP: | 1499 | case MSR_IA32_LASTINTFROMIP: |
1500 | case MSR_IA32_LASTINTTOIP: | 1500 | case MSR_IA32_LASTINTTOIP: |
1501 | case MSR_K8_SYSCFG: | 1501 | case MSR_K8_SYSCFG: |
1502 | case MSR_K7_HWCR: | 1502 | case MSR_K7_HWCR: |
1503 | case MSR_VM_HSAVE_PA: | 1503 | case MSR_VM_HSAVE_PA: |
1504 | case MSR_P6_PERFCTR0: | 1504 | case MSR_P6_PERFCTR0: |
1505 | case MSR_P6_PERFCTR1: | 1505 | case MSR_P6_PERFCTR1: |
1506 | case MSR_P6_EVNTSEL0: | 1506 | case MSR_P6_EVNTSEL0: |
1507 | case MSR_P6_EVNTSEL1: | 1507 | case MSR_P6_EVNTSEL1: |
1508 | case MSR_K7_EVNTSEL0: | 1508 | case MSR_K7_EVNTSEL0: |
1509 | case MSR_K7_PERFCTR0: | 1509 | case MSR_K7_PERFCTR0: |
1510 | case MSR_K8_INT_PENDING_MSG: | 1510 | case MSR_K8_INT_PENDING_MSG: |
1511 | case MSR_AMD64_NB_CFG: | 1511 | case MSR_AMD64_NB_CFG: |
1512 | case MSR_FAM10H_MMIO_CONF_BASE: | 1512 | case MSR_FAM10H_MMIO_CONF_BASE: |
1513 | data = 0; | 1513 | data = 0; |
1514 | break; | 1514 | break; |
1515 | case MSR_MTRRcap: | 1515 | case MSR_MTRRcap: |
1516 | data = 0x500 | KVM_NR_VAR_MTRR; | 1516 | data = 0x500 | KVM_NR_VAR_MTRR; |
1517 | break; | 1517 | break; |
1518 | case 0x200 ... 0x2ff: | 1518 | case 0x200 ... 0x2ff: |
1519 | return get_msr_mtrr(vcpu, msr, pdata); | 1519 | return get_msr_mtrr(vcpu, msr, pdata); |
1520 | case 0xcd: /* fsb frequency */ | 1520 | case 0xcd: /* fsb frequency */ |
1521 | data = 3; | 1521 | data = 3; |
1522 | break; | 1522 | break; |
1523 | case MSR_IA32_APICBASE: | 1523 | case MSR_IA32_APICBASE: |
1524 | data = kvm_get_apic_base(vcpu); | 1524 | data = kvm_get_apic_base(vcpu); |
1525 | break; | 1525 | break; |
1526 | case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: | 1526 | case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: |
1527 | return kvm_x2apic_msr_read(vcpu, msr, pdata); | 1527 | return kvm_x2apic_msr_read(vcpu, msr, pdata); |
1528 | break; | 1528 | break; |
1529 | case MSR_IA32_MISC_ENABLE: | 1529 | case MSR_IA32_MISC_ENABLE: |
1530 | data = vcpu->arch.ia32_misc_enable_msr; | 1530 | data = vcpu->arch.ia32_misc_enable_msr; |
1531 | break; | 1531 | break; |
1532 | case MSR_IA32_PERF_STATUS: | 1532 | case MSR_IA32_PERF_STATUS: |
1533 | /* TSC increment by tick */ | 1533 | /* TSC increment by tick */ |
1534 | data = 1000ULL; | 1534 | data = 1000ULL; |
1535 | /* CPU multiplier */ | 1535 | /* CPU multiplier */ |
1536 | data |= (((uint64_t)4ULL) << 40); | 1536 | data |= (((uint64_t)4ULL) << 40); |
1537 | break; | 1537 | break; |
1538 | case MSR_EFER: | 1538 | case MSR_EFER: |
1539 | data = vcpu->arch.efer; | 1539 | data = vcpu->arch.efer; |
1540 | break; | 1540 | break; |
1541 | case MSR_KVM_WALL_CLOCK: | 1541 | case MSR_KVM_WALL_CLOCK: |
1542 | case MSR_KVM_WALL_CLOCK_NEW: | 1542 | case MSR_KVM_WALL_CLOCK_NEW: |
1543 | data = vcpu->kvm->arch.wall_clock; | 1543 | data = vcpu->kvm->arch.wall_clock; |
1544 | break; | 1544 | break; |
1545 | case MSR_KVM_SYSTEM_TIME: | 1545 | case MSR_KVM_SYSTEM_TIME: |
1546 | case MSR_KVM_SYSTEM_TIME_NEW: | 1546 | case MSR_KVM_SYSTEM_TIME_NEW: |
1547 | data = vcpu->arch.time; | 1547 | data = vcpu->arch.time; |
1548 | break; | 1548 | break; |
1549 | case MSR_IA32_P5_MC_ADDR: | 1549 | case MSR_IA32_P5_MC_ADDR: |
1550 | case MSR_IA32_P5_MC_TYPE: | 1550 | case MSR_IA32_P5_MC_TYPE: |
1551 | case MSR_IA32_MCG_CAP: | 1551 | case MSR_IA32_MCG_CAP: |
1552 | case MSR_IA32_MCG_CTL: | 1552 | case MSR_IA32_MCG_CTL: |
1553 | case MSR_IA32_MCG_STATUS: | 1553 | case MSR_IA32_MCG_STATUS: |
1554 | case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: | 1554 | case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: |
1555 | return get_msr_mce(vcpu, msr, pdata); | 1555 | return get_msr_mce(vcpu, msr, pdata); |
1556 | case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: | 1556 | case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15: |
1557 | if (kvm_hv_msr_partition_wide(msr)) { | 1557 | if (kvm_hv_msr_partition_wide(msr)) { |
1558 | int r; | 1558 | int r; |
1559 | mutex_lock(&vcpu->kvm->lock); | 1559 | mutex_lock(&vcpu->kvm->lock); |
1560 | r = get_msr_hyperv_pw(vcpu, msr, pdata); | 1560 | r = get_msr_hyperv_pw(vcpu, msr, pdata); |
1561 | mutex_unlock(&vcpu->kvm->lock); | 1561 | mutex_unlock(&vcpu->kvm->lock); |
1562 | return r; | 1562 | return r; |
1563 | } else | 1563 | } else |
1564 | return get_msr_hyperv(vcpu, msr, pdata); | 1564 | return get_msr_hyperv(vcpu, msr, pdata); |
1565 | break; | 1565 | break; |
1566 | default: | 1566 | default: |
1567 | if (!ignore_msrs) { | 1567 | if (!ignore_msrs) { |
1568 | pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); | 1568 | pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); |
1569 | return 1; | 1569 | return 1; |
1570 | } else { | 1570 | } else { |
1571 | pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr); | 1571 | pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr); |
1572 | data = 0; | 1572 | data = 0; |
1573 | } | 1573 | } |
1574 | break; | 1574 | break; |
1575 | } | 1575 | } |
1576 | *pdata = data; | 1576 | *pdata = data; |
1577 | return 0; | 1577 | return 0; |
1578 | } | 1578 | } |
1579 | EXPORT_SYMBOL_GPL(kvm_get_msr_common); | 1579 | EXPORT_SYMBOL_GPL(kvm_get_msr_common); |
1580 | 1580 | ||
1581 | /* | 1581 | /* |
1582 | * Read or write a bunch of msrs. All parameters are kernel addresses. | 1582 | * Read or write a bunch of msrs. All parameters are kernel addresses. |
1583 | * | 1583 | * |
1584 | * @return number of msrs set successfully. | 1584 | * @return number of msrs set successfully. |
1585 | */ | 1585 | */ |
1586 | static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, | 1586 | static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, |
1587 | struct kvm_msr_entry *entries, | 1587 | struct kvm_msr_entry *entries, |
1588 | int (*do_msr)(struct kvm_vcpu *vcpu, | 1588 | int (*do_msr)(struct kvm_vcpu *vcpu, |
1589 | unsigned index, u64 *data)) | 1589 | unsigned index, u64 *data)) |
1590 | { | 1590 | { |
1591 | int i, idx; | 1591 | int i, idx; |
1592 | 1592 | ||
1593 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 1593 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
1594 | for (i = 0; i < msrs->nmsrs; ++i) | 1594 | for (i = 0; i < msrs->nmsrs; ++i) |
1595 | if (do_msr(vcpu, entries[i].index, &entries[i].data)) | 1595 | if (do_msr(vcpu, entries[i].index, &entries[i].data)) |
1596 | break; | 1596 | break; |
1597 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 1597 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
1598 | 1598 | ||
1599 | return i; | 1599 | return i; |
1600 | } | 1600 | } |
1601 | 1601 | ||
1602 | /* | 1602 | /* |
1603 | * Read or write a bunch of msrs. Parameters are user addresses. | 1603 | * Read or write a bunch of msrs. Parameters are user addresses. |
1604 | * | 1604 | * |
1605 | * @return number of msrs set successfully. | 1605 | * @return number of msrs set successfully. |
1606 | */ | 1606 | */ |
1607 | static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, | 1607 | static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, |
1608 | int (*do_msr)(struct kvm_vcpu *vcpu, | 1608 | int (*do_msr)(struct kvm_vcpu *vcpu, |
1609 | unsigned index, u64 *data), | 1609 | unsigned index, u64 *data), |
1610 | int writeback) | 1610 | int writeback) |
1611 | { | 1611 | { |
1612 | struct kvm_msrs msrs; | 1612 | struct kvm_msrs msrs; |
1613 | struct kvm_msr_entry *entries; | 1613 | struct kvm_msr_entry *entries; |
1614 | int r, n; | 1614 | int r, n; |
1615 | unsigned size; | 1615 | unsigned size; |
1616 | 1616 | ||
1617 | r = -EFAULT; | 1617 | r = -EFAULT; |
1618 | if (copy_from_user(&msrs, user_msrs, sizeof msrs)) | 1618 | if (copy_from_user(&msrs, user_msrs, sizeof msrs)) |
1619 | goto out; | 1619 | goto out; |
1620 | 1620 | ||
1621 | r = -E2BIG; | 1621 | r = -E2BIG; |
1622 | if (msrs.nmsrs >= MAX_IO_MSRS) | 1622 | if (msrs.nmsrs >= MAX_IO_MSRS) |
1623 | goto out; | 1623 | goto out; |
1624 | 1624 | ||
1625 | r = -ENOMEM; | 1625 | r = -ENOMEM; |
1626 | size = sizeof(struct kvm_msr_entry) * msrs.nmsrs; | 1626 | size = sizeof(struct kvm_msr_entry) * msrs.nmsrs; |
1627 | entries = kmalloc(size, GFP_KERNEL); | 1627 | entries = kmalloc(size, GFP_KERNEL); |
1628 | if (!entries) | 1628 | if (!entries) |
1629 | goto out; | 1629 | goto out; |
1630 | 1630 | ||
1631 | r = -EFAULT; | 1631 | r = -EFAULT; |
1632 | if (copy_from_user(entries, user_msrs->entries, size)) | 1632 | if (copy_from_user(entries, user_msrs->entries, size)) |
1633 | goto out_free; | 1633 | goto out_free; |
1634 | 1634 | ||
1635 | r = n = __msr_io(vcpu, &msrs, entries, do_msr); | 1635 | r = n = __msr_io(vcpu, &msrs, entries, do_msr); |
1636 | if (r < 0) | 1636 | if (r < 0) |
1637 | goto out_free; | 1637 | goto out_free; |
1638 | 1638 | ||
1639 | r = -EFAULT; | 1639 | r = -EFAULT; |
1640 | if (writeback && copy_to_user(user_msrs->entries, entries, size)) | 1640 | if (writeback && copy_to_user(user_msrs->entries, entries, size)) |
1641 | goto out_free; | 1641 | goto out_free; |
1642 | 1642 | ||
1643 | r = n; | 1643 | r = n; |
1644 | 1644 | ||
1645 | out_free: | 1645 | out_free: |
1646 | kfree(entries); | 1646 | kfree(entries); |
1647 | out: | 1647 | out: |
1648 | return r; | 1648 | return r; |
1649 | } | 1649 | } |
1650 | 1650 | ||
1651 | int kvm_dev_ioctl_check_extension(long ext) | 1651 | int kvm_dev_ioctl_check_extension(long ext) |
1652 | { | 1652 | { |
1653 | int r; | 1653 | int r; |
1654 | 1654 | ||
1655 | switch (ext) { | 1655 | switch (ext) { |
1656 | case KVM_CAP_IRQCHIP: | 1656 | case KVM_CAP_IRQCHIP: |
1657 | case KVM_CAP_HLT: | 1657 | case KVM_CAP_HLT: |
1658 | case KVM_CAP_MMU_SHADOW_CACHE_CONTROL: | 1658 | case KVM_CAP_MMU_SHADOW_CACHE_CONTROL: |
1659 | case KVM_CAP_SET_TSS_ADDR: | 1659 | case KVM_CAP_SET_TSS_ADDR: |
1660 | case KVM_CAP_EXT_CPUID: | 1660 | case KVM_CAP_EXT_CPUID: |
1661 | case KVM_CAP_CLOCKSOURCE: | 1661 | case KVM_CAP_CLOCKSOURCE: |
1662 | case KVM_CAP_PIT: | 1662 | case KVM_CAP_PIT: |
1663 | case KVM_CAP_NOP_IO_DELAY: | 1663 | case KVM_CAP_NOP_IO_DELAY: |
1664 | case KVM_CAP_MP_STATE: | 1664 | case KVM_CAP_MP_STATE: |
1665 | case KVM_CAP_SYNC_MMU: | 1665 | case KVM_CAP_SYNC_MMU: |
1666 | case KVM_CAP_REINJECT_CONTROL: | 1666 | case KVM_CAP_REINJECT_CONTROL: |
1667 | case KVM_CAP_IRQ_INJECT_STATUS: | 1667 | case KVM_CAP_IRQ_INJECT_STATUS: |
1668 | case KVM_CAP_ASSIGN_DEV_IRQ: | 1668 | case KVM_CAP_ASSIGN_DEV_IRQ: |
1669 | case KVM_CAP_IRQFD: | 1669 | case KVM_CAP_IRQFD: |
1670 | case KVM_CAP_IOEVENTFD: | 1670 | case KVM_CAP_IOEVENTFD: |
1671 | case KVM_CAP_PIT2: | 1671 | case KVM_CAP_PIT2: |
1672 | case KVM_CAP_PIT_STATE2: | 1672 | case KVM_CAP_PIT_STATE2: |
1673 | case KVM_CAP_SET_IDENTITY_MAP_ADDR: | 1673 | case KVM_CAP_SET_IDENTITY_MAP_ADDR: |
1674 | case KVM_CAP_XEN_HVM: | 1674 | case KVM_CAP_XEN_HVM: |
1675 | case KVM_CAP_ADJUST_CLOCK: | 1675 | case KVM_CAP_ADJUST_CLOCK: |
1676 | case KVM_CAP_VCPU_EVENTS: | 1676 | case KVM_CAP_VCPU_EVENTS: |
1677 | case KVM_CAP_HYPERV: | 1677 | case KVM_CAP_HYPERV: |
1678 | case KVM_CAP_HYPERV_VAPIC: | 1678 | case KVM_CAP_HYPERV_VAPIC: |
1679 | case KVM_CAP_HYPERV_SPIN: | 1679 | case KVM_CAP_HYPERV_SPIN: |
1680 | case KVM_CAP_PCI_SEGMENT: | 1680 | case KVM_CAP_PCI_SEGMENT: |
1681 | case KVM_CAP_DEBUGREGS: | 1681 | case KVM_CAP_DEBUGREGS: |
1682 | case KVM_CAP_X86_ROBUST_SINGLESTEP: | 1682 | case KVM_CAP_X86_ROBUST_SINGLESTEP: |
1683 | case KVM_CAP_XSAVE: | 1683 | case KVM_CAP_XSAVE: |
1684 | r = 1; | 1684 | r = 1; |
1685 | break; | 1685 | break; |
1686 | case KVM_CAP_COALESCED_MMIO: | 1686 | case KVM_CAP_COALESCED_MMIO: |
1687 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; | 1687 | r = KVM_COALESCED_MMIO_PAGE_OFFSET; |
1688 | break; | 1688 | break; |
1689 | case KVM_CAP_VAPIC: | 1689 | case KVM_CAP_VAPIC: |
1690 | r = !kvm_x86_ops->cpu_has_accelerated_tpr(); | 1690 | r = !kvm_x86_ops->cpu_has_accelerated_tpr(); |
1691 | break; | 1691 | break; |
1692 | case KVM_CAP_NR_VCPUS: | 1692 | case KVM_CAP_NR_VCPUS: |
1693 | r = KVM_MAX_VCPUS; | 1693 | r = KVM_MAX_VCPUS; |
1694 | break; | 1694 | break; |
1695 | case KVM_CAP_NR_MEMSLOTS: | 1695 | case KVM_CAP_NR_MEMSLOTS: |
1696 | r = KVM_MEMORY_SLOTS; | 1696 | r = KVM_MEMORY_SLOTS; |
1697 | break; | 1697 | break; |
1698 | case KVM_CAP_PV_MMU: /* obsolete */ | 1698 | case KVM_CAP_PV_MMU: /* obsolete */ |
1699 | r = 0; | 1699 | r = 0; |
1700 | break; | 1700 | break; |
1701 | case KVM_CAP_IOMMU: | 1701 | case KVM_CAP_IOMMU: |
1702 | r = iommu_found(); | 1702 | r = iommu_found(); |
1703 | break; | 1703 | break; |
1704 | case KVM_CAP_MCE: | 1704 | case KVM_CAP_MCE: |
1705 | r = KVM_MAX_MCE_BANKS; | 1705 | r = KVM_MAX_MCE_BANKS; |
1706 | break; | 1706 | break; |
1707 | case KVM_CAP_XCRS: | 1707 | case KVM_CAP_XCRS: |
1708 | r = cpu_has_xsave; | 1708 | r = cpu_has_xsave; |
1709 | break; | 1709 | break; |
1710 | default: | 1710 | default: |
1711 | r = 0; | 1711 | r = 0; |
1712 | break; | 1712 | break; |
1713 | } | 1713 | } |
1714 | return r; | 1714 | return r; |
1715 | 1715 | ||
1716 | } | 1716 | } |
1717 | 1717 | ||
1718 | long kvm_arch_dev_ioctl(struct file *filp, | 1718 | long kvm_arch_dev_ioctl(struct file *filp, |
1719 | unsigned int ioctl, unsigned long arg) | 1719 | unsigned int ioctl, unsigned long arg) |
1720 | { | 1720 | { |
1721 | void __user *argp = (void __user *)arg; | 1721 | void __user *argp = (void __user *)arg; |
1722 | long r; | 1722 | long r; |
1723 | 1723 | ||
1724 | switch (ioctl) { | 1724 | switch (ioctl) { |
1725 | case KVM_GET_MSR_INDEX_LIST: { | 1725 | case KVM_GET_MSR_INDEX_LIST: { |
1726 | struct kvm_msr_list __user *user_msr_list = argp; | 1726 | struct kvm_msr_list __user *user_msr_list = argp; |
1727 | struct kvm_msr_list msr_list; | 1727 | struct kvm_msr_list msr_list; |
1728 | unsigned n; | 1728 | unsigned n; |
1729 | 1729 | ||
1730 | r = -EFAULT; | 1730 | r = -EFAULT; |
1731 | if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list)) | 1731 | if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list)) |
1732 | goto out; | 1732 | goto out; |
1733 | n = msr_list.nmsrs; | 1733 | n = msr_list.nmsrs; |
1734 | msr_list.nmsrs = num_msrs_to_save + ARRAY_SIZE(emulated_msrs); | 1734 | msr_list.nmsrs = num_msrs_to_save + ARRAY_SIZE(emulated_msrs); |
1735 | if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list)) | 1735 | if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list)) |
1736 | goto out; | 1736 | goto out; |
1737 | r = -E2BIG; | 1737 | r = -E2BIG; |
1738 | if (n < msr_list.nmsrs) | 1738 | if (n < msr_list.nmsrs) |
1739 | goto out; | 1739 | goto out; |
1740 | r = -EFAULT; | 1740 | r = -EFAULT; |
1741 | if (copy_to_user(user_msr_list->indices, &msrs_to_save, | 1741 | if (copy_to_user(user_msr_list->indices, &msrs_to_save, |
1742 | num_msrs_to_save * sizeof(u32))) | 1742 | num_msrs_to_save * sizeof(u32))) |
1743 | goto out; | 1743 | goto out; |
1744 | if (copy_to_user(user_msr_list->indices + num_msrs_to_save, | 1744 | if (copy_to_user(user_msr_list->indices + num_msrs_to_save, |
1745 | &emulated_msrs, | 1745 | &emulated_msrs, |
1746 | ARRAY_SIZE(emulated_msrs) * sizeof(u32))) | 1746 | ARRAY_SIZE(emulated_msrs) * sizeof(u32))) |
1747 | goto out; | 1747 | goto out; |
1748 | r = 0; | 1748 | r = 0; |
1749 | break; | 1749 | break; |
1750 | } | 1750 | } |
1751 | case KVM_GET_SUPPORTED_CPUID: { | 1751 | case KVM_GET_SUPPORTED_CPUID: { |
1752 | struct kvm_cpuid2 __user *cpuid_arg = argp; | 1752 | struct kvm_cpuid2 __user *cpuid_arg = argp; |
1753 | struct kvm_cpuid2 cpuid; | 1753 | struct kvm_cpuid2 cpuid; |
1754 | 1754 | ||
1755 | r = -EFAULT; | 1755 | r = -EFAULT; |
1756 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) | 1756 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) |
1757 | goto out; | 1757 | goto out; |
1758 | r = kvm_dev_ioctl_get_supported_cpuid(&cpuid, | 1758 | r = kvm_dev_ioctl_get_supported_cpuid(&cpuid, |
1759 | cpuid_arg->entries); | 1759 | cpuid_arg->entries); |
1760 | if (r) | 1760 | if (r) |
1761 | goto out; | 1761 | goto out; |
1762 | 1762 | ||
1763 | r = -EFAULT; | 1763 | r = -EFAULT; |
1764 | if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) | 1764 | if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) |
1765 | goto out; | 1765 | goto out; |
1766 | r = 0; | 1766 | r = 0; |
1767 | break; | 1767 | break; |
1768 | } | 1768 | } |
1769 | case KVM_X86_GET_MCE_CAP_SUPPORTED: { | 1769 | case KVM_X86_GET_MCE_CAP_SUPPORTED: { |
1770 | u64 mce_cap; | 1770 | u64 mce_cap; |
1771 | 1771 | ||
1772 | mce_cap = KVM_MCE_CAP_SUPPORTED; | 1772 | mce_cap = KVM_MCE_CAP_SUPPORTED; |
1773 | r = -EFAULT; | 1773 | r = -EFAULT; |
1774 | if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) | 1774 | if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) |
1775 | goto out; | 1775 | goto out; |
1776 | r = 0; | 1776 | r = 0; |
1777 | break; | 1777 | break; |
1778 | } | 1778 | } |
1779 | default: | 1779 | default: |
1780 | r = -EINVAL; | 1780 | r = -EINVAL; |
1781 | } | 1781 | } |
1782 | out: | 1782 | out: |
1783 | return r; | 1783 | return r; |
1784 | } | 1784 | } |
1785 | 1785 | ||
1786 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) | 1786 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) |
1787 | { | 1787 | { |
1788 | kvm_x86_ops->vcpu_load(vcpu, cpu); | 1788 | kvm_x86_ops->vcpu_load(vcpu, cpu); |
1789 | if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0)) { | 1789 | if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0)) { |
1790 | unsigned long khz = cpufreq_quick_get(cpu); | 1790 | unsigned long khz = cpufreq_quick_get(cpu); |
1791 | if (!khz) | 1791 | if (!khz) |
1792 | khz = tsc_khz; | 1792 | khz = tsc_khz; |
1793 | per_cpu(cpu_tsc_khz, cpu) = khz; | 1793 | per_cpu(cpu_tsc_khz, cpu) = khz; |
1794 | } | 1794 | } |
1795 | kvm_request_guest_time_update(vcpu); | 1795 | kvm_request_guest_time_update(vcpu); |
1796 | } | 1796 | } |
1797 | 1797 | ||
1798 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) | 1798 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) |
1799 | { | 1799 | { |
1800 | kvm_x86_ops->vcpu_put(vcpu); | 1800 | kvm_x86_ops->vcpu_put(vcpu); |
1801 | kvm_put_guest_fpu(vcpu); | 1801 | kvm_put_guest_fpu(vcpu); |
1802 | } | 1802 | } |
1803 | 1803 | ||
1804 | static int is_efer_nx(void) | 1804 | static int is_efer_nx(void) |
1805 | { | 1805 | { |
1806 | unsigned long long efer = 0; | 1806 | unsigned long long efer = 0; |
1807 | 1807 | ||
1808 | rdmsrl_safe(MSR_EFER, &efer); | 1808 | rdmsrl_safe(MSR_EFER, &efer); |
1809 | return efer & EFER_NX; | 1809 | return efer & EFER_NX; |
1810 | } | 1810 | } |
1811 | 1811 | ||
1812 | static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) | 1812 | static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) |
1813 | { | 1813 | { |
1814 | int i; | 1814 | int i; |
1815 | struct kvm_cpuid_entry2 *e, *entry; | 1815 | struct kvm_cpuid_entry2 *e, *entry; |
1816 | 1816 | ||
1817 | entry = NULL; | 1817 | entry = NULL; |
1818 | for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { | 1818 | for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { |
1819 | e = &vcpu->arch.cpuid_entries[i]; | 1819 | e = &vcpu->arch.cpuid_entries[i]; |
1820 | if (e->function == 0x80000001) { | 1820 | if (e->function == 0x80000001) { |
1821 | entry = e; | 1821 | entry = e; |
1822 | break; | 1822 | break; |
1823 | } | 1823 | } |
1824 | } | 1824 | } |
1825 | if (entry && (entry->edx & (1 << 20)) && !is_efer_nx()) { | 1825 | if (entry && (entry->edx & (1 << 20)) && !is_efer_nx()) { |
1826 | entry->edx &= ~(1 << 20); | 1826 | entry->edx &= ~(1 << 20); |
1827 | printk(KERN_INFO "kvm: guest NX capability removed\n"); | 1827 | printk(KERN_INFO "kvm: guest NX capability removed\n"); |
1828 | } | 1828 | } |
1829 | } | 1829 | } |
1830 | 1830 | ||
1831 | /* when an old userspace process fills a new kernel module */ | 1831 | /* when an old userspace process fills a new kernel module */ |
1832 | static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, | 1832 | static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, |
1833 | struct kvm_cpuid *cpuid, | 1833 | struct kvm_cpuid *cpuid, |
1834 | struct kvm_cpuid_entry __user *entries) | 1834 | struct kvm_cpuid_entry __user *entries) |
1835 | { | 1835 | { |
1836 | int r, i; | 1836 | int r, i; |
1837 | struct kvm_cpuid_entry *cpuid_entries; | 1837 | struct kvm_cpuid_entry *cpuid_entries; |
1838 | 1838 | ||
1839 | r = -E2BIG; | 1839 | r = -E2BIG; |
1840 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) | 1840 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) |
1841 | goto out; | 1841 | goto out; |
1842 | r = -ENOMEM; | 1842 | r = -ENOMEM; |
1843 | cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry) * cpuid->nent); | 1843 | cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry) * cpuid->nent); |
1844 | if (!cpuid_entries) | 1844 | if (!cpuid_entries) |
1845 | goto out; | 1845 | goto out; |
1846 | r = -EFAULT; | 1846 | r = -EFAULT; |
1847 | if (copy_from_user(cpuid_entries, entries, | 1847 | if (copy_from_user(cpuid_entries, entries, |
1848 | cpuid->nent * sizeof(struct kvm_cpuid_entry))) | 1848 | cpuid->nent * sizeof(struct kvm_cpuid_entry))) |
1849 | goto out_free; | 1849 | goto out_free; |
1850 | for (i = 0; i < cpuid->nent; i++) { | 1850 | for (i = 0; i < cpuid->nent; i++) { |
1851 | vcpu->arch.cpuid_entries[i].function = cpuid_entries[i].function; | 1851 | vcpu->arch.cpuid_entries[i].function = cpuid_entries[i].function; |
1852 | vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax; | 1852 | vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax; |
1853 | vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx; | 1853 | vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx; |
1854 | vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx; | 1854 | vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx; |
1855 | vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx; | 1855 | vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx; |
1856 | vcpu->arch.cpuid_entries[i].index = 0; | 1856 | vcpu->arch.cpuid_entries[i].index = 0; |
1857 | vcpu->arch.cpuid_entries[i].flags = 0; | 1857 | vcpu->arch.cpuid_entries[i].flags = 0; |
1858 | vcpu->arch.cpuid_entries[i].padding[0] = 0; | 1858 | vcpu->arch.cpuid_entries[i].padding[0] = 0; |
1859 | vcpu->arch.cpuid_entries[i].padding[1] = 0; | 1859 | vcpu->arch.cpuid_entries[i].padding[1] = 0; |
1860 | vcpu->arch.cpuid_entries[i].padding[2] = 0; | 1860 | vcpu->arch.cpuid_entries[i].padding[2] = 0; |
1861 | } | 1861 | } |
1862 | vcpu->arch.cpuid_nent = cpuid->nent; | 1862 | vcpu->arch.cpuid_nent = cpuid->nent; |
1863 | cpuid_fix_nx_cap(vcpu); | 1863 | cpuid_fix_nx_cap(vcpu); |
1864 | r = 0; | 1864 | r = 0; |
1865 | kvm_apic_set_version(vcpu); | 1865 | kvm_apic_set_version(vcpu); |
1866 | kvm_x86_ops->cpuid_update(vcpu); | 1866 | kvm_x86_ops->cpuid_update(vcpu); |
1867 | update_cpuid(vcpu); | 1867 | update_cpuid(vcpu); |
1868 | 1868 | ||
1869 | out_free: | 1869 | out_free: |
1870 | vfree(cpuid_entries); | 1870 | vfree(cpuid_entries); |
1871 | out: | 1871 | out: |
1872 | return r; | 1872 | return r; |
1873 | } | 1873 | } |
1874 | 1874 | ||
1875 | static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, | 1875 | static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, |
1876 | struct kvm_cpuid2 *cpuid, | 1876 | struct kvm_cpuid2 *cpuid, |
1877 | struct kvm_cpuid_entry2 __user *entries) | 1877 | struct kvm_cpuid_entry2 __user *entries) |
1878 | { | 1878 | { |
1879 | int r; | 1879 | int r; |
1880 | 1880 | ||
1881 | r = -E2BIG; | 1881 | r = -E2BIG; |
1882 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) | 1882 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) |
1883 | goto out; | 1883 | goto out; |
1884 | r = -EFAULT; | 1884 | r = -EFAULT; |
1885 | if (copy_from_user(&vcpu->arch.cpuid_entries, entries, | 1885 | if (copy_from_user(&vcpu->arch.cpuid_entries, entries, |
1886 | cpuid->nent * sizeof(struct kvm_cpuid_entry2))) | 1886 | cpuid->nent * sizeof(struct kvm_cpuid_entry2))) |
1887 | goto out; | 1887 | goto out; |
1888 | vcpu->arch.cpuid_nent = cpuid->nent; | 1888 | vcpu->arch.cpuid_nent = cpuid->nent; |
1889 | kvm_apic_set_version(vcpu); | 1889 | kvm_apic_set_version(vcpu); |
1890 | kvm_x86_ops->cpuid_update(vcpu); | 1890 | kvm_x86_ops->cpuid_update(vcpu); |
1891 | update_cpuid(vcpu); | 1891 | update_cpuid(vcpu); |
1892 | return 0; | 1892 | return 0; |
1893 | 1893 | ||
1894 | out: | 1894 | out: |
1895 | return r; | 1895 | return r; |
1896 | } | 1896 | } |
1897 | 1897 | ||
1898 | static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, | 1898 | static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, |
1899 | struct kvm_cpuid2 *cpuid, | 1899 | struct kvm_cpuid2 *cpuid, |
1900 | struct kvm_cpuid_entry2 __user *entries) | 1900 | struct kvm_cpuid_entry2 __user *entries) |
1901 | { | 1901 | { |
1902 | int r; | 1902 | int r; |
1903 | 1903 | ||
1904 | r = -E2BIG; | 1904 | r = -E2BIG; |
1905 | if (cpuid->nent < vcpu->arch.cpuid_nent) | 1905 | if (cpuid->nent < vcpu->arch.cpuid_nent) |
1906 | goto out; | 1906 | goto out; |
1907 | r = -EFAULT; | 1907 | r = -EFAULT; |
1908 | if (copy_to_user(entries, &vcpu->arch.cpuid_entries, | 1908 | if (copy_to_user(entries, &vcpu->arch.cpuid_entries, |
1909 | vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) | 1909 | vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) |
1910 | goto out; | 1910 | goto out; |
1911 | return 0; | 1911 | return 0; |
1912 | 1912 | ||
1913 | out: | 1913 | out: |
1914 | cpuid->nent = vcpu->arch.cpuid_nent; | 1914 | cpuid->nent = vcpu->arch.cpuid_nent; |
1915 | return r; | 1915 | return r; |
1916 | } | 1916 | } |
1917 | 1917 | ||
1918 | static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, | 1918 | static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function, |
1919 | u32 index) | 1919 | u32 index) |
1920 | { | 1920 | { |
1921 | entry->function = function; | 1921 | entry->function = function; |
1922 | entry->index = index; | 1922 | entry->index = index; |
1923 | cpuid_count(entry->function, entry->index, | 1923 | cpuid_count(entry->function, entry->index, |
1924 | &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); | 1924 | &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); |
1925 | entry->flags = 0; | 1925 | entry->flags = 0; |
1926 | } | 1926 | } |
1927 | 1927 | ||
1928 | #define F(x) bit(X86_FEATURE_##x) | 1928 | #define F(x) bit(X86_FEATURE_##x) |
1929 | 1929 | ||
1930 | static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, | 1930 | static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, |
1931 | u32 index, int *nent, int maxnent) | 1931 | u32 index, int *nent, int maxnent) |
1932 | { | 1932 | { |
1933 | unsigned f_nx = is_efer_nx() ? F(NX) : 0; | 1933 | unsigned f_nx = is_efer_nx() ? F(NX) : 0; |
1934 | #ifdef CONFIG_X86_64 | 1934 | #ifdef CONFIG_X86_64 |
1935 | unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL) | 1935 | unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL) |
1936 | ? F(GBPAGES) : 0; | 1936 | ? F(GBPAGES) : 0; |
1937 | unsigned f_lm = F(LM); | 1937 | unsigned f_lm = F(LM); |
1938 | #else | 1938 | #else |
1939 | unsigned f_gbpages = 0; | 1939 | unsigned f_gbpages = 0; |
1940 | unsigned f_lm = 0; | 1940 | unsigned f_lm = 0; |
1941 | #endif | 1941 | #endif |
1942 | unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0; | 1942 | unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0; |
1943 | 1943 | ||
1944 | /* cpuid 1.edx */ | 1944 | /* cpuid 1.edx */ |
1945 | const u32 kvm_supported_word0_x86_features = | 1945 | const u32 kvm_supported_word0_x86_features = |
1946 | F(FPU) | F(VME) | F(DE) | F(PSE) | | 1946 | F(FPU) | F(VME) | F(DE) | F(PSE) | |
1947 | F(TSC) | F(MSR) | F(PAE) | F(MCE) | | 1947 | F(TSC) | F(MSR) | F(PAE) | F(MCE) | |
1948 | F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) | | 1948 | F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) | |
1949 | F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | | 1949 | F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | |
1950 | F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLSH) | | 1950 | F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLSH) | |
1951 | 0 /* Reserved, DS, ACPI */ | F(MMX) | | 1951 | 0 /* Reserved, DS, ACPI */ | F(MMX) | |
1952 | F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) | | 1952 | F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) | |
1953 | 0 /* HTT, TM, Reserved, PBE */; | 1953 | 0 /* HTT, TM, Reserved, PBE */; |
1954 | /* cpuid 0x80000001.edx */ | 1954 | /* cpuid 0x80000001.edx */ |
1955 | const u32 kvm_supported_word1_x86_features = | 1955 | const u32 kvm_supported_word1_x86_features = |
1956 | F(FPU) | F(VME) | F(DE) | F(PSE) | | 1956 | F(FPU) | F(VME) | F(DE) | F(PSE) | |
1957 | F(TSC) | F(MSR) | F(PAE) | F(MCE) | | 1957 | F(TSC) | F(MSR) | F(PAE) | F(MCE) | |
1958 | F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) | | 1958 | F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) | |
1959 | F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | | 1959 | F(MTRR) | F(PGE) | F(MCA) | F(CMOV) | |
1960 | F(PAT) | F(PSE36) | 0 /* Reserved */ | | 1960 | F(PAT) | F(PSE36) | 0 /* Reserved */ | |
1961 | f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | | 1961 | f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | |
1962 | F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | | 1962 | F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | |
1963 | 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); | 1963 | 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); |
1964 | /* cpuid 1.ecx */ | 1964 | /* cpuid 1.ecx */ |
1965 | const u32 kvm_supported_word4_x86_features = | 1965 | const u32 kvm_supported_word4_x86_features = |
1966 | F(XMM3) | 0 /* Reserved, DTES64, MONITOR */ | | 1966 | F(XMM3) | 0 /* Reserved, DTES64, MONITOR */ | |
1967 | 0 /* DS-CPL, VMX, SMX, EST */ | | 1967 | 0 /* DS-CPL, VMX, SMX, EST */ | |
1968 | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | | 1968 | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | |
1969 | 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | | 1969 | 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | |
1970 | 0 /* Reserved, DCA */ | F(XMM4_1) | | 1970 | 0 /* Reserved, DCA */ | F(XMM4_1) | |
1971 | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | | 1971 | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | |
1972 | 0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */; | 1972 | 0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */; |
1973 | /* cpuid 0x80000001.ecx */ | 1973 | /* cpuid 0x80000001.ecx */ |
1974 | const u32 kvm_supported_word6_x86_features = | 1974 | const u32 kvm_supported_word6_x86_features = |
1975 | F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ | | 1975 | F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ | |
1976 | F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) | | 1976 | F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) | |
1977 | F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) | | 1977 | F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) | |
1978 | 0 /* SKINIT */ | 0 /* WDT */; | 1978 | 0 /* SKINIT */ | 0 /* WDT */; |
1979 | 1979 | ||
1980 | /* all calls to cpuid_count() should be made on the same cpu */ | 1980 | /* all calls to cpuid_count() should be made on the same cpu */ |
1981 | get_cpu(); | 1981 | get_cpu(); |
1982 | do_cpuid_1_ent(entry, function, index); | 1982 | do_cpuid_1_ent(entry, function, index); |
1983 | ++*nent; | 1983 | ++*nent; |
1984 | 1984 | ||
1985 | switch (function) { | 1985 | switch (function) { |
1986 | case 0: | 1986 | case 0: |
1987 | entry->eax = min(entry->eax, (u32)0xd); | 1987 | entry->eax = min(entry->eax, (u32)0xd); |
1988 | break; | 1988 | break; |
1989 | case 1: | 1989 | case 1: |
1990 | entry->edx &= kvm_supported_word0_x86_features; | 1990 | entry->edx &= kvm_supported_word0_x86_features; |
1991 | entry->ecx &= kvm_supported_word4_x86_features; | 1991 | entry->ecx &= kvm_supported_word4_x86_features; |
1992 | /* we support x2apic emulation even if host does not support | 1992 | /* we support x2apic emulation even if host does not support |
1993 | * it since we emulate x2apic in software */ | 1993 | * it since we emulate x2apic in software */ |
1994 | entry->ecx |= F(X2APIC); | 1994 | entry->ecx |= F(X2APIC); |
1995 | break; | 1995 | break; |
1996 | /* function 2 entries are STATEFUL. That is, repeated cpuid commands | 1996 | /* function 2 entries are STATEFUL. That is, repeated cpuid commands |
1997 | * may return different values. This forces us to get_cpu() before | 1997 | * may return different values. This forces us to get_cpu() before |
1998 | * issuing the first command, and also to emulate this annoying behavior | 1998 | * issuing the first command, and also to emulate this annoying behavior |
1999 | * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */ | 1999 | * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */ |
2000 | case 2: { | 2000 | case 2: { |
2001 | int t, times = entry->eax & 0xff; | 2001 | int t, times = entry->eax & 0xff; |
2002 | 2002 | ||
2003 | entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; | 2003 | entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; |
2004 | entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; | 2004 | entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; |
2005 | for (t = 1; t < times && *nent < maxnent; ++t) { | 2005 | for (t = 1; t < times && *nent < maxnent; ++t) { |
2006 | do_cpuid_1_ent(&entry[t], function, 0); | 2006 | do_cpuid_1_ent(&entry[t], function, 0); |
2007 | entry[t].flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; | 2007 | entry[t].flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; |
2008 | ++*nent; | 2008 | ++*nent; |
2009 | } | 2009 | } |
2010 | break; | 2010 | break; |
2011 | } | 2011 | } |
2012 | /* function 4 and 0xb have additional index. */ | 2012 | /* function 4 and 0xb have additional index. */ |
2013 | case 4: { | 2013 | case 4: { |
2014 | int i, cache_type; | 2014 | int i, cache_type; |
2015 | 2015 | ||
2016 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2016 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2017 | /* read more entries until cache_type is zero */ | 2017 | /* read more entries until cache_type is zero */ |
2018 | for (i = 1; *nent < maxnent; ++i) { | 2018 | for (i = 1; *nent < maxnent; ++i) { |
2019 | cache_type = entry[i - 1].eax & 0x1f; | 2019 | cache_type = entry[i - 1].eax & 0x1f; |
2020 | if (!cache_type) | 2020 | if (!cache_type) |
2021 | break; | 2021 | break; |
2022 | do_cpuid_1_ent(&entry[i], function, i); | 2022 | do_cpuid_1_ent(&entry[i], function, i); |
2023 | entry[i].flags |= | 2023 | entry[i].flags |= |
2024 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2024 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2025 | ++*nent; | 2025 | ++*nent; |
2026 | } | 2026 | } |
2027 | break; | 2027 | break; |
2028 | } | 2028 | } |
2029 | case 0xb: { | 2029 | case 0xb: { |
2030 | int i, level_type; | 2030 | int i, level_type; |
2031 | 2031 | ||
2032 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2032 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2033 | /* read more entries until level_type is zero */ | 2033 | /* read more entries until level_type is zero */ |
2034 | for (i = 1; *nent < maxnent; ++i) { | 2034 | for (i = 1; *nent < maxnent; ++i) { |
2035 | level_type = entry[i - 1].ecx & 0xff00; | 2035 | level_type = entry[i - 1].ecx & 0xff00; |
2036 | if (!level_type) | 2036 | if (!level_type) |
2037 | break; | 2037 | break; |
2038 | do_cpuid_1_ent(&entry[i], function, i); | 2038 | do_cpuid_1_ent(&entry[i], function, i); |
2039 | entry[i].flags |= | 2039 | entry[i].flags |= |
2040 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2040 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2041 | ++*nent; | 2041 | ++*nent; |
2042 | } | 2042 | } |
2043 | break; | 2043 | break; |
2044 | } | 2044 | } |
2045 | case 0xd: { | 2045 | case 0xd: { |
2046 | int i; | 2046 | int i; |
2047 | 2047 | ||
2048 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2048 | entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2049 | for (i = 1; *nent < maxnent; ++i) { | 2049 | for (i = 1; *nent < maxnent; ++i) { |
2050 | if (entry[i - 1].eax == 0 && i != 2) | 2050 | if (entry[i - 1].eax == 0 && i != 2) |
2051 | break; | 2051 | break; |
2052 | do_cpuid_1_ent(&entry[i], function, i); | 2052 | do_cpuid_1_ent(&entry[i], function, i); |
2053 | entry[i].flags |= | 2053 | entry[i].flags |= |
2054 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; | 2054 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX; |
2055 | ++*nent; | 2055 | ++*nent; |
2056 | } | 2056 | } |
2057 | break; | 2057 | break; |
2058 | } | 2058 | } |
2059 | case KVM_CPUID_SIGNATURE: { | 2059 | case KVM_CPUID_SIGNATURE: { |
2060 | char signature[12] = "KVMKVMKVM\0\0"; | 2060 | char signature[12] = "KVMKVMKVM\0\0"; |
2061 | u32 *sigptr = (u32 *)signature; | 2061 | u32 *sigptr = (u32 *)signature; |
2062 | entry->eax = 0; | 2062 | entry->eax = 0; |
2063 | entry->ebx = sigptr[0]; | 2063 | entry->ebx = sigptr[0]; |
2064 | entry->ecx = sigptr[1]; | 2064 | entry->ecx = sigptr[1]; |
2065 | entry->edx = sigptr[2]; | 2065 | entry->edx = sigptr[2]; |
2066 | break; | 2066 | break; |
2067 | } | 2067 | } |
2068 | case KVM_CPUID_FEATURES: | 2068 | case KVM_CPUID_FEATURES: |
2069 | entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) | | 2069 | entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) | |
2070 | (1 << KVM_FEATURE_NOP_IO_DELAY) | | 2070 | (1 << KVM_FEATURE_NOP_IO_DELAY) | |
2071 | (1 << KVM_FEATURE_CLOCKSOURCE2) | | 2071 | (1 << KVM_FEATURE_CLOCKSOURCE2) | |
2072 | (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); | 2072 | (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); |
2073 | entry->ebx = 0; | 2073 | entry->ebx = 0; |
2074 | entry->ecx = 0; | 2074 | entry->ecx = 0; |
2075 | entry->edx = 0; | 2075 | entry->edx = 0; |
2076 | break; | 2076 | break; |
2077 | case 0x80000000: | 2077 | case 0x80000000: |
2078 | entry->eax = min(entry->eax, 0x8000001a); | 2078 | entry->eax = min(entry->eax, 0x8000001a); |
2079 | break; | 2079 | break; |
2080 | case 0x80000001: | 2080 | case 0x80000001: |
2081 | entry->edx &= kvm_supported_word1_x86_features; | 2081 | entry->edx &= kvm_supported_word1_x86_features; |
2082 | entry->ecx &= kvm_supported_word6_x86_features; | 2082 | entry->ecx &= kvm_supported_word6_x86_features; |
2083 | break; | 2083 | break; |
2084 | } | 2084 | } |
2085 | 2085 | ||
2086 | kvm_x86_ops->set_supported_cpuid(function, entry); | 2086 | kvm_x86_ops->set_supported_cpuid(function, entry); |
2087 | 2087 | ||
2088 | put_cpu(); | 2088 | put_cpu(); |
2089 | } | 2089 | } |
2090 | 2090 | ||
2091 | #undef F | 2091 | #undef F |
2092 | 2092 | ||
2093 | static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, | 2093 | static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, |
2094 | struct kvm_cpuid_entry2 __user *entries) | 2094 | struct kvm_cpuid_entry2 __user *entries) |
2095 | { | 2095 | { |
2096 | struct kvm_cpuid_entry2 *cpuid_entries; | 2096 | struct kvm_cpuid_entry2 *cpuid_entries; |
2097 | int limit, nent = 0, r = -E2BIG; | 2097 | int limit, nent = 0, r = -E2BIG; |
2098 | u32 func; | 2098 | u32 func; |
2099 | 2099 | ||
2100 | if (cpuid->nent < 1) | 2100 | if (cpuid->nent < 1) |
2101 | goto out; | 2101 | goto out; |
2102 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) | 2102 | if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) |
2103 | cpuid->nent = KVM_MAX_CPUID_ENTRIES; | 2103 | cpuid->nent = KVM_MAX_CPUID_ENTRIES; |
2104 | r = -ENOMEM; | 2104 | r = -ENOMEM; |
2105 | cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent); | 2105 | cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent); |
2106 | if (!cpuid_entries) | 2106 | if (!cpuid_entries) |
2107 | goto out; | 2107 | goto out; |
2108 | 2108 | ||
2109 | do_cpuid_ent(&cpuid_entries[0], 0, 0, &nent, cpuid->nent); | 2109 | do_cpuid_ent(&cpuid_entries[0], 0, 0, &nent, cpuid->nent); |
2110 | limit = cpuid_entries[0].eax; | 2110 | limit = cpuid_entries[0].eax; |
2111 | for (func = 1; func <= limit && nent < cpuid->nent; ++func) | 2111 | for (func = 1; func <= limit && nent < cpuid->nent; ++func) |
2112 | do_cpuid_ent(&cpuid_entries[nent], func, 0, | 2112 | do_cpuid_ent(&cpuid_entries[nent], func, 0, |
2113 | &nent, cpuid->nent); | 2113 | &nent, cpuid->nent); |
2114 | r = -E2BIG; | 2114 | r = -E2BIG; |
2115 | if (nent >= cpuid->nent) | 2115 | if (nent >= cpuid->nent) |
2116 | goto out_free; | 2116 | goto out_free; |
2117 | 2117 | ||
2118 | do_cpuid_ent(&cpuid_entries[nent], 0x80000000, 0, &nent, cpuid->nent); | 2118 | do_cpuid_ent(&cpuid_entries[nent], 0x80000000, 0, &nent, cpuid->nent); |
2119 | limit = cpuid_entries[nent - 1].eax; | 2119 | limit = cpuid_entries[nent - 1].eax; |
2120 | for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func) | 2120 | for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func) |
2121 | do_cpuid_ent(&cpuid_entries[nent], func, 0, | 2121 | do_cpuid_ent(&cpuid_entries[nent], func, 0, |
2122 | &nent, cpuid->nent); | 2122 | &nent, cpuid->nent); |
2123 | 2123 | ||
2124 | 2124 | ||
2125 | 2125 | ||
2126 | r = -E2BIG; | 2126 | r = -E2BIG; |
2127 | if (nent >= cpuid->nent) | 2127 | if (nent >= cpuid->nent) |
2128 | goto out_free; | 2128 | goto out_free; |
2129 | 2129 | ||
2130 | do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, &nent, | 2130 | do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, &nent, |
2131 | cpuid->nent); | 2131 | cpuid->nent); |
2132 | 2132 | ||
2133 | r = -E2BIG; | 2133 | r = -E2BIG; |
2134 | if (nent >= cpuid->nent) | 2134 | if (nent >= cpuid->nent) |
2135 | goto out_free; | 2135 | goto out_free; |
2136 | 2136 | ||
2137 | do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_FEATURES, 0, &nent, | 2137 | do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_FEATURES, 0, &nent, |
2138 | cpuid->nent); | 2138 | cpuid->nent); |
2139 | 2139 | ||
2140 | r = -E2BIG; | 2140 | r = -E2BIG; |
2141 | if (nent >= cpuid->nent) | 2141 | if (nent >= cpuid->nent) |
2142 | goto out_free; | 2142 | goto out_free; |
2143 | 2143 | ||
2144 | r = -EFAULT; | 2144 | r = -EFAULT; |
2145 | if (copy_to_user(entries, cpuid_entries, | 2145 | if (copy_to_user(entries, cpuid_entries, |
2146 | nent * sizeof(struct kvm_cpuid_entry2))) | 2146 | nent * sizeof(struct kvm_cpuid_entry2))) |
2147 | goto out_free; | 2147 | goto out_free; |
2148 | cpuid->nent = nent; | 2148 | cpuid->nent = nent; |
2149 | r = 0; | 2149 | r = 0; |
2150 | 2150 | ||
2151 | out_free: | 2151 | out_free: |
2152 | vfree(cpuid_entries); | 2152 | vfree(cpuid_entries); |
2153 | out: | 2153 | out: |
2154 | return r; | 2154 | return r; |
2155 | } | 2155 | } |
2156 | 2156 | ||
2157 | static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, | 2157 | static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, |
2158 | struct kvm_lapic_state *s) | 2158 | struct kvm_lapic_state *s) |
2159 | { | 2159 | { |
2160 | memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s); | 2160 | memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s); |
2161 | 2161 | ||
2162 | return 0; | 2162 | return 0; |
2163 | } | 2163 | } |
2164 | 2164 | ||
2165 | static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, | 2165 | static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, |
2166 | struct kvm_lapic_state *s) | 2166 | struct kvm_lapic_state *s) |
2167 | { | 2167 | { |
2168 | memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s); | 2168 | memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s); |
2169 | kvm_apic_post_state_restore(vcpu); | 2169 | kvm_apic_post_state_restore(vcpu); |
2170 | update_cr8_intercept(vcpu); | 2170 | update_cr8_intercept(vcpu); |
2171 | 2171 | ||
2172 | return 0; | 2172 | return 0; |
2173 | } | 2173 | } |
2174 | 2174 | ||
2175 | static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, | 2175 | static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, |
2176 | struct kvm_interrupt *irq) | 2176 | struct kvm_interrupt *irq) |
2177 | { | 2177 | { |
2178 | if (irq->irq < 0 || irq->irq >= 256) | 2178 | if (irq->irq < 0 || irq->irq >= 256) |
2179 | return -EINVAL; | 2179 | return -EINVAL; |
2180 | if (irqchip_in_kernel(vcpu->kvm)) | 2180 | if (irqchip_in_kernel(vcpu->kvm)) |
2181 | return -ENXIO; | 2181 | return -ENXIO; |
2182 | 2182 | ||
2183 | kvm_queue_interrupt(vcpu, irq->irq, false); | 2183 | kvm_queue_interrupt(vcpu, irq->irq, false); |
2184 | 2184 | ||
2185 | return 0; | 2185 | return 0; |
2186 | } | 2186 | } |
2187 | 2187 | ||
2188 | static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) | 2188 | static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) |
2189 | { | 2189 | { |
2190 | kvm_inject_nmi(vcpu); | 2190 | kvm_inject_nmi(vcpu); |
2191 | 2191 | ||
2192 | return 0; | 2192 | return 0; |
2193 | } | 2193 | } |
2194 | 2194 | ||
2195 | static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu, | 2195 | static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu, |
2196 | struct kvm_tpr_access_ctl *tac) | 2196 | struct kvm_tpr_access_ctl *tac) |
2197 | { | 2197 | { |
2198 | if (tac->flags) | 2198 | if (tac->flags) |
2199 | return -EINVAL; | 2199 | return -EINVAL; |
2200 | vcpu->arch.tpr_access_reporting = !!tac->enabled; | 2200 | vcpu->arch.tpr_access_reporting = !!tac->enabled; |
2201 | return 0; | 2201 | return 0; |
2202 | } | 2202 | } |
2203 | 2203 | ||
2204 | static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, | 2204 | static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, |
2205 | u64 mcg_cap) | 2205 | u64 mcg_cap) |
2206 | { | 2206 | { |
2207 | int r; | 2207 | int r; |
2208 | unsigned bank_num = mcg_cap & 0xff, bank; | 2208 | unsigned bank_num = mcg_cap & 0xff, bank; |
2209 | 2209 | ||
2210 | r = -EINVAL; | 2210 | r = -EINVAL; |
2211 | if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) | 2211 | if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) |
2212 | goto out; | 2212 | goto out; |
2213 | if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) | 2213 | if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) |
2214 | goto out; | 2214 | goto out; |
2215 | r = 0; | 2215 | r = 0; |
2216 | vcpu->arch.mcg_cap = mcg_cap; | 2216 | vcpu->arch.mcg_cap = mcg_cap; |
2217 | /* Init IA32_MCG_CTL to all 1s */ | 2217 | /* Init IA32_MCG_CTL to all 1s */ |
2218 | if (mcg_cap & MCG_CTL_P) | 2218 | if (mcg_cap & MCG_CTL_P) |
2219 | vcpu->arch.mcg_ctl = ~(u64)0; | 2219 | vcpu->arch.mcg_ctl = ~(u64)0; |
2220 | /* Init IA32_MCi_CTL to all 1s */ | 2220 | /* Init IA32_MCi_CTL to all 1s */ |
2221 | for (bank = 0; bank < bank_num; bank++) | 2221 | for (bank = 0; bank < bank_num; bank++) |
2222 | vcpu->arch.mce_banks[bank*4] = ~(u64)0; | 2222 | vcpu->arch.mce_banks[bank*4] = ~(u64)0; |
2223 | out: | 2223 | out: |
2224 | return r; | 2224 | return r; |
2225 | } | 2225 | } |
2226 | 2226 | ||
2227 | static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu, | 2227 | static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu, |
2228 | struct kvm_x86_mce *mce) | 2228 | struct kvm_x86_mce *mce) |
2229 | { | 2229 | { |
2230 | u64 mcg_cap = vcpu->arch.mcg_cap; | 2230 | u64 mcg_cap = vcpu->arch.mcg_cap; |
2231 | unsigned bank_num = mcg_cap & 0xff; | 2231 | unsigned bank_num = mcg_cap & 0xff; |
2232 | u64 *banks = vcpu->arch.mce_banks; | 2232 | u64 *banks = vcpu->arch.mce_banks; |
2233 | 2233 | ||
2234 | if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL)) | 2234 | if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL)) |
2235 | return -EINVAL; | 2235 | return -EINVAL; |
2236 | /* | 2236 | /* |
2237 | * if IA32_MCG_CTL is not all 1s, the uncorrected error | 2237 | * if IA32_MCG_CTL is not all 1s, the uncorrected error |
2238 | * reporting is disabled | 2238 | * reporting is disabled |
2239 | */ | 2239 | */ |
2240 | if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) && | 2240 | if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) && |
2241 | vcpu->arch.mcg_ctl != ~(u64)0) | 2241 | vcpu->arch.mcg_ctl != ~(u64)0) |
2242 | return 0; | 2242 | return 0; |
2243 | banks += 4 * mce->bank; | 2243 | banks += 4 * mce->bank; |
2244 | /* | 2244 | /* |
2245 | * if IA32_MCi_CTL is not all 1s, the uncorrected error | 2245 | * if IA32_MCi_CTL is not all 1s, the uncorrected error |
2246 | * reporting is disabled for the bank | 2246 | * reporting is disabled for the bank |
2247 | */ | 2247 | */ |
2248 | if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0) | 2248 | if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0) |
2249 | return 0; | 2249 | return 0; |
2250 | if (mce->status & MCI_STATUS_UC) { | 2250 | if (mce->status & MCI_STATUS_UC) { |
2251 | if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) || | 2251 | if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) || |
2252 | !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) { | 2252 | !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) { |
2253 | printk(KERN_DEBUG "kvm: set_mce: " | 2253 | printk(KERN_DEBUG "kvm: set_mce: " |
2254 | "injects mce exception while " | 2254 | "injects mce exception while " |
2255 | "previous one is in progress!\n"); | 2255 | "previous one is in progress!\n"); |
2256 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); | 2256 | set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); |
2257 | return 0; | 2257 | return 0; |
2258 | } | 2258 | } |
2259 | if (banks[1] & MCI_STATUS_VAL) | 2259 | if (banks[1] & MCI_STATUS_VAL) |
2260 | mce->status |= MCI_STATUS_OVER; | 2260 | mce->status |= MCI_STATUS_OVER; |
2261 | banks[2] = mce->addr; | 2261 | banks[2] = mce->addr; |
2262 | banks[3] = mce->misc; | 2262 | banks[3] = mce->misc; |
2263 | vcpu->arch.mcg_status = mce->mcg_status; | 2263 | vcpu->arch.mcg_status = mce->mcg_status; |
2264 | banks[1] = mce->status; | 2264 | banks[1] = mce->status; |
2265 | kvm_queue_exception(vcpu, MC_VECTOR); | 2265 | kvm_queue_exception(vcpu, MC_VECTOR); |
2266 | } else if (!(banks[1] & MCI_STATUS_VAL) | 2266 | } else if (!(banks[1] & MCI_STATUS_VAL) |
2267 | || !(banks[1] & MCI_STATUS_UC)) { | 2267 | || !(banks[1] & MCI_STATUS_UC)) { |
2268 | if (banks[1] & MCI_STATUS_VAL) | 2268 | if (banks[1] & MCI_STATUS_VAL) |
2269 | mce->status |= MCI_STATUS_OVER; | 2269 | mce->status |= MCI_STATUS_OVER; |
2270 | banks[2] = mce->addr; | 2270 | banks[2] = mce->addr; |
2271 | banks[3] = mce->misc; | 2271 | banks[3] = mce->misc; |
2272 | banks[1] = mce->status; | 2272 | banks[1] = mce->status; |
2273 | } else | 2273 | } else |
2274 | banks[1] |= MCI_STATUS_OVER; | 2274 | banks[1] |= MCI_STATUS_OVER; |
2275 | return 0; | 2275 | return 0; |
2276 | } | 2276 | } |
2277 | 2277 | ||
2278 | static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, | 2278 | static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, |
2279 | struct kvm_vcpu_events *events) | 2279 | struct kvm_vcpu_events *events) |
2280 | { | 2280 | { |
2281 | events->exception.injected = | 2281 | events->exception.injected = |
2282 | vcpu->arch.exception.pending && | 2282 | vcpu->arch.exception.pending && |
2283 | !kvm_exception_is_soft(vcpu->arch.exception.nr); | 2283 | !kvm_exception_is_soft(vcpu->arch.exception.nr); |
2284 | events->exception.nr = vcpu->arch.exception.nr; | 2284 | events->exception.nr = vcpu->arch.exception.nr; |
2285 | events->exception.has_error_code = vcpu->arch.exception.has_error_code; | 2285 | events->exception.has_error_code = vcpu->arch.exception.has_error_code; |
2286 | events->exception.error_code = vcpu->arch.exception.error_code; | 2286 | events->exception.error_code = vcpu->arch.exception.error_code; |
2287 | 2287 | ||
2288 | events->interrupt.injected = | 2288 | events->interrupt.injected = |
2289 | vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft; | 2289 | vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft; |
2290 | events->interrupt.nr = vcpu->arch.interrupt.nr; | 2290 | events->interrupt.nr = vcpu->arch.interrupt.nr; |
2291 | events->interrupt.soft = 0; | 2291 | events->interrupt.soft = 0; |
2292 | events->interrupt.shadow = | 2292 | events->interrupt.shadow = |
2293 | kvm_x86_ops->get_interrupt_shadow(vcpu, | 2293 | kvm_x86_ops->get_interrupt_shadow(vcpu, |
2294 | KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI); | 2294 | KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI); |
2295 | 2295 | ||
2296 | events->nmi.injected = vcpu->arch.nmi_injected; | 2296 | events->nmi.injected = vcpu->arch.nmi_injected; |
2297 | events->nmi.pending = vcpu->arch.nmi_pending; | 2297 | events->nmi.pending = vcpu->arch.nmi_pending; |
2298 | events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu); | 2298 | events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu); |
2299 | 2299 | ||
2300 | events->sipi_vector = vcpu->arch.sipi_vector; | 2300 | events->sipi_vector = vcpu->arch.sipi_vector; |
2301 | 2301 | ||
2302 | events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING | 2302 | events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING |
2303 | | KVM_VCPUEVENT_VALID_SIPI_VECTOR | 2303 | | KVM_VCPUEVENT_VALID_SIPI_VECTOR |
2304 | | KVM_VCPUEVENT_VALID_SHADOW); | 2304 | | KVM_VCPUEVENT_VALID_SHADOW); |
2305 | } | 2305 | } |
2306 | 2306 | ||
2307 | static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, | 2307 | static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, |
2308 | struct kvm_vcpu_events *events) | 2308 | struct kvm_vcpu_events *events) |
2309 | { | 2309 | { |
2310 | if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING | 2310 | if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING |
2311 | | KVM_VCPUEVENT_VALID_SIPI_VECTOR | 2311 | | KVM_VCPUEVENT_VALID_SIPI_VECTOR |
2312 | | KVM_VCPUEVENT_VALID_SHADOW)) | 2312 | | KVM_VCPUEVENT_VALID_SHADOW)) |
2313 | return -EINVAL; | 2313 | return -EINVAL; |
2314 | 2314 | ||
2315 | vcpu->arch.exception.pending = events->exception.injected; | 2315 | vcpu->arch.exception.pending = events->exception.injected; |
2316 | vcpu->arch.exception.nr = events->exception.nr; | 2316 | vcpu->arch.exception.nr = events->exception.nr; |
2317 | vcpu->arch.exception.has_error_code = events->exception.has_error_code; | 2317 | vcpu->arch.exception.has_error_code = events->exception.has_error_code; |
2318 | vcpu->arch.exception.error_code = events->exception.error_code; | 2318 | vcpu->arch.exception.error_code = events->exception.error_code; |
2319 | 2319 | ||
2320 | vcpu->arch.interrupt.pending = events->interrupt.injected; | 2320 | vcpu->arch.interrupt.pending = events->interrupt.injected; |
2321 | vcpu->arch.interrupt.nr = events->interrupt.nr; | 2321 | vcpu->arch.interrupt.nr = events->interrupt.nr; |
2322 | vcpu->arch.interrupt.soft = events->interrupt.soft; | 2322 | vcpu->arch.interrupt.soft = events->interrupt.soft; |
2323 | if (vcpu->arch.interrupt.pending && irqchip_in_kernel(vcpu->kvm)) | 2323 | if (vcpu->arch.interrupt.pending && irqchip_in_kernel(vcpu->kvm)) |
2324 | kvm_pic_clear_isr_ack(vcpu->kvm); | 2324 | kvm_pic_clear_isr_ack(vcpu->kvm); |
2325 | if (events->flags & KVM_VCPUEVENT_VALID_SHADOW) | 2325 | if (events->flags & KVM_VCPUEVENT_VALID_SHADOW) |
2326 | kvm_x86_ops->set_interrupt_shadow(vcpu, | 2326 | kvm_x86_ops->set_interrupt_shadow(vcpu, |
2327 | events->interrupt.shadow); | 2327 | events->interrupt.shadow); |
2328 | 2328 | ||
2329 | vcpu->arch.nmi_injected = events->nmi.injected; | 2329 | vcpu->arch.nmi_injected = events->nmi.injected; |
2330 | if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) | 2330 | if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) |
2331 | vcpu->arch.nmi_pending = events->nmi.pending; | 2331 | vcpu->arch.nmi_pending = events->nmi.pending; |
2332 | kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked); | 2332 | kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked); |
2333 | 2333 | ||
2334 | if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR) | 2334 | if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR) |
2335 | vcpu->arch.sipi_vector = events->sipi_vector; | 2335 | vcpu->arch.sipi_vector = events->sipi_vector; |
2336 | 2336 | ||
2337 | return 0; | 2337 | return 0; |
2338 | } | 2338 | } |
2339 | 2339 | ||
2340 | static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, | 2340 | static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, |
2341 | struct kvm_debugregs *dbgregs) | 2341 | struct kvm_debugregs *dbgregs) |
2342 | { | 2342 | { |
2343 | memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db)); | 2343 | memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db)); |
2344 | dbgregs->dr6 = vcpu->arch.dr6; | 2344 | dbgregs->dr6 = vcpu->arch.dr6; |
2345 | dbgregs->dr7 = vcpu->arch.dr7; | 2345 | dbgregs->dr7 = vcpu->arch.dr7; |
2346 | dbgregs->flags = 0; | 2346 | dbgregs->flags = 0; |
2347 | } | 2347 | } |
2348 | 2348 | ||
2349 | static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, | 2349 | static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, |
2350 | struct kvm_debugregs *dbgregs) | 2350 | struct kvm_debugregs *dbgregs) |
2351 | { | 2351 | { |
2352 | if (dbgregs->flags) | 2352 | if (dbgregs->flags) |
2353 | return -EINVAL; | 2353 | return -EINVAL; |
2354 | 2354 | ||
2355 | memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db)); | 2355 | memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db)); |
2356 | vcpu->arch.dr6 = dbgregs->dr6; | 2356 | vcpu->arch.dr6 = dbgregs->dr6; |
2357 | vcpu->arch.dr7 = dbgregs->dr7; | 2357 | vcpu->arch.dr7 = dbgregs->dr7; |
2358 | 2358 | ||
2359 | return 0; | 2359 | return 0; |
2360 | } | 2360 | } |
2361 | 2361 | ||
2362 | static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, | 2362 | static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, |
2363 | struct kvm_xsave *guest_xsave) | 2363 | struct kvm_xsave *guest_xsave) |
2364 | { | 2364 | { |
2365 | if (cpu_has_xsave) | 2365 | if (cpu_has_xsave) |
2366 | memcpy(guest_xsave->region, | 2366 | memcpy(guest_xsave->region, |
2367 | &vcpu->arch.guest_fpu.state->xsave, | 2367 | &vcpu->arch.guest_fpu.state->xsave, |
2368 | sizeof(struct xsave_struct)); | 2368 | sizeof(struct xsave_struct)); |
2369 | else { | 2369 | else { |
2370 | memcpy(guest_xsave->region, | 2370 | memcpy(guest_xsave->region, |
2371 | &vcpu->arch.guest_fpu.state->fxsave, | 2371 | &vcpu->arch.guest_fpu.state->fxsave, |
2372 | sizeof(struct i387_fxsave_struct)); | 2372 | sizeof(struct i387_fxsave_struct)); |
2373 | *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] = | 2373 | *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] = |
2374 | XSTATE_FPSSE; | 2374 | XSTATE_FPSSE; |
2375 | } | 2375 | } |
2376 | } | 2376 | } |
2377 | 2377 | ||
2378 | static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, | 2378 | static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, |
2379 | struct kvm_xsave *guest_xsave) | 2379 | struct kvm_xsave *guest_xsave) |
2380 | { | 2380 | { |
2381 | u64 xstate_bv = | 2381 | u64 xstate_bv = |
2382 | *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)]; | 2382 | *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)]; |
2383 | 2383 | ||
2384 | if (cpu_has_xsave) | 2384 | if (cpu_has_xsave) |
2385 | memcpy(&vcpu->arch.guest_fpu.state->xsave, | 2385 | memcpy(&vcpu->arch.guest_fpu.state->xsave, |
2386 | guest_xsave->region, sizeof(struct xsave_struct)); | 2386 | guest_xsave->region, sizeof(struct xsave_struct)); |
2387 | else { | 2387 | else { |
2388 | if (xstate_bv & ~XSTATE_FPSSE) | 2388 | if (xstate_bv & ~XSTATE_FPSSE) |
2389 | return -EINVAL; | 2389 | return -EINVAL; |
2390 | memcpy(&vcpu->arch.guest_fpu.state->fxsave, | 2390 | memcpy(&vcpu->arch.guest_fpu.state->fxsave, |
2391 | guest_xsave->region, sizeof(struct i387_fxsave_struct)); | 2391 | guest_xsave->region, sizeof(struct i387_fxsave_struct)); |
2392 | } | 2392 | } |
2393 | return 0; | 2393 | return 0; |
2394 | } | 2394 | } |
2395 | 2395 | ||
2396 | static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, | 2396 | static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, |
2397 | struct kvm_xcrs *guest_xcrs) | 2397 | struct kvm_xcrs *guest_xcrs) |
2398 | { | 2398 | { |
2399 | if (!cpu_has_xsave) { | 2399 | if (!cpu_has_xsave) { |
2400 | guest_xcrs->nr_xcrs = 0; | 2400 | guest_xcrs->nr_xcrs = 0; |
2401 | return; | 2401 | return; |
2402 | } | 2402 | } |
2403 | 2403 | ||
2404 | guest_xcrs->nr_xcrs = 1; | 2404 | guest_xcrs->nr_xcrs = 1; |
2405 | guest_xcrs->flags = 0; | 2405 | guest_xcrs->flags = 0; |
2406 | guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK; | 2406 | guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK; |
2407 | guest_xcrs->xcrs[0].value = vcpu->arch.xcr0; | 2407 | guest_xcrs->xcrs[0].value = vcpu->arch.xcr0; |
2408 | } | 2408 | } |
2409 | 2409 | ||
2410 | static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, | 2410 | static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, |
2411 | struct kvm_xcrs *guest_xcrs) | 2411 | struct kvm_xcrs *guest_xcrs) |
2412 | { | 2412 | { |
2413 | int i, r = 0; | 2413 | int i, r = 0; |
2414 | 2414 | ||
2415 | if (!cpu_has_xsave) | 2415 | if (!cpu_has_xsave) |
2416 | return -EINVAL; | 2416 | return -EINVAL; |
2417 | 2417 | ||
2418 | if (guest_xcrs->nr_xcrs > KVM_MAX_XCRS || guest_xcrs->flags) | 2418 | if (guest_xcrs->nr_xcrs > KVM_MAX_XCRS || guest_xcrs->flags) |
2419 | return -EINVAL; | 2419 | return -EINVAL; |
2420 | 2420 | ||
2421 | for (i = 0; i < guest_xcrs->nr_xcrs; i++) | 2421 | for (i = 0; i < guest_xcrs->nr_xcrs; i++) |
2422 | /* Only support XCR0 currently */ | 2422 | /* Only support XCR0 currently */ |
2423 | if (guest_xcrs->xcrs[0].xcr == XCR_XFEATURE_ENABLED_MASK) { | 2423 | if (guest_xcrs->xcrs[0].xcr == XCR_XFEATURE_ENABLED_MASK) { |
2424 | r = __kvm_set_xcr(vcpu, XCR_XFEATURE_ENABLED_MASK, | 2424 | r = __kvm_set_xcr(vcpu, XCR_XFEATURE_ENABLED_MASK, |
2425 | guest_xcrs->xcrs[0].value); | 2425 | guest_xcrs->xcrs[0].value); |
2426 | break; | 2426 | break; |
2427 | } | 2427 | } |
2428 | if (r) | 2428 | if (r) |
2429 | r = -EINVAL; | 2429 | r = -EINVAL; |
2430 | return r; | 2430 | return r; |
2431 | } | 2431 | } |
2432 | 2432 | ||
2433 | long kvm_arch_vcpu_ioctl(struct file *filp, | 2433 | long kvm_arch_vcpu_ioctl(struct file *filp, |
2434 | unsigned int ioctl, unsigned long arg) | 2434 | unsigned int ioctl, unsigned long arg) |
2435 | { | 2435 | { |
2436 | struct kvm_vcpu *vcpu = filp->private_data; | 2436 | struct kvm_vcpu *vcpu = filp->private_data; |
2437 | void __user *argp = (void __user *)arg; | 2437 | void __user *argp = (void __user *)arg; |
2438 | int r; | 2438 | int r; |
2439 | union { | 2439 | union { |
2440 | struct kvm_lapic_state *lapic; | 2440 | struct kvm_lapic_state *lapic; |
2441 | struct kvm_xsave *xsave; | 2441 | struct kvm_xsave *xsave; |
2442 | struct kvm_xcrs *xcrs; | 2442 | struct kvm_xcrs *xcrs; |
2443 | void *buffer; | 2443 | void *buffer; |
2444 | } u; | 2444 | } u; |
2445 | 2445 | ||
2446 | u.buffer = NULL; | 2446 | u.buffer = NULL; |
2447 | switch (ioctl) { | 2447 | switch (ioctl) { |
2448 | case KVM_GET_LAPIC: { | 2448 | case KVM_GET_LAPIC: { |
2449 | r = -EINVAL; | 2449 | r = -EINVAL; |
2450 | if (!vcpu->arch.apic) | 2450 | if (!vcpu->arch.apic) |
2451 | goto out; | 2451 | goto out; |
2452 | u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); | 2452 | u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); |
2453 | 2453 | ||
2454 | r = -ENOMEM; | 2454 | r = -ENOMEM; |
2455 | if (!u.lapic) | 2455 | if (!u.lapic) |
2456 | goto out; | 2456 | goto out; |
2457 | r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic); | 2457 | r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic); |
2458 | if (r) | 2458 | if (r) |
2459 | goto out; | 2459 | goto out; |
2460 | r = -EFAULT; | 2460 | r = -EFAULT; |
2461 | if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state))) | 2461 | if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state))) |
2462 | goto out; | 2462 | goto out; |
2463 | r = 0; | 2463 | r = 0; |
2464 | break; | 2464 | break; |
2465 | } | 2465 | } |
2466 | case KVM_SET_LAPIC: { | 2466 | case KVM_SET_LAPIC: { |
2467 | r = -EINVAL; | 2467 | r = -EINVAL; |
2468 | if (!vcpu->arch.apic) | 2468 | if (!vcpu->arch.apic) |
2469 | goto out; | 2469 | goto out; |
2470 | u.lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); | 2470 | u.lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); |
2471 | r = -ENOMEM; | 2471 | r = -ENOMEM; |
2472 | if (!u.lapic) | 2472 | if (!u.lapic) |
2473 | goto out; | 2473 | goto out; |
2474 | r = -EFAULT; | 2474 | r = -EFAULT; |
2475 | if (copy_from_user(u.lapic, argp, sizeof(struct kvm_lapic_state))) | 2475 | if (copy_from_user(u.lapic, argp, sizeof(struct kvm_lapic_state))) |
2476 | goto out; | 2476 | goto out; |
2477 | r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic); | 2477 | r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic); |
2478 | if (r) | 2478 | if (r) |
2479 | goto out; | 2479 | goto out; |
2480 | r = 0; | 2480 | r = 0; |
2481 | break; | 2481 | break; |
2482 | } | 2482 | } |
2483 | case KVM_INTERRUPT: { | 2483 | case KVM_INTERRUPT: { |
2484 | struct kvm_interrupt irq; | 2484 | struct kvm_interrupt irq; |
2485 | 2485 | ||
2486 | r = -EFAULT; | 2486 | r = -EFAULT; |
2487 | if (copy_from_user(&irq, argp, sizeof irq)) | 2487 | if (copy_from_user(&irq, argp, sizeof irq)) |
2488 | goto out; | 2488 | goto out; |
2489 | r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); | 2489 | r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); |
2490 | if (r) | 2490 | if (r) |
2491 | goto out; | 2491 | goto out; |
2492 | r = 0; | 2492 | r = 0; |
2493 | break; | 2493 | break; |
2494 | } | 2494 | } |
2495 | case KVM_NMI: { | 2495 | case KVM_NMI: { |
2496 | r = kvm_vcpu_ioctl_nmi(vcpu); | 2496 | r = kvm_vcpu_ioctl_nmi(vcpu); |
2497 | if (r) | 2497 | if (r) |
2498 | goto out; | 2498 | goto out; |
2499 | r = 0; | 2499 | r = 0; |
2500 | break; | 2500 | break; |
2501 | } | 2501 | } |
2502 | case KVM_SET_CPUID: { | 2502 | case KVM_SET_CPUID: { |
2503 | struct kvm_cpuid __user *cpuid_arg = argp; | 2503 | struct kvm_cpuid __user *cpuid_arg = argp; |
2504 | struct kvm_cpuid cpuid; | 2504 | struct kvm_cpuid cpuid; |
2505 | 2505 | ||
2506 | r = -EFAULT; | 2506 | r = -EFAULT; |
2507 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) | 2507 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) |
2508 | goto out; | 2508 | goto out; |
2509 | r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries); | 2509 | r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries); |
2510 | if (r) | 2510 | if (r) |
2511 | goto out; | 2511 | goto out; |
2512 | break; | 2512 | break; |
2513 | } | 2513 | } |
2514 | case KVM_SET_CPUID2: { | 2514 | case KVM_SET_CPUID2: { |
2515 | struct kvm_cpuid2 __user *cpuid_arg = argp; | 2515 | struct kvm_cpuid2 __user *cpuid_arg = argp; |
2516 | struct kvm_cpuid2 cpuid; | 2516 | struct kvm_cpuid2 cpuid; |
2517 | 2517 | ||
2518 | r = -EFAULT; | 2518 | r = -EFAULT; |
2519 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) | 2519 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) |
2520 | goto out; | 2520 | goto out; |
2521 | r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid, | 2521 | r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid, |
2522 | cpuid_arg->entries); | 2522 | cpuid_arg->entries); |
2523 | if (r) | 2523 | if (r) |
2524 | goto out; | 2524 | goto out; |
2525 | break; | 2525 | break; |
2526 | } | 2526 | } |
2527 | case KVM_GET_CPUID2: { | 2527 | case KVM_GET_CPUID2: { |
2528 | struct kvm_cpuid2 __user *cpuid_arg = argp; | 2528 | struct kvm_cpuid2 __user *cpuid_arg = argp; |
2529 | struct kvm_cpuid2 cpuid; | 2529 | struct kvm_cpuid2 cpuid; |
2530 | 2530 | ||
2531 | r = -EFAULT; | 2531 | r = -EFAULT; |
2532 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) | 2532 | if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid)) |
2533 | goto out; | 2533 | goto out; |
2534 | r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, | 2534 | r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid, |
2535 | cpuid_arg->entries); | 2535 | cpuid_arg->entries); |
2536 | if (r) | 2536 | if (r) |
2537 | goto out; | 2537 | goto out; |
2538 | r = -EFAULT; | 2538 | r = -EFAULT; |
2539 | if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) | 2539 | if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid)) |
2540 | goto out; | 2540 | goto out; |
2541 | r = 0; | 2541 | r = 0; |
2542 | break; | 2542 | break; |
2543 | } | 2543 | } |
2544 | case KVM_GET_MSRS: | 2544 | case KVM_GET_MSRS: |
2545 | r = msr_io(vcpu, argp, kvm_get_msr, 1); | 2545 | r = msr_io(vcpu, argp, kvm_get_msr, 1); |
2546 | break; | 2546 | break; |
2547 | case KVM_SET_MSRS: | 2547 | case KVM_SET_MSRS: |
2548 | r = msr_io(vcpu, argp, do_set_msr, 0); | 2548 | r = msr_io(vcpu, argp, do_set_msr, 0); |
2549 | break; | 2549 | break; |
2550 | case KVM_TPR_ACCESS_REPORTING: { | 2550 | case KVM_TPR_ACCESS_REPORTING: { |
2551 | struct kvm_tpr_access_ctl tac; | 2551 | struct kvm_tpr_access_ctl tac; |
2552 | 2552 | ||
2553 | r = -EFAULT; | 2553 | r = -EFAULT; |
2554 | if (copy_from_user(&tac, argp, sizeof tac)) | 2554 | if (copy_from_user(&tac, argp, sizeof tac)) |
2555 | goto out; | 2555 | goto out; |
2556 | r = vcpu_ioctl_tpr_access_reporting(vcpu, &tac); | 2556 | r = vcpu_ioctl_tpr_access_reporting(vcpu, &tac); |
2557 | if (r) | 2557 | if (r) |
2558 | goto out; | 2558 | goto out; |
2559 | r = -EFAULT; | 2559 | r = -EFAULT; |
2560 | if (copy_to_user(argp, &tac, sizeof tac)) | 2560 | if (copy_to_user(argp, &tac, sizeof tac)) |
2561 | goto out; | 2561 | goto out; |
2562 | r = 0; | 2562 | r = 0; |
2563 | break; | 2563 | break; |
2564 | }; | 2564 | }; |
2565 | case KVM_SET_VAPIC_ADDR: { | 2565 | case KVM_SET_VAPIC_ADDR: { |
2566 | struct kvm_vapic_addr va; | 2566 | struct kvm_vapic_addr va; |
2567 | 2567 | ||
2568 | r = -EINVAL; | 2568 | r = -EINVAL; |
2569 | if (!irqchip_in_kernel(vcpu->kvm)) | 2569 | if (!irqchip_in_kernel(vcpu->kvm)) |
2570 | goto out; | 2570 | goto out; |
2571 | r = -EFAULT; | 2571 | r = -EFAULT; |
2572 | if (copy_from_user(&va, argp, sizeof va)) | 2572 | if (copy_from_user(&va, argp, sizeof va)) |
2573 | goto out; | 2573 | goto out; |
2574 | r = 0; | 2574 | r = 0; |
2575 | kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr); | 2575 | kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr); |
2576 | break; | 2576 | break; |
2577 | } | 2577 | } |
2578 | case KVM_X86_SETUP_MCE: { | 2578 | case KVM_X86_SETUP_MCE: { |
2579 | u64 mcg_cap; | 2579 | u64 mcg_cap; |
2580 | 2580 | ||
2581 | r = -EFAULT; | 2581 | r = -EFAULT; |
2582 | if (copy_from_user(&mcg_cap, argp, sizeof mcg_cap)) | 2582 | if (copy_from_user(&mcg_cap, argp, sizeof mcg_cap)) |
2583 | goto out; | 2583 | goto out; |
2584 | r = kvm_vcpu_ioctl_x86_setup_mce(vcpu, mcg_cap); | 2584 | r = kvm_vcpu_ioctl_x86_setup_mce(vcpu, mcg_cap); |
2585 | break; | 2585 | break; |
2586 | } | 2586 | } |
2587 | case KVM_X86_SET_MCE: { | 2587 | case KVM_X86_SET_MCE: { |
2588 | struct kvm_x86_mce mce; | 2588 | struct kvm_x86_mce mce; |
2589 | 2589 | ||
2590 | r = -EFAULT; | 2590 | r = -EFAULT; |
2591 | if (copy_from_user(&mce, argp, sizeof mce)) | 2591 | if (copy_from_user(&mce, argp, sizeof mce)) |
2592 | goto out; | 2592 | goto out; |
2593 | r = kvm_vcpu_ioctl_x86_set_mce(vcpu, &mce); | 2593 | r = kvm_vcpu_ioctl_x86_set_mce(vcpu, &mce); |
2594 | break; | 2594 | break; |
2595 | } | 2595 | } |
2596 | case KVM_GET_VCPU_EVENTS: { | 2596 | case KVM_GET_VCPU_EVENTS: { |
2597 | struct kvm_vcpu_events events; | 2597 | struct kvm_vcpu_events events; |
2598 | 2598 | ||
2599 | kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &events); | 2599 | kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &events); |
2600 | 2600 | ||
2601 | r = -EFAULT; | 2601 | r = -EFAULT; |
2602 | if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events))) | 2602 | if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events))) |
2603 | break; | 2603 | break; |
2604 | r = 0; | 2604 | r = 0; |
2605 | break; | 2605 | break; |
2606 | } | 2606 | } |
2607 | case KVM_SET_VCPU_EVENTS: { | 2607 | case KVM_SET_VCPU_EVENTS: { |
2608 | struct kvm_vcpu_events events; | 2608 | struct kvm_vcpu_events events; |
2609 | 2609 | ||
2610 | r = -EFAULT; | 2610 | r = -EFAULT; |
2611 | if (copy_from_user(&events, argp, sizeof(struct kvm_vcpu_events))) | 2611 | if (copy_from_user(&events, argp, sizeof(struct kvm_vcpu_events))) |
2612 | break; | 2612 | break; |
2613 | 2613 | ||
2614 | r = kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events); | 2614 | r = kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events); |
2615 | break; | 2615 | break; |
2616 | } | 2616 | } |
2617 | case KVM_GET_DEBUGREGS: { | 2617 | case KVM_GET_DEBUGREGS: { |
2618 | struct kvm_debugregs dbgregs; | 2618 | struct kvm_debugregs dbgregs; |
2619 | 2619 | ||
2620 | kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs); | 2620 | kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs); |
2621 | 2621 | ||
2622 | r = -EFAULT; | 2622 | r = -EFAULT; |
2623 | if (copy_to_user(argp, &dbgregs, | 2623 | if (copy_to_user(argp, &dbgregs, |
2624 | sizeof(struct kvm_debugregs))) | 2624 | sizeof(struct kvm_debugregs))) |
2625 | break; | 2625 | break; |
2626 | r = 0; | 2626 | r = 0; |
2627 | break; | 2627 | break; |
2628 | } | 2628 | } |
2629 | case KVM_SET_DEBUGREGS: { | 2629 | case KVM_SET_DEBUGREGS: { |
2630 | struct kvm_debugregs dbgregs; | 2630 | struct kvm_debugregs dbgregs; |
2631 | 2631 | ||
2632 | r = -EFAULT; | 2632 | r = -EFAULT; |
2633 | if (copy_from_user(&dbgregs, argp, | 2633 | if (copy_from_user(&dbgregs, argp, |
2634 | sizeof(struct kvm_debugregs))) | 2634 | sizeof(struct kvm_debugregs))) |
2635 | break; | 2635 | break; |
2636 | 2636 | ||
2637 | r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu, &dbgregs); | 2637 | r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu, &dbgregs); |
2638 | break; | 2638 | break; |
2639 | } | 2639 | } |
2640 | case KVM_GET_XSAVE: { | 2640 | case KVM_GET_XSAVE: { |
2641 | u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); | 2641 | u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); |
2642 | r = -ENOMEM; | 2642 | r = -ENOMEM; |
2643 | if (!u.xsave) | 2643 | if (!u.xsave) |
2644 | break; | 2644 | break; |
2645 | 2645 | ||
2646 | kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave); | 2646 | kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave); |
2647 | 2647 | ||
2648 | r = -EFAULT; | 2648 | r = -EFAULT; |
2649 | if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave))) | 2649 | if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave))) |
2650 | break; | 2650 | break; |
2651 | r = 0; | 2651 | r = 0; |
2652 | break; | 2652 | break; |
2653 | } | 2653 | } |
2654 | case KVM_SET_XSAVE: { | 2654 | case KVM_SET_XSAVE: { |
2655 | u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); | 2655 | u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL); |
2656 | r = -ENOMEM; | 2656 | r = -ENOMEM; |
2657 | if (!u.xsave) | 2657 | if (!u.xsave) |
2658 | break; | 2658 | break; |
2659 | 2659 | ||
2660 | r = -EFAULT; | 2660 | r = -EFAULT; |
2661 | if (copy_from_user(u.xsave, argp, sizeof(struct kvm_xsave))) | 2661 | if (copy_from_user(u.xsave, argp, sizeof(struct kvm_xsave))) |
2662 | break; | 2662 | break; |
2663 | 2663 | ||
2664 | r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave); | 2664 | r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave); |
2665 | break; | 2665 | break; |
2666 | } | 2666 | } |
2667 | case KVM_GET_XCRS: { | 2667 | case KVM_GET_XCRS: { |
2668 | u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); | 2668 | u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); |
2669 | r = -ENOMEM; | 2669 | r = -ENOMEM; |
2670 | if (!u.xcrs) | 2670 | if (!u.xcrs) |
2671 | break; | 2671 | break; |
2672 | 2672 | ||
2673 | kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs); | 2673 | kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs); |
2674 | 2674 | ||
2675 | r = -EFAULT; | 2675 | r = -EFAULT; |
2676 | if (copy_to_user(argp, u.xcrs, | 2676 | if (copy_to_user(argp, u.xcrs, |
2677 | sizeof(struct kvm_xcrs))) | 2677 | sizeof(struct kvm_xcrs))) |
2678 | break; | 2678 | break; |
2679 | r = 0; | 2679 | r = 0; |
2680 | break; | 2680 | break; |
2681 | } | 2681 | } |
2682 | case KVM_SET_XCRS: { | 2682 | case KVM_SET_XCRS: { |
2683 | u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); | 2683 | u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL); |
2684 | r = -ENOMEM; | 2684 | r = -ENOMEM; |
2685 | if (!u.xcrs) | 2685 | if (!u.xcrs) |
2686 | break; | 2686 | break; |
2687 | 2687 | ||
2688 | r = -EFAULT; | 2688 | r = -EFAULT; |
2689 | if (copy_from_user(u.xcrs, argp, | 2689 | if (copy_from_user(u.xcrs, argp, |
2690 | sizeof(struct kvm_xcrs))) | 2690 | sizeof(struct kvm_xcrs))) |
2691 | break; | 2691 | break; |
2692 | 2692 | ||
2693 | r = kvm_vcpu_ioctl_x86_set_xcrs(vcpu, u.xcrs); | 2693 | r = kvm_vcpu_ioctl_x86_set_xcrs(vcpu, u.xcrs); |
2694 | break; | 2694 | break; |
2695 | } | 2695 | } |
2696 | default: | 2696 | default: |
2697 | r = -EINVAL; | 2697 | r = -EINVAL; |
2698 | } | 2698 | } |
2699 | out: | 2699 | out: |
2700 | kfree(u.buffer); | 2700 | kfree(u.buffer); |
2701 | return r; | 2701 | return r; |
2702 | } | 2702 | } |
2703 | 2703 | ||
2704 | static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr) | 2704 | static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr) |
2705 | { | 2705 | { |
2706 | int ret; | 2706 | int ret; |
2707 | 2707 | ||
2708 | if (addr > (unsigned int)(-3 * PAGE_SIZE)) | 2708 | if (addr > (unsigned int)(-3 * PAGE_SIZE)) |
2709 | return -1; | 2709 | return -1; |
2710 | ret = kvm_x86_ops->set_tss_addr(kvm, addr); | 2710 | ret = kvm_x86_ops->set_tss_addr(kvm, addr); |
2711 | return ret; | 2711 | return ret; |
2712 | } | 2712 | } |
2713 | 2713 | ||
2714 | static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm, | 2714 | static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm, |
2715 | u64 ident_addr) | 2715 | u64 ident_addr) |
2716 | { | 2716 | { |
2717 | kvm->arch.ept_identity_map_addr = ident_addr; | 2717 | kvm->arch.ept_identity_map_addr = ident_addr; |
2718 | return 0; | 2718 | return 0; |
2719 | } | 2719 | } |
2720 | 2720 | ||
2721 | static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm, | 2721 | static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm, |
2722 | u32 kvm_nr_mmu_pages) | 2722 | u32 kvm_nr_mmu_pages) |
2723 | { | 2723 | { |
2724 | if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES) | 2724 | if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES) |
2725 | return -EINVAL; | 2725 | return -EINVAL; |
2726 | 2726 | ||
2727 | mutex_lock(&kvm->slots_lock); | 2727 | mutex_lock(&kvm->slots_lock); |
2728 | spin_lock(&kvm->mmu_lock); | 2728 | spin_lock(&kvm->mmu_lock); |
2729 | 2729 | ||
2730 | kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages); | 2730 | kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages); |
2731 | kvm->arch.n_requested_mmu_pages = kvm_nr_mmu_pages; | 2731 | kvm->arch.n_requested_mmu_pages = kvm_nr_mmu_pages; |
2732 | 2732 | ||
2733 | spin_unlock(&kvm->mmu_lock); | 2733 | spin_unlock(&kvm->mmu_lock); |
2734 | mutex_unlock(&kvm->slots_lock); | 2734 | mutex_unlock(&kvm->slots_lock); |
2735 | return 0; | 2735 | return 0; |
2736 | } | 2736 | } |
2737 | 2737 | ||
2738 | static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm) | 2738 | static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm) |
2739 | { | 2739 | { |
2740 | return kvm->arch.n_alloc_mmu_pages; | 2740 | return kvm->arch.n_alloc_mmu_pages; |
2741 | } | 2741 | } |
2742 | 2742 | ||
2743 | gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn) | ||
2744 | { | ||
2745 | int i; | ||
2746 | struct kvm_mem_alias *alias; | ||
2747 | struct kvm_mem_aliases *aliases; | ||
2748 | |||
2749 | aliases = kvm_aliases(kvm); | ||
2750 | |||
2751 | for (i = 0; i < aliases->naliases; ++i) { | ||
2752 | alias = &aliases->aliases[i]; | ||
2753 | if (alias->flags & KVM_ALIAS_INVALID) | ||
2754 | continue; | ||
2755 | if (gfn >= alias->base_gfn | ||
2756 | && gfn < alias->base_gfn + alias->npages) | ||
2757 | return alias->target_gfn + gfn - alias->base_gfn; | ||
2758 | } | ||
2759 | return gfn; | ||
2760 | } | ||
2761 | |||
2762 | gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) | ||
2763 | { | ||
2764 | int i; | ||
2765 | struct kvm_mem_alias *alias; | ||
2766 | struct kvm_mem_aliases *aliases; | ||
2767 | |||
2768 | aliases = kvm_aliases(kvm); | ||
2769 | |||
2770 | for (i = 0; i < aliases->naliases; ++i) { | ||
2771 | alias = &aliases->aliases[i]; | ||
2772 | if (gfn >= alias->base_gfn | ||
2773 | && gfn < alias->base_gfn + alias->npages) | ||
2774 | return alias->target_gfn + gfn - alias->base_gfn; | ||
2775 | } | ||
2776 | return gfn; | ||
2777 | } | ||
2778 | |||
2779 | /* | ||
2780 | * Set a new alias region. Aliases map a portion of physical memory into | ||
2781 | * another portion. This is useful for memory windows, for example the PC | ||
2782 | * VGA region. | ||
2783 | */ | ||
2784 | static int kvm_vm_ioctl_set_memory_alias(struct kvm *kvm, | ||
2785 | struct kvm_memory_alias *alias) | ||
2786 | { | ||
2787 | int r, n; | ||
2788 | struct kvm_mem_alias *p; | ||
2789 | struct kvm_mem_aliases *aliases, *old_aliases; | ||
2790 | |||
2791 | r = -EINVAL; | ||
2792 | /* General sanity checks */ | ||
2793 | if (alias->memory_size & (PAGE_SIZE - 1)) | ||
2794 | goto out; | ||
2795 | if (alias->guest_phys_addr & (PAGE_SIZE - 1)) | ||
2796 | goto out; | ||
2797 | if (alias->slot >= KVM_ALIAS_SLOTS) | ||
2798 | goto out; | ||
2799 | if (alias->guest_phys_addr + alias->memory_size | ||
2800 | < alias->guest_phys_addr) | ||
2801 | goto out; | ||
2802 | if (alias->target_phys_addr + alias->memory_size | ||
2803 | < alias->target_phys_addr) | ||
2804 | goto out; | ||
2805 | |||
2806 | r = -ENOMEM; | ||
2807 | aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL); | ||
2808 | if (!aliases) | ||
2809 | goto out; | ||
2810 | |||
2811 | mutex_lock(&kvm->slots_lock); | ||
2812 | |||
2813 | /* invalidate any gfn reference in case of deletion/shrinking */ | ||
2814 | memcpy(aliases, kvm->arch.aliases, sizeof(struct kvm_mem_aliases)); | ||
2815 | aliases->aliases[alias->slot].flags |= KVM_ALIAS_INVALID; | ||
2816 | old_aliases = kvm->arch.aliases; | ||
2817 | rcu_assign_pointer(kvm->arch.aliases, aliases); | ||
2818 | synchronize_srcu_expedited(&kvm->srcu); | ||
2819 | kvm_mmu_zap_all(kvm); | ||
2820 | kfree(old_aliases); | ||
2821 | |||
2822 | r = -ENOMEM; | ||
2823 | aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL); | ||
2824 | if (!aliases) | ||
2825 | goto out_unlock; | ||
2826 | |||
2827 | memcpy(aliases, kvm->arch.aliases, sizeof(struct kvm_mem_aliases)); | ||
2828 | |||
2829 | p = &aliases->aliases[alias->slot]; | ||
2830 | p->base_gfn = alias->guest_phys_addr >> PAGE_SHIFT; | ||
2831 | p->npages = alias->memory_size >> PAGE_SHIFT; | ||
2832 | p->target_gfn = alias->target_phys_addr >> PAGE_SHIFT; | ||
2833 | p->flags &= ~(KVM_ALIAS_INVALID); | ||
2834 | |||
2835 | for (n = KVM_ALIAS_SLOTS; n > 0; --n) | ||
2836 | if (aliases->aliases[n - 1].npages) | ||
2837 | break; | ||
2838 | aliases->naliases = n; | ||
2839 | |||
2840 | old_aliases = kvm->arch.aliases; | ||
2841 | rcu_assign_pointer(kvm->arch.aliases, aliases); | ||
2842 | synchronize_srcu_expedited(&kvm->srcu); | ||
2843 | kfree(old_aliases); | ||
2844 | r = 0; | ||
2845 | |||
2846 | out_unlock: | ||
2847 | mutex_unlock(&kvm->slots_lock); | ||
2848 | out: | ||
2849 | return r; | ||
2850 | } | ||
2851 | |||
2852 | static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) | 2743 | static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) |
2853 | { | 2744 | { |
2854 | int r; | 2745 | int r; |
2855 | 2746 | ||
2856 | r = 0; | 2747 | r = 0; |
2857 | switch (chip->chip_id) { | 2748 | switch (chip->chip_id) { |
2858 | case KVM_IRQCHIP_PIC_MASTER: | 2749 | case KVM_IRQCHIP_PIC_MASTER: |
2859 | memcpy(&chip->chip.pic, | 2750 | memcpy(&chip->chip.pic, |
2860 | &pic_irqchip(kvm)->pics[0], | 2751 | &pic_irqchip(kvm)->pics[0], |
2861 | sizeof(struct kvm_pic_state)); | 2752 | sizeof(struct kvm_pic_state)); |
2862 | break; | 2753 | break; |
2863 | case KVM_IRQCHIP_PIC_SLAVE: | 2754 | case KVM_IRQCHIP_PIC_SLAVE: |
2864 | memcpy(&chip->chip.pic, | 2755 | memcpy(&chip->chip.pic, |
2865 | &pic_irqchip(kvm)->pics[1], | 2756 | &pic_irqchip(kvm)->pics[1], |
2866 | sizeof(struct kvm_pic_state)); | 2757 | sizeof(struct kvm_pic_state)); |
2867 | break; | 2758 | break; |
2868 | case KVM_IRQCHIP_IOAPIC: | 2759 | case KVM_IRQCHIP_IOAPIC: |
2869 | r = kvm_get_ioapic(kvm, &chip->chip.ioapic); | 2760 | r = kvm_get_ioapic(kvm, &chip->chip.ioapic); |
2870 | break; | 2761 | break; |
2871 | default: | 2762 | default: |
2872 | r = -EINVAL; | 2763 | r = -EINVAL; |
2873 | break; | 2764 | break; |
2874 | } | 2765 | } |
2875 | return r; | 2766 | return r; |
2876 | } | 2767 | } |
2877 | 2768 | ||
2878 | static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) | 2769 | static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip) |
2879 | { | 2770 | { |
2880 | int r; | 2771 | int r; |
2881 | 2772 | ||
2882 | r = 0; | 2773 | r = 0; |
2883 | switch (chip->chip_id) { | 2774 | switch (chip->chip_id) { |
2884 | case KVM_IRQCHIP_PIC_MASTER: | 2775 | case KVM_IRQCHIP_PIC_MASTER: |
2885 | raw_spin_lock(&pic_irqchip(kvm)->lock); | 2776 | raw_spin_lock(&pic_irqchip(kvm)->lock); |
2886 | memcpy(&pic_irqchip(kvm)->pics[0], | 2777 | memcpy(&pic_irqchip(kvm)->pics[0], |
2887 | &chip->chip.pic, | 2778 | &chip->chip.pic, |
2888 | sizeof(struct kvm_pic_state)); | 2779 | sizeof(struct kvm_pic_state)); |
2889 | raw_spin_unlock(&pic_irqchip(kvm)->lock); | 2780 | raw_spin_unlock(&pic_irqchip(kvm)->lock); |
2890 | break; | 2781 | break; |
2891 | case KVM_IRQCHIP_PIC_SLAVE: | 2782 | case KVM_IRQCHIP_PIC_SLAVE: |
2892 | raw_spin_lock(&pic_irqchip(kvm)->lock); | 2783 | raw_spin_lock(&pic_irqchip(kvm)->lock); |
2893 | memcpy(&pic_irqchip(kvm)->pics[1], | 2784 | memcpy(&pic_irqchip(kvm)->pics[1], |
2894 | &chip->chip.pic, | 2785 | &chip->chip.pic, |
2895 | sizeof(struct kvm_pic_state)); | 2786 | sizeof(struct kvm_pic_state)); |
2896 | raw_spin_unlock(&pic_irqchip(kvm)->lock); | 2787 | raw_spin_unlock(&pic_irqchip(kvm)->lock); |
2897 | break; | 2788 | break; |
2898 | case KVM_IRQCHIP_IOAPIC: | 2789 | case KVM_IRQCHIP_IOAPIC: |
2899 | r = kvm_set_ioapic(kvm, &chip->chip.ioapic); | 2790 | r = kvm_set_ioapic(kvm, &chip->chip.ioapic); |
2900 | break; | 2791 | break; |
2901 | default: | 2792 | default: |
2902 | r = -EINVAL; | 2793 | r = -EINVAL; |
2903 | break; | 2794 | break; |
2904 | } | 2795 | } |
2905 | kvm_pic_update_irq(pic_irqchip(kvm)); | 2796 | kvm_pic_update_irq(pic_irqchip(kvm)); |
2906 | return r; | 2797 | return r; |
2907 | } | 2798 | } |
2908 | 2799 | ||
2909 | static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) | 2800 | static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) |
2910 | { | 2801 | { |
2911 | int r = 0; | 2802 | int r = 0; |
2912 | 2803 | ||
2913 | mutex_lock(&kvm->arch.vpit->pit_state.lock); | 2804 | mutex_lock(&kvm->arch.vpit->pit_state.lock); |
2914 | memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state)); | 2805 | memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state)); |
2915 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); | 2806 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); |
2916 | return r; | 2807 | return r; |
2917 | } | 2808 | } |
2918 | 2809 | ||
2919 | static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) | 2810 | static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) |
2920 | { | 2811 | { |
2921 | int r = 0; | 2812 | int r = 0; |
2922 | 2813 | ||
2923 | mutex_lock(&kvm->arch.vpit->pit_state.lock); | 2814 | mutex_lock(&kvm->arch.vpit->pit_state.lock); |
2924 | memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); | 2815 | memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); |
2925 | kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0); | 2816 | kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0); |
2926 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); | 2817 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); |
2927 | return r; | 2818 | return r; |
2928 | } | 2819 | } |
2929 | 2820 | ||
2930 | static int kvm_vm_ioctl_get_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) | 2821 | static int kvm_vm_ioctl_get_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) |
2931 | { | 2822 | { |
2932 | int r = 0; | 2823 | int r = 0; |
2933 | 2824 | ||
2934 | mutex_lock(&kvm->arch.vpit->pit_state.lock); | 2825 | mutex_lock(&kvm->arch.vpit->pit_state.lock); |
2935 | memcpy(ps->channels, &kvm->arch.vpit->pit_state.channels, | 2826 | memcpy(ps->channels, &kvm->arch.vpit->pit_state.channels, |
2936 | sizeof(ps->channels)); | 2827 | sizeof(ps->channels)); |
2937 | ps->flags = kvm->arch.vpit->pit_state.flags; | 2828 | ps->flags = kvm->arch.vpit->pit_state.flags; |
2938 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); | 2829 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); |
2939 | return r; | 2830 | return r; |
2940 | } | 2831 | } |
2941 | 2832 | ||
2942 | static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) | 2833 | static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps) |
2943 | { | 2834 | { |
2944 | int r = 0, start = 0; | 2835 | int r = 0, start = 0; |
2945 | u32 prev_legacy, cur_legacy; | 2836 | u32 prev_legacy, cur_legacy; |
2946 | mutex_lock(&kvm->arch.vpit->pit_state.lock); | 2837 | mutex_lock(&kvm->arch.vpit->pit_state.lock); |
2947 | prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY; | 2838 | prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY; |
2948 | cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY; | 2839 | cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY; |
2949 | if (!prev_legacy && cur_legacy) | 2840 | if (!prev_legacy && cur_legacy) |
2950 | start = 1; | 2841 | start = 1; |
2951 | memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels, | 2842 | memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels, |
2952 | sizeof(kvm->arch.vpit->pit_state.channels)); | 2843 | sizeof(kvm->arch.vpit->pit_state.channels)); |
2953 | kvm->arch.vpit->pit_state.flags = ps->flags; | 2844 | kvm->arch.vpit->pit_state.flags = ps->flags; |
2954 | kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, start); | 2845 | kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, start); |
2955 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); | 2846 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); |
2956 | return r; | 2847 | return r; |
2957 | } | 2848 | } |
2958 | 2849 | ||
2959 | static int kvm_vm_ioctl_reinject(struct kvm *kvm, | 2850 | static int kvm_vm_ioctl_reinject(struct kvm *kvm, |
2960 | struct kvm_reinject_control *control) | 2851 | struct kvm_reinject_control *control) |
2961 | { | 2852 | { |
2962 | if (!kvm->arch.vpit) | 2853 | if (!kvm->arch.vpit) |
2963 | return -ENXIO; | 2854 | return -ENXIO; |
2964 | mutex_lock(&kvm->arch.vpit->pit_state.lock); | 2855 | mutex_lock(&kvm->arch.vpit->pit_state.lock); |
2965 | kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject; | 2856 | kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject; |
2966 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); | 2857 | mutex_unlock(&kvm->arch.vpit->pit_state.lock); |
2967 | return 0; | 2858 | return 0; |
2968 | } | 2859 | } |
2969 | 2860 | ||
2970 | /* | 2861 | /* |
2971 | * Get (and clear) the dirty memory log for a memory slot. | 2862 | * Get (and clear) the dirty memory log for a memory slot. |
2972 | */ | 2863 | */ |
2973 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, | 2864 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, |
2974 | struct kvm_dirty_log *log) | 2865 | struct kvm_dirty_log *log) |
2975 | { | 2866 | { |
2976 | int r, i; | 2867 | int r, i; |
2977 | struct kvm_memory_slot *memslot; | 2868 | struct kvm_memory_slot *memslot; |
2978 | unsigned long n; | 2869 | unsigned long n; |
2979 | unsigned long is_dirty = 0; | 2870 | unsigned long is_dirty = 0; |
2980 | 2871 | ||
2981 | mutex_lock(&kvm->slots_lock); | 2872 | mutex_lock(&kvm->slots_lock); |
2982 | 2873 | ||
2983 | r = -EINVAL; | 2874 | r = -EINVAL; |
2984 | if (log->slot >= KVM_MEMORY_SLOTS) | 2875 | if (log->slot >= KVM_MEMORY_SLOTS) |
2985 | goto out; | 2876 | goto out; |
2986 | 2877 | ||
2987 | memslot = &kvm->memslots->memslots[log->slot]; | 2878 | memslot = &kvm->memslots->memslots[log->slot]; |
2988 | r = -ENOENT; | 2879 | r = -ENOENT; |
2989 | if (!memslot->dirty_bitmap) | 2880 | if (!memslot->dirty_bitmap) |
2990 | goto out; | 2881 | goto out; |
2991 | 2882 | ||
2992 | n = kvm_dirty_bitmap_bytes(memslot); | 2883 | n = kvm_dirty_bitmap_bytes(memslot); |
2993 | 2884 | ||
2994 | for (i = 0; !is_dirty && i < n/sizeof(long); i++) | 2885 | for (i = 0; !is_dirty && i < n/sizeof(long); i++) |
2995 | is_dirty = memslot->dirty_bitmap[i]; | 2886 | is_dirty = memslot->dirty_bitmap[i]; |
2996 | 2887 | ||
2997 | /* If nothing is dirty, don't bother messing with page tables. */ | 2888 | /* If nothing is dirty, don't bother messing with page tables. */ |
2998 | if (is_dirty) { | 2889 | if (is_dirty) { |
2999 | struct kvm_memslots *slots, *old_slots; | 2890 | struct kvm_memslots *slots, *old_slots; |
3000 | unsigned long *dirty_bitmap; | 2891 | unsigned long *dirty_bitmap; |
3001 | 2892 | ||
3002 | spin_lock(&kvm->mmu_lock); | 2893 | spin_lock(&kvm->mmu_lock); |
3003 | kvm_mmu_slot_remove_write_access(kvm, log->slot); | 2894 | kvm_mmu_slot_remove_write_access(kvm, log->slot); |
3004 | spin_unlock(&kvm->mmu_lock); | 2895 | spin_unlock(&kvm->mmu_lock); |
3005 | 2896 | ||
3006 | r = -ENOMEM; | 2897 | r = -ENOMEM; |
3007 | dirty_bitmap = vmalloc(n); | 2898 | dirty_bitmap = vmalloc(n); |
3008 | if (!dirty_bitmap) | 2899 | if (!dirty_bitmap) |
3009 | goto out; | 2900 | goto out; |
3010 | memset(dirty_bitmap, 0, n); | 2901 | memset(dirty_bitmap, 0, n); |
3011 | 2902 | ||
3012 | r = -ENOMEM; | 2903 | r = -ENOMEM; |
3013 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); | 2904 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); |
3014 | if (!slots) { | 2905 | if (!slots) { |
3015 | vfree(dirty_bitmap); | 2906 | vfree(dirty_bitmap); |
3016 | goto out; | 2907 | goto out; |
3017 | } | 2908 | } |
3018 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); | 2909 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); |
3019 | slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; | 2910 | slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; |
3020 | 2911 | ||
3021 | old_slots = kvm->memslots; | 2912 | old_slots = kvm->memslots; |
3022 | rcu_assign_pointer(kvm->memslots, slots); | 2913 | rcu_assign_pointer(kvm->memslots, slots); |
3023 | synchronize_srcu_expedited(&kvm->srcu); | 2914 | synchronize_srcu_expedited(&kvm->srcu); |
3024 | dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap; | 2915 | dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap; |
3025 | kfree(old_slots); | 2916 | kfree(old_slots); |
3026 | 2917 | ||
3027 | r = -EFAULT; | 2918 | r = -EFAULT; |
3028 | if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { | 2919 | if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { |
3029 | vfree(dirty_bitmap); | 2920 | vfree(dirty_bitmap); |
3030 | goto out; | 2921 | goto out; |
3031 | } | 2922 | } |
3032 | vfree(dirty_bitmap); | 2923 | vfree(dirty_bitmap); |
3033 | } else { | 2924 | } else { |
3034 | r = -EFAULT; | 2925 | r = -EFAULT; |
3035 | if (clear_user(log->dirty_bitmap, n)) | 2926 | if (clear_user(log->dirty_bitmap, n)) |
3036 | goto out; | 2927 | goto out; |
3037 | } | 2928 | } |
3038 | 2929 | ||
3039 | r = 0; | 2930 | r = 0; |
3040 | out: | 2931 | out: |
3041 | mutex_unlock(&kvm->slots_lock); | 2932 | mutex_unlock(&kvm->slots_lock); |
3042 | return r; | 2933 | return r; |
3043 | } | 2934 | } |
3044 | 2935 | ||
3045 | long kvm_arch_vm_ioctl(struct file *filp, | 2936 | long kvm_arch_vm_ioctl(struct file *filp, |
3046 | unsigned int ioctl, unsigned long arg) | 2937 | unsigned int ioctl, unsigned long arg) |
3047 | { | 2938 | { |
3048 | struct kvm *kvm = filp->private_data; | 2939 | struct kvm *kvm = filp->private_data; |
3049 | void __user *argp = (void __user *)arg; | 2940 | void __user *argp = (void __user *)arg; |
3050 | int r = -ENOTTY; | 2941 | int r = -ENOTTY; |
3051 | /* | 2942 | /* |
3052 | * This union makes it completely explicit to gcc-3.x | 2943 | * This union makes it completely explicit to gcc-3.x |
3053 | * that these two variables' stack usage should be | 2944 | * that these two variables' stack usage should be |
3054 | * combined, not added together. | 2945 | * combined, not added together. |
3055 | */ | 2946 | */ |
3056 | union { | 2947 | union { |
3057 | struct kvm_pit_state ps; | 2948 | struct kvm_pit_state ps; |
3058 | struct kvm_pit_state2 ps2; | 2949 | struct kvm_pit_state2 ps2; |
3059 | struct kvm_memory_alias alias; | ||
3060 | struct kvm_pit_config pit_config; | 2950 | struct kvm_pit_config pit_config; |
3061 | } u; | 2951 | } u; |
3062 | 2952 | ||
3063 | switch (ioctl) { | 2953 | switch (ioctl) { |
3064 | case KVM_SET_TSS_ADDR: | 2954 | case KVM_SET_TSS_ADDR: |
3065 | r = kvm_vm_ioctl_set_tss_addr(kvm, arg); | 2955 | r = kvm_vm_ioctl_set_tss_addr(kvm, arg); |
3066 | if (r < 0) | 2956 | if (r < 0) |
3067 | goto out; | 2957 | goto out; |
3068 | break; | 2958 | break; |
3069 | case KVM_SET_IDENTITY_MAP_ADDR: { | 2959 | case KVM_SET_IDENTITY_MAP_ADDR: { |
3070 | u64 ident_addr; | 2960 | u64 ident_addr; |
3071 | 2961 | ||
3072 | r = -EFAULT; | 2962 | r = -EFAULT; |
3073 | if (copy_from_user(&ident_addr, argp, sizeof ident_addr)) | 2963 | if (copy_from_user(&ident_addr, argp, sizeof ident_addr)) |
3074 | goto out; | 2964 | goto out; |
3075 | r = kvm_vm_ioctl_set_identity_map_addr(kvm, ident_addr); | 2965 | r = kvm_vm_ioctl_set_identity_map_addr(kvm, ident_addr); |
3076 | if (r < 0) | 2966 | if (r < 0) |
3077 | goto out; | 2967 | goto out; |
3078 | break; | 2968 | break; |
3079 | } | 2969 | } |
3080 | case KVM_SET_MEMORY_REGION: { | 2970 | case KVM_SET_MEMORY_REGION: { |
3081 | struct kvm_memory_region kvm_mem; | 2971 | struct kvm_memory_region kvm_mem; |
3082 | struct kvm_userspace_memory_region kvm_userspace_mem; | 2972 | struct kvm_userspace_memory_region kvm_userspace_mem; |
3083 | 2973 | ||
3084 | r = -EFAULT; | 2974 | r = -EFAULT; |
3085 | if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) | 2975 | if (copy_from_user(&kvm_mem, argp, sizeof kvm_mem)) |
3086 | goto out; | 2976 | goto out; |
3087 | kvm_userspace_mem.slot = kvm_mem.slot; | 2977 | kvm_userspace_mem.slot = kvm_mem.slot; |
3088 | kvm_userspace_mem.flags = kvm_mem.flags; | 2978 | kvm_userspace_mem.flags = kvm_mem.flags; |
3089 | kvm_userspace_mem.guest_phys_addr = kvm_mem.guest_phys_addr; | 2979 | kvm_userspace_mem.guest_phys_addr = kvm_mem.guest_phys_addr; |
3090 | kvm_userspace_mem.memory_size = kvm_mem.memory_size; | 2980 | kvm_userspace_mem.memory_size = kvm_mem.memory_size; |
3091 | r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 0); | 2981 | r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 0); |
3092 | if (r) | 2982 | if (r) |
3093 | goto out; | 2983 | goto out; |
3094 | break; | 2984 | break; |
3095 | } | 2985 | } |
3096 | case KVM_SET_NR_MMU_PAGES: | 2986 | case KVM_SET_NR_MMU_PAGES: |
3097 | r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg); | 2987 | r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg); |
3098 | if (r) | 2988 | if (r) |
3099 | goto out; | 2989 | goto out; |
3100 | break; | 2990 | break; |
3101 | case KVM_GET_NR_MMU_PAGES: | 2991 | case KVM_GET_NR_MMU_PAGES: |
3102 | r = kvm_vm_ioctl_get_nr_mmu_pages(kvm); | 2992 | r = kvm_vm_ioctl_get_nr_mmu_pages(kvm); |
3103 | break; | 2993 | break; |
3104 | case KVM_SET_MEMORY_ALIAS: | ||
3105 | r = -EFAULT; | ||
3106 | if (copy_from_user(&u.alias, argp, sizeof(struct kvm_memory_alias))) | ||
3107 | goto out; | ||
3108 | r = kvm_vm_ioctl_set_memory_alias(kvm, &u.alias); | ||
3109 | if (r) | ||
3110 | goto out; | ||
3111 | break; | ||
3112 | case KVM_CREATE_IRQCHIP: { | 2994 | case KVM_CREATE_IRQCHIP: { |
3113 | struct kvm_pic *vpic; | 2995 | struct kvm_pic *vpic; |
3114 | 2996 | ||
3115 | mutex_lock(&kvm->lock); | 2997 | mutex_lock(&kvm->lock); |
3116 | r = -EEXIST; | 2998 | r = -EEXIST; |
3117 | if (kvm->arch.vpic) | 2999 | if (kvm->arch.vpic) |
3118 | goto create_irqchip_unlock; | 3000 | goto create_irqchip_unlock; |
3119 | r = -ENOMEM; | 3001 | r = -ENOMEM; |
3120 | vpic = kvm_create_pic(kvm); | 3002 | vpic = kvm_create_pic(kvm); |
3121 | if (vpic) { | 3003 | if (vpic) { |
3122 | r = kvm_ioapic_init(kvm); | 3004 | r = kvm_ioapic_init(kvm); |
3123 | if (r) { | 3005 | if (r) { |
3124 | kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, | 3006 | kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, |
3125 | &vpic->dev); | 3007 | &vpic->dev); |
3126 | kfree(vpic); | 3008 | kfree(vpic); |
3127 | goto create_irqchip_unlock; | 3009 | goto create_irqchip_unlock; |
3128 | } | 3010 | } |
3129 | } else | 3011 | } else |
3130 | goto create_irqchip_unlock; | 3012 | goto create_irqchip_unlock; |
3131 | smp_wmb(); | 3013 | smp_wmb(); |
3132 | kvm->arch.vpic = vpic; | 3014 | kvm->arch.vpic = vpic; |
3133 | smp_wmb(); | 3015 | smp_wmb(); |
3134 | r = kvm_setup_default_irq_routing(kvm); | 3016 | r = kvm_setup_default_irq_routing(kvm); |
3135 | if (r) { | 3017 | if (r) { |
3136 | mutex_lock(&kvm->irq_lock); | 3018 | mutex_lock(&kvm->irq_lock); |
3137 | kvm_ioapic_destroy(kvm); | 3019 | kvm_ioapic_destroy(kvm); |
3138 | kvm_destroy_pic(kvm); | 3020 | kvm_destroy_pic(kvm); |
3139 | mutex_unlock(&kvm->irq_lock); | 3021 | mutex_unlock(&kvm->irq_lock); |
3140 | } | 3022 | } |
3141 | create_irqchip_unlock: | 3023 | create_irqchip_unlock: |
3142 | mutex_unlock(&kvm->lock); | 3024 | mutex_unlock(&kvm->lock); |
3143 | break; | 3025 | break; |
3144 | } | 3026 | } |
3145 | case KVM_CREATE_PIT: | 3027 | case KVM_CREATE_PIT: |
3146 | u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY; | 3028 | u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY; |
3147 | goto create_pit; | 3029 | goto create_pit; |
3148 | case KVM_CREATE_PIT2: | 3030 | case KVM_CREATE_PIT2: |
3149 | r = -EFAULT; | 3031 | r = -EFAULT; |
3150 | if (copy_from_user(&u.pit_config, argp, | 3032 | if (copy_from_user(&u.pit_config, argp, |
3151 | sizeof(struct kvm_pit_config))) | 3033 | sizeof(struct kvm_pit_config))) |
3152 | goto out; | 3034 | goto out; |
3153 | create_pit: | 3035 | create_pit: |
3154 | mutex_lock(&kvm->slots_lock); | 3036 | mutex_lock(&kvm->slots_lock); |
3155 | r = -EEXIST; | 3037 | r = -EEXIST; |
3156 | if (kvm->arch.vpit) | 3038 | if (kvm->arch.vpit) |
3157 | goto create_pit_unlock; | 3039 | goto create_pit_unlock; |
3158 | r = -ENOMEM; | 3040 | r = -ENOMEM; |
3159 | kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags); | 3041 | kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags); |
3160 | if (kvm->arch.vpit) | 3042 | if (kvm->arch.vpit) |
3161 | r = 0; | 3043 | r = 0; |
3162 | create_pit_unlock: | 3044 | create_pit_unlock: |
3163 | mutex_unlock(&kvm->slots_lock); | 3045 | mutex_unlock(&kvm->slots_lock); |
3164 | break; | 3046 | break; |
3165 | case KVM_IRQ_LINE_STATUS: | 3047 | case KVM_IRQ_LINE_STATUS: |
3166 | case KVM_IRQ_LINE: { | 3048 | case KVM_IRQ_LINE: { |
3167 | struct kvm_irq_level irq_event; | 3049 | struct kvm_irq_level irq_event; |
3168 | 3050 | ||
3169 | r = -EFAULT; | 3051 | r = -EFAULT; |
3170 | if (copy_from_user(&irq_event, argp, sizeof irq_event)) | 3052 | if (copy_from_user(&irq_event, argp, sizeof irq_event)) |
3171 | goto out; | 3053 | goto out; |
3172 | r = -ENXIO; | 3054 | r = -ENXIO; |
3173 | if (irqchip_in_kernel(kvm)) { | 3055 | if (irqchip_in_kernel(kvm)) { |
3174 | __s32 status; | 3056 | __s32 status; |
3175 | status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, | 3057 | status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, |
3176 | irq_event.irq, irq_event.level); | 3058 | irq_event.irq, irq_event.level); |
3177 | if (ioctl == KVM_IRQ_LINE_STATUS) { | 3059 | if (ioctl == KVM_IRQ_LINE_STATUS) { |
3178 | r = -EFAULT; | 3060 | r = -EFAULT; |
3179 | irq_event.status = status; | 3061 | irq_event.status = status; |
3180 | if (copy_to_user(argp, &irq_event, | 3062 | if (copy_to_user(argp, &irq_event, |
3181 | sizeof irq_event)) | 3063 | sizeof irq_event)) |
3182 | goto out; | 3064 | goto out; |
3183 | } | 3065 | } |
3184 | r = 0; | 3066 | r = 0; |
3185 | } | 3067 | } |
3186 | break; | 3068 | break; |
3187 | } | 3069 | } |
3188 | case KVM_GET_IRQCHIP: { | 3070 | case KVM_GET_IRQCHIP: { |
3189 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ | 3071 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ |
3190 | struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); | 3072 | struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); |
3191 | 3073 | ||
3192 | r = -ENOMEM; | 3074 | r = -ENOMEM; |
3193 | if (!chip) | 3075 | if (!chip) |
3194 | goto out; | 3076 | goto out; |
3195 | r = -EFAULT; | 3077 | r = -EFAULT; |
3196 | if (copy_from_user(chip, argp, sizeof *chip)) | 3078 | if (copy_from_user(chip, argp, sizeof *chip)) |
3197 | goto get_irqchip_out; | 3079 | goto get_irqchip_out; |
3198 | r = -ENXIO; | 3080 | r = -ENXIO; |
3199 | if (!irqchip_in_kernel(kvm)) | 3081 | if (!irqchip_in_kernel(kvm)) |
3200 | goto get_irqchip_out; | 3082 | goto get_irqchip_out; |
3201 | r = kvm_vm_ioctl_get_irqchip(kvm, chip); | 3083 | r = kvm_vm_ioctl_get_irqchip(kvm, chip); |
3202 | if (r) | 3084 | if (r) |
3203 | goto get_irqchip_out; | 3085 | goto get_irqchip_out; |
3204 | r = -EFAULT; | 3086 | r = -EFAULT; |
3205 | if (copy_to_user(argp, chip, sizeof *chip)) | 3087 | if (copy_to_user(argp, chip, sizeof *chip)) |
3206 | goto get_irqchip_out; | 3088 | goto get_irqchip_out; |
3207 | r = 0; | 3089 | r = 0; |
3208 | get_irqchip_out: | 3090 | get_irqchip_out: |
3209 | kfree(chip); | 3091 | kfree(chip); |
3210 | if (r) | 3092 | if (r) |
3211 | goto out; | 3093 | goto out; |
3212 | break; | 3094 | break; |
3213 | } | 3095 | } |
3214 | case KVM_SET_IRQCHIP: { | 3096 | case KVM_SET_IRQCHIP: { |
3215 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ | 3097 | /* 0: PIC master, 1: PIC slave, 2: IOAPIC */ |
3216 | struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); | 3098 | struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL); |
3217 | 3099 | ||
3218 | r = -ENOMEM; | 3100 | r = -ENOMEM; |
3219 | if (!chip) | 3101 | if (!chip) |
3220 | goto out; | 3102 | goto out; |
3221 | r = -EFAULT; | 3103 | r = -EFAULT; |
3222 | if (copy_from_user(chip, argp, sizeof *chip)) | 3104 | if (copy_from_user(chip, argp, sizeof *chip)) |
3223 | goto set_irqchip_out; | 3105 | goto set_irqchip_out; |
3224 | r = -ENXIO; | 3106 | r = -ENXIO; |
3225 | if (!irqchip_in_kernel(kvm)) | 3107 | if (!irqchip_in_kernel(kvm)) |
3226 | goto set_irqchip_out; | 3108 | goto set_irqchip_out; |
3227 | r = kvm_vm_ioctl_set_irqchip(kvm, chip); | 3109 | r = kvm_vm_ioctl_set_irqchip(kvm, chip); |
3228 | if (r) | 3110 | if (r) |
3229 | goto set_irqchip_out; | 3111 | goto set_irqchip_out; |
3230 | r = 0; | 3112 | r = 0; |
3231 | set_irqchip_out: | 3113 | set_irqchip_out: |
3232 | kfree(chip); | 3114 | kfree(chip); |
3233 | if (r) | 3115 | if (r) |
3234 | goto out; | 3116 | goto out; |
3235 | break; | 3117 | break; |
3236 | } | 3118 | } |
3237 | case KVM_GET_PIT: { | 3119 | case KVM_GET_PIT: { |
3238 | r = -EFAULT; | 3120 | r = -EFAULT; |
3239 | if (copy_from_user(&u.ps, argp, sizeof(struct kvm_pit_state))) | 3121 | if (copy_from_user(&u.ps, argp, sizeof(struct kvm_pit_state))) |
3240 | goto out; | 3122 | goto out; |
3241 | r = -ENXIO; | 3123 | r = -ENXIO; |
3242 | if (!kvm->arch.vpit) | 3124 | if (!kvm->arch.vpit) |
3243 | goto out; | 3125 | goto out; |
3244 | r = kvm_vm_ioctl_get_pit(kvm, &u.ps); | 3126 | r = kvm_vm_ioctl_get_pit(kvm, &u.ps); |
3245 | if (r) | 3127 | if (r) |
3246 | goto out; | 3128 | goto out; |
3247 | r = -EFAULT; | 3129 | r = -EFAULT; |
3248 | if (copy_to_user(argp, &u.ps, sizeof(struct kvm_pit_state))) | 3130 | if (copy_to_user(argp, &u.ps, sizeof(struct kvm_pit_state))) |
3249 | goto out; | 3131 | goto out; |
3250 | r = 0; | 3132 | r = 0; |
3251 | break; | 3133 | break; |
3252 | } | 3134 | } |
3253 | case KVM_SET_PIT: { | 3135 | case KVM_SET_PIT: { |
3254 | r = -EFAULT; | 3136 | r = -EFAULT; |
3255 | if (copy_from_user(&u.ps, argp, sizeof u.ps)) | 3137 | if (copy_from_user(&u.ps, argp, sizeof u.ps)) |
3256 | goto out; | 3138 | goto out; |
3257 | r = -ENXIO; | 3139 | r = -ENXIO; |
3258 | if (!kvm->arch.vpit) | 3140 | if (!kvm->arch.vpit) |
3259 | goto out; | 3141 | goto out; |
3260 | r = kvm_vm_ioctl_set_pit(kvm, &u.ps); | 3142 | r = kvm_vm_ioctl_set_pit(kvm, &u.ps); |
3261 | if (r) | 3143 | if (r) |
3262 | goto out; | 3144 | goto out; |
3263 | r = 0; | 3145 | r = 0; |
3264 | break; | 3146 | break; |
3265 | } | 3147 | } |
3266 | case KVM_GET_PIT2: { | 3148 | case KVM_GET_PIT2: { |
3267 | r = -ENXIO; | 3149 | r = -ENXIO; |
3268 | if (!kvm->arch.vpit) | 3150 | if (!kvm->arch.vpit) |
3269 | goto out; | 3151 | goto out; |
3270 | r = kvm_vm_ioctl_get_pit2(kvm, &u.ps2); | 3152 | r = kvm_vm_ioctl_get_pit2(kvm, &u.ps2); |
3271 | if (r) | 3153 | if (r) |
3272 | goto out; | 3154 | goto out; |
3273 | r = -EFAULT; | 3155 | r = -EFAULT; |
3274 | if (copy_to_user(argp, &u.ps2, sizeof(u.ps2))) | 3156 | if (copy_to_user(argp, &u.ps2, sizeof(u.ps2))) |
3275 | goto out; | 3157 | goto out; |
3276 | r = 0; | 3158 | r = 0; |
3277 | break; | 3159 | break; |
3278 | } | 3160 | } |
3279 | case KVM_SET_PIT2: { | 3161 | case KVM_SET_PIT2: { |
3280 | r = -EFAULT; | 3162 | r = -EFAULT; |
3281 | if (copy_from_user(&u.ps2, argp, sizeof(u.ps2))) | 3163 | if (copy_from_user(&u.ps2, argp, sizeof(u.ps2))) |
3282 | goto out; | 3164 | goto out; |
3283 | r = -ENXIO; | 3165 | r = -ENXIO; |
3284 | if (!kvm->arch.vpit) | 3166 | if (!kvm->arch.vpit) |
3285 | goto out; | 3167 | goto out; |
3286 | r = kvm_vm_ioctl_set_pit2(kvm, &u.ps2); | 3168 | r = kvm_vm_ioctl_set_pit2(kvm, &u.ps2); |
3287 | if (r) | 3169 | if (r) |
3288 | goto out; | 3170 | goto out; |
3289 | r = 0; | 3171 | r = 0; |
3290 | break; | 3172 | break; |
3291 | } | 3173 | } |
3292 | case KVM_REINJECT_CONTROL: { | 3174 | case KVM_REINJECT_CONTROL: { |
3293 | struct kvm_reinject_control control; | 3175 | struct kvm_reinject_control control; |
3294 | r = -EFAULT; | 3176 | r = -EFAULT; |
3295 | if (copy_from_user(&control, argp, sizeof(control))) | 3177 | if (copy_from_user(&control, argp, sizeof(control))) |
3296 | goto out; | 3178 | goto out; |
3297 | r = kvm_vm_ioctl_reinject(kvm, &control); | 3179 | r = kvm_vm_ioctl_reinject(kvm, &control); |
3298 | if (r) | 3180 | if (r) |
3299 | goto out; | 3181 | goto out; |
3300 | r = 0; | 3182 | r = 0; |
3301 | break; | 3183 | break; |
3302 | } | 3184 | } |
3303 | case KVM_XEN_HVM_CONFIG: { | 3185 | case KVM_XEN_HVM_CONFIG: { |
3304 | r = -EFAULT; | 3186 | r = -EFAULT; |
3305 | if (copy_from_user(&kvm->arch.xen_hvm_config, argp, | 3187 | if (copy_from_user(&kvm->arch.xen_hvm_config, argp, |
3306 | sizeof(struct kvm_xen_hvm_config))) | 3188 | sizeof(struct kvm_xen_hvm_config))) |
3307 | goto out; | 3189 | goto out; |
3308 | r = -EINVAL; | 3190 | r = -EINVAL; |
3309 | if (kvm->arch.xen_hvm_config.flags) | 3191 | if (kvm->arch.xen_hvm_config.flags) |
3310 | goto out; | 3192 | goto out; |
3311 | r = 0; | 3193 | r = 0; |
3312 | break; | 3194 | break; |
3313 | } | 3195 | } |
3314 | case KVM_SET_CLOCK: { | 3196 | case KVM_SET_CLOCK: { |
3315 | struct timespec now; | 3197 | struct timespec now; |
3316 | struct kvm_clock_data user_ns; | 3198 | struct kvm_clock_data user_ns; |
3317 | u64 now_ns; | 3199 | u64 now_ns; |
3318 | s64 delta; | 3200 | s64 delta; |
3319 | 3201 | ||
3320 | r = -EFAULT; | 3202 | r = -EFAULT; |
3321 | if (copy_from_user(&user_ns, argp, sizeof(user_ns))) | 3203 | if (copy_from_user(&user_ns, argp, sizeof(user_ns))) |
3322 | goto out; | 3204 | goto out; |
3323 | 3205 | ||
3324 | r = -EINVAL; | 3206 | r = -EINVAL; |
3325 | if (user_ns.flags) | 3207 | if (user_ns.flags) |
3326 | goto out; | 3208 | goto out; |
3327 | 3209 | ||
3328 | r = 0; | 3210 | r = 0; |
3329 | ktime_get_ts(&now); | 3211 | ktime_get_ts(&now); |
3330 | now_ns = timespec_to_ns(&now); | 3212 | now_ns = timespec_to_ns(&now); |
3331 | delta = user_ns.clock - now_ns; | 3213 | delta = user_ns.clock - now_ns; |
3332 | kvm->arch.kvmclock_offset = delta; | 3214 | kvm->arch.kvmclock_offset = delta; |
3333 | break; | 3215 | break; |
3334 | } | 3216 | } |
3335 | case KVM_GET_CLOCK: { | 3217 | case KVM_GET_CLOCK: { |
3336 | struct timespec now; | 3218 | struct timespec now; |
3337 | struct kvm_clock_data user_ns; | 3219 | struct kvm_clock_data user_ns; |
3338 | u64 now_ns; | 3220 | u64 now_ns; |
3339 | 3221 | ||
3340 | ktime_get_ts(&now); | 3222 | ktime_get_ts(&now); |
3341 | now_ns = timespec_to_ns(&now); | 3223 | now_ns = timespec_to_ns(&now); |
3342 | user_ns.clock = kvm->arch.kvmclock_offset + now_ns; | 3224 | user_ns.clock = kvm->arch.kvmclock_offset + now_ns; |
3343 | user_ns.flags = 0; | 3225 | user_ns.flags = 0; |
3344 | 3226 | ||
3345 | r = -EFAULT; | 3227 | r = -EFAULT; |
3346 | if (copy_to_user(argp, &user_ns, sizeof(user_ns))) | 3228 | if (copy_to_user(argp, &user_ns, sizeof(user_ns))) |
3347 | goto out; | 3229 | goto out; |
3348 | r = 0; | 3230 | r = 0; |
3349 | break; | 3231 | break; |
3350 | } | 3232 | } |
3351 | 3233 | ||
3352 | default: | 3234 | default: |
3353 | ; | 3235 | ; |
3354 | } | 3236 | } |
3355 | out: | 3237 | out: |
3356 | return r; | 3238 | return r; |
3357 | } | 3239 | } |
3358 | 3240 | ||
3359 | static void kvm_init_msr_list(void) | 3241 | static void kvm_init_msr_list(void) |
3360 | { | 3242 | { |
3361 | u32 dummy[2]; | 3243 | u32 dummy[2]; |
3362 | unsigned i, j; | 3244 | unsigned i, j; |
3363 | 3245 | ||
3364 | /* skip the first msrs in the list. KVM-specific */ | 3246 | /* skip the first msrs in the list. KVM-specific */ |
3365 | for (i = j = KVM_SAVE_MSRS_BEGIN; i < ARRAY_SIZE(msrs_to_save); i++) { | 3247 | for (i = j = KVM_SAVE_MSRS_BEGIN; i < ARRAY_SIZE(msrs_to_save); i++) { |
3366 | if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0) | 3248 | if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0) |
3367 | continue; | 3249 | continue; |
3368 | if (j < i) | 3250 | if (j < i) |
3369 | msrs_to_save[j] = msrs_to_save[i]; | 3251 | msrs_to_save[j] = msrs_to_save[i]; |
3370 | j++; | 3252 | j++; |
3371 | } | 3253 | } |
3372 | num_msrs_to_save = j; | 3254 | num_msrs_to_save = j; |
3373 | } | 3255 | } |
3374 | 3256 | ||
3375 | static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, | 3257 | static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, |
3376 | const void *v) | 3258 | const void *v) |
3377 | { | 3259 | { |
3378 | if (vcpu->arch.apic && | 3260 | if (vcpu->arch.apic && |
3379 | !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, len, v)) | 3261 | !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, len, v)) |
3380 | return 0; | 3262 | return 0; |
3381 | 3263 | ||
3382 | return kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); | 3264 | return kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); |
3383 | } | 3265 | } |
3384 | 3266 | ||
3385 | static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) | 3267 | static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) |
3386 | { | 3268 | { |
3387 | if (vcpu->arch.apic && | 3269 | if (vcpu->arch.apic && |
3388 | !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, len, v)) | 3270 | !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, len, v)) |
3389 | return 0; | 3271 | return 0; |
3390 | 3272 | ||
3391 | return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); | 3273 | return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); |
3392 | } | 3274 | } |
3393 | 3275 | ||
3394 | static void kvm_set_segment(struct kvm_vcpu *vcpu, | 3276 | static void kvm_set_segment(struct kvm_vcpu *vcpu, |
3395 | struct kvm_segment *var, int seg) | 3277 | struct kvm_segment *var, int seg) |
3396 | { | 3278 | { |
3397 | kvm_x86_ops->set_segment(vcpu, var, seg); | 3279 | kvm_x86_ops->set_segment(vcpu, var, seg); |
3398 | } | 3280 | } |
3399 | 3281 | ||
3400 | void kvm_get_segment(struct kvm_vcpu *vcpu, | 3282 | void kvm_get_segment(struct kvm_vcpu *vcpu, |
3401 | struct kvm_segment *var, int seg) | 3283 | struct kvm_segment *var, int seg) |
3402 | { | 3284 | { |
3403 | kvm_x86_ops->get_segment(vcpu, var, seg); | 3285 | kvm_x86_ops->get_segment(vcpu, var, seg); |
3404 | } | 3286 | } |
3405 | 3287 | ||
3406 | gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) | 3288 | gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) |
3407 | { | 3289 | { |
3408 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; | 3290 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; |
3409 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); | 3291 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); |
3410 | } | 3292 | } |
3411 | 3293 | ||
3412 | gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) | 3294 | gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) |
3413 | { | 3295 | { |
3414 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; | 3296 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; |
3415 | access |= PFERR_FETCH_MASK; | 3297 | access |= PFERR_FETCH_MASK; |
3416 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); | 3298 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); |
3417 | } | 3299 | } |
3418 | 3300 | ||
3419 | gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) | 3301 | gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) |
3420 | { | 3302 | { |
3421 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; | 3303 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; |
3422 | access |= PFERR_WRITE_MASK; | 3304 | access |= PFERR_WRITE_MASK; |
3423 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); | 3305 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error); |
3424 | } | 3306 | } |
3425 | 3307 | ||
3426 | /* uses this to access any guest's mapped memory without checking CPL */ | 3308 | /* uses this to access any guest's mapped memory without checking CPL */ |
3427 | gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) | 3309 | gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) |
3428 | { | 3310 | { |
3429 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error); | 3311 | return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error); |
3430 | } | 3312 | } |
3431 | 3313 | ||
3432 | static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, | 3314 | static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, |
3433 | struct kvm_vcpu *vcpu, u32 access, | 3315 | struct kvm_vcpu *vcpu, u32 access, |
3434 | u32 *error) | 3316 | u32 *error) |
3435 | { | 3317 | { |
3436 | void *data = val; | 3318 | void *data = val; |
3437 | int r = X86EMUL_CONTINUE; | 3319 | int r = X86EMUL_CONTINUE; |
3438 | 3320 | ||
3439 | while (bytes) { | 3321 | while (bytes) { |
3440 | gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error); | 3322 | gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error); |
3441 | unsigned offset = addr & (PAGE_SIZE-1); | 3323 | unsigned offset = addr & (PAGE_SIZE-1); |
3442 | unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); | 3324 | unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); |
3443 | int ret; | 3325 | int ret; |
3444 | 3326 | ||
3445 | if (gpa == UNMAPPED_GVA) { | 3327 | if (gpa == UNMAPPED_GVA) { |
3446 | r = X86EMUL_PROPAGATE_FAULT; | 3328 | r = X86EMUL_PROPAGATE_FAULT; |
3447 | goto out; | 3329 | goto out; |
3448 | } | 3330 | } |
3449 | ret = kvm_read_guest(vcpu->kvm, gpa, data, toread); | 3331 | ret = kvm_read_guest(vcpu->kvm, gpa, data, toread); |
3450 | if (ret < 0) { | 3332 | if (ret < 0) { |
3451 | r = X86EMUL_IO_NEEDED; | 3333 | r = X86EMUL_IO_NEEDED; |
3452 | goto out; | 3334 | goto out; |
3453 | } | 3335 | } |
3454 | 3336 | ||
3455 | bytes -= toread; | 3337 | bytes -= toread; |
3456 | data += toread; | 3338 | data += toread; |
3457 | addr += toread; | 3339 | addr += toread; |
3458 | } | 3340 | } |
3459 | out: | 3341 | out: |
3460 | return r; | 3342 | return r; |
3461 | } | 3343 | } |
3462 | 3344 | ||
3463 | /* used for instruction fetching */ | 3345 | /* used for instruction fetching */ |
3464 | static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes, | 3346 | static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes, |
3465 | struct kvm_vcpu *vcpu, u32 *error) | 3347 | struct kvm_vcpu *vcpu, u32 *error) |
3466 | { | 3348 | { |
3467 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; | 3349 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; |
3468 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, | 3350 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, |
3469 | access | PFERR_FETCH_MASK, error); | 3351 | access | PFERR_FETCH_MASK, error); |
3470 | } | 3352 | } |
3471 | 3353 | ||
3472 | static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes, | 3354 | static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes, |
3473 | struct kvm_vcpu *vcpu, u32 *error) | 3355 | struct kvm_vcpu *vcpu, u32 *error) |
3474 | { | 3356 | { |
3475 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; | 3357 | u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; |
3476 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, | 3358 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, |
3477 | error); | 3359 | error); |
3478 | } | 3360 | } |
3479 | 3361 | ||
3480 | static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, | 3362 | static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, |
3481 | struct kvm_vcpu *vcpu, u32 *error) | 3363 | struct kvm_vcpu *vcpu, u32 *error) |
3482 | { | 3364 | { |
3483 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); | 3365 | return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); |
3484 | } | 3366 | } |
3485 | 3367 | ||
3486 | static int kvm_write_guest_virt_system(gva_t addr, void *val, | 3368 | static int kvm_write_guest_virt_system(gva_t addr, void *val, |
3487 | unsigned int bytes, | 3369 | unsigned int bytes, |
3488 | struct kvm_vcpu *vcpu, | 3370 | struct kvm_vcpu *vcpu, |
3489 | u32 *error) | 3371 | u32 *error) |
3490 | { | 3372 | { |
3491 | void *data = val; | 3373 | void *data = val; |
3492 | int r = X86EMUL_CONTINUE; | 3374 | int r = X86EMUL_CONTINUE; |
3493 | 3375 | ||
3494 | while (bytes) { | 3376 | while (bytes) { |
3495 | gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, | 3377 | gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, |
3496 | PFERR_WRITE_MASK, error); | 3378 | PFERR_WRITE_MASK, error); |
3497 | unsigned offset = addr & (PAGE_SIZE-1); | 3379 | unsigned offset = addr & (PAGE_SIZE-1); |
3498 | unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); | 3380 | unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); |
3499 | int ret; | 3381 | int ret; |
3500 | 3382 | ||
3501 | if (gpa == UNMAPPED_GVA) { | 3383 | if (gpa == UNMAPPED_GVA) { |
3502 | r = X86EMUL_PROPAGATE_FAULT; | 3384 | r = X86EMUL_PROPAGATE_FAULT; |
3503 | goto out; | 3385 | goto out; |
3504 | } | 3386 | } |
3505 | ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite); | 3387 | ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite); |
3506 | if (ret < 0) { | 3388 | if (ret < 0) { |
3507 | r = X86EMUL_IO_NEEDED; | 3389 | r = X86EMUL_IO_NEEDED; |
3508 | goto out; | 3390 | goto out; |
3509 | } | 3391 | } |
3510 | 3392 | ||
3511 | bytes -= towrite; | 3393 | bytes -= towrite; |
3512 | data += towrite; | 3394 | data += towrite; |
3513 | addr += towrite; | 3395 | addr += towrite; |
3514 | } | 3396 | } |
3515 | out: | 3397 | out: |
3516 | return r; | 3398 | return r; |
3517 | } | 3399 | } |
3518 | 3400 | ||
3519 | static int emulator_read_emulated(unsigned long addr, | 3401 | static int emulator_read_emulated(unsigned long addr, |
3520 | void *val, | 3402 | void *val, |
3521 | unsigned int bytes, | 3403 | unsigned int bytes, |
3522 | unsigned int *error_code, | 3404 | unsigned int *error_code, |
3523 | struct kvm_vcpu *vcpu) | 3405 | struct kvm_vcpu *vcpu) |
3524 | { | 3406 | { |
3525 | gpa_t gpa; | 3407 | gpa_t gpa; |
3526 | 3408 | ||
3527 | if (vcpu->mmio_read_completed) { | 3409 | if (vcpu->mmio_read_completed) { |
3528 | memcpy(val, vcpu->mmio_data, bytes); | 3410 | memcpy(val, vcpu->mmio_data, bytes); |
3529 | trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, | 3411 | trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, |
3530 | vcpu->mmio_phys_addr, *(u64 *)val); | 3412 | vcpu->mmio_phys_addr, *(u64 *)val); |
3531 | vcpu->mmio_read_completed = 0; | 3413 | vcpu->mmio_read_completed = 0; |
3532 | return X86EMUL_CONTINUE; | 3414 | return X86EMUL_CONTINUE; |
3533 | } | 3415 | } |
3534 | 3416 | ||
3535 | gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code); | 3417 | gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code); |
3536 | 3418 | ||
3537 | if (gpa == UNMAPPED_GVA) | 3419 | if (gpa == UNMAPPED_GVA) |
3538 | return X86EMUL_PROPAGATE_FAULT; | 3420 | return X86EMUL_PROPAGATE_FAULT; |
3539 | 3421 | ||
3540 | /* For APIC access vmexit */ | 3422 | /* For APIC access vmexit */ |
3541 | if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) | 3423 | if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) |
3542 | goto mmio; | 3424 | goto mmio; |
3543 | 3425 | ||
3544 | if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL) | 3426 | if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL) |
3545 | == X86EMUL_CONTINUE) | 3427 | == X86EMUL_CONTINUE) |
3546 | return X86EMUL_CONTINUE; | 3428 | return X86EMUL_CONTINUE; |
3547 | 3429 | ||
3548 | mmio: | 3430 | mmio: |
3549 | /* | 3431 | /* |
3550 | * Is this MMIO handled locally? | 3432 | * Is this MMIO handled locally? |
3551 | */ | 3433 | */ |
3552 | if (!vcpu_mmio_read(vcpu, gpa, bytes, val)) { | 3434 | if (!vcpu_mmio_read(vcpu, gpa, bytes, val)) { |
3553 | trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, gpa, *(u64 *)val); | 3435 | trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, gpa, *(u64 *)val); |
3554 | return X86EMUL_CONTINUE; | 3436 | return X86EMUL_CONTINUE; |
3555 | } | 3437 | } |
3556 | 3438 | ||
3557 | trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0); | 3439 | trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0); |
3558 | 3440 | ||
3559 | vcpu->mmio_needed = 1; | 3441 | vcpu->mmio_needed = 1; |
3560 | vcpu->run->exit_reason = KVM_EXIT_MMIO; | 3442 | vcpu->run->exit_reason = KVM_EXIT_MMIO; |
3561 | vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; | 3443 | vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; |
3562 | vcpu->run->mmio.len = vcpu->mmio_size = bytes; | 3444 | vcpu->run->mmio.len = vcpu->mmio_size = bytes; |
3563 | vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0; | 3445 | vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0; |
3564 | 3446 | ||
3565 | return X86EMUL_IO_NEEDED; | 3447 | return X86EMUL_IO_NEEDED; |
3566 | } | 3448 | } |
3567 | 3449 | ||
3568 | int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, | 3450 | int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, |
3569 | const void *val, int bytes) | 3451 | const void *val, int bytes) |
3570 | { | 3452 | { |
3571 | int ret; | 3453 | int ret; |
3572 | 3454 | ||
3573 | ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes); | 3455 | ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes); |
3574 | if (ret < 0) | 3456 | if (ret < 0) |
3575 | return 0; | 3457 | return 0; |
3576 | kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); | 3458 | kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); |
3577 | return 1; | 3459 | return 1; |
3578 | } | 3460 | } |
3579 | 3461 | ||
3580 | static int emulator_write_emulated_onepage(unsigned long addr, | 3462 | static int emulator_write_emulated_onepage(unsigned long addr, |
3581 | const void *val, | 3463 | const void *val, |
3582 | unsigned int bytes, | 3464 | unsigned int bytes, |
3583 | unsigned int *error_code, | 3465 | unsigned int *error_code, |
3584 | struct kvm_vcpu *vcpu) | 3466 | struct kvm_vcpu *vcpu) |
3585 | { | 3467 | { |
3586 | gpa_t gpa; | 3468 | gpa_t gpa; |
3587 | 3469 | ||
3588 | gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); | 3470 | gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); |
3589 | 3471 | ||
3590 | if (gpa == UNMAPPED_GVA) | 3472 | if (gpa == UNMAPPED_GVA) |
3591 | return X86EMUL_PROPAGATE_FAULT; | 3473 | return X86EMUL_PROPAGATE_FAULT; |
3592 | 3474 | ||
3593 | /* For APIC access vmexit */ | 3475 | /* For APIC access vmexit */ |
3594 | if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) | 3476 | if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) |
3595 | goto mmio; | 3477 | goto mmio; |
3596 | 3478 | ||
3597 | if (emulator_write_phys(vcpu, gpa, val, bytes)) | 3479 | if (emulator_write_phys(vcpu, gpa, val, bytes)) |
3598 | return X86EMUL_CONTINUE; | 3480 | return X86EMUL_CONTINUE; |
3599 | 3481 | ||
3600 | mmio: | 3482 | mmio: |
3601 | trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val); | 3483 | trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val); |
3602 | /* | 3484 | /* |
3603 | * Is this MMIO handled locally? | 3485 | * Is this MMIO handled locally? |
3604 | */ | 3486 | */ |
3605 | if (!vcpu_mmio_write(vcpu, gpa, bytes, val)) | 3487 | if (!vcpu_mmio_write(vcpu, gpa, bytes, val)) |
3606 | return X86EMUL_CONTINUE; | 3488 | return X86EMUL_CONTINUE; |
3607 | 3489 | ||
3608 | vcpu->mmio_needed = 1; | 3490 | vcpu->mmio_needed = 1; |
3609 | vcpu->run->exit_reason = KVM_EXIT_MMIO; | 3491 | vcpu->run->exit_reason = KVM_EXIT_MMIO; |
3610 | vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; | 3492 | vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; |
3611 | vcpu->run->mmio.len = vcpu->mmio_size = bytes; | 3493 | vcpu->run->mmio.len = vcpu->mmio_size = bytes; |
3612 | vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1; | 3494 | vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1; |
3613 | memcpy(vcpu->run->mmio.data, val, bytes); | 3495 | memcpy(vcpu->run->mmio.data, val, bytes); |
3614 | 3496 | ||
3615 | return X86EMUL_CONTINUE; | 3497 | return X86EMUL_CONTINUE; |
3616 | } | 3498 | } |
3617 | 3499 | ||
3618 | int emulator_write_emulated(unsigned long addr, | 3500 | int emulator_write_emulated(unsigned long addr, |
3619 | const void *val, | 3501 | const void *val, |
3620 | unsigned int bytes, | 3502 | unsigned int bytes, |
3621 | unsigned int *error_code, | 3503 | unsigned int *error_code, |
3622 | struct kvm_vcpu *vcpu) | 3504 | struct kvm_vcpu *vcpu) |
3623 | { | 3505 | { |
3624 | /* Crossing a page boundary? */ | 3506 | /* Crossing a page boundary? */ |
3625 | if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { | 3507 | if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { |
3626 | int rc, now; | 3508 | int rc, now; |
3627 | 3509 | ||
3628 | now = -addr & ~PAGE_MASK; | 3510 | now = -addr & ~PAGE_MASK; |
3629 | rc = emulator_write_emulated_onepage(addr, val, now, error_code, | 3511 | rc = emulator_write_emulated_onepage(addr, val, now, error_code, |
3630 | vcpu); | 3512 | vcpu); |
3631 | if (rc != X86EMUL_CONTINUE) | 3513 | if (rc != X86EMUL_CONTINUE) |
3632 | return rc; | 3514 | return rc; |
3633 | addr += now; | 3515 | addr += now; |
3634 | val += now; | 3516 | val += now; |
3635 | bytes -= now; | 3517 | bytes -= now; |
3636 | } | 3518 | } |
3637 | return emulator_write_emulated_onepage(addr, val, bytes, error_code, | 3519 | return emulator_write_emulated_onepage(addr, val, bytes, error_code, |
3638 | vcpu); | 3520 | vcpu); |
3639 | } | 3521 | } |
3640 | 3522 | ||
3641 | #define CMPXCHG_TYPE(t, ptr, old, new) \ | 3523 | #define CMPXCHG_TYPE(t, ptr, old, new) \ |
3642 | (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) | 3524 | (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) |
3643 | 3525 | ||
3644 | #ifdef CONFIG_X86_64 | 3526 | #ifdef CONFIG_X86_64 |
3645 | # define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) | 3527 | # define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) |
3646 | #else | 3528 | #else |
3647 | # define CMPXCHG64(ptr, old, new) \ | 3529 | # define CMPXCHG64(ptr, old, new) \ |
3648 | (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old)) | 3530 | (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old)) |
3649 | #endif | 3531 | #endif |
3650 | 3532 | ||
3651 | static int emulator_cmpxchg_emulated(unsigned long addr, | 3533 | static int emulator_cmpxchg_emulated(unsigned long addr, |
3652 | const void *old, | 3534 | const void *old, |
3653 | const void *new, | 3535 | const void *new, |
3654 | unsigned int bytes, | 3536 | unsigned int bytes, |
3655 | unsigned int *error_code, | 3537 | unsigned int *error_code, |
3656 | struct kvm_vcpu *vcpu) | 3538 | struct kvm_vcpu *vcpu) |
3657 | { | 3539 | { |
3658 | gpa_t gpa; | 3540 | gpa_t gpa; |
3659 | struct page *page; | 3541 | struct page *page; |
3660 | char *kaddr; | 3542 | char *kaddr; |
3661 | bool exchanged; | 3543 | bool exchanged; |
3662 | 3544 | ||
3663 | /* guests cmpxchg8b have to be emulated atomically */ | 3545 | /* guests cmpxchg8b have to be emulated atomically */ |
3664 | if (bytes > 8 || (bytes & (bytes - 1))) | 3546 | if (bytes > 8 || (bytes & (bytes - 1))) |
3665 | goto emul_write; | 3547 | goto emul_write; |
3666 | 3548 | ||
3667 | gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); | 3549 | gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); |
3668 | 3550 | ||
3669 | if (gpa == UNMAPPED_GVA || | 3551 | if (gpa == UNMAPPED_GVA || |
3670 | (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) | 3552 | (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) |
3671 | goto emul_write; | 3553 | goto emul_write; |
3672 | 3554 | ||
3673 | if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) | 3555 | if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) |
3674 | goto emul_write; | 3556 | goto emul_write; |
3675 | 3557 | ||
3676 | page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); | 3558 | page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); |
3677 | 3559 | ||
3678 | kaddr = kmap_atomic(page, KM_USER0); | 3560 | kaddr = kmap_atomic(page, KM_USER0); |
3679 | kaddr += offset_in_page(gpa); | 3561 | kaddr += offset_in_page(gpa); |
3680 | switch (bytes) { | 3562 | switch (bytes) { |
3681 | case 1: | 3563 | case 1: |
3682 | exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); | 3564 | exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); |
3683 | break; | 3565 | break; |
3684 | case 2: | 3566 | case 2: |
3685 | exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); | 3567 | exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); |
3686 | break; | 3568 | break; |
3687 | case 4: | 3569 | case 4: |
3688 | exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); | 3570 | exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); |
3689 | break; | 3571 | break; |
3690 | case 8: | 3572 | case 8: |
3691 | exchanged = CMPXCHG64(kaddr, old, new); | 3573 | exchanged = CMPXCHG64(kaddr, old, new); |
3692 | break; | 3574 | break; |
3693 | default: | 3575 | default: |
3694 | BUG(); | 3576 | BUG(); |
3695 | } | 3577 | } |
3696 | kunmap_atomic(kaddr, KM_USER0); | 3578 | kunmap_atomic(kaddr, KM_USER0); |
3697 | kvm_release_page_dirty(page); | 3579 | kvm_release_page_dirty(page); |
3698 | 3580 | ||
3699 | if (!exchanged) | 3581 | if (!exchanged) |
3700 | return X86EMUL_CMPXCHG_FAILED; | 3582 | return X86EMUL_CMPXCHG_FAILED; |
3701 | 3583 | ||
3702 | kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1); | 3584 | kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1); |
3703 | 3585 | ||
3704 | return X86EMUL_CONTINUE; | 3586 | return X86EMUL_CONTINUE; |
3705 | 3587 | ||
3706 | emul_write: | 3588 | emul_write: |
3707 | printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); | 3589 | printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); |
3708 | 3590 | ||
3709 | return emulator_write_emulated(addr, new, bytes, error_code, vcpu); | 3591 | return emulator_write_emulated(addr, new, bytes, error_code, vcpu); |
3710 | } | 3592 | } |
3711 | 3593 | ||
3712 | static int kernel_pio(struct kvm_vcpu *vcpu, void *pd) | 3594 | static int kernel_pio(struct kvm_vcpu *vcpu, void *pd) |
3713 | { | 3595 | { |
3714 | /* TODO: String I/O for in kernel device */ | 3596 | /* TODO: String I/O for in kernel device */ |
3715 | int r; | 3597 | int r; |
3716 | 3598 | ||
3717 | if (vcpu->arch.pio.in) | 3599 | if (vcpu->arch.pio.in) |
3718 | r = kvm_io_bus_read(vcpu->kvm, KVM_PIO_BUS, vcpu->arch.pio.port, | 3600 | r = kvm_io_bus_read(vcpu->kvm, KVM_PIO_BUS, vcpu->arch.pio.port, |
3719 | vcpu->arch.pio.size, pd); | 3601 | vcpu->arch.pio.size, pd); |
3720 | else | 3602 | else |
3721 | r = kvm_io_bus_write(vcpu->kvm, KVM_PIO_BUS, | 3603 | r = kvm_io_bus_write(vcpu->kvm, KVM_PIO_BUS, |
3722 | vcpu->arch.pio.port, vcpu->arch.pio.size, | 3604 | vcpu->arch.pio.port, vcpu->arch.pio.size, |
3723 | pd); | 3605 | pd); |
3724 | return r; | 3606 | return r; |
3725 | } | 3607 | } |
3726 | 3608 | ||
3727 | 3609 | ||
3728 | static int emulator_pio_in_emulated(int size, unsigned short port, void *val, | 3610 | static int emulator_pio_in_emulated(int size, unsigned short port, void *val, |
3729 | unsigned int count, struct kvm_vcpu *vcpu) | 3611 | unsigned int count, struct kvm_vcpu *vcpu) |
3730 | { | 3612 | { |
3731 | if (vcpu->arch.pio.count) | 3613 | if (vcpu->arch.pio.count) |
3732 | goto data_avail; | 3614 | goto data_avail; |
3733 | 3615 | ||
3734 | trace_kvm_pio(1, port, size, 1); | 3616 | trace_kvm_pio(1, port, size, 1); |
3735 | 3617 | ||
3736 | vcpu->arch.pio.port = port; | 3618 | vcpu->arch.pio.port = port; |
3737 | vcpu->arch.pio.in = 1; | 3619 | vcpu->arch.pio.in = 1; |
3738 | vcpu->arch.pio.count = count; | 3620 | vcpu->arch.pio.count = count; |
3739 | vcpu->arch.pio.size = size; | 3621 | vcpu->arch.pio.size = size; |
3740 | 3622 | ||
3741 | if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { | 3623 | if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { |
3742 | data_avail: | 3624 | data_avail: |
3743 | memcpy(val, vcpu->arch.pio_data, size * count); | 3625 | memcpy(val, vcpu->arch.pio_data, size * count); |
3744 | vcpu->arch.pio.count = 0; | 3626 | vcpu->arch.pio.count = 0; |
3745 | return 1; | 3627 | return 1; |
3746 | } | 3628 | } |
3747 | 3629 | ||
3748 | vcpu->run->exit_reason = KVM_EXIT_IO; | 3630 | vcpu->run->exit_reason = KVM_EXIT_IO; |
3749 | vcpu->run->io.direction = KVM_EXIT_IO_IN; | 3631 | vcpu->run->io.direction = KVM_EXIT_IO_IN; |
3750 | vcpu->run->io.size = size; | 3632 | vcpu->run->io.size = size; |
3751 | vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; | 3633 | vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; |
3752 | vcpu->run->io.count = count; | 3634 | vcpu->run->io.count = count; |
3753 | vcpu->run->io.port = port; | 3635 | vcpu->run->io.port = port; |
3754 | 3636 | ||
3755 | return 0; | 3637 | return 0; |
3756 | } | 3638 | } |
3757 | 3639 | ||
3758 | static int emulator_pio_out_emulated(int size, unsigned short port, | 3640 | static int emulator_pio_out_emulated(int size, unsigned short port, |
3759 | const void *val, unsigned int count, | 3641 | const void *val, unsigned int count, |
3760 | struct kvm_vcpu *vcpu) | 3642 | struct kvm_vcpu *vcpu) |
3761 | { | 3643 | { |
3762 | trace_kvm_pio(0, port, size, 1); | 3644 | trace_kvm_pio(0, port, size, 1); |
3763 | 3645 | ||
3764 | vcpu->arch.pio.port = port; | 3646 | vcpu->arch.pio.port = port; |
3765 | vcpu->arch.pio.in = 0; | 3647 | vcpu->arch.pio.in = 0; |
3766 | vcpu->arch.pio.count = count; | 3648 | vcpu->arch.pio.count = count; |
3767 | vcpu->arch.pio.size = size; | 3649 | vcpu->arch.pio.size = size; |
3768 | 3650 | ||
3769 | memcpy(vcpu->arch.pio_data, val, size * count); | 3651 | memcpy(vcpu->arch.pio_data, val, size * count); |
3770 | 3652 | ||
3771 | if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { | 3653 | if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { |
3772 | vcpu->arch.pio.count = 0; | 3654 | vcpu->arch.pio.count = 0; |
3773 | return 1; | 3655 | return 1; |
3774 | } | 3656 | } |
3775 | 3657 | ||
3776 | vcpu->run->exit_reason = KVM_EXIT_IO; | 3658 | vcpu->run->exit_reason = KVM_EXIT_IO; |
3777 | vcpu->run->io.direction = KVM_EXIT_IO_OUT; | 3659 | vcpu->run->io.direction = KVM_EXIT_IO_OUT; |
3778 | vcpu->run->io.size = size; | 3660 | vcpu->run->io.size = size; |
3779 | vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; | 3661 | vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; |
3780 | vcpu->run->io.count = count; | 3662 | vcpu->run->io.count = count; |
3781 | vcpu->run->io.port = port; | 3663 | vcpu->run->io.port = port; |
3782 | 3664 | ||
3783 | return 0; | 3665 | return 0; |
3784 | } | 3666 | } |
3785 | 3667 | ||
3786 | static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg) | 3668 | static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg) |
3787 | { | 3669 | { |
3788 | return kvm_x86_ops->get_segment_base(vcpu, seg); | 3670 | return kvm_x86_ops->get_segment_base(vcpu, seg); |
3789 | } | 3671 | } |
3790 | 3672 | ||
3791 | int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address) | 3673 | int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address) |
3792 | { | 3674 | { |
3793 | kvm_mmu_invlpg(vcpu, address); | 3675 | kvm_mmu_invlpg(vcpu, address); |
3794 | return X86EMUL_CONTINUE; | 3676 | return X86EMUL_CONTINUE; |
3795 | } | 3677 | } |
3796 | 3678 | ||
3797 | int emulate_clts(struct kvm_vcpu *vcpu) | 3679 | int emulate_clts(struct kvm_vcpu *vcpu) |
3798 | { | 3680 | { |
3799 | kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS)); | 3681 | kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS)); |
3800 | kvm_x86_ops->fpu_activate(vcpu); | 3682 | kvm_x86_ops->fpu_activate(vcpu); |
3801 | return X86EMUL_CONTINUE; | 3683 | return X86EMUL_CONTINUE; |
3802 | } | 3684 | } |
3803 | 3685 | ||
3804 | int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu) | 3686 | int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu) |
3805 | { | 3687 | { |
3806 | return _kvm_get_dr(vcpu, dr, dest); | 3688 | return _kvm_get_dr(vcpu, dr, dest); |
3807 | } | 3689 | } |
3808 | 3690 | ||
3809 | int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu) | 3691 | int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu) |
3810 | { | 3692 | { |
3811 | 3693 | ||
3812 | return __kvm_set_dr(vcpu, dr, value); | 3694 | return __kvm_set_dr(vcpu, dr, value); |
3813 | } | 3695 | } |
3814 | 3696 | ||
3815 | static u64 mk_cr_64(u64 curr_cr, u32 new_val) | 3697 | static u64 mk_cr_64(u64 curr_cr, u32 new_val) |
3816 | { | 3698 | { |
3817 | return (curr_cr & ~((1ULL << 32) - 1)) | new_val; | 3699 | return (curr_cr & ~((1ULL << 32) - 1)) | new_val; |
3818 | } | 3700 | } |
3819 | 3701 | ||
3820 | static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) | 3702 | static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) |
3821 | { | 3703 | { |
3822 | unsigned long value; | 3704 | unsigned long value; |
3823 | 3705 | ||
3824 | switch (cr) { | 3706 | switch (cr) { |
3825 | case 0: | 3707 | case 0: |
3826 | value = kvm_read_cr0(vcpu); | 3708 | value = kvm_read_cr0(vcpu); |
3827 | break; | 3709 | break; |
3828 | case 2: | 3710 | case 2: |
3829 | value = vcpu->arch.cr2; | 3711 | value = vcpu->arch.cr2; |
3830 | break; | 3712 | break; |
3831 | case 3: | 3713 | case 3: |
3832 | value = vcpu->arch.cr3; | 3714 | value = vcpu->arch.cr3; |
3833 | break; | 3715 | break; |
3834 | case 4: | 3716 | case 4: |
3835 | value = kvm_read_cr4(vcpu); | 3717 | value = kvm_read_cr4(vcpu); |
3836 | break; | 3718 | break; |
3837 | case 8: | 3719 | case 8: |
3838 | value = kvm_get_cr8(vcpu); | 3720 | value = kvm_get_cr8(vcpu); |
3839 | break; | 3721 | break; |
3840 | default: | 3722 | default: |
3841 | vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); | 3723 | vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); |
3842 | return 0; | 3724 | return 0; |
3843 | } | 3725 | } |
3844 | 3726 | ||
3845 | return value; | 3727 | return value; |
3846 | } | 3728 | } |
3847 | 3729 | ||
3848 | static int emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) | 3730 | static int emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) |
3849 | { | 3731 | { |
3850 | int res = 0; | 3732 | int res = 0; |
3851 | 3733 | ||
3852 | switch (cr) { | 3734 | switch (cr) { |
3853 | case 0: | 3735 | case 0: |
3854 | res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); | 3736 | res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); |
3855 | break; | 3737 | break; |
3856 | case 2: | 3738 | case 2: |
3857 | vcpu->arch.cr2 = val; | 3739 | vcpu->arch.cr2 = val; |
3858 | break; | 3740 | break; |
3859 | case 3: | 3741 | case 3: |
3860 | res = kvm_set_cr3(vcpu, val); | 3742 | res = kvm_set_cr3(vcpu, val); |
3861 | break; | 3743 | break; |
3862 | case 4: | 3744 | case 4: |
3863 | res = kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); | 3745 | res = kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); |
3864 | break; | 3746 | break; |
3865 | case 8: | 3747 | case 8: |
3866 | res = __kvm_set_cr8(vcpu, val & 0xfUL); | 3748 | res = __kvm_set_cr8(vcpu, val & 0xfUL); |
3867 | break; | 3749 | break; |
3868 | default: | 3750 | default: |
3869 | vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); | 3751 | vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); |
3870 | res = -1; | 3752 | res = -1; |
3871 | } | 3753 | } |
3872 | 3754 | ||
3873 | return res; | 3755 | return res; |
3874 | } | 3756 | } |
3875 | 3757 | ||
3876 | static int emulator_get_cpl(struct kvm_vcpu *vcpu) | 3758 | static int emulator_get_cpl(struct kvm_vcpu *vcpu) |
3877 | { | 3759 | { |
3878 | return kvm_x86_ops->get_cpl(vcpu); | 3760 | return kvm_x86_ops->get_cpl(vcpu); |
3879 | } | 3761 | } |
3880 | 3762 | ||
3881 | static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) | 3763 | static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) |
3882 | { | 3764 | { |
3883 | kvm_x86_ops->get_gdt(vcpu, dt); | 3765 | kvm_x86_ops->get_gdt(vcpu, dt); |
3884 | } | 3766 | } |
3885 | 3767 | ||
3886 | static unsigned long emulator_get_cached_segment_base(int seg, | 3768 | static unsigned long emulator_get_cached_segment_base(int seg, |
3887 | struct kvm_vcpu *vcpu) | 3769 | struct kvm_vcpu *vcpu) |
3888 | { | 3770 | { |
3889 | return get_segment_base(vcpu, seg); | 3771 | return get_segment_base(vcpu, seg); |
3890 | } | 3772 | } |
3891 | 3773 | ||
3892 | static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg, | 3774 | static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg, |
3893 | struct kvm_vcpu *vcpu) | 3775 | struct kvm_vcpu *vcpu) |
3894 | { | 3776 | { |
3895 | struct kvm_segment var; | 3777 | struct kvm_segment var; |
3896 | 3778 | ||
3897 | kvm_get_segment(vcpu, &var, seg); | 3779 | kvm_get_segment(vcpu, &var, seg); |
3898 | 3780 | ||
3899 | if (var.unusable) | 3781 | if (var.unusable) |
3900 | return false; | 3782 | return false; |
3901 | 3783 | ||
3902 | if (var.g) | 3784 | if (var.g) |
3903 | var.limit >>= 12; | 3785 | var.limit >>= 12; |
3904 | set_desc_limit(desc, var.limit); | 3786 | set_desc_limit(desc, var.limit); |
3905 | set_desc_base(desc, (unsigned long)var.base); | 3787 | set_desc_base(desc, (unsigned long)var.base); |
3906 | desc->type = var.type; | 3788 | desc->type = var.type; |
3907 | desc->s = var.s; | 3789 | desc->s = var.s; |
3908 | desc->dpl = var.dpl; | 3790 | desc->dpl = var.dpl; |
3909 | desc->p = var.present; | 3791 | desc->p = var.present; |
3910 | desc->avl = var.avl; | 3792 | desc->avl = var.avl; |
3911 | desc->l = var.l; | 3793 | desc->l = var.l; |
3912 | desc->d = var.db; | 3794 | desc->d = var.db; |
3913 | desc->g = var.g; | 3795 | desc->g = var.g; |
3914 | 3796 | ||
3915 | return true; | 3797 | return true; |
3916 | } | 3798 | } |
3917 | 3799 | ||
3918 | static void emulator_set_cached_descriptor(struct desc_struct *desc, int seg, | 3800 | static void emulator_set_cached_descriptor(struct desc_struct *desc, int seg, |
3919 | struct kvm_vcpu *vcpu) | 3801 | struct kvm_vcpu *vcpu) |
3920 | { | 3802 | { |
3921 | struct kvm_segment var; | 3803 | struct kvm_segment var; |
3922 | 3804 | ||
3923 | /* needed to preserve selector */ | 3805 | /* needed to preserve selector */ |
3924 | kvm_get_segment(vcpu, &var, seg); | 3806 | kvm_get_segment(vcpu, &var, seg); |
3925 | 3807 | ||
3926 | var.base = get_desc_base(desc); | 3808 | var.base = get_desc_base(desc); |
3927 | var.limit = get_desc_limit(desc); | 3809 | var.limit = get_desc_limit(desc); |
3928 | if (desc->g) | 3810 | if (desc->g) |
3929 | var.limit = (var.limit << 12) | 0xfff; | 3811 | var.limit = (var.limit << 12) | 0xfff; |
3930 | var.type = desc->type; | 3812 | var.type = desc->type; |
3931 | var.present = desc->p; | 3813 | var.present = desc->p; |
3932 | var.dpl = desc->dpl; | 3814 | var.dpl = desc->dpl; |
3933 | var.db = desc->d; | 3815 | var.db = desc->d; |
3934 | var.s = desc->s; | 3816 | var.s = desc->s; |
3935 | var.l = desc->l; | 3817 | var.l = desc->l; |
3936 | var.g = desc->g; | 3818 | var.g = desc->g; |
3937 | var.avl = desc->avl; | 3819 | var.avl = desc->avl; |
3938 | var.present = desc->p; | 3820 | var.present = desc->p; |
3939 | var.unusable = !var.present; | 3821 | var.unusable = !var.present; |
3940 | var.padding = 0; | 3822 | var.padding = 0; |
3941 | 3823 | ||
3942 | kvm_set_segment(vcpu, &var, seg); | 3824 | kvm_set_segment(vcpu, &var, seg); |
3943 | return; | 3825 | return; |
3944 | } | 3826 | } |
3945 | 3827 | ||
3946 | static u16 emulator_get_segment_selector(int seg, struct kvm_vcpu *vcpu) | 3828 | static u16 emulator_get_segment_selector(int seg, struct kvm_vcpu *vcpu) |
3947 | { | 3829 | { |
3948 | struct kvm_segment kvm_seg; | 3830 | struct kvm_segment kvm_seg; |
3949 | 3831 | ||
3950 | kvm_get_segment(vcpu, &kvm_seg, seg); | 3832 | kvm_get_segment(vcpu, &kvm_seg, seg); |
3951 | return kvm_seg.selector; | 3833 | return kvm_seg.selector; |
3952 | } | 3834 | } |
3953 | 3835 | ||
3954 | static void emulator_set_segment_selector(u16 sel, int seg, | 3836 | static void emulator_set_segment_selector(u16 sel, int seg, |
3955 | struct kvm_vcpu *vcpu) | 3837 | struct kvm_vcpu *vcpu) |
3956 | { | 3838 | { |
3957 | struct kvm_segment kvm_seg; | 3839 | struct kvm_segment kvm_seg; |
3958 | 3840 | ||
3959 | kvm_get_segment(vcpu, &kvm_seg, seg); | 3841 | kvm_get_segment(vcpu, &kvm_seg, seg); |
3960 | kvm_seg.selector = sel; | 3842 | kvm_seg.selector = sel; |
3961 | kvm_set_segment(vcpu, &kvm_seg, seg); | 3843 | kvm_set_segment(vcpu, &kvm_seg, seg); |
3962 | } | 3844 | } |
3963 | 3845 | ||
3964 | static struct x86_emulate_ops emulate_ops = { | 3846 | static struct x86_emulate_ops emulate_ops = { |
3965 | .read_std = kvm_read_guest_virt_system, | 3847 | .read_std = kvm_read_guest_virt_system, |
3966 | .write_std = kvm_write_guest_virt_system, | 3848 | .write_std = kvm_write_guest_virt_system, |
3967 | .fetch = kvm_fetch_guest_virt, | 3849 | .fetch = kvm_fetch_guest_virt, |
3968 | .read_emulated = emulator_read_emulated, | 3850 | .read_emulated = emulator_read_emulated, |
3969 | .write_emulated = emulator_write_emulated, | 3851 | .write_emulated = emulator_write_emulated, |
3970 | .cmpxchg_emulated = emulator_cmpxchg_emulated, | 3852 | .cmpxchg_emulated = emulator_cmpxchg_emulated, |
3971 | .pio_in_emulated = emulator_pio_in_emulated, | 3853 | .pio_in_emulated = emulator_pio_in_emulated, |
3972 | .pio_out_emulated = emulator_pio_out_emulated, | 3854 | .pio_out_emulated = emulator_pio_out_emulated, |
3973 | .get_cached_descriptor = emulator_get_cached_descriptor, | 3855 | .get_cached_descriptor = emulator_get_cached_descriptor, |
3974 | .set_cached_descriptor = emulator_set_cached_descriptor, | 3856 | .set_cached_descriptor = emulator_set_cached_descriptor, |
3975 | .get_segment_selector = emulator_get_segment_selector, | 3857 | .get_segment_selector = emulator_get_segment_selector, |
3976 | .set_segment_selector = emulator_set_segment_selector, | 3858 | .set_segment_selector = emulator_set_segment_selector, |
3977 | .get_cached_segment_base = emulator_get_cached_segment_base, | 3859 | .get_cached_segment_base = emulator_get_cached_segment_base, |
3978 | .get_gdt = emulator_get_gdt, | 3860 | .get_gdt = emulator_get_gdt, |
3979 | .get_cr = emulator_get_cr, | 3861 | .get_cr = emulator_get_cr, |
3980 | .set_cr = emulator_set_cr, | 3862 | .set_cr = emulator_set_cr, |
3981 | .cpl = emulator_get_cpl, | 3863 | .cpl = emulator_get_cpl, |
3982 | .get_dr = emulator_get_dr, | 3864 | .get_dr = emulator_get_dr, |
3983 | .set_dr = emulator_set_dr, | 3865 | .set_dr = emulator_set_dr, |
3984 | .set_msr = kvm_set_msr, | 3866 | .set_msr = kvm_set_msr, |
3985 | .get_msr = kvm_get_msr, | 3867 | .get_msr = kvm_get_msr, |
3986 | }; | 3868 | }; |
3987 | 3869 | ||
3988 | static void cache_all_regs(struct kvm_vcpu *vcpu) | 3870 | static void cache_all_regs(struct kvm_vcpu *vcpu) |
3989 | { | 3871 | { |
3990 | kvm_register_read(vcpu, VCPU_REGS_RAX); | 3872 | kvm_register_read(vcpu, VCPU_REGS_RAX); |
3991 | kvm_register_read(vcpu, VCPU_REGS_RSP); | 3873 | kvm_register_read(vcpu, VCPU_REGS_RSP); |
3992 | kvm_register_read(vcpu, VCPU_REGS_RIP); | 3874 | kvm_register_read(vcpu, VCPU_REGS_RIP); |
3993 | vcpu->arch.regs_dirty = ~0; | 3875 | vcpu->arch.regs_dirty = ~0; |
3994 | } | 3876 | } |
3995 | 3877 | ||
3996 | static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) | 3878 | static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) |
3997 | { | 3879 | { |
3998 | u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask); | 3880 | u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask); |
3999 | /* | 3881 | /* |
4000 | * an sti; sti; sequence only disable interrupts for the first | 3882 | * an sti; sti; sequence only disable interrupts for the first |
4001 | * instruction. So, if the last instruction, be it emulated or | 3883 | * instruction. So, if the last instruction, be it emulated or |
4002 | * not, left the system with the INT_STI flag enabled, it | 3884 | * not, left the system with the INT_STI flag enabled, it |
4003 | * means that the last instruction is an sti. We should not | 3885 | * means that the last instruction is an sti. We should not |
4004 | * leave the flag on in this case. The same goes for mov ss | 3886 | * leave the flag on in this case. The same goes for mov ss |
4005 | */ | 3887 | */ |
4006 | if (!(int_shadow & mask)) | 3888 | if (!(int_shadow & mask)) |
4007 | kvm_x86_ops->set_interrupt_shadow(vcpu, mask); | 3889 | kvm_x86_ops->set_interrupt_shadow(vcpu, mask); |
4008 | } | 3890 | } |
4009 | 3891 | ||
4010 | static void inject_emulated_exception(struct kvm_vcpu *vcpu) | 3892 | static void inject_emulated_exception(struct kvm_vcpu *vcpu) |
4011 | { | 3893 | { |
4012 | struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt; | 3894 | struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt; |
4013 | if (ctxt->exception == PF_VECTOR) | 3895 | if (ctxt->exception == PF_VECTOR) |
4014 | kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code); | 3896 | kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code); |
4015 | else if (ctxt->error_code_valid) | 3897 | else if (ctxt->error_code_valid) |
4016 | kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code); | 3898 | kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code); |
4017 | else | 3899 | else |
4018 | kvm_queue_exception(vcpu, ctxt->exception); | 3900 | kvm_queue_exception(vcpu, ctxt->exception); |
4019 | } | 3901 | } |
4020 | 3902 | ||
4021 | static int handle_emulation_failure(struct kvm_vcpu *vcpu) | 3903 | static int handle_emulation_failure(struct kvm_vcpu *vcpu) |
4022 | { | 3904 | { |
4023 | ++vcpu->stat.insn_emulation_fail; | 3905 | ++vcpu->stat.insn_emulation_fail; |
4024 | trace_kvm_emulate_insn_failed(vcpu); | 3906 | trace_kvm_emulate_insn_failed(vcpu); |
4025 | vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; | 3907 | vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; |
4026 | vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; | 3908 | vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; |
4027 | vcpu->run->internal.ndata = 0; | 3909 | vcpu->run->internal.ndata = 0; |
4028 | kvm_queue_exception(vcpu, UD_VECTOR); | 3910 | kvm_queue_exception(vcpu, UD_VECTOR); |
4029 | return EMULATE_FAIL; | 3911 | return EMULATE_FAIL; |
4030 | } | 3912 | } |
4031 | 3913 | ||
4032 | int emulate_instruction(struct kvm_vcpu *vcpu, | 3914 | int emulate_instruction(struct kvm_vcpu *vcpu, |
4033 | unsigned long cr2, | 3915 | unsigned long cr2, |
4034 | u16 error_code, | 3916 | u16 error_code, |
4035 | int emulation_type) | 3917 | int emulation_type) |
4036 | { | 3918 | { |
4037 | int r; | 3919 | int r; |
4038 | struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; | 3920 | struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; |
4039 | 3921 | ||
4040 | kvm_clear_exception_queue(vcpu); | 3922 | kvm_clear_exception_queue(vcpu); |
4041 | vcpu->arch.mmio_fault_cr2 = cr2; | 3923 | vcpu->arch.mmio_fault_cr2 = cr2; |
4042 | /* | 3924 | /* |
4043 | * TODO: fix emulate.c to use guest_read/write_register | 3925 | * TODO: fix emulate.c to use guest_read/write_register |
4044 | * instead of direct ->regs accesses, can save hundred cycles | 3926 | * instead of direct ->regs accesses, can save hundred cycles |
4045 | * on Intel for instructions that don't read/change RSP, for | 3927 | * on Intel for instructions that don't read/change RSP, for |
4046 | * for example. | 3928 | * for example. |
4047 | */ | 3929 | */ |
4048 | cache_all_regs(vcpu); | 3930 | cache_all_regs(vcpu); |
4049 | 3931 | ||
4050 | if (!(emulation_type & EMULTYPE_NO_DECODE)) { | 3932 | if (!(emulation_type & EMULTYPE_NO_DECODE)) { |
4051 | int cs_db, cs_l; | 3933 | int cs_db, cs_l; |
4052 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); | 3934 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); |
4053 | 3935 | ||
4054 | vcpu->arch.emulate_ctxt.vcpu = vcpu; | 3936 | vcpu->arch.emulate_ctxt.vcpu = vcpu; |
4055 | vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); | 3937 | vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); |
4056 | vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); | 3938 | vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); |
4057 | vcpu->arch.emulate_ctxt.mode = | 3939 | vcpu->arch.emulate_ctxt.mode = |
4058 | (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : | 3940 | (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : |
4059 | (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) | 3941 | (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) |
4060 | ? X86EMUL_MODE_VM86 : cs_l | 3942 | ? X86EMUL_MODE_VM86 : cs_l |
4061 | ? X86EMUL_MODE_PROT64 : cs_db | 3943 | ? X86EMUL_MODE_PROT64 : cs_db |
4062 | ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; | 3944 | ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; |
4063 | memset(c, 0, sizeof(struct decode_cache)); | 3945 | memset(c, 0, sizeof(struct decode_cache)); |
4064 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); | 3946 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); |
4065 | vcpu->arch.emulate_ctxt.interruptibility = 0; | 3947 | vcpu->arch.emulate_ctxt.interruptibility = 0; |
4066 | vcpu->arch.emulate_ctxt.exception = -1; | 3948 | vcpu->arch.emulate_ctxt.exception = -1; |
4067 | 3949 | ||
4068 | r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); | 3950 | r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); |
4069 | trace_kvm_emulate_insn_start(vcpu); | 3951 | trace_kvm_emulate_insn_start(vcpu); |
4070 | 3952 | ||
4071 | /* Only allow emulation of specific instructions on #UD | 3953 | /* Only allow emulation of specific instructions on #UD |
4072 | * (namely VMMCALL, sysenter, sysexit, syscall)*/ | 3954 | * (namely VMMCALL, sysenter, sysexit, syscall)*/ |
4073 | if (emulation_type & EMULTYPE_TRAP_UD) { | 3955 | if (emulation_type & EMULTYPE_TRAP_UD) { |
4074 | if (!c->twobyte) | 3956 | if (!c->twobyte) |
4075 | return EMULATE_FAIL; | 3957 | return EMULATE_FAIL; |
4076 | switch (c->b) { | 3958 | switch (c->b) { |
4077 | case 0x01: /* VMMCALL */ | 3959 | case 0x01: /* VMMCALL */ |
4078 | if (c->modrm_mod != 3 || c->modrm_rm != 1) | 3960 | if (c->modrm_mod != 3 || c->modrm_rm != 1) |
4079 | return EMULATE_FAIL; | 3961 | return EMULATE_FAIL; |
4080 | break; | 3962 | break; |
4081 | case 0x34: /* sysenter */ | 3963 | case 0x34: /* sysenter */ |
4082 | case 0x35: /* sysexit */ | 3964 | case 0x35: /* sysexit */ |
4083 | if (c->modrm_mod != 0 || c->modrm_rm != 0) | 3965 | if (c->modrm_mod != 0 || c->modrm_rm != 0) |
4084 | return EMULATE_FAIL; | 3966 | return EMULATE_FAIL; |
4085 | break; | 3967 | break; |
4086 | case 0x05: /* syscall */ | 3968 | case 0x05: /* syscall */ |
4087 | if (c->modrm_mod != 0 || c->modrm_rm != 0) | 3969 | if (c->modrm_mod != 0 || c->modrm_rm != 0) |
4088 | return EMULATE_FAIL; | 3970 | return EMULATE_FAIL; |
4089 | break; | 3971 | break; |
4090 | default: | 3972 | default: |
4091 | return EMULATE_FAIL; | 3973 | return EMULATE_FAIL; |
4092 | } | 3974 | } |
4093 | 3975 | ||
4094 | if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) | 3976 | if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) |
4095 | return EMULATE_FAIL; | 3977 | return EMULATE_FAIL; |
4096 | } | 3978 | } |
4097 | 3979 | ||
4098 | ++vcpu->stat.insn_emulation; | 3980 | ++vcpu->stat.insn_emulation; |
4099 | if (r) { | 3981 | if (r) { |
4100 | if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) | 3982 | if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) |
4101 | return EMULATE_DONE; | 3983 | return EMULATE_DONE; |
4102 | if (emulation_type & EMULTYPE_SKIP) | 3984 | if (emulation_type & EMULTYPE_SKIP) |
4103 | return EMULATE_FAIL; | 3985 | return EMULATE_FAIL; |
4104 | return handle_emulation_failure(vcpu); | 3986 | return handle_emulation_failure(vcpu); |
4105 | } | 3987 | } |
4106 | } | 3988 | } |
4107 | 3989 | ||
4108 | if (emulation_type & EMULTYPE_SKIP) { | 3990 | if (emulation_type & EMULTYPE_SKIP) { |
4109 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip); | 3991 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip); |
4110 | return EMULATE_DONE; | 3992 | return EMULATE_DONE; |
4111 | } | 3993 | } |
4112 | 3994 | ||
4113 | /* this is needed for vmware backdor interface to work since it | 3995 | /* this is needed for vmware backdor interface to work since it |
4114 | changes registers values during IO operation */ | 3996 | changes registers values during IO operation */ |
4115 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); | 3997 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); |
4116 | 3998 | ||
4117 | restart: | 3999 | restart: |
4118 | r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); | 4000 | r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); |
4119 | 4001 | ||
4120 | if (r) { /* emulation failed */ | 4002 | if (r) { /* emulation failed */ |
4121 | /* | 4003 | /* |
4122 | * if emulation was due to access to shadowed page table | 4004 | * if emulation was due to access to shadowed page table |
4123 | * and it failed try to unshadow page and re-entetr the | 4005 | * and it failed try to unshadow page and re-entetr the |
4124 | * guest to let CPU execute the instruction. | 4006 | * guest to let CPU execute the instruction. |
4125 | */ | 4007 | */ |
4126 | if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) | 4008 | if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) |
4127 | return EMULATE_DONE; | 4009 | return EMULATE_DONE; |
4128 | 4010 | ||
4129 | return handle_emulation_failure(vcpu); | 4011 | return handle_emulation_failure(vcpu); |
4130 | } | 4012 | } |
4131 | 4013 | ||
4132 | toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility); | 4014 | toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility); |
4133 | kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); | 4015 | kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); |
4134 | memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); | 4016 | memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); |
4135 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); | 4017 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); |
4136 | 4018 | ||
4137 | if (vcpu->arch.emulate_ctxt.exception >= 0) { | 4019 | if (vcpu->arch.emulate_ctxt.exception >= 0) { |
4138 | inject_emulated_exception(vcpu); | 4020 | inject_emulated_exception(vcpu); |
4139 | return EMULATE_DONE; | 4021 | return EMULATE_DONE; |
4140 | } | 4022 | } |
4141 | 4023 | ||
4142 | if (vcpu->arch.pio.count) { | 4024 | if (vcpu->arch.pio.count) { |
4143 | if (!vcpu->arch.pio.in) | 4025 | if (!vcpu->arch.pio.in) |
4144 | vcpu->arch.pio.count = 0; | 4026 | vcpu->arch.pio.count = 0; |
4145 | return EMULATE_DO_MMIO; | 4027 | return EMULATE_DO_MMIO; |
4146 | } | 4028 | } |
4147 | 4029 | ||
4148 | if (vcpu->mmio_needed) { | 4030 | if (vcpu->mmio_needed) { |
4149 | if (vcpu->mmio_is_write) | 4031 | if (vcpu->mmio_is_write) |
4150 | vcpu->mmio_needed = 0; | 4032 | vcpu->mmio_needed = 0; |
4151 | return EMULATE_DO_MMIO; | 4033 | return EMULATE_DO_MMIO; |
4152 | } | 4034 | } |
4153 | 4035 | ||
4154 | if (vcpu->arch.emulate_ctxt.restart) | 4036 | if (vcpu->arch.emulate_ctxt.restart) |
4155 | goto restart; | 4037 | goto restart; |
4156 | 4038 | ||
4157 | return EMULATE_DONE; | 4039 | return EMULATE_DONE; |
4158 | } | 4040 | } |
4159 | EXPORT_SYMBOL_GPL(emulate_instruction); | 4041 | EXPORT_SYMBOL_GPL(emulate_instruction); |
4160 | 4042 | ||
4161 | int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) | 4043 | int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) |
4162 | { | 4044 | { |
4163 | unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); | 4045 | unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); |
4164 | int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu); | 4046 | int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu); |
4165 | /* do not return to emulator after return from userspace */ | 4047 | /* do not return to emulator after return from userspace */ |
4166 | vcpu->arch.pio.count = 0; | 4048 | vcpu->arch.pio.count = 0; |
4167 | return ret; | 4049 | return ret; |
4168 | } | 4050 | } |
4169 | EXPORT_SYMBOL_GPL(kvm_fast_pio_out); | 4051 | EXPORT_SYMBOL_GPL(kvm_fast_pio_out); |
4170 | 4052 | ||
4171 | static void bounce_off(void *info) | 4053 | static void bounce_off(void *info) |
4172 | { | 4054 | { |
4173 | /* nothing */ | 4055 | /* nothing */ |
4174 | } | 4056 | } |
4175 | 4057 | ||
4176 | static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val, | 4058 | static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val, |
4177 | void *data) | 4059 | void *data) |
4178 | { | 4060 | { |
4179 | struct cpufreq_freqs *freq = data; | 4061 | struct cpufreq_freqs *freq = data; |
4180 | struct kvm *kvm; | 4062 | struct kvm *kvm; |
4181 | struct kvm_vcpu *vcpu; | 4063 | struct kvm_vcpu *vcpu; |
4182 | int i, send_ipi = 0; | 4064 | int i, send_ipi = 0; |
4183 | 4065 | ||
4184 | if (val == CPUFREQ_PRECHANGE && freq->old > freq->new) | 4066 | if (val == CPUFREQ_PRECHANGE && freq->old > freq->new) |
4185 | return 0; | 4067 | return 0; |
4186 | if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new) | 4068 | if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new) |
4187 | return 0; | 4069 | return 0; |
4188 | per_cpu(cpu_tsc_khz, freq->cpu) = freq->new; | 4070 | per_cpu(cpu_tsc_khz, freq->cpu) = freq->new; |
4189 | 4071 | ||
4190 | spin_lock(&kvm_lock); | 4072 | spin_lock(&kvm_lock); |
4191 | list_for_each_entry(kvm, &vm_list, vm_list) { | 4073 | list_for_each_entry(kvm, &vm_list, vm_list) { |
4192 | kvm_for_each_vcpu(i, vcpu, kvm) { | 4074 | kvm_for_each_vcpu(i, vcpu, kvm) { |
4193 | if (vcpu->cpu != freq->cpu) | 4075 | if (vcpu->cpu != freq->cpu) |
4194 | continue; | 4076 | continue; |
4195 | if (!kvm_request_guest_time_update(vcpu)) | 4077 | if (!kvm_request_guest_time_update(vcpu)) |
4196 | continue; | 4078 | continue; |
4197 | if (vcpu->cpu != smp_processor_id()) | 4079 | if (vcpu->cpu != smp_processor_id()) |
4198 | send_ipi++; | 4080 | send_ipi++; |
4199 | } | 4081 | } |
4200 | } | 4082 | } |
4201 | spin_unlock(&kvm_lock); | 4083 | spin_unlock(&kvm_lock); |
4202 | 4084 | ||
4203 | if (freq->old < freq->new && send_ipi) { | 4085 | if (freq->old < freq->new && send_ipi) { |
4204 | /* | 4086 | /* |
4205 | * We upscale the frequency. Must make the guest | 4087 | * We upscale the frequency. Must make the guest |
4206 | * doesn't see old kvmclock values while running with | 4088 | * doesn't see old kvmclock values while running with |
4207 | * the new frequency, otherwise we risk the guest sees | 4089 | * the new frequency, otherwise we risk the guest sees |
4208 | * time go backwards. | 4090 | * time go backwards. |
4209 | * | 4091 | * |
4210 | * In case we update the frequency for another cpu | 4092 | * In case we update the frequency for another cpu |
4211 | * (which might be in guest context) send an interrupt | 4093 | * (which might be in guest context) send an interrupt |
4212 | * to kick the cpu out of guest context. Next time | 4094 | * to kick the cpu out of guest context. Next time |
4213 | * guest context is entered kvmclock will be updated, | 4095 | * guest context is entered kvmclock will be updated, |
4214 | * so the guest will not see stale values. | 4096 | * so the guest will not see stale values. |
4215 | */ | 4097 | */ |
4216 | smp_call_function_single(freq->cpu, bounce_off, NULL, 1); | 4098 | smp_call_function_single(freq->cpu, bounce_off, NULL, 1); |
4217 | } | 4099 | } |
4218 | return 0; | 4100 | return 0; |
4219 | } | 4101 | } |
4220 | 4102 | ||
4221 | static struct notifier_block kvmclock_cpufreq_notifier_block = { | 4103 | static struct notifier_block kvmclock_cpufreq_notifier_block = { |
4222 | .notifier_call = kvmclock_cpufreq_notifier | 4104 | .notifier_call = kvmclock_cpufreq_notifier |
4223 | }; | 4105 | }; |
4224 | 4106 | ||
4225 | static void kvm_timer_init(void) | 4107 | static void kvm_timer_init(void) |
4226 | { | 4108 | { |
4227 | int cpu; | 4109 | int cpu; |
4228 | 4110 | ||
4229 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { | 4111 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { |
4230 | cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block, | 4112 | cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block, |
4231 | CPUFREQ_TRANSITION_NOTIFIER); | 4113 | CPUFREQ_TRANSITION_NOTIFIER); |
4232 | for_each_online_cpu(cpu) { | 4114 | for_each_online_cpu(cpu) { |
4233 | unsigned long khz = cpufreq_get(cpu); | 4115 | unsigned long khz = cpufreq_get(cpu); |
4234 | if (!khz) | 4116 | if (!khz) |
4235 | khz = tsc_khz; | 4117 | khz = tsc_khz; |
4236 | per_cpu(cpu_tsc_khz, cpu) = khz; | 4118 | per_cpu(cpu_tsc_khz, cpu) = khz; |
4237 | } | 4119 | } |
4238 | } else { | 4120 | } else { |
4239 | for_each_possible_cpu(cpu) | 4121 | for_each_possible_cpu(cpu) |
4240 | per_cpu(cpu_tsc_khz, cpu) = tsc_khz; | 4122 | per_cpu(cpu_tsc_khz, cpu) = tsc_khz; |
4241 | } | 4123 | } |
4242 | } | 4124 | } |
4243 | 4125 | ||
4244 | static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); | 4126 | static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); |
4245 | 4127 | ||
4246 | static int kvm_is_in_guest(void) | 4128 | static int kvm_is_in_guest(void) |
4247 | { | 4129 | { |
4248 | return percpu_read(current_vcpu) != NULL; | 4130 | return percpu_read(current_vcpu) != NULL; |
4249 | } | 4131 | } |
4250 | 4132 | ||
4251 | static int kvm_is_user_mode(void) | 4133 | static int kvm_is_user_mode(void) |
4252 | { | 4134 | { |
4253 | int user_mode = 3; | 4135 | int user_mode = 3; |
4254 | 4136 | ||
4255 | if (percpu_read(current_vcpu)) | 4137 | if (percpu_read(current_vcpu)) |
4256 | user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu)); | 4138 | user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu)); |
4257 | 4139 | ||
4258 | return user_mode != 0; | 4140 | return user_mode != 0; |
4259 | } | 4141 | } |
4260 | 4142 | ||
4261 | static unsigned long kvm_get_guest_ip(void) | 4143 | static unsigned long kvm_get_guest_ip(void) |
4262 | { | 4144 | { |
4263 | unsigned long ip = 0; | 4145 | unsigned long ip = 0; |
4264 | 4146 | ||
4265 | if (percpu_read(current_vcpu)) | 4147 | if (percpu_read(current_vcpu)) |
4266 | ip = kvm_rip_read(percpu_read(current_vcpu)); | 4148 | ip = kvm_rip_read(percpu_read(current_vcpu)); |
4267 | 4149 | ||
4268 | return ip; | 4150 | return ip; |
4269 | } | 4151 | } |
4270 | 4152 | ||
4271 | static struct perf_guest_info_callbacks kvm_guest_cbs = { | 4153 | static struct perf_guest_info_callbacks kvm_guest_cbs = { |
4272 | .is_in_guest = kvm_is_in_guest, | 4154 | .is_in_guest = kvm_is_in_guest, |
4273 | .is_user_mode = kvm_is_user_mode, | 4155 | .is_user_mode = kvm_is_user_mode, |
4274 | .get_guest_ip = kvm_get_guest_ip, | 4156 | .get_guest_ip = kvm_get_guest_ip, |
4275 | }; | 4157 | }; |
4276 | 4158 | ||
4277 | void kvm_before_handle_nmi(struct kvm_vcpu *vcpu) | 4159 | void kvm_before_handle_nmi(struct kvm_vcpu *vcpu) |
4278 | { | 4160 | { |
4279 | percpu_write(current_vcpu, vcpu); | 4161 | percpu_write(current_vcpu, vcpu); |
4280 | } | 4162 | } |
4281 | EXPORT_SYMBOL_GPL(kvm_before_handle_nmi); | 4163 | EXPORT_SYMBOL_GPL(kvm_before_handle_nmi); |
4282 | 4164 | ||
4283 | void kvm_after_handle_nmi(struct kvm_vcpu *vcpu) | 4165 | void kvm_after_handle_nmi(struct kvm_vcpu *vcpu) |
4284 | { | 4166 | { |
4285 | percpu_write(current_vcpu, NULL); | 4167 | percpu_write(current_vcpu, NULL); |
4286 | } | 4168 | } |
4287 | EXPORT_SYMBOL_GPL(kvm_after_handle_nmi); | 4169 | EXPORT_SYMBOL_GPL(kvm_after_handle_nmi); |
4288 | 4170 | ||
4289 | int kvm_arch_init(void *opaque) | 4171 | int kvm_arch_init(void *opaque) |
4290 | { | 4172 | { |
4291 | int r; | 4173 | int r; |
4292 | struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque; | 4174 | struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque; |
4293 | 4175 | ||
4294 | if (kvm_x86_ops) { | 4176 | if (kvm_x86_ops) { |
4295 | printk(KERN_ERR "kvm: already loaded the other module\n"); | 4177 | printk(KERN_ERR "kvm: already loaded the other module\n"); |
4296 | r = -EEXIST; | 4178 | r = -EEXIST; |
4297 | goto out; | 4179 | goto out; |
4298 | } | 4180 | } |
4299 | 4181 | ||
4300 | if (!ops->cpu_has_kvm_support()) { | 4182 | if (!ops->cpu_has_kvm_support()) { |
4301 | printk(KERN_ERR "kvm: no hardware support\n"); | 4183 | printk(KERN_ERR "kvm: no hardware support\n"); |
4302 | r = -EOPNOTSUPP; | 4184 | r = -EOPNOTSUPP; |
4303 | goto out; | 4185 | goto out; |
4304 | } | 4186 | } |
4305 | if (ops->disabled_by_bios()) { | 4187 | if (ops->disabled_by_bios()) { |
4306 | printk(KERN_ERR "kvm: disabled by bios\n"); | 4188 | printk(KERN_ERR "kvm: disabled by bios\n"); |
4307 | r = -EOPNOTSUPP; | 4189 | r = -EOPNOTSUPP; |
4308 | goto out; | 4190 | goto out; |
4309 | } | 4191 | } |
4310 | 4192 | ||
4311 | r = kvm_mmu_module_init(); | 4193 | r = kvm_mmu_module_init(); |
4312 | if (r) | 4194 | if (r) |
4313 | goto out; | 4195 | goto out; |
4314 | 4196 | ||
4315 | kvm_init_msr_list(); | 4197 | kvm_init_msr_list(); |
4316 | 4198 | ||
4317 | kvm_x86_ops = ops; | 4199 | kvm_x86_ops = ops; |
4318 | kvm_mmu_set_nonpresent_ptes(0ull, 0ull); | 4200 | kvm_mmu_set_nonpresent_ptes(0ull, 0ull); |
4319 | kvm_mmu_set_base_ptes(PT_PRESENT_MASK); | 4201 | kvm_mmu_set_base_ptes(PT_PRESENT_MASK); |
4320 | kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, | 4202 | kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, |
4321 | PT_DIRTY_MASK, PT64_NX_MASK, 0); | 4203 | PT_DIRTY_MASK, PT64_NX_MASK, 0); |
4322 | 4204 | ||
4323 | kvm_timer_init(); | 4205 | kvm_timer_init(); |
4324 | 4206 | ||
4325 | perf_register_guest_info_callbacks(&kvm_guest_cbs); | 4207 | perf_register_guest_info_callbacks(&kvm_guest_cbs); |
4326 | 4208 | ||
4327 | if (cpu_has_xsave) | 4209 | if (cpu_has_xsave) |
4328 | host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); | 4210 | host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); |
4329 | 4211 | ||
4330 | return 0; | 4212 | return 0; |
4331 | 4213 | ||
4332 | out: | 4214 | out: |
4333 | return r; | 4215 | return r; |
4334 | } | 4216 | } |
4335 | 4217 | ||
4336 | void kvm_arch_exit(void) | 4218 | void kvm_arch_exit(void) |
4337 | { | 4219 | { |
4338 | perf_unregister_guest_info_callbacks(&kvm_guest_cbs); | 4220 | perf_unregister_guest_info_callbacks(&kvm_guest_cbs); |
4339 | 4221 | ||
4340 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) | 4222 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) |
4341 | cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block, | 4223 | cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block, |
4342 | CPUFREQ_TRANSITION_NOTIFIER); | 4224 | CPUFREQ_TRANSITION_NOTIFIER); |
4343 | kvm_x86_ops = NULL; | 4225 | kvm_x86_ops = NULL; |
4344 | kvm_mmu_module_exit(); | 4226 | kvm_mmu_module_exit(); |
4345 | } | 4227 | } |
4346 | 4228 | ||
4347 | int kvm_emulate_halt(struct kvm_vcpu *vcpu) | 4229 | int kvm_emulate_halt(struct kvm_vcpu *vcpu) |
4348 | { | 4230 | { |
4349 | ++vcpu->stat.halt_exits; | 4231 | ++vcpu->stat.halt_exits; |
4350 | if (irqchip_in_kernel(vcpu->kvm)) { | 4232 | if (irqchip_in_kernel(vcpu->kvm)) { |
4351 | vcpu->arch.mp_state = KVM_MP_STATE_HALTED; | 4233 | vcpu->arch.mp_state = KVM_MP_STATE_HALTED; |
4352 | return 1; | 4234 | return 1; |
4353 | } else { | 4235 | } else { |
4354 | vcpu->run->exit_reason = KVM_EXIT_HLT; | 4236 | vcpu->run->exit_reason = KVM_EXIT_HLT; |
4355 | return 0; | 4237 | return 0; |
4356 | } | 4238 | } |
4357 | } | 4239 | } |
4358 | EXPORT_SYMBOL_GPL(kvm_emulate_halt); | 4240 | EXPORT_SYMBOL_GPL(kvm_emulate_halt); |
4359 | 4241 | ||
4360 | static inline gpa_t hc_gpa(struct kvm_vcpu *vcpu, unsigned long a0, | 4242 | static inline gpa_t hc_gpa(struct kvm_vcpu *vcpu, unsigned long a0, |
4361 | unsigned long a1) | 4243 | unsigned long a1) |
4362 | { | 4244 | { |
4363 | if (is_long_mode(vcpu)) | 4245 | if (is_long_mode(vcpu)) |
4364 | return a0; | 4246 | return a0; |
4365 | else | 4247 | else |
4366 | return a0 | ((gpa_t)a1 << 32); | 4248 | return a0 | ((gpa_t)a1 << 32); |
4367 | } | 4249 | } |
4368 | 4250 | ||
4369 | int kvm_hv_hypercall(struct kvm_vcpu *vcpu) | 4251 | int kvm_hv_hypercall(struct kvm_vcpu *vcpu) |
4370 | { | 4252 | { |
4371 | u64 param, ingpa, outgpa, ret; | 4253 | u64 param, ingpa, outgpa, ret; |
4372 | uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0; | 4254 | uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0; |
4373 | bool fast, longmode; | 4255 | bool fast, longmode; |
4374 | int cs_db, cs_l; | 4256 | int cs_db, cs_l; |
4375 | 4257 | ||
4376 | /* | 4258 | /* |
4377 | * hypercall generates UD from non zero cpl and real mode | 4259 | * hypercall generates UD from non zero cpl and real mode |
4378 | * per HYPER-V spec | 4260 | * per HYPER-V spec |
4379 | */ | 4261 | */ |
4380 | if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { | 4262 | if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { |
4381 | kvm_queue_exception(vcpu, UD_VECTOR); | 4263 | kvm_queue_exception(vcpu, UD_VECTOR); |
4382 | return 0; | 4264 | return 0; |
4383 | } | 4265 | } |
4384 | 4266 | ||
4385 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); | 4267 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); |
4386 | longmode = is_long_mode(vcpu) && cs_l == 1; | 4268 | longmode = is_long_mode(vcpu) && cs_l == 1; |
4387 | 4269 | ||
4388 | if (!longmode) { | 4270 | if (!longmode) { |
4389 | param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | | 4271 | param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | |
4390 | (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff); | 4272 | (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff); |
4391 | ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | | 4273 | ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | |
4392 | (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff); | 4274 | (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff); |
4393 | outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | | 4275 | outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | |
4394 | (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff); | 4276 | (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff); |
4395 | } | 4277 | } |
4396 | #ifdef CONFIG_X86_64 | 4278 | #ifdef CONFIG_X86_64 |
4397 | else { | 4279 | else { |
4398 | param = kvm_register_read(vcpu, VCPU_REGS_RCX); | 4280 | param = kvm_register_read(vcpu, VCPU_REGS_RCX); |
4399 | ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); | 4281 | ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); |
4400 | outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); | 4282 | outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); |
4401 | } | 4283 | } |
4402 | #endif | 4284 | #endif |
4403 | 4285 | ||
4404 | code = param & 0xffff; | 4286 | code = param & 0xffff; |
4405 | fast = (param >> 16) & 0x1; | 4287 | fast = (param >> 16) & 0x1; |
4406 | rep_cnt = (param >> 32) & 0xfff; | 4288 | rep_cnt = (param >> 32) & 0xfff; |
4407 | rep_idx = (param >> 48) & 0xfff; | 4289 | rep_idx = (param >> 48) & 0xfff; |
4408 | 4290 | ||
4409 | trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); | 4291 | trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); |
4410 | 4292 | ||
4411 | switch (code) { | 4293 | switch (code) { |
4412 | case HV_X64_HV_NOTIFY_LONG_SPIN_WAIT: | 4294 | case HV_X64_HV_NOTIFY_LONG_SPIN_WAIT: |
4413 | kvm_vcpu_on_spin(vcpu); | 4295 | kvm_vcpu_on_spin(vcpu); |
4414 | break; | 4296 | break; |
4415 | default: | 4297 | default: |
4416 | res = HV_STATUS_INVALID_HYPERCALL_CODE; | 4298 | res = HV_STATUS_INVALID_HYPERCALL_CODE; |
4417 | break; | 4299 | break; |
4418 | } | 4300 | } |
4419 | 4301 | ||
4420 | ret = res | (((u64)rep_done & 0xfff) << 32); | 4302 | ret = res | (((u64)rep_done & 0xfff) << 32); |
4421 | if (longmode) { | 4303 | if (longmode) { |
4422 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret); | 4304 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret); |
4423 | } else { | 4305 | } else { |
4424 | kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32); | 4306 | kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32); |
4425 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0xffffffff); | 4307 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0xffffffff); |
4426 | } | 4308 | } |
4427 | 4309 | ||
4428 | return 1; | 4310 | return 1; |
4429 | } | 4311 | } |
4430 | 4312 | ||
4431 | int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) | 4313 | int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) |
4432 | { | 4314 | { |
4433 | unsigned long nr, a0, a1, a2, a3, ret; | 4315 | unsigned long nr, a0, a1, a2, a3, ret; |
4434 | int r = 1; | 4316 | int r = 1; |
4435 | 4317 | ||
4436 | if (kvm_hv_hypercall_enabled(vcpu->kvm)) | 4318 | if (kvm_hv_hypercall_enabled(vcpu->kvm)) |
4437 | return kvm_hv_hypercall(vcpu); | 4319 | return kvm_hv_hypercall(vcpu); |
4438 | 4320 | ||
4439 | nr = kvm_register_read(vcpu, VCPU_REGS_RAX); | 4321 | nr = kvm_register_read(vcpu, VCPU_REGS_RAX); |
4440 | a0 = kvm_register_read(vcpu, VCPU_REGS_RBX); | 4322 | a0 = kvm_register_read(vcpu, VCPU_REGS_RBX); |
4441 | a1 = kvm_register_read(vcpu, VCPU_REGS_RCX); | 4323 | a1 = kvm_register_read(vcpu, VCPU_REGS_RCX); |
4442 | a2 = kvm_register_read(vcpu, VCPU_REGS_RDX); | 4324 | a2 = kvm_register_read(vcpu, VCPU_REGS_RDX); |
4443 | a3 = kvm_register_read(vcpu, VCPU_REGS_RSI); | 4325 | a3 = kvm_register_read(vcpu, VCPU_REGS_RSI); |
4444 | 4326 | ||
4445 | trace_kvm_hypercall(nr, a0, a1, a2, a3); | 4327 | trace_kvm_hypercall(nr, a0, a1, a2, a3); |
4446 | 4328 | ||
4447 | if (!is_long_mode(vcpu)) { | 4329 | if (!is_long_mode(vcpu)) { |
4448 | nr &= 0xFFFFFFFF; | 4330 | nr &= 0xFFFFFFFF; |
4449 | a0 &= 0xFFFFFFFF; | 4331 | a0 &= 0xFFFFFFFF; |
4450 | a1 &= 0xFFFFFFFF; | 4332 | a1 &= 0xFFFFFFFF; |
4451 | a2 &= 0xFFFFFFFF; | 4333 | a2 &= 0xFFFFFFFF; |
4452 | a3 &= 0xFFFFFFFF; | 4334 | a3 &= 0xFFFFFFFF; |
4453 | } | 4335 | } |
4454 | 4336 | ||
4455 | if (kvm_x86_ops->get_cpl(vcpu) != 0) { | 4337 | if (kvm_x86_ops->get_cpl(vcpu) != 0) { |
4456 | ret = -KVM_EPERM; | 4338 | ret = -KVM_EPERM; |
4457 | goto out; | 4339 | goto out; |
4458 | } | 4340 | } |
4459 | 4341 | ||
4460 | switch (nr) { | 4342 | switch (nr) { |
4461 | case KVM_HC_VAPIC_POLL_IRQ: | 4343 | case KVM_HC_VAPIC_POLL_IRQ: |
4462 | ret = 0; | 4344 | ret = 0; |
4463 | break; | 4345 | break; |
4464 | case KVM_HC_MMU_OP: | 4346 | case KVM_HC_MMU_OP: |
4465 | r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret); | 4347 | r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret); |
4466 | break; | 4348 | break; |
4467 | default: | 4349 | default: |
4468 | ret = -KVM_ENOSYS; | 4350 | ret = -KVM_ENOSYS; |
4469 | break; | 4351 | break; |
4470 | } | 4352 | } |
4471 | out: | 4353 | out: |
4472 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret); | 4354 | kvm_register_write(vcpu, VCPU_REGS_RAX, ret); |
4473 | ++vcpu->stat.hypercalls; | 4355 | ++vcpu->stat.hypercalls; |
4474 | return r; | 4356 | return r; |
4475 | } | 4357 | } |
4476 | EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); | 4358 | EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); |
4477 | 4359 | ||
4478 | int kvm_fix_hypercall(struct kvm_vcpu *vcpu) | 4360 | int kvm_fix_hypercall(struct kvm_vcpu *vcpu) |
4479 | { | 4361 | { |
4480 | char instruction[3]; | 4362 | char instruction[3]; |
4481 | unsigned long rip = kvm_rip_read(vcpu); | 4363 | unsigned long rip = kvm_rip_read(vcpu); |
4482 | 4364 | ||
4483 | /* | 4365 | /* |
4484 | * Blow out the MMU to ensure that no other VCPU has an active mapping | 4366 | * Blow out the MMU to ensure that no other VCPU has an active mapping |
4485 | * to ensure that the updated hypercall appears atomically across all | 4367 | * to ensure that the updated hypercall appears atomically across all |
4486 | * VCPUs. | 4368 | * VCPUs. |
4487 | */ | 4369 | */ |
4488 | kvm_mmu_zap_all(vcpu->kvm); | 4370 | kvm_mmu_zap_all(vcpu->kvm); |
4489 | 4371 | ||
4490 | kvm_x86_ops->patch_hypercall(vcpu, instruction); | 4372 | kvm_x86_ops->patch_hypercall(vcpu, instruction); |
4491 | 4373 | ||
4492 | return emulator_write_emulated(rip, instruction, 3, NULL, vcpu); | 4374 | return emulator_write_emulated(rip, instruction, 3, NULL, vcpu); |
4493 | } | 4375 | } |
4494 | 4376 | ||
4495 | void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) | 4377 | void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) |
4496 | { | 4378 | { |
4497 | struct desc_ptr dt = { limit, base }; | 4379 | struct desc_ptr dt = { limit, base }; |
4498 | 4380 | ||
4499 | kvm_x86_ops->set_gdt(vcpu, &dt); | 4381 | kvm_x86_ops->set_gdt(vcpu, &dt); |
4500 | } | 4382 | } |
4501 | 4383 | ||
4502 | void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) | 4384 | void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) |
4503 | { | 4385 | { |
4504 | struct desc_ptr dt = { limit, base }; | 4386 | struct desc_ptr dt = { limit, base }; |
4505 | 4387 | ||
4506 | kvm_x86_ops->set_idt(vcpu, &dt); | 4388 | kvm_x86_ops->set_idt(vcpu, &dt); |
4507 | } | 4389 | } |
4508 | 4390 | ||
4509 | static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) | 4391 | static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) |
4510 | { | 4392 | { |
4511 | struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i]; | 4393 | struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i]; |
4512 | int j, nent = vcpu->arch.cpuid_nent; | 4394 | int j, nent = vcpu->arch.cpuid_nent; |
4513 | 4395 | ||
4514 | e->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT; | 4396 | e->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT; |
4515 | /* when no next entry is found, the current entry[i] is reselected */ | 4397 | /* when no next entry is found, the current entry[i] is reselected */ |
4516 | for (j = i + 1; ; j = (j + 1) % nent) { | 4398 | for (j = i + 1; ; j = (j + 1) % nent) { |
4517 | struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j]; | 4399 | struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j]; |
4518 | if (ej->function == e->function) { | 4400 | if (ej->function == e->function) { |
4519 | ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; | 4401 | ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; |
4520 | return j; | 4402 | return j; |
4521 | } | 4403 | } |
4522 | } | 4404 | } |
4523 | return 0; /* silence gcc, even though control never reaches here */ | 4405 | return 0; /* silence gcc, even though control never reaches here */ |
4524 | } | 4406 | } |
4525 | 4407 | ||
4526 | /* find an entry with matching function, matching index (if needed), and that | 4408 | /* find an entry with matching function, matching index (if needed), and that |
4527 | * should be read next (if it's stateful) */ | 4409 | * should be read next (if it's stateful) */ |
4528 | static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e, | 4410 | static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e, |
4529 | u32 function, u32 index) | 4411 | u32 function, u32 index) |
4530 | { | 4412 | { |
4531 | if (e->function != function) | 4413 | if (e->function != function) |
4532 | return 0; | 4414 | return 0; |
4533 | if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index) | 4415 | if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index) |
4534 | return 0; | 4416 | return 0; |
4535 | if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && | 4417 | if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && |
4536 | !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT)) | 4418 | !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT)) |
4537 | return 0; | 4419 | return 0; |
4538 | return 1; | 4420 | return 1; |
4539 | } | 4421 | } |
4540 | 4422 | ||
4541 | struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, | 4423 | struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, |
4542 | u32 function, u32 index) | 4424 | u32 function, u32 index) |
4543 | { | 4425 | { |
4544 | int i; | 4426 | int i; |
4545 | struct kvm_cpuid_entry2 *best = NULL; | 4427 | struct kvm_cpuid_entry2 *best = NULL; |
4546 | 4428 | ||
4547 | for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { | 4429 | for (i = 0; i < vcpu->arch.cpuid_nent; ++i) { |
4548 | struct kvm_cpuid_entry2 *e; | 4430 | struct kvm_cpuid_entry2 *e; |
4549 | 4431 | ||
4550 | e = &vcpu->arch.cpuid_entries[i]; | 4432 | e = &vcpu->arch.cpuid_entries[i]; |
4551 | if (is_matching_cpuid_entry(e, function, index)) { | 4433 | if (is_matching_cpuid_entry(e, function, index)) { |
4552 | if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) | 4434 | if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) |
4553 | move_to_next_stateful_cpuid_entry(vcpu, i); | 4435 | move_to_next_stateful_cpuid_entry(vcpu, i); |
4554 | best = e; | 4436 | best = e; |
4555 | break; | 4437 | break; |
4556 | } | 4438 | } |
4557 | /* | 4439 | /* |
4558 | * Both basic or both extended? | 4440 | * Both basic or both extended? |
4559 | */ | 4441 | */ |
4560 | if (((e->function ^ function) & 0x80000000) == 0) | 4442 | if (((e->function ^ function) & 0x80000000) == 0) |
4561 | if (!best || e->function > best->function) | 4443 | if (!best || e->function > best->function) |
4562 | best = e; | 4444 | best = e; |
4563 | } | 4445 | } |
4564 | return best; | 4446 | return best; |
4565 | } | 4447 | } |
4566 | EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry); | 4448 | EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry); |
4567 | 4449 | ||
4568 | int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) | 4450 | int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) |
4569 | { | 4451 | { |
4570 | struct kvm_cpuid_entry2 *best; | 4452 | struct kvm_cpuid_entry2 *best; |
4571 | 4453 | ||
4572 | best = kvm_find_cpuid_entry(vcpu, 0x80000000, 0); | 4454 | best = kvm_find_cpuid_entry(vcpu, 0x80000000, 0); |
4573 | if (!best || best->eax < 0x80000008) | 4455 | if (!best || best->eax < 0x80000008) |
4574 | goto not_found; | 4456 | goto not_found; |
4575 | best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); | 4457 | best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); |
4576 | if (best) | 4458 | if (best) |
4577 | return best->eax & 0xff; | 4459 | return best->eax & 0xff; |
4578 | not_found: | 4460 | not_found: |
4579 | return 36; | 4461 | return 36; |
4580 | } | 4462 | } |
4581 | 4463 | ||
4582 | void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) | 4464 | void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) |
4583 | { | 4465 | { |
4584 | u32 function, index; | 4466 | u32 function, index; |
4585 | struct kvm_cpuid_entry2 *best; | 4467 | struct kvm_cpuid_entry2 *best; |
4586 | 4468 | ||
4587 | function = kvm_register_read(vcpu, VCPU_REGS_RAX); | 4469 | function = kvm_register_read(vcpu, VCPU_REGS_RAX); |
4588 | index = kvm_register_read(vcpu, VCPU_REGS_RCX); | 4470 | index = kvm_register_read(vcpu, VCPU_REGS_RCX); |
4589 | kvm_register_write(vcpu, VCPU_REGS_RAX, 0); | 4471 | kvm_register_write(vcpu, VCPU_REGS_RAX, 0); |
4590 | kvm_register_write(vcpu, VCPU_REGS_RBX, 0); | 4472 | kvm_register_write(vcpu, VCPU_REGS_RBX, 0); |
4591 | kvm_register_write(vcpu, VCPU_REGS_RCX, 0); | 4473 | kvm_register_write(vcpu, VCPU_REGS_RCX, 0); |
4592 | kvm_register_write(vcpu, VCPU_REGS_RDX, 0); | 4474 | kvm_register_write(vcpu, VCPU_REGS_RDX, 0); |
4593 | best = kvm_find_cpuid_entry(vcpu, function, index); | 4475 | best = kvm_find_cpuid_entry(vcpu, function, index); |
4594 | if (best) { | 4476 | if (best) { |
4595 | kvm_register_write(vcpu, VCPU_REGS_RAX, best->eax); | 4477 | kvm_register_write(vcpu, VCPU_REGS_RAX, best->eax); |
4596 | kvm_register_write(vcpu, VCPU_REGS_RBX, best->ebx); | 4478 | kvm_register_write(vcpu, VCPU_REGS_RBX, best->ebx); |
4597 | kvm_register_write(vcpu, VCPU_REGS_RCX, best->ecx); | 4479 | kvm_register_write(vcpu, VCPU_REGS_RCX, best->ecx); |
4598 | kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx); | 4480 | kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx); |
4599 | } | 4481 | } |
4600 | kvm_x86_ops->skip_emulated_instruction(vcpu); | 4482 | kvm_x86_ops->skip_emulated_instruction(vcpu); |
4601 | trace_kvm_cpuid(function, | 4483 | trace_kvm_cpuid(function, |
4602 | kvm_register_read(vcpu, VCPU_REGS_RAX), | 4484 | kvm_register_read(vcpu, VCPU_REGS_RAX), |
4603 | kvm_register_read(vcpu, VCPU_REGS_RBX), | 4485 | kvm_register_read(vcpu, VCPU_REGS_RBX), |
4604 | kvm_register_read(vcpu, VCPU_REGS_RCX), | 4486 | kvm_register_read(vcpu, VCPU_REGS_RCX), |
4605 | kvm_register_read(vcpu, VCPU_REGS_RDX)); | 4487 | kvm_register_read(vcpu, VCPU_REGS_RDX)); |
4606 | } | 4488 | } |
4607 | EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); | 4489 | EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); |
4608 | 4490 | ||
4609 | /* | 4491 | /* |
4610 | * Check if userspace requested an interrupt window, and that the | 4492 | * Check if userspace requested an interrupt window, and that the |
4611 | * interrupt window is open. | 4493 | * interrupt window is open. |
4612 | * | 4494 | * |
4613 | * No need to exit to userspace if we already have an interrupt queued. | 4495 | * No need to exit to userspace if we already have an interrupt queued. |
4614 | */ | 4496 | */ |
4615 | static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu) | 4497 | static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu) |
4616 | { | 4498 | { |
4617 | return (!irqchip_in_kernel(vcpu->kvm) && !kvm_cpu_has_interrupt(vcpu) && | 4499 | return (!irqchip_in_kernel(vcpu->kvm) && !kvm_cpu_has_interrupt(vcpu) && |
4618 | vcpu->run->request_interrupt_window && | 4500 | vcpu->run->request_interrupt_window && |
4619 | kvm_arch_interrupt_allowed(vcpu)); | 4501 | kvm_arch_interrupt_allowed(vcpu)); |
4620 | } | 4502 | } |
4621 | 4503 | ||
4622 | static void post_kvm_run_save(struct kvm_vcpu *vcpu) | 4504 | static void post_kvm_run_save(struct kvm_vcpu *vcpu) |
4623 | { | 4505 | { |
4624 | struct kvm_run *kvm_run = vcpu->run; | 4506 | struct kvm_run *kvm_run = vcpu->run; |
4625 | 4507 | ||
4626 | kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0; | 4508 | kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0; |
4627 | kvm_run->cr8 = kvm_get_cr8(vcpu); | 4509 | kvm_run->cr8 = kvm_get_cr8(vcpu); |
4628 | kvm_run->apic_base = kvm_get_apic_base(vcpu); | 4510 | kvm_run->apic_base = kvm_get_apic_base(vcpu); |
4629 | if (irqchip_in_kernel(vcpu->kvm)) | 4511 | if (irqchip_in_kernel(vcpu->kvm)) |
4630 | kvm_run->ready_for_interrupt_injection = 1; | 4512 | kvm_run->ready_for_interrupt_injection = 1; |
4631 | else | 4513 | else |
4632 | kvm_run->ready_for_interrupt_injection = | 4514 | kvm_run->ready_for_interrupt_injection = |
4633 | kvm_arch_interrupt_allowed(vcpu) && | 4515 | kvm_arch_interrupt_allowed(vcpu) && |
4634 | !kvm_cpu_has_interrupt(vcpu) && | 4516 | !kvm_cpu_has_interrupt(vcpu) && |
4635 | !kvm_event_needs_reinjection(vcpu); | 4517 | !kvm_event_needs_reinjection(vcpu); |
4636 | } | 4518 | } |
4637 | 4519 | ||
4638 | static void vapic_enter(struct kvm_vcpu *vcpu) | 4520 | static void vapic_enter(struct kvm_vcpu *vcpu) |
4639 | { | 4521 | { |
4640 | struct kvm_lapic *apic = vcpu->arch.apic; | 4522 | struct kvm_lapic *apic = vcpu->arch.apic; |
4641 | struct page *page; | 4523 | struct page *page; |
4642 | 4524 | ||
4643 | if (!apic || !apic->vapic_addr) | 4525 | if (!apic || !apic->vapic_addr) |
4644 | return; | 4526 | return; |
4645 | 4527 | ||
4646 | page = gfn_to_page(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); | 4528 | page = gfn_to_page(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); |
4647 | 4529 | ||
4648 | vcpu->arch.apic->vapic_page = page; | 4530 | vcpu->arch.apic->vapic_page = page; |
4649 | } | 4531 | } |
4650 | 4532 | ||
4651 | static void vapic_exit(struct kvm_vcpu *vcpu) | 4533 | static void vapic_exit(struct kvm_vcpu *vcpu) |
4652 | { | 4534 | { |
4653 | struct kvm_lapic *apic = vcpu->arch.apic; | 4535 | struct kvm_lapic *apic = vcpu->arch.apic; |
4654 | int idx; | 4536 | int idx; |
4655 | 4537 | ||
4656 | if (!apic || !apic->vapic_addr) | 4538 | if (!apic || !apic->vapic_addr) |
4657 | return; | 4539 | return; |
4658 | 4540 | ||
4659 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 4541 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
4660 | kvm_release_page_dirty(apic->vapic_page); | 4542 | kvm_release_page_dirty(apic->vapic_page); |
4661 | mark_page_dirty(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); | 4543 | mark_page_dirty(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT); |
4662 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 4544 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
4663 | } | 4545 | } |
4664 | 4546 | ||
4665 | static void update_cr8_intercept(struct kvm_vcpu *vcpu) | 4547 | static void update_cr8_intercept(struct kvm_vcpu *vcpu) |
4666 | { | 4548 | { |
4667 | int max_irr, tpr; | 4549 | int max_irr, tpr; |
4668 | 4550 | ||
4669 | if (!kvm_x86_ops->update_cr8_intercept) | 4551 | if (!kvm_x86_ops->update_cr8_intercept) |
4670 | return; | 4552 | return; |
4671 | 4553 | ||
4672 | if (!vcpu->arch.apic) | 4554 | if (!vcpu->arch.apic) |
4673 | return; | 4555 | return; |
4674 | 4556 | ||
4675 | if (!vcpu->arch.apic->vapic_addr) | 4557 | if (!vcpu->arch.apic->vapic_addr) |
4676 | max_irr = kvm_lapic_find_highest_irr(vcpu); | 4558 | max_irr = kvm_lapic_find_highest_irr(vcpu); |
4677 | else | 4559 | else |
4678 | max_irr = -1; | 4560 | max_irr = -1; |
4679 | 4561 | ||
4680 | if (max_irr != -1) | 4562 | if (max_irr != -1) |
4681 | max_irr >>= 4; | 4563 | max_irr >>= 4; |
4682 | 4564 | ||
4683 | tpr = kvm_lapic_get_cr8(vcpu); | 4565 | tpr = kvm_lapic_get_cr8(vcpu); |
4684 | 4566 | ||
4685 | kvm_x86_ops->update_cr8_intercept(vcpu, tpr, max_irr); | 4567 | kvm_x86_ops->update_cr8_intercept(vcpu, tpr, max_irr); |
4686 | } | 4568 | } |
4687 | 4569 | ||
4688 | static void inject_pending_event(struct kvm_vcpu *vcpu) | 4570 | static void inject_pending_event(struct kvm_vcpu *vcpu) |
4689 | { | 4571 | { |
4690 | /* try to reinject previous events if any */ | 4572 | /* try to reinject previous events if any */ |
4691 | if (vcpu->arch.exception.pending) { | 4573 | if (vcpu->arch.exception.pending) { |
4692 | trace_kvm_inj_exception(vcpu->arch.exception.nr, | 4574 | trace_kvm_inj_exception(vcpu->arch.exception.nr, |
4693 | vcpu->arch.exception.has_error_code, | 4575 | vcpu->arch.exception.has_error_code, |
4694 | vcpu->arch.exception.error_code); | 4576 | vcpu->arch.exception.error_code); |
4695 | kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr, | 4577 | kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr, |
4696 | vcpu->arch.exception.has_error_code, | 4578 | vcpu->arch.exception.has_error_code, |
4697 | vcpu->arch.exception.error_code, | 4579 | vcpu->arch.exception.error_code, |
4698 | vcpu->arch.exception.reinject); | 4580 | vcpu->arch.exception.reinject); |
4699 | return; | 4581 | return; |
4700 | } | 4582 | } |
4701 | 4583 | ||
4702 | if (vcpu->arch.nmi_injected) { | 4584 | if (vcpu->arch.nmi_injected) { |
4703 | kvm_x86_ops->set_nmi(vcpu); | 4585 | kvm_x86_ops->set_nmi(vcpu); |
4704 | return; | 4586 | return; |
4705 | } | 4587 | } |
4706 | 4588 | ||
4707 | if (vcpu->arch.interrupt.pending) { | 4589 | if (vcpu->arch.interrupt.pending) { |
4708 | kvm_x86_ops->set_irq(vcpu); | 4590 | kvm_x86_ops->set_irq(vcpu); |
4709 | return; | 4591 | return; |
4710 | } | 4592 | } |
4711 | 4593 | ||
4712 | /* try to inject new event if pending */ | 4594 | /* try to inject new event if pending */ |
4713 | if (vcpu->arch.nmi_pending) { | 4595 | if (vcpu->arch.nmi_pending) { |
4714 | if (kvm_x86_ops->nmi_allowed(vcpu)) { | 4596 | if (kvm_x86_ops->nmi_allowed(vcpu)) { |
4715 | vcpu->arch.nmi_pending = false; | 4597 | vcpu->arch.nmi_pending = false; |
4716 | vcpu->arch.nmi_injected = true; | 4598 | vcpu->arch.nmi_injected = true; |
4717 | kvm_x86_ops->set_nmi(vcpu); | 4599 | kvm_x86_ops->set_nmi(vcpu); |
4718 | } | 4600 | } |
4719 | } else if (kvm_cpu_has_interrupt(vcpu)) { | 4601 | } else if (kvm_cpu_has_interrupt(vcpu)) { |
4720 | if (kvm_x86_ops->interrupt_allowed(vcpu)) { | 4602 | if (kvm_x86_ops->interrupt_allowed(vcpu)) { |
4721 | kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), | 4603 | kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), |
4722 | false); | 4604 | false); |
4723 | kvm_x86_ops->set_irq(vcpu); | 4605 | kvm_x86_ops->set_irq(vcpu); |
4724 | } | 4606 | } |
4725 | } | 4607 | } |
4726 | } | 4608 | } |
4727 | 4609 | ||
4728 | static void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu) | 4610 | static void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu) |
4729 | { | 4611 | { |
4730 | if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) && | 4612 | if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) && |
4731 | !vcpu->guest_xcr0_loaded) { | 4613 | !vcpu->guest_xcr0_loaded) { |
4732 | /* kvm_set_xcr() also depends on this */ | 4614 | /* kvm_set_xcr() also depends on this */ |
4733 | xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); | 4615 | xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); |
4734 | vcpu->guest_xcr0_loaded = 1; | 4616 | vcpu->guest_xcr0_loaded = 1; |
4735 | } | 4617 | } |
4736 | } | 4618 | } |
4737 | 4619 | ||
4738 | static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu) | 4620 | static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu) |
4739 | { | 4621 | { |
4740 | if (vcpu->guest_xcr0_loaded) { | 4622 | if (vcpu->guest_xcr0_loaded) { |
4741 | if (vcpu->arch.xcr0 != host_xcr0) | 4623 | if (vcpu->arch.xcr0 != host_xcr0) |
4742 | xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); | 4624 | xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); |
4743 | vcpu->guest_xcr0_loaded = 0; | 4625 | vcpu->guest_xcr0_loaded = 0; |
4744 | } | 4626 | } |
4745 | } | 4627 | } |
4746 | 4628 | ||
4747 | static int vcpu_enter_guest(struct kvm_vcpu *vcpu) | 4629 | static int vcpu_enter_guest(struct kvm_vcpu *vcpu) |
4748 | { | 4630 | { |
4749 | int r; | 4631 | int r; |
4750 | bool req_int_win = !irqchip_in_kernel(vcpu->kvm) && | 4632 | bool req_int_win = !irqchip_in_kernel(vcpu->kvm) && |
4751 | vcpu->run->request_interrupt_window; | 4633 | vcpu->run->request_interrupt_window; |
4752 | 4634 | ||
4753 | if (vcpu->requests) | 4635 | if (vcpu->requests) |
4754 | if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) | 4636 | if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) |
4755 | kvm_mmu_unload(vcpu); | 4637 | kvm_mmu_unload(vcpu); |
4756 | 4638 | ||
4757 | r = kvm_mmu_reload(vcpu); | 4639 | r = kvm_mmu_reload(vcpu); |
4758 | if (unlikely(r)) | 4640 | if (unlikely(r)) |
4759 | goto out; | 4641 | goto out; |
4760 | 4642 | ||
4761 | if (vcpu->requests) { | 4643 | if (vcpu->requests) { |
4762 | if (test_and_clear_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests)) | 4644 | if (test_and_clear_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests)) |
4763 | __kvm_migrate_timers(vcpu); | 4645 | __kvm_migrate_timers(vcpu); |
4764 | if (test_and_clear_bit(KVM_REQ_KVMCLOCK_UPDATE, &vcpu->requests)) | 4646 | if (test_and_clear_bit(KVM_REQ_KVMCLOCK_UPDATE, &vcpu->requests)) |
4765 | kvm_write_guest_time(vcpu); | 4647 | kvm_write_guest_time(vcpu); |
4766 | if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests)) | 4648 | if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests)) |
4767 | kvm_mmu_sync_roots(vcpu); | 4649 | kvm_mmu_sync_roots(vcpu); |
4768 | if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) | 4650 | if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) |
4769 | kvm_x86_ops->tlb_flush(vcpu); | 4651 | kvm_x86_ops->tlb_flush(vcpu); |
4770 | if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS, | 4652 | if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS, |
4771 | &vcpu->requests)) { | 4653 | &vcpu->requests)) { |
4772 | vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS; | 4654 | vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS; |
4773 | r = 0; | 4655 | r = 0; |
4774 | goto out; | 4656 | goto out; |
4775 | } | 4657 | } |
4776 | if (test_and_clear_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests)) { | 4658 | if (test_and_clear_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests)) { |
4777 | vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; | 4659 | vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; |
4778 | r = 0; | 4660 | r = 0; |
4779 | goto out; | 4661 | goto out; |
4780 | } | 4662 | } |
4781 | if (test_and_clear_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests)) { | 4663 | if (test_and_clear_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests)) { |
4782 | vcpu->fpu_active = 0; | 4664 | vcpu->fpu_active = 0; |
4783 | kvm_x86_ops->fpu_deactivate(vcpu); | 4665 | kvm_x86_ops->fpu_deactivate(vcpu); |
4784 | } | 4666 | } |
4785 | } | 4667 | } |
4786 | 4668 | ||
4787 | preempt_disable(); | 4669 | preempt_disable(); |
4788 | 4670 | ||
4789 | kvm_x86_ops->prepare_guest_switch(vcpu); | 4671 | kvm_x86_ops->prepare_guest_switch(vcpu); |
4790 | if (vcpu->fpu_active) | 4672 | if (vcpu->fpu_active) |
4791 | kvm_load_guest_fpu(vcpu); | 4673 | kvm_load_guest_fpu(vcpu); |
4792 | kvm_load_guest_xcr0(vcpu); | 4674 | kvm_load_guest_xcr0(vcpu); |
4793 | 4675 | ||
4794 | atomic_set(&vcpu->guest_mode, 1); | 4676 | atomic_set(&vcpu->guest_mode, 1); |
4795 | smp_wmb(); | 4677 | smp_wmb(); |
4796 | 4678 | ||
4797 | local_irq_disable(); | 4679 | local_irq_disable(); |
4798 | 4680 | ||
4799 | if (!atomic_read(&vcpu->guest_mode) || vcpu->requests | 4681 | if (!atomic_read(&vcpu->guest_mode) || vcpu->requests |
4800 | || need_resched() || signal_pending(current)) { | 4682 | || need_resched() || signal_pending(current)) { |
4801 | atomic_set(&vcpu->guest_mode, 0); | 4683 | atomic_set(&vcpu->guest_mode, 0); |
4802 | smp_wmb(); | 4684 | smp_wmb(); |
4803 | local_irq_enable(); | 4685 | local_irq_enable(); |
4804 | preempt_enable(); | 4686 | preempt_enable(); |
4805 | r = 1; | 4687 | r = 1; |
4806 | goto out; | 4688 | goto out; |
4807 | } | 4689 | } |
4808 | 4690 | ||
4809 | inject_pending_event(vcpu); | 4691 | inject_pending_event(vcpu); |
4810 | 4692 | ||
4811 | /* enable NMI/IRQ window open exits if needed */ | 4693 | /* enable NMI/IRQ window open exits if needed */ |
4812 | if (vcpu->arch.nmi_pending) | 4694 | if (vcpu->arch.nmi_pending) |
4813 | kvm_x86_ops->enable_nmi_window(vcpu); | 4695 | kvm_x86_ops->enable_nmi_window(vcpu); |
4814 | else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) | 4696 | else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) |
4815 | kvm_x86_ops->enable_irq_window(vcpu); | 4697 | kvm_x86_ops->enable_irq_window(vcpu); |
4816 | 4698 | ||
4817 | if (kvm_lapic_enabled(vcpu)) { | 4699 | if (kvm_lapic_enabled(vcpu)) { |
4818 | update_cr8_intercept(vcpu); | 4700 | update_cr8_intercept(vcpu); |
4819 | kvm_lapic_sync_to_vapic(vcpu); | 4701 | kvm_lapic_sync_to_vapic(vcpu); |
4820 | } | 4702 | } |
4821 | 4703 | ||
4822 | srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); | 4704 | srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); |
4823 | 4705 | ||
4824 | kvm_guest_enter(); | 4706 | kvm_guest_enter(); |
4825 | 4707 | ||
4826 | if (unlikely(vcpu->arch.switch_db_regs)) { | 4708 | if (unlikely(vcpu->arch.switch_db_regs)) { |
4827 | set_debugreg(0, 7); | 4709 | set_debugreg(0, 7); |
4828 | set_debugreg(vcpu->arch.eff_db[0], 0); | 4710 | set_debugreg(vcpu->arch.eff_db[0], 0); |
4829 | set_debugreg(vcpu->arch.eff_db[1], 1); | 4711 | set_debugreg(vcpu->arch.eff_db[1], 1); |
4830 | set_debugreg(vcpu->arch.eff_db[2], 2); | 4712 | set_debugreg(vcpu->arch.eff_db[2], 2); |
4831 | set_debugreg(vcpu->arch.eff_db[3], 3); | 4713 | set_debugreg(vcpu->arch.eff_db[3], 3); |
4832 | } | 4714 | } |
4833 | 4715 | ||
4834 | trace_kvm_entry(vcpu->vcpu_id); | 4716 | trace_kvm_entry(vcpu->vcpu_id); |
4835 | kvm_x86_ops->run(vcpu); | 4717 | kvm_x86_ops->run(vcpu); |
4836 | 4718 | ||
4837 | /* | 4719 | /* |
4838 | * If the guest has used debug registers, at least dr7 | 4720 | * If the guest has used debug registers, at least dr7 |
4839 | * will be disabled while returning to the host. | 4721 | * will be disabled while returning to the host. |
4840 | * If we don't have active breakpoints in the host, we don't | 4722 | * If we don't have active breakpoints in the host, we don't |
4841 | * care about the messed up debug address registers. But if | 4723 | * care about the messed up debug address registers. But if |
4842 | * we have some of them active, restore the old state. | 4724 | * we have some of them active, restore the old state. |
4843 | */ | 4725 | */ |
4844 | if (hw_breakpoint_active()) | 4726 | if (hw_breakpoint_active()) |
4845 | hw_breakpoint_restore(); | 4727 | hw_breakpoint_restore(); |
4846 | 4728 | ||
4847 | atomic_set(&vcpu->guest_mode, 0); | 4729 | atomic_set(&vcpu->guest_mode, 0); |
4848 | smp_wmb(); | 4730 | smp_wmb(); |
4849 | local_irq_enable(); | 4731 | local_irq_enable(); |
4850 | 4732 | ||
4851 | ++vcpu->stat.exits; | 4733 | ++vcpu->stat.exits; |
4852 | 4734 | ||
4853 | /* | 4735 | /* |
4854 | * We must have an instruction between local_irq_enable() and | 4736 | * We must have an instruction between local_irq_enable() and |
4855 | * kvm_guest_exit(), so the timer interrupt isn't delayed by | 4737 | * kvm_guest_exit(), so the timer interrupt isn't delayed by |
4856 | * the interrupt shadow. The stat.exits increment will do nicely. | 4738 | * the interrupt shadow. The stat.exits increment will do nicely. |
4857 | * But we need to prevent reordering, hence this barrier(): | 4739 | * But we need to prevent reordering, hence this barrier(): |
4858 | */ | 4740 | */ |
4859 | barrier(); | 4741 | barrier(); |
4860 | 4742 | ||
4861 | kvm_guest_exit(); | 4743 | kvm_guest_exit(); |
4862 | 4744 | ||
4863 | preempt_enable(); | 4745 | preempt_enable(); |
4864 | 4746 | ||
4865 | vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); | 4747 | vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); |
4866 | 4748 | ||
4867 | /* | 4749 | /* |
4868 | * Profile KVM exit RIPs: | 4750 | * Profile KVM exit RIPs: |
4869 | */ | 4751 | */ |
4870 | if (unlikely(prof_on == KVM_PROFILING)) { | 4752 | if (unlikely(prof_on == KVM_PROFILING)) { |
4871 | unsigned long rip = kvm_rip_read(vcpu); | 4753 | unsigned long rip = kvm_rip_read(vcpu); |
4872 | profile_hit(KVM_PROFILING, (void *)rip); | 4754 | profile_hit(KVM_PROFILING, (void *)rip); |
4873 | } | 4755 | } |
4874 | 4756 | ||
4875 | 4757 | ||
4876 | kvm_lapic_sync_from_vapic(vcpu); | 4758 | kvm_lapic_sync_from_vapic(vcpu); |
4877 | 4759 | ||
4878 | r = kvm_x86_ops->handle_exit(vcpu); | 4760 | r = kvm_x86_ops->handle_exit(vcpu); |
4879 | out: | 4761 | out: |
4880 | return r; | 4762 | return r; |
4881 | } | 4763 | } |
4882 | 4764 | ||
4883 | 4765 | ||
4884 | static int __vcpu_run(struct kvm_vcpu *vcpu) | 4766 | static int __vcpu_run(struct kvm_vcpu *vcpu) |
4885 | { | 4767 | { |
4886 | int r; | 4768 | int r; |
4887 | struct kvm *kvm = vcpu->kvm; | 4769 | struct kvm *kvm = vcpu->kvm; |
4888 | 4770 | ||
4889 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) { | 4771 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) { |
4890 | pr_debug("vcpu %d received sipi with vector # %x\n", | 4772 | pr_debug("vcpu %d received sipi with vector # %x\n", |
4891 | vcpu->vcpu_id, vcpu->arch.sipi_vector); | 4773 | vcpu->vcpu_id, vcpu->arch.sipi_vector); |
4892 | kvm_lapic_reset(vcpu); | 4774 | kvm_lapic_reset(vcpu); |
4893 | r = kvm_arch_vcpu_reset(vcpu); | 4775 | r = kvm_arch_vcpu_reset(vcpu); |
4894 | if (r) | 4776 | if (r) |
4895 | return r; | 4777 | return r; |
4896 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 4778 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
4897 | } | 4779 | } |
4898 | 4780 | ||
4899 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); | 4781 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); |
4900 | vapic_enter(vcpu); | 4782 | vapic_enter(vcpu); |
4901 | 4783 | ||
4902 | r = 1; | 4784 | r = 1; |
4903 | while (r > 0) { | 4785 | while (r > 0) { |
4904 | if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) | 4786 | if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) |
4905 | r = vcpu_enter_guest(vcpu); | 4787 | r = vcpu_enter_guest(vcpu); |
4906 | else { | 4788 | else { |
4907 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); | 4789 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); |
4908 | kvm_vcpu_block(vcpu); | 4790 | kvm_vcpu_block(vcpu); |
4909 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); | 4791 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); |
4910 | if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests)) | 4792 | if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests)) |
4911 | { | 4793 | { |
4912 | switch(vcpu->arch.mp_state) { | 4794 | switch(vcpu->arch.mp_state) { |
4913 | case KVM_MP_STATE_HALTED: | 4795 | case KVM_MP_STATE_HALTED: |
4914 | vcpu->arch.mp_state = | 4796 | vcpu->arch.mp_state = |
4915 | KVM_MP_STATE_RUNNABLE; | 4797 | KVM_MP_STATE_RUNNABLE; |
4916 | case KVM_MP_STATE_RUNNABLE: | 4798 | case KVM_MP_STATE_RUNNABLE: |
4917 | break; | 4799 | break; |
4918 | case KVM_MP_STATE_SIPI_RECEIVED: | 4800 | case KVM_MP_STATE_SIPI_RECEIVED: |
4919 | default: | 4801 | default: |
4920 | r = -EINTR; | 4802 | r = -EINTR; |
4921 | break; | 4803 | break; |
4922 | } | 4804 | } |
4923 | } | 4805 | } |
4924 | } | 4806 | } |
4925 | 4807 | ||
4926 | if (r <= 0) | 4808 | if (r <= 0) |
4927 | break; | 4809 | break; |
4928 | 4810 | ||
4929 | clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests); | 4811 | clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests); |
4930 | if (kvm_cpu_has_pending_timer(vcpu)) | 4812 | if (kvm_cpu_has_pending_timer(vcpu)) |
4931 | kvm_inject_pending_timer_irqs(vcpu); | 4813 | kvm_inject_pending_timer_irqs(vcpu); |
4932 | 4814 | ||
4933 | if (dm_request_for_irq_injection(vcpu)) { | 4815 | if (dm_request_for_irq_injection(vcpu)) { |
4934 | r = -EINTR; | 4816 | r = -EINTR; |
4935 | vcpu->run->exit_reason = KVM_EXIT_INTR; | 4817 | vcpu->run->exit_reason = KVM_EXIT_INTR; |
4936 | ++vcpu->stat.request_irq_exits; | 4818 | ++vcpu->stat.request_irq_exits; |
4937 | } | 4819 | } |
4938 | if (signal_pending(current)) { | 4820 | if (signal_pending(current)) { |
4939 | r = -EINTR; | 4821 | r = -EINTR; |
4940 | vcpu->run->exit_reason = KVM_EXIT_INTR; | 4822 | vcpu->run->exit_reason = KVM_EXIT_INTR; |
4941 | ++vcpu->stat.signal_exits; | 4823 | ++vcpu->stat.signal_exits; |
4942 | } | 4824 | } |
4943 | if (need_resched()) { | 4825 | if (need_resched()) { |
4944 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); | 4826 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); |
4945 | kvm_resched(vcpu); | 4827 | kvm_resched(vcpu); |
4946 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); | 4828 | vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); |
4947 | } | 4829 | } |
4948 | } | 4830 | } |
4949 | 4831 | ||
4950 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); | 4832 | srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); |
4951 | 4833 | ||
4952 | vapic_exit(vcpu); | 4834 | vapic_exit(vcpu); |
4953 | 4835 | ||
4954 | return r; | 4836 | return r; |
4955 | } | 4837 | } |
4956 | 4838 | ||
4957 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) | 4839 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) |
4958 | { | 4840 | { |
4959 | int r; | 4841 | int r; |
4960 | sigset_t sigsaved; | 4842 | sigset_t sigsaved; |
4961 | 4843 | ||
4962 | if (vcpu->sigset_active) | 4844 | if (vcpu->sigset_active) |
4963 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); | 4845 | sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); |
4964 | 4846 | ||
4965 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { | 4847 | if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { |
4966 | kvm_vcpu_block(vcpu); | 4848 | kvm_vcpu_block(vcpu); |
4967 | clear_bit(KVM_REQ_UNHALT, &vcpu->requests); | 4849 | clear_bit(KVM_REQ_UNHALT, &vcpu->requests); |
4968 | r = -EAGAIN; | 4850 | r = -EAGAIN; |
4969 | goto out; | 4851 | goto out; |
4970 | } | 4852 | } |
4971 | 4853 | ||
4972 | /* re-sync apic's tpr */ | 4854 | /* re-sync apic's tpr */ |
4973 | if (!irqchip_in_kernel(vcpu->kvm)) | 4855 | if (!irqchip_in_kernel(vcpu->kvm)) |
4974 | kvm_set_cr8(vcpu, kvm_run->cr8); | 4856 | kvm_set_cr8(vcpu, kvm_run->cr8); |
4975 | 4857 | ||
4976 | if (vcpu->arch.pio.count || vcpu->mmio_needed || | 4858 | if (vcpu->arch.pio.count || vcpu->mmio_needed || |
4977 | vcpu->arch.emulate_ctxt.restart) { | 4859 | vcpu->arch.emulate_ctxt.restart) { |
4978 | if (vcpu->mmio_needed) { | 4860 | if (vcpu->mmio_needed) { |
4979 | memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); | 4861 | memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); |
4980 | vcpu->mmio_read_completed = 1; | 4862 | vcpu->mmio_read_completed = 1; |
4981 | vcpu->mmio_needed = 0; | 4863 | vcpu->mmio_needed = 0; |
4982 | } | 4864 | } |
4983 | vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); | 4865 | vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); |
4984 | r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); | 4866 | r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); |
4985 | srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); | 4867 | srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); |
4986 | if (r != EMULATE_DONE) { | 4868 | if (r != EMULATE_DONE) { |
4987 | r = 0; | 4869 | r = 0; |
4988 | goto out; | 4870 | goto out; |
4989 | } | 4871 | } |
4990 | } | 4872 | } |
4991 | if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL) | 4873 | if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL) |
4992 | kvm_register_write(vcpu, VCPU_REGS_RAX, | 4874 | kvm_register_write(vcpu, VCPU_REGS_RAX, |
4993 | kvm_run->hypercall.ret); | 4875 | kvm_run->hypercall.ret); |
4994 | 4876 | ||
4995 | r = __vcpu_run(vcpu); | 4877 | r = __vcpu_run(vcpu); |
4996 | 4878 | ||
4997 | out: | 4879 | out: |
4998 | post_kvm_run_save(vcpu); | 4880 | post_kvm_run_save(vcpu); |
4999 | if (vcpu->sigset_active) | 4881 | if (vcpu->sigset_active) |
5000 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); | 4882 | sigprocmask(SIG_SETMASK, &sigsaved, NULL); |
5001 | 4883 | ||
5002 | return r; | 4884 | return r; |
5003 | } | 4885 | } |
5004 | 4886 | ||
5005 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 4887 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
5006 | { | 4888 | { |
5007 | regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX); | 4889 | regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX); |
5008 | regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX); | 4890 | regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX); |
5009 | regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX); | 4891 | regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX); |
5010 | regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX); | 4892 | regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX); |
5011 | regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI); | 4893 | regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI); |
5012 | regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI); | 4894 | regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI); |
5013 | regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); | 4895 | regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); |
5014 | regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP); | 4896 | regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP); |
5015 | #ifdef CONFIG_X86_64 | 4897 | #ifdef CONFIG_X86_64 |
5016 | regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8); | 4898 | regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8); |
5017 | regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9); | 4899 | regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9); |
5018 | regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10); | 4900 | regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10); |
5019 | regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11); | 4901 | regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11); |
5020 | regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12); | 4902 | regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12); |
5021 | regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13); | 4903 | regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13); |
5022 | regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14); | 4904 | regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14); |
5023 | regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15); | 4905 | regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15); |
5024 | #endif | 4906 | #endif |
5025 | 4907 | ||
5026 | regs->rip = kvm_rip_read(vcpu); | 4908 | regs->rip = kvm_rip_read(vcpu); |
5027 | regs->rflags = kvm_get_rflags(vcpu); | 4909 | regs->rflags = kvm_get_rflags(vcpu); |
5028 | 4910 | ||
5029 | return 0; | 4911 | return 0; |
5030 | } | 4912 | } |
5031 | 4913 | ||
5032 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) | 4914 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) |
5033 | { | 4915 | { |
5034 | kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax); | 4916 | kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax); |
5035 | kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx); | 4917 | kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx); |
5036 | kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx); | 4918 | kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx); |
5037 | kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx); | 4919 | kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx); |
5038 | kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi); | 4920 | kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi); |
5039 | kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi); | 4921 | kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi); |
5040 | kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp); | 4922 | kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp); |
5041 | kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp); | 4923 | kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp); |
5042 | #ifdef CONFIG_X86_64 | 4924 | #ifdef CONFIG_X86_64 |
5043 | kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8); | 4925 | kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8); |
5044 | kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9); | 4926 | kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9); |
5045 | kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10); | 4927 | kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10); |
5046 | kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11); | 4928 | kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11); |
5047 | kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12); | 4929 | kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12); |
5048 | kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13); | 4930 | kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13); |
5049 | kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14); | 4931 | kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14); |
5050 | kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15); | 4932 | kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15); |
5051 | #endif | 4933 | #endif |
5052 | 4934 | ||
5053 | kvm_rip_write(vcpu, regs->rip); | 4935 | kvm_rip_write(vcpu, regs->rip); |
5054 | kvm_set_rflags(vcpu, regs->rflags); | 4936 | kvm_set_rflags(vcpu, regs->rflags); |
5055 | 4937 | ||
5056 | vcpu->arch.exception.pending = false; | 4938 | vcpu->arch.exception.pending = false; |
5057 | 4939 | ||
5058 | return 0; | 4940 | return 0; |
5059 | } | 4941 | } |
5060 | 4942 | ||
5061 | void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) | 4943 | void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) |
5062 | { | 4944 | { |
5063 | struct kvm_segment cs; | 4945 | struct kvm_segment cs; |
5064 | 4946 | ||
5065 | kvm_get_segment(vcpu, &cs, VCPU_SREG_CS); | 4947 | kvm_get_segment(vcpu, &cs, VCPU_SREG_CS); |
5066 | *db = cs.db; | 4948 | *db = cs.db; |
5067 | *l = cs.l; | 4949 | *l = cs.l; |
5068 | } | 4950 | } |
5069 | EXPORT_SYMBOL_GPL(kvm_get_cs_db_l_bits); | 4951 | EXPORT_SYMBOL_GPL(kvm_get_cs_db_l_bits); |
5070 | 4952 | ||
5071 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, | 4953 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, |
5072 | struct kvm_sregs *sregs) | 4954 | struct kvm_sregs *sregs) |
5073 | { | 4955 | { |
5074 | struct desc_ptr dt; | 4956 | struct desc_ptr dt; |
5075 | 4957 | ||
5076 | kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS); | 4958 | kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS); |
5077 | kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS); | 4959 | kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS); |
5078 | kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES); | 4960 | kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES); |
5079 | kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS); | 4961 | kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS); |
5080 | kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS); | 4962 | kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS); |
5081 | kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS); | 4963 | kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS); |
5082 | 4964 | ||
5083 | kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR); | 4965 | kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR); |
5084 | kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); | 4966 | kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); |
5085 | 4967 | ||
5086 | kvm_x86_ops->get_idt(vcpu, &dt); | 4968 | kvm_x86_ops->get_idt(vcpu, &dt); |
5087 | sregs->idt.limit = dt.size; | 4969 | sregs->idt.limit = dt.size; |
5088 | sregs->idt.base = dt.address; | 4970 | sregs->idt.base = dt.address; |
5089 | kvm_x86_ops->get_gdt(vcpu, &dt); | 4971 | kvm_x86_ops->get_gdt(vcpu, &dt); |
5090 | sregs->gdt.limit = dt.size; | 4972 | sregs->gdt.limit = dt.size; |
5091 | sregs->gdt.base = dt.address; | 4973 | sregs->gdt.base = dt.address; |
5092 | 4974 | ||
5093 | sregs->cr0 = kvm_read_cr0(vcpu); | 4975 | sregs->cr0 = kvm_read_cr0(vcpu); |
5094 | sregs->cr2 = vcpu->arch.cr2; | 4976 | sregs->cr2 = vcpu->arch.cr2; |
5095 | sregs->cr3 = vcpu->arch.cr3; | 4977 | sregs->cr3 = vcpu->arch.cr3; |
5096 | sregs->cr4 = kvm_read_cr4(vcpu); | 4978 | sregs->cr4 = kvm_read_cr4(vcpu); |
5097 | sregs->cr8 = kvm_get_cr8(vcpu); | 4979 | sregs->cr8 = kvm_get_cr8(vcpu); |
5098 | sregs->efer = vcpu->arch.efer; | 4980 | sregs->efer = vcpu->arch.efer; |
5099 | sregs->apic_base = kvm_get_apic_base(vcpu); | 4981 | sregs->apic_base = kvm_get_apic_base(vcpu); |
5100 | 4982 | ||
5101 | memset(sregs->interrupt_bitmap, 0, sizeof sregs->interrupt_bitmap); | 4983 | memset(sregs->interrupt_bitmap, 0, sizeof sregs->interrupt_bitmap); |
5102 | 4984 | ||
5103 | if (vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft) | 4985 | if (vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft) |
5104 | set_bit(vcpu->arch.interrupt.nr, | 4986 | set_bit(vcpu->arch.interrupt.nr, |
5105 | (unsigned long *)sregs->interrupt_bitmap); | 4987 | (unsigned long *)sregs->interrupt_bitmap); |
5106 | 4988 | ||
5107 | return 0; | 4989 | return 0; |
5108 | } | 4990 | } |
5109 | 4991 | ||
5110 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, | 4992 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, |
5111 | struct kvm_mp_state *mp_state) | 4993 | struct kvm_mp_state *mp_state) |
5112 | { | 4994 | { |
5113 | mp_state->mp_state = vcpu->arch.mp_state; | 4995 | mp_state->mp_state = vcpu->arch.mp_state; |
5114 | return 0; | 4996 | return 0; |
5115 | } | 4997 | } |
5116 | 4998 | ||
5117 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, | 4999 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, |
5118 | struct kvm_mp_state *mp_state) | 5000 | struct kvm_mp_state *mp_state) |
5119 | { | 5001 | { |
5120 | vcpu->arch.mp_state = mp_state->mp_state; | 5002 | vcpu->arch.mp_state = mp_state->mp_state; |
5121 | return 0; | 5003 | return 0; |
5122 | } | 5004 | } |
5123 | 5005 | ||
5124 | int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, | 5006 | int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, |
5125 | bool has_error_code, u32 error_code) | 5007 | bool has_error_code, u32 error_code) |
5126 | { | 5008 | { |
5127 | struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; | 5009 | struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; |
5128 | int cs_db, cs_l, ret; | 5010 | int cs_db, cs_l, ret; |
5129 | cache_all_regs(vcpu); | 5011 | cache_all_regs(vcpu); |
5130 | 5012 | ||
5131 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); | 5013 | kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); |
5132 | 5014 | ||
5133 | vcpu->arch.emulate_ctxt.vcpu = vcpu; | 5015 | vcpu->arch.emulate_ctxt.vcpu = vcpu; |
5134 | vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); | 5016 | vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); |
5135 | vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); | 5017 | vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); |
5136 | vcpu->arch.emulate_ctxt.mode = | 5018 | vcpu->arch.emulate_ctxt.mode = |
5137 | (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : | 5019 | (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : |
5138 | (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) | 5020 | (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) |
5139 | ? X86EMUL_MODE_VM86 : cs_l | 5021 | ? X86EMUL_MODE_VM86 : cs_l |
5140 | ? X86EMUL_MODE_PROT64 : cs_db | 5022 | ? X86EMUL_MODE_PROT64 : cs_db |
5141 | ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; | 5023 | ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; |
5142 | memset(c, 0, sizeof(struct decode_cache)); | 5024 | memset(c, 0, sizeof(struct decode_cache)); |
5143 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); | 5025 | memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); |
5144 | 5026 | ||
5145 | ret = emulator_task_switch(&vcpu->arch.emulate_ctxt, &emulate_ops, | 5027 | ret = emulator_task_switch(&vcpu->arch.emulate_ctxt, &emulate_ops, |
5146 | tss_selector, reason, has_error_code, | 5028 | tss_selector, reason, has_error_code, |
5147 | error_code); | 5029 | error_code); |
5148 | 5030 | ||
5149 | if (ret) | 5031 | if (ret) |
5150 | return EMULATE_FAIL; | 5032 | return EMULATE_FAIL; |
5151 | 5033 | ||
5152 | memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); | 5034 | memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); |
5153 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); | 5035 | kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); |
5154 | kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); | 5036 | kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); |
5155 | return EMULATE_DONE; | 5037 | return EMULATE_DONE; |
5156 | } | 5038 | } |
5157 | EXPORT_SYMBOL_GPL(kvm_task_switch); | 5039 | EXPORT_SYMBOL_GPL(kvm_task_switch); |
5158 | 5040 | ||
5159 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, | 5041 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, |
5160 | struct kvm_sregs *sregs) | 5042 | struct kvm_sregs *sregs) |
5161 | { | 5043 | { |
5162 | int mmu_reset_needed = 0; | 5044 | int mmu_reset_needed = 0; |
5163 | int pending_vec, max_bits; | 5045 | int pending_vec, max_bits; |
5164 | struct desc_ptr dt; | 5046 | struct desc_ptr dt; |
5165 | 5047 | ||
5166 | dt.size = sregs->idt.limit; | 5048 | dt.size = sregs->idt.limit; |
5167 | dt.address = sregs->idt.base; | 5049 | dt.address = sregs->idt.base; |
5168 | kvm_x86_ops->set_idt(vcpu, &dt); | 5050 | kvm_x86_ops->set_idt(vcpu, &dt); |
5169 | dt.size = sregs->gdt.limit; | 5051 | dt.size = sregs->gdt.limit; |
5170 | dt.address = sregs->gdt.base; | 5052 | dt.address = sregs->gdt.base; |
5171 | kvm_x86_ops->set_gdt(vcpu, &dt); | 5053 | kvm_x86_ops->set_gdt(vcpu, &dt); |
5172 | 5054 | ||
5173 | vcpu->arch.cr2 = sregs->cr2; | 5055 | vcpu->arch.cr2 = sregs->cr2; |
5174 | mmu_reset_needed |= vcpu->arch.cr3 != sregs->cr3; | 5056 | mmu_reset_needed |= vcpu->arch.cr3 != sregs->cr3; |
5175 | vcpu->arch.cr3 = sregs->cr3; | 5057 | vcpu->arch.cr3 = sregs->cr3; |
5176 | 5058 | ||
5177 | kvm_set_cr8(vcpu, sregs->cr8); | 5059 | kvm_set_cr8(vcpu, sregs->cr8); |
5178 | 5060 | ||
5179 | mmu_reset_needed |= vcpu->arch.efer != sregs->efer; | 5061 | mmu_reset_needed |= vcpu->arch.efer != sregs->efer; |
5180 | kvm_x86_ops->set_efer(vcpu, sregs->efer); | 5062 | kvm_x86_ops->set_efer(vcpu, sregs->efer); |
5181 | kvm_set_apic_base(vcpu, sregs->apic_base); | 5063 | kvm_set_apic_base(vcpu, sregs->apic_base); |
5182 | 5064 | ||
5183 | mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; | 5065 | mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; |
5184 | kvm_x86_ops->set_cr0(vcpu, sregs->cr0); | 5066 | kvm_x86_ops->set_cr0(vcpu, sregs->cr0); |
5185 | vcpu->arch.cr0 = sregs->cr0; | 5067 | vcpu->arch.cr0 = sregs->cr0; |
5186 | 5068 | ||
5187 | mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4; | 5069 | mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4; |
5188 | kvm_x86_ops->set_cr4(vcpu, sregs->cr4); | 5070 | kvm_x86_ops->set_cr4(vcpu, sregs->cr4); |
5189 | if (!is_long_mode(vcpu) && is_pae(vcpu)) { | 5071 | if (!is_long_mode(vcpu) && is_pae(vcpu)) { |
5190 | load_pdptrs(vcpu, vcpu->arch.cr3); | 5072 | load_pdptrs(vcpu, vcpu->arch.cr3); |
5191 | mmu_reset_needed = 1; | 5073 | mmu_reset_needed = 1; |
5192 | } | 5074 | } |
5193 | 5075 | ||
5194 | if (mmu_reset_needed) | 5076 | if (mmu_reset_needed) |
5195 | kvm_mmu_reset_context(vcpu); | 5077 | kvm_mmu_reset_context(vcpu); |
5196 | 5078 | ||
5197 | max_bits = (sizeof sregs->interrupt_bitmap) << 3; | 5079 | max_bits = (sizeof sregs->interrupt_bitmap) << 3; |
5198 | pending_vec = find_first_bit( | 5080 | pending_vec = find_first_bit( |
5199 | (const unsigned long *)sregs->interrupt_bitmap, max_bits); | 5081 | (const unsigned long *)sregs->interrupt_bitmap, max_bits); |
5200 | if (pending_vec < max_bits) { | 5082 | if (pending_vec < max_bits) { |
5201 | kvm_queue_interrupt(vcpu, pending_vec, false); | 5083 | kvm_queue_interrupt(vcpu, pending_vec, false); |
5202 | pr_debug("Set back pending irq %d\n", pending_vec); | 5084 | pr_debug("Set back pending irq %d\n", pending_vec); |
5203 | if (irqchip_in_kernel(vcpu->kvm)) | 5085 | if (irqchip_in_kernel(vcpu->kvm)) |
5204 | kvm_pic_clear_isr_ack(vcpu->kvm); | 5086 | kvm_pic_clear_isr_ack(vcpu->kvm); |
5205 | } | 5087 | } |
5206 | 5088 | ||
5207 | kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS); | 5089 | kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS); |
5208 | kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS); | 5090 | kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS); |
5209 | kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES); | 5091 | kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES); |
5210 | kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS); | 5092 | kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS); |
5211 | kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS); | 5093 | kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS); |
5212 | kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS); | 5094 | kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS); |
5213 | 5095 | ||
5214 | kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR); | 5096 | kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR); |
5215 | kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); | 5097 | kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR); |
5216 | 5098 | ||
5217 | update_cr8_intercept(vcpu); | 5099 | update_cr8_intercept(vcpu); |
5218 | 5100 | ||
5219 | /* Older userspace won't unhalt the vcpu on reset. */ | 5101 | /* Older userspace won't unhalt the vcpu on reset. */ |
5220 | if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 && | 5102 | if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 && |
5221 | sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 && | 5103 | sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 && |
5222 | !is_protmode(vcpu)) | 5104 | !is_protmode(vcpu)) |
5223 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 5105 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
5224 | 5106 | ||
5225 | return 0; | 5107 | return 0; |
5226 | } | 5108 | } |
5227 | 5109 | ||
5228 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, | 5110 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, |
5229 | struct kvm_guest_debug *dbg) | 5111 | struct kvm_guest_debug *dbg) |
5230 | { | 5112 | { |
5231 | unsigned long rflags; | 5113 | unsigned long rflags; |
5232 | int i, r; | 5114 | int i, r; |
5233 | 5115 | ||
5234 | if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) { | 5116 | if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) { |
5235 | r = -EBUSY; | 5117 | r = -EBUSY; |
5236 | if (vcpu->arch.exception.pending) | 5118 | if (vcpu->arch.exception.pending) |
5237 | goto out; | 5119 | goto out; |
5238 | if (dbg->control & KVM_GUESTDBG_INJECT_DB) | 5120 | if (dbg->control & KVM_GUESTDBG_INJECT_DB) |
5239 | kvm_queue_exception(vcpu, DB_VECTOR); | 5121 | kvm_queue_exception(vcpu, DB_VECTOR); |
5240 | else | 5122 | else |
5241 | kvm_queue_exception(vcpu, BP_VECTOR); | 5123 | kvm_queue_exception(vcpu, BP_VECTOR); |
5242 | } | 5124 | } |
5243 | 5125 | ||
5244 | /* | 5126 | /* |
5245 | * Read rflags as long as potentially injected trace flags are still | 5127 | * Read rflags as long as potentially injected trace flags are still |
5246 | * filtered out. | 5128 | * filtered out. |
5247 | */ | 5129 | */ |
5248 | rflags = kvm_get_rflags(vcpu); | 5130 | rflags = kvm_get_rflags(vcpu); |
5249 | 5131 | ||
5250 | vcpu->guest_debug = dbg->control; | 5132 | vcpu->guest_debug = dbg->control; |
5251 | if (!(vcpu->guest_debug & KVM_GUESTDBG_ENABLE)) | 5133 | if (!(vcpu->guest_debug & KVM_GUESTDBG_ENABLE)) |
5252 | vcpu->guest_debug = 0; | 5134 | vcpu->guest_debug = 0; |
5253 | 5135 | ||
5254 | if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) { | 5136 | if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) { |
5255 | for (i = 0; i < KVM_NR_DB_REGS; ++i) | 5137 | for (i = 0; i < KVM_NR_DB_REGS; ++i) |
5256 | vcpu->arch.eff_db[i] = dbg->arch.debugreg[i]; | 5138 | vcpu->arch.eff_db[i] = dbg->arch.debugreg[i]; |
5257 | vcpu->arch.switch_db_regs = | 5139 | vcpu->arch.switch_db_regs = |
5258 | (dbg->arch.debugreg[7] & DR7_BP_EN_MASK); | 5140 | (dbg->arch.debugreg[7] & DR7_BP_EN_MASK); |
5259 | } else { | 5141 | } else { |
5260 | for (i = 0; i < KVM_NR_DB_REGS; i++) | 5142 | for (i = 0; i < KVM_NR_DB_REGS; i++) |
5261 | vcpu->arch.eff_db[i] = vcpu->arch.db[i]; | 5143 | vcpu->arch.eff_db[i] = vcpu->arch.db[i]; |
5262 | vcpu->arch.switch_db_regs = (vcpu->arch.dr7 & DR7_BP_EN_MASK); | 5144 | vcpu->arch.switch_db_regs = (vcpu->arch.dr7 & DR7_BP_EN_MASK); |
5263 | } | 5145 | } |
5264 | 5146 | ||
5265 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) | 5147 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) |
5266 | vcpu->arch.singlestep_rip = kvm_rip_read(vcpu) + | 5148 | vcpu->arch.singlestep_rip = kvm_rip_read(vcpu) + |
5267 | get_segment_base(vcpu, VCPU_SREG_CS); | 5149 | get_segment_base(vcpu, VCPU_SREG_CS); |
5268 | 5150 | ||
5269 | /* | 5151 | /* |
5270 | * Trigger an rflags update that will inject or remove the trace | 5152 | * Trigger an rflags update that will inject or remove the trace |
5271 | * flags. | 5153 | * flags. |
5272 | */ | 5154 | */ |
5273 | kvm_set_rflags(vcpu, rflags); | 5155 | kvm_set_rflags(vcpu, rflags); |
5274 | 5156 | ||
5275 | kvm_x86_ops->set_guest_debug(vcpu, dbg); | 5157 | kvm_x86_ops->set_guest_debug(vcpu, dbg); |
5276 | 5158 | ||
5277 | r = 0; | 5159 | r = 0; |
5278 | 5160 | ||
5279 | out: | 5161 | out: |
5280 | 5162 | ||
5281 | return r; | 5163 | return r; |
5282 | } | 5164 | } |
5283 | 5165 | ||
5284 | /* | 5166 | /* |
5285 | * Translate a guest virtual address to a guest physical address. | 5167 | * Translate a guest virtual address to a guest physical address. |
5286 | */ | 5168 | */ |
5287 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, | 5169 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, |
5288 | struct kvm_translation *tr) | 5170 | struct kvm_translation *tr) |
5289 | { | 5171 | { |
5290 | unsigned long vaddr = tr->linear_address; | 5172 | unsigned long vaddr = tr->linear_address; |
5291 | gpa_t gpa; | 5173 | gpa_t gpa; |
5292 | int idx; | 5174 | int idx; |
5293 | 5175 | ||
5294 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 5176 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
5295 | gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL); | 5177 | gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL); |
5296 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 5178 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
5297 | tr->physical_address = gpa; | 5179 | tr->physical_address = gpa; |
5298 | tr->valid = gpa != UNMAPPED_GVA; | 5180 | tr->valid = gpa != UNMAPPED_GVA; |
5299 | tr->writeable = 1; | 5181 | tr->writeable = 1; |
5300 | tr->usermode = 0; | 5182 | tr->usermode = 0; |
5301 | 5183 | ||
5302 | return 0; | 5184 | return 0; |
5303 | } | 5185 | } |
5304 | 5186 | ||
5305 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 5187 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
5306 | { | 5188 | { |
5307 | struct i387_fxsave_struct *fxsave = | 5189 | struct i387_fxsave_struct *fxsave = |
5308 | &vcpu->arch.guest_fpu.state->fxsave; | 5190 | &vcpu->arch.guest_fpu.state->fxsave; |
5309 | 5191 | ||
5310 | memcpy(fpu->fpr, fxsave->st_space, 128); | 5192 | memcpy(fpu->fpr, fxsave->st_space, 128); |
5311 | fpu->fcw = fxsave->cwd; | 5193 | fpu->fcw = fxsave->cwd; |
5312 | fpu->fsw = fxsave->swd; | 5194 | fpu->fsw = fxsave->swd; |
5313 | fpu->ftwx = fxsave->twd; | 5195 | fpu->ftwx = fxsave->twd; |
5314 | fpu->last_opcode = fxsave->fop; | 5196 | fpu->last_opcode = fxsave->fop; |
5315 | fpu->last_ip = fxsave->rip; | 5197 | fpu->last_ip = fxsave->rip; |
5316 | fpu->last_dp = fxsave->rdp; | 5198 | fpu->last_dp = fxsave->rdp; |
5317 | memcpy(fpu->xmm, fxsave->xmm_space, sizeof fxsave->xmm_space); | 5199 | memcpy(fpu->xmm, fxsave->xmm_space, sizeof fxsave->xmm_space); |
5318 | 5200 | ||
5319 | return 0; | 5201 | return 0; |
5320 | } | 5202 | } |
5321 | 5203 | ||
5322 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) | 5204 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) |
5323 | { | 5205 | { |
5324 | struct i387_fxsave_struct *fxsave = | 5206 | struct i387_fxsave_struct *fxsave = |
5325 | &vcpu->arch.guest_fpu.state->fxsave; | 5207 | &vcpu->arch.guest_fpu.state->fxsave; |
5326 | 5208 | ||
5327 | memcpy(fxsave->st_space, fpu->fpr, 128); | 5209 | memcpy(fxsave->st_space, fpu->fpr, 128); |
5328 | fxsave->cwd = fpu->fcw; | 5210 | fxsave->cwd = fpu->fcw; |
5329 | fxsave->swd = fpu->fsw; | 5211 | fxsave->swd = fpu->fsw; |
5330 | fxsave->twd = fpu->ftwx; | 5212 | fxsave->twd = fpu->ftwx; |
5331 | fxsave->fop = fpu->last_opcode; | 5213 | fxsave->fop = fpu->last_opcode; |
5332 | fxsave->rip = fpu->last_ip; | 5214 | fxsave->rip = fpu->last_ip; |
5333 | fxsave->rdp = fpu->last_dp; | 5215 | fxsave->rdp = fpu->last_dp; |
5334 | memcpy(fxsave->xmm_space, fpu->xmm, sizeof fxsave->xmm_space); | 5216 | memcpy(fxsave->xmm_space, fpu->xmm, sizeof fxsave->xmm_space); |
5335 | 5217 | ||
5336 | return 0; | 5218 | return 0; |
5337 | } | 5219 | } |
5338 | 5220 | ||
5339 | int fx_init(struct kvm_vcpu *vcpu) | 5221 | int fx_init(struct kvm_vcpu *vcpu) |
5340 | { | 5222 | { |
5341 | int err; | 5223 | int err; |
5342 | 5224 | ||
5343 | err = fpu_alloc(&vcpu->arch.guest_fpu); | 5225 | err = fpu_alloc(&vcpu->arch.guest_fpu); |
5344 | if (err) | 5226 | if (err) |
5345 | return err; | 5227 | return err; |
5346 | 5228 | ||
5347 | fpu_finit(&vcpu->arch.guest_fpu); | 5229 | fpu_finit(&vcpu->arch.guest_fpu); |
5348 | 5230 | ||
5349 | /* | 5231 | /* |
5350 | * Ensure guest xcr0 is valid for loading | 5232 | * Ensure guest xcr0 is valid for loading |
5351 | */ | 5233 | */ |
5352 | vcpu->arch.xcr0 = XSTATE_FP; | 5234 | vcpu->arch.xcr0 = XSTATE_FP; |
5353 | 5235 | ||
5354 | vcpu->arch.cr0 |= X86_CR0_ET; | 5236 | vcpu->arch.cr0 |= X86_CR0_ET; |
5355 | 5237 | ||
5356 | return 0; | 5238 | return 0; |
5357 | } | 5239 | } |
5358 | EXPORT_SYMBOL_GPL(fx_init); | 5240 | EXPORT_SYMBOL_GPL(fx_init); |
5359 | 5241 | ||
5360 | static void fx_free(struct kvm_vcpu *vcpu) | 5242 | static void fx_free(struct kvm_vcpu *vcpu) |
5361 | { | 5243 | { |
5362 | fpu_free(&vcpu->arch.guest_fpu); | 5244 | fpu_free(&vcpu->arch.guest_fpu); |
5363 | } | 5245 | } |
5364 | 5246 | ||
5365 | void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) | 5247 | void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) |
5366 | { | 5248 | { |
5367 | if (vcpu->guest_fpu_loaded) | 5249 | if (vcpu->guest_fpu_loaded) |
5368 | return; | 5250 | return; |
5369 | 5251 | ||
5370 | /* | 5252 | /* |
5371 | * Restore all possible states in the guest, | 5253 | * Restore all possible states in the guest, |
5372 | * and assume host would use all available bits. | 5254 | * and assume host would use all available bits. |
5373 | * Guest xcr0 would be loaded later. | 5255 | * Guest xcr0 would be loaded later. |
5374 | */ | 5256 | */ |
5375 | kvm_put_guest_xcr0(vcpu); | 5257 | kvm_put_guest_xcr0(vcpu); |
5376 | vcpu->guest_fpu_loaded = 1; | 5258 | vcpu->guest_fpu_loaded = 1; |
5377 | unlazy_fpu(current); | 5259 | unlazy_fpu(current); |
5378 | fpu_restore_checking(&vcpu->arch.guest_fpu); | 5260 | fpu_restore_checking(&vcpu->arch.guest_fpu); |
5379 | trace_kvm_fpu(1); | 5261 | trace_kvm_fpu(1); |
5380 | } | 5262 | } |
5381 | 5263 | ||
5382 | void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) | 5264 | void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) |
5383 | { | 5265 | { |
5384 | kvm_put_guest_xcr0(vcpu); | 5266 | kvm_put_guest_xcr0(vcpu); |
5385 | 5267 | ||
5386 | if (!vcpu->guest_fpu_loaded) | 5268 | if (!vcpu->guest_fpu_loaded) |
5387 | return; | 5269 | return; |
5388 | 5270 | ||
5389 | vcpu->guest_fpu_loaded = 0; | 5271 | vcpu->guest_fpu_loaded = 0; |
5390 | fpu_save_init(&vcpu->arch.guest_fpu); | 5272 | fpu_save_init(&vcpu->arch.guest_fpu); |
5391 | ++vcpu->stat.fpu_reload; | 5273 | ++vcpu->stat.fpu_reload; |
5392 | set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests); | 5274 | set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests); |
5393 | trace_kvm_fpu(0); | 5275 | trace_kvm_fpu(0); |
5394 | } | 5276 | } |
5395 | 5277 | ||
5396 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) | 5278 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) |
5397 | { | 5279 | { |
5398 | if (vcpu->arch.time_page) { | 5280 | if (vcpu->arch.time_page) { |
5399 | kvm_release_page_dirty(vcpu->arch.time_page); | 5281 | kvm_release_page_dirty(vcpu->arch.time_page); |
5400 | vcpu->arch.time_page = NULL; | 5282 | vcpu->arch.time_page = NULL; |
5401 | } | 5283 | } |
5402 | 5284 | ||
5403 | fx_free(vcpu); | 5285 | fx_free(vcpu); |
5404 | kvm_x86_ops->vcpu_free(vcpu); | 5286 | kvm_x86_ops->vcpu_free(vcpu); |
5405 | } | 5287 | } |
5406 | 5288 | ||
5407 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, | 5289 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, |
5408 | unsigned int id) | 5290 | unsigned int id) |
5409 | { | 5291 | { |
5410 | return kvm_x86_ops->vcpu_create(kvm, id); | 5292 | return kvm_x86_ops->vcpu_create(kvm, id); |
5411 | } | 5293 | } |
5412 | 5294 | ||
5413 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) | 5295 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) |
5414 | { | 5296 | { |
5415 | int r; | 5297 | int r; |
5416 | 5298 | ||
5417 | vcpu->arch.mtrr_state.have_fixed = 1; | 5299 | vcpu->arch.mtrr_state.have_fixed = 1; |
5418 | vcpu_load(vcpu); | 5300 | vcpu_load(vcpu); |
5419 | r = kvm_arch_vcpu_reset(vcpu); | 5301 | r = kvm_arch_vcpu_reset(vcpu); |
5420 | if (r == 0) | 5302 | if (r == 0) |
5421 | r = kvm_mmu_setup(vcpu); | 5303 | r = kvm_mmu_setup(vcpu); |
5422 | vcpu_put(vcpu); | 5304 | vcpu_put(vcpu); |
5423 | if (r < 0) | 5305 | if (r < 0) |
5424 | goto free_vcpu; | 5306 | goto free_vcpu; |
5425 | 5307 | ||
5426 | return 0; | 5308 | return 0; |
5427 | free_vcpu: | 5309 | free_vcpu: |
5428 | kvm_x86_ops->vcpu_free(vcpu); | 5310 | kvm_x86_ops->vcpu_free(vcpu); |
5429 | return r; | 5311 | return r; |
5430 | } | 5312 | } |
5431 | 5313 | ||
5432 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) | 5314 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) |
5433 | { | 5315 | { |
5434 | vcpu_load(vcpu); | 5316 | vcpu_load(vcpu); |
5435 | kvm_mmu_unload(vcpu); | 5317 | kvm_mmu_unload(vcpu); |
5436 | vcpu_put(vcpu); | 5318 | vcpu_put(vcpu); |
5437 | 5319 | ||
5438 | fx_free(vcpu); | 5320 | fx_free(vcpu); |
5439 | kvm_x86_ops->vcpu_free(vcpu); | 5321 | kvm_x86_ops->vcpu_free(vcpu); |
5440 | } | 5322 | } |
5441 | 5323 | ||
5442 | int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) | 5324 | int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) |
5443 | { | 5325 | { |
5444 | vcpu->arch.nmi_pending = false; | 5326 | vcpu->arch.nmi_pending = false; |
5445 | vcpu->arch.nmi_injected = false; | 5327 | vcpu->arch.nmi_injected = false; |
5446 | 5328 | ||
5447 | vcpu->arch.switch_db_regs = 0; | 5329 | vcpu->arch.switch_db_regs = 0; |
5448 | memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); | 5330 | memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db)); |
5449 | vcpu->arch.dr6 = DR6_FIXED_1; | 5331 | vcpu->arch.dr6 = DR6_FIXED_1; |
5450 | vcpu->arch.dr7 = DR7_FIXED_1; | 5332 | vcpu->arch.dr7 = DR7_FIXED_1; |
5451 | 5333 | ||
5452 | return kvm_x86_ops->vcpu_reset(vcpu); | 5334 | return kvm_x86_ops->vcpu_reset(vcpu); |
5453 | } | 5335 | } |
5454 | 5336 | ||
5455 | int kvm_arch_hardware_enable(void *garbage) | 5337 | int kvm_arch_hardware_enable(void *garbage) |
5456 | { | 5338 | { |
5457 | /* | 5339 | /* |
5458 | * Since this may be called from a hotplug notifcation, | 5340 | * Since this may be called from a hotplug notifcation, |
5459 | * we can't get the CPU frequency directly. | 5341 | * we can't get the CPU frequency directly. |
5460 | */ | 5342 | */ |
5461 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { | 5343 | if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { |
5462 | int cpu = raw_smp_processor_id(); | 5344 | int cpu = raw_smp_processor_id(); |
5463 | per_cpu(cpu_tsc_khz, cpu) = 0; | 5345 | per_cpu(cpu_tsc_khz, cpu) = 0; |
5464 | } | 5346 | } |
5465 | 5347 | ||
5466 | kvm_shared_msr_cpu_online(); | 5348 | kvm_shared_msr_cpu_online(); |
5467 | 5349 | ||
5468 | return kvm_x86_ops->hardware_enable(garbage); | 5350 | return kvm_x86_ops->hardware_enable(garbage); |
5469 | } | 5351 | } |
5470 | 5352 | ||
5471 | void kvm_arch_hardware_disable(void *garbage) | 5353 | void kvm_arch_hardware_disable(void *garbage) |
5472 | { | 5354 | { |
5473 | kvm_x86_ops->hardware_disable(garbage); | 5355 | kvm_x86_ops->hardware_disable(garbage); |
5474 | drop_user_return_notifiers(garbage); | 5356 | drop_user_return_notifiers(garbage); |
5475 | } | 5357 | } |
5476 | 5358 | ||
5477 | int kvm_arch_hardware_setup(void) | 5359 | int kvm_arch_hardware_setup(void) |
5478 | { | 5360 | { |
5479 | return kvm_x86_ops->hardware_setup(); | 5361 | return kvm_x86_ops->hardware_setup(); |
5480 | } | 5362 | } |
5481 | 5363 | ||
5482 | void kvm_arch_hardware_unsetup(void) | 5364 | void kvm_arch_hardware_unsetup(void) |
5483 | { | 5365 | { |
5484 | kvm_x86_ops->hardware_unsetup(); | 5366 | kvm_x86_ops->hardware_unsetup(); |
5485 | } | 5367 | } |
5486 | 5368 | ||
5487 | void kvm_arch_check_processor_compat(void *rtn) | 5369 | void kvm_arch_check_processor_compat(void *rtn) |
5488 | { | 5370 | { |
5489 | kvm_x86_ops->check_processor_compatibility(rtn); | 5371 | kvm_x86_ops->check_processor_compatibility(rtn); |
5490 | } | 5372 | } |
5491 | 5373 | ||
5492 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) | 5374 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) |
5493 | { | 5375 | { |
5494 | struct page *page; | 5376 | struct page *page; |
5495 | struct kvm *kvm; | 5377 | struct kvm *kvm; |
5496 | int r; | 5378 | int r; |
5497 | 5379 | ||
5498 | BUG_ON(vcpu->kvm == NULL); | 5380 | BUG_ON(vcpu->kvm == NULL); |
5499 | kvm = vcpu->kvm; | 5381 | kvm = vcpu->kvm; |
5500 | 5382 | ||
5501 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; | 5383 | vcpu->arch.mmu.root_hpa = INVALID_PAGE; |
5502 | if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) | 5384 | if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) |
5503 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; | 5385 | vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; |
5504 | else | 5386 | else |
5505 | vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; | 5387 | vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; |
5506 | 5388 | ||
5507 | page = alloc_page(GFP_KERNEL | __GFP_ZERO); | 5389 | page = alloc_page(GFP_KERNEL | __GFP_ZERO); |
5508 | if (!page) { | 5390 | if (!page) { |
5509 | r = -ENOMEM; | 5391 | r = -ENOMEM; |
5510 | goto fail; | 5392 | goto fail; |
5511 | } | 5393 | } |
5512 | vcpu->arch.pio_data = page_address(page); | 5394 | vcpu->arch.pio_data = page_address(page); |
5513 | 5395 | ||
5514 | r = kvm_mmu_create(vcpu); | 5396 | r = kvm_mmu_create(vcpu); |
5515 | if (r < 0) | 5397 | if (r < 0) |
5516 | goto fail_free_pio_data; | 5398 | goto fail_free_pio_data; |
5517 | 5399 | ||
5518 | if (irqchip_in_kernel(kvm)) { | 5400 | if (irqchip_in_kernel(kvm)) { |
5519 | r = kvm_create_lapic(vcpu); | 5401 | r = kvm_create_lapic(vcpu); |
5520 | if (r < 0) | 5402 | if (r < 0) |
5521 | goto fail_mmu_destroy; | 5403 | goto fail_mmu_destroy; |
5522 | } | 5404 | } |
5523 | 5405 | ||
5524 | vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4, | 5406 | vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4, |
5525 | GFP_KERNEL); | 5407 | GFP_KERNEL); |
5526 | if (!vcpu->arch.mce_banks) { | 5408 | if (!vcpu->arch.mce_banks) { |
5527 | r = -ENOMEM; | 5409 | r = -ENOMEM; |
5528 | goto fail_free_lapic; | 5410 | goto fail_free_lapic; |
5529 | } | 5411 | } |
5530 | vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS; | 5412 | vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS; |
5531 | 5413 | ||
5532 | return 0; | 5414 | return 0; |
5533 | fail_free_lapic: | 5415 | fail_free_lapic: |
5534 | kvm_free_lapic(vcpu); | 5416 | kvm_free_lapic(vcpu); |
5535 | fail_mmu_destroy: | 5417 | fail_mmu_destroy: |
5536 | kvm_mmu_destroy(vcpu); | 5418 | kvm_mmu_destroy(vcpu); |
5537 | fail_free_pio_data: | 5419 | fail_free_pio_data: |
5538 | free_page((unsigned long)vcpu->arch.pio_data); | 5420 | free_page((unsigned long)vcpu->arch.pio_data); |
5539 | fail: | 5421 | fail: |
5540 | return r; | 5422 | return r; |
5541 | } | 5423 | } |
5542 | 5424 | ||
5543 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) | 5425 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) |
5544 | { | 5426 | { |
5545 | int idx; | 5427 | int idx; |
5546 | 5428 | ||
5547 | kfree(vcpu->arch.mce_banks); | 5429 | kfree(vcpu->arch.mce_banks); |
5548 | kvm_free_lapic(vcpu); | 5430 | kvm_free_lapic(vcpu); |
5549 | idx = srcu_read_lock(&vcpu->kvm->srcu); | 5431 | idx = srcu_read_lock(&vcpu->kvm->srcu); |
5550 | kvm_mmu_destroy(vcpu); | 5432 | kvm_mmu_destroy(vcpu); |
5551 | srcu_read_unlock(&vcpu->kvm->srcu, idx); | 5433 | srcu_read_unlock(&vcpu->kvm->srcu, idx); |
5552 | free_page((unsigned long)vcpu->arch.pio_data); | 5434 | free_page((unsigned long)vcpu->arch.pio_data); |
5553 | } | 5435 | } |
5554 | 5436 | ||
5555 | struct kvm *kvm_arch_create_vm(void) | 5437 | struct kvm *kvm_arch_create_vm(void) |
5556 | { | 5438 | { |
5557 | struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); | 5439 | struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); |
5558 | 5440 | ||
5559 | if (!kvm) | 5441 | if (!kvm) |
5560 | return ERR_PTR(-ENOMEM); | 5442 | return ERR_PTR(-ENOMEM); |
5561 | 5443 | ||
5562 | kvm->arch.aliases = kzalloc(sizeof(struct kvm_mem_aliases), GFP_KERNEL); | ||
5563 | if (!kvm->arch.aliases) { | ||
5564 | kfree(kvm); | ||
5565 | return ERR_PTR(-ENOMEM); | ||
5566 | } | ||
5567 | |||
5568 | INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); | 5444 | INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); |
5569 | INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); | 5445 | INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); |
5570 | 5446 | ||
5571 | /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ | 5447 | /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ |
5572 | set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); | 5448 | set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); |
5573 | 5449 | ||
5574 | rdtscll(kvm->arch.vm_init_tsc); | 5450 | rdtscll(kvm->arch.vm_init_tsc); |
5575 | 5451 | ||
5576 | return kvm; | 5452 | return kvm; |
5577 | } | 5453 | } |
5578 | 5454 | ||
5579 | static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu) | 5455 | static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu) |
5580 | { | 5456 | { |
5581 | vcpu_load(vcpu); | 5457 | vcpu_load(vcpu); |
5582 | kvm_mmu_unload(vcpu); | 5458 | kvm_mmu_unload(vcpu); |
5583 | vcpu_put(vcpu); | 5459 | vcpu_put(vcpu); |
5584 | } | 5460 | } |
5585 | 5461 | ||
5586 | static void kvm_free_vcpus(struct kvm *kvm) | 5462 | static void kvm_free_vcpus(struct kvm *kvm) |
5587 | { | 5463 | { |
5588 | unsigned int i; | 5464 | unsigned int i; |
5589 | struct kvm_vcpu *vcpu; | 5465 | struct kvm_vcpu *vcpu; |
5590 | 5466 | ||
5591 | /* | 5467 | /* |
5592 | * Unpin any mmu pages first. | 5468 | * Unpin any mmu pages first. |
5593 | */ | 5469 | */ |
5594 | kvm_for_each_vcpu(i, vcpu, kvm) | 5470 | kvm_for_each_vcpu(i, vcpu, kvm) |
5595 | kvm_unload_vcpu_mmu(vcpu); | 5471 | kvm_unload_vcpu_mmu(vcpu); |
5596 | kvm_for_each_vcpu(i, vcpu, kvm) | 5472 | kvm_for_each_vcpu(i, vcpu, kvm) |
5597 | kvm_arch_vcpu_free(vcpu); | 5473 | kvm_arch_vcpu_free(vcpu); |
5598 | 5474 | ||
5599 | mutex_lock(&kvm->lock); | 5475 | mutex_lock(&kvm->lock); |
5600 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) | 5476 | for (i = 0; i < atomic_read(&kvm->online_vcpus); i++) |
5601 | kvm->vcpus[i] = NULL; | 5477 | kvm->vcpus[i] = NULL; |
5602 | 5478 | ||
5603 | atomic_set(&kvm->online_vcpus, 0); | 5479 | atomic_set(&kvm->online_vcpus, 0); |
5604 | mutex_unlock(&kvm->lock); | 5480 | mutex_unlock(&kvm->lock); |
5605 | } | 5481 | } |
5606 | 5482 | ||
5607 | void kvm_arch_sync_events(struct kvm *kvm) | 5483 | void kvm_arch_sync_events(struct kvm *kvm) |
5608 | { | 5484 | { |
5609 | kvm_free_all_assigned_devices(kvm); | 5485 | kvm_free_all_assigned_devices(kvm); |
5610 | } | 5486 | } |
5611 | 5487 | ||
5612 | void kvm_arch_destroy_vm(struct kvm *kvm) | 5488 | void kvm_arch_destroy_vm(struct kvm *kvm) |
5613 | { | 5489 | { |
5614 | kvm_iommu_unmap_guest(kvm); | 5490 | kvm_iommu_unmap_guest(kvm); |
5615 | kvm_free_pit(kvm); | 5491 | kvm_free_pit(kvm); |
5616 | kfree(kvm->arch.vpic); | 5492 | kfree(kvm->arch.vpic); |
5617 | kfree(kvm->arch.vioapic); | 5493 | kfree(kvm->arch.vioapic); |
5618 | kvm_free_vcpus(kvm); | 5494 | kvm_free_vcpus(kvm); |
5619 | kvm_free_physmem(kvm); | 5495 | kvm_free_physmem(kvm); |
5620 | if (kvm->arch.apic_access_page) | 5496 | if (kvm->arch.apic_access_page) |
5621 | put_page(kvm->arch.apic_access_page); | 5497 | put_page(kvm->arch.apic_access_page); |
5622 | if (kvm->arch.ept_identity_pagetable) | 5498 | if (kvm->arch.ept_identity_pagetable) |
5623 | put_page(kvm->arch.ept_identity_pagetable); | 5499 | put_page(kvm->arch.ept_identity_pagetable); |
5624 | cleanup_srcu_struct(&kvm->srcu); | 5500 | cleanup_srcu_struct(&kvm->srcu); |
5625 | kfree(kvm->arch.aliases); | ||
5626 | kfree(kvm); | 5501 | kfree(kvm); |
5627 | } | 5502 | } |
5628 | 5503 | ||
5629 | int kvm_arch_prepare_memory_region(struct kvm *kvm, | 5504 | int kvm_arch_prepare_memory_region(struct kvm *kvm, |
5630 | struct kvm_memory_slot *memslot, | 5505 | struct kvm_memory_slot *memslot, |
5631 | struct kvm_memory_slot old, | 5506 | struct kvm_memory_slot old, |
5632 | struct kvm_userspace_memory_region *mem, | 5507 | struct kvm_userspace_memory_region *mem, |
5633 | int user_alloc) | 5508 | int user_alloc) |
5634 | { | 5509 | { |
5635 | int npages = memslot->npages; | 5510 | int npages = memslot->npages; |
5636 | 5511 | ||
5637 | /*To keep backward compatibility with older userspace, | 5512 | /*To keep backward compatibility with older userspace, |
5638 | *x86 needs to hanlde !user_alloc case. | 5513 | *x86 needs to hanlde !user_alloc case. |
5639 | */ | 5514 | */ |
5640 | if (!user_alloc) { | 5515 | if (!user_alloc) { |
5641 | if (npages && !old.rmap) { | 5516 | if (npages && !old.rmap) { |
5642 | unsigned long userspace_addr; | 5517 | unsigned long userspace_addr; |
5643 | 5518 | ||
5644 | down_write(¤t->mm->mmap_sem); | 5519 | down_write(¤t->mm->mmap_sem); |
5645 | userspace_addr = do_mmap(NULL, 0, | 5520 | userspace_addr = do_mmap(NULL, 0, |
5646 | npages * PAGE_SIZE, | 5521 | npages * PAGE_SIZE, |
5647 | PROT_READ | PROT_WRITE, | 5522 | PROT_READ | PROT_WRITE, |
5648 | MAP_PRIVATE | MAP_ANONYMOUS, | 5523 | MAP_PRIVATE | MAP_ANONYMOUS, |
5649 | 0); | 5524 | 0); |
5650 | up_write(¤t->mm->mmap_sem); | 5525 | up_write(¤t->mm->mmap_sem); |
5651 | 5526 | ||
5652 | if (IS_ERR((void *)userspace_addr)) | 5527 | if (IS_ERR((void *)userspace_addr)) |
5653 | return PTR_ERR((void *)userspace_addr); | 5528 | return PTR_ERR((void *)userspace_addr); |
5654 | 5529 | ||
5655 | memslot->userspace_addr = userspace_addr; | 5530 | memslot->userspace_addr = userspace_addr; |
5656 | } | 5531 | } |
5657 | } | 5532 | } |
5658 | 5533 | ||
5659 | 5534 | ||
5660 | return 0; | 5535 | return 0; |
5661 | } | 5536 | } |
5662 | 5537 | ||
5663 | void kvm_arch_commit_memory_region(struct kvm *kvm, | 5538 | void kvm_arch_commit_memory_region(struct kvm *kvm, |
5664 | struct kvm_userspace_memory_region *mem, | 5539 | struct kvm_userspace_memory_region *mem, |
5665 | struct kvm_memory_slot old, | 5540 | struct kvm_memory_slot old, |
5666 | int user_alloc) | 5541 | int user_alloc) |
5667 | { | 5542 | { |
5668 | 5543 | ||
5669 | int npages = mem->memory_size >> PAGE_SHIFT; | 5544 | int npages = mem->memory_size >> PAGE_SHIFT; |
5670 | 5545 | ||
5671 | if (!user_alloc && !old.user_alloc && old.rmap && !npages) { | 5546 | if (!user_alloc && !old.user_alloc && old.rmap && !npages) { |
5672 | int ret; | 5547 | int ret; |
5673 | 5548 | ||
5674 | down_write(¤t->mm->mmap_sem); | 5549 | down_write(¤t->mm->mmap_sem); |
5675 | ret = do_munmap(current->mm, old.userspace_addr, | 5550 | ret = do_munmap(current->mm, old.userspace_addr, |
5676 | old.npages * PAGE_SIZE); | 5551 | old.npages * PAGE_SIZE); |
5677 | up_write(¤t->mm->mmap_sem); | 5552 | up_write(¤t->mm->mmap_sem); |
5678 | if (ret < 0) | 5553 | if (ret < 0) |
5679 | printk(KERN_WARNING | 5554 | printk(KERN_WARNING |
5680 | "kvm_vm_ioctl_set_memory_region: " | 5555 | "kvm_vm_ioctl_set_memory_region: " |
5681 | "failed to munmap memory\n"); | 5556 | "failed to munmap memory\n"); |
5682 | } | 5557 | } |
5683 | 5558 | ||
5684 | spin_lock(&kvm->mmu_lock); | 5559 | spin_lock(&kvm->mmu_lock); |
5685 | if (!kvm->arch.n_requested_mmu_pages) { | 5560 | if (!kvm->arch.n_requested_mmu_pages) { |
5686 | unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); | 5561 | unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); |
5687 | kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); | 5562 | kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); |
5688 | } | 5563 | } |
5689 | 5564 | ||
5690 | kvm_mmu_slot_remove_write_access(kvm, mem->slot); | 5565 | kvm_mmu_slot_remove_write_access(kvm, mem->slot); |
5691 | spin_unlock(&kvm->mmu_lock); | 5566 | spin_unlock(&kvm->mmu_lock); |
5692 | } | 5567 | } |
5693 | 5568 | ||
5694 | void kvm_arch_flush_shadow(struct kvm *kvm) | 5569 | void kvm_arch_flush_shadow(struct kvm *kvm) |
5695 | { | 5570 | { |
5696 | kvm_mmu_zap_all(kvm); | 5571 | kvm_mmu_zap_all(kvm); |
5697 | kvm_reload_remote_mmus(kvm); | 5572 | kvm_reload_remote_mmus(kvm); |
5698 | } | 5573 | } |
5699 | 5574 | ||
5700 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) | 5575 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) |
5701 | { | 5576 | { |
5702 | return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE | 5577 | return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE |
5703 | || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED | 5578 | || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED |
5704 | || vcpu->arch.nmi_pending || | 5579 | || vcpu->arch.nmi_pending || |
5705 | (kvm_arch_interrupt_allowed(vcpu) && | 5580 | (kvm_arch_interrupt_allowed(vcpu) && |
5706 | kvm_cpu_has_interrupt(vcpu)); | 5581 | kvm_cpu_has_interrupt(vcpu)); |
5707 | } | 5582 | } |
5708 | 5583 | ||
5709 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu) | 5584 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu) |
5710 | { | 5585 | { |
5711 | int me; | 5586 | int me; |
5712 | int cpu = vcpu->cpu; | 5587 | int cpu = vcpu->cpu; |
5713 | 5588 | ||
5714 | if (waitqueue_active(&vcpu->wq)) { | 5589 | if (waitqueue_active(&vcpu->wq)) { |
5715 | wake_up_interruptible(&vcpu->wq); | 5590 | wake_up_interruptible(&vcpu->wq); |
5716 | ++vcpu->stat.halt_wakeup; | 5591 | ++vcpu->stat.halt_wakeup; |
5717 | } | 5592 | } |
5718 | 5593 | ||
5719 | me = get_cpu(); | 5594 | me = get_cpu(); |
5720 | if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) | 5595 | if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) |
5721 | if (atomic_xchg(&vcpu->guest_mode, 0)) | 5596 | if (atomic_xchg(&vcpu->guest_mode, 0)) |
5722 | smp_send_reschedule(cpu); | 5597 | smp_send_reschedule(cpu); |
5723 | put_cpu(); | 5598 | put_cpu(); |
5724 | } | 5599 | } |
5725 | 5600 | ||
5726 | int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) | 5601 | int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) |
5727 | { | 5602 | { |
5728 | return kvm_x86_ops->interrupt_allowed(vcpu); | 5603 | return kvm_x86_ops->interrupt_allowed(vcpu); |
5729 | } | 5604 | } |
5730 | 5605 | ||
5731 | bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip) | 5606 | bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip) |
5732 | { | 5607 | { |
5733 | unsigned long current_rip = kvm_rip_read(vcpu) + | 5608 | unsigned long current_rip = kvm_rip_read(vcpu) + |
5734 | get_segment_base(vcpu, VCPU_SREG_CS); | 5609 | get_segment_base(vcpu, VCPU_SREG_CS); |
5735 | 5610 | ||
5736 | return current_rip == linear_rip; | 5611 | return current_rip == linear_rip; |
5737 | } | 5612 | } |
5738 | EXPORT_SYMBOL_GPL(kvm_is_linear_rip); | 5613 | EXPORT_SYMBOL_GPL(kvm_is_linear_rip); |
5739 | 5614 | ||
5740 | unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) | 5615 | unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) |
5741 | { | 5616 | { |
5742 | unsigned long rflags; | 5617 | unsigned long rflags; |
5743 | 5618 | ||
5744 | rflags = kvm_x86_ops->get_rflags(vcpu); | 5619 | rflags = kvm_x86_ops->get_rflags(vcpu); |
5745 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) | 5620 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) |
5746 | rflags &= ~X86_EFLAGS_TF; | 5621 | rflags &= ~X86_EFLAGS_TF; |
5747 | return rflags; | 5622 | return rflags; |
5748 | } | 5623 | } |
5749 | EXPORT_SYMBOL_GPL(kvm_get_rflags); | 5624 | EXPORT_SYMBOL_GPL(kvm_get_rflags); |
5750 | 5625 | ||
5751 | void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) | 5626 | void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) |
5752 | { | 5627 | { |
5753 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP && | 5628 | if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP && |
5754 | kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip)) | 5629 | kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip)) |
5755 | rflags |= X86_EFLAGS_TF; | 5630 | rflags |= X86_EFLAGS_TF; |
5756 | kvm_x86_ops->set_rflags(vcpu, rflags); | 5631 | kvm_x86_ops->set_rflags(vcpu, rflags); |
5757 | } | 5632 | } |
5758 | EXPORT_SYMBOL_GPL(kvm_set_rflags); | 5633 | EXPORT_SYMBOL_GPL(kvm_set_rflags); |
5759 | 5634 | ||
5760 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); | 5635 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); |
5761 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); | 5636 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); |
5762 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); | 5637 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); |
5763 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); | 5638 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); |
5764 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); | 5639 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); |
5765 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); | 5640 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); |
5766 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); | 5641 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); |
5767 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); | 5642 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); |
5768 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit); | 5643 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit); |
5769 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga); | 5644 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga); |
5770 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit); | 5645 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit); |
5771 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts); | 5646 | EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts); |
5772 | 5647 |
arch/x86/kvm/x86.h
1 | #ifndef ARCH_X86_KVM_X86_H | 1 | #ifndef ARCH_X86_KVM_X86_H |
2 | #define ARCH_X86_KVM_X86_H | 2 | #define ARCH_X86_KVM_X86_H |
3 | 3 | ||
4 | #include <linux/kvm_host.h> | 4 | #include <linux/kvm_host.h> |
5 | #include "kvm_cache_regs.h" | 5 | #include "kvm_cache_regs.h" |
6 | 6 | ||
7 | static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) | 7 | static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) |
8 | { | 8 | { |
9 | vcpu->arch.exception.pending = false; | 9 | vcpu->arch.exception.pending = false; |
10 | } | 10 | } |
11 | 11 | ||
12 | static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector, | 12 | static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector, |
13 | bool soft) | 13 | bool soft) |
14 | { | 14 | { |
15 | vcpu->arch.interrupt.pending = true; | 15 | vcpu->arch.interrupt.pending = true; |
16 | vcpu->arch.interrupt.soft = soft; | 16 | vcpu->arch.interrupt.soft = soft; |
17 | vcpu->arch.interrupt.nr = vector; | 17 | vcpu->arch.interrupt.nr = vector; |
18 | } | 18 | } |
19 | 19 | ||
20 | static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu) | 20 | static inline void kvm_clear_interrupt_queue(struct kvm_vcpu *vcpu) |
21 | { | 21 | { |
22 | vcpu->arch.interrupt.pending = false; | 22 | vcpu->arch.interrupt.pending = false; |
23 | } | 23 | } |
24 | 24 | ||
25 | static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu) | 25 | static inline bool kvm_event_needs_reinjection(struct kvm_vcpu *vcpu) |
26 | { | 26 | { |
27 | return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending || | 27 | return vcpu->arch.exception.pending || vcpu->arch.interrupt.pending || |
28 | vcpu->arch.nmi_injected; | 28 | vcpu->arch.nmi_injected; |
29 | } | 29 | } |
30 | 30 | ||
31 | static inline bool kvm_exception_is_soft(unsigned int nr) | 31 | static inline bool kvm_exception_is_soft(unsigned int nr) |
32 | { | 32 | { |
33 | return (nr == BP_VECTOR) || (nr == OF_VECTOR); | 33 | return (nr == BP_VECTOR) || (nr == OF_VECTOR); |
34 | } | 34 | } |
35 | 35 | ||
36 | struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, | 36 | struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, |
37 | u32 function, u32 index); | 37 | u32 function, u32 index); |
38 | 38 | ||
39 | static inline bool is_protmode(struct kvm_vcpu *vcpu) | 39 | static inline bool is_protmode(struct kvm_vcpu *vcpu) |
40 | { | 40 | { |
41 | return kvm_read_cr0_bits(vcpu, X86_CR0_PE); | 41 | return kvm_read_cr0_bits(vcpu, X86_CR0_PE); |
42 | } | 42 | } |
43 | 43 | ||
44 | static inline int is_long_mode(struct kvm_vcpu *vcpu) | 44 | static inline int is_long_mode(struct kvm_vcpu *vcpu) |
45 | { | 45 | { |
46 | #ifdef CONFIG_X86_64 | 46 | #ifdef CONFIG_X86_64 |
47 | return vcpu->arch.efer & EFER_LMA; | 47 | return vcpu->arch.efer & EFER_LMA; |
48 | #else | 48 | #else |
49 | return 0; | 49 | return 0; |
50 | #endif | 50 | #endif |
51 | } | 51 | } |
52 | 52 | ||
53 | static inline int is_pae(struct kvm_vcpu *vcpu) | 53 | static inline int is_pae(struct kvm_vcpu *vcpu) |
54 | { | 54 | { |
55 | return kvm_read_cr4_bits(vcpu, X86_CR4_PAE); | 55 | return kvm_read_cr4_bits(vcpu, X86_CR4_PAE); |
56 | } | 56 | } |
57 | 57 | ||
58 | static inline int is_pse(struct kvm_vcpu *vcpu) | 58 | static inline int is_pse(struct kvm_vcpu *vcpu) |
59 | { | 59 | { |
60 | return kvm_read_cr4_bits(vcpu, X86_CR4_PSE); | 60 | return kvm_read_cr4_bits(vcpu, X86_CR4_PSE); |
61 | } | 61 | } |
62 | 62 | ||
63 | static inline int is_paging(struct kvm_vcpu *vcpu) | 63 | static inline int is_paging(struct kvm_vcpu *vcpu) |
64 | { | 64 | { |
65 | return kvm_read_cr0_bits(vcpu, X86_CR0_PG); | 65 | return kvm_read_cr0_bits(vcpu, X86_CR0_PG); |
66 | } | 66 | } |
67 | 67 | ||
68 | static inline struct kvm_mem_aliases *kvm_aliases(struct kvm *kvm) | ||
69 | { | ||
70 | return rcu_dereference_check(kvm->arch.aliases, | ||
71 | srcu_read_lock_held(&kvm->srcu) | ||
72 | || lockdep_is_held(&kvm->slots_lock)); | ||
73 | } | ||
74 | |||
75 | void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); | 68 | void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); |
76 | void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); | 69 | void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); |
77 | 70 | ||
78 | #endif | 71 | #endif |
79 | 72 |
include/linux/kvm.h
1 | #ifndef __LINUX_KVM_H | 1 | #ifndef __LINUX_KVM_H |
2 | #define __LINUX_KVM_H | 2 | #define __LINUX_KVM_H |
3 | 3 | ||
4 | /* | 4 | /* |
5 | * Userspace interface for /dev/kvm - kernel based virtual machine | 5 | * Userspace interface for /dev/kvm - kernel based virtual machine |
6 | * | 6 | * |
7 | * Note: you must update KVM_API_VERSION if you change this interface. | 7 | * Note: you must update KVM_API_VERSION if you change this interface. |
8 | */ | 8 | */ |
9 | 9 | ||
10 | #include <linux/types.h> | 10 | #include <linux/types.h> |
11 | #include <linux/compiler.h> | 11 | #include <linux/compiler.h> |
12 | #include <linux/ioctl.h> | 12 | #include <linux/ioctl.h> |
13 | #include <asm/kvm.h> | 13 | #include <asm/kvm.h> |
14 | 14 | ||
15 | #define KVM_API_VERSION 12 | 15 | #define KVM_API_VERSION 12 |
16 | 16 | ||
17 | /* *** Deprecated interfaces *** */ | 17 | /* *** Deprecated interfaces *** */ |
18 | 18 | ||
19 | #define KVM_TRC_SHIFT 16 | 19 | #define KVM_TRC_SHIFT 16 |
20 | 20 | ||
21 | #define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) | 21 | #define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) |
22 | #define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) | 22 | #define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) |
23 | 23 | ||
24 | #define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) | 24 | #define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) |
25 | #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) | 25 | #define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) |
26 | #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) | 26 | #define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) |
27 | 27 | ||
28 | #define KVM_TRC_HEAD_SIZE 12 | 28 | #define KVM_TRC_HEAD_SIZE 12 |
29 | #define KVM_TRC_CYCLE_SIZE 8 | 29 | #define KVM_TRC_CYCLE_SIZE 8 |
30 | #define KVM_TRC_EXTRA_MAX 7 | 30 | #define KVM_TRC_EXTRA_MAX 7 |
31 | 31 | ||
32 | #define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02) | 32 | #define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02) |
33 | #define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03) | 33 | #define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03) |
34 | #define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04) | 34 | #define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04) |
35 | #define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05) | 35 | #define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05) |
36 | #define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06) | 36 | #define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06) |
37 | #define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07) | 37 | #define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07) |
38 | #define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08) | 38 | #define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08) |
39 | #define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09) | 39 | #define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09) |
40 | #define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A) | 40 | #define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A) |
41 | #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) | 41 | #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) |
42 | #define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C) | 42 | #define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C) |
43 | #define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D) | 43 | #define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D) |
44 | #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) | 44 | #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) |
45 | #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) | 45 | #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) |
46 | #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) | 46 | #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) |
47 | #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) | 47 | #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) |
48 | #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) | 48 | #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) |
49 | #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) | 49 | #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) |
50 | #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) | 50 | #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) |
51 | #define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15) | 51 | #define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15) |
52 | #define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16) | 52 | #define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16) |
53 | #define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17) | 53 | #define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17) |
54 | #define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18) | 54 | #define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18) |
55 | #define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19) | 55 | #define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19) |
56 | 56 | ||
57 | struct kvm_user_trace_setup { | 57 | struct kvm_user_trace_setup { |
58 | __u32 buf_size; | 58 | __u32 buf_size; |
59 | __u32 buf_nr; | 59 | __u32 buf_nr; |
60 | }; | 60 | }; |
61 | 61 | ||
62 | #define __KVM_DEPRECATED_MAIN_W_0x06 \ | 62 | #define __KVM_DEPRECATED_MAIN_W_0x06 \ |
63 | _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) | 63 | _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) |
64 | #define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07) | 64 | #define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07) |
65 | #define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08) | 65 | #define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08) |
66 | 66 | ||
67 | #define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq) | 67 | #define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq) |
68 | 68 | ||
69 | struct kvm_breakpoint { | 69 | struct kvm_breakpoint { |
70 | __u32 enabled; | 70 | __u32 enabled; |
71 | __u32 padding; | 71 | __u32 padding; |
72 | __u64 address; | 72 | __u64 address; |
73 | }; | 73 | }; |
74 | 74 | ||
75 | struct kvm_debug_guest { | 75 | struct kvm_debug_guest { |
76 | __u32 enabled; | 76 | __u32 enabled; |
77 | __u32 pad; | 77 | __u32 pad; |
78 | struct kvm_breakpoint breakpoints[4]; | 78 | struct kvm_breakpoint breakpoints[4]; |
79 | __u32 singlestep; | 79 | __u32 singlestep; |
80 | }; | 80 | }; |
81 | 81 | ||
82 | #define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest) | 82 | #define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest) |
83 | 83 | ||
84 | /* *** End of deprecated interfaces *** */ | 84 | /* *** End of deprecated interfaces *** */ |
85 | 85 | ||
86 | 86 | ||
87 | /* for KVM_CREATE_MEMORY_REGION */ | 87 | /* for KVM_CREATE_MEMORY_REGION */ |
88 | struct kvm_memory_region { | 88 | struct kvm_memory_region { |
89 | __u32 slot; | 89 | __u32 slot; |
90 | __u32 flags; | 90 | __u32 flags; |
91 | __u64 guest_phys_addr; | 91 | __u64 guest_phys_addr; |
92 | __u64 memory_size; /* bytes */ | 92 | __u64 memory_size; /* bytes */ |
93 | }; | 93 | }; |
94 | 94 | ||
95 | /* for KVM_SET_USER_MEMORY_REGION */ | 95 | /* for KVM_SET_USER_MEMORY_REGION */ |
96 | struct kvm_userspace_memory_region { | 96 | struct kvm_userspace_memory_region { |
97 | __u32 slot; | 97 | __u32 slot; |
98 | __u32 flags; | 98 | __u32 flags; |
99 | __u64 guest_phys_addr; | 99 | __u64 guest_phys_addr; |
100 | __u64 memory_size; /* bytes */ | 100 | __u64 memory_size; /* bytes */ |
101 | __u64 userspace_addr; /* start of the userspace allocated memory */ | 101 | __u64 userspace_addr; /* start of the userspace allocated memory */ |
102 | }; | 102 | }; |
103 | 103 | ||
104 | /* for kvm_memory_region::flags */ | 104 | /* for kvm_memory_region::flags */ |
105 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | 105 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL |
106 | #define KVM_MEMSLOT_INVALID (1UL << 1) | 106 | #define KVM_MEMSLOT_INVALID (1UL << 1) |
107 | 107 | ||
108 | /* for KVM_IRQ_LINE */ | 108 | /* for KVM_IRQ_LINE */ |
109 | struct kvm_irq_level { | 109 | struct kvm_irq_level { |
110 | /* | 110 | /* |
111 | * ACPI gsi notion of irq. | 111 | * ACPI gsi notion of irq. |
112 | * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47.. | 112 | * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47.. |
113 | * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23.. | 113 | * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23.. |
114 | */ | 114 | */ |
115 | union { | 115 | union { |
116 | __u32 irq; | 116 | __u32 irq; |
117 | __s32 status; | 117 | __s32 status; |
118 | }; | 118 | }; |
119 | __u32 level; | 119 | __u32 level; |
120 | }; | 120 | }; |
121 | 121 | ||
122 | 122 | ||
123 | struct kvm_irqchip { | 123 | struct kvm_irqchip { |
124 | __u32 chip_id; | 124 | __u32 chip_id; |
125 | __u32 pad; | 125 | __u32 pad; |
126 | union { | 126 | union { |
127 | char dummy[512]; /* reserving space */ | 127 | char dummy[512]; /* reserving space */ |
128 | #ifdef __KVM_HAVE_PIT | 128 | #ifdef __KVM_HAVE_PIT |
129 | struct kvm_pic_state pic; | 129 | struct kvm_pic_state pic; |
130 | #endif | 130 | #endif |
131 | #ifdef __KVM_HAVE_IOAPIC | 131 | #ifdef __KVM_HAVE_IOAPIC |
132 | struct kvm_ioapic_state ioapic; | 132 | struct kvm_ioapic_state ioapic; |
133 | #endif | 133 | #endif |
134 | } chip; | 134 | } chip; |
135 | }; | 135 | }; |
136 | 136 | ||
137 | /* for KVM_CREATE_PIT2 */ | 137 | /* for KVM_CREATE_PIT2 */ |
138 | struct kvm_pit_config { | 138 | struct kvm_pit_config { |
139 | __u32 flags; | 139 | __u32 flags; |
140 | __u32 pad[15]; | 140 | __u32 pad[15]; |
141 | }; | 141 | }; |
142 | 142 | ||
143 | #define KVM_PIT_SPEAKER_DUMMY 1 | 143 | #define KVM_PIT_SPEAKER_DUMMY 1 |
144 | 144 | ||
145 | #define KVM_EXIT_UNKNOWN 0 | 145 | #define KVM_EXIT_UNKNOWN 0 |
146 | #define KVM_EXIT_EXCEPTION 1 | 146 | #define KVM_EXIT_EXCEPTION 1 |
147 | #define KVM_EXIT_IO 2 | 147 | #define KVM_EXIT_IO 2 |
148 | #define KVM_EXIT_HYPERCALL 3 | 148 | #define KVM_EXIT_HYPERCALL 3 |
149 | #define KVM_EXIT_DEBUG 4 | 149 | #define KVM_EXIT_DEBUG 4 |
150 | #define KVM_EXIT_HLT 5 | 150 | #define KVM_EXIT_HLT 5 |
151 | #define KVM_EXIT_MMIO 6 | 151 | #define KVM_EXIT_MMIO 6 |
152 | #define KVM_EXIT_IRQ_WINDOW_OPEN 7 | 152 | #define KVM_EXIT_IRQ_WINDOW_OPEN 7 |
153 | #define KVM_EXIT_SHUTDOWN 8 | 153 | #define KVM_EXIT_SHUTDOWN 8 |
154 | #define KVM_EXIT_FAIL_ENTRY 9 | 154 | #define KVM_EXIT_FAIL_ENTRY 9 |
155 | #define KVM_EXIT_INTR 10 | 155 | #define KVM_EXIT_INTR 10 |
156 | #define KVM_EXIT_SET_TPR 11 | 156 | #define KVM_EXIT_SET_TPR 11 |
157 | #define KVM_EXIT_TPR_ACCESS 12 | 157 | #define KVM_EXIT_TPR_ACCESS 12 |
158 | #define KVM_EXIT_S390_SIEIC 13 | 158 | #define KVM_EXIT_S390_SIEIC 13 |
159 | #define KVM_EXIT_S390_RESET 14 | 159 | #define KVM_EXIT_S390_RESET 14 |
160 | #define KVM_EXIT_DCR 15 | 160 | #define KVM_EXIT_DCR 15 |
161 | #define KVM_EXIT_NMI 16 | 161 | #define KVM_EXIT_NMI 16 |
162 | #define KVM_EXIT_INTERNAL_ERROR 17 | 162 | #define KVM_EXIT_INTERNAL_ERROR 17 |
163 | #define KVM_EXIT_OSI 18 | 163 | #define KVM_EXIT_OSI 18 |
164 | 164 | ||
165 | /* For KVM_EXIT_INTERNAL_ERROR */ | 165 | /* For KVM_EXIT_INTERNAL_ERROR */ |
166 | #define KVM_INTERNAL_ERROR_EMULATION 1 | 166 | #define KVM_INTERNAL_ERROR_EMULATION 1 |
167 | #define KVM_INTERNAL_ERROR_SIMUL_EX 2 | 167 | #define KVM_INTERNAL_ERROR_SIMUL_EX 2 |
168 | 168 | ||
169 | /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ | 169 | /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ |
170 | struct kvm_run { | 170 | struct kvm_run { |
171 | /* in */ | 171 | /* in */ |
172 | __u8 request_interrupt_window; | 172 | __u8 request_interrupt_window; |
173 | __u8 padding1[7]; | 173 | __u8 padding1[7]; |
174 | 174 | ||
175 | /* out */ | 175 | /* out */ |
176 | __u32 exit_reason; | 176 | __u32 exit_reason; |
177 | __u8 ready_for_interrupt_injection; | 177 | __u8 ready_for_interrupt_injection; |
178 | __u8 if_flag; | 178 | __u8 if_flag; |
179 | __u8 padding2[2]; | 179 | __u8 padding2[2]; |
180 | 180 | ||
181 | /* in (pre_kvm_run), out (post_kvm_run) */ | 181 | /* in (pre_kvm_run), out (post_kvm_run) */ |
182 | __u64 cr8; | 182 | __u64 cr8; |
183 | __u64 apic_base; | 183 | __u64 apic_base; |
184 | 184 | ||
185 | #ifdef __KVM_S390 | 185 | #ifdef __KVM_S390 |
186 | /* the processor status word for s390 */ | 186 | /* the processor status word for s390 */ |
187 | __u64 psw_mask; /* psw upper half */ | 187 | __u64 psw_mask; /* psw upper half */ |
188 | __u64 psw_addr; /* psw lower half */ | 188 | __u64 psw_addr; /* psw lower half */ |
189 | #endif | 189 | #endif |
190 | union { | 190 | union { |
191 | /* KVM_EXIT_UNKNOWN */ | 191 | /* KVM_EXIT_UNKNOWN */ |
192 | struct { | 192 | struct { |
193 | __u64 hardware_exit_reason; | 193 | __u64 hardware_exit_reason; |
194 | } hw; | 194 | } hw; |
195 | /* KVM_EXIT_FAIL_ENTRY */ | 195 | /* KVM_EXIT_FAIL_ENTRY */ |
196 | struct { | 196 | struct { |
197 | __u64 hardware_entry_failure_reason; | 197 | __u64 hardware_entry_failure_reason; |
198 | } fail_entry; | 198 | } fail_entry; |
199 | /* KVM_EXIT_EXCEPTION */ | 199 | /* KVM_EXIT_EXCEPTION */ |
200 | struct { | 200 | struct { |
201 | __u32 exception; | 201 | __u32 exception; |
202 | __u32 error_code; | 202 | __u32 error_code; |
203 | } ex; | 203 | } ex; |
204 | /* KVM_EXIT_IO */ | 204 | /* KVM_EXIT_IO */ |
205 | struct { | 205 | struct { |
206 | #define KVM_EXIT_IO_IN 0 | 206 | #define KVM_EXIT_IO_IN 0 |
207 | #define KVM_EXIT_IO_OUT 1 | 207 | #define KVM_EXIT_IO_OUT 1 |
208 | __u8 direction; | 208 | __u8 direction; |
209 | __u8 size; /* bytes */ | 209 | __u8 size; /* bytes */ |
210 | __u16 port; | 210 | __u16 port; |
211 | __u32 count; | 211 | __u32 count; |
212 | __u64 data_offset; /* relative to kvm_run start */ | 212 | __u64 data_offset; /* relative to kvm_run start */ |
213 | } io; | 213 | } io; |
214 | struct { | 214 | struct { |
215 | struct kvm_debug_exit_arch arch; | 215 | struct kvm_debug_exit_arch arch; |
216 | } debug; | 216 | } debug; |
217 | /* KVM_EXIT_MMIO */ | 217 | /* KVM_EXIT_MMIO */ |
218 | struct { | 218 | struct { |
219 | __u64 phys_addr; | 219 | __u64 phys_addr; |
220 | __u8 data[8]; | 220 | __u8 data[8]; |
221 | __u32 len; | 221 | __u32 len; |
222 | __u8 is_write; | 222 | __u8 is_write; |
223 | } mmio; | 223 | } mmio; |
224 | /* KVM_EXIT_HYPERCALL */ | 224 | /* KVM_EXIT_HYPERCALL */ |
225 | struct { | 225 | struct { |
226 | __u64 nr; | 226 | __u64 nr; |
227 | __u64 args[6]; | 227 | __u64 args[6]; |
228 | __u64 ret; | 228 | __u64 ret; |
229 | __u32 longmode; | 229 | __u32 longmode; |
230 | __u32 pad; | 230 | __u32 pad; |
231 | } hypercall; | 231 | } hypercall; |
232 | /* KVM_EXIT_TPR_ACCESS */ | 232 | /* KVM_EXIT_TPR_ACCESS */ |
233 | struct { | 233 | struct { |
234 | __u64 rip; | 234 | __u64 rip; |
235 | __u32 is_write; | 235 | __u32 is_write; |
236 | __u32 pad; | 236 | __u32 pad; |
237 | } tpr_access; | 237 | } tpr_access; |
238 | /* KVM_EXIT_S390_SIEIC */ | 238 | /* KVM_EXIT_S390_SIEIC */ |
239 | struct { | 239 | struct { |
240 | __u8 icptcode; | 240 | __u8 icptcode; |
241 | __u16 ipa; | 241 | __u16 ipa; |
242 | __u32 ipb; | 242 | __u32 ipb; |
243 | } s390_sieic; | 243 | } s390_sieic; |
244 | /* KVM_EXIT_S390_RESET */ | 244 | /* KVM_EXIT_S390_RESET */ |
245 | #define KVM_S390_RESET_POR 1 | 245 | #define KVM_S390_RESET_POR 1 |
246 | #define KVM_S390_RESET_CLEAR 2 | 246 | #define KVM_S390_RESET_CLEAR 2 |
247 | #define KVM_S390_RESET_SUBSYSTEM 4 | 247 | #define KVM_S390_RESET_SUBSYSTEM 4 |
248 | #define KVM_S390_RESET_CPU_INIT 8 | 248 | #define KVM_S390_RESET_CPU_INIT 8 |
249 | #define KVM_S390_RESET_IPL 16 | 249 | #define KVM_S390_RESET_IPL 16 |
250 | __u64 s390_reset_flags; | 250 | __u64 s390_reset_flags; |
251 | /* KVM_EXIT_DCR */ | 251 | /* KVM_EXIT_DCR */ |
252 | struct { | 252 | struct { |
253 | __u32 dcrn; | 253 | __u32 dcrn; |
254 | __u32 data; | 254 | __u32 data; |
255 | __u8 is_write; | 255 | __u8 is_write; |
256 | } dcr; | 256 | } dcr; |
257 | struct { | 257 | struct { |
258 | __u32 suberror; | 258 | __u32 suberror; |
259 | /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */ | 259 | /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */ |
260 | __u32 ndata; | 260 | __u32 ndata; |
261 | __u64 data[16]; | 261 | __u64 data[16]; |
262 | } internal; | 262 | } internal; |
263 | /* KVM_EXIT_OSI */ | 263 | /* KVM_EXIT_OSI */ |
264 | struct { | 264 | struct { |
265 | __u64 gprs[32]; | 265 | __u64 gprs[32]; |
266 | } osi; | 266 | } osi; |
267 | /* Fix the size of the union. */ | 267 | /* Fix the size of the union. */ |
268 | char padding[256]; | 268 | char padding[256]; |
269 | }; | 269 | }; |
270 | }; | 270 | }; |
271 | 271 | ||
272 | /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ | 272 | /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ |
273 | 273 | ||
274 | struct kvm_coalesced_mmio_zone { | 274 | struct kvm_coalesced_mmio_zone { |
275 | __u64 addr; | 275 | __u64 addr; |
276 | __u32 size; | 276 | __u32 size; |
277 | __u32 pad; | 277 | __u32 pad; |
278 | }; | 278 | }; |
279 | 279 | ||
280 | struct kvm_coalesced_mmio { | 280 | struct kvm_coalesced_mmio { |
281 | __u64 phys_addr; | 281 | __u64 phys_addr; |
282 | __u32 len; | 282 | __u32 len; |
283 | __u32 pad; | 283 | __u32 pad; |
284 | __u8 data[8]; | 284 | __u8 data[8]; |
285 | }; | 285 | }; |
286 | 286 | ||
287 | struct kvm_coalesced_mmio_ring { | 287 | struct kvm_coalesced_mmio_ring { |
288 | __u32 first, last; | 288 | __u32 first, last; |
289 | struct kvm_coalesced_mmio coalesced_mmio[0]; | 289 | struct kvm_coalesced_mmio coalesced_mmio[0]; |
290 | }; | 290 | }; |
291 | 291 | ||
292 | #define KVM_COALESCED_MMIO_MAX \ | 292 | #define KVM_COALESCED_MMIO_MAX \ |
293 | ((PAGE_SIZE - sizeof(struct kvm_coalesced_mmio_ring)) / \ | 293 | ((PAGE_SIZE - sizeof(struct kvm_coalesced_mmio_ring)) / \ |
294 | sizeof(struct kvm_coalesced_mmio)) | 294 | sizeof(struct kvm_coalesced_mmio)) |
295 | 295 | ||
296 | /* for KVM_TRANSLATE */ | 296 | /* for KVM_TRANSLATE */ |
297 | struct kvm_translation { | 297 | struct kvm_translation { |
298 | /* in */ | 298 | /* in */ |
299 | __u64 linear_address; | 299 | __u64 linear_address; |
300 | 300 | ||
301 | /* out */ | 301 | /* out */ |
302 | __u64 physical_address; | 302 | __u64 physical_address; |
303 | __u8 valid; | 303 | __u8 valid; |
304 | __u8 writeable; | 304 | __u8 writeable; |
305 | __u8 usermode; | 305 | __u8 usermode; |
306 | __u8 pad[5]; | 306 | __u8 pad[5]; |
307 | }; | 307 | }; |
308 | 308 | ||
309 | /* for KVM_INTERRUPT */ | 309 | /* for KVM_INTERRUPT */ |
310 | struct kvm_interrupt { | 310 | struct kvm_interrupt { |
311 | /* in */ | 311 | /* in */ |
312 | __u32 irq; | 312 | __u32 irq; |
313 | }; | 313 | }; |
314 | 314 | ||
315 | /* for KVM_GET_DIRTY_LOG */ | 315 | /* for KVM_GET_DIRTY_LOG */ |
316 | struct kvm_dirty_log { | 316 | struct kvm_dirty_log { |
317 | __u32 slot; | 317 | __u32 slot; |
318 | __u32 padding1; | 318 | __u32 padding1; |
319 | union { | 319 | union { |
320 | void __user *dirty_bitmap; /* one bit per page */ | 320 | void __user *dirty_bitmap; /* one bit per page */ |
321 | __u64 padding2; | 321 | __u64 padding2; |
322 | }; | 322 | }; |
323 | }; | 323 | }; |
324 | 324 | ||
325 | /* for KVM_SET_SIGNAL_MASK */ | 325 | /* for KVM_SET_SIGNAL_MASK */ |
326 | struct kvm_signal_mask { | 326 | struct kvm_signal_mask { |
327 | __u32 len; | 327 | __u32 len; |
328 | __u8 sigset[0]; | 328 | __u8 sigset[0]; |
329 | }; | 329 | }; |
330 | 330 | ||
331 | /* for KVM_TPR_ACCESS_REPORTING */ | 331 | /* for KVM_TPR_ACCESS_REPORTING */ |
332 | struct kvm_tpr_access_ctl { | 332 | struct kvm_tpr_access_ctl { |
333 | __u32 enabled; | 333 | __u32 enabled; |
334 | __u32 flags; | 334 | __u32 flags; |
335 | __u32 reserved[8]; | 335 | __u32 reserved[8]; |
336 | }; | 336 | }; |
337 | 337 | ||
338 | /* for KVM_SET_VAPIC_ADDR */ | 338 | /* for KVM_SET_VAPIC_ADDR */ |
339 | struct kvm_vapic_addr { | 339 | struct kvm_vapic_addr { |
340 | __u64 vapic_addr; | 340 | __u64 vapic_addr; |
341 | }; | 341 | }; |
342 | 342 | ||
343 | /* for KVM_SET_MPSTATE */ | 343 | /* for KVM_SET_MPSTATE */ |
344 | 344 | ||
345 | #define KVM_MP_STATE_RUNNABLE 0 | 345 | #define KVM_MP_STATE_RUNNABLE 0 |
346 | #define KVM_MP_STATE_UNINITIALIZED 1 | 346 | #define KVM_MP_STATE_UNINITIALIZED 1 |
347 | #define KVM_MP_STATE_INIT_RECEIVED 2 | 347 | #define KVM_MP_STATE_INIT_RECEIVED 2 |
348 | #define KVM_MP_STATE_HALTED 3 | 348 | #define KVM_MP_STATE_HALTED 3 |
349 | #define KVM_MP_STATE_SIPI_RECEIVED 4 | 349 | #define KVM_MP_STATE_SIPI_RECEIVED 4 |
350 | 350 | ||
351 | struct kvm_mp_state { | 351 | struct kvm_mp_state { |
352 | __u32 mp_state; | 352 | __u32 mp_state; |
353 | }; | 353 | }; |
354 | 354 | ||
355 | struct kvm_s390_psw { | 355 | struct kvm_s390_psw { |
356 | __u64 mask; | 356 | __u64 mask; |
357 | __u64 addr; | 357 | __u64 addr; |
358 | }; | 358 | }; |
359 | 359 | ||
360 | /* valid values for type in kvm_s390_interrupt */ | 360 | /* valid values for type in kvm_s390_interrupt */ |
361 | #define KVM_S390_SIGP_STOP 0xfffe0000u | 361 | #define KVM_S390_SIGP_STOP 0xfffe0000u |
362 | #define KVM_S390_PROGRAM_INT 0xfffe0001u | 362 | #define KVM_S390_PROGRAM_INT 0xfffe0001u |
363 | #define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u | 363 | #define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u |
364 | #define KVM_S390_RESTART 0xfffe0003u | 364 | #define KVM_S390_RESTART 0xfffe0003u |
365 | #define KVM_S390_INT_VIRTIO 0xffff2603u | 365 | #define KVM_S390_INT_VIRTIO 0xffff2603u |
366 | #define KVM_S390_INT_SERVICE 0xffff2401u | 366 | #define KVM_S390_INT_SERVICE 0xffff2401u |
367 | #define KVM_S390_INT_EMERGENCY 0xffff1201u | 367 | #define KVM_S390_INT_EMERGENCY 0xffff1201u |
368 | 368 | ||
369 | struct kvm_s390_interrupt { | 369 | struct kvm_s390_interrupt { |
370 | __u32 type; | 370 | __u32 type; |
371 | __u32 parm; | 371 | __u32 parm; |
372 | __u64 parm64; | 372 | __u64 parm64; |
373 | }; | 373 | }; |
374 | 374 | ||
375 | /* for KVM_SET_GUEST_DEBUG */ | 375 | /* for KVM_SET_GUEST_DEBUG */ |
376 | 376 | ||
377 | #define KVM_GUESTDBG_ENABLE 0x00000001 | 377 | #define KVM_GUESTDBG_ENABLE 0x00000001 |
378 | #define KVM_GUESTDBG_SINGLESTEP 0x00000002 | 378 | #define KVM_GUESTDBG_SINGLESTEP 0x00000002 |
379 | 379 | ||
380 | struct kvm_guest_debug { | 380 | struct kvm_guest_debug { |
381 | __u32 control; | 381 | __u32 control; |
382 | __u32 pad; | 382 | __u32 pad; |
383 | struct kvm_guest_debug_arch arch; | 383 | struct kvm_guest_debug_arch arch; |
384 | }; | 384 | }; |
385 | 385 | ||
386 | enum { | 386 | enum { |
387 | kvm_ioeventfd_flag_nr_datamatch, | 387 | kvm_ioeventfd_flag_nr_datamatch, |
388 | kvm_ioeventfd_flag_nr_pio, | 388 | kvm_ioeventfd_flag_nr_pio, |
389 | kvm_ioeventfd_flag_nr_deassign, | 389 | kvm_ioeventfd_flag_nr_deassign, |
390 | kvm_ioeventfd_flag_nr_max, | 390 | kvm_ioeventfd_flag_nr_max, |
391 | }; | 391 | }; |
392 | 392 | ||
393 | #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) | 393 | #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) |
394 | #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) | 394 | #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) |
395 | #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) | 395 | #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) |
396 | 396 | ||
397 | #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) | 397 | #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) |
398 | 398 | ||
399 | struct kvm_ioeventfd { | 399 | struct kvm_ioeventfd { |
400 | __u64 datamatch; | 400 | __u64 datamatch; |
401 | __u64 addr; /* legal pio/mmio address */ | 401 | __u64 addr; /* legal pio/mmio address */ |
402 | __u32 len; /* 1, 2, 4, or 8 bytes */ | 402 | __u32 len; /* 1, 2, 4, or 8 bytes */ |
403 | __s32 fd; | 403 | __s32 fd; |
404 | __u32 flags; | 404 | __u32 flags; |
405 | __u8 pad[36]; | 405 | __u8 pad[36]; |
406 | }; | 406 | }; |
407 | 407 | ||
408 | /* for KVM_ENABLE_CAP */ | 408 | /* for KVM_ENABLE_CAP */ |
409 | struct kvm_enable_cap { | 409 | struct kvm_enable_cap { |
410 | /* in */ | 410 | /* in */ |
411 | __u32 cap; | 411 | __u32 cap; |
412 | __u32 flags; | 412 | __u32 flags; |
413 | __u64 args[4]; | 413 | __u64 args[4]; |
414 | __u8 pad[64]; | 414 | __u8 pad[64]; |
415 | }; | 415 | }; |
416 | 416 | ||
417 | #define KVMIO 0xAE | 417 | #define KVMIO 0xAE |
418 | 418 | ||
419 | /* | 419 | /* |
420 | * ioctls for /dev/kvm fds: | 420 | * ioctls for /dev/kvm fds: |
421 | */ | 421 | */ |
422 | #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) | 422 | #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) |
423 | #define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */ | 423 | #define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */ |
424 | #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) | 424 | #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) |
425 | 425 | ||
426 | #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) | 426 | #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) |
427 | /* | 427 | /* |
428 | * Check if a kvm extension is available. Argument is extension number, | 428 | * Check if a kvm extension is available. Argument is extension number, |
429 | * return is 1 (yes) or 0 (no, sorry). | 429 | * return is 1 (yes) or 0 (no, sorry). |
430 | */ | 430 | */ |
431 | #define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03) | 431 | #define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03) |
432 | /* | 432 | /* |
433 | * Get size for mmap(vcpu_fd) | 433 | * Get size for mmap(vcpu_fd) |
434 | */ | 434 | */ |
435 | #define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */ | 435 | #define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */ |
436 | #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2) | 436 | #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2) |
437 | #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06 | 437 | #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06 |
438 | #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07 | 438 | #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07 |
439 | #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08 | 439 | #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08 |
440 | 440 | ||
441 | /* | 441 | /* |
442 | * Extension capability list. | 442 | * Extension capability list. |
443 | */ | 443 | */ |
444 | #define KVM_CAP_IRQCHIP 0 | 444 | #define KVM_CAP_IRQCHIP 0 |
445 | #define KVM_CAP_HLT 1 | 445 | #define KVM_CAP_HLT 1 |
446 | #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2 | 446 | #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2 |
447 | #define KVM_CAP_USER_MEMORY 3 | 447 | #define KVM_CAP_USER_MEMORY 3 |
448 | #define KVM_CAP_SET_TSS_ADDR 4 | 448 | #define KVM_CAP_SET_TSS_ADDR 4 |
449 | #define KVM_CAP_VAPIC 6 | 449 | #define KVM_CAP_VAPIC 6 |
450 | #define KVM_CAP_EXT_CPUID 7 | 450 | #define KVM_CAP_EXT_CPUID 7 |
451 | #define KVM_CAP_CLOCKSOURCE 8 | 451 | #define KVM_CAP_CLOCKSOURCE 8 |
452 | #define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */ | 452 | #define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */ |
453 | #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */ | 453 | #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */ |
454 | #define KVM_CAP_PIT 11 | 454 | #define KVM_CAP_PIT 11 |
455 | #define KVM_CAP_NOP_IO_DELAY 12 | 455 | #define KVM_CAP_NOP_IO_DELAY 12 |
456 | #define KVM_CAP_PV_MMU 13 | 456 | #define KVM_CAP_PV_MMU 13 |
457 | #define KVM_CAP_MP_STATE 14 | 457 | #define KVM_CAP_MP_STATE 14 |
458 | #define KVM_CAP_COALESCED_MMIO 15 | 458 | #define KVM_CAP_COALESCED_MMIO 15 |
459 | #define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */ | 459 | #define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */ |
460 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT | 460 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT |
461 | #define KVM_CAP_DEVICE_ASSIGNMENT 17 | 461 | #define KVM_CAP_DEVICE_ASSIGNMENT 17 |
462 | #endif | 462 | #endif |
463 | #define KVM_CAP_IOMMU 18 | 463 | #define KVM_CAP_IOMMU 18 |
464 | #ifdef __KVM_HAVE_MSI | 464 | #ifdef __KVM_HAVE_MSI |
465 | #define KVM_CAP_DEVICE_MSI 20 | 465 | #define KVM_CAP_DEVICE_MSI 20 |
466 | #endif | 466 | #endif |
467 | /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */ | 467 | /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */ |
468 | #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21 | 468 | #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21 |
469 | #ifdef __KVM_HAVE_USER_NMI | 469 | #ifdef __KVM_HAVE_USER_NMI |
470 | #define KVM_CAP_USER_NMI 22 | 470 | #define KVM_CAP_USER_NMI 22 |
471 | #endif | 471 | #endif |
472 | #ifdef __KVM_HAVE_GUEST_DEBUG | 472 | #ifdef __KVM_HAVE_GUEST_DEBUG |
473 | #define KVM_CAP_SET_GUEST_DEBUG 23 | 473 | #define KVM_CAP_SET_GUEST_DEBUG 23 |
474 | #endif | 474 | #endif |
475 | #ifdef __KVM_HAVE_PIT | 475 | #ifdef __KVM_HAVE_PIT |
476 | #define KVM_CAP_REINJECT_CONTROL 24 | 476 | #define KVM_CAP_REINJECT_CONTROL 24 |
477 | #endif | 477 | #endif |
478 | #ifdef __KVM_HAVE_IOAPIC | 478 | #ifdef __KVM_HAVE_IOAPIC |
479 | #define KVM_CAP_IRQ_ROUTING 25 | 479 | #define KVM_CAP_IRQ_ROUTING 25 |
480 | #endif | 480 | #endif |
481 | #define KVM_CAP_IRQ_INJECT_STATUS 26 | 481 | #define KVM_CAP_IRQ_INJECT_STATUS 26 |
482 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT | 482 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT |
483 | #define KVM_CAP_DEVICE_DEASSIGNMENT 27 | 483 | #define KVM_CAP_DEVICE_DEASSIGNMENT 27 |
484 | #endif | 484 | #endif |
485 | #ifdef __KVM_HAVE_MSIX | 485 | #ifdef __KVM_HAVE_MSIX |
486 | #define KVM_CAP_DEVICE_MSIX 28 | 486 | #define KVM_CAP_DEVICE_MSIX 28 |
487 | #endif | 487 | #endif |
488 | #define KVM_CAP_ASSIGN_DEV_IRQ 29 | 488 | #define KVM_CAP_ASSIGN_DEV_IRQ 29 |
489 | /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ | 489 | /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ |
490 | #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 | 490 | #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 |
491 | #ifdef __KVM_HAVE_MCE | 491 | #ifdef __KVM_HAVE_MCE |
492 | #define KVM_CAP_MCE 31 | 492 | #define KVM_CAP_MCE 31 |
493 | #endif | 493 | #endif |
494 | #define KVM_CAP_IRQFD 32 | 494 | #define KVM_CAP_IRQFD 32 |
495 | #ifdef __KVM_HAVE_PIT | 495 | #ifdef __KVM_HAVE_PIT |
496 | #define KVM_CAP_PIT2 33 | 496 | #define KVM_CAP_PIT2 33 |
497 | #endif | 497 | #endif |
498 | #define KVM_CAP_SET_BOOT_CPU_ID 34 | 498 | #define KVM_CAP_SET_BOOT_CPU_ID 34 |
499 | #ifdef __KVM_HAVE_PIT_STATE2 | 499 | #ifdef __KVM_HAVE_PIT_STATE2 |
500 | #define KVM_CAP_PIT_STATE2 35 | 500 | #define KVM_CAP_PIT_STATE2 35 |
501 | #endif | 501 | #endif |
502 | #define KVM_CAP_IOEVENTFD 36 | 502 | #define KVM_CAP_IOEVENTFD 36 |
503 | #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37 | 503 | #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37 |
504 | #ifdef __KVM_HAVE_XEN_HVM | 504 | #ifdef __KVM_HAVE_XEN_HVM |
505 | #define KVM_CAP_XEN_HVM 38 | 505 | #define KVM_CAP_XEN_HVM 38 |
506 | #endif | 506 | #endif |
507 | #define KVM_CAP_ADJUST_CLOCK 39 | 507 | #define KVM_CAP_ADJUST_CLOCK 39 |
508 | #define KVM_CAP_INTERNAL_ERROR_DATA 40 | 508 | #define KVM_CAP_INTERNAL_ERROR_DATA 40 |
509 | #ifdef __KVM_HAVE_VCPU_EVENTS | 509 | #ifdef __KVM_HAVE_VCPU_EVENTS |
510 | #define KVM_CAP_VCPU_EVENTS 41 | 510 | #define KVM_CAP_VCPU_EVENTS 41 |
511 | #endif | 511 | #endif |
512 | #define KVM_CAP_S390_PSW 42 | 512 | #define KVM_CAP_S390_PSW 42 |
513 | #define KVM_CAP_PPC_SEGSTATE 43 | 513 | #define KVM_CAP_PPC_SEGSTATE 43 |
514 | #define KVM_CAP_HYPERV 44 | 514 | #define KVM_CAP_HYPERV 44 |
515 | #define KVM_CAP_HYPERV_VAPIC 45 | 515 | #define KVM_CAP_HYPERV_VAPIC 45 |
516 | #define KVM_CAP_HYPERV_SPIN 46 | 516 | #define KVM_CAP_HYPERV_SPIN 46 |
517 | #define KVM_CAP_PCI_SEGMENT 47 | 517 | #define KVM_CAP_PCI_SEGMENT 47 |
518 | #define KVM_CAP_PPC_PAIRED_SINGLES 48 | 518 | #define KVM_CAP_PPC_PAIRED_SINGLES 48 |
519 | #define KVM_CAP_INTR_SHADOW 49 | 519 | #define KVM_CAP_INTR_SHADOW 49 |
520 | #ifdef __KVM_HAVE_DEBUGREGS | 520 | #ifdef __KVM_HAVE_DEBUGREGS |
521 | #define KVM_CAP_DEBUGREGS 50 | 521 | #define KVM_CAP_DEBUGREGS 50 |
522 | #endif | 522 | #endif |
523 | #define KVM_CAP_X86_ROBUST_SINGLESTEP 51 | 523 | #define KVM_CAP_X86_ROBUST_SINGLESTEP 51 |
524 | #define KVM_CAP_PPC_OSI 52 | 524 | #define KVM_CAP_PPC_OSI 52 |
525 | #define KVM_CAP_PPC_UNSET_IRQ 53 | 525 | #define KVM_CAP_PPC_UNSET_IRQ 53 |
526 | #define KVM_CAP_ENABLE_CAP 54 | 526 | #define KVM_CAP_ENABLE_CAP 54 |
527 | #ifdef __KVM_HAVE_XSAVE | 527 | #ifdef __KVM_HAVE_XSAVE |
528 | #define KVM_CAP_XSAVE 55 | 528 | #define KVM_CAP_XSAVE 55 |
529 | #endif | 529 | #endif |
530 | #ifdef __KVM_HAVE_XCRS | 530 | #ifdef __KVM_HAVE_XCRS |
531 | #define KVM_CAP_XCRS 56 | 531 | #define KVM_CAP_XCRS 56 |
532 | #endif | 532 | #endif |
533 | 533 | ||
534 | #ifdef KVM_CAP_IRQ_ROUTING | 534 | #ifdef KVM_CAP_IRQ_ROUTING |
535 | 535 | ||
536 | struct kvm_irq_routing_irqchip { | 536 | struct kvm_irq_routing_irqchip { |
537 | __u32 irqchip; | 537 | __u32 irqchip; |
538 | __u32 pin; | 538 | __u32 pin; |
539 | }; | 539 | }; |
540 | 540 | ||
541 | struct kvm_irq_routing_msi { | 541 | struct kvm_irq_routing_msi { |
542 | __u32 address_lo; | 542 | __u32 address_lo; |
543 | __u32 address_hi; | 543 | __u32 address_hi; |
544 | __u32 data; | 544 | __u32 data; |
545 | __u32 pad; | 545 | __u32 pad; |
546 | }; | 546 | }; |
547 | 547 | ||
548 | /* gsi routing entry types */ | 548 | /* gsi routing entry types */ |
549 | #define KVM_IRQ_ROUTING_IRQCHIP 1 | 549 | #define KVM_IRQ_ROUTING_IRQCHIP 1 |
550 | #define KVM_IRQ_ROUTING_MSI 2 | 550 | #define KVM_IRQ_ROUTING_MSI 2 |
551 | 551 | ||
552 | struct kvm_irq_routing_entry { | 552 | struct kvm_irq_routing_entry { |
553 | __u32 gsi; | 553 | __u32 gsi; |
554 | __u32 type; | 554 | __u32 type; |
555 | __u32 flags; | 555 | __u32 flags; |
556 | __u32 pad; | 556 | __u32 pad; |
557 | union { | 557 | union { |
558 | struct kvm_irq_routing_irqchip irqchip; | 558 | struct kvm_irq_routing_irqchip irqchip; |
559 | struct kvm_irq_routing_msi msi; | 559 | struct kvm_irq_routing_msi msi; |
560 | __u32 pad[8]; | 560 | __u32 pad[8]; |
561 | } u; | 561 | } u; |
562 | }; | 562 | }; |
563 | 563 | ||
564 | struct kvm_irq_routing { | 564 | struct kvm_irq_routing { |
565 | __u32 nr; | 565 | __u32 nr; |
566 | __u32 flags; | 566 | __u32 flags; |
567 | struct kvm_irq_routing_entry entries[0]; | 567 | struct kvm_irq_routing_entry entries[0]; |
568 | }; | 568 | }; |
569 | 569 | ||
570 | #endif | 570 | #endif |
571 | 571 | ||
572 | #ifdef KVM_CAP_MCE | 572 | #ifdef KVM_CAP_MCE |
573 | /* x86 MCE */ | 573 | /* x86 MCE */ |
574 | struct kvm_x86_mce { | 574 | struct kvm_x86_mce { |
575 | __u64 status; | 575 | __u64 status; |
576 | __u64 addr; | 576 | __u64 addr; |
577 | __u64 misc; | 577 | __u64 misc; |
578 | __u64 mcg_status; | 578 | __u64 mcg_status; |
579 | __u8 bank; | 579 | __u8 bank; |
580 | __u8 pad1[7]; | 580 | __u8 pad1[7]; |
581 | __u64 pad2[3]; | 581 | __u64 pad2[3]; |
582 | }; | 582 | }; |
583 | #endif | 583 | #endif |
584 | 584 | ||
585 | #ifdef KVM_CAP_XEN_HVM | 585 | #ifdef KVM_CAP_XEN_HVM |
586 | struct kvm_xen_hvm_config { | 586 | struct kvm_xen_hvm_config { |
587 | __u32 flags; | 587 | __u32 flags; |
588 | __u32 msr; | 588 | __u32 msr; |
589 | __u64 blob_addr_32; | 589 | __u64 blob_addr_32; |
590 | __u64 blob_addr_64; | 590 | __u64 blob_addr_64; |
591 | __u8 blob_size_32; | 591 | __u8 blob_size_32; |
592 | __u8 blob_size_64; | 592 | __u8 blob_size_64; |
593 | __u8 pad2[30]; | 593 | __u8 pad2[30]; |
594 | }; | 594 | }; |
595 | #endif | 595 | #endif |
596 | 596 | ||
597 | #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) | 597 | #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) |
598 | 598 | ||
599 | struct kvm_irqfd { | 599 | struct kvm_irqfd { |
600 | __u32 fd; | 600 | __u32 fd; |
601 | __u32 gsi; | 601 | __u32 gsi; |
602 | __u32 flags; | 602 | __u32 flags; |
603 | __u8 pad[20]; | 603 | __u8 pad[20]; |
604 | }; | 604 | }; |
605 | 605 | ||
606 | struct kvm_clock_data { | 606 | struct kvm_clock_data { |
607 | __u64 clock; | 607 | __u64 clock; |
608 | __u32 flags; | 608 | __u32 flags; |
609 | __u32 pad[9]; | 609 | __u32 pad[9]; |
610 | }; | 610 | }; |
611 | 611 | ||
612 | /* | 612 | /* |
613 | * ioctls for VM fds | 613 | * ioctls for VM fds |
614 | */ | 614 | */ |
615 | #define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region) | 615 | #define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region) |
616 | /* | 616 | /* |
617 | * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns | 617 | * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns |
618 | * a vcpu fd. | 618 | * a vcpu fd. |
619 | */ | 619 | */ |
620 | #define KVM_CREATE_VCPU _IO(KVMIO, 0x41) | 620 | #define KVM_CREATE_VCPU _IO(KVMIO, 0x41) |
621 | #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log) | 621 | #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log) |
622 | /* KVM_SET_MEMORY_ALIAS is obsolete: */ | ||
622 | #define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias) | 623 | #define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias) |
623 | #define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44) | 624 | #define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44) |
624 | #define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45) | 625 | #define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45) |
625 | #define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \ | 626 | #define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \ |
626 | struct kvm_userspace_memory_region) | 627 | struct kvm_userspace_memory_region) |
627 | #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) | 628 | #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) |
628 | #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) | 629 | #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) |
629 | /* Device model IOC */ | 630 | /* Device model IOC */ |
630 | #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) | 631 | #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) |
631 | #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) | 632 | #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) |
632 | #define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip) | 633 | #define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip) |
633 | #define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip) | 634 | #define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip) |
634 | #define KVM_CREATE_PIT _IO(KVMIO, 0x64) | 635 | #define KVM_CREATE_PIT _IO(KVMIO, 0x64) |
635 | #define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state) | 636 | #define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state) |
636 | #define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state) | 637 | #define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state) |
637 | #define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level) | 638 | #define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level) |
638 | #define KVM_REGISTER_COALESCED_MMIO \ | 639 | #define KVM_REGISTER_COALESCED_MMIO \ |
639 | _IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone) | 640 | _IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone) |
640 | #define KVM_UNREGISTER_COALESCED_MMIO \ | 641 | #define KVM_UNREGISTER_COALESCED_MMIO \ |
641 | _IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone) | 642 | _IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone) |
642 | #define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \ | 643 | #define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \ |
643 | struct kvm_assigned_pci_dev) | 644 | struct kvm_assigned_pci_dev) |
644 | #define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing) | 645 | #define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing) |
645 | /* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */ | 646 | /* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */ |
646 | #define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70 | 647 | #define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70 |
647 | #define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq) | 648 | #define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq) |
648 | #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71) | 649 | #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71) |
649 | #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \ | 650 | #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \ |
650 | struct kvm_assigned_pci_dev) | 651 | struct kvm_assigned_pci_dev) |
651 | #define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \ | 652 | #define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \ |
652 | struct kvm_assigned_msix_nr) | 653 | struct kvm_assigned_msix_nr) |
653 | #define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \ | 654 | #define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \ |
654 | struct kvm_assigned_msix_entry) | 655 | struct kvm_assigned_msix_entry) |
655 | #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) | 656 | #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) |
656 | #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) | 657 | #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) |
657 | #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) | 658 | #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) |
658 | #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78) | 659 | #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78) |
659 | #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) | 660 | #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) |
660 | #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config) | 661 | #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config) |
661 | #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data) | 662 | #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data) |
662 | #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data) | 663 | #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data) |
663 | /* Available with KVM_CAP_PIT_STATE2 */ | 664 | /* Available with KVM_CAP_PIT_STATE2 */ |
664 | #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2) | 665 | #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2) |
665 | #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) | 666 | #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) |
666 | 667 | ||
667 | /* | 668 | /* |
668 | * ioctls for vcpu fds | 669 | * ioctls for vcpu fds |
669 | */ | 670 | */ |
670 | #define KVM_RUN _IO(KVMIO, 0x80) | 671 | #define KVM_RUN _IO(KVMIO, 0x80) |
671 | #define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs) | 672 | #define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs) |
672 | #define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs) | 673 | #define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs) |
673 | #define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs) | 674 | #define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs) |
674 | #define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs) | 675 | #define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs) |
675 | #define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation) | 676 | #define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation) |
676 | #define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt) | 677 | #define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt) |
677 | /* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */ | 678 | /* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */ |
678 | #define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87 | 679 | #define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87 |
679 | #define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs) | 680 | #define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs) |
680 | #define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs) | 681 | #define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs) |
681 | #define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid) | 682 | #define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid) |
682 | #define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask) | 683 | #define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask) |
683 | #define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu) | 684 | #define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu) |
684 | #define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu) | 685 | #define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu) |
685 | #define KVM_GET_LAPIC _IOR(KVMIO, 0x8e, struct kvm_lapic_state) | 686 | #define KVM_GET_LAPIC _IOR(KVMIO, 0x8e, struct kvm_lapic_state) |
686 | #define KVM_SET_LAPIC _IOW(KVMIO, 0x8f, struct kvm_lapic_state) | 687 | #define KVM_SET_LAPIC _IOW(KVMIO, 0x8f, struct kvm_lapic_state) |
687 | #define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2) | 688 | #define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2) |
688 | #define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2) | 689 | #define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2) |
689 | /* Available with KVM_CAP_VAPIC */ | 690 | /* Available with KVM_CAP_VAPIC */ |
690 | #define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl) | 691 | #define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl) |
691 | /* Available with KVM_CAP_VAPIC */ | 692 | /* Available with KVM_CAP_VAPIC */ |
692 | #define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr) | 693 | #define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr) |
693 | /* valid for virtual machine (for floating interrupt)_and_ vcpu */ | 694 | /* valid for virtual machine (for floating interrupt)_and_ vcpu */ |
694 | #define KVM_S390_INTERRUPT _IOW(KVMIO, 0x94, struct kvm_s390_interrupt) | 695 | #define KVM_S390_INTERRUPT _IOW(KVMIO, 0x94, struct kvm_s390_interrupt) |
695 | /* store status for s390 */ | 696 | /* store status for s390 */ |
696 | #define KVM_S390_STORE_STATUS_NOADDR (-1ul) | 697 | #define KVM_S390_STORE_STATUS_NOADDR (-1ul) |
697 | #define KVM_S390_STORE_STATUS_PREFIXED (-2ul) | 698 | #define KVM_S390_STORE_STATUS_PREFIXED (-2ul) |
698 | #define KVM_S390_STORE_STATUS _IOW(KVMIO, 0x95, unsigned long) | 699 | #define KVM_S390_STORE_STATUS _IOW(KVMIO, 0x95, unsigned long) |
699 | /* initial ipl psw for s390 */ | 700 | /* initial ipl psw for s390 */ |
700 | #define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw) | 701 | #define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw) |
701 | /* initial reset for s390 */ | 702 | /* initial reset for s390 */ |
702 | #define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97) | 703 | #define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97) |
703 | #define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state) | 704 | #define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state) |
704 | #define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state) | 705 | #define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state) |
705 | /* Available with KVM_CAP_NMI */ | 706 | /* Available with KVM_CAP_NMI */ |
706 | #define KVM_NMI _IO(KVMIO, 0x9a) | 707 | #define KVM_NMI _IO(KVMIO, 0x9a) |
707 | /* Available with KVM_CAP_SET_GUEST_DEBUG */ | 708 | /* Available with KVM_CAP_SET_GUEST_DEBUG */ |
708 | #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) | 709 | #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) |
709 | /* MCE for x86 */ | 710 | /* MCE for x86 */ |
710 | #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) | 711 | #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) |
711 | #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) | 712 | #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) |
712 | #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) | 713 | #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) |
713 | /* IA64 stack access */ | 714 | /* IA64 stack access */ |
714 | #define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) | 715 | #define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) |
715 | #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) | 716 | #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) |
716 | /* Available with KVM_CAP_VCPU_EVENTS */ | 717 | /* Available with KVM_CAP_VCPU_EVENTS */ |
717 | #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) | 718 | #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) |
718 | #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) | 719 | #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) |
719 | /* Available with KVM_CAP_DEBUGREGS */ | 720 | /* Available with KVM_CAP_DEBUGREGS */ |
720 | #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) | 721 | #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) |
721 | #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) | 722 | #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) |
722 | #define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap) | 723 | #define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap) |
723 | /* Available with KVM_CAP_XSAVE */ | 724 | /* Available with KVM_CAP_XSAVE */ |
724 | #define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave) | 725 | #define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave) |
725 | #define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave) | 726 | #define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave) |
726 | /* Available with KVM_CAP_XCRS */ | 727 | /* Available with KVM_CAP_XCRS */ |
727 | #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs) | 728 | #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs) |
728 | #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs) | 729 | #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs) |
729 | 730 | ||
730 | #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) | 731 | #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) |
731 | 732 | ||
732 | struct kvm_assigned_pci_dev { | 733 | struct kvm_assigned_pci_dev { |
733 | __u32 assigned_dev_id; | 734 | __u32 assigned_dev_id; |
734 | __u32 busnr; | 735 | __u32 busnr; |
735 | __u32 devfn; | 736 | __u32 devfn; |
736 | __u32 flags; | 737 | __u32 flags; |
737 | __u32 segnr; | 738 | __u32 segnr; |
738 | union { | 739 | union { |
739 | __u32 reserved[11]; | 740 | __u32 reserved[11]; |
740 | }; | 741 | }; |
741 | }; | 742 | }; |
742 | 743 | ||
743 | #define KVM_DEV_IRQ_HOST_INTX (1 << 0) | 744 | #define KVM_DEV_IRQ_HOST_INTX (1 << 0) |
744 | #define KVM_DEV_IRQ_HOST_MSI (1 << 1) | 745 | #define KVM_DEV_IRQ_HOST_MSI (1 << 1) |
745 | #define KVM_DEV_IRQ_HOST_MSIX (1 << 2) | 746 | #define KVM_DEV_IRQ_HOST_MSIX (1 << 2) |
746 | 747 | ||
747 | #define KVM_DEV_IRQ_GUEST_INTX (1 << 8) | 748 | #define KVM_DEV_IRQ_GUEST_INTX (1 << 8) |
748 | #define KVM_DEV_IRQ_GUEST_MSI (1 << 9) | 749 | #define KVM_DEV_IRQ_GUEST_MSI (1 << 9) |
749 | #define KVM_DEV_IRQ_GUEST_MSIX (1 << 10) | 750 | #define KVM_DEV_IRQ_GUEST_MSIX (1 << 10) |
750 | 751 | ||
751 | #define KVM_DEV_IRQ_HOST_MASK 0x00ff | 752 | #define KVM_DEV_IRQ_HOST_MASK 0x00ff |
752 | #define KVM_DEV_IRQ_GUEST_MASK 0xff00 | 753 | #define KVM_DEV_IRQ_GUEST_MASK 0xff00 |
753 | 754 | ||
754 | struct kvm_assigned_irq { | 755 | struct kvm_assigned_irq { |
755 | __u32 assigned_dev_id; | 756 | __u32 assigned_dev_id; |
756 | __u32 host_irq; | 757 | __u32 host_irq; |
757 | __u32 guest_irq; | 758 | __u32 guest_irq; |
758 | __u32 flags; | 759 | __u32 flags; |
759 | union { | 760 | union { |
760 | struct { | 761 | struct { |
761 | __u32 addr_lo; | 762 | __u32 addr_lo; |
762 | __u32 addr_hi; | 763 | __u32 addr_hi; |
763 | __u32 data; | 764 | __u32 data; |
764 | } guest_msi; | 765 | } guest_msi; |
765 | __u32 reserved[12]; | 766 | __u32 reserved[12]; |
766 | }; | 767 | }; |
767 | }; | 768 | }; |
768 | 769 | ||
769 | 770 | ||
770 | struct kvm_assigned_msix_nr { | 771 | struct kvm_assigned_msix_nr { |
771 | __u32 assigned_dev_id; | 772 | __u32 assigned_dev_id; |
772 | __u16 entry_nr; | 773 | __u16 entry_nr; |
773 | __u16 padding; | 774 | __u16 padding; |
774 | }; | 775 | }; |
775 | 776 | ||
776 | #define KVM_MAX_MSIX_PER_DEV 256 | 777 | #define KVM_MAX_MSIX_PER_DEV 256 |
777 | struct kvm_assigned_msix_entry { | 778 | struct kvm_assigned_msix_entry { |
778 | __u32 assigned_dev_id; | 779 | __u32 assigned_dev_id; |
779 | __u32 gsi; | 780 | __u32 gsi; |
780 | __u16 entry; /* The index of entry in the MSI-X table */ | 781 | __u16 entry; /* The index of entry in the MSI-X table */ |
781 | __u16 padding[3]; | 782 | __u16 padding[3]; |
782 | }; | 783 | }; |
783 | 784 | ||
784 | #endif /* __LINUX_KVM_H */ | 785 | #endif /* __LINUX_KVM_H */ |
785 | 786 |
include/linux/kvm_host.h
1 | #ifndef __KVM_HOST_H | 1 | #ifndef __KVM_HOST_H |
2 | #define __KVM_HOST_H | 2 | #define __KVM_HOST_H |
3 | 3 | ||
4 | /* | 4 | /* |
5 | * This work is licensed under the terms of the GNU GPL, version 2. See | 5 | * This work is licensed under the terms of the GNU GPL, version 2. See |
6 | * the COPYING file in the top-level directory. | 6 | * the COPYING file in the top-level directory. |
7 | */ | 7 | */ |
8 | 8 | ||
9 | #include <linux/types.h> | 9 | #include <linux/types.h> |
10 | #include <linux/hardirq.h> | 10 | #include <linux/hardirq.h> |
11 | #include <linux/list.h> | 11 | #include <linux/list.h> |
12 | #include <linux/mutex.h> | 12 | #include <linux/mutex.h> |
13 | #include <linux/spinlock.h> | 13 | #include <linux/spinlock.h> |
14 | #include <linux/signal.h> | 14 | #include <linux/signal.h> |
15 | #include <linux/sched.h> | 15 | #include <linux/sched.h> |
16 | #include <linux/mm.h> | 16 | #include <linux/mm.h> |
17 | #include <linux/preempt.h> | 17 | #include <linux/preempt.h> |
18 | #include <linux/msi.h> | 18 | #include <linux/msi.h> |
19 | #include <asm/signal.h> | 19 | #include <asm/signal.h> |
20 | 20 | ||
21 | #include <linux/kvm.h> | 21 | #include <linux/kvm.h> |
22 | #include <linux/kvm_para.h> | 22 | #include <linux/kvm_para.h> |
23 | 23 | ||
24 | #include <linux/kvm_types.h> | 24 | #include <linux/kvm_types.h> |
25 | 25 | ||
26 | #include <asm/kvm_host.h> | 26 | #include <asm/kvm_host.h> |
27 | 27 | ||
28 | /* | 28 | /* |
29 | * vcpu->requests bit members | 29 | * vcpu->requests bit members |
30 | */ | 30 | */ |
31 | #define KVM_REQ_TLB_FLUSH 0 | 31 | #define KVM_REQ_TLB_FLUSH 0 |
32 | #define KVM_REQ_MIGRATE_TIMER 1 | 32 | #define KVM_REQ_MIGRATE_TIMER 1 |
33 | #define KVM_REQ_REPORT_TPR_ACCESS 2 | 33 | #define KVM_REQ_REPORT_TPR_ACCESS 2 |
34 | #define KVM_REQ_MMU_RELOAD 3 | 34 | #define KVM_REQ_MMU_RELOAD 3 |
35 | #define KVM_REQ_TRIPLE_FAULT 4 | 35 | #define KVM_REQ_TRIPLE_FAULT 4 |
36 | #define KVM_REQ_PENDING_TIMER 5 | 36 | #define KVM_REQ_PENDING_TIMER 5 |
37 | #define KVM_REQ_UNHALT 6 | 37 | #define KVM_REQ_UNHALT 6 |
38 | #define KVM_REQ_MMU_SYNC 7 | 38 | #define KVM_REQ_MMU_SYNC 7 |
39 | #define KVM_REQ_KVMCLOCK_UPDATE 8 | 39 | #define KVM_REQ_KVMCLOCK_UPDATE 8 |
40 | #define KVM_REQ_KICK 9 | 40 | #define KVM_REQ_KICK 9 |
41 | #define KVM_REQ_DEACTIVATE_FPU 10 | 41 | #define KVM_REQ_DEACTIVATE_FPU 10 |
42 | 42 | ||
43 | #define KVM_USERSPACE_IRQ_SOURCE_ID 0 | 43 | #define KVM_USERSPACE_IRQ_SOURCE_ID 0 |
44 | 44 | ||
45 | struct kvm; | 45 | struct kvm; |
46 | struct kvm_vcpu; | 46 | struct kvm_vcpu; |
47 | extern struct kmem_cache *kvm_vcpu_cache; | 47 | extern struct kmem_cache *kvm_vcpu_cache; |
48 | 48 | ||
49 | /* | 49 | /* |
50 | * It would be nice to use something smarter than a linear search, TBD... | 50 | * It would be nice to use something smarter than a linear search, TBD... |
51 | * Thankfully we dont expect many devices to register (famous last words :), | 51 | * Thankfully we dont expect many devices to register (famous last words :), |
52 | * so until then it will suffice. At least its abstracted so we can change | 52 | * so until then it will suffice. At least its abstracted so we can change |
53 | * in one place. | 53 | * in one place. |
54 | */ | 54 | */ |
55 | struct kvm_io_bus { | 55 | struct kvm_io_bus { |
56 | int dev_count; | 56 | int dev_count; |
57 | #define NR_IOBUS_DEVS 200 | 57 | #define NR_IOBUS_DEVS 200 |
58 | struct kvm_io_device *devs[NR_IOBUS_DEVS]; | 58 | struct kvm_io_device *devs[NR_IOBUS_DEVS]; |
59 | }; | 59 | }; |
60 | 60 | ||
61 | enum kvm_bus { | 61 | enum kvm_bus { |
62 | KVM_MMIO_BUS, | 62 | KVM_MMIO_BUS, |
63 | KVM_PIO_BUS, | 63 | KVM_PIO_BUS, |
64 | KVM_NR_BUSES | 64 | KVM_NR_BUSES |
65 | }; | 65 | }; |
66 | 66 | ||
67 | int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, | 67 | int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, |
68 | int len, const void *val); | 68 | int len, const void *val); |
69 | int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, | 69 | int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, |
70 | void *val); | 70 | void *val); |
71 | int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, | 71 | int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, |
72 | struct kvm_io_device *dev); | 72 | struct kvm_io_device *dev); |
73 | int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, | 73 | int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, |
74 | struct kvm_io_device *dev); | 74 | struct kvm_io_device *dev); |
75 | 75 | ||
76 | struct kvm_vcpu { | 76 | struct kvm_vcpu { |
77 | struct kvm *kvm; | 77 | struct kvm *kvm; |
78 | #ifdef CONFIG_PREEMPT_NOTIFIERS | 78 | #ifdef CONFIG_PREEMPT_NOTIFIERS |
79 | struct preempt_notifier preempt_notifier; | 79 | struct preempt_notifier preempt_notifier; |
80 | #endif | 80 | #endif |
81 | int vcpu_id; | 81 | int vcpu_id; |
82 | struct mutex mutex; | 82 | struct mutex mutex; |
83 | int cpu; | 83 | int cpu; |
84 | atomic_t guest_mode; | 84 | atomic_t guest_mode; |
85 | struct kvm_run *run; | 85 | struct kvm_run *run; |
86 | unsigned long requests; | 86 | unsigned long requests; |
87 | unsigned long guest_debug; | 87 | unsigned long guest_debug; |
88 | int srcu_idx; | 88 | int srcu_idx; |
89 | 89 | ||
90 | int fpu_active; | 90 | int fpu_active; |
91 | int guest_fpu_loaded, guest_xcr0_loaded; | 91 | int guest_fpu_loaded, guest_xcr0_loaded; |
92 | wait_queue_head_t wq; | 92 | wait_queue_head_t wq; |
93 | int sigset_active; | 93 | int sigset_active; |
94 | sigset_t sigset; | 94 | sigset_t sigset; |
95 | struct kvm_vcpu_stat stat; | 95 | struct kvm_vcpu_stat stat; |
96 | 96 | ||
97 | #ifdef CONFIG_HAS_IOMEM | 97 | #ifdef CONFIG_HAS_IOMEM |
98 | int mmio_needed; | 98 | int mmio_needed; |
99 | int mmio_read_completed; | 99 | int mmio_read_completed; |
100 | int mmio_is_write; | 100 | int mmio_is_write; |
101 | int mmio_size; | 101 | int mmio_size; |
102 | unsigned char mmio_data[8]; | 102 | unsigned char mmio_data[8]; |
103 | gpa_t mmio_phys_addr; | 103 | gpa_t mmio_phys_addr; |
104 | #endif | 104 | #endif |
105 | 105 | ||
106 | struct kvm_vcpu_arch arch; | 106 | struct kvm_vcpu_arch arch; |
107 | }; | 107 | }; |
108 | 108 | ||
109 | /* | 109 | /* |
110 | * Some of the bitops functions do not support too long bitmaps. | 110 | * Some of the bitops functions do not support too long bitmaps. |
111 | * This number must be determined not to exceed such limits. | 111 | * This number must be determined not to exceed such limits. |
112 | */ | 112 | */ |
113 | #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1) | 113 | #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1) |
114 | 114 | ||
115 | struct kvm_memory_slot { | 115 | struct kvm_memory_slot { |
116 | gfn_t base_gfn; | 116 | gfn_t base_gfn; |
117 | unsigned long npages; | 117 | unsigned long npages; |
118 | unsigned long flags; | 118 | unsigned long flags; |
119 | unsigned long *rmap; | 119 | unsigned long *rmap; |
120 | unsigned long *dirty_bitmap; | 120 | unsigned long *dirty_bitmap; |
121 | struct { | 121 | struct { |
122 | unsigned long rmap_pde; | 122 | unsigned long rmap_pde; |
123 | int write_count; | 123 | int write_count; |
124 | } *lpage_info[KVM_NR_PAGE_SIZES - 1]; | 124 | } *lpage_info[KVM_NR_PAGE_SIZES - 1]; |
125 | unsigned long userspace_addr; | 125 | unsigned long userspace_addr; |
126 | int user_alloc; | 126 | int user_alloc; |
127 | }; | 127 | }; |
128 | 128 | ||
129 | static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) | 129 | static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) |
130 | { | 130 | { |
131 | return ALIGN(memslot->npages, BITS_PER_LONG) / 8; | 131 | return ALIGN(memslot->npages, BITS_PER_LONG) / 8; |
132 | } | 132 | } |
133 | 133 | ||
134 | struct kvm_kernel_irq_routing_entry { | 134 | struct kvm_kernel_irq_routing_entry { |
135 | u32 gsi; | 135 | u32 gsi; |
136 | u32 type; | 136 | u32 type; |
137 | int (*set)(struct kvm_kernel_irq_routing_entry *e, | 137 | int (*set)(struct kvm_kernel_irq_routing_entry *e, |
138 | struct kvm *kvm, int irq_source_id, int level); | 138 | struct kvm *kvm, int irq_source_id, int level); |
139 | union { | 139 | union { |
140 | struct { | 140 | struct { |
141 | unsigned irqchip; | 141 | unsigned irqchip; |
142 | unsigned pin; | 142 | unsigned pin; |
143 | } irqchip; | 143 | } irqchip; |
144 | struct msi_msg msi; | 144 | struct msi_msg msi; |
145 | }; | 145 | }; |
146 | struct hlist_node link; | 146 | struct hlist_node link; |
147 | }; | 147 | }; |
148 | 148 | ||
149 | #ifdef __KVM_HAVE_IOAPIC | 149 | #ifdef __KVM_HAVE_IOAPIC |
150 | 150 | ||
151 | struct kvm_irq_routing_table { | 151 | struct kvm_irq_routing_table { |
152 | int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS]; | 152 | int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS]; |
153 | struct kvm_kernel_irq_routing_entry *rt_entries; | 153 | struct kvm_kernel_irq_routing_entry *rt_entries; |
154 | u32 nr_rt_entries; | 154 | u32 nr_rt_entries; |
155 | /* | 155 | /* |
156 | * Array indexed by gsi. Each entry contains list of irq chips | 156 | * Array indexed by gsi. Each entry contains list of irq chips |
157 | * the gsi is connected to. | 157 | * the gsi is connected to. |
158 | */ | 158 | */ |
159 | struct hlist_head map[0]; | 159 | struct hlist_head map[0]; |
160 | }; | 160 | }; |
161 | 161 | ||
162 | #else | 162 | #else |
163 | 163 | ||
164 | struct kvm_irq_routing_table {}; | 164 | struct kvm_irq_routing_table {}; |
165 | 165 | ||
166 | #endif | 166 | #endif |
167 | 167 | ||
168 | struct kvm_memslots { | 168 | struct kvm_memslots { |
169 | int nmemslots; | 169 | int nmemslots; |
170 | struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + | 170 | struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + |
171 | KVM_PRIVATE_MEM_SLOTS]; | 171 | KVM_PRIVATE_MEM_SLOTS]; |
172 | }; | 172 | }; |
173 | 173 | ||
174 | struct kvm { | 174 | struct kvm { |
175 | spinlock_t mmu_lock; | 175 | spinlock_t mmu_lock; |
176 | raw_spinlock_t requests_lock; | 176 | raw_spinlock_t requests_lock; |
177 | struct mutex slots_lock; | 177 | struct mutex slots_lock; |
178 | struct mm_struct *mm; /* userspace tied to this vm */ | 178 | struct mm_struct *mm; /* userspace tied to this vm */ |
179 | struct kvm_memslots *memslots; | 179 | struct kvm_memslots *memslots; |
180 | struct srcu_struct srcu; | 180 | struct srcu_struct srcu; |
181 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE | 181 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE |
182 | u32 bsp_vcpu_id; | 182 | u32 bsp_vcpu_id; |
183 | struct kvm_vcpu *bsp_vcpu; | 183 | struct kvm_vcpu *bsp_vcpu; |
184 | #endif | 184 | #endif |
185 | struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; | 185 | struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; |
186 | atomic_t online_vcpus; | 186 | atomic_t online_vcpus; |
187 | struct list_head vm_list; | 187 | struct list_head vm_list; |
188 | struct mutex lock; | 188 | struct mutex lock; |
189 | struct kvm_io_bus *buses[KVM_NR_BUSES]; | 189 | struct kvm_io_bus *buses[KVM_NR_BUSES]; |
190 | #ifdef CONFIG_HAVE_KVM_EVENTFD | 190 | #ifdef CONFIG_HAVE_KVM_EVENTFD |
191 | struct { | 191 | struct { |
192 | spinlock_t lock; | 192 | spinlock_t lock; |
193 | struct list_head items; | 193 | struct list_head items; |
194 | } irqfds; | 194 | } irqfds; |
195 | struct list_head ioeventfds; | 195 | struct list_head ioeventfds; |
196 | #endif | 196 | #endif |
197 | struct kvm_vm_stat stat; | 197 | struct kvm_vm_stat stat; |
198 | struct kvm_arch arch; | 198 | struct kvm_arch arch; |
199 | atomic_t users_count; | 199 | atomic_t users_count; |
200 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET | 200 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET |
201 | struct kvm_coalesced_mmio_dev *coalesced_mmio_dev; | 201 | struct kvm_coalesced_mmio_dev *coalesced_mmio_dev; |
202 | struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; | 202 | struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; |
203 | #endif | 203 | #endif |
204 | 204 | ||
205 | struct mutex irq_lock; | 205 | struct mutex irq_lock; |
206 | #ifdef CONFIG_HAVE_KVM_IRQCHIP | 206 | #ifdef CONFIG_HAVE_KVM_IRQCHIP |
207 | struct kvm_irq_routing_table *irq_routing; | 207 | struct kvm_irq_routing_table *irq_routing; |
208 | struct hlist_head mask_notifier_list; | 208 | struct hlist_head mask_notifier_list; |
209 | struct hlist_head irq_ack_notifier_list; | 209 | struct hlist_head irq_ack_notifier_list; |
210 | #endif | 210 | #endif |
211 | 211 | ||
212 | #ifdef KVM_ARCH_WANT_MMU_NOTIFIER | 212 | #ifdef KVM_ARCH_WANT_MMU_NOTIFIER |
213 | struct mmu_notifier mmu_notifier; | 213 | struct mmu_notifier mmu_notifier; |
214 | unsigned long mmu_notifier_seq; | 214 | unsigned long mmu_notifier_seq; |
215 | long mmu_notifier_count; | 215 | long mmu_notifier_count; |
216 | #endif | 216 | #endif |
217 | }; | 217 | }; |
218 | 218 | ||
219 | /* The guest did something we don't support. */ | 219 | /* The guest did something we don't support. */ |
220 | #define pr_unimpl(vcpu, fmt, ...) \ | 220 | #define pr_unimpl(vcpu, fmt, ...) \ |
221 | do { \ | 221 | do { \ |
222 | if (printk_ratelimit()) \ | 222 | if (printk_ratelimit()) \ |
223 | printk(KERN_ERR "kvm: %i: cpu%i " fmt, \ | 223 | printk(KERN_ERR "kvm: %i: cpu%i " fmt, \ |
224 | current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__); \ | 224 | current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__); \ |
225 | } while (0) | 225 | } while (0) |
226 | 226 | ||
227 | #define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt) | 227 | #define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt) |
228 | #define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt) | 228 | #define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt) |
229 | 229 | ||
230 | static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) | 230 | static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) |
231 | { | 231 | { |
232 | smp_rmb(); | 232 | smp_rmb(); |
233 | return kvm->vcpus[i]; | 233 | return kvm->vcpus[i]; |
234 | } | 234 | } |
235 | 235 | ||
236 | #define kvm_for_each_vcpu(idx, vcpup, kvm) \ | 236 | #define kvm_for_each_vcpu(idx, vcpup, kvm) \ |
237 | for (idx = 0, vcpup = kvm_get_vcpu(kvm, idx); \ | 237 | for (idx = 0, vcpup = kvm_get_vcpu(kvm, idx); \ |
238 | idx < atomic_read(&kvm->online_vcpus) && vcpup; \ | 238 | idx < atomic_read(&kvm->online_vcpus) && vcpup; \ |
239 | vcpup = kvm_get_vcpu(kvm, ++idx)) | 239 | vcpup = kvm_get_vcpu(kvm, ++idx)) |
240 | 240 | ||
241 | int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); | 241 | int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); |
242 | void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); | 242 | void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); |
243 | 243 | ||
244 | void vcpu_load(struct kvm_vcpu *vcpu); | 244 | void vcpu_load(struct kvm_vcpu *vcpu); |
245 | void vcpu_put(struct kvm_vcpu *vcpu); | 245 | void vcpu_put(struct kvm_vcpu *vcpu); |
246 | 246 | ||
247 | int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, | 247 | int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, |
248 | struct module *module); | 248 | struct module *module); |
249 | void kvm_exit(void); | 249 | void kvm_exit(void); |
250 | 250 | ||
251 | void kvm_get_kvm(struct kvm *kvm); | 251 | void kvm_get_kvm(struct kvm *kvm); |
252 | void kvm_put_kvm(struct kvm *kvm); | 252 | void kvm_put_kvm(struct kvm *kvm); |
253 | 253 | ||
254 | static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) | 254 | static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) |
255 | { | 255 | { |
256 | return rcu_dereference_check(kvm->memslots, | 256 | return rcu_dereference_check(kvm->memslots, |
257 | srcu_read_lock_held(&kvm->srcu) | 257 | srcu_read_lock_held(&kvm->srcu) |
258 | || lockdep_is_held(&kvm->slots_lock)); | 258 | || lockdep_is_held(&kvm->slots_lock)); |
259 | } | 259 | } |
260 | 260 | ||
261 | #define HPA_MSB ((sizeof(hpa_t) * 8) - 1) | 261 | #define HPA_MSB ((sizeof(hpa_t) * 8) - 1) |
262 | #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB) | 262 | #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB) |
263 | static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; } | 263 | static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; } |
264 | 264 | ||
265 | extern struct page *bad_page; | 265 | extern struct page *bad_page; |
266 | extern pfn_t bad_pfn; | 266 | extern pfn_t bad_pfn; |
267 | 267 | ||
268 | int is_error_page(struct page *page); | 268 | int is_error_page(struct page *page); |
269 | int is_error_pfn(pfn_t pfn); | 269 | int is_error_pfn(pfn_t pfn); |
270 | int is_hwpoison_pfn(pfn_t pfn); | 270 | int is_hwpoison_pfn(pfn_t pfn); |
271 | int kvm_is_error_hva(unsigned long addr); | 271 | int kvm_is_error_hva(unsigned long addr); |
272 | int kvm_set_memory_region(struct kvm *kvm, | 272 | int kvm_set_memory_region(struct kvm *kvm, |
273 | struct kvm_userspace_memory_region *mem, | 273 | struct kvm_userspace_memory_region *mem, |
274 | int user_alloc); | 274 | int user_alloc); |
275 | int __kvm_set_memory_region(struct kvm *kvm, | 275 | int __kvm_set_memory_region(struct kvm *kvm, |
276 | struct kvm_userspace_memory_region *mem, | 276 | struct kvm_userspace_memory_region *mem, |
277 | int user_alloc); | 277 | int user_alloc); |
278 | int kvm_arch_prepare_memory_region(struct kvm *kvm, | 278 | int kvm_arch_prepare_memory_region(struct kvm *kvm, |
279 | struct kvm_memory_slot *memslot, | 279 | struct kvm_memory_slot *memslot, |
280 | struct kvm_memory_slot old, | 280 | struct kvm_memory_slot old, |
281 | struct kvm_userspace_memory_region *mem, | 281 | struct kvm_userspace_memory_region *mem, |
282 | int user_alloc); | 282 | int user_alloc); |
283 | void kvm_arch_commit_memory_region(struct kvm *kvm, | 283 | void kvm_arch_commit_memory_region(struct kvm *kvm, |
284 | struct kvm_userspace_memory_region *mem, | 284 | struct kvm_userspace_memory_region *mem, |
285 | struct kvm_memory_slot old, | 285 | struct kvm_memory_slot old, |
286 | int user_alloc); | 286 | int user_alloc); |
287 | void kvm_disable_largepages(void); | 287 | void kvm_disable_largepages(void); |
288 | void kvm_arch_flush_shadow(struct kvm *kvm); | 288 | void kvm_arch_flush_shadow(struct kvm *kvm); |
289 | gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn); | ||
290 | gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn); | ||
291 | 289 | ||
292 | struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); | 290 | struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); |
293 | unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); | 291 | unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); |
294 | void kvm_release_page_clean(struct page *page); | 292 | void kvm_release_page_clean(struct page *page); |
295 | void kvm_release_page_dirty(struct page *page); | 293 | void kvm_release_page_dirty(struct page *page); |
296 | void kvm_set_page_dirty(struct page *page); | 294 | void kvm_set_page_dirty(struct page *page); |
297 | void kvm_set_page_accessed(struct page *page); | 295 | void kvm_set_page_accessed(struct page *page); |
298 | 296 | ||
299 | pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); | 297 | pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); |
300 | pfn_t gfn_to_pfn_memslot(struct kvm *kvm, | 298 | pfn_t gfn_to_pfn_memslot(struct kvm *kvm, |
301 | struct kvm_memory_slot *slot, gfn_t gfn); | 299 | struct kvm_memory_slot *slot, gfn_t gfn); |
302 | int memslot_id(struct kvm *kvm, gfn_t gfn); | 300 | int memslot_id(struct kvm *kvm, gfn_t gfn); |
303 | void kvm_release_pfn_dirty(pfn_t); | 301 | void kvm_release_pfn_dirty(pfn_t); |
304 | void kvm_release_pfn_clean(pfn_t pfn); | 302 | void kvm_release_pfn_clean(pfn_t pfn); |
305 | void kvm_set_pfn_dirty(pfn_t pfn); | 303 | void kvm_set_pfn_dirty(pfn_t pfn); |
306 | void kvm_set_pfn_accessed(pfn_t pfn); | 304 | void kvm_set_pfn_accessed(pfn_t pfn); |
307 | void kvm_get_pfn(pfn_t pfn); | 305 | void kvm_get_pfn(pfn_t pfn); |
308 | 306 | ||
309 | int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, | 307 | int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, |
310 | int len); | 308 | int len); |
311 | int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, | 309 | int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, |
312 | unsigned long len); | 310 | unsigned long len); |
313 | int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); | 311 | int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); |
314 | int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, | 312 | int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, |
315 | int offset, int len); | 313 | int offset, int len); |
316 | int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, | 314 | int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, |
317 | unsigned long len); | 315 | unsigned long len); |
318 | int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len); | 316 | int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len); |
319 | int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); | 317 | int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); |
320 | struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); | 318 | struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); |
321 | int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); | 319 | int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); |
322 | unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); | 320 | unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); |
323 | void mark_page_dirty(struct kvm *kvm, gfn_t gfn); | 321 | void mark_page_dirty(struct kvm *kvm, gfn_t gfn); |
324 | 322 | ||
325 | void kvm_vcpu_block(struct kvm_vcpu *vcpu); | 323 | void kvm_vcpu_block(struct kvm_vcpu *vcpu); |
326 | void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); | 324 | void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); |
327 | void kvm_resched(struct kvm_vcpu *vcpu); | 325 | void kvm_resched(struct kvm_vcpu *vcpu); |
328 | void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); | 326 | void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); |
329 | void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); | 327 | void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); |
330 | void kvm_flush_remote_tlbs(struct kvm *kvm); | 328 | void kvm_flush_remote_tlbs(struct kvm *kvm); |
331 | void kvm_reload_remote_mmus(struct kvm *kvm); | 329 | void kvm_reload_remote_mmus(struct kvm *kvm); |
332 | 330 | ||
333 | long kvm_arch_dev_ioctl(struct file *filp, | 331 | long kvm_arch_dev_ioctl(struct file *filp, |
334 | unsigned int ioctl, unsigned long arg); | 332 | unsigned int ioctl, unsigned long arg); |
335 | long kvm_arch_vcpu_ioctl(struct file *filp, | 333 | long kvm_arch_vcpu_ioctl(struct file *filp, |
336 | unsigned int ioctl, unsigned long arg); | 334 | unsigned int ioctl, unsigned long arg); |
337 | 335 | ||
338 | int kvm_dev_ioctl_check_extension(long ext); | 336 | int kvm_dev_ioctl_check_extension(long ext); |
339 | 337 | ||
340 | int kvm_get_dirty_log(struct kvm *kvm, | 338 | int kvm_get_dirty_log(struct kvm *kvm, |
341 | struct kvm_dirty_log *log, int *is_dirty); | 339 | struct kvm_dirty_log *log, int *is_dirty); |
342 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, | 340 | int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, |
343 | struct kvm_dirty_log *log); | 341 | struct kvm_dirty_log *log); |
344 | 342 | ||
345 | int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, | 343 | int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, |
346 | struct | 344 | struct |
347 | kvm_userspace_memory_region *mem, | 345 | kvm_userspace_memory_region *mem, |
348 | int user_alloc); | 346 | int user_alloc); |
349 | long kvm_arch_vm_ioctl(struct file *filp, | 347 | long kvm_arch_vm_ioctl(struct file *filp, |
350 | unsigned int ioctl, unsigned long arg); | 348 | unsigned int ioctl, unsigned long arg); |
351 | 349 | ||
352 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); | 350 | int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); |
353 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); | 351 | int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); |
354 | 352 | ||
355 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, | 353 | int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, |
356 | struct kvm_translation *tr); | 354 | struct kvm_translation *tr); |
357 | 355 | ||
358 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); | 356 | int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); |
359 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); | 357 | int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs); |
360 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, | 358 | int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, |
361 | struct kvm_sregs *sregs); | 359 | struct kvm_sregs *sregs); |
362 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, | 360 | int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, |
363 | struct kvm_sregs *sregs); | 361 | struct kvm_sregs *sregs); |
364 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, | 362 | int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, |
365 | struct kvm_mp_state *mp_state); | 363 | struct kvm_mp_state *mp_state); |
366 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, | 364 | int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, |
367 | struct kvm_mp_state *mp_state); | 365 | struct kvm_mp_state *mp_state); |
368 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, | 366 | int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, |
369 | struct kvm_guest_debug *dbg); | 367 | struct kvm_guest_debug *dbg); |
370 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); | 368 | int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); |
371 | 369 | ||
372 | int kvm_arch_init(void *opaque); | 370 | int kvm_arch_init(void *opaque); |
373 | void kvm_arch_exit(void); | 371 | void kvm_arch_exit(void); |
374 | 372 | ||
375 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu); | 373 | int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu); |
376 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); | 374 | void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); |
377 | 375 | ||
378 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu); | 376 | void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu); |
379 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); | 377 | void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); |
380 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); | 378 | void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); |
381 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id); | 379 | struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id); |
382 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu); | 380 | int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu); |
383 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu); | 381 | void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu); |
384 | 382 | ||
385 | int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu); | 383 | int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu); |
386 | int kvm_arch_hardware_enable(void *garbage); | 384 | int kvm_arch_hardware_enable(void *garbage); |
387 | void kvm_arch_hardware_disable(void *garbage); | 385 | void kvm_arch_hardware_disable(void *garbage); |
388 | int kvm_arch_hardware_setup(void); | 386 | int kvm_arch_hardware_setup(void); |
389 | void kvm_arch_hardware_unsetup(void); | 387 | void kvm_arch_hardware_unsetup(void); |
390 | void kvm_arch_check_processor_compat(void *rtn); | 388 | void kvm_arch_check_processor_compat(void *rtn); |
391 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); | 389 | int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); |
392 | 390 | ||
393 | void kvm_free_physmem(struct kvm *kvm); | 391 | void kvm_free_physmem(struct kvm *kvm); |
394 | 392 | ||
395 | struct kvm *kvm_arch_create_vm(void); | 393 | struct kvm *kvm_arch_create_vm(void); |
396 | void kvm_arch_destroy_vm(struct kvm *kvm); | 394 | void kvm_arch_destroy_vm(struct kvm *kvm); |
397 | void kvm_free_all_assigned_devices(struct kvm *kvm); | 395 | void kvm_free_all_assigned_devices(struct kvm *kvm); |
398 | void kvm_arch_sync_events(struct kvm *kvm); | 396 | void kvm_arch_sync_events(struct kvm *kvm); |
399 | 397 | ||
400 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); | 398 | int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); |
401 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu); | 399 | void kvm_vcpu_kick(struct kvm_vcpu *vcpu); |
402 | 400 | ||
403 | int kvm_is_mmio_pfn(pfn_t pfn); | 401 | int kvm_is_mmio_pfn(pfn_t pfn); |
404 | 402 | ||
405 | struct kvm_irq_ack_notifier { | 403 | struct kvm_irq_ack_notifier { |
406 | struct hlist_node link; | 404 | struct hlist_node link; |
407 | unsigned gsi; | 405 | unsigned gsi; |
408 | void (*irq_acked)(struct kvm_irq_ack_notifier *kian); | 406 | void (*irq_acked)(struct kvm_irq_ack_notifier *kian); |
409 | }; | 407 | }; |
410 | 408 | ||
411 | #define KVM_ASSIGNED_MSIX_PENDING 0x1 | 409 | #define KVM_ASSIGNED_MSIX_PENDING 0x1 |
412 | struct kvm_guest_msix_entry { | 410 | struct kvm_guest_msix_entry { |
413 | u32 vector; | 411 | u32 vector; |
414 | u16 entry; | 412 | u16 entry; |
415 | u16 flags; | 413 | u16 flags; |
416 | }; | 414 | }; |
417 | 415 | ||
418 | struct kvm_assigned_dev_kernel { | 416 | struct kvm_assigned_dev_kernel { |
419 | struct kvm_irq_ack_notifier ack_notifier; | 417 | struct kvm_irq_ack_notifier ack_notifier; |
420 | struct work_struct interrupt_work; | 418 | struct work_struct interrupt_work; |
421 | struct list_head list; | 419 | struct list_head list; |
422 | int assigned_dev_id; | 420 | int assigned_dev_id; |
423 | int host_segnr; | 421 | int host_segnr; |
424 | int host_busnr; | 422 | int host_busnr; |
425 | int host_devfn; | 423 | int host_devfn; |
426 | unsigned int entries_nr; | 424 | unsigned int entries_nr; |
427 | int host_irq; | 425 | int host_irq; |
428 | bool host_irq_disabled; | 426 | bool host_irq_disabled; |
429 | struct msix_entry *host_msix_entries; | 427 | struct msix_entry *host_msix_entries; |
430 | int guest_irq; | 428 | int guest_irq; |
431 | struct kvm_guest_msix_entry *guest_msix_entries; | 429 | struct kvm_guest_msix_entry *guest_msix_entries; |
432 | unsigned long irq_requested_type; | 430 | unsigned long irq_requested_type; |
433 | int irq_source_id; | 431 | int irq_source_id; |
434 | int flags; | 432 | int flags; |
435 | struct pci_dev *dev; | 433 | struct pci_dev *dev; |
436 | struct kvm *kvm; | 434 | struct kvm *kvm; |
437 | spinlock_t assigned_dev_lock; | 435 | spinlock_t assigned_dev_lock; |
438 | }; | 436 | }; |
439 | 437 | ||
440 | struct kvm_irq_mask_notifier { | 438 | struct kvm_irq_mask_notifier { |
441 | void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); | 439 | void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); |
442 | int irq; | 440 | int irq; |
443 | struct hlist_node link; | 441 | struct hlist_node link; |
444 | }; | 442 | }; |
445 | 443 | ||
446 | void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, | 444 | void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, |
447 | struct kvm_irq_mask_notifier *kimn); | 445 | struct kvm_irq_mask_notifier *kimn); |
448 | void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, | 446 | void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, |
449 | struct kvm_irq_mask_notifier *kimn); | 447 | struct kvm_irq_mask_notifier *kimn); |
450 | void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); | 448 | void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); |
451 | 449 | ||
452 | #ifdef __KVM_HAVE_IOAPIC | 450 | #ifdef __KVM_HAVE_IOAPIC |
453 | void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, | 451 | void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, |
454 | union kvm_ioapic_redirect_entry *entry, | 452 | union kvm_ioapic_redirect_entry *entry, |
455 | unsigned long *deliver_bitmask); | 453 | unsigned long *deliver_bitmask); |
456 | #endif | 454 | #endif |
457 | int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); | 455 | int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); |
458 | void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); | 456 | void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); |
459 | void kvm_register_irq_ack_notifier(struct kvm *kvm, | 457 | void kvm_register_irq_ack_notifier(struct kvm *kvm, |
460 | struct kvm_irq_ack_notifier *kian); | 458 | struct kvm_irq_ack_notifier *kian); |
461 | void kvm_unregister_irq_ack_notifier(struct kvm *kvm, | 459 | void kvm_unregister_irq_ack_notifier(struct kvm *kvm, |
462 | struct kvm_irq_ack_notifier *kian); | 460 | struct kvm_irq_ack_notifier *kian); |
463 | int kvm_request_irq_source_id(struct kvm *kvm); | 461 | int kvm_request_irq_source_id(struct kvm *kvm); |
464 | void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); | 462 | void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); |
465 | 463 | ||
466 | /* For vcpu->arch.iommu_flags */ | 464 | /* For vcpu->arch.iommu_flags */ |
467 | #define KVM_IOMMU_CACHE_COHERENCY 0x1 | 465 | #define KVM_IOMMU_CACHE_COHERENCY 0x1 |
468 | 466 | ||
469 | #ifdef CONFIG_IOMMU_API | 467 | #ifdef CONFIG_IOMMU_API |
470 | int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot); | 468 | int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot); |
471 | int kvm_iommu_map_guest(struct kvm *kvm); | 469 | int kvm_iommu_map_guest(struct kvm *kvm); |
472 | int kvm_iommu_unmap_guest(struct kvm *kvm); | 470 | int kvm_iommu_unmap_guest(struct kvm *kvm); |
473 | int kvm_assign_device(struct kvm *kvm, | 471 | int kvm_assign_device(struct kvm *kvm, |
474 | struct kvm_assigned_dev_kernel *assigned_dev); | 472 | struct kvm_assigned_dev_kernel *assigned_dev); |
475 | int kvm_deassign_device(struct kvm *kvm, | 473 | int kvm_deassign_device(struct kvm *kvm, |
476 | struct kvm_assigned_dev_kernel *assigned_dev); | 474 | struct kvm_assigned_dev_kernel *assigned_dev); |
477 | #else /* CONFIG_IOMMU_API */ | 475 | #else /* CONFIG_IOMMU_API */ |
478 | static inline int kvm_iommu_map_pages(struct kvm *kvm, | 476 | static inline int kvm_iommu_map_pages(struct kvm *kvm, |
479 | gfn_t base_gfn, | 477 | gfn_t base_gfn, |
480 | unsigned long npages) | 478 | unsigned long npages) |
481 | { | 479 | { |
482 | return 0; | 480 | return 0; |
483 | } | 481 | } |
484 | 482 | ||
485 | static inline int kvm_iommu_map_guest(struct kvm *kvm) | 483 | static inline int kvm_iommu_map_guest(struct kvm *kvm) |
486 | { | 484 | { |
487 | return -ENODEV; | 485 | return -ENODEV; |
488 | } | 486 | } |
489 | 487 | ||
490 | static inline int kvm_iommu_unmap_guest(struct kvm *kvm) | 488 | static inline int kvm_iommu_unmap_guest(struct kvm *kvm) |
491 | { | 489 | { |
492 | return 0; | 490 | return 0; |
493 | } | 491 | } |
494 | 492 | ||
495 | static inline int kvm_assign_device(struct kvm *kvm, | 493 | static inline int kvm_assign_device(struct kvm *kvm, |
496 | struct kvm_assigned_dev_kernel *assigned_dev) | 494 | struct kvm_assigned_dev_kernel *assigned_dev) |
497 | { | 495 | { |
498 | return 0; | 496 | return 0; |
499 | } | 497 | } |
500 | 498 | ||
501 | static inline int kvm_deassign_device(struct kvm *kvm, | 499 | static inline int kvm_deassign_device(struct kvm *kvm, |
502 | struct kvm_assigned_dev_kernel *assigned_dev) | 500 | struct kvm_assigned_dev_kernel *assigned_dev) |
503 | { | 501 | { |
504 | return 0; | 502 | return 0; |
505 | } | 503 | } |
506 | #endif /* CONFIG_IOMMU_API */ | 504 | #endif /* CONFIG_IOMMU_API */ |
507 | 505 | ||
508 | static inline void kvm_guest_enter(void) | 506 | static inline void kvm_guest_enter(void) |
509 | { | 507 | { |
510 | account_system_vtime(current); | 508 | account_system_vtime(current); |
511 | current->flags |= PF_VCPU; | 509 | current->flags |= PF_VCPU; |
512 | } | 510 | } |
513 | 511 | ||
514 | static inline void kvm_guest_exit(void) | 512 | static inline void kvm_guest_exit(void) |
515 | { | 513 | { |
516 | account_system_vtime(current); | 514 | account_system_vtime(current); |
517 | current->flags &= ~PF_VCPU; | 515 | current->flags &= ~PF_VCPU; |
518 | } | 516 | } |
519 | 517 | ||
520 | static inline gpa_t gfn_to_gpa(gfn_t gfn) | 518 | static inline gpa_t gfn_to_gpa(gfn_t gfn) |
521 | { | 519 | { |
522 | return (gpa_t)gfn << PAGE_SHIFT; | 520 | return (gpa_t)gfn << PAGE_SHIFT; |
523 | } | 521 | } |
524 | 522 | ||
525 | static inline hpa_t pfn_to_hpa(pfn_t pfn) | 523 | static inline hpa_t pfn_to_hpa(pfn_t pfn) |
526 | { | 524 | { |
527 | return (hpa_t)pfn << PAGE_SHIFT; | 525 | return (hpa_t)pfn << PAGE_SHIFT; |
528 | } | 526 | } |
529 | 527 | ||
530 | static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu) | 528 | static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu) |
531 | { | 529 | { |
532 | set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests); | 530 | set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests); |
533 | } | 531 | } |
534 | 532 | ||
535 | enum kvm_stat_kind { | 533 | enum kvm_stat_kind { |
536 | KVM_STAT_VM, | 534 | KVM_STAT_VM, |
537 | KVM_STAT_VCPU, | 535 | KVM_STAT_VCPU, |
538 | }; | 536 | }; |
539 | 537 | ||
540 | struct kvm_stats_debugfs_item { | 538 | struct kvm_stats_debugfs_item { |
541 | const char *name; | 539 | const char *name; |
542 | int offset; | 540 | int offset; |
543 | enum kvm_stat_kind kind; | 541 | enum kvm_stat_kind kind; |
544 | struct dentry *dentry; | 542 | struct dentry *dentry; |
545 | }; | 543 | }; |
546 | extern struct kvm_stats_debugfs_item debugfs_entries[]; | 544 | extern struct kvm_stats_debugfs_item debugfs_entries[]; |
547 | extern struct dentry *kvm_debugfs_dir; | 545 | extern struct dentry *kvm_debugfs_dir; |
548 | 546 | ||
549 | #ifdef KVM_ARCH_WANT_MMU_NOTIFIER | 547 | #ifdef KVM_ARCH_WANT_MMU_NOTIFIER |
550 | static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) | 548 | static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) |
551 | { | 549 | { |
552 | if (unlikely(vcpu->kvm->mmu_notifier_count)) | 550 | if (unlikely(vcpu->kvm->mmu_notifier_count)) |
553 | return 1; | 551 | return 1; |
554 | /* | 552 | /* |
555 | * Both reads happen under the mmu_lock and both values are | 553 | * Both reads happen under the mmu_lock and both values are |
556 | * modified under mmu_lock, so there's no need of smb_rmb() | 554 | * modified under mmu_lock, so there's no need of smb_rmb() |
557 | * here in between, otherwise mmu_notifier_count should be | 555 | * here in between, otherwise mmu_notifier_count should be |
558 | * read before mmu_notifier_seq, see | 556 | * read before mmu_notifier_seq, see |
559 | * mmu_notifier_invalidate_range_end write side. | 557 | * mmu_notifier_invalidate_range_end write side. |
560 | */ | 558 | */ |
561 | if (vcpu->kvm->mmu_notifier_seq != mmu_seq) | 559 | if (vcpu->kvm->mmu_notifier_seq != mmu_seq) |
562 | return 1; | 560 | return 1; |
563 | return 0; | 561 | return 0; |
564 | } | 562 | } |
565 | #endif | ||
566 | |||
567 | #ifndef KVM_ARCH_HAS_UNALIAS_INSTANTIATION | ||
568 | #define unalias_gfn_instantiation unalias_gfn | ||
569 | #endif | 563 | #endif |
570 | 564 | ||
571 | #ifdef CONFIG_HAVE_KVM_IRQCHIP | 565 | #ifdef CONFIG_HAVE_KVM_IRQCHIP |
572 | 566 | ||
573 | #define KVM_MAX_IRQ_ROUTES 1024 | 567 | #define KVM_MAX_IRQ_ROUTES 1024 |
574 | 568 | ||
575 | int kvm_setup_default_irq_routing(struct kvm *kvm); | 569 | int kvm_setup_default_irq_routing(struct kvm *kvm); |
576 | int kvm_set_irq_routing(struct kvm *kvm, | 570 | int kvm_set_irq_routing(struct kvm *kvm, |
577 | const struct kvm_irq_routing_entry *entries, | 571 | const struct kvm_irq_routing_entry *entries, |
578 | unsigned nr, | 572 | unsigned nr, |
579 | unsigned flags); | 573 | unsigned flags); |
580 | void kvm_free_irq_routing(struct kvm *kvm); | 574 | void kvm_free_irq_routing(struct kvm *kvm); |
581 | 575 | ||
582 | #else | 576 | #else |
583 | 577 | ||
584 | static inline void kvm_free_irq_routing(struct kvm *kvm) {} | 578 | static inline void kvm_free_irq_routing(struct kvm *kvm) {} |
585 | 579 | ||
586 | #endif | 580 | #endif |
587 | 581 | ||
588 | #ifdef CONFIG_HAVE_KVM_EVENTFD | 582 | #ifdef CONFIG_HAVE_KVM_EVENTFD |
589 | 583 | ||
590 | void kvm_eventfd_init(struct kvm *kvm); | 584 | void kvm_eventfd_init(struct kvm *kvm); |
591 | int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); | 585 | int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); |
592 | void kvm_irqfd_release(struct kvm *kvm); | 586 | void kvm_irqfd_release(struct kvm *kvm); |
593 | int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); | 587 | int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); |
594 | 588 | ||
595 | #else | 589 | #else |
596 | 590 | ||
597 | static inline void kvm_eventfd_init(struct kvm *kvm) {} | 591 | static inline void kvm_eventfd_init(struct kvm *kvm) {} |
598 | static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) | 592 | static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) |
599 | { | 593 | { |
600 | return -EINVAL; | 594 | return -EINVAL; |
601 | } | 595 | } |
602 | 596 | ||
603 | static inline void kvm_irqfd_release(struct kvm *kvm) {} | 597 | static inline void kvm_irqfd_release(struct kvm *kvm) {} |
604 | static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) | 598 | static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) |
605 | { | 599 | { |
606 | return -ENOSYS; | 600 | return -ENOSYS; |
607 | } | 601 | } |
608 | 602 | ||
609 | #endif /* CONFIG_HAVE_KVM_EVENTFD */ | 603 | #endif /* CONFIG_HAVE_KVM_EVENTFD */ |
610 | 604 | ||
611 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE | 605 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE |
612 | static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) | 606 | static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) |
613 | { | 607 | { |
614 | return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id; | 608 | return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id; |
615 | } | 609 | } |
616 | #endif | 610 | #endif |
617 | 611 | ||
618 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT | 612 | #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT |
619 | 613 | ||
620 | long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, | 614 | long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, |
621 | unsigned long arg); | 615 | unsigned long arg); |
622 | 616 | ||
623 | #else | 617 | #else |
624 | 618 | ||
625 | static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, | 619 | static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, |
626 | unsigned long arg) | 620 | unsigned long arg) |
627 | { | 621 | { |
628 | return -ENOTTY; | 622 | return -ENOTTY; |
629 | } | 623 | } |
630 | 624 | ||
631 | #endif | 625 | #endif |
632 | 626 | ||
633 | #endif | 627 | #endif |
634 | 628 | ||
635 | 629 |
virt/kvm/kvm_main.c
1 | /* | 1 | /* |
2 | * Kernel-based Virtual Machine driver for Linux | 2 | * Kernel-based Virtual Machine driver for Linux |
3 | * | 3 | * |
4 | * This module enables machines with Intel VT-x extensions to run virtual | 4 | * This module enables machines with Intel VT-x extensions to run virtual |
5 | * machines without emulation or binary translation. | 5 | * machines without emulation or binary translation. |
6 | * | 6 | * |
7 | * Copyright (C) 2006 Qumranet, Inc. | 7 | * Copyright (C) 2006 Qumranet, Inc. |
8 | * Copyright 2010 Red Hat, Inc. and/or its affilates. | 8 | * Copyright 2010 Red Hat, Inc. and/or its affilates. |
9 | * | 9 | * |
10 | * Authors: | 10 | * Authors: |
11 | * Avi Kivity <avi@qumranet.com> | 11 | * Avi Kivity <avi@qumranet.com> |
12 | * Yaniv Kamay <yaniv@qumranet.com> | 12 | * Yaniv Kamay <yaniv@qumranet.com> |
13 | * | 13 | * |
14 | * This work is licensed under the terms of the GNU GPL, version 2. See | 14 | * This work is licensed under the terms of the GNU GPL, version 2. See |
15 | * the COPYING file in the top-level directory. | 15 | * the COPYING file in the top-level directory. |
16 | * | 16 | * |
17 | */ | 17 | */ |
18 | 18 | ||
19 | #include "iodev.h" | 19 | #include "iodev.h" |
20 | 20 | ||
21 | #include <linux/kvm_host.h> | 21 | #include <linux/kvm_host.h> |
22 | #include <linux/kvm.h> | 22 | #include <linux/kvm.h> |
23 | #include <linux/module.h> | 23 | #include <linux/module.h> |
24 | #include <linux/errno.h> | 24 | #include <linux/errno.h> |
25 | #include <linux/percpu.h> | 25 | #include <linux/percpu.h> |
26 | #include <linux/mm.h> | 26 | #include <linux/mm.h> |
27 | #include <linux/miscdevice.h> | 27 | #include <linux/miscdevice.h> |
28 | #include <linux/vmalloc.h> | 28 | #include <linux/vmalloc.h> |
29 | #include <linux/reboot.h> | 29 | #include <linux/reboot.h> |
30 | #include <linux/debugfs.h> | 30 | #include <linux/debugfs.h> |
31 | #include <linux/highmem.h> | 31 | #include <linux/highmem.h> |
32 | #include <linux/file.h> | 32 | #include <linux/file.h> |
33 | #include <linux/sysdev.h> | 33 | #include <linux/sysdev.h> |
34 | #include <linux/cpu.h> | 34 | #include <linux/cpu.h> |
35 | #include <linux/sched.h> | 35 | #include <linux/sched.h> |
36 | #include <linux/cpumask.h> | 36 | #include <linux/cpumask.h> |
37 | #include <linux/smp.h> | 37 | #include <linux/smp.h> |
38 | #include <linux/anon_inodes.h> | 38 | #include <linux/anon_inodes.h> |
39 | #include <linux/profile.h> | 39 | #include <linux/profile.h> |
40 | #include <linux/kvm_para.h> | 40 | #include <linux/kvm_para.h> |
41 | #include <linux/pagemap.h> | 41 | #include <linux/pagemap.h> |
42 | #include <linux/mman.h> | 42 | #include <linux/mman.h> |
43 | #include <linux/swap.h> | 43 | #include <linux/swap.h> |
44 | #include <linux/bitops.h> | 44 | #include <linux/bitops.h> |
45 | #include <linux/spinlock.h> | 45 | #include <linux/spinlock.h> |
46 | #include <linux/compat.h> | 46 | #include <linux/compat.h> |
47 | #include <linux/srcu.h> | 47 | #include <linux/srcu.h> |
48 | #include <linux/hugetlb.h> | 48 | #include <linux/hugetlb.h> |
49 | #include <linux/slab.h> | 49 | #include <linux/slab.h> |
50 | 50 | ||
51 | #include <asm/processor.h> | 51 | #include <asm/processor.h> |
52 | #include <asm/io.h> | 52 | #include <asm/io.h> |
53 | #include <asm/uaccess.h> | 53 | #include <asm/uaccess.h> |
54 | #include <asm/pgtable.h> | 54 | #include <asm/pgtable.h> |
55 | #include <asm-generic/bitops/le.h> | 55 | #include <asm-generic/bitops/le.h> |
56 | 56 | ||
57 | #include "coalesced_mmio.h" | 57 | #include "coalesced_mmio.h" |
58 | 58 | ||
59 | #define CREATE_TRACE_POINTS | 59 | #define CREATE_TRACE_POINTS |
60 | #include <trace/events/kvm.h> | 60 | #include <trace/events/kvm.h> |
61 | 61 | ||
62 | MODULE_AUTHOR("Qumranet"); | 62 | MODULE_AUTHOR("Qumranet"); |
63 | MODULE_LICENSE("GPL"); | 63 | MODULE_LICENSE("GPL"); |
64 | 64 | ||
65 | /* | 65 | /* |
66 | * Ordering of locks: | 66 | * Ordering of locks: |
67 | * | 67 | * |
68 | * kvm->lock --> kvm->slots_lock --> kvm->irq_lock | 68 | * kvm->lock --> kvm->slots_lock --> kvm->irq_lock |
69 | */ | 69 | */ |
70 | 70 | ||
71 | DEFINE_SPINLOCK(kvm_lock); | 71 | DEFINE_SPINLOCK(kvm_lock); |
72 | LIST_HEAD(vm_list); | 72 | LIST_HEAD(vm_list); |
73 | 73 | ||
74 | static cpumask_var_t cpus_hardware_enabled; | 74 | static cpumask_var_t cpus_hardware_enabled; |
75 | static int kvm_usage_count = 0; | 75 | static int kvm_usage_count = 0; |
76 | static atomic_t hardware_enable_failed; | 76 | static atomic_t hardware_enable_failed; |
77 | 77 | ||
78 | struct kmem_cache *kvm_vcpu_cache; | 78 | struct kmem_cache *kvm_vcpu_cache; |
79 | EXPORT_SYMBOL_GPL(kvm_vcpu_cache); | 79 | EXPORT_SYMBOL_GPL(kvm_vcpu_cache); |
80 | 80 | ||
81 | static __read_mostly struct preempt_ops kvm_preempt_ops; | 81 | static __read_mostly struct preempt_ops kvm_preempt_ops; |
82 | 82 | ||
83 | struct dentry *kvm_debugfs_dir; | 83 | struct dentry *kvm_debugfs_dir; |
84 | 84 | ||
85 | static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, | 85 | static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, |
86 | unsigned long arg); | 86 | unsigned long arg); |
87 | static int hardware_enable_all(void); | 87 | static int hardware_enable_all(void); |
88 | static void hardware_disable_all(void); | 88 | static void hardware_disable_all(void); |
89 | 89 | ||
90 | static void kvm_io_bus_destroy(struct kvm_io_bus *bus); | 90 | static void kvm_io_bus_destroy(struct kvm_io_bus *bus); |
91 | 91 | ||
92 | static bool kvm_rebooting; | 92 | static bool kvm_rebooting; |
93 | 93 | ||
94 | static bool largepages_enabled = true; | 94 | static bool largepages_enabled = true; |
95 | 95 | ||
96 | struct page *hwpoison_page; | 96 | struct page *hwpoison_page; |
97 | pfn_t hwpoison_pfn; | 97 | pfn_t hwpoison_pfn; |
98 | 98 | ||
99 | inline int kvm_is_mmio_pfn(pfn_t pfn) | 99 | inline int kvm_is_mmio_pfn(pfn_t pfn) |
100 | { | 100 | { |
101 | if (pfn_valid(pfn)) { | 101 | if (pfn_valid(pfn)) { |
102 | struct page *page = compound_head(pfn_to_page(pfn)); | 102 | struct page *page = compound_head(pfn_to_page(pfn)); |
103 | return PageReserved(page); | 103 | return PageReserved(page); |
104 | } | 104 | } |
105 | 105 | ||
106 | return true; | 106 | return true; |
107 | } | 107 | } |
108 | 108 | ||
109 | /* | 109 | /* |
110 | * Switches to specified vcpu, until a matching vcpu_put() | 110 | * Switches to specified vcpu, until a matching vcpu_put() |
111 | */ | 111 | */ |
112 | void vcpu_load(struct kvm_vcpu *vcpu) | 112 | void vcpu_load(struct kvm_vcpu *vcpu) |
113 | { | 113 | { |
114 | int cpu; | 114 | int cpu; |
115 | 115 | ||
116 | mutex_lock(&vcpu->mutex); | 116 | mutex_lock(&vcpu->mutex); |
117 | cpu = get_cpu(); | 117 | cpu = get_cpu(); |
118 | preempt_notifier_register(&vcpu->preempt_notifier); | 118 | preempt_notifier_register(&vcpu->preempt_notifier); |
119 | kvm_arch_vcpu_load(vcpu, cpu); | 119 | kvm_arch_vcpu_load(vcpu, cpu); |
120 | put_cpu(); | 120 | put_cpu(); |
121 | } | 121 | } |
122 | 122 | ||
123 | void vcpu_put(struct kvm_vcpu *vcpu) | 123 | void vcpu_put(struct kvm_vcpu *vcpu) |
124 | { | 124 | { |
125 | preempt_disable(); | 125 | preempt_disable(); |
126 | kvm_arch_vcpu_put(vcpu); | 126 | kvm_arch_vcpu_put(vcpu); |
127 | preempt_notifier_unregister(&vcpu->preempt_notifier); | 127 | preempt_notifier_unregister(&vcpu->preempt_notifier); |
128 | preempt_enable(); | 128 | preempt_enable(); |
129 | mutex_unlock(&vcpu->mutex); | 129 | mutex_unlock(&vcpu->mutex); |
130 | } | 130 | } |
131 | 131 | ||
132 | static void ack_flush(void *_completed) | 132 | static void ack_flush(void *_completed) |
133 | { | 133 | { |
134 | } | 134 | } |
135 | 135 | ||
136 | static bool make_all_cpus_request(struct kvm *kvm, unsigned int req) | 136 | static bool make_all_cpus_request(struct kvm *kvm, unsigned int req) |
137 | { | 137 | { |
138 | int i, cpu, me; | 138 | int i, cpu, me; |
139 | cpumask_var_t cpus; | 139 | cpumask_var_t cpus; |
140 | bool called = true; | 140 | bool called = true; |
141 | struct kvm_vcpu *vcpu; | 141 | struct kvm_vcpu *vcpu; |
142 | 142 | ||
143 | zalloc_cpumask_var(&cpus, GFP_ATOMIC); | 143 | zalloc_cpumask_var(&cpus, GFP_ATOMIC); |
144 | 144 | ||
145 | raw_spin_lock(&kvm->requests_lock); | 145 | raw_spin_lock(&kvm->requests_lock); |
146 | me = smp_processor_id(); | 146 | me = smp_processor_id(); |
147 | kvm_for_each_vcpu(i, vcpu, kvm) { | 147 | kvm_for_each_vcpu(i, vcpu, kvm) { |
148 | if (test_and_set_bit(req, &vcpu->requests)) | 148 | if (test_and_set_bit(req, &vcpu->requests)) |
149 | continue; | 149 | continue; |
150 | cpu = vcpu->cpu; | 150 | cpu = vcpu->cpu; |
151 | if (cpus != NULL && cpu != -1 && cpu != me) | 151 | if (cpus != NULL && cpu != -1 && cpu != me) |
152 | cpumask_set_cpu(cpu, cpus); | 152 | cpumask_set_cpu(cpu, cpus); |
153 | } | 153 | } |
154 | if (unlikely(cpus == NULL)) | 154 | if (unlikely(cpus == NULL)) |
155 | smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1); | 155 | smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1); |
156 | else if (!cpumask_empty(cpus)) | 156 | else if (!cpumask_empty(cpus)) |
157 | smp_call_function_many(cpus, ack_flush, NULL, 1); | 157 | smp_call_function_many(cpus, ack_flush, NULL, 1); |
158 | else | 158 | else |
159 | called = false; | 159 | called = false; |
160 | raw_spin_unlock(&kvm->requests_lock); | 160 | raw_spin_unlock(&kvm->requests_lock); |
161 | free_cpumask_var(cpus); | 161 | free_cpumask_var(cpus); |
162 | return called; | 162 | return called; |
163 | } | 163 | } |
164 | 164 | ||
165 | void kvm_flush_remote_tlbs(struct kvm *kvm) | 165 | void kvm_flush_remote_tlbs(struct kvm *kvm) |
166 | { | 166 | { |
167 | if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) | 167 | if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) |
168 | ++kvm->stat.remote_tlb_flush; | 168 | ++kvm->stat.remote_tlb_flush; |
169 | } | 169 | } |
170 | 170 | ||
171 | void kvm_reload_remote_mmus(struct kvm *kvm) | 171 | void kvm_reload_remote_mmus(struct kvm *kvm) |
172 | { | 172 | { |
173 | make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); | 173 | make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); |
174 | } | 174 | } |
175 | 175 | ||
176 | int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) | 176 | int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) |
177 | { | 177 | { |
178 | struct page *page; | 178 | struct page *page; |
179 | int r; | 179 | int r; |
180 | 180 | ||
181 | mutex_init(&vcpu->mutex); | 181 | mutex_init(&vcpu->mutex); |
182 | vcpu->cpu = -1; | 182 | vcpu->cpu = -1; |
183 | vcpu->kvm = kvm; | 183 | vcpu->kvm = kvm; |
184 | vcpu->vcpu_id = id; | 184 | vcpu->vcpu_id = id; |
185 | init_waitqueue_head(&vcpu->wq); | 185 | init_waitqueue_head(&vcpu->wq); |
186 | 186 | ||
187 | page = alloc_page(GFP_KERNEL | __GFP_ZERO); | 187 | page = alloc_page(GFP_KERNEL | __GFP_ZERO); |
188 | if (!page) { | 188 | if (!page) { |
189 | r = -ENOMEM; | 189 | r = -ENOMEM; |
190 | goto fail; | 190 | goto fail; |
191 | } | 191 | } |
192 | vcpu->run = page_address(page); | 192 | vcpu->run = page_address(page); |
193 | 193 | ||
194 | r = kvm_arch_vcpu_init(vcpu); | 194 | r = kvm_arch_vcpu_init(vcpu); |
195 | if (r < 0) | 195 | if (r < 0) |
196 | goto fail_free_run; | 196 | goto fail_free_run; |
197 | return 0; | 197 | return 0; |
198 | 198 | ||
199 | fail_free_run: | 199 | fail_free_run: |
200 | free_page((unsigned long)vcpu->run); | 200 | free_page((unsigned long)vcpu->run); |
201 | fail: | 201 | fail: |
202 | return r; | 202 | return r; |
203 | } | 203 | } |
204 | EXPORT_SYMBOL_GPL(kvm_vcpu_init); | 204 | EXPORT_SYMBOL_GPL(kvm_vcpu_init); |
205 | 205 | ||
206 | void kvm_vcpu_uninit(struct kvm_vcpu *vcpu) | 206 | void kvm_vcpu_uninit(struct kvm_vcpu *vcpu) |
207 | { | 207 | { |
208 | kvm_arch_vcpu_uninit(vcpu); | 208 | kvm_arch_vcpu_uninit(vcpu); |
209 | free_page((unsigned long)vcpu->run); | 209 | free_page((unsigned long)vcpu->run); |
210 | } | 210 | } |
211 | EXPORT_SYMBOL_GPL(kvm_vcpu_uninit); | 211 | EXPORT_SYMBOL_GPL(kvm_vcpu_uninit); |
212 | 212 | ||
213 | #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) | 213 | #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) |
214 | static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) | 214 | static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) |
215 | { | 215 | { |
216 | return container_of(mn, struct kvm, mmu_notifier); | 216 | return container_of(mn, struct kvm, mmu_notifier); |
217 | } | 217 | } |
218 | 218 | ||
219 | static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, | 219 | static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, |
220 | struct mm_struct *mm, | 220 | struct mm_struct *mm, |
221 | unsigned long address) | 221 | unsigned long address) |
222 | { | 222 | { |
223 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 223 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
224 | int need_tlb_flush, idx; | 224 | int need_tlb_flush, idx; |
225 | 225 | ||
226 | /* | 226 | /* |
227 | * When ->invalidate_page runs, the linux pte has been zapped | 227 | * When ->invalidate_page runs, the linux pte has been zapped |
228 | * already but the page is still allocated until | 228 | * already but the page is still allocated until |
229 | * ->invalidate_page returns. So if we increase the sequence | 229 | * ->invalidate_page returns. So if we increase the sequence |
230 | * here the kvm page fault will notice if the spte can't be | 230 | * here the kvm page fault will notice if the spte can't be |
231 | * established because the page is going to be freed. If | 231 | * established because the page is going to be freed. If |
232 | * instead the kvm page fault establishes the spte before | 232 | * instead the kvm page fault establishes the spte before |
233 | * ->invalidate_page runs, kvm_unmap_hva will release it | 233 | * ->invalidate_page runs, kvm_unmap_hva will release it |
234 | * before returning. | 234 | * before returning. |
235 | * | 235 | * |
236 | * The sequence increase only need to be seen at spin_unlock | 236 | * The sequence increase only need to be seen at spin_unlock |
237 | * time, and not at spin_lock time. | 237 | * time, and not at spin_lock time. |
238 | * | 238 | * |
239 | * Increasing the sequence after the spin_unlock would be | 239 | * Increasing the sequence after the spin_unlock would be |
240 | * unsafe because the kvm page fault could then establish the | 240 | * unsafe because the kvm page fault could then establish the |
241 | * pte after kvm_unmap_hva returned, without noticing the page | 241 | * pte after kvm_unmap_hva returned, without noticing the page |
242 | * is going to be freed. | 242 | * is going to be freed. |
243 | */ | 243 | */ |
244 | idx = srcu_read_lock(&kvm->srcu); | 244 | idx = srcu_read_lock(&kvm->srcu); |
245 | spin_lock(&kvm->mmu_lock); | 245 | spin_lock(&kvm->mmu_lock); |
246 | kvm->mmu_notifier_seq++; | 246 | kvm->mmu_notifier_seq++; |
247 | need_tlb_flush = kvm_unmap_hva(kvm, address); | 247 | need_tlb_flush = kvm_unmap_hva(kvm, address); |
248 | spin_unlock(&kvm->mmu_lock); | 248 | spin_unlock(&kvm->mmu_lock); |
249 | srcu_read_unlock(&kvm->srcu, idx); | 249 | srcu_read_unlock(&kvm->srcu, idx); |
250 | 250 | ||
251 | /* we've to flush the tlb before the pages can be freed */ | 251 | /* we've to flush the tlb before the pages can be freed */ |
252 | if (need_tlb_flush) | 252 | if (need_tlb_flush) |
253 | kvm_flush_remote_tlbs(kvm); | 253 | kvm_flush_remote_tlbs(kvm); |
254 | 254 | ||
255 | } | 255 | } |
256 | 256 | ||
257 | static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, | 257 | static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, |
258 | struct mm_struct *mm, | 258 | struct mm_struct *mm, |
259 | unsigned long address, | 259 | unsigned long address, |
260 | pte_t pte) | 260 | pte_t pte) |
261 | { | 261 | { |
262 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 262 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
263 | int idx; | 263 | int idx; |
264 | 264 | ||
265 | idx = srcu_read_lock(&kvm->srcu); | 265 | idx = srcu_read_lock(&kvm->srcu); |
266 | spin_lock(&kvm->mmu_lock); | 266 | spin_lock(&kvm->mmu_lock); |
267 | kvm->mmu_notifier_seq++; | 267 | kvm->mmu_notifier_seq++; |
268 | kvm_set_spte_hva(kvm, address, pte); | 268 | kvm_set_spte_hva(kvm, address, pte); |
269 | spin_unlock(&kvm->mmu_lock); | 269 | spin_unlock(&kvm->mmu_lock); |
270 | srcu_read_unlock(&kvm->srcu, idx); | 270 | srcu_read_unlock(&kvm->srcu, idx); |
271 | } | 271 | } |
272 | 272 | ||
273 | static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, | 273 | static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, |
274 | struct mm_struct *mm, | 274 | struct mm_struct *mm, |
275 | unsigned long start, | 275 | unsigned long start, |
276 | unsigned long end) | 276 | unsigned long end) |
277 | { | 277 | { |
278 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 278 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
279 | int need_tlb_flush = 0, idx; | 279 | int need_tlb_flush = 0, idx; |
280 | 280 | ||
281 | idx = srcu_read_lock(&kvm->srcu); | 281 | idx = srcu_read_lock(&kvm->srcu); |
282 | spin_lock(&kvm->mmu_lock); | 282 | spin_lock(&kvm->mmu_lock); |
283 | /* | 283 | /* |
284 | * The count increase must become visible at unlock time as no | 284 | * The count increase must become visible at unlock time as no |
285 | * spte can be established without taking the mmu_lock and | 285 | * spte can be established without taking the mmu_lock and |
286 | * count is also read inside the mmu_lock critical section. | 286 | * count is also read inside the mmu_lock critical section. |
287 | */ | 287 | */ |
288 | kvm->mmu_notifier_count++; | 288 | kvm->mmu_notifier_count++; |
289 | for (; start < end; start += PAGE_SIZE) | 289 | for (; start < end; start += PAGE_SIZE) |
290 | need_tlb_flush |= kvm_unmap_hva(kvm, start); | 290 | need_tlb_flush |= kvm_unmap_hva(kvm, start); |
291 | spin_unlock(&kvm->mmu_lock); | 291 | spin_unlock(&kvm->mmu_lock); |
292 | srcu_read_unlock(&kvm->srcu, idx); | 292 | srcu_read_unlock(&kvm->srcu, idx); |
293 | 293 | ||
294 | /* we've to flush the tlb before the pages can be freed */ | 294 | /* we've to flush the tlb before the pages can be freed */ |
295 | if (need_tlb_flush) | 295 | if (need_tlb_flush) |
296 | kvm_flush_remote_tlbs(kvm); | 296 | kvm_flush_remote_tlbs(kvm); |
297 | } | 297 | } |
298 | 298 | ||
299 | static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, | 299 | static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, |
300 | struct mm_struct *mm, | 300 | struct mm_struct *mm, |
301 | unsigned long start, | 301 | unsigned long start, |
302 | unsigned long end) | 302 | unsigned long end) |
303 | { | 303 | { |
304 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 304 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
305 | 305 | ||
306 | spin_lock(&kvm->mmu_lock); | 306 | spin_lock(&kvm->mmu_lock); |
307 | /* | 307 | /* |
308 | * This sequence increase will notify the kvm page fault that | 308 | * This sequence increase will notify the kvm page fault that |
309 | * the page that is going to be mapped in the spte could have | 309 | * the page that is going to be mapped in the spte could have |
310 | * been freed. | 310 | * been freed. |
311 | */ | 311 | */ |
312 | kvm->mmu_notifier_seq++; | 312 | kvm->mmu_notifier_seq++; |
313 | /* | 313 | /* |
314 | * The above sequence increase must be visible before the | 314 | * The above sequence increase must be visible before the |
315 | * below count decrease but both values are read by the kvm | 315 | * below count decrease but both values are read by the kvm |
316 | * page fault under mmu_lock spinlock so we don't need to add | 316 | * page fault under mmu_lock spinlock so we don't need to add |
317 | * a smb_wmb() here in between the two. | 317 | * a smb_wmb() here in between the two. |
318 | */ | 318 | */ |
319 | kvm->mmu_notifier_count--; | 319 | kvm->mmu_notifier_count--; |
320 | spin_unlock(&kvm->mmu_lock); | 320 | spin_unlock(&kvm->mmu_lock); |
321 | 321 | ||
322 | BUG_ON(kvm->mmu_notifier_count < 0); | 322 | BUG_ON(kvm->mmu_notifier_count < 0); |
323 | } | 323 | } |
324 | 324 | ||
325 | static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, | 325 | static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, |
326 | struct mm_struct *mm, | 326 | struct mm_struct *mm, |
327 | unsigned long address) | 327 | unsigned long address) |
328 | { | 328 | { |
329 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 329 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
330 | int young, idx; | 330 | int young, idx; |
331 | 331 | ||
332 | idx = srcu_read_lock(&kvm->srcu); | 332 | idx = srcu_read_lock(&kvm->srcu); |
333 | spin_lock(&kvm->mmu_lock); | 333 | spin_lock(&kvm->mmu_lock); |
334 | young = kvm_age_hva(kvm, address); | 334 | young = kvm_age_hva(kvm, address); |
335 | spin_unlock(&kvm->mmu_lock); | 335 | spin_unlock(&kvm->mmu_lock); |
336 | srcu_read_unlock(&kvm->srcu, idx); | 336 | srcu_read_unlock(&kvm->srcu, idx); |
337 | 337 | ||
338 | if (young) | 338 | if (young) |
339 | kvm_flush_remote_tlbs(kvm); | 339 | kvm_flush_remote_tlbs(kvm); |
340 | 340 | ||
341 | return young; | 341 | return young; |
342 | } | 342 | } |
343 | 343 | ||
344 | static void kvm_mmu_notifier_release(struct mmu_notifier *mn, | 344 | static void kvm_mmu_notifier_release(struct mmu_notifier *mn, |
345 | struct mm_struct *mm) | 345 | struct mm_struct *mm) |
346 | { | 346 | { |
347 | struct kvm *kvm = mmu_notifier_to_kvm(mn); | 347 | struct kvm *kvm = mmu_notifier_to_kvm(mn); |
348 | int idx; | 348 | int idx; |
349 | 349 | ||
350 | idx = srcu_read_lock(&kvm->srcu); | 350 | idx = srcu_read_lock(&kvm->srcu); |
351 | kvm_arch_flush_shadow(kvm); | 351 | kvm_arch_flush_shadow(kvm); |
352 | srcu_read_unlock(&kvm->srcu, idx); | 352 | srcu_read_unlock(&kvm->srcu, idx); |
353 | } | 353 | } |
354 | 354 | ||
355 | static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { | 355 | static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { |
356 | .invalidate_page = kvm_mmu_notifier_invalidate_page, | 356 | .invalidate_page = kvm_mmu_notifier_invalidate_page, |
357 | .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, | 357 | .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, |
358 | .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, | 358 | .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, |
359 | .clear_flush_young = kvm_mmu_notifier_clear_flush_young, | 359 | .clear_flush_young = kvm_mmu_notifier_clear_flush_young, |
360 | .change_pte = kvm_mmu_notifier_change_pte, | 360 | .change_pte = kvm_mmu_notifier_change_pte, |
361 | .release = kvm_mmu_notifier_release, | 361 | .release = kvm_mmu_notifier_release, |
362 | }; | 362 | }; |
363 | 363 | ||
364 | static int kvm_init_mmu_notifier(struct kvm *kvm) | 364 | static int kvm_init_mmu_notifier(struct kvm *kvm) |
365 | { | 365 | { |
366 | kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops; | 366 | kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops; |
367 | return mmu_notifier_register(&kvm->mmu_notifier, current->mm); | 367 | return mmu_notifier_register(&kvm->mmu_notifier, current->mm); |
368 | } | 368 | } |
369 | 369 | ||
370 | #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */ | 370 | #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */ |
371 | 371 | ||
372 | static int kvm_init_mmu_notifier(struct kvm *kvm) | 372 | static int kvm_init_mmu_notifier(struct kvm *kvm) |
373 | { | 373 | { |
374 | return 0; | 374 | return 0; |
375 | } | 375 | } |
376 | 376 | ||
377 | #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ | 377 | #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ |
378 | 378 | ||
379 | static struct kvm *kvm_create_vm(void) | 379 | static struct kvm *kvm_create_vm(void) |
380 | { | 380 | { |
381 | int r = 0, i; | 381 | int r = 0, i; |
382 | struct kvm *kvm = kvm_arch_create_vm(); | 382 | struct kvm *kvm = kvm_arch_create_vm(); |
383 | 383 | ||
384 | if (IS_ERR(kvm)) | 384 | if (IS_ERR(kvm)) |
385 | goto out; | 385 | goto out; |
386 | 386 | ||
387 | r = hardware_enable_all(); | 387 | r = hardware_enable_all(); |
388 | if (r) | 388 | if (r) |
389 | goto out_err_nodisable; | 389 | goto out_err_nodisable; |
390 | 390 | ||
391 | #ifdef CONFIG_HAVE_KVM_IRQCHIP | 391 | #ifdef CONFIG_HAVE_KVM_IRQCHIP |
392 | INIT_HLIST_HEAD(&kvm->mask_notifier_list); | 392 | INIT_HLIST_HEAD(&kvm->mask_notifier_list); |
393 | INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list); | 393 | INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list); |
394 | #endif | 394 | #endif |
395 | 395 | ||
396 | r = -ENOMEM; | 396 | r = -ENOMEM; |
397 | kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); | 397 | kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); |
398 | if (!kvm->memslots) | 398 | if (!kvm->memslots) |
399 | goto out_err; | 399 | goto out_err; |
400 | if (init_srcu_struct(&kvm->srcu)) | 400 | if (init_srcu_struct(&kvm->srcu)) |
401 | goto out_err; | 401 | goto out_err; |
402 | for (i = 0; i < KVM_NR_BUSES; i++) { | 402 | for (i = 0; i < KVM_NR_BUSES; i++) { |
403 | kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus), | 403 | kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus), |
404 | GFP_KERNEL); | 404 | GFP_KERNEL); |
405 | if (!kvm->buses[i]) { | 405 | if (!kvm->buses[i]) { |
406 | cleanup_srcu_struct(&kvm->srcu); | 406 | cleanup_srcu_struct(&kvm->srcu); |
407 | goto out_err; | 407 | goto out_err; |
408 | } | 408 | } |
409 | } | 409 | } |
410 | 410 | ||
411 | r = kvm_init_mmu_notifier(kvm); | 411 | r = kvm_init_mmu_notifier(kvm); |
412 | if (r) { | 412 | if (r) { |
413 | cleanup_srcu_struct(&kvm->srcu); | 413 | cleanup_srcu_struct(&kvm->srcu); |
414 | goto out_err; | 414 | goto out_err; |
415 | } | 415 | } |
416 | 416 | ||
417 | kvm->mm = current->mm; | 417 | kvm->mm = current->mm; |
418 | atomic_inc(&kvm->mm->mm_count); | 418 | atomic_inc(&kvm->mm->mm_count); |
419 | spin_lock_init(&kvm->mmu_lock); | 419 | spin_lock_init(&kvm->mmu_lock); |
420 | raw_spin_lock_init(&kvm->requests_lock); | 420 | raw_spin_lock_init(&kvm->requests_lock); |
421 | kvm_eventfd_init(kvm); | 421 | kvm_eventfd_init(kvm); |
422 | mutex_init(&kvm->lock); | 422 | mutex_init(&kvm->lock); |
423 | mutex_init(&kvm->irq_lock); | 423 | mutex_init(&kvm->irq_lock); |
424 | mutex_init(&kvm->slots_lock); | 424 | mutex_init(&kvm->slots_lock); |
425 | atomic_set(&kvm->users_count, 1); | 425 | atomic_set(&kvm->users_count, 1); |
426 | spin_lock(&kvm_lock); | 426 | spin_lock(&kvm_lock); |
427 | list_add(&kvm->vm_list, &vm_list); | 427 | list_add(&kvm->vm_list, &vm_list); |
428 | spin_unlock(&kvm_lock); | 428 | spin_unlock(&kvm_lock); |
429 | out: | 429 | out: |
430 | return kvm; | 430 | return kvm; |
431 | 431 | ||
432 | out_err: | 432 | out_err: |
433 | hardware_disable_all(); | 433 | hardware_disable_all(); |
434 | out_err_nodisable: | 434 | out_err_nodisable: |
435 | for (i = 0; i < KVM_NR_BUSES; i++) | 435 | for (i = 0; i < KVM_NR_BUSES; i++) |
436 | kfree(kvm->buses[i]); | 436 | kfree(kvm->buses[i]); |
437 | kfree(kvm->memslots); | 437 | kfree(kvm->memslots); |
438 | kfree(kvm); | 438 | kfree(kvm); |
439 | return ERR_PTR(r); | 439 | return ERR_PTR(r); |
440 | } | 440 | } |
441 | 441 | ||
442 | /* | 442 | /* |
443 | * Free any memory in @free but not in @dont. | 443 | * Free any memory in @free but not in @dont. |
444 | */ | 444 | */ |
445 | static void kvm_free_physmem_slot(struct kvm_memory_slot *free, | 445 | static void kvm_free_physmem_slot(struct kvm_memory_slot *free, |
446 | struct kvm_memory_slot *dont) | 446 | struct kvm_memory_slot *dont) |
447 | { | 447 | { |
448 | int i; | 448 | int i; |
449 | 449 | ||
450 | if (!dont || free->rmap != dont->rmap) | 450 | if (!dont || free->rmap != dont->rmap) |
451 | vfree(free->rmap); | 451 | vfree(free->rmap); |
452 | 452 | ||
453 | if (!dont || free->dirty_bitmap != dont->dirty_bitmap) | 453 | if (!dont || free->dirty_bitmap != dont->dirty_bitmap) |
454 | vfree(free->dirty_bitmap); | 454 | vfree(free->dirty_bitmap); |
455 | 455 | ||
456 | 456 | ||
457 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { | 457 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { |
458 | if (!dont || free->lpage_info[i] != dont->lpage_info[i]) { | 458 | if (!dont || free->lpage_info[i] != dont->lpage_info[i]) { |
459 | vfree(free->lpage_info[i]); | 459 | vfree(free->lpage_info[i]); |
460 | free->lpage_info[i] = NULL; | 460 | free->lpage_info[i] = NULL; |
461 | } | 461 | } |
462 | } | 462 | } |
463 | 463 | ||
464 | free->npages = 0; | 464 | free->npages = 0; |
465 | free->dirty_bitmap = NULL; | 465 | free->dirty_bitmap = NULL; |
466 | free->rmap = NULL; | 466 | free->rmap = NULL; |
467 | } | 467 | } |
468 | 468 | ||
469 | void kvm_free_physmem(struct kvm *kvm) | 469 | void kvm_free_physmem(struct kvm *kvm) |
470 | { | 470 | { |
471 | int i; | 471 | int i; |
472 | struct kvm_memslots *slots = kvm->memslots; | 472 | struct kvm_memslots *slots = kvm->memslots; |
473 | 473 | ||
474 | for (i = 0; i < slots->nmemslots; ++i) | 474 | for (i = 0; i < slots->nmemslots; ++i) |
475 | kvm_free_physmem_slot(&slots->memslots[i], NULL); | 475 | kvm_free_physmem_slot(&slots->memslots[i], NULL); |
476 | 476 | ||
477 | kfree(kvm->memslots); | 477 | kfree(kvm->memslots); |
478 | } | 478 | } |
479 | 479 | ||
480 | static void kvm_destroy_vm(struct kvm *kvm) | 480 | static void kvm_destroy_vm(struct kvm *kvm) |
481 | { | 481 | { |
482 | int i; | 482 | int i; |
483 | struct mm_struct *mm = kvm->mm; | 483 | struct mm_struct *mm = kvm->mm; |
484 | 484 | ||
485 | kvm_arch_sync_events(kvm); | 485 | kvm_arch_sync_events(kvm); |
486 | spin_lock(&kvm_lock); | 486 | spin_lock(&kvm_lock); |
487 | list_del(&kvm->vm_list); | 487 | list_del(&kvm->vm_list); |
488 | spin_unlock(&kvm_lock); | 488 | spin_unlock(&kvm_lock); |
489 | kvm_free_irq_routing(kvm); | 489 | kvm_free_irq_routing(kvm); |
490 | for (i = 0; i < KVM_NR_BUSES; i++) | 490 | for (i = 0; i < KVM_NR_BUSES; i++) |
491 | kvm_io_bus_destroy(kvm->buses[i]); | 491 | kvm_io_bus_destroy(kvm->buses[i]); |
492 | kvm_coalesced_mmio_free(kvm); | 492 | kvm_coalesced_mmio_free(kvm); |
493 | #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) | 493 | #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) |
494 | mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); | 494 | mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); |
495 | #else | 495 | #else |
496 | kvm_arch_flush_shadow(kvm); | 496 | kvm_arch_flush_shadow(kvm); |
497 | #endif | 497 | #endif |
498 | kvm_arch_destroy_vm(kvm); | 498 | kvm_arch_destroy_vm(kvm); |
499 | hardware_disable_all(); | 499 | hardware_disable_all(); |
500 | mmdrop(mm); | 500 | mmdrop(mm); |
501 | } | 501 | } |
502 | 502 | ||
503 | void kvm_get_kvm(struct kvm *kvm) | 503 | void kvm_get_kvm(struct kvm *kvm) |
504 | { | 504 | { |
505 | atomic_inc(&kvm->users_count); | 505 | atomic_inc(&kvm->users_count); |
506 | } | 506 | } |
507 | EXPORT_SYMBOL_GPL(kvm_get_kvm); | 507 | EXPORT_SYMBOL_GPL(kvm_get_kvm); |
508 | 508 | ||
509 | void kvm_put_kvm(struct kvm *kvm) | 509 | void kvm_put_kvm(struct kvm *kvm) |
510 | { | 510 | { |
511 | if (atomic_dec_and_test(&kvm->users_count)) | 511 | if (atomic_dec_and_test(&kvm->users_count)) |
512 | kvm_destroy_vm(kvm); | 512 | kvm_destroy_vm(kvm); |
513 | } | 513 | } |
514 | EXPORT_SYMBOL_GPL(kvm_put_kvm); | 514 | EXPORT_SYMBOL_GPL(kvm_put_kvm); |
515 | 515 | ||
516 | 516 | ||
517 | static int kvm_vm_release(struct inode *inode, struct file *filp) | 517 | static int kvm_vm_release(struct inode *inode, struct file *filp) |
518 | { | 518 | { |
519 | struct kvm *kvm = filp->private_data; | 519 | struct kvm *kvm = filp->private_data; |
520 | 520 | ||
521 | kvm_irqfd_release(kvm); | 521 | kvm_irqfd_release(kvm); |
522 | 522 | ||
523 | kvm_put_kvm(kvm); | 523 | kvm_put_kvm(kvm); |
524 | return 0; | 524 | return 0; |
525 | } | 525 | } |
526 | 526 | ||
527 | /* | 527 | /* |
528 | * Allocate some memory and give it an address in the guest physical address | 528 | * Allocate some memory and give it an address in the guest physical address |
529 | * space. | 529 | * space. |
530 | * | 530 | * |
531 | * Discontiguous memory is allowed, mostly for framebuffers. | 531 | * Discontiguous memory is allowed, mostly for framebuffers. |
532 | * | 532 | * |
533 | * Must be called holding mmap_sem for write. | 533 | * Must be called holding mmap_sem for write. |
534 | */ | 534 | */ |
535 | int __kvm_set_memory_region(struct kvm *kvm, | 535 | int __kvm_set_memory_region(struct kvm *kvm, |
536 | struct kvm_userspace_memory_region *mem, | 536 | struct kvm_userspace_memory_region *mem, |
537 | int user_alloc) | 537 | int user_alloc) |
538 | { | 538 | { |
539 | int r, flush_shadow = 0; | 539 | int r, flush_shadow = 0; |
540 | gfn_t base_gfn; | 540 | gfn_t base_gfn; |
541 | unsigned long npages; | 541 | unsigned long npages; |
542 | unsigned long i; | 542 | unsigned long i; |
543 | struct kvm_memory_slot *memslot; | 543 | struct kvm_memory_slot *memslot; |
544 | struct kvm_memory_slot old, new; | 544 | struct kvm_memory_slot old, new; |
545 | struct kvm_memslots *slots, *old_memslots; | 545 | struct kvm_memslots *slots, *old_memslots; |
546 | 546 | ||
547 | r = -EINVAL; | 547 | r = -EINVAL; |
548 | /* General sanity checks */ | 548 | /* General sanity checks */ |
549 | if (mem->memory_size & (PAGE_SIZE - 1)) | 549 | if (mem->memory_size & (PAGE_SIZE - 1)) |
550 | goto out; | 550 | goto out; |
551 | if (mem->guest_phys_addr & (PAGE_SIZE - 1)) | 551 | if (mem->guest_phys_addr & (PAGE_SIZE - 1)) |
552 | goto out; | 552 | goto out; |
553 | if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1))) | 553 | if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1))) |
554 | goto out; | 554 | goto out; |
555 | if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) | 555 | if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) |
556 | goto out; | 556 | goto out; |
557 | if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) | 557 | if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) |
558 | goto out; | 558 | goto out; |
559 | 559 | ||
560 | memslot = &kvm->memslots->memslots[mem->slot]; | 560 | memslot = &kvm->memslots->memslots[mem->slot]; |
561 | base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; | 561 | base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; |
562 | npages = mem->memory_size >> PAGE_SHIFT; | 562 | npages = mem->memory_size >> PAGE_SHIFT; |
563 | 563 | ||
564 | r = -EINVAL; | 564 | r = -EINVAL; |
565 | if (npages > KVM_MEM_MAX_NR_PAGES) | 565 | if (npages > KVM_MEM_MAX_NR_PAGES) |
566 | goto out; | 566 | goto out; |
567 | 567 | ||
568 | if (!npages) | 568 | if (!npages) |
569 | mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES; | 569 | mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES; |
570 | 570 | ||
571 | new = old = *memslot; | 571 | new = old = *memslot; |
572 | 572 | ||
573 | new.base_gfn = base_gfn; | 573 | new.base_gfn = base_gfn; |
574 | new.npages = npages; | 574 | new.npages = npages; |
575 | new.flags = mem->flags; | 575 | new.flags = mem->flags; |
576 | 576 | ||
577 | /* Disallow changing a memory slot's size. */ | 577 | /* Disallow changing a memory slot's size. */ |
578 | r = -EINVAL; | 578 | r = -EINVAL; |
579 | if (npages && old.npages && npages != old.npages) | 579 | if (npages && old.npages && npages != old.npages) |
580 | goto out_free; | 580 | goto out_free; |
581 | 581 | ||
582 | /* Check for overlaps */ | 582 | /* Check for overlaps */ |
583 | r = -EEXIST; | 583 | r = -EEXIST; |
584 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { | 584 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { |
585 | struct kvm_memory_slot *s = &kvm->memslots->memslots[i]; | 585 | struct kvm_memory_slot *s = &kvm->memslots->memslots[i]; |
586 | 586 | ||
587 | if (s == memslot || !s->npages) | 587 | if (s == memslot || !s->npages) |
588 | continue; | 588 | continue; |
589 | if (!((base_gfn + npages <= s->base_gfn) || | 589 | if (!((base_gfn + npages <= s->base_gfn) || |
590 | (base_gfn >= s->base_gfn + s->npages))) | 590 | (base_gfn >= s->base_gfn + s->npages))) |
591 | goto out_free; | 591 | goto out_free; |
592 | } | 592 | } |
593 | 593 | ||
594 | /* Free page dirty bitmap if unneeded */ | 594 | /* Free page dirty bitmap if unneeded */ |
595 | if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) | 595 | if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) |
596 | new.dirty_bitmap = NULL; | 596 | new.dirty_bitmap = NULL; |
597 | 597 | ||
598 | r = -ENOMEM; | 598 | r = -ENOMEM; |
599 | 599 | ||
600 | /* Allocate if a slot is being created */ | 600 | /* Allocate if a slot is being created */ |
601 | #ifndef CONFIG_S390 | 601 | #ifndef CONFIG_S390 |
602 | if (npages && !new.rmap) { | 602 | if (npages && !new.rmap) { |
603 | new.rmap = vmalloc(npages * sizeof(*new.rmap)); | 603 | new.rmap = vmalloc(npages * sizeof(*new.rmap)); |
604 | 604 | ||
605 | if (!new.rmap) | 605 | if (!new.rmap) |
606 | goto out_free; | 606 | goto out_free; |
607 | 607 | ||
608 | memset(new.rmap, 0, npages * sizeof(*new.rmap)); | 608 | memset(new.rmap, 0, npages * sizeof(*new.rmap)); |
609 | 609 | ||
610 | new.user_alloc = user_alloc; | 610 | new.user_alloc = user_alloc; |
611 | new.userspace_addr = mem->userspace_addr; | 611 | new.userspace_addr = mem->userspace_addr; |
612 | } | 612 | } |
613 | if (!npages) | 613 | if (!npages) |
614 | goto skip_lpage; | 614 | goto skip_lpage; |
615 | 615 | ||
616 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { | 616 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { |
617 | unsigned long ugfn; | 617 | unsigned long ugfn; |
618 | unsigned long j; | 618 | unsigned long j; |
619 | int lpages; | 619 | int lpages; |
620 | int level = i + 2; | 620 | int level = i + 2; |
621 | 621 | ||
622 | /* Avoid unused variable warning if no large pages */ | 622 | /* Avoid unused variable warning if no large pages */ |
623 | (void)level; | 623 | (void)level; |
624 | 624 | ||
625 | if (new.lpage_info[i]) | 625 | if (new.lpage_info[i]) |
626 | continue; | 626 | continue; |
627 | 627 | ||
628 | lpages = 1 + (base_gfn + npages - 1) / | 628 | lpages = 1 + (base_gfn + npages - 1) / |
629 | KVM_PAGES_PER_HPAGE(level); | 629 | KVM_PAGES_PER_HPAGE(level); |
630 | lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level); | 630 | lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level); |
631 | 631 | ||
632 | new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i])); | 632 | new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i])); |
633 | 633 | ||
634 | if (!new.lpage_info[i]) | 634 | if (!new.lpage_info[i]) |
635 | goto out_free; | 635 | goto out_free; |
636 | 636 | ||
637 | memset(new.lpage_info[i], 0, | 637 | memset(new.lpage_info[i], 0, |
638 | lpages * sizeof(*new.lpage_info[i])); | 638 | lpages * sizeof(*new.lpage_info[i])); |
639 | 639 | ||
640 | if (base_gfn % KVM_PAGES_PER_HPAGE(level)) | 640 | if (base_gfn % KVM_PAGES_PER_HPAGE(level)) |
641 | new.lpage_info[i][0].write_count = 1; | 641 | new.lpage_info[i][0].write_count = 1; |
642 | if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level)) | 642 | if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level)) |
643 | new.lpage_info[i][lpages - 1].write_count = 1; | 643 | new.lpage_info[i][lpages - 1].write_count = 1; |
644 | ugfn = new.userspace_addr >> PAGE_SHIFT; | 644 | ugfn = new.userspace_addr >> PAGE_SHIFT; |
645 | /* | 645 | /* |
646 | * If the gfn and userspace address are not aligned wrt each | 646 | * If the gfn and userspace address are not aligned wrt each |
647 | * other, or if explicitly asked to, disable large page | 647 | * other, or if explicitly asked to, disable large page |
648 | * support for this slot | 648 | * support for this slot |
649 | */ | 649 | */ |
650 | if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) || | 650 | if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) || |
651 | !largepages_enabled) | 651 | !largepages_enabled) |
652 | for (j = 0; j < lpages; ++j) | 652 | for (j = 0; j < lpages; ++j) |
653 | new.lpage_info[i][j].write_count = 1; | 653 | new.lpage_info[i][j].write_count = 1; |
654 | } | 654 | } |
655 | 655 | ||
656 | skip_lpage: | 656 | skip_lpage: |
657 | 657 | ||
658 | /* Allocate page dirty bitmap if needed */ | 658 | /* Allocate page dirty bitmap if needed */ |
659 | if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) { | 659 | if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) { |
660 | unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new); | 660 | unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new); |
661 | 661 | ||
662 | new.dirty_bitmap = vmalloc(dirty_bytes); | 662 | new.dirty_bitmap = vmalloc(dirty_bytes); |
663 | if (!new.dirty_bitmap) | 663 | if (!new.dirty_bitmap) |
664 | goto out_free; | 664 | goto out_free; |
665 | memset(new.dirty_bitmap, 0, dirty_bytes); | 665 | memset(new.dirty_bitmap, 0, dirty_bytes); |
666 | /* destroy any largepage mappings for dirty tracking */ | 666 | /* destroy any largepage mappings for dirty tracking */ |
667 | if (old.npages) | 667 | if (old.npages) |
668 | flush_shadow = 1; | 668 | flush_shadow = 1; |
669 | } | 669 | } |
670 | #else /* not defined CONFIG_S390 */ | 670 | #else /* not defined CONFIG_S390 */ |
671 | new.user_alloc = user_alloc; | 671 | new.user_alloc = user_alloc; |
672 | if (user_alloc) | 672 | if (user_alloc) |
673 | new.userspace_addr = mem->userspace_addr; | 673 | new.userspace_addr = mem->userspace_addr; |
674 | #endif /* not defined CONFIG_S390 */ | 674 | #endif /* not defined CONFIG_S390 */ |
675 | 675 | ||
676 | if (!npages) { | 676 | if (!npages) { |
677 | r = -ENOMEM; | 677 | r = -ENOMEM; |
678 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); | 678 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); |
679 | if (!slots) | 679 | if (!slots) |
680 | goto out_free; | 680 | goto out_free; |
681 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); | 681 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); |
682 | if (mem->slot >= slots->nmemslots) | 682 | if (mem->slot >= slots->nmemslots) |
683 | slots->nmemslots = mem->slot + 1; | 683 | slots->nmemslots = mem->slot + 1; |
684 | slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID; | 684 | slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID; |
685 | 685 | ||
686 | old_memslots = kvm->memslots; | 686 | old_memslots = kvm->memslots; |
687 | rcu_assign_pointer(kvm->memslots, slots); | 687 | rcu_assign_pointer(kvm->memslots, slots); |
688 | synchronize_srcu_expedited(&kvm->srcu); | 688 | synchronize_srcu_expedited(&kvm->srcu); |
689 | /* From this point no new shadow pages pointing to a deleted | 689 | /* From this point no new shadow pages pointing to a deleted |
690 | * memslot will be created. | 690 | * memslot will be created. |
691 | * | 691 | * |
692 | * validation of sp->gfn happens in: | 692 | * validation of sp->gfn happens in: |
693 | * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) | 693 | * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) |
694 | * - kvm_is_visible_gfn (mmu_check_roots) | 694 | * - kvm_is_visible_gfn (mmu_check_roots) |
695 | */ | 695 | */ |
696 | kvm_arch_flush_shadow(kvm); | 696 | kvm_arch_flush_shadow(kvm); |
697 | kfree(old_memslots); | 697 | kfree(old_memslots); |
698 | } | 698 | } |
699 | 699 | ||
700 | r = kvm_arch_prepare_memory_region(kvm, &new, old, mem, user_alloc); | 700 | r = kvm_arch_prepare_memory_region(kvm, &new, old, mem, user_alloc); |
701 | if (r) | 701 | if (r) |
702 | goto out_free; | 702 | goto out_free; |
703 | 703 | ||
704 | #ifdef CONFIG_DMAR | 704 | #ifdef CONFIG_DMAR |
705 | /* map the pages in iommu page table */ | 705 | /* map the pages in iommu page table */ |
706 | if (npages) { | 706 | if (npages) { |
707 | r = kvm_iommu_map_pages(kvm, &new); | 707 | r = kvm_iommu_map_pages(kvm, &new); |
708 | if (r) | 708 | if (r) |
709 | goto out_free; | 709 | goto out_free; |
710 | } | 710 | } |
711 | #endif | 711 | #endif |
712 | 712 | ||
713 | r = -ENOMEM; | 713 | r = -ENOMEM; |
714 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); | 714 | slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); |
715 | if (!slots) | 715 | if (!slots) |
716 | goto out_free; | 716 | goto out_free; |
717 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); | 717 | memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); |
718 | if (mem->slot >= slots->nmemslots) | 718 | if (mem->slot >= slots->nmemslots) |
719 | slots->nmemslots = mem->slot + 1; | 719 | slots->nmemslots = mem->slot + 1; |
720 | 720 | ||
721 | /* actual memory is freed via old in kvm_free_physmem_slot below */ | 721 | /* actual memory is freed via old in kvm_free_physmem_slot below */ |
722 | if (!npages) { | 722 | if (!npages) { |
723 | new.rmap = NULL; | 723 | new.rmap = NULL; |
724 | new.dirty_bitmap = NULL; | 724 | new.dirty_bitmap = NULL; |
725 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) | 725 | for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) |
726 | new.lpage_info[i] = NULL; | 726 | new.lpage_info[i] = NULL; |
727 | } | 727 | } |
728 | 728 | ||
729 | slots->memslots[mem->slot] = new; | 729 | slots->memslots[mem->slot] = new; |
730 | old_memslots = kvm->memslots; | 730 | old_memslots = kvm->memslots; |
731 | rcu_assign_pointer(kvm->memslots, slots); | 731 | rcu_assign_pointer(kvm->memslots, slots); |
732 | synchronize_srcu_expedited(&kvm->srcu); | 732 | synchronize_srcu_expedited(&kvm->srcu); |
733 | 733 | ||
734 | kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); | 734 | kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); |
735 | 735 | ||
736 | kvm_free_physmem_slot(&old, &new); | 736 | kvm_free_physmem_slot(&old, &new); |
737 | kfree(old_memslots); | 737 | kfree(old_memslots); |
738 | 738 | ||
739 | if (flush_shadow) | 739 | if (flush_shadow) |
740 | kvm_arch_flush_shadow(kvm); | 740 | kvm_arch_flush_shadow(kvm); |
741 | 741 | ||
742 | return 0; | 742 | return 0; |
743 | 743 | ||
744 | out_free: | 744 | out_free: |
745 | kvm_free_physmem_slot(&new, &old); | 745 | kvm_free_physmem_slot(&new, &old); |
746 | out: | 746 | out: |
747 | return r; | 747 | return r; |
748 | 748 | ||
749 | } | 749 | } |
750 | EXPORT_SYMBOL_GPL(__kvm_set_memory_region); | 750 | EXPORT_SYMBOL_GPL(__kvm_set_memory_region); |
751 | 751 | ||
752 | int kvm_set_memory_region(struct kvm *kvm, | 752 | int kvm_set_memory_region(struct kvm *kvm, |
753 | struct kvm_userspace_memory_region *mem, | 753 | struct kvm_userspace_memory_region *mem, |
754 | int user_alloc) | 754 | int user_alloc) |
755 | { | 755 | { |
756 | int r; | 756 | int r; |
757 | 757 | ||
758 | mutex_lock(&kvm->slots_lock); | 758 | mutex_lock(&kvm->slots_lock); |
759 | r = __kvm_set_memory_region(kvm, mem, user_alloc); | 759 | r = __kvm_set_memory_region(kvm, mem, user_alloc); |
760 | mutex_unlock(&kvm->slots_lock); | 760 | mutex_unlock(&kvm->slots_lock); |
761 | return r; | 761 | return r; |
762 | } | 762 | } |
763 | EXPORT_SYMBOL_GPL(kvm_set_memory_region); | 763 | EXPORT_SYMBOL_GPL(kvm_set_memory_region); |
764 | 764 | ||
765 | int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, | 765 | int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, |
766 | struct | 766 | struct |
767 | kvm_userspace_memory_region *mem, | 767 | kvm_userspace_memory_region *mem, |
768 | int user_alloc) | 768 | int user_alloc) |
769 | { | 769 | { |
770 | if (mem->slot >= KVM_MEMORY_SLOTS) | 770 | if (mem->slot >= KVM_MEMORY_SLOTS) |
771 | return -EINVAL; | 771 | return -EINVAL; |
772 | return kvm_set_memory_region(kvm, mem, user_alloc); | 772 | return kvm_set_memory_region(kvm, mem, user_alloc); |
773 | } | 773 | } |
774 | 774 | ||
775 | int kvm_get_dirty_log(struct kvm *kvm, | 775 | int kvm_get_dirty_log(struct kvm *kvm, |
776 | struct kvm_dirty_log *log, int *is_dirty) | 776 | struct kvm_dirty_log *log, int *is_dirty) |
777 | { | 777 | { |
778 | struct kvm_memory_slot *memslot; | 778 | struct kvm_memory_slot *memslot; |
779 | int r, i; | 779 | int r, i; |
780 | unsigned long n; | 780 | unsigned long n; |
781 | unsigned long any = 0; | 781 | unsigned long any = 0; |
782 | 782 | ||
783 | r = -EINVAL; | 783 | r = -EINVAL; |
784 | if (log->slot >= KVM_MEMORY_SLOTS) | 784 | if (log->slot >= KVM_MEMORY_SLOTS) |
785 | goto out; | 785 | goto out; |
786 | 786 | ||
787 | memslot = &kvm->memslots->memslots[log->slot]; | 787 | memslot = &kvm->memslots->memslots[log->slot]; |
788 | r = -ENOENT; | 788 | r = -ENOENT; |
789 | if (!memslot->dirty_bitmap) | 789 | if (!memslot->dirty_bitmap) |
790 | goto out; | 790 | goto out; |
791 | 791 | ||
792 | n = kvm_dirty_bitmap_bytes(memslot); | 792 | n = kvm_dirty_bitmap_bytes(memslot); |
793 | 793 | ||
794 | for (i = 0; !any && i < n/sizeof(long); ++i) | 794 | for (i = 0; !any && i < n/sizeof(long); ++i) |
795 | any = memslot->dirty_bitmap[i]; | 795 | any = memslot->dirty_bitmap[i]; |
796 | 796 | ||
797 | r = -EFAULT; | 797 | r = -EFAULT; |
798 | if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n)) | 798 | if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n)) |
799 | goto out; | 799 | goto out; |
800 | 800 | ||
801 | if (any) | 801 | if (any) |
802 | *is_dirty = 1; | 802 | *is_dirty = 1; |
803 | 803 | ||
804 | r = 0; | 804 | r = 0; |
805 | out: | 805 | out: |
806 | return r; | 806 | return r; |
807 | } | 807 | } |
808 | 808 | ||
809 | void kvm_disable_largepages(void) | 809 | void kvm_disable_largepages(void) |
810 | { | 810 | { |
811 | largepages_enabled = false; | 811 | largepages_enabled = false; |
812 | } | 812 | } |
813 | EXPORT_SYMBOL_GPL(kvm_disable_largepages); | 813 | EXPORT_SYMBOL_GPL(kvm_disable_largepages); |
814 | 814 | ||
815 | int is_error_page(struct page *page) | 815 | int is_error_page(struct page *page) |
816 | { | 816 | { |
817 | return page == bad_page || page == hwpoison_page; | 817 | return page == bad_page || page == hwpoison_page; |
818 | } | 818 | } |
819 | EXPORT_SYMBOL_GPL(is_error_page); | 819 | EXPORT_SYMBOL_GPL(is_error_page); |
820 | 820 | ||
821 | int is_error_pfn(pfn_t pfn) | 821 | int is_error_pfn(pfn_t pfn) |
822 | { | 822 | { |
823 | return pfn == bad_pfn || pfn == hwpoison_pfn; | 823 | return pfn == bad_pfn || pfn == hwpoison_pfn; |
824 | } | 824 | } |
825 | EXPORT_SYMBOL_GPL(is_error_pfn); | 825 | EXPORT_SYMBOL_GPL(is_error_pfn); |
826 | 826 | ||
827 | int is_hwpoison_pfn(pfn_t pfn) | 827 | int is_hwpoison_pfn(pfn_t pfn) |
828 | { | 828 | { |
829 | return pfn == hwpoison_pfn; | 829 | return pfn == hwpoison_pfn; |
830 | } | 830 | } |
831 | EXPORT_SYMBOL_GPL(is_hwpoison_pfn); | 831 | EXPORT_SYMBOL_GPL(is_hwpoison_pfn); |
832 | 832 | ||
833 | static inline unsigned long bad_hva(void) | 833 | static inline unsigned long bad_hva(void) |
834 | { | 834 | { |
835 | return PAGE_OFFSET; | 835 | return PAGE_OFFSET; |
836 | } | 836 | } |
837 | 837 | ||
838 | int kvm_is_error_hva(unsigned long addr) | 838 | int kvm_is_error_hva(unsigned long addr) |
839 | { | 839 | { |
840 | return addr == bad_hva(); | 840 | return addr == bad_hva(); |
841 | } | 841 | } |
842 | EXPORT_SYMBOL_GPL(kvm_is_error_hva); | 842 | EXPORT_SYMBOL_GPL(kvm_is_error_hva); |
843 | 843 | ||
844 | struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn) | 844 | struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) |
845 | { | 845 | { |
846 | int i; | 846 | int i; |
847 | struct kvm_memslots *slots = kvm_memslots(kvm); | 847 | struct kvm_memslots *slots = kvm_memslots(kvm); |
848 | 848 | ||
849 | for (i = 0; i < slots->nmemslots; ++i) { | 849 | for (i = 0; i < slots->nmemslots; ++i) { |
850 | struct kvm_memory_slot *memslot = &slots->memslots[i]; | 850 | struct kvm_memory_slot *memslot = &slots->memslots[i]; |
851 | 851 | ||
852 | if (gfn >= memslot->base_gfn | 852 | if (gfn >= memslot->base_gfn |
853 | && gfn < memslot->base_gfn + memslot->npages) | 853 | && gfn < memslot->base_gfn + memslot->npages) |
854 | return memslot; | 854 | return memslot; |
855 | } | 855 | } |
856 | return NULL; | 856 | return NULL; |
857 | } | 857 | } |
858 | EXPORT_SYMBOL_GPL(gfn_to_memslot_unaliased); | 858 | EXPORT_SYMBOL_GPL(gfn_to_memslot); |
859 | 859 | ||
860 | struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) | ||
861 | { | ||
862 | gfn = unalias_gfn(kvm, gfn); | ||
863 | return gfn_to_memslot_unaliased(kvm, gfn); | ||
864 | } | ||
865 | |||
866 | int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) | 860 | int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) |
867 | { | 861 | { |
868 | int i; | 862 | int i; |
869 | struct kvm_memslots *slots = kvm_memslots(kvm); | 863 | struct kvm_memslots *slots = kvm_memslots(kvm); |
870 | 864 | ||
871 | gfn = unalias_gfn_instantiation(kvm, gfn); | ||
872 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { | 865 | for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { |
873 | struct kvm_memory_slot *memslot = &slots->memslots[i]; | 866 | struct kvm_memory_slot *memslot = &slots->memslots[i]; |
874 | 867 | ||
875 | if (memslot->flags & KVM_MEMSLOT_INVALID) | 868 | if (memslot->flags & KVM_MEMSLOT_INVALID) |
876 | continue; | 869 | continue; |
877 | 870 | ||
878 | if (gfn >= memslot->base_gfn | 871 | if (gfn >= memslot->base_gfn |
879 | && gfn < memslot->base_gfn + memslot->npages) | 872 | && gfn < memslot->base_gfn + memslot->npages) |
880 | return 1; | 873 | return 1; |
881 | } | 874 | } |
882 | return 0; | 875 | return 0; |
883 | } | 876 | } |
884 | EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); | 877 | EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); |
885 | 878 | ||
886 | unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) | 879 | unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) |
887 | { | 880 | { |
888 | struct vm_area_struct *vma; | 881 | struct vm_area_struct *vma; |
889 | unsigned long addr, size; | 882 | unsigned long addr, size; |
890 | 883 | ||
891 | size = PAGE_SIZE; | 884 | size = PAGE_SIZE; |
892 | 885 | ||
893 | addr = gfn_to_hva(kvm, gfn); | 886 | addr = gfn_to_hva(kvm, gfn); |
894 | if (kvm_is_error_hva(addr)) | 887 | if (kvm_is_error_hva(addr)) |
895 | return PAGE_SIZE; | 888 | return PAGE_SIZE; |
896 | 889 | ||
897 | down_read(¤t->mm->mmap_sem); | 890 | down_read(¤t->mm->mmap_sem); |
898 | vma = find_vma(current->mm, addr); | 891 | vma = find_vma(current->mm, addr); |
899 | if (!vma) | 892 | if (!vma) |
900 | goto out; | 893 | goto out; |
901 | 894 | ||
902 | size = vma_kernel_pagesize(vma); | 895 | size = vma_kernel_pagesize(vma); |
903 | 896 | ||
904 | out: | 897 | out: |
905 | up_read(¤t->mm->mmap_sem); | 898 | up_read(¤t->mm->mmap_sem); |
906 | 899 | ||
907 | return size; | 900 | return size; |
908 | } | 901 | } |
909 | 902 | ||
910 | int memslot_id(struct kvm *kvm, gfn_t gfn) | 903 | int memslot_id(struct kvm *kvm, gfn_t gfn) |
911 | { | 904 | { |
912 | int i; | 905 | int i; |
913 | struct kvm_memslots *slots = kvm_memslots(kvm); | 906 | struct kvm_memslots *slots = kvm_memslots(kvm); |
914 | struct kvm_memory_slot *memslot = NULL; | 907 | struct kvm_memory_slot *memslot = NULL; |
915 | 908 | ||
916 | gfn = unalias_gfn(kvm, gfn); | ||
917 | for (i = 0; i < slots->nmemslots; ++i) { | 909 | for (i = 0; i < slots->nmemslots; ++i) { |
918 | memslot = &slots->memslots[i]; | 910 | memslot = &slots->memslots[i]; |
919 | 911 | ||
920 | if (gfn >= memslot->base_gfn | 912 | if (gfn >= memslot->base_gfn |
921 | && gfn < memslot->base_gfn + memslot->npages) | 913 | && gfn < memslot->base_gfn + memslot->npages) |
922 | break; | 914 | break; |
923 | } | 915 | } |
924 | 916 | ||
925 | return memslot - slots->memslots; | 917 | return memslot - slots->memslots; |
926 | } | 918 | } |
927 | 919 | ||
928 | static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) | 920 | static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) |
929 | { | 921 | { |
930 | return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE; | 922 | return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE; |
931 | } | 923 | } |
932 | 924 | ||
933 | unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) | 925 | unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) |
934 | { | 926 | { |
935 | struct kvm_memory_slot *slot; | 927 | struct kvm_memory_slot *slot; |
936 | 928 | ||
937 | gfn = unalias_gfn_instantiation(kvm, gfn); | 929 | slot = gfn_to_memslot(kvm, gfn); |
938 | slot = gfn_to_memslot_unaliased(kvm, gfn); | ||
939 | if (!slot || slot->flags & KVM_MEMSLOT_INVALID) | 930 | if (!slot || slot->flags & KVM_MEMSLOT_INVALID) |
940 | return bad_hva(); | 931 | return bad_hva(); |
941 | return gfn_to_hva_memslot(slot, gfn); | 932 | return gfn_to_hva_memslot(slot, gfn); |
942 | } | 933 | } |
943 | EXPORT_SYMBOL_GPL(gfn_to_hva); | 934 | EXPORT_SYMBOL_GPL(gfn_to_hva); |
944 | 935 | ||
945 | static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) | 936 | static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) |
946 | { | 937 | { |
947 | struct page *page[1]; | 938 | struct page *page[1]; |
948 | int npages; | 939 | int npages; |
949 | pfn_t pfn; | 940 | pfn_t pfn; |
950 | 941 | ||
951 | might_sleep(); | 942 | might_sleep(); |
952 | 943 | ||
953 | npages = get_user_pages_fast(addr, 1, 1, page); | 944 | npages = get_user_pages_fast(addr, 1, 1, page); |
954 | 945 | ||
955 | if (unlikely(npages != 1)) { | 946 | if (unlikely(npages != 1)) { |
956 | struct vm_area_struct *vma; | 947 | struct vm_area_struct *vma; |
957 | 948 | ||
958 | if (is_hwpoison_address(addr)) { | 949 | if (is_hwpoison_address(addr)) { |
959 | get_page(hwpoison_page); | 950 | get_page(hwpoison_page); |
960 | return page_to_pfn(hwpoison_page); | 951 | return page_to_pfn(hwpoison_page); |
961 | } | 952 | } |
962 | 953 | ||
963 | down_read(¤t->mm->mmap_sem); | 954 | down_read(¤t->mm->mmap_sem); |
964 | vma = find_vma(current->mm, addr); | 955 | vma = find_vma(current->mm, addr); |
965 | 956 | ||
966 | if (vma == NULL || addr < vma->vm_start || | 957 | if (vma == NULL || addr < vma->vm_start || |
967 | !(vma->vm_flags & VM_PFNMAP)) { | 958 | !(vma->vm_flags & VM_PFNMAP)) { |
968 | up_read(¤t->mm->mmap_sem); | 959 | up_read(¤t->mm->mmap_sem); |
969 | get_page(bad_page); | 960 | get_page(bad_page); |
970 | return page_to_pfn(bad_page); | 961 | return page_to_pfn(bad_page); |
971 | } | 962 | } |
972 | 963 | ||
973 | pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; | 964 | pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; |
974 | up_read(¤t->mm->mmap_sem); | 965 | up_read(¤t->mm->mmap_sem); |
975 | BUG_ON(!kvm_is_mmio_pfn(pfn)); | 966 | BUG_ON(!kvm_is_mmio_pfn(pfn)); |
976 | } else | 967 | } else |
977 | pfn = page_to_pfn(page[0]); | 968 | pfn = page_to_pfn(page[0]); |
978 | 969 | ||
979 | return pfn; | 970 | return pfn; |
980 | } | 971 | } |
981 | 972 | ||
982 | pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) | 973 | pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) |
983 | { | 974 | { |
984 | unsigned long addr; | 975 | unsigned long addr; |
985 | 976 | ||
986 | addr = gfn_to_hva(kvm, gfn); | 977 | addr = gfn_to_hva(kvm, gfn); |
987 | if (kvm_is_error_hva(addr)) { | 978 | if (kvm_is_error_hva(addr)) { |
988 | get_page(bad_page); | 979 | get_page(bad_page); |
989 | return page_to_pfn(bad_page); | 980 | return page_to_pfn(bad_page); |
990 | } | 981 | } |
991 | 982 | ||
992 | return hva_to_pfn(kvm, addr); | 983 | return hva_to_pfn(kvm, addr); |
993 | } | 984 | } |
994 | EXPORT_SYMBOL_GPL(gfn_to_pfn); | 985 | EXPORT_SYMBOL_GPL(gfn_to_pfn); |
995 | 986 | ||
996 | pfn_t gfn_to_pfn_memslot(struct kvm *kvm, | 987 | pfn_t gfn_to_pfn_memslot(struct kvm *kvm, |
997 | struct kvm_memory_slot *slot, gfn_t gfn) | 988 | struct kvm_memory_slot *slot, gfn_t gfn) |
998 | { | 989 | { |
999 | unsigned long addr = gfn_to_hva_memslot(slot, gfn); | 990 | unsigned long addr = gfn_to_hva_memslot(slot, gfn); |
1000 | return hva_to_pfn(kvm, addr); | 991 | return hva_to_pfn(kvm, addr); |
1001 | } | 992 | } |
1002 | 993 | ||
1003 | struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) | 994 | struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) |
1004 | { | 995 | { |
1005 | pfn_t pfn; | 996 | pfn_t pfn; |
1006 | 997 | ||
1007 | pfn = gfn_to_pfn(kvm, gfn); | 998 | pfn = gfn_to_pfn(kvm, gfn); |
1008 | if (!kvm_is_mmio_pfn(pfn)) | 999 | if (!kvm_is_mmio_pfn(pfn)) |
1009 | return pfn_to_page(pfn); | 1000 | return pfn_to_page(pfn); |
1010 | 1001 | ||
1011 | WARN_ON(kvm_is_mmio_pfn(pfn)); | 1002 | WARN_ON(kvm_is_mmio_pfn(pfn)); |
1012 | 1003 | ||
1013 | get_page(bad_page); | 1004 | get_page(bad_page); |
1014 | return bad_page; | 1005 | return bad_page; |
1015 | } | 1006 | } |
1016 | 1007 | ||
1017 | EXPORT_SYMBOL_GPL(gfn_to_page); | 1008 | EXPORT_SYMBOL_GPL(gfn_to_page); |
1018 | 1009 | ||
1019 | void kvm_release_page_clean(struct page *page) | 1010 | void kvm_release_page_clean(struct page *page) |
1020 | { | 1011 | { |
1021 | kvm_release_pfn_clean(page_to_pfn(page)); | 1012 | kvm_release_pfn_clean(page_to_pfn(page)); |
1022 | } | 1013 | } |
1023 | EXPORT_SYMBOL_GPL(kvm_release_page_clean); | 1014 | EXPORT_SYMBOL_GPL(kvm_release_page_clean); |
1024 | 1015 | ||
1025 | void kvm_release_pfn_clean(pfn_t pfn) | 1016 | void kvm_release_pfn_clean(pfn_t pfn) |
1026 | { | 1017 | { |
1027 | if (!kvm_is_mmio_pfn(pfn)) | 1018 | if (!kvm_is_mmio_pfn(pfn)) |
1028 | put_page(pfn_to_page(pfn)); | 1019 | put_page(pfn_to_page(pfn)); |
1029 | } | 1020 | } |
1030 | EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); | 1021 | EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); |
1031 | 1022 | ||
1032 | void kvm_release_page_dirty(struct page *page) | 1023 | void kvm_release_page_dirty(struct page *page) |
1033 | { | 1024 | { |
1034 | kvm_release_pfn_dirty(page_to_pfn(page)); | 1025 | kvm_release_pfn_dirty(page_to_pfn(page)); |
1035 | } | 1026 | } |
1036 | EXPORT_SYMBOL_GPL(kvm_release_page_dirty); | 1027 | EXPORT_SYMBOL_GPL(kvm_release_page_dirty); |
1037 | 1028 | ||
1038 | void kvm_release_pfn_dirty(pfn_t pfn) | 1029 | void kvm_release_pfn_dirty(pfn_t pfn) |
1039 | { | 1030 | { |
1040 | kvm_set_pfn_dirty(pfn); | 1031 | kvm_set_pfn_dirty(pfn); |
1041 | kvm_release_pfn_clean(pfn); | 1032 | kvm_release_pfn_clean(pfn); |
1042 | } | 1033 | } |
1043 | EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty); | 1034 | EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty); |
1044 | 1035 | ||
1045 | void kvm_set_page_dirty(struct page *page) | 1036 | void kvm_set_page_dirty(struct page *page) |
1046 | { | 1037 | { |
1047 | kvm_set_pfn_dirty(page_to_pfn(page)); | 1038 | kvm_set_pfn_dirty(page_to_pfn(page)); |
1048 | } | 1039 | } |
1049 | EXPORT_SYMBOL_GPL(kvm_set_page_dirty); | 1040 | EXPORT_SYMBOL_GPL(kvm_set_page_dirty); |
1050 | 1041 | ||
1051 | void kvm_set_pfn_dirty(pfn_t pfn) | 1042 | void kvm_set_pfn_dirty(pfn_t pfn) |
1052 | { | 1043 | { |
1053 | if (!kvm_is_mmio_pfn(pfn)) { | 1044 | if (!kvm_is_mmio_pfn(pfn)) { |
1054 | struct page *page = pfn_to_page(pfn); | 1045 | struct page *page = pfn_to_page(pfn); |
1055 | if (!PageReserved(page)) | 1046 | if (!PageReserved(page)) |
1056 | SetPageDirty(page); | 1047 | SetPageDirty(page); |
1057 | } | 1048 | } |
1058 | } | 1049 | } |
1059 | EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); | 1050 | EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); |
1060 | 1051 | ||
1061 | void kvm_set_pfn_accessed(pfn_t pfn) | 1052 | void kvm_set_pfn_accessed(pfn_t pfn) |
1062 | { | 1053 | { |
1063 | if (!kvm_is_mmio_pfn(pfn)) | 1054 | if (!kvm_is_mmio_pfn(pfn)) |
1064 | mark_page_accessed(pfn_to_page(pfn)); | 1055 | mark_page_accessed(pfn_to_page(pfn)); |
1065 | } | 1056 | } |
1066 | EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); | 1057 | EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); |
1067 | 1058 | ||
1068 | void kvm_get_pfn(pfn_t pfn) | 1059 | void kvm_get_pfn(pfn_t pfn) |
1069 | { | 1060 | { |
1070 | if (!kvm_is_mmio_pfn(pfn)) | 1061 | if (!kvm_is_mmio_pfn(pfn)) |
1071 | get_page(pfn_to_page(pfn)); | 1062 | get_page(pfn_to_page(pfn)); |
1072 | } | 1063 | } |
1073 | EXPORT_SYMBOL_GPL(kvm_get_pfn); | 1064 | EXPORT_SYMBOL_GPL(kvm_get_pfn); |
1074 | 1065 | ||
1075 | static int next_segment(unsigned long len, int offset) | 1066 | static int next_segment(unsigned long len, int offset) |
1076 | { | 1067 | { |
1077 | if (len > PAGE_SIZE - offset) | 1068 | if (len > PAGE_SIZE - offset) |
1078 | return PAGE_SIZE - offset; | 1069 | return PAGE_SIZE - offset; |
1079 | else | 1070 | else |
1080 | return len; | 1071 | return len; |
1081 | } | 1072 | } |
1082 | 1073 | ||
1083 | int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, | 1074 | int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, |
1084 | int len) | 1075 | int len) |
1085 | { | 1076 | { |
1086 | int r; | 1077 | int r; |
1087 | unsigned long addr; | 1078 | unsigned long addr; |
1088 | 1079 | ||
1089 | addr = gfn_to_hva(kvm, gfn); | 1080 | addr = gfn_to_hva(kvm, gfn); |
1090 | if (kvm_is_error_hva(addr)) | 1081 | if (kvm_is_error_hva(addr)) |
1091 | return -EFAULT; | 1082 | return -EFAULT; |
1092 | r = copy_from_user(data, (void __user *)addr + offset, len); | 1083 | r = copy_from_user(data, (void __user *)addr + offset, len); |
1093 | if (r) | 1084 | if (r) |
1094 | return -EFAULT; | 1085 | return -EFAULT; |
1095 | return 0; | 1086 | return 0; |
1096 | } | 1087 | } |
1097 | EXPORT_SYMBOL_GPL(kvm_read_guest_page); | 1088 | EXPORT_SYMBOL_GPL(kvm_read_guest_page); |
1098 | 1089 | ||
1099 | int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len) | 1090 | int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len) |
1100 | { | 1091 | { |
1101 | gfn_t gfn = gpa >> PAGE_SHIFT; | 1092 | gfn_t gfn = gpa >> PAGE_SHIFT; |
1102 | int seg; | 1093 | int seg; |
1103 | int offset = offset_in_page(gpa); | 1094 | int offset = offset_in_page(gpa); |
1104 | int ret; | 1095 | int ret; |
1105 | 1096 | ||
1106 | while ((seg = next_segment(len, offset)) != 0) { | 1097 | while ((seg = next_segment(len, offset)) != 0) { |
1107 | ret = kvm_read_guest_page(kvm, gfn, data, offset, seg); | 1098 | ret = kvm_read_guest_page(kvm, gfn, data, offset, seg); |
1108 | if (ret < 0) | 1099 | if (ret < 0) |
1109 | return ret; | 1100 | return ret; |
1110 | offset = 0; | 1101 | offset = 0; |
1111 | len -= seg; | 1102 | len -= seg; |
1112 | data += seg; | 1103 | data += seg; |
1113 | ++gfn; | 1104 | ++gfn; |
1114 | } | 1105 | } |
1115 | return 0; | 1106 | return 0; |
1116 | } | 1107 | } |
1117 | EXPORT_SYMBOL_GPL(kvm_read_guest); | 1108 | EXPORT_SYMBOL_GPL(kvm_read_guest); |
1118 | 1109 | ||
1119 | int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, | 1110 | int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data, |
1120 | unsigned long len) | 1111 | unsigned long len) |
1121 | { | 1112 | { |
1122 | int r; | 1113 | int r; |
1123 | unsigned long addr; | 1114 | unsigned long addr; |
1124 | gfn_t gfn = gpa >> PAGE_SHIFT; | 1115 | gfn_t gfn = gpa >> PAGE_SHIFT; |
1125 | int offset = offset_in_page(gpa); | 1116 | int offset = offset_in_page(gpa); |
1126 | 1117 | ||
1127 | addr = gfn_to_hva(kvm, gfn); | 1118 | addr = gfn_to_hva(kvm, gfn); |
1128 | if (kvm_is_error_hva(addr)) | 1119 | if (kvm_is_error_hva(addr)) |
1129 | return -EFAULT; | 1120 | return -EFAULT; |
1130 | pagefault_disable(); | 1121 | pagefault_disable(); |
1131 | r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len); | 1122 | r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len); |
1132 | pagefault_enable(); | 1123 | pagefault_enable(); |
1133 | if (r) | 1124 | if (r) |
1134 | return -EFAULT; | 1125 | return -EFAULT; |
1135 | return 0; | 1126 | return 0; |
1136 | } | 1127 | } |
1137 | EXPORT_SYMBOL(kvm_read_guest_atomic); | 1128 | EXPORT_SYMBOL(kvm_read_guest_atomic); |
1138 | 1129 | ||
1139 | int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, | 1130 | int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, |
1140 | int offset, int len) | 1131 | int offset, int len) |
1141 | { | 1132 | { |
1142 | int r; | 1133 | int r; |
1143 | unsigned long addr; | 1134 | unsigned long addr; |
1144 | 1135 | ||
1145 | addr = gfn_to_hva(kvm, gfn); | 1136 | addr = gfn_to_hva(kvm, gfn); |
1146 | if (kvm_is_error_hva(addr)) | 1137 | if (kvm_is_error_hva(addr)) |
1147 | return -EFAULT; | 1138 | return -EFAULT; |
1148 | r = copy_to_user((void __user *)addr + offset, data, len); | 1139 | r = copy_to_user((void __user *)addr + offset, data, len); |
1149 | if (r) | 1140 | if (r) |
1150 | return -EFAULT; | 1141 | return -EFAULT; |
1151 | mark_page_dirty(kvm, gfn); | 1142 | mark_page_dirty(kvm, gfn); |
1152 | return 0; | 1143 | return 0; |
1153 | } | 1144 | } |
1154 | EXPORT_SYMBOL_GPL(kvm_write_guest_page); | 1145 | EXPORT_SYMBOL_GPL(kvm_write_guest_page); |
1155 | 1146 | ||
1156 | int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, | 1147 | int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, |
1157 | unsigned long len) | 1148 | unsigned long len) |
1158 | { | 1149 | { |
1159 | gfn_t gfn = gpa >> PAGE_SHIFT; | 1150 | gfn_t gfn = gpa >> PAGE_SHIFT; |
1160 | int seg; | 1151 | int seg; |
1161 | int offset = offset_in_page(gpa); | 1152 | int offset = offset_in_page(gpa); |
1162 | int ret; | 1153 | int ret; |
1163 | 1154 | ||
1164 | while ((seg = next_segment(len, offset)) != 0) { | 1155 | while ((seg = next_segment(len, offset)) != 0) { |
1165 | ret = kvm_write_guest_page(kvm, gfn, data, offset, seg); | 1156 | ret = kvm_write_guest_page(kvm, gfn, data, offset, seg); |
1166 | if (ret < 0) | 1157 | if (ret < 0) |
1167 | return ret; | 1158 | return ret; |
1168 | offset = 0; | 1159 | offset = 0; |
1169 | len -= seg; | 1160 | len -= seg; |
1170 | data += seg; | 1161 | data += seg; |
1171 | ++gfn; | 1162 | ++gfn; |
1172 | } | 1163 | } |
1173 | return 0; | 1164 | return 0; |
1174 | } | 1165 | } |
1175 | 1166 | ||
1176 | int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) | 1167 | int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) |
1177 | { | 1168 | { |
1178 | return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len); | 1169 | return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len); |
1179 | } | 1170 | } |
1180 | EXPORT_SYMBOL_GPL(kvm_clear_guest_page); | 1171 | EXPORT_SYMBOL_GPL(kvm_clear_guest_page); |
1181 | 1172 | ||
1182 | int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) | 1173 | int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) |
1183 | { | 1174 | { |
1184 | gfn_t gfn = gpa >> PAGE_SHIFT; | 1175 | gfn_t gfn = gpa >> PAGE_SHIFT; |
1185 | int seg; | 1176 | int seg; |
1186 | int offset = offset_in_page(gpa); | 1177 | int offset = offset_in_page(gpa); |
1187 | int ret; | 1178 | int ret; |
1188 | 1179 | ||
1189 | while ((seg = next_segment(len, offset)) != 0) { | 1180 | while ((seg = next_segment(len, offset)) != 0) { |
1190 | ret = kvm_clear_guest_page(kvm, gfn, offset, seg); | 1181 | ret = kvm_clear_guest_page(kvm, gfn, offset, seg); |
1191 | if (ret < 0) | 1182 | if (ret < 0) |
1192 | return ret; | 1183 | return ret; |
1193 | offset = 0; | 1184 | offset = 0; |
1194 | len -= seg; | 1185 | len -= seg; |
1195 | ++gfn; | 1186 | ++gfn; |
1196 | } | 1187 | } |
1197 | return 0; | 1188 | return 0; |
1198 | } | 1189 | } |
1199 | EXPORT_SYMBOL_GPL(kvm_clear_guest); | 1190 | EXPORT_SYMBOL_GPL(kvm_clear_guest); |
1200 | 1191 | ||
1201 | void mark_page_dirty(struct kvm *kvm, gfn_t gfn) | 1192 | void mark_page_dirty(struct kvm *kvm, gfn_t gfn) |
1202 | { | 1193 | { |
1203 | struct kvm_memory_slot *memslot; | 1194 | struct kvm_memory_slot *memslot; |
1204 | 1195 | ||
1205 | gfn = unalias_gfn(kvm, gfn); | 1196 | memslot = gfn_to_memslot(kvm, gfn); |
1206 | memslot = gfn_to_memslot_unaliased(kvm, gfn); | ||
1207 | if (memslot && memslot->dirty_bitmap) { | 1197 | if (memslot && memslot->dirty_bitmap) { |
1208 | unsigned long rel_gfn = gfn - memslot->base_gfn; | 1198 | unsigned long rel_gfn = gfn - memslot->base_gfn; |
1209 | 1199 | ||
1210 | generic___set_le_bit(rel_gfn, memslot->dirty_bitmap); | 1200 | generic___set_le_bit(rel_gfn, memslot->dirty_bitmap); |
1211 | } | 1201 | } |
1212 | } | 1202 | } |
1213 | 1203 | ||
1214 | /* | 1204 | /* |
1215 | * The vCPU has executed a HLT instruction with in-kernel mode enabled. | 1205 | * The vCPU has executed a HLT instruction with in-kernel mode enabled. |
1216 | */ | 1206 | */ |
1217 | void kvm_vcpu_block(struct kvm_vcpu *vcpu) | 1207 | void kvm_vcpu_block(struct kvm_vcpu *vcpu) |
1218 | { | 1208 | { |
1219 | DEFINE_WAIT(wait); | 1209 | DEFINE_WAIT(wait); |
1220 | 1210 | ||
1221 | for (;;) { | 1211 | for (;;) { |
1222 | prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); | 1212 | prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); |
1223 | 1213 | ||
1224 | if (kvm_arch_vcpu_runnable(vcpu)) { | 1214 | if (kvm_arch_vcpu_runnable(vcpu)) { |
1225 | set_bit(KVM_REQ_UNHALT, &vcpu->requests); | 1215 | set_bit(KVM_REQ_UNHALT, &vcpu->requests); |
1226 | break; | 1216 | break; |
1227 | } | 1217 | } |
1228 | if (kvm_cpu_has_pending_timer(vcpu)) | 1218 | if (kvm_cpu_has_pending_timer(vcpu)) |
1229 | break; | 1219 | break; |
1230 | if (signal_pending(current)) | 1220 | if (signal_pending(current)) |
1231 | break; | 1221 | break; |
1232 | 1222 | ||
1233 | schedule(); | 1223 | schedule(); |
1234 | } | 1224 | } |
1235 | 1225 | ||
1236 | finish_wait(&vcpu->wq, &wait); | 1226 | finish_wait(&vcpu->wq, &wait); |
1237 | } | 1227 | } |
1238 | 1228 | ||
1239 | void kvm_resched(struct kvm_vcpu *vcpu) | 1229 | void kvm_resched(struct kvm_vcpu *vcpu) |
1240 | { | 1230 | { |
1241 | if (!need_resched()) | 1231 | if (!need_resched()) |
1242 | return; | 1232 | return; |
1243 | cond_resched(); | 1233 | cond_resched(); |
1244 | } | 1234 | } |
1245 | EXPORT_SYMBOL_GPL(kvm_resched); | 1235 | EXPORT_SYMBOL_GPL(kvm_resched); |
1246 | 1236 | ||
1247 | void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) | 1237 | void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) |
1248 | { | 1238 | { |
1249 | ktime_t expires; | 1239 | ktime_t expires; |
1250 | DEFINE_WAIT(wait); | 1240 | DEFINE_WAIT(wait); |
1251 | 1241 | ||
1252 | prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); | 1242 | prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); |
1253 | 1243 | ||
1254 | /* Sleep for 100 us, and hope lock-holder got scheduled */ | 1244 | /* Sleep for 100 us, and hope lock-holder got scheduled */ |
1255 | expires = ktime_add_ns(ktime_get(), 100000UL); | 1245 | expires = ktime_add_ns(ktime_get(), 100000UL); |
1256 | schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); | 1246 | schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); |
1257 | 1247 | ||
1258 | finish_wait(&vcpu->wq, &wait); | 1248 | finish_wait(&vcpu->wq, &wait); |
1259 | } | 1249 | } |
1260 | EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); | 1250 | EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); |
1261 | 1251 | ||
1262 | static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) | 1252 | static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) |
1263 | { | 1253 | { |
1264 | struct kvm_vcpu *vcpu = vma->vm_file->private_data; | 1254 | struct kvm_vcpu *vcpu = vma->vm_file->private_data; |
1265 | struct page *page; | 1255 | struct page *page; |
1266 | 1256 | ||
1267 | if (vmf->pgoff == 0) | 1257 | if (vmf->pgoff == 0) |
1268 | page = virt_to_page(vcpu->run); | 1258 | page = virt_to_page(vcpu->run); |
1269 | #ifdef CONFIG_X86 | 1259 | #ifdef CONFIG_X86 |
1270 | else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET) | 1260 | else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET) |
1271 | page = virt_to_page(vcpu->arch.pio_data); | 1261 | page = virt_to_page(vcpu->arch.pio_data); |
1272 | #endif | 1262 | #endif |
1273 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET | 1263 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET |
1274 | else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) | 1264 | else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) |
1275 | page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); | 1265 | page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); |
1276 | #endif | 1266 | #endif |
1277 | else | 1267 | else |
1278 | return VM_FAULT_SIGBUS; | 1268 | return VM_FAULT_SIGBUS; |
1279 | get_page(page); | 1269 | get_page(page); |
1280 | vmf->page = page; | 1270 | vmf->page = page; |
1281 | return 0; | 1271 | return 0; |
1282 | } | 1272 | } |
1283 | 1273 | ||
1284 | static const struct vm_operations_struct kvm_vcpu_vm_ops = { | 1274 | static const struct vm_operations_struct kvm_vcpu_vm_ops = { |
1285 | .fault = kvm_vcpu_fault, | 1275 | .fault = kvm_vcpu_fault, |
1286 | }; | 1276 | }; |
1287 | 1277 | ||
1288 | static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) | 1278 | static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) |
1289 | { | 1279 | { |
1290 | vma->vm_ops = &kvm_vcpu_vm_ops; | 1280 | vma->vm_ops = &kvm_vcpu_vm_ops; |
1291 | return 0; | 1281 | return 0; |
1292 | } | 1282 | } |
1293 | 1283 | ||
1294 | static int kvm_vcpu_release(struct inode *inode, struct file *filp) | 1284 | static int kvm_vcpu_release(struct inode *inode, struct file *filp) |
1295 | { | 1285 | { |
1296 | struct kvm_vcpu *vcpu = filp->private_data; | 1286 | struct kvm_vcpu *vcpu = filp->private_data; |
1297 | 1287 | ||
1298 | kvm_put_kvm(vcpu->kvm); | 1288 | kvm_put_kvm(vcpu->kvm); |
1299 | return 0; | 1289 | return 0; |
1300 | } | 1290 | } |
1301 | 1291 | ||
1302 | static struct file_operations kvm_vcpu_fops = { | 1292 | static struct file_operations kvm_vcpu_fops = { |
1303 | .release = kvm_vcpu_release, | 1293 | .release = kvm_vcpu_release, |
1304 | .unlocked_ioctl = kvm_vcpu_ioctl, | 1294 | .unlocked_ioctl = kvm_vcpu_ioctl, |
1305 | .compat_ioctl = kvm_vcpu_ioctl, | 1295 | .compat_ioctl = kvm_vcpu_ioctl, |
1306 | .mmap = kvm_vcpu_mmap, | 1296 | .mmap = kvm_vcpu_mmap, |
1307 | }; | 1297 | }; |
1308 | 1298 | ||
1309 | /* | 1299 | /* |
1310 | * Allocates an inode for the vcpu. | 1300 | * Allocates an inode for the vcpu. |
1311 | */ | 1301 | */ |
1312 | static int create_vcpu_fd(struct kvm_vcpu *vcpu) | 1302 | static int create_vcpu_fd(struct kvm_vcpu *vcpu) |
1313 | { | 1303 | { |
1314 | return anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, O_RDWR); | 1304 | return anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, O_RDWR); |
1315 | } | 1305 | } |
1316 | 1306 | ||
1317 | /* | 1307 | /* |
1318 | * Creates some virtual cpus. Good luck creating more than one. | 1308 | * Creates some virtual cpus. Good luck creating more than one. |
1319 | */ | 1309 | */ |
1320 | static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) | 1310 | static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) |
1321 | { | 1311 | { |
1322 | int r; | 1312 | int r; |
1323 | struct kvm_vcpu *vcpu, *v; | 1313 | struct kvm_vcpu *vcpu, *v; |
1324 | 1314 | ||
1325 | vcpu = kvm_arch_vcpu_create(kvm, id); | 1315 | vcpu = kvm_arch_vcpu_create(kvm, id); |
1326 | if (IS_ERR(vcpu)) | 1316 | if (IS_ERR(vcpu)) |
1327 | return PTR_ERR(vcpu); | 1317 | return PTR_ERR(vcpu); |
1328 | 1318 | ||
1329 | preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); | 1319 | preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); |
1330 | 1320 | ||
1331 | r = kvm_arch_vcpu_setup(vcpu); | 1321 | r = kvm_arch_vcpu_setup(vcpu); |
1332 | if (r) | 1322 | if (r) |
1333 | return r; | 1323 | return r; |
1334 | 1324 | ||
1335 | mutex_lock(&kvm->lock); | 1325 | mutex_lock(&kvm->lock); |
1336 | if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { | 1326 | if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { |
1337 | r = -EINVAL; | 1327 | r = -EINVAL; |
1338 | goto vcpu_destroy; | 1328 | goto vcpu_destroy; |
1339 | } | 1329 | } |
1340 | 1330 | ||
1341 | kvm_for_each_vcpu(r, v, kvm) | 1331 | kvm_for_each_vcpu(r, v, kvm) |
1342 | if (v->vcpu_id == id) { | 1332 | if (v->vcpu_id == id) { |
1343 | r = -EEXIST; | 1333 | r = -EEXIST; |
1344 | goto vcpu_destroy; | 1334 | goto vcpu_destroy; |
1345 | } | 1335 | } |
1346 | 1336 | ||
1347 | BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]); | 1337 | BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]); |
1348 | 1338 | ||
1349 | /* Now it's all set up, let userspace reach it */ | 1339 | /* Now it's all set up, let userspace reach it */ |
1350 | kvm_get_kvm(kvm); | 1340 | kvm_get_kvm(kvm); |
1351 | r = create_vcpu_fd(vcpu); | 1341 | r = create_vcpu_fd(vcpu); |
1352 | if (r < 0) { | 1342 | if (r < 0) { |
1353 | kvm_put_kvm(kvm); | 1343 | kvm_put_kvm(kvm); |
1354 | goto vcpu_destroy; | 1344 | goto vcpu_destroy; |
1355 | } | 1345 | } |
1356 | 1346 | ||
1357 | kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu; | 1347 | kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu; |
1358 | smp_wmb(); | 1348 | smp_wmb(); |
1359 | atomic_inc(&kvm->online_vcpus); | 1349 | atomic_inc(&kvm->online_vcpus); |
1360 | 1350 | ||
1361 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE | 1351 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE |
1362 | if (kvm->bsp_vcpu_id == id) | 1352 | if (kvm->bsp_vcpu_id == id) |
1363 | kvm->bsp_vcpu = vcpu; | 1353 | kvm->bsp_vcpu = vcpu; |
1364 | #endif | 1354 | #endif |
1365 | mutex_unlock(&kvm->lock); | 1355 | mutex_unlock(&kvm->lock); |
1366 | return r; | 1356 | return r; |
1367 | 1357 | ||
1368 | vcpu_destroy: | 1358 | vcpu_destroy: |
1369 | mutex_unlock(&kvm->lock); | 1359 | mutex_unlock(&kvm->lock); |
1370 | kvm_arch_vcpu_destroy(vcpu); | 1360 | kvm_arch_vcpu_destroy(vcpu); |
1371 | return r; | 1361 | return r; |
1372 | } | 1362 | } |
1373 | 1363 | ||
1374 | static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset) | 1364 | static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset) |
1375 | { | 1365 | { |
1376 | if (sigset) { | 1366 | if (sigset) { |
1377 | sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP)); | 1367 | sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP)); |
1378 | vcpu->sigset_active = 1; | 1368 | vcpu->sigset_active = 1; |
1379 | vcpu->sigset = *sigset; | 1369 | vcpu->sigset = *sigset; |
1380 | } else | 1370 | } else |
1381 | vcpu->sigset_active = 0; | 1371 | vcpu->sigset_active = 0; |
1382 | return 0; | 1372 | return 0; |
1383 | } | 1373 | } |
1384 | 1374 | ||
1385 | static long kvm_vcpu_ioctl(struct file *filp, | 1375 | static long kvm_vcpu_ioctl(struct file *filp, |
1386 | unsigned int ioctl, unsigned long arg) | 1376 | unsigned int ioctl, unsigned long arg) |
1387 | { | 1377 | { |
1388 | struct kvm_vcpu *vcpu = filp->private_data; | 1378 | struct kvm_vcpu *vcpu = filp->private_data; |
1389 | void __user *argp = (void __user *)arg; | 1379 | void __user *argp = (void __user *)arg; |
1390 | int r; | 1380 | int r; |
1391 | struct kvm_fpu *fpu = NULL; | 1381 | struct kvm_fpu *fpu = NULL; |
1392 | struct kvm_sregs *kvm_sregs = NULL; | 1382 | struct kvm_sregs *kvm_sregs = NULL; |
1393 | 1383 | ||
1394 | if (vcpu->kvm->mm != current->mm) | 1384 | if (vcpu->kvm->mm != current->mm) |
1395 | return -EIO; | 1385 | return -EIO; |
1396 | 1386 | ||
1397 | #if defined(CONFIG_S390) || defined(CONFIG_PPC) | 1387 | #if defined(CONFIG_S390) || defined(CONFIG_PPC) |
1398 | /* | 1388 | /* |
1399 | * Special cases: vcpu ioctls that are asynchronous to vcpu execution, | 1389 | * Special cases: vcpu ioctls that are asynchronous to vcpu execution, |
1400 | * so vcpu_load() would break it. | 1390 | * so vcpu_load() would break it. |
1401 | */ | 1391 | */ |
1402 | if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_INTERRUPT) | 1392 | if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_INTERRUPT) |
1403 | return kvm_arch_vcpu_ioctl(filp, ioctl, arg); | 1393 | return kvm_arch_vcpu_ioctl(filp, ioctl, arg); |
1404 | #endif | 1394 | #endif |
1405 | 1395 | ||
1406 | 1396 | ||
1407 | vcpu_load(vcpu); | 1397 | vcpu_load(vcpu); |
1408 | switch (ioctl) { | 1398 | switch (ioctl) { |
1409 | case KVM_RUN: | 1399 | case KVM_RUN: |
1410 | r = -EINVAL; | 1400 | r = -EINVAL; |
1411 | if (arg) | 1401 | if (arg) |
1412 | goto out; | 1402 | goto out; |
1413 | r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run); | 1403 | r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run); |
1414 | break; | 1404 | break; |
1415 | case KVM_GET_REGS: { | 1405 | case KVM_GET_REGS: { |
1416 | struct kvm_regs *kvm_regs; | 1406 | struct kvm_regs *kvm_regs; |
1417 | 1407 | ||
1418 | r = -ENOMEM; | 1408 | r = -ENOMEM; |
1419 | kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); | 1409 | kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); |
1420 | if (!kvm_regs) | 1410 | if (!kvm_regs) |
1421 | goto out; | 1411 | goto out; |
1422 | r = kvm_arch_vcpu_ioctl_get_regs(vcpu, kvm_regs); | 1412 | r = kvm_arch_vcpu_ioctl_get_regs(vcpu, kvm_regs); |
1423 | if (r) | 1413 | if (r) |
1424 | goto out_free1; | 1414 | goto out_free1; |
1425 | r = -EFAULT; | 1415 | r = -EFAULT; |
1426 | if (copy_to_user(argp, kvm_regs, sizeof(struct kvm_regs))) | 1416 | if (copy_to_user(argp, kvm_regs, sizeof(struct kvm_regs))) |
1427 | goto out_free1; | 1417 | goto out_free1; |
1428 | r = 0; | 1418 | r = 0; |
1429 | out_free1: | 1419 | out_free1: |
1430 | kfree(kvm_regs); | 1420 | kfree(kvm_regs); |
1431 | break; | 1421 | break; |
1432 | } | 1422 | } |
1433 | case KVM_SET_REGS: { | 1423 | case KVM_SET_REGS: { |
1434 | struct kvm_regs *kvm_regs; | 1424 | struct kvm_regs *kvm_regs; |
1435 | 1425 | ||
1436 | r = -ENOMEM; | 1426 | r = -ENOMEM; |
1437 | kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); | 1427 | kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL); |
1438 | if (!kvm_regs) | 1428 | if (!kvm_regs) |
1439 | goto out; | 1429 | goto out; |
1440 | r = -EFAULT; | 1430 | r = -EFAULT; |
1441 | if (copy_from_user(kvm_regs, argp, sizeof(struct kvm_regs))) | 1431 | if (copy_from_user(kvm_regs, argp, sizeof(struct kvm_regs))) |
1442 | goto out_free2; | 1432 | goto out_free2; |
1443 | r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs); | 1433 | r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs); |
1444 | if (r) | 1434 | if (r) |
1445 | goto out_free2; | 1435 | goto out_free2; |
1446 | r = 0; | 1436 | r = 0; |
1447 | out_free2: | 1437 | out_free2: |
1448 | kfree(kvm_regs); | 1438 | kfree(kvm_regs); |
1449 | break; | 1439 | break; |
1450 | } | 1440 | } |
1451 | case KVM_GET_SREGS: { | 1441 | case KVM_GET_SREGS: { |
1452 | kvm_sregs = kzalloc(sizeof(struct kvm_sregs), GFP_KERNEL); | 1442 | kvm_sregs = kzalloc(sizeof(struct kvm_sregs), GFP_KERNEL); |
1453 | r = -ENOMEM; | 1443 | r = -ENOMEM; |
1454 | if (!kvm_sregs) | 1444 | if (!kvm_sregs) |
1455 | goto out; | 1445 | goto out; |
1456 | r = kvm_arch_vcpu_ioctl_get_sregs(vcpu, kvm_sregs); | 1446 | r = kvm_arch_vcpu_ioctl_get_sregs(vcpu, kvm_sregs); |
1457 | if (r) | 1447 | if (r) |
1458 | goto out; | 1448 | goto out; |
1459 | r = -EFAULT; | 1449 | r = -EFAULT; |
1460 | if (copy_to_user(argp, kvm_sregs, sizeof(struct kvm_sregs))) | 1450 | if (copy_to_user(argp, kvm_sregs, sizeof(struct kvm_sregs))) |
1461 | goto out; | 1451 | goto out; |
1462 | r = 0; | 1452 | r = 0; |
1463 | break; | 1453 | break; |
1464 | } | 1454 | } |
1465 | case KVM_SET_SREGS: { | 1455 | case KVM_SET_SREGS: { |
1466 | kvm_sregs = kmalloc(sizeof(struct kvm_sregs), GFP_KERNEL); | 1456 | kvm_sregs = kmalloc(sizeof(struct kvm_sregs), GFP_KERNEL); |
1467 | r = -ENOMEM; | 1457 | r = -ENOMEM; |
1468 | if (!kvm_sregs) | 1458 | if (!kvm_sregs) |
1469 | goto out; | 1459 | goto out; |
1470 | r = -EFAULT; | 1460 | r = -EFAULT; |
1471 | if (copy_from_user(kvm_sregs, argp, sizeof(struct kvm_sregs))) | 1461 | if (copy_from_user(kvm_sregs, argp, sizeof(struct kvm_sregs))) |
1472 | goto out; | 1462 | goto out; |
1473 | r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs); | 1463 | r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs); |
1474 | if (r) | 1464 | if (r) |
1475 | goto out; | 1465 | goto out; |
1476 | r = 0; | 1466 | r = 0; |
1477 | break; | 1467 | break; |
1478 | } | 1468 | } |
1479 | case KVM_GET_MP_STATE: { | 1469 | case KVM_GET_MP_STATE: { |
1480 | struct kvm_mp_state mp_state; | 1470 | struct kvm_mp_state mp_state; |
1481 | 1471 | ||
1482 | r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state); | 1472 | r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state); |
1483 | if (r) | 1473 | if (r) |
1484 | goto out; | 1474 | goto out; |
1485 | r = -EFAULT; | 1475 | r = -EFAULT; |
1486 | if (copy_to_user(argp, &mp_state, sizeof mp_state)) | 1476 | if (copy_to_user(argp, &mp_state, sizeof mp_state)) |
1487 | goto out; | 1477 | goto out; |
1488 | r = 0; | 1478 | r = 0; |
1489 | break; | 1479 | break; |
1490 | } | 1480 | } |
1491 | case KVM_SET_MP_STATE: { | 1481 | case KVM_SET_MP_STATE: { |
1492 | struct kvm_mp_state mp_state; | 1482 | struct kvm_mp_state mp_state; |
1493 | 1483 | ||
1494 | r = -EFAULT; | 1484 | r = -EFAULT; |
1495 | if (copy_from_user(&mp_state, argp, sizeof mp_state)) | 1485 | if (copy_from_user(&mp_state, argp, sizeof mp_state)) |
1496 | goto out; | 1486 | goto out; |
1497 | r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state); | 1487 | r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state); |
1498 | if (r) | 1488 | if (r) |
1499 | goto out; | 1489 | goto out; |
1500 | r = 0; | 1490 | r = 0; |
1501 | break; | 1491 | break; |
1502 | } | 1492 | } |
1503 | case KVM_TRANSLATE: { | 1493 | case KVM_TRANSLATE: { |
1504 | struct kvm_translation tr; | 1494 | struct kvm_translation tr; |
1505 | 1495 | ||
1506 | r = -EFAULT; | 1496 | r = -EFAULT; |
1507 | if (copy_from_user(&tr, argp, sizeof tr)) | 1497 | if (copy_from_user(&tr, argp, sizeof tr)) |
1508 | goto out; | 1498 | goto out; |
1509 | r = kvm_arch_vcpu_ioctl_translate(vcpu, &tr); | 1499 | r = kvm_arch_vcpu_ioctl_translate(vcpu, &tr); |
1510 | if (r) | 1500 | if (r) |
1511 | goto out; | 1501 | goto out; |
1512 | r = -EFAULT; | 1502 | r = -EFAULT; |
1513 | if (copy_to_user(argp, &tr, sizeof tr)) | 1503 | if (copy_to_user(argp, &tr, sizeof tr)) |
1514 | goto out; | 1504 | goto out; |
1515 | r = 0; | 1505 | r = 0; |
1516 | break; | 1506 | break; |
1517 | } | 1507 | } |
1518 | case KVM_SET_GUEST_DEBUG: { | 1508 | case KVM_SET_GUEST_DEBUG: { |
1519 | struct kvm_guest_debug dbg; | 1509 | struct kvm_guest_debug dbg; |
1520 | 1510 | ||
1521 | r = -EFAULT; | 1511 | r = -EFAULT; |
1522 | if (copy_from_user(&dbg, argp, sizeof dbg)) | 1512 | if (copy_from_user(&dbg, argp, sizeof dbg)) |
1523 | goto out; | 1513 | goto out; |
1524 | r = kvm_arch_vcpu_ioctl_set_guest_debug(vcpu, &dbg); | 1514 | r = kvm_arch_vcpu_ioctl_set_guest_debug(vcpu, &dbg); |
1525 | if (r) | 1515 | if (r) |
1526 | goto out; | 1516 | goto out; |
1527 | r = 0; | 1517 | r = 0; |
1528 | break; | 1518 | break; |
1529 | } | 1519 | } |
1530 | case KVM_SET_SIGNAL_MASK: { | 1520 | case KVM_SET_SIGNAL_MASK: { |
1531 | struct kvm_signal_mask __user *sigmask_arg = argp; | 1521 | struct kvm_signal_mask __user *sigmask_arg = argp; |
1532 | struct kvm_signal_mask kvm_sigmask; | 1522 | struct kvm_signal_mask kvm_sigmask; |
1533 | sigset_t sigset, *p; | 1523 | sigset_t sigset, *p; |
1534 | 1524 | ||
1535 | p = NULL; | 1525 | p = NULL; |
1536 | if (argp) { | 1526 | if (argp) { |
1537 | r = -EFAULT; | 1527 | r = -EFAULT; |
1538 | if (copy_from_user(&kvm_sigmask, argp, | 1528 | if (copy_from_user(&kvm_sigmask, argp, |
1539 | sizeof kvm_sigmask)) | 1529 | sizeof kvm_sigmask)) |
1540 | goto out; | 1530 | goto out; |
1541 | r = -EINVAL; | 1531 | r = -EINVAL; |
1542 | if (kvm_sigmask.len != sizeof sigset) | 1532 | if (kvm_sigmask.len != sizeof sigset) |
1543 | goto out; | 1533 | goto out; |
1544 | r = -EFAULT; | 1534 | r = -EFAULT; |
1545 | if (copy_from_user(&sigset, sigmask_arg->sigset, | 1535 | if (copy_from_user(&sigset, sigmask_arg->sigset, |
1546 | sizeof sigset)) | 1536 | sizeof sigset)) |
1547 | goto out; | 1537 | goto out; |
1548 | p = &sigset; | 1538 | p = &sigset; |
1549 | } | 1539 | } |
1550 | r = kvm_vcpu_ioctl_set_sigmask(vcpu, p); | 1540 | r = kvm_vcpu_ioctl_set_sigmask(vcpu, p); |
1551 | break; | 1541 | break; |
1552 | } | 1542 | } |
1553 | case KVM_GET_FPU: { | 1543 | case KVM_GET_FPU: { |
1554 | fpu = kzalloc(sizeof(struct kvm_fpu), GFP_KERNEL); | 1544 | fpu = kzalloc(sizeof(struct kvm_fpu), GFP_KERNEL); |
1555 | r = -ENOMEM; | 1545 | r = -ENOMEM; |
1556 | if (!fpu) | 1546 | if (!fpu) |
1557 | goto out; | 1547 | goto out; |
1558 | r = kvm_arch_vcpu_ioctl_get_fpu(vcpu, fpu); | 1548 | r = kvm_arch_vcpu_ioctl_get_fpu(vcpu, fpu); |
1559 | if (r) | 1549 | if (r) |
1560 | goto out; | 1550 | goto out; |
1561 | r = -EFAULT; | 1551 | r = -EFAULT; |
1562 | if (copy_to_user(argp, fpu, sizeof(struct kvm_fpu))) | 1552 | if (copy_to_user(argp, fpu, sizeof(struct kvm_fpu))) |
1563 | goto out; | 1553 | goto out; |
1564 | r = 0; | 1554 | r = 0; |
1565 | break; | 1555 | break; |
1566 | } | 1556 | } |
1567 | case KVM_SET_FPU: { | 1557 | case KVM_SET_FPU: { |
1568 | fpu = kmalloc(sizeof(struct kvm_fpu), GFP_KERNEL); | 1558 | fpu = kmalloc(sizeof(struct kvm_fpu), GFP_KERNEL); |
1569 | r = -ENOMEM; | 1559 | r = -ENOMEM; |
1570 | if (!fpu) | 1560 | if (!fpu) |
1571 | goto out; | 1561 | goto out; |
1572 | r = -EFAULT; | 1562 | r = -EFAULT; |
1573 | if (copy_from_user(fpu, argp, sizeof(struct kvm_fpu))) | 1563 | if (copy_from_user(fpu, argp, sizeof(struct kvm_fpu))) |
1574 | goto out; | 1564 | goto out; |
1575 | r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu); | 1565 | r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu); |
1576 | if (r) | 1566 | if (r) |
1577 | goto out; | 1567 | goto out; |
1578 | r = 0; | 1568 | r = 0; |
1579 | break; | 1569 | break; |
1580 | } | 1570 | } |
1581 | default: | 1571 | default: |
1582 | r = kvm_arch_vcpu_ioctl(filp, ioctl, arg); | 1572 | r = kvm_arch_vcpu_ioctl(filp, ioctl, arg); |
1583 | } | 1573 | } |
1584 | out: | 1574 | out: |
1585 | vcpu_put(vcpu); | 1575 | vcpu_put(vcpu); |
1586 | kfree(fpu); | 1576 | kfree(fpu); |
1587 | kfree(kvm_sregs); | 1577 | kfree(kvm_sregs); |
1588 | return r; | 1578 | return r; |
1589 | } | 1579 | } |
1590 | 1580 | ||
1591 | static long kvm_vm_ioctl(struct file *filp, | 1581 | static long kvm_vm_ioctl(struct file *filp, |
1592 | unsigned int ioctl, unsigned long arg) | 1582 | unsigned int ioctl, unsigned long arg) |
1593 | { | 1583 | { |
1594 | struct kvm *kvm = filp->private_data; | 1584 | struct kvm *kvm = filp->private_data; |
1595 | void __user *argp = (void __user *)arg; | 1585 | void __user *argp = (void __user *)arg; |
1596 | int r; | 1586 | int r; |
1597 | 1587 | ||
1598 | if (kvm->mm != current->mm) | 1588 | if (kvm->mm != current->mm) |
1599 | return -EIO; | 1589 | return -EIO; |
1600 | switch (ioctl) { | 1590 | switch (ioctl) { |
1601 | case KVM_CREATE_VCPU: | 1591 | case KVM_CREATE_VCPU: |
1602 | r = kvm_vm_ioctl_create_vcpu(kvm, arg); | 1592 | r = kvm_vm_ioctl_create_vcpu(kvm, arg); |
1603 | if (r < 0) | 1593 | if (r < 0) |
1604 | goto out; | 1594 | goto out; |
1605 | break; | 1595 | break; |
1606 | case KVM_SET_USER_MEMORY_REGION: { | 1596 | case KVM_SET_USER_MEMORY_REGION: { |
1607 | struct kvm_userspace_memory_region kvm_userspace_mem; | 1597 | struct kvm_userspace_memory_region kvm_userspace_mem; |
1608 | 1598 | ||
1609 | r = -EFAULT; | 1599 | r = -EFAULT; |
1610 | if (copy_from_user(&kvm_userspace_mem, argp, | 1600 | if (copy_from_user(&kvm_userspace_mem, argp, |
1611 | sizeof kvm_userspace_mem)) | 1601 | sizeof kvm_userspace_mem)) |
1612 | goto out; | 1602 | goto out; |
1613 | 1603 | ||
1614 | r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1); | 1604 | r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem, 1); |
1615 | if (r) | 1605 | if (r) |
1616 | goto out; | 1606 | goto out; |
1617 | break; | 1607 | break; |
1618 | } | 1608 | } |
1619 | case KVM_GET_DIRTY_LOG: { | 1609 | case KVM_GET_DIRTY_LOG: { |
1620 | struct kvm_dirty_log log; | 1610 | struct kvm_dirty_log log; |
1621 | 1611 | ||
1622 | r = -EFAULT; | 1612 | r = -EFAULT; |
1623 | if (copy_from_user(&log, argp, sizeof log)) | 1613 | if (copy_from_user(&log, argp, sizeof log)) |
1624 | goto out; | 1614 | goto out; |
1625 | r = kvm_vm_ioctl_get_dirty_log(kvm, &log); | 1615 | r = kvm_vm_ioctl_get_dirty_log(kvm, &log); |
1626 | if (r) | 1616 | if (r) |
1627 | goto out; | 1617 | goto out; |
1628 | break; | 1618 | break; |
1629 | } | 1619 | } |
1630 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET | 1620 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET |
1631 | case KVM_REGISTER_COALESCED_MMIO: { | 1621 | case KVM_REGISTER_COALESCED_MMIO: { |
1632 | struct kvm_coalesced_mmio_zone zone; | 1622 | struct kvm_coalesced_mmio_zone zone; |
1633 | r = -EFAULT; | 1623 | r = -EFAULT; |
1634 | if (copy_from_user(&zone, argp, sizeof zone)) | 1624 | if (copy_from_user(&zone, argp, sizeof zone)) |
1635 | goto out; | 1625 | goto out; |
1636 | r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone); | 1626 | r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone); |
1637 | if (r) | 1627 | if (r) |
1638 | goto out; | 1628 | goto out; |
1639 | r = 0; | 1629 | r = 0; |
1640 | break; | 1630 | break; |
1641 | } | 1631 | } |
1642 | case KVM_UNREGISTER_COALESCED_MMIO: { | 1632 | case KVM_UNREGISTER_COALESCED_MMIO: { |
1643 | struct kvm_coalesced_mmio_zone zone; | 1633 | struct kvm_coalesced_mmio_zone zone; |
1644 | r = -EFAULT; | 1634 | r = -EFAULT; |
1645 | if (copy_from_user(&zone, argp, sizeof zone)) | 1635 | if (copy_from_user(&zone, argp, sizeof zone)) |
1646 | goto out; | 1636 | goto out; |
1647 | r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone); | 1637 | r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone); |
1648 | if (r) | 1638 | if (r) |
1649 | goto out; | 1639 | goto out; |
1650 | r = 0; | 1640 | r = 0; |
1651 | break; | 1641 | break; |
1652 | } | 1642 | } |
1653 | #endif | 1643 | #endif |
1654 | case KVM_IRQFD: { | 1644 | case KVM_IRQFD: { |
1655 | struct kvm_irqfd data; | 1645 | struct kvm_irqfd data; |
1656 | 1646 | ||
1657 | r = -EFAULT; | 1647 | r = -EFAULT; |
1658 | if (copy_from_user(&data, argp, sizeof data)) | 1648 | if (copy_from_user(&data, argp, sizeof data)) |
1659 | goto out; | 1649 | goto out; |
1660 | r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags); | 1650 | r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags); |
1661 | break; | 1651 | break; |
1662 | } | 1652 | } |
1663 | case KVM_IOEVENTFD: { | 1653 | case KVM_IOEVENTFD: { |
1664 | struct kvm_ioeventfd data; | 1654 | struct kvm_ioeventfd data; |
1665 | 1655 | ||
1666 | r = -EFAULT; | 1656 | r = -EFAULT; |
1667 | if (copy_from_user(&data, argp, sizeof data)) | 1657 | if (copy_from_user(&data, argp, sizeof data)) |
1668 | goto out; | 1658 | goto out; |
1669 | r = kvm_ioeventfd(kvm, &data); | 1659 | r = kvm_ioeventfd(kvm, &data); |
1670 | break; | 1660 | break; |
1671 | } | 1661 | } |
1672 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE | 1662 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE |
1673 | case KVM_SET_BOOT_CPU_ID: | 1663 | case KVM_SET_BOOT_CPU_ID: |
1674 | r = 0; | 1664 | r = 0; |
1675 | mutex_lock(&kvm->lock); | 1665 | mutex_lock(&kvm->lock); |
1676 | if (atomic_read(&kvm->online_vcpus) != 0) | 1666 | if (atomic_read(&kvm->online_vcpus) != 0) |
1677 | r = -EBUSY; | 1667 | r = -EBUSY; |
1678 | else | 1668 | else |
1679 | kvm->bsp_vcpu_id = arg; | 1669 | kvm->bsp_vcpu_id = arg; |
1680 | mutex_unlock(&kvm->lock); | 1670 | mutex_unlock(&kvm->lock); |
1681 | break; | 1671 | break; |
1682 | #endif | 1672 | #endif |
1683 | default: | 1673 | default: |
1684 | r = kvm_arch_vm_ioctl(filp, ioctl, arg); | 1674 | r = kvm_arch_vm_ioctl(filp, ioctl, arg); |
1685 | if (r == -ENOTTY) | 1675 | if (r == -ENOTTY) |
1686 | r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg); | 1676 | r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg); |
1687 | } | 1677 | } |
1688 | out: | 1678 | out: |
1689 | return r; | 1679 | return r; |
1690 | } | 1680 | } |
1691 | 1681 | ||
1692 | #ifdef CONFIG_COMPAT | 1682 | #ifdef CONFIG_COMPAT |
1693 | struct compat_kvm_dirty_log { | 1683 | struct compat_kvm_dirty_log { |
1694 | __u32 slot; | 1684 | __u32 slot; |
1695 | __u32 padding1; | 1685 | __u32 padding1; |
1696 | union { | 1686 | union { |
1697 | compat_uptr_t dirty_bitmap; /* one bit per page */ | 1687 | compat_uptr_t dirty_bitmap; /* one bit per page */ |
1698 | __u64 padding2; | 1688 | __u64 padding2; |
1699 | }; | 1689 | }; |
1700 | }; | 1690 | }; |
1701 | 1691 | ||
1702 | static long kvm_vm_compat_ioctl(struct file *filp, | 1692 | static long kvm_vm_compat_ioctl(struct file *filp, |
1703 | unsigned int ioctl, unsigned long arg) | 1693 | unsigned int ioctl, unsigned long arg) |
1704 | { | 1694 | { |
1705 | struct kvm *kvm = filp->private_data; | 1695 | struct kvm *kvm = filp->private_data; |
1706 | int r; | 1696 | int r; |
1707 | 1697 | ||
1708 | if (kvm->mm != current->mm) | 1698 | if (kvm->mm != current->mm) |
1709 | return -EIO; | 1699 | return -EIO; |
1710 | switch (ioctl) { | 1700 | switch (ioctl) { |
1711 | case KVM_GET_DIRTY_LOG: { | 1701 | case KVM_GET_DIRTY_LOG: { |
1712 | struct compat_kvm_dirty_log compat_log; | 1702 | struct compat_kvm_dirty_log compat_log; |
1713 | struct kvm_dirty_log log; | 1703 | struct kvm_dirty_log log; |
1714 | 1704 | ||
1715 | r = -EFAULT; | 1705 | r = -EFAULT; |
1716 | if (copy_from_user(&compat_log, (void __user *)arg, | 1706 | if (copy_from_user(&compat_log, (void __user *)arg, |
1717 | sizeof(compat_log))) | 1707 | sizeof(compat_log))) |
1718 | goto out; | 1708 | goto out; |
1719 | log.slot = compat_log.slot; | 1709 | log.slot = compat_log.slot; |
1720 | log.padding1 = compat_log.padding1; | 1710 | log.padding1 = compat_log.padding1; |
1721 | log.padding2 = compat_log.padding2; | 1711 | log.padding2 = compat_log.padding2; |
1722 | log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap); | 1712 | log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap); |
1723 | 1713 | ||
1724 | r = kvm_vm_ioctl_get_dirty_log(kvm, &log); | 1714 | r = kvm_vm_ioctl_get_dirty_log(kvm, &log); |
1725 | if (r) | 1715 | if (r) |
1726 | goto out; | 1716 | goto out; |
1727 | break; | 1717 | break; |
1728 | } | 1718 | } |
1729 | default: | 1719 | default: |
1730 | r = kvm_vm_ioctl(filp, ioctl, arg); | 1720 | r = kvm_vm_ioctl(filp, ioctl, arg); |
1731 | } | 1721 | } |
1732 | 1722 | ||
1733 | out: | 1723 | out: |
1734 | return r; | 1724 | return r; |
1735 | } | 1725 | } |
1736 | #endif | 1726 | #endif |
1737 | 1727 | ||
1738 | static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) | 1728 | static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) |
1739 | { | 1729 | { |
1740 | struct page *page[1]; | 1730 | struct page *page[1]; |
1741 | unsigned long addr; | 1731 | unsigned long addr; |
1742 | int npages; | 1732 | int npages; |
1743 | gfn_t gfn = vmf->pgoff; | 1733 | gfn_t gfn = vmf->pgoff; |
1744 | struct kvm *kvm = vma->vm_file->private_data; | 1734 | struct kvm *kvm = vma->vm_file->private_data; |
1745 | 1735 | ||
1746 | addr = gfn_to_hva(kvm, gfn); | 1736 | addr = gfn_to_hva(kvm, gfn); |
1747 | if (kvm_is_error_hva(addr)) | 1737 | if (kvm_is_error_hva(addr)) |
1748 | return VM_FAULT_SIGBUS; | 1738 | return VM_FAULT_SIGBUS; |
1749 | 1739 | ||
1750 | npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page, | 1740 | npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page, |
1751 | NULL); | 1741 | NULL); |
1752 | if (unlikely(npages != 1)) | 1742 | if (unlikely(npages != 1)) |
1753 | return VM_FAULT_SIGBUS; | 1743 | return VM_FAULT_SIGBUS; |
1754 | 1744 | ||
1755 | vmf->page = page[0]; | 1745 | vmf->page = page[0]; |
1756 | return 0; | 1746 | return 0; |
1757 | } | 1747 | } |
1758 | 1748 | ||
1759 | static const struct vm_operations_struct kvm_vm_vm_ops = { | 1749 | static const struct vm_operations_struct kvm_vm_vm_ops = { |
1760 | .fault = kvm_vm_fault, | 1750 | .fault = kvm_vm_fault, |
1761 | }; | 1751 | }; |
1762 | 1752 | ||
1763 | static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) | 1753 | static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) |
1764 | { | 1754 | { |
1765 | vma->vm_ops = &kvm_vm_vm_ops; | 1755 | vma->vm_ops = &kvm_vm_vm_ops; |
1766 | return 0; | 1756 | return 0; |
1767 | } | 1757 | } |
1768 | 1758 | ||
1769 | static struct file_operations kvm_vm_fops = { | 1759 | static struct file_operations kvm_vm_fops = { |
1770 | .release = kvm_vm_release, | 1760 | .release = kvm_vm_release, |
1771 | .unlocked_ioctl = kvm_vm_ioctl, | 1761 | .unlocked_ioctl = kvm_vm_ioctl, |
1772 | #ifdef CONFIG_COMPAT | 1762 | #ifdef CONFIG_COMPAT |
1773 | .compat_ioctl = kvm_vm_compat_ioctl, | 1763 | .compat_ioctl = kvm_vm_compat_ioctl, |
1774 | #endif | 1764 | #endif |
1775 | .mmap = kvm_vm_mmap, | 1765 | .mmap = kvm_vm_mmap, |
1776 | }; | 1766 | }; |
1777 | 1767 | ||
1778 | static int kvm_dev_ioctl_create_vm(void) | 1768 | static int kvm_dev_ioctl_create_vm(void) |
1779 | { | 1769 | { |
1780 | int fd, r; | 1770 | int fd, r; |
1781 | struct kvm *kvm; | 1771 | struct kvm *kvm; |
1782 | 1772 | ||
1783 | kvm = kvm_create_vm(); | 1773 | kvm = kvm_create_vm(); |
1784 | if (IS_ERR(kvm)) | 1774 | if (IS_ERR(kvm)) |
1785 | return PTR_ERR(kvm); | 1775 | return PTR_ERR(kvm); |
1786 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET | 1776 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET |
1787 | r = kvm_coalesced_mmio_init(kvm); | 1777 | r = kvm_coalesced_mmio_init(kvm); |
1788 | if (r < 0) { | 1778 | if (r < 0) { |
1789 | kvm_put_kvm(kvm); | 1779 | kvm_put_kvm(kvm); |
1790 | return r; | 1780 | return r; |
1791 | } | 1781 | } |
1792 | #endif | 1782 | #endif |
1793 | fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); | 1783 | fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); |
1794 | if (fd < 0) | 1784 | if (fd < 0) |
1795 | kvm_put_kvm(kvm); | 1785 | kvm_put_kvm(kvm); |
1796 | 1786 | ||
1797 | return fd; | 1787 | return fd; |
1798 | } | 1788 | } |
1799 | 1789 | ||
1800 | static long kvm_dev_ioctl_check_extension_generic(long arg) | 1790 | static long kvm_dev_ioctl_check_extension_generic(long arg) |
1801 | { | 1791 | { |
1802 | switch (arg) { | 1792 | switch (arg) { |
1803 | case KVM_CAP_USER_MEMORY: | 1793 | case KVM_CAP_USER_MEMORY: |
1804 | case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: | 1794 | case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: |
1805 | case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: | 1795 | case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: |
1806 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE | 1796 | #ifdef CONFIG_KVM_APIC_ARCHITECTURE |
1807 | case KVM_CAP_SET_BOOT_CPU_ID: | 1797 | case KVM_CAP_SET_BOOT_CPU_ID: |
1808 | #endif | 1798 | #endif |
1809 | case KVM_CAP_INTERNAL_ERROR_DATA: | 1799 | case KVM_CAP_INTERNAL_ERROR_DATA: |
1810 | return 1; | 1800 | return 1; |
1811 | #ifdef CONFIG_HAVE_KVM_IRQCHIP | 1801 | #ifdef CONFIG_HAVE_KVM_IRQCHIP |
1812 | case KVM_CAP_IRQ_ROUTING: | 1802 | case KVM_CAP_IRQ_ROUTING: |
1813 | return KVM_MAX_IRQ_ROUTES; | 1803 | return KVM_MAX_IRQ_ROUTES; |
1814 | #endif | 1804 | #endif |
1815 | default: | 1805 | default: |
1816 | break; | 1806 | break; |
1817 | } | 1807 | } |
1818 | return kvm_dev_ioctl_check_extension(arg); | 1808 | return kvm_dev_ioctl_check_extension(arg); |
1819 | } | 1809 | } |
1820 | 1810 | ||
1821 | static long kvm_dev_ioctl(struct file *filp, | 1811 | static long kvm_dev_ioctl(struct file *filp, |
1822 | unsigned int ioctl, unsigned long arg) | 1812 | unsigned int ioctl, unsigned long arg) |
1823 | { | 1813 | { |
1824 | long r = -EINVAL; | 1814 | long r = -EINVAL; |
1825 | 1815 | ||
1826 | switch (ioctl) { | 1816 | switch (ioctl) { |
1827 | case KVM_GET_API_VERSION: | 1817 | case KVM_GET_API_VERSION: |
1828 | r = -EINVAL; | 1818 | r = -EINVAL; |
1829 | if (arg) | 1819 | if (arg) |
1830 | goto out; | 1820 | goto out; |
1831 | r = KVM_API_VERSION; | 1821 | r = KVM_API_VERSION; |
1832 | break; | 1822 | break; |
1833 | case KVM_CREATE_VM: | 1823 | case KVM_CREATE_VM: |
1834 | r = -EINVAL; | 1824 | r = -EINVAL; |
1835 | if (arg) | 1825 | if (arg) |
1836 | goto out; | 1826 | goto out; |
1837 | r = kvm_dev_ioctl_create_vm(); | 1827 | r = kvm_dev_ioctl_create_vm(); |
1838 | break; | 1828 | break; |
1839 | case KVM_CHECK_EXTENSION: | 1829 | case KVM_CHECK_EXTENSION: |
1840 | r = kvm_dev_ioctl_check_extension_generic(arg); | 1830 | r = kvm_dev_ioctl_check_extension_generic(arg); |
1841 | break; | 1831 | break; |
1842 | case KVM_GET_VCPU_MMAP_SIZE: | 1832 | case KVM_GET_VCPU_MMAP_SIZE: |
1843 | r = -EINVAL; | 1833 | r = -EINVAL; |
1844 | if (arg) | 1834 | if (arg) |
1845 | goto out; | 1835 | goto out; |
1846 | r = PAGE_SIZE; /* struct kvm_run */ | 1836 | r = PAGE_SIZE; /* struct kvm_run */ |
1847 | #ifdef CONFIG_X86 | 1837 | #ifdef CONFIG_X86 |
1848 | r += PAGE_SIZE; /* pio data page */ | 1838 | r += PAGE_SIZE; /* pio data page */ |
1849 | #endif | 1839 | #endif |
1850 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET | 1840 | #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET |
1851 | r += PAGE_SIZE; /* coalesced mmio ring page */ | 1841 | r += PAGE_SIZE; /* coalesced mmio ring page */ |
1852 | #endif | 1842 | #endif |
1853 | break; | 1843 | break; |
1854 | case KVM_TRACE_ENABLE: | 1844 | case KVM_TRACE_ENABLE: |
1855 | case KVM_TRACE_PAUSE: | 1845 | case KVM_TRACE_PAUSE: |
1856 | case KVM_TRACE_DISABLE: | 1846 | case KVM_TRACE_DISABLE: |
1857 | r = -EOPNOTSUPP; | 1847 | r = -EOPNOTSUPP; |
1858 | break; | 1848 | break; |
1859 | default: | 1849 | default: |
1860 | return kvm_arch_dev_ioctl(filp, ioctl, arg); | 1850 | return kvm_arch_dev_ioctl(filp, ioctl, arg); |
1861 | } | 1851 | } |
1862 | out: | 1852 | out: |
1863 | return r; | 1853 | return r; |
1864 | } | 1854 | } |
1865 | 1855 | ||
1866 | static struct file_operations kvm_chardev_ops = { | 1856 | static struct file_operations kvm_chardev_ops = { |
1867 | .unlocked_ioctl = kvm_dev_ioctl, | 1857 | .unlocked_ioctl = kvm_dev_ioctl, |
1868 | .compat_ioctl = kvm_dev_ioctl, | 1858 | .compat_ioctl = kvm_dev_ioctl, |
1869 | }; | 1859 | }; |
1870 | 1860 | ||
1871 | static struct miscdevice kvm_dev = { | 1861 | static struct miscdevice kvm_dev = { |
1872 | KVM_MINOR, | 1862 | KVM_MINOR, |
1873 | "kvm", | 1863 | "kvm", |
1874 | &kvm_chardev_ops, | 1864 | &kvm_chardev_ops, |
1875 | }; | 1865 | }; |
1876 | 1866 | ||
1877 | static void hardware_enable(void *junk) | 1867 | static void hardware_enable(void *junk) |
1878 | { | 1868 | { |
1879 | int cpu = raw_smp_processor_id(); | 1869 | int cpu = raw_smp_processor_id(); |
1880 | int r; | 1870 | int r; |
1881 | 1871 | ||
1882 | if (cpumask_test_cpu(cpu, cpus_hardware_enabled)) | 1872 | if (cpumask_test_cpu(cpu, cpus_hardware_enabled)) |
1883 | return; | 1873 | return; |
1884 | 1874 | ||
1885 | cpumask_set_cpu(cpu, cpus_hardware_enabled); | 1875 | cpumask_set_cpu(cpu, cpus_hardware_enabled); |
1886 | 1876 | ||
1887 | r = kvm_arch_hardware_enable(NULL); | 1877 | r = kvm_arch_hardware_enable(NULL); |
1888 | 1878 | ||
1889 | if (r) { | 1879 | if (r) { |
1890 | cpumask_clear_cpu(cpu, cpus_hardware_enabled); | 1880 | cpumask_clear_cpu(cpu, cpus_hardware_enabled); |
1891 | atomic_inc(&hardware_enable_failed); | 1881 | atomic_inc(&hardware_enable_failed); |
1892 | printk(KERN_INFO "kvm: enabling virtualization on " | 1882 | printk(KERN_INFO "kvm: enabling virtualization on " |
1893 | "CPU%d failed\n", cpu); | 1883 | "CPU%d failed\n", cpu); |
1894 | } | 1884 | } |
1895 | } | 1885 | } |
1896 | 1886 | ||
1897 | static void hardware_disable(void *junk) | 1887 | static void hardware_disable(void *junk) |
1898 | { | 1888 | { |
1899 | int cpu = raw_smp_processor_id(); | 1889 | int cpu = raw_smp_processor_id(); |
1900 | 1890 | ||
1901 | if (!cpumask_test_cpu(cpu, cpus_hardware_enabled)) | 1891 | if (!cpumask_test_cpu(cpu, cpus_hardware_enabled)) |
1902 | return; | 1892 | return; |
1903 | cpumask_clear_cpu(cpu, cpus_hardware_enabled); | 1893 | cpumask_clear_cpu(cpu, cpus_hardware_enabled); |
1904 | kvm_arch_hardware_disable(NULL); | 1894 | kvm_arch_hardware_disable(NULL); |
1905 | } | 1895 | } |
1906 | 1896 | ||
1907 | static void hardware_disable_all_nolock(void) | 1897 | static void hardware_disable_all_nolock(void) |
1908 | { | 1898 | { |
1909 | BUG_ON(!kvm_usage_count); | 1899 | BUG_ON(!kvm_usage_count); |
1910 | 1900 | ||
1911 | kvm_usage_count--; | 1901 | kvm_usage_count--; |
1912 | if (!kvm_usage_count) | 1902 | if (!kvm_usage_count) |
1913 | on_each_cpu(hardware_disable, NULL, 1); | 1903 | on_each_cpu(hardware_disable, NULL, 1); |
1914 | } | 1904 | } |
1915 | 1905 | ||
1916 | static void hardware_disable_all(void) | 1906 | static void hardware_disable_all(void) |
1917 | { | 1907 | { |
1918 | spin_lock(&kvm_lock); | 1908 | spin_lock(&kvm_lock); |
1919 | hardware_disable_all_nolock(); | 1909 | hardware_disable_all_nolock(); |
1920 | spin_unlock(&kvm_lock); | 1910 | spin_unlock(&kvm_lock); |
1921 | } | 1911 | } |
1922 | 1912 | ||
1923 | static int hardware_enable_all(void) | 1913 | static int hardware_enable_all(void) |
1924 | { | 1914 | { |
1925 | int r = 0; | 1915 | int r = 0; |
1926 | 1916 | ||
1927 | spin_lock(&kvm_lock); | 1917 | spin_lock(&kvm_lock); |
1928 | 1918 | ||
1929 | kvm_usage_count++; | 1919 | kvm_usage_count++; |
1930 | if (kvm_usage_count == 1) { | 1920 | if (kvm_usage_count == 1) { |
1931 | atomic_set(&hardware_enable_failed, 0); | 1921 | atomic_set(&hardware_enable_failed, 0); |
1932 | on_each_cpu(hardware_enable, NULL, 1); | 1922 | on_each_cpu(hardware_enable, NULL, 1); |
1933 | 1923 | ||
1934 | if (atomic_read(&hardware_enable_failed)) { | 1924 | if (atomic_read(&hardware_enable_failed)) { |
1935 | hardware_disable_all_nolock(); | 1925 | hardware_disable_all_nolock(); |
1936 | r = -EBUSY; | 1926 | r = -EBUSY; |
1937 | } | 1927 | } |
1938 | } | 1928 | } |
1939 | 1929 | ||
1940 | spin_unlock(&kvm_lock); | 1930 | spin_unlock(&kvm_lock); |
1941 | 1931 | ||
1942 | return r; | 1932 | return r; |
1943 | } | 1933 | } |
1944 | 1934 | ||
1945 | static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, | 1935 | static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, |
1946 | void *v) | 1936 | void *v) |
1947 | { | 1937 | { |
1948 | int cpu = (long)v; | 1938 | int cpu = (long)v; |
1949 | 1939 | ||
1950 | if (!kvm_usage_count) | 1940 | if (!kvm_usage_count) |
1951 | return NOTIFY_OK; | 1941 | return NOTIFY_OK; |
1952 | 1942 | ||
1953 | val &= ~CPU_TASKS_FROZEN; | 1943 | val &= ~CPU_TASKS_FROZEN; |
1954 | switch (val) { | 1944 | switch (val) { |
1955 | case CPU_DYING: | 1945 | case CPU_DYING: |
1956 | printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n", | 1946 | printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n", |
1957 | cpu); | 1947 | cpu); |
1958 | hardware_disable(NULL); | 1948 | hardware_disable(NULL); |
1959 | break; | 1949 | break; |
1960 | case CPU_ONLINE: | 1950 | case CPU_ONLINE: |
1961 | printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n", | 1951 | printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n", |
1962 | cpu); | 1952 | cpu); |
1963 | smp_call_function_single(cpu, hardware_enable, NULL, 1); | 1953 | smp_call_function_single(cpu, hardware_enable, NULL, 1); |
1964 | break; | 1954 | break; |
1965 | } | 1955 | } |
1966 | return NOTIFY_OK; | 1956 | return NOTIFY_OK; |
1967 | } | 1957 | } |
1968 | 1958 | ||
1969 | 1959 | ||
1970 | asmlinkage void kvm_handle_fault_on_reboot(void) | 1960 | asmlinkage void kvm_handle_fault_on_reboot(void) |
1971 | { | 1961 | { |
1972 | if (kvm_rebooting) | 1962 | if (kvm_rebooting) |
1973 | /* spin while reset goes on */ | 1963 | /* spin while reset goes on */ |
1974 | while (true) | 1964 | while (true) |
1975 | ; | 1965 | ; |
1976 | /* Fault while not rebooting. We want the trace. */ | 1966 | /* Fault while not rebooting. We want the trace. */ |
1977 | BUG(); | 1967 | BUG(); |
1978 | } | 1968 | } |
1979 | EXPORT_SYMBOL_GPL(kvm_handle_fault_on_reboot); | 1969 | EXPORT_SYMBOL_GPL(kvm_handle_fault_on_reboot); |
1980 | 1970 | ||
1981 | static int kvm_reboot(struct notifier_block *notifier, unsigned long val, | 1971 | static int kvm_reboot(struct notifier_block *notifier, unsigned long val, |
1982 | void *v) | 1972 | void *v) |
1983 | { | 1973 | { |
1984 | /* | 1974 | /* |
1985 | * Some (well, at least mine) BIOSes hang on reboot if | 1975 | * Some (well, at least mine) BIOSes hang on reboot if |
1986 | * in vmx root mode. | 1976 | * in vmx root mode. |
1987 | * | 1977 | * |
1988 | * And Intel TXT required VMX off for all cpu when system shutdown. | 1978 | * And Intel TXT required VMX off for all cpu when system shutdown. |
1989 | */ | 1979 | */ |
1990 | printk(KERN_INFO "kvm: exiting hardware virtualization\n"); | 1980 | printk(KERN_INFO "kvm: exiting hardware virtualization\n"); |
1991 | kvm_rebooting = true; | 1981 | kvm_rebooting = true; |
1992 | on_each_cpu(hardware_disable, NULL, 1); | 1982 | on_each_cpu(hardware_disable, NULL, 1); |
1993 | return NOTIFY_OK; | 1983 | return NOTIFY_OK; |
1994 | } | 1984 | } |
1995 | 1985 | ||
1996 | static struct notifier_block kvm_reboot_notifier = { | 1986 | static struct notifier_block kvm_reboot_notifier = { |
1997 | .notifier_call = kvm_reboot, | 1987 | .notifier_call = kvm_reboot, |
1998 | .priority = 0, | 1988 | .priority = 0, |
1999 | }; | 1989 | }; |
2000 | 1990 | ||
2001 | static void kvm_io_bus_destroy(struct kvm_io_bus *bus) | 1991 | static void kvm_io_bus_destroy(struct kvm_io_bus *bus) |
2002 | { | 1992 | { |
2003 | int i; | 1993 | int i; |
2004 | 1994 | ||
2005 | for (i = 0; i < bus->dev_count; i++) { | 1995 | for (i = 0; i < bus->dev_count; i++) { |
2006 | struct kvm_io_device *pos = bus->devs[i]; | 1996 | struct kvm_io_device *pos = bus->devs[i]; |
2007 | 1997 | ||
2008 | kvm_iodevice_destructor(pos); | 1998 | kvm_iodevice_destructor(pos); |
2009 | } | 1999 | } |
2010 | kfree(bus); | 2000 | kfree(bus); |
2011 | } | 2001 | } |
2012 | 2002 | ||
2013 | /* kvm_io_bus_write - called under kvm->slots_lock */ | 2003 | /* kvm_io_bus_write - called under kvm->slots_lock */ |
2014 | int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, | 2004 | int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, |
2015 | int len, const void *val) | 2005 | int len, const void *val) |
2016 | { | 2006 | { |
2017 | int i; | 2007 | int i; |
2018 | struct kvm_io_bus *bus; | 2008 | struct kvm_io_bus *bus; |
2019 | 2009 | ||
2020 | bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); | 2010 | bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); |
2021 | for (i = 0; i < bus->dev_count; i++) | 2011 | for (i = 0; i < bus->dev_count; i++) |
2022 | if (!kvm_iodevice_write(bus->devs[i], addr, len, val)) | 2012 | if (!kvm_iodevice_write(bus->devs[i], addr, len, val)) |
2023 | return 0; | 2013 | return 0; |
2024 | return -EOPNOTSUPP; | 2014 | return -EOPNOTSUPP; |
2025 | } | 2015 | } |
2026 | 2016 | ||
2027 | /* kvm_io_bus_read - called under kvm->slots_lock */ | 2017 | /* kvm_io_bus_read - called under kvm->slots_lock */ |
2028 | int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, | 2018 | int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, |
2029 | int len, void *val) | 2019 | int len, void *val) |
2030 | { | 2020 | { |
2031 | int i; | 2021 | int i; |
2032 | struct kvm_io_bus *bus; | 2022 | struct kvm_io_bus *bus; |
2033 | 2023 | ||
2034 | bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); | 2024 | bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu); |
2035 | for (i = 0; i < bus->dev_count; i++) | 2025 | for (i = 0; i < bus->dev_count; i++) |
2036 | if (!kvm_iodevice_read(bus->devs[i], addr, len, val)) | 2026 | if (!kvm_iodevice_read(bus->devs[i], addr, len, val)) |
2037 | return 0; | 2027 | return 0; |
2038 | return -EOPNOTSUPP; | 2028 | return -EOPNOTSUPP; |
2039 | } | 2029 | } |
2040 | 2030 | ||
2041 | /* Caller must hold slots_lock. */ | 2031 | /* Caller must hold slots_lock. */ |
2042 | int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, | 2032 | int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, |
2043 | struct kvm_io_device *dev) | 2033 | struct kvm_io_device *dev) |
2044 | { | 2034 | { |
2045 | struct kvm_io_bus *new_bus, *bus; | 2035 | struct kvm_io_bus *new_bus, *bus; |
2046 | 2036 | ||
2047 | bus = kvm->buses[bus_idx]; | 2037 | bus = kvm->buses[bus_idx]; |
2048 | if (bus->dev_count > NR_IOBUS_DEVS-1) | 2038 | if (bus->dev_count > NR_IOBUS_DEVS-1) |
2049 | return -ENOSPC; | 2039 | return -ENOSPC; |
2050 | 2040 | ||
2051 | new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); | 2041 | new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); |
2052 | if (!new_bus) | 2042 | if (!new_bus) |
2053 | return -ENOMEM; | 2043 | return -ENOMEM; |
2054 | memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); | 2044 | memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); |
2055 | new_bus->devs[new_bus->dev_count++] = dev; | 2045 | new_bus->devs[new_bus->dev_count++] = dev; |
2056 | rcu_assign_pointer(kvm->buses[bus_idx], new_bus); | 2046 | rcu_assign_pointer(kvm->buses[bus_idx], new_bus); |
2057 | synchronize_srcu_expedited(&kvm->srcu); | 2047 | synchronize_srcu_expedited(&kvm->srcu); |
2058 | kfree(bus); | 2048 | kfree(bus); |
2059 | 2049 | ||
2060 | return 0; | 2050 | return 0; |
2061 | } | 2051 | } |
2062 | 2052 | ||
2063 | /* Caller must hold slots_lock. */ | 2053 | /* Caller must hold slots_lock. */ |
2064 | int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, | 2054 | int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, |
2065 | struct kvm_io_device *dev) | 2055 | struct kvm_io_device *dev) |
2066 | { | 2056 | { |
2067 | int i, r; | 2057 | int i, r; |
2068 | struct kvm_io_bus *new_bus, *bus; | 2058 | struct kvm_io_bus *new_bus, *bus; |
2069 | 2059 | ||
2070 | new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); | 2060 | new_bus = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); |
2071 | if (!new_bus) | 2061 | if (!new_bus) |
2072 | return -ENOMEM; | 2062 | return -ENOMEM; |
2073 | 2063 | ||
2074 | bus = kvm->buses[bus_idx]; | 2064 | bus = kvm->buses[bus_idx]; |
2075 | memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); | 2065 | memcpy(new_bus, bus, sizeof(struct kvm_io_bus)); |
2076 | 2066 | ||
2077 | r = -ENOENT; | 2067 | r = -ENOENT; |
2078 | for (i = 0; i < new_bus->dev_count; i++) | 2068 | for (i = 0; i < new_bus->dev_count; i++) |
2079 | if (new_bus->devs[i] == dev) { | 2069 | if (new_bus->devs[i] == dev) { |
2080 | r = 0; | 2070 | r = 0; |
2081 | new_bus->devs[i] = new_bus->devs[--new_bus->dev_count]; | 2071 | new_bus->devs[i] = new_bus->devs[--new_bus->dev_count]; |
2082 | break; | 2072 | break; |
2083 | } | 2073 | } |
2084 | 2074 | ||
2085 | if (r) { | 2075 | if (r) { |
2086 | kfree(new_bus); | 2076 | kfree(new_bus); |
2087 | return r; | 2077 | return r; |
2088 | } | 2078 | } |
2089 | 2079 | ||
2090 | rcu_assign_pointer(kvm->buses[bus_idx], new_bus); | 2080 | rcu_assign_pointer(kvm->buses[bus_idx], new_bus); |
2091 | synchronize_srcu_expedited(&kvm->srcu); | 2081 | synchronize_srcu_expedited(&kvm->srcu); |
2092 | kfree(bus); | 2082 | kfree(bus); |
2093 | return r; | 2083 | return r; |
2094 | } | 2084 | } |
2095 | 2085 | ||
2096 | static struct notifier_block kvm_cpu_notifier = { | 2086 | static struct notifier_block kvm_cpu_notifier = { |
2097 | .notifier_call = kvm_cpu_hotplug, | 2087 | .notifier_call = kvm_cpu_hotplug, |
2098 | .priority = 20, /* must be > scheduler priority */ | 2088 | .priority = 20, /* must be > scheduler priority */ |
2099 | }; | 2089 | }; |
2100 | 2090 | ||
2101 | static int vm_stat_get(void *_offset, u64 *val) | 2091 | static int vm_stat_get(void *_offset, u64 *val) |
2102 | { | 2092 | { |
2103 | unsigned offset = (long)_offset; | 2093 | unsigned offset = (long)_offset; |
2104 | struct kvm *kvm; | 2094 | struct kvm *kvm; |
2105 | 2095 | ||
2106 | *val = 0; | 2096 | *val = 0; |
2107 | spin_lock(&kvm_lock); | 2097 | spin_lock(&kvm_lock); |
2108 | list_for_each_entry(kvm, &vm_list, vm_list) | 2098 | list_for_each_entry(kvm, &vm_list, vm_list) |
2109 | *val += *(u32 *)((void *)kvm + offset); | 2099 | *val += *(u32 *)((void *)kvm + offset); |
2110 | spin_unlock(&kvm_lock); | 2100 | spin_unlock(&kvm_lock); |
2111 | return 0; | 2101 | return 0; |
2112 | } | 2102 | } |
2113 | 2103 | ||
2114 | DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n"); | 2104 | DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n"); |
2115 | 2105 | ||
2116 | static int vcpu_stat_get(void *_offset, u64 *val) | 2106 | static int vcpu_stat_get(void *_offset, u64 *val) |
2117 | { | 2107 | { |
2118 | unsigned offset = (long)_offset; | 2108 | unsigned offset = (long)_offset; |
2119 | struct kvm *kvm; | 2109 | struct kvm *kvm; |
2120 | struct kvm_vcpu *vcpu; | 2110 | struct kvm_vcpu *vcpu; |
2121 | int i; | 2111 | int i; |
2122 | 2112 | ||
2123 | *val = 0; | 2113 | *val = 0; |
2124 | spin_lock(&kvm_lock); | 2114 | spin_lock(&kvm_lock); |
2125 | list_for_each_entry(kvm, &vm_list, vm_list) | 2115 | list_for_each_entry(kvm, &vm_list, vm_list) |
2126 | kvm_for_each_vcpu(i, vcpu, kvm) | 2116 | kvm_for_each_vcpu(i, vcpu, kvm) |
2127 | *val += *(u32 *)((void *)vcpu + offset); | 2117 | *val += *(u32 *)((void *)vcpu + offset); |
2128 | 2118 | ||
2129 | spin_unlock(&kvm_lock); | 2119 | spin_unlock(&kvm_lock); |
2130 | return 0; | 2120 | return 0; |
2131 | } | 2121 | } |
2132 | 2122 | ||
2133 | DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n"); | 2123 | DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n"); |
2134 | 2124 | ||
2135 | static const struct file_operations *stat_fops[] = { | 2125 | static const struct file_operations *stat_fops[] = { |
2136 | [KVM_STAT_VCPU] = &vcpu_stat_fops, | 2126 | [KVM_STAT_VCPU] = &vcpu_stat_fops, |
2137 | [KVM_STAT_VM] = &vm_stat_fops, | 2127 | [KVM_STAT_VM] = &vm_stat_fops, |
2138 | }; | 2128 | }; |
2139 | 2129 | ||
2140 | static void kvm_init_debug(void) | 2130 | static void kvm_init_debug(void) |
2141 | { | 2131 | { |
2142 | struct kvm_stats_debugfs_item *p; | 2132 | struct kvm_stats_debugfs_item *p; |
2143 | 2133 | ||
2144 | kvm_debugfs_dir = debugfs_create_dir("kvm", NULL); | 2134 | kvm_debugfs_dir = debugfs_create_dir("kvm", NULL); |
2145 | for (p = debugfs_entries; p->name; ++p) | 2135 | for (p = debugfs_entries; p->name; ++p) |
2146 | p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir, | 2136 | p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir, |
2147 | (void *)(long)p->offset, | 2137 | (void *)(long)p->offset, |
2148 | stat_fops[p->kind]); | 2138 | stat_fops[p->kind]); |
2149 | } | 2139 | } |
2150 | 2140 | ||
2151 | static void kvm_exit_debug(void) | 2141 | static void kvm_exit_debug(void) |
2152 | { | 2142 | { |
2153 | struct kvm_stats_debugfs_item *p; | 2143 | struct kvm_stats_debugfs_item *p; |
2154 | 2144 | ||
2155 | for (p = debugfs_entries; p->name; ++p) | 2145 | for (p = debugfs_entries; p->name; ++p) |
2156 | debugfs_remove(p->dentry); | 2146 | debugfs_remove(p->dentry); |
2157 | debugfs_remove(kvm_debugfs_dir); | 2147 | debugfs_remove(kvm_debugfs_dir); |
2158 | } | 2148 | } |
2159 | 2149 | ||
2160 | static int kvm_suspend(struct sys_device *dev, pm_message_t state) | 2150 | static int kvm_suspend(struct sys_device *dev, pm_message_t state) |
2161 | { | 2151 | { |
2162 | if (kvm_usage_count) | 2152 | if (kvm_usage_count) |
2163 | hardware_disable(NULL); | 2153 | hardware_disable(NULL); |
2164 | return 0; | 2154 | return 0; |
2165 | } | 2155 | } |
2166 | 2156 | ||
2167 | static int kvm_resume(struct sys_device *dev) | 2157 | static int kvm_resume(struct sys_device *dev) |
2168 | { | 2158 | { |
2169 | if (kvm_usage_count) | 2159 | if (kvm_usage_count) |
2170 | hardware_enable(NULL); | 2160 | hardware_enable(NULL); |
2171 | return 0; | 2161 | return 0; |
2172 | } | 2162 | } |
2173 | 2163 | ||
2174 | static struct sysdev_class kvm_sysdev_class = { | 2164 | static struct sysdev_class kvm_sysdev_class = { |
2175 | .name = "kvm", | 2165 | .name = "kvm", |
2176 | .suspend = kvm_suspend, | 2166 | .suspend = kvm_suspend, |
2177 | .resume = kvm_resume, | 2167 | .resume = kvm_resume, |
2178 | }; | 2168 | }; |
2179 | 2169 | ||
2180 | static struct sys_device kvm_sysdev = { | 2170 | static struct sys_device kvm_sysdev = { |
2181 | .id = 0, | 2171 | .id = 0, |
2182 | .cls = &kvm_sysdev_class, | 2172 | .cls = &kvm_sysdev_class, |
2183 | }; | 2173 | }; |
2184 | 2174 | ||
2185 | struct page *bad_page; | 2175 | struct page *bad_page; |
2186 | pfn_t bad_pfn; | 2176 | pfn_t bad_pfn; |
2187 | 2177 | ||
2188 | static inline | 2178 | static inline |
2189 | struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) | 2179 | struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) |
2190 | { | 2180 | { |
2191 | return container_of(pn, struct kvm_vcpu, preempt_notifier); | 2181 | return container_of(pn, struct kvm_vcpu, preempt_notifier); |
2192 | } | 2182 | } |
2193 | 2183 | ||
2194 | static void kvm_sched_in(struct preempt_notifier *pn, int cpu) | 2184 | static void kvm_sched_in(struct preempt_notifier *pn, int cpu) |
2195 | { | 2185 | { |
2196 | struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); | 2186 | struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); |
2197 | 2187 | ||
2198 | kvm_arch_vcpu_load(vcpu, cpu); | 2188 | kvm_arch_vcpu_load(vcpu, cpu); |
2199 | } | 2189 | } |
2200 | 2190 | ||
2201 | static void kvm_sched_out(struct preempt_notifier *pn, | 2191 | static void kvm_sched_out(struct preempt_notifier *pn, |
2202 | struct task_struct *next) | 2192 | struct task_struct *next) |
2203 | { | 2193 | { |
2204 | struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); | 2194 | struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); |
2205 | 2195 | ||
2206 | kvm_arch_vcpu_put(vcpu); | 2196 | kvm_arch_vcpu_put(vcpu); |
2207 | } | 2197 | } |
2208 | 2198 | ||
2209 | int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, | 2199 | int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, |
2210 | struct module *module) | 2200 | struct module *module) |
2211 | { | 2201 | { |
2212 | int r; | 2202 | int r; |
2213 | int cpu; | 2203 | int cpu; |
2214 | 2204 | ||
2215 | r = kvm_arch_init(opaque); | 2205 | r = kvm_arch_init(opaque); |
2216 | if (r) | 2206 | if (r) |
2217 | goto out_fail; | 2207 | goto out_fail; |
2218 | 2208 | ||
2219 | bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO); | 2209 | bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO); |
2220 | 2210 | ||
2221 | if (bad_page == NULL) { | 2211 | if (bad_page == NULL) { |
2222 | r = -ENOMEM; | 2212 | r = -ENOMEM; |
2223 | goto out; | 2213 | goto out; |
2224 | } | 2214 | } |
2225 | 2215 | ||
2226 | bad_pfn = page_to_pfn(bad_page); | 2216 | bad_pfn = page_to_pfn(bad_page); |
2227 | 2217 | ||
2228 | hwpoison_page = alloc_page(GFP_KERNEL | __GFP_ZERO); | 2218 | hwpoison_page = alloc_page(GFP_KERNEL | __GFP_ZERO); |
2229 | 2219 | ||
2230 | if (hwpoison_page == NULL) { | 2220 | if (hwpoison_page == NULL) { |
2231 | r = -ENOMEM; | 2221 | r = -ENOMEM; |
2232 | goto out_free_0; | 2222 | goto out_free_0; |
2233 | } | 2223 | } |
2234 | 2224 | ||
2235 | hwpoison_pfn = page_to_pfn(hwpoison_page); | 2225 | hwpoison_pfn = page_to_pfn(hwpoison_page); |
2236 | 2226 | ||
2237 | if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) { | 2227 | if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) { |
2238 | r = -ENOMEM; | 2228 | r = -ENOMEM; |
2239 | goto out_free_0; | 2229 | goto out_free_0; |
2240 | } | 2230 | } |
2241 | 2231 | ||
2242 | r = kvm_arch_hardware_setup(); | 2232 | r = kvm_arch_hardware_setup(); |
2243 | if (r < 0) | 2233 | if (r < 0) |
2244 | goto out_free_0a; | 2234 | goto out_free_0a; |
2245 | 2235 | ||
2246 | for_each_online_cpu(cpu) { | 2236 | for_each_online_cpu(cpu) { |
2247 | smp_call_function_single(cpu, | 2237 | smp_call_function_single(cpu, |
2248 | kvm_arch_check_processor_compat, | 2238 | kvm_arch_check_processor_compat, |
2249 | &r, 1); | 2239 | &r, 1); |
2250 | if (r < 0) | 2240 | if (r < 0) |
2251 | goto out_free_1; | 2241 | goto out_free_1; |
2252 | } | 2242 | } |
2253 | 2243 | ||
2254 | r = register_cpu_notifier(&kvm_cpu_notifier); | 2244 | r = register_cpu_notifier(&kvm_cpu_notifier); |
2255 | if (r) | 2245 | if (r) |
2256 | goto out_free_2; | 2246 | goto out_free_2; |
2257 | register_reboot_notifier(&kvm_reboot_notifier); | 2247 | register_reboot_notifier(&kvm_reboot_notifier); |
2258 | 2248 | ||
2259 | r = sysdev_class_register(&kvm_sysdev_class); | 2249 | r = sysdev_class_register(&kvm_sysdev_class); |
2260 | if (r) | 2250 | if (r) |
2261 | goto out_free_3; | 2251 | goto out_free_3; |
2262 | 2252 | ||
2263 | r = sysdev_register(&kvm_sysdev); | 2253 | r = sysdev_register(&kvm_sysdev); |
2264 | if (r) | 2254 | if (r) |
2265 | goto out_free_4; | 2255 | goto out_free_4; |
2266 | 2256 | ||
2267 | /* A kmem cache lets us meet the alignment requirements of fx_save. */ | 2257 | /* A kmem cache lets us meet the alignment requirements of fx_save. */ |
2268 | if (!vcpu_align) | 2258 | if (!vcpu_align) |
2269 | vcpu_align = __alignof__(struct kvm_vcpu); | 2259 | vcpu_align = __alignof__(struct kvm_vcpu); |
2270 | kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align, | 2260 | kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align, |
2271 | 0, NULL); | 2261 | 0, NULL); |
2272 | if (!kvm_vcpu_cache) { | 2262 | if (!kvm_vcpu_cache) { |
2273 | r = -ENOMEM; | 2263 | r = -ENOMEM; |
2274 | goto out_free_5; | 2264 | goto out_free_5; |
2275 | } | 2265 | } |
2276 | 2266 | ||
2277 | kvm_chardev_ops.owner = module; | 2267 | kvm_chardev_ops.owner = module; |
2278 | kvm_vm_fops.owner = module; | 2268 | kvm_vm_fops.owner = module; |
2279 | kvm_vcpu_fops.owner = module; | 2269 | kvm_vcpu_fops.owner = module; |
2280 | 2270 | ||
2281 | r = misc_register(&kvm_dev); | 2271 | r = misc_register(&kvm_dev); |
2282 | if (r) { | 2272 | if (r) { |
2283 | printk(KERN_ERR "kvm: misc device register failed\n"); | 2273 | printk(KERN_ERR "kvm: misc device register failed\n"); |
2284 | goto out_free; | 2274 | goto out_free; |
2285 | } | 2275 | } |
2286 | 2276 | ||
2287 | kvm_preempt_ops.sched_in = kvm_sched_in; | 2277 | kvm_preempt_ops.sched_in = kvm_sched_in; |
2288 | kvm_preempt_ops.sched_out = kvm_sched_out; | 2278 | kvm_preempt_ops.sched_out = kvm_sched_out; |
2289 | 2279 | ||
2290 | kvm_init_debug(); | 2280 | kvm_init_debug(); |
2291 | 2281 | ||
2292 | return 0; | 2282 | return 0; |
2293 | 2283 | ||
2294 | out_free: | 2284 | out_free: |
2295 | kmem_cache_destroy(kvm_vcpu_cache); | 2285 | kmem_cache_destroy(kvm_vcpu_cache); |
2296 | out_free_5: | 2286 | out_free_5: |
2297 | sysdev_unregister(&kvm_sysdev); | 2287 | sysdev_unregister(&kvm_sysdev); |
2298 | out_free_4: | 2288 | out_free_4: |
2299 | sysdev_class_unregister(&kvm_sysdev_class); | 2289 | sysdev_class_unregister(&kvm_sysdev_class); |
2300 | out_free_3: | 2290 | out_free_3: |
2301 | unregister_reboot_notifier(&kvm_reboot_notifier); | 2291 | unregister_reboot_notifier(&kvm_reboot_notifier); |
2302 | unregister_cpu_notifier(&kvm_cpu_notifier); | 2292 | unregister_cpu_notifier(&kvm_cpu_notifier); |
2303 | out_free_2: | 2293 | out_free_2: |
2304 | out_free_1: | 2294 | out_free_1: |
2305 | kvm_arch_hardware_unsetup(); | 2295 | kvm_arch_hardware_unsetup(); |
2306 | out_free_0a: | 2296 | out_free_0a: |
2307 | free_cpumask_var(cpus_hardware_enabled); | 2297 | free_cpumask_var(cpus_hardware_enabled); |
2308 | out_free_0: | 2298 | out_free_0: |
2309 | if (hwpoison_page) | 2299 | if (hwpoison_page) |
2310 | __free_page(hwpoison_page); | 2300 | __free_page(hwpoison_page); |
2311 | __free_page(bad_page); | 2301 | __free_page(bad_page); |
2312 | out: | 2302 | out: |
2313 | kvm_arch_exit(); | 2303 | kvm_arch_exit(); |
2314 | out_fail: | 2304 | out_fail: |
2315 | return r; | 2305 | return r; |
2316 | } | 2306 | } |
2317 | EXPORT_SYMBOL_GPL(kvm_init); | 2307 | EXPORT_SYMBOL_GPL(kvm_init); |
2318 | 2308 | ||
2319 | void kvm_exit(void) | 2309 | void kvm_exit(void) |
2320 | { | 2310 | { |
2321 | kvm_exit_debug(); | 2311 | kvm_exit_debug(); |
2322 | misc_deregister(&kvm_dev); | 2312 | misc_deregister(&kvm_dev); |
2323 | kmem_cache_destroy(kvm_vcpu_cache); | 2313 | kmem_cache_destroy(kvm_vcpu_cache); |
2324 | sysdev_unregister(&kvm_sysdev); | 2314 | sysdev_unregister(&kvm_sysdev); |
2325 | sysdev_class_unregister(&kvm_sysdev_class); | 2315 | sysdev_class_unregister(&kvm_sysdev_class); |
2326 | unregister_reboot_notifier(&kvm_reboot_notifier); | 2316 | unregister_reboot_notifier(&kvm_reboot_notifier); |
2327 | unregister_cpu_notifier(&kvm_cpu_notifier); | 2317 | unregister_cpu_notifier(&kvm_cpu_notifier); |
2328 | on_each_cpu(hardware_disable, NULL, 1); | 2318 | on_each_cpu(hardware_disable, NULL, 1); |
2329 | kvm_arch_hardware_unsetup(); | 2319 | kvm_arch_hardware_unsetup(); |
2330 | kvm_arch_exit(); | 2320 | kvm_arch_exit(); |
2331 | free_cpumask_var(cpus_hardware_enabled); | 2321 | free_cpumask_var(cpus_hardware_enabled); |
2332 | __free_page(hwpoison_page); | 2322 | __free_page(hwpoison_page); |
2333 | __free_page(bad_page); | 2323 | __free_page(bad_page); |
2334 | } | 2324 | } |
2335 | EXPORT_SYMBOL_GPL(kvm_exit); | 2325 | EXPORT_SYMBOL_GPL(kvm_exit); |
2336 | 2326 |