Blame view

Documentation/ia64/paravirt_ops.txt 6.19 KB
8a2f2ccc7   Isaku Yamahata   [IA64] pvops: doc...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
  Paravirt_ops on IA64
  ====================
                            21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>
  
  
  Introduction
  ------------
  The aim of this documentation is to help with maintainability and/or to
  encourage people to use paravirt_ops/IA64.
  
  paravirt_ops (pv_ops in short) is a way for virtualization support of
  Linux kernel on x86. Several ways for virtualization support were
  proposed, paravirt_ops is the winner.
  On the other hand, now there are also several IA64 virtualization
  technologies like kvm/IA64, xen/IA64 and many other academic IA64
  hypervisors so that it is good to add generic virtualization
  infrastructure on Linux/IA64.
  
  
  What is paravirt_ops?
  ---------------------
  It has been developed on x86 as virtualization support via API, not ABI.
  It allows each hypervisor to override operations which are important for
  hypervisors at API level. And it allows a single kernel binary to run on
  all supported execution environments including native machine.
  Essentially paravirt_ops is a set of function pointers which represent
  operations corresponding to low level sensitive instructions and high
  level functionalities in various area. But one significant difference
  from usual function pointer table is that it allows optimization with
  binary patch. It is because some of these operations are very
  performance sensitive and indirect call overhead is not negligible.
  With binary patch, indirect C function call can be transformed into
  direct C function call or in-place execution to eliminate the overhead.
  
  Thus, operations of paravirt_ops are classified into three categories.
  - simple indirect call
    These operations correspond to high level functionality so that the
    overhead of indirect call isn't very important.
  
  - indirect call which allows optimization with binary patch
    Usually these operations correspond to low level instructions. They
    are called frequently and performance critical. So the overhead is
    very important.
  
  - a set of macros for hand written assembly code
    Hand written assembly codes (.S files) also need paravirtualization
    because they include sensitive instructions or some of code paths in
    them are very performance critical.
  
  
  The relation to the IA64 machine vector
  ---------------------------------------
  Linux/IA64 has the IA64 machine vector functionality which allows the
  kernel to switch implementations (e.g. initialization, ipi, dma api...)
  depending on executing platform.
  We can replace some implementations very easily defining a new machine
  vector. Thus another approach for virtualization support would be
  enhancing the machine vector functionality.
  But paravirt_ops approach was taken because
  - virtualization support needs wider support than machine vector does.
    e.g. low level instruction paravirtualization. It must be
         initialized very early before platform detection.
  
  - virtualization support needs more functionality like binary patch.
    Probably the calling overhead might not be very large compared to the
    emulation overhead of virtualization. However in the native case, the
    overhead should be eliminated completely.
    A single kernel binary should run on each environment including native,
    and the overhead of paravirt_ops on native environment should be as
    small as possible.
  
  - for full virtualization technology, e.g. KVM/IA64 or
    Xen/IA64 HVM domain, the result would be
    (the emulated platform machine vector. probably dig) + (pv_ops).
    This means that the virtualization support layer should be under
    the machine vector layer.
  
  Possibly it might be better to move some function pointers from
  paravirt_ops to machine vector. In fact, Xen domU case utilizes both
  pv_ops and machine vector.
  
  
  IA64 paravirt_ops
  -----------------
  In this section, the concrete paravirt_ops will be discussed.
  Because of the architecture difference between ia64 and x86, the
  resulting set of functions is very different from x86 pv_ops.
  
  - C function pointer tables
  They are not very performance critical so that simple C indirect
  function call is acceptable. The following structures are defined at
  this moment. For details see linux/include/asm-ia64/paravirt.h
    - struct pv_info
      This structure describes the execution environment.
    - struct pv_init_ops
      This structure describes the various initialization hooks.
    - struct pv_iosapic_ops
      This structure describes hooks to iosapic operations.
    - struct pv_irq_ops
      This structure describes hooks to irq related operations
    - struct pv_time_op
      This structure describes hooks to steal time accounting.
  
  - a set of indirect calls which need optimization
  Currently this class of functions correspond to a subset of IA64
  intrinsics. At this moment the optimization with binary patch isn't
  implemented yet.
  struct pv_cpu_op is defined. For details see
  linux/include/asm-ia64/paravirt_privop.h
  Mostly they correspond to ia64 intrinsics 1-to-1.
  Caveat: Now they are defined as C indirect function pointers, but in
  order to support binary patch optimization, they will be changed
  using GCC extended inline assembly code.
  
  - a set of macros for hand written assembly code (.S files)
  For maintenance purpose, the taken approach for .S files is single
  source code and compile multiple times with different macros definitions.
  Each pv_ops instance must define those macros to compile.
  The important thing here is that sensitive, but non-privileged
  instructions must be paravirtualized and that some privileged
  instructions also need paravirtualization for reasonable performance.
  Developers who modify .S files must be aware of that. At this moment
  an easy checker is implemented to detect paravirtualization breakage.
  But it doesn't cover all the cases.
  
  Sometimes this set of macros is called pv_cpu_asm_op. But there is no
  corresponding structure in the source code.
  Those macros mostly 1:1 correspond to a subset of privileged
  instructions. See linux/include/asm-ia64/native/inst.h.
  And some functions written in assembly also need to be overrided so
  that each pv_ops instance have to define some macros. Again see
  linux/include/asm-ia64/native/inst.h.
  
  
  Those structures must be initialized very early before start_kernel.
  Probably initialized in head.S using multi entry point or some other trick.
  For native case implementation see linux/arch/ia64/kernel/paravirt.c.